|Home | About | Journals | Submit | Contact Us | Français|
Utilization of sequencing to screen the general population for preventable monogenic conditions is receiving substantial attention due to its potential to decrease morbidity and mortality. However, the selection of which variants to return is a serious implementation challenge. Procedures must be investigated to ensure optimal test characteristics and avoidance of harm from false positive test results.
We scanned exome sequences from 478 well-phenotyped individuals for potentially pathogenic variants in 17 genes representing 11 conditions that are among the most medically actionable Mendelian disorders in adults. We developed 5 variant selection algorithms with increasing sensitivity and measured their specificity in these 17 genes.
Variant selection algorithms with increasing sensitivity exhibited decreased specificity, and performance was highly dependent on the genes analyzed. The most sensitive algorithm ranged from 88.8% to 99.6% specificity among the 17 genes.
For very low prevalence conditions, small reductions in specificity greatly increase false positives. This inescapable test characteristic governs the predictive value of genomic sequencing in the general population. To address this issue, test performance must be evaluated systematically for each condition so that the false negatives and false positives can be tailored for optimal outcomes, depending on the downstream clinical consequences.
Screening programs can be valuable public health tools. Universal newborn screening has been highly successful in detecting severe but preventable genetic disorders. Such programs utilize defined mechanisms to select target conditions, based on their prevalence, severity, treatment options, and availability of a confirmatory test.1 Similar screening programs (based on genomics) may be emerging for the adult population, approximately 1% of whom are predisposed to a serious hereditary condition that may be preventable or ameliorated through early diagnosis.2 Large scale genomic sequencing initiatives comprising over 100,000 people have been announced3,4 and screening of the general adult population for hereditary cancer has recently been proposed.5 President Obama also announced a U.S. initiative to recruit a cohort of one million people in order to advance the cause of “Precision Medicine,” echoing the UK’s effort to sequence whole genomes of 100,000 patients focusing on cancer and rare diseases. Finally, we are witnessing the emergence of direct to consumer companies marketing the opportunity for genomic screening to healthy individuals, thus potentially initiating a vast uncontrolled experiment in such an approach.
Human genetic variation is ubiquitous, with ~3 million nucleotide variants per individual genome. The vast majority of variants have no health implications, but certain rare variants cause heritable monogenic conditions. Some variants have undisputed pathogenicity in these disorders, whereas most have limited or no evidence of pathogenicity and all individuals have novel variants that are essentially “private” to their family. Importantly, many variants previously claimed as causal for monogenic disorders have conflicting assertions regarding pathogenicity, have been disputed by subsequent evidence,10-13 or have been determined to have less penetrance than other disease-causing variants in the same gene.14 Genetic variants identified in clinical sequencing are typically classified into 5 categories with respect to their etiologic role in monogenic disorders: Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign, and Benign.15 The Pathogenic designation implies virtually complete certainty that the variant is causal for the disorder; however, there is no universal agreement on what “likely” means. One proposal suggested that the Likely Pathogenic designation should imply 95-99% confidence in the pathogenicity of the variant,16 but quantitating confidence in variant pathogenicity is difficult and few standardized methods exist.17 For most conditions there are no “gold standard” confirmatory tests that can adjudicate the pathogenicity of genetic variants.
In screening, the test performance (sensitivity and specificity) and the prevalence of the disorder determine the predictive value of a screen positive result. If genomic screening is misapplied in the general population, false positive results could lead to overtreatment, overt harm and monetary waste. Thus, it is imperative to understand the performance of sequencing and how to optimize thresholds for returning results in the novel context of population screening, which are likely to be dramatically different than in a clinical diagnostic context.
Because of their low population prevalence, some monogenic disorders would require screening >10,000 people in order to detect a single true positive result. In such conditions, positive predictive value (PPV) is highly dependent on specificity, such that for a condition with 0.01% population frequency, reduction from 100% specificity to 99.94% specificity decreases PPV to 10%. This effect is similar but less pronounced in conditions with higher population frequencies. Although the technical sensitivity and technical specificity of sequencing (whether a genetic variant is truly present or not) can be measured relatively easily, there has so far been no effort to determine the clinical sensitivity or clinical specificity (whether the condition is truly present or not) of sequencing in the general population. Thus, it is completely unknown how genomic screening will fare in terms of predictive value.
The Center for Genomics and Society at the University of North Carolina at Chapel Hill is conducting an exploratory project, “GeneScreen,” to examine the feasibility and ethical considerations of screening 1,000 adults from the general population using massively parallel targeted sequencing of 17 genes responsible for 11 conditions that are among the most clinically actionable monogenic disorders in adults. A central obstacle to such screening is to clearly define criteria for variant selection and return, so as to avoid unacceptable numbers of false positives that would lead to unnecessary intervention and negative emotional, physical, or social consequences. Here, we evaluate different algorithms for selecting potentially pathogenic variants. The most stringent algorithm chooses variants that correspond closely to those that would be classified as “Pathogenic” in human review; the least stringent algorithm selects many additional variants that would be classified as having “Uncertain Significance.” We estimate the specificity of these algorithms in 478 exomes from a diverse cohort of well-phenotyped participants in a separate clinical sequencing exploratory research project, allowing us to simulate expected findings in the general population. This represents the first attempt to empirically measure the performance of genomic sequencing for population screening.
Exome data from 478 participants in the IRB-approved North Carolina Clinical Genomic Evaluation by Next-gen Exome Sequencing (NCGENES) project were analyzed for the 17 GeneScreen genes (APC, BRCA1, BRCA2, HFE, FBN1, KCNH2, KCNQ1, LDLR, MLH1, MSH2, MSH6, MUTYH, PMS2, RET, RYR1, SCN5A, SERPINA1). The NCGENES participants were previously sequenced in a diagnostic context for a variety of phenotypes, including cancer, aortic aneurysm, and arrhythmia (manuscript in preparation). Any participants who were sequenced for a primary indication overlapping with a condition described in this manuscript were excluded from the analysis for those conditions. Exome variants were loaded into a PostgresSQL database (v.9.0.3) for annotation and facilitation of queries.18
Variants classified as pathogenic or likely pathogenic in the National Center for Biotechnology Information ClinVar database19 (downloaded 2/14) and variants labeled as “DM” (Disease Mutation) in the Human Gene Mutation Database20 (HGMD; 2014, v2) were collected as a source of potentially pathogenic variants. Recognizing that many variants in these databases may be erroneously classified, variants were excluded if pathogenicity assertions were discordant.
Population allele frequency estimates were determined using the Exome Aggregation Consortium (ExAC), a resource composed of 63,358 unrelated individuals sequenced through a variety of studies.21 Each gene was analyzed using one of three different minor allele frequency thresholds: 1%, 0.1%, or 0.01%. The minor allele frequency threshold chosen for each gene was based on the maximum expected population frequency of pathogenic alleles for the associated condition. In addition, variants in genes associated with conditions having dominant inheritance were eliminated if the minor allele frequency in the exome sample was >0.5%. CADD scores22 were retrieved for novel missense variants, with a Phred score of 13 as the threshold for deleteriousness. Conserved functional domains within proteins were obtained from the RefSeq database.23
Five variant selection algorithms (VSA1-5) were applied to the 17 genes, with each successive algorithm choosing more potentially pathogenic variants that could qualify as a “positive” screening result:
The algorithms take into account zygosity, such that either a homozygous variant or two potentially biallelic variants were required for a “positive” screen in MUTYH, HFE, and SERPINA1.
While each successive algorithm is expected to have increased sensitivity, the actual sensitivity cannot be directly measured in this cohort because we excluded participants with phenotypes overlapping these conditions. Given the rarity of the conditions being evaluated, we do not expect to find truly affected individuals for most of the disorders. Therefore, the number of true positives in the sample should be between 0-1 for most conditions, even including non-penetrant alleles. For estimated specificity calculations, we assumed that all “positive” screening results in this small sample were false positives, with the exception of certain variants with undisputed pathogenicity after curation of results from VSA-1 and VSA-2 (eg., HFE C282Y, Ashkenazi founder mutations in BRCA1 and BRCA2). Specificity estimates were adjusted and confidence intervals were calculated using the Wilson score interval.24
The GeneScreen project has identified 17 genes implicated in 11 monogenic disorders that are expected to be highly actionable in the adult population, in which to explore genomic screening in the general population. We simulated the performance of genomic screening of these genes in the general population by applying five algorithms for identifying potentially pathogenic variants in 478 exomes (Figure 1). VSA-1, which includes only variants classified as “pathogenic” in ClinVar, has the lowest sensitivity of the algorithms tested and returned the fewest variants overall, with specificity expected to approach 100% in ideal testing conditions if results are restricted to unquestionably pathogenic variants. Even so, some variants currently listed in ClinVar as “pathogenic” are not definitively pathogenic and thus the specificity of VSA-1 was 99.5%. The overall specificity of VSA-2 was also 99.5% and the overall specificity of VSA-3 was 99.4%. As expected, the most sensitive algorithms, which include rare missense variants, had the lowest specificity: VSA-4 had an overall specificity of 98.7%, and VSA-5 had an overall specificity of 97.1%. This demonstrates that the increased sensitivity afforded by inclusion of missense variants results in a concomitant decrease in overall specificity and hence would result in a substantial increase in false positive results.
However, it is more important to consider the performance of the algorithms with respect to each gene. Two individuals were homozygous for the HFE C282Y pathogenic variant, and another two individuals were homozygous for the SERPINA1 pathogenic variant that results in the “Pi-Z” Alpha-1 Antitrypsin deficiency phenotype. These findings are interpreted to be “true positives” since their pathogenicity is undisputed. Two of the genes with very low prevalence (MUTYH and RET) had no “positive” screening results identified by any algorithm. Most of the genes had very few “positive” results identified by either VSA-1 or VSA-2, indicating that increasing the sensitivity of the algorithm by including rare predicted truncating variants does not dramatically reduce specificity. VSA-3 (which includes variants classified in ClinVar as “likely pathogenic” or HGMD as “disease mutations”) identified numerous “positive” results in several genes, including BRCA1, BRCA2, KCNQ1, LDLR, and RYR1, consistent with the well-established concern of misclassification of variants in current reference databases and highlighting the need for databases with high quality lists of pathogenic variants. Finally, a large number of “positive” results were identified in most genes by the more sensitive algorithms, VSA-4 and VSA-5, reflecting higher underlying variation in certain genes that could impose an upper limit on the possible clinical sensitivity of genomic screening in order to preserve specificity. The performance of the variant selection algorithms for each gene was plotted as 1-specificity (Figure 2).
This study models genomic screening of 17 genes for 11 highly actionable monogenic disorders and demonstrates the tradeoff between sensitivity and specificity among variant selection algorithms that might be used in a large-scale screening program. Genomic screening of the general population differs significantly from the much more familiar pursuit of diagnostic genomic sequencing because a symptomatic patient has a much higher pre-test probability of having a pathogenic variant than the average individual in the general population. We have evaluated informatics algorithms that can be used to screen genomic data for potentially pathogenic variants, and for the first time attempted to measure the specificity of genomic findings in the context of population screening.
For most conditions, the true clinical sensitivity of genetic testing is not well established. In some cases, a very small number of variants account for a high proportion of cases and thus sensitivity of the most stringent variant selection algorithms will be high, whereas in other conditions a high proportion of cases are due to novel or private mutations that would not be documented in databases of known pathogenic variants, thus reducing the sensitivity of the most stringent variant selection algorithms. In most conditions it is very difficult to discern what fraction of cases are caused by recurrent mutations (which would be considered “known pathogenic”) versus novel or rare mutations that may only occur in a small fraction of cases (which might be considered “likely pathogenic”). That being said, the predictive value of a sequencing-based screening test for rare monogenic disorders hinges predominantly on specificity, not sensitivity.
We suggest that the degree of confidence regarding variant pathogenicity is analogous to the specificity of that result. Tests reporting only “Pathogenic” variants approach 100% specificity, whereas tests reporting “Likely Pathogenic” variants have 90-95% specificity depending on the confidence standard applied. In the diagnostic context, reduced specificity is acceptable because of the high pre-test probability and the importance of excluding a diagnosis, when possible. However, when screening the general population for rare conditions, even a small decrease in specificity has a devastating impact on PPV. Conversely, in the case of rare conditions NPV is not significantly improved by even large changes in sensitivity, which decreases the imperative to consider novel variants that inherently reduce the specificity of the screening test. These factors are critical points to consider in the design of genomic screening algorithms for the healthy population.
For example, the prevalence of three BRCA1 and BRCA2 pathogenic founder mutations in the Ashkenazi Jewish population is 2.6%25 and the specificity for these variants is 100%, resulting in a very high PPV. However, the specificity of most other variants (including novel truncating variants) will be less than 100%. In the general population, where disease prevalence is less than 1% and a much wider range of pathogenic variants is detected,26 a decrease of specificity to even 99% would result in a PPV of 37%, which some may find unacceptable for application of such screening in the general population. This dramatic decrease in PPV, despite a very small change in specificity, is true for all rare monogenic conditions.
It should be noted that fully automated analysis of results without some level of human curation prior to confirmation and reporting is not possible at present; thus variants flagged as a possible “positive” screening result by informatics algorithms are expected to undergo scrutiny by a human analyst to further reduce reporting of clearly non-pathogenic variants. This necessity for some level of human curation will be a critical limiting factor for deployment of genomics in large-scale population screening but can be optimized by designing variant selection algorithms that maximize the sensitivity of the screen without burdening the reviewer with numerous non-pathogenic variants. In the long term, the empiric sensitivity and specificity of variant selection algorithms may be a more accurate measure of predictive value than the pathogenicity classifications used in diagnostic testing.
The conditions being evaluated in the GeneScreen project are heterogeneous with respect to the penetrance, expressivity of manifestations, and recommended follow-up management strategies (Table 1). Positive genomic screening results in these conditions would lead to increased clinical surveillance and other interventions. In the case of pathogenic variants in BRCA1 or BRCA2, management options include increased surveillance or prophylactic mastectomy and/or bilateral salpingoophorectomy. These recommendations are very different from management of Familial Hypercholesterolemia due to pathogenic variants in LDLR, which includes periodic monitoring of cholesterol and treatment with cholesterol-lowering medications, an efficacious intervention with dramatically less impact (physical, psychosocial and monetary). Thus, the downstream consequences of positive screening results will differ between conditions. It should be noted that in conditions with incomplete penetrance, even truly pathogenic variants may not manifest in disease and therefore would constitute overdiagnosis in some individuals who would not develop disease in their lifetime. This concern is similar in many ways to a false positive result, in that the downstream consequences would not be expected to benefit these individuals and would only expose them to harm.
The negative consequences of genomic screening can be minimized if any of the following conditions exists: 1) test performance maximizes PPV and thus results in few false positives, 2) low-risk confirmatory follow-up tests can reduce the number of false positives, or 3) downstream consequences of false positives are relatively insignificant. Thus, a genomic screening project could consider customizing PPV thresholds for different conditions based on the above criteria and available information about the mutational spectrum and proportion of cases attributable to known versus novel mutations (Table 1). For example, for BRCA1 or BRCA2 the false positive rate will need to be very low, because risk-reducing surgery is an important management step and no confirmatory tests exist to ensure that a genetic variant is indeed pathogenic. Thus, only carefully curated lists of pathogenic variants and, perhaps, rare truncating variants (the equivalent of VSA-2) should be reported in a screening context. In other cases, inclusion of rare, potentially damaging missense variants (VSA-4) may be acceptable, depending on the specificity of variant selection algorithms for the gene, the spectrum of pathogenic mutations observed, and the false positive tolerance based on clinical implications of a positive screening result.
The argument that population genomic screening should utilize gene-specific false positive thresholds requires precise measurement of the clinical sensitivity and clinical specificity of variant selection algorithms (as well as the prevalence of the condition) in order to optimize PPV. However, such estimates will likely require a dataset of 100,000+ individuals with extensive phenotypic data, which does not yet exist, and the specificity estimates provided here should be considered provisional. The population being sequenced also needs to be considered – existing datasets fail to comprehensively ascertain the pathogenic and benign variants that are present in diverse populations, which will impact variant selection algorithms and pathogenicity assessments. In addition, databases of clinically relevant variants are known to have entries with misattributed pathogenicity, and will need to be greatly improved before they can be relied upon for screening.
Screening the general population for deleterious variants in highly selected genes holds great promise to decrease morbidity and mortality for millions of individuals. However, we must be cognizant of our limited ability to interpret the pathogenicity of rare genomic variation, the implications this has for standards used to analyze and report genomic findings, and the downstream consequences of screening in the general population. Otherwise, great harm could result. Much additional research is required to assess the overall cost of screening efforts, including the downstream effects of false positive tests, in order to evaluate the overall economic impact of genomic screening.27 A host of factors will need to be studied, including cost-effectiveness, adverse consequences of interventions, insurance impact, education and consent materials that address the complexity of potential harms and benefits of screening, implications for reproductive choices, potential psychosocial impacts, and the benefits and harms of familial cascade testing.
In this study we have addressed a small subset of highly actionable genetic conditions, that we think are currently the most promising candidates for genomic screening in the general adult population. However, these results also have fundamental implications for any effort that considers genome-scale sequencing in healthy individuals. Although genomic sequencing is inherently appealing for personalized medical care, it should be initiated cautiously with rigorous attention to the predictive value of genetic variation, and with careful consideration of the implications of false positive results across the broad range of genetic conditions that could potentially be identified.
This study was supported by NHGRI 2P50 HG004488, NHGRI 1U01 HG006487, and the Howard Holderness Distinguished Medical Scholars Foundation.
The authors report no conflicts of interest. M.C.A performed data analysis. J.S.B had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
GeneScreen Investigators include:
Daniel K. Nelson, MS, CIP, National Health and Environmental Effects Research Laboratory, Environmental Protection Agency
Debra G. Skinner, PhD, Frank Porter Graham Child Development Institute, University of North Carolina at Chapel Hill
John M. Conley, J.D., Ph.D., School of Law, University of North Carolina School at Chapel Hill
Christine Rini, PhD, Department of Health Behavior, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
Marcia Van Riper, RN, PhD, FANN, School of Nursing, University of North Carolina at Chapel Hill
Gabriel Lázaro-Muñoz, PhD, JD, MBE, Center for Genomics and Society, University of North Carolina at Chapel Hill
Julianne M. O’Daniel., MS, CGC, Department of Genetics, University of North Carolina at Chapel Hill
R. Jean Cadigan, PhD, Department of Social Medicine, University of North Carolina at Chapel Hill
Rebecca L. Walker, PhD, Department of Social Medicine, University of North Carolina at Chapel Hill
Ann Katherine M. Foreman, MS, CGC, Department of Genetics, University of North Carolina at Chapel Hill
Eric Juengst, PhD, Department of Social Medicine, University of North Carolina at Chapel Hill
Myra I. Roche, M.S., C.G.C., Department of Pediatrics, University of North Carolina at Chapel Hill
Kirk Wilhelmsen, MD, PhD, Department of Genetics, University of North Carolina at Chapel Hill