|Home | About | Journals | Submit | Contact Us | Français|
The increasing availability of personal genomic tests has led to discussions about the validity and utility of such tests and the balance of benefits and harms. A multidisciplinary workshop was convened by the National Institutes of Health and the Centers for Disease Control and Prevention to review the scientific foundation for using personal genomics in risk assessment and disease prevention and to develop recommendations for targeted research. The clinical validity and utility of personal genomics is a moving target with rapidly developing discoveries but little translation research to close the gap between discoveries and health impact. Workshop participants made recommendations in five domains: (1) developing and applying scientific standards for assessing personal genomic tests; (2) developing and applying a multidisciplinary research agenda, including observational studies and clinical trials to fill knowledge gaps in clinical validity and utility; (3) enhancing credible knowledge synthesis and information dissemination to clinicians and consumers; (4) linking scientific findings to evidence-based recommendations for use of personal genomics; and (5) assessing how the concept of personal utility can affect health benefits, costs, and risks by developing appropriate metrics for evaluation. To fulfill the promise of personal genomics, a rigorous multidisciplinary research agenda is needed.
The accelerated discovery of genes for common diseases fuels expectations that genomic information will become an integral component of personalized health care and disease prevention.1,2 Several companies now offer personal genomics (PG) tests directly to consumers (DTC). For our purposes, we define PG tests as those that provide comprehensive genetic risk profiles for many diseases or targeted genetic risk profiles for specific conditions (e.g., breast cancer). Such tests can include single or multiple genes, linked or causative single nucleotide polymorphisms, functional assays, and full gene or genome sequencing. PG tests can be requested by a physician or provided DTC.3
DTC marketing of PG tests can bypass the need for clinicians to order and/or interpret genetic tests, although companies differ in the extent to which they offer genetic counseling, and some states mandate provider involvement. Some scientists have voiced concerns regarding the current scientific foundation for the clinical validity (CV) and clinical utility (CU) of PG tests and the potential impact on our health care system.4 PG tests may lead to a medical testing “cascade effect” with unwarranted diagnostic, pharmacologic, and surgical interventions.5 The cascade effect has been well described for various radiology procedures (such as total body computed tomographic scans). There are also public health concerns regarding costs and possible patient harms if such cascade effects lead to unfounded preferences for pharmaceuticals or reduced motivations to pursue healthy lifestyles (see also discussion of personal utility later). Conversely, others have argued that PG tests can empower consumers and their providers in health promotion, as well as early disease detection and management, making the health care system more proactive.6 The emergence of PG tests coincides with increasing access and demand for health information.7 The arguments both for and against the use of PG must be informed by appropriate scientific research on benefits and risks.8
To discuss the scientific foundation for using PG tests in risk assessment and disease prevention, a multidisciplinary workshop was convened by several Institutes of the National Institutes of Health and the Centers for Disease Control and Prevention. The workshop sought to enhance dialogue among various stakeholders, identify gaps in knowledge, and suggest research areas. Multiple viewpoints and disciplines were represented, including industry, consumer, clinical, epidemiology, genetics, communication, social, and behavioral sciences. The following questions were discussed using case studies. (1) Is there evidence of public and provider interest in PG testing? (2) What evidence is needed to determine whether genetic information from PG tests adds to existing risk algorithms in predicting health outcomes? (3) What evidence is needed to determine whether PG tests can improve clinical outcomes? (4) What type of research is needed to determine CV and CU of PG tests? Details of the workshop can be accessed online.9 Not included in this report are the workshop discussions of government oversight, policy, and regulation of DTC genetic tests. These issues are also being considered by various advisory groups such as the Institute of Medicine and the Department of Health and Human Services' Secretary's Advisory Committee on Genetics, Health, and Society. In particular, Secretary's Advisory Committee on Genetics, Health, and Society published recommendations for oversight of genetic tests in 2008.10
The potential utility of PG tests for risk assessment and health improvement can be viewed by stages of disease natural history and points of intervention (Table 1)11. For primary prevention, genetic information may inform decision making for minimizing risk exposures, improving health behaviors and lifestyle factors, or providing prophylactic surgery, chemoprevention, or other customized interventions. Genetic information also can be useful in early disease detection (secondary prevention), in targeting treatments (tertiary prevention), and improving survival and psychosocial outcomes (quaternary prevention). Key assumptions for the scientific foundation for PG are that risk assessment should be followed by effective and safe interventions that can reduce morbidity, improve health, or other measurable utility endpoints; and that genetic information based on the combination of variants across the genome should be evaluated like other biological markers for screening or risk assessment. Finally, these assumptions must also be put in the context of well-known principles of population screening, especially as applied to genetic risk factors for asymptomatic individuals. These principles include public health importance, knowledge of the natural history of the disorder, availability of effective interventions, and full considerations of the ethical, legal, and social and policy issues surrounding these technologies.12
Multidisciplinary translation research is needed to connect gene discovery to improved health outcomes via four overlapping phases of research, denoted T1 to T4.13 T1 research entails the development of candidate genetic tests based on discovery, replication, and clinical and epidemiologic characterization. T2 research involves the evaluation of these tests for validity and utility and the development of evidence-based recommendations for their use. T3 research involves evaluating best approaches for diffusion, dissemination, and implementation of tests into practice. T4 research entails assessing the population impact including effectiveness and economic value of these tests in real-world settings.
The evaluation of genetic tests across the four translation research phases has been described using the ACCE framework (analytic validity, CV, CU, and ethical, legal, and social implications).14 This framework was formalized by the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) working group,15 an independent panel that reviews available data on validity and utility of genomic applications and produces evidence-based recommendations.16–19 The EGAPP working group adapted the methods of the US Preventive Services Task Force (USPSTF) for genomic applications. The USPSTF is a long-standing independent panel that has developed numerous evidence-based recommendations relevant to clinical preventive services.20 The EGAPP working group developed methods for assessing genetic tests for different applications (diagnostic, screening, risk assessment, prognostic, and pharmacogenomics) for both genetic disorders and common diseases. The group is currently reviewing two topics related to PG tests (Type 2 diabetes and coronary heart disease).21 As discussed later, issues of CV and CU are qualitatively and quantitatively different for PG tests designed for risk assessment compared with diagnostic tests for genetic disorders with high penetrance.
The workshop focused on CV and CU. Participants only briefly covered analytic validity because current genomic assays have high sensitivity and specificity for measured genetic variants. However, participants agreed that oversight is still needed to ensure laboratory quality of tests and the testing process.10 The evaluation of CV and CU is an ongoing and iterative process that occurs throughout the translation continuum, including evaluating early efficacy and effectiveness in real-world settings. As discussed recently by Wideroff et al.,22 research on dissemination, diffusion, effectiveness, and impact should be considered as part of a robust health services research agenda for genomic technologies.
The CV of a genetic variant is defined by its relationship with a phenotype or a health outcome, singly or in combination with other variants and environmental factors. Two steps are needed in evaluating CV: (1) establishing credible genetic associations and (2) assessing genetic disease associations in relation to the predictive value, especially vis-à-vis existing risk factors.
Variants identified by candidate gene studies and genome-wide association studies (GWAS) tend to be common (allele frequencies >5%) with small effect sizes (odds ratios <1.5).23 Many variants are not known to alter biological function and may be in linkage disequilibrium with unknown disease-related variants.24 Resequencing is beginning to identify rare potentially functional mutations that may underlie common variant disease associations.25–27 Currently, much of the heritability for common diseases is unexplained.23,24 However, we expect the current landscape to change rapidly as more variants are discovered and gene–gene and gene–environment interactions are used to refine risk estimates. Population-based case-control and cohort studies are crucial for establishing credible risk estimates of genetic variants.28,29 In addition, the cumulative evidence for genetic associations should be rigorously evaluated by systematic reviews and meta-analyses that determine if differences exist across populations.30,31
Consensus guidelines for grading cumulative evidence on genetic associations use three criteria: amount of evidence, replication, and protection from bias.32 The amount of evidence can be defined by sample size, false discovery rate, or Bayesian credibility.32 Consistency of replication across different datasets and populations also must be considered (most published GWAS have built-in replication). Protection from bias can be difficult to determine. Typical biases include phenotype mis-classification, population stratification, and selective reporting.
The cumulative assessment of genetic associations in available PG tests is still a moving target. For example, in a 2008 analysis of genetic associations included in PG tests, Janssens et al.33 found that of 56 genes tested, 24 (43%) had not been subjected to meta-analyses. For the remaining 260 meta-analyses, 60 (38%) were nominally statistically significant. The use of different single nucleotide polymorphisms to assess risk for the same disease has occasionally resulted in divergent results given to individuals.34–36 Currently, several companies offering PG tests use genetic risk factors that have been replicated in multiple studies and have worked together toward industry-wide standards.37–40 However, such standards are still work-in-progress and not uniformly accepted or applied.
Access to credible and rapidly updated information on genetic associations is difficult to deliver and urgently needed. One such approach is the HuGE Navigator, an online, continuously updated database of citations on human genome epidemiology.41 As of June 25, 2009, the knowledge base contained 43,515 genetic association studies, 1046 meta-analyses, 368 GWAS, 5593 genes, and 2204 disease terms. The HuGE Navigator allows users to view disease- and gene-centered pages to navigate to online databases (e.g., University of California at Santa Cruz Genome Browser,42 Gene Tests,43 and PharmGKB44).
Even if a genetic association is highly credible, how useful is the information for risk assessment and disease prediction? An important aspect of CV is the degree to which variants can distinguish between those who will develop an outcome from those who will not. Measures of sensitivity, specificity, positive predictive value, and negative predictive value are needed.45 Predictive values depend on the definition and prevalence of the outcome, characteristics of the tested population, penetrance, and genetic/allelic heterogeneity. Even for predominantly single gene conditions, such as hereditary breast cancer, heterogeneity can lower the positive predictive value and therefore the validity of genetic variants.46 Decision analyses for using absolute risk models to determine appropriate interventions may give more insight than standard statistical analyses.47
Three considerations affect the CV of a PG test: the degree to which predicted risks fits observed data (calibration); the ability of the test to separate those who are truly at risk from those who are not (discrimination); and change in risk assignment compared with no testing (reclassification). These considerations are prerequisites to evaluating CU (see later).
Calibration assesses whether predicted risks from models that include genetic variants and other factors are correct.48,49 Calibration is especially important when models have untested or incorrect assumptions (e.g., independent, multiplicative effects; no interactions; and applying effect sizes obtained from various studies to the tested population).
Discrimination assesses the overlap in risk distributions of people who will develop the disease and those who will not. For good discrimination, a broad distribution of risks is required. The area under the curve (AUC) is one measure of the discriminative ability of a test.50–53 It is generated by plotting all sensitivity–specificity combinations for all possible cut-off values of the predicted risks. Because of small effect sizes, most genetic variants included in the current genome profiles have low discriminative accuracy and contribute only marginally to AUC compared with existing risk factors.54–56 Many more genetic variants are needed to increase the discriminative ability and predictive value of PG.57,58
Reclassification refers to the proportion of persons who change risk categories when prediction models are updated to incorporate new genetic variants. If risk categories are defined according to cut points used to indicate type or intensity of interventions, reclassification can impact clinical management (see discussion of clinical utility later). However, if individuals do not change risk categories, as a result of adding genetic information, reclassification will not be clinically useful. One has to consider also the number of individuals who are in risk categories where reclassification would make a difference. Studies with few such individuals (e.g., where all persons have low risk, far from the cut-off where reclassification would influence decision making) may not be able to arrive at robust estimates of the extent and correctness of reclassification. Analyses of the AUC should be integrated with analyses of risk reclassification to maximize information on the use of new markers for risk prediction.59,60 The evidence on risk reclassification based on the genetic markers is rapidly growing. For example, the association of a variant on chromosome 9p21.3 (rs10757274) with cardiovascular disease has been extensively replicated but has been shown to be a significant risk classifier for future cardiovascular disease in some studies but not in others.61–64
CU is a measure of the net health benefits of PG tests (benefits minus harms). The cumulative assessment of CU and the level of certainty associated with the assessment depend on the clinical scenarios under consideration.65 CU evaluation should follow principles of comparative effectiveness research.66 The CU of PG tests may be evaluated for their ability to improve outcomes when either added to or substituting current approaches. For example, what kinds of nongenetic interventions could serve as adequate benchmarks for genomic information to be compared; and what study designs would be needed to assess the impact of PG tests on health care resources—e.g., resources for case management for those found to have increased risk? For any particular scenario, the balance of benefits and harms will depend on the factors such as the predictive value of genomic information, the availability and performance measures of other risk assessment tools, the acceptability, cost and efficacy of proven interventions to reduce risk, and possibly other factors such as potential stigmatization as well perceived personal value of the information. These issues are discussed in the population screening literature.12
Some suggest considering the “personal utility” of PG, in which genetic and other health information may be useful to individuals even in the absence of effective interventions. For example, in a series of publications from the Risk Evaluation and Education for Alzheimer disease study (the REVEAL Study), Green and coworkers67–71 have evaluated different methods for communicating genetic information to people at risk of Alzheimer Disease (AD). Even though there are no proven effective interventions to remediate risk, the results of these studies indicate that some people perceived this information to be useful by allowing them to prepare their families and arrange personal affairs including long-term care. Moreover, those who opted for testing did not generally experience adverse psychological effects from test results provided as part of a genetic counseling protocol, even when they learned they were at high risk for AD.
In evaluating the role of personal utility, it will be crucial to develop appropriate metrics to consider the impact of different indices of individual perceived value of personalized genomics on health-related benefits, costs, and harms associated with testing and interventions, both to individuals and the society at large.72 Finally, the impact of personal utility on appropriate use of health care resources will have to be further explored.
There is much to learn about how genetically guided risk reclassification can improve health outcomes. Well-calibrated models that can reclassify people into higher risk groups that require different interventions may provide indirect evidence of CU but these have been rarely available in the PG field. However, we must also consider how such reclassification, especially with small changes in risks due to small effect sizes affects the need to change interventions (e.g., cholesterol lowering drugs or screening for cancer detection), and how such changes can alter the balance of benefits and risks, as well as the economic implications of testing. There is also the possibility of frequent risk reclassification based on the additional genetic variants, as demonstrated in a prospective population-based study of Type 2 diabetes risk using 18 genetic variants in addition to age, sex, and body mass index (Janssens, personal communication, 2009). The utility of learning about risk updates must be considered and its impact assessed, particularly when lifestyle and nutrition recommendations and medical decisions can vary accordingly.
An unresolved question is whether observational studies can provide sufficient CU information without randomized clinical trials (RCTs). Lord et al.73 argue that CV studies suffice to show CU if a new diagnostic test is safer or more specific than, but of similar sensitivity to, an old test. However, if a new test is more sensitive than an old test, it can lead to the detection of additional cases of disease (often milder or earlier onset). Results from the treatment trials that enroll only patients detected by the old test may not apply to these extra cases. RCTs may be needed, unless we can be satisfied that the new test detects the same spectrum and subtype of disease as the old test and that intervention response is similar across the spectrum of disease.
Hypotheses for CU often come from biological and clinical data suggesting that response to interventions (e.g., pharmacogenomics) may work differently for population groups.74,75 The question is whether this information will translate into net health benefits in practice.76 To assess the CU of genetic tests, the EGAPP working group has developed analytic frameworks similar to those developed by the USPSTF, with key questions to frame the evidence; clear definitions of clinical and other outcomes of interest; explicit search strategies; use of hierarchies to characterize data sources and study designs; quality assessment of individual studies, synthetic assessment of all available evidence, linkage of evidence to recommendations; and avoidance of conflicts of interest.15 For most genomic applications (and many other diagnostic tests), direct evidence about the effectiveness and value of testing is rarely available from RCTs. For recent evaluations, the group has constructed a chain of evidence linking the strength of the association between a genotype and disorder of interest (CV) to evidence that test results can change intervention decisions and improve net health benefits (CU). So far, the group has tackled genomic applications in symptomatic patients and their families rather than the asymptomatic population at large, the main target group for PG, although similar chains of causal reasoning and evidence synthesis can be applied. For example, based on the biological reasoning, CYP450 testing was proposed as a test for adults with nonpsychotic depression before treatment with selective serotonin reuptake inhibitors. The EGAPP Working Group reviewed evidence for validity and utility of testing and found that CYP450 genotypes were not consistently associated with clinical response to selective serotonin reuptake inhibitor treatment or adverse events, and that no clinical trials had been conducted to evaluate benefits and harms. Thus, CYP450 testing was not recommended for this clinical situation.16 Another example is genetic testing to inform anticoagulation therapy with warfarin. The CYP2C9 and VKORC1 genes are implicated in warfarin and vitamin K metabolism, and variants in these genes are consistently associated with warfarin bleeding complications. Without large, well-designed clinical trials, however, it is not known if genotyping to determine warfarin dosing could reduce adverse consequences of hemorrhage or thrombosis or could improve health outcomes such as reduced rates of hospitalization or mortality.77 Recently, the International Warfarin Pharmacogenetics Consortium developed an algorithm for estimating warfarin dose based on both clinical and genetic data from a broad population cohort study.78,79 In evaluating CU, economic issues also should be considered. For example, Eckman et al.80 concluded that warfarin-related genotyping is unlikely to be cost effective for typical patients with atrial fibrillation (marginal cost effectiveness exceeded $170 000 per quality-adjusted life year). However, testing for warfarin dosing may be cost effective in patients at high risk for hemorrhage.80 Economic models that include sensitivity analyses need to be revisited frequently given the declining prices of PG tests and the rapid rise in health care costs.
RCTs can be used to develop direct evidence for CU of PG in relation to both behavioral and pharmacological interventions. RCTs could be used to identify subgroups of individuals based on the PG profiles where interventions are most effective and to apply intervention only in those subgroups. RCTs also could be used to identify subgroups of individuals based on the PG profiles with side effects, so that reduced dosages or alternative interventions can be used. Even if no differences in the effects of interventions exist by genotype, RCTs can be used to assess whether genotype-based interventions can be more effective overall if they improve adherence to available interventions that are designed for the general population.
Examples of research questions amenable to RCTs are given in Table 2. RCTs have rarely been conducted to assess the CU of genetic information in changing behavior. Studies that have examined health behavior change have generally found that genetic risk information by itself is insufficient to promote complex behavior changes such as smoking cessation and alteration of dietary and exercise habits (see later). However, an emerging body of evidence suggests that genetic risk information may increase preferences for biological interventions over health behavior changes when both are viable options.81 For example, some studies82–84 have suggested that individuals presented with genetic risk information are more likely to affirm the importance of pharmaceutical treatments for conditions like heart disease and depression over lifestyle change or psychotherapy. In the REVEAL study, where no proven treatments are available to prevent AD participants learning that they were APOEe4+ were more likely than their APOEe4− counterparts to report engagement in suspected but not proven AD risk reduction activities (e.g., vitamin E71). The potential for both health benefits and harms of these activities needs to be evaluated.
An example of an ongoing RCT is a primary care-based study to assess the CU of TCF7L2 testing for Type 2 diabetes in altering behavior and health measures in prediabetic patients (Ginsburg, personal communication). Secondary goals are to measure whether changes in perceived risk and beliefs about genetics are associated with behavior change after genetic testing and to determine whether a genetics-guided clinical trial would change primary care physicians' beliefs and understanding of genetics and their role in practice, especially vis-à-vis existing approaches to diabetes control that do not use genetic information.
Behavioral research conducted to date on the CU of genetic information has focused largely on the potential of genetic information to increase perceptions of vulnerability to adverse health outcomes. A considerable literature exists around genetic testing for rare hereditary cancers.85,86 Usually, persons considering genetic testing for rare genetic disorders want to know what tests are offered, what the results might mean for themselves and their families, what information they will have access to, where they can go for more information, and whether any of the information is actionable and whether action can lead to improved outcomes.
In considering PG tests, the target group is mostly asymptomatic people in the population.87,88 With relatively high innumeracy levels, education and awareness are critical for decision making by providers and consumers. Traditionally, mass media approaches have been used to increase awareness of health issues.89 However, emerging technologies now offer new approaches to personalized communication. For example, health communication with tailored health messages sent to mobile phones can be useful in conveying information about alcohol-related risks.90 In theory, educational approaches could be tailored not only to individual demographics, psychosocial beliefs, abilities, and preferences but also to genomic information. However, to date, rigorous evaluations of these technologies have been infrequent. Research is needed to develop effective and context-based approaches to communicate genetic information to promote comprehension of genetic information, informed decision making about testing and adopting healthy behaviors. Such research will need to incorporate best practices from the risk communication literature on how to emphasize actionable health messages from those that are inconclusive or potentially misleading.91,92
Early studies evaluating the use of single gene variants to convey personalized risks for lung cancer to cigarette smokers have shown no benefit for smoking cessation.93,94 McBride et al.8 are evaluating a prototypic “multiplex” genetic susceptibility test similar to those marketed DTC. The goal of this project is to evaluate the characteristics of individuals who are most interested in such testing and whether the information provided by PG can spur individuals to seek additional risk assessments (e.g., family history and behavioral risk assessments) and/or additional health services (e.g., well care visits). Although the multiplex study does not directly involve measurement of health outcomes, it will provide valuable information on social and psychological differences between those who opt to be tested versus those who decline testing, whether individuals who opt for such testing are able to accurately interpret their test results, whether interpretation of test results is associated with positive or negative emotions or changes in risk perception, and whether PG test results lead individuals to seek other personal health risk information.8 Another opportunity in this line of research is determining whether participants communicate such test results to their primary care providers and if so, how this information impacts service delivery in primary care. Some have expressed concern that an unintended consequence of the DTC model may be “raiding the medical commons,” as consumers who are encouraged to “ask their doctor” may bring PG test information to providers who are unequipped to interpret the information, which consumes time and resources, may take away from preventive services of established value, and may result in ordering unwarranted procedures or interventions.5
As we consider the potential utility of PG, we must use a multidisciplinary approach that moves beyond the focus on the psychological effects of risk communication to understand the value of PG in behavioral change. Existing but limited public health interventions, such as promoting energy balance to prevent obesity, have not been completely effective at the population level. Clinical trials evaluating weight loss interventions consistently show high attrition rates because individuals have difficulty adhering to recommendations for energy balance.95 It is unclear, but crucial to learn, whether PG information could be used to tailor interventions that promote weight loss.96 PG information also may enable us to further deconstruct behavioral phenotypes to identify and measure pathways that influence health behaviors. In turn, this information could offer new behavioral targets for intervention.
Workshop participants made five broad recommendations to enhance the scientific foundation for using PG as a tool for improving health. Specific areas of discussion are also published online.9
Several companies are collaborating in developing standards for PG tests. This work should be expanded to include transparent criteria for analytic standards, clinical standards on selection of genetic variants with high credibility, use of appropriate data to interpret reported allelic odds ratios in terms of overall risk compared with appropriate reference populations, and model calibration and evaluation of risk distributions for health conditions included in PG tests. The current statement from three companies represented at the workshop is available.40 In addition, standards for evaluating the CV and utility of PG tests need to be developed by independent panels (see fourth recommendation below).
Multiple scientific disciplines are needed to develop the PG field (Table 3). In addition to biological studies that can point to therapeutic and preventive interventions, epidemiologic studies are needed for risk characterization, especially of gene–gene and gene–environment interactions. Study cohorts must be quite large to have adequate statistical power.97 Clinical and population studies using communication, behavioral, and social sciences are needed across the translation continuum to assess the effectiveness of genetic information for consumers and providers. We need a robust health services research agenda that includes dissemination research to assess the uptake of evidence-based practice into routine care and outcomes research to improve the quality and effectiveness of health services. Also, we need public health surveillance and assessment of cost effectiveness and impact on health disparities. Current federal genomic initiatives in translation research98–103 should be enhanced. New models of translation research should be explored such as current collaborations among industry, academia, consumers, and government.104
Timely cumulative knowledge synthesis, based on standardized formats and systematic, evidence-based processes, is needed to summarize and update information on genetic associations and to document their CV and CU. Such information needs to be translated in an accessible fashion and disseminated to consumers, providers, and policy makers to inform decision making. Given the rapid pace of discovery in the PG field, new mechanisms may be needed to provide rapid turnaround of evidence reviews, to keep all stakeholders current relative to the best available data. This will require enhanced public and provider education.
The evidence threshold for implementing personal genomic information into clinical practice and disease prevention must be considered by independent panels that have no conflict of interest and use rigorous systematic evaluations. The current dilemma is that setting the evidence bar fairly low allows diffusion of genomic discoveries in practice, before there is adequate information on CV and CU. Consequently, payers may not cover testing costs. Conversely, setting the evidence bar too high may result in tests with high validity and utility but with lower financial incentive for innovation by developers. Paradoxically, this could lead to fewer developed tests and potentially diminished health benefits from PG tests.105 Because PG tests potentially affect a large number of asymptomatic persons in the population, extra caution is needed to establish appropriate evidentiary thresholds. Such screening tests can expose large numbers of healthy people to potential harms from false-positive results (such as anxiety and “labeling,” as well as additional invasive testing and treatment) or from false-negative results (such as false reassurance and attendant lapses in personalized risk factor reduction efforts). As a result, independent groups formulating evidence-based clinical recommendations such as the USPSTF have required a high level of certainty that the benefits of screening outweigh the harms and therefore have set a high evidence bar for recommending preventive services. To achieve an appropriate linkage between evidence and practice, independent panels such as EGAPP and USPSTF should provide rapid and timely assessments to determine in a systematic and transparent fashion whether a PG test and its associated interventions does more good than harm in specific population groups or on a population-wide basis. In preventive services, the USPSTF has had a major role for more than three decades and has conducted formal analyses of hundreds of preventive interventions. The task force has generally set a high evidentiary bar for preventive tests used in asymptomatic populations. EGAPP has recently established methods and processes for genomic-related applications. The field of PG will greatly benefit from such independent evaluations.
Finally, as discussed earlier, we should continue to explore individuals' and population subgroups' notions of perceived personal utility of PG (e.g., advantages of learning about genomic risk) and to assess whether personal utility may impact measures of CU (e.g., via improved adherence to recommendations). However, these perceptions of utility will need to be considered in the context of broader societal costs. In order for personal utility to be scientifically supported, objective metrics must be developed and applied in rigorous multidisciplinary observational studies and RCTs. These metrics should include measurable benefits, harms, as well as costs of PG testing and interventions.
In conclusion, to make the best use of PG for improved health outcomes, the CV and CU of these tests must be understood by consumers, providers, and policy makers. Scientific standards for evaluating these tests must be established and a mechanism put in place to provide authoritative, unbiased, timely reviews of new discoveries. Clinical, epidemiologic, communication, behavioral, social, and economic studies of PG must be rigorously pursued. Finally, these scientific standards have to be examined in the context of principles of population screening with full consideration of the ethical, legal and social, economic, and policy issues.
The authors thank Linda Avey, Alan Guttmacher, and Teri Manolio for their contribution to the workshop, Daniela Seminara and Mukesh Verma for their comments on the manuscript.
The following coauthors have conflicts of interest: Jeff Gulcher and Kari Stefansson (DeCODE Genetics); Andrew Hso (23andme); Amy DuRoss, Michele Cargill (Navigenics); George Church (Knome); Geoff Ginsburg is on the physician advisory board for 23andme. Amy Miller is employed by the Personalized Medicine Coalition, an organization that has as members consumer genomics companies. Sharon Terry is on the board of directors of DNA Direct.
Disclosure: Several authors have varying conflicts of interest because of employment and other affiliations. See complete list in the Acknowledgments section