|Home | About | Journals | Submit | Contact Us | Français|
In this paper, we discuss common challenges in and principles for conducting systematic reviews of genetic tests. The types of genetic tests discussed are those used to 1). determine risk or susceptibility in asymptomatic individuals; 2). reveal prognostic information to guide clinical management in those with a condition; or 3). predict response to treatments or environmental factors. This paper is not intended to provide comprehensive guidance on evaluating all genetic tests. Rather, it focuses on issues that have been of particular concern to analysts and stakeholders and on areas that are of particular relevance for the evaluation of studies of genetic tests. The key points include:
With recent advances in genotyping, it is expected that whole genome sequencing will soon be available for less than $1000. Consequently, the number of studies of genetic tests will likely increase substantially, as will the need to evaluate studies of genetic tests. The general principles for evaluating genetic tests are similar to those for interpreting other prognostic or predictive tests, but there are differences in how the principles need to be applied and the degree to which certain issues are relevant, particularly when considering genetic test results that provide predictive rather than diagnostic information.
This paper focuses on issues of particular concern to analysts and stakeholders and areas of particular relevance for the evaluation of studies of genetic tests. It is not intended to provide comprehensive guidance on evaluating all genetic tests. We reflect on genetic tests used to 1) determine risk or susceptibility in asymptomatic individuals (to identify individuals at risk for future health conditions, such as BRCA1 and BRCA2 for breast and ovarian cancer); 2) reveal prognostic information to guide clinical management and treatment in those with a condition (e.g., Oncotype Dx® for breast cancer recurrence, a test to evaluate the tumor genome of surgically excised tumors from patients with breast cancer); or 3) predict response to treatments or environmental factors including diet (nutrigenomics), drugs (pharmacogenomics, such as CYP2C9 and VKORC1 tests to inform warfarin dosing), infectious agents, chemicals, physical agents, and behavioral factors. We do not address genetic tests used for diagnostic purposes. We address issues related to both heritable mutations and somatic mutations (e.g., genetic tests for tumors).
Clinicians, geneticists, analysts, policymakers, and other stakeholders may have varying definitions of what is considered a “genetic test.” We have chosen to use a broad definition in agreement with that of the Centers for Disease Control and Prevention (CDC)-sponsored Evaluation of Genomic Applications in Practice and Prevention (EGAPP) and the Secretary’s Advisory Committee on Genetics, Health, and Society,1 namely: “A genetic test involves the analysis of chromosomes, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), genes, or gene products (e.g., enzymes and other proteins) to detect heritable or somatic variations related to disease or health. Whether a laboratory method is considered a genetic test also depends on the intended use, claim, or purpose of a test.”1 The same technologies are used for diagnostic and predictive genetic tests; it is the intended use of the test result that determines whether it is a diagnostic or predictive test.
In this paper, we discuss principles for addressing challenges related to developing the topic and structuring a genetic test review (context and scoping), as well as performing the review. This paper is meant to complement the Methods Guide for Comparative Effectiveness Reviews.2 We do not attempt to reiterate the challenges and principles described in earlier sections of this Medical Test Methods Guide, but focus instead on issues of particular relevance for evaluating studies of genetic tests. Although we have written this paper to serve as guidance for the Agency for Healthcare Research and Quality (AHRQ) Evidence-based Practice Centers (EPCs), we also intend for this to be a useful resource for other investigators interested in conducting systematic reviews on genetic tests.
Genetic tests are different from other medical tests in their relationship to the outcomes measured. Reviewers need to take into account the penetrance of the disease, time lag to outcomes, variable expressivity, and pleiotropy (as defined below). These particular aspects of genetic tests result in specific actions at various stages of planning and performing the review. Both single-gene and polygenic disorders are known. Single gene disorders are the result of a single mutated gene and may be passed on to subsequent generations in various well-described ways (i.e., autosomal dominant, autosomal recessive, X-linked). Polygenic disorders are the result of the combined action of more than one gene and are not inherited by simple Mendelian patterns. Some examples include heart disease and diabetes. Some of the terms described below (penetrance, variable expressivity, and pleiotropy) are generally used to describe single-gene disorders.
Evaluations of predictive genetic tests should always consider penetrance, defined as “the proportion of people with a particular genetic change who exhibit signs and symptoms of a disorder.”3 Penetrance is a key factor in determining the future risk of developing disease and assessing the overall clinical utility of predictive genetic tests. Sufficient data to determine precise estimates of penetrance are sometimes lacking.4,5 This can be due to the lack of reliable prevalence data or a lack of long-term outcomes data. In such cases, determining the overall clinical utility of a genetic test is difficult. In some cases, modeling with sensitivity analyses can be helpful to develop estimates.4
The time lag between genetic testing and clinically important events should be assessed in critical appraisal of studies of such tests. Whether the duration of studies and follow-up are sufficient to characterize the relationship between positive tests and clinical outcomes are important considerations. In addition, it should be determined whether or not subjects have reached the age beyond which clinical expression would be likely.
Variable expressivity refers to the range of severity of the signs and symptoms that can occur in different people with the same condition.3 For example, the features of hemochromatosis vary widely. Some individuals have mild symptoms, while others experience life-threatening complications such as liver failure. The degree of expressivity should be considered in the evaluation of genetic tests.
Pleiotropy occurs when a single gene influences multiple phenotypic traits. For example, the genetic mutation causing Marfan syndrome results in cardiovascular, skeletal, and ophthalmologic abnormalities. Similarly, BRCA mutations can increase the risk of a number of cancers, including breast, ovarian, prostate, and melanoma.
Another common challenge in evaluating predictive genetic tests is that direct evidence for the impact of the test results on health outcomes is often lacking. The evidence base may often be too limited in scope to evaluate the clinical utility of the test. In addition, it is often difficult to find published information on various aspects of genetic tests, especially data related to analytic validity. For example, laboratory-developed tests (LDT) are regulated by the Centers for Medicare & Medicaid Services (CMS) Clinical Laboratory Improvement Act (CLIA) regulations for clinical laboratories. CLIA does not require clinical validation and many LDTs have had no clinical validation or clinical utility studies.
Genetic tests also have a number of technical issues that are particularly relevant to assessing their analytic validity. These technical issues may differ according to the type of genetic test and may influence the interpretation of a genetic test result. Technical issues may also differ depending on the specimen being tested. For example, there are different considerations when assessing tumor genomes as opposed to human genomes.
Common challenges arise when attempting to use genetic tests to determine susceptibility or risk in asymptomatic individuals. The utility of such tests may depend on the ability of respondents, such as the patient or their relative, to report and identify certain clinical factors. For instance, if patients cannot accurately recall the family history of a heritable disease, it can be difficult to assess their risk of developing the disease.
Finally, statistical issues must be taken into account when evaluating studies of genetic tests. For example, genetic test results are often derived from analytically complex studies that have undergone a very large number of statistical tests, creating a high risk of Type I error (i.e., a spurious association is deemed significant).
Organizing frameworks for evaluating genetic tests have been developed by the United States Preventive Services Task Force (USPSTF), the CDC, and EGAPP.1,6,7 The model endorsed by the EGAPP initiative1 was based on a previous Task Force report8 and developed through a CDC-sponsored project, which piloted an evidence evaluation framework that applied the following three criteria: 1) analytic validity (technical accuracy and reliability), 2) clinical validity (ability to detect or predict an outcome, disorder, or phenotype), and 3) clinical utility (whether use of the test to direct clinical management improves patient outcomes). A fourth criterion was added: 4) ethical, legal, and social implications.6 The ACCE model (Analytic validity, Clinical validity, Clinical utility, and Ethical, legal and social implications) includes a series of 44 questions that are useful for analysts in defining the scope of a review, as well as for critically appraising studies of genetic tests (Table 1). The initial seven questions help to guide an understanding of the disorder, the setting, and the type of testing. A detailed description of the methods of the EGAPP Working Group is published elsewhere.1
It is important to have a clear definition of the clinical scenario and analytic framework when evaluating any test, including a predictive genetic test. Prior to performing a review, analysts should develop clearly defined key questions and understand the needs of decision makers and the context in which the tests are used. They should consider whether this is a test used for determining future risk of disease in asymptomatic individuals, establishing prognostic information that will influence treatment decisions, or predicting response to treatments (either effectiveness or harms)—or used for some other purpose. They should clarify the type of specimens used for the genetic test under evaluation (i.e., patient genome or tumor genome). The PICOTS typology (Patient population, Intervention, Comparator, Outcomes, Timing, Setting) should be clearly described as it will inform the development of the analytic framework and vice versa.
In constructing an analytic framework, it may be useful for analysts to consider preanalytic, analytic, and postanalytic factors particularly applicable to genetic tests (described later in this paper), as well as the key outcomes of interest. Analytic frameworks should incorporate the factors and outcomes of greatest interest to decision makers. Figure 1 illustrates a generic analytic framework for evaluating predictive genetic tests that can be modified as necessary for various situations.
In addition to effects on family members, psychological distress and possible stigmatization or discrimination are potential harms that may result from predictive genetic tests, particularly those test results that predict probability of disease occurring with a high likelihood, especially if no proven preventive or ameliorative measures are available. For these potential harms, analysts should take into account whether the testing is for inherited or acquired genetic mutations since these factors influence the potential for harms. In addition, whether the condition related to the test is multifactorial or follows classic Mendelian inheritance will affect the potential for these harms.
Other important outcomes to consider when evaluating genetic tests include, but are not limited to, cost, quality of life, long-term morbidity, and indirect impact. Genetic tests may have an impact on decisions that are difficult to measure, yet very important, such as decisions regarding pregnancy.
Depending on the context, the impact of genetic testing on family members may be important, particularly in cases that involve testing for heritable conditions. One approach to including family members in the analytic framework is illustrated in Figure 2.
The Human Genome Epidemiology Network (HuGE Net) Web site can provide a helpful supplement to searches, as it includes many meta-analyses of genetic association studies as well as a source called the HuGE Navigator that can identify all types of available studies related to a genetic test.9
When assessing the gray literature, U.S. Food and Drug Administration (FDA)-approved test package inserts contain summaries of the analytic validity data. Package inserts are available on the FDA and manufacturer Web sites. Laboratory-developed tests do not require FDA clearance, and there is no requirement for publicly available data on analytic validity. When there are no published data on analytic validity of a genetic test, the external proficiency testing program carried out jointly by the American College of Medical Genetics (ACMG) and the College of American Pathologists (CAP) can be useful in establishing the degree of laboratory-to-laboratory variability, as well as some sense of reproducibility.10–12 Other potentially useful sources of unpublished data include conference publications from professional societies (e.g., the College of American Pathologists), the GeneTests Web site (www.genetests.org), the Association for Molecular Pathology Web site (www.amp.org), CDC programs (e.g., the Genetic Testing Reference Materials Coordination Program and the Newborn Screening Quality Assurance Program), and international proficiency testing programs.13
An AHRQ “horizon scan” found two databases—the LexisNexis® database (www.lexisnexis.com) and Cambridge Healthtech Institute (CHI) (www.healthtech.com/)—that had high utility in identifying genetic tests in development for clinical cancer care. A number of others had low-to-moderate utility, and some were not useful.14
There are a number of technical issues related to analytic validity that can influence the interpretation of a genetic test result, including preanalytic, analytic, and postanalytic factors.15,16 In general, preanalytic steps are those involved in obtaining, fixing or preserving, and storing samples prior to staining and analysis. Important analytic variables include the type of assay chosen and its reliability, types of samples, the specific analyte investigated, specific genotyping methods, timing of sample analysis, and complexity of performing the assay. Postanalytic variables relate to the complexity of interpreting the test result, variability from laboratory to laboratory, and quality control.15,16 Comparative effectiveness review teams should include or consult with molecular pathologists, geneticists, or others familiar with the issues related to the process of performing and reporting genetic tests to determine which of these technical issues are pertinent for a given review. Table 2 summarizes some of the preanalytic, analytic, and postanalytic questions that should be addressed.
For genetic testing of tumor specimens, it is important to understand that the tumor genome may be in a dynamic state, with mutations emerging over time (e.g., due to drug exposure or disruption of cellular repair). Tumor specimens will often contain normal cells from the patient as well as tumor cells. To accurately assess for somatic mutations using tumor specimens, particular strategies may be needed, such as enriching samples for tumor cells (e.g., by microscopic evaluation and dissection of tumor cells).
Some studies may utilize DNA-based assays whereas others may utilize functional assays with different sensitivities and specificities. Functional assays, in which a substrate or product of a metabolic process affected by a particular genetic polymorphism is measured, may have the advantage of showing potentially more important information than the presence of the genetic polymorphism itself. However, they may be affected by a number of factors and do not necessarily reflect the polymorphism alone. Unmeasured environmental factors, other genetic polymorphisms, and various disease states may influence the results of functional assays. In addition, functional assays that measure enzyme activity are taken at a single point in time. Depending on the enzyme and polymorphism being evaluated, the variation in enzyme activity over time should be considered in critical appraisal. Inconsistent results between studies using DNA-based molecular methods and those using phenotypic assays have been reported.16–18
For DNA-based tests, a variety of sample sources are available (e.g., blood, cheek swab, hair) that should hypothetically result in identical genotype results.16,19–23 However, DNA may be more difficult to obtain and purify from some tissues than from blood, particularly if the tissues have been fixed in paraffin versus fresh samples (DNA extraction from formalin-fixed tissue is difficult, but sometimes possible).16 Some studies utilize different sources of DNA for cases and controls, introducing potential measurement bias from differences in the ease of technique and test accuracy. Extraction of DNA from tumors in oncology studies may raise additional issues that influence analytic validity, including the quantity of tissue, admixture of normal and cancerous tissue, amount of necrosis, timing of collection, and storage technique (e.g., fresh, frozen, paraffin, formalin).16
When evaluating DNA-based molecular tests, the complexity of the test method, laboratory-to-laboratory variability, and quality control should be assessed. A number of methods are available for genotyping single nucleotide polymorphisms that vary in complexity and potential for polymorphism misclassification.16,24–26 Considering laboratory reporting of internal controls and repetitive experiments can be useful in assessment of overall analytic validity. The method of interpreting test results may influence complexity as well. For example, some tests require visual inspection of electrophoresis gels. Inter-observer variability should be considered for such tests.16,27
In critical appraisal of any case-control study, it is important to determine whether cases and controls were selected from the same source population. In the case of genetic studies, the geographic location of the population does not suffice. Rather, having cases and controls matched for ethnicity/race or ancestry is important since the frequencies of DNA polymorphisms vary from population to population (i.e., population stratification). It has been noted that many case-control studies of gene-disease associations have selected controls from a population that does not represent the population from which the cases arose.16,17,28–30 In general, only nested case-control studies could have low enough potential for selection bias to provide reliable information.
For some scenarios, a number of clinical factors associated with risk assessment or susceptibility may already be well characterized. In such cases, comparative effectiveness reviews should determine the added value of using genetic testing along with known factors compared with using the known factors alone. For example, age, sex, smoking, hypertension, diabetes, and cholesterol are all well-established risk factors for cardiovascular disease. Risk stratification of individuals to determine cholesterol-lowering targets is based on these factors.31 Assessment of newly identified polymorphisms—such as those described on chromosome 9p2132—that may confer increased risk of cardiovascular disease and have potential implications for medical interventions should be evaluated in the context of these known risk factors. In this scenario, investigators should determine the added value of testing for polymorphisms of chromosome 9p21 in addition to known clinical risk factors.
Multiple polymorphisms may be associated with risk of disease, prognosis, or prediction of drug response. In such cases, the effect of multiple polymorphisms can be explored using a multiple regression model. Then, prospective studies would usually be needed to determine whether the model including the genetic tests has clinical utility. For example, VKORC1 and CYP2C9 genotypes have been associated with warfarin dose requirements in multiple regression models. In order to determine whether tests for VKORC1 and CYP2C9 have clinical utility, studies would need to compare the use of a prediction model that contains the genetic tests in combination with known clinical factors that affect warfarin dose (e.g., age, BMI) with the use of clinical factors alone.33–35
In population genetics, most allele distributions follow a usual distribution, known as Hardy–Weinberg equilibrium (HWE). Genetic association studies should generally report whether the frequencies of the alleles being evaluated follow HWE. There are a number of reasons that distributions may deviate from HWE, including new mutations, selection, migration, genetic drift, and inbreeding.36 In addition, when numerous polymorphisms are tested for associations with diseases or outcomes, as in many genome-wide association studies, many of them (5%) will deviate from HWE based on chance alone (related to multiple testing).37 Although it is not specific and possibly not sensitive, deviation from HWE may be a clue to bias and genotyping error.37 Analysts should consider whether studies have tested for and reported HWE. A more detailed discussion of this topic as it relates to genetic association studies has been published elsewhere.36,37
When assessing internal validity of studies, it is important to assess whether sample size calculations appropriately accounted for the number of variant alleles and the prevalence of variants in the population of interest. This is particularly relevant for pharmacogenomic studies evaluating the functional relevance of genetic polymorphisms.38 Such studies often enroll an insufficient number of subjects to account for the number of variant alleles and the prevalence of variants in the population.38
Genetic test results are sometimes derived from analytically complex studies that have undergone a very large number of statistical tests. These may be in the form of genome-wide association studies searching for associations between a huge number of genetic polymorphisms and health conditions. Such association studies may launch further understanding of the importance of genetics in relation to a variety of health conditions but should generally be used to generate hypotheses rather than to test hypotheses or to confirm cause-effect relationships.16 Close scrutiny should be applied to ensure that the evidence for the association has been validated in multiple studies to minimize both potential confounding and potential publication bias issues. In addition, reviewers should note whether appropriate adjustments for multiple comparisons were used. Many recommend using a P value of less than 5×10−8 for the threshold of significance in large genome-wide studies.37,39,40 Other approaches include assessing the false positive report probability and controlling the false discovery rate.41–43
When a genetic mutation associated with increased risk is present, evaluating potential causality can be difficult as many other factors may influence associations. These include environmental exposures, behaviors, and other genes. Many genetic variants identified that are thought to influence susceptibility to diseases are associated with low relative and absolute risk.16,44 Thus, exclusion of non-causal explanations for associations and consideration of potential confounders are central to critical appraisal of such associations. It may also be important to explore biologic plausibility (e.g., from in vitro studies) to help support or oppose theories of causation.16
Be cautious of publications that report prevalence estimates for genetic variants that have actually arisen from overlapping data sets.16 For example, genome-wide association studies or other large collaborative efforts, such as the International Warfarin Pharmacogenomics Consortium, may pool samples of patients that were previously included in other published studies.3 To the degree possible, investigators should identify overlapping data sets and avoid double-counting. It may be useful to organize evidence tables by study time period and geographic area to identify potential overlapping data sets.16
As mentioned under Principle 4, it is important to understand that a tumor genome may be in a dynamic state. In addition, tumor specimens will often contain normal cells from the patient. The characteristics of the specimen will influence the sensitivity and operating characteristics of the test. Tests with greater sensitivity may be required when specimens contain both normal cells and tumor cells.
Since the completion of the Human Genome Project, the Hap Map project, and related works, there have been a great number of publications describing the clinical validity of genetic test results (e.g., gene-disease associations), but far fewer studies of the clinical utility. A review of genetic testing for cytochrome P450 polymorphisms in adults with depression treated with selective serotonin reuptake inhibitors (SSRIs) developed an analytic framework and five corresponding key questions which, taken together, provide an example of a well-defined predictive genetic test scenario that explores a potential chain of evidence relating to intermediate outcomes (Figure 3).45 The authors found no prospective studies with clinical outcomes that used genotyping to guide treatment. They constructed a chain of questions to assess whether sufficient indirect evidence could answer the overarching question by evaluating the links between genotype and metabolism of SSRIs (phenotype), metabolism and SSRI efficacy, and metabolism and adverse drug reactions to SSRIs.
An EPC report on HER2 testing to manage patients with breast cancer and other solid tumors provides a detailed assessment of challenges in conducting a definitive evaluation of preanalytic, analytic, and postanalytic factors when there is substantial heterogeneity or lack of available information related to the methods of testing.46 The authors noted that it had been only very recently that many aspects of HER2 assays were standardized, and that the effects of widely varying testing methods could not be isolated. Thus, they approached this challenge by providing a narrative review for their first key question (What is the evidence on concordance and discrepancy rates for methods [e.g., FISH, IHC, etc.] used to analyze HER2 status in breast tumor tissue?).
Additional considerations arise when evaluating genetic test results used to determine susceptibility or risk in asymptomatic individuals. The utility of such tests may depend on the ability of patients and providers to report and identify certain clinical factors. For example, a review of genetic risk assessment and BRCA mutation testing underscores the importance of accurately determining family history.4,47 The analytic framework begins by classifying asymptomatic women into high, moderate, or average risk categories. This is a good example of incorporating a key preanalytic factor (family history), that has an important influence on analytic validity. Tests for BRCA mutations may be used to predict the risk for breast and ovarian cancer in high-risk women (i.e., those with a family history suggesting increased risk). However, because we do not know all of the genes that contribute to hereditary breast and ovarian cancer and because analytic methods to detect mutations in the known genes are not perfect, population-based testing for hereditary susceptibility to breast and ovarian cancer is currently not an appropriate strategy. Rather, family history-based testing is the paradigm that is recommended to guide the use of BRCA testing.4, 47
Thus, family history is a genetic/genomics tool that is used to 1) identify people with possible inherited disease susceptibilities, 2) guide genetic testing strategies, 3) help interpret genetic test results, and 4) assess disease risk. The ability of providers to accurately determine a family history that confers increased risk is a key prerequisite to the utility of BRCA mutation and other predictive genetic testing. It is sometimes difficult for people to accurately recall the presence of a condition in their relatives. Sensitivity and specificity of self-reported family history are important in determining overall usefulness of predictive genetic testing.4
Analysts should understand common challenges, and apply the principles for addressing those challenges, when conducting systematic reviews of genetic tests used as predictive indicators. Key points include:
We would like to thank Halle R. Amick (University of North Carolina, Cecil G. Sheps Center for Health Services Research) and Crystal M. Riley (Duke-NUS Graduate Medical School Singapore) for their assistance with preparation of this manuscript, insightful editing, and outstanding attention to detail. We deeply appreciate the considerable support, commitment, and contributions of Stephanie Chang, MD, MPH, the AHRQ Task Order Officer for this project and the Evidence-based Practice Center Director.
This project was funded under contract HHSA290200710056I #1 from the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services.
The expressed views are the authors’ and do not necessarily represent the Agency for Healthcare Research and Quality, the U.S. Department of Health and Human Services, or the Veterans Health Administration.