Methods of the EWG for reviewing the evidence share many elements of existing processes, such as the USPSTF,23
the AHRQ Evidence-based Practice Center Program,46
the Centre for Evidence Based Medicine,47
These include the use of analytic frameworks with key questions to frame the evidence review; clear definitions of clinical and other outcomes of interest; explicit search strategies; use of hierarchies to characterize data sources and study designs; assessment of quality of individual studies and overall certainty of evidence; linkage of evidence to recommendations; and minimizing conflicts of interest throughout the process. Typically, however, the current evidence on genomic applications is limited to evaluating gene-disease associations, and is unlikely to include randomized controlled trials that evaluate test-based interventions and patient outcomes. Consequently, the EWG must rigorously assess the quality of observational studies, which may not be designed to address the questions posed.
In this new field, direct evidence to answer an overarching question about the effectiveness and value of testing is rarely available. Therefore, it is necessary to construct a chain of evidence, beginning with the technical performance of the test (analytic validity) and the strength of the association between a genotype and disorder of interest. The strength of this association determines the test’s ability to diagnose a disorder, assess susceptibility or risk, or provide information on prognosis or variation in drug response (clinical validity). The final link is the evidence that test results can change patient management decisions and improve net health outcomes (clinical utility).
To address some unique aspects of genetic test evaluation, the EWG has adopted several aspects of the ACCE model process, including formal assessment of analytic validity; use of unpublished literature for some evaluation components when published data are lacking or of low quality; consideration of ethical, legal, and social implications as integral to all components of evaluation; and use of questions from the ACCE analytic framework to organize collection of information.13
Important concepts that underlie the EGAPP process and add value include (1) providing a venue for multidisciplinary independent assessment of collected evidence; (2) conducting reviews that maintain a focus on medical outcomes that matter to patients, but also consider a range of specific family and societal outcomes when appropriate54
; (3) developing and optimizing methods for assessing individual study quality, adequacy of evidence for each component of the analytic framework, and certainty of the overall body of evidence; (4) focusing on summarization and synthesis of the evidence and identification of gaps in knowledge; and (5) ultimately, providing a foundation for evidentiary standards that can guide policy decisions. Although evidentiary standards will necessarily vary depending on test application (e.g., for diagnosis or to guide therapy) and the clinical situation, the methods and approaches described in this report are generally applicable; further refinement is anticipated as experience is gained.
The analytic framework and key questions
After the selection and structuring of the topic to be reviewed, the EWG Methods Subcommittee drafts an analytic framework for the defined topic that explicitly illustrates the clinical scenario, the intermediate and health outcomes of interest, and the key questions to be addressed. provides generic examples of clinical scenarios. However, analytic frameworks for genetic tests differ based on clinical scenario, and must be customized for each topic. shows the example of an analytic framework used to develop the first EWG recommendation, Testing for Cytochrome P450 Polymorphisms in Adults with Nonpsychotic Depression Prior to Treatment with Selective Serotonin Reuptake Inhibitors (SSRIs)
; numbers in the figure refer to the key questions listed in the legend.55,56
Fig. 1 Analytic framework and key questions for evaluating one application of a genetic test in a specific clinical scenario: Testing for Cytochrome P450 Polymorphisms in Adults With Non-Psychotic Depression Treated With Selective Serotonin Reuptake Inhibitors (more ...)
The first key question is an over-arching question to determine whether there is direct evidence that using the test leads to clinically meaningful improvement in outcomes or is useful in medical or personal decision-making. In this case, EGAPP uses the USP-STF definition of direct evidence, “…a single body of evidence establishes the connection…” between the use of the genetic test (and possibly subsequent tests or interventions) and health outcomes.23
Thus, the overarching question addresses clinical utility, and specific measures of the outcomes of interest. For genetic tests, such direct evidence on outcomes is most commonly not available or of low quality, so a “chain of evidence” is constructed using a series of key questions. EGAPP follows the convention that the chain of evidence is indirect if, rather than answering the overarching question, two or more bodies of evidence (linkages in the analytic framework) are used to connect the use of the test with health outcomes.23,57
After the overarching question, the remaining key questions address the components of evaluation as links in a possible chain of evidence: analytic validity
(technical test performance), clinical validity
(the strength of association that determines the test’s ability to accurately and reliably identify or predict the disorder of interest), and clinical utility
(balance of benefits and harms when the test is used to influence patient management). Determining whether a chain of indirect evidence can be applied to answer the overarching question requires consideration of the quality of individual studies, the adequacy of evidence for each link in the evidence chain, and the certainty of benefit based on the quantity (i.e., number and size) and quality (i.e., internal validity) of studies, the consistency and generalizability of results, and understanding of other factors or contextual issues that might influence the conclusions.23,57
The USPSTF has recently updated its methods and clarified its terminology.57
Because this approach is both thoughtful and directly applicable to the work of EGAPP, the EWG has adopted the terminology; an additional benefit will be to provide consistency for shared audiences.
Evidence collection and assessment
The review team considers the analytic framework, key questions, and any specific methodological approaches proposed by the EWG. As previously noted, the report will focus on clinical factors (e.g., natural history of disease, therapeutic alternatives) and outcomes (e.g., morbidity, mortality, quality of life), but the EWG may request that other familial, ethical, societal, or intermediate outcomes also be considered for a specific topic.54
The EWG may also request information on other relevant factors (e.g., impact on management decisions by patients and providers) and contextual issues (e.g., cost-effectiveness, current use, or feasibility of use).
Methods for individual evidence reviews will differ in small ways based on the reviewers (AHRQ EPC or other review team), the strategy for review (e.g., comprehensive, targeted/ rapid), and the topic. These differences will be transparent because all evidence reviews describe methods and follow the same general steps: framing the specific questions for review; gathering technical experts and reviewers; identifying data sources, searching for evidence using explicit strategies and study inclusion/exclusion criteria; specifying criteria for assessing quality of studies; abstracting data into evidence tables; synthesizing findings; and identifying gaps and making suggestions for future research.
All draft evidence reports are distributed to the TEP and other selected experts for technical review. After consideration of reviewer comments, EPCs provide a final report that is approved and released by AHRQ and posted on the AHRQ website; the EPC may subsequently publish a summary of the evidence. Non-EPC review teams submit final reports to CDC and the EWG, along with the comments from the technical reviewers and how they were addressed; the EWG approves the final report. Final evidence reports (or links to AHRQ reports) are posted on the www.egappreviews.org
web site. When possible, a manuscript summarizing the evidence report is prepared to submit for publication along with the clinical practice recommendations developed by the EWG.56
Grading quality of individual studies
provides the hierarchies of data sources for analytic validity, and of study designs for clinical validity and utility, designated for all as Level 1 (highest) to Level 4. provides a checklist of questions for assessing the quality of individual studies for each evaluation component based on the published literature.5,13,23,48,58,59
Different reviewers may provide a quality rating for individual studies that is based on specified criteria, or derived using a more quantitative algorithm. The EWG ranks individual studies as Good
, or Marginal
based on critical appraisal using the criteria in and . The designation Marginal
(rather than Poor
) acknowledges that some studies may not have been “poor” in overall design or conduct, but may not have been designed to address the specific key question in the evidence review.
Hierarchies of data sources and study designs for the components of evaluation
Table 4 Criteria for assessing quality of individual studies (internal validity)55
Components of evaluation
EGAPP defines the analytic validity of a genetic test as its ability to accurately and reliably measure the genotype (or analyte) of interest in the clinical laboratory, and in specimens representative of the population of interest.13
Analytic validity includes analytic sensitivity (detection rate), analytic specificity (1-false positive rate), reliability (e.g., repeatability of test results), and assay robustness (e.g., resistance to small changes in preanalytic or analytic variables).13
As illustrated by the “ACCE wheel” figure (http://www.cdc.gov/genomics/gtesting/ACCE.htm
), these elements of analytic validity are themselves integral elements in the assessment of clinical validity.13,42
Many evidence-based processes assume that evaluating clinical validity will address any analytic problems, and do not formally consider analytic validity.23
The EWG has elected to pursue formal evaluation of analytic validity because genetic and genomic technologies are complex and rapidly evolving, and validation data are limited. New tests may not have been validated in multiple sites, for all populations of interest, or under routine clinical laboratory conditions over time. More importantly, review of analytic validity can also determine whether clinical validity can be improved by addressing test performance.
Tests kits or reagents that have been cleared or approved by the FDA may provide information on analytic validity that is publicly available for review (e.g., FDA submission summaries).60
However, most currently available genetic tests are offered as laboratory developed tests not currently reviewed by the FDA, and information from other sources must be sought and evaluated. Different genetic tests may use a similar methodology, and information on the analytic validity of a common technology, as applied to genes not related to the review, may be informative. However, general information about the technology cannot be used as a substitute for specific information about the test under review. Based on experience to date, access to specific expertise in clinical laboratory genetics and test development is important for effective review of analytic validity.
(column 1) provides a quality ranking of data sources that are used to obtain unbiased and reliable information about analytic validity. The best information (quality Level 1) comes from collaborative studies using a single large, carefully selected panel of well-characterized samples (both cases and controls) that are blindly tested and reported, with the results independently analyzed. At this time, such studies are largely hypothetical, but an example that comes close is the Genetic Testing Quality Control Materials Program at CDC.61
As part of this program, samples precharacterized for specific genetic variants can be accessed from Coriell Cell Repositories (Camden, NJ) by other laboratories to perform in-house validation studies.62
Data from proficiency testing schemes (Levels 1 or 2) can provide some information about all three phases of analytic validity (i.e., analytic, pre- and postanalytic), as well as interlaboratory and intermethod variability. ACCE questions 8 through 17 are helpful in ensuring that all aspects of analytic validity have been addressed.42
(column 1) lists additional criteria for assessing the quality of individual studies on analytic validity. Assessment of the overall quality of evidence for analytic validity includes consideration of the quality of studies, the quantity of data (e.g., number and size of studies, genes/alleles tested), and the consistency and generalizability of the evidence (also see , column 1). The consistency of findings can be assessed formally (e.g., by testing for homogeneity), or by less formal methods (e.g., providing a central estimate and range of values) when sufficient data are lacking. One or more internally valid studies do not necessarily provide sufficient information to conclude that analytic validity has been established for the test. Supporting the use of a test in routine clinical practice requires data on analytic validity that are generalizable to use in diverse “real world” settings.
Table 5 Grading the quality of evidence for the individual components of the chain of evidence (key questions)57 Clinical validity
EGAPP defines the clinical validity of a genetic test as its ability to accurately and reliably predict the clinically defined disorder or phenotype of interest. Clinical validity encompasses clinical sensitivity and specificity (integrating analytic validity), and predictive values of positive and negative tests that take into account the disorder prevalence (the proportion of individuals in the selected setting who have, or will develop, the phenotype/clinical disorder of interest). Clinical validity may also be affected by reduced penetrance (i.e., the proportion of individuals with a disease-related genotype or mutation who develop disease), variable expressivity (i.e., variable severity of disease among individuals with the same genotype), and other genetic (e.g., variability in allele/genotype frequencies or gene-disease association in racial/ethnic subpopulations) or environmental factors. ACCE questions 18 through 25 are helpful in organizing information on clinical validity.42
(column 2) provides a hierarchy of study designs for assessing quality of individual studies.13,23,44,46–48,50,53,63
Published checklists for reporting studies on clinical validity are reasonably consistent, and (column 2) provides additional criteria adopted for grading the quality of studies (e.g., execution, minimizing bias).5,13,23,44,46–51,53,58,59,63
As with analytic validity, the important characteristics defining overall quality of evidence on clinical validity include the number and quality of studies, the representativeness of the study population(s) compared with the population(s) to be tested, and the consistency and generalizability of the findings (). The quantity of data includes the number of studies, and the number of total subjects in the studies. The overall consistency of clinical validity estimates can be determined by formal methods such as meta-analysis. Minimally, estimates of clinical sensitivity and specificity should include confidence intervals.63
In pilot studies, initial estimates of clinical validity may be derived from small data sets focused on individuals known to have, versus not have, a disorder, or from case/control studies that may not represent the wide range or frequency of results that will be found in the general population. Although important to establish proof of concept, such studies are insufficient evidence for clinical application; additional data are needed from the entire range of the intended clinical population to reliably quantify clinical validity before introduction.
EGAPP defines the clinical utility of a genetic test as the evidence of improved measurable clinical outcomes, and its usefulness and added value to patient management decision-making compared with current management without genetic testing. If a test has utility, it means that the results (positive or negative) provide information that is of value to the person, or sometimes to the individual’s family or community, in making decisions about effective treatment or preventive strategies. Clinical utility encompasses effectiveness (evidence of utility in real clinical settings), and the net benefit (the balance of benefits and harms). Frequently, it also involves assessment of efficacy (evidence of utility in controlled settings like a clinical trial).
and (column 3) provide the hierarchy of study designs for clinical utility, and other criteria for grading the internal validity of studies (e.g., execution, minimizing bias) adopted from other published approaches.13,23,46 – 48,57
Paralleling the assessment of analytic and clinical validity, the three important quality characteristics for clinical utility are quality of individual studies and the overall body of evidence, the quantity of relevant data, and the consistency and generalizability of the findings (). Another criterion to be considered is whether implementation of testing in different settings, such as clinician ordered versus direct-to-consumer, could lead to variability in health outcomes.
Grading the quality of evidence for the individual components in the chain of evidence (key questions)
provides criteria for assessing the quality of the body of evidence for the individual components of evaluation, analytic validity (column 2), clinical validity (column 3), and clinical utility (column 4).23,44,47,48,64
The adequacy of the information to answer the key questions related to each evaluation component is classified as Convincing
, or Inadequate
. This information is critical to assess the “strength of linkages” in the chain of evidence.57
The intent of this approach is to minimize the risk of being wrong in the conclusions derived from the evidence. When the quality of evidence is Convincing,
the observed estimate or effect is likely to be real, rather than explained by flawed study methodology; when Adequate
, the observed results may be influenced by such flaws. When the quality of evidence is Inadequate
, the observed results are more likely to be the result of flaws in study methodology rather than an accurate assessment; availability of only Marginal
quality studies always results in Inadequate
Based on the evidence available, the overall level of certainty of net health benefit is categorized as High
, or Low
certainty is associated with consistent and generalizable results from well-designed and conducted studies, making it unlikely that estimates and conclusions will change based on future studies. When the level of certainty is Moderate
, some data are available, but limitations in data quantity, quality, consistency, or generalizability reduce confidence in the results, and, as more information becomes available, the estimate or effect may change enough to alter the conclusion. Low
certainty is associated with insufficient or poor quality data, results that are not consistent or generalizable, or lack of information on important outcomes of interest; as a result, conclusions are likely to change based on future studies.
Translating evidence into recommendations
Based on the evidence report, the EWG’s assessment of the magnitude of net benefit and the certainty of evidence, and consideration of other clinical and contextual issues, the EWG formulates clinical practice recommendations (). Although the information will have value to other stakeholders, the primary intended audience for the content and format of the recommendation statement is clinicians. The information is intended to provide transparent, authoritative advice, inform targeted research agendas, and underscore the increasing need for translational research that supports the appropriate transition of genomic discoveries to tests, and then to specific clinical applications that will improve health or add other value in clinical practice.
Recommendations based on certainty of evidence, magnitude of net benefit, and contextual issues
Key factors considered in the development of a recommendation are the relative importance of the outcomes selected for review, the benefits (e.g., improved clinical outcome, reduction of risk) that result from the use of the test and subsequent actions or interventions (or if not available, maximum potential benefits), the harms (e.g., adverse clinical outcome, increase in risk or burden) that result from the use of the test and subsequent actions/interventions (or if not available, largest potential harms), and the efficacy and effectiveness of the test and follow-up compared with currently used interventions (or doing nothing). Simple decision models or outcomes tables may be used to assess the magnitudes of benefits and harms, and estimate the net effect. Consistent with the terminology used by the USPSTF, the magnitude of net benefit (benefit minus harm) may be classified as Substantial
, or Zero
Considering contextual factors
Contextual issues include clinical factors (e.g., severity of disorder, therapeutic alternatives), availability of diagnostic alternatives, current availability and use of the test, economics (e.g., cost, cost-effectiveness, and opportunity costs), and other ethical and psychosocial considerations (e.g., insurability, family factors, acceptability, equity/fairness). Cost-effectiveness analysis is especially important when a recommendation for testing is made. Contextual issues that are not included in preparing EGAPP recommendation statements are values or preferences, budget constraints, and precedent. Societal perspectives on whether use of the test in the proposed clinical scenario is ethical are explored before commissioning an evidence review.
The ACCE analytic framework considers as part of clinical utility the assessment of a number of additional elements related to the integration of testing into routine practice (e.g., adequate facilities/resources to support testing and appropriate follow-up, plan for monitoring the test in practice, availability of validated educational materials for providers and consumers).13
The EWG considers that most of these elements constitute information that should not be included in the consideration of clinical utility, but may be considered as contextual factors in developing recommendation statements and in translating recommendations into clinical practice.
Standard EGAPP language for recommendation statements uses the terms: Recommend For, Recommend Against, or Insufficient Evidence (). Because the types of emerging genomic tests addressed by EGAPP are more likely to have findings of Insufficient Evidence, three additional qualifiers may be added. Based on the existing evidence and consideration of contextual issues and modeling, Insufficient Evidence could be considered “Neutral” (not possible to predict with current evidence), “Discouraging” (discouraged until specific gaps in knowledge are filled or not likely to meet evidentiary standards even with further study), and “Encouraging” (likely to meet evidentiary standards with further studies or reasonable to use in limited situations based on existing evidence while additional evidence is gathered).
As a hypothetical example of how the various components of the review are brought together to reach a conclusion, consider the model of a pharmacogenetic test proposed for screening individuals who are entering treatment with a specific drug. The intended use is to identify individuals who are at risk for a serious adverse reaction to the drug. The analytic validity and clinical validity of the test are established and adequately high. However, the specific adverse outcomes of interest are often clinically diagnosed and treated as part of routine management, and clinical studies have not been conducted to show the incremental benefit of the test in improving patient outcomes. Because there is no evidence to support improvement in health outcome or other benefit of using the test (e.g., more effective, more acceptable to patients, or less costly), the EWG would consider the recommendation to be Insufficient Evidence (neutral). In a second scenario, a genetic test is proposed for testing patients with a specific disorder to provide information on prognosis and treatment. Clinical trials have provided good evidence for benefit to a subset of patients based on the test results, but more studies are needed to determine the validity and utility of testing more generally. The EWG is likely to consider the recommendation to be Insufficient Evidence (encouraging).
Products and review
Draft evidence reports are distributed by the EPC or other contractor for expert peer-review. Objectives for peer review of draft evidence reports are to ensure accuracy, completeness, clarity, and organization of the document; assess modeling, if present, for parameters, assumptions and clinical relevance; and to identify scientific or contextual issues that need to be addressed or clarified in the final evidence report. In general, the selection of reviewers is based on expertise, with consideration given to potential conflicts of interest.
When a final evidence report is received by the EWG, a writing team begins development of the recommendation statement. Technical comments are solicited from test developers on the evidence report’s accuracy and completeness, and are considered by the writing team. The recommendation statement is intended to summarize current knowledge on the validity and utility of an intended use of a genetic test (what we know and do not know), consider contextual issues related to implementation, provide guidance on appropriate use, list key gaps in knowledge, and suggest a research agenda. Following acceptance by the full EWG, the draft EGAPP recommendation statement is distributed for comment to peer reviewers selected from organizations expected to be impacted by the recommendation, the EGAPP Stakeholders Group, and other key target audiences (e.g., health care payers, consumer organizations). The objectives of this peer review process are to ensure the accuracy and completeness of the evidence summarized in the recommendation statement and the transparency of the linkage to the evidence report, improve the clarity and organization of information, solicit feedback from different perspectives, identify contextual issues that have not been addressed, and avoid unintended consequences. Final drafts of recommendation statements are approved by the EWG and submitted for publication in Genetics in Medicine
. Once published, the journal provides open access to these documents, and the link is also posted on the www.egappreviews.org
web site. Announcements of recommendation statements are distributed by email to a large number of stakeholders and the media. The newly established EGAPP Stakeholders Group will advise on and facilitate dissemination of evidence reports and recommendation statements.
This document describes methods developed by the EWG for establishing a systematic, evidence-based assessment process that is specifically focused on genetic tests and other applications of genomic technology. The methods aim for transparency, public accountability, and minimization of conflicts of interest, and provide a framework to guide all aspects of genetic test assessment, beginning with topic selection and concluding with recommendations and dissemination. Key objectives are to optimize existing evidence review methods to address the challenges presented by complex and rapidly emerging genomic applications, and to establish a clear linkage between the scientific evidence, the conclusions/recommendations, and the information that is subsequently disseminated.
In combining elements from other internationally recognized assessment schemes in its methods, the EWG seeks to maintain continuity in approach and nomenclature, avoid confusion in communication, and capture existing expertise and experience. The panel’s methods differ from others in some respects, however, by calling for formal assessment of analytic validity (in addition to clinical validity and clinical utility) in its evidence reviews, and including (on a selective basis) nontraditional sources of information such as gray literature, unpublished data, and review articles that address relevant technical or contextual issues. The methods and process of the EWG remain a work in progress and will continue to evolve as knowledge is gained from each evidence review and recommendation statement.
Future challenges include modifying current methods to achieve more rapid, less expensive, and targeted evidence reviews for test applications with limited literature, without sacrificing the quality of the answers needed to inform practice decisions and research agendas. A more systematic horizon scanning process is being developed to identify high priority topics more effectively, in partnership with the EGAPP Stakeholders Group and other stakeholders. Additional partnerships will need to be created to develop evidentiary standards and build additional evidence review capacity, nationally. Finally, the identification of specific gaps in knowledge in the evidence offers the opportunity to raise awareness among researchers, funding entities, and review panels, and thereby focus future translation research agendas.