|Home | About | Journals | Submit | Contact Us | Français|
The clinical utility is uncertain for many cancer genomic applications. Comparative effectiveness research (CER) can provide evidence to clarify this uncertainty.
To identify approaches to help stakeholders make evidence-based decisions, and to describe potential challenges and opportunities using CER to produce evidence-based guidance.
We identified general CER approaches for genomic applications through literature review, the authors’ experiences, and lessons learned from a recent, seven-site CER initiative in cancer genomic medicine. Case studies illustrate the use of CER approaches.
Evidence generation and synthesis approaches include comparative observational and randomized trials, patient reported outcomes, decision modeling, and economic analysis. We identified significant challenges to conducting CER in cancer genomics: the rapid pace of innovation, the lack of regulation, the limited evidence for clinical utility, and the beliefs that genomic tests could have personal utility without having clinical utility. Opportunities to capitalize on CER methods in cancer genomics include improvements in the conduct of evidence synthesis, stakeholder engagement, increasing the number of comparative studies, and developing approaches to inform clinical guidelines and research prioritization.
CER offers a variety of methodological approaches to address stakeholders’ needs. Innovative approaches are needed to ensure an effective translation of genomic discoveries.
Clinical validity—the association between genotype and clinical phenotype—is now available for an increasing number of genomic applications. On the other hand clinical utility— the improvement in patient outcomes and balance of risks and benefits—is largely unknown for most cancer genomic applications. Implementing tests with uncertain clinical utility potentially wastes health care resources, through variable or unnecessary use of those tests. In the worst case, individuals are harmed when they or their health care provider acts on the test results such that they receive ineffective, potentially harmful treatments, or cause anxiety or discrimination. Further, clinical utility may be quite specific, as when limited to subgroups with certain genotypes.1 To maximize the clinical relevance of existing and as-yet unknown genomic applications, it is crucial to ensure that clinically valid tests also have high clinical utility before they become widely used.
The clinical utility may be unclear for numerous reasons, including the relative lack of regulatory requirements for test manufacturers to demonstrate clinical utility.2 Furthermore, the research community has not aggressively prioritized either the translation of new discoveries into practical use or the generation of evidence on these applications.3 The field is also changing so quickly that evidence becomes rapidly outdated. In some cases, there may be little incentive for private sector investment in molecular diagnostics because of a lack of value-based reimbursement. Finally, existing paradigms for generating and evaluating evidence may be too slow, too costly, too unwieldy, or too unrepresentative to provide useful evidence to decision makers in a timely manner.4-7
Comparative effectiveness research (CER) is intended to create evidence for decision making, and to find out “what works” in health care. Although many definitions of CER have been proposed,8-12 we use the Institute of Medicine’s (IOM) definition:10 “CER is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.” Some also use the term patient-centered outcomes research to refer to this type of research, although this concept will ultimately carry its own definition (pcori.org).
Concerns over the growing costs of health care13-15 have made the use of CER a practical necessity, which have been enabled by $1.1 billion in funding from the American Recovery and Reinvestment Act (ARRA), and the advent of the Patient-Centered Outcomes Research Institute (PCORI) in the recent Patient Protection and Affordable Care Act (2010). Other developments that make CER timely are a new genetic test registry (www.ncbi.nlm.nih.gov/gtr/) at the NIH, recent congressional hearings stimulated by concerns over direct to consumer genetic testing in July 2010, and possible changes at the FDA to consider genetic tests as medical devices, which would require regulatory approval before marketing.
It is critical that all stakeholders (including consumers, insurers, policymakers, and clinicians) possess tools to assess the clinical utility of genomic applications. We describe CER approaches to answer questions about cancer genomic applications, and the potential challenges and opportunities associated with each. We provide case studies of genomic applications to illustrate the types of questions decision makers are facing, and describe potential CER study designs and methods that can be used to address them.
We searched Pubmed for recent literature on CER, and searched the citations of these articles to identify additional publications relevant to CER. We also considered additional articles that were not identified through this search, but were known to the authors. We selected the following methodology categories for consideration: evidence synthesis, prospective comparative clinical trials, observational research, health economics and decision modeling, and stakeholder engagement. We developed descriptions of these approaches as applied to CER based on literature reviews and the authors’ experience. We then identified a series of case studies of breast cancer genomic applications to clarify CER questions and possible methods to address them. We selected breast cancer because of the public health relevance of the disease, and because of the plethora of genomic applications currently in clinical practice. We used the ACCE framework (Analytic validity, Clinical validity, Clinical utility, and Ethical, legal and social implications) 16 as a starting point to identify and organize the information we would abstract on the case studies. Finally, we identified particular challenges for using these CER approaches to conduct genome-based research.
Our results are presented in three sections: 1) identification of the key questions for CER applications in cancer genomics, 2) illustration of the key questions using examples from breast cancer genomic applications, and 3) general methodological approaches to addressing the key questions.
Is there a significant association between the results of the genomic application and clinical phenotype? (clinical validity)
Does the genomic application provide correct information? (analytic validity)
Does the genomic application provide clinically significant information? (clinical utility)
Does the genomic application lead to improve patient outcomes compared to the alternative? (comparison or added clinical value)
Genomic applications can span the entire range of disease, from risk identification to di-agnosis and patient management. Table 1 shows examples of both conventional and genomic applications in the context of breast cancer for each test category. We provide summary tables of example key questions for breast cancer genomic applications that address risk assessment [Table 2] and treatment decisions [Table 3].
Clinical validity is the association between the predictor (e.g., genotype, profile, or family history status) and clinical phenotype. Predictors are identified by investigating targeted path-ways, candidate gene analysis, or through agnostic genome-wide study designs. Methodological problems from multiple testing, heterogeneity, the “winner’s curse” (the likelihood that the first report of a significant test will have a larger effect size than later replication studies), small sample size, and other concerns make interpretation challenging.18-20 Further, the attributable risk may be small because of low frequency or low penetrance, or the variant may only be linked to the functional variant. For example, initial studies reported an association between CYP2D6 variants and the risk of disease recurrence in women taking tamoxifen [Table 3].21 A systematic evidence review, however, found inconsistent evidence.22 Preliminary results from recent retrospective analyses of large randomized controlled trials (RCTs) including about 5,000 women23,24 found no association between CYP2D6 variants and breast cancer recurrence.
Analytic validity refers to characteristics of the test including reproducibility (i.e., will the same test performed on the same sample produce the same result?), the lower limit of detection (smallest quantity of the target that can be reliably detected), and analytic specificity (ability to measure the target and only the target). A proficiency testing program (exchange of quality control material for analysis and comparison across laboratories) may be the best approach to address this concern. For example, when HER2 testing [Table 3] was first used in breast cancer clinical trials, it is estimated that up to 20% of test results may have been incorrect. Laboratories with lower volume testing were the most likely to report incorrect findings.25,26 A proficiency testing program has since been implemented for HER2.27
Clinical utility has to do with whether the information provided by the genomic application is actionable, and evaluating the balance between risks and benefits of available actions. BRCA1/2 testing [Table 2] is one example. Mutation carriers are at increased risk of developing breast and ovarian cancer and can receive more effective breast cancer screening by choice of screening modality or interval, can undergo surgeries to reduce risk by 85-100%, or can select chemoprevention. High-risk women in families with known mutations who undergo testing and are found not to carry deleterious BRCA1/2 mutations can receive significant psychosocial benefit and avoid these interventions. On the other hand, the clinical utility of gene expression profiles is less clear.28 A key area of uncertainty is how women and their physicians will make treatment decisions based on test results in the intermediate risk category. Two prospective RCTs —TailoRx and RxPonder—are underway to evaluate how risk profile scores affect patient management, treatment decisions and subsequent outcomes.29,30
Added Clinical value 17 asks whether the application provides superior clinical, patient, or economic outcomes than the alternative, which could be another intervention or usual care. A critical factor is how to define and measure ‘better’, which could include measures of predictive accuracy, quality of life, survival, or other outcomes, including testing costs, acceptability, or feasibility. Recently, a genetic risk prediction model for breast cancer was published including 10 well-validated single nucleotide polymorphisms (SNPs) [Table 2].31 The predictive power of this genetic model is only slightly better (about 4%) than the widely used Gail model,32 which uses non-genetic factors to predict risk. Because both models explain about 60% of risk, and because the Gail model can be used without the expense of genetic testing, the added clinical value of the risk prediction model based on SNP profiles is low.
The key questions and methodological challenges described above, coupled with the need for CER to inform a diverse group of stakeholders, will require a range of innovative strategies, including both evidence synthesis and evidence generation [Table 4].
Evidence synthesis begins with identifying topics through processes such as horizon scanning,33 which searches published literature and grey literature databases (e.g., meeting abstracts, commercial websites, newsletters, or business news) for emerging genomic applications. Horizon scanning may also examine existing curated databases of published literature such as the HuGE Navigator, the GAPP Knowledge Base (GAPPKB), or the Pharmacogenomics Knowledge Base (PharmGKB). Grey literature sources identify emerging genomic applications because of the lag in reporting on these topics in peer-reviewed published literature; these may be supplemented by a query process from users as an early indicator of burgeoning clinical interest. Once new topics are identified, rapid topic briefs, or short reviews, are used to assess the feasibility of a full systematic review.
Full systematic reviews are often identified through a public nomination process and then commissioned through an existing body such as the U.S. Preventive Services Task Force, EGAPP Working Group, or the AHRQ Effective Healthcare (EHC) Program. The scope of the review is defined by the analytic framework and key questions, and the reviewers conduct a broad but systematic search to identify evidence. They develop inclusion and validity criteria for the evidence, and abstract needed data, which is then synthesized and summarized in a narrative. Quantitative approaches such as meta-analysis may provide summary estimates of critical measures across studies. While full systematic reviews are comprehensive, they may not be timely, which is a critical issue in summarizing evidence in genomics.
Explanatory RCTs are used to evaluate the efficacy of a medical intervention. They are often viewed as the ideal approach to protect against bias. However, this study design also has limitations.34,35 Explanatory RCTs are typically restricted to selected patients, but real-world populations can differ markedly in age, race, comorbid conditions, concomitant medication use, and environmental factors. The generally small sample size of RCTs may under-represent some patient groups, a particular concern when evaluating genomic-based subgroups. Randomization requires a prospective design, and so RCTs tend to focus on questions of short-term efficacy and safety using intermediate (surrogate) endpoints. Finally, because RCT protocols are often far removed from routine practice, they may not accurately predict real-world effectiveness.
Innovative strategies in the design of clinical trials seek to overcome these limitations. Pragmatic clinical trials 36,37 address the issue of relevance by assessing the effectiveness of the intervention in routine practice by using wide patient inclusion criteria, allowing variation in the treatment protocol, and assessing outcomes relevant to everyday life. However, these studies typically require much larger sample sizes to compensate for heterogeneity in the patient population and the treatment protocol, and longer time frames to assess patient-relevant outcomes.
To fund and implement studies with larger sample sizes, collaborations between researchers, health care systems, and payers will be critical. A policy framework for conducting such collaborations is coverage with evidence development (CED). CED is a conditional reimbursement decision by a payer, with an explicit linkage between payment and data collection to reduce uncertainty about the intervention.38,39 The Centers for Medicare and Medicaid Services (CMS) recently issued a CED policy for warfarin pharmacogenomic testing, in which CMS will pay for testing if the patient is enrolled in a RCT designed to measure bleeding events.40
Cluster randomized trials are another alternative experimental design in which units such as communities, medical clinics or hospitals, or families are randomized to intervention arms rather than individuals. This design is often used when the intervention is aimed at changing the behavior of the group or the behavior of a provider, or changing the organization of services. This design can also be used to reduce contamination (e.g., ‘spill-over’ effects of a mass educational campaign), or to improve the feasibility of a study. Cluster randomized trials require more sophisticated analytic approaches and larger sample sizes because of lack of independence among individual observations.41,42 However, this study design may still be cost-efficient.43 Cluster randomized trials have been used to assess the impact of decision support tools implemented at the provider level, particularly involving genetic risk assessment based on family history.44-46
Bayesian or adaptive trial designs can accelerate the pace of evidence generation by in-corporating information from prior cases to alter the study midway, based on interim results. An adaptive design incorporates genomic profiles into the trial design by changing the patient randomization process to treatment arms as the trial progresses based on the accumulated data for each profile.47 Despite potential advantages, these trials have not gained widespread acceptance because of nonstandard methods and resistance among FDA regulators.
One example of an adaptive design is the I-SPY 2 project.48 This is a phase II RCT in the neoadjuvant setting for women with locally advanced breast cancer. Patients are randomized to treatment arms based on their biomarker profile. Initially, patients with a given biomarker profile have an equal chance of being randomized to each treatment arm. Over time, the randomization ratio (i.e., the vector of probabilities that a patient will be randomized to each treatment arm) for each biomarker profile is adjusted depending on the experience of previously randomized patients with that profile. Thus, future patients are more likely to be randomized to treatment arms in which patients with similar biomarker profiles achieved a better response.
Observational study designs are a valuable and complementary approach to RCTs.34,35,49 These designs are especially useful when it would be unethical or infeasible to conduct an RCT. For example, Habel and colleagues (2006)50 conducted a retrospective case-control study to evaluate the association between long-term outcomes (the risk of breast cancer death) and Oncoty-peDX Recurrence Score. Previous studies based on RCTs could not evaluate this outcome and used shorter-term outcomes instead including rates of distant recurrence as the primary measures.51,52 The primary limitation with observational study designs is the possibility of confounding bias due to unexplained differences between exposure groups, which are not controlled for through randomization. One option is to use risk adjustment approaches, such as propensity scores or instrumental variables. However, unlike randomization, these approaches cannot control for unmeasured or imperfectly measured covariates, so residual confounding may still be present. Observational designs are less subject to bias when there is no relationship between treatment assignment and treatment response and can contribute important information about un-anticipated, real-world impacts that complements RCTs.
Use of large, administrative health care databases to access routinely collected data may offer significant advantages for an observational design. The large population size enables the study of infrequent events. Also, such databases are representative of routine care, making it possible to study real-world effectiveness and utilization patterns. The data are available at relatively low cost without long delays compared with data gathering for a new prospectively recruited study. Electronic data from integrated health care systems with a defined population and electronic medical records (EMRs) allow broad consideration of the patient’s health status. Over time, EMRs and associated databases will make it feasible to consider long-term outcomes. One limitation is a lack of clinically derived genomic information or the ability to easily access it.53 However, many systems now have biorepositories linked to EMRs, which can facilitate retrospective study designs.
Evidence-based bodies have generally relied on RCTs to inform their guideline development when weighing relative benefits and harms. Decision modeling provides a framework to formally incorporate indirect and direct evidence from various sources, to evaluate likely outcomes, and to quantify uncertainty. The advantages of this approach are a structured, transparent framework for assessing the available evidence, and, critically, for quantifying the uncertainty of evidence and its potential impact on patient outcomes. Challenges include timeliness of implementation, development of models acceptable to stakeholders, problems with assumptions and model transparency, and the development of formal guidelines or recommendations based on modeling analyses. Recent work indicates that stakeholders such as clinicians, health care payers, and guidelines groups are open to using such approaches in genomics if the process is trans-parent and there is not an overreliance on the model results to drive recommendations.54
Another CER approach is value-of-research (VOR) analysis, also called value-of-information (VOI), which is used to make decisions about selecting technologies for additional research, and for designing those trials optimally. The concept behind VOR is that additional research reduces our uncertainty about which intervention to use in clinical practice.55 Reducing uncertainty is valuable because it reduces the chances that the less optimal strategy is selected, and studies that provide ‘negative’ results are still valuable. Impacts on patients’ morbidity and mortality are assessed, as well as health care costs. These approaches are just beginning to be applied to research prioritization decisions in health care, and must be shown to be feasible as well as useful before widespread implementation. The VOR paradigm may be particularly useful in genomics because the pace of innovation leads to the need to prioritize investment in expensive comparative studies.56
Cost-effectiveness analysis (CEA) is the standard approach to formally assess the incremental value of health care technologies.57 These analyses can incorporate a variety of outcomes including clinical events, life-expectancy, quality-adjusted life expectancy, and health care costs. Applying CEA to genomics can be challenging. First, the general lack of comparative effectiveness data makes evaluation of comparative value problematic, and uncertainty must be carefully assessed. Second, the value patients and clinicians place on knowing genetic information (the ‘value of knowing’) is difficult to measure and to incorporate in policy decisions.58,59 Contingent valuation (willingness-to-pay) approaches have been used;60 more recently, discrete choice experiments to assess patient preferences offer significant promise.61
Given CER’s explicit purpose to produce useful information for decision making, there has been increasing recognition of the importance of including stakeholders such as patients, clinicians, payers, and policymakers in CER activities. The IOM recommended specifically that this work “should fully involve consumers, patients, and caregivers in …strategic planning, priority setting, research proposal development, peer review, and dissemination”.10 The rationale is that such involvement will lead to a focus on questions of most relevance to end-users.62 Stakeholder involvement should increase the chances that study designs will reflect the specific questions of decision makers, and the greater relevance of the research questions will also facilitate use of results in decision making. Recent work by Deverka and colleagues is one example of an approach to involve stakeholders in assessing the current state of evidence.
While the need for stakeholder engagement is widely recognized, the published literature on this topic is limited, and there are few formal evaluations of these methods.63 Some qualitative synthesis has identified several recurring themes, including the importance of developing trust and shared understanding through sustained interaction and devoting adequate time and resources to training and preparation.64 The need for valid methods for engaging patients, consumers and clinicians has been identified as a critical CER methods research priority.65
Developing sufficient evidence on the clinical utility of cancer genomic applications is complex. The rapid pace of innovation in genomics means that studies must be extremely efficient if the evidence is to remain timely and relevant. This can limit the potential endpoints to short-term outcomes, or require retrospective designs to enable sufficient time for events to accumulate. There is a general lack of evidence for clinical utility, but there is also a need to clarify the meaning of clinical utility. For example, the concept of personal utility, or the value of knowing the information, is clearly relevant for some decision makers and settings (e.g., direct-to-consumer marketing), but may not be relevant in a clinical context.60 The metrics for measuring personal utility are not well established.58,66 It is also essential to identify the relevant comparator for CER, and to present data to enable appropriate comparisons.
To resolve questions on the clinical utility of genomic applications, a more comprehensive approach is needed. Very few genomic applications have sufficient evidence for widespread recommendation and use in clinical care. Research is needed that considers more outcome measures, and in settings that are relevant to more real-world clinical decisions. All stakeholders have a role in facilitating the generation of evidence. For example, health systems are needed to provide data and facilitate pragmatic trials, providers are needed to use genomic tests in the context of evidence generation, and test developers are needed to make tests available for collaborative study. A clear approach to developing priorities for CER research is also needed to ensure that limited resources are used to resolve the most compelling questions. Such approaches should engage stakeholders to ensure the study of pressing topics in ‘real-world’ environments and should proof approaches for rapid evidence synthesis and quantitatively assess the value of prioritized research, considering the health and well-being of patients and the decision-making needs of other stakeholders.
Second, it may be necessary to reform the evidentiary framework to define evidence standards for clinical utility.6 This task that will require a dialogue and interaction between evidence appraisers and end users to develop consensus and to define acceptable alternatives to the current hierarchies of evidence. That is, to recognize that a RCT is not desirable or feasible in every circumstance, and to decide when (not if) to use an observational study design and the extent to which evidence of underlying biological mechanisms contribute to the evidentiary frame-work.67 Beyond study designs, an evidentiary framework needs to cogently articulate the minimal evidence necessary before clinical application is warranted, taking into consideration issues around the type of genomic application and its clinical context.
Third, strategies that are rapid, timely, and efficient are needed given the fast pace of discovery in genomic-based approaches. Existing methods are limited,68 and innovative methods are needed to make CER successful and relevant to decision making.69,70 New strategies will involve transformation of the research infrastructure to “learning systems” that allow continual addition to the evidence base. This approach will achieve greater efficiency through efforts such as establishing bio-repositories or registries, linking electronic medical record data or administrative databases to genomic information and creating quality-assured clinical data repositories, or improving standardized coding schemes for genomic applications.
Finally, any reforms of the evidentiary framework should uphold rigorous standards on the statistical validity of the research.71 Although some study designs have a risk of greater uncertainty, we can make strategic choices about when such increased uncertainty is acceptable. We should improve the integrity and conduct of all study designs by using guidelines such as those provided in Strengthening the Reporting of Observational Studies in Epidemiology (STROBE), CONsolidated Standards of Reporting Trials Statement (CONSORT), STrengthening the REporting of Genetic Associations (STREGA), and Genetic RIsk Prediction Studies (GRIPS). Also, we can describe how threats to validity are assessed in grading evidence, or require pre-registry of the analysis plan for observational studies, as is currently done for RCTs, to reduce biases (including selective outcome reporting) or errors, such as from multiple testing.
The risk of maintaining the status quo in cancer genomic medicine is high. Informed decision making through the development and application of comparative effectiveness research could accelerate the implementation of valuable genomic applications, while avoiding harmful applications that can persist in clinical care, leading to waste or patient harm.
This manuscript was supported in part by cooperative agreements funded by the American Recovery and Reinvestment Act (ARRA) from the National Cancer Institute (NCI) including the CERGEN study UC2 CA148471 (Goddard, Whitlock, Feigelson), the CANCERGEN study RC2 CA148570-01 (Ramsey, Veenstra), RC2CA148041-01 (Lyman), Building a Genome Enabled Electronic Medical Record UC2CA150911 (Knaus), and a cooperative agreement funded by the Centers for Disease Control and Prevention 5U18-GD000005-02 (Veenstra).
David Veenstra reports serving as a consultant for Medco, Novartis Molecular Diagnostics, and Genentech, and is supported by the following genomics-related research grants: P50HG003374, RC2CA148570, and UO1GM092676 from the National Institutes of Health and U18GD000005 from the Centers for Disease Control and Prevention.
The authors have no other conflicts of interest.