|Home | About | Journals | Submit | Contact Us | Français|
The use of biomarkers to “personalize” cancer treatment —identifying discrete genes, proteins, or other indicators that can differentiate one type of cancer from another and enable the use of highly tailored therapies -- offers tremendous potential for improved outcomes and lower treatment costs. However, the rapid development of cancer biomarker, or genomic, tests— combined with a paucity of evidence to support the effectiveness of the tests—presents a challenge for patients, clinicians and other stakeholders. In this article, we propose that comparative effectiveness research be used to strengthen what is now a haphazard process for developing and marketing cancer biomarker tests. We suggest novel funding approaches and a systematic process for moving from regulatory approval to the generation of evidence that meets the needs of stakeholders and, ultimately, patients.
Because of extraordinary technological advances in genomics and proteomics—the study of the genomes of organisms and the study of protein structures and functions, respectively—clinicians in cancer practice will soon have access to tens if not hundreds of biomarker tests, also known as genomic tests. Biomarkers are measurable variations between people in their genetic material, proteins, or other biological molecules. Clinicians can use biomarker test results that correlate with the presence of disease to make a diagnosis. Biomarkers can also be used to assess the risk of future disease, the aggressiveness of a patient’s disease over time, and the likelihood that a patient will respond to a particular treatment.
Although genomic tests offer the promise of effective and efficient cancer care tailored to the genetic profile of the patient or the type of cancer, many are being introduced into clinical practice without sufficient evidence about their benefits, risks, and impact on health care costs, compared to standard care. David Ransohoff and Muin J. Khoury have argued that because personal genomic information can cause significant harms as well as benefits (for example, when a “positive” test results in patients starting a therapy that is not proven to help but has known side effects), it is no different from other kinds of clinical information and should be held to the same standards of evidence.1
Comparative effectiveness research will be more important at the intersection of personalized medicine and cancer than anywhere else because of the high risks and costs of poor-quality test information about this group of diseases. We present an integrated approach to designing and implementing comparative clinical trials of personalized medicine technologies for cancer, such as the use of genomic tests. This process has been developed through a multidisciplinary collaboration between the Fred Hutchinson Cancer Research Center, the Center for Medical Technology Policy,2 the University of Washington, and SWOG, a clinical trial cooperative.3
Key features of the approach are: (1) identifying high-priority personalized medicine technologies by reaching a consensus with multiple stakeholders, such as consumer groups and clinicians; (2) designing trials that meet the needs of multiple stakeholders; (3) conducting trials through the use of clinical trials groups, described below; and, (4) developing and implementing funding mechanisms shared by private and public insurers—such as Medicare’s policy of coverage with evidence development, which reimburses certain procedures if the patient’s data become part of a registry or are used in a clinical trial.
There are many potential benefits of using a process driven by multiple stakeholders to implement comparative effectiveness studies within a cooperative group setting such as clinical trials groups. These groups—which provide a standing infrastructure to conduct large-scale clinical trials have the capacity to generate high-quality evidence needed for several levels of decision making. Representatives of expert groups such as the National Comprehensive Cancer Network (NCCN) or the American Society of Clinical Oncology (ASCO) and insurers participating in a trial’s design phase can help shape studies so they are better able to inform practice guidelines and coverage policies. Finally, pooling research funds from several sources, such as the National Institutes of Health and insurers, enables the health system to evaluate a greater number of personalized medicine tests.
A variety of personalized medicine applications in cancer treatment are in various stages of investigation, ranging from research to early clinical use. Although these technologies may improve care, providers and other stakeholders charged with assessing their usefulness in clinical practice have been hampered by the lack of information about these tests when they enter the market.
The Centers for Disease Control and Prevention’s Evaluation of Genomic Applications in Practice and Prevention group4 has conducted evidence reviews for genomic tests now on the market for colorectal, breast, and ovarian cancer. The group found that only half of the tests were consistently associated with clinical outcomes of interest such as disease recurrence or death, a concept known as clinical validity. There was reasonable evidence about roughly another quarter of the tests to suggest that they could measurably improve patient outcomes such as survival or quality of life or reduce the cost of care, compared to standard care without the test—a concept known as clinical utility.
Why do genomic tests often enter clinical practice with little evidence to support their clinical validity, utility, or cost-effectiveness? One important reason is the process by which the tests move from development through evaluation and regulatory review to the market. The search for an accurate biomarker often leads researchers to develop a test that identifies a particular risk or disease marker while overlooking the way that the risk or problem is currently assessed in practice. In addition, the process fails to consider what clinical actions may follow from the information provided by the test.
One example is the addition of genomic tests to standard clinical monitoring—including taking medical histories and performing regular physical examinations and mammograms—to assess breast cancer patients for the recurrence of disease after the initial treatment. The three biomarkers used in these tests—carcinoembryonic antigen (CEA) and two cancer antigens (CA), 15-3 and 27.29—are all antigens, a type of protein that tends to increase along with increases in the amount of cancer cells in the body. Studies suggest that monitoring these biomarkers may detect a recurrence of breast cancer five to six months earlier than standard monitoring. That earlier detection of recurrence might lead to timelier treatment, which could improve the patient’s quality of life and chance of survival.
Unfortunately, these tests are neither highly sensitive nor specific for breast cancer recurrence—that is, they are neither very likely to show recurrence when it has truly recurred, nor likely to show definitively that recurrence hasn’t happened. Results across tests of the same patient can fluctuate substantially, making it difficult to interpret changing levels. Because of their poor predictive ability, these tests could produce large numbers of false positive results, resulting in unwarranted anxiety on the part of patients and unnecessary additional evaluations such as computed tomography (CT) scans and biopsies. Moreover, the potential benefit of diagnosing a metastatic tumor five to six months earlier is unclear, because no cure for breast cancer at that stage exists today.
Because of these concerns—and the fact that no comparative effectiveness trials have been conducted specifically to evaluate whether tumor biomarker tests improve patients’ survival rates or quality of life, or the cost-effectiveness of care—clinical guidelines do not recommend using these genomic tests to monitor patients for a recurrence of breast cancer after primary therapy, which is typically surgery.5,6 Despite these guidelines, our examination of records for 2007 from Medicare and the Surveillance, Epidemiology, and End Results program of the National Cancer Institute7 found that these genomic tests are ordered for more than one in ten women with breast cancer. Furthermore, the Centers for Medicare and Medicaid Services currently reimburses providers for performing the tests.
The regulatory flexibility that manufacturers of genomic tests have in bringing their products to market is one reason for the lack of evidence about the tests.8
One pathway is through the development of a commercial test kit that manufacturers distribute to multiple laboratories. Makers of these kits must obtain Food and Drug Administration (FDA) approval prior to marketing them. The FDA requires evidence from human trials demonstrating that the tests are safe and effective when used for their intended purpose. For the FDA, effectiveness means that “it can fairly and responsibly be concluded by qualified experts that the device [test] will have the effect it purports or is represented to have under the conditions of use prescribed, recommended, or suggested in the labeling of the device.”9 The FDA is concerned about the possibility of false results, either negative or positive, rather than about whether a test represents value added, compared to standard care—its clinical utility.
The second pathway to market is through the development of a test for a single laboratory. Such tests are commonly called laboratory-developed tests or, less formally, “home brew” tests. Traditionally these were low-risk diagnostic tests for well-defined conditions, specialized tests used to diagnose rare diseases, or tests designed to serve local patient populations. Examples of some laboratory-developed tests include microscopic examinations (such as Pap smears and manual cell counts), erythrocyte sedimentation rates, microbiology cultures and susceptibility tests, examination of tissue sections (including staining protocols), and blood cross-matching procedures. Tests for rare genetic disorders are typically designed for local populations.
With a few exceptions, laboratory-developed tests have not been subject to FDA review,10 although this policy is currently being reassessed.11 Manufacturers of genomic tests have realized that choosing this route to market avoids the expense, time, and uncertainty that FDA review of commercial test kits entails.12 As a result, the great majority of new genomic tests for cancer are brought to market this way.
Without criticizing the potential value of laboratory-developed tests, it is important to note that their manufacturers do not have to meet any federally specified evidentiary standard and can market the tests to clinicians and patients as they see fit. Jeffrey Shuren, director of the FDA’s Center for Devices and Radiological Health, testified that the FDA has observed problems with some laboratory-developed tests in recent years, including faulty data analysis, exaggerated clinical claims, fraudulent data, poor clinical study design, and unacceptable clinical performance.13
Most US manufacturers seek certification for their tests under the Clinical Laboratory Improvement Amendments of 1988, whether or not they have FDA approval.10 The Centers for Medicare and Medicaid Services regulates this program, setting quality control procedures to ensure that the tests that laboratories use are accurate and reliable. Separate and distinct from FDA approval, certification is not designed to evaluate the impact of tests on clinical practice or patient outcomes.
Public and private insurers have historically paid for laboratory tests that are certified by this program, according to Current Procedural Terminology (CPT) codes overseen by the American Medical Association.14 These codes are assigned to medical, surgical, and diagnostic services that are provided to patients and are used to determine the reimbursement that a provider will receive from insurers.
Gene expression profiling tests are genomic tests that identify the activity of genes in cancer cells in order to assess the patient’s risk of cancer recurrence. They are designed to provide better results than traditional risk assessments that rely on evaluating the physical characteristics of a tumor, such as its size and pathology.
Two gene expression profiling tests for breast cancer the—Oncotype DX Breast Cancer Assay and MammaPrint—are marketed to clinicians and patients throughout the United States. MammaPrint received FDA approval on February 6, 2007, for use in determining the likelihood that breast cancer would return within five to ten years after a woman’s initial cancer diagnosis and treatment.15 Oncotype DX reached the market in 2004 as a laboratory-developed test and thus did not receive FDA approval. On February 27, 2006 California’s Part B Medicare administrator agreed to cover Oncotype DX, manufactured by Genomic Health.16 Agendia, the manufacturer of MammaPrint, received approval from the administrator on November 26, 2009.17 Both manufacturers maintain labs in California.
The MammaPrint and Oncotype DX stories illustrate the uncertainties that manufacturers face about whether insurers will reimburse providers for using their tests, as well as the possible different approaches to development and marketing. Because each test costs approximately $4,000, price was probably an important reason why major insurers waited for several years after the tests reached the market to cover them. Agendia’s decision to seek FDA approval may have been based on a strategy to accelerate its acceptance by Medicare, commercial insurers, and possibly patients.18 However, the lengthy FDA approval process helped delay MammaPrint’s entry into the market, compared to Oncotype’s.
The flexibility that manufacturers of genomic tests have in pursuing or avoiding the FDA approval process contributes to the heterogeneity of evidence supporting the tests. This can be an important obstacle to evaluating the comparative effectiveness of different tests, even those with essentially identical uses. For example, Oncotype DX has evidence supporting both its ability to determine the likelihood of cancer recurrence (its prognostic validity) and to determine the potential benefit of chemotherapy in reducing the risk of recurrence (its predictive validity). MammaPrint has evidence only of prognostic validity.19 However, both tests are now covered by several state Medicare programs and most large private insurers.
As we have shown, the current development process for genomic tests is haphazard. We propose an innovative framework to replace it, which would meet the needs of multiple stakeholders, would be systematic and evidence-driven, and would encourage use of the right test at the right point in a patient’s care. The framework includes systematic literature reviews and comparative effectiveness research.
Systematic literature reviews are an important first step in understanding the evidence for promising genomic tests in cancer. As an example, we used such a review, together with input from experts on the specific cancers to identify recent published and unpublished studies evaluating genomic tests for the five most common types of cancer: bladder, breast, colorectal, lung and prostate. We identified thirty-nine studies where the clinical validity of the test was explicitly evaluated (Exhibit 1). Only eight of the studies (21 percent) evaluated clinical validity using prospective correlation—that is, these studies were designed to evaluate the relationship between the result of a genomic test and the recurrence of cancer. None of the studies was a prospective evaluation of a test’s clinical utility.
If most genomic tests lack high-quality supporting evidence to start with, how should we select tests that warrant our spending time and money on more-informative prospective trials? We believe that the challenge of choosing tests for further study could be better addressed through early and sustained partnerships among experts and stakeholders, including test manufacturers, researchers, patients, consumers, clinicians, insurers, and members of groups that prepare guidelines for clinical practice and patients.
After conducting a systematic literature review like the one described above to identify tests that merit additional research, the next step involves sharing this information with stakeholders to find a “short list” of tests that have promising evidence and appeal strongly to the greatest number of groups. In the case of the tests in Exhibit 1, that short list might include those for common cancers or for conditions where the test provides uniquely valuable information that cannot be obtained by other means.
Each stakeholder group will have different perspectives and insights. Patient representatives will have perspectives about patients’ values and preferences. Clinicians will understand the clinical context and the level of need for alternatives to the status quo. Test manufacturers and researchers will know what is currently being done to evaluate emerging genomic tests. Each of the groups might have insights into what groups of patients and points in the progress of cancer are particularly worth including in a new study.
Comparative effectiveness research has been defined as “the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, monitor a clinical condition or to improve the delivery of care.”20 Exhibit 2 compares features of comparative effectiveness research and traditional research on genomic tests.
Some existing genomic tests have already been evaluated using study designs that could be considered comparative effectiveness research. An example is testing for Lynch syndrome. Caused by several genetic mutations, Lynch syndrome is a condition that predisposes patients to a high risk of developing certain cancers at an early age, primarily colorectal and uterine cancers. Although it was historically diagnosed through examining a patient’s family history, there are now several types of genomic tests available to screen patients with newly diagnosed colorectal cancer for this condition. If a cancer patient is found to have Lynch syndrome, family members can be alerted, tested, and—if found to have the relevant mutations——given regular screening by colonoscopy and uterine swab.
A comparative effectiveness framework such as we propose would ask several questions about the merits of using these tests to identify patients with Lynch syndrome, compared to the alternative of using family history alone to determine the individual’s likelihood of having Lynch syndrome.
The first question would be about the clinical validity of the tests: Do they have acceptable sensitivity and specificity -- in other words, are they likely to yield neither false negatives or false positives -- for identifying both people who have Lynch syndrome and those who do not? Summarizing available studies, the Centers for Disease Control and Prevention’s Evaluation of Genomic Applications in Practice and Prevention group4 categorized the evidence supporting the tests as “adequate,” meaning that the studies are of reasonable quality and the tests have good sensitivity and specificity for the condition.
The second question relates to the clinical utility of testing: Is there sufficient evidence to recommend the use of genomics tests instead of using family history to identify people who should receive screening? The Evaluation of Genomic Applications in Practice and Prevention group reviewed available studies that addressed this clinical problem and again found “adequate” evidence in favor of the tests.21
For example, a study by Bonis PA, Trinkalinos TA, Chung M, et al. shows that first-degree relatives—parents, children, and siblings—of people with Lynch syndrome mutations are willing to undergo counseling (52 percent), gene mutation testing (95 percent), and to have colonoscopies every one to two years (80 percent).22 Finally, a large Finnish trial demonstrated that screening people with the mutations through regular colonoscopies reduced the incidence of colorectal cancer by 62 percent, compared to usual care.23
The evidence supporting genomic testing of people without cancer for the Lynch syndrome mutations stems from retrospective and prospective cohort studies. Higher-quality studies—specifically, controlled trials that are both prospective and randomized—may be necessary for tests that are directly tied to decisions about using chemotherapy. Much hinges on the outcome: A negative test result could trigger a decision to forgo chemotherapy in cases where the current standard of care is to receive it. That means that a false negative result might jeopardize a patient’s chances for remission or cure, but the degree to which this is a problem remains unknown known without a prospective, randomized study.
Oncotype DX and MammaPrint, the gene expression profiling tests for breast cancer discussed above in the regulatory context, are examples of genomic tests for which the evidence of comparative effectiveness is mixed. These tests measure different sets of genes but have the same purpose: to guide decisions about using adjuvant chemotherapy—that is, chemotherapy after a primary treatment such as surgery to—decrease the risk that cancer will recur. Despite their current use, there is no direct evidence supporting their clinical utility, compared to the traditional treatment of chemotherapy. In addition, no studies have directly compared the two tests to each other.
Patients who otherwise would have received chemotherapy but decide against it following genomic testing surely have a better quality of life because they avoid the side effects of chemotherapy, but they may suffer more anxiety about the recurrence of cancer, compared to those who are treated. In addition, insurers worry that women will demand the test simply to learn the likelihood that their cancer will recur but will choose to have chemotherapy regardless of the test result, rendering the $4,000 test useless from a decision-making standpoint.
We have no information about what types of patients receive the tests and how well they adhere to treatment recommendations based on test results in actual clinical practice. We can answer those questions with retrospective studies, using health plans’ medical records to determine the proportion of women who had the test and received chemotherapy.
Generating direct evidence of whether the tests lead to better outcomes, however, requires randomized clinical trials that compare patients who receive standard treatment to those whose treatment is based on the results of genomic tests. Two ongoing, prospective randomized trials will provide needed information: TAILORx for Oncotype DX and MINDACT for MammaPrint.
TAILORx randomly assigned nearly 10,000 women with early-stage, hormone-responsive, node-negative breast cancer and intermediate OncotypeDX Recurrence Scores—that is, scores of 11–25, which predict neither a high or low risk of cancer recurrence—to chemotherapy treatment followed by endocrine therapy versus endocrine treatment alone. Because more than 45 percent of Oncotype DX scores are in the range of 11–25, and because most women would receive chemotherapy using standard criteria, the study is primarily designed to provide information on the appropriate cutoff for recommending adjuvant chemotherapy. It does not directly assess the effect of the test result on clinical decision making. In contrast, MINDACT is a direct evaluation of MammaPrint’s clinical utility, comparing the test to conventional methods of deciding whether a woman should receive chemotherapy.
Both trials include patient health outcomes such as survival and quality of life as endpoints, and both were started years after the test they are investigating reached the market.
One question for the future is whether high-quality trials of genomic tests can be designed before the tests become widely used in practice. Early trials are important because they can help decision makers understand the risks and benefits of testing in a controlled setting, rather than through retrospective studies (with all the attendant problems of making inferences from observational data). Another question is whether, in the spirit of comparative effectiveness research, such studies can be designed to meet the needs of multiple stakeholders; that is, have designs that satisfy regulatory requirements, and provide information that is vital to other stakeholders, such as clinicians, patients, and health insurers. We believe they can, provided that they make use of existing clinical trials groups and explicit collaborations with insurers.
Prospective randomized, controlled trials create the most compelling evidence to change clinical practice, but they are artificial constructs using highly selected patient populations. To address this issue, clinical trials groups could implement naturalistic or pragmatic trials. Pragmatic, or “large simple,” trials come closest to measuring the benefits of a novel treatment or approach in routine clinical practice. They minimize the criteria used to exclude patients and compare the innovation to usual care.
Purchasers and providers are requesting pragmatic trials to generate evidence supporting decisions about clinical care and health policy.24 The Clinical Trials Cooperative Group Program supported by the National Cancer Institute offers an infrastructure for high-quality prospective comparative effectiveness research on genomics tests for cancer.25 The program’s mission is to promote and support clinical trials of new cancer treatments and diagnostics, explore methods of cancer prevention and early detection, and study quality-of-life issues and rehabilitation during and after treatment.
To work most effectively with the Clinical Trials Cooperative Group Network we developed a multidisciplinary collaboration between the Fred Hutchinson Cancer Research Center, the Center for Medical Technology Policy,2 the University of Washington, and SWOG3. The Center for Medical Technology Policy facilitates meetings with stakeholders—such as representatives from clinical practice, industry, patient advocacy groups, insurers, and other payers—to identify priorities for research among emerging genomic technologies.
Our first priority has been to design a prospective comparative effectiveness study for women enrolled in Medicare whose breast cancer has spread to their lymph nodes. The study will compare the usual treatment (chemotherapy for all patients) and treatment using results from Oncotype DX (chemotherapy only for patients whose recurrence risk scores suggest it would be appropriate). Stakeholders advised us on the criteria for patients’ admission to the study and identified endpoints to be investigated in the trial, which opened on January 15, 2011. We will track health care use and costs, quality of life, and patient preferences.
Developing better evidence about the clinical utility of genomic tests for cancer will require creative funding strategies. The private sector bears much of the cost of discovering cancer biomarkers and manufacturing tests for them, but manufacturers cannot support definitive, long-term, randomized studies of health outcomes prior to obtaining reimbursement for their tests. Nor are federal sources of funding likely to be sufficient, given the current state of the US economy and the number of potential tests.
One policy mechanism that may be particularly useful is “coverage with evidence development.” As explained above, under coverage with evidence development, insurers reimburse providers for the cost of certain procedures if the patient’s data are added to a registry or used in a clinical trial.26 Medicare has used this approach for a variety of services, including off-label uses of biologics approved for colorectal cancer, the use of implantable cardioverter defibrillators to prevent sudden cardiac death, and the use of positron emission tomography to diagnose patients with malignancies or suspected dementia. One recent example is Medicare’s relying on this approach to begin several clinical trials evaluating genetic tests for sensitivity to warfarin in patients undergoing long-term treatment to prevent coagulation.27
Medicare’s program was intended to provide financial support for promising technologies that are very costly, which many cancer genomic tests are, while further evidence about their effectiveness is obtained. In some recent cases, the costs of the genomic tests alone exceed the general research costs of the clinical trial. Thus, paying for tests used in trials has become a major concern for potential funding agencies.
Although Medicare’s coverage with evidence development program has been in place for several years, the Center for Medical Technology Policy is investigating ways to make it feasible for private insurers instead to support evaluations of genomics tests.28 SWOG and several major health plans are trying to resolve the related legal, financial, and policy issues.
In order for such an approach to be viable, clinical trials groups, insurers, and other interested stakeholders will need to develop standardized methods for assessing clinical validity and clinical utility, particularly in the context of rapidly evolving technology; establish a network of clinical organizations capable of rapidly enrolling patients in these studies; and develop collaborative mechanisms so that payers, clinicians, patients, and researchers can work together to design, fund, and implement studies.
Personalized medicine offers the promise of treatments tailored to individual patients, improved outcomes, and lower treatment costs for people with cancer. However, regulatory and market structures do not encourage the development and manufacture of genomic tests based on the needs of multiple stakeholders, nor do they promote the generation of high-quality evidence about the tests’ role and value in clinical practice.
Comparative effectiveness studies can provide high-quality evidence that addresses multiple stakeholders’ perspectives. But this research will require a clear process to translate research findings in the lab into clinical use in patients, as well as infrastructure—such as the Cancer Cooperative Groups Trials Network or the HMO Research Network (www.hmoresearchnetwork.org)[please provide] —and novel funding methods to pay for diagnostic tests or treatment interventions, including coverage with evidence development. Such an integrated model will require that test manufacturers, clinical trials groups, and insurers modify their current ways of operating and paying for trials and treatment, and will require flexibility and willingness to share risk if the model is to be successful.
The authors thank Karma Kreizenbeck and Judy Nelson for their assistance in preparing this article. This work was partially funded by the Center for Comparative Effectiveness Research in Cancer Genomics through the American Recovery and Reinvestment Act of 2009 by the National Cancer Institute, National Institutes of Health (Agency Award No. 5UC2CA148570-02).
The content of the article is solely the responsibility of the authors and does not necessarily reflect the views or policies of the National Cancer Institute.