|Home | About | Journals | Submit | Contact Us | Français|
Using genomic tests to personalize oncology treatment is a worthy endeavor. But when the rush to capitalize on biomedical research overcomes the rules of evidence, it’s the patients who suffer.
What happened at the Duke University Institute for Genome Sciences and Policy could be the plot of a Robin Cook medical thriller — except that Cook writes fiction and the events at Duke were all too real, especially for 110 cancer patients enrolled in three clinical trials based on bogus research.
The paper by a Duke research team led by principal investigators Anil Potti, MD, and Joseph R. Nevins, PhD, published in Nature Medicine1 in 2006 is a good place to start. The team was correlating microarray-based genomic signatures of NCI-60 cell lines with response to chemotherapy drugs to predict which chemo regimen would be most effective for a patient. Essentially, the team attempted to “personalize” oncology treatment.
The Duke team’s work attracted a great deal of attention among cancer researchers. At the MD Anderson Cancer Center, in Houston, bioinformaticians Keith A. Baggerly, PhD, and Kevin R. Coombes, PhD, professors in the Department of Bioinformatics and Computational Biology, were approached by colleagues who were keen to do similar research. Baggerly and Coombes parsed the paper and then asked the Duke team for more information. As their correspondence continued, the Duke team continued to publish papers in major medical journals but Baggerly and Coombes became increasingly convinced that the data didn’t add up. In November 2007, Nature Medicine published Baggerly and Coombes’s letter about unresolved questions, a reply from Potti, and a correction to the original paper.
Between October 2007 and April 2008, the Duke team initiated three clinical trials in which patients with early stage non-small cell lung cancer (NSCLC), early-stage breast cancer, and advanced lung cancer and breast cancer were assigned to chemotherapy regimens based on the results of Duke’s genomic tests. A fourth trial was started at the H. Lee Moffitt Cancer Center & Research Institute, in Tampa. Desperate patients saw these trials as their last best hope, and the Duke team was excited about offering cancer patients a rational approach to treatment. Potti, Nevins, and Duke University also launched two new companies to commercialize their method for guiding chemotherapy.
When Baggerly and Coombes learned in 2009 that the clinical trials had been launched, they submitted a paper to the Annals of Applied Statistics2 that pointedly alerted readers to the risks of treating patients based on what they regarded as sloppy science and flawed data. Lisa McShane, PhD, a statistician in the National Cancer Institute’s Biometric Research Branch, says that NCI had been aware of errors and methodological faults in the Duke papers but was unaware that trials had already been initiated — until NCI received a trial proposal from the Duke team in 2009 that proposed to use several of the tests in an NCI-sponsored trial (CALGB-30702). The NCI subsequently contacted Duke University and asked that the validity of Potti/Nevins’ methodology be examined.
In October 2009, The Cancer Letter, a weekly trade newsletter, began covering the Duke story. On October 8, Moffitt terminated its clinical trial. Meanwhile, the Duke Data Safety Monitoring Board and the Duke Cancer Protocol Review Committee concurred that patients enrolled in the trials were not at risk — despite the information that Baggerly and Coombes had presented.
Eventually, the concerns were shared with the Duke Institutional Review Board (IRB) and the principal trial investigators, which prompted the IRB to commission an independent review of the Potti/ Nevins methodology. On November 9, Baggerly sent a report about new questionable data posted online for one of the Duke trials to the vice dean of research at Duke University School of Medicine. Duke, however, acceded to Nevins’ request not to share the report with the external reviewers. On that same day, NCI, under its Cancer Therapy Evaluation Program (CTEP), denied a revised Cancer and Leukemia Group B trial protocol, CALGB-30702, for advanced stage NSCLC using Duke’s chemosensitivity predictors. A week later, McShane and CTEP associate director Jeffrey Abrams, MD, requested a re-evaluation of another Duke predictor in trial protocol CALGB-30506. In December, the external review commissioned by the IRB concluded that the predictors were scientifically valid and, with a few additions, could be fully responsive to Baggerly and Coombes’s comments. Duke sent the review to NCI in January 2010 and resumed enrollment in the three trials.
The IOM report isn’t about investigating misconduct at Duke University. It’s about recommending best practices for the discovery, confirmation, validation, and evaluation of omics-based tests to improve patient care.
In July 2010, The Cancer Letter reported that Potti had falsely claimed to be a Rhodes scholar and misrepresented other information on his curriculum vitae. Duke then placed Potti on paid administrative leave. Three days later, 31 international biostatisticians and bioinformatics experts sent a letter to NCI director Harold E. Varmus, MD, expressing concerns about the prediction models in the Duke clinical trials. Varmus and Duke officials then requested the Institute of Medicine (IOM) to assess the science behind Duke’s clinical trials and to recommend criteria for future omics-based tests. Duke investigators subsequently began retracting some of the questioned papers, and, ultimately, Duke terminated the trials.
The IOM Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials focused on best practices for developing omics-based tests. The committee included genomic scientists, a pathologist, computational biologist, biostatistician, clinical trial expert, and oncologist, among others. The result was a 274-page report “Evolution of Translational Omics: Lessons Learned and the Path Forward,”3 published in 2012.
“This is not rocket science,” declared NCI’s Lisa McShane in her testimony to the IOM Committee. “There is computer code that evaluates the algorithm. There is data. And when you plug the data into that code, you should be able to get the answers back that you have reported. And to the extent that you can’t do that, there is a problem in one or both of those items.”
McShane’s point is worth reiterating. The events at Duke University did not discredit the translation of genomics into diagnostics and therapeutics, nor were biostatistics or bioinformatics inadequate to the task at hand. Rather, as Baggerly, Coombes, and others demonstrated, the Duke investigators had erred in using the tools of bioinformatics. There were data management errors, incorrectly designed computational models, failure to validate gene expression tests, and what appeared to be intentional data corruption — all of which made it impossible to reproduce the results claimed by Duke investigators.
An “off-by-one” indexing error in the Potti paper published in Nature Medicine alerted Baggerly and Coombes to the possibility that “somebody is using software they don’t understand,” as Baggerly puts it. An early example of other errors that were to follow involved reporting one gene when the intended gene was the next one down in an ordered list. “Other types of mistakes became harder to explain,” Baggerly continues, referring to a paper by Hsu4 on a cisplatin chemosensitivity test used in lung cancer patients. Attempting to reproduce the results, Baggerly and Coombes were only able to match 41 of the 45 genes after accounting for the “off-by-one” indexing error. The other four were problematic. For example, Hsu reported using the Affymetrix U133A chip, but two of the four remaining probe sets are not on that chip.
“We were able to replicate a larger and larger fraction of the results, but only after specifically introducing certain types of errors that shouldn’t be there,” adds Baggerly. “That made us increasingly doubtful that their method did indeed work as described. We were seeing a lot of evidence that something’s broken.”
The IOM report defines omics as the scientific disciplines comprising the study of global sets of biological molecules such as DNAs (genomics), RNAs (transcriptomics), proteins (proteomics), and metabolites (metabolomics). What differentiates omics research from single-biomarker research is the huge volume of data generated by interrogating thousands of potentially relevant molecules. Gene variation, for example, can include mutations, translocations, insertions, deletions, copy number changes, epigenetic changes, and expression levels. Cleaning, organizing, and analyzing these complex “high-dimensional” biological datasets is what biostatisticians like McShane and bioinformaticians like Baggerly and Coombes do, typically in collaboration with clinical investigators. High-dimensional data, informally referred to as “short fat data,” are large data sets characterized by the presence of many more variables than independent observations, such as data sets that result from measurements of hundreds to thousands of molecules in a relatively small number of biological samples.
“Before we get into the analysis, which requires sophisticated mathematics, there’s the basic question: Are we looking at the right numbers in the first place?” says Baggerly, referring to early attempts to understand what Potti and Nevins were doing. “That’s where most of the difficulties that we encountered arose. Eventually we came up with what we thought were pretty good approximations of the mathematics involved. We were able to prove — at least to our satisfaction — that the right numbers hadn’t been used.”
Even with the right numbers, one risk inherent in high-dimensional data is “overfitting,” which unintentionally exploits characteristics of the data that are due to noise, experimental artifacts, or other chance effects not shared between data sets rather than to the underlying biology. In other words, overfitting leads to erroneous conclusions about the data. “There’s so much data there that you can often find things that aren’t real,” notes Baggerly.
All this and more is exhaustively covered in the IOM report. But “bad” biomarker tests are nothing new and certainly not limited to U.S. research, according to McShane.
“This is a problem we’ve had with traditional biomarker studies for a very long time,” she explains. “It’s not unique to omics technologies. I could point to papers in top-tier journals that have been coauthored by respected statisticians, but they’ve built genomic models that are not clinically useful or they compare clinical outcomes between groups defined by a biomarker predictor when the groups weren’t comparable.”
There are many ways to bungle costly, labor-intensive research. The international genomic research community has been writing and lecturing for years about how to do biomarker and genomics signature research. McShane herself coauthored one such paper.5 Another initiative by the global genomics research community is the EQUATOR Network (http://bit.ly/EQUATOR). Their website catalogues all types of reporting guidelines for health research studies. The rationale for reporting standards seems compelling — but it’s a tough sell. “Frankly, some people look at the REMARK checklist [on the EQUATOR website] and say, ‘Yeah, just someone else making my life difficult,’” says McShane. “The problem is, everyone’s busy and worried about where the next grant money is going to come from.”
Data aside, much that went wrong at Duke had to do with the basic principles of science, including not doing blinded, independent validation on samples separate from the training samples.
“What my mentor Alvan Feinstein at Yale called scientific toilet training,” says David F. Ransohoff, MD, professor of medicine and clinical professor of epidemiology at the University of North Carolina School of Medicine, in Chapel Hill, just down the road from Duke. Ransohoff served on the IOM report committee. “This is basic stuff you learned in grad school or high school. In the world of omics, it’s a matter of trying to see some clarity in the midst of lots of data, but the principles themselves are fundamentals we all understand.” Ultimately, Ransohoff acknowledges, we will need to rely on a system of checks and balances that includes the investigators themselves, the research institutions, peer-reviewed journals, funders, and the U.S. Food and Drug Administration.
The Duke Cancer Center oversight mechanism was circumvented, according to Robert M. Califf, MD, director of the Duke Translational Medicine Institute and vice chancellor for Clinical and Translational Research at Duke University School of Medicine. “Potti and Nevins made the case to the institution that their work was different enough from what we do that they shouldn’t fall under our aegis and that they should have their own enterprise, focused on genomic research,” says Califf. “They argued that they were in a new area so they should have the freedom to operate on their own. In this case it was the so-called Institute for Genome Sciences and Policy.”
This and other contributing factors — discontinuity in the Potti/ Nevins statistical team; reluctance to challenge a well-regarded tenured professor; confusion about what constitutes individual and institutional conflicts of interest; the IRB’s failure to inform external reviewers of questions raised by Baggerly, Coombes, and the NCI; and ambiguity about the need for an FDA Investigational Device Exemption (IDE) — are cited in the IOM report.
One of the most commercially successful omics-based tests currently in use, Oncotype DX, was developed by Genomic Health as a laboratory-developed test (LDT) without FDA review. However, in IOM’s report, Genomic Health’s chief scientific officer Steven Shak, MD, acknowledged that “the company benefited from prior interaction with FDA and the extensive background material FDA provides on its website about assay validation.” The report recommends that FDA clarify the regulation of omics-based tests by issuing guidance or regulation that specifies which omics-based tests require FDA review and at which point a review should occur. The FDA is encouraged to continue using and publicizing the pre-IDE process and to issue guidance for LDTs not currently reviewed by the FDA.
“I think it’s clear from the case studies that you can do the FDA process well or poorly and that you can do a laboratory-developed test process well or poorly,” says Debra Leonard, MD, PhD, professor and vice chair for laboratory medicine and director of clinical laboratories at Weill Medical College and director of clinical laboratories at New York-Presbyterian Hospital–Weill Cornell Medical Center. “Both are valuable pathways. The CLIA pathway allows you to move technologies into clinical practice in a cost-effective way that allows rapid access to those technologies. The FDA process requires documentation of safety and efficacy.”
Leonard, who also served on the IOM committee, points out that CLIA does not have much to say about the validation of molecular or genomic tests per se. To compensate, the College of American Pathologists (CAP) developed a checklist related to the development and validation of a test prior to clinical use, and because CAP has “deemed” status under CLIA, CAP accreditation is equivalent to CLIA accreditation. As the IOM report points out, if an omics-based test result is used for patient care, the test must be performed in a CLIA-certified lab.
Many medical schools now offer courses on basic principles of clinical research design and interpretation, but it can be easy to overlook those principles when laboratory-based researchers start doing nonexperimental or observational clinical research.
In experimental research, randomization is used to allocate groups of cell lines or people to an intervention vs. control. In an experiment, because you can “hold everything else equal” (except for the intervention), it is easy to learn about cause-and-effect or other associations.
In observational research, randomization cannot be done, and the rules of evidence — how to design and interpret a study so that you can trust its results — are much trickier. A culture clash can occur when laboratory-based researchers, used to the experimental method, start doing observational research but are unfamiliar with important rules of evidence.
— David F. Ransohoff, MD,
Professor of Medicine and Clinical Professor of Epidemiology
University of North Carolina School of Medicine
“FDA rules for evaluating diagnostics have gone through a fascinating evolution in the last four decades, and the current process doesn’t make clinical sense to a lot of people — including many at the FDA,” says Ransohoff, who is also a consultant to the FDA’s Immunology Devices Panel, Center for Devices and Radiological Health. “It’s not the FDA’s fault. They’ve never gotten the guidance they need from Congress or the administration.”
Last year, Duke University convened a Translational Medicine Quality Framework (TMQF) committee, which has issued a report along with a preliminary implementation plan. The TMQF report includes four key elements: improved bioinformatics support for laboratories; improved access to biostatistical collaboration and support; an enhanced system of accountability; and a formal assessment of studies that will involve a degree of rigor far exceeding current standards before they move into human trials. “Once everybody in the leadership realized that the institution had missed the boat, there was broad agreement that we needed to put significant changes into the system that would be institution-wide and policy driven,” says Califf, who cochairs the committee.
Two of the Duke clinical trials were terminated on November 4, 2010, and the third was terminated February 3, 2011. Duke identified 40 papers that Potti coauthored and that involved original data analysis. Based on responses from all 162 of his coauthors, two thirds of the papers were expected to be partially or fully retracted, while the remaining third were still considered valid. Potti resigned from Duke in November 2010 and later joined the Coastal Cancer Center in Myrtle Beach, S.C.
In September 2011, a North Carolina law firm filed a malpractice suit on behalf of patients who enrolled in the Duke clinical trials. News media have covered the Duke events extensively. Califf and Nevins appeared on a 60 Minutes TV episode titled “Deception at Duke,” aired February 12, 2012. In June, Gilbert Omenn, MD, PhD, who chaired the IOM committee, presented the report to a standing-room-only audience at the American Society of Clinical Oncologists (ASCO) 2012 meeting.
Asked about the likelihood of similar problems at other research centers, Califf says, “Well, we’ll never know. But the bigger issue is not this egregious case. It’s the everyday errors that are made because the systems are not what they should be. This is a wake-up call for everyone working in the biomedical research system to improve quality.
“If you read the IOM report, that is the point — it’s about improvements needed in the whole system.”