|Home | About | Journals | Submit | Contact Us | Français|
A panel discussion on the future of clinical trials addressed several issues raised in the course of the first day of the Workshop. The chair posed questions in six areas which the panel and audience addressed. This paper summarizes the panel discussion.
It’s not unusual for methodology to precede applications. For Bayesian trials to become more widespread, do we need advocates? Our two good examples are cancer trials at MD Anderson and device trials at the FDA. Both of these have advocates.
You’re talking about a social-cultural transformation and it is going on right now. It’s not a philosophic war; we need examples and high profile advocates. We also need journal editors who are receptive and more statisticians comfortable using these methods. There’s a lot of magical thinking about Bayesian methods, but clinicians can’t find a statistician who is willing to or able to implement the methods. The methods make certain hard problems easier. We need to be realistic about when the Bayesian methods are beneficial and when a frequentist design is equally good.
The need for advocacy is very important. Many don’t understand these new methods. We need to present these methods as a way to clarify issues. Statistics in Medicine is trying to publish Bayesian methods, but the problem is translating them into clinical practice.
Most serious biostatisticians are Bayesian in spirit because they’re always thinking about the whole picture. The problem is how to make it easier and clear. Where it’s useful we’ll use it.
Bayesian methods have been around a long time. Why haven’t they taken hold? If one were to objectively evaluate the proper role of Bayesian methods what would be a desirable percent of trials that should be Bayesian?
There has been a similar discussion in epidemiology and genetics. We have talked at length about the value of Bayesian methods, but they are not being used very much. One issue is lack of familiarity and training in Bayesian approaches needs to begin in graduate school. People ask all the time how Bayesian methods influence one’s ultimate conclusion. While a bit deterministic, compelling examples showing how Bayesian methods shift your final conclusions would be helpful.
The lack of computational software for implementing Bayesian methods had been a limitation until Markov Chain Monte Carlo came along.
Historically there was polemic on the Bayesian side and that didn’t help make the methods popular. Now that the computations can be done more easily, we can get involved in hard problems. Bayesian methods will not be a panacea, but let’s formalize design in the Bayesian construct, which is not controversial and introduce the analysis methods slowly.
An interventional cardiology trial with a Bayesian design was submitted to JAMA. A detailed section of methods was incomprehensible to the editors. The paper had to be totally rewritten to be accessible to the readers. To be effective, advocates have to be able to communicate.
To summarize, we are moving towards being Bayesians. We should consider whether there is need for the methodology in specific cases and also educate statisticians so they can implement and explain this methodology.
Surrogates are considered one way to make trials smaller and faster (hence cheaper). Dr. Fleming’s examples indicated how poorly surrogates have fared. Is there any future for surrogate endpoint clinical trials?
In hypertension and other cardiovascular trials we have moved to hard endpoints, but in orphan drugs, where the population is small, surrogates are a good start. However, you can’t stop with the surrogate. You have to be able to show that a result with a surrogate endpoint leads to a consistent result with the hard clinical endpoint.
We must rely on surrogates as an intermediate step. It has been a total reliance on surrogates which has gotten us into trouble. I was part of the Institute of Medicine panel that took a pretty hard nosed position on surrogates. We might use surrogates to determine who would be a better responder, but as a final exam I don’t have much hope for surrogates.
I think that what Tom Fleming was suggesting yesterday was that if we understood all of the causal pathways and all of the effects of a specific drug, then we wouldn’t need a clinical trial. Since we don’t know everything about the drugs we test, it’s easy to get it wrong, using surrogates or not. Another means of lowering sample size is to use a composite endpoint.
There are increasing opportunities to measure surrogates. Is this good or bad?
I regard surrogates as a question of societal and regulatory decision making and levels of evidence. Surrogate endpoints are a lesser evidential standard than hard endpoints. I don’t accept that surrogates have fared poorly in many disease areas. We trumpet the counterexamples. If surrogates give the correct result 85% of the time, we will still be wrong 1/6th of the time. Unless we are willing to wait for the time it takes for a definitive trial, we are stuck with surrogates. We need to give attention to our ability to change decisions once we get new evidence.
Including enough detail in publications when we use interesting, novel methods will often be contentious because of word limits and our collaborators’ desire to use available space for more medical detail. How can we prevail?
We don’t ever get the space we need. We need a design paper publication and a web supplement to get the details out.
I worry that with statistics relegated to appendices, statisticians will not get credit for authorship.
There isn’t space in the manuscript for the 24 detailed items proposed by Altman. We do need an alternative, none of which are really good other than a separate design publication.
Cancer doesn’t do design papers and in cardiology we do them all the time.
With the new push towards a need for reproducibility, this problem will solve itself. There was a series of high profile papers published in top journals which alleged that you could tailor chemotherapy to individual patients and which turned out to be incorrect. Clinical trials had been based on these results and had to be halted. The journals were reluctant to consider that the work was incorrect. Several papers ended up being retracted and this episode makes journals more open to some means of allowing reproducibility. Data cannot be published in journals, so there will need to be different ways to deliver this information. We need to use cyberspace effectively. The journal Trials, edited by Douglas Altman, is taking initiatives for new methods of reporting clinical trials.
Supplemental material is part of the primary publication and statisticians should get credit for their work here. Another avenue to give statisticians credit is via a design paper.
Reporting methods in the medical literature is very important and one needs to describe methods sharply so that we are understood by the sophisticated clinician.
There are some tough issues here which we haven’t addressed. Supplements imply a larger burden on the reviewers and it isn’t clear that the supplemental material is thoroughly refereed. Papers which are cited in a supplement do not get counted as citations.
It’s post-publication that needs as much scrutiny as pre-publication.
What should we do when missingness is non-ignorable and different models yield results which differ from an intention-to-treat analysis? For instance, differential drop out may be expected to occur more frequently in a trial with a “usual care” control or in any unblinded trial.
This is obviously a very serious problem. The best answer is to not get into this situation, but when dealing with free-living people and with certain diseases it is virtually impossible to avoid serious missingness! When missingness is serious, it can destroy a trial, particularly if there is differential dropout. Nonetheless, we don’t want to throw the trial away, or not deal with disease states where this is a chronic problem. So we report results as best as we can and hope we’re right.
Of course eliminating missingness entirely is not realistic. One should anticipate and plan for potential missingness in the design phase.
Different kinds of analyses answer different questions. When there is missingness, we may focus on the effectiveness question and give up on the efficacy question. The goal of the paper is to represent the uncertainty as well as the results. Report estimates with uncertainty and if the uncertainty is so great that different models and different imputations give different results, we should report that. We shouldn’t report just one favored analysis but focus on properly representing the uncertainty. Sometimes we should throw up our hands and not conclude anything.
The type of missingness that statisticians dislike is censoring at last contact or 14 days after treatment stops, as was done in the Vioxx trial in colon cancer patients. Investigators should avoid censoring at last contact when it is easy enough to follow up the individuals. We should try to get complete data and then when there is missingness, try to understand the missingness mechanism. A sensitivity analysis should be a standard part of analysis and reporting.
I don’t believe that missingness is ever at random! The hype or promise or hope for methods of imputation are not met because they are not understandable by all and that can lead to interesting discussions.
The objective of sensitivity analysis is to reassure you. Even if missingness is not at random, you need a lot of missingness for methods which assume “missing at random” to be invalid. When we have a case with a lot of drop out in one arm and can’t reassure ourselves that missingness is non-ignorable, should we skip publication?
When I was at the Heart, Lung and Blood Institute, missingness was not a huge problem, but since then I have seen trials in which it is not unusual for 1/3 of the subjects not to return. Are we going to tell that discipline not to bother doing clinical trials? Rather than throwing out the data, say there is a great degree of uncertainty.
I was on an IOM panel looking at treatment of Post-Traumatic Stress Disorder (PTSD) in veterans and all trials had at least 50% dropout. We said we couldn’t draw conclusions because of the dropout. Only two trials had proper imputation (most used last value carried forward).
We should punish people for missing data!
In summary, it’s best not to have missing data!
Is there a future for the large, simple trial? With increased observational evidence of association (which trialists consider hypothesis generating), how will we decide when a trial is warranted and which trials to conduct?
This is a question in the hands of the funders. There is a huge push to extract higher quality information from designed observational studies and how much more do we get out of a large randomized trial for x million dollars? The same data structure that will enable high level observational studies will be useful in clinical trials. In a conference I will attend in a few months the idea is to create a data system which is useful for clinical trials and clinical care. We’ll see how this will come out.
In some arenas like cardiovascular diseases, where the event rates are so low, large trials are needed, so there is a future for large simple trials. The difficulty is that the endpoints are often composite and the events that happen first are not necessarily the most important ones. We often don’t follow up the patients after this first event, but should. Observational evidence has a role, but such studies have selection bias. They do give comfort when they give the same results as a clinical trial, but replacing clinical trials with observational studies would be a very dangerous route to take.
NIH will spend an enormous amount of money building a data warehouse based on electronic health records. This will increase the ability to do observational studies and it would be nice if these could be used for large simple randomized trials. However, there is still reluctance on the part of many practicing physicians to randomize. The number willing to do this is small.
My conception of large simple trials is that there would be many sites and few patients per site and little investigator training. I don’t think I’m ready for that. One area in which we can simplify is that we have learned that central adjudication endpoints is very expensive with very little gain.
I wonder about the impact of “off-shore” trials. While this provides one avenue for increasing the number and size of trials due to lower cost, the design and analysis of “off-shore” trials may raise a number of other issues.
Is there any place for changes in trial design that are not pre-planned? Adaptive designs that look at interim results appeal to industry because they appear to increase the likelihood of a positive result. But do they?
Certainly there is a place for adaptive designs. Theoretically they should increase the likelihood of a positive result but I don’t know if they do in practice.
They should be used in a limited way, if ever, because they are open to a lot of shenanigans by people who have various conscious and unconscious biases.
We should define “adaptive”. Group sequential trials are adaptive; increasing the sample size for a low overall event rate is adaptive. We’ve been doing these for a long time. The controversy arises when treatment differences are used to change the trial, even if such a change is planned.
What is the impact on the investigators when something dramatic comes out of an adaptive design? Changes in trial design need to be preplanned or else there are questions about the validity of the study. When there is a small population and a need to figure out how long we should follow up the subjects, that is one place for an adaptive design. These questions should be considered before the trial is undertaken. The methods don’t promise a positive result, but help get the most information we can out of the study.
“Adaptive designs” is a complicated, all-encompassing term. Bayesian monitoring gives an upper bound on coming to a false conclusion and allows more latitude than frequentist designs. Bayesian adaptive designs take a very different perspective than frequentist ones.
Some adaptations don’t need to be preplanned. For example, you can change a trial based on the overall effect (rather than the effect by treatment arm) without preplanning and not jeopardize the frequentist properties.
This is a very difficult area which takes a lot of thought. The FDA advises preplanning.
Our time is up now. Thanks to the panel and the audience for an interesting and lively discussion.