The choice of the target response rate is a key aspect of Phase II design. Poorly chosen targets reduce the ability of Phase II’s to determine which agents or approaches should be considered for testing in definitive phase III trials and which should not be evaluated further. The consequences are that patients may be exposed to treatments that are unlikely to be effective, while jeopardizing the development of therapies that are more likely to be beneficial and to improve standards of care. The cost in resources, both in dollars and in time, is difficult to overestimate.
The phase II studies we evaluated include those measure both tumor regression and those assessing the proportion of patients who had not progressed or who were alive at a fixed time after treatment. We found that although a high proportion (52%) of Phase II trials required historical data to determine the null, few justified the choice of null by clearly explaining the results of prior studies. We were, furthermore, unable to find a single study that incorporated statistical adjustments for either sampling error or case mix. Trials that failed to report a rationale for the historical bar which the new therapy had to exceed were much more likely to conclude that the new therapy was “active” and worthy of further study. For this analysis, we considered explicit reference to historical data necessary when the null tumor response rate exceeded 10%, or when a time-to-event outcome endpoint such as survival at one year was used. Both of these design characteristics imply some level of activity for the historically treated group.
Although a potential limitation of this analysis was that it was restricted to trials reported in Journal of Clinical Oncology and Cancer, we have no reason to believe that trials published in other journals would differ in terms of the need for historical data. However, it is possible that by focusing on journals that publish a higher than average proportion of Phase II trials with statistical designs, our estimates of design shortcomings are conservative. It is also possible that details on the historical data used for study design may have been included in the study protocols, but not the published reports we analyzed. Yet reports which omitted details of historical data were more likely to be interpreted as positive (p=0.005). Moreover, researchers attempting to analyze phase II results should not have to take on faith a critical design decision such as the choice of the null. It may also be the case that an approach approved for one disease is tested in a Phase II for a different disease. Physicians considering the clinical use of this approach for the new indication should be able to evaluate key trial characteristics critically.
To our knowledge, this is the first report on the use of historical data in Phase II design. There are, however, some estimates based on prior reviews which are comparable to those presented here. For example, in a systematic review of Phase II study design(5
), just over half of the Phase II trials published in Cancer
and the Journal of Clinical Oncology
reported a study design and that the null was rejected in 74% of trials, both estimates close to what we report.
With the exception of single agent trials in patients with untreatable tumors, we believe that the rationale for the null level of response must be made explicit. We have the following recommendations for Phase II design and reporting (summarized in ). First, if the null is based on historical data, these should be cited and described in the methods
. The description should include the dates when the patients were treated, the type of study (Phase II, Phase III, cohort study), and details of the therapy. With respect to the dates of accrual, it is well recognized that as therapies are accepted, they are utilized earlier in the natural history of disease in patients with an inherently better prognosis independent of treatment. A single estimate should be derived from the historical data: specifying only a range should be avoided. For instance, take the case where there three prior studies had been reported with sample sizes of 1000, 100 and 20 and response rates of 33%, 22% and 15%. This is a total of 355 responses in 1120 patients (32%). It is preferable to give this single historical response rate of 32% than to say only that “response rates in prior studies varied from 15% – 33%” on the grounds that the latter offers no guidance as to the appropriate null: investigators tempted to pick the middle of the range would underestimate the true response rate and inflate the risk of a false positive.
Recommendations for Phase II trials requiring historical data
The relationship between the null and the historical data should be detailed clearly. For example, in the case of a novel chemotherapy agent added to single agent cisplatin for non-small cell lung cancer, the historical data should include the response rates to the cisplatin alone. In this case, the null might rationally be set close to or slightly higher than the historical response rate. Alternatively, if the intervention was less toxic or more convenient than the treatment in the historical cohort, it would be reasonable if the null was at or slightly lower than the historical response rate.
An additional consideration is the use of statistical methods to adjust for imprecision in historical estimates. Such imprecision can have an important effect on study design. For instance, if the historical response rate in 50 patients receiving standard therapy is 50%, the 95% confidence interval around this proportion is approximately 35% to 65%. Imagine that investigators set the null at 50% for a trial of standard therapy plus novel agent. If the true response rate were in fact close to 65%, there is a high probability that an ineffective novel agent would be deemed worthy of further study. A statistical approach to this problem has been proposed by Fazzari et al., who suggest using the upper bound of a one-sided 75% confidence interval for the historical data as the null response rate(3
Differences in case mix between the historical cohort and the study sample should also be considered. Endpoints such as tumor response and survival rate are at least partly predictable using variables such as cancer stage, tumor grade or biomarkers. The conclusions of a Phase II may be misleading if the patients accrued differ on important prognostic variables from those in the historical cohort. For example, if patients in phase II had, on average, lower stage disease than those in the historical cohort, the Phase II would overestimate the value of the investigational agent. Techniques have been described that use multivariable models to adjust the comparison between Phase II and historical data in order to account for any differences on prognostic variables(6
Our analysis shows that over half of Phase II trials require historical data to determine a null response rate. This proportion is likely to increase as more effective approaches are identified. We make some simple recommendations to improve the design and reporting of such trials. More appropriate use of historical data in Phase II design will improve both the sensitivity and specificity of Phase II for eventual Phase III success, avoiding both unnecessary definitive trials of ineffective agents and early termination of effective drugs for lack of apparent benefit.