|Home | About | Journals | Submit | Contact Us | Français|
Phase II trial designs that ignore between-patient heterogeneity and do not allow for treatment-subgroup interactions may produce very large false positive and false negative error rates if efficacy varies by subgroup. Recent discussions of this problem were illustrated with scenarios and computer simulations. In this short communication, we reanalyzed a published phase II trial to highlight the need to consider between-patient heterogeneity and the possibility of treatment-subgroup interaction when designing and analyzing phase II studies. The single-arm trial evaluated amsacrine plus cytosine arabinoside, vincristine, and prednisone (a combination abbreviated as OAP) for adult acute leukemia, when standard treatment was adriamycin plus OAP. We carried out an analysis of covariance (ANCOVA) incorporating data from historical control patients who met eligibility criteria for the trial and received standard treatment at the study center in the years immediately preceding the trial. Patients administered experimental treatment and control patients were classified as having favorable or unfavorable prognosis according to their predicted probability of response to standard treatment. When the prognostic subgroup of patients was ignored, the response rates for experimental and standard treatment appeared similar. However, fitting an ANCOVA model determined that the effects of subgroup, treatment, and their interaction were statistically significant: experimental treatment was superior to standard treatment in patients with unfavorable prognosis and inferior to standard treatment in patients with favorable prognosis. This real-world example of treatment-subgroup interaction highlights the need to employ phase II designs that consider between-patient heterogeneity and the possibility that efficacy differs by subgroup.
Nearly all designs for phase II trials make the simplifying assumption that patients are alike in their probability of responding to experimental treatment ,  and . Often this assumption is not valid, and the patients entering the trial belong to subgroups with differing probabilities of response. When this occurs, phase II designs that ignore the possibility that efficacy differs by subgroup may produce extremely large false positive and false negative error rates within subgroups . Recent discussions of this problem were illustrated with scenarios and computer simulations  and . In this short communication, we reanalyzed a published phase II trial  to highlight the need to consider between-patient heterogeneity and the possibility of treatment-subgroup interaction when designing and analyzing phase II studies.
The single-arm trial that we reanalyzed evaluated amsacrine (AMSA) plus cytosine arabinoside, vincristine, and prednisone (a combination abbreviated as OAP) for adult acute leukemia, when standard treatment was adriamycin plus OAP . For ethical reasons, experimental treatment was initially restricted to patients with unfavorable prognosis on standard treatment; as results on the trial accumulated, the criteria for administering experimental treatment were broadened until ultimately all patients received experimental treatment.
Subjects who received experimental treatment (n=134) were classified into two subgroups based on predicted probability of response (PPR) to standard treatment, calculated from a logistic regression model constructed based on 300 patients  and validated using another 107 patients , all of whom met eligibility criteria for the trial and were treated at the same institution in the years immediately preceding the trial. Patients in the favorable prognostic subgroup had a PPR of at least 0.60.
Data for the current reanalysis were obtained from tables in publications reporting the trial  and the PPR model  and . The 407 patients described above  and  who received standard treatment served as our historical control patients and were classified into two prognostic subgroups in the same manner as the subjects who received experimental treatment. We carried out an analysis of covariance (ANCOVA) where response was the number of complete remissions (CR) out of total subjects treated (N), or CR/N.
Response to treatment was as shown in Table 1. When the prognostic subgroup of patients was ignored, the response rates for experimental and standard treatment did not differ significantly (52.2% versus 60.2%, chi-square test, p>0.10). However, fitting an ANCOVA model determined that the effects of subgroup, treatment, and their interaction were statistically significant (Table 2). Thus experimental treatment was superior to standard treatment in patients with unfavorable prognosis and inferior to standard treatment in patients with favorable prognosis (Table 3).
The current reanalysis of a phase II trial  detected significant treatment-subgroup interaction. This real-world example highlights the need for single-arm trial designs to take into account between-patient heterogeneity and allow for the possibility of differing treatment efficacy among subgroups.
Our reanalysis of the AMSA trial incorporated data from appropriate historical control patients  into an ANCOVA. Investigators planning future phase II trials might consider employing the Bayesian design recently proposed by Wathen et al. , in which historical subgroup effects of standard treatment are incorporated into an ANCOVA as informative priors. Because the design allows for treatment-subgroup interaction and subgroup-specific stopping rules, accrual may be stopped within one subgroup but continue in another.
However, it should be noted that the design of Wathen et al.  would not have been appropriate for the AMSA trial, in which subjects were previously untreated patients for whom moderately effective standard treatment was available. For this reason, it would not have been ethical to administer experimental treatment simultaneously to those with favorable and unfavorable prognosis. Instead, experimental treatment was initially restricted to patients who had the most unfavorable prognosis on standard treatment. Only after their response rate was observed to be higher than predicted were criteria for administering experimental treatment broadened, until ultimately they included all patients entering the trial .
The AMSA trial also evaluated toxicity (days of fever, episodes of infection, hyperbilirubinemia, and renal toxicity). According to the report of the trial, experimental treatment was associated with increased frequency of hyperbilirubinemia and fewer days of fever compared to standard treatment . If predicted risks of specific toxicities on standard treatment had been calculated similarly to PPR, subjects receiving experimental treatment and control patients could have been assigned to subgroups of toxicity risk, and ANCOVA models of specific toxicities could have been constructed. In this way, the study could have ascertained whether risk of toxicity, like treatment benefit, differed by prognostic subgroup. The resulting information could have made possible a subgroup-specific risk-benefit assessment of experimental treatment. Unfortunately, published data relating to the trial were not sufficiently detailed to permit us to perform such an analysis, which should be investigated in future trials.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.