|Home | About | Journals | Submit | Contact Us | Français|
Buzaianu and Chen apply strong curtailment to modify the two-stage select-and-test clinical trial design proposed by Thall et al. (1988). The modification reduces the expected sample size while maintaining overall power but requires continuous monitoring in stage 1. I will review the history of this type of design and discuss practical issues related to the use of strong curtailment that arise in trial conduct.
Buzaianu and Chen (BC) apply strong curtailment (SC) to modify the two-stage select-and-test design of Thall et al. (TSE; 1988) for comparing k experimental treatments, E1, …, Ek, and a standard treatment, S. In stage 1, BC constrain each arm to at most n1 patients and propose that the data be monitored continuously. If it is determined that the current best experimental treatment, E[k], cannot possibly be beaten by any other Ej, BC apply SC to stop stage 1 and proceed to stage 2. This greatly reduces the average sample size while maintaining the generalized power of the final comparative test.
An ethical advantage of SC if response is associated with long-term benefit, such as extended overall survival time T in that Pr(T > t | response) > Pr(T > t | no response) for all t > 0, is that treating fewer patients with experimental therapies having lower response rates translates into more patients in the trial receiving greater long-term benefit from their assigned treatment. By applying SC in the two-stage design, patients who would have been treated with inferior Ej’s in the later portion of stage 1 instead receive either S or E[k] in the more quickly initiated stage 2. This is an example of the more general process of weeding out inferior treatments via futility stopping rules.
Two-stage select-and-test designs arose from consideration of scientific problems with the process of evaluating new treatments. Simon et al. (1985) proposed the then radical idea of randomizing patients among E1, …, Ek for phase II evaluation of these treatments, to avoid the bias that arises from the common practice of conducting k separate single-arm trials and comparing the Ej’s on that basis. Still, if E[k] is later compared to S in a conventional phase III trial, the type I error will be inflated due to the preliminary selection of E[k]. Thus, the two potential sources of bias are due to between-trial effects if patients are not randomized among the Ej’s, and selecting a best E[k]. TSE addressed these problems by including both the phase II selection and phase III test in one two-stage design, incorporating the stage 1 data on E[k] and S into the final stage 2 test statistic, while controlling the overall type I and type II error rates. A limitation of the TSE design is that it assumes binary outcomes in both stages, which may oversimplify treatment effects. Schaid et al. (1988, 1990) provided similar select-and-test designs, but accommodating time-to-event outcomes. These are examples of “phase II–III” designs, many extensions, variants, and refinements of which have been proposed. Recent papers include Inoue et al. (2002), Stallard and Todd (2003), Todd and Stallard (2005), and Kelly et al. (2005). A review is given by Thall (2008).
BC change the TSE framework by constraining the stage 1 per-arm sample size to be ≤n1 and requiring continuous monitoring. Consequently, their design actually has many more than two stages. This raises important issues pertaining to the practical logistics of trial conduct. Designs that include sequential, outcome-adaptive interim decisions depend critically on the time, τ, required to treat each patient and observe his/her outcome, Y, the accrual rate, a, and the cohort size, c, used for applying the decision rules, with c = 1 corresponding to continuous monitoring. Decisions may include selecting a best treatment in a randomized trial, deciding whether to stop a phase II trial based on Y = 1 [≥50% tumor shrinkage at evaluation time τ], or choosing the dose for the next cohort in a phase I trial based on Y = 1 [toxicity before time τ]. In addition to decision rules and model parameters, trial logistics and thus design properties depend heavily on τ, a, and c.
Like TSE, BC assume implicitly that each patient’s outcome is observed immediately after treatment, τ = 0. Unlike TSE, BC use c = 1 rather than c = n1 in stage 1. In actual clinical trial conduct, c = 1 is practical only if outcomes can be observed quickly, such as nausea/vomiting following chemotherapy or recanalization following rapid treatment of stroke, both of which are observed within τ = 24 to 48 hours. In most oncology trials, however, it takes a nontrivial time τ to administer treatment and evaluate Y. This may greatly complicate outcome-adaptive decision-making, and monitoring Y continuously, periodically, or for cohorts of size c > 1 must account for both τ and a. In practice, accrual often overruns the nominal sample size at any interim decision by about aτ patients. For example, for the common value τ = 6 weeks in a chemotherapy trial that accrues on average a = 2 patients per week, aside from the impossible approach of setting c = 1 and suspending accrual for 6 weeks each time a patient is treated so that his/her Y may be observed, on average 12 new patients will be treated during each patient’s evaluation period. A common practical solution is to suspend accrual and apply the decision rules only after cohorts of, for example, c = 10 for a futility stopping rule in phase II or c = 3 in phase I dose-finding. Thall et al. (1999) proposed a simple “look ahead” rule that is the dose-selection analog of SC. For example, if only the first of three patients in the current cohort is fully evaluated but the outcomes (toxicity or not) of patients 2 and 3 will not alter the next chosen dose, then start the next cohort without delay. Similarly, if a futility rule in phase II says to stop the trial if [no. responses]/[no. patients fully evaluated] ≤3/10, then any experienced trialist will apply SC by stopping accrual when 0/7, 1/8, or 2/9 responses are observed. This sort of thing is done routinely in early-phase oncology trials, and the properties of designs including such rules generally are derived by computer simulation.
In contrast, two-stage designs have the practical logistical advantage that accrual only need be suspended once, between stages, for the period τ, in order to harvest the stage 1 data and compute the interim decision criteria.