In advanced disease cancer settings, most RCTs are designed to evaluate new agents with limited preliminary efficacy data. A high proportion of these new agents will be ineffective, and many of these agents have serious toxicities. First, consider designs that compare a new therapy with placebo or observation. (In a lethal disease like cancer, this implies that no active therapy is available.
9) There is no scientific or therapeutic value in showing that the new treatment is significantly worse than no treatment.
10 Therefore, observing no improvement in the primary outcome at some point into the study is often considered sufficient to demonstrate lack of benefit. The same rationale applies to designs that evaluate addition of a new therapy to the standard therapy (eg, standard plus new therapy
v standard therapy plus placebo). A simple guideline is to stop the study if after observing 50% of the design-specified total number of events (50% of the total information), the hazard ratio (HR) of the new over the standard treatment is greater than 1 (ie, the experimental treatment is doing worse).
11,12 However, if the experimental therapy is toxic, then waiting until 50% of the events have occurred (and when the accrual may almost be complete) may not be appropriate. In this case, monitoring should commence earlier (approximately 25% to 30% of the total information). For example, Danish Head and Neck Cancer Study 10 was an RCT comparing darbepoetin alfa versus placebo in patients with advanced head and neck cancer receiving radiotherapy.
13 This study was stopped at the first scheduled interim analysis performed at 50% of total information (158 events, 522 patients, 87% of planned accrual) after observing significantly worse outcome in terms of the primary end point of locoregional-control (
P = .01); 5-year OS rates were 38% and 51% on darbepoetin and control, respectively (
P = .08).
13,14 The trial was stopped even though the primary end point had apparently not crossed the protocol-specified symmetric monitoring boundary.
15 In retrospect, it seems this study could have benefited from an asymmetric interim analysis plan scheduled to start at approximately 25% to 30% of information.
In designs where the experimental arm does not contain an active standard therapy to which it is being compared (new agent
v standard active therapy), monitoring for lack of benefit requires some additional considerations (especially in an aggressive disease setting). The monitoring is often influenced by the consideration that accumulating evidence that the new therapy has similar activity to an established active treatment is valuable for the treatment development,
16 especially if the new therapy has a favorable toxicity profile. However, in the presence of an active standard therapy, there is a heightened pressure to stop the trial for lack of benefit, because ineffectiveness of the new agent implies that patients in the experimental arm are receiving an inferior treatment. (By contrast, in designs with placebo/observation control, there is usually no expectation that the new therapy will result in a nontrivial detriment in clinical outcome relative to the control, with the main concern being treatment toxicity.) Consider an RCT
17 that compared the matrix metalloproteinase inhibitor BAY12-9566 against gemcitabine in advanced pancreatic cancer. The trial was designed using a symmetric boundary, with the first comparative interim analysis scheduled at 50% of the total information. The study was stopped at the first comparative interim analysis (140 deaths observed, 277 patients accrued, 80% of planned accrual) on the basis of observing median survival of 3.2 months on the BAY12-9566 arm and 6.4 months on the gemcitabine arm (
P = .0001). Gemcitabine is an approved therapy with documented survival and palliation benefit in this poor-prognosis population.
17 Therefore, for this setting, we would recommend a lack-of-benefit rule that is scheduled to commence at approximately 25% of information and required a less dramatic evidence of harm (relative to the standard active therapy) for stopping.
After the start of monitoring, interim analyses should be repeated reasonably frequently (eg, each 10% to 20% of information, or every 6 or 12 months). Once interim analyses have started, frequent interim looks cost little in terms of the statistical power of the trial and reduce the possibility that the trial continues longer than necessary.
8 Consider the Evaluation of Neo-Recormon on Outcome in Head and Neck Cancer in Europe trial
18 that compared epoetin beta versus placebo in patients with head and neck cancer undergoing radiotherapy; the study completed accrual (351 patients) in April 2001. The publication
18 was unclear regarding who was doing the interim monitoring and stated that the study sponsor decided to omit the second of the two planned interim analyses. In April 2003, 2 years after the scheduled date of the omitted interim analysis, the final analysis was conducted. It revealed a significant impairment in cancer control and survival for erythropoietin versus placebo: locoregional PFS (primary end point) HR = 1.62 (
P = .0008) and OS HR = 1.39 (
P = .02).
18 In theory, monitoring rules with frequent interim analyses using asymmetric boundaries could have made the negative results available earlier. This would have been important for a number of ongoing and planned trials of this (or similar) agents. For example, in response to the publication of the Evaluation on Neo-Recormon on Outcome in Head and Neck Cancer in Europe study results, a similarly designed study Radiation Therapy Oncology Group 99-03 (radiotherapy with or without epoetin alfa in head and neck cancer) performed an unscheduled interim analysis.
19 On the basis of observing a negative trend in patient outcome for the epoetin alfa arm, the Radiation Therapy Oncology Group DMC closed the trial after only 148 of 372 planned patients were enrolled.
19In settings where survival events accumulate slowly relative to patient accrual, even well-designed survival-based futility rules are unlikely to stop the study before accrual is completed. Incorporating a reliable intermediate end point (eg, PFS) into the futility rule may lessen the number of patients treated with ineffective therapy and speed up evaluation of new agents.
20,21 Note that to be used for interim futility monitoring, an intermediate end point does not generally have to be validated as a surrogate end point for the primary outcome; often, a consistent association between the interim and primary end points is considered sufficient.
21