Phase II clinical trials are designed to screen out experimental therapies with low efficacy before they proceed to a large scale phase III trial. Often, we have multiple experimental therapies for efficacy screening with respect to the same patient population. Usually, the resources for clinical trials are limited, so that we may want to choose only small number of therapies, ideally one therapy, to be compared with a standard therapy through a phase III trial. In this setting, we may take one of two approaches: (i) Conduct multiple separate phase II trials, one for each experimental therapy, and evaluate them independently using a standard phase II trial design method for a single-arm phase II trial; (ii) Conduct a single phase II trial with multiple arms, randomize patients into the arms, and choose the best arm(s) using a selection method. The former approach requires more research resources due to the multiplicity of the studies. Also, the individual phase II trials may potentially have different patient characteristics, and the comparison among different therapies can be biased.
To avoid these issues, the second approach is attractive. However, the statistical approaches for analyzing randomized phase II trials are limited. Simon, Wittes and Ellenberg (1985)
consider randomizing n
patients to each of K
treatment arms through a single stage and picking the winner, the arm with the largest estimated response rate, among them. this approach is based on the statistical methods of ranking and selection, the basic concepts of which were introduced over 50 years ago by Beckhofer (1954)
, with a substantial literature since that time. They show that, depending on the design setting, n
= 16 to 70 patients are required for a 0.9 correct selection probability when there exists a difference of 0.15 in response rate among the K
arms. Liu, LeBlanc and Desai (1999)
point out that this approach has a high selection probability even when the treatment arms have the same response rates. Sargent and Goldberg (2001)
consider a similar approach by allowing selection based on other factors when the difference in observed response rates is small.
Thall, Simon and Ellenberg (1989)
consider studies with one control and K
experimental arms. In the first stage, n1
patients are randomized to each of K
experimental arms, and the winner is chosen for the second stage if its observed efficacy is larger than that for the historical control by 10%. The trial is stopped early if the winner does not satisfy this condition. In the second stage, n2
patients are randomized to each of the control arm and the selected experimental arm from stage 1, and one-sided testing is conducted to see if the experimental arm is better than the control. They require n1
= 30 to 80 patients and n2
= 90 to 140 patients under different design setting.
proposes a two-stage design for selection of the best of three treatments. In stage 1, cohorts of three patients are randomized to Arms A, B and C, and a decision is made to continue to accrue the next cohort or to stop and choose the better two arms. In stage 2, cohorts of two patients are randomized to the two arms chosen at stage 1, and a decision is made to continue to accrue the next cohort or to stop and choose the winner. Given the maximum number of patients available for the study, the stopping time for each stage is chosen to minimize the number of future failures using a Baysian approach. This method requires rapid determination of responses to be able to apply the sequential tests.
Steinberg and Venzon (2002)
propose two-stage designs for a phase II trial with two experimental arms. In stage 1, n1
patients are randomized to each arm. The trial is stopped after stage 1 if the difference in number of responders between the two arms are larger than d
, which is chosen so that, when the two arms have a difference of 0.15 in response rate, the probability of selecting the inferior arm is controlled at a specified level. Otherwise, the trial proceeds to stage 2 to randomize an additional n2
patients to each arm. After stage 2, the winner is chosen based on the cumulative responses through the two stages. Given n
, one can choose n1
/2 or to minimize the expected sample size for the specified response rates with 0.15 of difference. This approach does not control the overall error probabilities through the two stages.
Most of these existing methods do not accurately control the type I error and the power for the whole selection procedure. Furthermore, they do not allow unequal designs among different arms. We propose exact and efficient between-arm comparison methods for analyzing randomized phase II trials designed for independent evaluation of each arm. The proposed methods can be used for comparing the response data from multiple single-arm trials on competing therapies with similar patient populations as well. We use the uniformly minimum variance unbiased estimator (UMVUE) since, as shown by Jung and Kim (2004)
, for 2-stage phase II trial designs, the maximum likelihood estimator (MLE) can be seriously biased, and the efficiency of UMVUE is comparable to that of MLE. In Section 2, we briefly review the UMVUE for multi-stage designs. We derive between-arm comparison methods under various conditions in Section 3. Some numerical studies are conducted in Section 4.