Adaptive randomization increases the probability of assigning patients to the treatment arm that appears to be doing better. There are many ways to do this. We consider the method of Thall and Wathen12
= 1/2 and P
) is the posterior probability that the experimental treatment is better than the control treatment estimated from the data seen so far (using uniform prior distributions13
). For example, if P
) equaled 0.05, 0.10, 0.3, 0.5, 0.7, 0.9, or 0.95, then patients would be assigned to the experimental treatment with probability 0.19, 0.25, 0.4, 0.5, 0.6, 0.75, or 0.81, respectively.
The estimator 1
of the assignment probability can be unstable at the beginning of the trial because there is little data at that point to estimate P
). One possibility is to have a run-in period with the randomization probability being 1/2 before starting to use 1
; this approach will be used when discussing phase III trials. The approach we will use here for phase II trials is the one given by Thall and Wathen12
: Use formula 1
but with a
), where n
is the current sample size of the trial and N
is the maximum sample size of the trial. This approach yields assignment probabilities closer to 1/2 earlier in the trial. For example, if the current estimate of P
) is 0.9, the probability of assignment to the experimental treatment arm would be 0.57, 0.63, or 0.70 if the trial was one quarter, one half, or three quarters completed, respectively. From a practical perspective, one would in addition want to prevent the assignment probability from becoming too unbalanced, that is, being greater than 0.8 or 0.9 (extreme imbalances can create problems with the study interpretation if there are time trends; see Adaptive Randomization of Phase III Trials). We considered two versions of the adaptive design with the probability of arm assignment capped at 0.8 and 0.9.
displays the results for the adaptive randomization and 1:1 and 2:1 fixed randomization using the same phase II operating characteristics as described in the Fixed Unbalanced Randomization section. (No early stopping is allowed in this set of simulations to simplify the comparison of the designs.) The adaptive approach requires a total of 140 patients compared with 132 patients required for a fixed 1:1 randomization. Under the null hypothesis (response rates are 20% in both arms), the probability of response for a study participant is the same for all designs (20%). However the adaptive randomization designs have higher numbers of nonresponders compared with the 1:1 randomization (112.0 v 105.6; first row of data in ). When the new treatment is beneficial, the adaptive randomization provides a slightly higher probability response: 33.2% or 33.7% versus 30% under the design alternative (third row of data in ). At the same time, the adaptive design continues to result in a higher number of nonresponders than 1:1 randomization except when the treatment effect exceeds the design alternative. With respect to limiting the probability of arm assignment in adaptive randomization results, suggests that there is no meaningful difference between capping the probability at 0.8 versus at 0.9. Therefore, we will cap the assignment probability at 0.8 in the following discussion.
Average Proportion of Responders, No. of Nonresponders, and Overall Proportion Treated on the Experimental Arm for Various Randomized Phase II Trial Designs, Some of Which Use Adaptive Randomization
Trials with adaptive randomization frequently have interim monitoring based on the assignment probability. For example, Faderl et al2
suggest stopping their trial and declaring the experimental treatment better than the control treatment if P
) > pstop
, where pstop
= 0.95; Giles et al14
= 0.85 in a similar manner. These investigators also suggest stopping the trial if P
) < pstop
and declaring the control treatment better. However, this type of symmetric inefficacy/futility monitoring is inappropriate for the type of one-sided question we are considering here.15
Instead, for simplicity, we will not consider inefficacy/futility monitoring in the simulations. If the trial reaches a maximum sample size without stopping, we declare that the experimental treatment does not warrant further study.
displays the results of the simulations that use early stopping for the adaptive randomization and 1:1 fixed randomization. The maximum sample sizes (190 and 208 for fixed randomization and the adaptive design, respectively) and value of pstop (0.984) were chosen so that the trial designs had type I error of 10% and power of 90% for the alternative of 20% versus 40% response rates. In terms of probability of response for a participant, the two designs perform similarly: the differences are < 1% across the range of simulated scenarios. When compared by the number of nonresponders, the adaptive design does nontrivially worse (eg, on average 13 more nonresponders in adaptive design under the null hypothesis) except when the treatment effect exceeds the design target alternative.
Average Sample Size, Proportion of Responders, and No. of Nonresponders for Fixed 1:1 and Adaptive Randomized Phase II Trial Design
As an example based on a real trial, consider the adaptive randomized trial of clofarabine plus low-dose cytarabine versus clofarabine (control arm) for acute myeloid leukemia.2
With the early stopping rules used, 63% of 54 patients in the experimental arm and 31% of 16 patients in the control arm had responses, yielding 31 nonresponders in total (56% response rate; 70 patients). This is a favorable situation for an adaptive randomization design because the response rates are so different between the arms. But even here, a fixed 2:1 randomization would have arguably been a better option: 38 and 19 patients in each arm would have yielded the same precision for estimating the treatment difference if one saw the same response rate difference between the two arms, yielding 27 nonresponders (53% response rate; 57 patients).