Search tips
Search criteria 


Logo of blackwellopenThis ArticleFor AuthorsLearn MoreSubmit
Statistics in Medicine
Stat Med. 2017 September 10; 36(20): 3137–3153.
Published online 2017 June 13. doi:  10.1002/sim.7367
PMCID: PMC5575545

Estimation in multi‐arm two‐stage trials with treatment selection and time‐to‐event endpoint


We consider estimation of treatment effects in two‐stage adaptive multi‐arm trials with a common control. The best treatment is selected at interim, and the primary endpoint is modeled via a Cox proportional hazards model. The maximum partial‐likelihood estimator of the log hazard ratio of the selected treatment will overestimate the true treatment effect in this case. Several methods for reducing the selection bias have been proposed for normal endpoints, including an iterative method based on the estimated conditional selection biases and a shrinkage approach based on empirical Bayes theory. We adapt these methods to time‐to‐event data and compare the bias and mean squared error of all methods in an extensive simulation study and apply the proposed methods to reconstructed data from the FOCUS trial. We find that all methods tend to overcorrect the bias, and only the shrinkage methods can reduce the mean squared error. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Keywords: multi‐arm trial, time‐to‐event, adaptive design, bias, empirical Bayes

1. Introduction

Multi‐arm trials, which compare multiple treatment arms to a common control group, are more efficient than traditional two‐arm designs as only a single control group is used 1. Additional efficiency gains can be achieved by selecting treatments at one or more interim analyses 2. Most of the recent research in this area focuses on how such studies are designed for a normally distributed endpoint (e.g., 3, 4, 5, 6). Approaches that are applicable to non‐normal and particularly time‐to‐event endpoints are less frequent in contrast (e.g., 7, 8, 9, 10), although many examples of multi‐arm trials with survival endpoints, such as the three‐arm SANAD trial in generalized and unclassifiable epilepsy 11, the four‐arm STAMPEDE trial in prostate cancer 12, or the five‐arm FOCUS trial in poor prognosis advanced colorectal cancer 13, exist.

Adaptive treatment selection at an interim analysis increases the chance of a successful trial and avoids wasting resources on unpromising treatments 1. It is well known, however, that selecting promising treatments based on the observed data leads to overestimation of the treatment effect in the selected treatment arms and an underestimation of the treatment effect in the dropped treatment arms 14. A variety of different strategies to remove or reduce the bias has been developed for normal data. The uniformly minimum variance conditionally unbiased estimator (UMVCUE) in a two‐stage design when selecting the best treatment was first derived by Cohen and Sackrowitz 15 and extended to more general selection rules in 16. A simple iterative method for calculating a bias‐corrected estimate based on the estimated conditional selection bias was proposed by Stallard and Todd 17. An extension of the shrinkage estimator of Hwang 18 to two‐stage adaptive designs has been proposed by Carreras and Brannath 19. They show that the shrinkage estimator has smaller Bayes risk than the maximum likelihood estimator (MLE) and also a smaller mean squared error (MSE) in many scenarios.

Common to the methods for normal data is that they assume equal variance between treatment arms and do not consider the control group, because in the normal case, the conditional selection bias of the MLE is the same with or without a control group (although the equal variance assumption has been relaxed for the UMVCUE in recent work 20, which also explicitly accounts for a shared control group). However, in the context of survival trials and for the estimation of log hazard ratios (log‐HRs), a control group is required. The challenge thereby is that a control group, which is common to all treatment groups, induces a correlation among the log‐HR estimators. Moreover, the assumption of equal and known variances is unrealistic in the survival setting as the variances depend on the number of observed events. In this paper, we therefore adapt the method of Stallard and Todd 17 to time‐to‐event endpoints and consider shrinkage estimators in this context. In an extensive simulation study, we then compare the bias and MSE of these estimators and analyze reconstructed data from the FOCUS trial.

In Section 2.2, we derive explicit formulas for the selection bias of the MLE in the selected, as well as in the dropped treatment arms under a Cox proportional hazards model, when the treatment with the largest observed effect at an interim analysis is selected. We derive two shrinkage estimators in Section 2.3 and adapt the iterative Stallard–Todd method to survival data in Section 2.4. The simulation study in Section 3 compares the bias and MSE of these methods for different number of treatment groups, allocation ratios, and total number of events. An analysis of reconstructed data from the FOCUS trial is presented in Section 4 before we conclude with a discussion.

2. Methods

We consider a two‐stage design where the best treatment, that is, the treatment with the smallest estimated log‐HR, is selected at the interim analysis. Let S be the index of the selected treatment. Note that S is a random variable depending on the Stage 1 data.

2.1. Model

Consider K2 treatment groups and a common control group. Denote the total sample size by n, and suppose patients are allocated to group k in Stage 1 with probability p k (k = 0, …, K). Here, the index k=0 represents the control group. Assuming proportional hazards between each treatment group and the control group, we consider the following Cox proportional hazards model:


where β is the vector of log‐HRs and Z j=(Z 1j, …, Z Kj) is the vector of binary treatment indicators, that is, Z kj=1 if patient j is in group k and 0, otherwise (k = 1 …, K, j = 1, …, n). The joint model in Eq. (1) is equivalent to K separate Cox models with the same baseline hazard function λ 0, which is the hazard in the common control group. In a sequential survival trial with more than one analysis time point, we need to distinguish between the calendar time, measured from the start of the trial, and the individual survival time of each patient. Associated with each patient is an entry time R j, measured from the start of the trial, a survival time T j, and a censoring time C j, both measured from entry of the patient into the trial. At each calendar time t, only the minimum Yj(t)=min(Tj,Cj,tRj) and the censoring indicator Δj(t)=I{Tjmin(Cj,tRj)} can be observed. The observed data of all patients at calendar time t are given by


where n(t)=jIRjt is the number of patients recruited by calendar time t.

2.2. Maximum likelihood estimator

The MLE of β in Eq. (1) at calendar time t is denoted by β^MLE(t)=(β^1MLE(t),,β^KMLE(t)) and is obtained by maximizing the partial‐likelihood of the Cox model based on the data D(t). In practice, interim and final analyses are usually not scheduled at fixed calendar times but after a certain number of events have been observed. Let d j be the total number of events of the j‐th analysis and d kj the number of events in the k‐th group at the j‐th analysis, such that d j=d j0 + … + d jK (k = 0, …, K, j = 1, 2). Denote by β^MLE,j the vector of estimated log‐HRs β^kMLE,j (k = 1, …, K, j = 1, 2) at the interim and final analyses, respectively. The estimated log‐HR in the selected treatment group at the interim analysis is β^SMLE,1=mink=1,,K{β^kMLE,1}. The true treatment effect β S of the selected treatment is a random variable, and in general, βSminkβk. We say that an estimator β^SMLE,j of β S at Stage j is unbiased if E[β^SMLE,j]=E[βS] (j=1,2) 15.

The MLEs are correlated because of the common control group. If β≈0 and the censoring time C is stochastically independent from Z, we show in Section A.1 in the Appendix that the correlation ρ kl between β^kMLE,j and β^lMLE,j is approximately


Note that when the size of the common control group decreases, the correlation of the MLEs increases. In the special case p0=1k=1Kpk=1Kp, where all treatment groups are the same size, the correlation ρ between any two log‐HR estimators at the interim analysis is approximately (1−p 0)/(1+(K−1)p 0).

The derivation of the selection bias is based on the joint asymptotic distribution of the vectors β^MLE,1 and β^MLE,2. We have approximately


where Σj is the asymptotic covariance matrix of β^MLE,j (j=1,2) (see Section A.1 in the Appendix).

2.2.1. Estimation in a two‐stage design

Denote by δ^SMLE the increment of the MLE based on all data observed after the interim analysis (including events observed in Stage 2, from patients recruited in Stage 1). This is an (asymptotically) unbiased estimator of the true log‐HR, which is inefficient, because it ignores all of the Stage 1 data. Formally, δ^SMLE can be defined via the score process U of the Cox model and the corresponding Fisher information I conditional on the selection S. Asymptotically U 2SU 1S and U 1S are stochastically independent, because of the independent increments property of U. Thus, the increment of the MLE given by


is asymptotically independent from β^SMLE,1 conditionally on S. At the final analysis, the MLE of the selected treatment can be defined in two different, but asymptotically equivalent, ways, either by β^MLE,2, that is, from all data accrued until the final analysis or by a weighted sum of the first stage MLE and the increment from first to second stage, that is,


where w[set membership][0,1] is a pre‐specified constant. We will consider both estimators in our simulation study. Usually, w will be equal to the information fraction at the planned time of the interim analysis, that is, w=d 1/d 2.

2.2.2. Selection bias and mean squared error

The bias of the MLE of the selected treatment at analysis j can be written as


The true treatment effect in the selected treatment arm is overestimated, when the best performing treatment arm is selected, because


One can see that the true treatment effect in the dropped treatment arms is underestimated.

For normal means, the selection bias is maximal if all means are equal 19. Their result and its proof still hold asymptotically for the estimated log‐HRs when the signs are reversed; that is, for given variances and correlation, the absolute value of the bias is maximal if all true log‐HRs are equal. We obtain a lower bound


for the selection bias of β^SMLE,j depending on the asymptotic covariance matrix Σj. This bound is attained when the true log‐HRs are all equal. Because the correlations are always nonnegative in our setting, it is an immediate consequence of Slepian's lemma 21 that for fixed variances, the lower bound C jj) is minimal when all correlations are 0. This corresponds to a situation where every treatment group has its own independent control group.

There are two possible ways of defining the MSE, because β S is a random variable. First, the MSE of predicting β S


which is the definition that we will use in our simulation study, or alternatively the MSE of estimating E[β S]:


Note that while the bias does not depend on the correlation in the normal case, this is not the case for the variance. The MSE is always larger with control group than without a control group:


where X k is the sample mean and θ k is the true mean in group k = 0, …, K.

2.2.3. Conditional selection bias

We will now give explicit expressions for the selection probabilities and the conditional selection biases at the interim and final analyses. The detailed derivations can be found in Section B.1 in the Appendix.

Let β^kMLE,1 be the vector β^MLE,1 with the k‐th element removed. Denote by


the survival function of the (K−1)‐dimensional normal distribution with mean β k(x) and covariance matrix Σk(x), where β k(x) is the conditional mean and Σk(x) is the conditional covariance matrix of β^kMLE,1 given β^kMLE,1=x, respectively. The probability of selecting treatment k is given by


where σ1k=Σ1kk. At the interim analysis, the conditional selection bias cb1k+ of β^kMLE,1 conditional on {S=k} is


In the dropped treatment arm, the MLE is biased in the opposite direction; that is, the treatment effect is underestimated, and the bias cb1k of β^kMLE,1 given {Sk} is


We derive now an explicit formula for the conditional selection bias at the final analysis. Let uk=Σ2k.Σ11, where Σ2k· is the k‐th row of the matrix Σ2. Then, the conditional selection bias at the final analysis can be written as




In the dropped treatment arm, analogously to Eq. (3),


We have shown in Section 2.2.2 that the absolute value of the bias is maximal when the correlation is 0; that is, each treatment group has its own independent control group. In this case, Eqs (4) and (5) reduce to




respectively. Thus,


Thus, each additional event observed after the interim analysis reduces the conditional bias, and the bias at the final analysis will be smaller than at the interim analysis. We also see that only the total number of observed events matters and not whether the event comes from a patient recruited before or after the interim analysis. This is especially interesting in the dropped treatment arms, because it quantifies the bias reduction that we can expect, when continuing follow‐up until the final analysis. In our simulation study in Section 3, we see that this relation also holds approximately in the case of a common control group. This result is analogous to the normal case, for example 22.

In a two‐arm trial with a single treatment group, recruitment may be stopped early at the interim analysis for the lack of benefit, when the estimated log‐HR is larger than some pre‐specified threshold c, but follow‐up is continued until the final analysis. The selection bias in this case can be obtained as a special case by setting K=2,β^2MLE,1=c, and letting σ120. This leads to


We see that Eq. (6) still holds; that is, additional events reduce the conditional selection bias. This confirms the simulation study results of 23.

2.3. Shrinkage estimator

We consider two different approaches to deriving the shrinkage estimator. The first approach follows closely the original derivation of 18, by considering the posterior mean of the selected treatment when the covariance matrix of the data is known, but arbitrary, and then replacing the prior mean and variance by their sample estimates. The second approach first scales the estimators by the square root of their covariance matrix to obtain independent estimators with unit variances and then rescales the shrinkage estimates back to the original scale.

In the normal case with known and equal variances and four or more treatment groups, the shrinkage estimator can be derived as an empirical Bayes estimator 24. Let X k be the sample mean and θ i the true mean in group k ( k=1,,K4). We assume normal priors for the true means θ k~N(μ,τ 2) and that the sample mean X k conditional on θ k is normally distributed with mean θ k and known variance σ 2 (k = 1, …, K). The posterior mean of θ k conditional on X k is given by (1−C)μ+C X k and the posterior variance by σ 2 C, where C = τ 2/(τ 2+σ 2). Thus, the posterior mean of θ S is E[θ S|X 1, …, X K] = (1−C)μ+C X S. To obtain an estimator of the posterior mean of θ S, the unknown quantities μ and C are estimated by the overall sample mean X¯ and


respectively. Ĉ is an unbiased estimator of C 24. Because C0, the estimator Ĉ is replaced by Ĉ+=max(Ĉ,0). Therefore, the shrinkage estimator is given by


The aforementioned estimators are only defined for K4. The best linear unbiased predictor can be used instead when K3. The best linear unbiased predictor is the same as the empirical Bayes estimator but with K−3 replaced by K−119.

2.3.1. Shrinkage estimator for the log hazard ratios

The extension from independent equal variances to the general case of an arbitrary covariance matrix is straightforward. In order to simplify the notation, we write β^ in the following for the MLE at Stage j instead of β^MLE,j. Assume now


where I is the K×K identity matrix and the Σ the covariance matrix, which we assume to be known for the moment. Note that we still put independent priors on the true treatment effects. The posterior mean of β given the data β^ is


where the shrinkage factor C is now the matrix I−Σ(τ 2 I+Σ)−1. The marginal covariance matrix of β^ is τ 2 I+Σ. Because Σ is symmetric, there exists an orthogonal matrix U and a diagonal matrix D such that D=UΣU. Thus, Uβ^ is a vector of independent normal distributed variables. The marginal covariance matrix of Uβ^ is τ 2 I+D. The following iterative procedure described by Morris 24 to calculate the MLE τ^2 of τ 2 can now be applied to the transformed data Uβ^:

  1. Start with an initial guess of τ^2.
  2. Define weights wk=(τ^2+Dkk2)1 (k = 1, …, K).
  3. Estimate τ 2 by
  4. If τ^2<0, set τ^2=0.
  5. Repeat Steps 2 and 4 until convergence.

As Σ is unknown, we apply this iterative method with the diagonal matrix D^=ÛΣ^Û instead of D, where Σ^ is an estimator of the unknown covariance matrix Σ and Û is an orthogonal matrix to diagonalize Σ^. We can now define an estimator of the matrix C by


Similarly as before, we replace the unknown prior mean μ by the overall log‐HR β¯, which is calculated by pooling all treatment groups together and fitting a Cox proportional hazards model to the pooled data. The shrinkage estimator is defined by


The shrinkage estimator of the selected treatment is β^SEB. If τ^2=0, then Ĉ=0, and the shrinkage estimator is equal to the overall log‐HR β¯. We will call this the EB shrinkage method.

We could have diagonalized with the orthogonal transformation U from the beginning; that is, we could have used U θ and Uβ^, applied the shrinkage, and then transformed back to the original coordinates. This would have led to the shrinkage factor:


which is exactly the same as C, because U is orthogonal.

2.3.2. Shrinkage estimator of the scaled log hazard ratios

Instead of only diagonalizing Σ^, we standardize it to the identity matrix and then apply the method of the independent equal variances case. This leads to an alternative shrinkage estimator. Consider the problem of shrinkage towards 0 when Σ is known. We start by transforming the vector β^ with covariance matrix Σ to a vector β * with independent components and unit variances. Define the scaled MLEs:


When we use a N(μ *,τ *2 I) prior on the scaled treatment effects Σ−1/2 β, we know from the independent case with equal variances that the empirical Bayes estimator is given by Ĉ+β^, where


We have


The right hand side is the Wald test statistic for testing equality of all treatment effects, which is asymptotically equivalent to the log‐rank test statistic in the Cox model in Eq. (1). When shrinking towards the overall log‐HR β¯, we take the log‐rank test statistic Z, which directly compares the treatment groups, that is, without the control group and treatment group 1 being the new baseline, instead. Therefore,


As in the normal case, we replace K−3 by K−1 if K<4. The log‐rank test statistic Z is asymptotically χK12 distributed. This is the justification to consider the shrinkage estimator:


which we will call the LR shrinkage method.

2.3.3. Two‐stage shrinkage estimators

We define a two‐stage estimator similar to the two‐stage shrinkage estimator proposed by Carreras and Brannath 19 using the increment of the MLE:


where β^S,1 is one of the two shrinkage estimators at the interim analysis defined previously.

2.3.4. Confidence intervals

Our focus is on point estimation. Construction of correct confidence intervals is difficult and a topic of ongoing research. No adequate methods for constructing confidence intervals in this setting are available apart from bootstrap resampling. However, bootstrap confidence intervals cannot properly account for the variability introduced by the selection process, because the selection in a resampled data set may be different from the decision in the original data set, but the decision to stop recruitment in the dropped treatment arms cannot be reversed in retrospect. Despite this, such intervals are useful to compare the relative merits of the estimators. We use a bootstrap approach conditional on the selection S 0 in the original data set similar to 25:

  1. Resample data accrued until final analysis;
  2. Determine calendar time t * of interim analysis in resampled data set;
  3. Obtain Stage 1 data from resampled data set, and if necessary, recensor event times of overrunning patients at t *;
  4. Calculate Stage 1 MLEs β˜k (k=1,…,K) from Stage 1 data;
  5. If β˜S0=minkβ˜k, calculate bias‐corrected estimates.

These steps should be repeated until a large enough number of estimates have been produced in order to calculate empirical quantiles of the sampling distributions with the desired accuracy. Note that the interim analysis in each resampled data set is conducted at the same information fraction as in the original data set, although this may correspond to a different number of events and calendar time.

2.4. Stallard–Todd estimator

The bias‐corrected estimator of Stallard and Todd 17, 26 is the solution β^ST,1 of


where b(β˜)=(b1(β˜),,bK(β˜)) is an estimate of the vector of conditional biases when the true log‐HR is β˜ (Section 2.2.3). A simple fixed point iteration can be used to find β^ST,1:

  1. β˜0=β^MLE,1;
  2. β˜n+1=β^MLE,1b(β˜n);
  3. Repeat Step 2 until convergence.

The conditional biases depend on the unknown covariance matrix of β^MLE,1 (Section 2.2.3), which we replace with the estimator Σ^1 from the Cox model. The estimation of the vector b(β˜) is computationally complex, because it requires the numeric evaluation of integrals of multivariate cumulative distribution functions. Transforming the vector β^MLE,1 in the same way as in Section 2.3.2 and then solving Eq. (9) with the identity matrix as covariance matrix does not work, because these solutions are in general not solutions of the original equation.

We have defined the Stallard–Todd estimator only at the interim analysis. A similar definition is possible at the final analysis using the bias formulas derived in Section 2.2.3, that is solving Eq. (9) with β^MLE,2 instead of β^MLE,1. However, in our simulations, the iterative procedure diverged most of the time. This also happened in the data example in Section 4. This seems to be related to the fact that the selection bias depends on the marginal covariance matrix, but only the conditional covariance matrix given the selection can be estimated at the final analysis. We therefore only use the two‐stage estimator at the final analysis defined by


Confidence intervals for the Stallard–Todd estimator can only be obtained by resampling, as its (asymptotic) sampling distribution is not known. Simulations not reported here suggest that asymptotic normality does indeed hold.

2.5. Bias‐corrected Kaplan–Meier estimator

The Kaplan–Meier estimator in the selected treatment group will be biased upwards. The Kaplan–Meier estimator in the control group will always be (asymptotically) unbiased. We can define a bias‐corrected Kaplan–Meier estimator based on any of the bias‐corrected estimators β^S(t) of the log‐HR by using the relationship between the survival functions in the control and the treatment groups under the Cox model (Eq. (1)). The bias‐corrected Kaplan–Meier estimator at a time t is defined as


In simulations not reported here, we found that the selection bias of the Kaplan–Meier estimator in the selected treatment arm is negligible even when all true treatment effects are equal. We would expect the selection bias to be more pronounced, when selection was performed based on, for example, the estimated survival probability at a specific time instead of the estimated log‐HRs at interim.

3. Simulations

We consider three base scenarios in our simulation study, which are similar to those in 19 and cover a range of configurations of true log‐HRs. It is of course impossible to cover all possible configurations in a simulation study.

  1. Constant: All hazard ratios (HRs) are equal to 1. H R k=1(k = 1, …, K).
  2. Linear: A linear trend from 0.6 to 1: H R 1=1,H R kH R k+1=0.4/(K−1)(k = 2, …, K).
  3. Peak: All but one HRs are 1: H R 1=0.6,H R k = 1(k = 2, …, K).

The baseline hazard was log(2)/12, which corresponds to a median survival time of 12 (months) in the control group. We compare the bias and MSE of all methods for an increasing number of treatment groups K. A maximum of 200 patients are allocated to each group. The interim analysis was conducted after (K+1)·50 observed events in all treatment groups and control group combined. This amounts to approximately 50 events in each group when the event rates are similar. The final analysis was performed after 200 events in the selected treatment group and control group combined.

Figure Figure11 shows the bias in 105 simulations of each method in each of the base scenarios as the number of treatment groups K ranges from 2 to 6 at the interim analysis, the final analysis, and for the two‐stage estimators. The corresponding square root of the MSE is shown in Figure Figure2.2. We observed some convergence problems of Stallard–Todd estimator, especially in the case K=2 with non‐convergence in up to 20% of the simulations. In these cases, the MLE was used as a fallback, which slightly increases the bias but reduces the MSE. Overall the results are very similar to those of 19 in the normal case.

Figure 1

Bias as function of the number of treatment groups in each of the three base scenarios at the interim and final analyses and for the two‐stage estimators. The Stallard–Todd estimator is only defined at the interim analysis and as two‐stage ...

Figure 2

Root mean squared error as function of the number of treatment groups in each of the three base scenarios at the interim and final analyses and for the two‐stage estimators. The Stallard–Todd estimator is only defined at the interim analysis ...

The simulation results confirm the theoretical result that the bias of the MLE is maximal, when all treatment effects are equal (“Constant” scenario). Both shrinkage estimators clearly outperform the MLE and the Stallard–Todd method in terms of bias and MSE in this case. In the “Linear” scenario, the LR shrinkage estimator has smaller bias than the MLE, when there are four or more treatment groups and about the same MSE for any number of treatment groups. Both shrinkage estimator win over the Stallard–Todd method except in the K=2 case (which is identical to the K=2 case in the “Peak” scenario). In the “Peak” scenario, the MLE is practically unbiased and beats all other methods in terms of bias and MSE. However, the LR shrinkage estimator is still very close to the MLE and has a smaller MSE than the Stallard–Todd estimator.

While the estimates of the Stallard–Todd method are on average not far from those of the other methods, a closer look at the distribution of the estimated HRs at the interim analysis (Figure (Figure3)3) reveals that the Stallard–Todd method produces more extreme and skewed results, than the other methods, especially in the “Constant” scenario.

Figure 3

Boxplot of estimated hazard ratios at the interim analysis in the three scenarios for K=4. MLE, maximum likelihood estimator; EB, EB shrinkage estimator; LR, LR shrinkage estimator; ST, Stallard–Todd estimator.

The influence of the allocation ratio on the bias and MSE of the two‐stage estimators is shown in Figure Figure4.4. A maximum of n patients were allocated to each treatment group and a maximum of m patients to the control group. The allocation ratios m:n considered were 3:1, 2:1, 1:1, 1:2, and 1:3. This corresponds to correlations between the estimated log‐HRs of approximately 1/4, 1/3, 1/2, 2/3, and 3/4. The bias increases with increasing correlation in line with our theoretical result. The MSE apparently attains its minimum at a correlation of 0.5. This suggests that patients should be equally allocated to all groups.

Figure 4

Bias and Root mean squared error (RMSE) of the two‐stage estimators as functions of the correlation for K=4 treatment groups in each of the three base scenarios.

The bias and MSE decrease when the number of observed events increases (Figure (Figure5).5). There is a considerable finite sample bias in addition to the selection bias, when only a few events (<25 events per group) are observed.

Figure 5

Bias and Root mean squared error (RMSE) of the two‐stage estimators as functions of the number of events at the final analysis for K=4 treatment groups in each of the three base scenarios.

The largest effect considered is a HR of 0.6. The MLE is already practically unbiased when the HR is 0.6 for one treatment and 1 for all other treatments as can be seen in Figure Figure1.1. An even larger effect would not provide any additional insight, although estimation also works well for values smaller than 0.6.

4. Application

We apply our methods to data reconstructed from the published Kaplan–Meier curves of the FOCUS trial 13 using the method of 27. The Kaplan–Meier curves were digitized using WebPlotDigitizer 28.

In the FOCUS trial, five treatment strategies of sequential and combination chemotherapy for poor prognosis advanced colorectal cancer were compared: flourouracil only (arm A, control), irinotecan after flourouracil (arm B‐ir), combination of flourouracil and irinotecan (arm C‐ir), oxaliplatin after flourouracil (arm B‐ox), and combination of flourouracil and oxaliplatin (arm C‐ox). A total of 2135 patients were randomized in the ratio 2:1:1:1:1.

The reconstructed data closely match the original data in terms of hazard ratios and corresponding confidence intervals (Table 1). The data are analyzed under a hypothetical two‐stage design, with an interim analysis after 900 events across all arms. This corresponds to roughly 300 events in the control arm and 150 events in each of the four treatment arms. The final analysis is conducted after 900 events in the control arm and the selected treatment arm, resulting in about 600 events in the control arm and 300 events in the selected treatment arm. In addition to point estimates, bootstrap confidence intervals from 1000 replications for all methods conditional on the treatment selection in the original data set are provided. The resulting confidence intervals therefore do not account for the variability introduced by the selection and are expected to have less than the nominal coverage as discussed in Section 2.3.

Table 1

HRs, 95% confidence intervals, and p‐values of the original FOCUS data 13 and the reconstructed data.

Both shrinkage methods produce very similar and sensible results (Table 2), when compared with the original observed HRs of the FOCUS trial (Table 1), where no selection was performed. The estimated HRs of the LR shrinkage method are very close the MLE in all arms, with narrower confidence intervals, while the EB shrinkage method shrinks all estimates to the overall HR, resulting in a smaller treatment effect in the selected C‐ir arm. This is the same tendency of the EB shrinkage estimator to overcorrect the bias that was also observed in the “Linear” and “Peak” simulation scenarios.

Table 2

Estimated HRs and 95% bootstrap confidence intervals for all treatment groups (versus A).

The Stallard–Todd estimator applied to this specific data set at the interim analysis gives rather different results, which contradict the observed HRs in the original data, where the C‐ir arm was the best performing treatment arm. While this result is unsatisfactory, it is in‐line with the simulation results (Figure (Figure3)3) that show that extreme solutions can be obtained with this approach. The iterative procedure did not converge when applied at the final analysis (see also Section 2.4).

The effect of continuing follow‐up in the dropped treatment arms can be seen clearly in the B‐ir and C‐ox arms. The substantial difference in Stage 1 HRs completely disappears at the end of Stage 2.

5. Conclusion

The bias of the MLE was found to be small except for a large number ( K6) of treatment groups. We have shown that the common control group reduces the bias of the MLE because of the induced correlation of the MLEs and that the bias of the MLE in the dropped treatment arms is reduced when continuing follow‐up until the final analysis in accordance with the simulation results of 23.

In our simulation study, the shrinkage methods performed very well, when all treatment effects are equal, but had a larger bias than the MLE, which increased with number of treatment groups, when the the maximal absolute difference of the true treatment effects was large. These results reflect the prior assumptions underlying the empirical Bayes estimators. When comparing the estimators with respect to the MSE, the Stallard–Todd method performs the worst across all scenarios. This is also reflected in the analysis of the reconstructed FOCUS trial data, which shows that either shrinkage method is preferable to the Stallard–Todd method.

Unless there is reason to believe that there are large differences between the true treatment effects, the LR shrinkage estimator can be recommended as a way to reduce the bias and the MSE of the MLE. Because the shrinkage estimators tend to overcorrect the bias, they might also be of interest in situations where overestimation of the treatment effect is considered worse than underestimation.

All methods were based on the asymptotic normal approximation of the log‐HRs. In our main simulations, the number of events was at least 150, such that finite sample bias is negligible. This can be seen, for example, in the Peak scenario in Figure Figure1,1, where the MLE is practically unbiased. However, when the number of events per group is very small (<25), the finite sample bias outweighs the selection bias as was seen in Figure Figure5.5. Such small event numbers do not seem relevant in practice, because typically much larger event numbers are required to achieve acceptable power.

Assuming a Cox proportional hazards model is convenient but not necessary, because all methods are based on the asymptotic approximation of the distribution of the treatment effect estimators. Therefore, the Cox model could be replaced with any other model with asymptotically normal treatment effect estimators. It is anticipated that the results would be comparable to those under the Cox model.

Confidence intervals after adaptive treatment selection that have the correct coverage probability are non‐trivial and were outside of the scope of this work. As far as we know, no adequate methods for construction of confidence intervals for the shrinkage or Stallard–Todd methods have been proposed.

In most trials some form of covariate adjustment is required, for example, inclusion of center effects in a multi‐center trial to account for stratified randomization. As long as the asymptotic results still hold, no adjustment is necessary, because only the dimension K of the multivariate normal distribution needs to be increased to accommodate any additional regression coefficients.

The shrinkage estimators could also be derived as estimators arising from penalized Cox regression 29. When using an L 2 penalty, that is, k=1K(βkβ¯)2, there is a one‐to‐one correspondence between the penalty parameter and the prior variance τ. This is appealing, because the penalized Cox regression works directly on the time‐to‐event data and not only via an asymptotic normal approximation. Selection of the penalty parameter could be performed by cross‐validation in order to minimize the prediction MSE. However, cross‐validation is computationally extremely expensive and results in an additional cross‐validation error. In simulations not reported here, the penalized MLE with cross‐validation did not perform better than the shrinkage estimator.

The LR shrinkage of the vector of MLEs does not change the ordering of its components, because


and Ĉ+0. Thus, the results would be the same if selection were based on the LR shrinkage estimator instead of the MLE.

Finally, it is worth mentioning that the calculation of the shrinkage estimator does not depend on the specific selection rule used. It can be used without change with complex selection rules, for example, when selection is based on hypothesis tests. For the Stallard–Todd method and the UMVCUE approach, non‐trivial modifications are necessary 20.


This work is an independent research arising in part from Dr. Jakis Senior Research Fellowship (NIHR‐SRF‐2015‐08‐001) supported by the National Institute for Health Research. Funding for this work was also provided by the Medical Research Council (MR/M005755/1). The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health.

Appendix A. Asymptotic covariance matrix of β^ MLE (t)


Weak convergence of the maximum likelihood estimator (MLE) as a function of calendar time to a mean‐zero Gaussian field {η(t):t0} follows from theorem 4.2 of 30. For t0, define the matrix μ(t) by


where S k(t, u) = P(TC∧(tR)+>u|Z k = 1),S 0(t,u) = P(TC∧(tR)+>u|Z 1 = … = Z K = 0) and β 0 = 0. Then, for j = 1, 2,


where the covariance function of η is given by


For β≈0 and C independent of Z, the asymptotic covariance matrix of n(β^1MLE,j(t),β^2MLE,j(t))T(β1,β2)T is the inverse of the matrix:


where p˜i=pi/(p0+p1+p2). Consequently, we have for the correlation ρ 12(t) between β^1MLE,j(t) and β^2MLE,j(t):


Appendix B. Proofs


B1. Selection probability


If β^kMLE,1 (k = 1, …, K) are independent, then


where Σ˜k is the covariance matrix of β^kMLE,1β^kMLE,1.

B2. Conditional selection bias

B.2.1. At the interim analysis

At the interim analysis, the selection bias of β^kMLE,1 conditional on S=k is


Equation (2) now follows from β^kMLE,1N(βk,σk2) and


In the dropped treatment arm, the conditional bias of β^kMLE,1 given Sk is


B.2.2. At the final analysis

We have that




In the dropped treatment arm, the conditional bias of β^kMLE,2 given Sk is



Brückner M., Titman A., and Jaki T. (2017) Estimation in multi‐arm two‐stage trials with treatment selection and time‐to‐event endpoint. Statist. Med., 36: 3137–3153. doi: 10.1002/sim.7367.


1. Jaki T. Multi‐arm clinical trials with treatment selection: what can be gained and at what price?. Clinical Investigation 2015;5(4):393‐399.
2. Barthel FM, Parmar M, Royston P. How do multi‐stage, multi‐arm trials compare to the traditional two‐arm parallel group design – a reanalysis of 4 trials. Trials 2009;10:21. [PubMed]
3. Friede T, Parsons N, Stallard N, Todd S, Valdes Marquez E, Chataway J, Nicholas R. Designing a seamless phase II/III clinical trial using early outcomes for treatment selection: an application in multiple sclerosis. Statistics in Medicine 2011;30(13):1528‐1540. [PubMed]
4. Magirr D, Jaki T, Whitehead J. A generalized Dunnett test for multi‐arm multi‐stage clinical studies with treatment selection. Biometrika 2012;99(2):494‐501.
5. Stallard N, Todd S. Sequential designs for phase III clinical trials incorporating treatment selection. Statistics in Medicine 2003;22(5):689‐703. [PubMed]
6. Wason J, Magirr D, Law M, Jaki T. Some recommendations for multi‐arm multi‐stage trials. Statistical Methods in Medical Research 2016;25(2):716‐727. [PubMed]
7. Bretz F, Koenig F, Brannath W, Glimm E, Posch M. Adaptive designs for confirmatory clinical trials. Statistics in Medicine 2009;28(8):1181‐1217. [PubMed]
8. Di Scala L, Glimm E. Time‐to‐event analysis with treatment arm selection at interim. Statistics in Medicine 2011;30(26):3067‐3081. [PubMed]
9. Jaki T, Magirr D. Considerations on covariates and endpoints in multi‐arm multi‐stage clinical trials selecting all promising treatments. Statistics in Medicine 2013;32(7):1150‐1163. [PubMed]
10. Royston P, Parmar MK, Qian W. Novel designs for multi‐arm clinical trials with survival outcomes with an application in ovarian cancer. Statistics in Medicine 2003;22(14):2239‐2256. [PubMed]
11. Marson AG, Al‐Kharusi AM, Alwaidh M, Appleton R, Baker GA, Chadwick DW, Cramp C, Cockerell OC, Cooper PN, Doughty J, et al. The SANAD study of effectiveness of valproate, lamotrigine, or topiramate for generalised and unclassifiable epilepsy: an unblinded randomised controlled trial. Lancet (London, England) 2007;369(9566):1016‐1026. [PMC free article] [PubMed]
12. James ND, Sydes MR, Clarke NW, Mason MD, Dearnaley DP, Spears MR, Ritchie AWS, Parker CC, Russell JM, Attard G, et al. Addition of docetaxel, zoledronic acid, or both to first‐line long‐term hormone therapy in prostate cancer (STAMPEDE): survival results from an adaptive, multiarm, multistage, platform randomised controlled trial. The Lancet 2016;387(10024):1163‐1177. [PMC free article] [PubMed]
13. Seymour MT, Maughan TS, Ledermann JA, Topham C, James R, Gwyther SJ, Smith DB, Shepherd S, Maraveyas A, Ferry DR, et al. Different strategies of sequential and combination chemotherapy for patients with poor prognosis advanced colorectal cancer (MRC FOCUS): a randomised controlled trial. The Lancet 2007;370(9582):143‐152. [PubMed]
14. Bauer P, Koenig F, Brannath W, Posch M. Selection and bias – two hostile brothers. Statistics in Medicine 2010;29(1):1‐13. [PubMed]
15. Cohen A, Sackrowitz HB. Two stage conditionally unbiased estimators of the selected mean. Statistics & Probability Letters 1989;8(3):273‐278.
16. Bowden J, Glimm E. Unbiased estimation of selected treatment means in two‐stage trials. Biometrical Journal 2008;50(4):515‐527. [PubMed]
17. Stallard N, Todd S. Point estimates and confidence regions for sequential trials involving selection. Journal of Statistical Planning and Inference 2005;135(2):402‐419.
18. Hwang JT. Empirical bayes estimation for the means of the selected populations. Sankhy: The Indian Journal of Statistics, Series A 1993;55(2):285‐304.
19. Carreras M, Brannath W. Shrinkage estimation in two‐stage adaptive designs with midtrial treatment selection. Statistics in Medicine 2013;32(10):1677‐1690. [PubMed]
20. Robertson DS, Prevost AT, Bowden J. Unbiased estimation in seamless phase II/III trials with unequal treatment effect variances and hypothesis‐driven selection rules. Statistics in Medicine 2016;35(22):3907‐3922. [PubMed]
21. Slepian D. The one‐sided barrier problem for Gaussian noise. Bell System Technical Journal 1962;41(2):463‐501.
22. Kimani PK, Todd S, Stallard N. Estimation after subpopulation selection in adaptive seamless trials. Statistics in Medicine 2015;34(18):2581‐2601. [PubMed]
23. Choodari‐Oskooei B., Parmar MK, Royston P, Bowden J. Impact of lack‐of‐benefit stopping rules on treatment effect estimates of two‐arm multi‐stage (TAMS) trials with time to event outcome. Trials 2013;14:23. [PubMed]
24. Morris CN. Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association 1983;78(381):47‐55.
25. Bowden J, Glimm E. Conditionally unbiased and near unbiased estimation of the selected treatment mean for multistage drop‐the‐losers trials. Biometrical Journal 2014;56(2):332‐349. [PubMed]
26. Stallard N, Todd S, Whitehead J. Estimation following selection of the largest of two normal means. Journal of Statistical Planning and Inference 2008;138(6):1629‐1638.
27. Guyot P, Ades A, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Medical Research Methodology 2012;12:9. [PubMed]
28. Rohatgi A. WebPlotDigitizer., Version 3.10, accessed 2016‐11‐21.
29. Verweij PJM, Van Houwelingen HC. Penalized likelihood in Cox regression. Statistics in Medicine 1994;13(23–24):2427‐2436. [PubMed]
30. Bilias Y, Gu M, Ying Z. Towards a general asymptotic theory for Cox model with staggered entry. The Annals of Statistics 1997;25(2):662‐682.

Articles from Wiley-Blackwell Online Open are provided here courtesy of Wiley-Blackwell, John Wiley & Sons