Search tips
Search criteria 


Logo of biostsLink to Publisher's site
Biostatistics. 2012 January; 13(1): 142–152.
Published online 2011 July 16. doi:  10.1093/biostatistics/kxr016
PMCID: PMC3276275

Efficient design and inference for multistage randomized trials of individualized treatment policies

Ree Dawson*
Frontier Science Technology and Research Foundation, 900 Commonwealth Avenue, Boston, MA 02215, USA, ude.dravrah.icfd.frtsf@noswad


Clinical demand for individualized “adaptive” treatment policies in diverse fields has spawned development of clinical trial methodology for their experimental evaluation via multistage designs, building upon methods intended for the analysis of naturalistically observed strategies. Because often there is no need to parametrically smooth multistage trial data (in contrast to observational data for adaptive strategies), it is possible to establish direct connections among different methodological approaches. We show by algebraic proof that the maximum likelihood (ML) and optimal semiparametric (SP) estimators of the population mean of the outcome of a treatment policy and its standard error are equal under certain experimental conditions. This result is used to develop a unified and efficient approach to design and inference for multistage trials of policies that adapt treatment according to discrete responses. We derive a sample size formula expressed in terms of a parametric version of the optimal SP population variance. Nonparametric (sample-based) ML estimation performed well in simulation studies, in terms of achieved power, for scenarios most likely to occur in real studies, even though sample sizes were based on the parametric formula. ML outperformed the SP estimator; differences in achieved power predominately reflected differences in their estimates of the population mean (rather than estimated standard errors). Neither methodology could mitigate the potential for overestimated sample sizes when strong nonlinearity was purposely simulated for certain discrete outcomes; however, such departures from linearity may not be an issue for many clinical contexts that make evaluation of competitive treatment policies meaningful.

Keywords: Adaptive treatment strategy, Efficient SP estimation, Maximum likelihood, Multi-stage design, Sample size formula


Increased interest in individualized treatment policies has shifted the focus of their methodological development from the analysis of “naturalistically” observed strategies (e.g. Murphy and others, 2001) to experimental evaluation of a preselected set of strategies via multistage designs (e.g. Lavori and Dawson, 2000). The candidate policies under study have been described as “adaptive” treatment strategies (ATS) or “dynamic” treatment regimes because treatment changes are tailored to the circumstances of the individual. The studies have been described as sequential, multiple assignment, randomized (SMAR) trials (Murphy, 2005) because successive courses of treatment are randomly and adaptively assigned over time, according to individual treatment and response history. The multiple randomization stages correspond to the sequential decision making formalized by an ATS.

The following ATS exemplifies those evaluated in the SMAR trial of antidepressants known as STAR*D (Rush and others, 2004): “Start on treatment A; switch to B if poor response or persistent side effects, otherwise, either continue on A or augment A with C, depending on degree of improvement; continue to monitor and switch to D or augment with F, respectively, according to degree of response.” As in STAR*D, the SMAR design to evaluate this and related ATS specifies that all subjects in the trial start on A, so that the first randomization is to possible options for B and C, nested within the response categories for treatment with A. Further randomization to options for D and F is similarly nested within previous treatment and response history. Other SMAR designs may start with nonadaptive randomization, e.g. to different choices for A.

Clinical equipoise successively guides SMAR options for B,C, D and F. That principle, coupled with standardizing of clinical details, such as dosing, reduces the usual explosive variation in treatment regimes found in observational settings. Accordingly, there is often no need to parametrically smooth SMAR trial data, a property that allows us to establish direct connections among different methodological approaches. This paper shows that the simplest estimators of the population mean of the outcome of an ATS and its standard error, derived using probability calculus and “plug-in” method of moments (MOM) estimates, are equal under certain experimental conditions to the analogous estimators provided by optimal semiparametric (SP) theory, maximum likelihood (ML) theory, and Bayesian predictive inference. In particular, we assume that constrained randomization ensures the observed allocation of subjects matches that intended by design and that the sample size is large enough to ensure “replete” data sets at the end of the experiment, in the sense of precluding random zeroes at intermediate randomization steps (Lavori and @x Dawson, 2007).

The equality of the optimal variance estimator with the others is not obvious by appearance and full induction across randomization stages is required to derive the result algebraically. The different formulations for standard error exemplify methodological differences. The iterative probability calculus underlying MOM, ML and predictive estimators is carried out sequentially to reflect the influence due to intervening outcomes used for (nested) multistage randomization. The resulting variance estimator decomposes into stage-specific components (Lavori and @x Dawson, 2007), which quantify the inference “penalty” paid for not knowing a priori the joint outcome distribution (Dawson and Lavori, 2008). The efficient SP influence function used to obtain the optimal variance estimator is specified in terms of the marginal mean of the outcome measured at the end of the study (Murphy and others, 2001). The resulting variance estimator derives from the population marginal variance of the final outcome, typically used for determining the sample size for a single-stage trial, plus a sum of stage-specific variances of the inversely weighted final outcome.

In this paper, we exploit the marginal character of the SP approach to develop a regression-based formula suitable for sample size calculations, which minimizes reliance on unknown population parameters. We also derive a nonparametric counterpart for the SP efficiency gains provided by the optimal estimator, relative to the simpler marginal mean (MM) estimator defined by Murphy (2005) for SMAR trials. We consider the performance of ML and SP inference, in terms of achieved power, when using the regression-based sample size formula. The intent is to provide a unified and efficient approach to design and inference for SMAR trials of ATS that adapt treatment according to discrete responses.


Consider a K-stage trial. For stage k in 1,…,K, let S k be the status of the subject measured at the start of the kth stage and A k the treatment assigned by the kth randomization according to values for S k = (S 1,S 2,…,S k) and A k − 1 = (A 1,A 2,…,A k − 1), with A 1 a function of S 1. SMAR assignment to treatment options can be expressed in terms of (sequential) allocation to different decision rules, which determine treatment as a function of treatment and response history. We write a k = d k(S k = s k,A k − 1 = a k − 1) for the decision rule d k at the kth stage, where a k and s k denote values for treatment and state; the randomization probabilities for d k, denoted {p k(d k|S k,A k − 1)}, are known and experimentally fixed functions of prior state-treatment history. The strategies to be evaluated can be represented as sequences of the decision rules with positive probability of assignment. Each sequence d = {d 1,d 2,…,d K} corresponds to an ATS if the domain for each successive rule includes the state-treatment histories produced by previous rules in the sequence. This condition ensures that the K-stage ATS is a well-defined policy for adaptively determining the “next” treatment. The introductory example consists of two decision rules {d 1, d 2}: A = d 1(S 1 = 1), A + C = d 1(S 1 = 2) and B = d 1(S 1 = 3), where the S 1 indicates response to A. The second decision rule is similarly defined, e.g. a 1 = d 2(S 2 = 1,a 1), where S 2 indicates response measured after a 1 = d 1(S 1).

The SMAR design includes a primary outcome Y for evaluation purposes, obtained after the Kth stage of randomization. We judge the performance of an ATS d byμ d, the population mean of Y that would be observed if all subjects were treated according to d.

2.1. Estimator of the mean of an ATS

When the observed allocation of subjects matches that intended by design, the MOM estimator of μ d is equal to the SP marginal mean (MM) estimator defined by Murphy for SMAR trials (Lavori and @x Dawson, 2007). This condition, which occurs asymptotically and might be achieved in a study using sequentially blocked randomization, is needed because the MM estimator is defined in terms of randomization probabilities rather than their sample counterparts. In this case, both estimators of μ d can be expressed as:

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx1_ht.jpg

where m K(s K) is the sample mean of final responses among subjects sequentially randomized to d through K and having state values S K = s K, and f k(s k) is the sample (conditional) response rate for S k = s k, given assignment to d through k − 1 and S k − 1 = s k − 1. The estimator (2.1) is a version of the nonparametric G-computational formula and is suitable for strategies that adapt treatment according to discrete states, such as the ATS in Section 1.

Murphy (2005), building upon the work of Murphy and others (2001) for observational data, presented an “optimal” SP estimator of μ d for use in SMAR trials, which has the smallest variance among the class of all regular asymptotically linear estimators. Let An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx2_ht.jpg and An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx3_ht.jpg. The optimal estimator is the solution to the efficient estimating equation An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx4_ht.jpg= 0, where n is the number of subjects and U opt is

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx5_ht.jpg

with μ k(s k,d k − 1) = E(Y d|S k = s k,A k − 1 = d k − 1) for k in 1,…,K; Y d denotes the primary outcome when the subject is treated according to strategy d. For k = 1, μ k(s k,d k − 1)[equivalent]μ 1(s 1).

The G-computational formula (2.1) can be used to provide consistent nonparametric estimates of the μ k (given SMAR), in which case, the solution to the estimated estimating equation is optimal (most efficient) (Murphy and others, 2001). With some calculation, the optimal estimator also reduces to (2.1), a result that holds even if the observed assignment proportions differ from the preset probabilities (despite U opt being defined in terms of randomization probabilities).

Because the ML estimates for means and proportions coincide with the plug-in estimates obtained by the MOM for common distributions of interest here, (2.1) is also ML. It is also equal to the predictive estimator of μ d, assuming noninformative priors (Dawson and Lavori, 2008). We therefore refer to (2.1) unambiguously as the estimator of the ATS mean, denoted An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx6_ht.jpg.

2.2. Variance estimators of the estimator of the mean of an ATS

To obtain the ML variance of An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx6_ht.jpg, we assume that (i) the final outcome Y has a stratified normal (continuous case) or Bernoulli (discrete case) distribution across strata indexed by the possible sequences (s K,a K); (ii) the intermediate states S k are distributed conditionally, given (s k − 1,a k − 1), as multinomial random variables; (iii) model parameters are distinct across state-treatment histories for a given stage k and across stages (reflective of SMAR allocation). Because the sequence of nested randomizations in a SMAR trial gives rise to a monotone pattern of missingness for each ATS, the likelihood for the parameters in (i) and (ii) can be factored into components, each of which is a complete-data problem; standard theory dictates that the information matrix and hence (asymptotic) ML covariance matrix of the parameters is block diagonal, with each block corresponding to a complete-data component (Little and Rubin, 1987). It is possible to obtain the ML variance of An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx6_ht.jpg from the block-diagonal covariance matrix (once calculated); however, a more tractable derivation uses iterated variance decomposition (Little and Rubin, 1987). The application to the SMAR set up factors the term of (2.1) for s K into [var phi] K(s K)m K(s K);[var phi] k(s k) = [var phi] k − 1(s k − 1)f k(s k),k = 2,…,K. The iterated calculation produces the same estimator obtained using probability calculus coupled with MOM (Lavori and @x Dawson, 2007) or Bayesian predictive inference (Dawson and Lavori, 2008). We use v^ML to denote the variance estimator of An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx6_ht.jpg provided by these three derivations.

Iterated variance decomposition yields that An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx8_ht.jpg is the “naïve” variance estimate that assumes the coefficients of m K(s K) in (2.1) are known a priori, and v^p is the “penalty” paid for estimating them:

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx7_ht.jpg

where An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx9_ht.jpg is the sample variance of m K[equivalent]m K(s K) and [var phi] K [equivalent][var phi] K(s K ) (Dawson and Lavori, 2008). The co^v(φK,φK) can be obtained by induction on k, with K = 1 being the usual multinomial calculation (Lavori and @x Dawson, 2007). For general K, v^p decomposes into stage-specific components of penalty variance, with the kth-stage term of co^v(φK,φK) reducible up to multiplicative factors to co^v(fk,fk). See Appendix A in the supplementary materials available at Biostatistics online.

The estimated asymptotic variance of the optimal SP estimator of μ d, denoted v^OPT, is obtained nonparametrically from the variance of U opt (Murphy, 2005). Specifically, v^OPT is the estimate of An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx10_ht.jpg, where

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx11_ht.jpg

and the expectation E d() is calculated under the distribution of S K and Y when all treatments are assigned according to the strategy d. As before, the μ k are estimated using the G-computational formula, which guarantees that v^OPT achieves the SP efficiency bound.

In Appendix A in the supplementary materials available at Biostatistics online, we use induction to algebraically show equality of the variance estimators. The result holds asymptotically without restriction, but for finite samples requires that the observed allocation of subjects matches that intended by design, as set out in Section 2.1 for the MM estimator of μ d. For analytic derivations, we assume blocking or some other form of constrained randomization makes this distinction moot and use the notation p k(d k|S k,A k − 1) interchangeably for expected and observed proportions under d.

A key element of the inductive proof is the ANOVA decomposition:

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx12_ht.jpg

where e^d() is the sample estimator of E d() obtained via inverse weighting:

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx13_ht.jpg

Algebraic reexpression of (mKμ^k)2 in (2.5) in terms of covariances of the “pseudo” proportions An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx14_ht.jpg provides a direct comparison of v^OPTv^n to the penalty component of v^ML in (2.3). Because the kth-stage term of v^p restricts covariance uncertainty tof k andf k (see above), the difference v^OPTv^ML gives rise to K remainder terms that telescope to zero (proved inductively). We remark that the variance components of V(U opt), and consequently of v^OPT, are expressed in terms of squared deviations, a property that must be shared by v^ML for equality to hold. This motivates (i) as the likelihood model in Y.


Murphy (2005) obtains the simple MM (SP) estimator of μ d and its standard error by setting each μ k in U opt to μ d. To characterize the potential loss of efficiency in doing so, we express v^MM as:

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx15_ht.jpg

for comparison to the ANOVA decomposition (2.5) of v^OPT; v^b accounts for response heterogeneity across subgroups indexed by state history (Lavori and @x Dawson, 2007). With some algebra, it follows from (2.6) that An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx16_ht.jpg, where n K[equivalent]n K(s K) is the number of subjects sequentially randomized to d through K and having state history s K; (2.5) becomes

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx17_ht.jpg

where v^b(sK) is the summand of v^b corresponding to s K.

Consider v^b and the kth-stage term of (3.8). Let An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx18_ht.jpg, which can be reexpressed as An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx19_ht.jpg noting that An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx20_ht.jpg. Suppose that S k is binary (achievable by introducing more stages) taking on values s k, s k . Accordingly, Δk can be sequentially defined in terms of stage-specific response heterogeneity: Δk = δ k + δ k − 1 + (...) + δ 2 + δ 1, where An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx21_ht.jpg, δ 1 = Δ1 and f k = f k(s k − 1,s k ) = 1 − f k(s k − 1,s k)[equivalent]1 − f k. The derivation follows by induction.

We can reexpress v^OPTv^MM directly in terms of the δ k when p k(d k|S k, A k − 1)[equivalent]p k(d k) for all k. The case K = 3 suffices to concretely explicate the general result:

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx22_ht.jpg

The SMAR randomization probabilities specified by the trialist govern SP efficiency gains with v^OPT in a simple way under the assumed restrictions. The strength of the relationship of state history to Y, as evidenced by the magnitudes of the δ k, has impact as well, consistent with simulated results for two-stage trials (Wahed and Tsiatis, 2004). Differentiation of (3.9) shows that efficiency is maximized when each S k acts like a flip of a fair coin, thereby allowing sequential allocation of subjects to each possible state history. The worst improvement occurs when S k is a degenerate binomial (all mass on one outcome) at each stage but the last. But then the study is not adaptive before the last stage and is equivalent to the cross-sectional K = 1 case.


We develop sample size formulae for inference for μ d under the assumption that expected and observed SMAR allocations coincide and choose to use the SP variance estimator for An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx6_ht.jpg because of its marginal formulation. We further assume that E d[(Yμ k)2|s d,k] = V d(Y|s d,k) = σ k 2(s d,k) is homogeneous across state history at k, i.e. σ k 2(s d,k)[equivalent]σ k,d 2[equivalent]σ k 2, in order to reexpress V(U)opt in terms of familiar regression quantities. Applying iterated expectation to the kth-stage term in (2.4) gives E d[(1 − p k)P k − 1(Yμ k)2] = E d[(1 − p k)P k − 1]σ k 2. Moreover, E d[(1 − p k)P k − 1] = (1 − p k)P k − 1 if the kth-stage randomization probabilities are all equal to p k(d k), as would occur in a “balanced' SMAR trial. In this case, V(U opt) = σ Y 2+∑k = 1 K(1 − p k)P k − 1 σ k 2, where σ Y,d 2[equivalent]σ Y 2 is the marginal variance of Y d. Let R T 2 = (1 − σ K 2/σ Y 2) be the coefficient of determination for the regression of Y d on S d,K, and R k 2 denote the (population) increment in coefficient of determination when S d,k is added to the regression of Y d on S d,k − 1. Then V(U opt) becomes

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx23_ht.jpg

noting that R T 2 = ∑R k 2. We refer to the multiplier of σ Y 2 as the “variance inflation factor” (VIF) due to the SMAR design, which generalizes to

An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx24_ht.jpg

when randomization probabilities depend on prior state values. Using either (4.10) or (4.11) as appropriate provides the SMAR version of the usual one-sample t-test formula for sample size: An external file that holds a picture, illustration, etc.
Object name is biostskxr016fx25_ht.jpg, where α is the significance level, 1 − β is the power to be achieved, and ES = (μ dμ 0)/σ Y is the standardized difference between μ d and the null meanμ 0.

For a balanced SMAR trial in which p k(d k|S k, A k − 1)[equivalent]p k(d k) for all k, the sample size formula only requires that the unknown distribution of (S d,K, Y) be restricted in terms of the V d(Y|s d,k), assumed homogenous across state values S d,k = s d,k. Homogeneity of variance is a simplifying assumption typical of power calculations for fixed treatment trials, but sequential allocation makes the assumption unlikely for allK stages. More subtly, the assumed equality of the V d(Y|s d,k) algebraically transforms the stratified (nonparametric) regression structure of V(U opt) resulting from optimal SP theory into linear association, as characterized by the R k 2 in the VIF. Although the requirement of homogenous variances does not directly restrict conditional expectations, (4.10) or (4.11) may only partially account for any nonlinearity in the E d(Y|s d,k).

We remark that our sample size formulae are conditional on the SMAR allocation and hence may underestimate the required number of subjects; calculations based on (4.10) suggest that the typical impact of such conditioning will be negligible when constrained randomization is used.


A central issue to the performance of the sample size formula in Section 4 is how well the parametric reexpression of V(U opt), derived assuming homogeneity of variance, reflects the nonparametric inference carried out using the estimators in Section 2. It may be that successive stratification leads to one or more random zeroes at intermediate stages of randomization, even if the nominal level of power is achieved (in the frequency sense). As the sample size grows, the chance of this diminishes. We conducted simulations to understand the degree to which good performance of the sample size formula across repeated samples protects the trialist from an unlucky (nonreplete) SMAR realization. Because the formula may also fail to protect against near sampling zeroes (and thereby interfere with constrained randomization), we calculated the test statistic twice, using ML and SP estimators.

The simulation set up is structured to explicate the relationship between “repleteness,” defined as the lack of random zeroes at any intermediate stage of the SMAR experiment, and calculated sample size. Data for the ATS in the Introduction, denoted as d, are generated by the following scheme. The state space for symptom severity at each stage is {1,2,3}; these values determine whether to adaptively continue, augment or switch medication, using the stage-specific options specified by d. As in the STAR*D antidepressant study, S d,1[equivalent]S 1 is obtained after an initial trial on the medication A; baseline values are equiprobable. The values for S d,2 evolve according to the transition matrix (TM) with rows (0.7, 0.2, 0.1), (0.5, 0.3, 0.2), (0.1, 0.5, 0.4), where TMij= Pr(j|i), consistent with “healthier” subjects having greater probability of better successive outcomes. The final outcome is generated as a regression on state history. For the continuous case, Y d = S d,2 T β + e, e~N(0,σ e 2), where (β 1,β 2) = (1,2) and the intercept β 0 = 0.5 is the coefficient for S 0[equivalent]1. For the discrete case, Y d is Bernoulli with probability p = logit − 1(S d,2 T β),(β 1,β 2) = (1,2); β 0 is − 6.0, − 4.5, or − 3.0 to govern the degree of nonlinearity in expected Bernoulli outcomes.

The simulation set up is further structured to investigate SP efficiency gains (relative to the MM estimator) beyond the case of a balanced design required by the analytic derivations in Section 3. In the simulations, random assignment to d depends on prior state values in the following way: subjects who are (well, in partial remission, ill) continue on d with probability (1, 1/3, 1/2); accordingly, we use formula (4.11) to calculate sample sizes. Sequential blocking is used throughout to ensure whenever possible that observed and expected allocations agree; constrained randomization also protects against nonrepleteness. Additionally, simulated trials vary by whether they use a “safe” mechanism to guarantee positive sample sizes across state histories at both stages of the trial (Lavori and @x Dawson, 2007). Specifically, safe implies that once the number of subjects for a particular state history falls below a certain value (set here to 6), further randomization stops and subjects with those states continue on d thereafter. The safe mechanism is intended to reflect the effects of good practice in the sense that the trialist would ensure repleteness either through design or by monitoring subject accrual during the trial.

For purposes of inference for μ d, we set the standardized effect size in the sample size formula to be either 0.2 or 0.4. The trialist might specify the larger ES value to ensure adequate precision for individual ATS means when planning a pilot SMAR trial. The inherent “cost” in successfully implementing a whole-treatment strategy makes it unlikely that the trialist would find effects smaller than 0.2 of practical relevance.


Table 1 summarizes 2000 replications of the set up for continuous Y d for every combination of ES = 0.2, 0.4 and σ e = 0.5, 1, 2. Throughout, the nominal level of power to be achieved was set to 0.80, with the level of the test = 0.05. The test statistic (the difference of the estimated mean and the null value divided by the standard error) was compared to 1.96, suggested by asymptotic normality of the ML and SP estimators of μ d.

Table 1.

Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is continuous. VIF is calculated from the regression of Yd on Sd , 2

The results show that when ES = 0.2, the calculated sample sizes ensure repleteness for almost all experiments. By contrast, when ES = 0.40, the proportion of replete experiments among the 2000 replications ranges from 60% to 89%. One could argue that for most SMAR trials, the primary interest will be to detect moderate-sized causal effects, thereby increasing the sample size beyond that provided by the generalized t-test formula in Section 4 when ES = 0.4. Nonetheless, the simulations serve to illustrate the relevance of repleteness to good planning of a SMAR experiment, beyond the usual sample size considerations.

A more striking result in Table 1 are the differences in power achieved by the ML and optimal SP estimators. ML estimation is mostly robust to even substantial failures of repleteness, because of its use of sample quantities in (2.1) and (2.3) based on allocated proportions. In contrast, the SP reliance on assignment probabilities precludes the optimal estimator (and its standard error) from tuning to the sample at hand. This is true even with mostly replete repetitions, highlighting the influence of near sampling zeroes on achieved power with SP estimation. The expansion of Table 1 in Appendix B in the supplementary materials available at Biostatistics online shows that differences in power for the two approaches are influenced much more by their differences in estimates of μ d than by differences in estimated standard errors. The cases n = 320,404 show this to be true for even modest loss of power when sample sizes for some strata are too small for sequentially blocked randomization to achieve a priori assignment probabilities. We note that the efficiency gains for ML estimation are modest, with relative efficiency running from 0.95 to 1.0, for simulated trials without safe turned on but using constrained randomization.

It is not surprising that the optimal estimator may sometimes be underpowered when the simulated trials use the safe option, given that certain a priori randomization probabilities may be set to zero. In contrast, ML estimation ensures nominal power in these cases, albeit conservatively for some scenarios. This property suggests that ML estimation is a suitable choice for inference, prior to the execution of the trial and any knowledge of the stochastic process underlying intermediate states. More generally, its “self-tuning” property of in the face of random and near sampling zeroes reminds us that the asymptotic ML variance estimator coincides with the finite sample one obtained from the MOM.

Table 2 shows that repleteness and near sampling zeroes have at most moderate impact on the SP efficiency gains provided by the optimal estimator; such impact occurs because of the (inversely weighted) estimates of theμ k in U opt. In theory, efficiency gains for fixed σ e should not depend on n, and simulations with excessively large sample sizes show this to be the case. For the realistic values of n in Table 2, the relative efficiency for any given value of σ e depends on whether the sample size was geared to ES = 0.2 or ES = 0.4. Nonetheless, the results of the simulations confirm that the strength of the relationship of state history to Y d, as evidenced by the R T 2 values, governs the magnitude of efficiency gains.

Table 2.

Relative efficiency of the optimal SP estimator to the MM SP estimator when Yd is continuous. RT 2 and the sample size n are calculated as described in Section 4

Table 3 for the binary set up shows the sample size formula provides close to the nominal power of 0.80, albeit smaller at times, for at most moderate nonlinearity in expected Bernoulli outcomes (β 0 = − 6.0, − 4.5), and is conservative otherwise. We attribute the excessive sample sizes for the case β 0 = − 3.0 to the inability of the VIF to adequately account for strong nonlinearity rather than due to marked failure of sequential homogeneity of variance, given good performance for the normal model set up in the presence of this type of failure (Dawson and Lavori, 2010). However, strong departures from linearity may not be of issue for many realistic applications because of the impact on μ d, which is much higher for β 0 = − 3.0: μ d = 0.82 compared to μ d = 0.44, 0.63 for β 0 = − 6.0, − 4.5, respectively. ATS will tend to be moderately successful (or not) in populations with sufficient response heterogeneity to make sequential treatment adaptation clinically attractive, making values of μ d such as 0.82 unlikely to occur.

Table 3.

Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is binary. VIF is calculated from the regression of Yd on Sd , 2

The performance of SP and ML estimation is more similar for the binary case than for continuous Y d, although larger sample sizes (expected for discrete outcomes) promote significant differences in achieved power. The impact on achieved power due to differences in estimates of μ d is sometimes canceled out by the impact due to differences in estimated standard errors. When repleteness held across replications, differences in standard error had modest impact. See Appendix B in the supplementary materials available at Biostatistics online.


Prior development of SMAR sample size formulae derived from SP theory have specified estimators that used the known randomization probabilities (Murphy, 2005; Feng and Wahed, 2009) and did not use the most efficient influence function as a basis for the derivations. Building upon that work, we have developed theoretical connections that not only provide a more efficient basis for sample size calculations but also help to establish the advantage of using observed (ML) rather than expected (optimal SP) allocations for planned evaluation of ATS. The better performance of ML estimation in terms of achieved power parallels the superiority of model-based weights for studies with nonrandomized treatments (less bias) or missing data (better efficiency) (see, e.g. Rotnitzky and Robins, 1995). We note that the results obtained here may be specific to the sequential context in which randomizations are adaptively nested over time.

The sample and population formulations of SP variance in this paper elucidate the central role played by response heterogeneity in determining the magnitude of sequential uncertainty. Section 3 offers a nonparametric characterization of sample response heterogeneity in terms of stage-specific between-subgroup sum of squares, which captures the sequential effect of response heterogeneity on SP efficiency. The increments in regression-based coefficients of determination defined in Section 4 provide the parametric counterparts at the population level and describe the sequential effect of response heterogeneity on sample size requirements. Less apparent is the intrinsic role of response heterogeneity to estimators developed for SMAR data. The entire premise of an ATS relies on a strong relationship between outcome and state on which to base decisions. Because the SMAR design mimics sequential decision making, the missingness intentionally created by sequential (nested) randomization is governed implicitly by variation in responses across states for any given strategy. In the absence of such variation, treatment assignment at any given stage reduces to a flip of a fair coin, making sequential adjustment for state history unnecessary. For certain estimators, such as the ML and optimal SP ones considered here, their adjustment for SMAR missingness to guarantee consistency also reaps the usual efficiency gains, as translated to the sequential context.

The sample size formulae we developed apply directly to inference for a single ATS but require extension for paired comparisons. In Dawson and Lavori (2010), we use the ML formulation of variance to derive an analytic approximation to (positive) between-strategy covariance created by sequential nested randomization and adjust sample sizes for pairwise comparisons accordingly. The adjusted sample size formulae are the basis of a method we establish for sizing a SMAR trial with the goal of fully powering all pairs of strategies deemed “distinct” (defined in terms of effect size).

The results in this paper emphasize the importance of running a “tight” trial, using sequentially constrained randomization in combination with some version of an a priori designated safe option. The trialist should also consider whether the calculated sample size will sufficiently protect against sparse data and whether a larger number of subjects might circumvent the need for a safe option, which effectively truncates the ATS under evaluation. The simulation set up provides one means to translate clinical judgments about intermediate response rates into the frequentist probability of experimental repleteness. The trialist can also use the simulation set up to “firm up” guesses for variance inflation factors when more than moderate nonlinearity is suspected.


Supplementary material is available at


The National Institute of Mental Health (R01-MH51481 to Stanford University).

Supplementary Material

Supplementary Material:


Conflict of Interest: None declared.


  • Dawson R, Lavori PW. Sequential causal inference: application to randomized trials of adaptive treatment strategies. Statistics in Medicine. 2008;27:1626–1645. [PMC free article] [PubMed]
  • Dawson R, Lavori PW. Sample size calculations for evaluating treatment policies in multi-stage design. Clinical Trials. 2010;7:643–652. [PMC free article] [PubMed]
  • Feng W, Wahed AS. Sample size for two-stage studies with maintenance therapy. Statistics in Medicine. 2009;28:2028–2041. [PubMed]
  • Lavori PW, Dawson R. A design for testing clinical strategies: biased adaptive within-subject randomization. Journal Royal Statistical Society A. 2000;163:29–38.
  • Lavori PW, Dawson R. Improving the efficiency of estimation in randomized trials of adaptive treatment strategies. Clinical Trials. 2007;4:297–308. [PubMed]
  • Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: John Wiley; 1987.
  • Murphy S. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine. 2005;24:1455–1481. [PubMed]
  • Murphy SM, van der Laan MJ, Robins JM. Marginal mean models for dynamic regimes. Journal of the American Statistical Association. 2001;96:1410–1423. [PMC free article] [PubMed]
  • Rotnitzky A, Robins JR. Semiparametric regression estimation in the presence of dependent censoring. Biometrika. 1995;82:805–820.
  • Rush AJ, Fava M, Wisniewski SR, Lavori PW, Trivedi MH, Sackeim HA, Thase ME, Nierenberg AA, Quitkin FM, Kashner TM, and others Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Controlled Clinical Trials. 2004;25:119–142. [PubMed]
  • Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. [PubMed]

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press