Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2999650

Formats

Article sections

- Abstract
- Introduction
- Realization of a SMAR design
- Basis for sample size formulae
- Sample size: Pairwise comparison with no overlap
- Sample size: Pairwise comparison with overlap
- Sample size: SMAR trial set up
- Simulations
- Sample size: SMAR trial with unequal randomization probabilities
- Discussion
- References

Authors

Related links

Clin Trials. Author manuscript; available in PMC 2011 January 1.

Published in final edited form as:

PMCID: PMC2999650

NIHMSID: NIHMS210900

Ree Dawson, Ph.D. and Philip W. Lavori, Ph.D.

Ree Dawson, Frontier Science Technology and Research Foundation, 900 Commonwealth Ave., Boston MA 02215, U.S.A;

Corresponding Author: Ree Dawson, Email: ude.dravrah.icfd.frtsf@noswad

The publisher's final edited version of this article is available at Clin Trials

See other articles in PMC that cite the published article.

Sequential Multiple Assignment Randomized (SMAR) designs are used to evaluate treatment policies, also known as adaptive treatment strategies (ATS). The determination of SMAR sample sizes is challenging because of the sequential and adaptive nature of ATS, and the multi-stage randomized assignment used to evaluate them.

We derive sample size formulae appropriate for the nested structure of successive SMAR randomizations. This nesting gives rise to ATS that have overlapping data, and hence between-strategy covariance. We focus on the case when covariance is substantial enough to reduce sample size through improved inferential efficiency.

Our design calculations draw upon two distinct methodologies for SMAR trials, using the equality of the optimal semi-parametric and Bayesian predictive estimators of standard error. This ‘hybrid’ approach produces a generalization of the t-test power calculation that is carried out in terms of effect size and regression quantities familiar to the trialist.

Simulation studies support the reasonableness of underlying assumptions, as well as the adequacy of the approximation to between-strategy covariance when it is substantial. Investigation of the sensitivity of formulae to misspecification shows that the greatest influence is due to changes in effect size, which is an *a priori* clinical judgment on the part of the trialist.

We have restricted simulation investigation to SMAR studies of two and three stages, although the methods are fully general in that they apply to ‘*K*-stage’ trials.

Practical guidance is needed to allow the trialist to size a SMAR design using the derived methods. To this end, we define ATS to be ‘distinct’ when they differ by at least the (minimal) size of effect deemed to be clinically relevant. Simulation results suggest that the number of subjects needed to distinguish distinct strategies will be significantly reduced by adjustment for covariance only when small effects are of interest.

Growing interest in the development of individualized treatments has led to a new generation of clinical trials and methodology aimed at evaluating treatment policies with full experimental rigor. The clinical trials are designed in accord with the sequential decision making underlying the policies. In these studies, successive courses of treatment are randomly and adaptively assigned over time, according to the individual subject's treatment and response history; each stage of randomization corresponds to a decision stage of the dynamic treatment policies under evaluation. Multi-stage randomization trials have been used in fields as diverse as psychiatric, cancer and AIDS research, and as early as the 1980s, for the evaluation of oncology treatment policies; see [1]. In the recent statistical literature, treatment policies have been described as adaptive treatment strategies (ATS) because treatment changes are tailored to the circumstances of the individual, including response to prior treatments [2]; the multi-stage randomization designs for evaluating ATS have been described as sequential, multiple assignment, randomized (SMAR) designs [3]. We adopt both terminologies in this paper.

To make ideas concrete, consider a two-stage generic ATS that specifies ‘Start on medication *A*, change treatment to *B* if the patient's symptoms continue under *A*, otherwise maintain on *C*′, where *A*, *B*, and *C* are fixed for a particular strategy. Figure 1 depicts the SMAR trial to evaluate competitive choices for *A*, *B*, and *C*, using the state *S* to indicate whether symptoms persisted under *A*.

Two-stage SMAR design for evaluating two-stage strategies. There are two options for initial medication, *A* and *A**. The alternatives for second medication depend on initial medication and response to it. For example, responders to *A** are randomized to **...**

The SMAR structure illustrated in Figure 1 is well suited for the development of whole strategies. In particular, the design enables detection of interactive effects in treatment sequences that may be overlooked by the ‘myopic’ approach that evaluates each treatment decision with a separate single-stage trial [4]. For example, the medication *A* may be superior at symptom reduction (measured by *S*), but the optimal two-stage ATS may start with the alternative *A** because it enhances the effects of particular secondary treatments [3]. Generally, SMAR treatment alternatives are fully nested over time, according to clinical equipoise, to insure that the design encompasses the complete set of ATS relevant to determining the best strategy. This property distinguishes the SMAR design from those that randomize subjects at baseline to a subset of the strategies that would be determined sequentially by equipoise. For either approach, it is possible to evaluate the effect of initial (or a later stage) treatment by comparing whole ATS in the study that begin with different treatments [3,5]. The use of ATS for this purpose insures against fallacies associated with the myopic approach. In this paper, we focus on the comparison of whole strategies, as this is typically the primary goal of a SMAR trial.

Much of the development for SMAR trials has centered on methods for inference [6-8]. For reasons described above, evaluation is in terms of a ‘final’ outcome (measured after the last decision stage) that takes into account the sequence of intervening outcomes due to successively applied treatments [3]. Improvements to estimators have been proposed to provide gains in efficiency [9-12]. Efficiency is of particular importance to SMAR trials for two reasons. First, the original sample splits randomly and adaptively at each stage of the design, with subjects nested within strata defined by previous treatment and response history. Hence, the sequence of randomizations creates a monotone pattern of *ignorable* missingness for each ATS in the study [6]; improved efficiency helps address the resulting loss of statistical power. Second, the nested structure of treatment assignment gives rise to ATS that have overlapping treatments, e.g., two or more strategies may (adaptively) specify the same initial treatment [6]. The greater the overlap there is for pairs of strategies, the greater likelihood there is for diminished causal effects. Hence if two strategies with much overlap are to be compared, it is important to size the trial efficiently.

Less attention has been given to design methodology. A significant challenge is the development of methods for sample size determination because of the sequential and adaptive nature of both the strategies under study and the treatment assignment mechanism used to assign subjects to the ATS. Consider the generic ATS, and assume that the trial uses a final outcome *Y*, which summarizes symptom severity over time, to evaluate alternatives for *A*, *B*, and *C*. The unknown population parameters required for sample size calculations not only include those for the distribution of *Y*, but also for the distribution of the intervening state *S*. In particular, because the second randomization is adaptive, the distribution of *Y* will be stratified according to the *unknown* distribution of *S* (the responses to *A*). Adding another decision stage further splits the stratification, leading to many more unknowns than commonly needed for sample size calculations, as well as greater uncertainty about their values.

For specific contexts, simulation has been used to evaluate design requirements [7,13]. A sample size formula has also been developed for two-stage trials with time to event endpoints when two strategies specify the same initial treatment [14]. To provide a general method for trials with continuous outcomes, Murphy appeals to ‘working’ assumptions about treatment assignment to circumvent the need to specify parameters for the unknown distribution of state (response) history when determining the number of subjects for a SMAR trial. This allows a simple upper bound on the variance of the estimated mean for the ATS to be derived solely in terms of the population ‘within-strategy’ variance of *Y*; simulations show that the sample size formula based on the bound may be conservative when treatments are assigned adaptively in terms of prior state history [3]. The approach has been extended to cover a range of research questions that can be addressed by the SMAR design [5]. As in Murphy's original paper, the test statistics used to size SMAR trials to specific questions assume that strategy means have been estimated with semi-parametric marginal means (MM) models [15]. However, the MM variance estimator has been shown to be upwardly biased for clinically realistic scenarios, which may lead to excessive number of study subjects when used in sample size formulae [6,10].

Recently, Dawson and Lavori [16] developed an alternative method for determining sample sizes for SMAR trials with continuous outcomes, using *optimal* (efficient) semi-parametric theory as a basis for those calculations. The efficiency gains provided by optimal estimation effectively eliminate the problem of bias with the MM variance estimator. Moreover, by appealing to assumptions typical of design calculations, but adapted to the SMAR setting, the population within-strategy variance can be expressed in terms of effect size and regression quantities familiar to the trialist. This leads to a simple sample size formula for pairwise comparisons based on the t-test, and eliminates the need to specify intervening response rates or to assume worst case scenarios. Nonetheless, a key practical issue remains for the trialist intending to size a SMAR trial, which arises because of the nested nature of the randomization. Specifically, any overlap between a pair of ATS (created by sequential treatment assignment) not only diminishes their causal difference, but also introduces positive between-strategy covariance. Hence causal differences are getting smaller, while inference (that takes into account covariance) is getting more efficient. It is unclear what impact this has on observable effect sizes, and how design calculations should be carried out to efficiently size SMAR trials. In this article, we consider the role of covariance, and equivalently of overlap, in determining sample sizes for multi-stage designs.

A tree is a canonic way to depict the nested structure of a SMAR realization (see Figure 1). For the simplest design, the first stage of randomization is represented by the initial node that divides into branches, one for each possible treatment option. Because the next randomization is adaptive, each ‘treatment’ branch subdivides into further branches, one for each potential subsequent response. The tree continues to subdivide in the same way for successive stages of the design, along the treatment-response branch paths defined by previous randomizations. The set of strategies under evaluation determines the alternative treatment options and how they are adaptively applied, and consequently the SMAR tree or design. For example, the trialist would ‘dial’ in different choices for the placeholders *A*, *B*, and *C* in the example ATS in the Introduction, according to the strategies of *a priori* interest, which in turn would specify the treatment alternatives for a two-stage SMAR study. As for fixed treatment trials, the choices for *A* should satisfy clinical equipoise. The choices for *B* and *C* could vary according to the choices for *A*, and would satisfy *sequential* equipoise for the adaptive use of *B* and *C*, following the initial use of a particular choice for *A*.

For purposes of design and inference, the multi-stage design can be described sequentially in terms of the adaptive randomized treatment assignments. Consider a SMAR trial with three stages of randomization. Let *S*_{1} be the baseline state of the subject, and let *A*_{1} be initial treatment randomly assigned on the basis of *S*_{1}. For stage two, let *S*_{2} be the status of the subject measured at the start of the second stage and *A*_{2} the treatment assigned by the second randomization according to *S*_{2} = (*S*_{1}, *S*_{2}) and *A*_{1}. Analogously, *S*_{3} is measured at the start of the third stage and *A*_{3} is assigned by the last randomization according to *S*_{3} = (*S*_{1}, *S*_{2}, *S*_{3}) and *A*_{2} = (*A*_{1}, *A*_{2}). The sequence of randomizations can expressed as (sequential) assignment to alternative decision rules, each of which determines treatment for the next stage of the study as a function of the state and treatment history to date. We write *d _{k}*(

For a three-stage SMAR trial, each observable sequence {*d*_{1}, *d*_{2}, *d*_{3}} corresponds to an ATS, which we denote as ** d**, if the domain for each successive decision rule includes the state-treatment histories produced by previous rules in the sequence. This condition insures that the ATS is a well-defined policy for adaptively determining the ‘next’ treatment. A two-stage trial similarly evaluates sequences {

To evaluate competitive ATS, the SMAR design includes a primary outcome *Y*, obtained after the last stage of randomization, assumed in this paper to be continuous. We judge the performance of an ATS ** d** by

Under appropriate experimental conditions detailed in the Appendix (e.g., sequentially blocked randomization), the optimal semi-parametric estimator of *μ***_{d}** obtained from SMAR data, denoted

The estimator of strategy means on which the sample size formulae are based can be expressed in terms of stage-specific, stratified quantities as:

$${\widehat{\mu}}_{\mathit{\text{d}}}=\sum _{{\mathit{\text{s}}}_{3}}{\phi}_{3}({\mathit{\text{s}}}_{3}){m}_{3}({\mathit{\text{s}}}_{3})$$

(1)

where *m*_{3}(*s*_{3}) is the sample mean of final responses among subjects sequentially randomized to ** d** through the final stage and having state values

$${\phi}_{3}({\mathit{\text{s}}}_{3})=\prod _{k=1}^{3}{f}_{k}({\mathit{\text{s}}}_{k})$$

(2)

where *f _{k}*(

It is possible to derive a formula for population within-strategy variance suitable for design calculations, using V(*U _{opt}*) and a simplifying assumption [16]. At each stage, consider the residual obtained by the regression of the final outcome

$${\mathrm{\sigma}}_{Y}^{2}{P}_{3}^{-1}[1-(1-{p}_{1}{p}_{2}{p}_{3}){R}_{1}^{2}-\text{(1}-{p}_{2}{p}_{3}\text{)}{R}_{2}^{2}-\text{}(1-{p}_{3}){R}_{3}^{2}]$$

(3)

where
${\mathrm{\sigma}}_{Y}^{2}$ is the variance of *Y***_{d}**,

We refer to the multiplier of
${\mathrm{\sigma}}_{Y}^{2}$ in (3) as the ‘variance inflation factor’ (VIF) due to the SMAR design. It accounts for the loss of precision due to missingness created by successive randomizations, relative to a trial that would allocate all subjects to ** d**. It also makes explicit the efficiency gains due to semi-parametric optimality, as the first term in (3) corresponds to the MM variance estimator [16].

When randomization probabilities for ** d** depend on state history, (3) becomes:

$${\mathrm{\sigma}}_{Y}^{2}[{E}_{\mathit{\text{d}}}({P}_{3}^{-1}\text{)}-\text{}{E}_{\mathit{\text{d}}}((1-{p}_{1}{p}_{2}{p}_{3}){P}_{3}^{-1}){R}_{1}^{2}-\text{}{E}_{\mathit{\text{d}}}((1-{p}_{2}{p}_{3}){P}_{3}^{-1}){R}_{2}^{2}-\text{}{E}_{\mathit{\text{d}}}((1-{p}_{3}){P}_{3}^{-1}){R}_{3}^{2}]$$

(4)

where the expectation *E***_{d}**() is calculated under the distribution of

To illustrate SMAR sample size calculations, we consider the two-stage ATS in the Introduction, and suppose there is interest in comparing two versions, denoted ** d** and

Let ES be the standardized effect size of interest specified by the trialist. To generalize the usual t-test formula for sample size, we pool the marginal outcome variance of ** d** and

$${P}_{2}^{-1}[1-(1-{p}_{2}){R}_{T}^{2}]$$

in the absence of using a baseline state for the first randomization. For illustration, set the nominal level of power to be achieved to 0.80, with the level of the test = 0.05. Then the sample size is calculated as
${7.9}^{\ast}{2}^{\ast}{\scriptstyle \frac{\mathrm{V}\mathrm{I}{\mathrm{F}}_{p}}{\mathrm{E}{\mathrm{S}}^{2}}}$. If the final outcome is strongly related to the intermediate state *S*_{2} for both strategies, e.g.,
${R}_{T}^{2}=0.7$ for ** d** and

The t-test generalization ignores the role of between-strategy covariance in SMAR inference, which may lead to an excessive number of subjects. To derive an adjustment to sample size when covariance is likely to be substantial, we consider the case when two ATS agree except for the last decision rule. As detailed in the Appendix, we make assumptions that imply *μ***_{d}** =

$${n}^{\ast}=n\phantom{\rule{0.2em}{0ex}}(1-{R}_{p}^{2})\frac{{P}_{3}^{-1}}{{\text{VIF}}_{p}}\phantom{\rule{0.3em}{0ex}};\phantom{\rule{0.8em}{0ex}}{P}_{3}=\phantom{\rule{0.4em}{0ex}}\prod _{k=1}^{3}{p}_{k}$$

(5)

where *n* is the sample size calculated without regard to covariance, and
${R}_{p}^{2}$ is the mean of the
${R}_{T}^{2}$ values for the two strategies. The adjustment factor
${\scriptstyle \frac{n\ast}{n}}$ approximates the ratio of the (pooled) variance that takes into account between-strategy covariance, to the (pooled) variance that does not, as desired.

The assumed scenario is generally not realistic, and will be ‘anti-conservative’, in that the reduced sample size will be too small, when causal differences are constant conditional on state history *S*_{3}. To improve the adjustment (5), we use ES^{2} as a crude upper bound for the relative error in total variance due to the covariance approximation; this gives *n*** = *n** + ES^{2}*n*. The underlying rationale for the correction suggests that *n*** will be conservative, i.e., provide more than the nominal level of power for the pairwise comparison. See the Appendix for a complete exposition.

As a simple illustration of the adjustments for between-strategy covariance, assume now that ** d** and

To develop a strategy for determining SMAR sample size requirements, we posit the following set up. The trialist specifies *a priori* the *ES* of clinical relevance, i.e., effects smaller than that are not worth detecting. The appropriate sample size is one that insures (i) any pairwise comparison arising from the trial will be fully powered if effects are at least *ES*; (ii) resources are not ‘wasted’ on comparisons smaller than *ES*. Conceptually, it's useful to think of pairs of strategies as either distinct (having effects at least *ES*) or not; the required sample size will be the maximum needed for any comparison of distinct strategies.

As discussed in the Introduction, a key question is what role between-strategy covariance might play in sample size calculations. Any overlap in treatment differences when *ES* is sufficiently large may preclude a ‘distinct’ causal difference, and covariance can likely be ignored. However, for small enough effect sizes, two strategies could be distinct, despite common treatments; in this case, covariance might be substantial enough to require consideration. To make this concrete, consider a SMAR trial with *K* stages in which treatment under an ATS is either uniformly effective or uniformly ineffective across state histories, at any particular stage of the study. For two strategies in the trial, let *δ* (formally a function of the pair) be the stage at which their treatments diverge. For example, *δ* = 1 implies there are no common treatments for a pair of ATS; *δ* = *K* implies all but the last treatments are common. Assume now that as *δ* varies from 1 to *K*, there are at least two strategies divergent at *δ*, such that one is uniformly effective thereafter, while the other is uniformly ineffective. This condition insures that the potential for distinct strategies is as great as possible, even as pairwise covariances increase in magnitude. As constructed, the choice of *ES* determines the ‘*δ* threshold’ after which strategies fail to be distinct.

We conducted simulations to evaluate the proposed formulae for sample size requirements, to assess the role of covariance in design calculations, and to understand the impact of misspecification. To operationalize the SMAR trial described above, we adopted a variant on the simulation scheme used to evaluate the performance of predictive and semi-parametric estimators [6,10]. The central feature of the scheme is a transition matrix (TM), which is used to generate states over time for a particular strategy; the distribution of strategy-specific states is governed by the choice of entries for TM. To explicitly allow for simulated effects due to the final treatment *A _{K}* (for a

We set *K* = 2,3; *β* is varied to generate different scenarios for
${R}_{k}^{2}$ and
${R}_{T}^{2}$ values. In all cases,
${\mathrm{\sigma}}_{e}^{2}=1$ and the intercept *β*_{0} = 0.5, which is the coefficient for *S*_{0} 1. States are assumed to be binary at each stage of the SMAR trial: *S _{k}* = 1,2, with the higher value indicative of poorer response, and equiprobable values for baseline

$$\begin{array}{ll}{\text{TM}}_{G}\phantom{\rule{0.4em}{0ex}}=\phantom{\rule{0.4em}{0ex}}0.7\phantom{\rule{0.6em}{0ex}}0.3& {\phantom{\rule{3em}{0ex}}\text{TM}}_{B}\phantom{\rule{0.4em}{0ex}}=\phantom{\rule{0.4em}{0ex}}0.5\phantom{\rule{0.6em}{0ex}}0.5\\ \phantom{\rule{4em}{0ex}}0.5\phantom{\rule{0.6em}{0ex}}0.5& \phantom{\rule{7em}{0ex}}0.4\phantom{\rule{0.6em}{0ex}}0.6\end{array}$$

where TM* _{ij}* = Pr(j|i) for TM = TM

Table 1 summarizes 1000 replications of each of the scenarios. The results show that the original sample size formula (ignoring covariance) achieves the nominal level of power of 0.80 for *δ* < *K*, but is overly conservative when strategies diverge at the last point; see columns 1-3 for *K* = 3 and columns 1-2 for *K* = 2. Table 1 also confirms that the approximation to covariance derived for the case *δ* = *K* (GGG vs. GGB; GG vs. GB) used by *n** is ‘anti-conservative’, as expected. However, the additional adjustment to sample size provided by *n*** is reasonable, and with two exceptions is conservative but within the simulation standard error of 0.013.

Some of the scenarios in Table 1 were modified by setting σ* _{e}* = 2

The performance of the sample size formulae detailed in Table 1 depends on the trialist knowing the ‘true’ population quantities underlying the simulations. To study the impact of misspecification error, we selected pairs of regression coefficients from Table 1, one assumed to be true, designated ** β**, and one assumed to be in error, designated

Rows 4-6, 9-10 show substantial changes in VIF due to ‘error’, and that the calculated sample sizes are excessive when VIF* _{e}* is specified instead of

The results in Table 2 suggest that sample size is somewhat robust to misspecification errors in the
$\{{R}_{k}^{2}/{R}_{T}^{2}\}$. For example, the proportionate increments generated for the strategy BB (*K* = 2, *δ* = 1) are (0.43, 0.57) when ** β** = (1,1,3), and (0.24, 0.76) when

Less obvious to discern from Table 2 is the influence of the last treatment *A _{K}* (which produces outcome

The simulation set up provides a way to calculate sample sizes when randomization probabilities depend on the history of the individual subject. For example, suppose that the two-stage ATS in the Introduction keeps patients on initial treatment if symptoms abate; subjects who remit will not be randomized again in the SMAR trial. Assume further that the trialist sets *ES* to be moderate, and uses the comparison of strategies GG to BG to size the study, where the second ‘G’ of GG or BG only applies to subjects requiring a second randomization (i.e., *S*_{2} = 2 for an individual subject). In this case, calculations are similar to previous ones for comparing ** d** and

$${E}_{\mathit{\text{d}}}({P}_{2}^{-1})=2Pr[{\mathit{\text{S}}}_{2}=(1,1)]+4Pr[{\mathit{\text{S}}}_{2}=(1,2)]+2Pr[{\mathit{\text{S}}}_{2}=(2,1)]+4Pr[{\mathit{\text{S}}}_{2}=(2,2)]$$

(6)

noting that
${P}_{2}^{-1}=2$ for a single randomization. For GG,
${{\rm E}}_{d}({P}_{2}^{-1})=2(0.35)+4(0.15)+2(0.25)+4(0.25)=2.74$, and similarly
${{\rm E}}_{\mathit{\text{d}}}({P}_{2}^{-1})=2.7$ for BG, given equiprobable *S*_{1}. Also,
${{\rm E}}_{\mathit{\text{d}}}((1-{p}_{2}){P}_{2}^{-1})=2(0.15)+2(0.25)=0.8$ and 0.9 for GG and BG, respectively, since 1- *p*_{2} = 0 if *S*_{2} = 1. Assuming
${R}_{T}^{2}=0.7$ as before, VIF* _{p}* = 2.13, compared to 2.6 when

Analogous calculations could be performed for when *ES* is deemed to be large (e.g., GG vs. BB) or small (e.g., GG vs. GB), taking into account between-strategy covariance for the latter case. For a particular specification for *ES*, simulation could be used to find the pair of ATS with the smallest ‘distinct’ effect, in order to determine the required sample size.

In this paper, we draw upon two distinct methodologies for SMAR trials to develop a hybrid approach to design calculations. The simulation results support the reasonableness of the sequential homogeneity of variance assumption used to map the population optimal variance into familiar regression quantities, as well the adequacy of the approximation to between-strategy covariance when overlap between strategies is substantial. The regression framework for the calculations allows the trialist to ‘guess’ or specify the strength of the relationship of final outcomes to state history, as well as the relative importance of each stage of the study, in order to determine sample sizes for pairwise comparisons. Furthermore, the simulation set up itself can be used by the trialist to ‘firm up’ guesses for variance inflation factors and effect sizes, as well as gauge whether adjustment for between-strategy covariance is needed, using clinical appraisal of what constitutes a ‘good’ and ‘not so good’ response to treatment.

We also used simulation to investigate the sensitivity of sample size formulae to misspecification. The results show that the greatest influence is due to ES, which is an *a priori* clinical judgment on the part of the trialist. The results further suggest an intrinsic robustness to misspecifying regression quantities, particularly the relative contribution of each SMAR stage to the variance inflation factor. Moreover, sensitivity to specification of
${R}_{T}^{2}$ is diminished when the last stage of treatment has a pronounced effect on final outcome, in comparison to earlier stages, as might typify SMAR realizations.

We designate strategies as distinct or not, according to an *a priori* specified effect size of clinical relevance, to develop a strategy for determining sample sizes reflective of the SMAR design. Patients are randomized sequentially to a set of nested treatment options, which not only gives rise to a multiplicity of strategies to be evaluated, but also insures ‘partial’ redundancy among competitors (induced by common treatment history). In addition, differences among ATS are likely to be attenuated by their own sequential structure, e.g., early gains in efficacy are prone to erosion over time. The significance of this to the trialist is that many strategies may be ‘nearly’ as effective (or not), and that it is necessary to define ‘nearly’ in a clinically meaningful way before the start of the study. The notion of effect size offers one way to specify a neighborhood of indifference for SMAR trials that is accessible to clinicians and researchers alike. Our simulation results suggest that the number of subjects needed to distinguish distinct strategies will be significantly reduced by adjustment for covariance only when small effects are of interest.

We remark that presentation has been limited to two- and three-stage SMAR designs, because of the practical focus of the paper. All methodological results extend readily to the *K*-stage study [16].

Supported by National Institute of Mental Health Grant No. R01-MH51481 to Stanford University.

Throughout, we assume the observed proportions assigned to treatments at each stage coincide with the fixed randomization probabilities of the design; such coincidence occurs asymptotically by the law of large numbers and might be achieved in a study using sequentially blocked randomization [6,10]. When this condition holds, the predictive and optimal semi-parametric estimators of **_{d}** and its standard error are equal [16].

To adjust design calculations for between-strategy covariance, we decompose the predictive variance for **_{d}**, denoted

$${\widehat{v}}_{PR}={\widehat{v}}_{n}+{\widehat{v}}_{p}$$

where (suppressing dependence on state history)

$${\widehat{v}}_{n}=\sum _{{\mathit{\text{s}}}_{3}}{\phi}_{3}^{2}\widehat{\text{v}}\text{(}{m}_{3}\text{)}\phantom{\rule{0.2em}{0ex}};\phantom{\rule{3em}{0ex}}{\widehat{v}}_{p}=\sum _{{\mathit{\text{s}}}_{3}{\mathit{\text{s}}}_{3}^{\text{'}}}{m}_{3}{m}_{3}^{\text{'}}\phantom{\rule{0.2em}{0ex}}\mathrm{c}\widehat{\mathrm{o}}\mathrm{v}({\phi}_{3},{\phi}_{3}^{\text{'}})\phantom{\rule{0.2em}{0ex}};$$

(7)

and (*m*_{3}) (*m*_{3}(*s*_{3})) is the sample variance of *m*_{3}(*s*_{3}). (When the argument
${\mathit{\text{s}}}_{3}^{\text{'}}$ is suppressed, we mark the function instead, e.g.,
${\phi}_{3}^{\text{'}}={\phi}_{3}({\mathit{\text{s}}}_{3}^{\text{'}})$.) The first term * _{n}* is the ‘naïve’ variance estimate that assumes the coefficients of

Let *v _{p}* be the population counterpart to

$$\frac{{v}_{\mathit{\text{opt}}}-\text{}{v}_{p}}{{v}_{\mathit{\text{opt}}}}=\frac{{\mathrm{\sigma}}_{K}^{2}{P}_{3}^{-1}}{{\mathrm{\sigma}}_{Y}^{2}\mathit{\text{VIF}}}$$

(8)

where *v _{opt}* is the population optimal within-strategy variance. (Note that

Assumption (ii) is generally not realistic, as it implies with (i) that the population means of the two strategies agree, despite differences in final treatment. Furthermore, if the second assumption fails to hold *and* causal effects are constant across subgroups indexed by state history, the adjustment is ‘anti-conservative’ in that the reduced sample size will be too small. This occurs because the between-strategy covariance will be overstated by the amount
${\Delta}^{2}{\sum}_{{\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}}}c{v}_{3}$, where
$c{v}_{3}\equiv c{v}_{3}({\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}})$ is the population counterpart to
$\mathrm{c}\widehat{\mathrm{o}}\mathrm{v}({\phi}_{3},{\phi}_{3}^{\text{'}})\equiv \mathrm{c}\widehat{\mathrm{o}}\mathrm{v}({\phi}_{3}({\mathit{\text{s}}}_{3}),{\phi}_{3}({\mathit{\text{s}}}_{3}^{\text{'}}))$; as before, *Δ* is the difference in population means. To see this, fix
${\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}}$ and let
${\mu}_{3},{\mu}_{3}^{\text{'}}$ be the corresponding population subgroup means. The term corresponding to
${\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}}$ in *v _{p}* +

$$c{v}_{3}\phantom{\rule{0.2em}{0ex}}({\mu}_{3}{\mu}_{3}^{\text{'}}+{\stackrel{\sim}{\mu}}_{3}{\stackrel{\sim}{\mu}}_{3}^{\text{'}}-2{\mu}_{3}{\stackrel{\sim}{\mu}}_{3}^{\text{'}})\phantom{\rule{0.4em}{0ex}}=c{v}_{3}\phantom{\rule{0.2em}{0ex}}\Delta ({\mu}_{3}-{\stackrel{\sim}{\mu}}_{3}^{\text{'}})$$

(9)

where we use ^{~} to distinguish strategy-specific quantities. If
${\mathit{\text{s}}}_{3}={\mathit{\text{s}}}_{3}^{\text{'}}$, then (9) is equal to *cv*_{3}*Δ*^{2}(*μ*_{3} - _{3} *Δ* by assumption). Otherwise, it is necessary to pair terms for
$({\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}})$ and
$({\mathit{\text{s}}}_{3}^{\text{'}},{\mathit{\text{s}}}_{3})$ to obtain the result, noting that
$\Delta ({\mu}_{3}-{\stackrel{\sim}{\mu}}_{3})+\Delta ({\mu}_{3}^{\text{'}}-{\stackrel{\sim}{\mu}}_{3}^{\text{'}})=2{\Delta}^{2}$.

To get *n***, we use ES^{2} as a ‘crude’ upper bound for the relative error in total variance due to the covariance approximation:

$$\frac{{\Delta}^{2}{\sum}_{{\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}}}c{v}_{3}}{{v}_{\mathit{\text{OPT}},p}}=\frac{{\Delta}^{2}}{{\mathrm{\sigma}}_{p}^{2}}\frac{{\sum}_{{\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}}}c{v}_{3}}{{\text{VIF}}_{p}}={\text{ES}}^{2}\frac{{\sum}_{{\mathit{\text{s}}}_{3},{\mathit{\text{s}}}_{3}^{\text{'}}}c{v}_{3}}{{\text{VIF}}_{p}}$$

where *v _{OPT,p}* is obtained by pooling

Ree Dawson, Frontier Science Technology and Research Foundation, 900 Commonwealth Ave., Boston MA 02215, U.S.A.

Philip W. Lavori, Department of Health Research and Policy, M/C 5405, Stanford CA 94305, U.S.A.

1. Dawson R, Green AI, Drake RE, McGlashan TH, Schanzer B, Lavori PW. Developing and testing adaptive treatment strategies using substance-induced psychosis as an example. Psychopharmacology Bulletin. 2008;41:51–67. [PMC free article] [PubMed]

2. Lavori PW, Dawson R. Adaptive treatment strategies in chronic disease. Annual Review of Medicine. 2008;59:443–453. [PMC free article] [PubMed]

3. Murphy S. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine. 2005;24:1455–1481. [PubMed]

4. Lavori PW, Dawson R. Dynamic treatment regimes: practical design considerations. Clinical Trials. 2004;1:9–20. [PubMed]

5. Oetting AI, Levy JA, Weiss RD, Murphy SA. Statistical Methodology for a SMART Design in the Development of Adaptive Treatment Strategies. In: Shrout PE, editor. Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. Arlington VA: American Psychiatric Publishing, Inc.; 2009.

6. Dawson R, Lavori PW. Sequential causal inference: Application to randomized trials of adaptive treatment strategies. Statistics in Medicine. 2008;27:1626–45. [PMC free article] [PubMed]

7. Thall PF, Millikan R, Sung HG. Evaluating multiple treatment courses in clinical trials. Statistics in Medicine. 2000;19:1011–1028. [PubMed]

8. Lunceford JK, Davidian M, Tsiatis AA. Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2002;58:48–57. [PubMed]

9. Bembom O, van der Laan MJ. Analyzing sequentially randomized trials based on causal effect models for realistic individualized treatment rules. Statistics in Medicine. 2008;27:3689–3716. published online. [PubMed]

10. Lavori PW, Dawson R. Improving the efficiency of estimation in randomized trials of adaptive treatment strategies. Clinical Trials. 2007;4:297–308. [PubMed]

11. Lokhnygina Y, Helterbrand JD. Cox regression methods for two-stage randomization designs. Biometrics. 2007;63:422–428. [PubMed]

12. Wahed AS, Tsiatis AA. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomization designs in clinical trials. Biometrics. 2004;60:124–133. [PubMed]

13. Wolbers M, Helterbrand JD. Two-stage randomization designs in drug development. Statistics in Medicine 2009 published online. [PubMed]

14. Feng W, Wahed AS. Sample size for two-stage studies with maintenance therapy. Statistics in Medicine. 2009;28:2028–2041. [PubMed]

15. Murphy SM, van der Laan MJ, Robins JM. Journal of the American Statistical Association. 2001;96:1410–1423. [PMC free article] [PubMed]

16. Dawson R, Lavori PW. Efficient design and inference for multi-stage randomized trials of individualized treatment policies. 2009 In submission. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |