Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5026912

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Stepped wedge design parameters
- 3 Statistical model for three level data structure
- 4 Estimate of intervention effect and power function
- 5 Power under model for two-level data
- 6 Simulation study
- 7 Discussion
- Supplementary Material
- References

Authors

Related links

Stat Methods Med Res. Author manuscript; available in PMC 2017 September 17.

Published before final editing as:

Published online 2016 March 17. doi: 10.1177/0962280216632564

PMCID: PMC5026912

NIHMSID: NIHMS795985

Reprints and permissions: sagepub.co.uk/journalsPermissions.nav

The publisher's version of this article, before final editing, is available at Stat Methods Med Res

See other articles in PMC that cite the published article.

Stepped-wedge (SW) designs have been steadily implemented in a variety of trials. A SW design typically assumes a three-level hierarchical data structure where participants are nested within times or periods which are in turn nested within clusters. Therefore, statistical models for analysis of SW trial data need to consider two correlations, the first and second level correlations. Existing power functions and sample size determination formulas had been derived based on statistical models for two-level data structures. Consequently, the second-level correlation has not been incorporated in conventional power analyses. In this paper, we derived a closed-form explicit power function based on a statistical model for three-level continuous outcome data. The power function is based on a pooled overall estimate of stratified cluster-specific estimates of an intervention effect. The sampling distribution of the pooled estimate is derived by applying a fixed-effect meta-analytic approach. Simulation studies verified that the derived power function is unbiased and can be applicable to varying number of participants per period per cluster. In addition, when data structures are assumed to have two levels, we compare three types of power functions by conducting additional simulation studies under a two-level statistical model. In this case, the power function based on a sampling distribution of a marginal, as opposed to pooled, estimate of the intervention effect performed the best. Extensions of power functions to binary outcomes are also suggested.

Stepped-wedge (SW) clinical trial design is a variation of cluster randomized trials and is a type of crossover design at the cluster level as treatment assignments are designed to be progressively crossed over unilaterally from a control to an experimental arm until all clusters are completely crossed over.^{1} The random element of a SW design is the assignment of time points of the crossover to the clusters. The main advantage of the SW design is the relaxation of logistical constraints related to human or financial resources for conduct of classical cluster randomized trials,^{1} although there are challenges in implementing in real-world settings.^{2,3} The SW design is also useful when clinical equipoise is not met and it is unethical to randomize participants to the control arm for the length of the study. Further detailed discussion on this issue can be found in Prost et al.^{4} Additional considerations that should be taken into account for conducting SW trials are suggested by Hargreaves et al.^{5} The SW design has been steadily implemented in a variety of trials,^{6} and a systematic review concerning characteristics of published SW trials is conducted by Brown and Lilford,7 and more recently and comprehensively by Beard et al.^{8}

As is the case for all types of randomized clinical trials, sample size determination is an indispensible element of the SW design. Hussey and Hughes^{9} have proposed a widely used closed-form sample size determination formula for the SW design considering a random effects model. Woertman et al.^{10} converted Hussey and Hughes’ formula to a design effect of the SW design in comparison with a conventional two parallel arm design. Baio et al.^{11} also suggested a design effect formula under a different setting and statistical model. Hemming et al.^{12} evaluated the impact of intra-cluster correlations on statistical power or sample size through design effects under various types of SW designs. In addition, simulation studies for power analysis without explicit formula have also been conducted by Biao et al.^{11} and Van den Heuvel et al.^{13}

In all those derivations above, although the first level correlations (denoted below by *ρ*_{1}) of outcomes among participants in the same times or periods within the same clusters were taken into account, the second level correlations (denoted below by *ρ*_{2}) of outcomes among participants between times or periods within the same clusters were not explicitly considered for computing power or determining sample sizes. The latter correlations would need to be modeled in a statistical model for SW design trials because SW designs by definition naturally assume a three-level data hierarchy, as participants are nested within times or periods that are in turn nested within clusters in SW designs. The nomenclatures for units of levels should depend on the study context; for example, depending on research settings, the third level units can be physicians, clinics, hospitals, schools, communities, and districts to name a few. Hereafter, however, we refer to “cluster”, “period”, and “participant” as the third, the second, and the first level data units, respectively, in the SW design.

The primary aim of this paper is to derive explicit closed-form power functions which consider also the second level correlations by formulating a three-level model accounting for the two types of correlations. To this end, in section 2, we introduce a SW design with design parameter notations. In section 3, we specify the three-level model and formulate the two types of correlations. In section 4, we estimate an overall treatment effect by pooling cluster-specific effect estimates since the number of periods exposed to the experimental condition is not identical across all clusters. A power function is derived based on a sampling distribution of the pooled estimate of the overall treatment effect. In section 5, as a secondary aim, we compare performances of three power functions including that of Hussey and Hughes^{9} under a two-level model when *ρ*_{2} is assumed to be 0 as has previously been implicitly assumed. In section 6, simulation studies compare validities of all power functions under both two- and three-level models. Discussion follows in section 7.

Here we consider the SW design depicted in Figure 1, similar to that which was considered in Woertman et al.,^{10} to illustrate design parameters. The total number of steps is represented by *S* (≥1); the number of clusters in each step is represented by *c* (≥1); the number of periods for each “depth” of a step under an experimental condition (gray areas) is represented by *p* (≥1); and each cluster has *b* (≥0) number of “baseline” periods under a control condition (blank areas). The clusters are indexed by *i* = 1, 2, …, *I* = *cS*, this being the total number of clusters. Let us further denote by *S*_{(}_{i}_{)} the depths of steps for the *i*th cluster under the experimental condition: e.g., *S*_{(}_{i}_{)} = 1 for *i* = 1, 2; *S*_{(}_{i}_{)} = 2 for *i* = 3, 4; and so on. The periods nested within clusters are indexed by *j* = 1, 2, …, *J* = *b* + *pS*, this being the total number of periods per cluster. Study participants nested within each period are indexed by *k* = 1, 2, …, *K*, this being referred to as “cell” size or the number of participants for each cell in Figure 1. Let us assume that the participants belong to only one cell without cross-over to other clusters or periods. Then the total number of participants *N* will be *N* = *IJK* = *cS*(*b* + *pS*)*K*. The parameter values for the SW design depicted in Figure 1 can be found in its footnote. The sets of indices indicating observations from the experimental and control arms will be denoted by *E* and *C*, respectively.

A statistical model for testing an experimental intervention/treatment effect under a SW design can be formulated as follows.

$${Y}_{\mathit{\text{ijk}}}={\beta}_{0}+\delta {X}_{\mathit{\text{ijk}}}+{u}_{i}+{u}_{j(i)}+{e}_{\mathit{\text{ijk}}}$$

(1)

The study outcome is denoted by *Y _{ijk}* (

The fixed-effect overall intercept is denoted by *β*_{0}, and the fixed experimental intervention effect by *δ* in model (1). The distribution of the cluster-level random intercepts *u _{i}* is assumed to be normal as
${u}_{i}~N(0,{\sigma}_{3}^{2})$ and so is that of the period-level random intercepts

Under model (1), and the elements of the covariance matrix are

$$\mathit{\text{Cov}}({Y}_{\mathit{\text{ijk}}},{Y}_{i\prime j\prime k\prime})=1(i=i\prime \&j=j\prime \&k=k\prime ){\sigma}_{e}^{2}+1(i=i\prime \&j=j\prime ){\sigma}_{2}^{2}+1(i=i\prime ){\sigma}_{3}^{2}$$

(2)

where 1(.) is an indicator function. It follows that *Var*(*Y _{ijk}*) =

$${\rho}_{2}={\sigma}_{3}^{2}/{\sigma}^{2}$$

(3)

The correlations among the level one data (i.e., among outcomes measured from different participants in the same period within the same cluster) can be expressed for *k* ≠ *k*′ as

$${\rho}_{1}=({\sigma}_{2}^{2}+{\sigma}_{3}^{2})/{\sigma}^{2}$$

(4)

As a result, *ρ*_{1} is greater than or equal to *ρ*_{2}, that is, *ρ*_{1} ≥ *ρ*_{2}.

To estimate the overall intervention effect *δ*, we consider each cluster as a stratum because outcome observations between periods within clusters are correlated and the numbers of periods exposed to control and experimental conditions are not necessarily identical across the clusters. This means that the variances of the cluster-specific effect estimates are not necessarily identical. In our approach, we first estimate an intervention effect for each cluster/stratum in a cluster-specific fashion, and then pool the cluster-specific estimates into an overall estimate of *δ* in model (1) by applying a fixed-effect meta-analytic approach^{14} as *δ* is assumed to be fixed and homogenous across clusters.

The intervention effect for the *i*th cluster is denoted by *δ _{i}*. A moment of estimate,
${\stackrel{\sim}{\delta}}_{i}$, of

$$\mathit{\text{Var}}({\stackrel{\sim}{\theta}}_{i,E})=\frac{{\sigma}^{2}}{{J}_{i,E}K}\{1+(K-1){\rho}_{1}+K({J}_{i,E}-1){\rho}_{2}\},\mathit{\text{Var}}({\stackrel{\sim}{\theta}}_{i,C})=\frac{{\sigma}^{2}}{{J}_{i,C}K}\{1+(K-1){\rho}_{1}+K({J}_{i,C}-1){\rho}_{2}\}$$

and $\mathit{\text{Cov}}({\stackrel{\sim}{\theta}}_{i,C},{\stackrel{\sim}{\theta}}_{i,E})={\sigma}_{3}^{2}={\sigma}^{2}{\rho}_{2}$. It follows that the variance of ${\stackrel{\sim}{\delta}}_{i}$ can be expressed as below:

$$\mathit{\text{Var}}({\stackrel{\sim}{\delta}}_{i})=\mathit{\text{Var}}({\stackrel{\sim}{\theta}}_{i,E})+\mathit{\text{Var}}({\stackrel{\sim}{\theta}}_{i,C})-2\mathit{\text{Cov}}({\stackrel{\sim}{\theta}}_{i,E},{\stackrel{\sim}{\theta}}_{i,C})=\frac{fJ{\sigma}^{2}}{{J}_{i,E}{J}_{i,C}K}$$

where

$$f=1+(K-1){\rho}_{1}-K{\rho}_{2}$$

(5)

which is the design effect for three-level trials that randomly assign treatments at the second level within clusters.^{15,16} This design effect *f* is an increasing function of *ρ*_{1} and a decreasing function of *ρ*_{2}.

An estimate $\stackrel{\sim}{\delta}$ of overall intervention effect can now be obtained as a pooled estimate of ${\stackrel{\sim}{\delta}}_{i}$ ‘s weighted by their corresponding inverse variances as follows:

$$\stackrel{\sim}{\delta}={\displaystyle \sum _{i=1}^{I}{\omega}_{i}{\stackrel{\sim}{\delta}}_{i}}/{\displaystyle \sum _{i=1}^{I}{\omega}_{i}}$$

(6)

where ${\omega}_{i}=1/\mathit{\text{Var}}({\stackrel{\sim}{\delta}}_{i})$. This pooled estimate is a weighted mean of ${\stackrel{\sim}{\delta}}_{i}\u2019\mathrm{s}$. It follows that

$$Va\mathrm{r}(\stackrel{\sim}{\delta})=1/{\displaystyle \sum _{i=1}^{I}{\omega}_{i}=\frac{fJ{\sigma}^{2}}{K{\displaystyle {\sum}_{i=1}^{I}{J}_{i,E}{J}_{i,C}}}}$$

(7)

Under the setting depicted in Figure 1, the following equation is straightforward:

$$\sum _{i=1}^{I}{J}_{i,E}{J}_{i,C}}=c{\displaystyle \sum _{m=0}^{S-1}(pm+b)(S-m)p=\mathit{\text{cpS}}(S+1)\left\{\frac{p(S-1)+3b}{6}\right\}$$

This equation enables a power function to be expressed in terms of design parameters as follows:

$$\phi =\mathrm{\Phi}\left(\left|\delta \right|/\sqrt{\mathit{\text{Var}}(\stackrel{\sim}{\delta})}-{z}_{1-\alpha /2}\right)=\mathrm{\Phi}\left(\left|\mathrm{\Delta}\right|\sqrt{\frac{\mathit{\text{cpKS}}(S+1)(pS-p+3b)}{6f(b+pS)}}-{z}_{1-\alpha /2}\right)$$

(8)

where Φ (.) is the cumulative density function of a standard normal distribution, *z*_{1−}_{α}_{/2} = Φ^{−1}(1 − *α*/2), Φ^{−1}(.) is the inverse of Φ(.), and Δ = *δ*/*σ* which is known as standardized effect size or Cohen’s *d*.^{17} The statistical power increases with increasing Δ, *b*, *c*, *S* (or *I*), *p*, *K*, *α*, and *ρ*_{2}, all of which decrease the variance (7). However, the statistical power decreases with increasing *ρ*_{1} which increases *f* and thus increases the variance (7).

If ${\rho}_{2}={\sigma}_{3}^{2}/{\sigma}^{2}$(3) is assumed to be 0, then this assumption is equivalent to assuming ${\sigma}_{3}^{2}=0$, and reduces model (1) to a model

$${Y}_{\mathit{\text{ijk}}}={\beta}_{0}+\delta {X}_{\mathit{\text{ijk}}}+{u}_{j(i)}+{e}_{\mathit{\text{ijk}}}$$

(9)

for a two-level data structure. Subsequently, *Cov*(*Y _{ijk}*,

$$\mathit{\text{Cov}}({Y}_{\mathit{\text{ijk}}},{Y}_{i\prime j\prime k\prime})=1(i=i\prime \&j=j\prime \&k=k\prime ){\sigma}_{e}^{2}+1(i=i\prime \&j=j\prime ){\sigma}_{2}^{2}$$

and likewise ${\sigma}^{2}\equiv {\sigma}_{e}^{2}+{\sigma}_{2}^{2}$ and

$$\rho \equiv {\rho}_{1}={\sigma}_{2}^{2}/{\sigma}^{2}$$

(10)

The statistical power expressed in equation (7) in Hussey and Hughes,^{9} can be re-expressed, denoted here by _{HH}, utilizing the equations in the supplements of Woertman et al.^{10} as follows in terms of the design parameters depicted in Figure 1

$${\phi}_{\text{HH}}=\mathrm{\Phi}\left(\left|\mathrm{\Delta}\right|\sqrt{\frac{\mathit{\text{cpKS}}(S-1/S)\{1+\rho (\mathit{\text{pKS}}/2+bK-1)\}}{6(1-\rho )\{1+\rho (\mathit{\text{pKS}}+bK-1)\}}}-{z}_{1-\alpha /2}\right)$$

(11)

This function is not a monotone increasing or decreasing function of *ρ*. Furthermore, _{HH} cannot be defined if *ρ* = 1 although this is unrealistic to occur.

In addition, the statistical power (8) for the three-level model can straightforwardly be reduced to

$${\phi}_{0}=\mathrm{\Phi}\left(\left|\mathrm{\Delta}\right|\sqrt{\frac{\mathit{\text{cpKS}}(S+1)(pS-p+3b)}{6{f}_{0}(b+pS)}}-{z}_{1-\alpha /2}\right)$$

(12)

where

$${f}_{0}=1+(K-1)\rho $$

(13)

which is the same as *f* (5) with *ρ*_{2} (3) and *ρ*_{1} (4) replaced by 0 and *ρ* (10), respectively; *f*_{0} (13) is the design effect for level-two data structure.^{18} The statistical power _{0} is in fact based on pooling of cluster-specific effect estimates weighted by the inverses of the cluster-specific variances of the estimates, and is a monotone decreasing function of *ρ* as is the case for (8) that decreases with *ρ*_{1}.

We note that the clusters, however, became nominal without any influence on statistical inference, since
${\sigma}_{3}^{2}=0$ is assumed. That is to say that the periods are no longer assumed to be nested within clusters although individual observations *Y* are still assumed to be nested with periods. Therefore, statistical power ^{(2)} below can be based on a sampling distribution of a marginal estimate of *δ* in model (9) for two-level data as follows:

$${\stackrel{\sim}{\delta}}_{M}={\displaystyle \sum _{i=1}^{I}{\displaystyle \sum _{j=1}^{J}{\displaystyle \sum _{k=1}^{K}{X}_{\mathit{\text{ijk}}}{Y}_{\mathit{\text{ijk}}}}}}/{\displaystyle \sum _{i=1}^{I}{\displaystyle \sum _{j=1}^{J}{\displaystyle \sum _{k=1}^{K}{X}_{\mathit{\text{ijk}}}-{\displaystyle \sum _{i=1}^{I}{\displaystyle \sum _{j=1}^{J}{\displaystyle \sum _{k=1}^{K}{W}_{\mathit{\text{ijk}}}{Y}_{\mathit{\text{ijk}}}}}}}}}/{\displaystyle \sum _{i=1}^{I}{\displaystyle \sum _{j=1}^{J}{\displaystyle \sum _{k=1}^{K}{W}_{\mathit{\text{ijk}}}}}}$$

(14)

It follows that the power ^{(2)} can be obtained as follows^{19}

$${\phi}^{(2)}=\mathrm{\Phi}\left(\left|\mathrm{\Delta}\right|\sqrt{\frac{K}{{f}_{0}\left(1/{N}_{E}^{(2)}+1/{N}_{C}^{(2)}\right)}}-{z}_{1-\alpha /2}\right)$$

(15)

where
${N}_{E}^{(2)}=\#\{i,j|{X}_{\mathit{\text{ijk}}}=1\}=\mathit{\text{pcS}}(S+1)/2$ and
${N}_{C}^{(2)}=\#\{i,j|{W}_{\mathit{\text{ijk}}}=1\}=cS(b+p(S-1)/2)$ are the numbers of total periods for the experimental and control arms, respectively (the numbers of gray-colored and blank “cells” in Figure 1, respectively). It can be seen that the power function ^{(2)} is also a monotone decreasing function of *ρ* (10).

We conducted simulations using the SAS v9.3 PROC MIXED routine with a restricted maximum likelihood fitting option to (1) validate the power function (8) derived under the three-level model (1); and (2) compare three power functions _{HH} (11), _{0} (12), and ^{(2)} (15) under the two-level model (9). We note that it is possible to theoretically derive closed-form power functions with varying *K _{ij}*, the number of observations per period per cluster. However, it will be cumbersome not only to express exact formulas but also to compute power functions. Therefore, to assess applicability of the power functions under varying

The magnitudes of all of theoretical power functions are compared with those of empirical power estimated from the simulations. To compute simulation-based empirical power, which we consider as the “reference” power, we fit models (1) and (9) with unknown variances which are usually assumed in practice, although all the power functions are derived under known variance components. We generated 1000 simulated data sets for each combination of pre-specified design parameters and estimated the empirical power as follows:

$$\stackrel{\sim}{\phi}={\displaystyle \sum _{s=1}^{1000}1\{{p}_{s}(\delta )<\alpha \}/1000}$$

(16)

where *p _{s}*(

The pre-specified design parameters can be found in Table 1. The results show that the theoretical power (8) and the simulation-based empirical power
$\stackrel{\sim}{\phi}(16)$ are very close to each other regardless of whether *K* is fixed or varying:
$\text{mean}(\phi )-\text{mean}(\stackrel{\sim}{\phi})=-0.012$ for fixed *K* and = 0.001 for varying *K _{ij}*; and
$\text{range}(\phi -\stackrel{\sim}{\phi})=(-0.049,0.010)\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}=(-0.041,0.036)$ respectively. The power function is proven to be an increasing function of all design parameters except

Relationship of *ρ*_{1} and *ρ*_{2} with statistical power (8) for a three-level model: Δ = 0.3, *b* = 2, *c* = 2, *p* = 2, *S* = 5, K = 5, and *α =* 0.05. *Note:* rho_1 = *ρ*_{1} and rho_2 = *ρ*_{2}.

The pre-specified design parameters can be found in Table 2, in which *ρ*_{2} is considered 0. The results show that the performances of the three theoretical power functions _{HH} (11), _{0} (12), and ^{(2)} (15) are quite different in comparison with the reference empirical power under both fixed *K* and varying *K _{ij}*:
$\text{mean}({\phi}_{\text{HH}})-\text{mean}(\stackrel{\sim}{\phi})=0.129$ for fixed

Relationship of *ρ* with statistical power _{HH} (11), _{0} (12), and ^{(2)} (15) for a two-level model: Δ = 0.3, *b* = 2, *c* = 2, *p* = 2, *S* = 5, K = 5, and *α* = 0.05. *Note*: power_HH = _{HH}, power_0 = **...**

Our results suggest that the second level correlations *ρ*_{2} must be accounted for determining sample size when designing a SW assuming a three-level model. However, no SW trials have so far reported an estimate of *ρ*_{2}, although a couple of SW trial studies^{21,22} reported only *ρ*_{1} based on the recent review of Davey et al.^{23} As observed in this paper (Figure 2), the effects of both *ρ*_{1} and *ρ*_{2} on the power are substantial when a three-level model is considered. Therefore, it would be valuable to report estimates of *ρ*_{2} from conducted SW trials for aiding designs of future SW trials. For two-level models, many studies addressed impacts of *ρ* (e.g. see literature^{12,18,24,25}) as reflected in Figure 3. However, relationship between *ρ* and _{HH} is hardly predictable and mostly contradictory to that between *ρ* and _{0} and that between *ρ* and ^{(2)} as well.

The derived power function (8) is proven to be unbiased and valid for that purpose of accounting for both the first and second level correlations. This finding suggests that the pooled estimate
$\stackrel{\sim}{\delta}(6)$ may indeed be a maximum likelihood estimate of *δ* in model (1). Although it was derived under a special case depicted in Figure 1, the power function is also proven to be applicable to SW designs with varying *K _{ij}*, the cell size. Therefore, the pooling estimation approach (6) based on the cluster-specific moment estimates can also be extended to the general cases where c varies over steps, and

When *ρ*_{2} does not need to be considered in a two-level model, the power function ^{(2)} (15) based on the marginal estimate
${\stackrel{\sim}{\delta}}_{M}(14)$ performs the best with the ignorable biases regardless of whether the cell sizes are fixed or varying. In contrast, the power function _{0} (12) that is reduced from (8) by plugging 0 into *ρ*_{2} in underestimates the reference power estimated by simulations. This may be because when stratification is unnecessary, pooled estimates can have an unduly inflated variance and thus lose efficiency compared to the marginal approach. On the contrary, the widely used power function _{HH} (11) overestimates the reference empirical power and thus underestimates sample sizes under the values of *ρ* in Table 2. We suspect that the Hussey and Hughes’ approach might unduly over or underestimate the variance of the estimate of *δ* depending heavily on the values of *ρ* (Figure 3).

Both models (1) and (9) assume that participants are different across the periods within clusters, let alone between clusters. However, when participants are followed up longitudinally over the periods within the same clusters and crossed over from control to experimental arm, another level of data structure should be modeled by expanding the three-level model (1) to a four-level model that additionally incorporates correlations of outcomes over periods within the same participants. In addition, the random intercepts could be correlated each other violating the independence assumption we took in this paper. Derivations of power functions under these situations design would be a worthy contribution to power literature.

Although only continuous outcome is considered in this paper, categorical or non-normal outcomes such as proportions, incidence rates, ordinal, and survival outcomes are more often of interest in many SW trials.^{8} Extension of sample size determinations for such SW trials would be of great interest. The extension might be possible by modeling those outcomes with generalized estimating equations or non-linear mixed-effects models although derivations of closed forms could be intractable. For this reason, sample size determinations based on simulation approaches might be preferable as attempted by Baio et al.^{11} Nonetheless, we suspect that simulation of non-normal data with multi-level data hierarchy for a specified correlation structure would be challenging particularly because correlations may well vary with means on which variances depend unlike normal distributions. Therefore, it would also be interesting to examine if extensions based on normal approximations would be comparable. For example, although it has not been verified by simulation studies for a binary outcome, a simple replacement of Δ by
${\mathrm{\Delta}}_{p}=({p}_{1}-{p}_{0})/\sqrt{\overline{p}(1-\overline{p})}$ in or ^{(2)} might be a good approximation owing to a central limit theorem, where *p*_{0} and *p*_{1} are the “success” probabilities under the null and the alternative hypotheses, respectively, and
$\overline{p}=({p}_{1}+{p}_{0})/2$.

In conclusion, the power functions (8) and ^{(2)} (15) should be used for sample size determinations for designing SW trials depending on whether the second level correlations *ρ*_{2} is assumed to be 0 or not. Both are applicable when cell sizes vary.

**Funding**

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was in part supported by the following NIH Grants: UL1RR025750, R01HS023608, and R01DK097096.

**Declaration of Conflicting Interests**

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

1. Hayes RJ, Moulton LH. Cluster randomized trials. Boca Raton: CRC Press; 2009.

2. Zhan Z, van den Heuvel ER, Doornbos PM, et al. Strengths and weaknesses of a stepped wedge cluster randomized design: its application in a colorectal cancer follow-up study. J Clin Epidemiol. 2014;67:454–461. [PubMed]

3. Moulton LH, Golub JE, Durovni B, et al. Statistical design of THRio: a phased implementation clinic-randomized study of a tuberculosis preventive therapy intervention. Clin Trials. 2007;4:190–199. [PubMed]

4. Prost A, Binik A, Abubakar I, et al. Logistic, ethical, and political dimensions of stepped wedge trials: critical review and case studies. Trials. 2015;16:11. [PMC free article] [PubMed]

5. Hargreaves JR, Copas AJ, Beard E, et al. Five questions to consider before conducting a stepped wedge trial. Trials. 2015;16:4. [PMC free article] [PubMed]

6. Mdege ND, Man M-S, Taylor CA, et al. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011;64:936–948. [PubMed]

7. Brown CA, Lilford RJ. The stepped wedge trial design: a systematic review. BMC Med Res Methodol. 2006;6:54. [PMC free article] [PubMed]

8. Beard E, Lewis JJ, Copas A, et al. Stepped wedge randomised controlled trials: systematic review of studies published between 2010 and 2014. Trials. 2015;16:14. [PMC free article] [PubMed]

9. Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28:182–191. [PubMed]

10. Woertman W, de Hoop E, Moerbeek M, et al. Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol. 2013;66:752–758. [PubMed]

11. Baio G, Copas A, Ambler G, et al. Sample size calculation for a stepped wedge trial. Trials. 2015;16:15. [PMC free article] [PubMed]

12. Hemming K, Girling A. The efficiency of stepped wedge vs. cluster randomized trials: stepped wedge studies do not always require a smaller sample size. J Clin Epidemiol. 2013;66:1427–1428. [PubMed]

13. Van den Heuvel ER, Zwanenburg RJ, Van Ravenswaaij-Arts CM. A stepped wedge design for testing an effect of intranasal insulin on cognitive development of children with Phelan-McDermid syndrome: a comparison of different designs. Stat Methods Med Res. 2014 doi: 10.1177/0962280214558864. [Cross Ref]

14. Hedges L, Olkin I. Statistical methods for meta-analysis. San Diego, CA: Academic Press; 1985.

15. Fazzari MJ, Kim MY, Heo M. Sample size determination for three-level randomized clinical trials with randomization at the first or second level. J Biopharm Stat. 2014;24:579–599. [PubMed]

16. Moerbeek M, van Breukelen GJP, Berger MPF. Design issues for experiments in multilevel populations. J Educ Behav Stat. 2000;25:271–284.

17. Cohen J. Statistical power analysis for the behavioral science. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.

18. Donner A, Klar N. Design and analysis of cluster randomization trials in health research. London: Arnold; 2000.

19. Ahn C, Heo M, Zhang S. Sample size calculations for clustered and longitudinal outcomes in clinical research. Boca Raton, FL: CRC Press; 2014.

20. Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53:983–997. [PubMed]

21. Bashour HN, Kanaan M, Kharouf MH, et al. The effect of training doctors in communication skills on women’s satisfaction with doctor-woman relationship during labour and delivery: a stepped wedge cluster randomised trial in Damascus. BMJ Open. 2013;3:11.

22. Durovni B, Saraceni V, Moulton LH, et al. Effect of improved tuberculosis screening and isoniazid preventive therapy on incidence of tuberculosis and death in patients with HIV in clinics in Rio de Janeiro, Brazil: a stepped wedge, cluster-randomised trial. Lancet Infect Dis. 2013;13:852–858. [PMC free article] [PubMed]

23. Davey C, Hargreaves J, Thompson JA, et al. Analysis and reporting of stepped wedge randomised controlled trials: synthesis and critical appraisal of published studies, 2010 to 2014. Trials. 2015;16:13. [PMC free article] [PubMed]

24. Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clin Trials. 2005;2:152–162. [PubMed]

25. Campbell MK, Fayers PM, Grimshaw JM. Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clin Trials. 2005;2:99–107. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |