summarizes 2000 replications of the set up for continuous Y
d for every combination of ES = 0.2, 0.4 and σ
e = 0.5, 1, 2. Throughout, the nominal level of power to be achieved was set to 0.80, with the level of the test = 0.05. The test statistic (the difference of the estimated mean and the null value divided by the standard error) was compared to 1.96, suggested by asymptotic normality of the ML and SP estimators of μ
Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is continuous. VIF is calculated from the regression of Yd on Sd
The results show that when ES = 0.2, the calculated sample sizes ensure repleteness for almost all experiments. By contrast, when ES = 0.40, the proportion of replete experiments among the 2000 replications ranges from 60% to 89%. One could argue that for most SMAR trials, the primary interest will be to detect moderate-sized causal effects, thereby increasing the sample size beyond that provided by the generalized t-test formula in Section 4 when ES = 0.4. Nonetheless, the simulations serve to illustrate the relevance of repleteness to good planning of a SMAR experiment, beyond the usual sample size considerations.
A more striking result in are the differences in power achieved by the ML and optimal SP estimators. ML estimation is mostly robust to even substantial failures of repleteness, because of its use of sample quantities in (2.1) and (2.3) based on allocated proportions. In contrast, the SP reliance on assignment probabilities precludes the optimal estimator (and its standard error) from tuning to the sample at hand. This is true even with mostly replete repetitions, highlighting the influence of near sampling zeroes on achieved power with SP estimation. The expansion of in Appendix B in the supplementary materials
available at Biostatistics
online shows that differences in power for the two approaches are influenced much more by their differences in estimates of μ
than by differences in estimated standard errors. The cases n
= 320,404 show this to be true for even modest loss of power when sample sizes for some strata are too small for sequentially blocked randomization to achieve a priori
assignment probabilities. We note that the efficiency gains for ML estimation are modest, with relative efficiency running from 0.95 to 1.0, for simulated trials without safe turned on but using constrained randomization.
It is not surprising that the optimal estimator may sometimes be underpowered when the simulated trials use the safe option, given that certain a priori randomization probabilities may be set to zero. In contrast, ML estimation ensures nominal power in these cases, albeit conservatively for some scenarios. This property suggests that ML estimation is a suitable choice for inference, prior to the execution of the trial and any knowledge of the stochastic process underlying intermediate states. More generally, its “self-tuning” property of in the face of random and near sampling zeroes reminds us that the asymptotic ML variance estimator coincides with the finite sample one obtained from the MOM.
shows that repleteness and near sampling zeroes have at most moderate impact on the SP efficiency gains provided by the optimal estimator; such impact occurs because of the (inversely weighted) estimates of theμ
k in U
opt. In theory, efficiency gains for fixed σ
e should not depend on n, and simulations with excessively large sample sizes show this to be the case. For the realistic values of n in , the relative efficiency for any given value of σ
e depends on whether the sample size was geared to ES = 0.2 or ES = 0.4. Nonetheless, the results of the simulations confirm that the strength of the relationship of state history to Y
d, as evidenced by the R
2 values, governs the magnitude of efficiency gains.
Relative efficiency of the optimal SP estimator to the MM SP estimator when Yd is continuous. RT
2 and the sample size n are calculated as described in Section 4
for the binary set up shows the sample size formula provides close to the nominal power of 0.80, albeit smaller at times, for at most moderate nonlinearity in expected Bernoulli outcomes (β
= − 6.0, − 4.5), and is conservative otherwise. We attribute the excessive sample sizes for the case β
= − 3.0 to the inability of the VIF to adequately account for strong nonlinearity rather than due to marked failure of sequential homogeneity of variance, given good performance for the normal model set up in the presence of this type of failure (Dawson and Lavori, 2010
). However, strong departures from linearity may not be of issue for many realistic applications because of the impact on μ
, which is much higher for β
= − 3.0: μ
= 0.82 compared to μ
= 0.44, 0.63 for β
= − 6.0, − 4.5, respectively. ATS will tend to be moderately successful (or not) in populations with sufficient response heterogeneity to make sequential treatment adaptation clinically attractive, making values of μ
such as 0.82 unlikely to occur.
Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is binary. VIF is calculated from the regression of Yd on Sd
The performance of SP and ML estimation is more similar for the binary case than for continuous Y
, although larger sample sizes (expected for discrete outcomes) promote significant differences in achieved power. The impact on achieved power due to differences in estimates of μ
is sometimes canceled out by the impact due to differences in estimated standard errors. When repleteness held across replications, differences in standard error had modest impact. See Appendix B in the supplementary materials
available at Biostatistics