Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3370397

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Generalized Estimating Equation Estimator
- 3. A Closed Form Sample Size Formula
- 4. Traditional Adjustment for Missingness
- 5. Simulation Study
- 6. Example
- 7. Discussion
- References

Authors

Related links

Contemp Clin Trials. Author manuscript; available in PMC 2013 May 1.

Published in final edited form as:

Contemp Clin Trials. 2012 May; 33(3): 550–556.

PMCID: PMC3370397

NIHMSID: NIHMS356704

Email: ude.nretsewhtuostu@gnahZ.gnoS (Song Zhang)

See other articles in PMC that cite the published article.

Sample size calculations based on two-sample comparisons of slopes in repeated measurements have been reported by many investigators. In contrast, the literature has paid relatively little attention to the sample size calculations for time-averaged differences in the presence of missing data in repeated measurements studies. Diggle *et al.* (2002) provided a sample size formula comparing time-averaged differences for continuous outcomes in repeated measurement studies assuming no missing data and the compound symmetry (CS) correlation structure among outcomes from the same subject. In this paper we extend Diggle *et al.*'s time-averaged difference sample size formula by allowing missing data and various correlations structures. We propose to use the generalized estimating equation (GEE) method to compare the time-averaged differences in repeated measurement studies and introduce a closed form formula for sample size and power. Simulation studies were conducted to investigate the performance of GEE sample size formula with small sample sizes, damped exponential family of correlation structure and missing data. The proposed sample size formula is illustrated using a clinical trial example.

In controlled clinical trials, subjects are often evaluated at baseline and intervals over a treatment period. For example, Aronow and Ahn (1994) investigated how blood pressure levels vary after taking meals in 499 elderly residents in a long-term health care facility. Blood pressure was measured at baseline and then every 15 minutes over a two-hour period after taking a meal. The researchers showed that the mean maximal reduction in post-prandial systolic blood pressure was significantly greater in those treated with angiotensin-converting-enzyme inhibitors, calcium channel blockers, diuretics, nitrates, and psychotropic drugs than that in elderly residents not treated with these drugs. By measuring the response (e.g., blood pressure reduction) at multiple time points, researchers hope that the time-averaged response (e.g., averaged reduction in blood pressure over time points) within each group can provide a more precise assessment of the treatment effect. Comparing two treatments based on the time-averaged difference, defined as the difference between the time-averaged responses under two treatments, might consequently offer a greater testing power. Care must be taken in the analysis because of the correlation introduced when several measurements are taken from the same individual. The analysis might be further complicated by the occurrence of missing data.

In this paper we investigate sample size determination based on the test of time-averaged difference between treatment groups over a period of a fixed duration. Diggle et al. (2002) provided closed-form sample size formulas to compare the time-averaged responses and the rates of change in studies with continuous outcomes, assuming no missing data, an equal number of subjects between two groups, and the compound symmetry (CS) correlation among observations from the same subject. Liu and Wu (2005) extended the sample size formula for time-averaged differences to unbalanced clinical trials. Zhang and Ahn (2011) investigated how the number of repeated measurements affects the sample size requirement in repeated measurement studies, where statistical inference is obtained based on time-averaged differences. Here we further extend the sample size calculation for time-averaged difference to allow for missing data, general correlation structures, and unequal sample sizes between study groups. Liang and Zeger (1986) developed the generalized estimating equation (GEE) method which has been widely used to analyze repeated measurements data due to its ability to accommodate missing data and robustness against mis-specification of the true correlation structure. We will employ the GEE method to derive a closed-form sample size formula for repeated measurement studies.

We briefly review the GEE method for the analysis of repeated measurements data in Section 2. A closed-form sample size formula for comparing the time-averaged differences between treatment groups will be derived in Section 3. The proposed sample size formula is general enough to accommodate various missing data patterns, such as random missing or monotone missing, and various correlation structures, represented by a damped exponential family that includes autoregressive correlation with order 1 (AR(1)) and compound symmetry (CS) correlation as special cases. In Section 4, we compare sample size adjustment for missing data by the proposed approach (through theoretical derivation which appropriately accounts for the impact of missing pattern, observation probabilities over time, and correlation structure) with that by the traditional approach (a conservative strategy such that even after excluding patients with partial measurements, the subset of patients with complete measurements are sufficient to achieve the nominal power and type I error). We perform simulations to assess the performance of the sample sizes in Section 5. We applied the sample size formula to a clinical trial example in Section 6. The final section is devoted to discussion.

Let *Y _{ij}* denote the continuous response measurement obtained at time

$${Y}_{ij}={\beta}_{1}+{\beta}_{2}{r}_{i}+{\u220a}_{ij},$$

(1)

where parameter *β*_{1} models the intercept effect, parameter *β*_{2} models the difference in time-averaged responses between two groups, and *ε _{ij}* denotes random error. The primary interest is to test hypothesis

It is usually assumed that *E*(*ε _{ij}*)=0 and Var(

To simplify derivation, we rewrite the model as

$${Y}_{ij}={b}_{1}+{b}_{2}({r}_{i}-\stackrel{\u2012}{r})+{\u220a}_{ij},$$

where *b*_{1} = *β*_{1} + *β*_{2}, and *b*_{2} = *β*_{2}. Define the vector of covariates *Z _{ij}* = (1,

$${S}_{n}\left(\mathit{b}\right)={n}^{-1\u22152}\sum _{i=1}^{n}\sum _{j=1}^{m}({Y}_{ij}-{\mathit{b}}^{\prime}{Z}_{ij}){Z}_{ij}.$$

That is,

$$\widehat{\mathit{b}}={\left(\sum _{i}\sum _{j}{Z}_{ij}{Z}_{ij}^{\prime}\right)}^{-1}\sum _{i}\sum _{j}{Z}_{ij}{Y}_{ij}.$$

(2)

We use the independent working correlation structure to derive the GEE estimator. Following Liang and Zeger (1986), $\sqrt{n}(\widehat{\mathit{b}}-\mathit{b})$ is approximately normal with mean 0 and variance ${\sum}_{n}={A}_{n}^{-1}{V}_{n}{A}_{n}^{-1}$, with

$$\begin{array}{cc}\hfill {A}_{n}\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}{n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}{Z}_{ij}{Z}_{ij}^{\prime}={n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}\left(\begin{array}{cc}1\hfill & {r}_{i}-\stackrel{\u2012}{r}\hfill \\ {r}_{i}-\stackrel{\u2012}{r}\hfill & {({r}_{i}-\stackrel{\u2012}{r})}^{2}\hfill \end{array}\right),\hfill \\ \hfill {V}_{n}\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}{n}^{-1}\sum _{i=1}^{n}\left(\sum _{j}{Z}_{ij}{\widehat{\u220a}}_{ij}\right){\left(\sum _{j}{Z}_{ij}{\widehat{\u220a}}_{ij}\right)}^{\prime}\hfill \\ \hfill & =\phantom{\rule{thickmathspace}{0ex}}{n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}\sum _{{j}^{\prime}=1}^{m}{\widehat{\u220a}}_{ij}{\widehat{\u220a}}_{i{j}^{\prime}}\left(\begin{array}{cc}1\hfill & {r}_{i}-\stackrel{\u2012}{r}\hfill \\ {r}_{i}-\stackrel{\u2012}{r}\hfill & {({r}_{i}-\stackrel{\u2012}{r})}^{2}\hfill \end{array}\right),\hfill \end{array}$$

(3)

and ${\widehat{\u220a}}_{ij}={Y}_{ij}-{\widehat{\mathit{b}}}^{\prime}{Z}_{ij}$. Hence we reject hypothesis *H*_{0} : *b*_{2} = 0, which is equivalent to reject hypothesis *H*_{0} : *β*_{2} = 0, if the absolute value of $\sqrt{n}{\widehat{b}}_{2}\u2215{\widehat{\sigma}}_{2}$ is greater than *z*_{1–α/2}. Here ${\widehat{\sigma}}_{2}^{2}$ is the (2, 2)th element in Σ* _{n}*, and

Let ${\sigma}_{2}^{2}$ be the variance of the GEE estimator for *β*_{2}. Given type I error *α*, power 1 – *γ*, and the true value of time-averaged difference *β*_{2}, the required sample size is

$$n=\frac{{\sigma}_{2}^{2}{({z}_{1-\alpha \u22152}+{z}_{1-\gamma})}^{2}}{{\beta}_{2}^{2}}.$$

To account for missing data, we assume that measurements are made at scheduled times unless missing, and the missing probability only depends on time. We introduce an indicator *δ _{ij}* which takes value 1 if subject

$${A}_{n}={n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}{\delta}_{ij}\left(\begin{array}{cc}1\hfill & {r}_{i}-\stackrel{\u2012}{r}\hfill \\ {r}_{i}-\stackrel{\u2012}{r}\hfill & {({r}_{i}-\stackrel{\u2012}{r})}^{2}\hfill \end{array}\right)$$

and

$${V}_{n}={n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{m}\sum _{{j}^{\prime}=1}^{m}{\delta}_{ij}{\delta}_{i{j}^{\prime}}{\widehat{\u220a}}_{ij}{\widehat{\u220a}}_{i{j}^{\prime}}\left(\begin{array}{cc}1\hfill & {r}_{i}-\stackrel{\u2012}{r}\hfill \\ {r}_{i}-\stackrel{\u2012}{r}\hfill & {({r}_{i}-\stackrel{\u2012}{r})}^{2}\hfill \end{array}\right).$$

Let *ρ _{jj′}* = Corr(

$$A=\lambda \left(\begin{array}{cc}1\hfill & 0\hfill \\ 0\hfill & {\sigma}_{r}^{2}\hfill \end{array}\right)\phantom{\rule{thickmathspace}{0ex}}\text{and}\phantom{\rule{thickmathspace}{0ex}}V={\sigma}^{2}\left(\begin{array}{cc}\eta \hfill & 0\hfill \\ 0\hfill & \eta {\sigma}_{r}^{2}\hfill \end{array}\right),$$

where ${\sigma}_{r}^{2}=\stackrel{\u2012}{r}(1-\stackrel{\u2012}{r})$, $\lambda ={\sum}_{j=1}^{m}{p}_{j}$, and $\eta ={\sum}_{j=1}^{m}{\sum}_{{j}^{\prime}=1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}$. It is easy to show that the (2, 2)th element of Σ = *A*^{–1}*V A*^{–1} is

$${\sigma}_{2}^{2}=\frac{{\sigma}^{2}\eta}{{\lambda}^{2}{\sigma}_{r}^{2}}.$$

Then the required sample size is

$$n=\frac{{\sigma}^{2}\eta {({z}_{1-\alpha \u22152}+{z}_{1-\gamma})}^{2}}{{\beta}_{2}^{2}{\lambda}^{2}{\sigma}_{r}^{2}}.$$

(4)

This closed-form sample size formula provides a flexible approach to sample size estimation because it can accommodate a broad spectrum of experimental designs, missing patterns, and correlation structures through the specifications of (, *p _{j}*,

For example, = 0.5 indicates a balanced design while = 0.33 implies a 1 : 2 randomization ratio between the treatment and control group. The temporal trend of missingness is described by ** P** = (

$$\rho ={\left({\rho}_{j{j}^{\prime}}\right)}_{m\times m}=\left(\begin{array}{cccc}\hfill 1\hfill & \hfill {\rho}^{{\mid {t}_{1}-{t}_{2}\mid}^{\varphi}}\hfill & \hfill \dots \hfill & \hfill {\rho}^{{\mid {t}_{1}-{t}_{m}\mid}^{\varphi}}\hfill \\ \hfill {\rho}^{{\mid {t}_{2}-{t}_{1}\mid}^{\varphi}}\hfill & \hfill 1\hfill & \hfill \dots \hfill & \hfill {\rho}^{{\mid {t}_{2}-{t}_{m}\mid}^{\varphi}}\hfill \\ \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ \hfill {\rho}^{{\mid {t}_{m}-{t}_{1}\mid}^{\varphi}}\hfill & \hfill {\rho}^{{\mid {t}_{m}-{t}_{2}\mid}^{\varphi}}\hfill & \hfill \dots \hfill & \hfill 1\hfill \end{array}\right).$$

(5)

Note that under *ϕ* = 0, it describes a CS correlation structure with *ρ _{jj′}* =

In the design of clinical trials with repeated measurements, a common practice to account for missing data is to estimate sample size by *n*_{0}/(1 – *q*), where *n*_{0} is the sample size estimated assuming no missing measurement and *q* is the expected dropout rate (Patel and Rowe, 1999). If we specify ** P** = (

**Theorem 1.** In designing a repeated measurement study to compare the time-averaged difference between the control and treatment groups, the sample size estimated by (4) is smaller than that obtained by traditional adjustment for missing data,

$$n\le {n}_{0}\u2215{p}_{m},$$

(6)

*as long as the following two conditions hold:*

- The probability of observation is non-increasing over time, p
_{1}≥*p*_{2}≥ ··· ≥*p*_{m}. *The within-subject correlation is positive, ρ*≥ 0. Furthermore, the equality sign in (6) only holds for complete data (_{jj′}*p*_{1}= ··· =*p*= 1)_{m}*with independent measurements within subjects*(*ρ*= 0_{jj′}*for j*≠*j′*).

*Proof.* See Appendix A.1.

Condition 1 assumes a non-decreasing trend in the probability of missing data, which is a realistic assumption in most clinical trials. The positive correlation assumption stated by Condition 2 agrees with the general perception that the within-subject correlation between measurements is generally non-negative. These two conditions do not put restriction on missing patterns (complete data, RM, MM) or correlation structures (CS, AR(1)). The proposed sample size formula (4) appropriately accounts for factors affecting statistical inference in repeated measurement studies. As a result, Theorem 1 shows that it preserves the power and type I error better than the traditional approach in the presence of missing data.

We conduct Simulation 1 to demonstrate the effect of various design configurations on sample size in repeated measurement studies. The nominal levels of power and type I error are set at 1 – *γ* = 0.8 and *α* = 0.05, respectively. We consider various correlation structures from the damped exponential family with *ϕ* = 0, ¼, ½, ¾ and 1, representing a gradual transition from the CS to the AR(1) structure. Different values of *ρ* are explored, ranging from 0.1, 0.25, to 0.5. We also assess the effect of missing patterns, RM and MM, with various trends in the observation probability. Assuming a total number of *m* = 6 scheduled measurements from each subject, we consider

$$\begin{array}{cc}\hfill {\mathit{P}}_{1}\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}(1,0.82,0.79,0.76,0.73,0.7),\hfill \\ \hfill {\mathit{P}}_{2}\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}(1,0.94,0.88,0.82,0.76,0.7),\hfill \\ \hfill {\mathit{P}}_{3}\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}(1,1,1,0.9,0.8,0.7),\hfill \\ \hfill {\mathit{P}}_{4}\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}(1,1,1,1,1,1).\hfill \end{array}$$

(7)

The probabilities *P*_{1}, *P*_{2} and *P*_{3} describe different scenarios where an increasing number of subjects miss visits over the follow-up period, with the same dropout rate at the end of study *q* = 1 – *p _{m}* = 0.3. Specifically,

To assess the performance of the proposed sample size approach, for every combination of the aforementioned factors (*σ*^{2}, *ρ*, *ϕ*, observation probability, missing pattern), the simulation study proceeds as follows: a) Estimate sample size (*n*) based on Equation (4); b) Generate 5000 null (under *β*_{2} = 0) and 5000 alternative (under *β*_{2} = 0.2) data sets each containing *n* subjects. Every subject has a vector of measurements, *Y*_{i} = (*Y*_{i1}, ···, *Y _{im}*)′, generated from Model (1),

The empirical type I error and empirical power being close to the nominal levels (*α* = 0.05, 1 – *γ* = 0.8) indicate good performance of the proposed method. Table 1 presents sample size estimates and empirical power from Simulation 1. The empirical type I errors are all close to 0.05 and omitted due to space limit. The rows with observation probability *P*_{4}, are identical under missing patterns RM and MM since there is no missing data. We have several observations. First, sample sizes increase in the order of *P*_{4}, *P*_{3}, *P*_{2} and *P*_{1}, under all correlation structures and missing patterns, which is understandable because the severity of missing increases in the same order. It should be pointed out that the traditional adjustment for missing data greatly overestimates sample size requirement. For example, under the CS correlation with *ρ* = 0.1, the sample size for complete data is *n*_{0} = 197. The traditional adjustment for a dropout rate at 0.3 would require a sample size of 282. The proposed approach, however, requires at most 229 and 240 subjects, under the RM and MM missing patterns, respectively. The saving in sample size not only reduces waste in time and resources, more importantly, it exposes less subjects to the potential risk of experimental treatments. Second, along each row, the sample sizes increase with *ϕ* within the damped exponential family, regardless of missing pattern. The magnitude of relative increase, however, is smaller under higher correlation. For example, under the RM pattern and observation probability *P*_{1}, with *ρ* = 0.1, the sample size increases from 229 to 419 as *ϕ* increases from 0 to 1, an 83% increase. With a higher correlation (*ρ* = 0.5), the sample size increases by 31%, from 490 to 641. Third, given any missing pattern and correlation structure, a stronger correlation always leads to a larger sample size requirement. For example, under the MM pattern with observation probability *P*_{2} and the AR(1) structure, the required sample sizes are 439, 551, and 677, for correlation *ρ* = 0.1, 0.25, and 0.5, respectively. Finally, Table 1 demonstrates the importance of correctly specifying the correlation structure and missing pattern. For example, in the first row of Table 1, all other factors being the same, a CS correlation structure (*ϕ* = 0) leads to a sample size of 229 while an AR(1) structure (*ϕ* = 1) leads to a sample size of 419. Thus if the true correlation structure is AR(1) but we mistakenly assume a CS structure in experimental design, the study will be severely under powered. Similarly, by comparing the corresponding sample sizes under the RM and MM patterns in the table, we can assess the impact of missing pattern. In Table 1, a sample size under MM is always larger than that under RM. Thus if the true missing pattern is random but we mistakenly assume an MM pattern in sample size calculation, the study will be over powered, resulting waste in resource and excess risk for patients. We conducted another simulation with power 0.9 (results presented in Table B.2). Except for larger sample sizes, we have similar observations as from Table B.1. The following theorem formally summarizes the empirical observations from the two tables.

**Theorem 2.** For a repeated measurement study to compare the time-averaged differences between the control and treatment groups, when Condition 1 and 2 listed in Theorem 1 hold, the required sample size

- increases with the power parameter ϕ in the damped exponential family;
- increases with correlation ρ;
*is larger under MM than that under RM,**when all the other factors hold equal.*

*Proof.* See Appendix A.2.

In clinical trials patients might have difficulty following the exact schedule of clinical visits. In other words, the measurement times are more likely to be random instead of being fixed. We conduct simulation 2 to examine the performance of the proposed sample size in a more realistic scenario, where the sample size is calculated assuming a fixed visit schedule based on (4), but the data is simulated in such a fashion that the first and last measurements are made at time 0 and *T*, and the other measurements are made at random time points in between. The simulation results are presented in Table 3. Because the CS structure (*ϕ* = 0) assumes a constant within-subject correlation, the measurement time being random or fixed does not affect the statistical inference of time-averaged differences. Thus the first columns of Table B.1 and Table B.3 are identical. When *ϕ* > 0, the temporal difference between measurements affects the strength of within-subject correlation, and in turn affects the time-averaged difference. The effect of randomness in measurement times, however, seems to be small. In Table B.3, the empirical power remains close to the nominal level (1 – *γ* = 0.8), indicating that the proposed sample size performs well when clinical visits deviate from the planned schedule.

The statistical inference under the GEE method is based on a large sample approximation. It is thus worthwhile to examine the performance of the proposed approach in a small-sample-size scenario, which might be encountered by practitioners due to a large treatment effect or a limited disease population. We conduct Simulation 3 where the sample size is fixed at 30 subjects per group with *n* = 60. Here is the basic idea. From Equation (4) we can derive a relationship between sample size *n* and treatment effect *β*_{2}:

$${\beta}_{2}=\sqrt{\frac{{\sigma}^{2}\eta {({z}_{1-\alpha \u22152}+{z}_{1-\gamma})}^{2}}{n{\lambda}^{2}{\sigma}_{r}^{2}}}.$$

(8)

Thus by setting *n* = 60, we can compute from (8) the corresponding values of *β*_{2}. In other words, instead of fixing the value of the treatment effect *β*_{2}, in Simulation 3 we allow *β*_{2} to vary across different trial configurations so that the sample size estimated by the proposed approach is always small (*n* = 60).

The results of Simulation 3 are presented in Table B.4, which is conducted in the similar fashion as Simulation 1 except for letting *β*_{2} vary to fix sample size at *n* = 60. The empirical powers are close to the nominal level (0.8) under all trial configurations. It provides assurance to researchers that the proposed approach is widely applicable to clinical trial with repeated measurements, even when the sample size is relatively small (30 subjects per group with a 30% dropout rate).

The simulation studies are conducted using statistical software R 2.13.1 (R Foundation for Statistical Computing, Vienna, Austria). The R code is available upon request from the first author.

We apply the sample size calculation method to a labor pain study (Davis, 1991), where women in labour were randomly assigned to the pain medication or the placebo group. At 30 minutes intervals, the self-reported amount of pain was marked on a 100mm line with 0mm=no pain and 100mm=extreme pain. The maximum number of measurements for each woman was *m* =6, but there were numerous missing values at later measurement times with the monotone missing pattern. The observation probabilities are ** P** = (1, 0.90, 0.77, 0.67, 0.54, 0.41). Exploratory analysis indicates that the measurements has a AR(1) correlation structure with

In this study we have derived a closed-form sample size formula for clinical trials with repeated measurements, where the primary goal is to compare the time-averaged difference between the control and treatment groups. This formula is flexible enough to accommodate different missing pattern, severity of missing data, and correlation structures. We further show that the proposed sample size formula, by taking into account of the various designing factors, performs better in adjusting the sample size for missing data than the traditional approach. It should be pointed out that sometimes it might be difficult to know the dropout rates at different time points or to accurately specify the within-subject correlation structure. In such cases the traditional adjustment for missing data remains a practical solution. The closed-form formula also provides insight into the relationship between sample size requirement and the various factors, as summarized in Theorem 2.

The proposed approach is readily applicable to cluster randomization trials with missing data. We have assumed that the missing data arises from a missing completely at random (MCAR) mechanism. In some clinical trials this assumption may not hold. To appropriately account for a non-MCAR mechanism, however, an additional model is required which characterizes the true missing mechanism adequately well. In such cases a general sample size formula is usually unavailable and a specially designed numerical study is required to estimate the sample size for a particular missing mechanism.

This work was supported in part by NIH grants UL1 RR024982, P30CA142543, P50CA70907, and DK081872.

First we derive the expression of *n*_{0}, which can be obtained from Equation (4) by setting all *p _{j}* = 1 and all

$${n}_{0}=\frac{{\sigma}^{2}\left({\sum}_{j=1}^{m}{\sum}_{{j}^{\prime}=1}^{m}{\rho}_{j{j}^{\prime}}\right){({z}_{1-\alpha \u22152}+{z}_{1-\gamma})}^{2}}{{\beta}_{2}{m}^{2}{\sigma}_{r}^{2}}.$$

Thus

$$\frac{n}{{n}_{0}\u2215{p}_{m}}=\frac{{m}^{2}{p}_{m}}{{\left({\sum}_{j=1}^{m}{p}_{j}\right)}^{2}}\cdot \frac{{\sum}_{j=1}^{m}{\sum}_{{j}^{\prime}=1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}}{{\sum}_{j=1}^{m}{\sum}_{{j}^{\prime}=1}^{m}{\rho}_{j{j}^{\prime}}}.$$

Defining $\stackrel{\u2012}{p}={\sum}_{j=1}^{m}{p}_{j}\u2215m$ and using the fact that *ρ _{jj}* = 1, we have

$$\frac{n}{{n}_{0}\u2215{p}_{m}}=\frac{{m}^{2}{p}_{m}}{{m}^{2}{\stackrel{\u2012}{p}}^{2}}\cdot \frac{m\stackrel{\u2012}{p}+2{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}}{m+2{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}}.$$

**Lemma 1. ***Under Conditions 1 and 2,*

$$\sum _{j=1}^{m-1}\sum _{{j}^{\prime}=j+1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}\le \stackrel{\u2012}{p}\sum _{j=1}^{m-1}\sum _{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}.$$

(A.1)

*Proof.* Inequality (A.1) is equivalent to

$$\frac{{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}}{{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}}\le \stackrel{\u2012}{p}.$$

Because *ρ _{jj′}* ≥ 0 and

$$\begin{array}{cc}\hfill \frac{{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}}{{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}}\phantom{\rule{thickmathspace}{0ex}}& \le \phantom{\rule{thickmathspace}{0ex}}\frac{{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{p}_{{j}^{\prime}}{\rho}_{j{j}^{\prime}}}{{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}}\hfill \\ \hfill & =\phantom{\rule{thickmathspace}{0ex}}\frac{{\sum}_{{j}^{\prime}=2}^{m}\left({p}_{{j}^{\prime}}{\sum}_{j=1}^{{j}^{\prime}-1}{\rho}_{j{j}^{\prime}}\right)}{{\sum}_{{j}^{\prime}=2}^{m}{\sum}_{j=1}^{{j}^{\prime}-1}{\rho}_{j{j}^{\prime}}}\hfill \\ \hfill & =\phantom{\rule{thickmathspace}{0ex}}\sum _{{j}^{\prime}=2}^{m}{p}_{{j}^{\prime}}{w}_{{j}^{\prime}}.\hfill \end{array}$$

That is, the right hand side of inequality is a weighted average of {*p _{j′}* :

$${w}_{{j}^{\prime}}=\frac{{\sum}_{j=1}^{{j}^{\prime}-1}{\rho}_{j{j}^{\prime}}}{{\sum}_{l=2}^{m}{\sum}_{j=1}^{l-1}{\rho}_{jl}}$$

With *ρ _{jj′}* ≥ 0 from Condition (2), we have

$$\sum _{{j}^{\prime}=2}^{m}{p}_{{j}^{\prime}}{w}_{{j}^{\prime}}\le {\stackrel{\u2012}{p}}_{(-1)}\le \stackrel{\u2012}{p}.$$

Here ${\stackrel{\u2012}{p}}_{(-1)}={\sum}_{{j}^{\prime}=2}^{m}{p}_{{j}^{\prime}}\u2215(m-2)$ is the unweighted average of {*p _{j′}* :

Using Lemma 1, we have

$$\frac{n}{{n}_{0}\u2215{p}_{m}}\le \frac{{p}_{m}}{{\stackrel{\u2012}{p}}^{2}}\cdot \frac{m\stackrel{\u2012}{p}+2\stackrel{\u2012}{p}{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}}{m+2{\sum}_{j=1}^{m-1}{\sum}_{{j}^{\prime}=j+1}^{m}{\rho}_{j{j}^{\prime}}}=\frac{{p}_{m}\stackrel{\u2012}{p}}{{\stackrel{\u2012}{p}}^{2}}=\frac{{p}_{m}}{\stackrel{\u2012}{p}}\le 1.$$

The last “≤” sign comes from the fact that *p _{m}* is the smallest element in {

It is obvious from the above derivation that the equality sign in (6) only holds when *p _{1}* = ··· =

To prove Point 1, it is obvious from Equation (4) that parameter *ϕ* affects sample size through $\eta ={\sum}_{j=1}^{m}{\sum}_{{j}^{\prime}=1}^{m}{p}_{j{j}^{\prime}}{\rho}_{j{j}^{\prime}}$. Because *ρ _{jj′}* =

To prove Point 2, it is easy to show that, given power *ϕ*, the correlation coefficients *ρ _{jj′}* are increasing functions of

To prove Point 3, first note that the missing patterns (RM and MM) affect sample size through *p _{jj′}* in

ρ | P | ϕ = 0 CS | ϕ = 0.25 | ϕ = 0.5 | ϕ = 0.75 | ϕ = 1 AR(1) |
---|---|---|---|---|---|---|

(a) Random missing | ||||||

0.1 | P_{1} | 229(0.808) | 269(0.809) | 318(0.807) | 370(0.805) | 419(0.804) |

P_{2} | 220(0.808) | 261(0.805) | 311(0.795) | 363(0.812) | 413(0.798) | |

P_{3} | 211(0.801) | 253(0.798) | 304(0.798) | 357(0.795) | 408(0.811) | |

P_{4} | 197(0.801) | 237(0.807) | 287(0.791) | 340(0.802) | 389(0.797) | |

0.25 | P_{1} | 327(0.785) | 380(0.802) | 433(0.805) | 481(0.800) | 522(0.807) |

P_{2} | 317(0.797) | 371(0.802) | 425(0.809) | 474(0.794) | 517(0.802) | |

P_{3} | 309(0.810) | 364(0.812) | 418(0.792) | 468(0.799) | 512(0.799) | |

P_{4} | 295(0.795) | 349(0.797) | 402(0.800) | 451(0.804) | 493(0.809) | |

0.5 | P_{1} | 490(0.807) | 538(0.807) | 579(0.802) | 613(0.801) | 641(0.807) |

P_{2} | 480(0.797) | 529(0.798) | 571(0.806) | 606(0.807) | 634(0.793) | |

P_{3} | 472(0.791) | 522(0.802) | 564(0.806) | 600(0.791) | 628(0.808) | |

P_{4} | 458(0.793) | 507(0.804) | 549(0.802) | 583(0.801) | 611(0.804) | |

(b) Monotone missing | ||||||

0.1 | P_{1} | 240(0.802) | 288(0.795) | 346(0.803) | 408(0.797) | 466(0.807) |

P_{2} | 225(0.805) | 270(0.805) | 326(0.804) | 384(0.804) | 439(0.796) | |

P_{3} | 213(0.811) | 256(0.803) | 309(0.802) | 364(0.799) | 417(0.798) | |

P_{4} | 197(0.814) | 237(0.811) | 287(0.808) | 340(0.801) | 389(0.808) | |

0.25 | P_{1} | 353(0.794) | 416(0.798) | 479(0.811) | 536(0.791) | 586(0.799) |

P_{2} | 331(0.806) | 391(0.803) | 450(0.803) | 504(0.805) | 551(0.807) | |

P_{3} | 313(0.800) | 370(0.813) | 426(0.799) | 478(0.803) | 523(0.806) | |

P_{4} | 295(0.801) | 349(0.813) | 402(0.804) | 451(0.806) | 493(0.803) | |

0.5 | P_{1} | 542(0.803) | 599(0.797) | 648(0.798) | 689(0.808) | 722(0.806) |

P_{2} | 507(0.790) | 562(0.799) | 608(0.796) | 646(0.806) | 677(0.798) | |

P_{3} | 480(0.800) | 531(0.806) | 575(0.807) | 612(0.805) | 641(0.797) | |

P_{4} | 458(0.809) | 507(0.799) | 549(0.808) | 583(0.793) | 611(0.798) |

ρ | P | ϕ = 0 CS | ϕ = 0.25 | ϕ = 0.5 | ϕ = 0.75 | ϕ = 1 AR(1) |
---|---|---|---|---|---|---|

(a) Random missing | ||||||

0.1 | P_{1} | 307(0.893) | 360(0.903) | 426(0.899) | 495(0.901) | 560(0.895) |

P_{2} | 294(0.902) | 349(0.903) | 416(0.906) | 486(0.900) | 553(0.902) | |

P_{3} | 282(0.905) | 338(0.909) | 406(0.899) | 478(0.899) | 546(0.900) | |

P_{4} | 263(0.907) | 318(0.904) | 384(0.898) | 454(0.902) | 521(0.902) | |

0.25 | P_{1} | 438(0.898) | 508(0.895) | 579(0.900) | 643(0.897) | 699(0.899) |

P_{2} | 425(0.897) | 497(0.902) | 569(0.898) | 635(0.904) | 691(0.894) | |

P_{3} | 413(0.900) | 487(0.903) | 560(0.904) | 627(0.901) | 685(0.902) | |

P_{4} | 395(0.899) | 466(0.903) | 538(0.900) | 603(0.899) | 660(0.895) | |

0.5 | P_{1} | 656(0.905) | 720(0.904) | 775(0.896) | 821(0.904) | 858(0.898) |

P_{2} | 643(0.904) | 709(0.899) | 765(0.901) | 811(0.902) | 849(0.888) | |

P_{3} | 631(0.906) | 698(0.896) | 775(0.902) | 802(0.905) | 841(0.905) | |

P_{4} | 613(0.903) | 679(0.901) | 735(0.905) | 781(0.897) | 818(0.907) | |

(b) Monotone missing | ||||||

0.1 | P_{1} | 321(0.899) | 385(0.905) | 463(0.901) | 545(0.904) | 624(0.899) |

P_{2} | 301(0.902) | 362(0.904) | 436(0.903) | 514(0.905) | 588(0.910) | |

P_{3} | 284(0.900) | 342(0.900) | 413(0.896) | 488(0.899) | 558(0.910) | |

P_{4} | 263(0.905) | 318(0.906) | 384(0.907) | 454(0.901) | 521(0.904) | |

0.25 | P_{1} | 473(0.902) | 557(0.898) | 641(0.911) | 718(0.904) | 784(0.900) |

P_{2} | 443(0.893) | 523(0.898) | 602(0.898) | 675(0.902) | 737(0.896) | |

P_{3} | 418(0.901) | 495(0.905) | 571(0.909) | 640(0.898) | 700(0.906) | |

P_{4} | 395(0.898) | 466(0.898) | 538(0.899) | 603(0.911) | 660(0.903) | |

0.5 | P_{1} | 726(0.898) | 802(0.907) | 868(0.900) | 922(0.892) | 966(0.897) |

P_{2} | 679(0.897) | 752(0.896) | 814(0.898) | 865(0.903) | 906(0.905) | |

P_{3} | 642(0.906) | 711(0.903) | 770(0.895) | 819(0.898) | 858(0.894) | |

P_{4} | 613(0.904) | 679(0.904) | 735(0.900) | 781(0.908) | 818(0.905) |

ρ | P | ϕ = 0 CS | ϕ = 0.25 | ϕ = 0.5 | ϕ = 0.75 | ϕ = 1 AR(1) |
---|---|---|---|---|---|---|

(a) Random missing | ||||||

0.1 | P_{1} | 229(0.808) | 269(0.774) | 318(0.775) | 370(0.775) | 419(0.780) |

P_{2} | 220(0.808) | 261(0.783) | 311(0.772) | 363(0.770) | 413(0.784) | |

P_{3} | 211(0.801) | 253(0.776) | 304(0.764) | 357(0.766) | 408(0.780) | |

P_{4} | 197(0.801) | 237(0.778) | 287(0.769) | 340(0.767) | 389(0.766) | |

0.25 | P_{1} | 327(0.785) | 380(0.795) | 433(0.782) | 481(0.790) | 522(0.792) |

P_{2} | 317(0.797) | 371(0.784) | 425(0.779) | 474(0.782) | 517(0.790) | |

P_{3} | 309(0.810) | 364(0.790) | 418(0.787) | 468(0.776) | 512(0.792) | |

P_{4} | 295(0.795) | 349(0.792) | 402(0.787) | 451(0.787) | 493(0.791) | |

0.5 | P_{1} | 490(0.807) | 538(0.798) | 579(0.794) | 613(0.795) | 641(0.795) |

P_{2} | 480(0.797) | 529(0.796) | 571(0.795) | 606(0.791) | 634(0.799) | |

P_{3} | 472(0.791) | 522(0.789) | 564(0.789) | 600(0.786) | 628(0.806) | |

P_{4} | 458(0.793) | 507(0.798) | 549(0.786) | 583(0.802) | 611(0.801) | |

(b) Monotone missing | ||||||

0.1 | P_{1} | 240(0.802) | 288(0.772) | 346(0.764) | 408(0.764) | 466(0.784) |

P_{2} | 225(0.805) | 270(0.776) | 326(0.774) | 384(0.768) | 439(0.782) | |

P_{3} | 213(0.811) | 256(0.765) | 309(0.773) | 364(0.773) | 417(0.779) | |

P_{4} | 197(0.814) | 237(0.780) | 287(0.775) | 340(0.780) | 389(0.787) | |

0.25 | P_{1} | 353(0.794) | 416(0.789) | 479(0.781) | 536(0.793) | 586(0.795) |

P_{2} | 331(0.806) | 391(0.791) | 450(0.793) | 504(0.784) | 551(0.785) | |

P_{3} | 313(0.800) | 370(0.792) | 426(0.793) | 478(0.791) | 523(0.806) | |

P_{4} | 295(0.801) | 349(0.788) | 402(0.788) | 451(0.784) | 493(0.791) | |

0.5 | P_{1} | 542(0.803) | 599(0.785) | 648(0.801) | 689(0.793) | 722(0.799) |

P_{2} | 507(0.790) | 562(0.789) | 608(0.794) | 646(0.795) | 677(0.798) | |

P_{3} | 480(0.800) | 531(0.790) | 575(0.788) | 612(0.807) | 641(0.796) | |

P_{4} | 458(0.809) | 507(0.792) | 549(0.794) | 583(0.802) | 611(0.786) |

ρ | P | ϕ = 0 CS | ϕ = 0.25 | ϕ = 0.5 | ϕ = 0.75 | ϕ =1AR(1) |
---|---|---|---|---|---|---|

(a) Random missing | ||||||

0.1 | P_{1} | 60(0.809) | 60(0.807) | 60(0.798) | 60(0.817) | 60(0.809) |

P_{2} | 60(0.809) | 60(0.814) | 60(0.816) | 60(0.803) | 60(0.816) | |

P_{3} | 60(0.808) | 60(0.818) | 60(0.811) | 60(0.810) | 60(0.802) | |

P_{4} | 60(0.815) | 60(0.798) | 60(0.809) | 60(0.814) | 60(0.819) | |

0.25 | P_{1} | 60(0.808) | 60(0.808) | 60(0.814) | 60(0.824) | 60(0.809) |

P_{2} | 60(0.809) | 60(0.812) | 60(0.804) | 60(0.812) | 60(0.804) | |

P_{3} | 60(0.814) | 60(0.812) | 60(0.805) | 60(0.813) | 60(0.809) | |

P_{4} | 60(0.806) | 60(0.813) | 60(0.810) | 60(0.820) | 60(0.807) | |

0.5 | P_{1} | 60(0.799) | 60(0.808) | 60(0.824) | 60(0.810) | 60(0.814) |

P_{2} | 60(0.824) | 60(0.804) | 60(0.810) | 60(0.803) | 60(0.805) | |

P_{3} | 60(0.808) | 60(0.817) | 60(0.810) | 60(0.806) | 60(0.815) | |

P_{4} | 60(0.808) | 60(0.809) | 60(0.820) | 60(0.817) | 60(0.806) | |

(b) Monotone missing | ||||||

0.1 | P_{1} | 60(0.808) | 60(0.803) | 60(0.815) | 60(0.826) | 60(0.812) |

P_{2} | 60(0.812) | 60(0.808) | 60(0.810) | 60(0.825) | 60(0.818) | |

P_{3} | 60(0.807) | 60(0.810) | 60(0.804) | 60(0.800) | 60(0.801) | |

P_{4} | 60(0.797) | 60(0.797) | 60(0.815) | 60(0.807) | 60(0.817) | |

0.25 | P_{1} | 60(0.806) | 60(0.814) | 60(0.806) | 60(0.811) | 60(0.812) |

P_{2} | 60(0.812) | 60(0.810) | 60(0.812) | 60(0.801) | 60(0.814) | |

P_{3} | 60(0.802) | 60(0.804) | 60(0.810) | 60(0.813) | 60(0.814) | |

P_{4} | 60(0.800) | 60(0.798) | 60(0.817) | 60(0.807) | 60(0.812) | |

0.5 | P_{1} | 60(0.815) | 60(0.817) | 60(0.812) | 60(0.816) | 60(0.816) |

P_{2} | 60(0.815) | 60(0.815) | 60(0.824) | 60(0.818) | 60(0.813) | |

P_{3} | 60(0.807) | 60(0.803) | 60(0.799) | 60(0.804) | 60(0.798) | |

P_{4} | 60(0.808) | 60(0.806) | 60(0.809) | 60(0.813) | 60(0.798) |

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Aronow WS, Ahn C. Postprandial hypotension in 499 elderly persons in a long-term health care facility. Journal of the American Geriatrics Society. 1994;42(9):930–932. [PubMed]
- Davis CS. Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials. Statistics in Medicine. 1991;10(12):1959–1980. [PubMed]
- Diggle P, Heagerty P, Liang K, Zeger S. Analysis of longitudinal data. 2nd ed. Oxford University Press; 2002.
- Liang K, Zeger SL. Longitudinal data analysis for discrete and continuous outcomes using generalized linear models. Biometrika. 1986;84:3–32.
- Liu H, Wu T. Sample size calculation and power analysis of time-averaged difference. Journal of Modern Applied Statistical Methods. 2005;4(2):434–445.
- Munoz A, Carey V, Schouten JP, Segal M, Rosner B. A parametric family of correlation structures for the analysis of longitudinal data. Biometrics. 1992;48(3):733–742. [PubMed]
- Patel H, Rowe E. Sample size for comparing linear growth curves. Journal of Biopharmaceutical Statistics. 1999;9(2):339–50. [PubMed]
- Zhang S, Ahn C. How many measurements for time-averaged differences in repeated measurement studies? Contemporary Clinical Trials. 2011;32(3):412–417. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |