Longitudinal studies, in which the same individuals are repeatedly measured over time, have become routine in psychiatric research. In fact, it is difficult to imagine a randomized clinical trial of a new psychiatric intervention that is not longitudinal in nature. For example, all recent trials of antidepressant medications submitted in support of new drug applications (NDAs) to the U.S. Food and Drug Administration (FDA) involve longitudinal randomized clinical trials (RCTs). However, longitudinal designs are not limited to RCTs and are frequently used in observational studies to investigate associations between treatment and outcomes (eg, the relationship between antidepressants and suicide in U.S. Veterans).1 We show that the use of repeated measures lead to very important gains in statistical power relative to studies with a single measurement occasion or simple pre- compared with post-treatment comparison. Longitudinal designs are also common in cluster-randomized trials. For example, an intervention is randomly assigned to all children within a family or within a classroom and the members of the family or classroom are repeatedly evaluated over the course of the study. Although statistical methods for the analysis of longitudinal data with clustering of subjects are now routinely applied,2 the design of such studies often suffers from poorly specified and often inadequate sample sizes because of the application of methods for sample size determination based on a single outcome or for longitudinal studies in which the clustering is ignored. The determination of sample sizes when subjects are both repeatedly measured over time and clustered within research centers (eg, multi-center RCTs) can be erroneous unless both factors are taken into account.
This article provides a method to determine both the number of centers, the number of subjects within centers, and the number of observation points that are required to produce a pre-specified level of statistical power (eg, 80%) for a given confidence level (eg, 95% or Type I error rate; eg, 5%). We demonstrate that in multi-center trials, the sample size required to adequately power a study to detect a clinically meaningful difference can vary dramatically depending on whether or not randomization is at the level of the individual subject or at the cluster (eg, classroom, clinic, hospital). Finally, in longitudinal studies, one must also be concerned with both the rate and timing of attrition, which can also play a major role in determining the number of subjects and/or centers needed to adequately power a research study.
In this article, we use recent statistical results in this area3 to provide guidance on sample size determination for longitudinal studies in psychiatric research. We restrict our discussion to continuous and normally distributed outcomes for ease of exposition. Future work in this area for categorical (eg, remission of depression) and non-normally distributed counting outcomes (eg, number of mental health service visits) is underway.
An important contribution of the recent work of Roy et al3 was to highlight the distinction between subject-level randomization and cluster-level randomization on sample size determination. When research subjects are clustered within research centers, clinics, hospitals, families, classrooms, or schools, it is often not possible to randomize individual subjects to treatment and control conditions because the intervention may be applied at the level of the cluster.4 For example, an intervention applied at the level of a classroom does not permit randomization of subjects to treatment and control conditions within a classroom. There are many cases where all subjects within a cluster are exposed to the intervention, thereby precluding randomization to a control condition. In these cases, randomization is performed at the cluster level (eg, school) and the effects of cluster randomization on sample size requirements can be profound. Clustering reduces power in two ways. First, intra-class correlation reduces the effective sample size to only a fraction of the entire sample size. Secondly, statistical power for intervention trials delivered at the level of the group depends much more strongly on the number of groups rather than the number of subjects. Because most cluster randomized trials can only afford a limited number of groups, statistical power can be low even with large numbers of subjects in each group. In longitudinal studies, the statistical power can increase substantially with the number of repeated measures. When a study has both clustering and longitudinal data, the statistical power is a complex function of these characteristics. These effects are more fully explored in the following sections.