We have demonstrated a case-study of an actual neonatal trial in which the conclusion of the study is sensitive to the analytic approach to multiples, and we have shown a low prevalence of papers accounting for such clustered data in the recent perinatal and neonatal literature. In the NO CLD trial, the outcomes of siblings from the same gestation were highly correlated as predicted based on shared genetic and environmental factors. Furthermore, likely as a result of a non-random societal distribution of in vitro
infants from multiple gestations were different than singletons with respect to key demographic variables that may be predictors of important pulmonary and neurodevelopmental outcomes of prematurity. Finally, whereas generalized estimating equations and multiple outputation yielded highly consistent estimates of a statistically significant effect, analysis by logistic regression without accounting for the clustering of outcomes did not reach significance.
The need for statistical approaches that account for clustering of data has been recognized for several decades.45,47–49,52–54,56–58
Several papers in the 1990’s showed that many cluster-randomized trials failed to account for clustered data in their analyses, leading to spurious results in more than 50% of such studies.46,66,67
Our findings in the perinatal literature are that between 2003 and 2008, the majority of recent multi-center trials that measure outcomes in preterm infants have not accounted for multiples in their analyses. Furthermore, many do not address whether multiples were enrolled, and if so, how they were randomized. However, the use of generalized estimating equations in some of the trials reviewed may represent an early trend in the neonatal and perinatal literature.
Shared genetic and environmental influences certainly may cause the outcomes of multiples to be correlated. For instance, genetic effects account for approximately 80% of the observed variance in bronchopulmonary dysplasia susceptibility.68
In addition, study designs that assign siblings to the same treatment may increase this correlation. Although independently randomizing each twin or systematically assigning them to different treatment arms could be one approach to decrease the correlation in postnatal trials, further work is needed to assess the statistical and ethical implications of this approach, in addition to the palatability to families. Certainly, in perinatal studies in which interventions are prenatal and the outcomes are measured in infants, this is not a feasible option. The exclusion of multiples from studies may also limit the generalizability of a trial, since multiples may be biologically and socio-demographically different than singletons. Therefore, this limitation should be seriously considered in the decision to exclude multiples from a study, as this approach could be more or less appropriate with different study aims. Furthermore, adjusting for multiple status in addition to clustering on pregnancy may be necessary in observational studies or in randomized trials when randomization has not successfully balanced the distribution of multiple status and associated prognostic covariates between the treatment groups. In situations where the outcomes of multiples and singletons are different, particularly if multiple-status is not a covariate in the model, multiple outputation may be more robust than GEE. Both multiple outputation and generalized estimating equations allow for adjusted analyses.
In mathematical simulations of a neonatal randomized clinical trial including twins, Shaffer et al. found only minimal differences between the results of logistic regression using generalized estimating equations and logistic regression without adjusting for correlated outcomes when the twins received the same therapy.69
Gates et al., using one real and two simulated perinatal datasets, showed that confidence intervals are often wider with methods that account for the non-independence of multiples; using simulated datasets they showed that there is potential for inconsistency of point estimates of effect size using different methods. The NO CLD case study presents the example of an actual perinatal trial in which the non-independence of siblings’ outcomes caused both the overall estimate of the odds ratio and confidence interval to be sensitive to the analytic approach, even with a lower percentage of twins than in the Gates analyses. Although the net differences were small, and some might argue the estimates were not meaningfully different, they would likely have been sufficient to alter many readers’ interpretation of the trial results, since there was a difference in statistical significance. Analytic strategy that accounts for clustering will tend to generate the most conservative estimates of the confidence intervals and the most statistically valid point estimates of effect size.
Although logistic regression excluding all but the first-enrolled multiple was presented () to demonstrate the sensitivity of the calculated results to the handling of multiples, it is not a recommended approach because there are ethical concerns about excluding the data from enrolled subjects and there is potential for systematic bias. For instance, in the NO CLD trial, infants enrolled earlier had a greater benefit from the study drug.13
Therefore, systematically excluding data from infants enrolled later than their siblings could over-inflate the estimated odds ratio for benefit from therapy and does not make use of all of the gathered data. Potential bias exists both from excluding multiples from an analysis and also from eligibility in the trial itself. An “any occurrence” strategy, in which an outcome of the pregnancy is considered to have occurred if any of the multiples experiences the outcome, also fails to make use of all the information gathered in the study.58
Randomly excluding data from analysis by randomly selecting one sibling from each pregnancy to contribute to the dataset similarly is prone to random error. Multiple outputation with a large number of repetitions is essentially an extension of this method that greatly decreases this possibility. However, multiple outputation is more computationally intensive and may be unfamiliar to readers. Generalized estimating equations may be more familiar to readers, but may also fail to converge (i.e. be statistically unstable) in situations with a low percentage of multiples. One reasonable strategy would be an a priori
plan to use generalized estimating equations and, if they failed to converge, to use multiple outputation.48
The Donner and Klar cluster trials method used by Gates et al. is also an option if adjustment for covariates is not necessary.58,70
These approaches may also be expanded, for instance to allow for Bayesian modeling while accounting for the clustering. In addition, in some situations, multi-level clustering, such as by pregnancy and by center, may be appropriate.
While the case of the NO CLD trial demonstrates that the results of perinatal trials may be sensitive to the analytic approach to multiples, we have not determined the exact parameters under which clustered analyses are necessary; in general, the higher the percentage of multiples or the degree of intra-cluster correlation, the less valid the results of an analysis that assumes independence of outcomes will be. It is unclear what percentage of perinatal trials would have meaningfully different results with different statistical analyses. Therapies with small effect sizes, borderline significance, and few studies are most suspect, whereas therapies shown to have a large significant effect in many populations are of the least concern. Finally, although our search strategy for the systematic review of recent multi-center randomized clinical trials could have missed some existing trials, it is clear that the majority of recent multi-center trials have not accounted for the non-independence of multiples in their analyses. While we have focused on large randomized trials, similar analytical issues exist for other study designs, including cohort studies.57
In conclusion, while statistical approaches accounting for clustered data have become standard,42
particularly in reports of trials in adults where the majority of outcomes are part of a cluster group, the results of this systematic review indicate that only a small minority of perinatal and neonatal trials measuring outcomes in premature infants account for the non-independence of multiples. The example of the NO CLD study demonstrates that the results of such studies, in which twins and triplets comprise a significant minority of the study population, can be sensitive to the analytic approach to clustered data. The degree of impact of clustering will depend on the number of multiple births in the sample and the degree to which there is correlation in the outcomes of multiples. This issue needs to be discussed a priori
during trial design. Furthermore, trial reports should account for how multiples are enrolled and randomized. An analytic approach that accounts for the non-independence of siblings should be the default option, as clinical researches have a responsibility to both research subjects and future patients to present the most valid results possible.