|Home | About | Journals | Submit | Contact Us | Français|
To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births. To explore the sensitivity of an actual trial to several analytic approaches to multiples.
A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The NO CLD trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using non-clustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses.
In the systematic review, most studies did not describe the randomization of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (p<0.01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not.
The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue.
The statistical methods most commonly used to analyze the outcomes of perinatal clinical trials assume the statistical independence of each measured outcome between individuals. In other words, they assume that individuals are not correlated. However, the outcomes of twins and infants from higher order multiple births are not independent, violating the assumptions of these analytic methods. Siblings from the same gestation share genes, as well as prenatal and postnatal environmental and iatrogenic exposures. Further correlation may be induced in a randomized clinical trial if, as often happens, multiples receive the same intervention, either as a result of randomizing multiples as a cluster or equivalently, randomizing the mother if treating during pregnancy.
Multiple births are common in the neonatal intensive care unit, particularly among patients born preterm. The prevalence of infants born from multiple gestations among very low birth weight infants at NICHD Neonatal Research Network sites in 2006 ranged from 21–30% [personal communication, MCW]. Among multiples, 87% were twins, 12% were triplets, and 2% were higher-order multiples.
Analytic strategies for clustered data are standard in other medical research situations.42 For instance, clustered analyses are routinely used when the outcomes of both eyes are measured in ophthalmology trials, for repeated longitudinal measures of the same patient, or when subjects have been cluster randomized by site.43–56 In these cases, the vast majority of outcomes are part of a cluster group, whether that group consists of the data from an individual site or an individual patient. In such cases, using statistical tests that assume independence for clustered data has the potential to impact both the accuracy and precision of the results. This problem has also been demonstrated in cohorts that are exclusively comprised of twins.44,57 However, unlike studies comprised entirely of twins, in most neonatal studies singletons are the majority although multiples may make up a large minority. This makes the clustering pattern in neonatal studies unique, because most of the outcomes measured are not from a cluster group, but many are. Gates et al. also demonstrated that the analytic approach to non-independence in such situations may influence the estimates of odds ratios and confidence intervals in such populations.58 Furthermore, multiples may be different than singletons with regards to key prognostic demographic characteristics such as race and socioeconomic status, thereby increasing the importance of appropriately accounting for their correlated status and other covariates in the analysis. Therefore, neonatal populations represent a unique situation for which a standard analytic approach has not been established.
We conducted a systematic review of recent multi-center randomized clinical trials studying preterm populations to assess the prevalence of clustered analytic techniques in the neonatal and perinatal literature. We hypothesized that inappropriately weighting the correlated outcomes of multiples by not using analyses that account for their non-independence could bias the calculated odds ratios and confidence intervals, potentially altering the interpretation of trial results. As a case study to explore the sensitivity of an actual neonatal trial to the analytic approach to multiples, we used the NO CLD trial that cluster-randomized very low birth weight infants by mother to either inhaled nitric oxide or placebo.13,59
We conducted a systematic review to obtain an overview of the way in which multiple births are handled in the neonatal and perinatal literature. Because the incidence of multiple births is higher in preterm populations, we focused the search on premature infants. In addition, we narrowed the search to multi-center randomized clinical trials, both for feasibility and because large multi-center trials require extensive collaboration between trialists and statisticians. The search was conducted on August 29, 2008 using PubMed, with the terms “preterm and multicenter” and the limits “published in the last 5 years, Humans, Randomized Controlled Trial, English, Newborn: birth-1 month.” Articles were included if they were either papers describing the methods of a multicenter trial or reporting the results of the primary outcome of the trial. A primary outcome was identified if the authors directly identified it as such or if that outcome was used to determine sample size. Trials where the primary outcome was measured in the mother but not the infant were excluded. Trials where an outcome could be equally attributed to the mother or the infant, such as breast-feeding success, were included.
The NO CLD trial of inhaled nitric oxide was a multi-center, blinded, placebo controlled study of the impact of inhaled nitric oxide on the development of bronchopulmonary dysplasia in 582 very low birth weight premature infants.13,59 The primary outcome was survival without bronchopulmonary dysplasia.60,61 The study enrolled singletons and infants born from multiple gestations. Some infants were born from a multiple gestation but did not have any siblings enrolled in the trial if those siblings did not meet eligibility criteria, were deceased, or consent was not given by the parents.
Due to expressed parental preference in previous trials, siblings from the same gestation were randomized to the same treatment. Given the cluster-randomized design, analyses accounting for clustering were planned a priori, and sample size calculations were based on clustered analyses. Generalized estimating equations were proposed as an analysis method in the original protocol, as well as random selection of a sibling from each gestation. Generalized estimating equations are an extension of generalized linear models and a form of regression that accounts for variance in clustered data.(10, 14) Multiple outputation, or within-cluster resampling, samples independent, uncorrelated data sets from the original data set, analyzes it, and combines the repeated resampling results.48,62 This method was published during enrollment for the NO CLD study and was recommended by the statistician on the data safety monitoring committee; it decreases the potential for random error in the simple random selection approach.
Simple counts and percentages were used to describe study outcomes within sibling groups. Chi-squared test and ANOVA were used to assess demographic differences between singletons and multiples.
We compared several methods for analysis of this clustered data. We first calculated odds ratios and confidence intervals for the primary outcome of the NO CLD study, survival without bronchopulmonary dysplasia, by logistic regression without adjustment for clustering. Then, we used two techniques using logistic regression that account for clustered data: generalized estimating equations and multiple outputation.48,51, 62 Finally, we calculated a logistic regression based only on singletons and the first-enrolled infant from each multiple gestation. SAS version 9.1 was used for the analyses.
The search parameters yielded 79 papers, of which 41 met inclusion criteria. Only four (9%) of these (including the NO CLD trial) used statistics that accounted for the non-independence of multiples.8,13,16,20 Collins et al. used logistic regression with robust variance estimates clustering on the mother.16 Morley et al. presented logistic regression without clustering for the main analysis and stated that when the analysis was repeated with generalized estimating equations to account for multiple birth the “results were substantially unaffected.”20 Marret et al. used generalized estimating equations to account for the non-independence of the outcomes of twins; they also stratified randomization by multiple/singleton status and adjusted for this in the analysis, presumably to account for an unequal distribution of prognostic variables between multiples and singletons.8 Eighteen papers did not specify whether multiples were included and four papers excluded all multiples or second-born infants from the study (figure 1). Of the six studies that described how enrolled multiples were randomized, five studies cluster-randomized the infants to the same intervention and one randomized the siblings separately. Among the postnatal intervention trials reporting the frequency of multiples in the study population, the prevalence ranged from 14–36%, although none of the studies specified how many of those subjects born from a multiple gestation had a sibling enrolled in the trial.
Of the 582 infants enrolled in the NO CLD trial, 157 (23%) were twins or triplets. However, because some of these siblings were deceased or did not meet eligibility criteria, only 84 (14.4%) had a sibling enrolled in the trial. Among those with a sibling from the same gestation in the trial, there were 36 pairs of siblings (twins or two siblings from a higher-order multiple gestation), and 4 sets of three siblings (triplets or three of four quadruplets). In 26 of the sibling pairs (72%), both siblings had the same primary outcome (survival without bronchopulmonary dysplasia versus bronchopulmonary dysplasia or death). In all four sets of three siblings enrolled (100%), all the siblings shared the same outcome. In addition, statistically significant demographic differences were seen among singletons, multiples without a sibling enrolled in the trial, and multiples with a sibling enrolled in the trial (table 1). Multiples were more likely to be white and to be born to married parents and mothers of higher age (p<0.025).
The analysis of the primary outcome of the NO CLD trial showed sensitivity to the analytic approach to clustered data, with calculated odds ratios ranging from 1.36 to 1.52. In addition, statistical significance was inconsistent across analysis methods. Using simple logistic regression without adjusting for clustering resulted in a non-significant result while adjusting for clustering with either generalized estimating equations or multiple outputation resulted in statistical significance (p<0.05). Finally, when the odds ratio for the primary outcome was calculated using generalized estimating equations and multiple outputation, similar statistically significant estimates of the odds ratio and confidence intervals were obtained (table 2).
We have demonstrated a case-study of an actual neonatal trial in which the conclusion of the study is sensitive to the analytic approach to multiples, and we have shown a low prevalence of papers accounting for such clustered data in the recent perinatal and neonatal literature. In the NO CLD trial, the outcomes of siblings from the same gestation were highly correlated as predicted based on shared genetic and environmental factors. Furthermore, likely as a result of a non-random societal distribution of in vitro fertilization,63–65 infants from multiple gestations were different than singletons with respect to key demographic variables that may be predictors of important pulmonary and neurodevelopmental outcomes of prematurity. Finally, whereas generalized estimating equations and multiple outputation yielded highly consistent estimates of a statistically significant effect, analysis by logistic regression without accounting for the clustering of outcomes did not reach significance.
The need for statistical approaches that account for clustering of data has been recognized for several decades.45,47–49,52–54,56–58 Several papers in the 1990’s showed that many cluster-randomized trials failed to account for clustered data in their analyses, leading to spurious results in more than 50% of such studies.46,66,67 Our findings in the perinatal literature are that between 2003 and 2008, the majority of recent multi-center trials that measure outcomes in preterm infants have not accounted for multiples in their analyses. Furthermore, many do not address whether multiples were enrolled, and if so, how they were randomized. However, the use of generalized estimating equations in some of the trials reviewed may represent an early trend in the neonatal and perinatal literature.
Shared genetic and environmental influences certainly may cause the outcomes of multiples to be correlated. For instance, genetic effects account for approximately 80% of the observed variance in bronchopulmonary dysplasia susceptibility.68 In addition, study designs that assign siblings to the same treatment may increase this correlation. Although independently randomizing each twin or systematically assigning them to different treatment arms could be one approach to decrease the correlation in postnatal trials, further work is needed to assess the statistical and ethical implications of this approach, in addition to the palatability to families. Certainly, in perinatal studies in which interventions are prenatal and the outcomes are measured in infants, this is not a feasible option. The exclusion of multiples from studies may also limit the generalizability of a trial, since multiples may be biologically and socio-demographically different than singletons. Therefore, this limitation should be seriously considered in the decision to exclude multiples from a study, as this approach could be more or less appropriate with different study aims. Furthermore, adjusting for multiple status in addition to clustering on pregnancy may be necessary in observational studies or in randomized trials when randomization has not successfully balanced the distribution of multiple status and associated prognostic covariates between the treatment groups. In situations where the outcomes of multiples and singletons are different, particularly if multiple-status is not a covariate in the model, multiple outputation may be more robust than GEE. Both multiple outputation and generalized estimating equations allow for adjusted analyses.
In mathematical simulations of a neonatal randomized clinical trial including twins, Shaffer et al. found only minimal differences between the results of logistic regression using generalized estimating equations and logistic regression without adjusting for correlated outcomes when the twins received the same therapy.69 Gates et al., using one real and two simulated perinatal datasets, showed that confidence intervals are often wider with methods that account for the non-independence of multiples; using simulated datasets they showed that there is potential for inconsistency of point estimates of effect size using different methods. The NO CLD case study presents the example of an actual perinatal trial in which the non-independence of siblings’ outcomes caused both the overall estimate of the odds ratio and confidence interval to be sensitive to the analytic approach, even with a lower percentage of twins than in the Gates analyses. Although the net differences were small, and some might argue the estimates were not meaningfully different, they would likely have been sufficient to alter many readers’ interpretation of the trial results, since there was a difference in statistical significance. Analytic strategy that accounts for clustering will tend to generate the most conservative estimates of the confidence intervals and the most statistically valid point estimates of effect size.
Although logistic regression excluding all but the first-enrolled multiple was presented (table 2) to demonstrate the sensitivity of the calculated results to the handling of multiples, it is not a recommended approach because there are ethical concerns about excluding the data from enrolled subjects and there is potential for systematic bias. For instance, in the NO CLD trial, infants enrolled earlier had a greater benefit from the study drug.13 Therefore, systematically excluding data from infants enrolled later than their siblings could over-inflate the estimated odds ratio for benefit from therapy and does not make use of all of the gathered data. Potential bias exists both from excluding multiples from an analysis and also from eligibility in the trial itself. An “any occurrence” strategy, in which an outcome of the pregnancy is considered to have occurred if any of the multiples experiences the outcome, also fails to make use of all the information gathered in the study.58 Randomly excluding data from analysis by randomly selecting one sibling from each pregnancy to contribute to the dataset similarly is prone to random error. Multiple outputation with a large number of repetitions is essentially an extension of this method that greatly decreases this possibility. However, multiple outputation is more computationally intensive and may be unfamiliar to readers. Generalized estimating equations may be more familiar to readers, but may also fail to converge (i.e. be statistically unstable) in situations with a low percentage of multiples. One reasonable strategy would be an a priori plan to use generalized estimating equations and, if they failed to converge, to use multiple outputation.48 The Donner and Klar cluster trials method used by Gates et al. is also an option if adjustment for covariates is not necessary.58,70 These approaches may also be expanded, for instance to allow for Bayesian modeling while accounting for the clustering. In addition, in some situations, multi-level clustering, such as by pregnancy and by center, may be appropriate.
While the case of the NO CLD trial demonstrates that the results of perinatal trials may be sensitive to the analytic approach to multiples, we have not determined the exact parameters under which clustered analyses are necessary; in general, the higher the percentage of multiples or the degree of intra-cluster correlation, the less valid the results of an analysis that assumes independence of outcomes will be. It is unclear what percentage of perinatal trials would have meaningfully different results with different statistical analyses. Therapies with small effect sizes, borderline significance, and few studies are most suspect, whereas therapies shown to have a large significant effect in many populations are of the least concern. Finally, although our search strategy for the systematic review of recent multi-center randomized clinical trials could have missed some existing trials, it is clear that the majority of recent multi-center trials have not accounted for the non-independence of multiples in their analyses. While we have focused on large randomized trials, similar analytical issues exist for other study designs, including cohort studies.57
In conclusion, while statistical approaches accounting for clustered data have become standard,42 particularly in reports of trials in adults where the majority of outcomes are part of a cluster group, the results of this systematic review indicate that only a small minority of perinatal and neonatal trials measuring outcomes in premature infants account for the non-independence of multiples. The example of the NO CLD study demonstrates that the results of such studies, in which twins and triplets comprise a significant minority of the study population, can be sensitive to the analytic approach to clustered data. The degree of impact of clustering will depend on the number of multiple births in the sample and the degree to which there is correlation in the outcomes of multiples. This issue needs to be discussed a priori during trial design. Furthermore, trial reports should account for how multiples are enrolled and randomized. An analytic approach that accounts for the non-independence of siblings should be the default option, as clinical researches have a responsibility to both research subjects and future patients to present the most valid results possible.
We would like to thank Richard Martin, David Durand, and Phillip Ballard for their professional input.
Supported by National Institutes of Health grant K23-HD056299. The NO CLD trial has been used as a case study. Disclosures with regards to that study are as follows: Supported by grants from the National Institutes of Health (U01-HL62514, P50-HL56401, P30-HD26979, P30-MRDDRC, and P30-HD26979) and the General Clinical Research Centers Program (M01-RR00240, M01-RR00084, M01-RR00425, M01-RR001271, M01-RR00064, and M01-RR00080). IKARIA (formerly INO Therapeutics) provided study gas and masked delivery systems for the primary NO CLD trial. M.C.W. and R.A.B. received support from IKARIA to fund completion of 24-month follow-up and data analysis. A.M.H. received reimbursement for travel to investigators meetings as part of these grants. A trainee in the division of Neonatology, Rainbow Babies & Children’s Hospital received grant support from IKARIA. IKARIA did not play any role in the design, analysis, interpretation, or reporting of the study.