|Home | About | Journals | Submit | Contact Us | Français|
Prominent developmental theories posit a causal link between early-life exposures and later functioning. Yet, observed associations with early exposures may not reflect causal effects because of genetic and environmental confounding. The current manuscript describes how a systematic series of epidemiologic analyses that combine several genetically-informative designs and statistical approaches can help distinguish between competing theories. In particular, the manuscript details how combining the use of measured covariates with sibling-comparisons, cousin-comparisons, and additional designs can help elucidate the sources of covariation between early-life exposures and later outcomes, including the roles of (a) factors that are not shared in families, including a potential causal effect of the exposure; (b) carryover effects from the exposure of one child to the next; and (c) familial confounding. We also describe key assumptions, and how they can be critically evaluated. Furthermore, we outline how subsequent analyses, including effect decomposition with respect to measured, plausible mediators, and quantitative genetic models can help further specify the underlying processes that account for the associations between early-life exposures and offspring outcomes.
There is growing interest in the role of preconception and prenatal exposures for later development in numerous disciplines, ranging, for example, from neuroscience (Pechtel & Pizzagalli, 2011) to economics (Heckman, 2012). The Developmental Origins of Health and Disease (DOHaD) hypothesis, a broad hypothesis that encompasses such research, posits that early-life factors can causally impact later functioning when the exposure is experienced during a sensitive developmental period (Barker, 1998). Sensitive periods are characterized by increased plasticity corresponding to changing properties (e.g., in neural circuitry: Ganzel & Morris, 2011; Knudsen, 2004; Zeanah, Gunnar, McCall, Kreppner, & Fox, 2011), with the prenatal and early-life period being particularly important because of its unrivaled rate of neuronal and physical development (Rice & Barone, 2000; Zeanah et al., 2011).
Yet, as many researchers (e.g., Thapar & Rutter, 2009) and prominent scientific committees (e.g., Academy of Medical Sciences Working Group, 2007) have noted, genetic and environmental factors that influence both the exposures and outcomes could account for the observed associations, rather than the exposures having a causal influence. For instance, quantitative genetic studies have found that correlations between environments and individual’s genetic risk are pervasive (e.g., Jaffee & Price, 2012; Kendler & Baker, 2007; Plomin & Bergeman, 1991), which indicates that genetic factors may account for a large fraction of the observed association between putative environmental, early-life exposures and outcomes. Genetically-informative studies can help test competing hypotheses, which is the focus of the articles in this special issue of Behavior Genetics. Several articles and books also review the strengths and limitations of many different genetically informed designs (e.g., D’Onofrio, Lahey, Turkheimer, & Lichtenstein, 2013; Knopik, 2009; Lawlor & Mishra, 2009; Rutter, Pickles, Murray, & Eaves, 2001), and elsewhere we have briefly reviewed how some of the designs have been specifically used to examine early-life exposures (Donofrio, Class, Lahey, & Larsson, 2014).
In the current manuscript we describe how systematic use of multiple genetically- (and environmentally-) informative designs and analytical approaches may help identify the processes through which early-life exposures come to be associated with later outcomes. The overall rationale is that rigorous translational epidemiological approaches can help specify the processes behind observed associations (Gaziano, 2010; Hiatt, 2010; Khoury, Gwinn, & Ioannidis, 2010; Weissman, Brown, & Talati, 2011). We explicitly use the term “translational epidemiology” here to refer to observational study of health determinants designed to provide critical insights for research along the translational continuum—basic research informing and being informed by studies of interventions. Because each genetically-informative design has a number of assumptions and limitations that require tradeoffs between internal and external validity (Shadish, Cook, & Campbell, 2002), the use of multiple designs in a systematic series of analyses enables researchers to examine of the assumptions and limitations of each approach, which is essential for making meaningful and valid conclusions (Rutter et al., 2001). Systematically combining several genetically-informative designs and approaches also provides the opportunity to examine more complex developmental processes and theories. The goal of this manuscript, thus, is to provide a framework for considering how to explore the processes associated with early-life exposures. The review will provide researchers who are not familiar with genetically-informative designs an overview of these and the logic and limitations behind their use, while illustrating the assumptions researchers must make if they rely on traditional cohort studies that compare unrelated individuals who are differentially exposed to an early-life exposure. The review will further provide researchers who use these designs (including the authors) with a more systematic and integrated framework for conducting research in this area, particularly considering what the use of the different designs can and cannot achieve.
Fundamental for both the rationale and application of the framework is the underlying hypothesized structural relationships between the early-life exposure(s) and later outcome(s). Construction of diagrams to illustrate the relationships using, for example, Directed Acyclic Graphs (DAGs; Greenland, Pearl, & Robins, 1999), helps clarify the causal questions and identify if/how/when they can be tested. Such diagrams can also assist the identification of potential sources of bias and guide decisions and analyses to help differentiate between alternative hypotheses for an observed association. A causal diagram should include all variables, measured and unmeasured, that are common causes of any pair of variables on the diagram (Pearl, 2000); therefore, its construction relies heavily on subject-matter knowledge and expertise (Robins, 2001). Figure 1 provides a schematic example of how a diagram can be used to illustrate the hypothesized structural relationship between an early-life exposure and later outcome. In this figure, the early-life exposure and later outcome share common causes C1 (measured) and U1 (unmeasured), for example. A common cause creates a “back-door path” between the variables that leads to a spurious association; we say that the association between the two variables suffers from “confounding.” This spurious association can be eliminated by “blocking” the back-door path, from adjusting for (or conditioning on) a variable on that path. Consequently, we define a “confounder” to be any factor that can block a back-door path, not necessarily the common cause itself (i.e., in Figure 1 we illustrate this by noting how the confounder C2 blocks the unmeasured confounding from U2 for the association between the exposure and mediator). If correctly specified, the diagram can help identify the set of such factors (confounders) required to block all back door paths due to common causes of the exposure and outcome (i.e. control for all sources of confounding).
Diagrams are also very useful when exploring direct and indirect pathways through factors in the causal pathway between an exposure and outcome, so called “mediators.” For example, diagrams can help identify if the mediator is an effect of more than one cause, which it would be if the mediator is an effect of the exposure and when the mediator and outcome share common causes (U3 in Figure 1). This is important because conditioning on a common effect will open the path between its causes (here the exposure and U3). Researchers must carefully consider this because adjusting for a mediator that is also a common effect, or “collider”, will introduce a spurious (non-causal) association between the exposure and the outcome.
In this manuscript we focus on early-life risk factors that can vary among siblings because individual-specific risk factors account for variability in many domains of human health and development (Plomin & Daniels, 1987; Turkheimer & Waldron, 2000). We also focus on genetically-informative designs that do not require access to datasets that rely on relatively rare kinship pairs, such as twin, adoption, or in vitro fertilization studies, although using such designs can certainly provide critically important insights (see below). We want to stress that no one research team has access to all of the designs and measures needed to fully understand the processes through which early-life exposures influence later outcomes. Thus, the systematic plan of analyses we propose is designed to help guide translational research so that findings from epidemiologic studies can help inform subsequent research on early-life exposures (e.g., additional epidemiologic studies, research focused on proximal mediating factors, etc.). We propose three main aims that are sequential (See Figure 2). The first aim is to examine the robustness of the exposure-outcome association when controlling for measured covariates. The second aim is to examine the robustness of the exposure-outcome association when using multiple genetically-informative designs along with measured covariates. The third aim is to examine specific mediators of the exposure-outcome association using effect decomposition and/or the sources of confounding using quantitative genetic modeling.
Provided a causal diagram of the structural relationships for an early-life exposures and outcome has been correctly specified (e.g., Figure 1), the diagram determines the set of factors required to block all back door paths due to common causes of the exposure and outcome (i.e., control for all confounding). Careful consideration of the hypothesized structural relationship should, thus, assist in the identification of which measured covariates to account for in Aim 1, and most importantly if the inclusion of the covariates will be sufficient to exclude all confounding influences. For example, observational studies of early-life exposures frequently include several covariates, such as birth year, parity/birth order, maternal age at childbearing, etc.
By identifying and controlling for measured confounders (labeled C1 in Figure 1), Aim 1 will provide less biased estimates of the causal exposure effect. If the observed association is completely eliminated when controlling for measured covariates, researchers frequently make inferences that confounding factors, rather than a causal influence, explain the observed association. We believe such inferences should be provisional until the actual processes responsible for the observed association are identified because further specification of the precise confounding factors could shed great light on the etiology of the outcome. For instance, knowing that measured covariates attenuated an association does not identify whether the confounding was due to environmental or genetic processes related to the measured covariates. We, therefore, argue that additional designs are needed to help identify the underlying causal processes.
If, in contrast, the association persists when controlling for measured covariates additional research designs are critically important to explore plausible alternative explanations, particularly the role of unmeasured confounding (See Figure 1). A comprehensive consideration of plausible confounders should provide researchers with a thorough set of factors that need to be considered, and the degree to which a set of measured covariates helps rule out these alternative paths certainly depends on the specific research study. We can never verify whether any particular study appropriately measured and controlled for every salient confounder to rule out influence from unmeasured common causes (U1 in Figure 1); for instance, it is impossible to know whether a study has accounted for all genetic factors that could confound the exposure-outcome association. Hence, we believe that the results and inferences stemming from the analyses in Aim 1 need to be followed up by designs that can further examine unmeasured confounding.
Thus, the approach for addressing the plausible role of confounding factors by statistically controlling for measured covariates can help improve our understanding of the origins of the exposures and the processes through which the exposures come to be associated with subsequent outcomes (e.g., Rutter, 2000), but additional designs are needed.
In general, design features in genetically-informative approaches enable researchers to investigate unmeasured genetic and environmental common causes (part of U1 in Figure 1) that are shared by family members and explore developmental processes, such as carryover effects, that could account for associations between early-life exposures and later outcomes.
The initial genetically-informative approach that we propose is the sibling-comparison design. Numerous epidemiologic studies include several offspring from the same nuclear family. For example, several large-scale epidemiologic studies funded by government in the United States, such as National Collaborative Perinatal Project (Light, 1973) and the Children of the National Longitudinal Survey of Youth (Baker & Mott, 1989), assessed multiple siblings of mothers and include information on early-life exposures. National registries in Scandinavian countries also enable researchers to examine early risks using siblings (Byrne, Regan, & Howard, 2005; Miettunen, Suvisaari, Haukka, & Isohanni, 2011). Furthermore, researchers have specifically designed studies to leverage the advantages inherent in comparing siblings (Knopik et al., 2015; Neiderhiser, Reiss, & Hetherington, 2007). We encourage researchers to examine the possibility of using the design when conducting secondary data analysis and when designing new data collection focused on early-life exposures.
Our research team (D’Onofrio et al., 2013; Lahey & D’Onofrio, 2010) and many others (e.g., Donovan & Susser, 2011; Knopik, 2009; Lawlor & Mishra, 2009; Rutter, 2007) have written extensively about the logic of the sibling-comparison design. In brief, the design accounts for all genetic and environmental factors that make siblings similar because researchers use the unexposed siblings of exposed individuals as comparison (instead of using unexposed unrelated individuals). If an association with an early-life exposure remains when comparing differentially exposed siblings, the association cannot be due to any of the factors that they share. If there is no within-family association when comparing siblings, so that all siblings have the same rate/prevalence of the outcome regardless of their exposure, the results would (under the assumptions reviewed below) suggest that shared confounding factors account for the association. Notably, we use the term “shared” here and throughout the manuscript to refer to factors that make all siblings similar (i.e., the effective influence), regardless of whether the factors are objectively shared (Rutter, Silberg, & Simonoff, 1993). Sibling comparisons automatically control for confounding from all factors that are shared by siblings, and, because confounding factors may be difficult or even impossible to measure, numerous researchers have argued that the estimates from such studies provide a more rigorous examination of potential causal effects compared to those solely relying on measured covariates (i.e., using the design strengthens the internal validity of the study).
It is important to stress, however, that sibling comparisons have their own limitations and their validity therefore depends on several important assumptions (D’Onofrio et al., 2013; Donovan & Susser, 2011; Frisell, Oberg, Kuja-Halkola, & Sjolander, 2012; Lahey & D’Onofrio, 2010; McGue, Osler, & Christensen, 2010; Rutter, 2007; Susser, Eide, & Begg, 2010). To help illustrate some of these, Figure 3 depicts a hypothesized structural relationship between an early-life exposure and later outcome in a sibling pair (i). For simplicity, each set of common causes is represented with U; this could either be taken to cover all confounding factors or only the unmeasured (assuming measured have been appropriately accounted for). In addition to sharing common causes of exposure and outcome (Ui), siblings also share common causes of exposure (UXi), and outcome (UYi). All confounding factors not perfectly shared by siblings (non-shared) are captured in Ui1 and Ui2 respectively. When the comparison is made within siblings, all pathways through the shared factors (Ui, UXi and UYi) will be blocked. However, only siblings that are differentially exposed will contribute to an estimate of association, and their discordance has to be due to other factors than those shared. Most limitations of sibling comparisons derive from this requirement, and the more similar siblings are with respect to the exposure, the more influential the selection of discordance becomes. For internal validity, the main concern is that this renders the design sensitive to measurement error and confounding from factors the siblings do not share (Frisell et al., 2012; McGue et al., 2010). As the intra-class correlation in exposure increases, discordant individuals will also become less common, which may have implications for the external validity of the findings (generalization to the general population) and precision of the estimates (ability to acquire samples with adequate statistical power)(Allison, 2009). Furthermore, because the estimation assumes that the siblings’ experiences are independent of all measured and shared confounders, the design assumes no contagion from the first siblings status to the next (i.e. no arrow from Xi1 to Xi2, or from Yi1 to Yi2) and no carryover effect from one sibling’s exposure to the outcome of subsequent siblings (i.e. no arrow from Xi1 to Yi2) (Donovan & Susser, 2011; Sjolander, 2013; Sjolander et al., 2012).
Several of these limitations can be addressed in the design and/or analysis phase of the study, and if not, explored in sensitivity analysis. Exposures should, for example, only be considered if reliably measured, to minimize the influence of compounding measurement error. Unless all confounders that are not perfectly shared (Uij) can be identified and measured, a good rule of thumb is that siblings’ expected similarity (e.g., intra-class correlation) in exposure (UXi) should not exceed their similarity with respect to the entire set of unmeasured confounding factors (Ui and Uij). Concern for non-shared confounding could also be mitigated by considering temporal ordering; if, for example, a child cannot influence the exposure (such as the parental age at childbearing) all potentially confounding genetic factors are perfectly shared by full siblings. All measured potential confounders that could vary in siblings should be appropriately accounted for in the analysis in order to exclude their influence on the sibling comparison (see, for example, Sjölander & Greenland, 2013 for a description of the appropriate analytical methods). For early-life exposures, researchers should particularly consider the role of birth order because it is correlated with so many risk factors (e.g., parental age at childbearing, birth weight, gestational age, and infection during pregnancy, etc.). Again, careful consideration of the origins of exposures in Aim 1 will provide important insights into the additional covariates required to rule out plausible alternative hypotheses related to confounding factors that vary within siblings.
We want to stress, however, that it is impossible to rule out all possible confounding in a sibling-comparison study. For some early-life exposures it is, in fact, impossible to completely disentangle sibling-comparison estimates from the associations with highly correlated/collinear factors. For instance, any cohort effects (e.g., measured by year of birth) are perfectly correlated with advancing parental age in sibling comparisons. Furthermore, studies may not include reliable and valid measures of the relevant confounders of the exposure and outcome. To state it differently, though more similar than an unrelated individual, the unexposed sibling is not a perfectly exchangeable comparison. Researchers who use the design, as a result, need to acknowledge the limitation of being unable to account for unmeasured confounding.
The second genetically-informative design we propose researchers should use is the comparison of differentially exposed cousins. The logic behind the design—the advantages of comparing family members to account for unmeasured confounding—is parallel to the logic of sibling comparisons. The comparison of the offspring of adult siblings accounts for all factors that make individuals within an extended family similar. It follows that cousin comparisons do not account for as many genetic and environmental factors as sibling comparisons, where most importantly siblings share parents and cousins do not. For example, cousin comparisons cannot rule out influence from factors that make cousins different, which includes the influences of the spouses of the adult siblings (Eaves, Silberg, & Maes, 2005). The internal validity of cousin-comparison studies may, therefore, not be as strong as sibling-comparison studies.
Yet, cousin-comparison studies provide researchers with the opportunity to find converging evidence using a design with different limitations and assumptions. In fact, the comparison of differentially exposed cousins relaxes several of the assumptions found in sibling-comparisons (D’Onofrio et al., 2013; D’Onofrio et al., 2013). For instance, the finding in cousins may be more generalizable to other groups because differentially exposed cousins may be more common than differentially exposed siblings. Furthermore, the assumption of no carryover effects is arguably more justified when comparing cousins than siblings. Finally, studies that contain both sibling and cousin information frequently include a larger number of differentially exposed cousins than differently exposed siblings, which provides more statistical power.
The third genetically-informative design we propose researchers should use is the comparison of first-born cousins. Because birth order is correlated with so many early exposures (see above), we propose that researchers specifically examine differentially exposed first-born cousins. Restricting the analyses to only include first-born children provides an estimate of the association with an early exposure that is independent of all influences associated with birth order.
We also propose that researchers should specifically examine the role of carryover effects by analyzing sibling data using alternative analytic approaches. While we strongly encourage researchers to control for birth order in sibling comparisons studies and use the comparison of first-born cousins to examine associations with early-life exposures that are independent of birth order, neither of these approaches explicitly tests the influence of carryover effects. Yet, prominent developmental theories for several early risk factors, such as early teenage childbearing (e.g., Coley & Chase-Lansdale, 1998; C.A. Coyne & D’Onofrio, 2012), explicitly posit that the exposure of the first child has a causal influence on the outcomes (e.g., antisocial behavior) for all subsequent offspring (C. A. Coyne, Långström, Rickert, Lichtenstein, & D’Onofrio, 2013). Furthermore, reviewers have explicitly called for sibling-comparison studies to study such “dynamic” family influences (Donovan & Susser, 2011).
With large enough samples researchers can test the assumption of no carryover effects by conducting bidirectional case-crossover studies, which explore differentially exposed siblings across birth order (Meyer, Williams, Hernandez-Diaz, & Cnattingius, 2004). The design separately conducts sibling-comparison analyses among two sub-groups of differentially exposed sibling pairs, (1) a subgroup where the first-born was exposed and (2) a subgroup where the second-born was exposed. If the two types of sibling-comparison analyses were to yield similar results, this would not be consistent with a carryover effect. For example, when separately comparing differentially exposed sibling pairs with either the first- or second-born child being born prematurely, associations with long-term consequences were comparable, suggesting no role of carryover effects (D’Onofrio et al., 2013). In contrast, in the presence of a carryover effect we would expect the results of the sibling comparisons to differ according to which sibling (first or later-born) was exposed, assuming no effect modification by birth order.
Bidirectional case-crossover analyses are not possible when the early exposure is highly correlated with birth order, such as when exploring parental age at childbearing. Researchers have used regression models that explore the associations with both the proband’s and previous sibling’s exposure. We would like to stress that such an approach will not distinguish between carryover and confounding. For example, researchers have included first parental age at childbearing and the age at childbearing for the proband to examine possible carryover effects (e.g., Jaffee et al., 2001) or to account for familial confounding (e.g., Petersen, Mortensen, & Pedersen, 2011; Turley, 2003). Additional designs, therefore, are needed to better identify the processes through which the exposure of an earlier child would influence all subsequent children.
The selection of exposure-discordance implies selecting relatives who differ in non-shared causes of the exposure. Since random measurement error is not shared, the selection will favor relatives that differ in the direction of such error (Frisell et al., 2012). The magnitude of the bias from measurement error depends on the intra-class correlation in exposure and the presence of unmeasured confounding. The impact that random misclassification of exposure may have on the interpretation of sibling comparisons has been reviewed in detail elsewhere (Frisell et al., 2012; McGue et al., 2010). To examine the potential bias from measurement error researchers can apply empirical or hypothesized expectations of measurement reliability (or classification sensitivity/specificity) to the specific setting (prevalence of exposure and outcome, as well as intra-class correlation in exposure). Such analyses can estimate the expected attenuations of population estimates versus sibling-comparison estimates under different scenarios, such as the strength of the true causal effect and influence of unmeasured confounding. This would be particularly important for the interpretation of an observed attenuation in Aim 2; researchers can gauge how much attenuation could be attributed to the potential influence of measurement error.
Among the non-shared causes of exposure (that will be over-represented in exposure discordant relatives), there most likely will be some that also cause the outcome (i.e. common causes). Hence, while the comparison of exposure-discordant relatives achieves control for confounding from all factors they share, the comparison will be more imbalanced with respect to all other confounding influence; see Figure 3 (Frisell et al., 2012). This distinguishes these comparisons from ordinary matched designs in which the index person and reference are selected to be perfectly correlated on the matching variables, but not expected to be correlated in other causes of exposure or confounders. The impact of bias will depend on the relative correlations of exposure and confounding factors within families. Given appropriate control for all measured confounders, a proposed rule of thumb is that the within comparison may be used when (the whole set of) unmeasured confounding factors are more correlated than the exposure (and avoided in the opposite scenario). While the latter may be estimated, the former can only be hypothesized based on subject-matter knowledge. Discussion of what type, strength and direction, and how influenced by familial factors the confounding may be, should all be necessary components to guide the decision to apply these methods. In addition, to encourage such pre-evaluation, we also recommend careful considerations in both the design and analysis phase to help mitigate the influence of individual-level confounding. If concern still remains, it would also be possible to explore the influence of residual, unmeasured confounding using analytic results and/or data simulations (Frisell et al., 2012). By varying, for example, the hypothesized familial correlation in the total unmeasured confounding and its influence on exposure and outcome, while holding all other parameters constant (i.e. the correlation in exposure, the prevalence of exposure, confounding factors and outcome, and the true causal effect) researchers could try to assess under which conditions (if any) their results could be explained by unmeasured confounding.
As previously mentioned, representativeness may generally be of less concern in cousins, because there is rarely grounds to suspect exposure-discordant cousins to be substantially different from the general population. For both sibling and cousin comparisons, potential deviance could be due to (a) the restriction to related individuals (i.e., requiring mothers to have a sister or a mother to have at least two offspring, respectively) and/or (b) the additional requirement of exposure-discordance. The first selection can be explored by comparing estimates of association in the population to samples restricted to siblings or cousins respectively, and commensurate findings would provide some reassurance that the requirement of certain family ties does not influence representativeness. If the concern is, however, that exposure discordant individuals differ from the full sample (or the entire population) we suggest that researchers compare the distributions of measured covariates to identify and interpret potential differences.
The use of family-based designs, such as sibling- and cousin-comparisons, enables researchers to examine whether associations are independent of unmeasured genetic and environmental factors that make family members similar, which can provide a rigorous examination of competing hypotheses. It is important to stress that comparisons of related individuals make several assumptions, as described above. Fortunately, asking the same scientific question using different statistical approaches and genetically-informative designs (each with different assumptions and limitations) allows researchers to explore many of these assumptions, as well as examine alternative hypotheses about the processes responsible for associations between early exposures and adverse outcomes. Careful considerations in the design and analysis phase, use of multiple genetically-informative designs, and sensitivity analyses may allow for three potential outcomes (See Figure 1: results from the methods in Aim 2), which represent different processes that could explain why early risk factors are associated with later outcomes: (a) they cannot be explained by factors shared in families or the influence of measured covariates (i.e., the within-family associations remain robust); (b) the results are consistent with carryover effects; and (c) the results are consistent with role of familial confounding (i.e., there are no within-family associations). We certainly do not believe that these processes are mutually exclusive; rather, we expect that multiple processes will account for the associations with early exposures.
None of the possible scenarios under Aim 2 will identify specific causal processes. However, additional epidemiologic methods can help (a) specify mediating factors, (b) examine carryover effects, and (c) identify the source (genetic and environmental factors) of the familial confounding. The objective of this aim is (for scenario a) to examine specific factors as mediators; (for scenario b) to rigorously test the exposure of the first-born child; and/or (for scenario c) to better understand the source of the confounding factors shared by siblings that account for the exposure-outcome association.
If the findings from the genetically-informative (and sensitivity) analyses in Aim 2 suggest an independent association between an early-life exposure and later outcome (consistent with a causal effect), subsequent research should explore mediating mechanisms that are more proximal and/or ideally (for translational science) amenable to intervention. Having established associations in sibling-comparisons analyses also helps narrow the list of possible mediators, because, in addition to the standard criteria of being in the causal pathway (i.e. a cause of the outcome that is also an effect of the exposure), these are expected to vary within siblings (Lahey & D’Onofrio, 2010). In fact, researchers should, whenever possible, conduct genetically-informed studies (such as those described above) to explore plausible mediating factors to further justify their inclusion in subsequent analyses. Such analyses would enable researchers to help rule out measured (C2 and C3 in Figure 1) and some unmeasured confounding (U2 and U3 in Figure 1) related to the mediating factor.
The traditional practice of adjusting for a factor in the causal pathway in order to obtain the direct effect of an exposure is problematic if (a) the mediator acts as a modifier of the exposure effect on the outcome (Kaufman, Maclehose, & Kaufman, 2004) or if (b) the mediator is a collider (by sharing unmeasured common causes with the outcome) (Cole & Hernan, 2002). A structural approach can, however, help identify and avoid erroneous inferences in effect decomposition (VanderWeele, 2009, 2010; VanderWeele & Hernandez-Diaz, 2011).
To help address some of the problems with traditional mediation analyses (Cole & Hernan, 2002; Kaufman et al., 2004), researchers can implement a causal structural approach for counterfactual-based effect decomposition. First, illustration of the hypothesized structural relationship between exposure – mediator – and outcome (using, for example, a DAG) is instrumental to evaluate necessary assumptions of no unmeasured confounding (of exposure and outcome, mediator and exposure, and mediator and outcome). Second, under assumptions identified in the first step, the total effect can be decomposed into direct and indirect effects, (Pearl, 2001; Robins & Greenland, 1992). Controlled direct effects may be used to establish if there are pathways between the exposure and the outcome independent of the mediator and, with the special condition of no interaction between the effects of the exposure and the mediator on the outcome, they can also be used for effect decomposition. In the case of interactions, natural direct and indirect effects are more useful for effect decomposition (VanderWeele, 2009). Third, researchers should conduct sensitivity analysis to evaluate the robustness of underlying assumptions (VanderWeele, 2010). Exploring the influence of effect modification or unmeasured confounding helps identify conditions under which we would observe different scenarios.
If the alternative exploration of exposures highly correlated with birth order under Aim 2 suggests the possibility that carryover effects may be present, then additional designs and approaches are needed to better understand the processes through which the exposure of one child affects later born children in the family. It is important to note that such associations could due to causal processes (i.e., biological and social effects from the previous exposures influence all subsequent siblings, a so-called “carryover effect”). However, the association could also be due to environmental and/or genetic confounding. To better distinguish among these possibilities researchers can conduct analyses using the “children of full-/half-siblings” design in quantitative genetic modeling. This approach compares cousins (the offspring of siblings) who vary in their exposure to the risk factor and their genetic risk, as offspring of full-siblings share 12.5% of their genetic makeup (on average), whereas offspring of half-siblings share 6.25% (on average). Established multivariate quantitative genetic models (D’Onofrio et al., 2003; Heath, Kendler, Eaves, & Markell, 1985; McAdams et al., 2014; Silberg & Eaves, 2004) with these designs can help estimate the degree to which the observed association is due to environmental processes correlated with the exposure of the first-child, which is consistent with a causal influence; environmental factors that make all cousins in an extended family similar, providing support for the role of environmental confounding; and/or genetic factors shared by cousins, suggesting the importance of genetic confounding.
The design and analyses include several assumptions, which have been articulated elsewhere (D’Onofrio et al., 2013; D’Onofrio et al., 2003; Heath et al., 1985; McAdams et al., 2014; Silberg & Eaves, 2004). We want to highlight two that are particularly relevant, though. First, researchers need quite large datasets to precisely estimate the parameters of the models because the difference in genetic relatedness between the offspring of full- and half-siblings is relatively small (D’Onofrio et al., 2013). Second, the offspring of siblings designs do not account for factors that make cousins different, including the genetic and environmental factors associated with the spouses of the adult siblings (Eaves et al., 2005). As such, combining the quantitative genetic modeling with measured covariates will help researchers make more valid inferences about the carryover “effects.” For example, our studies on teenage childbearing using children of siblings designs and quantitative modeling strongly support the importance of studying the influence of age at first childbearing on all siblings when studying outcomes, such as ADHD and criminality (Chang et al., 2014; C. A. Coyne et al., 2013). In sum, the major advantage of using the offspring of full-/half-siblings is that the approach can help further distinguish among the causal and confounding hypotheses regarding how the exposure of the first child is associated with outcomes in all of his/her siblings.
If the designs and approaches in Aim 2 suggest that familial factors (shared genetic and environmental factors) account for the association between an early-life exposure and later outcome, additional designs are needed because the comparisons of siblings (or cousins) alone cannot explore the source of such familial confounding (Donovan & Susser, 2011; Lahey & D’Onofrio, 2010). Stated differently, the use of sibling and cousin comparisons can highlight the importance of Ui in Figure 3, but the designs cannot determine to what extent Ui is genetic and/or environmental. Furthermore, the conclusions researchers can draw from the multiple designs used in Aim 2 are based solely on interpreting the pattern of the results across the designs. To quantify the role of different processes, other designs and analytical strategies are needed.
In particular, researchers can take advantage of analytical models that combine sibling comparisons (with their ability to study exposures that are not shared by siblings) with the “children of full/half-siblings” design (with the ability to study factors that are shared by siblings). Sibling-comparison analyses may indicate that factors (either genetic and/or environmental) that make siblings similar account for the observed associations with the early exposure. The “children of half/full siblings design” (described above) can help researchers explore the extent to which the confounding is due to genetic or environmental factors. In particular, quantitative genetic analyses of the design can help distinguish between environmental (confounding) processes that are specific to nuclear families (i.e., the factors do not make cousins similar); environmental (confounding) processes that make cousins within an extended family similar; and (confounding) genetic factors. Combining the different approaches in the same model enables researchers to simultaneously estimate the magnitude of these confounding processes and the magnitude of the sibling-comparison estimates (which, under certain assumptions, is consistent with a causal influence). Combining the different designs, therefore, enables researchers to estimate several processes that may account for the observed association between an early exposure and an outcome.
Several articles discuss the combined designs, the statistical models, and their application to early exposures (D’Onofrio et al., 2008; Harden et al., 2007; Kuja-Halkola, D’Onofrio, Larsson, & Lichtenstein, 2014; Kuja-Halkola et al., 2010). In particular, we recently combined these different genetically-informed designs to study the processes that account for associations between maternal smoking during pregnancy and numerous offspring outcomes, including pregnancy outcomes, intellectual abilities, and externalizing problems (Kuja-Halkola et al., 2014). Notably, the model results indicated that maternal smoking during pregnancy was associated with pregnancy outcomes (e.g., low birth weight and preterm birth) when comparing differentially exposed siblings, consistent with a causal influence. In contrast, the sibling-comparison estimates found no within-family associations with later intellectual abilities or externalizing—siblings had the same frequency/rates of these problems, regardless of their exposure to maternal smoking during pregnancy. Moreover, the quantitative genetic analyses indicated that genetic factors were largely responsible for this confounding, suggesting that genetic factors that influenced maternal smoking during pregnancy also influenced offspring cognitive and behavioral problems. The results highlight how advanced quantitative genetic modeling that combines several genetically-informed designs can help elucidate and distinguish between very different processes.
With Aim 3 we hope researchers will use more detailed measurement and advanced designs/analyses to better understand the processes behind observed associations between early-life exposures and later outcomes. We described three possible scenarios. First, using effect decomposition would provide insight into theoretically and empirically-supported plausible mediators of early-life exposures. Second, exploring carryover effects would enable researchers to explore social and biological factors related to the exposure of the first-born offspring on all subsequent offspring in a family. Third, quantitative behavior genetic modeling considering family relationships in multiple generations would help elucidate the sources (i.e., genetic and/or environmental) of the familial confounding.
In the current manuscript we provide an overview of how several genetically-informed designs can be used to conduct translational epidemiologic studies of early-life exposures using a systematic framework. We believe that studies using this framework can provide key insights into the consequences of early-life exposures.
We also see several other steps forward. First, to further strengthen the inference from these methods we believe that researchers should also use other family-based, genetically-informative designs, such as the co-twin design (McGue et al., 2010), quantitative genetic modeling of identical and fraternal twins (Turkheimer & Harden, submitted), adoption studies (Leve, Neiderhiser, Scaramella, & Reiss, 2010), and in vitro fertilization approaches (Thapar et al., 2007) when appropriate. For instance, causal inference regarding the importance of fetal growth for psychosocial outcomes have been greatly strengthened by the use of several genetically-informed designs (review in Donofrio et al., 2014). Researchers also can use genetically-informed approaches that rely on molecular genetic data, such as Mendelian Randomization (Smith & Ebrahim, 2005), to examine early-life exposures.
Second, researchers will need to collaborate to include more detailed measures of early-life exposures and plausible mediating mechanisms, including biomarkers, in genetically-informative studies (Donofrio et al., 2014). This will be particularly important because most of the studies using the designs we describe in the current review have relied on register-based assessments of global risks (e.g., birth weight for gestational age as a proxy for fetal growth) or maternal self-report of health behaviors (e.g., smoking during pregnancy). More detailed measurement will also enable researchers to better understand the origins of within-family differences in the exposures (e.g., why was one sibling exposed but not another?), which is important for understanding the consequences of the exposures (e.g., Caspi et al., 2004). Thus, the degree to which genetically-informed studies can specify more precise processes relies on rigorous assessments of risk factors at multiple levels of analysis.
Third, future studies of early risks that do not use genetically-informed designs should be informed by explicit consideration of unmeasured genetic and environmental confounding and the plausible role of carryover effects (i.e., the research is guided by explicit causal diagrams). We certainly understand that it will not be possible to conduct genetically-informative studies of every salient early-life exposure. For example, it is currently not feasible to conduct large-scale studies of fetal brain development based on prenatal fMRI assessments. Researchers conducting analyses of early-life risks (and those designing new studies) will need to use rigorous methods to identify and measure possible confounders and carefully consider the role of unmeasured confounding when interpreting results.
Finally, with the growing availability of GWAS data, including in large epidemiologic studies, the field will need to further explore how SNP-based information can be leveraged to further examine and clarify the processes related to parent-child associations (e.g., Eaves, Pourcain, Smith, York, & Evans, 2014) and early exposures. While the use of GWAS data may enable us to better understand early-life exposures, we believe that considerable methodological work is needed to better appreciate the advantages, limitations, and assumptions of using different methodological approaches with GWAS data (including from studies that have genotyped both the parents and offspring) when studying such exposures.
Leading researchers in numerous fields (review in D’Onofrio et al., 2013) have all stressed the critical need for researchers to use design features, including those in genetically-informative approaches, instead of solely relying on measured covariates to test competing hypotheses. For example, using measured covariates to account for confounding factors and exploring mediating variables to explain an association with an early-life exposure will often produce biased estimates because of unmeasured confounding. The overall significance of the proposed framework is that the results of studies using these approaches will help specify the processes responsible for public health problems associated with early-life exposures, regardless of whether the associations are due to causal processes or confounding. These efforts could influence policy and medical decision-making by providing public health officials, physicians, clinicians, and the public with important information about the consequences of early-life exposures. To provide just one example, using these designs to explore outcomes associated with maternal prescription medication use during pregnancy could help answer important public health questions (e.g., Parisi, Spong, Zajicek, & Guttmacher, 2011).
In addition to the implications for policy, medical, and personal decision-making, the outcomes of such research could also have important positive impact on subsequent translational research. First, using these designs would help prevention science better identify (a) modifiable targets for intervention/prevention efforts that are consistent with causal risks factors, and (b) putative risk factors that are markers/predictors, but not causally related to the outcomes (Cicchetti, 1993; J.D. Coie, Miller-Jackson, & Bagwell, 2000; J.D Coie et al., 1993). Second, the findings could help identify which biological mechanisms should be explored by basic research (D’Onofrio et al., 2013; Fernando & Robbins, 2011; Nestler & Hyman, 2010), emphasizing a key benefit of translational epidemiology (e.g., Weissman et al., 2011). Third, the proposed research would provide a critical foundation for empirically supported studies of effect moderating biological factors (e.g., gene-by-environment interactions) because identification of true environmental influences is required for such endeavors (e.g., Dick, 2011; Moffitt, Caspi, & Rutter, 2005; Vrieze, Iacono, & McGue, 2012).
We acknowledge financial support from the Swedish Research Council through the Swedish Initiative for Research on Microdata in the Social and Medical Sciences (SIMSAM, grant no 340-2013-5867), NICHD (HD061817 and HD061384), and NIMH (MH094011 and MH102221).