States and behaviors of different individuals are expected to be correlated across a social network. Christakis and Fowler have proposed a ’three degrees of influence rule’ to characterize the extent of such dependence. In this paper we discuss three distinct interpretations of such a rule, one involving only associations (which is the interpretation for which Christakis and Fowler give evidence), one involving actual causation, generally referred to as contagion or social influence, and one involving direct effects. We discuss analytic procedures appropriate for assessing evidence for each possible interpretation and the increasingly difficult methodological challenges present in each interpretation.
doi:10.1002/sim.5653
PMCID: PMC4221254
PMID: 23341081
Contagion; direct effects; homophily; sensitivity analysis; social influence
We present results that allow the researcher in certain cases to determine the direction of the bias that arises when control for confounding is inadequate. The results are given within the context of the directed acyclic graph causal framework and are stated in terms of signed edges. Rigorous definitions for signed edges are provided. We describe cases in which intuition concerning signed edges fails and we characterize the directed acyclic graphs that researchers can use to draw conclusions about the sign of the bias of unmeasured confounding. If there is only one unmeasured confounding variable on the graph, then non-increasing or non-decreasing average causal effects suffice to draw conclusions about the direction of the bias. When there are more than one unmeasured confounding variable, non-increasing and non-decreasing average causal effects can be used to draw conclusions only if the various unmeasured confounding variables are independent of one another conditional on the measured covariates. When this conditional independence property does not hold, stronger notions of monotonicity are needed to draw conclusions about the direction of the bias.
doi:10.1097/EDE.0b013e3181810e29
PMCID: PMC4242711
PMID: 18633331
Summary
Formal rules governing signed edges on causal directed acyclic graphs are described in this paper and it is shown how these rules can be useful in reasoning about causality. Specifically, the notions of a monotonic effect, a weak monotonic effect and a signed edge are introduced. Results are developed relating these monotonic effects and signed edges to the sign of the causal effect of an intervention in the presence of intermediate variables. The incorporation of signed edges into the directed acyclic graph causal framework furthermore allows for the development of rules governing the relationship between monotonic effects and the sign of the covariance between two variables. It is shown that when certain assumptions about monotonic effects can be made then these results can be used to draw conclusions about the presence of causal effects even when data is missing on confounding variables.
doi:10.1111/j.1467-9868.2009.00728.x
PMCID: PMC4239133
PMID: 25419168
Bias; Causal inference; Confounding; Directed acyclic graphs; Structural equations
A key question in many studies is how to divide the total effect of an exposure into a component that acts directly on the outcome and a component that acts indirectly, i.e. through some intermediate. For example, one might be interested in the extent to which the effect of diet on blood pressure is mediated through sodium intake and the extent to which it operates through other pathways. In the context of such mediation analysis, even if the effect of the exposure on the outcome is unconfounded, estimates of direct and indirect effects will be biased if control is not made for confounders of the mediator-outcome relationship. Often data are not collected on such mediator-outcome confounding variables; the results in this paper allow researchers to assess the sensitivity of their estimates of direct and indirect effects to the biases from such confounding. Specifically, the paper provides formulas for the bias in estimates of direct and indirect effects due to confounding of the exposure-mediator relationship and of the mediator-outcome relationship. Under some simplifying assumptions, the formulas are particularly easy to use in sensitivity analysis. The bias formulas are illustrated by examples in the literature concerning direct and indirect effects in which mediator-outcome confounding may be present.
doi:10.1097/EDE.0b013e3181df191c
PMCID: PMC4231822
PMID: 20479643
Background
Observational studies have reported higher mortality among older adults treated with first-generation antipsychotics (FGAs) versus second-generation antipsychotics (SGAs). A few studies examined risk for medical events, including stroke, ventricular arrhythmia, venous thromboembolism, myocardial infarction, pneumonia, and hip fracture.
Objectives
1) Review robust epidemiologic evidence comparing mortality and medical event risk between FGAs and SGAs in older adults; 2) Quantify how much these medical events explain the observed mortality difference between FGAs and SGAs.
Data sources
Pubmed and Science Citation Index.
Study eligibility criteria, participants, and interventions
Studies of antipsychotic users that: 1) evaluated mortality or medical events specified above; 2) restricted to populations with a mean age of 65 years or older 3) compared FGAs to SGAs, or both to a non-user group; (4) employed a “new user” design; (5) adjusted for confounders assessed prior to antipsychotic initiation; (6) and did not require survival after antipsychotic initiation. A separate search was performed for mortality estimates associated with the specified medical events.
Study appraisal and synthesis methods
For each medical event, we used a non-parametric model to estimate lower and upper bounds for the proportion of the mortality difference—comparing FGAs to SGAs—mediated by their difference in risk for the medical event.
Results
We provide a brief, updated summary of the included studies and the biological plausibility of these mechanisms. Of the 1122 unique citations retrieved, we reviewed 20 observational cohort studies that reported 28 associations. We identified hip fracture, stroke, myocardial infarction, and ventricular arrhythmias as potential intermediaries on the causal pathway from antipsychotic type to death. However, these events did not appear to explain the entire mortality difference.
Conclusions
The current literature suggests that hip fracture, stroke, myocardial infarction, and ventricular arrhythmias partially explain the mortality difference between SGAs and FGAs.
doi:10.1371/journal.pone.0105376
PMCID: PMC4139353
PMID: 25140533
We show that, in the presence of uncontrolled environmental confounding, joint tests for the presence of a main genetic effect and gene-environment interaction will be biased if the genetic and environmental factors are correlated, even if there is no effect of either the genetic factor or the environmental factor on the disease. When environmental confounding is ignored, such tests will in fact reject the joint null of no genetic effect with a probability that tends to 1 as the sample size increases. This problem with the joint test vanishes under gene-environment independence, but it still persists if estimating the gene-environment interaction parameter itself is of interest. Uncontrolled environmental confounding will bias estimates of gene-environment interaction parameters even under gene-environment independence, but it will not do so if the unmeasured confounding variable itself does not interact with the genetic factor. Under gene-environment independence, if the interaction parameter without controlling for the environmental confounder is nonzero, then there is gene-environment interaction either between the genetic factor and the environmental factor of interest or between the genetic factor and the unmeasured environmental confounder. We evaluate several recently proposed joint tests in a simulation study and discuss the implications of these results for the conduct of gene-environment interaction studies.
doi:10.1093/aje/kws439
PMCID: PMC3698991
PMID: 23821317
case-control; case-only; confounding; gene-environment; interaction; joint tests; marginal genetic association
Purpose
Over the last thirty years, prenatal care utilization, both the proportion of women receiving the recommended number of visits and the average number of visits, has increased substantially. Although infant mortality has fallen, preterm birth has increased. We hypothesized that prenatal care may lead to lower infant mortality in part by increasing the detection of obstetrical problems for which the clinical response may be to medically induce preterm birth.
Methods
We examine whether medically induced preterm birth mediates the association between prenatal care and infant mortality using newly developed methods for mediation analysis. Data are the cohort version of the national linked birth certificate and infant death data for 2003 births. Analyses adjust for maternal sociodemographic, geographic and health characteristics.
Results
Receiving more prenatal care visits than recommended was associated with medically induced preterm birth (OR=2.44 (95% CI: 2.40,2.49) compared with fewer visits than recommended). Medically induced preterm birth was itself associated with higher infant mortality (OR=5.08, 95% CI: 4.61,5.60)), but that association was weaker among women receiving extra prenatal care visits (OR=3.08 95% CI: 2.88,3.30)) compared to women receiving the recommended number of visits or fewer.
Conclusions
These analyses suggest that some of the benefit of prenatal care in terms of infant mortality may be mediated by medically induced preterm birth. If so, using preterm birth rates as a metric for tracking birth policy and outcomes could be misleading.
doi:10.1016/j.annepidem.2013.04.010
PMCID: PMC3711527
PMID: 23726822
Peer influence and social interactions can give rise to spillover effects in which the exposure of one individual may affect outcomes of other individuals. Even if the intervention under study occurs at the group or cluster level as in group-randomized trials, spillover effects can occur when the mediator of interest is measured at a lower level than the treatment. Evaluators who choose groups rather than individuals as experimental units in a randomized trial often anticipate that the desirable changes in targeted social behaviors will be reinforced through interference among individuals in a group exposed to the same treatment. In an empirical evaluation of the effect of a school-wide intervention on reducing individual students’ depressive symptoms, schools in matched pairs were randomly assigned to the 4Rs intervention or the control condition. Class quality was hypothesized as an important mediator assessed at the classroom level. We reason that the quality of one classroom may affect outcomes of children in another classroom because children interact not simply with their classmates but also with those from other classes in the hallways or on the playground. In investigating the role of class quality as a mediator, failure to account for such spillover effects of one classroom on the outcomes of children in other classrooms can potentially result in bias and problems with interpretation. Using a counterfactual conceptualization of direct, indirect and spillover effects, we provide a framework that can accommodate issues of mediation and spillover effects in group randomized trials. We show that the total effect can be decomposed into a natural direct effect, a within-classroom mediated effect and a spillover mediated effect. We give identification conditions for each of the causal effects of interest and provide results on the consequences of ignoring “interference” or “spillover effects” when they are in fact present. Our modeling approach disentangles these effects. The analysis examines whether the 4Rs intervention has an effect on children's depressive symptoms through changing the quality of other classes as well as through changing the quality of a child's own class.
doi:10.1080/01621459.2013.779832
PMCID: PMC3753117
PMID: 23997375
Direct/indirect effects; interference; multilevel models; social interactions
Genetic association studies have been a popular approach for assessing the association between common Single Nucleotide Polymorphisms (SNPs) and complex diseases. However, other genomic data involved in the mechanism from SNPs to disease, e.g., gene expressions, are usually neglected in these association studies. In this paper, we propose to exploit gene expression information to more powerfully test the association between SNPs and diseases by jointly modeling the relations among SNPs, gene expressions and diseases. We propose a variance component test for the total effect of SNPs and a gene expression on disease risk. We cast the test within the causal mediation analysis framework with the gene expression as a potential mediator. For eQTL SNPs, the use of gene expression information can enhance power to test for the total effect of a SNP-set, which are the combined direct and indirect effects of the SNPs mediated through the gene expression, on disease risk. We show that the test statistic under the null hypothesis follows a mixture of χ2 distributions, which can be evaluated analytically or empirically using the resampling-based perturbation method. We construct tests for each of three disease models that is determined by SNPs only, SNPs and gene expression, or includes also their interactions. As the true disease model is unknown in practice, we further propose an omnibus test to accommodate different underlying disease models. We evaluate the finite sample performance of the proposed methods using simulation studies, and show that our proposed test performs well and the omnibus test can almost reach the optimal power where the disease model is known and correctly specified. We apply our method to re-analyze the overall effect of the SNP-set and expression of the ORMDL3 gene on the risk of asthma.
doi:10.1214/13-AOAS690
PMCID: PMC3981558
PMID: 24729824
Causal Inference; Data Integration; Mediation Analysis; Mixed Models; Score Test; SNP Set Analysis; Variance Component Test
Recent theory in causal inference has provided concepts for mediation analysis and effect decomposition that allow one to decompose a total effect into a direct and an indirect effect. Here, it is shown that what is often taken as an indirect effect can in fact be further decomposed into a “pure” indirect effect and a mediated interactive effect, thus yielding a three-way decomposition of a total effect (direct, indirect, and interactive). This three-way decomposition applies to difference scales and also to additive ratio scales and additive hazard scales. Assumptions needed for the identification of each of these three effects are discussed and simple formulae are given for each when regression models allowing for interaction are used. The three-way decomposition is illustrated by examples from genetic and perinatal epidemiology, and discussion is given to what is gained over the traditional two-way decomposition into simply a direct and an indirect effect.
doi:10.1097/EDE.0b013e318281a64e
PMCID: PMC3563853
PMID: 23354283
In this paper we introduce methodology—causal directed acyclic
graphs—that empirical researchers can use to identify causation, avoid
bias, and interpret empirical results. This methodology has become popular in a
number of disciplines, including statistics, biostatistics, epidemiology and
computer science, but has yet to appear in the empirical legal literature.
Accordingly we outline the rules and principles underlying this new methodology
and then show how it can assist empirical researchers through both hypothetical
and real-world examples found in the extant literature. While causal directed
acyclic graphs are certainly not a panacea for all empirical problems, we show
they have potential to make the most basic and fundamental tasks, such as
selecting covariate controls, relatively easy and straightforward.
doi:10.1093/lpr/mgr019
PMCID: PMC4324363
doi:10.1097/EDE.0b013e3182781410
PMCID: PMC3523303
PMID: 23232624
Causal inference with interference is a rapidly growing area. The literature has begun to relax the “no-interference” assumption that the treatment received by one individual does not affect the outcomes of other individuals. In this paper we briefly review the literature on causal inference in the presence of interference when treatments have been randomized. We then consider settings in which causal effects in the presence of interference are not identified, either because randomization alone does not suffice for identification, or because treatment is not randomized and there may be unmeasured confounders of the treatment-outcome relationship. We develop sensitivity analysis techniques for these settings. We describe several sensitivity analysis techniques for the infectiousness effect which, in a vaccine trial, captures the effect of the vaccine of one person on protecting a second person from infection even if the first is infected. We also develop two sensitivity analysis techniques for causal effects in the presence of unmeasured confounding which generalize analogous techniques when interference is absent. These two techniques for unmeasured confounding are compared and contrasted.
doi:10.1214/14-STS479
PMCID: PMC4300555
PMID: 25620841
Causal inference; infectiousness effect; interference; sensitivity analysis; spillover effect; stable unit treatment value assumption; vaccine trial
In randomized trials with subgroup analyses, the primary treatment or intervention of interest is randomized but the secondary factors defining subgroups are not. The commentary clarifies when confounding is or is not an issue in subgroup analyses. If investigators are simply interested in targeting subpopulations for intervention, control for confounding does not need to be made. If investigators are interested in intervening on the secondary factor defining the subgroup in order to increase the treatment effect or in attributing the subgroup differences to the secondary factor itself then confounding is relevant and must be controlled for. The point is illustrated using randomized trials published in the literature.
doi:10.7326/0003-4819-154-10-201105170-00008
PMCID: PMC3825512
PMID: 21576536
Loneliness has been shown to longitudinally predict subjective well-being. The authors used data from a longitudinal population-based study (2002–2006) of non-Hispanic white, African-American, and nonblack Latino-American persons born between 1935 and 1952 and living in Cook County, Illinois. They applied marginal structural models for time-varying exposures to examine the magnitude and persistence of the effects of loneliness on subjective well-being and of subjective well-being on loneliness. Their results indicate that, if interventions on loneliness were made 1 and 2 years prior to assessing final subjective well-being, then only the intervention 1 year prior would have an effect (standardized effect = −0.29). In contrast, increases in subjective well-being 1 year prior (standardized effect = −0.26) and 2 years prior (standardized effect = −0.13) to assessing final loneliness would both have an effect on an individual's final loneliness. These effects persist even after control is made for depressive symptoms, social support, and psychiatric conditions and medications as time-varying confounders. Results from this study indicate an asymmetrical and persistent feedback of fairly substantial magnitude between loneliness and subjective well-being. Mechanisms responsible for the asymmetry are discussed. Developing interventions for loneliness and subjective well-being could have substantial psychological and health benefits.
doi:10.1093/aje/kws173
PMCID: PMC3571255
PMID: 23077285
causal models; loneliness; marginal structural models; subjective well-being
Analyses of social network data have suggested that obesity, smoking, happiness and loneliness all travel through social networks. Individuals exert “contagion effects” on one another through social ties and association. These analyses have come under critique because of the possibility that homophily from unmeasured factors may explain these statistical associations and because similar findings can be obtained when the same methodology is applied to height, acne and head-aches, for which the conclusion of contagion effects seems somewhat less plausible. We use sensitivity analysis techniques to assess the extent to which supposed contagion effects for obesity, smoking, happiness and loneliness might be explained away by homophily or confounding and the extent to which the critique using analysis of data on height, acne and head-aches is relevant. Sensitivity analyses suggest that contagion effects for obesity and smoking cessation are reasonably robust to possible latent homophily or environmental confounding; those for happiness and loneliness are somewhat less so. Supposed effects for height, acne and head-aches are all easily explained away by latent homophily and confounding. The methodology that has been employed in past studies for contagion effects in social networks, when used in conjunction with sensitivity analysis, may prove useful in establishing social influence for various behaviors and states. The sensitivity analysis approach can be used to address the critique of latent homophily as a possible explanation of associations interpreted as contagion effects.
doi:10.1177/0049124111404821
PMCID: PMC4288024
PMID: 25580037
Questions of mediation are often of interest in reasoning about mechanisms, and methods have been developed to address these questions. However, these methods make strong assumptions about the absence of confounding. Even if exposure is randomized, there may be mediator-outcome confounding variables. Inference about direct and indirect effects is particularly challenging if these mediator-outcome confounders are affected by the exposure because in this case these effects are not identified irrespective of whether data is available on these exposure-induced mediator-outcome confounders. In this paper, we provide a sensitivity analysis technique for natural direct and indirect effects that is applicable even if there are mediator-outcome confounders affected by the exposure. We give techniques for both the difference and risk ratio scales and compare the technique to other possible approaches.
doi:10.2427/9027
PMCID: PMC4287391
PMID: 25580387
Confounding; direct and indirect effects; mediation; sensitivity analysis
In vaccine trials, the vaccination of one person might prevent the infection of another; a distinction can be drawn between the ways such a protective effect might arise. Consider a setting with 2 persons per household in which one of the 2 is vaccinated. Vaccinating the first person may protect the second person by preventing the first from being infected and passing the infection on to the second. Alternatively, vaccinating the first person may protect the second by rendering the infection less contagious even if the first is infected. This latter mechanism is sometimes referred to as an “infectiousness effect” of the vaccine. Crude estimators for the infectiousness effect will be subject to selection bias due to stratification on a postvaccination event, namely the infection status of the first person. We use theory concerning causal inference under interference along with a principal-stratification framework to show that, although the crude estimator is biased, it is, under plausible assumptions, conservative for what one might define as a causal infectiousness effect. This applies to bias from selection due to the persons in the comparison, and also to selection due to pathogen virulence. We illustrate our results with an example from the literature.
doi:10.1097/EDE.0b013e31822708d5
PMCID: PMC3792580
PMID: 21753730
doi:10.1080/19345747.2012.688412
PMCID: PMC4280833
PMID: 25558296
A sufficient cause interaction between two exposures signals the presence of individuals for whom the outcome would occur only under certain values of the two exposures. When the outcome is dichotomous and all exposures are categorical, then under certain no confounding assumptions, empirical conditions for sufficient cause interactions can be constructed based on the sign of linear contrasts of conditional outcome probabilities between differently exposed subgroups, given confounders. It is argued that logistic regression models are unsatisfactory for evaluating such contrasts, and that Bernoulli regression models with linear link are prone to misspecification. We therefore develop semiparametric tests for sufficient cause interactions under models which postulate probability contrasts in terms of a finite-dimensional parameter, but which are otherwise unspecified. Estimation is often not feasible in these models because it would require nonparametric estimation of auxiliary conditional expectations given high-dimensional variables. We therefore develop ‘multiply robust tests’ under a union model that assumes at least one of several working submodels holds. In the special case of a randomized experiment or a family-based genetic study in which the joint exposure distribution is known by design or Mendelian inheritance, the procedure leads to asymptotically distribution-free tests of the null hypothesis of no sufficient cause interaction.
doi:10.1111/j.1467-9868.2011.01011.x
PMCID: PMC4280915
PMID: 25558182
Double robustness; Effect modification; Gene-environment interaction; Gene-gene interaction; Semiparametric inference; Sufficient cause; Synergism
The causal inference literature has provided definitions of direct and indirect effects based on counterfactuals that generalize the approach found in the social science literature. However, these definitions presuppose well defined hypothetical interventions on the mediator. In many settings there may be multiple ways to fix the mediator to a particular value and these different hypothetical interventions may have very different implications for the outcome of interest. In this paper we consider mediation analysis when multiple versions of the mediator are present. Specifically, we consider the problem of attempting to decompose a total effect of an exposure on an outcome into the portion through the intermediate and the portion through other pathways. We consider the setting in which there are multiple versions of the mediator but the investigator only has access to data on the particular measurement, not which version of the mediator may have brought that value about. We show that the quantity that is estimated as a natural indirect effect using only the available data does indeed have an interpretation as a particular type of mediated effect; however, the quantity estimated as a natural direct effect in fact captures both a true direct effect and an effect of the exposure on the outcome mediated through the effect of the version of the mediator that is not captured by the mediator measurement. The results are illustrated using two examples from the literature, one in which the versions of the mediator are unknown and another in which the mediator itself has been dichotomized.
doi:10.1097/EDE.0b013e31824d5fe7
PMCID: PMC3771529
PMID: 22475830
The sufficient-component cause framework assumes the existence of sets of
sufficient causes that bring about an event. For a binary outcome and an arbitrary number
of binary causes any set of potential outcomes can be replicated by positing a set of
sufficient causes; typically this representation is not unique. A sufficient cause
interaction is said to be present if within all representations there exists a sufficient
cause in which two or more particular causes are all present. A singular interaction is
said to be present if for some subset of individuals there is a unique minimal sufficient
cause. Empirical and counterfactual conditions are given for sufficient cause interactions
and singular interactions between an arbitrary number of causes. Conditions are given for
cases in which none, some or all of a given set of causes affect the outcome
monotonically. The relations between these results, interactions in linear statistical
models and Pearl’s probability of causation are discussed.
PMCID: PMC4278668
PMID: 25552780
causal inference; counterfactual; epistasis; interaction include keywords that are in title; potential outcomes; synergism
Vaccination of one person may prevent the infection of another either because the vaccine prevents the first from being infected and from infecting the second, or because, even if the first person is infected, the vaccine may render the infection less infectious. We might refer to the first of these mechanisms as a contagion effect and the second as an infectiousness effect. In the simple setting of a randomized vaccine trial with households of size two, we use counterfactual theory under interference to provide formal definitions of a contagion effect and an unconditional infectiousness effect. Using ideas analogous to mediation analysis, we show that the indirect effect (the effect of one person’s vaccine on another’s outcome) can be decomposed into a contagion effect and an unconditional infectiousness effect on the risk-difference, risk-ratio, odds-ratio and vaccine-efficacy scales. We provide identification assumptions for such contagion and unconditional infectiousness effects, and describe a simple statistical technique to estimate these effects when they are identified. We also give a sensitivity-analysis technique to assess how inferences would change under violations of the identification assumptions. The concepts and results of this paper are illustrated with hypothetical vaccine-trial data.
doi:10.1097/EDE.0b013e31825fb7a0
PMCID: PMC3415570
PMID: 22828661
The interaction estimates from Bhavnani et al. (Am J Epidemiol. 2012;176(5):387–395) are used to evaluate evidence for mechanistic interaction between coinfecting pathogens for diarrheal disease. Mechanistic interaction is said to be present if there are individuals for whom the outcome would occur if both of 2 exposures are present but would not occur if 1 or the other of the exposures is absent. In the epidemiologic literature, mechanistic interaction is often conceived of as synergism within Rothman's sufficient-cause framework. Tests for additive interaction are sometimes used to assess such synergism or mechanistic interaction, but testing for positive additive interaction only allows for the conclusion of mechanistic interaction under fairly strong “monotonicity” assumptions. Alternative tests for mechanistic interaction, which do not require monotonicity assumptions, have been developed more recently but require more substantial additive interaction to draw the conclusion of the presence of mechanistic interaction. The additive interaction reported by Bhavnani et al. is of sufficient magnitude to provide strong evidence of mechanistic interaction between rotavirus and Giardia and between rotavirus and Escherichia. coli/Shigellae, even without any assumptions about monotonicity.
doi:10.1093/aje/kws214
PMCID: PMC3499113
PMID: 22842718
coinfecting pathogens; diarrhea; interaction; mechanism; synergism
Summary
The causal inference literature has provided a clear formal definition of confounding expressed in terms of counterfactual independence. The causal inference literature has not, however, produced a clear formal definition of a confounder, as it has given priority to the concept of confounding over that of a confounder. We consider a number of candidate definitions arising from various more informal statements made in the literature. We consider the properties satisfied by each candidate definition, principally focusing on (i) whether under the candidate definition control for all “confounders” suffices to control for “confounding” and (ii) whether each confounder in some context helps eliminate or reduce confounding bias. Several of the candidate definitions do not have these two properties. Only one candidate definition of those considered satisfies both properties. We propose that a “confounder” be defined as a pre-exposure covariate C for which there exists a set of other covariates X such that effect of the exposure on the outcome is unconfounded conditional on (X, C) but such that for no proper subset of (X, C) is the effect of the exposure on the outcome unconfounded given the subset. A variable that helps reduce bias but not eliminate bias we propose referring to as a “surrogate confounder.”
PMCID: PMC4276366
PMID: 25544784
Adjustment; causal diagrams; causal inference; counterfactuals; confounder; minimal sufficiency