Summary
The causal inference literature has provided a clear formal definition of confounding expressed in terms of counterfactual independence. The causal inference literature has not, however, produced a clear formal definition of a confounder, as it has given priority to the concept of confounding over that of a confounder. We consider a number of candidate definitions arising from various more informal statements made in the literature. We consider the properties satisfied by each candidate definition, principally focusing on (i) whether under the candidate definition control for all “confounders” suffices to control for “confounding” and (ii) whether each confounder in some context helps eliminate or reduce confounding bias. Several of the candidate definitions do not have these two properties. Only one candidate definition of those considered satisfies both properties. We propose that a “confounder” be defined as a pre-exposure covariate C for which there exists a set of other covariates X such that effect of the exposure on the outcome is unconfounded conditional on (X, C) but such that for no proper subset of (X, C) is the effect of the exposure on the outcome unconfounded given the subset. A variable that helps reduce bias but not eliminate bias we propose referring to as a “surrogate confounder.”
PMCID: PMC4276366
PMID: 25544784
Adjustment; causal diagrams; causal inference; counterfactuals; confounder; minimal sufficiency
We propose a new criterion for confounder selection when the underlying causal structure is unknown and only limited knowledge is available. We assume all covariates being considered are pretreatment variables and that for each covariate it is known (i) whether the covariate is a cause of treatment, and (ii) whether the covariate is a cause of the outcome. The causal relationships the covariates have with one another is assumed unknown. We propose that control be made for any covariate that is either a cause of treatment or of the outcome or both. We show that irrespective of the actual underlying causal structure, if any subset of the observed covariates suffices to control for confounding then the set of covariates chosen by our criterion will also suffice. We show that other, commonly used, criteria for confounding control do not have this property. We use formal theory concerning causal diagrams to prove our result but the application of the result does not rely on familiarity with causal diagrams. An investigator simply need ask, “Is the covariate a cause of the treatment?” and “Is the covariate a cause of the outcome?” If the answer to either question is “yes” then the covariate is included for confounder control. We discuss some additional covariate selection results that preserve unconfoundedness and that may be of interest when used with our criterion.
doi:10.1111/j.1541-0420.2011.01619.x
PMCID: PMC3166439
PMID: 21627630
Causal inference; confounding; covariate selection; directed acyclic graphs
Various assumptions have been used in the literature to identify natural direct and indirect effects in mediation analysis. These effects are of interest because they allow for effect decomposition of a total effect into a direct and indirect effect even in the presence of interactions or non-linear models. In this paper, we consider the relation and interpretation of various identification assumptions in terms of causal diagrams interpreted as a set of non-parametric structural equations. We show that for such causal diagrams, two sets of assumptions for identification that have been described in the literature are in fact equivalent in the sense that if either set of assumptions holds for all models inducing a particular causal diagram, then the other set of assumptions will also hold for all models inducing that diagram. We moreover build on prior work concerning a complete graphical identification criterion for covariate adjustment for total effects to provide a complete graphical criterion for using covariate adjustment to identify natural direct and indirect effects. Finally, we show that this criterion is equivalent to the two sets of independence assumptions used previously for mediation analysis.
doi:10.2202/1557-4679.1297
PMCID: PMC3083137
PMID: 21556286
adjustment; causal diagrams; confounding; covariate adjustment; mediation; natural direct and indirect effects
Abstract
Inference of biological networks from high-throughput data is a central problem in bioinformatics. Particularly powerful for network reconstruction is data collected by recent studies that contain both genetic variation information and gene expression profiles from genetically distinct strains of an organism. Various statistical approaches have been applied to these data to tease out the underlying biological networks that govern how individual genetic variation mediates gene expression and how genes regulate and interact with each other. Extracting meaningful causal relationships from these networks remains a challenging but important problem. In this article, we use causal inference techniques to infer the presence or absence of causal relationships between yeast gene expressions in the framework of graphical causal models. We evaluate our method using a well studied dataset consisting of both genetic variations and gene expressions collected over randomly segregated yeast strains. Our predictions of causal regulators, genes that control the expression of a large number of target genes, are consistent with previously known experimental evidence. In addition, our method can detect the absence of causal relationships and can distinguish between direct and indirect effects of variation on a gene expression level.
doi:10.1089/cmb.2009.0176
PMCID: PMC3198891
PMID: 20377462
algorithms; gene networks; machine learning; regulatory regions