Search tips
Search criteria

Results 1-25 (1323106)

Clipboard (0)

Related Articles

1.  A Note on formulae for causal mediation analysis in an odds ratiocontext 
Epidemiologic methods  2014;2(1):21-31.
In a recent manuscript, VanderWeele and Vansteelandt (American Journal of Epidemiology, 2010,172:1339–1348) (hereafter VWV) build on results due to Judea Pearl on causal mediation analysis and derive simple closed-form expressions for so-called natural direct and indirect effects in an odds ratio context for a binary outcome and a continuous mediator. The expressions obtained by VWV make two key simplifying assumptions: The mediator is normally distributed with constant variance,The binary outcome is rare. Assumption A may not be appropriate in settings where, as can happen in routine epidemiologic applications, the distribution of the mediator variable is highly skew. However, in this note, the author establishes that under a key assumption of “no mediator-exposure interaction” in the logistic regression model for the outcome, the simple formulae of VWV continue to hold even when the normality assumption of the mediator is dropped. The author further shows that when the “no interaction” assumption is relaxed, the formula of VWV for the natural indirect effect in this setting continues to apply when assumption A is also dropped. However, an alternative formula to that of VWV for the natural direct effect is required in this context and is provided in an appendix. When the disease is not rare, the author replaces assumptions A and B with an assumption C that the mediator follows a so-called Bridge distribution in which case simple closed-form formulae are again obtained for the natural direct and indirect effects.
PMCID: PMC4193811  PMID: 25309848
2.  Mediation Analysis for Nonlinear Models with Confounding 
Epidemiology (Cambridge, Mass.)  2012;23(6):879-888.
Recently, researchers have used a potential-outcome framework to estimate causally interpretable direct and indirect effects of an intervention or exposure on an outcome. One approach to causal-mediation analysis uses the so-called mediation formula to estimate the natural direct and indirect effects. This approach generalizes classical mediation estimators and allows for arbitrary distributions for the outcome variable and mediator. A limitation of the standard (parametric) mediation formula approach is that it requires a specified mediator regression model and distribution; such a model may be difficult to construct and may not be of primary interest. To address this limitation, we propose a new method for causal-mediation analysis that uses the empirical distribution function, thereby avoiding parametric distribution assumptions for the mediator. In order to adjust for confounders of the exposure-mediator and exposure-outcome relationships, inverse-probability weighting is incorporated based on a supplementary model of the probability of exposure. This method, which yields estimates of the natural direct and indirect effects for a specified reference group, is applied to data from a cohort study of dental caries in very-low-birth-weight adolescents to investigate the oral-hygiene index as a possible mediator. Simulation studies show low bias in the estimation of direct and indirect effects in a variety of distribution scenarios, whereas the standard mediation formula approach can be considerably biased when the distribution of the mediator is incorrectly specified.
PMCID: PMC3773310  PMID: 23007042
3.  Identification and efficient estimation of the natural direct effect among the untreated 
Biometrics  2013;69(2):310-317.
The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this paper we introduce a new causal parameter called the natural direct effect among the untreated, discus identifiability assumptions, propose a sensitivity analysis for some of the assumptions, and show that this new parameter is equivalent to the NDE in a randomized controlled trial. We also present a targeted minimum loss estimator (TMLE), a locally efficient, double robust substitution estimator for the statistical parameter associated with this causal parameter. The TMLE can be applied to problems with continuous and high dimensional intermediate variables, and can be used to estimate the NDE in a randomized controlled trial with such data. Additionally, we define and discuss the estimation of three related causal parameters: the natural direct effect among the treated, the indirect effect among the untreated and the indirect effect among the treated.
PMCID: PMC3692606  PMID: 23607645
Causal inference; direct effect; indirect effect; mediation analysis; semiparametric models; targeted minimum loss estimation
4.  A Three-way Decomposition of a Total Effect into Direct, Indirect, and Interactive Effects 
Epidemiology (Cambridge, Mass.)  2013;24(2):224-232.
Recent theory in causal inference has provided concepts for mediation analysis and effect decomposition that allow one to decompose a total effect into a direct and an indirect effect. Here, it is shown that what is often taken as an indirect effect can in fact be further decomposed into a “pure” indirect effect and a mediated interactive effect, thus yielding a three-way decomposition of a total effect (direct, indirect, and interactive). This three-way decomposition applies to difference scales and also to additive ratio scales and additive hazard scales. Assumptions needed for the identification of each of these three effects are discussed and simple formulae are given for each when regression models allowing for interaction are used. The three-way decomposition is illustrated by examples from genetic and perinatal epidemiology, and discussion is given to what is gained over the traditional two-way decomposition into simply a direct and an indirect effect.
PMCID: PMC3563853  PMID: 23354283
5.  Child Mortality Estimation: Consistency of Under-Five Mortality Rate Estimates Using Full Birth Histories and Summary Birth Histories 
PLoS Medicine  2012;9(8):e1001296.
Romesh Silva assesses and analyzes differences in direct and indirect methods of estimating under-five mortality rates using data collected from full and summary birth histories in Demographic and Health Surveys from West Africa, East Africa, Latin America, and South/Southeast Asia.
Given the lack of complete vital registration data in most developing countries, for many countries it is not possible to accurately estimate under-five mortality rates from vital registration systems. Heavy reliance is often placed on direct and indirect methods for analyzing data collected from birth histories to estimate under-five mortality rates. Yet few systematic comparisons of these methods have been undertaken. This paper investigates whether analysts should use both direct and indirect estimates from full birth histories, and under what circumstances indirect estimates derived from summary birth histories should be used.
Methods and Findings
Usings Demographic and Health Surveys data from West Africa, East Africa, Latin America, and South/Southeast Asia, I quantify the differences between direct and indirect estimates of under-five mortality rates, analyze data quality issues, note the relative effects of these issues, and test whether these issues explain the observed differences. I find that indirect estimates are generally consistent with direct estimates, after adjustment for fertility change and birth transference, but don't add substantial additional insight beyond direct estimates. However, choice of direct or indirect method was found to be important in terms of both the adjustment for data errors and the assumptions made about fertility.
Although adjusted indirect estimates are generally consistent with adjusted direct estimates, some notable inconsistencies were observed for countries that had experienced either a political or economic crisis or stalled health transition in their recent past. This result suggests that when a population has experienced a smooth mortality decline or only short periods of excess mortality, both adjusted methods perform equally well. However, the observed inconsistencies identified suggest that the indirect method is particularly prone to bias resulting from violations of its strong assumptions about recent mortality and fertility. Hence, indirect estimates of under-five mortality rates from summary birth histories should be used only for populations that have experienced either smooth mortality declines or only short periods of excess mortality in their recent past.
Please see later in the article for the Editors' Summary.
Editors' Summary
In 1990, 12 million children died before they reached their fifth birthday. Faced with this largely avoidable loss of young lives, in 2000, world leaders set a target of reducing under-five mortality (death) to one-third of its 1990 level by 2015 as Millennium Development Goal 4 (MDG 4); this goal, together with seven others, aims to eradicate extreme poverty globally. To track progress towards MDG 4, experts need accurate estimates of the global and country-specific under-five mortality rate (U5MR, the probability of a child dying before age five). The most reliable sources of data for U5MR estimation are vital registration systems—national records of all births and deaths. Unfortunately, developing countries, which are where most childhood deaths occur, rarely have such records, so full or summary birth histories provide the data for U5MR estimation instead. In full birth histories (FBHs), which are collected through household surveys such as those conducted by Demographic and Health Surveys (DHS), women are asked for the date of birth of all their children and the age at death of any children who have died. In summary birth histories (SBHs), which are collected through household surveys and censuses, women are asked how many children they have had and how many are alive at the time of the survey.
Why Was This Study Done?
“Direct” estimates of U5MRs can be obtained from FBHs because FBHs provide detailed information about the date of death and the exposure of children to the risk of dying. By contrast, because SBHs do not contain information on children's exposure to the risk of dying, “indirect” estimates of U5MR are obtained from SBHs using model life tables (mathematical models of the variation of mortality with age). Indirect estimates are often also derived from FBHs, but few systematic comparisons of direct and indirect methods for U5MR estimation have been undertaken. In this study, Romesh Silva investigates whether direct and indirect methods provide consistent U5MR estimates from FBHs and whether there are any circumstances under which indirect methods provide more reliable U5MR estimates than direct methods.
What Did the Researcher Do and Find?
The researcher used DHS data from West Africa, East Africa, Latin America, and South/Southeast Asia to quantify the differences between direct and indirect estimates of U5MR calculated from the same data and analyzed possible reasons for these differences. Estimates obtained using a version of the “Brass” indirect estimation method were uniformly higher than those obtained using direct estimation. Indirect and direct estimates generally agreed, however, after adjustment for changes in fertility—the Brass method assumes that country-specific fertility (the number of children born to a woman during her reproductive life) remains constant—and for birth transference, an important source of data error in FBHs that arises because DHS field staff can lessen their workload by recording births as occurring before a preset cutoff date rather than after that date. Notably, though, for countries that had experienced political or economic crises, periods of excess mortality due to conflicts, or periods during which the health transition had stalled (as countries become more affluent, overall mortality rates decline and noncommunicable diseases replace infectious diseases as the major causes of death), marked differences between indirect and direct estimates of U5MR remained, even after these adjustments.
What Do These Findings Mean?
Because the countries included in this study do not have vital registration systems, these findings provide no information about the validity of either direct or indirect estimation methods for U5MR estimation. They suggest, however, that for countries where there has been a smooth decline in mortality or only short periods of excess mortality, both direct and indirect methods of U5MR estimation work equally well, after adjustment for changes in fertility and for birth transference, and that indirect estimates add little to the insights provided into childhood mortality by direct estimates. Importantly, the inconsistencies observed between the two methods that remain after adjustment suggest that indirect U5MR estimation is more susceptible to bias (systematic errors that arise because of the assumptions used to estimate U5MR) than direct estimation. Thus, indirect estimates of U5MR from SBHs should be used only for populations that have experienced either smooth mortality declines or only short periods of excess mortality in their recent past.
Additional Information
Please access these websites via the online version of this summary at
This paper is part of a collection of papers on Child Mortality Estimation Methods published in PLOS Medicine
The United Nations Childrens Fund (UNICEF) works for children's rights, survival, development, and protection around the world; it provides information on Millennium Development Goal 4, and its Childinfo website provides detailed statistics about child survival and health, including a description of the United Nations Inter-agency Group for Child Mortality Estimation; the 2011 UN IGME report Levels & Trends in Child Mortality is available
The World Health Organization has information about Millennium Development Goal 4 and provides estimates of child mortality rates (some information in several languages)
Further information about the Millennium Development Goals is available
Information is available about infant and child mortality data collected by Demographic and Health Surveys
PMCID: PMC3429405  PMID: 22952436
6.  The role of Homocysteine as a predictor for coronary heart disease 
Background and objective
There is an ongoing debate on the role of the cytotoxic aminoacid homocysteine as a causal risk factor for the development of coronary heart disease. Results from multiple case control-studies demonstrate, that there is a strong association between high plasma levels of homoysteine and prevalent coronary heart disease, independent of other classic risk factors. Furthermore, results from interventional studies point out that elevated plasma levels of homocysteine may effectively be lowered by the intake of folic acid and B vitamins. In order to use this information for the construction of a new preventive strategy against coronary heart disease, more information is needed: first, whether homocysteine actually is a causal risk factor with relevant predictive properties and, second, whether by lowering elevated homocysteine plasma concentrations cardiac morbidity can be reduced. Currently in Germany the determination of homocysteine plasma levels is reimbursed for by statutory health insurance in patients with manifest coronary heart disease and in patients at high risk for coronary heart disease but not for screening purposes in asymptomatic low risk populations.
Against this background the following assessment sets out to answer four questions:
Is an elevated homocysteine plasma concentration a strong, consistent and independent (of other classic risk factors) predictor for coronary heart disease?Does a therapeutic lowering of elevated homoysteine plasma levels reduce the risk of developing coronary events?What is the cost-effectiveness relationship of homocysteine testing for preventive purposes?Are there morally, socially or legally relevant aspects that should be considered when implementing a preventive strategy as outlined above?
In order to answer the first question, a systematic overview of prospective studies and metaanalyses of prospective studies is undertaken. Studies are included that analyse the association of homocysteine plasma levels with future cardiac events in probands without pre-existing coronary heart disease or in population-based samples. To answer the second question, a systematic overview of the literature is prepared, including randomised controlled trials and systematic reviews of randomised controlled trials that determine the effectiveness of homocysteine lowering therapy for the prevention of cardiac events. To answer the third question, economic evaluations of homocysteine testing for preventive purposes are analysed. Methodological quality of all materials is assessed by widely accepted instruments, evidence was summarized qualitatively.
For the first question eleven systematic reviews and 33 single studies (prospective cohort studies and nested case control studies) are available. Among the studies there is profound heterogeneity concercing study populations, classification of exposure (homocysteine measurements, units to express “elevation”), outcome definition and measurement, as well as controlling for confounding (qualitatively and quantitatively). Taking these heterogeneities into consideration, metaanalysis of single patient data with controlling for multiple confounders seems to be the only adequate method of summarizing the results of single studies. The only available analysis of this type shows, that in otherwise healthy people homocysteine plasma levels are only a very weak predictor of future cardiac events. The predictive value of the classical risk factors is much stronger. Among the studies that actively exclude patients with pre-existing coronary heart disease, there are no reports of an association between elevated homocysteine plasma levels and future cardiac events.
Eleven randomized controlled trials (ten of them reported in one systematic review) are analysed in order to answer the second question. All trials include high risk populations for the development of (further) cardiac events. These studies also present with marked clinical heterogeneity: primarily concerning the average homocysteine plasma levels at baseline, type and mode of outcome measurement and as study duration. Except for one, none of the trials shows a risk reduction for cardiac events by lowering homocysteine plasma levels with folate or B vitamins. These results also hold for predefined subgroups with markedly elevated homocysteine plasma levels.
In order to answer the third questions, three economic evaluations (modelling studies) of homocysteine testing are available. All economic models are based on the assumption that lowering homocysteine plasma levels results in risk reduction for cardiac events. Since this assumption is falsified by the results of the interventional studies cited above, there is no evidence left to answer the third question.
Morally, socially or legally relevant aspects of homocysteine assessment are currently not being discussed in the scientific literature.
Discussion and conclusion
Many currently available pieces of evidence contradict a causal role of homocysteine in the pathogenesis of coronary heart disease. Arguing with the Bradford-Hill criteria at least the criterion of time-sequence (that exposure has to happen before the outcome is measured), the criterion of a strong and consistent association and the criterion of reversibility are not fulfilled. Therefore, homocysteine may, if at all, play a role as a risk indicator but not as risk factor.
Furthermore, currently available evidence does not imply that for the prevention of coronary heart disease, knowledge of homocysteine plasma levels provides any information that supersedes the information gathered from the examination of classical risk factors. So, currently for the indication of prevention, there is no evidence that homocysteine testing provides any benefit. Against this background there is also no basis for cost-effectiveness calculations.
Further basic research should clarify the discrepant results of case control studies and prospective studies. Maybe there is a third parameter (confounder) associated with homocysteine metabolism as well with coronary heart disease. Further epidemiological research could elucidate the role of elevated homocysteine plasma levels as a risk indicator or prognostic indicator in patients with pre-existing coronary heart disease taking into consideration the classical risk factors.
PMCID: PMC3011327  PMID: 21289945
7.  The Teacher, the Physician and the Person: Exploring Causal Connections between Teaching Performance and Role Model Types Using Directed Acyclic Graphs 
PLoS ONE  2013;8(7):e69449.
In fledgling areas of research, evidence supporting causal assumptions is often scarce due to the small number of empirical studies conducted. In many studies it remains unclear what impact explicit and implicit causal assumptions have on the research findings; only the primary assumptions of the researchers are often presented. This is particularly true for research on the effect of faculty’s teaching performance on their role modeling. Therefore, there is a need for robust frameworks and methods for transparent formal presentation of the underlying causal assumptions used in assessing the causal effects of teaching performance on role modeling. This study explores the effects of different (plausible) causal assumptions on research outcomes.
This study revisits a previously published study about the influence of faculty’s teaching performance on their role modeling (as teacher-supervisor, physician and person). We drew eight directed acyclic graphs (DAGs) to visually represent different plausible causal relationships between the variables under study. These DAGs were subsequently translated into corresponding statistical models, and regression analyses were performed to estimate the associations between teaching performance and role modeling.
The different causal models were compatible with major differences in the magnitude of the relationship between faculty’s teaching performance and their role modeling. Odds ratios for the associations between teaching performance and the three role model types ranged from 31.1 to 73.6 for the teacher-supervisor role, from 3.7 to 15.5 for the physician role, and from 2.8 to 13.8 for the person role.
Different sets of assumptions about causal relationships in role modeling research can be visually depicted using DAGs, which are then used to guide both statistical analysis and interpretation of results. Since study conclusions can be sensitive to different causal assumptions, results should be interpreted in the light of causal assumptions made in each study.
PMCID: PMC3720648  PMID: 23936020
8.  Diagram-based Analysis of Causal Systems (DACS): elucidating inter-relationships between determinants of acute lower respiratory infections among children in sub-Saharan Africa 
Effective interventions require evidence on how individual causal pathways jointly determine disease. Based on the concept of systems epidemiology, this paper develops Diagram-based Analysis of Causal Systems (DACS) as an approach to analyze complex systems, and applies it by examining the contributions of proximal and distal determinants of childhood acute lower respiratory infections (ALRI) in sub-Saharan Africa.
Diagram-based Analysis of Causal Systems combines the use of causal diagrams with multiple routinely available data sources, using a variety of statistical techniques. In a step-by-step process, the causal diagram evolves from conceptual based on a priori knowledge and assumptions, through operational informed by data availability which then undergoes empirical testing, to integrated which synthesizes information from multiple datasets. In our application, we apply different regression techniques to Demographic and Health Survey (DHS) datasets for Benin, Ethiopia, Kenya and Namibia and a pooled World Health Survey (WHS) dataset for sixteen African countries. Explicit strategies are employed to make decisions transparent about the inclusion/omission of arrows, the sign and strength of the relationships and homogeneity/heterogeneity across settings.
Findings about the current state of evidence on the complex web of socio-economic, environmental, behavioral and healthcare factors influencing childhood ALRI, based on DHS and WHS data, are summarized in an integrated causal diagram. Notably, solid fuel use is structured by socio-economic factors and increases the risk of childhood ALRI mortality.
Diagram-based Analysis of Causal Systems is a means of organizing the current state of knowledge about a specific area of research, and a framework for integrating statistical analyses across a whole system. This partly a priori approach is explicit about causal assumptions guiding the analysis and about researcher judgment, and wrong assumptions can be reversed following empirical testing. This approach is well-suited to dealing with complex systems, in particular where data are scarce.
PMCID: PMC3904753  PMID: 24314302
Africa; Children; Acute lower respiratory infections; Pneumonia; Health determinants; Causal diagrams; Multi-factorial causality; Systems epidemiology; Social epidemiology; Environmental epidemiology
9.  Invited Commentary: Decomposing with a Lot of Supposing 
American Journal of Epidemiology  2010;172(12):1349-1351.
In this issue of the Journal, VanderWeele and Vansteelandt (Am J Epidemiol. 2010;172(12):1339–1348) provide simple formulae for estimation of direct and indirect effects using standard logistic regression when the exposure and outcome are binary, the mediator is continuous, and the odds ratio is the chosen effect measure. They also provide concisely stated lists of assumptions necessary for estimation of these effects, including various conditional independencies and homogeneity of exposure and mediator effects over covariate strata. They further suggest that this will allow effect decomposition in case-control studies if the sampling fractions and population outcome prevalence are known with certainty. In this invited commentary, the author argues that, in a well-designed case-control study in which the sampling fraction is known, it should not be necessary to rely on the odds ratio. The odds ratio has well-known deficiencies as a causal parameter, and its use severely complicates evaluation of confounding and effect homogeneity. Although VanderWeele and Vansteelandt propose that a rare disease assumption is not necessary for estimation of controlled direct effects using their approach, collapsibility concerns suggest otherwise when the goal is causal inference rather than merely measuring association. Moreover, their clear statement of assumptions necessary for the estimation of natural/pure effects suggests that these quantities will rarely be viable estimands in observational epidemiology.
PMCID: PMC3139971  PMID: 21036956
causal inference; conditional independence; confounding; decomposition; estimation; interaction; logistic regression; odds ratio
10.  Indirect comparisons of therapeutic interventions 
Health political background
The comparison of the effectiveness of health technologies is not only laid down in German law (Social Code Book V, § 139 and § 35b) but also constitutes a central element of clinical guidelines and decision making in health care. Tools supporting decision making (e. g. Health Technology Assessments (HTA)) are therefore in need of a valid methodological repertoire for these comparisons.
Scientific background
Randomised controlled head-to-head trials which directly compare the effects of different therapies are considered the gold standard methodological approach for the comparison of the efficacy of interventions. Because this type of trial is rarely found, comparisons of efficacy often need to rely on indirect comparisons whose validity is being controversially debated.
Research questions
Research questions for the current assessment are: Which (statistical) methods for indirect comparisons of therapeutic interventions do exist, how often are they applied and how valid are their results in comparison to the results of head-to-head trials?
In a systematic literature research all medical databases of the German Institute of Medical Documentation and Information (DIMDI) are searched for methodological papers as well as applications of indirect comparisons in systematic reviews. Results of the literature analysis are summarized qualitatively for the characterisation of methods and quantitatively for the frequency of their application.
The validity of the results from indirect comparisons is checked by comparing them to the results from the gold standard – a direct comparison. Data sets from systematic reviews which use both direct and indirect comparisons are tested for consistency by of the z-statistic.
29 methodological papers and 106 applications of indirect methods in systematic reviews are being analysed. Four methods for indirect comparisons can be identified:
Unadjusted indirect comparisons include, independent of any comparator, all randomised controlled trials (RCT) that provide a study arm with the intervention of interest. Adjusted indirect comparisons and metaregression analyses include only those studies that provide one study arm with the intervention of interest and another study arm with a common comparator. While the aforementioned methods use conventional metaanalytical techniques, Mixed treatment comparisons (MTC) use Bayesian statistics. They are able to analyse a complex network of RCT with multiple comparators simultaneously.
During the period from 1999 to 2008 adjusted indirect comparisons are the most commonly used method for indirect comparisons. Since 2006 an increase in the application of the more methodologically challenging MTC is being observed.
For the validity check 248 data sets, which include results of a direct and an indirect comparison, are available. The share of statistically significant discrepant results is greatest in the unadjusted indirect comparisons (25,5% [95% CI: 13,1%; 38%]), followed by metaregression analyses (16,7% [95% CI: -13,2%; 46,5%]), adjusted indirect comparisons (12,1% [95% CI: 6,1%; 18%]) and MTC (1,8% [95% CI: -1,7%; 5,2%]). Discrepant results are mainly detected if the basic assumption for an indirect comparison – between-study homogeneity – does not hold. However a systematic over- or underestimation of the results of direct comparisons by any of the indirectly comparing methods was not observed in this sample.
The selection of an appropriate method for an indirect comparison has to account for its validity, the number of interventions to be compared and the quality as well as the quantity of available studies. Unadjusted indirect comparisons provide, contrasted with the results of direct comparisons, a low validity. Adjusted indirect comparisons and MTC may, under certain circumstances, give results which are consistent with the results of direct comparisons. The limited number of available reviews utilizing metaregression analyses for indirect comparisons currently prohibits empirical evaluation of this methodology.
Given the main prerequisite – a pool of homogenous and high-quality RCT – the results of head-to-head trials may be pre-estimated by an adjusted indirect comparison or a MTC. In the context of HTA and guideline development they are valuable tools if there is a lack of a direct comparison of the interventions of interest.
PMCID: PMC3011284  PMID: 21289896
11.  Cancer risk and the complexity of the interactions between environmental and host factors: HENVINET interactive diagrams as simple tools for exploring and understanding the scientific evidence 
Environmental Health  2012;11(Suppl 1):S9.
Development of graphical/visual presentations of cancer etiology caused by environmental stressors is a process that requires combining the complex biological interactions between xenobiotics in living and occupational environment with genes (gene-environment interaction) and genomic and non-genomic based disease specific mechanisms in living organisms. Traditionally, presentation of causal relationships includes the statistical association between exposure to one xenobiotic and the disease corrected for the effect of potential confounders.
Within the FP6 project HENVINET, we aimed at considering together all known agents and mechanisms involved in development of selected cancer types. Selection of cancer types for causal diagrams was based on the corpus of available data and reported relative risk (RR). In constructing causal diagrams the complexity of the interactions between xenobiotics was considered a priority in the interpretation of cancer risk. Additionally, gene-environment interactions were incorporated such as polymorphisms in genes for repair and for phase I and II enzymes involved in metabolism of xenobiotics and their elimination. Information on possible age or gender susceptibility is also included. Diagrams are user friendly thanks to multistep access to information packages and the possibility of referring to related literature and a glossary of terms. Diagrams cover both chemical and physical agents (ionizing and non-ionizing radiation) and provide basic information on the strength of the association between type of exposure and cancer risk reported by human studies and supported by mechanistic studies. Causal diagrams developed within HENVINET project represent a valuable source of information for professionals working in the field of environmental health and epidemiology, and as educational material for students.
Cancer risk results from a complex interaction of environmental exposures with inherited gene polymorphisms, genetic burden collected during development and non genomic capacity of response to environmental insults. In order to adopt effective preventive measures and the associated regulatory actions, a comprehensive investigation of cancer etiology is crucial. Variations and fluctuations of cancer incidence in human populations do not necessarily reflect environmental pollution policies or population distribution of polymorphisms of genes known to be associated with increased cancer risk. Tools which may be used in such a comprehensive research, including molecular biology applied to field studies, require a methodological shift from the reductionism that has been used until recently as a basic axiom in interpretation of data. The complexity of the interactions between cells, genes and the environment, i.e. the resonance of the living matter with the environment, can be synthesized by systems biology. Within the HENVINET project such philosophy was followed in order to develop interactive causal diagrams for the investigation of cancers with possible etiology in environmental exposure.
Causal diagrams represent integrated knowledge and seed tool for their future development and development of similar diagrams for other environmentally related diseases such as asthma or sterility. In this paper development and application of causal diagrams for cancer are presented and discussed.
PMCID: PMC3388474  PMID: 22759509
12.  A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation 
Epidemiologic research is often devoted to etiologic investigation, and so techniques that may facilitate mechanistic inferences are attractive. Some of these techniques rely on rigid and/or unrealistic assumptions, making the biologic inferences tenuous. The methodology investigated here is effect decomposition: the contrast between effect measures estimated with and without adjustment for one or more variables hypothesized to lie on the pathway through which the exposure exerts its effect. This contrast is typically used to distinguish the exposure's indirect effect, through the specified intermediate variables, from its direct effect, transmitted via pathways that do not involve the specified intermediates.
We apply a causal framework based on latent potential response types to describe the limitations inherent in effect decomposition analysis. For simplicity, we assume three measured binary variables with monotonic effects and randomized exposure, and use difference contrasts as measures of causal effect. Previous authors showed that confounding between intermediate and the outcome threatens the validity of the decomposition strategy, even if exposure is randomized. We define exchangeability conditions for absence of confounding of causal effects of exposure and intermediate, and generate two example populations in which the no-confounding conditions are satisfied. In one population we impose an additional prohibition against unit-level interaction (synergism). We evaluate the performance of the decomposition strategy against true values of the causal effects, as defined by the proportions of latent potential response types in the two populations.
We demonstrate that even when there is no confounding, partition of the total effect into direct and indirect effects is not reliably valid. Decomposition is valid only with the additional restriction that the population contain no units in which exposure and intermediate interact to cause the outcome. This restriction implies homogeneity of causal effects across strata of the intermediate.
Reliable effect decomposition requires not only absence of confounding, but also absence of unit-level interaction and use of linear contrasts as measures of causal effect. Epidemiologists should be wary of etiologic inference based on adjusting for intermediates, especially when using ratio effect measures or when absence of interacting potential response types cannot be confidently asserted.
PMCID: PMC526390  PMID: 15507130
effect decomposition; causality; confounding; counterfactual models; bias
13.  Cloak and DAG: A Response to the Comments on our Comment 
NeuroImage  2011;76:446-449.
Our original comment (Lindquist and Sobel 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGM’s), which include certain types of structural equation models (SEM’s), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. Thus it is imperative that neuroimaging researchers interested in issues involving causation, for example, effective connectivity, consider the plausibility of these assumptions for their particular problem before using SEM’s. In cases where these additional assumptions are not met, researchers may be able to use other methods and/or design experimental studies where the use of unrealistic assumptions can be avoided. Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. Glymour’s comment is based on three claims that he inappropriately attributes to us. Glymour is also more optimistic than us about the potential of using directed graphical models (DGM’s) to discover causal relations in neuroimaging research; we briefly address this issue toward the end of our rejoinder.
PMCID: PMC4121662  PMID: 22119004
14.  Invited Commentary: Structural Equation Models and Epidemiologic Analysis 
American Journal of Epidemiology  2012;176(7):608-612.
In this commentary, structural equation models (SEMs) are discussed as a tool for epidemiologic analysis. Such models are related to and compared with other analytic approaches often used in epidemiology, including regression analysis, causal diagrams, causal mediation analysis, and marginal structural models. Several of these other approaches in fact developed out of the SEM literature. However, SEMs themselves tend to make much stronger assumptions than these other techniques. SEMs estimate more types of effects than do these other techniques, but this comes at the price of additional assumptions. Many of these assumptions have often been ignored and not carefully evaluated when SEMs have been used in practice. In light of the strong assumptions employed by SEMs, the author argues that they should be used principally for the purposes of exploratory analysis and hypothesis generation when a broad range of effects are potentially of interest.
PMCID: PMC3530375  PMID: 22956513
causal inference; causality; causal modeling; confounding factors (epidemiology); epidemiologic methods; regression analysis; structural equation model
15.  Quantification of collider-stratification bias and the birthweight paradox 
The ‘birthweight paradox’ describes the phenomenon whereby birthweight-specific mortality curves cross when stratified on other exposures, most notably cigarette smoking. The paradox has been noted widely in the literature and numerous explanations and corrections have been suggested. Recently, causal diagrams have been used to illustrate the possibility for collider-stratification bias in models adjusting for birthweight. When two variables share a common effect, stratification on the variable representing that effect induces a statistical relation between otherwise independent factors. This bias has been proposed to explain the birthweight paradox.
Causal diagrams may illustrate sources of bias, but are limited to describing qualitative effects. In this paper, we provide causal diagrams that illustrate the birthweight paradox and use a simulation study to quantify the collider-stratification bias under a range of circumstances. Considered circumstances include exposures with and without direct effects on neonatal mortality, as well as with and without indirect effects acting through birthweight on neonatal mortality. The results of these simulations illustrate that when the birthweight-mortality relation is subject to substantial uncontrolled confounding, the bias on estimates of effect adjusted for birthweight may be sufficient to yield opposite causal conclusions, i.e. a factor that poses increased risk appears protective. Effects on stratum-specific birthweight-mortality curves were considered to illustrate the connection between collider-stratification bias and the crossing of the curves. The simulations demonstrate the conditions necessary to give rise to empirical evidence of the paradox.
PMCID: PMC2743120  PMID: 19689488
collider-stratification bias; birthweight; directed acyclic graphs; neonatal nortality
16.  Estimation of Causal Mediation Effects for a Dichotomous Outcome in Multiple-Mediator Models using the Mediation Formula 
Statistics in medicine  2013;32(24):4211-4228.
Mediators are intermediate variables in the causal pathway between an exposure and an outcome. Mediation analysis investigates the extent to which exposure effects occur through these variables, thus revealing causal mechanisms. In this paper, we consider the estimation of the mediation effect when the outcome is binary and multiple mediators of different types exist. We give a precise definition of the total mediation effect as well as decomposed mediation effects through individual or sets of mediators using the potential outcomes framework. We formulate a model of joint distribution (probit-normal) using continuous latent variables for any binary mediators to account for correlations among multiple mediators. A mediation formula approach is proposed to estimate the total mediation effect and decomposed mediation effects based on this parametric model. Estimation of mediation effects through individual or subsets of mediators requires an assumption involving the joint distribution of multiple counterfactuals. We conduct a simulation study that demonstrates low bias of mediation effect estimators for two-mediator models with various combinations of mediator types. The results also show that the power to detect a non-zero total mediation effect increases as the correlation coefficient between two mediators increases, while power for individual mediation effects reaches a maximum when the mediators are uncorrelated. We illustrate our approach by applying it to a retrospective cohort study of dental caries in adolescents with low and high socioeconomic status. Sensitivity analysis is performed to assess the robustness of conclusions regarding mediation effects when the assumption of no unmeasured mediator-outcome confounders is violated.
PMCID: PMC3789850  PMID: 23650048
mediation analysis; multiple mediators; latent variables; overall mediation effect; decomposed mediation effect; mediation formula; sensitivity analysis
17.  Handling Missing Data in Randomized Experiments with Noncompliance 
Treatment noncompliance and missing outcomes at posttreatment assessments are common problems in field experiments in naturalistic settings. Although the two complications often occur simultaneously, statistical methods that address both complications have not been routinely considered in data analysis practice in the prevention research field. This paper shows that identification and estimation of causal treatment effects considering both noncompliance and missing outcomes can be relatively easily conducted under various missing data assumptions. We review a few assumptions on missing data in the presence of noncompliance, including the latent ignorability proposed by Frangakis and Rubin (Biometrika 86:365–379, 1999), and show how these assumptions can be used in the parametric complier average causal effect (CACE) estimation framework. As an easy way of sensitivity analysis, we propose the use of alternative missing data assumptions, which will provide a range of causal effect estimates. In this way, we are less likely to settle with a possibly biased causal effect estimate based on a single assumption. We demonstrate how alternative missing data assumptions affect identification of causal effects, focusing on the CACE. The data from the Johns Hopkins School Intervention Study (Ialongo et al., Am J Community Psychol 27:599–642, 1999) will be used as an example.
PMCID: PMC2912956  PMID: 20379779
Causal inference; Complier average causal effect; Latent ignorability; Missing at random; Missing data; Noncompliance
18.  An Introduction to Causal Inference* 
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
PMCID: PMC2836213  PMID: 20305706
structural equation models; confounding; graphical methods; counterfactuals; causal effects; potential-outcome; mediation; policy evaluation; causes of effects
19.  A nucleosomal approach to inferring causal relationships of histone modifications 
BMC Genomics  2014;15(Suppl 1):S7.
Histone proteins are subject to various posttranslational modifications (PTMs). Elucidating their functional relationships is crucial toward understanding many biological processes. Bayesian network (BN)-based approaches have shown the advantage of revealing causal relationships, rather than simple cooccurrences, of PTMs. Previous works employing BNs to infer causal relationships of PTMs require that all confounders should be included. This assumption, however, is unavoidably violated given the fact that several modifications are often regulated by a common but unobserved factor. An existing non-parametric method can be applied to tackle the problem but the complexity and inflexibility make it impractical.
We propose a novel BN-based method to infer causal relationships of histone modifications. First, from the evidence that nucleosome organization in vivo significantly affects the activities of PTM regulators working on chromatin substrate, hidden confounders of PTMs are selectively introduced by an information-theoretic criterion. Causal relationships are then inferred from a network model of both PTMs and the derived confounders. Application on human epigenomic data shows the advantage of the proposed method, in terms of computational performance and support from literature. Requiring less strict data assumptions also makes it more practical. Interestingly, analysis of the most significant relationships suggests that the proposed method can recover biologically relevant causal effects between histone modifications, which should be important for future investigation of histone crosstalk.
PMCID: PMC4046832  PMID: 24564627
20.  Sensitivity Analysis and Bounding of Causal Effects With Alternative Identifying Assumptions 
When identification of causal effects relies on untestable assumptions regarding nonidentified parameters, sensitivity of causal effect estimates is often questioned. For proper interpretation of causal effect estimates in this situation, deriving bounds on causal parameters or exploring the sensitivity of estimates to scientifically plausible alternative assumptions can be critical. In this paper, we propose a practical way of bounding and sensitivity analysis, where multiple identifying assumptions are combined to construct tighter common bounds. In particular, we focus on the use of competing identifying assumptions that impose different restrictions on the same non-identified parameter. Since these assumptions are connected through the same parameter, direct translation across them is possible. Based on this cross-translatability, various information in the data, carried by alternative assumptions, can be effectively combined to construct tighter bounds on causal effects. Flexibility of the suggested approach is demonstrated focusing on the estimation of the complier average causal effect (CACE) in a randomized job search intervention trial that suffers from noncompliance and subsequent missing outcomes.
PMCID: PMC3150587  PMID: 21822369
alternative assumptions; bounds; causal inference; missing data; noncompliance; principal stratification; sensitivity analysis
The annals of applied statistics  2010;4(1):320-339.
Causal inference approaches in systems genetics exploit quantitative trait loci (QTL) genotypes to infer causal relationships among phenotypes. The genetic architecture of each phenotype may be complex, and poorly estimated genetic architectures may compromise the inference of causal relationships among phenotypes. Existing methods assume QTLs are known or inferred without regard to the phenotype network structure. In this paper we develop a QTL-driven phenotype network method (QTLnet) to jointly infer a causal phenotype network and associated genetic architecture for sets of correlated phenotypes. Randomization of alleles during meiosis and the unidirectional influence of genotype on phenotype allow the inference of QTLs causal to phenotypes. Causal relationships among phenotypes can be inferred using these QTL nodes, enabling us to distinguish among phenotype networks that would otherwise be distribution equivalent. We jointly model phenotypes and QTLs using homogeneous conditional Gaussian regression models, and we derive a graphical criterion for distribution equivalence. We validate the QTLnet approach in a simulation study. Finally, we illustrate with simulated data and a real example how QTLnet can be used to infer both direct and indirect effects of QTLs and phenotypes that co-map to a genomic region.
PMCID: PMC3017382  PMID: 21218138
Causal graphical models; QTL mapping; joint inference of phenotype network and genetic architecture; systems genetics; homogeneous conditional Gaussian regression models; Markov chain Monte Carlo
22.  VennMaster: Area-proportional Euler diagrams for functional GO analysis of microarrays 
BMC Bioinformatics  2008;9:67.
Microarray experiments generate vast amounts of data. The functional context of differentially expressed genes can be assessed by querying the Gene Ontology (GO) database via GoMiner. Directed acyclic graph representations, which are used to depict GO categories enriched with differentially expressed genes, are difficult to interpret and, depending on the particular analysis, may not be well suited for formulating new hypotheses. Additional graphical methods are therefore needed to augment the GO graphical representation.
We present an alternative visualization approach, area-proportional Euler diagrams, showing set relationships with semi-quantitative size information in a single diagram to support biological hypothesis formulation. The cardinalities of sets and intersection sets are represented by area-proportional Euler diagrams and their corresponding graphical (circular or polygonal) intersection areas. Optimally proportional representations are obtained using swarm and evolutionary optimization algorithms.
VennMaster's area-proportional Euler diagrams effectively structure and visualize the results of a GO analysis by indicating to what extent flagged genes are shared by different categories. In addition to reducing the complexity of the output, the visualizations facilitate generation of novel hypotheses from the analysis of seemingly unrelated categories that share differentially expressed genes.
PMCID: PMC2335321  PMID: 18230172
23.  Toward Causal Inference With Interference 
A fundamental assumption usually made in causal inference is that of no interference between individuals (or units); that is, the potential outcomes of one individual are assumed to be unaffected by the treatment assignment of other individuals. However, in many settings, this assumption obviously does not hold. For example, in the dependent happenings of infectious diseases, whether one person becomes infected depends on who else in the population is vaccinated. In this article, we consider a population of groups of individuals where interference is possible between individuals within the same group. We propose estimands for direct, indirect, total, and overall causal effects of treatment strategies in this setting. Relations among the estimands are established; for example, the total causal effect is shown to equal the sum of direct and indirect causal effects. Using an experimental design with a two-stage randomization procedure (first at the group level, then at the individual level within groups), unbiased estimators of the proposed estimands are presented. Variances of the estimators are also developed. The methodology is illustrated in two different settings where interference is likely: assessing causal effects of housing vouchers and of vaccines.
PMCID: PMC2600548  PMID: 19081744
Group-randomized trials; Potential outcomes; Stable unit treatment value assumption; SUTVA; Vaccine
24.  The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients 
American Journal of Epidemiology  2013;177(4):292-298.
It is common to present multiple adjusted effect estimates from a single model in a single table. For example, a table might show odds ratios for one or more exposures and also for several confounders from a single logistic regression. This can lead to mistaken interpretations of these estimates. We use causal diagrams to display the sources of the problems. Presentation of exposure and confounder effect estimates from a single model may lead to several interpretative difficulties, inviting confusion of direct-effect estimates with total-effect estimates for covariates in the model. These effect estimates may also be confounded even though the effect estimate for the main exposure is not confounded. Interpretation of these effect estimates is further complicated by heterogeneity (variation, modification) of the exposure effect measure across covariate levels. We offer suggestions to limit potential misunderstandings when multiple effect estimates are presented, including precise distinction between total and direct effect measures from a single model, and use of multiple models tailored to yield total-effect estimates for covariates.
PMCID: PMC3626058  PMID: 23371353
causal diagrams; causal inference; confounding; direct effects; epidemiologic methods; mediation analysis; regression modeling
25.  A Comparative Analysis of Models Used to Evaluate the Cost-Effectiveness of Dabigatran Versus Warfarin for the Prevention of Stroke in Atrial Fibrillation 
Pharmacoeconomics  2013;31(7):589-604.
A number of models exploring the cost-effectiveness of dabigatran versus warfarin for stroke prevention in atrial fibrillation have been published. These studies found dabigatran was generally cost-effective, considering well-accepted willingness-to-pay thresholds, but estimates of the incremental cost-effectiveness ratios (ICERs) varied, even in the same setting. The objective of this study was to compare the findings of the published economic models and identify key model features accounting for differences.
All aspects of the economic evaluations were reviewed: model approach, inputs, and assumptions. A previously published model served as the reference model for comparisons of the selected studies in the US and UK settings. The reference model was adapted, wherever possible, using the inputs and key assumptions from each of the other published studies to determine if results could be reproduced in the reference model. Incremental total costs, incremental quality-adjusted life years (QALYs), and ICERs (cost per QALY) were compared between each study and the corresponding adapted reference model. The impact of each modified variable or assumption was tracked separately.
The selected studies were in the US setting (2), the Canadian setting (1), and the UK setting (2). All models used the Randomized Evaluation of Long-Term Anticoagulation study (RE-LY) as the main source for clinical inputs, and all used a Markov modelling approach, except one that used discrete event simulation. The reference model had been published in the Canadian and UK settings. In the UK setting, the reference model reported an ICER of UK£4,831, whereas the other UK-based analysis reported an ICER of UK£23,082. When the reference model was modified to use the same population characteristics, cost inputs, and utility inputs, it reproduced the results of the other model (ICER UK£25,518) reasonably well. Key reasons for the different results between the two models were the assumptions on the event utility decrement and costs associated with intracranial haemorrhage, as well as the costs of warfarin monitoring and disability following events. In the US setting, the reference model produced an ICER similar to the ICER from one of the US models (US$15,115/QALY versus US$12,386/QALY, respectively) when modelling assumptions and input values were transferred into the reference model. Key differences in results could be explained by the population characteristics (age and baseline stroke risk), utility assigned to events and specific treatments, adjustment of stroke and intracranial haemorrhage risk over time, and treatment discontinuation and switching. The reference model was able to replicate the QALY results, but not the cost results, reported by the other US cost-effectiveness analysis. The parameters driving the QALY results were utility values by disability levels as well as utilities assigned to specific treatments, and event and background mortality rates.
Despite differences in model designs and structures, it was mostly possible to replicate the results published by different authors and identify variables responsible for differences between ICERs using a reference model approach. This enables a better interpretation of published findings by focusing attention on the assumptions underlying the key model features accounting for differences.
PMCID: PMC3691493  PMID: 23615895

Results 1-25 (1323106)