Search tips
Search criteria

Results 1-25 (1576964)

Clipboard (0)

Related Articles

1.  Semiparametric Theory for Causal Mediation Analysis: efficiency bounds, multiple robustness, and sensitivity analysis 
Annals of statistics  2012;40(3):1816-1845.
Whilst estimation of the marginal (total) causal effect of a point exposure on an outcome is arguably the most common objective of experimental and observational studies in the health and social sciences, in recent years, investigators have also become increasingly interested in mediation analysis. Specifically, upon evaluating the total effect of the exposure, investigators routinely wish to make inferences about the direct or indirect pathways of the effect of the exposure not through or through a mediator variable that occurs subsequently to the exposure and prior to the outcome. Although powerful semiparametric methodologies have been developed to analyze observational studies, that produce double robust and highly efficient estimates of the marginal total causal effect, similar methods for mediation analysis are currently lacking. Thus, this paper develops a general semiparametric framework for obtaining inferences about so-called marginal natural direct and indirect causal effects, while appropriately accounting for a large number of pre-exposure confounding factors for the exposure and the mediator variables. Our analytic framework is particularly appealing, because it gives new insights on issues of efficiency and robustness in the context of mediation analysis. In particular, we propose new multiply robust locally efficient estimators of the marginal natural indirect and direct causal effects, and develop a novel double robust sensitivity analysis framework for the assumption of ignorability of the mediator variable.
PMCID: PMC4710381  PMID: 26770002
Natural direct effects; Natural indirect effects; double robust; mediation analysis; local efficiency
2.  Estimation of a Semiparametric Natural Direct Effect Model Incorporating Baseline Covariates 
Biometrika  2014;101(4):849-864.
Establishing cause-effect relationships is a standard goal of empirical science. Once the presence of a causal relationship is established, the precise causal mechanism involved becomes a topic of interest. A particularly popular type of mechanism analysis concerns questions of mediation, that is to what extent an effect is direct, and to what extent it is mediated by a third variable. A semiparametric theory has recently been proposed which allows multiply robust estimation of direct and mediated marginal effect functionals in observational studies (Tchetgen Tchetgen & Shpitser, 2012). In this paper we extend the new theory to handle parametric models of natural direct and indirect effects within levels of pre-exposure variables with an identity or log link function, where the model for the observed data likelihood is otherwise unrestricted. We show that estimation is generally not feasible in this model because of the curse of dimensionality associated with the required estimation of auxiliary conditional densities or expectations, given high-dimensional covariates. Thus, we consider multiply robust estimation and propose a more general model which assumes that a subset but not all of several working models holds.
PMCID: PMC4396536  PMID: 25892739
Local Efficiency; Mediation; Multiple Robustness; Natural Direct Effect; Natural Indirect Effect
3.  Sensitivity analyses for parametric causal mediation effect estimation 
Biostatistics (Oxford, England)  2014;16(2):339-351.
Causal mediation analysis uses a potential outcomes framework to estimate the direct effect of an exposure on an outcome and its indirect effect through an intermediate variable (or mediator). Causal interpretations of these effects typically rely on sequential ignorability. Because this assumption is not empirically testable, it is important to conduct sensitivity analyses. Sensitivity analyses so far offered for this situation have either focused on the case where the outcome follows a linear model or involve nonparametric or semiparametric models. We propose alternative approaches that are suitable for responses following generalized linear models. The first approach uses a Gaussian copula model involving latent versions of the mediator and the final outcome. The second approach uses a so-called hybrid causal-observational model that extends the association model for the final outcome, providing a novel sensitivity parameter. These models, while still assuming a randomized exposure, allow for unobserved (as well as observed) mediator-outcome confounders that are not affected by exposure. The methods are applied to data from a study of the effect of mother education on dental caries in adolescence.
PMCID: PMC4441101  PMID: 25395683
Causal inference; Copula; Interaction; Mediation analysis; Mediation formula; Potential outcome; Structural equations model
4.  Evaluating the Effect of Early Versus Late ARV Regimen Change if Failure on an Initial Regimen: Results From the AIDS Clinical Trials Group Study A5095 
The current goal of initial antiretroviral (ARV) therapy is suppression of plasma human immunodeficiency virus (HIV)-1 RNA levels to below 200 copies per milliliter. A proportion of HIV-infected patients who initiate antiretroviral therapy in clinical practice or antiretroviral clinical trials either fail to suppress HIV-1 RNA or have HIV-1 RNA levels rebound on therapy. Frequently, these patients have sustained CD4 cell counts responses and limited or no clinical symptoms and, therefore, have potentially limited indications for altering therapy which they may be tolerating well despite increased viral replication. On the other hand, increased viral replication on therapy leads to selection of resistance mutations to the antiretroviral agents comprising their therapy and potentially cross-resistance to other agents in the same class decreasing the likelihood of response to subsequent antiretroviral therapy. The optimal time to switch antiretroviral therapy to ensure sustained virologic suppression and prevent clinical events in patients who have rebound in their HIV-1 RNA, yet are stable, is not known. Randomized clinical trials to compare early versus delayed switching have been difficult to design and more difficult to enroll. In some clinical trials, such as the AIDS Clinical Trials Group (ACTG) Study A5095, patients randomized to initial antiretroviral treatment combinations, who fail to suppress HIV-1 RNA or have a rebound of HIV-1 RNA on therapy are allowed to switch from the initial ARV regimen to a new regimen, based on clinician and patient decisions. We delineate a statistical framework to estimate the effect of early versus late regimen change using data from ACTG A5095 in the context of two-stage designs.
In causal inference, a large class of doubly robust estimators are derived through semiparametric theory with applications to missing data problems. This class of estimators is motivated through geometric arguments and relies on large samples for good performance. By now, several authors have noted that a doubly robust estimator may be suboptimal when the outcome model is misspecified even if it is semiparametric efficient when the outcome regression model is correctly specified. Through auxiliary variables, two-stage designs, and within the contextual backdrop of our scientific problem and clinical study, we propose improved doubly robust, locally efficient estimators of a population mean and average causal effect for early versus delayed switching to second-line ARV treatment regimens. Our analysis of the ACTG A5095 data further demonstrates how methods that use auxiliary variables can improve over methods that ignore them. Using the methods developed here, we conclude that patients who switch within 8 weeks of virologic failure have better clinical outcomes, on average, than patients who delay switching to a new second-line ARV regimen after failing on the initial regimen. Ordinary statistical methods fail to find such differences. This article has online supplementary material.
PMCID: PMC3545451  PMID: 23329858
Causal inference; Double robustness; Longitudinal data analysis; Missing data; Rubin causal model; Semiparametric efficient estimation
5.  Inverse Odds Ratio-Weighted Estimation for Causal Mediation Analysis 
Statistics in medicine  2013;32(26):4567-4580.
An important scientific goal of studies in the health and social sciences is increasingly to determine to what extent the total effect of a point exposure is mediated by an intermediate variable on the causal pathway between the exposure and the outcome. A causal framework has recently been proposed for mediation analysis, which gives rise to new definitions, formal identification results and novel estimators of direct and indirect effects. In the present paper, the author describes a new inverse odds ratio-weighted (IORW) approach to estimate so-called natural direct and indirect effects. The approach which uses as a weight, the inverse of an estimate of the odds ratio function relating the exposure and the mediator is universal in that it can be used to decompose total effects in a number of regression models commonly used in practice. Specifically, the approach may be used for effect decomposition in generalized linear models with a nonlinear link function, and in a number of other commonly used models such as the Cox proportional hazards regression for a survival outcome. The approach is simple and can be implemented in standard software provided a weight can be specified for each observation. An additional advantage of the method is that it easily incorporates multiple mediators of a categorical, discrete or continuous nature.
PMCID: PMC3954805  PMID: 23744517
Causal Mediation Analysis; Inverse odds ratio weighted estimation; natural direct and indirect effects; double robustness
6.  Mechanisms and mediation in survival analysis: towards an integrated analytical framework 
A wide-ranging debate has taken place in recent years on mediation analysis and causal modelling, raising profound theoretical, philosophical and methodological questions. The authors build on the results of these discussions to work towards an integrated approach to the analysis of research questions that situate survival outcomes in relation to complex causal pathways with multiple mediators. The background to this contribution is the increasingly urgent need for policy-relevant research on the nature of inequalities in health and healthcare.
The authors begin by summarising debates on causal inference, mediated effects and statistical models, showing that these three strands of research have powerful synergies. They review a range of approaches which seek to extend existing survival models to obtain valid estimates of mediation effects. They then argue for an alternative strategy, which involves integrating survival outcomes within Structural Equation Models via the discrete-time survival model. This approach can provide an integrated framework for studying mediation effects in relation to survival outcomes, an issue of great relevance in applied health research. The authors provide an example of how these techniques can be used to explore whether the social class position of patients has a significant indirect effect on the hazard of death from colon cancer.
The results suggest that the indirect effects of social class on survival are substantial and negative (-0.23 overall). In addition to the substantial direct effect of this variable (-0.60), its indirect effects account for more than one quarter of the total effect. The two main pathways for this indirect effect, via emergency admission (-0.12), on the one hand, and hospital caseload, on the other, (-0.10) are of similar size.
The discrete-time survival model provides an attractive way of integrating time-to-event data within the field of Structural Equation Modelling. The authors demonstrate the efficacy of this approach in identifying complex causal pathways that mediate the effects of a socio-economic baseline covariate on the hazard of death from colon cancer. The results show that this approach has the potential to shed light on a class of research questions which is of particular relevance in health research.
PMCID: PMC4772586  PMID: 26927506
Causal modelling; Mediation analysis; Social inequalities; Discrete-time survival model; Structural equation modelling; Deprivation index; Ireland; Colon cancer
7.  Performance of mixed effects models in the analysis of mediated longitudinal data 
Linear mixed effects models (LMMs) are a common approach for analyzing longitudinal data in a variety of settings. Although LMMs may be applied to complex data structures, such as settings where mediators are present, it is unclear whether they perform well relative to methods for mediational analyses such as structural equation models (SEMs), which have obvious appeal in such settings. For some researchers, SEMs may be more difficult than LMMs to implement, e.g. due to lack of training in the methodology or the need for specialized SEM software. It therefore is of interest to evaluate whether the LMM performs sufficiently in a scenario particularly suitable for SEMs. We focus on evaluation of the total effect (i.e. direct and indirect) of an exposure on an outcome of interest when a mediating factor is present. Our aim is to explore whether the LMM performs as well as the SEM in a setting that is conducive to using the SEM.
We simulated mediated longitudinal data from an SEM where a binary, main independent variable has both direct and indirect effects on a continuous outcome. We conducted analyses with both the LMM and SEM to evaluate the performance of the LMM in a setting where the SEM is expected to be preferable. Models were evaluated with respect to bias, coverage probability and power. Sample size, effect size and error distribution of the simulated data were varied.
Both models performed well in a range of settings. Marginal increases in power estimates were observed for the SEM, although generally there were no major differences in performance. Power for both models was good with a sample of size of 250 and a small to medium effect size. Bias did not substantially increase for either model when data were generated from distributions that were both skewed and kurtotic.
In settings where the goal is to evaluate the overall effects, the LMM excluding mediating variables appears to have good performance with respect to power, bias and coverage probability relative to the SEM. The major benefit of SEMs is that it simultaneously and efficiently models both the direct and indirect effects of the mediation process.
PMCID: PMC2842282  PMID: 20170503
8.  Mediation analysis when a continuous mediator is measured with error and the outcome follows a generalized linear model 
Statistics in medicine  2014;33(28):4875-4890.
Mediation analysis is a popular approach to examine the extent to which the effect of an exposure on an outcome is through an intermediate variable (mediator) and the extent to which the effect is direct. When the mediator is mis-measured the validity of mediation analysis can be severely undermined. In this paper we first study the bias of classical, non-differential measurement error on a continuous mediator in the estimation of direct and indirect causal effects in generalized linear models when the outcome is either continuous or discrete and exposure-mediator interaction may be present. Our theoretical results as well as a numerical study demonstrate that in the presence of non-linearities the bias of naive estimators for direct and indirect effects that ignore measurement error can take unintuitive directions. We then develop methods to correct for measurement error. Three correction approaches using method of moments, regression calibration and SIMEX are compared. We apply the proposed method to the Massachusetts General Hospital lung cancer study to evaluate the effect of genetic variants mediated through smoking on lung cancer risk.
PMCID: PMC4224977  PMID: 25220625
Asymptotic bias; Measurement error; Mediation analysis; Method of moments; Regression calibration; SIMEX
9.  On identification of natural direct effects when a confounder of the mediator is directly affected by exposure 
Epidemiology (Cambridge, Mass.)  2014;25(2):282-291.
Natural direct and indirect effects formalize traditional notions of mediation analysis into a rigorous causal framework and have recently received considerable attention in epidemiology and in the social sciences. Sufficient conditions for identification of natural direct effects were formulated by Judea Pearl under a nonparametric structural equations model, which assumes certain independencies between potential outcomes. A common situation in epidemiology is that a confounder of the mediator-outcome relationship is itself affected by the exposure, in which case natural direct effects fail to be nonparametrically identified without additional assumptions, even under Pearl's nonparametric structural equations model. In this paper, we show that when a single binary confounder of the mediator is affected by the exposure, the natural direct effect is nonparametrically identified under the model, assuming monotonicity about the effect of the exposure on the confounder. A similar result is shown to hold for a vector of binary confounders of the mediator under a certain independence assumption about the confounders. Finally, we show that natural direct effects are more generally identified if there is no additive mean interaction between the mediator and confounders of the mediator affected by exposure. When correct, this latter assumption is particularly appealing because it does not require monotonicity of effects of the exposure. Additionally, it places no restriction on the nature of the confounders of the mediator which can be continuous or polytomous.
PMCID: PMC4230499  PMID: 24487211
10.  Mediation Analysis for Nonlinear Models with Confounding 
Epidemiology (Cambridge, Mass.)  2012;23(6):879-888.
Recently, researchers have used a potential-outcome framework to estimate causally interpretable direct and indirect effects of an intervention or exposure on an outcome. One approach to causal-mediation analysis uses the so-called mediation formula to estimate the natural direct and indirect effects. This approach generalizes classical mediation estimators and allows for arbitrary distributions for the outcome variable and mediator. A limitation of the standard (parametric) mediation formula approach is that it requires a specified mediator regression model and distribution; such a model may be difficult to construct and may not be of primary interest. To address this limitation, we propose a new method for causal-mediation analysis that uses the empirical distribution function, thereby avoiding parametric distribution assumptions for the mediator. In order to adjust for confounders of the exposure-mediator and exposure-outcome relationships, inverse-probability weighting is incorporated based on a supplementary model of the probability of exposure. This method, which yields estimates of the natural direct and indirect effects for a specified reference group, is applied to data from a cohort study of dental caries in very-low-birth-weight adolescents to investigate the oral-hygiene index as a possible mediator. Simulation studies show low bias in the estimation of direct and indirect effects in a variety of distribution scenarios, whereas the standard mediation formula approach can be considerably biased when the distribution of the mediator is incorrectly specified.
PMCID: PMC3773310  PMID: 23007042
11.  Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables 
PLoS ONE  2015;10(3):e0120031.
We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed). Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner), which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO). In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC) for a receiver-operator curve (ROC). Thus, given that 1) our VIM applies to any model fitting procedure, 2) under assumptions has meaningful clinical (causal) interpretations and 3) has asymptotic (influence-curve) based robust inference, it provides a compelling alternative to existing methods for estimating variable importance in high-dimensional clinical (or other) data.
PMCID: PMC4376910  PMID: 25815719
12.  Mediation analysis of the relationship between institutional research activity and patient survival 
Recent studies have suggested that patients treated in research-active institutions have better outcomes than patients treated in research-inactive institutions. However, little attention has been paid to explaining such effects, probably because techniques for mediation analysis existing so far have not been applicable to survival data.
We investigated the underlying mechanisms using a recently developed method for mediation analysis of survival data. Our analysis of the effect of research activity on patient survival was based on 352 patients who had been diagnosed with advanced ovarian cancer at 149 hospitals in 2001. All hospitals took part in a quality assurance program of the German Cancer Society. Patient outcomes were compared between hospitals participating in clinical trials and non-trial hospitals. Surgical outcome and chemotherapy selection were explored as potential mediators of the effect of hospital research activity on patient survival.
The 219 patients treated in hospitals participating in clinical trials had more complete surgical debulking, were more likely to receive the recommended platinum-taxane combination, and had better survival than the 133 patients treated in non-trial hospitals. Taking into account baseline confounders, the overall adjusted hazard ratio of death was 0.58 (95% confidence interval: 0.42 to 0.79). This effect was decomposed into a direct effect of research activity of 0.67 and two indirect effects of 0.93 each mediated through either optimal surgery or chemotherapy. Taken together, about 26% of the beneficial effect of research activity was mediated through the proposed pathways.
Mediation analysis allows proceeding from the question “Does it work?” to the question “How does it work?” In particular, we have shown that the research activity of a hospital contributes to superior patient survival through better use of surgery and chemotherapy. This methodology may be applied to analyze direct and indirect natural effects for almost any combination of variable types.
PMCID: PMC3917547  PMID: 24447677
Trial effect; Research activity; Healthcare outcomes; Mediation; Survival analysis
13.  Estimation of Causal Mediation Effects for a Dichotomous Outcome in Multiple-Mediator Models using the Mediation Formula 
Statistics in medicine  2013;32(24):4211-4228.
Mediators are intermediate variables in the causal pathway between an exposure and an outcome. Mediation analysis investigates the extent to which exposure effects occur through these variables, thus revealing causal mechanisms. In this paper, we consider the estimation of the mediation effect when the outcome is binary and multiple mediators of different types exist. We give a precise definition of the total mediation effect as well as decomposed mediation effects through individual or sets of mediators using the potential outcomes framework. We formulate a model of joint distribution (probit-normal) using continuous latent variables for any binary mediators to account for correlations among multiple mediators. A mediation formula approach is proposed to estimate the total mediation effect and decomposed mediation effects based on this parametric model. Estimation of mediation effects through individual or subsets of mediators requires an assumption involving the joint distribution of multiple counterfactuals. We conduct a simulation study that demonstrates low bias of mediation effect estimators for two-mediator models with various combinations of mediator types. The results also show that the power to detect a non-zero total mediation effect increases as the correlation coefficient between two mediators increases, while power for individual mediation effects reaches a maximum when the mediators are uncorrelated. We illustrate our approach by applying it to a retrospective cohort study of dental caries in adolescents with low and high socioeconomic status. Sensitivity analysis is performed to assess the robustness of conclusions regarding mediation effects when the assumption of no unmeasured mediator-outcome confounders is violated.
PMCID: PMC3789850  PMID: 23650048
mediation analysis; multiple mediators; latent variables; overall mediation effect; decomposed mediation effect; mediation formula; sensitivity analysis
14.  A Three-way Decomposition of a Total Effect into Direct, Indirect, and Interactive Effects 
Epidemiology (Cambridge, Mass.)  2013;24(2):224-232.
Recent theory in causal inference has provided concepts for mediation analysis and effect decomposition that allow one to decompose a total effect into a direct and an indirect effect. Here, it is shown that what is often taken as an indirect effect can in fact be further decomposed into a “pure” indirect effect and a mediated interactive effect, thus yielding a three-way decomposition of a total effect (direct, indirect, and interactive). This three-way decomposition applies to difference scales and also to additive ratio scales and additive hazard scales. Assumptions needed for the identification of each of these three effects are discussed and simple formulae are given for each when regression models allowing for interaction are used. The three-way decomposition is illustrated by examples from genetic and perinatal epidemiology, and discussion is given to what is gained over the traditional two-way decomposition into simply a direct and an indirect effect.
PMCID: PMC3563853  PMID: 23354283
15.  Child Mortality Estimation: Consistency of Under-Five Mortality Rate Estimates Using Full Birth Histories and Summary Birth Histories 
PLoS Medicine  2012;9(8):e1001296.
Romesh Silva assesses and analyzes differences in direct and indirect methods of estimating under-five mortality rates using data collected from full and summary birth histories in Demographic and Health Surveys from West Africa, East Africa, Latin America, and South/Southeast Asia.
Given the lack of complete vital registration data in most developing countries, for many countries it is not possible to accurately estimate under-five mortality rates from vital registration systems. Heavy reliance is often placed on direct and indirect methods for analyzing data collected from birth histories to estimate under-five mortality rates. Yet few systematic comparisons of these methods have been undertaken. This paper investigates whether analysts should use both direct and indirect estimates from full birth histories, and under what circumstances indirect estimates derived from summary birth histories should be used.
Methods and Findings
Usings Demographic and Health Surveys data from West Africa, East Africa, Latin America, and South/Southeast Asia, I quantify the differences between direct and indirect estimates of under-five mortality rates, analyze data quality issues, note the relative effects of these issues, and test whether these issues explain the observed differences. I find that indirect estimates are generally consistent with direct estimates, after adjustment for fertility change and birth transference, but don't add substantial additional insight beyond direct estimates. However, choice of direct or indirect method was found to be important in terms of both the adjustment for data errors and the assumptions made about fertility.
Although adjusted indirect estimates are generally consistent with adjusted direct estimates, some notable inconsistencies were observed for countries that had experienced either a political or economic crisis or stalled health transition in their recent past. This result suggests that when a population has experienced a smooth mortality decline or only short periods of excess mortality, both adjusted methods perform equally well. However, the observed inconsistencies identified suggest that the indirect method is particularly prone to bias resulting from violations of its strong assumptions about recent mortality and fertility. Hence, indirect estimates of under-five mortality rates from summary birth histories should be used only for populations that have experienced either smooth mortality declines or only short periods of excess mortality in their recent past.
Please see later in the article for the Editors' Summary.
Editors' Summary
In 1990, 12 million children died before they reached their fifth birthday. Faced with this largely avoidable loss of young lives, in 2000, world leaders set a target of reducing under-five mortality (death) to one-third of its 1990 level by 2015 as Millennium Development Goal 4 (MDG 4); this goal, together with seven others, aims to eradicate extreme poverty globally. To track progress towards MDG 4, experts need accurate estimates of the global and country-specific under-five mortality rate (U5MR, the probability of a child dying before age five). The most reliable sources of data for U5MR estimation are vital registration systems—national records of all births and deaths. Unfortunately, developing countries, which are where most childhood deaths occur, rarely have such records, so full or summary birth histories provide the data for U5MR estimation instead. In full birth histories (FBHs), which are collected through household surveys such as those conducted by Demographic and Health Surveys (DHS), women are asked for the date of birth of all their children and the age at death of any children who have died. In summary birth histories (SBHs), which are collected through household surveys and censuses, women are asked how many children they have had and how many are alive at the time of the survey.
Why Was This Study Done?
“Direct” estimates of U5MRs can be obtained from FBHs because FBHs provide detailed information about the date of death and the exposure of children to the risk of dying. By contrast, because SBHs do not contain information on children's exposure to the risk of dying, “indirect” estimates of U5MR are obtained from SBHs using model life tables (mathematical models of the variation of mortality with age). Indirect estimates are often also derived from FBHs, but few systematic comparisons of direct and indirect methods for U5MR estimation have been undertaken. In this study, Romesh Silva investigates whether direct and indirect methods provide consistent U5MR estimates from FBHs and whether there are any circumstances under which indirect methods provide more reliable U5MR estimates than direct methods.
What Did the Researcher Do and Find?
The researcher used DHS data from West Africa, East Africa, Latin America, and South/Southeast Asia to quantify the differences between direct and indirect estimates of U5MR calculated from the same data and analyzed possible reasons for these differences. Estimates obtained using a version of the “Brass” indirect estimation method were uniformly higher than those obtained using direct estimation. Indirect and direct estimates generally agreed, however, after adjustment for changes in fertility—the Brass method assumes that country-specific fertility (the number of children born to a woman during her reproductive life) remains constant—and for birth transference, an important source of data error in FBHs that arises because DHS field staff can lessen their workload by recording births as occurring before a preset cutoff date rather than after that date. Notably, though, for countries that had experienced political or economic crises, periods of excess mortality due to conflicts, or periods during which the health transition had stalled (as countries become more affluent, overall mortality rates decline and noncommunicable diseases replace infectious diseases as the major causes of death), marked differences between indirect and direct estimates of U5MR remained, even after these adjustments.
What Do These Findings Mean?
Because the countries included in this study do not have vital registration systems, these findings provide no information about the validity of either direct or indirect estimation methods for U5MR estimation. They suggest, however, that for countries where there has been a smooth decline in mortality or only short periods of excess mortality, both direct and indirect methods of U5MR estimation work equally well, after adjustment for changes in fertility and for birth transference, and that indirect estimates add little to the insights provided into childhood mortality by direct estimates. Importantly, the inconsistencies observed between the two methods that remain after adjustment suggest that indirect U5MR estimation is more susceptible to bias (systematic errors that arise because of the assumptions used to estimate U5MR) than direct estimation. Thus, indirect estimates of U5MR from SBHs should be used only for populations that have experienced either smooth mortality declines or only short periods of excess mortality in their recent past.
Additional Information
Please access these websites via the online version of this summary at
This paper is part of a collection of papers on Child Mortality Estimation Methods published in PLOS Medicine
The United Nations Childrens Fund (UNICEF) works for children's rights, survival, development, and protection around the world; it provides information on Millennium Development Goal 4, and its Childinfo website provides detailed statistics about child survival and health, including a description of the United Nations Inter-agency Group for Child Mortality Estimation; the 2011 UN IGME report Levels & Trends in Child Mortality is available
The World Health Organization has information about Millennium Development Goal 4 and provides estimates of child mortality rates (some information in several languages)
Further information about the Millennium Development Goals is available
Information is available about infant and child mortality data collected by Demographic and Health Surveys
PMCID: PMC3429405  PMID: 22952436
16.  Likelihood approaches for proportional likelihood ratio model with right-censored data 
Statistics in medicine  2014;33(14):2467-2479.
Regression methods for survival data with right censoring have been extensively studied under semiparametric transformation models [1] such as the Cox regression model [2] and the proportional odds model [3]. However, their practical application could be limited due to possible violation of model assumption or lack of ready interpretation for the regression coefficients in some cases. As an alternative, in this paper, the proportional likelihood ratio model introduced by Luo and Tsai [4] is extended to flexibly model the relationship between survival outcome and covariates. This model has a natural connection with many important semiparametric models such as generalized linear model and density ratio model, and is closely related to biased sampling problems. Compared with the semiparametric transformation model, the proportional likelihood ratio model is appealing and practical in many ways because of its model flexibility and quite direct clinical interpretation. We present two likelihood approaches for the estimation and inference on the target regression parameters under independent and dependent censoring assumptions. Based on a conditional likelihood approach using uncensored failure times, a numerically simple estimation procedure is developed by maximizing a pairwise pseudo-likelihood [5]. We also develop a full likelihood approach and the most efficient maximum likelihood estimator is obtained by a profile likelihood. Simulation studies are conducted to assess the finite-sample properties of the proposed estimators and compare the efficiency of the two likelihood approaches. An application to survival data for bone marrow transplantation patients of acute leukemia is provided to illustrate the proposed method and other approaches for handling non-proportionality. The relative merits of these methods are discussed in concluding remarks.
PMCID: PMC4527348  PMID: 24500821
conditional likelihood; pairwise pseudo-likelihood; profile likelihood; proportional likelihood ratio model; right-censored data
17.  Mediation and spillover effects in group-randomized trials: a case study of the 4Rs educational intervention 
Peer influence and social interactions can give rise to spillover effects in which the exposure of one individual may affect outcomes of other individuals. Even if the intervention under study occurs at the group or cluster level as in group-randomized trials, spillover effects can occur when the mediator of interest is measured at a lower level than the treatment. Evaluators who choose groups rather than individuals as experimental units in a randomized trial often anticipate that the desirable changes in targeted social behaviors will be reinforced through interference among individuals in a group exposed to the same treatment. In an empirical evaluation of the effect of a school-wide intervention on reducing individual students’ depressive symptoms, schools in matched pairs were randomly assigned to the 4Rs intervention or the control condition. Class quality was hypothesized as an important mediator assessed at the classroom level. We reason that the quality of one classroom may affect outcomes of children in another classroom because children interact not simply with their classmates but also with those from other classes in the hallways or on the playground. In investigating the role of class quality as a mediator, failure to account for such spillover effects of one classroom on the outcomes of children in other classrooms can potentially result in bias and problems with interpretation. Using a counterfactual conceptualization of direct, indirect and spillover effects, we provide a framework that can accommodate issues of mediation and spillover effects in group randomized trials. We show that the total effect can be decomposed into a natural direct effect, a within-classroom mediated effect and a spillover mediated effect. We give identification conditions for each of the causal effects of interest and provide results on the consequences of ignoring “interference” or “spillover effects” when they are in fact present. Our modeling approach disentangles these effects. The analysis examines whether the 4Rs intervention has an effect on children's depressive symptoms through changing the quality of other classes as well as through changing the quality of a child's own class.
PMCID: PMC3753117  PMID: 23997375
Direct/indirect effects; interference; multilevel models; social interactions
18.  Longitudinal studies of binary response data following case-control and stratified case-control sampling: design and analysis 
Biometrics  2009;66(2):365-373.
We discuss design and analysis of longitudinal studies after case-control sampling, wherein interest is in the relationship between a longitudinal binary response that is related to the sampling (case-control) variable, and a set of covariates. We propose a semiparametric modelling framework based on a marginal longitudinal binary response model and an ancillary model for subjects’ case-control status. In this approach, the analyst must posit the population prevalence of being a case, which is then used to compute an offset term in the ancillary model. Parameter estimates from this model are used to compute offsets for the longitudinal response model. Examining the impact of population prevalence and ancillary model misspecification, we show that time-invariant covariate parameter estimates, other than the intercept, are reasonably robust, but intercept and time-varying covariate parameter estimates can be sensitive to such misspecification. We study design and analysis issues impacting study efficiency, namely: choice of sampling variable and the strength of its relationship to the response, sample stratification, choice of working covariance weighting, and degree of flexibility of the ancillary model. The research is motivated by a longitudinal study following case-control sampling of the time course of ADHD symptoms.
PMCID: PMC3051172  PMID: 19673861
Bias; binary data; efficiency; Generalized Estimating Equations; longitudinal data; logistic regression; outcome dependent sampling
19.  Mediation analysis with multiple versions of the mediator 
Epidemiology (Cambridge, Mass.)  2012;23(3):454-463.
The causal inference literature has provided definitions of direct and indirect effects based on counterfactuals that generalize the approach found in the social science literature. However, these definitions presuppose well defined hypothetical interventions on the mediator. In many settings there may be multiple ways to fix the mediator to a particular value and these different hypothetical interventions may have very different implications for the outcome of interest. In this paper we consider mediation analysis when multiple versions of the mediator are present. Specifically, we consider the problem of attempting to decompose a total effect of an exposure on an outcome into the portion through the intermediate and the portion through other pathways. We consider the setting in which there are multiple versions of the mediator but the investigator only has access to data on the particular measurement, not which version of the mediator may have brought that value about. We show that the quantity that is estimated as a natural indirect effect using only the available data does indeed have an interpretation as a particular type of mediated effect; however, the quantity estimated as a natural direct effect in fact captures both a true direct effect and an effect of the exposure on the outcome mediated through the effect of the version of the mediator that is not captured by the mediator measurement. The results are illustrated using two examples from the literature, one in which the versions of the mediator are unknown and another in which the mediator itself has been dichotomized.
PMCID: PMC3771529  PMID: 22475830
20.  An information criterion for marginal structural models 
Statistics in medicine  2012;32(8):1383-1393.
Marginal structural models were developed as a semiparametric alternative to the G-computation formula to estimate causal effects of exposures. In practice, these models are often specified using parametric regression models. As such, the usual conventions regarding regression model specification apply. This paper outlines strategies for marginal structural model specification, and considerations for the functional form of the exposure metric in the final structural model. We propose a quasi-likelihood information criterion adapted from use in generalized estimating equations. We evaluate the properties of our proposed information criterion using a limited simulation study. We illustrate our approach using two empirical examples. In the first example, we use data from a randomized breastfeeding promotion trial to estimate the effect of breastfeeding duration on infant weight at one year. In the second example, we use data from two prospective cohorts studies to estimate the effect of highly active antiretroviral therapy on CD4 count in an observational cohort of HIV-infected men and women. The marginal structural model specified should reflect the scientific question being addressed, but can also assist in exploration of other plausible and closely related questions. In marginal structural models, as in any regression setting, correct inference depends on correct model specification. Our proposed information criterion provides a formal method for comparing model fit for different specifications.
PMCID: PMC4180061  PMID: 22972662
Bias; Causal inference; Marginal structural model; Regression analysis; Model specification
21.  Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery 
In longitudinal and repeated measures data analysis, often the goal is to determine the effect of a treatment or aspect on a particular outcome (e.g., disease progression). We consider a semiparametric repeated measures regression model, where the parametric component models effect of the variable of interest and any modification by other covariates. The expectation of this parametric component over the other covariates is a measure of variable importance. Here, we present a targeted maximum likelihood estimator of the finite dimensional regression parameter, which is easily estimated using standard software for generalized estimating equations.
The targeted maximum likelihood method provides double robust and locally efficient estimates of the variable importance parameters and inference based on the influence curve. We demonstrate these properties through simulation under correct and incorrect model specification, and apply our method in practice to estimating the activity of transcription factor (TF) over cell cycle in yeast. We specifically target the importance of SWI4, SWI6, MBP1, MCM1, ACE2, FKH2, NDD1, and SWI5.
The semiparametric model allows us to determine the importance of a TF at specific time points by specifying time indicators as potential effect modifiers of the TF. Our results are promising, showing significant importance trends during the expected time periods. This methodology can also be used as a variable importance analysis tool to assess the effect of a large number of variables such as gene expressions or single nucleotide polymorphisms.
PMCID: PMC3122882  PMID: 21291412
targeted maximum likelihood; semiparametric; repeated measures; longitudinal; transcription factors
22.  Application of Behavioral Theories to Disaster and Emergency Health Preparedness: A Systematic Review 
PLoS Currents  2015;7:ecurrents.dis.31a8995ced321301466db400f1357829.
Background: Preparedness for disasters and emergencies at individual, community and organizational levels could be more effective tools in mitigating (the growing incidence) of disaster risk and ameliorating their impacts. That is, to play more significant roles in disaster risk reduction (DRR). Preparedness efforts focus on changing human behaviors in ways that reduce people’s risk and increase their ability to cope with hazard consequences. While preparedness initiatives have used behavioral theories to facilitate DRR, many theories have been used and little is known about which behavioral theories are more commonly used, where they have been used, and why they have been preferred over alternative behavioral theories. Given that theories differ with respect to the variables used and the relationship between them, a systematic analysis is an essential first step to answering questions about the relative utility of theories and providing a more robust evidence base for preparedness components of DRR strategies. The goal of this systematic review was to search and summarize evidence by assessing the application of behavioral theories to disaster and emergency health preparedness across the world.
Methods: The protocol was prepared in which the study objectives, questions, inclusion and exclusion criteria, and sensitive search strategies were developed and pilot-tested at the beginning of the study. Using selected keywords, articles were searched mainly in PubMed, Scopus, Mosby’s Index (Nursing Index) and Safetylit databases. Articles were assessed based on their titles, abstracts, and their full texts. The data were extracted from selected articles and results were presented using qualitative and quantitative methods.
Results: In total, 2040 titles, 450 abstracts and 62 full texts of articles were assessed for eligibility criteria, whilst five articles were archived from other sources, and then finally, 33 articles were selected. The Health Belief Model (HBM), Extended Parallel Process Model (EPPM), Theory of Planned Behavior (TPB) and Social Cognitive Theories were most commonly applied to influenza (H1N1 and H5N1), floods, and earthquake hazards. Studies were predominantly conducted in USA (13 studies). In Asia, where the annual number of disasters and victims exceeds those in other continents, only three studies were identified. Overall, the main constructs of HBM (perceived susceptibility, severity, benefits, and barriers), EPPM (higher threat and higher efficacy), TPB (attitude and subjective norm), and the majority of the constructs utilized in Social Cognitive Theories were associated with preparedness for diverse hazards. However, while all the theories described above describe the relationships between constituent variables, with the exception of research on Social Cognitive Theories, few studies of other theories and models used path analysis to identify the interdependence relationships between the constructs described in the respective theories/models. Similarly, few identified how other mediating  variables could influence disaster and emergency preparedness. 
Conclusions: The existing evidence on the application of behavioral theories and models to disaster and emergency preparedness is chiefly from developed countries. This raises issues regarding their utility in countries, particularly in Asisa and the Middle East, where cultural characteristics are very different to those prevailing in the Western countries in which theories have been developed and tested. The theories and models discussed here have been applied predominantly to disease outbreaks and natural hazards, and information on their utility as guides to preparedness for man-made hazards is lacking. Hence, future studies related to behavioral theories and models addressing preparedness need to target developing countries where disaster risk  and the consequent need for preparedness is high. A need for additional work on demonstrating the relationships of variables and constructs, including more clearly articulating roles for mediating effects was also identified in this analysis. 
PMCID: PMC4494855  PMID: 26203400
Behavior; disaster; Emergency Health; Model; preparedness; Theory
23.  Model Misspecification When Excluding Instrumental Variables From PS Models in Settings Where Instruments Modify the Effects of Covariates on Treatment 
Epidemiologic methods  2014;3(1):83-96.
Theory and simulations show that variables affecting the outcome only through exposure, known as instrumental variables (IVs), should be excluded from propensity score (PS) models. In pharmacoepidemiologic studies based on automated healthcare databases, researchers will sometimes use a single PS model to control for confounding when evaluating the effect of a treatment on multiple outcomes. Because these “full” models are not constructed with a specific outcome in mind, they will usually contain a large number of IVs for any individual study or outcome. If researchers subsequently decide to evaluate a subset of the outcomes in more detail, they can construct reduced “outcome-specific” models that exclude IVs for the particular study. Accurate estimates of PSs that do not condition on IVs, however, can be compromised when simply excluding instruments from the full PS model. This misspecification may have a negligible impact on effect estimates in many settings, but is likely to be more pronounced for situations where instruments modify the effects of covariates on treatment (instrument-confounder interactions). In studies evaluating drugs during early dissemination, the effects of covariates on treatment are likely modified over calendar time and IV-confounder interaction effects on treatment are likely to exist. In these settings, refitting more flexible PS models after excluding IVs and IV-confounder interactions can work well. The authors propose an alternative method based on the concept of marginalization that can be used to remove the negative effects of controlling for IVs and IV-confounder interactions without having to refit the full PS model. This method fits the full PS model, including IVs and IV-confounder interactions, but marginalizes over values of the instruments. Fitting more flexible PS models after excluding IVs or using the full model to marginalize over IVs can prevent model misspecification along with the negative effects of balancing instruments in certain settings.
PMCID: PMC4319188  PMID: 25667819
24.  SIMEX and standard error estimation in semiparametric measurement error models 
SIMEX is a general-purpose technique for measurement error correction. There is a substantial literature on the application and theory of SIMEX for purely parametric problems, as well as for purely non-parametric regression problems, but there is neither application nor theory for semiparametric problems. Motivated by an example involving radiation dosimetry, we develop the basic theory for SIMEX in semiparametric problems using kernel-based estimation methods. This includes situations that the mismeasured variable is modeled purely parametrically, purely non-parametrically, or that the mismeasured variable has components that are modeled both parametrically and nonparametrically. Using our asymptotic expansions, easily computed standard error formulae are derived, as are the bias properties of the nonparametric estimator. The standard error method represents a new method for estimating variability of nonparametric estimators in semiparametric problems, and we show in both simulations and in our example that it improves dramatically on first order methods.
We find that for estimating the parametric part of the model, standard bandwidth choices of order O(n−1/5) are sufficient to ensure asymptotic normality, and undersmoothing is not required. SIMEX has the property that it fits misspecified models, namely ones that ignore the measurement error. Our work thus also more generally describes the behavior of kernel-based methods in misspecified semiparametric problems.
PMCID: PMC2710855  PMID: 19609371
Berkson measurement errors; measurement error; misspecified models; nonparametric regression; radiation epidemiology; semiparametric models; SIMEX; simulation-extrapolation; standard error estimation; uniform expansions
25.  Identification and efficient estimation of the natural direct effect among the untreated 
Biometrics  2013;69(2):310-317.
The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this paper we introduce a new causal parameter called the natural direct effect among the untreated, discus identifiability assumptions, propose a sensitivity analysis for some of the assumptions, and show that this new parameter is equivalent to the NDE in a randomized controlled trial. We also present a targeted minimum loss estimator (TMLE), a locally efficient, double robust substitution estimator for the statistical parameter associated with this causal parameter. The TMLE can be applied to problems with continuous and high dimensional intermediate variables, and can be used to estimate the NDE in a randomized controlled trial with such data. Additionally, we define and discuss the estimation of three related causal parameters: the natural direct effect among the treated, the indirect effect among the untreated and the indirect effect among the treated.
PMCID: PMC3692606  PMID: 23607645
Causal inference; direct effect; indirect effect; mediation analysis; semiparametric models; targeted minimum loss estimation

Results 1-25 (1576964)