Search tips
Search criteria

Results 1-25 (1278470)

Clipboard (0)

Related Articles

1.  Inverse Odds Ratio-Weighted Estimation for Causal Mediation Analysis 
Statistics in medicine  2013;32(26):4567-4580.
An important scientific goal of studies in the health and social sciences is increasingly to determine to what extent the total effect of a point exposure is mediated by an intermediate variable on the causal pathway between the exposure and the outcome. A causal framework has recently been proposed for mediation analysis, which gives rise to new definitions, formal identification results and novel estimators of direct and indirect effects. In the present paper, the author describes a new inverse odds ratio-weighted (IORW) approach to estimate so-called natural direct and indirect effects. The approach which uses as a weight, the inverse of an estimate of the odds ratio function relating the exposure and the mediator is universal in that it can be used to decompose total effects in a number of regression models commonly used in practice. Specifically, the approach may be used for effect decomposition in generalized linear models with a nonlinear link function, and in a number of other commonly used models such as the Cox proportional hazards regression for a survival outcome. The approach is simple and can be implemented in standard software provided a weight can be specified for each observation. An additional advantage of the method is that it easily incorporates multiple mediators of a categorical, discrete or continuous nature.
PMCID: PMC3954805  PMID: 23744517
Causal Mediation Analysis; Inverse odds ratio weighted estimation; natural direct and indirect effects; double robustness
2.  Evaluating the Effect of Early Versus Late ARV Regimen Change if Failure on an Initial Regimen: Results From the AIDS Clinical Trials Group Study A5095 
The current goal of initial antiretroviral (ARV) therapy is suppression of plasma human immunodeficiency virus (HIV)-1 RNA levels to below 200 copies per milliliter. A proportion of HIV-infected patients who initiate antiretroviral therapy in clinical practice or antiretroviral clinical trials either fail to suppress HIV-1 RNA or have HIV-1 RNA levels rebound on therapy. Frequently, these patients have sustained CD4 cell counts responses and limited or no clinical symptoms and, therefore, have potentially limited indications for altering therapy which they may be tolerating well despite increased viral replication. On the other hand, increased viral replication on therapy leads to selection of resistance mutations to the antiretroviral agents comprising their therapy and potentially cross-resistance to other agents in the same class decreasing the likelihood of response to subsequent antiretroviral therapy. The optimal time to switch antiretroviral therapy to ensure sustained virologic suppression and prevent clinical events in patients who have rebound in their HIV-1 RNA, yet are stable, is not known. Randomized clinical trials to compare early versus delayed switching have been difficult to design and more difficult to enroll. In some clinical trials, such as the AIDS Clinical Trials Group (ACTG) Study A5095, patients randomized to initial antiretroviral treatment combinations, who fail to suppress HIV-1 RNA or have a rebound of HIV-1 RNA on therapy are allowed to switch from the initial ARV regimen to a new regimen, based on clinician and patient decisions. We delineate a statistical framework to estimate the effect of early versus late regimen change using data from ACTG A5095 in the context of two-stage designs.
In causal inference, a large class of doubly robust estimators are derived through semiparametric theory with applications to missing data problems. This class of estimators is motivated through geometric arguments and relies on large samples for good performance. By now, several authors have noted that a doubly robust estimator may be suboptimal when the outcome model is misspecified even if it is semiparametric efficient when the outcome regression model is correctly specified. Through auxiliary variables, two-stage designs, and within the contextual backdrop of our scientific problem and clinical study, we propose improved doubly robust, locally efficient estimators of a population mean and average causal effect for early versus delayed switching to second-line ARV treatment regimens. Our analysis of the ACTG A5095 data further demonstrates how methods that use auxiliary variables can improve over methods that ignore them. Using the methods developed here, we conclude that patients who switch within 8 weeks of virologic failure have better clinical outcomes, on average, than patients who delay switching to a new second-line ARV regimen after failing on the initial regimen. Ordinary statistical methods fail to find such differences. This article has online supplementary material.
PMCID: PMC3545451  PMID: 23329858
Causal inference; Double robustness; Longitudinal data analysis; Missing data; Rubin causal model; Semiparametric efficient estimation
3.  Mediation analysis of the relationship between institutional research activity and patient survival 
Recent studies have suggested that patients treated in research-active institutions have better outcomes than patients treated in research-inactive institutions. However, little attention has been paid to explaining such effects, probably because techniques for mediation analysis existing so far have not been applicable to survival data.
We investigated the underlying mechanisms using a recently developed method for mediation analysis of survival data. Our analysis of the effect of research activity on patient survival was based on 352 patients who had been diagnosed with advanced ovarian cancer at 149 hospitals in 2001. All hospitals took part in a quality assurance program of the German Cancer Society. Patient outcomes were compared between hospitals participating in clinical trials and non-trial hospitals. Surgical outcome and chemotherapy selection were explored as potential mediators of the effect of hospital research activity on patient survival.
The 219 patients treated in hospitals participating in clinical trials had more complete surgical debulking, were more likely to receive the recommended platinum-taxane combination, and had better survival than the 133 patients treated in non-trial hospitals. Taking into account baseline confounders, the overall adjusted hazard ratio of death was 0.58 (95% confidence interval: 0.42 to 0.79). This effect was decomposed into a direct effect of research activity of 0.67 and two indirect effects of 0.93 each mediated through either optimal surgery or chemotherapy. Taken together, about 26% of the beneficial effect of research activity was mediated through the proposed pathways.
Mediation analysis allows proceeding from the question “Does it work?” to the question “How does it work?” In particular, we have shown that the research activity of a hospital contributes to superior patient survival through better use of surgery and chemotherapy. This methodology may be applied to analyze direct and indirect natural effects for almost any combination of variable types.
PMCID: PMC3917547  PMID: 24447677
Trial effect; Research activity; Healthcare outcomes; Mediation; Survival analysis
4.  Performance of mixed effects models in the analysis of mediated longitudinal data 
Linear mixed effects models (LMMs) are a common approach for analyzing longitudinal data in a variety of settings. Although LMMs may be applied to complex data structures, such as settings where mediators are present, it is unclear whether they perform well relative to methods for mediational analyses such as structural equation models (SEMs), which have obvious appeal in such settings. For some researchers, SEMs may be more difficult than LMMs to implement, e.g. due to lack of training in the methodology or the need for specialized SEM software. It therefore is of interest to evaluate whether the LMM performs sufficiently in a scenario particularly suitable for SEMs. We focus on evaluation of the total effect (i.e. direct and indirect) of an exposure on an outcome of interest when a mediating factor is present. Our aim is to explore whether the LMM performs as well as the SEM in a setting that is conducive to using the SEM.
We simulated mediated longitudinal data from an SEM where a binary, main independent variable has both direct and indirect effects on a continuous outcome. We conducted analyses with both the LMM and SEM to evaluate the performance of the LMM in a setting where the SEM is expected to be preferable. Models were evaluated with respect to bias, coverage probability and power. Sample size, effect size and error distribution of the simulated data were varied.
Both models performed well in a range of settings. Marginal increases in power estimates were observed for the SEM, although generally there were no major differences in performance. Power for both models was good with a sample of size of 250 and a small to medium effect size. Bias did not substantially increase for either model when data were generated from distributions that were both skewed and kurtotic.
In settings where the goal is to evaluate the overall effects, the LMM excluding mediating variables appears to have good performance with respect to power, bias and coverage probability relative to the SEM. The major benefit of SEMs is that it simultaneously and efficiently models both the direct and indirect effects of the mediation process.
PMCID: PMC2842282  PMID: 20170503
5.  Mediation Analysis for Nonlinear Models with Confounding 
Epidemiology (Cambridge, Mass.)  2012;23(6):879-888.
Recently, researchers have used a potential-outcome framework to estimate causally interpretable direct and indirect effects of an intervention or exposure on an outcome. One approach to causal-mediation analysis uses the so-called mediation formula to estimate the natural direct and indirect effects. This approach generalizes classical mediation estimators and allows for arbitrary distributions for the outcome variable and mediator. A limitation of the standard (parametric) mediation formula approach is that it requires a specified mediator regression model and distribution; such a model may be difficult to construct and may not be of primary interest. To address this limitation, we propose a new method for causal-mediation analysis that uses the empirical distribution function, thereby avoiding parametric distribution assumptions for the mediator. In order to adjust for confounders of the exposure-mediator and exposure-outcome relationships, inverse-probability weighting is incorporated based on a supplementary model of the probability of exposure. This method, which yields estimates of the natural direct and indirect effects for a specified reference group, is applied to data from a cohort study of dental caries in very-low-birth-weight adolescents to investigate the oral-hygiene index as a possible mediator. Simulation studies show low bias in the estimation of direct and indirect effects in a variety of distribution scenarios, whereas the standard mediation formula approach can be considerably biased when the distribution of the mediator is incorrectly specified.
PMCID: PMC3773310  PMID: 23007042
6.  Child Mortality Estimation: Consistency of Under-Five Mortality Rate Estimates Using Full Birth Histories and Summary Birth Histories 
PLoS Medicine  2012;9(8):e1001296.
Romesh Silva assesses and analyzes differences in direct and indirect methods of estimating under-five mortality rates using data collected from full and summary birth histories in Demographic and Health Surveys from West Africa, East Africa, Latin America, and South/Southeast Asia.
Given the lack of complete vital registration data in most developing countries, for many countries it is not possible to accurately estimate under-five mortality rates from vital registration systems. Heavy reliance is often placed on direct and indirect methods for analyzing data collected from birth histories to estimate under-five mortality rates. Yet few systematic comparisons of these methods have been undertaken. This paper investigates whether analysts should use both direct and indirect estimates from full birth histories, and under what circumstances indirect estimates derived from summary birth histories should be used.
Methods and Findings
Usings Demographic and Health Surveys data from West Africa, East Africa, Latin America, and South/Southeast Asia, I quantify the differences between direct and indirect estimates of under-five mortality rates, analyze data quality issues, note the relative effects of these issues, and test whether these issues explain the observed differences. I find that indirect estimates are generally consistent with direct estimates, after adjustment for fertility change and birth transference, but don't add substantial additional insight beyond direct estimates. However, choice of direct or indirect method was found to be important in terms of both the adjustment for data errors and the assumptions made about fertility.
Although adjusted indirect estimates are generally consistent with adjusted direct estimates, some notable inconsistencies were observed for countries that had experienced either a political or economic crisis or stalled health transition in their recent past. This result suggests that when a population has experienced a smooth mortality decline or only short periods of excess mortality, both adjusted methods perform equally well. However, the observed inconsistencies identified suggest that the indirect method is particularly prone to bias resulting from violations of its strong assumptions about recent mortality and fertility. Hence, indirect estimates of under-five mortality rates from summary birth histories should be used only for populations that have experienced either smooth mortality declines or only short periods of excess mortality in their recent past.
Please see later in the article for the Editors' Summary.
Editors' Summary
In 1990, 12 million children died before they reached their fifth birthday. Faced with this largely avoidable loss of young lives, in 2000, world leaders set a target of reducing under-five mortality (death) to one-third of its 1990 level by 2015 as Millennium Development Goal 4 (MDG 4); this goal, together with seven others, aims to eradicate extreme poverty globally. To track progress towards MDG 4, experts need accurate estimates of the global and country-specific under-five mortality rate (U5MR, the probability of a child dying before age five). The most reliable sources of data for U5MR estimation are vital registration systems—national records of all births and deaths. Unfortunately, developing countries, which are where most childhood deaths occur, rarely have such records, so full or summary birth histories provide the data for U5MR estimation instead. In full birth histories (FBHs), which are collected through household surveys such as those conducted by Demographic and Health Surveys (DHS), women are asked for the date of birth of all their children and the age at death of any children who have died. In summary birth histories (SBHs), which are collected through household surveys and censuses, women are asked how many children they have had and how many are alive at the time of the survey.
Why Was This Study Done?
“Direct” estimates of U5MRs can be obtained from FBHs because FBHs provide detailed information about the date of death and the exposure of children to the risk of dying. By contrast, because SBHs do not contain information on children's exposure to the risk of dying, “indirect” estimates of U5MR are obtained from SBHs using model life tables (mathematical models of the variation of mortality with age). Indirect estimates are often also derived from FBHs, but few systematic comparisons of direct and indirect methods for U5MR estimation have been undertaken. In this study, Romesh Silva investigates whether direct and indirect methods provide consistent U5MR estimates from FBHs and whether there are any circumstances under which indirect methods provide more reliable U5MR estimates than direct methods.
What Did the Researcher Do and Find?
The researcher used DHS data from West Africa, East Africa, Latin America, and South/Southeast Asia to quantify the differences between direct and indirect estimates of U5MR calculated from the same data and analyzed possible reasons for these differences. Estimates obtained using a version of the “Brass” indirect estimation method were uniformly higher than those obtained using direct estimation. Indirect and direct estimates generally agreed, however, after adjustment for changes in fertility—the Brass method assumes that country-specific fertility (the number of children born to a woman during her reproductive life) remains constant—and for birth transference, an important source of data error in FBHs that arises because DHS field staff can lessen their workload by recording births as occurring before a preset cutoff date rather than after that date. Notably, though, for countries that had experienced political or economic crises, periods of excess mortality due to conflicts, or periods during which the health transition had stalled (as countries become more affluent, overall mortality rates decline and noncommunicable diseases replace infectious diseases as the major causes of death), marked differences between indirect and direct estimates of U5MR remained, even after these adjustments.
What Do These Findings Mean?
Because the countries included in this study do not have vital registration systems, these findings provide no information about the validity of either direct or indirect estimation methods for U5MR estimation. They suggest, however, that for countries where there has been a smooth decline in mortality or only short periods of excess mortality, both direct and indirect methods of U5MR estimation work equally well, after adjustment for changes in fertility and for birth transference, and that indirect estimates add little to the insights provided into childhood mortality by direct estimates. Importantly, the inconsistencies observed between the two methods that remain after adjustment suggest that indirect U5MR estimation is more susceptible to bias (systematic errors that arise because of the assumptions used to estimate U5MR) than direct estimation. Thus, indirect estimates of U5MR from SBHs should be used only for populations that have experienced either smooth mortality declines or only short periods of excess mortality in their recent past.
Additional Information
Please access these websites via the online version of this summary at
This paper is part of a collection of papers on Child Mortality Estimation Methods published in PLOS Medicine
The United Nations Childrens Fund (UNICEF) works for children's rights, survival, development, and protection around the world; it provides information on Millennium Development Goal 4, and its Childinfo website provides detailed statistics about child survival and health, including a description of the United Nations Inter-agency Group for Child Mortality Estimation; the 2011 UN IGME report Levels & Trends in Child Mortality is available
The World Health Organization has information about Millennium Development Goal 4 and provides estimates of child mortality rates (some information in several languages)
Further information about the Millennium Development Goals is available
Information is available about infant and child mortality data collected by Demographic and Health Surveys
PMCID: PMC3429405  PMID: 22952436
7.  Estimation of Causal Mediation Effects for a Dichotomous Outcome in Multiple-Mediator Models using the Mediation Formula 
Statistics in medicine  2013;32(24):4211-4228.
Mediators are intermediate variables in the causal pathway between an exposure and an outcome. Mediation analysis investigates the extent to which exposure effects occur through these variables, thus revealing causal mechanisms. In this paper, we consider the estimation of the mediation effect when the outcome is binary and multiple mediators of different types exist. We give a precise definition of the total mediation effect as well as decomposed mediation effects through individual or sets of mediators using the potential outcomes framework. We formulate a model of joint distribution (probit-normal) using continuous latent variables for any binary mediators to account for correlations among multiple mediators. A mediation formula approach is proposed to estimate the total mediation effect and decomposed mediation effects based on this parametric model. Estimation of mediation effects through individual or subsets of mediators requires an assumption involving the joint distribution of multiple counterfactuals. We conduct a simulation study that demonstrates low bias of mediation effect estimators for two-mediator models with various combinations of mediator types. The results also show that the power to detect a non-zero total mediation effect increases as the correlation coefficient between two mediators increases, while power for individual mediation effects reaches a maximum when the mediators are uncorrelated. We illustrate our approach by applying it to a retrospective cohort study of dental caries in adolescents with low and high socioeconomic status. Sensitivity analysis is performed to assess the robustness of conclusions regarding mediation effects when the assumption of no unmeasured mediator-outcome confounders is violated.
PMCID: PMC3789850  PMID: 23650048
mediation analysis; multiple mediators; latent variables; overall mediation effect; decomposed mediation effect; mediation formula; sensitivity analysis
8.  A Three-way Decomposition of a Total Effect into Direct, Indirect, and Interactive Effects 
Epidemiology (Cambridge, Mass.)  2013;24(2):224-232.
Recent theory in causal inference has provided concepts for mediation analysis and effect decomposition that allow one to decompose a total effect into a direct and an indirect effect. Here, it is shown that what is often taken as an indirect effect can in fact be further decomposed into a “pure” indirect effect and a mediated interactive effect, thus yielding a three-way decomposition of a total effect (direct, indirect, and interactive). This three-way decomposition applies to difference scales and also to additive ratio scales and additive hazard scales. Assumptions needed for the identification of each of these three effects are discussed and simple formulae are given for each when regression models allowing for interaction are used. The three-way decomposition is illustrated by examples from genetic and perinatal epidemiology, and discussion is given to what is gained over the traditional two-way decomposition into simply a direct and an indirect effect.
PMCID: PMC3563853  PMID: 23354283
9.  Dynamic regression hazards models for relative survival 
Statistics in medicine  2008;27(18):3563-3584.
A natural way of modelling relative survival through regression analysis is to assume an additive form between the expected population hazard and the excess hazard due to the presence of an additional cause of mortality. Within this context, the existing approaches in the parametric, semiparametric and non-parametric setting are compared and discussed. We study the additive excess hazards models, where the excess hazard is on additive form. This makes it possible to assess the importance of time-varying effects for regression models in the relative survival framework. We show how recent developments can be used to make inferential statements about the non-parametric version of the model. This makes it possible to test the key hypothesis that an excess risk effect is time varying in contrast to being constant over time. In case some covariate effects are constant, we show how the semiparametric additive risk model can be considered in the excess risk setting, providing a better and more useful summary of the data. Estimators have explicit form and inference based on a resampling scheme is presented for both the non-parametric and semiparametric models. We also describe a new suggestion for goodness of fit of relative survival models, which consists on statistical and graphical tests based on cumulative martingale residuals. This is illustrated on the semiparametric model with proportional excess hazards. We analyze data from the TRACE study using different approaches and show the need for more flexible models in relative survival.
PMCID: PMC2737139  PMID: 18338318
10.  Mediation and spillover effects in group-randomized trials: a case study of the 4Rs educational intervention 
Peer influence and social interactions can give rise to spillover effects in which the exposure of one individual may affect outcomes of other individuals. Even if the intervention under study occurs at the group or cluster level as in group-randomized trials, spillover effects can occur when the mediator of interest is measured at a lower level than the treatment. Evaluators who choose groups rather than individuals as experimental units in a randomized trial often anticipate that the desirable changes in targeted social behaviors will be reinforced through interference among individuals in a group exposed to the same treatment. In an empirical evaluation of the effect of a school-wide intervention on reducing individual students’ depressive symptoms, schools in matched pairs were randomly assigned to the 4Rs intervention or the control condition. Class quality was hypothesized as an important mediator assessed at the classroom level. We reason that the quality of one classroom may affect outcomes of children in another classroom because children interact not simply with their classmates but also with those from other classes in the hallways or on the playground. In investigating the role of class quality as a mediator, failure to account for such spillover effects of one classroom on the outcomes of children in other classrooms can potentially result in bias and problems with interpretation. Using a counterfactual conceptualization of direct, indirect and spillover effects, we provide a framework that can accommodate issues of mediation and spillover effects in group randomized trials. We show that the total effect can be decomposed into a natural direct effect, a within-classroom mediated effect and a spillover mediated effect. We give identification conditions for each of the causal effects of interest and provide results on the consequences of ignoring “interference” or “spillover effects” when they are in fact present. Our modeling approach disentangles these effects. The analysis examines whether the 4Rs intervention has an effect on children's depressive symptoms through changing the quality of other classes as well as through changing the quality of a child's own class.
PMCID: PMC3753117  PMID: 23997375
Direct/indirect effects; interference; multilevel models; social interactions
11.  Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data 
Biometrika  2008;95(4):947-960.
We consider a class of semiparametric normal transformation models for right censored bivariate failure times. Nonparametric hazard rate models are transformed to a standard normal model and a joint normal distribution is assumed for the bivariate vector of transformed variates. A semiparametric maximum likelihood estimation procedure is developed for estimating the marginal survival distribution and the pairwise correlation parameters. This produces an efficient estimator of the correlation parameter of the semiparametric normal transformation model, which characterizes the bivariate dependence of bivariate survival outcomes. In addition, a simple positive-mass-redistribution algorithm can be used to implement the estimation procedures. Since the likelihood function involves infinite-dimensional parameters, the empirical process theory is utilized to study the asymptotic properties of the proposed estimators, which are shown to be consistent, asymptotically normal and semiparametric efficient. A simple estimator for the variance of the estimates is also derived. The finite sample performance is evaluated via extensive simulations.
PMCID: PMC2600666  PMID: 19079778
Asymptotic normality; Bivariate failure time; Consistency; Semiparametric efficiency; Semiparametric maximum likelihood estimate; Semiparametric normal transformation
12.  Modeling the impact of hepatitis C viral clearance on end-stage liver disease in an HIV co-infected cohort with Targeted Maximum Likelihood Estimation 
Biometrics  2013;70(1):144-152.
Despite modern effective HIV treatment, hepatitis C virus (HCV) co-infection is associated with a high risk of progression to end-stage liver disease (ESLD) which has emerged as the primary cause of death in this population. Clinical interest lies in determining the impact of clearance of HCV on risk for ESLD. In this case study, we examine whether HCV clearance affects risk of ESLD using data from the multicenter Canadian Co-infection Cohort Study. Complications in this survival analysis arise from the time-dependent nature of the data, the presence of baseline confounders, loss to follow-up, and confounders that change over time, all of which can obscure the causal effect of interest. Additional challenges included non-censoring variable missingness and event sparsity.
In order to efficiently estimate the ESLD-free survival probabilities under a specific history of HCV clearance, we demonstrate the doubly-robust and semiparametric efficient method of Targeted Maximum Likelihood Estimation (TMLE). Marginal structural models (MSM) can be used to model the effect of viral clearance (expressed as a hazard ratio) on ESLD-free survival and we demonstrate a way to estimate the parameters of a logistic model for the hazard function with TMLE. We show the theoretical derivation of the efficient influence curves for the parameters of two different MSMs and how they can be used to produce variance approximations for parameter estimates. Finally, the data analysis evaluating the impact of HCV on ESLD was undertaken using multiple imputations to account for the non-monotone missing data.
PMCID: PMC3954273  PMID: 24571372
Double-robust; Inverse probability of treatment weighting; Kaplan-Meier; Longitudinal data; Marginal structural model; Survival analysis; Targeted maximum likelihood estimation
13.  Longitudinal studies of binary response data following case-control and stratified case-control sampling: design and analysis 
Biometrics  2009;66(2):365-373.
We discuss design and analysis of longitudinal studies after case-control sampling, wherein interest is in the relationship between a longitudinal binary response that is related to the sampling (case-control) variable, and a set of covariates. We propose a semiparametric modelling framework based on a marginal longitudinal binary response model and an ancillary model for subjects’ case-control status. In this approach, the analyst must posit the population prevalence of being a case, which is then used to compute an offset term in the ancillary model. Parameter estimates from this model are used to compute offsets for the longitudinal response model. Examining the impact of population prevalence and ancillary model misspecification, we show that time-invariant covariate parameter estimates, other than the intercept, are reasonably robust, but intercept and time-varying covariate parameter estimates can be sensitive to such misspecification. We study design and analysis issues impacting study efficiency, namely: choice of sampling variable and the strength of its relationship to the response, sample stratification, choice of working covariance weighting, and degree of flexibility of the ancillary model. The research is motivated by a longitudinal study following case-control sampling of the time course of ADHD symptoms.
PMCID: PMC3051172  PMID: 19673861
Bias; binary data; efficiency; Generalized Estimating Equations; longitudinal data; logistic regression; outcome dependent sampling
14.  An information criterion for marginal structural models 
Statistics in medicine  2012;32(8):1383-1393.
Marginal structural models were developed as a semiparametric alternative to the G-computation formula to estimate causal effects of exposures. In practice, these models are often specified using parametric regression models. As such, the usual conventions regarding regression model specification apply. This paper outlines strategies for marginal structural model specification, and considerations for the functional form of the exposure metric in the final structural model. We propose a quasi-likelihood information criterion adapted from use in generalized estimating equations. We evaluate the properties of our proposed information criterion using a limited simulation study. We illustrate our approach using two empirical examples. In the first example, we use data from a randomized breastfeeding promotion trial to estimate the effect of breastfeeding duration on infant weight at one year. In the second example, we use data from two prospective cohorts studies to estimate the effect of highly active antiretroviral therapy on CD4 count in an observational cohort of HIV-infected men and women. The marginal structural model specified should reflect the scientific question being addressed, but can also assist in exploration of other plausible and closely related questions. In marginal structural models, as in any regression setting, correct inference depends on correct model specification. Our proposed information criterion provides a formal method for comparing model fit for different specifications.
PMCID: PMC4180061  PMID: 22972662
Bias; Causal inference; Marginal structural model; Regression analysis; Model specification
15.  Mediation analysis with multiple versions of the mediator 
Epidemiology (Cambridge, Mass.)  2012;23(3):454-463.
The causal inference literature has provided definitions of direct and indirect effects based on counterfactuals that generalize the approach found in the social science literature. However, these definitions presuppose well defined hypothetical interventions on the mediator. In many settings there may be multiple ways to fix the mediator to a particular value and these different hypothetical interventions may have very different implications for the outcome of interest. In this paper we consider mediation analysis when multiple versions of the mediator are present. Specifically, we consider the problem of attempting to decompose a total effect of an exposure on an outcome into the portion through the intermediate and the portion through other pathways. We consider the setting in which there are multiple versions of the mediator but the investigator only has access to data on the particular measurement, not which version of the mediator may have brought that value about. We show that the quantity that is estimated as a natural indirect effect using only the available data does indeed have an interpretation as a particular type of mediated effect; however, the quantity estimated as a natural direct effect in fact captures both a true direct effect and an effect of the exposure on the outcome mediated through the effect of the version of the mediator that is not captured by the mediator measurement. The results are illustrated using two examples from the literature, one in which the versions of the mediator are unknown and another in which the mediator itself has been dichotomized.
PMCID: PMC3771529  PMID: 22475830
16.  Repeated Measures Semiparametric Regression Using Targeted Maximum Likelihood Methodology with Application to Transcription Factor Activity Discovery 
In longitudinal and repeated measures data analysis, often the goal is to determine the effect of a treatment or aspect on a particular outcome (e.g., disease progression). We consider a semiparametric repeated measures regression model, where the parametric component models effect of the variable of interest and any modification by other covariates. The expectation of this parametric component over the other covariates is a measure of variable importance. Here, we present a targeted maximum likelihood estimator of the finite dimensional regression parameter, which is easily estimated using standard software for generalized estimating equations.
The targeted maximum likelihood method provides double robust and locally efficient estimates of the variable importance parameters and inference based on the influence curve. We demonstrate these properties through simulation under correct and incorrect model specification, and apply our method in practice to estimating the activity of transcription factor (TF) over cell cycle in yeast. We specifically target the importance of SWI4, SWI6, MBP1, MCM1, ACE2, FKH2, NDD1, and SWI5.
The semiparametric model allows us to determine the importance of a TF at specific time points by specifying time indicators as potential effect modifiers of the TF. Our results are promising, showing significant importance trends during the expected time periods. This methodology can also be used as a variable importance analysis tool to assess the effect of a large number of variables such as gene expressions or single nucleotide polymorphisms.
PMCID: PMC3122882  PMID: 21291412
targeted maximum likelihood; semiparametric; repeated measures; longitudinal; transcription factors
17.  A Partial Linear Model in the Outcome Dependent Sampling Setting to Evaluate the Effect of Prenatal PCB Exposure on Cognitive Function in Children 
Biometrics  2010;67(3):876-885.
Outcome-dependent sampling (ODS) has been widely used in biomedical studies because it is a cost effective way to improve study efficiency. However, in the setting of a continuous outcome, the representation of the exposure variable has been limited to the framework of linear models, due to the challenge in terms of both theory and computation. Partial linear models (PLM) are a powerful inference tool to nonparametrically model the relation between an outcome and the exposure variable. In this article, we consider a case study of a partial linear model for data from an ODS design. We propose a semiparametric maximum likelihood method to make inferences with a PLM. We develop the asymptotic properties and conduct simulation studies to show that the proposed ODS estimator can produce a more efficient estimate than that from a traditional simple random sampling design with the same sample size. Using this newly developed method, we were able to explore an open question in epidemiology: whether in utero exposure to background levels of PCBs is associated with children’s intellectual impairment. Our model provides further insights into the relation between low-level PCB exposure and children’s cognitive function. The results shed new light on a body of inconsistent epidemiologic findings.
PMCID: PMC3182522  PMID: 21039397
Cost-effective designs; Empirical likelihood; Outcome dependent sampling; Partial linear model; Polychlorinated biphenyls; P-spline
18.  Identification and efficient estimation of the natural direct effect among the untreated 
Biometrics  2013;69(2):310-317.
The natural direct effect (NDE), or the effect of an exposure on an outcome if an intermediate variable was set to the level it would have been in the absence of the exposure, is often of interest to investigators. In general, the statistical parameter associated with the NDE is difficult to estimate in the non-parametric model, particularly when the intermediate variable is continuous or high dimensional. In this paper we introduce a new causal parameter called the natural direct effect among the untreated, discus identifiability assumptions, propose a sensitivity analysis for some of the assumptions, and show that this new parameter is equivalent to the NDE in a randomized controlled trial. We also present a targeted minimum loss estimator (TMLE), a locally efficient, double robust substitution estimator for the statistical parameter associated with this causal parameter. The TMLE can be applied to problems with continuous and high dimensional intermediate variables, and can be used to estimate the NDE in a randomized controlled trial with such data. Additionally, we define and discuss the estimation of three related causal parameters: the natural direct effect among the treated, the indirect effect among the untreated and the indirect effect among the treated.
PMCID: PMC3692606  PMID: 23607645
Causal inference; direct effect; indirect effect; mediation analysis; semiparametric models; targeted minimum loss estimation
19.  SIMEX and standard error estimation in semiparametric measurement error models 
SIMEX is a general-purpose technique for measurement error correction. There is a substantial literature on the application and theory of SIMEX for purely parametric problems, as well as for purely non-parametric regression problems, but there is neither application nor theory for semiparametric problems. Motivated by an example involving radiation dosimetry, we develop the basic theory for SIMEX in semiparametric problems using kernel-based estimation methods. This includes situations that the mismeasured variable is modeled purely parametrically, purely non-parametrically, or that the mismeasured variable has components that are modeled both parametrically and nonparametrically. Using our asymptotic expansions, easily computed standard error formulae are derived, as are the bias properties of the nonparametric estimator. The standard error method represents a new method for estimating variability of nonparametric estimators in semiparametric problems, and we show in both simulations and in our example that it improves dramatically on first order methods.
We find that for estimating the parametric part of the model, standard bandwidth choices of order O(n−1/5) are sufficient to ensure asymptotic normality, and undersmoothing is not required. SIMEX has the property that it fits misspecified models, namely ones that ignore the measurement error. Our work thus also more generally describes the behavior of kernel-based methods in misspecified semiparametric problems.
PMCID: PMC2710855  PMID: 19609371
Berkson measurement errors; measurement error; misspecified models; nonparametric regression; radiation epidemiology; semiparametric models; SIMEX; simulation-extrapolation; standard error estimation; uniform expansions
20.  Genetic Architecture Promotes the Evolution and Maintenance of Cooperation 
PLoS Computational Biology  2013;9(11):e1003339.
When cooperation has a direct cost and an indirect benefit, a selfish behavior is more likely to be selected for than an altruistic one. Kin and group selection do provide evolutionary explanations for the stability of cooperation in nature, but we still lack the full understanding of the genomic mechanisms that can prevent cheater invasion. In our study we used Aevol, an agent-based, in silico genomic platform to evolve populations of digital organisms that compete, reproduce, and cooperate by secreting a public good for tens of thousands of generations. We found that cooperating individuals may share a phenotype, defined as the amount of public good produced, but have very different abilities to resist cheater invasion. To understand the underlying genetic differences between cooperator types, we performed bio-inspired genomics analyses of our digital organisms by recording and comparing the locations of metabolic and secretion genes, as well as the relevant promoters and terminators. Association between metabolic and secretion genes (promoter sharing, overlap via frame shift or sense-antisense encoding) was characteristic for populations with robust cooperation and was more likely to evolve when secretion was costly. In mutational analysis experiments, we demonstrated the potential evolutionary consequences of the genetic association by performing a large number of mutations and measuring their phenotypic and fitness effects. The non-cooperating mutants arising from the individuals with genetic association were more likely to have metabolic deleterious mutations that eventually lead to selection eliminating such mutants from the population due to the accompanying fitness decrease. Effectively, cooperation evolved to be protected and robust to mutations through entangled genetic architecture. Our results confirm the importance of second-order selection on evolutionary outcomes, uncover an important genetic mechanism for the evolution and maintenance of cooperation, and suggest promising methods for preventing gene loss in synthetically engineered organisms.
Author Summary
Cooperation is a much studied and debated phenomena in the microbial world marked by a key question: Given the survival of the fittest evolutionary paradigm, why do individuals act in seemingly altruistic ways, paying a cost to help others? Kin selection and group selection, together with mathematical tools from areas such as economics and game theory, have provided some answers. However, they largely ignored the underlying genetic and genomic mechanisms that drive the evolution of cooperation. In this study, we show that the architecture of the genomes has a major role in shaping the fate of cooperating populations. Specifically, we use an in silico evolution platform and discover that genes for cooperative traits are “hiding” behind metabolic ones by overlapping their sequences or sharing operons. In conditions where cheaters may outcompete the cooperators, this entangled architecture evolves spontaneously and effectively protects cooperation from invasion by cheater mutants. We describe a novel genetic mechanism for the evolution and maintenance of cooperation and, by taking into account the second order selection pressures on the genomes, highlight the need for going beyond simple game theory models in its study.
PMCID: PMC3836702  PMID: 24278000
21.  Method for Evaluating Multiple Mediators: Mediating Effects of Smoking and COPD on the Association between the CHRNA5-A3 Variant and Lung Cancer Risk 
PLoS ONE  2012;7(10):e47705.
A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.
PMCID: PMC3471886  PMID: 23077662
22.  A Note on formulae for causal mediation analysis in an odds ratiocontext 
Epidemiologic methods  2014;2(1):21-31.
In a recent manuscript, VanderWeele and Vansteelandt (American Journal of Epidemiology, 2010,172:1339–1348) (hereafter VWV) build on results due to Judea Pearl on causal mediation analysis and derive simple closed-form expressions for so-called natural direct and indirect effects in an odds ratio context for a binary outcome and a continuous mediator. The expressions obtained by VWV make two key simplifying assumptions: The mediator is normally distributed with constant variance,The binary outcome is rare. Assumption A may not be appropriate in settings where, as can happen in routine epidemiologic applications, the distribution of the mediator variable is highly skew. However, in this note, the author establishes that under a key assumption of “no mediator-exposure interaction” in the logistic regression model for the outcome, the simple formulae of VWV continue to hold even when the normality assumption of the mediator is dropped. The author further shows that when the “no interaction” assumption is relaxed, the formula of VWV for the natural indirect effect in this setting continues to apply when assumption A is also dropped. However, an alternative formula to that of VWV for the natural direct effect is required in this context and is provided in an appendix. When the disease is not rare, the author replaces assumptions A and B with an assumption C that the mediator follows a so-called Bridge distribution in which case simple closed-form formulae are again obtained for the natural direct and indirect effects.
PMCID: PMC4193811  PMID: 25309848
23.  A robust two-way semi-linear model for normalization of cDNA microarray data 
BMC Bioinformatics  2005;6:14.
Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values.
We propose a robust semiparametric method in a two-way semi-linear model (TW-SLM) for normalization of cDNA microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that: (i) the percentage of differentially expressed genes is small; or (ii) the numbers of up- and down-regulated genes are about the same, as required in the LOWESS normalization method. We conduct simulation studies to evaluate the proposed method and use a real data set from a specially designed microarray experiment to compare the performance of the proposed method with that of the LOWESS normalization approach.
The simulation results show that the proposed method performs better than the LOWESS normalization method in terms of mean square errors for estimated gene effects. The results of analysis of the real data set also show that the proposed method yields more consistent results between the direct and the indirect comparisons and also can detect more differentially expressed genes than the LOWESS method.
Our simulation studies and the real data example indicate that the proposed robust TW-SLM method works at least as well as the LOWESS method and works better when the underlying assumptions for the LOWESS method are not satisfied. Therefore, it is a powerful alternative to the existing normalization methods.
PMCID: PMC549200  PMID: 15663789
24.  First-Year Maternal Employment and Child Development in the First Seven Years 
Using data from the first 2 phases of the NICHD Study of Early Child Care, we examine the links between maternal employment in the first 12 months of life and cognitive, social, and emotional outcomes for children at age 3, age 4½, and first grade. Drawing on theory and prior research from developmental psychology as well as economics and sociology, we address three main questions. First, what associations exist between first-year maternal employment and cognitive, social, and emotional outcomes for children over the first seven years of life? Second, to what extent do any such associations vary by the child’s gender and temperament, or the mother’s occupation? Third, to what extent do mother’s earnings, the home environment (maternal depressive symptoms, sensitivity, and HOME scores), and the type and quality of child care mediate or offset any associations between first-year employment and child outcomes, and what is the net effect of first-year maternal employment once these factors are taken into account?
We compare families in which mothers worked full time (55%), part time (23%), or did not work (22%) in the first year for non-Hispanic white children (N=900) and for African-American children (N=113). Comparisons are also made taking into account the timing of mothers’ employment within the first year. A rich set of control variables are included. OLS and SEM analyses are constructed.
With regard to cognitive outcomes, first, we find that full-time maternal employment in the first 12 months of life (but not part-time employment) is associated with significantly lower scores on some, but not all, measures of cognitive development at age 3, 4 ½, and first grade for non-Hispanic white children, but with no significant associations for the small sample of African-American children Part-time employment in the first year is associated with higher scores than full-time employment for some measures. Employment in the second and third year of life is not associated with the cognitive outcomes. Second, we examine the role of the child’s gender and temperament and the mother’s occupation in moderating the associations between first-year maternal employment and cognitive outcomes, but find few significant interactions for either child characteristics or mother’s occupation. Third, we examine the role of an extensive set of potential mediators – the mother’s earnings, the home environment, and the type and quality of child care. We find that mothers who worked full time have higher income in the first year of life and thereafter, that mothers who worked part time have higher HOME and maternal sensitivity scores than mothers who did not work or worked full time, and that mothers who worked either full time or part time were more likely to place their children in high-quality child care by age 3 and 4 ½ and their children spent more time in center-based care by age 4 ½ than in families where mothers did not work in the first year of life. However, we also find some links between first-year maternal employment and elevated levels of maternal depressive symptoms thereafter. Turning to results from structural equation modeling, we find that the overall effects of first-year maternal employment on the cognitive outcomes are neutral. This occurs because significantly negative direct effects of full-time first-year employment are offset by significantly positive indirect effects working through more use of center-based care and greater maternal sensitivity by age 4 ½.
Regarding social and emotional outcomes, several findings, again limited to non-Hispanic white children, stand out. First, we find no significant associations between first-year maternal employment and later social and emotional outcomes (including attachment security) when comparing children whose mothers worked full-time or part-time in the first year with the reference group of children whose mothers did not work in the first year, although in models that take the timing of employment within the first year into account, we find some significant associations between full-time maternal employment in the first year and higher levels of caregiver- or teacher-reported externalizing problems at age 4 ½ and first grade. Second, part-time maternal employment by 12 months tends to be associated with fewer externalizing problems at age 4 ½ and first grade than full time maternal employment by 12 months. These results are unchanged when we allow for the possibility of moderation by child characteristics or maternal occupation. Third, the results from SEM models indicate that, while neither full-time nor part-time first-year employment has significant total effects on children’s externalizing behavior problems at age 4 ½ or first grade, part-time first-year employment has indirect positive effects, working primarily through differences in the home environment and maternal sensitivity. Another important finding from the SEM models is that center-based care, which is often associated with maternal employment, is not significantly associated with elevated levels of child behavior problems.
Taken together, our findings provide new insight as to the net effects of first-year maternal employment as well as the potential pathways through which associations between first-year maternal employment and later child outcomes, where present, come about. Our SEM results indicate that, on average, the associations between first-year maternal employment and later cognitive, social, and emotional outcomes are neutral, because negative effects, where present, are offset by positive effects. These results confirm that maternal employment in the first year of life may confer both advantages and disadvantages and that for the average non-Hispanic white child, those effects balance each other.
PMCID: PMC4139074  PMID: 25152543
25.  Sensitivity analysis for direct and indirect effects in the presence of exposure-induced mediator-outcome confounders 
Questions of mediation are often of interest in reasoning about mechanisms, and methods have been developed to address these questions. However, these methods make strong assumptions about the absence of confounding. Even if exposure is randomized, there may be mediator-outcome confounding variables. Inference about direct and indirect effects is particularly challenging if these mediator-outcome confounders are affected by the exposure because in this case these effects are not identified irrespective of whether data is available on these exposure-induced mediator-outcome confounders. In this paper, we provide a sensitivity analysis technique for natural direct and indirect effects that is applicable even if there are mediator-outcome confounders affected by the exposure. We give techniques for both the difference and risk ratio scales and compare the technique to other possible approaches.
PMCID: PMC4287391  PMID: 25580387
Confounding; direct and indirect effects; mediation; sensitivity analysis

Results 1-25 (1278470)