# Related Articles

Consider a study in which the effect of a binary exposure on an outcome operates partly through a binary mediator but measurement of the mediator is nondifferentially misclassified. Suppose that an investigator wishes to estimate the direct and indirect effects of the exposure on the outcome. In this paper, the authors describe a mathematical correspondence between the empirical expressions for the natural direct effect and the effect of exposure among the unexposed standardized by a binary confounder. They then exploit this correspondence to prove that the direction of the bias due to nondifferential measurement error in estimating the natural direct and indirect effects is to overestimate the natural direct effect and underestimate the natural indirect effect.

doi:10.1093/aje/kws131

PMCID: PMC3530348
PMID: 22930481

bias (epidemiology); confounding factors (epidemiology); epidemiologic methods; measurement error; mediating factors

Recently, researchers have used a potential-outcome framework to estimate causally interpretable direct and indirect effects of an intervention or exposure on an outcome. One approach to causal-mediation analysis uses the so-called mediation formula to estimate the natural direct and indirect effects. This approach generalizes classical mediation estimators and allows for arbitrary distributions for the outcome variable and mediator. A limitation of the standard (parametric) mediation formula approach is that it requires a specified mediator regression model and distribution; such a model may be difficult to construct and may not be of primary interest. To address this limitation, we propose a new method for causal-mediation analysis that uses the empirical distribution function, thereby avoiding parametric distribution assumptions for the mediator. In order to adjust for confounders of the exposure-mediator and exposure-outcome relationships, inverse-probability weighting is incorporated based on a supplementary model of the probability of exposure. This method, which yields estimates of the natural direct and indirect effects for a specified reference group, is applied to data from a cohort study of dental caries in very-low-birth-weight adolescents to investigate the oral-hygiene index as a possible mediator. Simulation studies show low bias in the estimation of direct and indirect effects in a variety of distribution scenarios, whereas the standard mediation formula approach can be considerably biased when the distribution of the mediator is incorrectly specified.

doi:10.1097/EDE.0b013e31826c2bb9

PMCID: PMC3773310
PMID: 23007042

Recent theory in causal inference has provided concepts for mediation analysis and effect decomposition that allow one to decompose a total effect into a direct and an indirect effect. Here, it is shown that what is often taken as an indirect effect can in fact be further decomposed into a “pure” indirect effect and a mediated interactive effect, thus yielding a three-way decomposition of a total effect (direct, indirect, and interactive). This three-way decomposition applies to difference scales and also to additive ratio scales and additive hazard scales. Assumptions needed for the identification of each of these three effects are discussed and simple formulae are given for each when regression models allowing for interaction are used. The three-way decomposition is illustrated by examples from genetic and perinatal epidemiology, and discussion is given to what is gained over the traditional two-way decomposition into simply a direct and an indirect effect.

doi:10.1097/EDE.0b013e318281a64e

PMCID: PMC3563853
PMID: 23354283

SUMMARY

We consider a class of semiparametric normal transformation models for right censored bivariate failure times. Nonparametric hazard rate models are transformed to a standard normal model and a joint normal distribution is assumed for the bivariate vector of transformed variates. A semiparametric maximum likelihood estimation procedure is developed for estimating the marginal survival distribution and the pairwise correlation parameters. This produces an efficient estimator of the correlation parameter of the semiparametric normal transformation model, which characterizes the bivariate dependence of bivariate survival outcomes. In addition, a simple positive-mass-redistribution algorithm can be used to implement the estimation procedures. Since the likelihood function involves infinite-dimensional parameters, the empirical process theory is utilized to study the asymptotic properties of the proposed estimators, which are shown to be consistent, asymptotically normal and semiparametric efficient. A simple estimator for the variance of the estimates is also derived. The finite sample performance is evaluated via extensive simulations.

doi:10.1093/biomet/asn049

PMCID: PMC2600666
PMID: 19079778

Asymptotic normality; Bivariate failure time; Consistency; Semiparametric efficiency; Semiparametric maximum likelihood estimate; Semiparametric normal transformation

The causal inference literature has provided definitions of direct and indirect effects based on counterfactuals that generalize the approach found in the social science literature. However, these definitions presuppose well defined hypothetical interventions on the mediator. In many settings there may be multiple ways to fix the mediator to a particular value and these different hypothetical interventions may have very different implications for the outcome of interest. In this paper we consider mediation analysis when multiple versions of the mediator are present. Specifically, we consider the problem of attempting to decompose a total effect of an exposure on an outcome into the portion through the intermediate and the portion through other pathways. We consider the setting in which there are multiple versions of the mediator but the investigator only has access to data on the particular measurement, not which version of the mediator may have brought that value about. We show that the quantity that is estimated as a natural indirect effect using only the available data does indeed have an interpretation as a particular type of mediated effect; however, the quantity estimated as a natural direct effect in fact captures both a true direct effect and an effect of the exposure on the outcome mediated through the effect of the version of the mediator that is not captured by the mediator measurement. The results are illustrated using two examples from the literature, one in which the versions of the mediator are unknown and another in which the mediator itself has been dichotomized.

doi:10.1097/EDE.0b013e31824d5fe7

PMCID: PMC3771529
PMID: 22475830

Summary

The goal of mediation analysis is to assess direct and indirect effects of a treatment or exposure on an outcome. More generally, we may be interested in the context of a causal model as characterized by a directed acyclic graph (DAG), where mediation via a specific path from exposure to outcome may involve an arbitrary number of links (or ‘stages’). Methods for estimating mediation (or pathway) effects are available for a continuous outcome and a continuous mediator related via a linear model, while for a categorical outcome or categorical mediator, methods are usually limited to two-stage mediation. We present a method applicable to multiple stages of mediation and mixed variable types using generalized linear models. We define pathway effects using a potential outcomes framework and present a general formula that provides the effect of exposure through any specified pathway. Some pathway effects are nonidentifiable and their estimation requires an assumption regarding the correlation between counterfactuals. We provide a sensitivity analysis to assess of the impact of this assumption. Confidence intervals for pathway effect estimates are obtained via a bootstrap method. The method is applied to a cohort study of dental caries in very low birth weight adolescents. A simulation study demonstrates low bias of pathway effect estimators and close-to-nominal coverage rates of confidence intervals. We also find low sensitivity to the counterfactual correlation in most scenarios.

doi:10.1111/j.1541-0420.2010.01547.x

PMCID: PMC3139764
PMID: 21306353

Copula; Generalized linear model; G-computation algorithm; Path analysis; Potential outcome; Sensitivity analysis

Background

Relative survival is commonly used for studying survival of cancer patients as it captures both the direct and indirect contribution of a cancer diagnosis on mortality by comparing the observed survival of the patients to the expected survival in a comparable cancer-free population. However, existing methods do not allow estimation of the impact of isolated conditions (e.g., excess cardiovascular mortality) on the total excess mortality. For this purpose we extend flexible parametric survival models for relative survival, which use restricted cubic splines for the baseline cumulative excess hazard and for any time-dependent effects.

Methods

In the extended model we partition the excess mortality associated with a diagnosis of cancer through estimating a separate baseline excess hazard function for the outcomes under investigation. This is done by incorporating mutually exclusive background mortality rates, stratified by the underlying causes of death reported in the Swedish population, and by introducing cause of death as a time-dependent effect in the extended model. This approach thereby enables modeling of temporal trends in e.g., excess cardiovascular mortality and remaining cancer excess mortality simultaneously. Furthermore, we illustrate how the results from the proposed model can be used to derive crude probabilities of death due to the component parts, i.e., probabilities estimated in the presence of competing causes of death.

Results

The method is illustrated with examples where the total excess mortality experienced by patients diagnosed with breast cancer is partitioned into excess cardiovascular mortality and remaining cancer excess mortality.

Conclusions

The proposed method can be used to simultaneously study disease patterns and temporal trends for various causes of cancer-consequent deaths. Such information should be of interest for patients and clinicians as one way of improving prognosis after cancer is through adapting treatment strategies and follow-up of patients towards reducing the excess mortality caused by side effects of the treatment.

doi:10.1186/1471-2288-12-86

PMCID: PMC3526518
PMID: 22726307

Survival analysis; Cancer; Relative survival; Regression models; Competing risks

A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.

doi:10.1371/journal.pone.0047705

PMCID: PMC3471886
PMID: 23077662

This study compares methods for analyzing correlated survival data from physician-randomized trials of health care quality improvement interventions. Several proposed methods adjust for correlated survival data however the most suitable method is unknown. Applying the characteristics of our study example, we performed three simulation studies to compare conditional, marginal, and non-parametric methods for analyzing clustered survival data. We simulated 1,000 datasets using a shared frailty model with (1) fixed cluster size, (2) variable cluster size, and (3) non-lognormal random effects. Methods of analyses included: the nonlinear mixed model (conditional), the marginal proportional hazards model with robust standard errors, the clustered logrank test, and the clustered permutation test (non-parametric). For each method considered we estimated Type I error, power, mean squared error, and the coverage probability of the treatment effect estimator. We observed underestimated Type I error for the clustered logrank test. The marginal proportional hazards method performed well even when model assumptions were violated. Nonlinear mixed models were only advantageous when the distribution was correctly specified.

doi:10.1016/j.cct.2011.08.008

PMCID: PMC3849400
PMID: 21924382

Cluster Randomized Trials; Survival Analysis; Physician-Randomized Trials; Permutation Test; Simulation Study; Shared Frailty Model

For dichotomous outcomes, the authors discuss when the standard approaches to mediation analysis used in epidemiology and the social sciences are valid, and they provide alternative mediation analysis techniques when the standard approaches will not work. They extend definitions of controlled direct effects and natural direct and indirect effects from the risk difference scale to the odds ratio scale. A simple technique to estimate direct and indirect effect odds ratios by combining logistic and linear regressions is described that applies when the outcome is rare and the mediator continuous. Further discussion is given as to how this mediation analysis technique can be extended to settings in which data come from a case-control study design. For the standard mediation analysis techniques used in the epidemiologic and social science literatures to be valid, an assumption of no interaction between the effects of the exposure and the mediator on the outcome is needed. The approach presented here, however, will apply even when there are interactions between the effect of the exposure and the mediator on the outcome.

doi:10.1093/aje/kwq332

PMCID: PMC2998205
PMID: 21036955

case-control studies; causal inference; decomposition; dichotomous response; epidemiologic methods; interaction; logistic regression; odds ratio

This article discusses a method by Erikson et al. (2005) for decomposing a total effect in a logit model into direct and indirect effects. Moreover, this article extends this method in three ways. First, in the original method the variable through which the indirect effect occurs is assumed to be normally distributed. In this article the method is generalized by allowing this variable to have any distribution. Second, the original method did not provide standard errors for the estimates. In this article the bootstrap is proposed as a method of providing those. Third, I show how to include control variables in this decomposition, which was not allowed in the original method. The original method and these extensions are implemented in the ldecomp package.

PMCID: PMC3314333
PMID: 22468140

st0001; ldecomp; mediation; intervening variable; logit

Background

Linear mixed effects models (LMMs) are a common approach for analyzing longitudinal data in a variety of settings. Although LMMs may be applied to complex data structures, such as settings where mediators are present, it is unclear whether they perform well relative to methods for mediational analyses such as structural equation models (SEMs), which have obvious appeal in such settings. For some researchers, SEMs may be more difficult than LMMs to implement, e.g. due to lack of training in the methodology or the need for specialized SEM software. It therefore is of interest to evaluate whether the LMM performs sufficiently in a scenario particularly suitable for SEMs. We focus on evaluation of the total effect (i.e. direct and indirect) of an exposure on an outcome of interest when a mediating factor is present. Our aim is to explore whether the LMM performs as well as the SEM in a setting that is conducive to using the SEM.

Methods

We simulated mediated longitudinal data from an SEM where a binary, main independent variable has both direct and indirect effects on a continuous outcome. We conducted analyses with both the LMM and SEM to evaluate the performance of the LMM in a setting where the SEM is expected to be preferable. Models were evaluated with respect to bias, coverage probability and power. Sample size, effect size and error distribution of the simulated data were varied.

Results

Both models performed well in a range of settings. Marginal increases in power estimates were observed for the SEM, although generally there were no major differences in performance. Power for both models was good with a sample of size of 250 and a small to medium effect size. Bias did not substantially increase for either model when data were generated from distributions that were both skewed and kurtotic.

Conclusions

In settings where the goal is to evaluate the overall effects, the LMM excluding mediating variables appears to have good performance with respect to power, bias and coverage probability relative to the SEM. The major benefit of SEMs is that it simultaneously and efficiently models both the direct and indirect effects of the mediation process.

doi:10.1186/1471-2288-10-16

PMCID: PMC2842282
PMID: 20170503

In occupational epidemiologic studies, the healthy-worker survivor effect refers to a process that leads to bias in the estimates of an association between cumulative exposure and a health outcome. In these settings, work status acts both as an intermediate and confounding variable, and may violate the positivity assumption (the presence of exposed and unexposed observations in all strata of the confounder). Using Monte Carlo simulation, we assess the degree to which crude, work-status adjusted, and weighted (marginal structural) Cox proportional hazards models are biased in the presence of time-varying confounding and nonpositivity. We simulate data representing time-varying occupational exposure, work status, and mortality. Bias, coverage, and root mean squared error (MSE) were calculated relative to the true marginal exposure effect in a range of scenarios. For a base-case scenario, using crude, adjusted, and weighted Cox models, respectively, the hazard ratio was biased downward 19%, 9%, and 6%; 95% confidence interval coverage was 48%, 85%, and 91%; and root MSE was 0.20, 0.13, and 0.11. Although marginal structural models were less biased in most scenarios studied, neither standard nor marginal structural Cox proportional hazards models fully resolve the bias encountered under conditions of time-varying confounding and nonpositivity.

doi:10.1097/EDE.0b013e31822549e8

PMCID: PMC3155387
PMID: 21747286

Summary

Outcome-dependent sampling (ODS) has been widely used in biomedical studies because it is a cost effective way to improve study efficiency. However, in the setting of a continuous outcome, the representation of the exposure variable has been limited to the framework of linear models, due to the challenge in terms of both theory and computation. Partial linear models (PLM) are a powerful inference tool to nonparametrically model the relation between an outcome and the exposure variable. In this article, we consider a case study of a partial linear model for data from an ODS design. We propose a semiparametric maximum likelihood method to make inferences with a PLM. We develop the asymptotic properties and conduct simulation studies to show that the proposed ODS estimator can produce a more efficient estimate than that from a traditional simple random sampling design with the same sample size. Using this newly developed method, we were able to explore an open question in epidemiology: whether in utero exposure to background levels of PCBs is associated with children’s intellectual impairment. Our model provides further insights into the relation between low-level PCB exposure and children’s cognitive function. The results shed new light on a body of inconsistent epidemiologic findings.

doi:10.1111/j.1541-0420.2010.01500.x

PMCID: PMC3182522
PMID: 21039397

Cost-effective designs; Empirical likelihood; Outcome dependent sampling; Partial linear model; Polychlorinated biphenyls; P-spline

In many biomedical studies, it is common that due to budget constraints, the primary covariate is only collected in a randomly selected subset from the full study cohort. Often, there is an inexpensive auxiliary covariate for the primary exposure variable that is readily available for all the cohort subjects. Valid statistical methods that make use of the auxiliary information to improve study efficiency need to be developed. To this end, we develop an estimated partial likelihood approach for correlated failure time data with auxiliary information. We assume a marginal hazard model with common baseline hazard function. The asymptotic properties for the proposed estimators are developed. The proof of the asymptotic results for the proposed estimators is nontrivial since the moments used in estimating equation are not martingale-based and the classical martingale theory is not sufficient. Instead, our proofs rely on modern empirical theory. The proposed estimator is evaluated through simulation studies and is shown to have increased efficiency compared to existing methods. The proposed methods are illustrated with a data set from the Framingham study.

doi:10.1007/s10985-011-9209-x

PMCID: PMC3259288
PMID: 22094533

Marginal hazard model; Correlated failure time; Validation set; Auxiliary covariate

We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins.

doi:10.1093/biomet/asp082

PMCID: PMC3633199
PMID: 23613620

Binomial modelling; Copula function; Cross-odds ratio; Cumulative incidence function; Danish twin data; Estimating equation; Inverse-censoring probability weighting; Two-stage estimation

Objectives

To build upon state-of-the art theory and empirical data to estimate the strength of multiple mediators of the efficacious Keep Active Minnesota (KAM) physical activity (PA) maintenance intervention.

Methods

The total, direct, and indirect effects through which KAM helped randomized participants (KAM n=523; UC n=526) maintain moderate or vigorous PA (MVPA) for up to 2 years were estimated using structural equation modeling.

Results

Multiple mediators explained half (β=.052, P=.13) of the effect of KAM on MVPA (β=.105, P=.004). Self-efficacy was the upstream variable in 2 endogenously mediated effects, and the self-concept mediator emerged as the strongest predictor of MVPA.

Conclusions

KAM positively impacted self-efficacy, which was associated with PA enjoyment, integration into the self-concept, and PA maintenance. Successful long-term PA maintenance appears to be influenced by multiple small interrelated mediational pathways. Future research evaluating maintenance models should specify recursive relationships among mediators and outcomes.

PMCID: PMC3319762
PMID: 20604700

maintenance; physical activity; multiple mediation; behavioral intervention; structural equation modeling

The hazard ratio provides a natural target for assessing a treatment effect with survival data, with the Cox proportional hazards model providing a widely used special case. In general, the hazard ratio is a function of time and provides a visual display of the temporal pattern of the treatment effect. A variety of nonproportional hazards models have been proposed in the literature. However, available methods for flexibly estimating a possibly time-dependent hazard ratio are limited. Here, we investigate a semiparametric model that allows a wide range of time-varying hazard ratio shapes. Point estimates as well as pointwise confidence intervals and simultaneous confidence bands of the hazard ratio function are established under this model. The average hazard ratio function is also studied to assess the cumulative treatment effect. We illustrate corresponding inference procedures using coronary heart disease data from the Women's Health Initiative estrogen plus progestin clinical trial.

doi:10.1093/biostatistics/kxq061

PMCID: PMC3062151
PMID: 20860993

Clinical trial; Empirical process; Gaussian process; Hazard ratio; Simultaneous inference; Survival analysis; Treatment–time interaction

This paper presents marginal structural models (MSMs) with inverse propensity weighting (IPW) for assessing mediation. Generally, individuals are not randomly assigned to levels of the mediator. Therefore, confounders of the mediator and outcome may exist that limit causal inferences, a goal of mediation analysis. Either regression adjustment or IPW can be used to take confounding into account, but IPW has several advantages. Regression adjustment of even one confounder of the mediator and outcome that has been influenced by treatment results in biased estimates of the direct effect (i.e., the effect of treatment on the outcome that does not go through the mediator). One advantage of IPW is that it can properly adjust for this type of confounding, assuming there are no unmeasured confounders. Further, we illustrate that IPW estimation provides unbiased estimates of all effects when there is a baseline moderator variable that interacts with the treatment, when there is a baseline moderator variable that interacts with the mediator, and when the treatment interacts with the mediator. IPW estimation also provides unbiased estimates of all effects in the presence of non-randomized treatments. In addition, for testing mediation we propose a test of the null hypothesis of no mediation. Finally, we illustrate this approach with an empirical data set in which the mediator is continuous, as is often the case in psychological research.

doi:10.1037/a0029311

PMCID: PMC3553264
PMID: 22905648

Summary

Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV.

doi:10.1093/biomet/asq005

PMCID: PMC3412576
PMID: 23049121

Dimension reduction; Inverse probability weighting; Kernel regression; Missing at random; Robustness to model misspecification

The authors use recent methodology in causal inference to disentangle the direct and indirect effects that operate through a mediator in an exposure-response association paradigm. They demonstrate how total effects can be partitioned into direct and indirect effects even when the exposure and mediator interact. The impact of bias due to unmeasured confounding on the exposure-response association is assessed through a series of sensitivity analyses. These methods are applied to a problem in perinatal epidemiology to examine the extent to which the effect of abruption on perinatal mortality is mediated through preterm delivery. Data on over 26 million US singleton births (1995–2002) were utilized. Risks of mortality among abruption and nonabruption births were 102.7 and 6.2 per 1,000 births, respectively. Risk ratios of the natural direct and indirect (preterm delivery-mediated) effects of abruption on mortality were 10.18 (95% confidence interval: 9.80, 10.58) and 1.35 (95% confidence interval: 1.33, 1.38), respectively. The proportion of increased mortality risk mediated through preterm delivery was 28.1%, with even higher proportions associated with deliveries at earlier gestational ages. Sensitivity analyses underscore that the qualitative conclusions of some mediated effects and substantial direct effects are reasonably robust to unmeasured confounding of a fairly considerable magnitude.

doi:10.1093/aje/kwr045

PMCID: PMC3159429
PMID: 21430195

abruptio placentae; bias (epidemiology); causal model; gestational age; perinatal mortality

We tried to obtain preliminary evidence to test the hypothesis that the association between driving exposure and the frequency of reporting a road crash can be decomposed into two paths: direct and indirect (mediated by risky driving patterns). In a cross-sectional study carried out between 2007 and 2010, a sample of 1114 car drivers who were students at the University of Granada completed a questionnaire with items about driving exposure during the previous year, risk-related driving circumstances and involvement in road crashes. We applied the decomposition procedure proposed by Buis for logit models. The indirect path showed a strong dose-response relationship with the frequency of reporting a road crash, whereas the direct path did not. The decomposition procedure was able to identify the indirect path as the main explanatory mechanism for the association between exposure and the frequency of reporting a road crash.

doi:10.1136/injuryprev-2012-040467

PMCID: PMC3717768
PMID: 23129719

Summary

We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study.

PMCID: PMC2464276
PMID: 18629347

Attributable risk; Causal inference; Confounding; Counterfactual; Doubly-robust estimation; G-computation estimation; Inverse-probability-of-treatment-weighted estimation

Recently proposed double-robust estimators for a population mean from incomplete data and for a finite number of counterfactual means can have much higher efficiency than the usual double-robust estimators under misspecification of the outcome model. In this paper, we derive a new class of double-robust estimators for the parameters of regression models with incomplete cross-sectional or longitudinal data, and of marginal structural mean models for cross-sectional data with similar efficiency properties. Unlike the recent proposals, our estimators solve outcome regression estimating equations. In a simulation study, the new estimator shows improvements in variance relative to the standard double-robust estimator that are in agreement with those suggested by asymptotic theory.

doi:10.1093/biomet/ass013

PMCID: PMC3635709
PMID: 23843666

Drop-out; Marginal structural model; Missing at random

In this issue of the Journal, VanderWeele and Vansteelandt (Am J Epidemiol. 2010;172(12):1339–1348) provide simple formulae for estimation of direct and indirect effects using standard logistic regression when the exposure and outcome are binary, the mediator is continuous, and the odds ratio is the chosen effect measure. They also provide concisely stated lists of assumptions necessary for estimation of these effects, including various conditional independencies and homogeneity of exposure and mediator effects over covariate strata. They further suggest that this will allow effect decomposition in case-control studies if the sampling fractions and population outcome prevalence are known with certainty. In this invited commentary, the author argues that, in a well-designed case-control study in which the sampling fraction is known, it should not be necessary to rely on the odds ratio. The odds ratio has well-known deficiencies as a causal parameter, and its use severely complicates evaluation of confounding and effect homogeneity. Although VanderWeele and Vansteelandt propose that a rare disease assumption is not necessary for estimation of controlled direct effects using their approach, collapsibility concerns suggest otherwise when the goal is causal inference rather than merely measuring association. Moreover, their clear statement of assumptions necessary for the estimation of natural/pure effects suggests that these quantities will rarely be viable estimands in observational epidemiology.

doi:10.1093/aje/kwq329

PMCID: PMC3139971
PMID: 21036956

causal inference; conditional independence; confounding; decomposition; estimation; interaction; logistic regression; odds ratio