Clinical guidelines that rely on observational data due to the absence of data from randomized trials benefit when the observational data or its analysis emulates trial data or its analysis. In this paper, we review a methodology for emulating trials that compare the effects of different timing strategies, that is, strategies that vary the frequency of delivery of a medical intervention or procedure. We review trial emulation for comparing (i) single applications of the procedure at different times, (ii) fixed schedules of application, and (iii) schedules adapted to the evolving clinical characteristics of the patients. For illustration, we describe an application in which we estimate the effect of surveillance colonoscopies in patients who had an adenoma detected during the Norwegian Colorectal Cancer Prevention (NORCCAP) trial.
Causal inference; Dynamic strategies; Inverse probability weighting; Colorectal cancer; Surveillance; Colonoscopy
Because a comparison of noninitiators and initiators of treatment may be hopelessly confounded, guidelines for the conduct of observational research often recommend using an “active” comparator group consisting of people who initiate a treatment other than the medication of interest. In this paper, we discuss the conditions under which this approach is valid if the goal is to emulate a trial with an inactive comparator.
Identification of Effects:
We provide conditions under which a target trial in a subpopulation can be validly emulated from observational data, using an active comparator that is known or believed to be inactive for the outcome of interest. The average treatment effect in the population as a whole is not identified, but under certain conditions this approach can be used to emulate a trial in the subset of individuals who were treated with the treatment of interest, in the subset of individuals who were treated with the treatment of interest but not with the comparator, or in the subset of individuals who were treated with both the treatment of interest and the active comparator.
The Plausibility of the Comparability Conditions:
We discuss whether the required conditions can be expected to hold in pharmacoepidemiologic research, with a particular focus on whether the conditions are plausible in situations where the standard analysis fails due to unmeasured confounding by access to health care or health seeking behaviors.
The conditions discussed in this paper may at best be approximately true. Investigators using active comparator designs to emulate trials with inactive comparators should exercise caution.
Comparative Effectiveness Research (CER); Methods; Electronic Medical Record (EMR); Evidence Based Medicine
We give a simple proof of Bell’s inequality in quantum mechanics using theory from causal interaction, which, in conjunction with experiments, demonstrates that the local hidden variables assumption is false. The proof sheds light on relationships between the notion of causal interaction and interference between treatments.
Interactions; interference; local reality; quantum physics
Instrumental variable (IV) methods are increasingly being used in comparative effectiveness research. Studies using these methods often compare 2 particular treatments, and the researchers perform their IV analyses conditional on patients' receiving this subset of treatments (while ignoring the third option of “neither treatment”). The ensuing selection bias that occurs due to this restriction has gone relatively unnoticed in interpretations and discussions of these studies' results. In this paper we describe the structure of this selection bias with examples drawn from commonly proposed instruments such as calendar time and preference, illustrate the bias with causal diagrams, and estimate the magnitude and direction of possible bias using simulations. A noncausal association between the proposed instrument and the outcome can occur in analyses restricted to patients receiving a subset of the possible treatments. This results in bias in the numerator for the standard IV estimator; the bias is amplified in the treatment effect estimate. The direction and magnitude of the bias in the treatment effect estimate are functions of the distribution of and relationships between the proposed instrument, treatment values, unmeasured confounders, and outcome. IV methods used to compare a subset of treatment options are prone to substantial biases, even when the proposed instrument appears relatively strong.
collider stratification bias; epidemiologic methods; instrumental variable; selection bias
Supplemental Digital Content is Available in the Text.
To illustrate an approach to compare CD4 cell count and HIV-RNA monitoring strategies in HIV-positive individuals on antiretroviral therapy (ART).
Prospective studies of HIV-positive individuals in Europe and the USA in the HIV-CAUSAL Collaboration and The Center for AIDS Research Network of Integrated Clinical Systems.
Antiretroviral-naive individuals who initiated ART and became virologically suppressed within 12 months were followed from the date of suppression. We compared 3 CD4 cell count and HIV-RNA monitoring strategies: once every (1) 3 ± 1 months, (2) 6 ± 1 months, and (3) 9–12 ± 1 months. We used inverse-probability weighted models to compare these strategies with respect to clinical, immunologic, and virologic outcomes.
In 39,029 eligible individuals, there were 265 deaths and 690 AIDS-defining illnesses or deaths. Compared with the 3-month strategy, the mortality hazard ratios (95% CIs) were 0.86 (0.42 to 1.78) for the 6 months and 0.82 (0.46 to 1.47) for the 9–12 month strategy. The respective 18-month risk ratios (95% CIs) of virologic failure (RNA >200) were 0.74 (0.46 to 1.19) and 2.35 (1.56 to 3.54) and 18-month mean CD4 differences (95% CIs) were −5.3 (−18.6 to 7.9) and −31.7 (−52.0 to −11.3). The estimates for the 2-year risk of AIDS-defining illness or death were similar across strategies.
Our findings suggest that monitoring frequency of virologically suppressed individuals can be decreased from every 3 months to every 6, 9, or 12 months with respect to clinical outcomes. Because effects of different monitoring strategies could take years to materialize, longer follow-up is needed to fully evaluate this question.
HIV; CD4 cell count; HIV RNA; monitoring; observational studies; mortality
Preference-based instrumental variable methods are often used in comparative effectiveness research. Many instrumental variable studies estimate the local average treatment effect (i.e., the effect in the “compliers”) under the assumption of monotonicity, i.e., no “defiers,” and well-defined compliance types. However, the monotonicity assumption has not been empirically tested and the meaning of monotonicity itself is unclear.
Here we clarify the definition of local and global monotonicity and propose a novel study design to assess the monotonicity assumption empirically. Our design requires surveying physicians about their treatment plans and prescribing preferences for the same set of patients. We also discuss measures of monotonicity that can be calculated from this survey data. As an illustration, we conducted a pilot study in a survey of 53 physicians who reported treatment plans and prescribing preferences for hypothetical patients who were candidates for antipsychotic treatment.
In our study, nearly all patients exhibited some degree of monotonicity violations. In addition, patients could not be cleanly classified as compliers, defiers, always-takers, or never-takers.
We conclude that preference-based instrumental variable estimates should be interpreted cautiously because bias due to monotonicity violations is likely and because the subpopulation to which the estimate applies may not be well-defined. Investigators using preference-based instruments may consider supplementing their study with a survey to empirically assess the magnitude and direction of bias due to violations of monotonicity.
compliance type; complier; defier; instrumental variable; local average treatment effect; monotonicity
In this paper we discuss relationships between causal interactions within the counterfactual framework and interference in which the exposure of one person may affect the outcomes of another. We show that the empirical tests for causal interactions can in fact all be adapted to empirical tests for particular forms of interference. In the context of interference, by recoding the response as some function of the outcomes of the various persons within a cluster, a wide range of different forms of interference can potentially be detected. The correspondence between causal interactions and forms of interference extends to encompass n-way causal interactions, interference between n persons within a cluster and to multi-valued exposures. The theory for causal interactions provides a complete conceptual apparatus for assessing interference as well. The results are illustrated using data from an hypothetical vaccine trial to reason about specific forms of interference and spillover effects that may be present in this vaccine setting. We discuss the implications of this correspondence for our conceptualizations of interaction and for application to vaccine trials.
In Africa, antiretroviral therapy (ART) is delivered with limited laboratory monitoring, often none. In 2003–2004, investigators in the Development of Antiretroviral Therapy in Africa (DART) Trial randomized persons initiating ART in Uganda and Zimbabwe to either laboratory and clinical monitoring (LCM) or clinically driven monitoring (CDM). CD4 cell counts were measured every 12 weeks in both groups but were only returned to treating clinicians for management in the LCM group. Follow-up continued through 2008. In observational analyses, dynamic marginal structural models on pooled randomized groups were used to estimate survival under different monitoring-frequency and clinical/immunological switching strategies. Assumptions included no direct effect of randomized group on mortality or confounders and no unmeasured confounders which influenced treatment switch and mortality or treatment switch and time-dependent covariates. After 48 weeks of first-line ART, 2,946 individuals contributed 11,351 person-years of follow-up, 625 switches, and 179 deaths. The estimated survival probability after a further 240 weeks for post-48-week switch at the first CD4 cell count less than 100 cells/mm3 or non-Candida World Health Organization stage 4 event (with CD4 count <250) was 0.96 (95% confidence interval (CI): 0.94, 0.97) with 12-weekly CD4 testing, 0.96 (95% CI: 0.95, 0.97) with 24-weekly CD4 testing, 0.95 (95% CI: 0.93, 0.96) with a single CD4 test at 48 weeks (baseline), and 0.92 (95% CI: 0.91, 0.94) with no CD4 testing. Comparing randomized groups by 48-week CD4 count, the mortality risk associated with CDM versus LCM was greater in persons with CD4 counts of <100 (hazard ratio = 2.4, 95% CI: 1.3, 4.3) than in those with CD4 counts of ≥100 (hazard ratio = 1.1, 95% CI: 0.8, 1.7; interaction P = 0.04). These findings support a benefit from identifying patients immunologically failing first-line ART at 48 weeks.
Africa; antiretroviral therapy; drug switching; dynamic marginal structural models; HIV; monitoring
Methods from causal mediation analysis have generalized the traditional approach to direct and indirect effects in the epidemiologic and social science literature by allowing for interaction and non-linearities. However, the methods from the causal inference literature have themselves been subject to a major limitation in that the so-called natural direct and indirect effects that are employed are not identified from data whenever there is a variable that is affected by the exposure, which also confounds the relationship between the mediator and the outcome. In this paper we describe three alternative approaches to effect decomposition that give quantities that can be interpreted as direct and indirect effects, and that can be identified from data even in the presence of an exposure-induced mediator-outcome confounder. We describe a simple weighting-based estimation method for each of these three approaches, illustrated with data from perinatal epidemiology. The methods described here can shed insight into pathways and questions of mediation even when an exposure-induced mediator-outcome confounder is present.
A sufficient cause interaction between two exposures signals the presence of individuals for whom the outcome would occur only under certain values of the two exposures. When the outcome is dichotomous and all exposures are categorical, then under certain no confounding assumptions, empirical conditions for sufficient cause interactions can be constructed based on the sign of linear contrasts of conditional outcome probabilities between differently exposed subgroups, given confounders. It is argued that logistic regression models are unsatisfactory for evaluating such contrasts, and that Bernoulli regression models with linear link are prone to misspecification. We therefore develop semiparametric tests for sufficient cause interactions under models which postulate probability contrasts in terms of a finite-dimensional parameter, but which are otherwise unspecified. Estimation is often not feasible in these models because it would require nonparametric estimation of auxiliary conditional expectations given high-dimensional variables. We therefore develop ‘multiply robust tests’ under a union model that assumes at least one of several working submodels holds. In the special case of a randomized experiment or a family-based genetic study in which the joint exposure distribution is known by design or Mendelian inheritance, the procedure leads to asymptotically distribution-free tests of the null hypothesis of no sufficient cause interaction.
Double robustness; Effect modification; Gene-environment interaction; Gene-gene interaction; Semiparametric inference; Sufficient cause; Synergism
Most work in causal inference concerns deterministic counterfactuals; the literature on stochastic counterfactuals is small. In the stochastic counterfactual setting, the outcome for each individual under each possible set of exposures follows a probability distribution so that for any given exposure combination, outcomes vary not only between individuals but also probabilistically for each particular individual. The deterministic sufficient cause framework supplements the deterministic counterfactual framework by allowing for the representation of counterfactual outcomes in terms of sufficient causes or causal mechanisms. In the deterministic sufficient cause framework it is possible to test for the joint presence of two causes in the same causal mechanism, referred to as a sufficient cause interaction. In this paper, these ideas are extended to the setting of stochastic counterfactuals and stochastic sufficient causes. Formal definitions are given for a stochastic sufficient cause framework. It is shown that the empirical conditions that suffice to conclude the presence of a sufficient cause interaction in the deterministic sufficient cause framework suffice also to conclude the presence of a sufficient cause interaction in the stochastic sufficient cause framework. Two examples from the genetics literature, in which there is evidence that sufficient cause interactions are present, are discussed in light of the results in this paper.
Causal inference; Interaction; Stochastic counterfactual; Sufficient cause; Synergism
We present results that allow the researcher in certain cases to determine the direction of the bias that arises when control for confounding is inadequate. The results are given within the context of the directed acyclic graph causal framework and are stated in terms of signed edges. Rigorous definitions for signed edges are provided. We describe cases in which intuition concerning signed edges fails and we characterize the directed acyclic graphs that researchers can use to draw conclusions about the sign of the bias of unmeasured confounding. If there is only one unmeasured confounding variable on the graph, then non-increasing or non-decreasing average causal effects suffice to draw conclusions about the direction of the bias. When there are more than one unmeasured confounding variable, non-increasing and non-decreasing average causal effects can be used to draw conclusions only if the various unmeasured confounding variables are independent of one another conditional on the measured covariates. When this conditional independence property does not hold, stronger notions of monotonicity are needed to draw conclusions about the direction of the bias.
Formal rules governing signed edges on causal directed acyclic graphs are described in this paper and it is shown how these rules can be useful in reasoning about causality. Specifically, the notions of a monotonic effect, a weak monotonic effect and a signed edge are introduced. Results are developed relating these monotonic effects and signed edges to the sign of the causal effect of an intervention in the presence of intermediate variables. The incorporation of signed edges into the directed acyclic graph causal framework furthermore allows for the development of rules governing the relationship between monotonic effects and the sign of the covariance between two variables. It is shown that when certain assumptions about monotonic effects can be made then these results can be used to draw conclusions about the presence of causal effects even when data is missing on confounding variables.
Bias; Causal inference; Confounding; Directed acyclic graphs; Structural equations
Containing an emerging influenza H5N1 pandemic in its earliest stages may be feasible, but containing multiple introductions of a pandemic-capable strain would be more difficult. Mills and colleagues argue that multiple introductions are likely, especially if risk of a pandemic is high.
We review the class of inverse probability weighting (IPW) approaches for the analysis of missing data under various missing data patterns and mechanisms. The IPW methods rely on the intuitive idea of creating a pseudo-population of weighted copies of the complete cases to remove selection bias introduced by the missing data. However, different weighting approaches are required depending on the missing data pattern and mechanism. We begin with a uniform missing data pattern (i.e., a scalar missing indicator indicating whether or not the full data is observed) to motivate the approach. We then generalize to more complex settings. Our goal is to provide a conceptual overview of existing IPW approaches and illustrate the connections and differences among these approaches.
missing data; inverse probability weighting; missing at random; missing not at random; monotone missing; non-monotone missing
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).
Doubly robust; Generalized odds ratio; Locally efficient; Semiparametric logistic regression
In the presence of time-varying confounders affected by prior treatment, standard statistical methods for failure time analysis may be biased. Methods that correctly adjust for this type of covariate include the parametric g-formula, inverse probability weighted estimation of marginal structural Cox proportional hazards models, and g-estimation of structural nested accelerated failure time models. In this article, we propose a novel method to estimate the causal effect of a time-dependent treatment on failure in the presence of informative right-censoring and time-dependent confounders that may be affected by past treatment: g-estimation of structural nested cumulative failure time models (SNCFTMs). An SNCFTM considers the conditional effect of a final treatment at time m on the outcome at each later time k by modeling the ratio of two counterfactual cumulative risks at time k under treatment regimes that differ only at time m. Inverse probability weights are used to adjust for informative censoring. We also present a procedure that, under certain “no-interaction” conditions, uses the g-estimates of the model parameters to calculate unconditional cumulative risks under nondynamic (static) treatment regimes. The procedure is illustrated with an example using data from a longitudinal cohort study, in which the “treatments” are healthy behaviors and the outcome is coronary heart disease.
Causal inference; Coronary heart disease; Epidemiology; G-estimation; Inverse probability weighting
Ideally, randomized trials would be used to compare the long-term
effectiveness of dynamic treatment regimes on clinically relevant outcomes.
However, because randomized trials are not always feasible or timely, we often
must rely on observational data to compare dynamic treatment regimes. An example
of a dynamic treatment regime is “start combined antiretroviral therapy
(cART) within 6 months of CD4 cell count first dropping below x
cells/mm3 or diagnosis of an AIDS-defining illness, whichever
happens first” where x can take values between 200 and
500. Recently, Cain et al (2011) used
inverse probability (IP) weighting of dynamic marginal structural models to find
the x that minimizes 5-year mortality risk under similar
dynamic regimes using observational data. Unlike standard methods, IP weighting
can appropriately adjust for measured time-varying confounders (e.g., CD4 cell
count, viral load) that are affected by prior treatment. Here we describe an
alternative method to IP weighting for comparing the effectiveness of dynamic
cART regimes: the parametric g-formula. The parametric g-formula naturally
handles dynamic regimes and, like IP weighting, can appropriately adjust for
measured time-varying confounders. However, estimators based on the parametric
g-formula are more efficient than IP weighted estimators. This is often at the
expense of more parametric assumptions. Here we describe how to use the
parametric g-formula to estimate risk by the end of a user-specified follow-up
period under dynamic treatment regimes. We describe an application of this
method to answer the “when to start” question using data from
the HIV-CAUSAL Collaboration.
The Women’s Health Initiative randomized trial found greater coronary heart disease (CHD) risk in women assigned to estrogen/progestin therapy than in those assigned to placebo. Observational studies had previously suggested reduced CHD risk in hormone users.
Using data from the observational Nurses’ Health Study, we emulated the design and intention-to-treat (ITT) analysis of the randomized trial. The observational study was conceptualized as a sequence of “trials” in which eligible women were classified as initiators or noninitiators of estrogen/progestin therapy.
The ITT hazard ratios (95% confidence intervals) of CHD for initiators versus noninitiators were 1.42 (0.92 – 2.20) for the first 2 years, and 0.96 (0.78 – 1.18) for the entire follow-up. The ITT hazard ratios were 0.84 (0.61 – 1.14) in women within 10 years of menopause, and 1.12 (0.84 – 1.48) in the others (P value for interaction = 0.08). These ITT estimates are similar to those from the Women’s Health Initiative. Because the ITT approach causes severe treatment misclassification, we also estimated adherence-adjusted effects by inverse probability weighting. The hazard ratios were 1.61 (0.97 – 2.66) for the first 2 years, and 0.98 (0.66 – 1.49) for the entire follow-up. The hazard ratios were 0.54 (0.19 – 1.51) in women within 10 years after menopause, and 1.20 (0.78 – 1.84) in others (P value for interaction = 0.01). Finally, we also present comparisons between these estimates and previously reported NHS estimates.
Our findings suggest that the discrepancies between the Women’s Health Initiative and Nurses’ Health Study ITT estimates could be largely explained by differences in the distribution of time since menopause and length of follow-up.
Recently proposed double-robust estimators for a population mean from incomplete data and for a finite number of counterfactual means can have much higher efficiency than the usual double-robust estimators under misspecification of the outcome model. In this paper, we derive a new class of double-robust estimators for the parameters of regression models with incomplete cross-sectional or longitudinal data, and of marginal structural mean models for cross-sectional data with similar efficiency properties. Unlike the recent proposals, our estimators solve outcome regression estimating equations. In a simulation study, the new estimator shows improvements in variance relative to the standard double-robust estimator that are in agreement with those suggested by asymptotic theory.
Drop-out; Marginal structural model; Missing at random
Standard methods for estimating the effect of a time-varying exposure on survival may be biased in the presence of time-dependent confounders themselves affected by prior exposure. This problem can be overcome by inverse probability weighted estimation of Marginal Structural Cox Models (Cox MSM), g-estimation of Structural Nested Accelerated Failure Time Models (SNAFTM) and g-estimation of Structural Nested Cumulative Failure Time Models (SNCFTM). In this paper, we describe a data generation mechanism that approximately satisfies a Cox MSM, an SNAFTM and an SNCFTM. Besides providing a procedure for data simulation, our formal description of a data generation mechanism that satisfies all three models allows one to assess the relative advantages and disadvantages of each modeling approach. A simulation study is also presented to compare effect estimates across the three models.
As with other instrumental variable (IV) analyses, Mendelian randomization (MR) studies rest on strong assumptions. These assumptions are not routinely systematically evaluated in MR applications, although such evaluation could add to the credibility of MR analyses. In this article, the authors present several methods that are useful for evaluating the validity of an MR study. They apply these methods to a recent MR study that used fat mass and obesity-associated (FTO) genotype as an IV to estimate the effect of obesity on mental disorder. These approaches to evaluating assumptions for valid IV analyses are not fail-safe, in that there are situations where the approaches might either fail to identify a biased IV or inappropriately suggest that a valid IV is biased. Therefore, the authors describe the assumptions upon which the IV assessments rely. The methods they describe are relevant to any IV analysis, regardless of whether it is based on a genetic IV or other possible sources of exogenous variation. Methods that assess the IV assumptions are generally not conclusive, but routinely applying such methods is nonetheless likely to improve the scientific contributions of MR studies.
causality; confounding factors; epidemiologic methods; instrumental variables; Mendelian randomization analysis