Search tips
Search criteria

Results 1-25 (1585673)

Clipboard (0)

Related Articles

1.  Principal Stratification in Causal Inference 
Biometrics  2002;58(1):21-29.
Many scientific problems require that treatment comparisons be adjusted for posttreatment variables, but the estimands underlying standard methods are not causal effects. To address this deficiency, we propose a general framework for comparing treatments adjusting for posttreatment variables that yields principal effects based on principal stratification. Principal stratification with respect to a posttreatment variable is a cross-classification of subjects defined by the joint potential values of that posttreatment variable under each of the treatments being compared. Principal effects are causal effects within a principal stratum. The key property of principal strata is that they are not affected by treatment assignment and therefore can be used just as any pretreatment covariate, such as age category. As a result, the central property of our principal effects is that they are always causal effects and do not suffer from the complications of standard posttreatment-adjusted estimands. We discuss briefly that such principal causal effects are the link between three recent applications with adjustment for posttreatment variables: (i) treatment noncompliance, (ii) missing outcomes (dropout) following treatment noncompliance, and (iii) censoring by death. We then attack the problem of surrogate or biomarker endpoints, where we show, using principal causal effects, that all current definitions of surrogacy, even when perfectly true, do not generally have the desired interpretation as causal effects of treatment on outcome. We go on to formulate estimands based on principal stratification and principal causal effects and show their superiority.
PMCID: PMC4137767  PMID: 11890317
Biomarker; Causal inference; Censoring by death; Missing data; Posttreatment variable; Principal stratification; Quality of life; Rubin causal model; Surrogate
2.  Commentary on “Principal Stratification — a Goal or a Tool?” by Judea Pearl 
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
PMCID: PMC3204668  PMID: 22049267
principal stratification; causal inference; vaccine trial
3.  Accommodating Missingness When Assessing Surrogacy Via Principal Stratification 
Clinical trials (London, England)  2013;10(3):363-377.
When an outcome of interest in a clinical trial is late-occurring or difficult to obtain, surrogate markers can extract information about the effect of the treatment on the outcome of interest. Understanding associations between the causal effect of treatment on the outcome and the causal effect of treatment on the surrogate is critical to understanding the value of a surrogate from a clinical perspective.
Traditional regression approaches to determine the proportion of the treatment effect explained by surrogate markers suffer from several shortcomings: they can be unstable, and can lie outside of the 0–1 range. Further, they do not account for the fact that surrogate measures are obtained post-randomization, and thus the surrogate-outcome relationship may be subject to unmeasured confounding. Methods to avoid these problem are of key importance.
Frangakis C, Rubin DM. Principal stratification in causal inference. Biometrics 2002; 58:21–9 suggested assessing the causal effect of treatment within pre-randomization “principal strata” defined by the counterfactual joint distribution of the surrogate marker under the different treatment arms, with the proportion of the overall outcome causal effect attributable to subjects for whom the treatment affects the proposed surrogate as the key measure of interest. Li Y, Taylor JMG, Elliott MR. Bayesian approach to surrogacy assessment using principal stratification in clinical trials. Biometrics 2010; 66:523–31 developed this “principal surrogacy” approach for dichotomous markers and outcomes, utilizing Bayesian methods that accommodated non-identifiability in the model parameters. Because the surrogate marker is typically observed early, outcome data is often missing. Here we extend Li, Taylor, and Elliott to accommodate missing data in the observable final outcome under ignorable and non-ignorable settings. We also allow for the possibility that missingness has a counterfactual component, a feature that previous literature has not addressed.
We apply the proposed methods to a trial of glaucoma control comparing surgery versus medication, where intraocular pressure (IOP) control at 12 months is a surrogate for IOP control at 96 months. We also conduct a series of simulations to consider the impacts of non-ignorability, as well as sensitivity to priors and the ability of the Decision Information Criterion to choose the correct model when parameters are not fully identified.
Because model parameters cannot be fully identified from data, informative priors can introduce non-trivial bias in moderate sample size settings, while more non-informative priors can yield wide credible intervals.
Assessing the linkage between causal effects of treatment on a surrogate marker and causal effects of a treatment on an outcome is important to understanding the value of a marker. These causal effects are not fully identifiable: hence we explore the sensitivity and identifiability aspects of these models and show that relatively weak assumptions can still yield meaningful results.
PMCID: PMC4096330  PMID: 23553326
Causal Inference; Surrogate Marker; Bayesian Analysis; dentifiability; Non-response; Counterfactual
4.  Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal 
Biostatistics (Oxford, England)  2013;15(2):266-283.
In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21–29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431–440). The method is applied to data from a macular degeneration study and an ovarian cancer study.
PMCID: PMC4023321  PMID: 24285772
Bayesian estimation; Principal stratification; Surrogate endpoints
5.  Causal assessment of surrogacy in a meta-analysis of colorectal cancer trials 
Biostatistics (Oxford, England)  2011;12(3):478-492.
When the true end points (T) are difficult or costly to measure, surrogate markers (S) are often collected in clinical trials to help predict the effect of the treatment (Z). There is great interest in understanding the relationship among S, T, and Z. A principal stratification (PS) framework has been proposed by Frangakis and Rubin (2002) to study their causal associations. In this paper, we extend the framework to a multiple trial setting and propose a Bayesian hierarchical PS model to assess surrogacy. We apply the method to data from a large collection of colon cancer trials in which S and T are binary. We obtain the trial-specific causal measures among S, T, and Z, as well as their overall population-level counterparts that are invariant across trials. The method allows for information sharing across trials and reduces the nonidentifiability problem. We examine the frequentist properties of our model estimates and the impact of the monotonicity assumption using simulations. We also illustrate the challenges in evaluating surrogacy in the counterfactual framework that result from nonidentifiability.
PMCID: PMC3114655  PMID: 21252079
Bayesian estimation; Counterfactual model; Identifiability; Multiple trials; Principal stratification; Surrogate marker
6.  Clarifying the Role of Principal Stratification in the Paired Availability Design 
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
PMCID: PMC3114955  PMID: 21686085
principal stratification; causal inference; paired availability design
The annals of applied statistics  2008;2(3):1034-1055.
Participants in longitudinal studies on the effects of drug treatment and criminal justice system interventions are at high risk for institutionalization (e.g., spending time in an environment where their freedom to use drugs, commit crimes, or engage in risky behavior may be circumscribed). Methods used for estimating treatment effects in the presence of institutionalization during follow-up can be highly sensitive to assumptions that are unlikely to be met in applications and thus likely to yield misleading inferences. In this paper, we consider the use of principal stratification to control for institutionalization at follow-up. Principal stratification has been suggested for similar problems where outcomes are unobservable for samples of study participants because of dropout, death, or other forms of censoring. The method identifies principal strata within which causal effects are well defined and potentially estimable. We extend the method of principal stratification to model institutionalization at follow-up and estimate the effect of residential substance abuse treatment versus outpatient services in a large scale study of adolescent substance abuse treatment programs. Additionally, we discuss practical issues in applying the principal stratification model to data. We show via simulation studies that the model can only recover true effects provided the data meet strenuous demands and that there must be caution taken when implementing principal stratification as a technique to control for post-treatment confounders such as institutionalization.
PMCID: PMC2749670  PMID: 19779599
Principal Stratification; Post-Treatment Confounder; Institutionalization; Causal Inference
8.  Surrogate Endpoint Evaluation: Principal Stratification Criteria and the Prentice Definition 
Journal of causal inference  2015;3(2):157-175.
A common problem of interest within a randomized clinical trial is the evaluation of an inexpensive response endpoint as a valid surrogate endpoint for a clinical endpoint, where a chief purpose of a valid surrogate is to provide a way to make correct inferences on clinical treatment effects in future studies without needing to collect the clinical endpoint data. Within the principal stratification framework for addressing this problem based on data from a single randomized clinical efficacy trial, a variety of definitions and criteria for a good surrogate endpoint have been proposed, all based on or closely related to the “principal effects” or “causal effect predictiveness (CEP)” surface. We discuss CEP-based criteria for a useful surrogate endpoint, including (1) the meaning and relative importance of proposed criteria including average causal necessity (ACN), average causal sufficiency (ACS), and large clinical effect modification; (2) the relationship between these criteria and the Prentice definition of a valid surrogate endpoint; and (3) the relationship between these criteria and the consistency criterion (i.e., assurance against the “surrogate paradox”). This includes the result that ACN plus a strong version of ACS generally do not imply the Prentice definition nor the consistency criterion, but they do have these implications in special cases. Moreover, the converse does not hold except in a special case with a binary candidate surrogate. The results highlight that assumptions about the treatment effect on the clinical endpoint before the candidate surrogate is measured are influential for the ability to draw conclusions about the Prentice definition or consistency. In addition, we emphasize that in some scenarios that occur commonly in practice, the principal strata sub-populations for inference are identifiable from the observable data, in which cases the principal stratification framework has relatively high utility for the purpose of effect modification analysis, and is closely connected to the treatment marker selection problem. The results are illustrated with application to a vaccine efficacy trial, where ACN and ACS for an antibody marker are found to be consistent with the data and hence support the Prentice definition and consistency.
PMCID: PMC4692254  PMID: 26722639
9.  Surrogacy Assessment Using Principal Stratification and a Gaussian Copula Model 
In clinical trials, a surrogate outcome (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Many methods of surrogacy validation rely on models for the conditional distribution of T given Z and S. However, S is a post-randomization variable, and unobserved, simultaneous predictors of S and T may exist, resulting in a non-causal interpretation. Frangakis and Rubin1 developed the concept of principal surrogacy, stratifying on the joint distribution of the surrogate marker under treatment and control to assess the association between the causal effects of treatment on the marker and the causal effects of treatment on the clinical outcome. Working within the principal surrogacy framework, we address the scenario of an ordinal categorical variable as a surrogate for a censored failure time true endpoint. A Gaussian copula model is used to model the joint distribution of the potential outcomes of T, given the potential outcomes of S. Because the proposed model cannot be fully identified from the data, we use a Bayesian estimation approach with prior distributions consistent with reasonable assumptions in the surrogacy assessment setting. The method is applied to data from a colorectal cancer clinical trial, previously analyzed by Burzykowski et al..2
PMCID: PMC4272338  PMID: 24947559
Causal inference; Gaussian copula; Potential outcomes; Surrogate endpoint
10.  Estimating Causal Effects in Trials Involving Multi-Treatment Arms Subject to Non-compliance: A Bayesian framework 
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
PMCID: PMC3104736  PMID: 21637737
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
11.  A tutorial on principal stratification-based sensitivity analysis: Application to smoking cessation studies 
One problem with assessing effects of smoking cessation interventions on withdrawal symptoms is that symptoms are affected by whether participants abstain from smoking during trials. Those who enter a randomized trial but do not change smoking behavior might not experience withdrawal related symptoms.
We present a tutorial of how one can use a principal stratification sensitivity analysis to account for abstinence in the estimation of smoking cessation intervention effects. The paper is intended to introduce researchers to principal stratification and describe how they might implement the methods.
We provide a hypothetical example that demonstrates why estimating effects within observed abstention groups is problematic. We demonstrate how estimation of effects within groups defined by potential abstention that an individual would have in either arm of a study can provide meaningful inferences. We describe a sensitivity analysis method to estimate such effects, and use it to investigate effects of a combined behavioral and nicotine replacement therapy intervention on withdrawal symptoms in a female prisoner population.
Overall, the intervention was found to reduce withdrawal symptoms but the effect was not statistically significant in the group that was observed to abstain. More importantly, the intervention was found to be highly effective in the group that would abstain regardless of intervention assignment. The effectiveness of the intervention in other potential abstinence strata depends on the sensitivity analysis assumptions.
We make assumptions to narrow the range of our sensitivity parameter estimates. While appropriate in this situation, such assumptions might not be plausible in all situations.
A principal stratification sensitivity analysis provides a meaningful method of accounting for abstinence effects in the evaluation of smoking cessation interventions on withdrawal symptoms. Smoking researchers have previously recommended analyses in subgroups defined by observed abstention status in the evaluation of smoking cessation interventions. We believe that principal stratification analyses should replace such analyses as the preferred means of accounting for post-randomization abstinence effects in the evaluation of smoking cessation programs.
PMCID: PMC2874094  PMID: 20423924
12.  Partially hidden Markov model for time-varying principal stratification in HIV prevention trials 
It is frequently of interest to estimate the intervention effect that adjusts for post-randomization variables in clinical trials. In the recently completed HPTN 035 trial, there is differential condom use between the three microbicide gel arms and the No Gel control arm, so that intention to treat (ITT) analyses only assess the net treatment effect that includes the indirect treatment effect mediated through differential condom use. Various statistical methods in causal inference have been developed to adjust for post-randomization variables. We extend the principal stratification framework to time-varying behavioral variables in HIV prevention trials with a time-to-event endpoint, using a partially hidden Markov model (pHMM). We formulate the causal estimand of interest, establish assumptions that enable identifiability of the causal parameters, and develop maximum likelihood methods for estimation. Application of our model on the HPTN 035 trial reveals an interesting pattern of prevention effectiveness among different condom-use principal strata.
PMCID: PMC3649016  PMID: 23667279
microbicide; causal inference; posttreatment variables; direct effect
13.  Statistical identifiability and the surrogate endpoint problem, with application to vaccine trials 
Biometrics  2010;66(4):1153-1161.
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
PMCID: PMC3597127  PMID: 20105158
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
14.  Intermediate outcomes in randomized clinical trials: an introduction 
Trials  2013;14:78.
Intermediate outcomes are common and typically on the causal pathway to the final outcome. Some examples include noncompliance, missing data, and truncation by death like pregnancy (e.g. when the trial intervention is given to non-pregnant women and the final outcome is preeclampsia, defined only on pregnant women). The intention-to-treat approach does not account properly for them, and more appropriate alternative approaches like principal stratification are not yet widely known. The purposes of this study are to inform researchers that the intention-to-treat approach unfortunately does not fit all problems we face in experimental research, to introduce the principal stratification approach for dealing with intermediate outcomes, and to illustrate its application to a trial of long term calcium supplementation in women at high risk of preeclampsia.
Principal stratification and related concepts are introduced. Two ways for estimating causal effects are discussed and their application is illustrated using the calcium trial, where noncompliance and pregnancy are considered as intermediate outcomes, and preeclampsia is the main final outcome.
The limitations of traditional approaches and methods for dealing with intermediate outcomes are demonstrated. The steps, assumptions and required calculations involved in the application of the principal stratification approach are discussed in detail in the case of our calcium trial.
The intention-to-treat approach is a very sound one but unfortunately it does not fit all problems we find in randomized clinical trials; this is particularly the case for intermediate outcomes, where alternative approaches like principal stratification should be considered.
PMCID: PMC3610291  PMID: 23510143
Intermediate outcomes; Intention-to-treat approach; Principal stratification; Causal effects
15.  Estimating the palliative effect of percutaneous endoscopic gastrostomy in an observational registry using principal stratification and generalized propensity scores 
Scientific Reports  2016;6:33431.
Clinical disease registries offer a rich collection of valuable patient information but also pose challenges that require special care and attention in statistical analyses. The goal of this paper is to propose a statistical framework that allows for estimating the effect of surgical insertion of a percutaneous endogastrostomy (PEG) tube for patients living with amyotrophic lateral sclerosis (ALS) using data from a clinical registry. Although all ALS patients are informed about PEG, only some patients agree to the procedure which, leads to the potential for selection bias. Assessing the effect of PEG is further complicated by the aggressively fatal disease, such that time to death competes directly with both the opportunity to receive PEG and clinical outcome measurements. Our proposed methodology handles the “censoring by death” phenomenon through principal stratification and selection bias for PEG treatment through generalized propensity scores. We develop a fully Bayesian modeling approach to estimate the survivor average causal effect (SACE) of PEG on BMI, a surrogate outcome measure of nutrition and quality of life. The use of propensity score methods within the principal stratification framework demonstrates a significant and positive effect of PEG treatment, particularly when time of treatment is included in the treatment definition.
PMCID: PMC5027570  PMID: 27640365
16.  Causal Inference in Longitudinal Comparative Effectiveness Studies With Repeated Measures of A Continuous Intermediate Variable 
Statistics in medicine  2014;33(20):3509-3527.
We propose a principal stratification approach to assess causal effects in non-randomized longitudinal comparative effectiveness studies with a binary endpoint outcome and repeated measures of a continuous intermediate variable. Our method is an extension of the principal stratification approach by Lin et al. [10,11], originally proposed for a longitudinal randomized study to assess the treatment effect of a continuous outcome adjusting for the heterogeneity of a repeatedly measured binary intermediate variable. Our motivation for this work comes from a comparison of the effect of two glucose-lowering medications on a clinical cohort of patients with type 2 diabetes. Here we consider a causal inference problem assessing how well the two medications work relative to one another on two binary endpoint outcomes: cardiovascular disease related hospitalization and all-cause mortality. Clinically, these glucose-lowering medications can have differential effects on the intermediate outcome, glucose level over time. Ultimately we want to compare medication effects on the endpoint outcomes among individuals in the same glucose trajectory stratum while accounting for the heterogeneity in baseline covariates (i.e., to obtain “principal effects” on the endpoint outcomes). The proposed method involves a 3-step model estimation procedure. Step 1 identifies principal strata associated with the intermediate variable using hybrid growth mixture modeling analyses [13]. Step 2 obtains the stratum membership using the pseudoclass technique [17,18], and derives propensity scores for treatment assignment. Step 3 obtains the stratum-specific treatment effect on the endpoint outcome weighted by inverse propensity probabilities derived from Step 2.
PMCID: PMC4122661  PMID: 24577715
Causal inference; Comparative effectiveness studies; Growth mixture model; Principal stratification; Propensity score
17.  Mechanisms and mediation in survival analysis: towards an integrated analytical framework 
A wide-ranging debate has taken place in recent years on mediation analysis and causal modelling, raising profound theoretical, philosophical and methodological questions. The authors build on the results of these discussions to work towards an integrated approach to the analysis of research questions that situate survival outcomes in relation to complex causal pathways with multiple mediators. The background to this contribution is the increasingly urgent need for policy-relevant research on the nature of inequalities in health and healthcare.
The authors begin by summarising debates on causal inference, mediated effects and statistical models, showing that these three strands of research have powerful synergies. They review a range of approaches which seek to extend existing survival models to obtain valid estimates of mediation effects. They then argue for an alternative strategy, which involves integrating survival outcomes within Structural Equation Models via the discrete-time survival model. This approach can provide an integrated framework for studying mediation effects in relation to survival outcomes, an issue of great relevance in applied health research. The authors provide an example of how these techniques can be used to explore whether the social class position of patients has a significant indirect effect on the hazard of death from colon cancer.
The results suggest that the indirect effects of social class on survival are substantial and negative (-0.23 overall). In addition to the substantial direct effect of this variable (-0.60), its indirect effects account for more than one quarter of the total effect. The two main pathways for this indirect effect, via emergency admission (-0.12), on the one hand, and hospital caseload, on the other, (-0.10) are of similar size.
The discrete-time survival model provides an attractive way of integrating time-to-event data within the field of Structural Equation Modelling. The authors demonstrate the efficacy of this approach in identifying complex causal pathways that mediate the effects of a socio-economic baseline covariate on the hazard of death from colon cancer. The results show that this approach has the potential to shed light on a class of research questions which is of particular relevance in health research.
PMCID: PMC4772586  PMID: 26927506
Causal modelling; Mediation analysis; Social inequalities; Discrete-time survival model; Structural equation modelling; Deprivation index; Ireland; Colon cancer
18.  An Introduction to Causal Inference* 
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
PMCID: PMC2836213  PMID: 20305706
structural equation models; confounding; graphical methods; counterfactuals; causal effects; potential-outcome; mediation; policy evaluation; causes of effects
19.  Principal Stratification — a Goal or a Tool? 
Principal stratification has recently become a popular tool to address certain causal inference questions, particularly in dealing with post-randomization factors in randomized trials. Here, we analyze the conceptual basis for this framework and invite response to clarify the value of principal stratification in estimating causal effects of interest.
PMCID: PMC3083139  PMID: 21556288
causal inference; principal stratification; surrogate endpoints; direct effect; mediation
20.  Design and Inference for the Intent to Treat Principle using Adaptive Treatment Strategies and Sequential Randomization 
Statistics in medicine  2015;34(9):1441-1453.
Nonadherence to assigned treatment jeopardizes the power and interpretability of intent-to-treat comparisons from clinical trial data and continues to be an issue for effectiveness studies, despite their pragmatic emphasis. We posit that new approaches to design need to complement developments in methods for causal inference to address nonadherence, in both experimental and practice settings. This paper considers the conventional study design for psychiatric research and other medical contexts, in which subjects are randomized to treatments that are fixed throughout the trial and presents an alternative that converts the fixed treatments into an adaptive intervention that reflects best practice. The key element is the introduction of an adaptive decision point midway into the study to address a patient's reluctance to remain on treatment before completing a full-length trial of medication. The clinical uncertainty about the appropriate adaptation prompts a second randomization at the new decision point to evaluate relevant options. Additionally, the standard ‘all-or-none’ principal stratification (PS) framework is applied to the first stage of the design to address treatment discontinuation that occurs too early for a mid-trial adaptation. Drawing upon the adaptive intervention features, we develop assumptions to identify the PS causal estimand and introduce restrictions on outcome distributions to simplify Expectation-Maximization calculations. We evaluate the performance of the PS setup, with particular attention to the role played by a binary covariate. The results emphasize the importance of collecting covariate data for use in design and analysis. We consider the generality of our approach beyond the setting of psychiatric research.
PMCID: PMC4390441  PMID: 25581413
nonadherence; intent-to-treat; adaptive treatment strategy; sequential randomization; principal stratification
21.  Causal Models and Learning from Data 
Epidemiology (Cambridge, Mass.)  2014;25(3):418-426.
The practice of epidemiology requires asking causal questions. Formal frameworks for causal inference developed over the past decades have the potential to improve the rigor of this process. However, the appropriate role for formal causal thinking in applied epidemiology remains a matter of debate. We argue that a formal causal framework can help in designing a statistical analysis that comes as close as possible to answering the motivating causal question, while making clear what assumptions are required to endow the resulting estimates with a causal interpretation. A systematic approach for the integration of causal modeling with statistical estimation is presented. We highlight some common points of confusion that occur when causal modeling techniques are applied in practice and provide a broad overview on the types of questions that a causal framework can help to address. Our aims are to argue for the utility of formal causal thinking, to clarify what causal models can and cannot do, and to provide an accessible introduction to the flexible and powerful tools provided by causal models.
PMCID: PMC4077670  PMID: 24713881
22.  Statistical Approaches for Enhancing Causal Interpretation of the M to Y Relation in Mediation Analysis 
Statistical mediation methods provide valuable information about underlying mediating psychological processes, but the ability to infer that the mediator variable causes the outcome variable is more complex than widely known. Researchers have recently emphasized how violating assumptions about confounder bias severely limits causal inference of the mediator to dependent variable relation. Our article describes and addresses these limitations by drawing on new statistical developments in causal mediation analysis. We first review the assumptions underlying causal inference and discuss three ways to examine the effects of confounder bias when assumptions are violated. We then describe four approaches to address the influence of confounding variables and enhance causal inference, including comprehensive structural equation models, instrumental variable methods, principal stratification, and inverse probability weighting. Our goal is to further the adoption of statistical methods to enhance causal inference in mediation studies.
PMCID: PMC4835342  PMID: 25063043
mediation; causal inference; confounder bias
23.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma 
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma.
We introduce a modeling approach termed EPoC (Endogenous Perturbation analysis of Cancer), enabling the construction of global, gene-level models that causally connect gene copy number with expression in glioblastoma.On the basis of the resulting model, we predict genes that are likely to be disease-driving and validate selected predictions experimentally. We also demonstrate that further analysis of the network model by sparse singular value decomposition allows stratification of patients with glioblastoma into short-term and long-term survivors, introducing decomposed network models as a useful principle for biomarker discovery.Finally, in systematic comparisons, we demonstrate that EPoC is computationally efficient and yields more consistent results than mRNA-only methods, standard eQTL methods, and two recent multivariate methods for genotype–mRNA coupling.
Gains and losses of chromosomal material (DNA copy number aberrations; CNAs) are a characteristic feature of cancer genomes. At the level of a single locus, it is well known that increased copy number (gene amplification) typically leads to increased gene expression, whereas decreased copy number (gene deletion) leads to decreased gene expression (Pollack et al, 2002; Lee et al, 2008; Nilsson et al, 2008). However, CNAs also affect the expression of genes located outside the amplified/deleted region itself via indirect mechanisms. To fully understand the action of CNAs, it is therefore necessary to analyze their action in a network context. Toward this goal, improved computational approaches will be important, if not essential.
To determine the global effects on transcription of CNAs in the brain tumor glioblastoma, we develop EPoC (Endogenous Perturbation analysis of Cancer), a computational technique capable of inferring sparse, causal network models by combining genome-wide, paired CNA- and mRNA-level data. EPoC aims to detect disease-driving copy number aberrations and their effect on target mRNA expression, and stratify patients into long-term and short-term survivors. Technically, EPoC relates CNA perturbations to mRNA responses by matrix equations, derived from a steady-state approximation of the transcriptional network. Patient prognostic scores are obtained from singular value decompositions of the network matrix. The models are constructed by solving a large-scale, regularized regression problem.
We apply EPoC to glioblastoma data from The Cancer Genome Atlas (TCGA) consortium (186 patients). The identified CNA-driven network comprises 10 672 genes, and contains a number of copy number-altered genes that control multiple downstream genes. Highly connected hub genes include well-known oncogenes and tumor supressor genes that are frequently deleted or amplified in glioblastoma, including EGFR, PDGFRA, CDKN2A and CDKN2B, confirming a clear association between these aberrations and transcriptional variability of these brain tumors. In addition, we identify a number of hub genes that have previously not been associated with glioblastoma, including interferon alpha 1 (IFNA1), myeloid/lymphoid or mixed-lineage leukemia translocated to 10 (MLLT10, a well-known leukemia gene), glutamate decarboxylase 2 GAD2, a postulated glutamate receptor GPR158 and Necdin (NDN). Furthermore, we demonstrate that the network model contains useful information on downstream target genes (including stem cell regulators), and possible drug targets.
We proceed to explore the validity of a small network region experimentally. Introducing experimental perturbations of NDN and other targets in four glioblastoma cell lines (T98G, U-87MG, U-343MG and U-373MG), we confirm several predicted mechanisms. We also demonstrate that the TCGA glioblastoma patients can be stratified into long-term and short-term survivors, using our proposed prognostic scores derived from a singular vector decomposition of the network model. Finally, we compare EPoC to existing methods for mRNA networks analysis and expression quantitative locus methods, and demonstrate that EPoC produces more consistent models between technically independent glioblastoma data sets, and that the EPoC models exhibit better overlap with known protein–protein interaction networks and pathway maps.
In summary, we conclude that large-scale integrative modeling reveals mechanistically and prognostically informative networks in human glioblastoma. Our approach operates at the gene level and our data support that individual hub genes can be identified in practice. Very large aberrations, however, cannot be fully resolved by the current modeling strategy.
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
PMCID: PMC3101951  PMID: 21525872
cancer biology; cancer genomics; glioblastoma
24.  Hard, harder, hardest: principal stratification, statistical identifiability, and the inherent difficulty of finding surrogate endpoints 
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging.
PMCID: PMC4171402  PMID: 25342953
Surrogate endpoint; Principal stratification; Causal inference; Statistical identifiability
25.  Assessing the sensitivity of methods for estimating principal causal effects 
Statistical methods in medical research  2011;10.1177/0962280211421840.
The framework of principal stratification provides a way to think about treatment effects conditional on post-randomization variables, such as level of compliance. In particular, the complier average causal effect (CACE)–the effect of the treatment for those individuals who would comply with their treatment assignment under either treatment condition–is often of substantive interest. However, estimation of the CACE is not always straightforward, with a variety of estimation procedures and underlying assumptions, but little advice to help researchers select between methods. In this paper we discuss and examine two methods that rely on very different assumptions to estimate the CACE: a maximum likelihood (“joint”) method that assumes the “exclusion restriction,” and a propensity score based method that relies on “principal ignorability.” We detail the assumptions underlying each approach, and assess each method’s sensitivity to both its own assumptions and those of the other method using both simulated data and a motivating example. We find that the exclusion restriction based joint approach appears somewhat less sensitive to its assumptions, and that the performance of both methods is significantly improved when there are strong predictors of compliance. Interestingly, we also find that each method performs particularly well when the assumptions of the other approach are violated. These results highlight the importance of carefully selecting an estimation procedure whose assumptions are likely to be satisfied in practice and of having strong predictors of principal stratum membership.
PMCID: PMC3253203  PMID: 21971481
Complier average causal effect; Intermediate outcomes; Noncompliance; Principal stratification; Propensity scores

Results 1-25 (1585673)