Search tips
Search criteria

Results 1-25 (808393)

Clipboard (0)

Related Articles

1.  Accommodating Missingness When Assessing Surrogacy Via Principal Stratification 
Clinical trials (London, England)  2013;10(3):363-377.
When an outcome of interest in a clinical trial is late-occurring or difficult to obtain, surrogate markers can extract information about the effect of the treatment on the outcome of interest. Understanding associations between the causal effect of treatment on the outcome and the causal effect of treatment on the surrogate is critical to understanding the value of a surrogate from a clinical perspective.
Traditional regression approaches to determine the proportion of the treatment effect explained by surrogate markers suffer from several shortcomings: they can be unstable, and can lie outside of the 0–1 range. Further, they do not account for the fact that surrogate measures are obtained post-randomization, and thus the surrogate-outcome relationship may be subject to unmeasured confounding. Methods to avoid these problem are of key importance.
Frangakis C, Rubin DM. Principal stratification in causal inference. Biometrics 2002; 58:21–9 suggested assessing the causal effect of treatment within pre-randomization “principal strata” defined by the counterfactual joint distribution of the surrogate marker under the different treatment arms, with the proportion of the overall outcome causal effect attributable to subjects for whom the treatment affects the proposed surrogate as the key measure of interest. Li Y, Taylor JMG, Elliott MR. Bayesian approach to surrogacy assessment using principal stratification in clinical trials. Biometrics 2010; 66:523–31 developed this “principal surrogacy” approach for dichotomous markers and outcomes, utilizing Bayesian methods that accommodated non-identifiability in the model parameters. Because the surrogate marker is typically observed early, outcome data is often missing. Here we extend Li, Taylor, and Elliott to accommodate missing data in the observable final outcome under ignorable and non-ignorable settings. We also allow for the possibility that missingness has a counterfactual component, a feature that previous literature has not addressed.
We apply the proposed methods to a trial of glaucoma control comparing surgery versus medication, where intraocular pressure (IOP) control at 12 months is a surrogate for IOP control at 96 months. We also conduct a series of simulations to consider the impacts of non-ignorability, as well as sensitivity to priors and the ability of the Decision Information Criterion to choose the correct model when parameters are not fully identified.
Because model parameters cannot be fully identified from data, informative priors can introduce non-trivial bias in moderate sample size settings, while more non-informative priors can yield wide credible intervals.
Assessing the linkage between causal effects of treatment on a surrogate marker and causal effects of a treatment on an outcome is important to understanding the value of a marker. These causal effects are not fully identifiable: hence we explore the sensitivity and identifiability aspects of these models and show that relatively weak assumptions can still yield meaningful results.
PMCID: PMC4096330  PMID: 23553326
Causal Inference; Surrogate Marker; Bayesian Analysis; dentifiability; Non-response; Counterfactual
2.  Intermediate outcomes in randomized clinical trials: an introduction 
Trials  2013;14:78.
Intermediate outcomes are common and typically on the causal pathway to the final outcome. Some examples include noncompliance, missing data, and truncation by death like pregnancy (e.g. when the trial intervention is given to non-pregnant women and the final outcome is preeclampsia, defined only on pregnant women). The intention-to-treat approach does not account properly for them, and more appropriate alternative approaches like principal stratification are not yet widely known. The purposes of this study are to inform researchers that the intention-to-treat approach unfortunately does not fit all problems we face in experimental research, to introduce the principal stratification approach for dealing with intermediate outcomes, and to illustrate its application to a trial of long term calcium supplementation in women at high risk of preeclampsia.
Principal stratification and related concepts are introduced. Two ways for estimating causal effects are discussed and their application is illustrated using the calcium trial, where noncompliance and pregnancy are considered as intermediate outcomes, and preeclampsia is the main final outcome.
The limitations of traditional approaches and methods for dealing with intermediate outcomes are demonstrated. The steps, assumptions and required calculations involved in the application of the principal stratification approach are discussed in detail in the case of our calcium trial.
The intention-to-treat approach is a very sound one but unfortunately it does not fit all problems we find in randomized clinical trials; this is particularly the case for intermediate outcomes, where alternative approaches like principal stratification should be considered.
PMCID: PMC3610291  PMID: 23510143
Intermediate outcomes; Intention-to-treat approach; Principal stratification; Causal effects
3.  Partially hidden Markov model for time-varying principal stratification in HIV prevention trials 
It is frequently of interest to estimate the intervention effect that adjusts for post-randomization variables in clinical trials. In the recently completed HPTN 035 trial, there is differential condom use between the three microbicide gel arms and the No Gel control arm, so that intention to treat (ITT) analyses only assess the net treatment effect that includes the indirect treatment effect mediated through differential condom use. Various statistical methods in causal inference have been developed to adjust for post-randomization variables. We extend the principal stratification framework to time-varying behavioral variables in HIV prevention trials with a time-to-event endpoint, using a partially hidden Markov model (pHMM). We formulate the causal estimand of interest, establish assumptions that enable identifiability of the causal parameters, and develop maximum likelihood methods for estimation. Application of our model on the HPTN 035 trial reveals an interesting pattern of prevention effectiveness among different condom-use principal strata.
PMCID: PMC3649016  PMID: 23667279
microbicide; causal inference; posttreatment variables; direct effect
4.  Can we apply the Mendelian randomization methodology without considering epigenetic effects? 
Instrumental variable (IV) methods have been used in econometrics for several decades now, but have only recently been introduced into the epidemiologic research frameworks. Similarly, Mendelian randomization studies, which use the IV methodology for analysis and inference in epidemiology, were introduced into the epidemiologist's toolbox only in the last decade.
Mendelian randomization studies using instrumental variables (IVs) have the potential to avoid some of the limitations of observational epidemiology (confounding, reverse causality, regression dilution bias) for making causal inferences. Certain limitations of randomized controlled trials, such as problems with generalizability, feasibility and ethics for some exposures, and high costs, also make the use of Mendelian randomization in observational studies attractive. Unlike conventional randomized controlled trials (RCTs), Mendelian randomization studies can be conducted in a representative sample without imposing any exclusion criteria or requiring volunteers to be amenable to random treatment allocation.
Within the last decade, epigenetics has gained recognition as an independent field of study, and appears to be the new direction for future research into the genetics of complex diseases. Although previous articles have addressed some of the limitations of Mendelian randomization (such as the lack of suitable genetic variants, unreliable associations, population stratification, linkage disequilibrium (LD), pleiotropy, developmental canalization, the need for large sample sizes and some potential problems with binary outcomes), none has directly characterized the impact of epigenetics on Mendelian randomization. The possibility of epigenetic effects (non-Mendelian, heritable changes in gene expression not accompanied by alterations in DNA sequence) could alter the core instrumental variable assumptions of Mendelian randomization.
This paper applies conceptual considerations, algebraic derivations and data simulations to question the appropriateness of Mendelian randomization methods when epigenetic modifications are present.
Given an inheritance of gene expression from parents, Mendelian randomization studies not only need to assume a random distribution of alleles in the offspring, but also a random distribution of epigenetic changes (e.g. gene expression) at conception, in order for the core assumptions of the Mendelian randomization methodology to remain valid. As an increasing number of epidemiologists employ Mendelian randomization methods in their research, caution is therefore needed in drawing conclusions from these studies if these assumptions are not met.
PMCID: PMC2698894  PMID: 19432981
5.  Causal Inference in Randomized Experiments With Mediational Processes 
Psychological methods  2008;13(4):314-336.
This article links the structural equation modeling (SEM) approach with the principal stratification (PS) approach, both of which have been widely used to study the role of intermediate posttreatment outcomes in randomized experiments. Despite the potential benefit of such integration, the 2 approaches have been developed in parallel with little interaction. This article proposes the cross-model translation (CMT) approach, in which parameter estimates are translated back and forth between the PS and SEM models. First, without involving any particular identifying assumptions, translation between PS and SEM parameters is carried out on the basis of their close conceptual connection. Monte Carlo simulations are used to further clarify the relation between the 2 approaches under particular identifying assumptions. The study concludes that, under the common goal of causal inference, what makes a practical difference is the choice of identifying assumptions, not the modeling framework itself. The CMT approach provides a common ground in which the PS and SEM approaches can be jointly considered, focusing on their common inferential problems.
PMCID: PMC2927874  PMID: 19071997
cross-model translation; mediational process; principal stratification; randomized experiment; structural equation modeling
6.  Commentary on “Principal Stratification — a Goal or a Tool?” by Judea Pearl 
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
PMCID: PMC3204668  PMID: 22049267
principal stratification; causal inference; vaccine trial
7.  Principal Stratification — Uses and Limitations 
Pearl (2011) asked for the causal inference community to clarify the role of the principal stratification framework in the analysis of causal effects. Here, I argue that the notion of principal stratification has shed light on problems of non-compliance, censoring-by-death, and the analysis of post-infection outcomes; that it may be of use in considering problems of surrogacy but further development is needed; that it is of some use in assessing “direct effects”; but that it is not the appropriate tool for assessing “mediation.” There is nothing within the principal stratification framework that corresponds to a measure of an “indirect” or “mediated” effect.
PMCID: PMC3154088  PMID: 21841939
causal inference; mediation; non-compliance; potential outcomes; principal stratification; surrogates
8.  Clarifying the Role of Principal Stratification in the Paired Availability Design 
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
PMCID: PMC3114955  PMID: 21686085
principal stratification; causal inference; paired availability design
9.  Statistical identifiability and the surrogate endpoint problem, with application to vaccine trials 
Biometrics  2010;66(4):1153-1161.
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
PMCID: PMC3597127  PMID: 20105158
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
10.  Estimating Causal Effects in Trials Involving Multi-Treatment Arms Subject to Non-compliance: A Bayesian framework 
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
PMCID: PMC3104736  PMID: 21637737
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
11.  Principal Stratification in Causal Inference 
Biometrics  2002;58(1):21-29.
Many scientific problems require that treatment comparisons be adjusted for posttreatment variables, but the estimands underlying standard methods are not causal effects. To address this deficiency, we propose a general framework for comparing treatments adjusting for posttreatment variables that yields principal effects based on principal stratification. Principal stratification with respect to a posttreatment variable is a cross-classification of subjects defined by the joint potential values of that posttreatment variable under each of the treatments being compared. Principal effects are causal effects within a principal stratum. The key property of principal strata is that they are not affected by treatment assignment and therefore can be used just as any pretreatment covariate, such as age category. As a result, the central property of our principal effects is that they are always causal effects and do not suffer from the complications of standard posttreatment-adjusted estimands. We discuss briefly that such principal causal effects are the link between three recent applications with adjustment for posttreatment variables: (i) treatment noncompliance, (ii) missing outcomes (dropout) following treatment noncompliance, and (iii) censoring by death. We then attack the problem of surrogate or biomarker endpoints, where we show, using principal causal effects, that all current definitions of surrogacy, even when perfectly true, do not generally have the desired interpretation as causal effects of treatment on outcome. We go on to formulate estimands based on principal stratification and principal causal effects and show their superiority.
PMCID: PMC4137767  PMID: 11890317
Biomarker; Causal inference; Censoring by death; Missing data; Posttreatment variable; Principal stratification; Quality of life; Rubin causal model; Surrogate
The annals of applied statistics  2011;5(3):1876-1892.
Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predict membership in latent principal strata. We show that when treatment received can differ from treatment assigned in both study arms, incorporating a stratum-predictive covariate can make estimates of the “complier average causal effect” (CACE) derive from observations in the two treatment arms with different covariate distributions. Adopting a Bayesian perspective and using Markov chain Monte Carlo for computation, we develop posterior checks that characterize the extent to which incorporating the pretreatment covariate endangers estimation of the CACE. We apply the method to analyze a clinical trial comparing two treatments for jaw fractures in which the study protocol allowed surgeons to overrule both possible randomized treatment assignments based on their clinical judgment and the data contained a key covariate (injury severity) predictive of treatment received.
PMCID: PMC3269822  PMID: 22308190
Complier average causal effect; noncompliance; principal effect; principal stratification
13.  Causal Inference for Vaccine Effects on Infectiousness 
The International Journal of Biostatistics  2012;8(2):10.2202/1557-4679.1354 /j/ijb.2012.8.issue-2/1557-4679.1354/1557-4679.1354.xml.
If a vaccine does not protect individuals completely against infection, it could still reduce infectiousness of infected vaccinated individuals to others. Typically, vaccine efficacy for infectiousness is estimated based on contrasts between the transmission risk to susceptible individuals from infected vaccinated individuals compared with that from infected unvaccinated individuals. Such estimates are problematic, however, because they are subject to selection bias and do not have a causal interpretation. Here, we develop causal estimands for vaccine efficacy for infectiousness for four different scenarios of populations of transmission units of size two. These causal estimands incorporate both principal stratification, based on the joint potential infection outcomes under vaccine and control, and interference between individuals within transmission units. In the most general scenario, both individuals can be exposed to infection outside the transmission unit and both can be assigned either vaccine or control. The three other scenarios are special cases of the general scenario where only one individual is exposed outside the transmission unit or can be assigned vaccine. The causal estimands for vaccine efficacy for infectiousness are well defined only within certain principal strata and, in general, are identifiable only with strong unverifiable assumptions. Nonetheless, the observed data do provide some information, and we derive large sample bounds on the causal vaccine efficacy for infectiousness estimands. An example of the type of data observed in a study to estimate vaccine efficacy for infectiousness is analyzed in the causal inference framework we developed.
PMCID: PMC3348179  PMID: 22499732
causal inference; principal stratification; interference; infectious disease; vaccine
14.  Assessing the sensitivity of methods for estimating principal causal effects 
Statistical methods in medical research  2011;10.1177/0962280211421840.
The framework of principal stratification provides a way to think about treatment effects conditional on post-randomization variables, such as level of compliance. In particular, the complier average causal effect (CACE)–the effect of the treatment for those individuals who would comply with their treatment assignment under either treatment condition–is often of substantive interest. However, estimation of the CACE is not always straightforward, with a variety of estimation procedures and underlying assumptions, but little advice to help researchers select between methods. In this paper we discuss and examine two methods that rely on very different assumptions to estimate the CACE: a maximum likelihood (“joint”) method that assumes the “exclusion restriction,” and a propensity score based method that relies on “principal ignorability.” We detail the assumptions underlying each approach, and assess each method’s sensitivity to both its own assumptions and those of the other method using both simulated data and a motivating example. We find that the exclusion restriction based joint approach appears somewhat less sensitive to its assumptions, and that the performance of both methods is significantly improved when there are strong predictors of compliance. Interestingly, we also find that each method performs particularly well when the assumptions of the other approach are violated. These results highlight the importance of carefully selecting an estimation procedure whose assumptions are likely to be satisfied in practice and of having strong predictors of principal stratum membership.
PMCID: PMC3253203  PMID: 21971481
Complier average causal effect; Intermediate outcomes; Noncompliance; Principal stratification; Propensity scores
15.  Marginalized models for longitudinal ordinal data with application to quality of life studies 
Statistics in medicine  2008;27(21):4359-4380.
Random effects are often used in generalized linear models to explain the serial dependence for longitudinal categorical data. Marginalized random effects models (MREMs) for the analysis of longitudinal binary data have been proposed to permit likelihood-based estimation of marginal regression parameters. In this paper, we introduce an extension of the MREM to accommodate longitudinal ordinal data. Maximum marginal likelihood estimation is implemented utilizing quasi-Newton algorithms with Monte Carlo integration of the random effects. Our approach is applied to analyze the quality of life data from a recent colorectal cancer clinical trial. Dropout occurs at a high rate and is often due to tumor progression or death. To deal with progression/death, we use a mixture model for the joint distribution of longitudinal measures and progression/death times and principal stratification to draw causal inferences about survivors.
PMCID: PMC2858760  PMID: 18613246
marginalized likelihood-based models; ordinal data models; dropout
16.  Causal Vaccine Effects on Binary Postinfection Outcomes 
The effects of vaccine on postinfection outcomes, such as disease, death, and secondary transmission to others, are important scientific and public health aspects of prophylactic vaccination. As a result, evaluation of many vaccine effects condition on being infected. Conditioning on an event that occurs posttreatment (in our case, infection subsequent to assignment to vaccine or control) can result in selection bias. Moreover, because the set of individuals who would become infected if vaccinated is likely not identical to the set of those who would become infected if given control, comparisons that condition on infection do not have a causal interpretation. In this article we consider identifiability and estimation of causal vaccine effects on binary postinfection outcomes. Using the principal stratification framework, we define a postinfection causal vaccine efficacy estimand in individuals who would be infected regardless of treatment assignment. The estimand is shown to be not identifiable under the standard assumptions of the stable unit treatment value, monotonicity, and independence of treatment assignment. Thus selection models are proposed that identify the causal estimand. Closed-form maximum likelihood estimators (MLEs) are then derived under these models, including those assuming maximum possible levels of positive and negative selection bias. These results show the relations between the MLE of the causal estimand and two commonly used estimators for vaccine effects on postinfection outcomes. For example, the usual intent-to-treat estimator is shown to be an upper bound on the postinfection causal vaccine effect provided that the magnitude of protection against infection is not too large. The methods are used to evaluate postinfection vaccine effects in a clinical trial of a rotavirus vaccine candidate and in a field study of a pertussis vaccine. Our results show that pertussis vaccination has a significant causal effect in reducing disease severity.
PMCID: PMC2603579  PMID: 19096723
Causal inference; Infectious disease; Maximum likelihood; Principal stratification; Sensitivity analysis
17.  A tutorial on principal stratification-based sensitivity analysis: Application to smoking cessation studies 
One problem with assessing effects of smoking cessation interventions on withdrawal symptoms is that symptoms are affected by whether participants abstain from smoking during trials. Those who enter a randomized trial but do not change smoking behavior might not experience withdrawal related symptoms.
We present a tutorial of how one can use a principal stratification sensitivity analysis to account for abstinence in the estimation of smoking cessation intervention effects. The paper is intended to introduce researchers to principal stratification and describe how they might implement the methods.
We provide a hypothetical example that demonstrates why estimating effects within observed abstention groups is problematic. We demonstrate how estimation of effects within groups defined by potential abstention that an individual would have in either arm of a study can provide meaningful inferences. We describe a sensitivity analysis method to estimate such effects, and use it to investigate effects of a combined behavioral and nicotine replacement therapy intervention on withdrawal symptoms in a female prisoner population.
Overall, the intervention was found to reduce withdrawal symptoms but the effect was not statistically significant in the group that was observed to abstain. More importantly, the intervention was found to be highly effective in the group that would abstain regardless of intervention assignment. The effectiveness of the intervention in other potential abstinence strata depends on the sensitivity analysis assumptions.
We make assumptions to narrow the range of our sensitivity parameter estimates. While appropriate in this situation, such assumptions might not be plausible in all situations.
A principal stratification sensitivity analysis provides a meaningful method of accounting for abstinence effects in the evaluation of smoking cessation interventions on withdrawal symptoms. Smoking researchers have previously recommended analyses in subgroups defined by observed abstention status in the evaluation of smoking cessation interventions. We believe that principal stratification analyses should replace such analyses as the preferred means of accounting for post-randomization abstinence effects in the evaluation of smoking cessation programs.
PMCID: PMC2874094  PMID: 20423924
The annals of applied statistics  2008;2(3):1034-1055.
Participants in longitudinal studies on the effects of drug treatment and criminal justice system interventions are at high risk for institutionalization (e.g., spending time in an environment where their freedom to use drugs, commit crimes, or engage in risky behavior may be circumscribed). Methods used for estimating treatment effects in the presence of institutionalization during follow-up can be highly sensitive to assumptions that are unlikely to be met in applications and thus likely to yield misleading inferences. In this paper, we consider the use of principal stratification to control for institutionalization at follow-up. Principal stratification has been suggested for similar problems where outcomes are unobservable for samples of study participants because of dropout, death, or other forms of censoring. The method identifies principal strata within which causal effects are well defined and potentially estimable. We extend the method of principal stratification to model institutionalization at follow-up and estimate the effect of residential substance abuse treatment versus outpatient services in a large scale study of adolescent substance abuse treatment programs. Additionally, we discuss practical issues in applying the principal stratification model to data. We show via simulation studies that the model can only recover true effects provided the data meet strenuous demands and that there must be caution taken when implementing principal stratification as a technique to control for post-treatment confounders such as institutionalization.
PMCID: PMC2749670  PMID: 19779599
Principal Stratification; Post-Treatment Confounder; Institutionalization; Causal Inference
19.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma 
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma.
We introduce a modeling approach termed EPoC (Endogenous Perturbation analysis of Cancer), enabling the construction of global, gene-level models that causally connect gene copy number with expression in glioblastoma.On the basis of the resulting model, we predict genes that are likely to be disease-driving and validate selected predictions experimentally. We also demonstrate that further analysis of the network model by sparse singular value decomposition allows stratification of patients with glioblastoma into short-term and long-term survivors, introducing decomposed network models as a useful principle for biomarker discovery.Finally, in systematic comparisons, we demonstrate that EPoC is computationally efficient and yields more consistent results than mRNA-only methods, standard eQTL methods, and two recent multivariate methods for genotype–mRNA coupling.
Gains and losses of chromosomal material (DNA copy number aberrations; CNAs) are a characteristic feature of cancer genomes. At the level of a single locus, it is well known that increased copy number (gene amplification) typically leads to increased gene expression, whereas decreased copy number (gene deletion) leads to decreased gene expression (Pollack et al, 2002; Lee et al, 2008; Nilsson et al, 2008). However, CNAs also affect the expression of genes located outside the amplified/deleted region itself via indirect mechanisms. To fully understand the action of CNAs, it is therefore necessary to analyze their action in a network context. Toward this goal, improved computational approaches will be important, if not essential.
To determine the global effects on transcription of CNAs in the brain tumor glioblastoma, we develop EPoC (Endogenous Perturbation analysis of Cancer), a computational technique capable of inferring sparse, causal network models by combining genome-wide, paired CNA- and mRNA-level data. EPoC aims to detect disease-driving copy number aberrations and their effect on target mRNA expression, and stratify patients into long-term and short-term survivors. Technically, EPoC relates CNA perturbations to mRNA responses by matrix equations, derived from a steady-state approximation of the transcriptional network. Patient prognostic scores are obtained from singular value decompositions of the network matrix. The models are constructed by solving a large-scale, regularized regression problem.
We apply EPoC to glioblastoma data from The Cancer Genome Atlas (TCGA) consortium (186 patients). The identified CNA-driven network comprises 10 672 genes, and contains a number of copy number-altered genes that control multiple downstream genes. Highly connected hub genes include well-known oncogenes and tumor supressor genes that are frequently deleted or amplified in glioblastoma, including EGFR, PDGFRA, CDKN2A and CDKN2B, confirming a clear association between these aberrations and transcriptional variability of these brain tumors. In addition, we identify a number of hub genes that have previously not been associated with glioblastoma, including interferon alpha 1 (IFNA1), myeloid/lymphoid or mixed-lineage leukemia translocated to 10 (MLLT10, a well-known leukemia gene), glutamate decarboxylase 2 GAD2, a postulated glutamate receptor GPR158 and Necdin (NDN). Furthermore, we demonstrate that the network model contains useful information on downstream target genes (including stem cell regulators), and possible drug targets.
We proceed to explore the validity of a small network region experimentally. Introducing experimental perturbations of NDN and other targets in four glioblastoma cell lines (T98G, U-87MG, U-343MG and U-373MG), we confirm several predicted mechanisms. We also demonstrate that the TCGA glioblastoma patients can be stratified into long-term and short-term survivors, using our proposed prognostic scores derived from a singular vector decomposition of the network model. Finally, we compare EPoC to existing methods for mRNA networks analysis and expression quantitative locus methods, and demonstrate that EPoC produces more consistent models between technically independent glioblastoma data sets, and that the EPoC models exhibit better overlap with known protein–protein interaction networks and pathway maps.
In summary, we conclude that large-scale integrative modeling reveals mechanistically and prognostically informative networks in human glioblastoma. Our approach operates at the gene level and our data support that individual hub genes can be identified in practice. Very large aberrations, however, cannot be fully resolved by the current modeling strategy.
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
PMCID: PMC3101951  PMID: 21525872
cancer biology; cancer genomics; glioblastoma
20.  What Can Causal Networks Tell Us about Metabolic Pathways? 
PLoS Computational Biology  2012;8(4):e1002458.
Graphical models describe the linear correlation structure of data and have been used to establish causal relationships among phenotypes in genetic mapping populations. Data are typically collected at a single point in time. Biological processes on the other hand are often non-linear and display time varying dynamics. The extent to which graphical models can recapitulate the architecture of an underlying biological processes is not well understood. We consider metabolic networks with known stoichiometry to address the fundamental question: “What can causal networks tell us about metabolic pathways?”. Using data from an Arabidopsis BaySha population and simulated data from dynamic models of pathway motifs, we assess our ability to reconstruct metabolic pathways using graphical models. Our results highlight the necessity of non-genetic residual biological variation for reliable inference. Recovery of the ordering within a pathway is possible, but should not be expected. Causal inference is sensitive to subtle patterns in the correlation structure that may be driven by a variety of factors, which may not emphasize the substrate-product relationship. We illustrate the effects of metabolic pathway architecture, epistasis and stochastic variation on correlation structure and graphical model-derived networks. We conclude that graphical models should be interpreted cautiously, especially if the implied causal relationships are to be used in the design of intervention strategies.
Author Summary
High-throughput profiling data are pervasive in modern genetic studies. The large-scale nature of the data can make interpretation challenging. Methods that estimate networks or graphs have become popular tools for proposing causal relationships among traits. However, it is not obvious that these methods are able to capture causal biological mechanisms. Here we address the power and limitations of causal inference methods in biological systems. We examine metabolic data from simulation and from a well-characterized metabolic pathway in plants. We show that variation has to propagate through the pathway for reliable network inference. While it is possible for causal inference methods to recover the ordering of the biological pathway, it should not be expected. Causal relationships create subtle patterns in correlation, which may be dominated by other biological factors that do not reflect the ordering of the underlying pathway. Our results shape expectations about these methods and explain some of the successes and failures of causal graphical models for network inference.
PMCID: PMC3320578  PMID: 22496633
21.  Bounding the Infectiousness Effect in Vaccine Trials 
Epidemiology (Cambridge, Mass.)  2011;22(5):686-693.
In vaccine trials, the vaccination of one person might prevent the infection of another; a distinction can be drawn between the ways such a protective effect might arise. Consider a setting with 2 persons per household in which one of the 2 is vaccinated. Vaccinating the first person may protect the second person by preventing the first from being infected and passing the infection on to the second. Alternatively, vaccinating the first person may protect the second by rendering the infection less contagious even if the first is infected. This latter mechanism is sometimes referred to as an “infectiousness effect” of the vaccine. Crude estimators for the infectiousness effect will be subject to selection bias due to stratification on a postvaccination event, namely the infection status of the first person. We use theory concerning causal inference under interference along with a principal-stratification framework to show that, although the crude estimator is biased, it is, under plausible assumptions, conservative for what one might define as a causal infectiousness effect. This applies to bias from selection due to the persons in the comparison, and also to selection due to pathogen virulence. We illustrate our results with an example from the literature.
PMCID: PMC3792580  PMID: 21753730
22.  Iterative pruning PCA improves resolution of highly structured populations 
BMC Bioinformatics  2009;10:382.
Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming.
A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods.
The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.
PMCID: PMC2790469  PMID: 19930644
23.  Causal Inference in Longitudinal Comparative Effectiveness Studies With Repeated Measures of A Continuous Intermediate Variable 
Statistics in medicine  2014;33(20):3509-3527.
We propose a principal stratification approach to assess causal effects in non-randomized longitudinal comparative effectiveness studies with a binary endpoint outcome and repeated measures of a continuous intermediate variable. Our method is an extension of the principal stratification approach by Lin et al. [10,11], originally proposed for a longitudinal randomized study to assess the treatment effect of a continuous outcome adjusting for the heterogeneity of a repeatedly measured binary intermediate variable. Our motivation for this work comes from a comparison of the effect of two glucose-lowering medications on a clinical cohort of patients with type 2 diabetes. Here we consider a causal inference problem assessing how well the two medications work relative to one another on two binary endpoint outcomes: cardiovascular disease related hospitalization and all-cause mortality. Clinically, these glucose-lowering medications can have differential effects on the intermediate outcome, glucose level over time. Ultimately we want to compare medication effects on the endpoint outcomes among individuals in the same glucose trajectory stratum while accounting for the heterogeneity in baseline covariates (i.e., to obtain “principal effects” on the endpoint outcomes). The proposed method involves a 3-step model estimation procedure. Step 1 identifies principal strata associated with the intermediate variable using hybrid growth mixture modeling analyses [13]. Step 2 obtains the stratum membership using the pseudoclass technique [17,18], and derives propensity scores for treatment assignment. Step 3 obtains the stratum-specific treatment effect on the endpoint outcome weighted by inverse propensity probabilities derived from Step 2.
PMCID: PMC4122661  PMID: 24577715
Causal inference; Comparative effectiveness studies; Growth mixture model; Principal stratification; Propensity score
24.  Effects of BMI, Fat Mass, and Lean Mass on Asthma in Childhood: A Mendelian Randomization Study 
PLoS Medicine  2014;11(7):e1001669.
In this study, Granell and colleagues used Mendelian randomization to investigate causal effects of BMI, fat mass, and lean mass on current asthma at age 7½ years in the Avon Longitudinal Study of Parents and Children (ALSPAC) and found that higher BMI increases the risk of asthma in mid-childhood.
Please see later in the article for the Editors' Summary
Observational studies have reported associations between body mass index (BMI) and asthma, but confounding and reverse causality remain plausible explanations. We aim to investigate evidence for a causal effect of BMI on asthma using a Mendelian randomization approach.
Methods and Findings
We used Mendelian randomization to investigate causal effects of BMI, fat mass, and lean mass on current asthma at age 7½ y in the Avon Longitudinal Study of Parents and Children (ALSPAC). A weighted allele score based on 32 independent BMI-related single nucleotide polymorphisms (SNPs) was derived from external data, and associations with BMI, fat mass, lean mass, and asthma were estimated. We derived instrumental variable (IV) estimates of causal risk ratios (RRs). 4,835 children had available data on BMI-associated SNPs, asthma, and BMI. The weighted allele score was strongly associated with BMI, fat mass, and lean mass (all p-values<0.001) and with childhood asthma (RR 2.56, 95% CI 1.38–4.76 per unit score, p = 0.003). The estimated causal RR for the effect of BMI on asthma was 1.55 (95% CI 1.16–2.07) per kg/m2, p = 0.003. This effect appeared stronger for non-atopic (1.90, 95% CI 1.19–3.03) than for atopic asthma (1.37, 95% CI 0.89–2.11) though there was little evidence of heterogeneity (p = 0.31). The estimated causal RRs for the effects of fat mass and lean mass on asthma were 1.41 (95% CI 1.11–1.79) per 0.5 kg and 2.25 (95% CI 1.23–4.11) per kg, respectively. The possibility of genetic pleiotropy could not be discounted completely; however, additional IV analyses using FTO variant rs1558902 and the other BMI-related SNPs separately provided similar causal effects with wider confidence intervals. Loss of follow-up was unlikely to bias the estimated effects.
Higher BMI increases the risk of asthma in mid-childhood. Higher BMI may have contributed to the increase in asthma risk toward the end of the 20th century.
Please see later in the article for the Editors' Summary
Editors' Summary
The global burden of asthma, a chronic (long-term) condition caused by inflammation of the airways (the tubes that carry air in and out of the lungs), has been rising steadily over the past few decades. It is estimated that, nowadays, 200–300 million adults and children worldwide are affected by asthma. Although asthma can develop at any age, it is often diagnosed in childhood—asthma is the most common chronic disease in children. In people with asthma, the airways can react very strongly to allergens such as animal fur or to irritants such as cigarette smoke, becoming narrower so that less air can enter the lungs. Exercise, cold air, and infections can also trigger asthma attacks, which can be fatal. The symptoms of asthma include wheezing, coughing, chest tightness, and shortness of breath. Asthma cannot be cured, but drugs can relieve its symptoms and prevent acute asthma attacks.
Why Was This Study Done?
We cannot halt the ongoing rise in global asthma rates without understanding the causes of asthma. Some experts think obesity may be one cause of asthma. Obesity, like asthma, is increasingly common, and observational studies (investigations that ask whether individuals exposed to a suspected risk factor for a condition develop that condition more often than unexposed individuals) in children have reported that body mass index (BMI, an indicator of body fat calculated by dividing a person's weight in kilograms by their height in meters squared) is positively associated with asthma. Observational studies cannot prove that obesity causes asthma because of “confounding.” Overweight children with asthma may share another unknown characteristic (confounder) that actually causes both obesity and asthma. Moreover, children with asthma may be less active than unaffected children, so they become overweight (reverse causality). Here, the researchers use “Mendelian randomization” to assess whether BMI has a causal effect on asthma. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the effect of a modifiable risk factor and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if a higher BMI leads to asthma, genetic variants associated with increased BMI should be associated with an increased risk of asthma.
What Did the Researchers Do and Find?
The researchers investigated causal effects of BMI, fat mass, and lean mass on current asthma at age 7½ years in 4,835 children enrolled in the Avon Longitudinal Study of Parents and Children (ALSPAC, a long-term health project that started in 1991). They calculated an allele score for each child based on 32 BMI-related genetic variants, and estimated associations between this score and BMI, fat mass and lean mass (both measured using a special type of X-ray scanner; in children BMI is not a good indicator of “fatness”), and asthma. They report that the allele score was strongly associated with BMI, fat mass, and lean mass, and with childhood asthma. The estimated causal relative risk (risk ratio) for the effect of BMI on asthma was 1.55 per kg/m2. That is, the relative risk of asthma increased by 55% for every extra unit of BMI. The estimated causal relative risks for the effects of fat mass and lean mass on asthma were 1.41 per 0.5 kg and 2.25 per kg, respectively.
What Do These Findings Mean?
These findings suggest that a higher BMI increases the risk of asthma in mid-childhood and that global increases in BMI toward the end of the 20th century may have contributed to the global increase in asthma that occurred at the same time. It is possible that the observed association between BMI and asthma reported in this study is underpinned by “genetic pleiotropy” (a potential limitation of all Mendelian randomization analyses). That is, some of the genetic variants included in the BMI allele score could conceivably also increase the risk of asthma. Nevertheless, these findings suggest that public health interventions designed to reduce obesity may also help to limit the global rise in asthma.
Additional Information
Please access these websites via the online version of this summary at
The US Centers for Disease Control and Prevention provides information on asthma and on all aspects of overweight and obesity (in English and Spanish)
The World Health Organization provides information on asthma and on obesity (in several languages)
The UK National Health Service Choices website provides information about asthma, about asthma in children, and about obesity (including real stories)
The Global Asthma Report 2011 is available
The Global Initiative for Asthma released its updated Global Strategy for Asthma Management and Prevention on World Asthma Day 2014
Information about the Avon Longitudinal Study of Parents and Children is available
MedlinePlus provides links to further information on obesity in children, on asthma, and on asthma in children (in English and Spanish
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC4077660  PMID: 24983943
25.  Assessing statistical significance in causal graphs 
BMC Bioinformatics  2012;13:35.
Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing.
First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds.
Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs.
We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields.
PMCID: PMC3307026  PMID: 22348444

Results 1-25 (808393)