PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (794813)

Clipboard (0)
None

Related Articles

1.  Commentary on “Principal Stratification — a Goal or a Tool?” by Judea Pearl 
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
doi:10.2202/1557-4679.1341
PMCID: PMC3204668  PMID: 22049267
principal stratification; causal inference; vaccine trial
2.  Causal assessment of surrogacy in a meta-analysis of colorectal cancer trials 
Biostatistics (Oxford, England)  2011;12(3):478-492.
When the true end points (T) are difficult or costly to measure, surrogate markers (S) are often collected in clinical trials to help predict the effect of the treatment (Z). There is great interest in understanding the relationship among S, T, and Z. A principal stratification (PS) framework has been proposed by Frangakis and Rubin (2002) to study their causal associations. In this paper, we extend the framework to a multiple trial setting and propose a Bayesian hierarchical PS model to assess surrogacy. We apply the method to data from a large collection of colon cancer trials in which S and T are binary. We obtain the trial-specific causal measures among S, T, and Z, as well as their overall population-level counterparts that are invariant across trials. The method allows for information sharing across trials and reduces the nonidentifiability problem. We examine the frequentist properties of our model estimates and the impact of the monotonicity assumption using simulations. We also illustrate the challenges in evaluating surrogacy in the counterfactual framework that result from nonidentifiability.
doi:10.1093/biostatistics/kxq082
PMCID: PMC3114655  PMID: 21252079
Bayesian estimation; Counterfactual model; Identifiability; Multiple trials; Principal stratification; Surrogate marker
3.  AN APPLICATION OF PRINCIPAL STRATIFICATION TO CONTROL FOR INSTITUTIONALIZATION AT FOLLOW-UP IN STUDIES OF SUBSTANCE ABUSE TREATMENT PROGRAMS* 
The annals of applied statistics  2008;2(3):1034-1055.
Participants in longitudinal studies on the effects of drug treatment and criminal justice system interventions are at high risk for institutionalization (e.g., spending time in an environment where their freedom to use drugs, commit crimes, or engage in risky behavior may be circumscribed). Methods used for estimating treatment effects in the presence of institutionalization during follow-up can be highly sensitive to assumptions that are unlikely to be met in applications and thus likely to yield misleading inferences. In this paper, we consider the use of principal stratification to control for institutionalization at follow-up. Principal stratification has been suggested for similar problems where outcomes are unobservable for samples of study participants because of dropout, death, or other forms of censoring. The method identifies principal strata within which causal effects are well defined and potentially estimable. We extend the method of principal stratification to model institutionalization at follow-up and estimate the effect of residential substance abuse treatment versus outpatient services in a large scale study of adolescent substance abuse treatment programs. Additionally, we discuss practical issues in applying the principal stratification model to data. We show via simulation studies that the model can only recover true effects provided the data meet strenuous demands and that there must be caution taken when implementing principal stratification as a technique to control for post-treatment confounders such as institutionalization.
doi:10.1214/08-AOAS179
PMCID: PMC2749670  PMID: 19779599
Principal Stratification; Post-Treatment Confounder; Institutionalization; Causal Inference
4.  Clarifying the Role of Principal Stratification in the Paired Availability Design 
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
doi:10.2202/1557-4679.1338
PMCID: PMC3114955  PMID: 21686085
principal stratification; causal inference; paired availability design
5.  Partially hidden Markov model for time-varying principal stratification in HIV prevention trials 
It is frequently of interest to estimate the intervention effect that adjusts for post-randomization variables in clinical trials. In the recently completed HPTN 035 trial, there is differential condom use between the three microbicide gel arms and the No Gel control arm, so that intention to treat (ITT) analyses only assess the net treatment effect that includes the indirect treatment effect mediated through differential condom use. Various statistical methods in causal inference have been developed to adjust for post-randomization variables. We extend the principal stratification framework to time-varying behavioral variables in HIV prevention trials with a time-to-event endpoint, using a partially hidden Markov model (pHMM). We formulate the causal estimand of interest, establish assumptions that enable identifiability of the causal parameters, and develop maximum likelihood methods for estimation. Application of our model on the HPTN 035 trial reveals an interesting pattern of prevention effectiveness among different condom-use principal strata.
doi:10.1080/01621459.2011.643743
PMCID: PMC3649016  PMID: 23667279
microbicide; causal inference; posttreatment variables; direct effect
6.  Principal Stratification — a Goal or a Tool? 
Principal stratification has recently become a popular tool to address certain causal inference questions, particularly in dealing with post-randomization factors in randomized trials. Here, we analyze the conceptual basis for this framework and invite response to clarify the value of principal stratification in estimating causal effects of interest.
doi:10.2202/1557-4679.1322
PMCID: PMC3083139  PMID: 21556288
causal inference; principal stratification; surrogate endpoints; direct effect; mediation
7.  Estimating Causal Effects in Trials Involving Multi-Treatment Arms Subject to Non-compliance: A Bayesian framework 
Summary
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
doi:10.1111/j.1467-9876.2009.00709.x
PMCID: PMC3104736  PMID: 21637737
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
8.  Intermediate outcomes in randomized clinical trials: an introduction 
Trials  2013;14:78.
Background
Intermediate outcomes are common and typically on the causal pathway to the final outcome. Some examples include noncompliance, missing data, and truncation by death like pregnancy (e.g. when the trial intervention is given to non-pregnant women and the final outcome is preeclampsia, defined only on pregnant women). The intention-to-treat approach does not account properly for them, and more appropriate alternative approaches like principal stratification are not yet widely known. The purposes of this study are to inform researchers that the intention-to-treat approach unfortunately does not fit all problems we face in experimental research, to introduce the principal stratification approach for dealing with intermediate outcomes, and to illustrate its application to a trial of long term calcium supplementation in women at high risk of preeclampsia.
Methods
Principal stratification and related concepts are introduced. Two ways for estimating causal effects are discussed and their application is illustrated using the calcium trial, where noncompliance and pregnancy are considered as intermediate outcomes, and preeclampsia is the main final outcome.
Results
The limitations of traditional approaches and methods for dealing with intermediate outcomes are demonstrated. The steps, assumptions and required calculations involved in the application of the principal stratification approach are discussed in detail in the case of our calcium trial.
Conclusions
The intention-to-treat approach is a very sound one but unfortunately it does not fit all problems we find in randomized clinical trials; this is particularly the case for intermediate outcomes, where alternative approaches like principal stratification should be considered.
doi:10.1186/1745-6215-14-78
PMCID: PMC3610291  PMID: 23510143
Intermediate outcomes; Intention-to-treat approach; Principal stratification; Causal effects
9.  A tutorial on principal stratification-based sensitivity analysis: Application to smoking cessation studies 
Background
One problem with assessing effects of smoking cessation interventions on withdrawal symptoms is that symptoms are affected by whether participants abstain from smoking during trials. Those who enter a randomized trial but do not change smoking behavior might not experience withdrawal related symptoms.
Purpose
We present a tutorial of how one can use a principal stratification sensitivity analysis to account for abstinence in the estimation of smoking cessation intervention effects. The paper is intended to introduce researchers to principal stratification and describe how they might implement the methods.
Methods
We provide a hypothetical example that demonstrates why estimating effects within observed abstention groups is problematic. We demonstrate how estimation of effects within groups defined by potential abstention that an individual would have in either arm of a study can provide meaningful inferences. We describe a sensitivity analysis method to estimate such effects, and use it to investigate effects of a combined behavioral and nicotine replacement therapy intervention on withdrawal symptoms in a female prisoner population.
Results
Overall, the intervention was found to reduce withdrawal symptoms but the effect was not statistically significant in the group that was observed to abstain. More importantly, the intervention was found to be highly effective in the group that would abstain regardless of intervention assignment. The effectiveness of the intervention in other potential abstinence strata depends on the sensitivity analysis assumptions.
Limitations
We make assumptions to narrow the range of our sensitivity parameter estimates. While appropriate in this situation, such assumptions might not be plausible in all situations.
Conclusions
A principal stratification sensitivity analysis provides a meaningful method of accounting for abstinence effects in the evaluation of smoking cessation interventions on withdrawal symptoms. Smoking researchers have previously recommended analyses in subgroups defined by observed abstention status in the evaluation of smoking cessation interventions. We believe that principal stratification analyses should replace such analyses as the preferred means of accounting for post-randomization abstinence effects in the evaluation of smoking cessation programs.
doi:10.1177/1740774510367811
PMCID: PMC2874094  PMID: 20423924
10.  Statistical identifiability and the surrogate endpoint problem, with application to vaccine trials 
Biometrics  2010;66(4):1153-1161.
Summary
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
doi:10.1111/j.1541-0420.2009.01380.x
PMCID: PMC3597127  PMID: 20105158
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
11.  Assessing the sensitivity of methods for estimating principal causal effects 
Statistical methods in medical research  2011;10.1177/0962280211421840.
The framework of principal stratification provides a way to think about treatment effects conditional on post-randomization variables, such as level of compliance. In particular, the complier average causal effect (CACE)–the effect of the treatment for those individuals who would comply with their treatment assignment under either treatment condition–is often of substantive interest. However, estimation of the CACE is not always straightforward, with a variety of estimation procedures and underlying assumptions, but little advice to help researchers select between methods. In this paper we discuss and examine two methods that rely on very different assumptions to estimate the CACE: a maximum likelihood (“joint”) method that assumes the “exclusion restriction,” and a propensity score based method that relies on “principal ignorability.” We detail the assumptions underlying each approach, and assess each method’s sensitivity to both its own assumptions and those of the other method using both simulated data and a motivating example. We find that the exclusion restriction based joint approach appears somewhat less sensitive to its assumptions, and that the performance of both methods is significantly improved when there are strong predictors of compliance. Interestingly, we also find that each method performs particularly well when the assumptions of the other approach are violated. These results highlight the importance of carefully selecting an estimation procedure whose assumptions are likely to be satisfied in practice and of having strong predictors of principal stratum membership.
doi:10.1177/0962280211421840
PMCID: PMC3253203  PMID: 21971481
Complier average causal effect; Intermediate outcomes; Noncompliance; Principal stratification; Propensity scores
12.  Principal Stratification and Attribution Prohibition: Good Ideas Taken Too Far 
Pearl’s article provides a useful springboard for discussing further the benefits and drawbacks of principal stratification and the associated discomfort with attributing effects to post-treatment variables. The basic insights of the approach are important: pay close attention to modification of treatment effects by variables not observable before treatment decisions are made, and be careful in attributing effects to variables when counterfactuals are ill-defined. These insights have often been taken too far in many areas of application of the approach, including instrumental variables, censoring by death, and surrogate outcomes. A novel finding is that the usual principal stratification estimand in the setting of censoring by death is by itself of little practical value in estimating intervention effects.
doi:10.2202/1557-4679.1367
PMCID: PMC3204670  PMID: 22049269
principal stratification; causal inference
13.  Response to Pearl's Comments on Principal Stratification 
Dr. Pearl invites researchers to justify their use of principal stratification. This comment explains how the use of principal stratification simplified a complex mediational problem encountered when evaluating a smoking cessation intervention's effect on reducing smoking withdrawal symptoms.
doi:10.2202/1557-4679.1330
PMCID: PMC3114954  PMID: 21686084
causal inference; principal stratification; mediation; smoking cessation interventions
14.  Bounding the Infectiousness Effect in Vaccine Trials 
Epidemiology (Cambridge, Mass.)  2011;22(5):686-693.
In vaccine trials, the vaccination of one person might prevent the infection of another; a distinction can be drawn between the ways such a protective effect might arise. Consider a setting with 2 persons per household in which one of the 2 is vaccinated. Vaccinating the first person may protect the second person by preventing the first from being infected and passing the infection on to the second. Alternatively, vaccinating the first person may protect the second by rendering the infection less contagious even if the first is infected. This latter mechanism is sometimes referred to as an “infectiousness effect” of the vaccine. Crude estimators for the infectiousness effect will be subject to selection bias due to stratification on a postvaccination event, namely the infection status of the first person. We use theory concerning causal inference under interference along with a principal-stratification framework to show that, although the crude estimator is biased, it is, under plausible assumptions, conservative for what one might define as a causal infectiousness effect. This applies to bias from selection due to the persons in the comparison, and also to selection due to pathogen virulence. We illustrate our results with an example from the literature.
doi:10.1097/EDE.0b013e31822708d5
PMCID: PMC3792580  PMID: 21753730
15.  Links between analysis of surrogate endpoints and endogeneity 
Statistics in medicine  2010;29(28):2869-2879.
Summary
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
doi:10.1002/sim.4027
PMCID: PMC2997195  PMID: 20803482
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
16.  Estimating causal effects of air quality regulations using principal stratification for spatially correlated multivariate intermediate outcomes 
Biostatistics (Oxford, England)  2012;13(2):289-302.
Methods for causal inference regarding health effects of air quality regulations are met with unique challenges because (1) changes in air quality are intermediates on the causal pathway between regulation and health, (2) regulations typically affect multiple pollutants on the causal pathway towards health, and (3) regulating a given location can affect pollution at other locations, that is, there is interference between observations. We propose a principal stratification method designed to examine causal effects of a regulation on health that are and are not associated with causal effects of the regulation on air quality. A novel feature of our approach is the accommodation of a continuously scaled multivariate intermediate response vector representing multiple pollutants. Furthermore, we use a spatial hierarchical model for potential pollution concentrations and ultimately use estimates from this model to assess validity of assumptions regarding interference. We apply our method to estimate causal effects of the 1990 Clean Air Act Amendments among approximately 7 million Medicare enrollees living within 6 miles of a pollution monitor.
doi:10.1093/biostatistics/kxr052
PMCID: PMC3297824  PMID: 22267524
Air pollution; Bayesian statistics; Causal inference; Principal stratification; Spatial data
17.  Causal Inference in Public Health 
Causal inference has a central role in public health; the determination that an association is causal indicates the possibility for intervention. We review and comment on the long-used guidelines for interpreting evidence as supporting a causal association and contrast them with the potential outcomes framework that encourages thinking in terms of causes that are interventions. We argue that in public health this framework is more suitable, providing an estimate of an action’s consequences rather than the less precise notion of a risk factor’s causal effect. A variety of modern statistical methods adopt this approach. When an intervention cannot be specified, causal relations can still exist, but how to intervene to change the outcome will be unclear. In application, the often-complex structure of causal processes needs to be acknowledged and appropriate data collected to study them. These newer approaches need to be brought to bear on the increasingly complex public health challenges of our globalized world.
doi:10.1146/annurev-publhealth-031811-124606
PMCID: PMC4079266  PMID: 23297653
causation; causal modeling; causal framework; epidemiology
18.  Marbled Inflation From Population Structure in Gene-Based Association Studies With Rare Variants 
Genetic epidemiology  2013;37(3):10.1002/gepi.21714.
Accurate genetic association studies are crucial for the detection and the validation of disease determinants. One of the main confounding factors that affect accuracy is population stratification, and great efforts have been extended for the past decade to detect and to adjust for it. We have now efficient solutions for population stratification adjustment for single-SNP (where SNP is single-nucleotide polymorphisms) inference in genome-wide association studies, but it is unclear whether these solutions can be effectively applied to rare variation studies and in particular gene-based (or set-based) association methods that jointly analyze multiple rare and common variants. We examine here, both theoretically and empirically, the performance of two commonly used approaches for population stratification adjustment—genomic control and principal component analysis—when used on gene-based association tests. We show that, different from single-SNP inference, genes with diverse composition of rare and common variants may suffer from population stratification to various extent. The inflation in gene-level statistics could be impacted by the number and the allele frequency spectrum of SNPs in the gene, and by the gene-based testing method used in the analysis. As a consequence, using a universal inflation factor as a genomic control should be avoided in gene-based inference with sequencing data. We also demonstrate that caution needs to be exercised when using principal component adjustment because the accuracy of the adjusted analyses depends on the underlying population substructure, on the way the principal components are constructed, and on the number of principal components used to recover the substructure.
doi:10.1002/gepi.21714
PMCID: PMC3716585  PMID: 23468125
sequencing studies; gene-based association test; genomic control; principal component analysis; C-alpha test; burden test
19.  Adjusting for Population Stratification in a Fine Scale with Principal Components and Sequencing Data 
Genetic epidemiology  2013;37(8):10.1002/gepi.21764.
Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.
doi:10.1002/gepi.21764
PMCID: PMC3864649  PMID: 24123217
1000 Genomes Project; Association testing; Common variants; Principal component analysis; Rare variants; Spectral analysis
20.  Marginalized models for longitudinal ordinal data with application to quality of life studies 
Statistics in medicine  2008;27(21):4359-4380.
SUMMARY
Random effects are often used in generalized linear models to explain the serial dependence for longitudinal categorical data. Marginalized random effects models (MREMs) for the analysis of longitudinal binary data have been proposed to permit likelihood-based estimation of marginal regression parameters. In this paper, we introduce an extension of the MREM to accommodate longitudinal ordinal data. Maximum marginal likelihood estimation is implemented utilizing quasi-Newton algorithms with Monte Carlo integration of the random effects. Our approach is applied to analyze the quality of life data from a recent colorectal cancer clinical trial. Dropout occurs at a high rate and is often due to tumor progression or death. To deal with progression/death, we use a mixture model for the joint distribution of longitudinal measures and progression/death times and principal stratification to draw causal inferences about survivors.
doi:10.1002/sim.3352
PMCID: PMC2858760  PMID: 18613246
marginalized likelihood-based models; ordinal data models; dropout
21.  Causal Vaccine Effects on Binary Postinfection Outcomes 
The effects of vaccine on postinfection outcomes, such as disease, death, and secondary transmission to others, are important scientific and public health aspects of prophylactic vaccination. As a result, evaluation of many vaccine effects condition on being infected. Conditioning on an event that occurs posttreatment (in our case, infection subsequent to assignment to vaccine or control) can result in selection bias. Moreover, because the set of individuals who would become infected if vaccinated is likely not identical to the set of those who would become infected if given control, comparisons that condition on infection do not have a causal interpretation. In this article we consider identifiability and estimation of causal vaccine effects on binary postinfection outcomes. Using the principal stratification framework, we define a postinfection causal vaccine efficacy estimand in individuals who would be infected regardless of treatment assignment. The estimand is shown to be not identifiable under the standard assumptions of the stable unit treatment value, monotonicity, and independence of treatment assignment. Thus selection models are proposed that identify the causal estimand. Closed-form maximum likelihood estimators (MLEs) are then derived under these models, including those assuming maximum possible levels of positive and negative selection bias. These results show the relations between the MLE of the causal estimand and two commonly used estimators for vaccine effects on postinfection outcomes. For example, the usual intent-to-treat estimator is shown to be an upper bound on the postinfection causal vaccine effect provided that the magnitude of protection against infection is not too large. The methods are used to evaluate postinfection vaccine effects in a clinical trial of a rotavirus vaccine candidate and in a field study of a pertussis vaccine. Our results show that pertussis vaccination has a significant causal effect in reducing disease severity.
doi:10.1198/016214505000000970
PMCID: PMC2603579  PMID: 19096723
Causal inference; Infectious disease; Maximum likelihood; Principal stratification; Sensitivity analysis
22.  Clustering by genetic ancestry using genome-wide SNP data 
BMC Genetics  2010;11:108.
Background
Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls.
Results
We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations.
Conclusions
In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects.
doi:10.1186/1471-2156-11-108
PMCID: PMC3018397  PMID: 21143920
23.  Effect of sample stratification on dairy GWAS results 
BMC Genomics  2012;13:536.
Background
Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.
Results
Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.
Conclusions
Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.
doi:10.1186/1471-2164-13-536
PMCID: PMC3496570  PMID: 23039970
24.  THE POTENTIAL FOR BIAS IN PRINCIPAL CAUSAL EFFECT ESTIMATION WHEN TREATMENT RECEIVED DEPENDS ON A KEY COVARIATE1 
The annals of applied statistics  2011;5(3):1876-1892.
Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predict membership in latent principal strata. We show that when treatment received can differ from treatment assigned in both study arms, incorporating a stratum-predictive covariate can make estimates of the “complier average causal effect” (CACE) derive from observations in the two treatment arms with different covariate distributions. Adopting a Bayesian perspective and using Markov chain Monte Carlo for computation, we develop posterior checks that characterize the extent to which incorporating the pretreatment covariate endangers estimation of the CACE. We apply the method to analyze a clinical trial comparing two treatments for jaw fractures in which the study protocol allowed surgeons to overrule both possible randomized treatment assignments based on their clinical judgment and the data contained a key covariate (injury severity) predictive of treatment received.
doi:10.1214/11-AOAS477
PMCID: PMC3269822  PMID: 22308190
Complier average causal effect; noncompliance; principal effect; principal stratification
25.  Bayesian inference for causal mediation effects using principal stratification with dichotomous mediators and outcomes 
Biostatistics (Oxford, England)  2010;11(2):353-372.
Most investigations in the social and health sciences aim to understand the directional or causal relationship between a treatment or risk factor and outcome. Given the multitude of pathways through which the treatment or risk factor may affect the outcome, there is also an interest in decomposing the effect of a treatment of risk factor into “direct” and “mediated” effects. For example, child's socioeconomic status (risk factor) may have a direct effect on the risk of death (outcome) and an effect that may be mediated through the adulthood socioeconomic status (mediator). Building on the potential outcome framework for causal inference, we develop a Bayesian approach for estimating direct and mediated effects in the context of a dichotomous mediator and dichotomous outcome, which is challenging as many parameters cannot be fully identified. We first define principal strata corresponding to the joint distribution of the observed and counterfactual values of the mediator, and define associate, dissociative, and mediated effects as functions of the differences in the mean outcome under differing treatment assignments within the principal strata. We then develop the likelihood properties and calculate nonparametric bounds of these causal effects assuming randomized treatment assignment. Because likelihood theory is not well developed for nonidentifiable parameters, we consider a Bayesian approach that allows the direct and mediated effects to be expressed in terms of the posterior distribution of the population parameters of interest. This range can be reduced by making further assumptions about the parameters that can be encoded in prior distribution assumptions. We perform sensitivity analyses by using several prior distributions that make weaker assumptions than monotonicity or the exclusion restriction. We consider an application that explores the mediating effects of adult poverty on the relationship between childhood poverty and risk of death.
doi:10.1093/biostatistics/kxp060
PMCID: PMC2830580  PMID: 20101045
Direct effect; Mediated effect; Monotonicity; Mortality; Poverty

Results 1-25 (794813)