This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
principal stratification; causal inference; vaccine trial
When the true end points (T) are difficult or costly to measure, surrogate markers (S) are often collected in clinical trials to help predict the effect of the treatment (Z). There is great interest in understanding the relationship among S, T, and Z. A principal stratification (PS) framework has been proposed by Frangakis and Rubin (2002) to study their causal associations. In this paper, we extend the framework to a multiple trial setting and propose a Bayesian hierarchical PS model to assess surrogacy. We apply the method to data from a large collection of colon cancer trials in which S and T are binary. We obtain the trial-specific causal measures among S, T, and Z, as well as their overall population-level counterparts that are invariant across trials. The method allows for information sharing across trials and reduces the nonidentifiability problem. We examine the frequentist properties of our model estimates and the impact of the monotonicity assumption using simulations. We also illustrate the challenges in evaluating surrogacy in the counterfactual framework that result from nonidentifiability.
Bayesian estimation; Counterfactual model; Identifiability; Multiple trials; Principal stratification; Surrogate marker
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
principal stratification; causal inference; paired availability design
Participants in longitudinal studies on the effects of drug treatment and criminal justice system interventions are at high risk for institutionalization (e.g., spending time in an environment where their freedom to use drugs, commit crimes, or engage in risky behavior may be circumscribed). Methods used for estimating treatment effects in the presence of institutionalization during follow-up can be highly sensitive to assumptions that are unlikely to be met in applications and thus likely to yield misleading inferences. In this paper, we consider the use of principal stratification to control for institutionalization at follow-up. Principal stratification has been suggested for similar problems where outcomes are unobservable for samples of study participants because of dropout, death, or other forms of censoring. The method identifies principal strata within which causal effects are well defined and potentially estimable. We extend the method of principal stratification to model institutionalization at follow-up and estimate the effect of residential substance abuse treatment versus outpatient services in a large scale study of adolescent substance abuse treatment programs. Additionally, we discuss practical issues in applying the principal stratification model to data. We show via simulation studies that the model can only recover true effects provided the data meet strenuous demands and that there must be caution taken when implementing principal stratification as a technique to control for post-treatment confounders such as institutionalization.
Principal Stratification; Post-Treatment Confounder; Institutionalization; Causal Inference
Principal stratification has recently become a popular tool to address certain causal inference questions, particularly in dealing with post-randomization factors in randomized trials. Here, we analyze the conceptual basis for this framework and invite response to clarify the value of principal stratification in estimating causal effects of interest.
causal inference; principal stratification; surrogate endpoints; direct effect; mediation
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
One problem with assessing effects of smoking cessation interventions on withdrawal symptoms is that symptoms are affected by whether participants abstain from smoking during trials. Those who enter a randomized trial but do not change smoking behavior might not experience withdrawal related symptoms.
We present a tutorial of how one can use a principal stratification sensitivity analysis to account for abstinence in the estimation of smoking cessation intervention effects. The paper is intended to introduce researchers to principal stratification and describe how they might implement the methods.
We provide a hypothetical example that demonstrates why estimating effects within observed abstention groups is problematic. We demonstrate how estimation of effects within groups defined by potential abstention that an individual would have in either arm of a study can provide meaningful inferences. We describe a sensitivity analysis method to estimate such effects, and use it to investigate effects of a combined behavioral and nicotine replacement therapy intervention on withdrawal symptoms in a female prisoner population.
Overall, the intervention was found to reduce withdrawal symptoms but the effect was not statistically significant in the group that was observed to abstain. More importantly, the intervention was found to be highly effective in the group that would abstain regardless of intervention assignment. The effectiveness of the intervention in other potential abstinence strata depends on the sensitivity analysis assumptions.
We make assumptions to narrow the range of our sensitivity parameter estimates. While appropriate in this situation, such assumptions might not be plausible in all situations.
A principal stratification sensitivity analysis provides a meaningful method of accounting for abstinence effects in the evaluation of smoking cessation interventions on withdrawal symptoms. Smoking researchers have previously recommended analyses in subgroups defined by observed abstention status in the evaluation of smoking cessation interventions. We believe that principal stratification analyses should replace such analyses as the preferred means of accounting for post-randomization abstinence effects in the evaluation of smoking cessation programs.
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
Pearl’s article provides a useful springboard for discussing further the benefits and drawbacks of principal stratification and the associated discomfort with attributing effects to post-treatment variables. The basic insights of the approach are important: pay close attention to modification of treatment effects by variables not observable before treatment decisions are made, and be careful in attributing effects to variables when counterfactuals are ill-defined. These insights have often been taken too far in many areas of application of the approach, including instrumental variables, censoring by death, and surrogate outcomes. A novel finding is that the usual principal stratification estimand in the setting of censoring by death is by itself of little practical value in estimating intervention effects.
principal stratification; causal inference
Dr. Pearl invites researchers to justify their use of principal stratification. This comment explains how the use of principal stratification simplified a complex mediational problem encountered when evaluating a smoking cessation intervention's effect on reducing smoking withdrawal symptoms.
causal inference; principal stratification; mediation; smoking cessation interventions
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
Random effects are often used in generalized linear models to explain the serial dependence for longitudinal categorical data. Marginalized random effects models (MREMs) for the analysis of longitudinal binary data have been proposed to permit likelihood-based estimation of marginal regression parameters. In this paper, we introduce an extension of the MREM to accommodate longitudinal ordinal data. Maximum marginal likelihood estimation is implemented utilizing quasi-Newton algorithms with Monte Carlo integration of the random effects. Our approach is applied to analyze the quality of life data from a recent colorectal cancer clinical trial. Dropout occurs at a high rate and is often due to tumor progression or death. To deal with progression/death, we use a mixture model for the joint distribution of longitudinal measures and progression/death times and principal stratification to draw causal inferences about survivors.
marginalized likelihood-based models; ordinal data models; dropout
The effects of vaccine on postinfection outcomes, such as disease, death, and secondary transmission to others, are important scientific and public health aspects of prophylactic vaccination. As a result, evaluation of many vaccine effects condition on being infected. Conditioning on an event that occurs posttreatment (in our case, infection subsequent to assignment to vaccine or control) can result in selection bias. Moreover, because the set of individuals who would become infected if vaccinated is likely not identical to the set of those who would become infected if given control, comparisons that condition on infection do not have a causal interpretation. In this article we consider identifiability and estimation of causal vaccine effects on binary postinfection outcomes. Using the principal stratification framework, we define a postinfection causal vaccine efficacy estimand in individuals who would be infected regardless of treatment assignment. The estimand is shown to be not identifiable under the standard assumptions of the stable unit treatment value, monotonicity, and independence of treatment assignment. Thus selection models are proposed that identify the causal estimand. Closed-form maximum likelihood estimators (MLEs) are then derived under these models, including those assuming maximum possible levels of positive and negative selection bias. These results show the relations between the MLE of the causal estimand and two commonly used estimators for vaccine effects on postinfection outcomes. For example, the usual intent-to-treat estimator is shown to be an upper bound on the postinfection causal vaccine effect provided that the magnitude of protection against infection is not too large. The methods are used to evaluate postinfection vaccine effects in a clinical trial of a rotavirus vaccine candidate and in a field study of a pertussis vaccine. Our results show that pertussis vaccination has a significant causal effect in reducing disease severity.
Causal inference; Infectious disease; Maximum likelihood; Principal stratification; Sensitivity analysis
Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.
Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.
Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.
Population stratification can cause spurious associations in a genome-wide association study (GWAS), and occurs when differences in allele frequencies of single nucleotide polymorphisms (SNPs) are due to ancestral differences between cases and controls rather than the trait of interest. Principal components analysis (PCA) is the established approach to detect population substructure using genome-wide data and to adjust the genetic association for stratification by including the top principal components in the analysis. An alternative solution is genetic matching of cases and controls that requires, however, well defined population strata for appropriate selection of cases and controls.
We developed a novel algorithm to cluster individuals into groups with similar ancestral backgrounds based on the principal components computed by PCA. We demonstrate the effectiveness of our algorithm in real and simulated data, and show that matching cases and controls using the clusters assigned by the algorithm substantially reduces population stratification bias. Through simulation we show that the power of our method is higher than adjustment for PCs in certain situations.
In addition to reducing population stratification bias and improving power, matching creates a clean dataset free of population stratification which can then be used to build prediction models without including variables to adjust for ancestry. The cluster assignments also allow for the estimation of genetic heterogeneity by examining cluster specific effects.
Motivated by a potential-outcomes perspective, the idea of principal stratification has been widely recognized for its relevance in settings susceptible to posttreatment selection bias such as randomized clinical trials where treatment received can differ from treatment assigned. In one such setting, we address subtleties involved in inference for causal effects when using a key covariate to predict membership in latent principal strata. We show that when treatment received can differ from treatment assigned in both study arms, incorporating a stratum-predictive covariate can make estimates of the “complier average causal effect” (CACE) derive from observations in the two treatment arms with different covariate distributions. Adopting a Bayesian perspective and using Markov chain Monte Carlo for computation, we develop posterior checks that characterize the extent to which incorporating the pretreatment covariate endangers estimation of the CACE. We apply the method to analyze a clinical trial comparing two treatments for jaw fractures in which the study protocol allowed surgeons to overrule both possible randomized treatment assignments based on their clinical judgment and the data contained a key covariate (injury severity) predictive of treatment received.
Complier average causal effect; noncompliance; principal effect; principal stratification
In assessing the mechanism of treatment efficacy in randomized clinical trials, investigators often perform mediation analyses by analyzing if the significant intent-to-treat treatment effect on outcome occurs through or around a third intermediate or mediating variable: indirect and direct effects, respectively. Standard mediation analyses assume sequential ignorability, i.e., conditional on covariates the intermediate or mediating factor is randomly assigned, as is the treatment in a randomized clinical trial. This research focuses on the application of the principal stratification approach for estimating the direct effect of a randomized treatment but without the standard sequential ignorability assumption. This approach is used to estimate the direct effect of treatment as a difference between expectations of potential outcomes within latent sub-groups of participants for whom the intermediate variable behavior would be constant, regardless of the randomized treatment assignment. Using a Bayesian estimation procedure, we also assess the sensitivity of results based on the principal stratification approach to heterogeneity of the variances among these principal strata. We assess this approach with simulations and apply it to two psychiatric examples. Both examples and the simulations indicated robustness of our findings to the homogeneous variance assumption. However, simulations showed that the magnitude of treatment effects derived under the principal stratification approach were sensitive to model mis-specification.
Principal stratification; mediating variables; direct effects; principal strata probabilities; heterogeneous variances
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
structural equation models; confounding; graphical methods; counterfactuals; causal effects; potential-outcome; mediation; policy evaluation; causes of effects
Most investigations in the social and health sciences aim to understand the directional or causal relationship between a treatment or risk factor and outcome. Given the multitude of pathways through which the treatment or risk factor may affect the outcome, there is also an interest in decomposing the effect of a treatment of risk factor into “direct” and “mediated” effects. For example, child's socioeconomic status (risk factor) may have a direct effect on the risk of death (outcome) and an effect that may be mediated through the adulthood socioeconomic status (mediator). Building on the potential outcome framework for causal inference, we develop a Bayesian approach for estimating direct and mediated effects in the context of a dichotomous mediator and dichotomous outcome, which is challenging as many parameters cannot be fully identified. We first define principal strata corresponding to the joint distribution of the observed and counterfactual values of the mediator, and define associate, dissociative, and mediated effects as functions of the differences in the mean outcome under differing treatment assignments within the principal strata. We then develop the likelihood properties and calculate nonparametric bounds of these causal effects assuming randomized treatment assignment. Because likelihood theory is not well developed for nonidentifiable parameters, we consider a Bayesian approach that allows the direct and mediated effects to be expressed in terms of the posterior distribution of the population parameters of interest. This range can be reduced by making further assumptions about the parameters that can be encoded in prior distribution assumptions. We perform sensitivity analyses by using several prior distributions that make weaker assumptions than monotonicity or the exclusion restriction. We consider an application that explores the mediating effects of adult poverty on the relationship between childhood poverty and risk of death.
Direct effect; Mediated effect; Monotonicity; Mortality; Poverty
In longitudinal clinical trials, when outcome variables at later time points are only defined for patients who survive to those times, the evaluation of the causal effect of treatment is complicated. In this paper, we describe an approach that can be used to obtain the causal effect of three treatment arms with ordinal outcomes in the presence of death using a principal stratification approach. We introduce a set of flexible assumptions to identify the causal effect and implement a sensitivity analysis for non-identifiable assumptions which we parameterize parsimoniously. Methods are illustrated on quality of life data from a recent colorectal cancer clinical trial.
Principal stratification; QOL; Ordinal data; Sensitivity analysis
Population stratification can cause spurious associations in population–based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population–based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population–based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies.
This article links the structural equation modeling (SEM) approach with the principal stratification (PS) approach, both of which have been widely used to study the role of intermediate posttreatment outcomes in randomized experiments. Despite the potential benefit of such integration, the 2 approaches have been developed in parallel with little interaction. This article proposes the cross-model translation (CMT) approach, in which parameter estimates are translated back and forth between the PS and SEM models. First, without involving any particular identifying assumptions, translation between PS and SEM parameters is carried out on the basis of their close conceptual connection. Monte Carlo simulations are used to further clarify the relation between the 2 approaches under particular identifying assumptions. The study concludes that, under the common goal of causal inference, what makes a practical difference is the choice of identifying assumptions, not the modeling framework itself. The CMT approach provides a common ground in which the PS and SEM approaches can be jointly considered, focusing on their common inferential problems.
cross-model translation; mediational process; principal stratification; randomized experiment; structural equation modeling
Traditional transmission disequilibrium test (TDT) based methods for genetic association analyses are robust to population stratification at the cost of a substantial loss of power. We here describe a novel method for family-based association studies that corrects for population stratification with the use of an extension of principal component analysis (PCA). Specifically, we adopt PCA on unrelated parents in each family. We then infer principal components for children from those for their parents through a TDT-like strategy. Two test statistics within variance-components model are proposed for association tests. Simulation results show that the proposed tests have correct type I error rates regardless of population stratification, and have greatly improved power over two popular TDT-based methods: QTDT and FBAT. The application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility of the proposed method.
Family Based Association Tests (FBATs); Transmission Disequilibrium Test (TDT); Principal Component Analysis (PCA); Variance-Components
Background and Purpose
Sequence variants on chromosome 9p21.3 are implicated in coronary artery disease (CAD) and myocardial infarction (MI), but studies in ischemic stroke have produced inconsistent results. We investigated whether these conflicting findings were due to false positive studies confounded by population stratification, or false negative studies that failed to account for effects specific to certain stroke subtypes.
After assessing for population stratification at 9p21.3 using genome-wide data, we meta-analyzed 8 ischemic stroke studies. This analysis focused on two single nucleotide polymorphisms (SNPs), rs1537378 and rs10757278, as these variants are in strong linkage disequilibrium with most SNPs analyzed in prior studies of the region.
Principal component analysis of the genome-wide data showed no evidence of population stratification at that locus. Meta-analysis confirmed that both rs1537378 and rs10757278 are risk factors for ischemic stroke (odds ratios 1.09, [p = 0.0014], and 1.11, [p = 0.001] respectively). Subtype analysis revealed a substantial increase in the effect of each SNP for risk of large artery (LA) stroke, achieving an effect size similar to that seen in CAD/MI.
Variants on 9p21.3 are associated with ischemic stroke, and restriction of analysis to LA stroke increases effect size towards that observed in prior association studies of CAD/MI. Previous inconsistent findings are best explained by this subtype-specificity rather than any unmeasured confounding by population stratification.
Genes; Stroke; Atherosclerosis; Coronary Artery Disease; Meta-Analysis
Understanding the distribution of human genetic variation is an important foundation for research into the genetics of common diseases. Some of the alleles that modify common disease risk are themselves likely to be common and, thus, amenable to identification using gene-association methods. A problem with this approach is that the large sample sizes required for sufficient statistical power to detect alleles with moderate effect make gene-association studies susceptible to false-positive findings as the result of population stratification [1,2]. Such type I errors can be eliminated by using either family-based association tests or methods that sufficiently adjust for population stratification [3-5]. These methods require the availability of genetic markers that can detect and, thus, control for sources of genetic stratification among populations. In an effort to investigate population stratification and identify appropriate marker panels, we have analysed 11,555 single nucleotide polymorphisms in 203 individuals from 12 diverse human populations. Individuals in each population cluster to the exclusion of individuals from other populations using two clustering methods. Higher-order branching and clustering of the populations are consistent with the geographic origins of populations and with previously published genetic analyses. These data provide a valuable resource for the definition of marker panels to detect and control for population stratification in population-based gene identification studies. Using three US resident populations (European-American, African-American and Puerto Rican), we demonstrate how such studies can proceed, quantifying proportional ancestry levels and detecting significant admixture structure in each of these populations.
population genetics; population genomics; human evolution; migration; admixture; population stratification