PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (986636)

Clipboard (0)
None

Related Articles

1.  Principal Stratification in Causal Inference 
Biometrics  2002;58(1):21-29.
Summary
Many scientific problems require that treatment comparisons be adjusted for posttreatment variables, but the estimands underlying standard methods are not causal effects. To address this deficiency, we propose a general framework for comparing treatments adjusting for posttreatment variables that yields principal effects based on principal stratification. Principal stratification with respect to a posttreatment variable is a cross-classification of subjects defined by the joint potential values of that posttreatment variable under each of the treatments being compared. Principal effects are causal effects within a principal stratum. The key property of principal strata is that they are not affected by treatment assignment and therefore can be used just as any pretreatment covariate, such as age category. As a result, the central property of our principal effects is that they are always causal effects and do not suffer from the complications of standard posttreatment-adjusted estimands. We discuss briefly that such principal causal effects are the link between three recent applications with adjustment for posttreatment variables: (i) treatment noncompliance, (ii) missing outcomes (dropout) following treatment noncompliance, and (iii) censoring by death. We then attack the problem of surrogate or biomarker endpoints, where we show, using principal causal effects, that all current definitions of surrogacy, even when perfectly true, do not generally have the desired interpretation as causal effects of treatment on outcome. We go on to formulate estimands based on principal stratification and principal causal effects and show their superiority.
PMCID: PMC4137767  PMID: 11890317
Biomarker; Causal inference; Censoring by death; Missing data; Posttreatment variable; Principal stratification; Quality of life; Rubin causal model; Surrogate
2.  Statistical identifiability and the surrogate endpoint problem, with application to vaccine trials 
Biometrics  2010;66(4):1153-1161.
Summary
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
doi:10.1111/j.1541-0420.2009.01380.x
PMCID: PMC3597127  PMID: 20105158
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
3.  Accommodating Missingness When Assessing Surrogacy Via Principal Stratification 
Clinical trials (London, England)  2013;10(3):363-377.
Background
When an outcome of interest in a clinical trial is late-occurring or difficult to obtain, surrogate markers can extract information about the effect of the treatment on the outcome of interest. Understanding associations between the causal effect of treatment on the outcome and the causal effect of treatment on the surrogate is critical to understanding the value of a surrogate from a clinical perspective.
Purpose
Traditional regression approaches to determine the proportion of the treatment effect explained by surrogate markers suffer from several shortcomings: they can be unstable, and can lie outside of the 0–1 range. Further, they do not account for the fact that surrogate measures are obtained post-randomization, and thus the surrogate-outcome relationship may be subject to unmeasured confounding. Methods to avoid these problem are of key importance.
Methods
Frangakis C, Rubin DM. Principal stratification in causal inference. Biometrics 2002; 58:21–9 suggested assessing the causal effect of treatment within pre-randomization “principal strata” defined by the counterfactual joint distribution of the surrogate marker under the different treatment arms, with the proportion of the overall outcome causal effect attributable to subjects for whom the treatment affects the proposed surrogate as the key measure of interest. Li Y, Taylor JMG, Elliott MR. Bayesian approach to surrogacy assessment using principal stratification in clinical trials. Biometrics 2010; 66:523–31 developed this “principal surrogacy” approach for dichotomous markers and outcomes, utilizing Bayesian methods that accommodated non-identifiability in the model parameters. Because the surrogate marker is typically observed early, outcome data is often missing. Here we extend Li, Taylor, and Elliott to accommodate missing data in the observable final outcome under ignorable and non-ignorable settings. We also allow for the possibility that missingness has a counterfactual component, a feature that previous literature has not addressed.
Results
We apply the proposed methods to a trial of glaucoma control comparing surgery versus medication, where intraocular pressure (IOP) control at 12 months is a surrogate for IOP control at 96 months. We also conduct a series of simulations to consider the impacts of non-ignorability, as well as sensitivity to priors and the ability of the Decision Information Criterion to choose the correct model when parameters are not fully identified.
Limitations
Because model parameters cannot be fully identified from data, informative priors can introduce non-trivial bias in moderate sample size settings, while more non-informative priors can yield wide credible intervals.
Conclusions
Assessing the linkage between causal effects of treatment on a surrogate marker and causal effects of a treatment on an outcome is important to understanding the value of a marker. These causal effects are not fully identifiable: hence we explore the sensitivity and identifiability aspects of these models and show that relatively weak assumptions can still yield meaningful results.
doi:10.1177/1740774513479522
PMCID: PMC4096330  PMID: 23553326
Causal Inference; Surrogate Marker; Bayesian Analysis; dentifiability; Non-response; Counterfactual
4.  Hard, harder, hardest: principal stratification, statistical identifiability, and the inherent difficulty of finding surrogate endpoints 
In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging.
doi:10.1186/1742-7622-11-14
PMCID: PMC4171402  PMID: 25342953
Surrogate endpoint; Principal stratification; Causal inference; Statistical identifiability
5.  Commentary on “Principal Stratification — a Goal or a Tool?” by Judea Pearl 
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
doi:10.2202/1557-4679.1341
PMCID: PMC3204668  PMID: 22049267
principal stratification; causal inference; vaccine trial
6.  Causal Vaccine Effects on Binary Postinfection Outcomes 
The effects of vaccine on postinfection outcomes, such as disease, death, and secondary transmission to others, are important scientific and public health aspects of prophylactic vaccination. As a result, evaluation of many vaccine effects condition on being infected. Conditioning on an event that occurs posttreatment (in our case, infection subsequent to assignment to vaccine or control) can result in selection bias. Moreover, because the set of individuals who would become infected if vaccinated is likely not identical to the set of those who would become infected if given control, comparisons that condition on infection do not have a causal interpretation. In this article we consider identifiability and estimation of causal vaccine effects on binary postinfection outcomes. Using the principal stratification framework, we define a postinfection causal vaccine efficacy estimand in individuals who would be infected regardless of treatment assignment. The estimand is shown to be not identifiable under the standard assumptions of the stable unit treatment value, monotonicity, and independence of treatment assignment. Thus selection models are proposed that identify the causal estimand. Closed-form maximum likelihood estimators (MLEs) are then derived under these models, including those assuming maximum possible levels of positive and negative selection bias. These results show the relations between the MLE of the causal estimand and two commonly used estimators for vaccine effects on postinfection outcomes. For example, the usual intent-to-treat estimator is shown to be an upper bound on the postinfection causal vaccine effect provided that the magnitude of protection against infection is not too large. The methods are used to evaluate postinfection vaccine effects in a clinical trial of a rotavirus vaccine candidate and in a field study of a pertussis vaccine. Our results show that pertussis vaccination has a significant causal effect in reducing disease severity.
doi:10.1198/016214505000000970
PMCID: PMC2603579  PMID: 19096723
Causal inference; Infectious disease; Maximum likelihood; Principal stratification; Sensitivity analysis
7.  AN APPLICATION OF PRINCIPAL STRATIFICATION TO CONTROL FOR INSTITUTIONALIZATION AT FOLLOW-UP IN STUDIES OF SUBSTANCE ABUSE TREATMENT PROGRAMS* 
The annals of applied statistics  2008;2(3):1034-1055.
Participants in longitudinal studies on the effects of drug treatment and criminal justice system interventions are at high risk for institutionalization (e.g., spending time in an environment where their freedom to use drugs, commit crimes, or engage in risky behavior may be circumscribed). Methods used for estimating treatment effects in the presence of institutionalization during follow-up can be highly sensitive to assumptions that are unlikely to be met in applications and thus likely to yield misleading inferences. In this paper, we consider the use of principal stratification to control for institutionalization at follow-up. Principal stratification has been suggested for similar problems where outcomes are unobservable for samples of study participants because of dropout, death, or other forms of censoring. The method identifies principal strata within which causal effects are well defined and potentially estimable. We extend the method of principal stratification to model institutionalization at follow-up and estimate the effect of residential substance abuse treatment versus outpatient services in a large scale study of adolescent substance abuse treatment programs. Additionally, we discuss practical issues in applying the principal stratification model to data. We show via simulation studies that the model can only recover true effects provided the data meet strenuous demands and that there must be caution taken when implementing principal stratification as a technique to control for post-treatment confounders such as institutionalization.
doi:10.1214/08-AOAS179
PMCID: PMC2749670  PMID: 19779599
Principal Stratification; Post-Treatment Confounder; Institutionalization; Causal Inference
8.  Clarifying the Role of Principal Stratification in the Paired Availability Design 
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
doi:10.2202/1557-4679.1338
PMCID: PMC3114955  PMID: 21686085
principal stratification; causal inference; paired availability design
9.  Principal Stratification — Uses and Limitations 
Pearl (2011) asked for the causal inference community to clarify the role of the principal stratification framework in the analysis of causal effects. Here, I argue that the notion of principal stratification has shed light on problems of non-compliance, censoring-by-death, and the analysis of post-infection outcomes; that it may be of use in considering problems of surrogacy but further development is needed; that it is of some use in assessing “direct effects”; but that it is not the appropriate tool for assessing “mediation.” There is nothing within the principal stratification framework that corresponds to a measure of an “indirect” or “mediated” effect.
doi:10.2202/1557-4679.1329
PMCID: PMC3154088  PMID: 21841939
causal inference; mediation; non-compliance; potential outcomes; principal stratification; surrogates
10.  Estimating Causal Effects in Trials Involving Multi-Treatment Arms Subject to Non-compliance: A Bayesian framework 
Summary
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
doi:10.1111/j.1467-9876.2009.00709.x
PMCID: PMC3104736  PMID: 21637737
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
11.  Accounting for Population Stratification in DNA Methylation Studies 
Genetic epidemiology  2014;38(3):231-241.
DNA methylation is an important epigenetic mechanism that has been linked to complex disease and is of great interest to researchers as a potential link between genome, environment, and disease. As the scale of DNA methylation association studies approaches that of genome-wide association studies (GWAS), issues such as population stratification will need to be addressed. It is well-documented that failure to adjust for population stratification can lead to false positives in genetic association studies, but population stratification is often unaccounted for in DNA methylation studies. Here, we propose several approaches to correct for population stratification using principal components from different subsets of genome-wide methylation data. We first illustrate the potential for confounding due to population stratification by demonstrating widespread associations between DNA methylation and race in 388 individuals (365 African American and 23 Caucasian). We subsequently evaluate the performance of our principal-components approaches and other methods in adjusting for confounding due to population stratification. Our simulations show that 1) all of the methods considered are effective at removing inflation due to population stratification, and 2) maximum power can be obtained with SNP-based principal components, followed by methylation-based principal components, which out-perform both surrogate variable analysis and genomic control. Among our different approaches to computing methylation-based principal components, we find that principal components based on CpG sites chosen for their potential to proxy nearby SNPs can provide a powerful and computationally efficient approach to adjustment for population stratification in DNA methylation studies when genome-wide SNP data are unavailable.
doi:10.1002/gepi.21789
PMCID: PMC4090102  PMID: 24478250
12.  ASSESSING SURROGATE ENDPOINTS IN VACCINE TRIALS WITH CASE-COHORT SAMPLING AND THE COX MODEL1 
The annals of applied statistics  2008;2(1):386-407.
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition under the principal stratification framework proposed by Frangakis and Rubin [Biometrics 58 (2002) 21–29] and adapted by Gilbert and Hudgens (2006), we introduce estimands that measure the value of an immune response as a surrogate of protection in the context of the Cox proportional hazards model. The estimands are not identified because the immune response to vaccine is not measured in placebo recipients. We formulate the problem as a Cox model with missing covariates, and employ novel trial designs for predicting the missing immune responses and thereby identifying the estimands. The first design utilizes information from baseline predictors of the immune response, and bridges their relationship in the vaccine recipients to the placebo recipients. The second design provides a validation set for the unmeasured immune responses of uninfected placebo recipients by immunizing them with the study vaccine after trial closeout. A maximum estimated likelihood approach is proposed for estimation of the parameters. Simulated data examples are given to evaluate the proposed designs and study their properties.
doi:10.1214/07-AOAS132
PMCID: PMC2601643  PMID: 19079758
Clinical trial; discrete failure time model; missing data; potential outcomes; principal stratification; surrogate marker
13.  Estimation of dynamical model parameters taking into account undetectable marker values 
Background
Mathematical models are widely used for studying the dynamic of infectious agents such as hepatitis C virus (HCV). Most often, model parameters are estimated using standard least-square procedures for each individual. Hierarchical models have been proposed in such applications. However, another issue is the left-censoring (undetectable values) of plasma viral load due to the lack of sensitivity of assays used for quantification. A method is proposed to take into account left-censored values for estimating parameters of non linear mixed models and its impact is demonstrated through a simulation study and an actual clinical trial of anti-HCV drugs.
Methods
The method consists in a full likelihood approach distinguishing the contribution of observed and left-censored measurements assuming a lognormal distribution of the outcome. Parameters of analytical solution of system of differential equations taking into account left-censoring are estimated using standard software.
Results
A simulation study with only 14% of measurements being left-censored showed that model parameters were largely biased (from -55% to +133% according to the parameter) with the exception of the estimate of initial outcome value when left-censored viral load values are replaced by the value of the threshold. When left-censoring was taken into account, the relative bias on fixed effects was equal or less than 2%. Then, parameters were estimated using the 100 measurements of HCV RNA available (with 12% of left-censored values) during the first 4 weeks following treatment initiation in the 17 patients included in the trial. Differences between estimates according to the method used were clinically significant, particularly on the death rate of infected cells. With the crude approach the estimate was 0.13 day-1 (95% confidence interval [CI]: 0.11; 0.17) compared to 0.19 day-1 (CI: 0.14; 0.26) when taking into account left-censoring. The relative differences between estimates of individual treatment efficacy according to the method used varied from 0.001% to 37%.
Conclusion
We proposed a method that gives unbiased estimates if the assumed distribution is correct (e.g. lognormal) and that is easy to use with standard software.
doi:10.1186/1471-2288-6-38
PMCID: PMC1559636  PMID: 16879756
14.  Association analyses of the MAS-QTL data set using grammar, principal components and Bayesian network methodologies 
BMC Proceedings  2011;5(Suppl 3):S8.
Background
It has been shown that if genetic relationships among individuals are not taken into account for genome wide association studies, this may lead to false positives. To address this problem, we used Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification analyses. To account for linkage disequilibrium among the significant markers, principal components loadings obtained from top markers can be included as covariates. Estimation of Bayesian networks may also be useful to investigate linkage disequilibrium among SNPs and their relation with environmental variables.
For the quantitative trait we first estimated residuals while taking polygenic effects into account. We then used a single SNP approach to detect the most significant SNPs based on the residuals and applied principal component regression to take linkage disequilibrium among these SNPs into account. For the categorical trait we used principal component stratification methodology to account for background effects. For correction of linkage disequilibrium we used principal component logit regression. Bayesian networks were estimated to investigate relationship among SNPs.
Results
Using the Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification approach we detected around 100 significant SNPs for the quantitative trait (p<0.05 with 1000 permutations) and 109 significant (p<0.0006 with local FDR correction) SNPs for the categorical trait. With additional principal component regression we reduced the list to 16 and 50 SNPs for the quantitative and categorical trait, respectively.
Conclusions
GRAMMAR could efficiently incorporate the information regarding random genetic effects. Principal component stratification should be cautiously used with stringent multiple hypothesis testing correction to correct for ancestral stratification and association analyses for binary traits when there are systematic genetic effects such as half sib family structures. Bayesian networks are useful to investigate relationships among SNPs and environmental variables.
doi:10.1186/1753-6561-5-S3-S8
PMCID: PMC3103207  PMID: 21624178
15.  Partially hidden Markov model for time-varying principal stratification in HIV prevention trials 
It is frequently of interest to estimate the intervention effect that adjusts for post-randomization variables in clinical trials. In the recently completed HPTN 035 trial, there is differential condom use between the three microbicide gel arms and the No Gel control arm, so that intention to treat (ITT) analyses only assess the net treatment effect that includes the indirect treatment effect mediated through differential condom use. Various statistical methods in causal inference have been developed to adjust for post-randomization variables. We extend the principal stratification framework to time-varying behavioral variables in HIV prevention trials with a time-to-event endpoint, using a partially hidden Markov model (pHMM). We formulate the causal estimand of interest, establish assumptions that enable identifiability of the causal parameters, and develop maximum likelihood methods for estimation. Application of our model on the HPTN 035 trial reveals an interesting pattern of prevention effectiveness among different condom-use principal strata.
doi:10.1080/01621459.2011.643743
PMCID: PMC3649016  PMID: 23667279
microbicide; causal inference; posttreatment variables; direct effect
16.  Risk-Stratified Imputation in Survival Analysis 
Clinical trials (London, England)  2013;10(4):530-539.
Background
Censoring that is dependent on covariates associated with survival can arise in randomized trials due to changes in recruitment and eligibility criteria to minimize withdrawals, potentially leading to biased treatment effect estimates. Imputation approaches have been proposed to address censoring in survival analysis; and while these approaches may provide unbiased estimates of treatment effects, imputation of a large number of outcomes may over- or underestimate the associated variance based on the imputation pool selected.
Purpose
We propose an improved method, risk-stratified imputation, as an alternative to address withdrawal related to the risk of events in the context of time-to-event analyses.
Methods
Our algorithm performs imputation from a pool of replacement subjects with similar values of both treatment and covariate(s) of interest, that is, from a risk-stratified sample. This stratification prior to imputation addresses the requirement of time-to-event analysis that censored observations are representative of all other observations in the risk group with similar exposure variables. We compared our risk-stratified imputation to case deletion and bootstrap imputation in a simulated dataset in which the covariate of interest (study withdrawal) was related to treatment. A motivating example from a recent clinical trial is also presented to demonstrate the utility of our method.
Results
In our simulations, risk-stratified imputation gives estimates of treatment effect comparable to bootstrap and auxiliary variable imputation while avoiding inaccuracies of the latter two in estimating the associated variance. Similar results were obtained in analysis of clinical trial data.
Limitations
Risk-stratified imputation has little advantage over other imputation methods when covariates of interest are not related to treatment, although its performance is superior when covariates are related to treatment. Risk-stratified imputation is intended for categorical covariates, and may be sensitive to the width of the matching window if continuous covariates are used.
Conclusions
The use of the risk-stratified imputation should facilitate the analysis of many clinical trials, in which one group has a higher withdrawal rate that is related to treatment.
doi:10.1177/1740774513493150
PMCID: PMC3807795  PMID: 23818434
Censoring; Survival; Imputation; Randomized Trials; CREST; Time to Event
17.  Cereal Domestication and Evolution of Branching: Evidence for Soft Selection in the Tb1 Orthologue of Pearl Millet (Pennisetum glaucum [L.] R. Br.) 
PLoS ONE  2011;6(7):e22404.
Background
During the Neolithic revolution, early farmers altered plant development to domesticate crops. Similar traits were often selected independently in different wild species; yet the genetic basis of this parallel phenotypic evolution remains elusive. Plant architecture ranks among these target traits composing the domestication syndrome. We focused on the reduction of branching which occurred in several cereals, an adaptation known to rely on the major gene Teosinte-branched1 (Tb1) in maize. We investigate the role of the Tb1 orthologue (Pgtb1) in the domestication of pearl millet (Pennisetum glaucum), an African outcrossing cereal.
Methodology/Principal Findings
Gene cloning, expression profiling, QTL mapping and molecular evolution analysis were combined in a comparative approach between pearl millet and maize. Our results in pearl millet support a role for PgTb1 in domestication despite important differences in the genetic basis of branching adaptation in that species compared to maize (e.g. weaker effects of PgTb1). Genetic maps suggest this pattern to be consistent in other cereals with reduced branching (e.g. sorghum, foxtail millet). Moreover, although the adaptive sites underlying domestication were not formerly identified, signatures of selection pointed to putative regulatory regions upstream of both Tb1 orthologues in maize and pearl millet. However, the signature of human selection in the pearl millet Tb1 is much weaker in pearl millet than in maize.
Conclusions/Significance
Our results suggest that some level of parallel evolution involved at least regions directly upstream of Tb1 for the domestication of pearl millet and maize. This was unanticipated given the multigenic basis of domestication traits and the divergence of wild progenitor species for over 30 million years prior to human selection. We also hypothesized that regular introgression of domestic pearl millet phenotypes by genes from the wild gene pool could explain why the selective sweep in pearl millet is softer than in maize.
doi:10.1371/journal.pone.0022404
PMCID: PMC3142148  PMID: 21799845
18.  Mediation Analysis with Principal Stratification 
Statistics in medicine  2009;28(7):1108-1130.
In assessing the mechanism of treatment efficacy in randomized clinical trials, investigators often perform mediation analyses by analyzing if the significant intent-to-treat treatment effect on outcome occurs through or around a third intermediate or mediating variable: indirect and direct effects, respectively. Standard mediation analyses assume sequential ignorability, i.e., conditional on covariates the intermediate or mediating factor is randomly assigned, as is the treatment in a randomized clinical trial. This research focuses on the application of the principal stratification approach for estimating the direct effect of a randomized treatment but without the standard sequential ignorability assumption. This approach is used to estimate the direct effect of treatment as a difference between expectations of potential outcomes within latent sub-groups of participants for whom the intermediate variable behavior would be constant, regardless of the randomized treatment assignment. Using a Bayesian estimation procedure, we also assess the sensitivity of results based on the principal stratification approach to heterogeneity of the variances among these principal strata. We assess this approach with simulations and apply it to two psychiatric examples. Both examples and the simulations indicated robustness of our findings to the homogeneous variance assumption. However, simulations showed that the magnitude of treatment effects derived under the principal stratification approach were sensitive to model mis-specification.
doi:10.1002/sim.3533
PMCID: PMC2669107  PMID: 19184975
Principal stratification; mediating variables; direct effects; principal strata probabilities; heterogeneous variances
19.  Evaluating Candidate Principal Surrogate Endpoints 
Biometrics  2008;64(4):1146-1154.
Summary
Frangakis and Rubin (2002, Biometrics 58, 21–29) proposed a new definition of a surrogate endpoint (a “principal” surrogate) based on causal effects. We introduce an estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, which quantifies how well causal treatment effects on the biomarker predict causal treatment effects on the clinical endpoint. Although the CEP surface is not identifiable due to missing potential outcomes, it can be identified by incorporating a baseline covariate(s) that predicts the biomarker. Given case–cohort sampling of such a baseline predictor and the biomarker in a large blinded randomized clinical trial, we develop an estimated likelihood method for estimating the CEP surface. This estimation assesses the “surrogate value” of the biomarker for reliably predicting clinical treatment effects for the same or similar setting as the trial. A CEP surface plot provides a way to compare the surrogate value of multiple biomarkers. The approach is illustrated by the problem of assessing an immune response to a vaccine as a surrogate endpoint for infection.
doi:10.1111/j.1541-0420.2008.01014.x
PMCID: PMC2726718  PMID: 18363776
Case cohort; Causal inference; Clinical trial; HIV vaccine; Postrandomization selection bias; Structural model; Prentice criteria; Principal stratification
20.  Estimation of colorectal adenoma recurrence with dependent censoring 
Background
Due to early colonoscopy for some participants, interval-censored observations can be introduced into the data of a colorectal polyp prevention trial. The censoring could be dependent of risk of recurrence if the reasons of having early colonoscopy are associated with recurrence. This can complicate estimation of the recurrence rate.
Methods
We propose to use midpoint imputation to convert interval-censored data problems to right censored data problems. To adjust for potential dependent censoring, we use information from auxiliary variables to define risk groups to perform the weighted Kaplan-Meier estimation to the midpoint imputed data. The risk groups are defined using two risk scores derived from two working proportional hazards models with the auxiliary variables as the covariates. One is for the recurrence time and the other is for the censoring time. The method described here is explored by simulation and illustrated with an example from a colorectal polyp prevention trial.
Results
We first show that midpoint imputation under an assumption of independent censoring will produce an unbiased estimate of recurrence rate at the end of the trial, which is often the main interest of a colorectal polyp prevention trial, and then show in simulations that the weighted Kaplan-Meier method using the information from auxiliary variables based on the midpoint imputed data can improve efficiency in a situation with independent censoring and reduce bias in a situation with dependent censoring compared to the conventional methods, while estimating the recurrence rate at the end of the trial.
Conclusion
The research in this paper uses midpoint imputation to handle interval-censored observations and then uses the information from auxiliary variables to adjust for dependent censoring by incorporating them into the weighted Kaplan-Meier estimation. This approach can handle a situation with multiple auxiliary variables by deriving two risk scores from two working PH models. Although the idea of this approach might appear simple, the results do show that the weighted Kaplan-Meier approach can gain efficiency and reduce bias due to dependent censoring.
doi:10.1186/1471-2288-9-66
PMCID: PMC2760573  PMID: 19788750
21.  Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data 
Bioinformatics  2014;30(13):1867-1875.
Motivation: High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data.
Results: We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA.
Availability and implementation: The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm.
Contact: fbuettner.phys@gmail.com
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btu134
PMCID: PMC4071202  PMID: 24618470
22.  Multiple approaches to assessing the effects of delays for hip fracture patients in the United States and Canada. 
Health Services Research  2000;34(7):1499-1518.
OBJECTIVE: To examine the determinants of postsurgery length of stay (LOS) and inpatient mortality in the United States (California and Massachusetts) and Canada (Manitoba and Quebec). DATA SOURCES/STUDY SETTING: Patient discharge abstracts from the Agency for Health Care Policy and Research Nationwide Inpatient Sample and from provincial health ministries. STUDY DESIGN: Descriptive statistics by state or province, pooled competing risks hazards models (which control for censoring of LOS and inpatient mortality data), and instrumental variables (which control for confounding in observational data) were used to analyze the effect of wait time for hip fracture surgery on postsurgery outcomes. DATA EXTRACTIONS: Data were extracted for patients admitted to an acute care hospital with a primary diagnosis of hip fracture who received hip fracture surgery, were admitted from home or the emergency room, were age 45 or older, stayed in the hospital 365 days or less, and were not trauma patients. PRINCIPAL FINDINGS: The descriptive data indicate that wait times for surgery are longer in the two Canadian provinces than in the two U.S. states. Canadians also have longer postsurgery LOS and higher inpatient mortality. Yet the competing risks hazards model indicates that the effect of wait time on postsurgery LOS is small in magnitude. Instrumental variables analysis reveals that wait time for surgery is not a significant predictor of postsurgery length of stay. The hazards model reveals significant differences in mortality across regions. However, both the regressions and the instrumental variables indicate that these differences are not attributable to wait time for surgery. CONCLUSIONS: Statistical models that account for censoring and confounding yield conclusions that differ from those implied by descriptive statistics in administrative data. Longer wait time for hip fracture surgery does not explain the difference in postsurgery outcomes across countries.
PMCID: PMC1975661  PMID: 10737450
23.  Predicting treatment effect from surrogate endpoints and historical trials: an extrapolation involving probabilities of a binary outcome or survival to a specific time 
Biometrics  2011;68(1):248-257.
SUMMARY
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.
doi:10.1111/j.1541-0420.2011.01646.x
PMCID: PMC3218246  PMID: 21838732
Randomized trials; Reproducibility; Principal stratification
24.  Using Cure Models and Multiple Imputation to Utilize Recurrence as an Auxiliary Variable for Overall Survival 
Background
Intermediate outcome variables can often be used as auxiliary variables for the true outcome of interest in randomized clinical trials. For many cancers, time to recurrence is an informative marker in predicting a patient’s overall survival outcome, and could provide auxiliary information for the analysis of survival times.
Purpose
To investigate whether models linking recurrence and death combined with a multiple imputation procedure for censored observations can result in efficiency gains in the estimation of treatment effects, and be used to shorten trial lengths.
Methods
Recurrence and death times are modeled using data from 12 trials in colorectal cancer. Multiple imputation is used as a strategy for handling missing values arising from censoring. The imputation procedure uses a cure model for time to recurrence and a time-dependent Weibull proportional hazards model for time to death. Recurrence times are imputed, and then death times are imputed conditionally on recurrence times. To illustrate these methods, trials are artificially censored 2-years after the last accrual, the imputation procedure is implemented, and a log-rank test and Cox model are used to analyze and compare these new data with the original data.
Results
The results show modest, but consistent gains in efficiency in the analysis by using the auxiliary information in recurrence times. Comparison of analyses show the treatment effect estimates and log rank test results from the 2-year censored imputed data to be in between the estimates from the original data and the artificially censored data, indicating that the procedure was able to recover some of the lost information due to censoring.
Limitations
The models used are all fully parametric, requiring distributional assumptions of the data.
Conclusions
The proposed models may be useful to improve the efficiency in estimation of treatment effects in cancer trials and shortening trial length.
doi:10.1177/1740774511414741
PMCID: PMC3197975  PMID: 21921063
Auxiliary Variables; Colon Cancer; Cure Models; Multiple Imputation; Surrogate Endpoints
25.  Limitation of Inverse Probability-of-Censoring Weights in Estimating Survival in the Presence of Strong Selection Bias 
American Journal of Epidemiology  2011;173(5):569-577.
In time-to-event analyses, artificial censoring with correction for induced selection bias using inverse probability-of-censoring weights can be used to 1) examine the natural history of a disease after effective interventions are widely available, 2) correct bias due to noncompliance with fixed or dynamic treatment regimens, and 3) estimate survival in the presence of competing risks. Artificial censoring entails censoring participants when they meet a predefined study criterion, such as exposure to an intervention, failure to comply, or the occurrence of a competing outcome. Inverse probability-of-censoring weights use measured common predictors of the artificial censoring mechanism and the outcome of interest to determine what the survival experience of the artificially censored participants would be had they never been exposed to the intervention, complied with their treatment regimen, or not developed the competing outcome. Even if all common predictors are appropriately measured and taken into account, in the context of small sample size and strong selection bias, inverse probability-of-censoring weights could fail because of violations in assumptions necessary to correct selection bias. The authors used an example from the Multicenter AIDS Cohort Study, 1984–2008, regarding estimation of long-term acquired immunodeficiency syndrome-free survival to demonstrate the impact of violations in necessary assumptions. Approaches to improve correction methods are discussed.
doi:10.1093/aje/kwq385
PMCID: PMC3105434  PMID: 21289029
epidemiologic methods; selection bias; survival analysis

Results 1-25 (986636)