This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
principal stratification; causal inference; vaccine trial
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
Pearl (2011) asked for the causal inference community to clarify the role of the principal stratification framework in the analysis of causal effects. Here, I argue that the notion of principal stratification has shed light on problems of non-compliance, censoring-by-death, and the analysis of post-infection outcomes; that it may be of use in considering problems of surrogacy but further development is needed; that it is of some use in assessing “direct effects”; but that it is not the appropriate tool for assessing “mediation.” There is nothing within the principal stratification framework that corresponds to a measure of an “indirect” or “mediated” effect.
causal inference; mediation; non-compliance; potential outcomes; principal stratification; surrogates
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
principal stratification; causal inference; paired availability design
Participants in longitudinal studies on the effects of drug treatment and criminal justice system interventions are at high risk for institutionalization (e.g., spending time in an environment where their freedom to use drugs, commit crimes, or engage in risky behavior may be circumscribed). Methods used for estimating treatment effects in the presence of institutionalization during follow-up can be highly sensitive to assumptions that are unlikely to be met in applications and thus likely to yield misleading inferences. In this paper, we consider the use of principal stratification to control for institutionalization at follow-up. Principal stratification has been suggested for similar problems where outcomes are unobservable for samples of study participants because of dropout, death, or other forms of censoring. The method identifies principal strata within which causal effects are well defined and potentially estimable. We extend the method of principal stratification to model institutionalization at follow-up and estimate the effect of residential substance abuse treatment versus outpatient services in a large scale study of adolescent substance abuse treatment programs. Additionally, we discuss practical issues in applying the principal stratification model to data. We show via simulation studies that the model can only recover true effects provided the data meet strenuous demands and that there must be caution taken when implementing principal stratification as a technique to control for post-treatment confounders such as institutionalization.
Principal Stratification; Post-Treatment Confounder; Institutionalization; Causal Inference
The effects of vaccine on postinfection outcomes, such as disease, death, and secondary transmission to others, are important scientific and public health aspects of prophylactic vaccination. As a result, evaluation of many vaccine effects condition on being infected. Conditioning on an event that occurs posttreatment (in our case, infection subsequent to assignment to vaccine or control) can result in selection bias. Moreover, because the set of individuals who would become infected if vaccinated is likely not identical to the set of those who would become infected if given control, comparisons that condition on infection do not have a causal interpretation. In this article we consider identifiability and estimation of causal vaccine effects on binary postinfection outcomes. Using the principal stratification framework, we define a postinfection causal vaccine efficacy estimand in individuals who would be infected regardless of treatment assignment. The estimand is shown to be not identifiable under the standard assumptions of the stable unit treatment value, monotonicity, and independence of treatment assignment. Thus selection models are proposed that identify the causal estimand. Closed-form maximum likelihood estimators (MLEs) are then derived under these models, including those assuming maximum possible levels of positive and negative selection bias. These results show the relations between the MLE of the causal estimand and two commonly used estimators for vaccine effects on postinfection outcomes. For example, the usual intent-to-treat estimator is shown to be an upper bound on the postinfection causal vaccine effect provided that the magnitude of protection against infection is not too large. The methods are used to evaluate postinfection vaccine effects in a clinical trial of a rotavirus vaccine candidate and in a field study of a pertussis vaccine. Our results show that pertussis vaccination has a significant causal effect in reducing disease severity.
Causal inference; Infectious disease; Maximum likelihood; Principal stratification; Sensitivity analysis
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
It is frequently of interest to estimate the intervention effect that adjusts for post-randomization variables in clinical trials. In the recently completed HPTN 035 trial, there is differential condom use between the three microbicide gel arms and the No Gel control arm, so that intention to treat (ITT) analyses only assess the net treatment effect that includes the indirect treatment effect mediated through differential condom use. Various statistical methods in causal inference have been developed to adjust for post-randomization variables. We extend the principal stratification framework to time-varying behavioral variables in HIV prevention trials with a time-to-event endpoint, using a partially hidden Markov model (pHMM). We formulate the causal estimand of interest, establish assumptions that enable identifiability of the causal parameters, and develop maximum likelihood methods for estimation. Application of our model on the HPTN 035 trial reveals an interesting pattern of prevention effectiveness among different condom-use principal strata.
microbicide; causal inference; posttreatment variables; direct effect
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.
Randomized trials; Reproducibility; Principal stratification
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition under the principal stratification framework proposed by Frangakis and Rubin [Biometrics 58 (2002) 21–29] and adapted by Gilbert and Hudgens (2006), we introduce estimands that measure the value of an immune response as a surrogate of protection in the context of the Cox proportional hazards model. The estimands are not identified because the immune response to vaccine is not measured in placebo recipients. We formulate the problem as a Cox model with missing covariates, and employ novel trial designs for predicting the missing immune responses and thereby identifying the estimands. The first design utilizes information from baseline predictors of the immune response, and bridges their relationship in the vaccine recipients to the placebo recipients. The second design provides a validation set for the unmeasured immune responses of uninfected placebo recipients by immunizing them with the study vaccine after trial closeout. A maximum estimated likelihood approach is proposed for estimation of the parameters. Simulated data examples are given to evaluate the proposed designs and study their properties.
Clinical trial; discrete failure time model; missing data; potential outcomes; principal stratification; surrogate marker
In longitudinal clinical trials, when outcome variables at later time points are only defined for patients who survive to those times, the evaluation of the causal effect of treatment is complicated. In this paper, we describe an approach that can be used to obtain the causal effect of three treatment arms with ordinal outcomes in the presence of death using a principal stratification approach. We introduce a set of flexible assumptions to identify the causal effect and implement a sensitivity analysis for non-identifiable assumptions which we parameterize parsimoniously. Methods are illustrated on quality of life data from a recent colorectal cancer clinical trial.
Principal stratification; QOL; Ordinal data; Sensitivity analysis
Frangakis and Rubin (2002, Biometrics 58, 21–29) proposed a new definition of a surrogate endpoint (a “principal” surrogate) based on causal effects. We introduce an estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, which quantifies how well causal treatment effects on the biomarker predict causal treatment effects on the clinical endpoint. Although the CEP surface is not identifiable due to missing potential outcomes, it can be identified by incorporating a baseline covariate(s) that predicts the biomarker. Given case–cohort sampling of such a baseline predictor and the biomarker in a large blinded randomized clinical trial, we develop an estimated likelihood method for estimating the CEP surface. This estimation assesses the “surrogate value” of the biomarker for reliably predicting clinical treatment effects for the same or similar setting as the trial. A CEP surface plot provides a way to compare the surrogate value of multiple biomarkers. The approach is illustrated by the problem of assessing an immune response to a vaccine as a surrogate endpoint for infection.
Case cohort; Causal inference; Clinical trial; HIV vaccine; Postrandomization selection bias; Structural model; Prentice criteria; Principal stratification
Association studies are a promising way to uncover the genetic basis of complex traits in wild populations. Data on population stratification, linkage disequilibrium and distribution of variant effect-sizes for different trait-types are required to predict study success but are lacking for most taxa. We quantified and investigated the impacts of these key variables in a large-scale association study of a strongly selected trait of medical importance: pyrethroid resistance in the African malaria vector Anopheles gambiae.
We genotyped ≈1500 resistance-phenotyped wild mosquitoes from Ghana and Cameroon using a 1536-SNP array enriched for candidate insecticide resistance gene SNPs. Three factors greatly impacted study power. (1) Population stratification, which was attributable to co-occurrence of molecular forms (M and S), and cryptic within-form stratification necessitating both a partitioned analysis and genomic control. (2) All SNPs of substantial effect (odds ratio, OR>2) were rare (minor allele frequency, MAF<0.05). (3) Linkage disequilibrium (LD) was very low throughout most of the genome. Nevertheless, locally high LD, consistent with a recent selective sweep, and uniformly high ORs in each subsample facilitated significant direct and indirect detection of the known insecticide target site mutation kdr L1014F (OR≈6; P<10−6), but with resistance level modified by local haplotypic background.
Primarily as a result of very low LD in wild A. Gambiae, LD-based association mapping is challenging, but is feasible at least for major effect variants, especially where LD is enhanced by selective sweeps. Such variants will be of greatest importance for predictive diagnostic screening.
Intermediate outcomes are common and typically on the causal pathway to the final outcome. Some examples include noncompliance, missing data, and truncation by death like pregnancy (e.g. when the trial intervention is given to non-pregnant women and the final outcome is preeclampsia, defined only on pregnant women). The intention-to-treat approach does not account properly for them, and more appropriate alternative approaches like principal stratification are not yet widely known. The purposes of this study are to inform researchers that the intention-to-treat approach unfortunately does not fit all problems we face in experimental research, to introduce the principal stratification approach for dealing with intermediate outcomes, and to illustrate its application to a trial of long term calcium supplementation in women at high risk of preeclampsia.
Principal stratification and related concepts are introduced. Two ways for estimating causal effects are discussed and their application is illustrated using the calcium trial, where noncompliance and pregnancy are considered as intermediate outcomes, and preeclampsia is the main final outcome.
The limitations of traditional approaches and methods for dealing with intermediate outcomes are demonstrated. The steps, assumptions and required calculations involved in the application of the principal stratification approach are discussed in detail in the case of our calcium trial.
The intention-to-treat approach is a very sound one but unfortunately it does not fit all problems we find in randomized clinical trials; this is particularly the case for intermediate outcomes, where alternative approaches like principal stratification should be considered.
Intermediate outcomes; Intention-to-treat approach; Principal stratification; Causal effects
It has been shown that if genetic relationships among individuals are not taken into account for genome wide association studies, this may lead to false positives. To address this problem, we used Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification analyses. To account for linkage disequilibrium among the significant markers, principal components loadings obtained from top markers can be included as covariates. Estimation of Bayesian networks may also be useful to investigate linkage disequilibrium among SNPs and their relation with environmental variables.
For the quantitative trait we first estimated residuals while taking polygenic effects into account. We then used a single SNP approach to detect the most significant SNPs based on the residuals and applied principal component regression to take linkage disequilibrium among these SNPs into account. For the categorical trait we used principal component stratification methodology to account for background effects. For correction of linkage disequilibrium we used principal component logit regression. Bayesian networks were estimated to investigate relationship among SNPs.
Using the Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification approach we detected around 100 significant SNPs for the quantitative trait (p<0.05 with 1000 permutations) and 109 significant (p<0.0006 with local FDR correction) SNPs for the categorical trait. With additional principal component regression we reduced the list to 16 and 50 SNPs for the quantitative and categorical trait, respectively.
GRAMMAR could efficiently incorporate the information regarding random genetic effects. Principal component stratification should be cautiously used with stringent multiple hypothesis testing correction to correct for ancestral stratification and association analyses for binary traits when there are systematic genetic effects such as half sib family structures. Bayesian networks are useful to investigate relationships among SNPs and environmental variables.
Dr. Pearl invites researchers to justify their use of principal stratification. This comment explains how the use of principal stratification simplified a complex mediational problem encountered when evaluating a smoking cessation intervention's effect on reducing smoking withdrawal symptoms.
causal inference; principal stratification; mediation; smoking cessation interventions
In assessing the mechanism of treatment efficacy in randomized clinical trials, investigators often perform mediation analyses by analyzing if the significant intent-to-treat treatment effect on outcome occurs through or around a third intermediate or mediating variable: indirect and direct effects, respectively. Standard mediation analyses assume sequential ignorability, i.e., conditional on covariates the intermediate or mediating factor is randomly assigned, as is the treatment in a randomized clinical trial. This research focuses on the application of the principal stratification approach for estimating the direct effect of a randomized treatment but without the standard sequential ignorability assumption. This approach is used to estimate the direct effect of treatment as a difference between expectations of potential outcomes within latent sub-groups of participants for whom the intermediate variable behavior would be constant, regardless of the randomized treatment assignment. Using a Bayesian estimation procedure, we also assess the sensitivity of results based on the principal stratification approach to heterogeneity of the variances among these principal strata. We assess this approach with simulations and apply it to two psychiatric examples. Both examples and the simulations indicated robustness of our findings to the homogeneous variance assumption. However, simulations showed that the magnitude of treatment effects derived under the principal stratification approach were sensitive to model mis-specification.
Principal stratification; mediating variables; direct effects; principal strata probabilities; heterogeneous variances
In time-to-event analyses, artificial censoring with correction for induced selection bias using inverse probability-of-censoring weights can be used to 1) examine the natural history of a disease after effective interventions are widely available, 2) correct bias due to noncompliance with fixed or dynamic treatment regimens, and 3) estimate survival in the presence of competing risks. Artificial censoring entails censoring participants when they meet a predefined study criterion, such as exposure to an intervention, failure to comply, or the occurrence of a competing outcome. Inverse probability-of-censoring weights use measured common predictors of the artificial censoring mechanism and the outcome of interest to determine what the survival experience of the artificially censored participants would be had they never been exposed to the intervention, complied with their treatment regimen, or not developed the competing outcome. Even if all common predictors are appropriately measured and taken into account, in the context of small sample size and strong selection bias, inverse probability-of-censoring weights could fail because of violations in assumptions necessary to correct selection bias. The authors used an example from the Multicenter AIDS Cohort Study, 1984–2008, regarding estimation of long-term acquired immunodeficiency syndrome-free survival to demonstrate the impact of violations in necessary assumptions. Approaches to improve correction methods are discussed.
epidemiologic methods; selection bias; survival analysis
Tuberculosis (TB) and HIV co-infection remains a major public health problem. In spite of different initiatives implemented to tackle the disease, many countries have not reached TB control targets. One of the major attributing reasons for this failure is infection with HIV. This study aims to determine the effect of HIV infection on the survival of TB patients.
A retrospective cohort study was employed to compare the survival between HIV positive and HIV negative TB patients (370 each) during an eight month directly observed treatment short-course (DOTS) period. TB patient’s HIV status was considered as an exposure and follow up time until death was taken as an outcome. All patients with TB treatment outcomes other than death were censored, and death was considered as failure. Cox proportional hazard regression model was used to determine the hazard ratio (HR) of death for each main baseline predictor. TB/HIV co-infected patients were more likely to die; adjusted Hazard Rate (AHR) =1.6, 95%CI (1.01, 2.6) during the DOTS period. This risk was statistically higher among HIV patients during the continuation phase (p=0.0003), as a result HIV positive TB patients had shorter survival (Log rank test= 6.90, df= 2, p= 0.008). The adjusted survival probability was lower in HIV positive TB patients (< 15%) than HIV negative TB patients (> 85%) at the end of the DOTS period (8th month).
TB treatment survival was substantially lower in HIV infected TB patients, especially during the continuation phase. Targeted and comprehensive management of TB/HIV with a strict follow up should be considered through the entire TB treatment period.
TB treatment; Survival; TB patients; HIV positive; Hawassa; Health center
The framework of principal stratification provides a way to think about treatment effects conditional on post-randomization variables, such as level of compliance. In particular, the complier average causal effect (CACE)–the effect of the treatment for those individuals who would comply with their treatment assignment under either treatment condition–is often of substantive interest. However, estimation of the CACE is not always straightforward, with a variety of estimation procedures and underlying assumptions, but little advice to help researchers select between methods. In this paper we discuss and examine two methods that rely on very different assumptions to estimate the CACE: a maximum likelihood (“joint”) method that assumes the “exclusion restriction,” and a propensity score based method that relies on “principal ignorability.” We detail the assumptions underlying each approach, and assess each method’s sensitivity to both its own assumptions and those of the other method using both simulated data and a motivating example. We find that the exclusion restriction based joint approach appears somewhat less sensitive to its assumptions, and that the performance of both methods is significantly improved when there are strong predictors of compliance. Interestingly, we also find that each method performs particularly well when the assumptions of the other approach are violated. These results highlight the importance of carefully selecting an estimation procedure whose assumptions are likely to be satisfied in practice and of having strong predictors of principal stratum membership.
Complier average causal effect; Intermediate outcomes; Noncompliance; Principal stratification; Propensity scores
We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.
Lasso; Logrank; Penalized least squares; Survival analysis
This paper summarizes recent advances in causal inference and underscores the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underlie all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: those about (1) the effects of potential interventions, (2) probabilities of counterfactuals, and (3) direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both. The tools are demonstrated in the analyses of mediation, causes of effects, and probabilities of causation.
structural equation models; confounding; graphical methods; counterfactuals; causal effects; potential-outcome; mediation; policy evaluation; causes of effects
In a series of papers, Robins and colleagues describe inverse probability of treatment weighted (IPTW) estimation in marginal structural models (MSMs), a method of causal analysis of longitudinal data based on counterfactual principles. This family of statistical techniques is similar in concept to weighting of survey data, except that the weights are estimated using study data rather than defined so as to reflect sampling design and post-stratification to an external population. Several decades ago Miettinen described an elementary method of causal analysis of case-control data based on indirect standardization. In this paper we extend the Miettinen approach using ideas closely related to IPTW estimation in MSMs. The technique is illustrated using data from a case-control study of oral contraceptives and myocardial infarction.
Intermediate outcome variables can often be used as auxiliary variables for the true outcome of interest in randomized clinical trials. For many cancers, time to recurrence is an informative marker in predicting a patient’s overall survival outcome, and could provide auxiliary information for the analysis of survival times.
To investigate whether models linking recurrence and death combined with a multiple imputation procedure for censored observations can result in efficiency gains in the estimation of treatment effects, and be used to shorten trial lengths.
Recurrence and death times are modeled using data from 12 trials in colorectal cancer. Multiple imputation is used as a strategy for handling missing values arising from censoring. The imputation procedure uses a cure model for time to recurrence and a time-dependent Weibull proportional hazards model for time to death. Recurrence times are imputed, and then death times are imputed conditionally on recurrence times. To illustrate these methods, trials are artificially censored 2-years after the last accrual, the imputation procedure is implemented, and a log-rank test and Cox model are used to analyze and compare these new data with the original data.
The results show modest, but consistent gains in efficiency in the analysis by using the auxiliary information in recurrence times. Comparison of analyses show the treatment effect estimates and log rank test results from the 2-year censored imputed data to be in between the estimates from the original data and the artificially censored data, indicating that the procedure was able to recover some of the lost information due to censoring.
The models used are all fully parametric, requiring distributional assumptions of the data.
The proposed models may be useful to improve the efficiency in estimation of treatment effects in cancer trials and shortening trial length.
Auxiliary Variables; Colon Cancer; Cure Models; Multiple Imputation; Surrogate Endpoints
In many randomized clinical trials, the primary response variable, for example, the survival time, is not observed directly after the patients enroll in the study but rather observed after some period of time (lag time). It is often the case that such a response variable is missing for some patients due to censoring that occurs when the study ends before the patient’s response is observed or when the patients drop out of the study. It is often assumed that censoring occurs at random which is referred to as noninformative censoring; however, in many cases such an assumption may not be reasonable. If the missing data are not analyzed properly, the estimator or test for the treatment effect may be biased. In this paper, we use semiparametric theory to derive a class of consistent and asymptotically normal estimators for the treatment effect parameter which are applicable when the response variable is right censored. The baseline auxiliary covariates and post-treatment auxiliary covariates, which may be time-dependent, are also considered in our semiparametric model. These auxiliary covariates are used to derive estimators that both account for informative censoring and are more efficient then the estimators which do not consider the auxiliary covariates.
Informative censoring; Influence function; Logrank test; Nuisance tangent space; Proportional hazards model; Regular and asymptotically linear estimators