A surrogate marker (S) is a variable that can be measured earlier and often easier than the true endpoint (T) in a clinical trial. Most previous research has been devoted to developing surrogacy measures to quantify how well S can replace T or examining the use of S in predicting the effect of a treatment (Z). However, the research often requires one to fit models for the distribution of T given S and Z. It is well known that such models do not have causal interpretations because the models condition on a post-randomization variable S. In this paper, we directly model the relationship among T, S and Z using a potential outcomes framework introduced by Frangakis and Rubin (2002). We propose a Bayesian estimation method to evaluate the causal probabilities associated with the cross-classification of the potential outcomes of S and T when S and T are both binary. We use a log-linear model to directly model the association between the potential outcomes of S and T through the odds ratios. The quantities derived from this approach always have causal interpretations. However, this causal model is not identifiable from the data without additional assumptions. To reduce the non-identifiability problem and increase the precision of statistical inferences, we assume monotonicity and incorporate prior belief that is plausible in the surrogate context by using prior distributions. We also explore the relationship among the surrogacy measures based on traditional models and this counterfactual model. The method is applied to the data from a glaucoma treatment study.
Bayesian Estimation; Counterfactual Model; Randomized Trial; Surrogate Marker
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition under the principal stratification framework proposed by Frangakis and Rubin [Biometrics 58 (2002) 21–29] and adapted by Gilbert and Hudgens (2006), we introduce estimands that measure the value of an immune response as a surrogate of protection in the context of the Cox proportional hazards model. The estimands are not identified because the immune response to vaccine is not measured in placebo recipients. We formulate the problem as a Cox model with missing covariates, and employ novel trial designs for predicting the missing immune responses and thereby identifying the estimands. The first design utilizes information from baseline predictors of the immune response, and bridges their relationship in the vaccine recipients to the placebo recipients. The second design provides a validation set for the unmeasured immune responses of uninfected placebo recipients by immunizing them with the study vaccine after trial closeout. A maximum estimated likelihood approach is proposed for estimation of the parameters. Simulated data examples are given to evaluate the proposed designs and study their properties.
Clinical trial; discrete failure time model; missing data; potential outcomes; principal stratification; surrogate marker
Frangakis and Rubin (2002, Biometrics 58, 21–29) proposed a new definition of a surrogate endpoint (a “principal” surrogate) based on causal effects. We introduce an estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, which quantifies how well causal treatment effects on the biomarker predict causal treatment effects on the clinical endpoint. Although the CEP surface is not identifiable due to missing potential outcomes, it can be identified by incorporating a baseline covariate(s) that predicts the biomarker. Given case–cohort sampling of such a baseline predictor and the biomarker in a large blinded randomized clinical trial, we develop an estimated likelihood method for estimating the CEP surface. This estimation assesses the “surrogate value” of the biomarker for reliably predicting clinical treatment effects for the same or similar setting as the trial. A CEP surface plot provides a way to compare the surrogate value of multiple biomarkers. The approach is illustrated by the problem of assessing an immune response to a vaccine as a surrogate endpoint for infection.
Case cohort; Causal inference; Clinical trial; HIV vaccine; Postrandomization selection bias; Structural model; Prentice criteria; Principal stratification
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
principal stratification; causal inference; vaccine trial
Treatment noncompliance and missing outcomes at posttreatment assessments are common problems in field experiments in naturalistic settings. Although the two complications often occur simultaneously, statistical methods that address both complications have not been routinely considered in data analysis practice in the prevention research field. This paper shows that identification and estimation of causal treatment effects considering both noncompliance and missing outcomes can be relatively easily conducted under various missing data assumptions. We review a few assumptions on missing data in the presence of noncompliance, including the latent ignorability proposed by Frangakis and Rubin (Biometrika 86:365–379, 1999), and show how these assumptions can be used in the parametric complier average causal effect (CACE) estimation framework. As an easy way of sensitivity analysis, we propose the use of alternative missing data assumptions, which will provide a range of causal effect estimates. In this way, we are less likely to settle with a possibly biased causal effect estimate based on a single assumption. We demonstrate how alternative missing data assumptions affect identification of causal effects, focusing on the CACE. The data from the Johns Hopkins School Intervention Study (Ialongo et al., Am J Community Psychol 27:599–642, 1999) will be used as an example.
Causal inference; Complier average causal effect; Latent ignorability; Missing at random; Missing data; Noncompliance
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
There has been a recent emphasis on the identification of biomarkers and other biologic measures that may be potentially used as surrogate endpoints in clinical trials. We focus on the setting of data from a single clinical trial. In this paper, we consider a framework in which the surrogate must occur before the true endpoint. This suggests viewing the surrogate and true endpoints as semi-competing risks data; this approach is new to the literature on surrogate endpoints and leads to an asymmetrical treatment of the surrogate and true endpoints. However, such a data structure also conceptually complicates many of the previously considered measures of surrogacy in the literature. We propose novel estimation and inferential procedures for the relative effect and adjusted association quantities proposed by Buyse and Molenberghs (1998, Biometrics, 1014 – 1029). The proposed methodology is illustrated with application to simulated data, as well as to data from a leukemia study.
Bivariate survival data; Copula model; Dependent Censoring; Multivariate failure time data; Prentice criterion
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
Overall survival (OS) is the gold standard for the demonstration of a clinical benefit in cancer trials. Replacement of OS by a surrogate endpoint allows to reduce trial duration. To date, few surrogate endpoints have been validated in digestive oncology. The aim of this study was to draw up an ordered list of potential surrogate endpoints for OS in digestive cancer trials, by way of a survey among clinicians and methodologists. Secondary objective was to obtain their opinion on surrogacy and quality of life (QoL).
In 2007 and 2008, self administered sequential questionnaires were sent to a panel of French clinicians and methodologists involved in the conduct of cancer clinical trials. In the first questionnaire, panellists were asked to choose the most important characteristics defining a surrogate among six proposals, to give advantages and drawbacks of the surrogates, and to answer questions about their validation and use. Then they had to suggest potential surrogate endpoints for OS in each of the following tumour sites: oesophagus, stomach, liver, pancreas, biliary tract, lymphoma, colon, rectum, and anus. They finally gave their opinion on QoL as surrogate endpoint. In the second questionnaire, they had to classify the previously proposed candidate surrogates from the most (position #1) to the least relevant in their opinion.
Frequency at which the endpoints were chosen as first, second or third most relevant surrogates was calculated and served as final ranking.
Response rate was 30% (24/80) in the first round and 20% (16/80) in the second one. Participants highlighted key points concerning surrogacy. In particular, they reminded that a surrogate endpoint is expected to predict clinical benefit in a well-defined therapeutic situation. Half of them thought it was not relevant to study QoL as surrogate for OS.
DFS, in the neoadjuvant settings or early stages, and PFS, in the non operable or metastatic settings, were ranked first, with a frequency of more than 69% in 20 out of 22 settings. PFS was proposed in association with QoL in metastatic primary liver and stomach cancers (both 81%). This composite endpoint was ranked second in metastatic oesophageal (69%), colorectal (56%) and anal (56%) cancers, whereas QoL alone was also suggested in most metastatic situations.
Other endpoints frequently suggested were R0 resection in the neoadjuvant settings (oesophagus (69%), stomach (56%), pancreas (75%) and biliary tract (63%)) and response. An unexpected endpoint was metastatic PFS in non operable oesophageal (31%) and pancreatic (44%) cancers. Quality and results of surgical procedures like sphincter preservation were also cited as eligible surrogate endpoints in rectal (19%) and anal (50% in case of localized disease) cancers. Except for alpha-FP kinetic in hepatocellular carcinoma (13%) and CA19-9 decline (6%) in pancreas, few endpoints based on biological or tumour markers were proposed.
The overall results should help prioritise the endpoints to be statistically evaluated as surrogate for OS, so that trialists and clinicians can rely on endpoints that ensure relevant clinical benefit to the patient.
The meta-analytic approach to evaluating surrogate end points assesses the predictiveness of treatment effect on the surrogate toward treatment effect on the clinical end point based on multiple clinical trials. Definition and estimation of the correlation of treatment effects were developed in linear mixed models and later extended to binary or failure time outcomes on a case-by-case basis. In a general regression setting that covers nonnormal outcomes, we discuss in this paper several metrics that are useful in the meta-analytic evaluation of surrogacy. We propose a unified 3-step procedure to assess these metrics in settings with binary end points, time-to-event outcomes, or repeated measures. First, the joint distribution of estimated treatment effects is ascertained by an estimating equation approach; second, the restricted maximum likelihood method is used to estimate the means and the variance components of the random treatment effects; finally, confidence intervals are constructed by a parametric bootstrap procedure. The proposed method is evaluated by simulations and applications to 2 clinical trials.
Causal inference; Meta-analysis; Surrogacy
Pearl (2011) asked for the causal inference community to clarify the role of the principal stratification framework in the analysis of causal effects. Here, I argue that the notion of principal stratification has shed light on problems of non-compliance, censoring-by-death, and the analysis of post-infection outcomes; that it may be of use in considering problems of surrogacy but further development is needed; that it is of some use in assessing “direct effects”; but that it is not the appropriate tool for assessing “mediation.” There is nothing within the principal stratification framework that corresponds to a measure of an “indirect” or “mediated” effect.
causal inference; mediation; non-compliance; potential outcomes; principal stratification; surrogates
In this note, we address the problem of surrogacy using a causal modelling framework that differs substantially from the potential outcomes model that pervades the biostatistical literature. The framework comes from econometrics and conceptualizes direct effects of the surrogate endpoint on the true endpoint. While this framework can incorporate the so-called semi-competing risks data structure, we also derive a fundamental non-identifiability result. Relationships to existing causal modelling frameworks are also discussed.
Clinical Trial; Counterfactual; Dependence; Nonlinear response; Prentice Criterion; Rubin causal model
Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the signaling pathway based on the posterior distributions of parameters in the model. Under this framework, we consider network sparsity and model the existence of an association between two proteins both at the overall level across all experiments and at each individual experimental level. This allows us to infer the pairs of proteins that are associated with each other and their causal relationships. We also explicitly consider both intrinsic noise and measurement error. Markov chain Monte Carlo is implemented for statistical inference. We demonstrate that this hierarchical modeling can effectively pool information from different interventional experiments through simulation studies and real data analysis.
Bayesian network; dependency network; Gaussian graphical model; hierarchical model; interventional data; Markov chain Monte Carlo; mixture distribution; single cell measurements; signaling pathway
Recently, many Bayesian methods have been developed for dose-finding when simultaneously modeling both toxicity and efficacy outcomes in a blended phase I/II fashion. A further challenge arises when all the true efficacy data cannot be obtained quickly after the treatment, so that surrogate markers are instead used (e.g, in cancer trials). We propose a framework to jointly model the probabilities of toxicity, efficacy and surrogate efficacy given a particular dose. Our trivariate binary model is specified as a composition of two bivariate binary submodels. In particular, we extend the bCRM approach , as well as utilize the Gumbel copula of Thall and Cook . The resulting trivariate algorithm utilizes all the available data at any given time point, and can flexibly stop the trial early for either toxicity or efficacy. Our simulation studies demonstrate our proposed method can successfully improve dosage targeting efficiency and guard against excess toxicity over a variety of true model settings and degrees of surrogacy.
Bayesian adaptive methods; Continual reassessment method (CRM); Maximum tolerated dose (MTD); Phase I/II clinical trial; Surrogate efficacy; Toxicity
Although the frequentist paradigm has been the predominant approach to clinical trial design since the 1940s, it has several notable limitations. The alternative Bayesian paradigm has been greatly enhanced by advancements in computational algorithms and computer hardware. Compared to its frequentist counterpart, the Bayesian framework has several unique advantages, and its incorporation into clinical trial design is occurring more frequently. Using an extensive literature review to assess how Bayesian methods are used in clinical trials, we find them most commonly used for dose finding, efficacy monitoring, toxicity monitoring, diagnosis/decision making, and for studying pharmacokinetics/pharmacodynamics. The additional infrastructure required for implementing Bayesian methods in clinical trials may include specialized software programs to run the study design, simulation, and analysis, and Web-based applications, which are particularly useful for timely data entry and analysis. Trial success requires not only the development of proper tools but also timely and accurate execution of data entry, quality control, adaptive randomization, and Bayesian computation. The relative merit of the Bayesian and frequentist approaches continues to be the subject of debate in statistics. However, more evidence can be found showing the convergence of the two camps, at least at the practical level. Ultimately, better clinical trial methods lead to more efficient designs, lower sample sizes, more accurate conclusions, and better outcomes for patients enrolled in the trials. Bayesian methods offer attractive alternatives for better trials. More such trials should be designed and conducted to refine the approach and demonstrate its real benefit in action.
adaptive trial design; Bayesian paradigm; clinical trial conduct; frequentist paradigm; trial efficiency; trial ethics
When identification of causal effects relies on untestable assumptions regarding nonidentified parameters, sensitivity of causal effect estimates is often questioned. For proper interpretation of causal effect estimates in this situation, deriving bounds on causal parameters or exploring the sensitivity of estimates to scientifically plausible alternative assumptions can be critical. In this paper, we propose a practical way of bounding and sensitivity analysis, where multiple identifying assumptions are combined to construct tighter common bounds. In particular, we focus on the use of competing identifying assumptions that impose different restrictions on the same non-identified parameter. Since these assumptions are connected through the same parameter, direct translation across them is possible. Based on this cross-translatability, various information in the data, carried by alternative assumptions, can be effectively combined to construct tighter bounds on causal effects. Flexibility of the suggested approach is demonstrated focusing on the estimation of the complier average causal effect (CACE) in a randomized job search intervention trial that suffers from noncompliance and subsequent missing outcomes.
alternative assumptions; bounds; causal inference; missing data; noncompliance; principal stratification; sensitivity analysis
Surrogacy was prohibited in the Netherlands until 1994, at which time the Dutch law was changed from the general prohibition of surrogacy to the prohibition of commercial surrogacy. This paper describes the results from the first and only Dutch Centre for Non-commercial IVF Surrogacy between 1997 and 2004.
A prospective study was conducted of all intended parents, and surrogate mothers and their partners (if present), in which medical, psychological and legal aspects of patient selection were assessed by questionnaires and interviews developed for this study.
More than 500 couples enquired about surrogacy by telephone or e-mail. More than 200 couples applied for surrogacy in the Centre, of which, after extensive screening, 35 couples actually entered the IVF programme and 24 completed the treatment, resulting in 16 children being born to 13 women. Recommendations for non-commercial surrogacy are given, including abandoning the 1-year waiting period before adoption, currently dictated by law, avoiding a period of unnecessary psychological distress.
Our study has shown that non-commercial IVF surrogacy is feasible, with good results in terms of pregnancy outcome and psychological outcome for all parents, and with no legal problems relating to the adoption procedures arising. The extensive screening of medical, psychological and legal aspects was a key element in helping to ensure the safety and success of the procedure.
surrogacy; IVF/ICSI outcome; counselling; infertility; assisted reproduction
Confounding can be a major source of bias in non-experimental research. The authors recently introduced propensity score calibration (PSC), which combines propensity scores (PS) and regression calibration to address confounding by variables unobserved in the main study by using variables observed in a validation study. Here, the authors assess the performance of PSC using simulations in settings with and without violation of the key assumption of PSC: that the error-prone PS estimated in the main study is a surrogate for the gold-standard PS (i.e. contains no additional information on the outcome). The assumption can be assessed if data on the outcome are available in the validation study. If data are simulated allowing for surrogacy to be violated, results largely depend on the extent of violation. If surrogacy holds, PSC leads to bias reduction between 74 and 106 percent (>100 percent representing an overcorrection). If surrogacy is violated, PSC can lead to an increase in bias. Surrogacy is violated when the direction of confounding of the exposure-disease association caused by the unobserved variable(s) differs from that of the confounding due to observed variables. When surrogacy holds, PSC is a useful approach to adjust for unmeasured confounding using validation data.
bias (epidemiology); cohort studies; confounding factors (epidemiology); epidemiologic methods; propensity score calibration; research design
Many randomized clinical trials collect multivariate longitudinal measurements in different scales, for example, binary, ordinal, and continuous. Multilevel item response models are used to evaluate the global treatment effects across multiple outcomes while accounting for all sources of correlation. Continuous measurements are often assumed to be normally distributed. But the model inference is not robust when the normality assumption is violated because of heavy tails and outliers. In this article, we develop a Bayesian method for multilevel item response models replacing the normal distributions with symmetric heavy-tailed normal/independent distributions. The inference is conducted using a Bayesian framework via Markov Chain Monte Carlo simulation implemented in BUGS language. Our proposed method is evaluated by simulation studies and is applied to Earlier versus Later Levodopa Therapy in Parkinson’s Disease study, a motivating clinical trial assessing the effect of Levodopa therapy on the Parkinson’s disease progression rate.
item response theory; latent variable; Markov Chain Monte Carlo; robust inference; clinical trial
Health information technology evaluators need to distinguish between intervention efficacy as assessed in the ideal circumstances of clinical trials and intervention effectiveness as assessed in the real world circumstances of actual practice. Because current evaluation study designs do not routinely allow for this distinction, we have developed a framework for evaluation of implementation fidelity that considers health information technologies as complex interventions and makes use of common intervention components as defined in the Oxford Implementation Index. We also propose statistical methods for the evaluation of interventions at the system and component level using the Rubin Causal Model. We then describe how to apply this framework to evaluate an ongoing clinical trial of three health information technology interventions currently implemented in a 17,000 patient community-based health network caring for Medicaid beneficiaries in Durham County, North Carolina.
We investigated the putative surrogate endpoints (PSEs) of best response (BR), complete response (CR), confirmed response (CoR), and progression-free survival (PFS) for associations with Overall Survival (OS), and as possible surrogate endpoints for OS.
Individual patient (pt) data from 870 untreated ES-SCLC pts participating in 6 single-arm (274 pts) and 3 randomized trials (596 pts) were pooled. Patient-level associations between PSEs and OS were assessed by Cox models using landmark analyses. Trial-level surrogacy of PSEs assessed by the association of treatment effects on OS and individual PSEs. Trial-level surrogacy measures included: R2 from weighted least squares regression model (WLS R2), Spearman's correlation coefficient, and R2 from bivariate survival model (Copula R2).
Median OS and PFS were 9.6 (95% CI: 9.1-10.0) and 5.5 (95% CI: 5.2-5.9) months, respectively; BR, CR, and CoR rates were 44%, 22%, and 34%, respectively. Patient-level associations showed that PFS status at 4 months was a strong predictor of subsequent survival (HR=0.42 (95% CI: 0.35-0.51); concordance index=0.63; p<0.01), with 6-month PFS being the strongest (HR=0.41 (95% CI: 0.35-0.49); concordance index=0.66; p<0.01). At the trial-level, PFS showed the highest level of surrogacy for OS (WLS R2=0.79; Copula R2=0.80), explaining 79% of the variance in OS. Tumor response endpoints showed lower surrogacy levels (WLS R2≤0.48).
PFS was strongly associated with OS at both the patient and trial-level. PFS also shows promise as a potential surrogate for OS, but further validation is needed using data from a larger number of randomized phase III trials.
extensive-stage small cell lung cancer; surrogate endpoints; pooled analysis; progression-free survival; tumor response
This study aimed to prospectively examine families created using surrogacy over a 10-year period in the UK with respect to intending parents' and children's relationship with the surrogate mother, parents' decisions over disclosure and children's understanding of the nature of their conception.
Semi-structured interviews were administered by trained researchers to intending mothers, intending fathers and children on four occasions over a 10-year period. Forty-two families (19 with a genetic surrogate mother) participated when the child was 1-year old and by age 10 years, 33 families remained in the study. Data were collected on the frequency of contact with the surrogate mother, relationship with the surrogate, disclosure of surrogacy to the child and the child's understanding of their surrogacy birth.
Frequency of contact between surrogacy families and their surrogate mother decreased over time, particularly for families whose surrogate was a previously unknown genetic carrier (P < 0.001) (i.e. where they had met through a third party and the surrogate mother's egg was used to conceive the child). Most families reported harmonious relationships with their surrogate mother. At age 10 years, 19 (90%) children who had been informed of the nature of their conception had a good understanding of this and 13 of the 14 children who were in contact with their surrogate reported that they liked her.
Surrogacy families maintained good relationships with the surrogate mother over time. Children felt positive about their surrogate mother and their surrogacy birth. The sample size of this study was small and further, larger investigations are needed before firm conclusions can be drawn.
genetic surrogacy; gestational surrogacy; disclosure; surrogate
To investigate whether progression-free survival (PFS) can be considered a surrogate endpoint for overall survival (OS) in advanced non-small-cell lung cancer (NSCLC).
Meta-analysis of individual patient data from randomised trials.
Five randomised controlled trials comparing docetaxel-based chemotherapy with vinorelbine-based chemotherapy for the first-line treatment of NSCLC.
2331 patients with advanced NSCLC.
Primary and secondary outcome measures
Surrogacy of PFS for OS was assessed through the association between these endpoints and between the treatment effects on these endpoints. The surrogate threshold effect was the minimum treatment effect on PFS required to predict a non-zero treatment effect on OS.
The median follow-up of patients still alive was 23.4 months. Median OS was 10 months and median PFS was 5.5 months. The treatment effects on PFS and OS were correlated, whether using centres (R²=0.62, 95% CI 0.52 to 0.72) or prognostic strata (R²=0.72, 95% CI 0.60 to 0.84) as units of analysis. The surrogate threshold effect was a PFS hazard ratio (HR) of 0.49 using centres or 0.53 using prognostic strata.
These analyses provide only modest support for considering PFS as an acceptable surrogate for OS in patients with advanced NSCLC. Only treatments that have a major impact on PFS (risk reduction of at least 50%) would be expected to also have a significant effect on OS. Whether these results also apply to targeted therapies is an open question that requires independent evaluation.
It is frequently of interest to estimate the intervention effect that adjusts for post-randomization variables in clinical trials. In the recently completed HPTN 035 trial, there is differential condom use between the three microbicide gel arms and the No Gel control arm, so that intention to treat (ITT) analyses only assess the net treatment effect that includes the indirect treatment effect mediated through differential condom use. Various statistical methods in causal inference have been developed to adjust for post-randomization variables. We extend the principal stratification framework to time-varying behavioral variables in HIV prevention trials with a time-to-event endpoint, using a partially hidden Markov model (pHMM). We formulate the causal estimand of interest, establish assumptions that enable identifiability of the causal parameters, and develop maximum likelihood methods for estimation. Application of our model on the HPTN 035 trial reveals an interesting pattern of prevention effectiveness among different condom-use principal strata.
microbicide; causal inference; posttreatment variables; direct effect