A surrogate marker (S) is a variable that can be measured earlier and often easier than the true endpoint (T) in a clinical trial. Most previous research has been devoted to developing surrogacy measures to quantify how well S can replace T or examining the use of S in predicting the effect of a treatment (Z). However, the research often requires one to fit models for the distribution of T given S and Z. It is well known that such models do not have causal interpretations because the models condition on a post-randomization variable S. In this paper, we directly model the relationship among T, S and Z using a potential outcomes framework introduced by Frangakis and Rubin (2002). We propose a Bayesian estimation method to evaluate the causal probabilities associated with the cross-classification of the potential outcomes of S and T when S and T are both binary. We use a log-linear model to directly model the association between the potential outcomes of S and T through the odds ratios. The quantities derived from this approach always have causal interpretations. However, this causal model is not identifiable from the data without additional assumptions. To reduce the non-identifiability problem and increase the precision of statistical inferences, we assume monotonicity and incorporate prior belief that is plausible in the surrogate context by using prior distributions. We also explore the relationship among the surrogacy measures based on traditional models and this counterfactual model. The method is applied to the data from a glaucoma treatment study.
Bayesian Estimation; Counterfactual Model; Randomized Trial; Surrogate Marker
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition under the principal stratification framework proposed by Frangakis and Rubin [Biometrics 58 (2002) 21–29] and adapted by Gilbert and Hudgens (2006), we introduce estimands that measure the value of an immune response as a surrogate of protection in the context of the Cox proportional hazards model. The estimands are not identified because the immune response to vaccine is not measured in placebo recipients. We formulate the problem as a Cox model with missing covariates, and employ novel trial designs for predicting the missing immune responses and thereby identifying the estimands. The first design utilizes information from baseline predictors of the immune response, and bridges their relationship in the vaccine recipients to the placebo recipients. The second design provides a validation set for the unmeasured immune responses of uninfected placebo recipients by immunizing them with the study vaccine after trial closeout. A maximum estimated likelihood approach is proposed for estimation of the parameters. Simulated data examples are given to evaluate the proposed designs and study their properties.
Clinical trial; discrete failure time model; missing data; potential outcomes; principal stratification; surrogate marker
Frangakis and Rubin (2002, Biometrics 58, 21–29) proposed a new definition of a surrogate endpoint (a “principal” surrogate) based on causal effects. We introduce an estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, which quantifies how well causal treatment effects on the biomarker predict causal treatment effects on the clinical endpoint. Although the CEP surface is not identifiable due to missing potential outcomes, it can be identified by incorporating a baseline covariate(s) that predicts the biomarker. Given case–cohort sampling of such a baseline predictor and the biomarker in a large blinded randomized clinical trial, we develop an estimated likelihood method for estimating the CEP surface. This estimation assesses the “surrogate value” of the biomarker for reliably predicting clinical treatment effects for the same or similar setting as the trial. A CEP surface plot provides a way to compare the surrogate value of multiple biomarkers. The approach is illustrated by the problem of assessing an immune response to a vaccine as a surrogate endpoint for infection.
Case cohort; Causal inference; Clinical trial; HIV vaccine; Postrandomization selection bias; Structural model; Prentice criteria; Principal stratification
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
principal stratification; causal inference; vaccine trial
Treatment noncompliance and missing outcomes at posttreatment assessments are common problems in field experiments in naturalistic settings. Although the two complications often occur simultaneously, statistical methods that address both complications have not been routinely considered in data analysis practice in the prevention research field. This paper shows that identification and estimation of causal treatment effects considering both noncompliance and missing outcomes can be relatively easily conducted under various missing data assumptions. We review a few assumptions on missing data in the presence of noncompliance, including the latent ignorability proposed by Frangakis and Rubin (Biometrika 86:365–379, 1999), and show how these assumptions can be used in the parametric complier average causal effect (CACE) estimation framework. As an easy way of sensitivity analysis, we propose the use of alternative missing data assumptions, which will provide a range of causal effect estimates. In this way, we are less likely to settle with a possibly biased causal effect estimate based on a single assumption. We demonstrate how alternative missing data assumptions affect identification of causal effects, focusing on the CACE. The data from the Johns Hopkins School Intervention Study (Ialongo et al., Am J Community Psychol 27:599–642, 1999) will be used as an example.
Causal inference; Complier average causal effect; Latent ignorability; Missing at random; Missing data; Noncompliance
There has been a recent emphasis on the identification of biomarkers and other biologic measures that may be potentially used as surrogate endpoints in clinical trials. We focus on the setting of data from a single clinical trial. In this paper, we consider a framework in which the surrogate must occur before the true endpoint. This suggests viewing the surrogate and true endpoints as semi-competing risks data; this approach is new to the literature on surrogate endpoints and leads to an asymmetrical treatment of the surrogate and true endpoints. However, such a data structure also conceptually complicates many of the previously considered measures of surrogacy in the literature. We propose novel estimation and inferential procedures for the relative effect and adjusted association quantities proposed by Buyse and Molenberghs (1998, Biometrics, 1014 – 1029). The proposed methodology is illustrated with application to simulated data, as well as to data from a leukemia study.
Bivariate survival data; Copula model; Dependent Censoring; Multivariate failure time data; Prentice criterion
Overall survival (OS) is the gold standard for the demonstration of a clinical benefit in cancer trials. Replacement of OS by a surrogate endpoint allows to reduce trial duration. To date, few surrogate endpoints have been validated in digestive oncology. The aim of this study was to draw up an ordered list of potential surrogate endpoints for OS in digestive cancer trials, by way of a survey among clinicians and methodologists. Secondary objective was to obtain their opinion on surrogacy and quality of life (QoL).
In 2007 and 2008, self administered sequential questionnaires were sent to a panel of French clinicians and methodologists involved in the conduct of cancer clinical trials. In the first questionnaire, panellists were asked to choose the most important characteristics defining a surrogate among six proposals, to give advantages and drawbacks of the surrogates, and to answer questions about their validation and use. Then they had to suggest potential surrogate endpoints for OS in each of the following tumour sites: oesophagus, stomach, liver, pancreas, biliary tract, lymphoma, colon, rectum, and anus. They finally gave their opinion on QoL as surrogate endpoint. In the second questionnaire, they had to classify the previously proposed candidate surrogates from the most (position #1) to the least relevant in their opinion.
Frequency at which the endpoints were chosen as first, second or third most relevant surrogates was calculated and served as final ranking.
Response rate was 30% (24/80) in the first round and 20% (16/80) in the second one. Participants highlighted key points concerning surrogacy. In particular, they reminded that a surrogate endpoint is expected to predict clinical benefit in a well-defined therapeutic situation. Half of them thought it was not relevant to study QoL as surrogate for OS.
DFS, in the neoadjuvant settings or early stages, and PFS, in the non operable or metastatic settings, were ranked first, with a frequency of more than 69% in 20 out of 22 settings. PFS was proposed in association with QoL in metastatic primary liver and stomach cancers (both 81%). This composite endpoint was ranked second in metastatic oesophageal (69%), colorectal (56%) and anal (56%) cancers, whereas QoL alone was also suggested in most metastatic situations.
Other endpoints frequently suggested were R0 resection in the neoadjuvant settings (oesophagus (69%), stomach (56%), pancreas (75%) and biliary tract (63%)) and response. An unexpected endpoint was metastatic PFS in non operable oesophageal (31%) and pancreatic (44%) cancers. Quality and results of surgical procedures like sphincter preservation were also cited as eligible surrogate endpoints in rectal (19%) and anal (50% in case of localized disease) cancers. Except for alpha-FP kinetic in hepatocellular carcinoma (13%) and CA19-9 decline (6%) in pancreas, few endpoints based on biological or tumour markers were proposed.
The overall results should help prioritise the endpoints to be statistically evaluated as surrogate for OS, so that trialists and clinicians can rely on endpoints that ensure relevant clinical benefit to the patient.
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the signaling pathway based on the posterior distributions of parameters in the model. Under this framework, we consider network sparsity and model the existence of an association between two proteins both at the overall level across all experiments and at each individual experimental level. This allows us to infer the pairs of proteins that are associated with each other and their causal relationships. We also explicitly consider both intrinsic noise and measurement error. Markov chain Monte Carlo is implemented for statistical inference. We demonstrate that this hierarchical modeling can effectively pool information from different interventional experiments through simulation studies and real data analysis.
Bayesian network; dependency network; Gaussian graphical model; hierarchical model; interventional data; Markov chain Monte Carlo; mixture distribution; single cell measurements; signaling pathway
When identification of causal effects relies on untestable assumptions regarding nonidentified parameters, sensitivity of causal effect estimates is often questioned. For proper interpretation of causal effect estimates in this situation, deriving bounds on causal parameters or exploring the sensitivity of estimates to scientifically plausible alternative assumptions can be critical. In this paper, we propose a practical way of bounding and sensitivity analysis, where multiple identifying assumptions are combined to construct tighter common bounds. In particular, we focus on the use of competing identifying assumptions that impose different restrictions on the same non-identified parameter. Since these assumptions are connected through the same parameter, direct translation across them is possible. Based on this cross-translatability, various information in the data, carried by alternative assumptions, can be effectively combined to construct tighter bounds on causal effects. Flexibility of the suggested approach is demonstrated focusing on the estimation of the complier average causal effect (CACE) in a randomized job search intervention trial that suffers from noncompliance and subsequent missing outcomes.
alternative assumptions; bounds; causal inference; missing data; noncompliance; principal stratification; sensitivity analysis
Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the ‘common variance’ assumption). This approach ‘borrows strength’ for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary.
In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities.
In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach.
MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice.
The use of biological surrogates as proxies for biodiversity patterns is gaining popularity, particularly in marine systems where field surveys can be expensive and species richness high. Yet, uncertainty regarding their applicability remains because of inconsistency of definitions, a lack of standard methods for estimating effectiveness, and variable spatial scales considered. We present a Bayesian meta-analysis of the effectiveness of biological surrogates in marine ecosystems. Surrogate effectiveness was defined both as the proportion of surrogacy tests where predictions based on surrogates were better than random (i.e., low probability of making a Type I error; P) and as the predictability of targets using surrogates (R2). A total of 264 published surrogacy tests combined with prior probabilities elicited from eight international experts demonstrated that the habitat, spatial scale, type of surrogate and statistical method used all influenced surrogate effectiveness, at least according to either P or R2. The type of surrogate used (higher-taxa, cross-taxa or subset taxa) was the best predictor of P, with the higher-taxa surrogates outperforming all others. The marine habitat was the best predictor of R2, with particularly low predictability in tropical reefs. Surrogate effectiveness was greatest for higher-taxa surrogates at a <10-km spatial scale, in low-complexity marine habitats such as soft bottoms, and using multivariate-based methods. Comparisons with terrestrial studies in terms of the methods used to study surrogates revealed that marine applications still ignore some problems with several widely used statistical approaches to surrogacy. Our study provides a benchmark for the reliable use of biological surrogates in marine ecosystems, and highlights directions for future development of biological surrogates in predicting biodiversity.
Pearl (2011) asked for the causal inference community to clarify the role of the principal stratification framework in the analysis of causal effects. Here, I argue that the notion of principal stratification has shed light on problems of non-compliance, censoring-by-death, and the analysis of post-infection outcomes; that it may be of use in considering problems of surrogacy but further development is needed; that it is of some use in assessing “direct effects”; but that it is not the appropriate tool for assessing “mediation.” There is nothing within the principal stratification framework that corresponds to a measure of an “indirect” or “mediated” effect.
causal inference; mediation; non-compliance; potential outcomes; principal stratification; surrogates
In clinical trials, a biomarker (S) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z) on T. A useful biomarker can be measured earlier than T and cost less than T. In this paper we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed. In an ideal but often unrealistic setting, when S satisfies Prentice’s definition for perfect surrogacy, there is the potential for substantial gain in precision by using data from S to estimate the treatment effect on T. When S is not close to a perfect surrogate, it can provide substantial information only under particular circumstances. We propose to use a targeted shrinkage regression approach that data-adaptively takes advantage of the potential efficiency gain yet avoids the need to make a strong surrogacy assumption. Simulations show that this approach strikes a balance between bias and efficiency gain. Compared with competing methods, it has better mean squared error properties and can achieve substantial efficiency gain, particularly in a common practical setting when S captures much but not all of the treatment effect and the sample size is relatively small. We apply the proposed method to a glaucoma data example.
Auxiliary Variable; Biomarker; Randomized Trials; Ridge Regression; Missing Data
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.
Randomized trials; Reproducibility; Principal stratification
Health information technology evaluators need to distinguish between intervention efficacy as assessed in the ideal circumstances of clinical trials and intervention effectiveness as assessed in the real world circumstances of actual practice. Because current evaluation study designs do not routinely allow for this distinction, we have developed a framework for evaluation of implementation fidelity that considers health information technologies as complex interventions and makes use of common intervention components as defined in the Oxford Implementation Index. We also propose statistical methods for the evaluation of interventions at the system and component level using the Rubin Causal Model. We then describe how to apply this framework to evaluate an ongoing clinical trial of three health information technology interventions currently implemented in a 17,000 patient community-based health network caring for Medicaid beneficiaries in Durham County, North Carolina.
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models.
We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.
Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted.
The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient.
On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
Statisticians in medicine can disagree on appropriate methodology applicable to the design and analysis of clinical trials. So called Bayesians and frequentists both claim ethical superiority. This paper, by defining and then linking together various dichotomies, argues there is a place for both statistical camps. The choice between them depends on the phase of clinical trial, disease prevalence and severity, but supremely on the ethics underlying the particular trial. There is always a tension present between physicians primarily obligated to their own patients (the weight of 'individual ethics') and ethical committees responsible for the scientific merit of the trial and its long-term implications ('collective ethics'). Individual ethics, it is proposed, favour the Bayesian approach; collective ethics, the frequentist. Though in some situations the choice appears clear-cut, there remain other where both methodologies can be appropriate.
Genome-wide dense markers have been used to detect genes and estimate relative genetic values. Among many methods, Bayesian techniques have been widely used and shown to be powerful in genome-wide breeding value estimation and association studies. However, computation is known to be intensive under the Bayesian framework, and specifying a prior distribution for each parameter is always required for Bayesian computation. We propose the use of hierarchical likelihood to solve such problems.
Using double hierarchical generalized linear models, we analyzed the simulated dataset provided by the QTLMAS 2010 workshop. Marker-specific variances estimated by double hierarchical generalized linear models identified the QTL with large effects for both the quantitative and binary traits. The QTL positions were detected with very high accuracy. For young individuals without phenotypic records, the true and estimated breeding values had Pearson correlation of 0.60 for the quantitative trait and 0.72 for the binary trait, where the quantitative trait had a more complicated genetic architecture involving imprinting and epistatic QTL.
Hierarchical likelihood enables estimation of marker-specific variances under the likelihoodist framework. Double hierarchical generalized linear models are powerful in localizing major QTL and computationally fast.
Confounding can be a major source of bias in non-experimental research. The authors recently introduced propensity score calibration (PSC), which combines propensity scores (PS) and regression calibration to address confounding by variables unobserved in the main study by using variables observed in a validation study. Here, the authors assess the performance of PSC using simulations in settings with and without violation of the key assumption of PSC: that the error-prone PS estimated in the main study is a surrogate for the gold-standard PS (i.e. contains no additional information on the outcome). The assumption can be assessed if data on the outcome are available in the validation study. If data are simulated allowing for surrogacy to be violated, results largely depend on the extent of violation. If surrogacy holds, PSC leads to bias reduction between 74 and 106 percent (>100 percent representing an overcorrection). If surrogacy is violated, PSC can lead to an increase in bias. Surrogacy is violated when the direction of confounding of the exposure-disease association caused by the unobserved variable(s) differs from that of the confounding due to observed variables. When surrogacy holds, PSC is a useful approach to adjust for unmeasured confounding using validation data.
bias (epidemiology); cohort studies; confounding factors (epidemiology); epidemiologic methods; propensity score calibration; research design
The ACCENT group previously established disease-free survival (DFS) with 2 or 3 years median follow-up to predict 5 year overall survival (5yr OS) in stage II and III colon cancer. ACCENT further proposed (1) a stronger association between DFS and OS in stage III than II, and (2) 6 or 7 years necessary to demonstrate DFS/OS surrogacy in recent trials. The relationship between endpoints in trials with oral fluoropyrimidines, oxaliplatin, and irinotecan is unknown.
Associations between the treatment effect hazard ratios (HRs) on 2 and 3yr DFS, and 5 and 6yr OS were examined in 6 phase III trials not included in prior analyses from 1997-2002. Individual data for 12,676 patients were analyzed; two trials each tested oxaliplatin, irinotecan, and oral treatment vs 5-FU/LV.
Overall association between 2/3 yr DFS and 5/6 yr OS HRs was modest to poor (simple R2 measures: 0.58 to 0.76, model-based R2: 0.17 to 0.49). In stage III patients, the association increased (model-based R2≥0.79). Observed treatment effects on 2 yr DFS accurately 5/6 yr OS effects overall and in stage III patients.
In recent trials of cytotoxic chemotherapy, 2 or 3yr DFS HRs are highly predictive of 5 and 6yr OS HRs in stage III but not stage II patients. In all patients the DFS/OS association is stronger for 6yr OS, thus at least 6 year follow-up is recommended to assess OS benefit. These data support DFS as the primary endpoint for stage III colon cancer trials testing cytotoxic agents.
Funded by NCI Grant CA-25224 to the Mayo Clinic to support the North Central Cancer Treatment Group.
Multichannel electroencephalography (EEG) offers a non-invasive tool to explore spatio-temporal dynamics of brain activity. With EEG recordings consisting of multiple trials, traditional signal processing approaches that ignore inter-trial variability in the data may fail to accurately estimate the underlying spatio-temporal brain patterns. Moreover, precise characterization of such inter-trial variability per se can be of high scientific value in establishing the relationship between brain activity and behavior. In this paper, a statistical modeling framework is introduced for learning spatiotemporal decomposition of multiple-trial EEG data recorded under two contrasting experimental conditions. By modeling the variance of source signals as random variables varying across trials, the proposed two-stage hierarchical Bayesian model is able to capture inter-trial amplitude variability in the data in a sparse way where a parsimonious representation of the data can be obtained. A variational Bayesian (VB) algorithm is developed for statistical inference of the hierarchical model. The efficacy of the proposed modeling framework is validated with the analysis of both synthetic and real EEG data. In the simulation study we show that even at low signal-to-noise ratios our approach is able to recover with high precision the underlying spatiotemporal patterns and the evolution of source amplitude across trials; on two brain-computer interface (BCI) data sets we show that our VB algorithm can extract physiologically meaningful spatio-temporal patterns and make more accurate predictions than other two widely used algorithms: the common spatial patterns (CSP) algorithm and the Infomax algorithm for independent component analysis (ICA). The results demonstrate that our statistical modeling framework can serve as a powerful tool for extracting brain patterns, characterizing trial-to-trial brain dynamics, and decoding brain states by exploiting useful structures in the data.
Hierarchical Bayesian; Variational Bayesian; Common spatial patterns; Spatio-temporal decomposition; Inter-trial variability; Sparse learning; Brain-computer interface
We investigated the putative surrogate endpoints (PSEs) of best response (BR), complete response (CR), confirmed response (CoR), and progression-free survival (PFS) for associations with Overall Survival (OS), and as possible surrogate endpoints for OS.
Individual patient (pt) data from 870 untreated ES-SCLC pts participating in 6 single-arm (274 pts) and 3 randomized trials (596 pts) were pooled. Patient-level associations between PSEs and OS were assessed by Cox models using landmark analyses. Trial-level surrogacy of PSEs assessed by the association of treatment effects on OS and individual PSEs. Trial-level surrogacy measures included: R2 from weighted least squares regression model (WLS R2), Spearman's correlation coefficient, and R2 from bivariate survival model (Copula R2).
Median OS and PFS were 9.6 (95% CI: 9.1-10.0) and 5.5 (95% CI: 5.2-5.9) months, respectively; BR, CR, and CoR rates were 44%, 22%, and 34%, respectively. Patient-level associations showed that PFS status at 4 months was a strong predictor of subsequent survival (HR=0.42 (95% CI: 0.35-0.51); concordance index=0.63; p<0.01), with 6-month PFS being the strongest (HR=0.41 (95% CI: 0.35-0.49); concordance index=0.66; p<0.01). At the trial-level, PFS showed the highest level of surrogacy for OS (WLS R2=0.79; Copula R2=0.80), explaining 79% of the variance in OS. Tumor response endpoints showed lower surrogacy levels (WLS R2≤0.48).
PFS was strongly associated with OS at both the patient and trial-level. PFS also shows promise as a potential surrogate for OS, but further validation is needed using data from a larger number of randomized phase III trials.
extensive-stage small cell lung cancer; surrogate endpoints; pooled analysis; progression-free survival; tumor response