A surrogate marker (S) is a variable that can be measured earlier and often easier than the true endpoint (T) in a clinical trial. Most previous research has been devoted to developing surrogacy measures to quantify how well S can replace T or examining the use of S in predicting the effect of a treatment (Z). However, the research often requires one to fit models for the distribution of T given S and Z. It is well known that such models do not have causal interpretations because the models condition on a post-randomization variable S. In this paper, we directly model the relationship among T, S and Z using a potential outcomes framework introduced by Frangakis and Rubin (2002). We propose a Bayesian estimation method to evaluate the causal probabilities associated with the cross-classification of the potential outcomes of S and T when S and T are both binary. We use a log-linear model to directly model the association between the potential outcomes of S and T through the odds ratios. The quantities derived from this approach always have causal interpretations. However, this causal model is not identifiable from the data without additional assumptions. To reduce the non-identifiability problem and increase the precision of statistical inferences, we assume monotonicity and incorporate prior belief that is plausible in the surrogate context by using prior distributions. We also explore the relationship among the surrogacy measures based on traditional models and this counterfactual model. The method is applied to the data from a glaucoma treatment study.
Bayesian Estimation; Counterfactual Model; Randomized Trial; Surrogate Marker
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
Assessing immune responses to study vaccines as surrogates of protection plays a central role in vaccine clinical trials. Motivated by three ongoing or pending HIV vaccine efficacy trials, we consider such surrogate endpoint assessment in a randomized placebo-controlled trial with case-cohort sampling of immune responses and a time to event endpoint. Based on the principal surrogate definition under the principal stratification framework proposed by Frangakis and Rubin [Biometrics 58 (2002) 21–29] and adapted by Gilbert and Hudgens (2006), we introduce estimands that measure the value of an immune response as a surrogate of protection in the context of the Cox proportional hazards model. The estimands are not identified because the immune response to vaccine is not measured in placebo recipients. We formulate the problem as a Cox model with missing covariates, and employ novel trial designs for predicting the missing immune responses and thereby identifying the estimands. The first design utilizes information from baseline predictors of the immune response, and bridges their relationship in the vaccine recipients to the placebo recipients. The second design provides a validation set for the unmeasured immune responses of uninfected placebo recipients by immunizing them with the study vaccine after trial closeout. A maximum estimated likelihood approach is proposed for estimation of the parameters. Simulated data examples are given to evaluate the proposed designs and study their properties.
Clinical trial; discrete failure time model; missing data; potential outcomes; principal stratification; surrogate marker
Frangakis and Rubin (2002, Biometrics 58, 21–29) proposed a new definition of a surrogate endpoint (a “principal” surrogate) based on causal effects. We introduce an estimand for evaluating a principal surrogate, the causal effect predictiveness (CEP) surface, which quantifies how well causal treatment effects on the biomarker predict causal treatment effects on the clinical endpoint. Although the CEP surface is not identifiable due to missing potential outcomes, it can be identified by incorporating a baseline covariate(s) that predicts the biomarker. Given case–cohort sampling of such a baseline predictor and the biomarker in a large blinded randomized clinical trial, we develop an estimated likelihood method for estimating the CEP surface. This estimation assesses the “surrogate value” of the biomarker for reliably predicting clinical treatment effects for the same or similar setting as the trial. A CEP surface plot provides a way to compare the surrogate value of multiple biomarkers. The approach is illustrated by the problem of assessing an immune response to a vaccine as a surrogate endpoint for infection.
Case cohort; Causal inference; Clinical trial; HIV vaccine; Postrandomization selection bias; Structural model; Prentice criteria; Principal stratification
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al. 1996), which is defined as the treatment effect for subjects who would comply regardless of the assigned treatment. Following the idea of principal stratification (Frangakis & Rubin 2002), we define principal compliance (Little et al. 2009) in trials with three treatment arms, extend CACE and define causal estimands of interest in this setting. In addition, we discuss structural assumptions needed for estimation of causal effects and the identifiability problem inherent in this setting from both a Bayesian and a classical statistical perspective. We propose a likelihood-based framework that models potential outcomes in this setting and a Bayes procedure for statistical inference. We compare our method with a method of moments approach proposed by Cheng & Small (2006) using a hypothetical data set, and further illustrate our approach with an application to a behavioral intervention study (Janevic et al. 2003).
Causal Inference; Complier Average Causal Effect; Multi-arm Trials; Non-compliance; Principal Compliance; Principal Stratification
This commentary takes up Pearl's welcome challenge to clearly articulate the scientific value of principal stratification estimands that we and colleagues have investigated, in the area of randomized placebo-controlled preventive vaccine efficacy trials, especially trials of HIV vaccines. After briefly arguing that certain principal stratification estimands for studying vaccine effects on post-infection outcomes are of genuine scientific interest, the bulk of our commentary argues that the “causal effect predictiveness” (CEP) principal stratification estimand for evaluating immune biomarkers as surrogate endpoints is not of ultimate scientific interest, because it evaluates surrogacy restricted to the setting of a particular vaccine efficacy trial, but is nevertheless useful for guiding the selection of primary immune biomarker endpoints in Phase I/II vaccine trials and for facilitating assessment of transportability/bridging surrogacy.
principal stratification; causal inference; vaccine trial
Treatment noncompliance and missing outcomes at posttreatment assessments are common problems in field experiments in naturalistic settings. Although the two complications often occur simultaneously, statistical methods that address both complications have not been routinely considered in data analysis practice in the prevention research field. This paper shows that identification and estimation of causal treatment effects considering both noncompliance and missing outcomes can be relatively easily conducted under various missing data assumptions. We review a few assumptions on missing data in the presence of noncompliance, including the latent ignorability proposed by Frangakis and Rubin (Biometrika 86:365–379, 1999), and show how these assumptions can be used in the parametric complier average causal effect (CACE) estimation framework. As an easy way of sensitivity analysis, we propose the use of alternative missing data assumptions, which will provide a range of causal effect estimates. In this way, we are less likely to settle with a possibly biased causal effect estimate based on a single assumption. We demonstrate how alternative missing data assumptions affect identification of causal effects, focusing on the CACE. The data from the Johns Hopkins School Intervention Study (Ialongo et al., Am J Community Psychol 27:599–642, 1999) will be used as an example.
Causal inference; Complier average causal effect; Latent ignorability; Missing at random; Missing data; Noncompliance
There has been a recent emphasis on the identification of biomarkers and other biologic measures that may be potentially used as surrogate endpoints in clinical trials. We focus on the setting of data from a single clinical trial. In this paper, we consider a framework in which the surrogate must occur before the true endpoint. This suggests viewing the surrogate and true endpoints as semi-competing risks data; this approach is new to the literature on surrogate endpoints and leads to an asymmetrical treatment of the surrogate and true endpoints. However, such a data structure also conceptually complicates many of the previously considered measures of surrogacy in the literature. We propose novel estimation and inferential procedures for the relative effect and adjusted association quantities proposed by Buyse and Molenberghs (1998, Biometrics, 1014 – 1029). The proposed methodology is illustrated with application to simulated data, as well as to data from a leukemia study.
Bivariate survival data; Copula model; Dependent Censoring; Multivariate failure time data; Prentice criterion
Overall survival (OS) is the gold standard for the demonstration of a clinical benefit in cancer trials. Replacement of OS by a surrogate endpoint allows to reduce trial duration. To date, few surrogate endpoints have been validated in digestive oncology. The aim of this study was to draw up an ordered list of potential surrogate endpoints for OS in digestive cancer trials, by way of a survey among clinicians and methodologists. Secondary objective was to obtain their opinion on surrogacy and quality of life (QoL).
In 2007 and 2008, self administered sequential questionnaires were sent to a panel of French clinicians and methodologists involved in the conduct of cancer clinical trials. In the first questionnaire, panellists were asked to choose the most important characteristics defining a surrogate among six proposals, to give advantages and drawbacks of the surrogates, and to answer questions about their validation and use. Then they had to suggest potential surrogate endpoints for OS in each of the following tumour sites: oesophagus, stomach, liver, pancreas, biliary tract, lymphoma, colon, rectum, and anus. They finally gave their opinion on QoL as surrogate endpoint. In the second questionnaire, they had to classify the previously proposed candidate surrogates from the most (position #1) to the least relevant in their opinion.
Frequency at which the endpoints were chosen as first, second or third most relevant surrogates was calculated and served as final ranking.
Response rate was 30% (24/80) in the first round and 20% (16/80) in the second one. Participants highlighted key points concerning surrogacy. In particular, they reminded that a surrogate endpoint is expected to predict clinical benefit in a well-defined therapeutic situation. Half of them thought it was not relevant to study QoL as surrogate for OS.
DFS, in the neoadjuvant settings or early stages, and PFS, in the non operable or metastatic settings, were ranked first, with a frequency of more than 69% in 20 out of 22 settings. PFS was proposed in association with QoL in metastatic primary liver and stomach cancers (both 81%). This composite endpoint was ranked second in metastatic oesophageal (69%), colorectal (56%) and anal (56%) cancers, whereas QoL alone was also suggested in most metastatic situations.
Other endpoints frequently suggested were R0 resection in the neoadjuvant settings (oesophagus (69%), stomach (56%), pancreas (75%) and biliary tract (63%)) and response. An unexpected endpoint was metastatic PFS in non operable oesophageal (31%) and pancreatic (44%) cancers. Quality and results of surgical procedures like sphincter preservation were also cited as eligible surrogate endpoints in rectal (19%) and anal (50% in case of localized disease) cancers. Except for alpha-FP kinetic in hepatocellular carcinoma (13%) and CA19-9 decline (6%) in pancreas, few endpoints based on biological or tumour markers were proposed.
The overall results should help prioritise the endpoints to be statistically evaluated as surrogate for OS, so that trialists and clinicians can rely on endpoints that ensure relevant clinical benefit to the patient.
Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this paper, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. While marginal risks do not measure causal associations of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.
Estimated likelihood; Identifiability; Principal stratification; Sensitivity analysis; Surrogate endpoint; Vaccine trials
The meta-analytic approach to evaluating surrogate end points assesses the predictiveness of treatment effect on the surrogate toward treatment effect on the clinical end point based on multiple clinical trials. Definition and estimation of the correlation of treatment effects were developed in linear mixed models and later extended to binary or failure time outcomes on a case-by-case basis. In a general regression setting that covers nonnormal outcomes, we discuss in this paper several metrics that are useful in the meta-analytic evaluation of surrogacy. We propose a unified 3-step procedure to assess these metrics in settings with binary end points, time-to-event outcomes, or repeated measures. First, the joint distribution of estimated treatment effects is ascertained by an estimating equation approach; second, the restricted maximum likelihood method is used to estimate the means and the variance components of the random treatment effects; finally, confidence intervals are constructed by a parametric bootstrap procedure. The proposed method is evaluated by simulations and applications to 2 clinical trials.
Causal inference; Meta-analysis; Surrogacy
In this note, we address the problem of surrogacy using a causal modelling framework that differs substantially from the potential outcomes model that pervades the biostatistical literature. The framework comes from econometrics and conceptualizes direct effects of the surrogate endpoint on the true endpoint. While this framework can incorporate the so-called semi-competing risks data structure, we also derive a fundamental non-identifiability result. Relationships to existing causal modelling frameworks are also discussed.
Clinical Trial; Counterfactual; Dependence; Nonlinear response; Prentice Criterion; Rubin causal model
Recent technological advances have made it possible to simultaneously measure multiple protein activities at the single cell level. With such data collected under different stimulatory or inhibitory conditions, it is possible to infer the causal relationships among proteins from single cell interventional data. In this article we propose a Bayesian hierarchical modeling framework to infer the signaling pathway based on the posterior distributions of parameters in the model. Under this framework, we consider network sparsity and model the existence of an association between two proteins both at the overall level across all experiments and at each individual experimental level. This allows us to infer the pairs of proteins that are associated with each other and their causal relationships. We also explicitly consider both intrinsic noise and measurement error. Markov chain Monte Carlo is implemented for statistical inference. We demonstrate that this hierarchical modeling can effectively pool information from different interventional experiments through simulation studies and real data analysis.
Bayesian network; dependency network; Gaussian graphical model; hierarchical model; interventional data; Markov chain Monte Carlo; mixture distribution; single cell measurements; signaling pathway
When identification of causal effects relies on untestable assumptions regarding nonidentified parameters, sensitivity of causal effect estimates is often questioned. For proper interpretation of causal effect estimates in this situation, deriving bounds on causal parameters or exploring the sensitivity of estimates to scientifically plausible alternative assumptions can be critical. In this paper, we propose a practical way of bounding and sensitivity analysis, where multiple identifying assumptions are combined to construct tighter common bounds. In particular, we focus on the use of competing identifying assumptions that impose different restrictions on the same non-identified parameter. Since these assumptions are connected through the same parameter, direct translation across them is possible. Based on this cross-translatability, various information in the data, carried by alternative assumptions, can be effectively combined to construct tighter bounds on causal effects. Flexibility of the suggested approach is demonstrated focusing on the estimation of the complier average causal effect (CACE) in a randomized job search intervention trial that suffers from noncompliance and subsequent missing outcomes.
alternative assumptions; bounds; causal inference; missing data; noncompliance; principal stratification; sensitivity analysis
Multiple treatment comparison (MTC) meta-analyses are commonly modeled in a Bayesian framework, and weakly informative priors are typically preferred to mirror familiar data driven frequentist approaches. Random-effects MTCs have commonly modeled heterogeneity under the assumption that the between-trial variance for all involved treatment comparisons are equal (i.e., the ‘common variance’ assumption). This approach ‘borrows strength’ for heterogeneity estimation across treatment comparisons, and thus, ads valuable precision when data is sparse. The homogeneous variance assumption, however, is unrealistic and can severely bias variance estimates. Consequently 95% credible intervals may not retain nominal coverage, and treatment rank probabilities may become distorted. Relaxing the homogeneous variance assumption may be equally problematic due to reduced precision. To regain good precision, moderately informative variance priors or additional mathematical assumptions may be necessary.
In this paper we describe four novel approaches to modeling heterogeneity variance - two novel model structures, and two approaches for use of moderately informative variance priors. We examine the relative performance of all approaches in two illustrative MTC data sets. We particularly compare between-study heterogeneity estimates and model fits, treatment effect estimates and 95% credible intervals, and treatment rank probabilities.
In both data sets, use of moderately informative variance priors constructed from the pair wise meta-analysis data yielded the best model fit and narrower credible intervals. Imposing consistency equations on variance estimates, assuming variances to be exchangeable, or using empirically informed variance priors also yielded good model fits and narrow credible intervals. The homogeneous variance model yielded high precision at all times, but overall inadequate estimates of between-trial variances. Lastly, treatment rankings were similar among the novel approaches, but considerably different when compared with the homogenous variance approach.
MTC models using a homogenous variance structure appear to perform sub-optimally when between-trial variances vary between comparisons. Using informative variance priors, assuming exchangeability or imposing consistency between heterogeneity variances can all ensure sufficiently reliable and realistic heterogeneity estimation, and thus more reliable MTC inferences. All four approaches should be viable candidates for replacing or supplementing the conventional homogeneous variance MTC model, which is currently the most widely used in practice.
To investigate whether progression-free survival (PFS) can be considered a surrogate endpoint for overall survival (OS) in advanced non-small-cell lung cancer (NSCLC).
Meta-analysis of individual patient data from randomised trials.
Five randomised controlled trials comparing docetaxel-based chemotherapy with vinorelbine-based chemotherapy for the first-line treatment of NSCLC.
2331 patients with advanced NSCLC.
Primary and secondary outcome measures
Surrogacy of PFS for OS was assessed through the association between these endpoints and between the treatment effects on these endpoints. The surrogate threshold effect was the minimum treatment effect on PFS required to predict a non-zero treatment effect on OS.
The median follow-up of patients still alive was 23.4 months. Median OS was 10 months and median PFS was 5.5 months. The treatment effects on PFS and OS were correlated, whether using centres (R²=0.62, 95% CI 0.52 to 0.72) or prognostic strata (R²=0.72, 95% CI 0.60 to 0.84) as units of analysis. The surrogate threshold effect was a PFS hazard ratio (HR) of 0.49 using centres or 0.53 using prognostic strata.
These analyses provide only modest support for considering PFS as an acceptable surrogate for OS in patients with advanced NSCLC. Only treatments that have a major impact on PFS (risk reduction of at least 50%) would be expected to also have a significant effect on OS. Whether these results also apply to targeted therapies is an open question that requires independent evaluation.
This study aimed to prospectively examine families created using surrogacy over a 10-year period in the UK with respect to intending parents' and children's relationship with the surrogate mother, parents' decisions over disclosure and children's understanding of the nature of their conception.
Semi-structured interviews were administered by trained researchers to intending mothers, intending fathers and children on four occasions over a 10-year period. Forty-two families (19 with a genetic surrogate mother) participated when the child was 1-year old and by age 10 years, 33 families remained in the study. Data were collected on the frequency of contact with the surrogate mother, relationship with the surrogate, disclosure of surrogacy to the child and the child's understanding of their surrogacy birth.
Frequency of contact between surrogacy families and their surrogate mother decreased over time, particularly for families whose surrogate was a previously unknown genetic carrier (P < 0.001) (i.e. where they had met through a third party and the surrogate mother's egg was used to conceive the child). Most families reported harmonious relationships with their surrogate mother. At age 10 years, 19 (90%) children who had been informed of the nature of their conception had a good understanding of this and 13 of the 14 children who were in contact with their surrogate reported that they liked her.
Surrogacy families maintained good relationships with the surrogate mother over time. Children felt positive about their surrogate mother and their surrogacy birth. The sample size of this study was small and further, larger investigations are needed before firm conclusions can be drawn.
genetic surrogacy; gestational surrogacy; disclosure; surrogate
The use of biological surrogates as proxies for biodiversity patterns is gaining popularity, particularly in marine systems where field surveys can be expensive and species richness high. Yet, uncertainty regarding their applicability remains because of inconsistency of definitions, a lack of standard methods for estimating effectiveness, and variable spatial scales considered. We present a Bayesian meta-analysis of the effectiveness of biological surrogates in marine ecosystems. Surrogate effectiveness was defined both as the proportion of surrogacy tests where predictions based on surrogates were better than random (i.e., low probability of making a Type I error; P) and as the predictability of targets using surrogates (R2). A total of 264 published surrogacy tests combined with prior probabilities elicited from eight international experts demonstrated that the habitat, spatial scale, type of surrogate and statistical method used all influenced surrogate effectiveness, at least according to either P or R2. The type of surrogate used (higher-taxa, cross-taxa or subset taxa) was the best predictor of P, with the higher-taxa surrogates outperforming all others. The marine habitat was the best predictor of R2, with particularly low predictability in tropical reefs. Surrogate effectiveness was greatest for higher-taxa surrogates at a <10-km spatial scale, in low-complexity marine habitats such as soft bottoms, and using multivariate-based methods. Comparisons with terrestrial studies in terms of the methods used to study surrogates revealed that marine applications still ignore some problems with several widely used statistical approaches to surrogacy. Our study provides a benchmark for the reliable use of biological surrogates in marine ecosystems, and highlights directions for future development of biological surrogates in predicting biodiversity.
Mainly, two statistical methodologies are applicable to the design and analysis of clinical trials: frequentist and Bayesian. Most traditional clinical trial designs are based on frequentist statistics. In frequentist statistics prior information is utilized formally only in the design of a clinical trial but not in the analysis of the data. On the other hand, Bayesian statistics provide a formal mathematical method for combining prior information with current information at the design stage, during the conduct of the trial, and at the analysis stage. It is easier to implement adaptive trial designs using Bayesian methods than frequentist methods. The Bayesian approach can also be applied for post-marketing surveillance purposes and in meta-analysis. The basic tenets of good trial design are same for both Bayesian and frequentist trials. It has been recommended that the type of analysis to be used (Bayesian or frequentist) should be chosen beforehand. Switching to an analysis method that produces a more favorable outcome after observing the data is not recommended.
Adaptive trial; Bayesian statistics; drug development
Pearl (2011) asked for the causal inference community to clarify the role of the principal stratification framework in the analysis of causal effects. Here, I argue that the notion of principal stratification has shed light on problems of non-compliance, censoring-by-death, and the analysis of post-infection outcomes; that it may be of use in considering problems of surrogacy but further development is needed; that it is of some use in assessing “direct effects”; but that it is not the appropriate tool for assessing “mediation.” There is nothing within the principal stratification framework that corresponds to a measure of an “indirect” or “mediated” effect.
causal inference; mediation; non-compliance; potential outcomes; principal stratification; surrogates
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable. This can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides an overall estimate of the causal relationship between the phenotype and the outcome, and an assessment of its heterogeneity across studies. As an example, we estimate the causal relationship of blood concentrations of C-reactive protein on fibrinogen levels using data from 11 studies. These methods provide a flexible framework for efficient estimation of causal relationships derived from multiple studies. Issues discussed include weak instrument bias, analysis of binary outcome data such as disease risk, missing genetic data, and the use of haplotypes.
Mendelian randomization; instrumental variables; causal association; meta-analysis; Bayesian methods
In clinical trials, a biomarker (S) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z) on T. A useful biomarker can be measured earlier than T and cost less than T. In this paper we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed. In an ideal but often unrealistic setting, when S satisfies Prentice’s definition for perfect surrogacy, there is the potential for substantial gain in precision by using data from S to estimate the treatment effect on T. When S is not close to a perfect surrogate, it can provide substantial information only under particular circumstances. We propose to use a targeted shrinkage regression approach that data-adaptively takes advantage of the potential efficiency gain yet avoids the need to make a strong surrogacy assumption. Simulations show that this approach strikes a balance between bias and efficiency gain. Compared with competing methods, it has better mean squared error properties and can achieve substantial efficiency gain, particularly in a common practical setting when S captures much but not all of the treatment effect and the sample size is relatively small. We apply the proposed method to a glaucoma data example.
Auxiliary Variable; Biomarker; Randomized Trials; Ridge Regression; Missing Data
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.
Randomized trials; Reproducibility; Principal stratification
Health information technology evaluators need to distinguish between intervention efficacy as assessed in the ideal circumstances of clinical trials and intervention effectiveness as assessed in the real world circumstances of actual practice. Because current evaluation study designs do not routinely allow for this distinction, we have developed a framework for evaluation of implementation fidelity that considers health information technologies as complex interventions and makes use of common intervention components as defined in the Oxford Implementation Index. We also propose statistical methods for the evaluation of interventions at the system and component level using the Rubin Causal Model. We then describe how to apply this framework to evaluate an ongoing clinical trial of three health information technology interventions currently implemented in a 17,000 patient community-based health network caring for Medicaid beneficiaries in Durham County, North Carolina.
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models.
We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.
Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted.
The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient.
On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.