In recent years, the medical and biostatistical literature has devoted considerable attention to surrogate versus true endpoints, where a true endpoint is the health outcome of interest and a surrogate endpoint is an outcome observed before the true endpoint that is used to make conclusions about the effect of intervention on the true endpoint (Weir and Walle, 2006
, and Lassere, 2008
). By using surrogate instead of true endpoints, clinicians can draw conclusions sooner, thereby potentially helping more future patients. However the use of surrogate endpoints to draw these conclusions involves the additional uncertainty associated with extrapolating estimates to an unobserved true endpoint.
To clearly and concisely discuss this extrapolation, we introduce the following terminology. A target trial is a randomized trial of the intervention of interest in which data are available on surrogate but not true endpoints. An historical trial is a previously conducted randomized trial with the same surrogate endpoint as in the target trial and the same true endpoint that would be observed with sufficiently long follow-up in the target trial, but typically comparing different interventions than in the target trial. A true result is the estimated effect of an intervention on the true endpoint in the target trial based on a comparison of the true endpoints from each arm of the target trial that would be observed with sufficiently long follow-up. A surrogate result is the estimated effect of an intervention on the surrogate endpoint in the target trial based on a comparison of surrogate endpoints from each arm of the target trial. A predicted result is the estimated effect of an intervention on the true endpoint in the target trial based on a comparison of predicted true endpoints from each arm of the target trial that are derived from the surrogate endpoints in the target trial and a model relating surrogate and true endpoints that is fit to data from the historical trials.
Analyzing trial data with surrogate endpoints requires an extrapolation procedure,
which is a method to draw conclusions about the true result in the target trial based on either the surrogate or predicted result. When substituting a surrogate result for a true result in the target trial, the extrapolation procedure is testing a null hypothesis of no effect of treatment on the surrogate endpoint and drawing conclusions about a null hypothesis of no effect of treatment on true endpoint. When the surrogate endpoint is binary, this extrapolation procedure yields correct conclusions if the following criteria hold in the target trial: (i
) the Prentice Criterion, namely that the true endpoint is conditionally independent of the randomization group given the surrogate endpoint, (ii
) treatment is prognostic for the surrogate and true endpoints, and (iii
) the surrogate endpoint is prognostic for the true endpoint (Prentice, 1989
; Buyse and Molenberghs, 1998
). When substituting a predicted result for the true result, an extrapolation procedure consists of computing a 95% confidence interval for the predicted result (Daniels and Hughes, 1997
; Gail et al., 2002; Korn, Albert, and McShane, 2005
Before using an extrapolation procedure to draw conclusions from a surrogate endpoint, it is important to compute a validation measure,
which is a statistic based on historical trials to determine if the extrapolation procedure will likely yield correct conclusions. For a surrogate result, one validation measure is a test of the Prentice Criterion in historical trials. However, failure to reject the Prentice Criterion does not ensure it holds. Another validation measure for a surrogate result is the proportion of treatment effect explained in a historical trial (Freedman, Graubard, and Schatzkin, 1992
). However, it is difficult to select an acceptable value of this measure. For a predicted result, a commonly used validation measure is
, which equals one minus the ratio of the estimated variances of the true result adjusting, versus not adjusting, for the surrogate result in the historical trials (Buyse et al., 2000
). However, selecting a threshold for
that suffices for validation is difficult (Burzykowski and Buyse, 2006
We consider surrogate and true endpoints involving probabilities of binary outcomes or probabilities of survival to a particular time computed from censored survival data. In this context, we make the following three contributions to the evaluation of an intervention in the target trial using the predicted result. First, we propose, as an extrapolation procedure, a 95% confidence interval for the predicted result which is derived from a prediction model and an estimated random extrapolation error. Second, we propose, as a validation measure, the coverage of the 95% confidence interval for the predicted result using a simulation based on the historical trials. Third, we propose, as a summary of the additional uncertainty when using predicted instead of true results, a standard error multiplier based on the estimated variances of predicted and true results.