Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2890300

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Measure of the Proportion of Treatment Effect Explained by the Surrogate
- 3. Alternative Measure of the Proportion of Treatment Effect Explained by the Surrogate: A Better Approach
- 4. Simulation Study
- 5. An Example From an Ophthalmology Study
- 6. Conclusion and Discussion
- REFERENCES

Authors

Related links

Stat Biopharm Res. Author manuscript; available in PMC 2010 June 23.

Published in final edited form as:

Stat Biopharm Res. 2010 May 1; 2(2): 229–238.

doi: 10.1198/sbr.2009.0070PMCID: PMC2890300

NIHMSID: NIHMS171595

Jie Huang, Novartis Pharmaceuticals, Oncology Business Unit, East Hanover, NJ 07936 (Email: moc.sitravon@gnauh.eij).

See other articles in PMC that cite the published article.

Using surrogate endpoints in clinical trials is desirable for drug development because the trials can be shortened and therefore more cost-effective. Validating a surrogate for the clinical endpoint is critical in this context. One of the key steps in statistical validation of a surrogate for a single trial is to estimate the proportion of treatment effect explained (PTE or PE) by a surrogate. Often the measure for PTE is estimated from the difference in coefficients of treatment from two models with or without adjusting for the surrogate for clinical endpoint. Inherent problems with the method are: the two models may not be valid simultaneously; and the estimate can often lie outside the interval [0, 1]. In this article, we provide alternative measures for evaluating the proportion of treatment effect explained by a surrogate in logistic or probit regression models. Our measures can be estimated easily with any statistical programs capable of binary linear regression modeling, and the interpretation of the measures can be illustrated using Ordinal Dominance (OD) curves. The concept can be visually understood by any practical user. Simulation shows our alternative measures yield more accurate estimates which are less biased, less variable, and with narrower confidence intervals. A clinical trial example is provided.

In clinical trials, study endpoints such as cancer survival or cardiovascular event require a prolonged follow-up time. The trials can be costly; it may be difficult to enroll patients, and even more difficult to follow and monitor patients. However, a surrogate endpoint can usually be observed early during a trial, and can thus be used as an attractive substitute for the clinical endpoint in studying treatment effect. Using surrogate endpoints, trials may be shortened, and it may be possible to avoid fatal clinical endpoints before drug approval. It makes efficacious treatment available to patients sooner, saves patients’ lives, and reduces medical expenditure. However, a critical question must be answered beforehand: What may serve as a valid surrogate endpoint? Both the U.S. regulatory agency Food and Drug Administration (FDA) (Temple 1995) and the European Medicines Agency (EMEA) (EMEA/CHMP report 2007) have recognized the importance of developing and validating surrogate endpoints. The validation of biomarkers as surrogate endpoints is part of FDA’s “critical path initiative.” Workshops and meetings were organized among the regulatory agencies, industry representatives, and the academics to discuss and set out position statements on surrogate endpoints (An NIDAID Workshop 1989; Biomarkers Definitions Working Group 2001; DeGruttola et al. 2001). In recent years, statistical validation of surrogate endpoints has attracted increased attention.

In his landmark article, Prentice (1989) defined a surrogate as “a response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the true endpoint.” He defined three criteria for admitting a valid surrogate. One of the criteria requires that “the full effect of treatment on the true endpoint is captured by the surrogate, that is, *f* (*T* |*S, Z*) = *f* (*T* |*S*).” This condition is rather stringent and unlikely to be true in practice. To validate this equation, Freedman, Graubard, and Schatzkin (1992) proposed a statistic to measure the proportion of the treatment effect explained by the surrogate (hereafter PE^{(FGS)}). This statistic is defined as the percentage change of treatment effects estimated from two models with or without adjusting for the surrogate marker. It is well known that PE^{(FGS)} suffers from several serious drawbacks (DeGruttola et al. 1997; Bycott and Taylor 1998; Wang and Taylor 2002). In particular, either the point estimate or its confidence interval estimate can be out of the range [0, 1]. As a result, it often fails to provide a meaningful assessment of “the proportion of treatment effect explained by a surrogate marker.” Subsequently, Wang and Taylor (2002) provided the more desirable alternative measures *F*(*F*’) for PTE, which are less variable and have its point estimate within [0, 1] under certain conditions. When surrogate markers and outcome endpoint are either all continuous or binary, *F*(*F*’) can be calculated easily. For the continuous surrogate marker and binary outcome, Wang and Taylor (2002) suggested two approaches to derive the measures, but one is computationally difficult, and the other is not easily interpretable. Evaluating a continuous surrogate for a binary endpoint is often encountered in clinical studies. For examples, blood pressure level has been accepted by clinical and regulatory as a surrogate for the incidence of stroke and congestive heart failure (Biomarkers Definitions Working Group 2001; DeGruttola et al. 2001). In an oncology imaging study, the use of post-therapy FDG-PET as a metabolic surrogate marker of tumor response in cervical cancer was prospectively validated (Schwarz et al. 2007). In this article, we provide alternative measures to evaluate the proportion of treatment effect explained by a surrogate in the logistic and probit regression.

The remainder of this article is organized as follows. Section 2 introduces the definition of PTE in general. Section 3 provides our simple alternative PTE measures for the case of continuous surrogate marker and binary outcome. Our measures can be obtained easily with any commonly used statistical computation package capable of binary linear regression modeling. Simulation results are shown in Section 4 for comparing our measures to PE^{(FGS)}. Results of our measures and PE^{(FGS)} are compared to each other; both are compared to the estimated true PTE from the continuous latent outcome of the binary outcome. Interpretations of our measures are provided using Ordinal Dominance (OD) curves (Bamber 1975); a concept that can be easily understood by practical users. In Section 5, a clinical example is illustrated for the method. Discussions and conclusions are provided in Section 6.

Freedman, Graubard, and Schatzkin (1992) defined the proportional of treatment effect explained by the surrogate (PE^{(FGS)}) as the percentage change of the coefficient of the treatment effect from two models, that is, with or without adjusting for a surrogate in the model. Their article focused on the binary outcome in the logistic model. The method provides an intuitive concept, but the two logistic models may not be valid at the same time. Estimation of PE^{(FGS)} is shown to be quite variable; the point estimate and its confidence interval can be outside of interval [0,1] (Freedman 2001). When the endpoint is the time to event, Lin, Fleming, and DeGruttola (1997) defined PTE based on the two Cox proportional hazard models with or without adjusting for the surrogate in the model. The two models can have different baseline hazards. Although both models may be approximated by valid Cox PH models, the confidence interval is again very wide.

Considering estimating PTE in the generalized linear model setting, the outcome observation (*t _{i}*) is assumed to be from a natural exponential family with density function:

$$f({t}_{i};{\theta}_{i},\varphi )=\mathrm{exp}\left[\frac{{t}_{i}{\theta}_{i}-b\left({\theta}_{i}\right)}{a\left(\varphi \right)}+c({t}_{i},\varphi )\right].$$

The outcome (*t _{i}*) relates to treatment (

$$E({T}_{i}\mid {z}_{i},{s}_{i})={h}^{-1}({\beta}_{0}+{\beta}_{1}{s}_{i}+{\beta}_{2}{z}_{i}).$$

*h*(.) is the link function for the generalized linear model. If this full model holds, the marginal model of *t _{i}*, conditional on treatment only, would be obtained by integrating over the distribution function of

$$f({t}_{i}\mid {z}_{i})=\int f({t}_{i}\mid {z}_{i},s)f(s\mid {z}_{i})ds.$$

If both *f* (*t _{i}* |

In practice, a reduced model is fitted for outcome (*t _{i}*) and treatment (

$$E({T}_{i}\mid {z}_{i},{s}_{i})={h}^{-1}({\gamma}_{0}+{\gamma}_{1}{z}_{i}).$$

The PE^{(FGS)} is estimated by

$${\mathrm{PE}}^{\left(\mathrm{FGS}\right)}=1-\stackrel{\u2322}{\beta}_{2}\u2215\stackrel{\u2322}{\gamma}_{1}.$$

This definition of PE^{(FGS)} is intuitive and easy to calculate. There are inherited problems; first, the full model and the marginal model may not be valid simultaneously except in special cases; second, the PE^{(FGS)} estimate can be extremely variable, and its point estimate and confidence interval can lie outside the interval [0,1]. For these reasons, seeking more reliable statistical measures of PTE would be necessary.

Motivated by the idea of Tsiatis, DeGruttola, and Wulfsohn (1995), Wang and Taylor (2002) defined alternative measures (*F* and *F*’) for the proportion of treatment effect explained by a surrogate. The measures are defined by two ratios, that is,

$$\begin{array}{cc}\hfill & F=\frac{AA-AB}{AA-BB}\hfill \\ \hfill \text{and}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}& \hfill \\ \hfill & {F}^{\prime}=\frac{BA-BB}{AA-BB}.\hfill \end{array}$$

Terms *AA, BB, AB*, and *BA* are defined as

$$\begin{array}{cc}\hfill & AA=\stackrel{~}{h}\left(\int {g}_{A}\left(s\right)d{P}_{A}\left(s\right)\right),\hfill \\ \hfill & AB=\stackrel{~}{h}\left(\int {g}_{A}\left(s\right)d{P}_{B}\left(s\right)\right),\hfill \\ \hfill & BA=\stackrel{~}{h}\left(\int {g}_{B}\left(s\right)d{P}_{A}\left(s\right)\right),\hfill \\ \hfill \text{and}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}& \hfill \\ \hfill & BB=\stackrel{~}{h}\left(\int {g}_{B}\left(s\right)d{P}_{B}\left(s\right)\right),\hfill \end{array}$$

(1)

where *P _{A}*(

In the setting of a randomized clinical trial, *T* is a binary outcome, *Z* is a binary indicator of treatment, and *S* is a continuous surrogate marker. Without loss of generality, we assume that an increased surrogate will more likely result in an event (*T* = 1 for event and *T* = 0 for nonevent). Let us assume *P _{A}*(

$$\begin{array}{cc}\hfill {P}_{A}\left(s\right)=& \Phi \left(\frac{s-{\mu}_{A}}{{\sigma}_{A}}\right),\hfill \\ \hfill {P}_{B}\left(s\right)=& \Phi \left(\frac{s-{\mu}_{B}}{{\sigma}_{B}}\right).\hfill \end{array}$$

(2)

Let *g _{A}*(

$$\begin{array}{cc}\hfill & {g}_{A}\left(s\right)=\mathrm{Pr}(T=1\mid s,Z=0)=\Phi ({\beta}_{0}+{\beta}_{1}s)\hfill \\ \hfill & {g}_{B}\left(s\right)=\mathrm{Pr}(T=1\mid s,Z=1)=\Phi ({\beta}_{0}+{\beta}_{1}s-\omega ).\hfill \end{array}$$

(3)

The *ω*(= −*β*_{2}) represents the effect of treatment on the outcome. Setting *ω* as nonnegative will ensure reduced odds of event (*T* = 1) for treatment versus placebo given *S* = *s*. In the expression for *F*(*F*’), (*AA* – *AB*) measures the change in the probability [*T* = 1|*Z* = *A*] for patients in treatment *A* group, had their surrogate distribution followed *P _{B}*(

$$\begin{array}{cc}\hfill & AA=\Phi \left(\frac{{\beta}_{0}\u2215{\beta}_{1}+{\mu}_{A}}{\sqrt{1\u2215{\beta}_{1}^{2}+{\sigma}_{A}^{2}}}\right),\hfill \\ \hfill & BB=\Phi \left(\frac{({\beta}_{0}-\omega )\u2215{\beta}_{1}+{\mu}_{B}}{\sqrt{1\u2215{\beta}_{1}^{2}+{\sigma}_{B}^{2}}}\right),\hfill \\ \hfill & AB=\Phi \left(\frac{{\beta}_{0}\u2215{\beta}_{1}+{\mu}_{B}}{\sqrt{1\u2215{\beta}_{1}^{2}+{\sigma}_{B}^{2}}}\right),\hfill \\ \hfill & BA=\Phi \left(\frac{({\beta}_{0}-\omega )\u2215{\beta}_{1}+{\mu}_{A}}{\sqrt{1\u2215{\beta}_{1}^{2}+{\sigma}_{A}^{2}}}\right).\hfill \end{array}$$

(4)

Each of the parameters in (4) can be estimated from fitted models (2) and (3). (’) can thus be calculated easily. The choice of _{g}(.) and (.) may vary. In the setting of interest, we use identity function for (.) and mean function for *g*(.). When the logit function is desired for the conditional probability [*T* |*S*], the analytical solution to the integrations for each of the terms in (4) is not available. However, one may use the good approximation of

$$\frac{\mathrm{exp}\left(y\right)}{1+\mathrm{exp}\left(y\right)}\approx \Phi \left(\frac{\sqrt{3}}{\pi}y\right)$$

to obtain the estimates of *AA, BB, AB*, and *BA*.

In the special case *ω* = 0, we have *AB* – *BB* = 0 and *AA* – *BA* = 0, thus *F* = *F*’ = 1. If *μ _{A}* =

- R1:
*P*(.) is stochastically higher than_{A}*P*(.);_{B} - R2:
*g*(_{A}*S*) and*g*(_{B}*s*) are nondecreasing function of*s*; - R3:
*g*(_{A}*S*) ≥*g*(_{B}*s*) for all*s*.

Such conditions are generally met if *S* is a surrogate endpoint. In particular, for the setting of interest, if the variances of *S* are equal for both treatment groups, then all three conditions are met.

Unlike PE^{(FGS)}, definition of *F*(*F*’) measures is not based on the model assumptions. It can be applied for any model as long as the calculation of each term is feasible.

By definition each term of (*AA, BB, AB, BA*) corresponds to a probability term which is defined by two distribution functions *F*(.) and *G*(.); that is, ∫ *F(x)dG(y)*. Thus, each term could be presented by an area under an ordinal dominance (OD) curve (Bamber 1975) connecting (0, 0) and (1, 1) in a two-dimensional probability space. Figure 1 illustrates a clinical example that will be presented in detail later in Section 4. OD curves are plotted for *AA* (solid line), *BB* (long dashed), *AB* (dotdashed), and *BA* (dotted dash). The areas under each curve are the values for *AA, BB, AB*, and *BA*. Thus the area bounded by *AA* and *BB* curves corresponds to the (*AA*–*BB*) value; that is, the overall treatment effect. Further, the area of (*AA*–*BB*) can be divided into two mutually exclusive areas of (*AA*–*AB*) and (*AB*–*BB*). Therefore the *F* statistic could be represented by the ratio between two areas, the area defined by the solid line and short dashed lines (*AA*–*AB*) versus the area defined by the solid and long dashed lines (*AA*–*BB*). Similarly, *F*’ is represented by the ratio between the areas defined by (*BA*–*BB*) and (*AA*–*BB*). This graphic presentation is easy to understand, and directly corresponds to the definitions of *F* and *F*’ statistics. When the curves of *AA* and *BB* overlap, it suggests no treatment effect; the larger the (*AA*–*BB*) area is, the stronger the treatment effect is. On the other hand, the closer together the *AB* and *BB* curves (or *BA* and *AA* curves) are, the higher the percentage of the effect of treatment is explained by the surrogate. When *AA* overlaps *AB* or *BA* overlaps *BB, F*(*F*’) = 0, suggesting the surrogate is useless. When *AA* overlaps *BA* or *AB* overlaps *BB, F*(*F*’) = 1, suggesting the surrogate is perfect.

A simulation study is conducted for various scenarios (Table 1) where PTE varies from useless to perfect (0%, 16.67%, 33.33%, 66.67%, 80%, and 100%). The purpose of the simulation is to compare the performance of *F*(*F*’) measures to PE^{(FGS)} measure in the probit model. For the specific setting of the problem (i.e., binary outcome and treatment with continuous surrogate), PE^{(FGS)} and *F*(*F*’) by definition are not the same. However, if the outcome endpoint is a continuous one (*T _{c}*), and the relationships between treatment (

The sample sizes are set to be 500 for each treatment group. For each scenario, simulation results are obtained from 1000 replicates, and are summarized in Table 2. The bootstrap estimates are obtained based on 1000 bootstrap samples. Since the distributional form for the *F* and *F*’ statistics are unknown, following Wang and Taylor’s (2002) suggestion, bias-corrected (BC) Bootstrap (Efron and Tibshirani 1986) method is used for constructing the confidence intervals of *F* and *F*’. The two following models are used for the simulation.

$$S={\alpha}_{0}+{\alpha}_{1}Z+e;$$

Model 1

$${\Phi}^{-1}\left(P\right(T=1\left)\right)={\beta}_{0}+{\beta}_{1}S+{\beta}_{2}Z.$$

Model 2

To calculate PTE, the marginal model below is estimated,

$${\Phi}^{-1}\left(P\right(T=1\left)\right)={\gamma}_{0}+{\gamma}_{1}Z.$$

Model 3

Estimates of PE^{(FGS)} or *FM* are summarized in Table 2. The distributions of these estimates are plotted in histograms as shown in Figure 2. The results show that, in any of the considered cases, *F*(*F*’) and *FM* has little bias and smaller standard deviations (SD) compared to PE^{(FGS)}. In all cases, the estimated PE^{(FGS)} is biased toward 0 and seriously underestimates the true PTE except for the design case C6, in which *β*_{2} = 0 and the surrogate is perfect. The variability of estimated PE^{(FGS)} can be more than twice that of *FM* except in design C3. The variability is particularly large when the marker is perfect or useless (cases C1 and C6). In design C3, the SD of PE^{(FGS)} is smaller due to the fact that the mean estimate is around zero (compared to the true effect of 1/3). In fact, when the true PTE is less than 1/3, the estimated PE^{(FGS)} is negative. The bias of the estimated PE^{(FGS)} improves as the true surrogate effect approaches one; however, the standard deviation remains larger than estimates of *F*(*F*’). As presented in Figure 2, the distributions of PE^{(FGS)} are more skewed to the left than *F*(*F*’), particularly for design C1 (true effect = 0). Also in Table 2, the average *FM* is close to the true PTE. The SDs of *F* and *F*’ are similar. The problems of PE^{(FGS)} identified in our simulation are consistent with the simulation findings from other studies (Bycott and Taylor 1998; Wang and Taylor 2002).

Histogram of the distribution of estimate PE^{(FGS)} and *FM* for different simulation settings. *FM* has smaller biases and standard deviations (SD) comparing to PE^{(FGS)}.

Table 3 presents the 95% CI (confidence interval) and its coverage rate for *F, F*’, *FM*, and PE^{(FGS)}. PE^{(FGS)} has the poorest coverage rate, especially when the true PTE is small (only 4.8% when true PTE = 66.7%). On the other hand, *F, F*’, and *FM* have nominal coverage rates (>93%) over all considerations. Table 4 shows the numbers of times (out of 1000 simulations) that 95% CIs lie within [0, 1], lower bound ≥ 0, and upper bound ≤ 1. As discussed earlier, PE^{(FGS)} underestimates the true value by 30% or more. Therefore its 95% CI is shifted toward the negative direction. The lower CI is often below 0 and upper CI is below 1. This can be seen from the results of cases C1–C4. PTE is worse than *FM* in terms of lower bound ≥ 0 or CI within [0, 1]. They are comparable in terms of upper bound ≤1. In the cases of C5 and C6, although the results for PE^{(FGS)} and *FM* are comparable, both are often within [0, 1], the point estimate of PE^{(FGS)} and its CI are biased toward zero.

Shown are the numbers of times (out of 1000) that the 95% confidence intervals (CIs) lie between [0,1], that lower bounds are greater than or equal to zero, and that upper bounds are less than or equal to one for *FM* and PE^{(FGS)}

With probit model, it could be derived that Model 3 takes the form of

$${\Phi}^{-1}\left(P\right(T=1\mid Z\left)\right)=\frac{({\beta}_{0}+{\alpha}_{0}{\beta}_{1})}{\sqrt{1+{\beta}_{1}^{2}{\sigma}_{s}^{2}}}+\frac{({\beta}_{2}+{\alpha}_{1}{\beta}_{1})}{\sqrt{1+{\beta}_{1}^{2}{\sigma}_{s}^{2}}}Z.$$

Thus, PE^{(FGS)} by definition can be derived, that is,

$${\mathrm{PE}}^{\mathrm{FGS}}=1-\frac{{\beta}_{2}\sqrt{1+{\beta}_{1}^{2}{\sigma}_{s}^{2}}}{{\beta}_{2}+{\alpha}_{1}{\beta}_{1}}.$$

Compare PE^{(FGS)} to the latent PTE (= $\frac{{\alpha}_{1}{\beta}_{1}}{{\alpha}_{1}{\beta}_{1}+{\beta}_{2}}$ from Section 3.2), the bias is

$$\frac{{\beta}_{2}\left(\sqrt{1+{\beta}_{1}^{2}{\sigma}_{s}}-1\right)}{{\beta}_{2}+{\alpha}_{1}{\beta}_{1}},$$

which is always ≥ 0, suggesting PE^{(FGS)} almost always underestimate the true effect. It is unbiased only when *β*_{2} = 0 or *β*_{1} = 0. When the logit model is desired for Model 2, an approximation of relationship

$$\frac{\mathrm{exp}\left(y\right)}{1+\mathrm{exp}\left(y\right)}\approx \Phi \left(\frac{\sqrt{3}}{\pi}y\right)$$

can be used to estimate PTE, and same method is applicable.

Our simulation shows *FM* is a more accurate and less variable measure than PE^{(FGS)}. The point estimate is bounded within [0, 1]. It is unavoidable that the confidence interval of *F* (*F*’) can be out of interval [0, 1] when the true PE^{(FGS)} is close to 0 or 1. Otherwise, its 95% CI is mostly within [0, 1] when the true PTE is not near 0 or 1. The precision, variability, and coverage probability are much improved versus the PE^{(FGS)} estimate.

We use an example from published literature (Buyse et al. 1998; Wang and Taylor 2002). This is a randomized clinical trial study investigating the effect of interferon-*α* in 190 patients with age-related macular degeneration (ARMD). The surrogate *S* and clinical end point *T* are defined as *S* = change of visual acuity at six months (lines loss of vision in six months)

$$T=\{\begin{array}{c}{\scriptstyle \begin{array}{c}0\phantom{\rule{thickmathspace}{0ex}}\text{if patient lost less than three lines of}\hfill \\ \text{vision at one year compared to baseline}\hfill \end{array}}\hfill \\ \hfill \\ {\scriptstyle \begin{array}{c}1\phantom{\rule{thickmathspace}{0ex}}\text{if patient lost three or more lines of}\hfill \\ \text{vision at one year compared to baseline}\hfill \end{array}}\hfill \end{array}\phantom{\}}.$$

A Box–Cox plot suggests that five extreme data points (≥8 or ≤−6) could be outliers. The Surrogate variable presents a reasonable normal distribution within each treatment group. Two groups *t*-test on surrogate was not significant for all 190 patients, but became significant after excluding the five patients with extreme values. It might be arguable that the five patients are outliers. For the purpose of illustrating Prentice’s validation criteria, these five patients’ data are excluded from analysis. Three of the criteria are met, that is, treatment is significant for clinical endpoint *T*, surrogate is also significant for clinical endpoint *T*, and there is no interaction between treatment and surrogate. The criteria that treatment effect is captured by surrogate for the clinical endpoint is assessed by the PTE estimation.

Point estimates of PTE, *F* and *F*’, bootstrap estimates and their confidence intervals (CIs) are reported in Table 5. The bootstrap is used for constructing CI with replicates of 1000. The latent PTE is calculated based on the continuous clinical endpoint (vision at one year), which is 78%. Our median estimate of and ’ are similar (62% and 64.5%). They are similar to the estimates reported by Wang and Taylor (2002) (69% and 61.9%) where the surrogate is treated as binary data. The CIs of *F, F*’, and *FM* are clearly narrower than that of PE^{(FGS)}; and they are about 30% narrower than that reported by Wang and Taylor (2002). The point estimates of *F* and *F*’ are all within (0, 1), and the lower bound of the 95% CI are above 0, suggesting change of visual acuity at six months is a potential surrogate for the clinical endpoint (loss of three or more lines of vision at one year).

Figure 1 presents the effect of treatment in the placebo group (*AA* curve, solid line), and in the interferon-α group (*BB* curve, long dashed line). The *BB* curve lies below the *AA* curve suggesting that interferon-treatment may be effective in preventing more than three lines of loss of vision in ARMD patients by the end of one year. The treatment effect is presented by the inner area between the *AA* and *BB* curves. To demonstrate how much of the treatment effect could be explained by the surrogate measure, *AB* (short dashed line) and *BA* (dotted dash line) curves were plotted together with *AA* and *BB* curves. The area between the *AA* and *AB* curves is about 2/3 (*F* = 0.628) of the (*AA*–*BB*) area, suggesting that a substantial amount of treatment–effect could be explained by the surrogate measure. Similarly, the *F*’(= 0, 647) measure is presented by the area of (*BA*–*BB*) out of the area of (*AA*–*BB*).

This article provides a useful and more accurate estimate for the proportion of treatment effect explained by a surrogate when the clinical endpoint is modeled by logistic or probit regression. The estimate can be calculated easily with any existing statistical computation package that is capable of binary regression modeling, such as SAS, SPSS, or S-Plus. This measure can be presented graphically by the ratio of the areas under ordinal dominant curves.

The derived measures *F*(*F*’) have much smaller bias and smaller variability in contrast to commonly used PE^{(FGS)}. The expected value of *F*(*F*’) is often inside [0, 1] under a few mild conditions, while PE^{(FGS)} often lies outside of [0, 1]. Unlike PE^{(FGS)}, measures *F*(*F*’) have no requirement that the fitted marginal model [*T*|*Z*] and the full model [*T* |*S, Z*] be the same type. In the logistic model [*T* |*Z, S*], the marginal model of [*T* |*Z*] is no longer in the logistic model setting, and it is not easily obtained. *F*(*F*’) can be easily estimated by proper transformation of logit to probit function, and it is shown to be more reliable and efficient. In the setting of the probit model of [*T* |*S, Z*], the marginal model [*T* |*S*] follows a valid probit model as well. PE^{(FGS)} estimate is shown to be unreliable and seriously underestimates the true PTE by as much as 30% or more.

Although theoretically both *F* and *F*’ measures are to quantify the proportion of the treatment effect explained by a surrogate marker, the two could be slightly different in a nonnormal setting. *FM* as an average of both is less bias and has smaller variability than *F* and *F*’, and is much better than PE^{(FGS)}. We recommend the use of *FM*. Many other alternative measures have been proposed to assess the proportion of the treatment effect that is explained by a surrogate marker or mediator. A closely related proposal is the relative indirect effect (RIND) measure proposed by Huang et al. (2004). The RIND measure was proposed by first estimating an imaginary quantity, that is, the expected potential outcome given a specific treatment had the effect of treatment not acted through the intermediate surrogate endpoint. When taking the *g*(.) as the expected mean of the conditional distribution of [*T* |*S, Z*], this expected potential outcome is equivalent to the *AB* measure in our definition, and the RIND is equivalent to the *F* measure of Wang and Taylor’s (2002). There are slight differences in the RIND and *F*(*F*’) measures. First, RIND is defined by setting *g*(.) as the expected conditional mean; second, unlike the *F*(*F*’) measure, the RIND measure does not require the treatment to be a binary measure; third, RIND was defined based on a system of two generalized regression equations, and thus generally requires linear assumptions between the treatment *Z* and the outcome *T*. However, such assumptions could be easily relaxed. Both *F*(*F*’) and RIND measure were proposed for single trial only. Although it is accepted to evaluate the effect of surrogacy from multiple trials (Buyse et al. 2000), validation based on a single trial can provide preliminary evidence in the early study of surrogate or biomarker. Surrogate validation using *F*(*F*’) can be valid in a large single trial. Recently, a new measure was proposed based on information theory which can be applied to a wide variety of settings (Alonso and Molenberghs 2007). The information-theoretic based measure inherits a difficulty in providing a hard cut-off value for discriminating good or bad surrogates.

In this article, we have considered only a single surrogate endpoint in the clinical trial. In many situations, there could be multiple biomarkers on the pathway to the clinical endpoint, where the combination of biomarkers may serve as a surrogate endpoint. One may follow the two-stage approach, by first finding a best linear combination of these markers with the available statistical methods (Pepe 2003), then applying the method presented in this article to validate the surrogate. If multiple surrogate markers assume a joint distribution, *F*(*F*’) can be derived under multiple integration.

The authors thank the Pharmacological Therapy for Macular Degeneration Study Group, F. Hoffman-LaRoche, and Dr. Marc Buyse and Dr. Geert Molenberghs for providing the authors with data and the permission to use the data. This work is partially supported by the NIDA R01DA019965-01A1 awarded to Dr. Bin Huang.

Jie Huang, Novartis Pharmaceuticals, Oncology Business Unit, East Hanover, NJ 07936 (Email: moc.sitravon@gnauh.eij).

Bin Huang, Center for Epidemiology and Biostatistics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229 (Email: gro.cmhcc@gnauh.nib).

- Alonso A, Molenberghs G. Surrogate Marker Evaluation from an Information Theory Perspective. Biometrics. 2007;63:180–186. [PubMed]
- Bamber D. The Area Above the Ordinal Dominance Graph and the Area Below the Receiver Operating Characteristic Graph. Journal of Mathematical Psychology. 1975;12:387–415.
- Biomarkers Definitions Working Group Biomarkers and Surrogate Endpoints: Preferred Definitions and Conceptual Framework. Clinical Pharmacology and Therapeutics. 2001;69(3):89–95. [PubMed]
- Buyse M, Molenberghs G. Criteria for the Validation of Surrogate Endpoint in Randomized Experiments. Biometrics. 1998;54:1014–1029. [PubMed]
- Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. The Validation of Surrogate Endpoints in Meta-analyses of Randomized Experiments. Biostatistics. 2000;1:49–67. [PubMed]
- Bycott PW, Taylor J. An Evaluation of a Measure of the Proportion of the Treatment Effect Explained by a Surrogate Marker. Controlled Clinical Trials. 1998;19:555–568. [PubMed]
- DeGruttola V, Fleming T, Lin DY, Coombs R. Perspective: Validating Surrogate Markers—Are We Being Naive? Journal of Infectious Diseases. 1997;175:237–246. [PubMed]
- DeGruttola V, Clax PC, Demets DL, Downing GJ, Ellenberg SS, Freedman L, Gail MH, Pretience R, Wittes J, Zeger SL. Considerations in the Evaluation of Surrogate Endpoints in Clinical Trials: Summary of a National Institutes of Health Workshop. Controlled Clinical Trials. 2001;22:485–502. [PubMed]
- Efron B, Tibshirani R. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science. 1986;1:54–77.
- EMEA/CHMP report. Innovative Drug Development Approaches: Report From the EMEA / CHMP-Think-Tank Group on Innovative Drug Development. [March 2007]. 2007. http://www.EMEA.EUROPA.EU/PDFS/HUMAN/ITF/12731807EN.PDF.
- Freedman LS. Confidence Intervals and Statistical Power of the ‘Validation’ Ratio for Surrogate or Intermediate Endpoint. Journal of Statistical Planning and Inference. 2001;96:143–153.
- Freedman LS, Graubard BI, Schatzkin A. Statistical Validation of Intermediate Endpoints for Chronic Disease. Statistics in Medicine. 1992;11:167–178. [PubMed]
- Huang B, Sivaganesan S, Succop P, Goodman E. Statistical Assessment of Mediational Effects for Logistic Mediational Methods. Statistics in Medicine. 2004;23:2713–2728. [PubMed]
- Lin DY, Fleming TR, DeGruttola V. Estimating the Proportion of Treatment Effect Explained by a Surrogate Marker. Statistics in Medicine. 1997;16:1515–1527. [PubMed]
- NIAID Workshop Statistical Issues for HIV Surrogate Endpoints: Point/Counterpoint. Statistics in Medicine. 1989;17:2435–2462. [PubMed]
- Pepe MS. The Evaluation of Medical Tests for Classification and Predictions. Oxford University Press; 2003. (Oxford Statistical Science Series).
- Prentice R. Surrogate Endpoints in Clinical Trials: Definition and Operational Criteria. Statistics in Medicine. 1989;8:431–440. [PubMed]
- Temple RJ. A Regulatory Authority’s Opinion About Surrogate Endpoints. In: Nimmo WS, Tucker GT, editors. Clinical Measurement in Drug Evaluation. Wiley; New York: 1995. pp. 3–22.
- Tsiatis AA, DeGruttola V, Wulfsohn MS. Modeling the Relationship of Survival to Longitudinal Data Measured With Error: Applications to Survival and CD4 Counts in Patients with AIDs. Journal of the American Statistical Association. 1995;90:27–37.
- Wang Y, Taylor J. A Measure of the Proportion of Treatment Effect Explained by a Surrogate Marker. Biometrics. 2002;58:803–812. [PubMed]
- Schwarz JK, Siegel BA, Dehdashti F, Grigsby PW. Association of Posttherapy Positron Emission Tomography With Tumor Response and Survival in Cervical Carcinoma. Journal of American Medical Association. 2007;298(19):2289–2295. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |