Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2772215

Formats

Article sections

- Summary
- 1. Introduction
- 2. The Best Linear Combination under the ROC Criterion
- 3. Estimation of the Best Linear Combination and the largest AUC
- 4. Application
- 5. Discussion and Conclusion
- References

Authors

Related links

Stat Probab Lett. Author manuscript; available in PMC 2010 November 15.

Published in final edited form as:

Stat Probab Lett. 2009 November 15; 79(22): 2321–2327.

doi: 10.1016/j.spl.2009.08.002PMCID: PMC2772215

NIHMSID: NIHMS139290

Corresponding Author: JIN Hua, Ph.D., School of Mathematical Sciences, South China Normal University, Guangzhou 510631, China. Tel: 86-20-85215691; Email: moc.361@1hnij

See other articles in PMC that cite the published article.

Multiple alternative diagnostic tests for one disease are commonly available to clinicians. It’s important to use all the good diagnostic predictors simultaneously to establish a new predictor with higher statistical utility. Under the generalized linear model for binary outcomes, the linear combination of multiple predictors in the link function is proved optimal in the sense that the area under the receiver operating characteristic (ROC) curve of this combination is the largest among all possible linear combination. The result was applied to analysis of the data from the Study of Osteoporotic Fractures (SOF) with comparison to Su and Liu’s approach.

Multiple alternative diagnostic tests for one disease are commonly available to clinicians, and statistical methods are available for evaluating these alternatives. For instance, multivariate regression models can be used to determine the marginal advantage of additional tests (Richards, 1995), and this approach has been used in osteoporosis research to determine the relative risk and/or odds ratio of osteoporotic frature (Hans, 1999). While these parameters are important for determining future fracture risk or developing clinical predictor rules based on tests, multivariate regression does not directly provide information on test utility or optimizing test combinations.

Because sensitivity and specificity of a diagnostic test depend on the threshold used to define abnormal, the receiver operating characteristic (ROC) curve is often used to assess the utility of a diagnostic test (DeLong, 1988; Pepe, 1997). This methodology has been extended to multivariate tests by constructing a linear function and then evaluating a combined utility based on the ROC of the linear function. Su and Liu (1993) have shown that under a normal distribution assumption, the best linear combination of diagnostic tests to achieve maximum area under a ROC curve is the linear discriminant function. However, no literature covers the general situation.

In this article, under the generalized linear model, we provide the optimal linear combination among all possibilities under the criterion that the area under the corresponding ROC curve of the combination is maximized, which is named the ROC criterion. The main results are presented in Section 2. Section 3 considers the estimation of the optimal linear combination and its corresponding area under ROC curve. Section 4 demonstrates an application of the proposed method to data from the Study of Osteoporotic Fractures (SOF) with comparison to Su and Liu’s approach. Discussion and conclusions are presented in the last section.

In this section it is proved that the linear combination in the link function of a model for binary responses maximizes sensitivity uniformly at any given specificity under the generalized linear model. Here we only present results for a simple case with two predictors, which can be easily generalized to the situation of multiple diagnostic markers.

Let *Z* be the binary outcome, (*X*_{1}, *X*_{2}) be the random predictive variables. Suppose the generalized linear model for binary responses holds, that is to say,

$$P\phantom{\rule{0.16667em}{0ex}}Z=1({X}_{1},{X}_{2})=h({\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2})$$

(1)

where *β* = (*β*_{0}, *β*_{1}, *β*_{2})′ is the 3-vector of parameters and *h* is a known function that is bounded in the unit interval (0,1). Although a wide choice of link function is available, the most commonly used in practical are the three models as follows:

- the logistic regression model:$$h(x)=\frac{exp(x)}{1+exp(x)}$$
- the probit model:$$h(x)=\mathrm{\Phi}(x)$$
- the complementary log-log model:$$h(x)=1-exp(-exp(x))$$

Let *α*_{1}*X*_{1} + *α*_{2}*X*_{2} be any linear combination, and their sensitivity and specificity are respectively *Sn* and *Sp*. First, we will show that the selected linear combination by the link function of the model, *β*_{1}*X*_{1} + *β*_{2}*X*_{2}, dominates all the other possible linear combinations in the sense that it provides the highest sensitivity uniformly over the entire range of specificity and thus, it is the best linear combination that results in the largest area under the ROC curve (AUC) among all linear combinations. Then we give the mathematical formula to compute the corresponding largest AUC.

The following useful lemma was originally due to Chebyshev (see Hardy, 1959)

Let ** U** be a one-dimensional non-degenerate random variable. If

$$E[U\xb7g(U)]<E(U)\xb7E(g(U))$$

holds provide the expectation of ** U** exists.

Suppose the generalized linear model (1) for binary responses holds and the function *h* is continuous and strictly monotone. If (*X*_{1}, *X*_{2}) is two-dimensional continuous variable with continuous probability density and its expectation exists, then for any given specificity *Sp*, the coefficients for the best linear combination that provides the highest sensitivity uniformly is (*α*_{1}, *α*_{2}) (*β*_{1}, *β*_{2}).

Theorem 1 may be easily extended to multiple-markers situation, which is presented in the following theorem without proof.

Suppose the generalized linear model for binary responses holds, i.e.,

$$P\phantom{\rule{0.16667em}{0ex}}Z=1({X}_{1},,{X}_{k})=h({\beta}_{0}+{\beta}_{1}{X}_{1}++{\beta}_{k}{X}_{k})$$

(2)

where *β* = (*β*_{0}, *β*_{1}, ···, *β _{k}*)′ is the (

Suppose the conditions in Theorem 2 hold. Then the best linear combination under the ROC criterion is also (*α*_{1}, ···, *α _{k}*) (

$$\mathit{LAUC}=\frac{1}{P(Z=1)P(Z=0)}\int G(x)h(x)dG(x)-\frac{P(Z=1)}{2P(Z=0)}$$

where *P*(*Z* = 1) = ∫*h*(*x*)*dG*(*x*), *P*(*Z* = 0) = 1 − *P*(*Z* = 1) and *G*(*x*) is the distribution function of *X* = *β*_{0} + *β*_{1}*X*_{1} + ··· + *β _{k}X_{k}*, which can be derived from the joint distribution of (

For rare diseases such as Osteoporotic Fractures discussed in the example section, we have the following simple formula to calculate the largest area under the ROC curve of the best linear combination, which can be directly derived from simple computation.

If the prevalence of the disease, *P*(*Z* = 1), is small enough, for example less than 5%, then the largest AUC may be approximated as follows:

$$\mathit{LAUC}\approx \frac{\int G(x)h(x)dG}{P(Z=1)}$$

whose error is about *P*(*Z* = 1)/2, say less than 2.5% if *P*(*Z* = 1) ≤ 5%.

In practical application, we need first to fit the generalized linear model from data. Suppose that for the study subject *i*, the response *Z _{i}* has the Bernoulli distribution

$$P({Z}_{i}={z}_{i})={\theta}_{i}^{{z}_{i}}{(1-{\theta}_{i})}^{1-{z}_{i}},\phantom{\rule{0.38889em}{0ex}}{z}_{i}=0,1,\phantom{\rule{0.38889em}{0ex}}i=1,,n$$

and *θ _{i}* is determined by the link function

It is common to make use of the likelihood function as follows to estimate the parameters *β* = (*β*_{0}, *β*_{1}, ···, *β _{k}*)′:

$$L(\beta ;z)=\underset{{[h({\beta}_{0}+{\beta}_{1}{x}_{i1}++{\beta}_{k}{x}_{ik})]{z}_{i}}^{{[1-h({\beta}_{0}+{\beta}_{1}{x}_{i1}++{\beta}_{k}{x}_{ik})]1-{z}_{i}}^{}}}{\overset{}{i=1n}}$$

In fact, we can obtain the maximum likelihood estimate of *β* by maximizing *L*(*β; z*). An iterative algorithm, such as the popular Newton-Raphson method implemented in standard software from SAS and S-Plus, may be used to compute numerically (Cox, 1989).

As for estimation of the largest AUC, we need further know the joint distribution of the diagnostic predictors. There are two approaches for this problem. One belongs to conventional parametric methods, among which the normal approximation is the most popular. Another is the standard nonparametric technique, which use the empirical distribution, based on the sample data, to directly estimate the population distribution. After that, we could derive the estimation of the distribution function of *β*_{0} + *β*_{1}*X*_{1} +···+ *β _{k}X_{k}*, and the corresponding area under the ROC curve based on Theorem 3 in this paper.

In this section, we apply our method to the data obtained from the Study of Osteoporotic Fractures (SOF) for illustration and compare it to the approach of Su and Liu (1993). From 1986 to 1988, SOF recruited 9,704 white women aged 65 years or older from four areas of the United States. At baseline, bone mineral density (BMD) was measured at the calcaneus, distal radius and proximal radius using single photon absorptiometry (SPA). At the second visit (1988–89), surviving participants had BMD measurements of the posterior-anterior (PA) spine (L1–L4) and proximal femur (neck, trochanter, total hip regions of interest) using dual x-ray absorptiometry (DXA). Fractures of the hip were recorded for each subject at each visit. More details about the study design and the data have been published previously (Cummings, 1995).

We included 7,127 women from the study. All of these women had forearm, calcaneal, hip and spine bone mineral density (BMD) measurements. In addition to the BMD measurements, many other previously identified predictive variables at baseline were also investigated. Furthermore, they all had known 5-year hip fracture status: either they were followed for 5 years after visit 2 without hip fracture or they had hip fracture within five years after visit 2. Women who were lost to follow-up within 5 years without known hip fracture were excluded.

Although all these 43 variables, including patient demographic data, clinical BMD measured by DXA and SXA, medical history, X-ray for prevalent fracture and vertebral heights, functional status, vision test results, and nutrition, etc, have been identified as significant predictors of hip fracture risk and they may reflect different aspects of osteoporosis and aging and may help in understanding the etiology of osteoporosis and hip fracture, only a few of them are necessary to identify subjects with elevated fracture risks. Standard approaches to the analysis of binary data such as logistic regression and probit model show that a linear combination of age, the femoral neck BMD and loss of height could best predict the hip fracture. Our previous study also suggested that these 3 variables could be used to build a non-inferior classification rule to the optimum recursive partitioning rule (Jin, 2004). So here we consider these three predictors and want to find out the best linear combination under the ROC criterion.

Assume the generalized linear model (2) holds for the SOF data, where *Z*_{1} = 1 stands for hip fracture, *X*_{1}, *X*_{2} and *X*_{3} denote age, femoral neck BMD and loss of height respectively. If we further assume
$h(x)=\frac{exp(x)}{1+exp(x)}$, the standard software S-plus or SAS provides the following fitted logistic regression model:

$$P\phantom{\rule{0.16667em}{0ex}}Z=1({X}_{1},{X}_{2},{X}_{3})=1-\frac{1}{1+exp(-3.89+0.075{X}_{1}-8.90{X}_{2}+0.100{X}_{3})}$$

It follows from our theorems 2 and 3 that *X*^{(}^{l}^{)} = 0.075*X*_{1} − 8.90*X*_{2} + 0.100*X*_{3} is the optimal linear combination in the sense that it provides the highest sensitivity uniformly over the entire range of specificity and also under the ROC criterion. Hence, the estimated best coefficients are proportional to
$({\alpha}_{1}^{l},{\alpha}_{2}^{l},{\alpha}_{3}^{l})(1,-119,1.33)$.

We need to estimate the joint distribution of (*X*_{1}, *X*_{2}, *X*_{3}) in order to get the corresponding largest area under the ROC curve. If we further assume that (*X*_{1}, *X*_{2}, *X*_{3})* ^{τ}* ~

$$\widehat{\mathrm{\sum}}=\left(\begin{array}{ccc}\hfill 23.88\hfill & \hfill -0.13\hfill & \hfill 3.77\hfill \\ \hfill -0.13\hfill & \hfill 0.012\hfill & \hfill -0.05\hfill \\ \hfill 3.77\hfill & \hfill -0.05\hfill & \hfill 8.05\hfill \end{array}\right),$$

which leads to −3.89 + 0.075*X*_{1}−8.90*X*_{2} + 0.10*X*_{3} ~ *N* −4.05,1.22. It follows from Theorem 3 that the largest AUC under the logistic regression model is **0.798.** The 250 times bootstrap standard error is 0.049.

We may obtain similar result under the probit model, that is to say, *h*(*x*) = Φ(*x*). Here, the fitted model of hip fracture risk is

$$P\phantom{\rule{0.16667em}{0ex}}Z=1({X}_{1},{X}_{2},{X}_{3})=\mathrm{\Phi}-2.16+0.036{X}_{1}-4.09{X}_{2}+0.048{X}_{3},$$

which suggests *X*^{(}^{p}^{)} = 0.036*X*_{1} −4.09*X*_{2} + 0.048*X*_{3} be the best linear combination under the probit model. Hence, the estimated best coefficients are proportional to
$({\alpha}_{1}^{p},{\alpha}_{2}^{p},{\alpha}_{3}^{p})(1,-114,1.34)$, and the corresponding largest AUC be **0.805** (with standard error 0.050). By the way, if we use the approximate formula in Corollary 1, these two largest AUCs are estimated to be **0.788** and **0.795**, just 1% lower than the true values.

We also can consider applying Su and Liu’s method to the hip fracture data. Under their approach, the two conditional distributions are estimated as

$$\left(\begin{array}{l}{X}_{1}\hfill \\ {X}_{2}\hfill \\ {X}_{3}\hfill \end{array}\right)|Z=1~N\left(\left(\begin{array}{l}74.97\hfill \\ 0.56\hfill \\ 5.28\hfill \end{array}\right),\left(\begin{array}{ccc}\hfill 36.83\hfill & \hfill -0.079\hfill & \hfill 8.28\hfill \\ \hfill -0.079\hfill & \hfill 0.007\hfill & \hfill -0.069\hfill \\ \hfill 8.28\hfill & \hfill -0.069\hfill & \hfill 13.23\hfill \end{array}\right)\right)$$

and

$$\left(\begin{array}{l}{X}_{1}\hfill \\ {X}_{2}\hfill \\ {X}_{3}\hfill \end{array}\right)|Z=0~N\left(\left(\begin{array}{l}70.93\hfill \\ 0.66\hfill \\ 3.30\hfill \end{array}\right),\left(\begin{array}{ccc}\hfill 22.92\hfill & \hfill -0.12\hfill & \hfill 3.36\hfill \\ \hfill -0.12\hfill & \hfill 0.012\hfill & \hfill -0.046\hfill \\ \hfill 8.28\hfill & \hfill -0.046\hfill & \hfill 7.76\hfill \end{array}\right)\right)$$

The general method in Section 3 of their paper may give the best coefficients as follows,
$({\alpha}_{1}^{s},{\alpha}_{2}^{s},{\alpha}_{3}^{s})(1,-103,1.05)$ with the estimated largest AUC **0.760**.

For comparison of fit goodness of the three models, we directly estimate the ROC curves corresponding to the three coefficients $({\alpha}_{1}^{l},{\alpha}_{2}^{l},{\alpha}_{3}^{l}),({\alpha}_{1}^{p},{\alpha}_{2}^{p},{\alpha}_{3}^{p})$ and $({\alpha}_{1}^{s},{\alpha}_{2}^{s},{\alpha}_{3}^{s})$ empirically, i.e. based on the SOF data itself, and then calculate the areas under the curves. It’s not surprising that the real AUCs are all 0.805 because of little difference between the three optimal linear combinations resulted by different models. Therefore, the logistic regression and probit model fit the SOF data very well while Su and Liu’s method underestimates the area under the ROC curve of the best linear combination.

In medical applications, it is important to use all the good diagnostic predictors of a disease simultaneously to establish a new predictor with higher statistical utility. The linear combinations of multiple predictors are of particular interest to us. In this paper, we consider a common issue where the generalized linear models are utilized to fit the data with binary outcomes. Now that the linear combination identified by the link function is proved optimal under the ROC criterion, we just need to estimate the best linear combination using the standard procedure once the generalized linear models pass the statistical test. Therefore it is easy for clinicians to get the optimal linear combination of multiple diagnostic predictors under the condition of generalized linear models.

One referee of Su and Liu’s paper pointed out, “Using logistic regression to identify tests that best predict presence or absence of disease is also common”. Although the logistic regression is usually less efficient than the normal discriminant analysis with the normal assumption holding (Efron, 1975; Ruiz-Velasco, 1991), the generalized linear models for binary data will be more robust than the latter, because choice and estimation of the best linear combination need no assumption of the joint distribution of the multiple predictors. Su and Liu admitted this feature, exposed clearly by Cox and Snell (1989) that “once a vector of explanatory variables is given, then the probability that this individual belongs to one of the two groups is determined.” Our SOF data could be a real example for illustration of that point.

There may be two directions for extension of this paper. On the one hand, we can extend our work from linear to non-linear models, which may explore a new concept--the ROC region to assess statistical utility of a diagnostic predictor instead of the ROC curve. On the other hand, we may extend our interest from binary data to more complex data with more than two outcomes, which was discussed by Yang and Carlin (2000) who used ROC surface approach. All these kinds of research are currently under investigation.

The study is supported by grants from the National Institutes of Health R01EB004079 and National Bureau of Statistics of China LX:2006B45

Without loss of generality, let *α*_{2} = *β*_{2}. For simplicity, let *α*_{1} = *α*.

Suppose that *β*_{2} > 0 and *h* is strictly increasing. Let *f*(*x*_{1}, *x*_{2}) be the distribution density function of the two-dimensional continuous variable (*X*_{1}, *X*_{2}), and (*x*) = 1 − *h*(*x*), then the specificity of the linear combination, *αX*_{l} + *β*_{2}*X*_{2}, can be expressed as

$$\begin{array}{l}Sp(c)=P(\alpha {X}_{1}+{\beta}_{2}{X}_{2}\le cZ=0)=\frac{1}{P(Z=0)}\underset{P(Z=0({x}_{1},{x}_{2}))f({x}_{1},{x}_{2})d{x}_{1}d{x}_{2}}{\alpha {x}_{1}+{\beta}_{2}{x}_{2}\le c}=\frac{1}{P(Z=0)}\underset{-\infty}{\overset{\infty}{\int}}\left[\underset{-\infty}{\overset{\frac{c-{\alpha x}_{1}}{{\beta}_{2}}}{\int}}\overline{h}({\beta}_{0}+{\beta}_{1}{x}_{1}+{\beta}_{2}{x}_{2})\xb7f({x}_{1},{x}_{2}){dx}_{2}\right]{dx}_{1}\end{array}$$

and, therefore,

$$\underset{-\infty}{\overset{\infty}{\int}}\left[\underset{-\infty}{\overset{\frac{c-\alpha {x}_{1}}{{\beta}_{2}}}{\int}}\overline{h}({\beta}_{0}+{\beta}_{1}{x}_{1}+{\beta}_{2}{x}_{2})\xb7f({x}_{1},{x}_{2})d{x}_{2}\right]d{x}_{1}={S}_{p}\xb7P(Z=0)$$

(A1)

For any given specificity *Sp*, the equation (A1) defines *c* as a differentiable function of *α*, denoted as *c*(*α*) below. Furthermore, it follows from differentiating both sides of equation (A1) with respective to *α* that

$$\underset{-\infty}{\overset{\infty}{\int}}\overline{h}({\beta}_{0}+{\beta}_{1}x+c(\alpha )-\alpha x)\xb7f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})\xb7\frac{{c}^{\prime}(\alpha )-x}{{\beta}_{2}}dx=0$$

(A2)

Evaluating (A2) at the specific choice *α* = *β*_{1}, and noting that the nonzero function (*β*_{0} + *β*_{1}*x* + *c*(*β*_{1}) − *β*_{1}*x*) does not depend on *x* and so can be taken out of the above integral, leads to the equality:

$$\underset{-\infty}{\overset{\infty}{\int}}\frac{{c}^{\prime}({\beta}_{1})-x}{{\beta}_{2}}f(x,\frac{c({\beta}_{1})-{\beta}_{1}x}{{\beta}_{2}})dx=0$$

(A3)

where *c*′(*α*) denotes the derivative of *c*(*α*) with respect to *α*.

As for the sensitivity of *X*, it can be rewritten as

$$\begin{array}{l}Sn(c(\alpha ))=P(\alpha {X}_{1}+{\beta}_{2}{X}_{2}>c(\alpha )Z=1)=1-\frac{1}{P(Z=1)}\underset{P(Z=1({x}_{1},{x}_{2}))f({x}_{1},{x}_{2})d{x}_{1}d{x}_{2}}{\alpha {x}_{1}+{\beta}_{2}{x}_{2}\le c(\alpha )}=1-\frac{1}{P(Z=1)}\underset{-\infty}{\overset{\infty}{\int}}d{x}_{1}\left[\underset{-\infty}{\overset{\frac{c(\alpha )-\alpha {x}_{1}}{{\beta}_{2}}}{\int}}h({\beta}_{0}+{\beta}_{1}{x}_{1}+{\beta}_{2}{x}_{2})f({x}_{1},{x}_{2})d{x}_{2}\right]\\ =1-\frac{1}{P(Z=1)}\left(\underset{-\infty}{\overset{\infty}{\int}}\left[\underset{-\infty}{\overset{\frac{c(\alpha )-\alpha {x}_{1}}{{\beta}_{2}}}{\int}}f({x}_{1},{x}_{2})d{x}_{2}\right]d{x}_{1}-{S}_{p}\xb7P(Z=0)\right)\end{array}$$

by utilizing (A1). So, we just need to prove that
$S(\alpha )=\underset{-\infty}{\overset{\infty}{\int}}\left[\underset{-\infty}{\overset{\frac{c(\alpha )-\alpha {x}_{1}}{{\beta}_{2}}}{\int}}f({x}_{1},{x}_{2})d{x}_{2}\right]d{x}_{1}$ can reach the absolute minimum at the point *α* = *β*_{1} under the restriction of (A1) or (A2).

Because

$${S}^{\prime}(\alpha )=\underset{-\infty}{\overset{\infty}{\int}}\frac{{c}^{\prime}(\alpha )-x}{{\beta}_{2}}f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx,$$

(A4)

$${S}^{\prime}({\beta}_{1})=\underset{-\infty}{\overset{\infty}{\int}}\frac{{c}^{\prime}({\beta}_{1})-x}{{\beta}_{2}}f(x,\frac{c({\beta}_{1})-{\beta}_{1}x}{{\beta}_{2}})dx=0$$

(A5)

On the other hand, it’s easy to see that, for any fixed *α*,
$\frac{1}{{\beta}_{2}}\underset{-\infty}{\overset{\infty}{\int}}f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx$ is the value of the density function of the linear combination *αX*_{1} + *β*_{2}*X*_{2} at point *c*(*α*). Let
$p(x)=\frac{f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})}{\underset{-\infty}{\overset{\infty}{\int}}f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}$ be the density function of a random variable ** U**, and

$$E[U\xb7g(U)]<E(U)\xb7E(g(U))$$

which leads to the following inequality:

$$\frac{\underset{-\infty}{\overset{\infty}{\int}}xg(x)f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}{\underset{-\infty}{\overset{\infty}{\int}}g(x)f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}<\frac{\underset{-\infty}{\overset{\infty}{\int}}xf(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}{\underset{-\infty}{\overset{\infty}{\int}}f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}$$

(A6)

From (A2), we see that

$$\frac{\underset{-\infty}{\overset{\infty}{\int}}xg(x)f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}{\underset{-\infty}{\overset{\infty}{\int}}g(x)f(x,\frac{c(\alpha )-\alpha x}{{\beta}_{2}})dx}={c}^{\prime}(\alpha )$$

(A7)

Combining (A4),(A6) and (A7), we obtain that *S*′(*α*) < 0 when *α* < *β*_{l}.

Similarly, we can prove that *S*′(*α*) > 0 when *α* > *β*_{1}. Therefore, combining with (A5), we finish the proof that *S*(*α*) has the absolute minimum, hence *Sn*(*c*(*α*)) has the absolute maximum, at *α* = *β*_{1} when *β*_{2} > 0 and *h* is strictly increasing.

In a similar way, it’s easy to prove that the coefficients for the best linear combination that provides the highest sensitivity uniformly is (*α*_{1}, *α*_{2}) (*β*_{1}, *β*_{2}) when *β*_{2} < 0 or *h* is strictly decreasing. Thus the proof of the theorem is complete.

Once again, we let (*x*) = 1 − *h*(*x*). It follows from (2) that

$$P\phantom{\rule{0.16667em}{0ex}}Z=1X=h(X)$$

So we have *P*(*Z* = 1) = ∫*h*(*x*)*dG*(*x*), and the specificity of the best linear combination can be expressed as

$$Sp(c)=P\phantom{\rule{0.38889em}{0ex}}X\le cZ=0=\frac{1}{P(Z=0)}\underset{x\le c}{\int}P\phantom{\rule{0.38889em}{0ex}}Z=0x)\phantom{\rule{0.38889em}{0ex}}dG(x)=\frac{1}{P(Z=0)}\underset{-\infty}{\overset{c}{\int}}\overline{h}(x)dG(x)$$

Then $\mathit{dSp}(c)=\frac{1}{P(Z=0)}\overline{h}(c)dG(c)$, and

$$\int \frac{1}{P(Z=1)}G(c)\mathit{dSp}(c)=\frac{1}{P(Z=1)P(Z=0)}\int G(c)\overline{h}(c)dG(c).$$

On the other hand, its sensitivity is

$$\begin{array}{l}Sn(c)=P\phantom{\rule{0.38889em}{0ex}}X>cZ=1=1-\frac{1}{P(Z=1)}\underset{x\le c}{\int}P\phantom{\rule{0.38889em}{0ex}}Z=1x)\phantom{\rule{0.38889em}{0ex}}dG(x)=1-\frac{1}{P(Z=1)}\underset{-\infty}{\overset{c}{\int}}h(x)dG(x)=1+\frac{P(Z=0)}{P(Z=1)}Sp(c)-\frac{1}{P(Z=1)}G(c)\end{array}$$

Then the largest area under the ROC curve is given by

$$\begin{array}{l}\mathit{AUC}=\int Sn(c)\mathit{dSp}(c)=\int \left(1+\frac{P(Z=0)}{P(Z=1)}Sp(c)\right)\mathit{dSp}(c)-\int \frac{1}{P(Z=1)}G(c)\mathit{dSp}(c)\\ =\left(1+\frac{P(Z=0)}{2P(Z=1)}\right)-\frac{1}{P(Z=1)P(Z=0)}\int G(c)\overline{h}(c)dG(c)\\ =\left(1+\frac{P(Z=0)}{2P(Z=1)}\right)-\frac{1}{P(Z=1)P(Z=0)}\int G(c)(1-h(c))dG(c)\\ =\frac{1}{P(Z=1)P(Z=0)}\int G(c)h(c)dG(c)+\left(1+\frac{P(Z=0)}{2P(Z=1)}-\frac{1}{2P(Z=1)P(Z=0)}\right)\\ =\frac{1}{P(Z=1)P(Z=0)}\int G(x)h(x)dG(x)-\frac{P(Z=1)}{2P(Z=0)}.\end{array}$$

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Cox DR, Snell EJ. The Analysis of Binary Data. London: Chapman and Hall; 1989. p. 132.
- Cummings SR, Nevitt MC, Browner WS, et al. Risk factors for hip fracture in white women. Study of osteoporosis research group [see comments] The New England Journal of Medicine. 1995;332:767–773. [PubMed]
- DeLong ER, DeLong DM, Clark-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a non-parametric approach. Biometrics. 1988;44:837–845. [PubMed]
- Efron B. The Efficiency of logistic Regression Compared to Normal Discriminant Analysis. Journal of the American Statistical Association. 1975;70:892–898.
- Hans D, Srivastav SK, Singal C, et al. Does combining the results from multiple bone sites measured by a new quantitative ultrasound device improve discrimination of hip fracture? Journal of Bone and Mineral Research. 1999;14:644–651. [PubMed]
- Hardy GH, Littlewood JE, Polya G. Inequalities. London and New York: Cambridge University Press; 1959. p. 43.
- Jin H, Lu Y, Stone KL, et al. Classification algorithms for hip fracture prediction based on recursive partitioning methods. Medical Decision Making. 2004;24(4):386–397. [PubMed]
- Pepe MS. A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika. 1997;85:595–608.
- Richards RJ, Hammitt JK, Tsevat J. Finding the optimal multiple-test strategy using a method analogous to logistic regression analysis. Medical Decision Making. 1995;16:367–375. [PubMed]
- Ruiz-Velasco S. Asymptotic Efficiency of Logistic Regression Relative to Linear Discriminant Analysis. Biometrika. 1991;78:235–243.
- Su J, Liu J. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association. 1993;88:1350–1355.
- Yang H, Carlin D. ROC surface: A generalization of ROC curve analysis. Journal of Biopharmaceutical Statistics. 2000;10:183–96. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |