Extensive simulations are conducted to examine the finite sample performance of the proposed method. We generate the covariates {

*X*_{1},…,

*X*_{n}} and {

*Y*_{1},…,

*Y*_{m}} from the following models:

- (multivariate normal) and , where
*X*_{i} and *Y*_{j} are *p*-dimensional random vectors. - (normal mixture) and i.e. 20% of the markers values in both cases and controls are contaminated by a common error distribution.
- (multivariate log-normal) and .
- (log-normal mixture) and .

In the above settings, we let μ

_{1}=(1, 0, 0,…, 0)′, μ

_{2}=(0, 1, 0,…, 0)′, μ

_{3}=(1, 1, 1, 0,…, 0)′,

Σ

_{3}=5

*I*_{p},

*I*_{p} is the identity matrix. We considered several configurations of

*n*,

*m* and

*p* to investigate the operational characteristics of the proposed method.

First, we examine the scenario where the number of covariates is low relative to the sample size. To this end, we let

*p*=3 and

*n*=

*m*=50. For each generated dataset, we construct a linear combination of the covariates as a score differentiating cases from controls, where the weights of the linear combinations are estimated by minimizing (i)

*M*_{1}(β) (ii)

*M*_{2}(β) (iii) the loss function

proposed in

Ma and Huang (2007) and (iv) by fitting a regular logistic regression. We also implement the popular ada-boosting with 300 iterations using the simple stump as the base classifier (

Friendman *et al.*, 2000). The continuous class probability based on ada-boosting trained ensemble is used to generate the ROC curve. In minimizing

*S*(β), we first identify the ‘anchor covariate’ with the most significant

*p* value from

*t*-test comparing the covariate distribution between cases and controls and set its regression coefficient at +1 or −1 depending on the sign of the

*t*-statistics. The σ in

*S*(β) is then selected as 20% of the mean group difference of the anchor covariate as suggested in

Ma and Huang (2007). We then calculate the AUCs in an independent test set consisting of 2000 cases and 2000 controls for all the five constructed scores. The boxplots of AUCs over 250 replications in each setting are plotted in . The AUCs in the training sets are higher than their counterparts in the validation sets as expected. In most cases including the first setting, where the logistic regression estimates the optimal combination in terms of maximizing the AUC, the scores based on

*M*_{1}(β),

*M*_{2}(β),

*S*(β) and logistic regression perform similarly in terms of AUC in the validation sets. In general, the score based on ada-boosting has the lowest AUC, which could be due to overfitting indicated by the high AUCs in the training set. Furthermore, the score based on the one-step adaptive hinge loss function performs similar or slightly superior to that based on hinge loss function itself.

Second, we have examined the performance of the proposed method for covariates with moderate dimension. In this case, we let

*p*=200 and

*n*=

*m*=50 and the lasso regularization is used for selecting the important features in logistic regression. The forward stagewise algorithm similar to that presented in

Section 2.2 for minimizing

*M*_{2}(β) is also used to minimize

*S*(β). We choose the popular lasso penalty mainly for the purpose of fair comparison, i.e. evaluating the relative performance of various methods under similar regularization schemes. The boxplots of AUCs in independent test sets over 250 replications are plotted in . In general, the scores based on

*M*_{j}(β) perform better than that based on the alternatives in terms of average AUC in the test sets. Furthermore, the AUCs from scores constructed via

*M*_{j}(β) also tend to have smaller variability than their counterparts. In the most challenging fourth setting, the empirical average AUC in the test sets is 0.66 for score minimizing

*M*_{1}(β), 0.66 for score minimizing the hinge loss,

*M*_{2}(β), 0.60 for scores minimizing

*S*(β), 0.60 for score from the logistic regression fitting and 0.63 for score from ada-boosting using three markers. An increase from 0.60–63 to 0.66 in the AUC is often considered non-trivial in clinical practice.

Lastly, we have examined the cases for high-dimensional covariate. Here, we let *p*=20 000 and *n*=*m*=50. To save computational time, the ada-boosting is only applied to top 500 features selected based on significance levels of *t*-test comparing cases and controls in the training set. The simulation results are presented in . For the high-dimensional covariates, the relative performance of the proposed methods is even better than that in the previous case where *p*=200. For example, in the third setting, the empirical average AUC in the test sets is 0.85 for score minimizing *M*_{1}(β), 0.85 for score minimizing the hinge loss, 0.76 for scores minimizing *S*(β), 0.74 for score from the logistic regression fitting and 0.64 for score from ada-boosting using three markers. Similarly, in the fourth setting, the empirical average AUC in the test sets are 0.56, 0.56, 0.53, 0.53 and 0.53 for aforementioned five methods.