We conducted leave-one-out (LOO) cross-validation (CV) experiments on three binary biological/clinical datasets. The first is a Parkinson’s disease dataset, ^{4,5} including 195 instances with 22 components. The second is a colon cancer microarray dataset,^{6} preprocessed by,^{7} including 63 instances with 2,000 components. We ranked the genes of the colon cancer dataset by a simple index (BSS/WSS) as described in,^{8} narrowing the dataset down to only 500 components. The third one is a breast cancer dataset,^{9} including 683 instances with 10 components. The results of our algorithms on the three clinical dataset are shown in , and , respectively.

For the Parkinson’s disease dataset, both SPRT-random and MSPRT-random outperform LDA at the chosen α and β values. SPRT-ordered outperforms LDA at some α and β values, while MSPRT-ordered is comparable to LDA at some α and β values. The highest accuracy rate is achieved by SPRT-random and MSPRT-random at *α* = *β* = 0.32 and *α* = *β* = 0.175, respectively. For the colon cancer dataset, the MSPRT-ordered reaches the maximum accuracy rate among all the methods at *α* = *β* = 0.22. Most accuracy rates by SPRT-ordered are right under those by the MSPRT-ordered. Also note that the MSPRT-random and SPRT-random reaches the LDA accuracy rate, but the MSPRT-random falls below the accuracy rate achieved by LDA. Similarly, SPRT-ordered and MSPRT-ordered outperform LDA at some α and β values. Finally, for the breast cancer dataset, both SPRT-ordered and MSPRT-ordered reach peaks of accuracy that are above the LDA accuracy rate. Note the two peak values are the same but occur at different *α* and *β* values. SPRT-random and MSPRT-random both fall short of the LDA accuracy rate.

As described in Section 2.2 and 2.3, SPRT and MSPRT may not use all the components before arriving at a decision. Hence, we investigated the relationship between prediction accuracy and the average number of components examined by SPRT-ordered and MSPRT-ordered using the colon cancer dataset since it has the most number of components. To compare to LDA, we ordered the components by their variances and used only *K* components with the most variances to conduct LOO CV experiments, where *K* ranges from 5 to 50. Unlike SPRT-random and MSPRT-random, LDA always uses all the *K* components to infer the class label of a new instance. shows scatter plots of accuracy rate versus number of components for the three methods. It is evident that MSPRT-ordered requires significantly less components than LDA to attain the maximal accuracy rate of 0.8871. SPRT-ordered also requires less than 10 components on average to achieve its performance peak. It may appear that LDA reaches its performance peak at *K* = 48 and remains at the peak as *K* increases. This is not true since *K* can go up to 500, the total number of components, for this data set. We know that, at *K* = 500, this classifier is simply the LDA without component selection and the accuracy rate is 0.8548 (see ), which is less than the maximal accuracy rate.

We note that the desired error rates, *α* and *β*, in SPRT/MSPRT are considered model parameters, which can be tuned by CV experiments on a training dataset. From the LOO CV results presented above, we can see that SPRT-ordered and MSPRT-ordered are superior to LDA in terms of prediction accuracy. Moreover, on average SPRT-ordered and MSPRT-ordered require less components than LDA to achieve the same accuracy. In some sense, SPRT-ordered and MSPRT-ordered perform implicit feature selection when labeling a new instance. Consequently, we do not need to find the optimal number of components as was done for LDA.

Inspired by the random forest algorithm,^{11} SPRT-random and MSPRT-random can be viewed as ensemble classification algorithms, where a run of SPRT-random or MSPRT-random is analogous to a decision tree. What is different is that we did not perform bootstrapping on the training dataset for each run as in random forest. It is difficult, however, to compare SPRT-random and MSPRT-random to the other methods investigated in this work since the accuracy rates are available only at a few *α* and *β* values. More experiments need to be done at a range of *α* and *β* values to better understand the behavior of SPRT-random and MSPRT-random. We will also investigate the effect of introducing bootstrapping into our SPRT-random and MSPRT-random algorithms.

Finally, although Fu^{2} assumed independence among components, we note that this assumption is not necessary. This follows from the fact that the proof of Theorem 3.2.1 in^{10} does not assume independence among components. As long as the joint distribution of the components is known, the SPRT and MSPRT algorithms will work correctly. Because of the normality assumption, we have the joint distribution for each class immediately after estimation of the mean vectors and dispersion matrix. Consequently, obtaining independent components is not necessary for SPRT and MSPRT. We can directly sample the original (possibly dependent) components, resulting in new variants of SPRT and MSPRT. These novel variants of SPRT and MSPRT will be investigated in our future work.