The mining of high-dimensional data in which the number of features is much larger than the number of samples, has become increasingly important, especially in genomics, proteomics, biomedical imaging and other areas of systems biology [

1]. The availability of high dimensional data along with new scientific problems have significantly challenged traditional statistical theory and reshaped statistical thinking [

2].

The high dimensionality of functional genomic data sets poses problems to build classifiers. Because of the sparsity of data in high dimensional spaces, many classical methods of classification break down. For example, Fisher discrimination rule will be inapplicable because the within scatter matrix become singular if the number of variables is larger than the number of samples [

3,

4].

Another problem is caused by the small sample size. The number of samples is usually not adequate to be representative of the total population. Moreover classifiers built on small sample sets are often not stable and may have a large variance in the number of misclassification [

5]. One common approach for this problem is to aggregate many classifiers instead of using a single one. There has been considerable interest recently in the application of aggregating methods in the classification of high-dimension data [

6-

11]. The most well-known method in this class of techniques is perhaps bootstrap aggregating (bagging). Breiman found that gains in accuracy could be obtained by bagging when the base learner is not stable [

6]. However, Vu and Braga-Neto argued that the use of bagging in classification of small-sample data increases computational cost, but is not likely to improve overall classification accuracy over other simpler classification rules [

10]. Moreover, if the sample size is small, the gains achieved via a bagged ensemble may not compensate for the decrease in accuracy of individual models [

11].

Cross-validation is probably the most widely used method for estimating prediction error. In small sampled high dimension data modeling,

*k*-fold cross-validation is often used [

1]. The

*k*-fold cross-validation estimate is a stochastic variable that depends on the partition of the data set. Full cross-validation, that means performing all-possible ways of partitioning, will give an accurate estimation, but is computationally too expensive. Therefore, repeating

*k*-fold cross-validation multiple times using different splits provides a good Monte-Carlo estimate of the full cross-validation [

12]. This repeating procedure results in a lot of classifiers.

In this paper, we aggregated the classifiers obtained from principal component discriminant analysis (PCDA) with a double cross-validation scheme [

13]. PCDA is an adaptation of Fisher's linear discriminant analysis (FLDA) for high-dimensional data. In PCDA, the dimensionality of the data is reduced by principal component analysis (PCA). In the reduced dimensional space the within scatter matrices is nonsingular and classical LDA can be performed [

13-

16]. A double cross-validation scheme was used to estimate both the number of principal components and the predictor error of the PCDA model [

17]. The classifiers that were obtained from the different cross-validation loops are aggregated to make a single classifier. This approach is tested on simulated data, gene expression, proteomics and metabolomics data. The results obtained from the research may provide insights into the use of aggregating learner in low sample, high dimensional biological data.