With the successful completion of the Human Genome Project (HGP), we are entering the post genomic era. Facing mass amounts of data, traditional biological experiments and data analysis techniques encounter great challenges. In this situation, cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies as global (genome-wide or system-wide) experimental approaches that are effectively used in systematical analysis of large-scale genome data. In recent years, with its ability to measure simultaneously the activities and interactions of thousands of genes, microarray promises new insights into the mechanisms of living systems and is attracting more and more interest for solving scientific problems and in industrial applications. Meanwhile, further biological and medical research also promoted the development and application of microarray.
Typical issues addressed by microarray experiments include two main aspects: finding co-regulated genes for classification based on different cell-type [1
], stage-specific [2
], disease-related [4
], or treatment-related [6
] patterns of gene expression and understanding gene regulatory networks by analyzing functional roles of genes in cellular processes [9
]. Here we focus on the former, especially on tumor classification using gene expression data, which is a hot topic in recent years and has received general attention by many biological and medical researchers [11
]. A reliable and precise classification of tumors based on gene expression data may lead to a more complete understanding of molecular variations among tumors, and hence, to better diagnosis and treatment strategies.
Microarray experiments usually generate large datasets with expression values for thousands of genes (2000~20 000) but not more than a few dozens samples (20~80). Thus, very accurate classification of tissue samples in such high-dimensional problems is difficult, but often crucial, for successful diagnosis and treatment. Several comprehensively comparative and improved methods have been proposed recently [20
]. In this paper, we introduce a combinational feature selection method using ensemble neural networks to remarkably improve the accuracy and robustness of sample classification. In recent years, several researchers have used ensemble neural networks for tumor classification based on gene expression data [12
]. Khan et al. [12
] used neural networks to classify 4 subcategories of small round blue-cell tumors. By using 3750 networks generated by three fold cross-validation 1250 times and using the list of 96 most influential genes as the inputs, they reported very excellent results based on their dataset. Also O'Neill and Song [23
] used neural networks to analyze lymphoma microarray data and can predict the long-term survival of individual patients with 100% accuracy based on the datasets published by Alizadeh et al [18
]. Both of them are very good work in microarray data analysis using neural networks. In this paper our motivation lies in that by combining various feature selection mechanisms we can avail of more information of samples for classification and by using ensemble neural networks we can more effectively combine these features and improve the stability and robustness of answers. So the most important distinctions between our work and these above two citations are that by using combinational feature selection we can penetrate various different profiles of the samples and can avail of more information for classification, and also these neural networks can work in a parallel way unlike those two papers. In the same time, unlike their work based on some certain dataset, we can get improved, at least comparable results on a wide range of different datasets. In the following section, we provide detailed illustration and comparison of our new method.