PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ejbiosysbioJournal's HomeManuscript SubmissionSpringerOpen.comRegisterThis article
 
EURASIP J Bioinform Syst Biol. 2007; 2007(1): 64628.
Published online 2007 July 10. doi:  10.1155/2007/64628
PMCID: PMC3171347
Gene Selection for Multiclass Prediction by Weighted Fisher Criterion
Jianhua Xuan,corresponding author1 Yue Wang,1 Yibin Dong,1 Yuanjian Feng,1 Bin Wang,1 Javed Khan,2 Maria Bakay,3 Zuyi Wang,1,3 Lauren Pachman,4 Sara Winokur,5 Yi-Wen Chen,3 Robert Clarke,6 and Eric Hoffman3
1Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
2Department of Pediatric Oncology, National Cancer Institute, Gaithersburg, MD 20877, USA
3Research Center for Genetic Medicine, Children's National Medical Center, Washington, DC 20010, USA
4Disease Pathogenesis Program, Children's Memorial Research Center, Chicago, IL 60614, USA
5Department of Biological Chemistry, University of California, Irvine, CA 92697, USA
6Lombardi Cancer Center, Georgetown University, Washington, DC 20007, USA
corresponding authorCorresponding author.
Jianhua Xuan: xuan/at/vt.edu; Yue Wang: yuewang/at/vt.edu; Yibin Dong: yibin.dong/at/vt.edu; Yuanjian Feng: yjfeng/at/vt.edu; Bin Wang: binwang/at/vt.edu; Javed Khan: khanjav/at/mail.nih.gov; Maria Bakay: mbakay/at/cnmcresearch.org; Zuyi Wang: zwang/at/cnmcresearch.org; Lauren Pachman: pachman/at/northwestern.edu; Sara Winokur: stwinoku/at/uci.edu; Yi-Wen Chen: ychen/at/cnmcresearch.org; Robert Clarke: clarker/at/georgetown.edu; Eric Hoffman: ehoffman/at/cnmcresearch.org
Received August 30, 2006; Revised December 16, 2006; Accepted March 20, 2007.
Gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. Gene selection, as an important step for improved diagnostics, screens tens of thousands of genes and identifies a small subset that discriminates between disease types. A two-step gene selection method is proposed to identify informative gene subsets for accurate classification of multiclass phenotypes. In the first step, individually discriminatory genes (IDGs) are identified by using one-dimensional weighted Fisher criterion (wFC). In the second step, jointly discriminatory genes (JDGs) are selected by sequential search methods, based on their joint class separability measured by multidimensional weighted Fisher criterion (wFC). The performance of the selected gene subsets for multiclass prediction is evaluated by artificial neural networks (ANNs) and/or support vector machines (SVMs). By applying the proposed IDG/JDG approach to two microarray studies, that is, small round blue cell tumors (SRBCTs) and muscular dystrophies (MDs), we successfully identified a much smaller yet efficient set of JDGs for diagnosing SRBCTs and MDs with high prediction accuracies (96.9% for SRBCTs and 92.3% for MDs, resp.). These experimental results demonstrated that the two-step gene selection method is able to identify a subset of highly discriminative genes for improved multiclass prediction.
  • Bittner M, Meltzer P, Chen Y. et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 2000;406(6795):536–540. doi: 10.1038/35020115. [PubMed] [Cross Ref]
  • Golub TR, Slonim DK, Tamayo P. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–537. doi: 10.1126/science.286.5439.531. [PubMed] [Cross Ref]
  • Shipp MA, Ross KN, Tamayo P. et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine. 2002;8(1):68–74. doi: 10.1038/nm0102-68. [PubMed] [Cross Ref]
  • Liotta L, Petricoin E. Molecular profiling of human cancer. Nature Reviews Genetics. 2000;1(1):48–56. doi: 10.1038/35049567. [PubMed] [Cross Ref]
  • Jain AK, Duin RPW, Mao J. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(1):4–37. doi: 10.1109/34.824819. [Cross Ref]
  • Jain AK, Zongker D. Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1997;19(2):153–158. doi: 10.1109/34.574797. [Cross Ref]
  • Raudys SJ, Jain AK. Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1991;13(3):252–264. doi: 10.1109/34.75512. [Cross Ref]
  • Fukunaga K. Introduction to Statistical Pattern Recognition. 2. Academic Press, Boston, Mass, USA; 1990.
  • Devijver PA, Kittler J. Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, NJ, USA; 1982.
  • Pudil P, Novovicova J, Kittler J. Floating search methods in feature selection. Pattern Recognition Letters. 1994;15(11):1119–1125. doi: 10.1016/0167-8655(94)90127-9. [Cross Ref]
  • Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association. 2002;97(457):77–87. doi: 10.1198/016214502753479248. [Cross Ref]
  • Khan J, Wei JS, Ringnér M. et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine. 2001;7(6):673–679. doi: 10.1038/89044. [PMC free article] [PubMed] [Cross Ref]
  • Li T, Zhang C, Ogihara M. A comparative study of feature selection and multiclass classfication methods for tissue classification based on gene expression. Bioinformatics. 2004;20(15):2429–2437. doi: 10.1093/bioinformatics/bth267. [PubMed] [Cross Ref]
  • Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2002;99(10):6567–6572. doi: 10.1073/pnas.082099299. [PubMed] [Cross Ref]
  • Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Research. 2001;11(11):1878–1887. [PubMed]
  • Loog M. Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion. Delft University Press, Delft, The Netherlands; 1999.
  • Loog M, Duin RPW, Haeb-Umbach R. Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2001;23(7):762–766. doi: 10.1109/34.935849. [Cross Ref]
  • Koop JC. Generalized inverse of a singular matrix. Nature. 1963;200:716. [PubMed]
  • Press WM, Flannery BP, Teukolsky SA, Vetterling WT. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York, NY, USA; 1986.
  • Narendra PM, Fukunaga K. A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers. 1977;26(9):917–922.
  • Marill T, Green DM. On the effectiveness of receptors in cognition system. IEEE Transactions on Information Theory. 1963;9:11–17. doi: 10.1109/TIT.1963.1057810. [Cross Ref]
  • Whitney AW. A direct method of nonparametric measurement selection. IEEE Transactions on Computers. 1971;20(9):1100–1103.
  • Stearns SD. On selecting features for pattern classifiers. Proceedings of the 3rd International Conference on Pattern Recognition, Coronado, Calif, USA, November 1976. pp. 71–75.
  • Haykin S. Neural Networks: A Comprehensive Foundation. 2. Prentice-Hall, Upper Saddle River, NJ, USA; 1999.
  • Lee Y, Lee C-K. Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics. 2003;19(9):1132–1139. doi: 10.1093/bioinformatics/btg102. [PubMed] [Cross Ref]
  • Ramaswamy S, Tamayo P, Rifkin R. et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(26):15149–15154. doi: 10.1073/pnas.211566398. [PubMed] [Cross Ref]
  • Bakay M, Chen Y-W, Borup R, Zhao P, Nagaraju K, Hoffman E. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformatics. 2002;3(1):4. doi: 10.1186/1471-2105-3-4. [PMC free article] [PubMed] [Cross Ref]
  • Bakay M, Wang Z, Melcon G. et al. Nuclear envelope dystrophies show a transcriptional fingerprint suggesting disruption of Rb-MyoD pathways in muscle regeneration. Brain. 2006;129(4):996–1013. doi: 10.1093/brain/awl023. [PubMed] [Cross Ref]
  • Affymetrix Technical Note. Statistical algorithms description document. Affymetrix. 2002. http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
  • Zhao P, Seo J, Wang Z, Wang Y, Shneiderman B, Hoffman E. In vivo filtering of in vitro expression data reveals MyoD targets. Comptes Rendus - Biologies. 2003;326(10-11):1049–1065. doi: 10.1016/j.crvi.2003.09.035. [PubMed] [Cross Ref]
  • Zhao P, Hoffman E. Embryonic myogenesis pathways in muscle regeneration. Developmental Dynamics. 2004;229(2):380–392. doi: 10.1002/dvdy.10457. [PubMed] [Cross Ref]
  • Winokur S, Chen Y-W, Masny PS. et al. Expression profiling of FSHD muscle supports a defect in specific stages of myogenic differentiation. Human Molecular Genetics. 2003;12(22):2895–2907. doi: 10.1093/hmg/ddg327. [PubMed] [Cross Ref]
  • Bakay M, Zhao P, Chen J, Hoffman E. A web-accessible complete transcriptome of normal human and DMD muscle. Neuromuscular Disorders. 2002;12(1):S125–S141. [PubMed]
  • Chen Y-W, Zhao P, Borup R, Hoffman E. Expression profiling in the muscular dystrophies: identification of novel aspects of molecular pathophysiology. Journal of Cell Biology. 2000;151(6):1321–1336. doi: 10.1083/jcb.151.6.1321. [PMC free article] [PubMed] [Cross Ref]
  • Hoffman E, Brown RH Jr., Kunkel LM. Dystrophin: the protein product of the Duchenne muscular dystrophy locus. Cell. 1987;51(6):919–928. doi: 10.1016/0092-8674(87)90579-4. [PubMed] [Cross Ref]
  • Koening M, Hoffman E, Bertelson CJ, Monaco AP, Feener C, Kunkel LM. Complete cloning of the Duchenne muscular dystrophy (DMD) cDNA and preliminary genomic organization of the DMD gene in normal and affected individuals. Cell. 1987;50(3):509–517. doi: 10.1016/0092-8674(87)90504-6. [PubMed] [Cross Ref]
  • Zhao P, Iezzi S, Carver E. et al. Slug is a novel downstream target of MyoD. Temporal profiling in muscle regeneration. Journal of Biological Chemistry. 2002;277(33):30091–30101. doi: 10.1074/jbc.M202668200. [PubMed] [Cross Ref]
  • Fernandes RJ, Skiena SS. Microarray synthesis through multiple-use PCR primer design. Bioinformatics. 2002;18(1):S128–S135. doi: 10.1093/bioinformatics/18.suppl_1.S128. [PubMed] [Cross Ref]
  • Jaeger J, Weichenhan D, Ivandic B, Spang R. Early diagnostic marker panel determination for microarray based clinical studies. Statistical Applications in Genetics and Molecular Biology. 2005;4(1, article 9) [PubMed]
  • Li W. How many genes are needed for early detection of breast cancer, based on gene expression patterns in peripheral blood cells? Breast Cancer Research. 2005;7(5):E5. doi: 10.1186/bcr1295. [PMC free article] [PubMed] [Cross Ref]
  • Glas AM, Floore A, Delahaye LJMJ. et al. Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics. 2006;7:278. doi: 10.1186/1471-2164-7-278. [PMC free article] [PubMed] [Cross Ref]
  • Duin RPW. Classifiers in almost empty spaces. Proceedings of the 15th International Conference on Pattern Recognition (ICPR '00), Barcelona, Spain, September 2000. pp. 1–7.
  • Raudys SJ. Evolution and generalization of a single neurone—I: single-layer perceptron as seven statistical classifiers. Neural Networks. 1998;11(2):283–296. doi: 10.1016/S0893-6080(97)00135-4. [PubMed] [Cross Ref]
  • Raudys SJ. Evolution and generalization of a single neurone—II: complexity of statistical classifiers and sample size considerations. Neural Networks. 1998;11(2):297–313. doi: 10.1016/S0893-6080(97)00136-6. [PubMed] [Cross Ref]
  • Raudys SJ, Duin RPW. Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recognition Letters. 1998;19(5-6):385–392. doi: 10.1016/S0167-8655(98)00016-6. [Cross Ref]
  • Vapnik VN. Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.
  • Oja E. Subspace Methods of Pattern Recognition. John Wiley & Sons, New York, NY, USA; 1984.
Articles from EURASIP Journal on Bioinformatics and Systems Biology are provided here courtesy of
BioMed Central