|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.
The small, round blue cell tumors (SRBCTs) of childhood, which include neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), are so named because of their similar appearance on routine histology1. However, accurate diagnosis of SRBCTs is essential because the treatment options, responses to therapy and prognoses vary widely depending on the diagnosis. As their name implies, these cancers are difficult to distinguish by light microscopy, and currently no single test can precisely distinguish these cancers. In clinical practice, several techniques are used for diagnosis, including immunohistochemistry2, cytogenetics, interphase fluorescence in situ hybridization3 and reverse transcription (RT)-PCR (ref. 4). Immunohistochemistry allows the detection of protein expression, but it can only examine one protein at a time. Molecular techniques such as RT-PCR are used increasingly for diagnostic confirmation following the discovery of tumor-specific translocations such as EWS-FLI1; t(11;22)(q24;q12) in EWS, and the PAX3-FKHR; t(2;13)(q35;q14) in alveolar rhabdomyosarcoma1 (ARMS). However, molecular markers do not always provide a definitive diagnosis, as on occasion there is failure to detect the classical translocations, due to either technical difficulties or the presence of variant translocations.
Gene-expression profiling using cDNA microarrays permits a simultaneous analysis of multiple markers, and has been used to categorize cancers into subgroups5–8. However, despite the many statistical techniques to analyze gene-expression data, none so far has been rigorously tested for their ability to accurately distinguish cancers belonging to several diagnostic categories.
Artificial neural networks (ANNs) are computer-based algorithms which are modeled on the structure and behavior of neurons in the human brain and can be trained to recognize and categorize complex patterns9. Pattern recognition is achieved by adjusting parameters of the ANN by a process of error minimization through learning from experience. They can be calibrated using any type of input data, such as gene-expression levels generated by cDNA microarrays, and the output can be grouped into any given number of categories. ANNs have been recently applied to clinical problems such as diagnosing myocardial infarcts10 and arrhythmias from electrocardiograms11 and interpreting radiographs and magnetic resonance images12,13. Here we applied ANNs to decipher gene-expression signatures of SRBCTs and used them for diagnostic classification.
To calibrate ANN models to recognize cancers in each of the four SRBCT categories, we used gene-expression data from cDNA microarrays containing 6567 genes. The 63 training samples (see Supplemental Table A) included both tumor biopsy material (13 EWS and 10 RMS) and cell lines (10 EWS, 10 RMS, 12 NB and 8 Burkitt lymphomas (BL; a subset of NHL)). For two samples, ST486 (BL-C2 and C4) and GICAN (NB-C2 and C7), we performed two independent microarray experiments to test the reproducibility of the experiments and these were subsequently treated as separate samples. Filtering for a minimal level of expression reduced the number of genes to 2308 (Fig. 1a). Principal component analysis (PCA) further reduced the dimensionality, and we found that using the 10 dominant PCA components per sample as inputs and four outputs (EWS, RMS, NB or BL) produced well-calibrated ANN models. These 10 dominant components contained 63% of the variance in the data matrix. The remaining PCA components contained variance unrelated to separating the four cancers. The three-fold cross-validation procedure (see Methods) produced a total of 3750 ANN models, and the training and validation was successful (Fig. 1b). In addition, there was no sign of ‘over-training’ of the models, as would be shown by a rise in the summed square error for the validation set with increasing training iterations or ‘epochs’ (Fig. 1b). Using these ANN models, all of the 63 training samples were correctly assigned/classified to their respective categories, having received the highest committee vote (average output) for that category.
We next determined the contribution of each gene to the classification by the ANN models by measuring the sensitivity of the classification to a change in the expression level of each gene, using the 3750 previously calibrated models (see Supplementary Methods). In this way, we ranked the genes according to their significance for the classification. We then determined the classification error rate using increasing numbers of these ranked genes. The classification error rate minimized to 0% at 96 genes (Fig. 1c). The 10 dominant PCA components for these 96 genes contained 79% of the variance in the data matrix. Using only these 96 genes, we recalibrated the ANN models (Fig. 1a) and again correctly classified all 63 samples (Fig. 2). Moreover, multidimensional scaling (MDS) analysis5 using these 96 genes clearly separated the four cancer types (Fig. 3a). The top 96 discriminators represented 93 unique genes (Fig. 3b), as IGF2 was represented by three independent clones and MYC by two. Of the 96, 13 were anonymous expressed sequence tags (ESTs); 16 genes were specifically expressed in EWS, 20 in RMS, 15 in NB and 10 in BL. Twelve genes were good discriminators on the basis of lack of expression in BL and variable expression in the other three types. One gene (EST; Clone ID 295985) discriminated EWS from other cancer types by its lack of expression in this cancer. The remainder of the genes was expressed in two of the four cancer types. To our knowledge, of the 61 genes that were specifically expressed in a cancer type, 41 have not been previously reported as associated with these diseases.
We then tested the diagnostic classification capabilities of these ANN models on a set of 25 blinded test samples. A sample is classified to a diagnostic category if it receives the highest vote for that category and because this classifier has only four possible outputs, all samples will be classified to one of the four categories. We therefore established a diagnostic classification method based on a statistical cutoff to enable us to reject a diagnosis of a sample classified to a given category. If a sample falls outside the 95th percentile of the probability distribution of distances between samples and their ideal output (for example, for EWS it is EWS = 1, RMS = NB = BL = 0), its diagnosis is rejected (see Methods).
The test samples contained both tumors (5 EWS, 5 RMS and 4 NB) and cell lines (1 EWS, 2 NB and 3 BL). We also tested the ability of these models to reject a diagnosis on 5 non-SRBCTs (consisting of 2 normal muscle tissues (Tests 9 and 13) and 3 cell lines including an undifferentiated sarcoma (Test 5), osteosarcoma (Test 3) and a prostate carcinoma (Test 11)). Using the 3750 ANN models calibrated with the 96 genes, we correctly classified 100% of the 20 SRBCT tests (Table 1 & Fig. 2) as well as all 63 training samples (see Supplemental Table A). Three of these samples, Test 10, Test 20 and EWS-T13 were correctly assigned to their categories (RMS, EWS and EWS respectively), having received the highest vote for their respective categories. However, their distance from a perfect vote was greater than the expected 95th percentile distance (Fig. 2); therefore, we could not confidently diagnose them by this criterion. All of the five non-SRBCT samples were excluded from any of the four diagnostic categories, since they fell outside the 95th percentiles. Using these criteria for all 88 samples, the sensitivity of the ANN models for diagnostic classification was 93% for EWS, 96% for RMS and 100% for both NB and BL. The specificity was 100% for all four diagnostic categories. Also, hierarchical clustering14 using the 96 genes, identified from the ANN models, correctly clustered all 20 of the test samples (Fig. 3c). Moreover, the two pairs of samples that were derived from two cell lines, BL-C2 and C4 (ST486) and NB-C2 and C7 (GICAN), were adjacent to one another in the same cluster.
To confirm the effectiveness of the ANN models to identify genes that show preferential high expression in specific cancer types at the protein level, we performed immunohistochemistry on SRBCT tissue arrays for the expression of fibroblast growth factor receptor 4 (FGFR4). This tyrosine kinase receptor is expressed during myogenesis15 but not in adult muscle, and is of interest because of its potential role in tumor growth16 and in prevention of terminal differentiation in muscle17. Moderate to strong cytoplasmic immunostaining for FGFR4 was seen in all 26 RMSs tested (17 alveolar, 9 embryonal). We also observed generally weaker staining in EWS and NHL in agreement with the microarray results, except for one case of anaplastic large cell lymphoma that was strongly positive (data not shown).
Tumors are currently diagnosed by histology and immunohistochemistry based on their morphology and protein expression, respectively. However, poorly differentiated cancers can be difficult to diagnose by routine histopathology. In addition, the histological appearance of a tumor cannot reveal the underlying genetic aberrations or biological processes that contribute to the malignant process. Here we developed a method of diagnostic classification of cancers from their gene-expression signatures and identified the genes that contributed to this classification.
We used the SRBCTs of childhood as a model because these cancers occasionally present diagnostic difficulties. For example, Ewing sarcoma is diagnosed by immunohistochemical evidence of MIC2 expression18 and lack of expression of the leukocyte common antigen CD45 (excluding lymphoma), muscle-specific actin or myogenin (excluding RMS)19. However, reliance on detection of MIC2 alone can lead to incorrect diagnosis as MIC2 expression occurs occasionally in other tumor types including RMS and NHL (ref. 1).
Monitoring global gene-expression levels by cDNA microarrays provides an additional tool for elucidating tumor biology as well as the potential for molecular diagnostic classification of cancer5–8,20–22. Currently, classification and clustering tools using gene-expression data have not been rigorously tested for diagnostic classification of more than two categories. Other approaches that share the parametric nature of ANNs and have been utilized to classify gene-expression profiles include Support Vector Machines23. Thus far, these other methods have not been fully explored to extract the genes or features that are most important for the classification performance and which also will be of interest to cancer biologists24.
Here we have approached this problem using ANN-based models. We calibrated ANN models on the expression profiles of 63 SRBCTs of 4 diagnostic categories. Due to the limited amount of training data and the high performance achieved, we limited our analysis to linear (that is, no hidden layers) ANN models. Although other linear methods may perform as well, our method can easily accommodate nonlinear features of expression data if required. To compensate for heterogeneity within the tumor samples (which contain both malignant and stromal cells) and for possible artifacts due to growth of cell lines in tissue culture, we used both tumor samples (n = 23) and cell lines (n = 40). Data from these samples is complementary, because tumor tissue, though complex, provides a gene-expression pattern representative of tumor growth in vivo, while cell lines contain a uniform malignant population without stromal contamination. Despite using only NB cell lines for calibrating the ANN models, all four NB tumors among the test samples were correctly diagnosed with high confidence. This not only demonstrates the high similarity of NB cell lines to the tumors of origin, but also validates the use of cell lines for ANN calibration. The calibrated ANN models accurately classified all 63 training SRBCTs and showed no evidence of over-training, demonstrating the robustness of this technique.
A potential difficulty with ANN-based pattern recognition models is elucidating causal links from the output to the original input data. To solve this problem and to identify the most significant genes, we calculated the sensitivity of the classification to a change in the expression level of each gene. We produced a list of genes ranked by their significance to the classification. Using this list, we established that the top 96 genes reduced the misclassifications to zero, which opens the potential for cost effective fabrication of SRBCT subarrays in diagnostic use. When we tested the ANN models calibrated using the 96 genes on 25 blinded samples, we were able to correctly classify all 20 samples of SRBCTs and reject the 5 non-SRBCTs. This supports the potential use of these methods as an adjunct to routine histological diagnosis.
Although ANN analysis leads to identification of genes specific for a cancer with implications for biology and therapy, a strength of this method is that it does not require genes to be exclusively associated with a single cancer type. This allows for classification based on complex gene-expression patterns. For example, the top 96 discriminating genes included not only those that had high (61) or low levels (12 BL and 1 EWS) of expression in one particular cancer, but also genes that were differentially expressed in two diagnostic categories as compared to the remaining two. Of the 16 genes highly expressed only in EWS, two (MIC2 and GYG2) have been previously described18,25. MIC2 immunostaining is currently used to diagnose EWS; however we find that although MIC2 detects EWS with high sensitivity, it alone cannot be used to discriminate EWS as it was also expressed in several RMSs.
Our method identifies genes related to tumor histogenesis, but includes genes that may not normally be expressed in the corresponding mature tissue. Of the 14 genes that have not yet been reported to be highly expressed in EWS, 4 (TUBB5, ANXA1, NOE1 and GSTM5)26–29 were neural-specific genes—lending more credence to the proposed neural histogenesis of EWS (ref. 30). Twenty genes were highly expressed only in RMS, including eight specific for muscle tissue and five (FGFR4, IGF2, MYL4, ITGA7 and IGFBP5)15,31–34 related to myogenesis. Among the latter, IGF2, MYL4 and IGFBP5 expression has been reported in RMS (refs. 35,36), and only ITGA7 and IGFBP5 were found to be expressed in our two normal muscle samples. Of the genes specifically expressed in a cancer type, 41 have not been previously reported, including 7 ESTs with no current known function. All of these warrant further study and might provide new insights into the biology of these cancers. For example, FGFR4, a tyrosine kinase receptor that is expressed during myogenesis and prevents terminal differentiation in myocytes15,17, was found to be highly expressed only in RMS and not in normal muscle. The relatively strong differential expression of FGFR4 in RMS was confirmed by immunostaining of tissue microarrays (data not shown). Although the high expression of FGFR4 in most cases of RMS indicates that it may be relevant to the biology of this tumor, it is also expressed in some other cancers37 and normal tissues38. This indicates that although FGFR4 expression in RMS may be of biological and therapeutic interest, it is unlikely to be applicable as a sole differential diagnostic marker for these tumors.
As the main purpose of this study was to optimize the classification of these cancers, we used a stringent quality filter to include only the genes for which there were good measurements for all samples. This may remove certain genes that are highly expressed in some cancers, but not expressed in other cancers, or may appear not to be expressed because of an artifact in a particular cDNA spot. However, we found that this quality filtration produced more robust prediction models and led to the identification of a set of 96 genes highly relevant to these cancers. Nonetheless, we expect that this list can be expanded by the use of more comprehensive arrays and larger sample sets for training.
Here we developed a method of diagnostic classification of cancers from their gene expression signatures using ANNs. We also identified in ranked order the genes that contributed to this classification, and we were able to define a minimal set that can correctly classify our samples into their diagnostic categories. Although we achieved high sensitivity and specificity for diagnostic classification, we believe that with larger arrays and more samples it will be possible to improve on the sensitivity of these models for purposes of diagnosis in clinical practice. To our knowledge, this is the first application of ANN for diagnostic classification of cancer using gene-expression data derived from cDNA microarrays. Future applications of these methods will include studies to classify cancers according to stage and biological behavior in order to predict prognosis and thereby direct therapy. We believe this offers an alternative and powerful technique for the detection of gene-expression signatures, and the discovery of novel genes that characterize a diagnostic subgroup may also identify new targets for therapy.
The source and other information for the cell lines and tumor samples used in this study are described in Supplemental Table A (for the training set) and Table 1 (for the test set). All the original histological diagnoses were made at tertiary hospitals, which have reference diagnostic laboratories with extensive experience in the diagnosis of pediatric cancers. Approximately 20% of all samples in each category were randomly selected, blinded and set aside for testing. To augment this test set, we added 4 neuroblastoma tumors and 5 non-SRBCT samples (also blinded to the authors performing the analysis). The EWSs had a spectrum of the expected translocations, and the RMSs were a mixture of both ARMS containing the PAX3-FKHR translocation and embryonal rhabdomyosarcoma (ERMS). The NBs contained both MYCN amplified and single copy samples. The NHLs were cell lines derived from BL (see Supplemental Table B for details of all samples). The conditions for cell cultures and the methods for extracting RNA from cell lines were described5.
Preparation of glass cDNA microarrays, probe labeling, hybridization and image acquisition were performed according to the standard NHGRI protocol (http://www.nhgri.nih.gov/DIR/LCG /15K/HTML/protocol.html). Image analysis was performed using DeArray software39. The cDNA clones were obtained from Research Genetics (Huntsville, Alabama) and were their standard microarray set, which consisted of 3789 sequence-verified known genes and 2778 sequence-verified ESTs.
We filtered genes by requiring that a gene should have red intensity greater than 20 across all experiments. The number of genes that passed this filter was 2308. Each slide was normalized across all experiments such that the relative (or normalized) red intensity (RRI) for each gene was defined as: RRI = mean intensity of that spot/mean intensity of filtered genes. The natural logarithm (ln) of RRI was used as a measure of the expression levels. Hierarchical clustering and MDS plots were performed as described5.
To allow for a supervised regression model with no over-training (when we have low number of parameters as compared to the number of samples), the dimensionality of the samples was reduced by PCA (ref. 40) using centralized ln(RRI) values as input. Thus each sample was represented by 88 numbers, which are the results of projection of the gene expressions using PCA eigenvectors. We used the 10 dominant PCA components for subsequent analysis. We classified the training samples in the 4 categories using a 3-fold cross validation procedure: the 63 training (labeled) samples were randomly shuffled and split into 3 equally sized groups (see Fig. 1a). Each linear ANN model was then calibrated with the 10 PCA input variables (normalized to centralized z-scores) using 2 of the groups, with the third group reserved for testing predictions (validation). This procedure was repeated 3 times, each time with a different group used for validation. The random shuffling was redone 1250 times and for each shuffling we analyzed 3 ANN models. Thus, in total, each sample belonged to a validation set 1250 times, and 3750 ANN models were calibrated. For each diagnostic category (EWS, RMS, NB or BL), each ANN model gave an output between 0 (not this category) and 1 (this category). The 1250 outputs for each validation sample were used as a committee as follows. We calculated the average of all the predicted outputs (a committee vote) and then a sample is classified as a particular cancer if it receives the highest committee vote for that cancer. In clinical settings, it is important to be able to reject a diagnostic classification including samples not belonging to any of the four diagnoses. Therefore, to be able to reject classifications we did as follows. A squared Euclidean distance was computed for each cancer type, between the committee vote for a sample and the ‘ideal’ output for that cancer type; normalized such that it is unity between cancer types (see Supplemental Methods). Using the 1250 ANN models for each validation sample we constructed for each cancer type an empirical probability distribution for the distances. Using these distributions, samples are only diagnosed as a specific cancer if they lie within the 95th percentile. All 3750 models were used to classify the additional 25 test samples.
The sensitivity to the different genes is determined by the absolute value of the partial derivative of the output with respect to the gene expressions, averaged over samples and ANN models (see Supplemental Methods). A large sensitivity implies that changing the expression influences the output significantly. In this way the genes can be ranked.
We thank K. Gayton, C. Tsokos, T. Fadiran, J. Lueders and R. Walker for their technical assistance; M. Ohlsson for valuable discussions on ANNs; R. Simon, M. Bittner, Y. Chen and S. Gruvberger for their helpful comments regarding the data analysis; and M. Tsokos, L. Helman and C. Thiele for cell lines supplied from the NCI. J.S.W. was in part supported by the Charles & Dana Nearburg Foundation. M.R. was in part supported by the Swedish Research Council and the Knut and Alice Wallenberg Foundation through the SWEGENE consortium. C.P. was in part supported by the Swedish Foundation for Strategic Research.
J.K., J.S.W. and M.R. contributed equally to this study.