PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Conf Proc IEEE Eng Med Biol Soc. Author manuscript; available in PMC 2013 May 3.
Published in final edited form as:
PMCID: PMC3644033
NIHMSID: NIHMS462173

Classification of Astrocytomas and Oligodendrogliomas from Mass Spectrometry Data Using Sparse Kernel Machines

Abstract

Glioma histologies are the primary factor in prognostic estimates and are used in determining the proper course of treatment. Furthermore, due to the sensitivity of cranial environments, real-time tumor-cell classification and boundary detection can aid in the precision and completeness of tumor resection. A recent improvement to mass spectrometry known as desorption electrospray ionization operates in an ambient environment without the application of a preparation compound. This allows for a real-time acquisition of mass spectra during surgeries and other live operations. In this paper, we present a framework using sparse kernel machines to determine a glioma sample’s histopathological subtype by analyzing its chemical composition acquired by desorption electrospray ionization mass spectrometry.

I. Introduction

Brain cancer is the second leading cause of death in children and young adults [1]. As of 2007, 126,000 cases of brain tumor have been identified in the U.S. [2]. The most common type of malignant brain cancers are gliomas, which encompass about 80% of the 88,000 incidents in the U.S. [3]. Patients with glioblastoma multiforme (GBM), the most common glioma form, have a median survival time of only 10.6 months [4].

Part of a glioma’s diagnosis is its histopathological type, or subtype, qualitatively based on the type of cell from which it originated or most closely resembles [4]. The World Health Organization (WHO) releases the International Classification of Diseases for Oncology (ICD-O), which classifies gliomas into three subtypes, namely, ependymomas, astrocytomas, and oligodendrogliomas, each originating from ependymal cells, astrocytes, and oligodendrocytes, respectively [5]. The National Cancer Institute breaks down the current cases of gliomas into 76% astrocytomas, 6.5% oligodendrogliomas, and 5.9% ependymomas [6]. Tumors are further categorized by a histopathological grade, based on how differentiated the tumor cells are from the original cells [4]. The American Joint Commission on Cancer grades most tumors on a scale from G1 through G4, where G1 tumors are composed of cells similar to the original tissue, and G4 cells are the most differentiated [4]. These histopathological categories are the primary factors used in determining the prognosis of the tumor and planning treatment [4], [7].

Due to the sensitive nature of the cranial environment, the identification of cancerous tissue is of utmost importance [8]. Histopathological classification is currently performed by sending a resected sample for evaluation by an expert [9]. This process is inefficient during a neural surgery where accurate tumor classification and boundary detection are crucial.

Numerous techniques exist for identifying and analyzing cancerous cells. A well-established technique known as mass spectrometry (MS) is often used for analysis of complex proteins and lipids [10], [11]. MS is appropriate for this application due to brain matter being primarily composed of lipids and proteins [12]. It has been shown that certain disorders and conditions cause a measurable change in the composition of these molecules [11]. Specifically, gliomas have been shown to exhibit differences in lipid composition compared to healthy glial tissue [12], [13]. MS is a powerful tool to measure this change, where it is capable of identifying the molecular composition of a given sample. Specifically, the sample molecules are ionized into gas and identified using a property of the ions known as the mass-to-charge ratio (m/z) [10]. MS is specifically advantageous in a clinical setting due to multiple reasons. First, it has a high sensitivity and provides a large amount of chemical information. In addition, the administration of contrast agents can be avoided [8].

Traditional MS methods, including matrix-assisted laser desorption/ionization (MALDI) [14], [15], require sample preparation, which introduces a significant delay between the sample resection and the mass spectrum acquisition for clinical applications. A recent improvement to MS known as desorption electrospray ionization mass spectrometry (DESI-MS) involves ionization of molecules in an ambient environment without the application of a preparation compound [16]. This allows for a real-time acquisition of mass spectra during surgeries and other live operations.

Prior work has shown that mass spectrometry can be successfully used as a technique for cancer detection [17]. Furthermore, mass data from DESI-MS has been used to successfully diagnose bladder carcinomas [11]. Glioma subtype classification has been explored using gene expression data [18]. In [9], the authors present a preliminary study for classification of glioma subtypes using mass spectrometry.

In this paper, we aim to bridge the advantage of in vivo DESI-MS and real-time classification of glioma subtypes. Specifically, we show that machine learning techniques can be used to provide neurosurgeons with real-time information, which can be of critical importance for precise tumor resection. There exists a plethora of techniques for data classification. However, sparse kernel machines were chosen for their superior performance in dealing with high-dimensional data [19]. Here, we consider two sparse kernel methods for glioma subtype classification, namely, support vector machines and relevance vector machines. We show that sparse kernel machines can be used for glioma subtype classification with a high accuracy.

II. Sparse Kernel Machines

In this section, we discuss two sparse kernel-based algorithms, namely, the support vector machine and the relevance vector machine. For a more comprehensive discussion on these methods, see [20].

A. Support Vector Machine Algorithm

The support vector machine (SVM) algorithm [20] is a sparse kernel algorithm used in classification and regression problems. Here we will briefly discuss the SVM framework for the two-class classification problem. Let the training set be given by x1, x2, …, xN, with target values given by z1, z2, …, zN, respectively, where equation M1 and zn [set membership] {−1, 1}, n = 1, 2, …, N. Moreover, assume that this training set is linearly separable in a feature space equation M2 defined by the transformation equation M3 that is, there exists a linear decision boundary in the feature space separating the two classes.

To classify a new data point equation M4 by predicting its target value z define equation M5, where equation M6 is a weight vector and equation M7 is a bias parameter. This representation can be rewritten in terms of a kernel function as equation M8, where an, n = 1, 2, …, N, and b are parameters determined by the training set xn and zn, n = 1, 2, …, N, and k(·, ·) is the kernel function. The sign of the function y(x) determines the class of x. More specifically, for a new data point x, the target value is given by z = sgn(y(x)), where equation M9, y ≠ 0, and sgn(0) [triangle, equals] 0. In the SVM approach the parameters w and b are chosen such that the margin, that is, the minimum distance between the decision boundary and the data points, is maximized. Hence, only a subset of the training data (i.e., support vectors) is used to determine the decision boundary. It can be shown that the solution to the SVM problem results in a convex optimization problem [20], and hence, a global optimum is guaranteed.

In the case where there is an overlap between the two data classes, the SVM algorithm can be modified by allowing misclassification of the data points. In this case, the margin is maximized while penalizing misclassified points. Such a trade-off is controlled by a positive complexity parameter C, which is determined using a hold-out method such as cross-validation [20].

B. Relevance Vector Machine Algorithm

The relevance vector machine (RVM) algorithm [21] is a Bayesian sparse kernel algorithm, which can be regarded as the Bayesian extension of the SVM algorithm.

Next, we briefly review the method for the classification problem involving two data classes, namely equation M10 and equation M11. Let the training set be given by x1, x2, …, xN, with target values given by z1, z2, …, zN, where equation M12 and zn [set membership] {0, 1}, n = 1, 2, …, N, equation M13 if zn = 1, and equation M14 if zn = 0. For a new data point equation M15, we predict the associated class membership posterior probability distribution, namely, equation M16, k = 1, 2, where equation M17 is the conditional probability of the data class equation M18 given the data point x. The class membership posterior probability for a given data point x is given by

equation M19
(1)

where equation M20 is a fixed feature-space transformation, with components equation M21, equation M22 is the weight vector, and σ(·) is the logistic sigmoidal function defined by equation M23. Note that the RVM algorithm is a special case of the above model. Specifically, in the RVM algorithm wT[var phi](x) in (1) has a special form (similar to the SVM algorithm) given by equation M24, where k(·, ·) is the kernel function. Hence, the class membership posterior probability for a given data point x is given by

equation M25
(2)

In the sequel, we consider the general formulation (1). Each weight parameter wi, i = 1, …, M, in (1) is assumed to have a zero-mean Gaussian distribution, and hence, the weight prior distribution is given by

equation M26
(3)

where αi, i = 1, 2, …, M, is the precision corresponding to the weight component wi, equation M27 and equation M28 represents the normal distribution with mean μ and variance σ2. The parameters αi, i = 1, 2, …, M, in the prior distribution (3) are called the hyperparameters.

The hyperparameters αi, i = 1, 2, …, M, can be determined by maximizing the marginal likelihood distribution p(w|z,α), where equation M29. As a result of the maximization of the marginal likelihood distribution, a number of the hyperparameters αi approach infinity. Thus, the corresponding weight parameter wi will be centered at zero, and hence, the corresponding component of the feature vector [var phi]i(x) plays no role in the prediction, resulting in a sparse predictive model. For further details of this approach, see [20], [21].

III. Glioma Type Classification

In this section, we use the sparse kernel machines described in Section II to classify the subtype of a glioma sample. The data were collected from research subjects at the Brigham and Women’s Hospital, Boston, MA. In this study, 28 glioma samples were acquired from multiple research subjects, where the samples were either astrocytomas (A) or oligodendrogliomas (O). For the purposes of maximizing spatial resolution, the scanning pattern shown in Figure 1 was used to analyze each sample [22]. In order to account for the chemical variation within a sample, numerous spectra were extracted from each sample (one spectrum from each scanned row). The number of spectra varied depending on the amount of usable tissue in the sample. In our data set, each sample contained between 7 to 45 spectra. A total of 19 astrocytoma samples and 9 oligodendroglioma samples were scanned, where the corresponding number of spectra was 426 and 205, respectively. To account for large-scale intensity differences between samples, each spectrum was normalized to zero mean and unit standard deviation.

Fig. 1
Scan pattern

The mass spectrum is regarded as a mapping equation M30 where equation M31 denotes the m/z values ranging from 150 to 1,000, with a resolution of 0.0833. Moreover, for a given m/z value mi, i [set membership] {1, …, D}, MS(mi) denotes the number of corresponding ions detected. These intensities collectively can be represented by a vector equation M32, where si = MS(mi), i = 1, …, D, and si is referred to as a feature point. For a given set of mass spectra {x1, …, xN}, equation M33, i = 1, …, N, the target value zi [set membership] {0, 1}, i = 1, …, N, is provided by the pathologist, where zi = 0 (resp., zi = 1) indicates that xi was from an astrocytoma (resp., oligodendroglioma) sample.

The SVM and RVM frameworks were chosen for the classification problem, both with a linear kernel equation M34. We applied the leave-one-out method for validation [20]. Specifically, one spectrum xj and its corresponding target value zj, j [set membership] {1, …, N}, are excluded, and the remaining spectra and their corresponding target values are used as the training set. A class, yj, is assigned to xj using the trained model. This process is repeated for each spectrum, and the outputs are compared to the target values zj to determine the accuracy of the classifier.

The goal of cross-validation is to measure how well a classifier generalizes to a data set independent from the set used for training. As can be expected, the spectra extracted from the same sample exhibit a large degree of dependence. To compensate for this, spectra from the same sample as xj are also removed from the training set. Another issue was the difference in the number of spectra from each sample as discussed earlier in this section. This was remedied by trimming the training set to use the same number of spectra from each sample. This trimming, however, was carried out randomly, introducing some variation into the performance of cross-validation. Thus, cross-validation was repeated 10 times, and the accuracies were averaged in the results of Table I. The MATLAB implementation of SVM was used with the penalty C = 1. The MATLAB RVM library [23] was used for RVM classification.

TABLE I
Classification Results Per Sample

Table I shows the performance of the SVM and RVM for the problem of classifying mass spectra corresponding to astrocytoma and oligodendroglioma. Each row lists the sample number, the subtype determined by the pathologist (A for astrocytoma and O for oligodendroglioma), the percentage of the spectra in this sample that were properly classified using SVM and RVM. Comprehensive classification accuracies are listed at the bottom of each table, representing the percentage of all astrocytoma and oligodendroglioma spectra that were properly classified as well as the overall classification accuracy. These accuracies may be analyzed to identify samples that are given incorrect subtypes by the pathologists, or to identify samples that are chemically dissimilar to other samples in the same subtype.

Finally, a feature selection technique was applied to the spectra. With thousands of points in the spectrum to describe two classes, redundancy in the information is expected [24]. Feature selection not only can improve the classification performance but also can aid in identifying the appropriate biomarkers for each glioma type. We implemented an iterative method of reducing the number of features [24]. The feature points were split into 10 subsets of size equation M35, {x1, … , xP}, {xP+1, … , x2P}, … , (x9P+1, … , xM}. The purpose was to identify the subset with the least effect on classification performance; these features were assumed to be either redundant or noisy, and were removed. The classification performance was measured by cross-validation, using an SVM classifier. This process was repeated iteratively, removing the features that contribute least to the classification performance in each iteration. A summary of these results are shown in Table II. As shown in Table II, the classification performance increases to 98% by selecting the most significant features.

TABLE II
Iterative Feature Selection Results

IV. Conclusion

In this paper, we proposed an approach for glioma subtype classification using mass spectrometry data. We considered two sparse kernel methods for glioma subtype classification, namely, SVM and RVM, and showed that sparse kernel machines can be used for glioma subtype classification with a high accuracy. Future work includes exploring alternative preprocessing, feature selection, and classification methods, extending this framework to the multi-class glioma classification problem as well as biomarker detection. Identifying the most significant molecules using a feature selection method can provide a deeper insight into the pathology of the condition and can improve treatment methods.

Acknowledgments

J. Huang and B. Gholami acknowledge several fruitful discussions with Dr. Vandana Mohan.

This research was supported in part by a grant from NIH (NAC P41 RR-13218) through Brigham and Women’s Hospital, and by grants from the Air Force Office of Scientific Research, Army Research Office, the National Science Foundation, the Brain Science Foundation, Daniel E. Ponton Fund for the Neurosciences, and the NIH Director’s New Innovator Award DP2 OD007383. This work is part of the National Alliance for Medical Image Computing (NAMIC), funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 EB005149.

Contributor Information

Jacob Huang, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, jhuang/at/gatech.edu.

Behnood Gholami, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, behnood/at/gatech.edu.

Nathalie Y. R. Agar, Department of Neurosurgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115 ; nagar/at/rics.bwh.harvard.edu.

Isaiah Norton, Department of Neurosurgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, inorton/at/partners.org.

Wassim M. Haddad, School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 ; wm.haddad/at/aerospace.gatech.edu.

Allen R. Tannenbaum, School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, tannenba/at/ece.gatech.edu.

References

[1] Jemal A, Siegel R, Xu J, Ward E. Cancer Statistics, 2010. Cancer J. Clin. 2010 Jul; http://caonline.amcancersoc.org/
[2] Altekruse SF, Kosary CL, Krapcho M, Neyman N, Aminou R, Waldron W, Ruhl J, Howlader N, Tatalovich Z, Cho H, Mariotto A, Eisner MP, Lewis DR, Cronin K, Chen HS, Feuer EJ, Stinchcomb DG, Edwards BK. SEER Cancer Statistics Review 1975-2007. Bethesda, MD: http://www.seer.cancer.gov/csr/1975_2007/
[3] American Cancer Society Brain and Spinal Cord Tumors in Adults. Atlanta, GA: 2009. http://www.cancer.org/cancer/braincnstumorsinadults/
[4] Edge SB, Byrd DR. AJCC Cancer Staging Manual. 7th ed. Springer-Verlag; New York, NY: 2009.
[5] Louis DN, Ohgaki H, Wiestler OD, Cavenee WK, Burger PC, Jouvet A, Scheithauer BW, Kleihues P. The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathologica. 2007 Aug;114(2):97–109. [PMC free article] [PubMed]
[6] Central Brain Tumor Registry of the United States Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2004-2007. Hinsdale, IL: 2011. http://www.cbtrus.org/reports/reports.html.
[7] Cha S. Update on brain tumor imaging: from anatomy to physiology. Am. J. Neurorad. 2006 Mar;27(3):475–87. [PubMed]
[8] Eberlin LS, Dill AL, Golby AJ, Ligon KL, Wiseman JM, Cooks RG, Agar NYR. Discrimination of human astrocytoma subtypes by lipid analysis using desorption electrospray ionization imaging mass spectrometry. Angew. Chem. Int. Ed. Engl. 2010 Aug;49(34):5953–6. [PMC free article] [PubMed]
[9] Mohan V, Agar N, Jolesz F, Tannenbaum A. Automatic Classification of Glioma Subtypes and Biomarker Identification Using DESI Mass Spectrometry Imaging; MICCAI 2010 Workshop Comp. Imag. Biomark. Tumors; 2010.
[10] Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003 Mar;422(6928):198–207. [PubMed]
[11] Dill AL, Ifa DR, Manicke NE, Costa AB, Ramos-Vara JA, Knapp DW, Cooks RG. Lipid profiles of canine invasive transitional cell carcinoma of the urinary bladder and adjacent normal tissue by desorption electrospray ionization imaging mass spectrometry. Anal. Chem. 2009 Nov;81(21):8758–64. [PMC free article] [PubMed]
[12] Köhler M, Machill S, Salzer R, Krafft C. Characterization of lipid extracts from brain tissue and tumors using Raman spectroscopy and mass spectrometry. Anal. Bioanal. Chem. 2009 Mar;393(5):1513–20. [PubMed]
[13] Beljebbar A, Dukic S, Amharref N, Bellefqih S, Manfait M. Monitoring of biochemical changes through the c6 gliomas progression and invasion by fourier transform infrared (FTIR) imaging. Anal. Chem. 2009 Nov;81(22):9247–56. [PubMed]
[14] Hillenkamp F, Karas M, Beavis RC, Chait BT. Matrix-assisted laser desorption/ionization mass spectrometry of biopolymers. Anal. Chem. 1991 Dec;63(24):1193A–1203A. [PubMed]
[15] Agar NYR, Malcolm JG, Mohan V, Yang HW, Johnson MD, Tannenbaum A, Agar JN, Black PM. Imaging of meningioma progression by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Anal. Chem. 2010 Apr;82(7):2621–5. [PMC free article] [PubMed]
[16] Cooks RG, Ouyang Z, Takats Z, Wiseman JM. Detection Technologies. Ambient mass spectrometry. Science. 2006 Mar;311(5767):1566–70. [PubMed]
[17] Semmes OJ, Feng Z, Adam B-L, Banez LL, Bigbee WL, Campos D, Cazares LH, Chan DW, Grizzle WE, Izbicka E, Kagan J, Malik G, McLerran D, Moul JW, Partin A, Prasanna P, Rosenzweig J, Sokoll LJ, Srivastava S, Srivastava S, Thompson I, Welsh MJ, White N, Winget M, Yasui Y, Zhang Z, Zhu L. Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin. Chem. 2005 Jan;51(1):102–12. [PubMed]
[18] Li A, Walling J, Ahn S, Kotliarov Y, Su Q, Quezado M, Oberholtzer JC, Park J, Zenklusen JC, Fine HA. Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes. Cancer Res. 2009 Mar;69(5):2091–9. [PMC free article] [PubMed]
[19] Wagner M, Naik D, Pothen A. Protocols for disease classification from mass spectrometry data. Proteomics. 2003 Sep;3(9):1692–8. [PubMed]
[20] Bishop CM. Pattern Recognition and Machine Learning. Springer-Verlag; New York, NY: 2006.
[21] Tipping ME. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001 Aug;1(3):211–244.
[22] Takáts Z, Wiseman JM, Cooks RG. Ambient mass spectrometry using desorption electrospray ionization (DESI): instrumentation, mechanisms and applications in forensics, chemistry, and biology. J. Mass. Spec. 2005 Oct;40(10):1261–75. [PubMed]
[23] Tipping M, Faul A. Proc. Ninth Int’l Workshop Artif. Intel. Stat. Key West, FL: 2003. Fast Marginal Likelihood Maximisation for Sparse Bayesian Models. http://research.microsoft.com/enus/um/cambridge/events/aistats2003/proceedings/papers.htm.
[24] Pereira F, Mitchell T, Botvinick M. Machine learning classifiers and fMRI: a tutorial overview. NeuroImage. 2009 Mar;45(1 Suppl):S199–209. [PMC free article] [PubMed]