PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Magn Reson Med. Author manuscript; available in PMC 2010 December 1.
Published in final edited form as:
PMCID: PMC2863141
NIHMSID: NIHMS165529

Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme

Abstract

The objective of this study is to investigate the use of pattern classification methods for distinguishing different types of brain tumors, such as primary gliomas from metastases, and also for grading of gliomas. The availability of an automated computer analysis tool that is more objective than human readers can potentially lead to more reliable and reproducible brain tumor diagnostic procedures. A computer-assisted classification method combining conventional MRI and perfusion MRI is developed and used for differential diagnosis. The proposed scheme consists of several steps including ROI definition, feature extraction, feature selection and classification. The extracted features include tumor shape and intensity characteristics as well as rotation invariant texture features. Feature subset selection is performed using Support Vector Machines (SVMs) with recursive feature elimination. The method was applied on a population of 102 brain tumors histologically diagnosed as metastasis (24), meningiomas (4), gliomas WHO grade 2 (22), gliomas WHO grade 3 (18), and glioblastomas (34). The binary SVM classification accuracy, sensitivity, and specificity, assessed by leave-one-out cross-validation, were respectively 85%, 87%, and 79% for discrimination of metastases from gliomas, and 88%, 85%, and 96% for discrimination of high grade (grade III and IV) from low grade (grade II) neoplasms. Multi-class classification was also performed via a one-versus-all voting scheme.

Keywords: brain tumor, MRI, classification, SVM, feature selection, texture, tumor grade

1. Introduction

Clinical decisions regarding the treatment of brain neoplasms rely, in part, on MRI at various stages of the treatment process. Radiological diagnosis is based on the multi-parametric imaging profile (CT, conventional MRI, advanced MRI). Tumor characterization is difficult, because neoplastic tissue is often heterogeneous in spatial and imaging profiles [1], and for some imaging techniques often overlaps with normal tissue (especially the infiltrating part) [2][3]. Gliomas might show mixed characteristics, for example demonstrating both low and high grade features. The reference standard for characterizing brain neoplasms is currently based on histopathologic analysis following surgical biopsy or resection, but this also has limitations including sampling error and variability in interpretation [1][4].

The objective of this study is to provide an automated tool that may assist in the imaging evaluation of brain neoplasms by determining the glioma grade and differentiating between different tissue types, such as primary neoplasms (gliomas) from secondary neoplasms (metastases). These issues are of critical clinical importance in making decisions regarding initial and evolving treatment strategies, and conventional MR imaging is often not adequate in providing answers [1][5]. Automated tools, if proven accurate, can ultimately be applied to (i) provide more reliable differentiation, especially when the neoplasm is heterogeneous and therefore cannot be adequately sampled by localized needle biopsy, (ii) avoid invasive procedures such as biopsy, especially in cases where the risks outweigh the benefits (iii) expedite or anticipate the diagnosis (histological examination is usually time consuming).

Toward a similar goal, researchers used conventional MR imaging and echo-planar relative cerebral blood volume (rCBV) maps calculated from perfusion imaging to differentiate between high grade and low grade neoplasms [6], or assessed the contribution of MR perfusion alone in differentiating certain tumor types [7][8]. Many studies have used MR spectroscopy for brain tumor classification [9][10][11]. Specifically, spectroscopic and conventional MR imaging were used in [9] to differentiate benign from malignant brain neoplasms applying a decision tree algorithm, whereas spectroscopic and perfusion MRI was used in [10] to evaluate the inherent heterogeneity of brain neoplasms by defining four regions of interest (ROIs) in the tumoral and peritumoral region. Some studies used apparent diffusion coefficient (ADC) maps computed from Diffusion Tensor Imaging (DTI) data to differentiate metastases from primary cerebral tumors (by measuring diffusion in peritumoral edema) [12] or combined ADC with rCBV to differentiate tumefactive demyelinating lesions and primary neoplasms from abscesses and lymphomas [13].

The previous studies are very useful in determining the clinical significance of each MR sequence separately; however, they don't investigate non-linear relationships between different variables by pattern analysis. Pattern classification techniques were applied for differentiating brain neoplasms based on Linear Discriminant Analysis (LDA) [14][15], or Independent Component Analysis [16] on spectral intensities. Others applied Support Vector Machines (SVMs) on perfusion MRI [17] or combined variable selection and classification using Bayesian least squares SVMs and Relevance Vector Machines on microarray or spectroscopy data [18]. Textural features from T1 post-contrast images were employed in [19] to discriminate between metastatic and primary brain tumors using a probabilistic neural network with a non-linear least squares features transformation method. Although these studies apply more advance machine learning techniques, they use a single MR sequence and don't investigate the contribution of multiple imaging parameters. Multi-parametric features are explored by non-linear classification techniques in [20][21]. Li et al. [20] classify gliomas according to their clinical grade using linear SVMs trained on a maximum of 15 descriptive features (such as amount of mass effect or blood supply), which are estimated quantitatively by domain experts. The definition of such features is based on expert knowledge and therefore is not completely automated and reproducible. Devos et al. [21] combine standard MR intensities with spectroscopy imaging to improve classification performance using 3 classification techniques.

In this study, we explore the heterogeneous regions of brain tumors by combining imaging features from several sequences and extract morphological and texture characteristics. Our analysis requires 3 ROIs, which define the neoplastic and necrotic region on contrast enhanced T1-weighted MRI, and edematous region on FLAIR image. Instead of only measuring mean values in the ROIs, we investigate if conventional MRI has higher potential when complicated features are extracted, such as rotation invariant texture features based on Gabor filtering [22]. Moreover, since it has been shown that contrast enhancement in conventional MRI is not sufficient for glioma grading or differentiating between metastasis and high grade tumor, we also incorporate rCBV maps. This approach incorporates imaging data which are (or can be) acquired in a routine clinical protocol, including multi-parametric conventional MRI and perfusion.

We apply the method for binary classification, but also investigate the multi-class classification problem for differentiating between the most common brain tumors: metastasis, meningioma (usually grade I), low grade glioma (grade II), grade III glioma, and glioblastoma (grade IV) according to the World Health Organization (WHO) system. Grade II and grade III gliomas include astrocytomas (anaplastic or not), oligodendrogliomas and oligoastrocytomas. The proposed framework consists of a training step, where the classifier learns the imaging characteristics of the different tumor types, and a testing step, where the classifier recognizes the tumor type in a new image. The framework is general and, if a significant number of observations (training samples) exists, can be applied to classify any other neoplasm and also non-neoplastic pathologies including lymphoma, abscess, encephalitis. We differentiate between tumor types by combining multi-parametric MR images into a single classification rule rather than using single modalities independently. We exploit the potential of features extracted automatically from the images in order to avoid descriptive criteria, which are rater-dependent and require prior knowledge (the help of experts).

The proposed scheme consists of four parts: ROI definition, feature extraction, feature selection and classification based on SVMs. Leave-one-out cross-validation is used to test the robustness and accuracy of the proposed classification scheme.

2. Methods

We propose a multi-parametric framework for brain tumor classification and prediction of degree of malignancy. Intensity and texture based features are integrated via a pattern classification technique into a multi-parametric imaging profile. The features are first normalized to have zero mean and unit variance. A feature selection method is then used to select a small set of effective features for classification in order to improve the generalization ability and the performance of the classifier.

In this section, the clinical data and the acquisition protocol are first reviewed. Then, the preprocessing of the data and the ROIs for feature extraction are described followed by the definition of the features. Finally, the methods for feature selection and classification are presented.

2.1. Data

We examined 98 patients (52 women, 46 men; age 17-83 years) with a diagnosis of brain neoplasm (from September 2006 to December 2007) who had not been treated at the time of MRI. Four patients had multiple (2), not related to each other, lesions which were regarded as independent masses. All patients underwent biopsy or surgical resection of the tumor with histopathological diagnosis. The total of 102 brain masses were histologically diagnosed and graded based on World Health Organization (WHO) criteria as

  • metastasis (24),
  • meningiomas (4),
  • gliomas grade II (22) including astrocytomas, oligodendrogliomas, oligoastrocytomas, ependymomas and gliomatosis cerebri,
  • gliomas grade III (18) including anaplastic astrocytomas and (anaplastic) oligodendrogliomas,
  • glioblastomas (GBMs) (34) including 1 giant cell GBM.

The primary sites of cancer for patients with metastatic lesions were lung (14), breast (5), melanoma (3) and esophagus (1). For one case the primary tumor type could not be confirmed, most likely came from breast or lung. The study was approved by the Institutional Review Board and was compliant with the Health Insurance Portability and Accountability Act (HIPAA).

The patients were imaged using a 3.0 Tesla MRI scanner system (MAGNETOM Trio Tim System, Siemens Medical Systems, Erlangen, Germany), with a multi-channel phased-array coil. The imaging acquisition protocol was the same for all patients and includes the following sequences: axial 3D T1-weighted (T1) (magnetization prepared rapid acquisition gradient echo TR/TE/TI 1760/3.1/950, matrix size 192×256, pixel spacing 0.9766×0.9766 mm slice thickness 1 mm), sagittal 3D T2-weighted (T2) (matrix size 256×320, pixel spacing 0.8969×0.8969 mm, slice thickness 0.9 mm), Fluid Attenuated Inversion Recovery (FLAIR) (TR/TE/TI 9420/141/2500, matrix size 192×256, pixel spacing 0.9375×0.9375 mm, slice thickness 3 mm) and DTI. DTI was not used as part of this study, but will be incorporated in the future. Axial 3D contrast enhanced T1-weighted images (T1ce) were obtained after administration of a standard dose (0.1 mmol/kg) of gadodiamide with a power injector (Medrad, Idianola, PA). Also, after an initial loading dose of 3ml gadodiamide and a 5min delay, T2*-weighted dynamic susceptibility perfusion MRI was performed using a gradient echo EPI acquisition during bolus injection of 12 ml gadodiamide. Twenty slices were acquired with TR/TE 2000/45 ms, matrix size 128×128, pixel spacing 1.7×1.7 mm, slice thickness 3.0 mm. Relative cerebral blood volume (rCBV) maps were generated off-line by calculating the change in relaxation rate using the equation ΔR2*(t) = −ln(S(t)/S0)/TE, where S(t) is signal intensity at time t and S0 is baseline signal intensity, and then integrating under the ΔR2* (t) curve over time points corresponding to the first pass of the contrast bolus.

2.2. Preprocessing and definition of ROIs

The images are preprocessed following a number of steps including noise reduction, bias-field correction, and rigid intra-subject registration using the public software package FSL [23]. The co-registration of all sequences (T1, T1ce, T2, FLAIR, rCBV) is required in order to extract features from the ROIs and is performed with the rigid registration algorithm FLIRT [24] from FSL. The intensity levels are made comparable across subjects by histogram matching. For this purpose skull stripping is first performed using BET [25] to generate a brain tissue mask from the T1 image which is then used to extract the brain region from all other co-registered sequences. A linear transformation of the intensities (translation and scaling) is applied in order to minimize the L2-norm of the histogram difference between each subject and a template image. Histogram matching is not applied to the rCBV maps, calculated from the perfusion sequence.

The features were extracted from ROIs manually traced by two expert neuroradiologists. Maximum 4 ROIs were used:

  • ROI1 (neoplastic, enhancing), ROI2 (neoplastic, non-enhancing): includes all non-necrotic enhancing neoplastic tissue, or, if the lesion did not show enhancement, the whole non-necrotic T1-hypointense neoplastic tissue avoiding peritumoral edema by tracing the FLAIR image.
  • ROI3 (necrotic): this ROI was delineated only in cases including necrotic tumor tissue by tracing the coregistered T1ce, T1, T2 and FLAIR images and by excluding hemorrhage.
  • ROI4 (edematous): FLAIR and T2 images were used to depict the peritumoral edema (possibly including neoplastic infiltration), drawing the ROI surrounding the high signal intensity seen on these sequences.

It should be noted that the ROIs were drawn based on the signal intensity of different sequences and are not expected to be highly specific to each tissue type. For example, non-enhancing neoplastic tissue and edematous tissue might overlap and be both present in ROI2 or ROI4.

2.3. Feature description

We chose a large number of features (161) for investigation which included age, tumor shape characteristics, image intensity characteristics within some of the ROIs and Gabor texture features, as explained next.

Shape and statistical characteristics of tumor (evaluated in ROI1 [union or logical sum] ROI2 [union or logical sum] ROI3)

Five shape features (si, i = 1,…,5) of the total tumor area are investigated, i.e. the tumor circularity, irregularity, rectangularity, the entropy of radial length distribution of the boundary voxels, and the surface-to-volume ratio. Also three statistical features are calculated, i.e. the ratio of enhancing (venh), necrotic (vnec), and edematous (vedm) tumor volume versus total (enhancing and non-enhancing) tumor volume.

Image intensity characteristics

The mean (μ) and variance (var) of image intensities of T1, T1ce, T2 are calculated in the central and marginal area of ROI1, ROI2 and ROI3. For FLAIR images the same intensity characteristics are extracted from ROI4. For the rCBV maps, since hyperintense areas are indicative but not specific to tumor, we first mask out areas that appear hypointense in FLAIR and then calculate the mean and variance of rCBV in the central and marginal region of ROI1, ROI2 and ROI4. More analytically, the voxels with low intensity on the FLAIR image, such as ventricles or peripheral vessels in the cortical sulci, are excluded from the analysis of the rCBV maps, because they may represent normal vasculature which may be indistinguishable from abnormal neovascularity due to neoplastic infiltration. All intensity related features sum up to 52 features in total. These features are denoted as μcR(I), varcR(I), when measured in the central area and μmR(I), varmR(I), when measured in the marginal area, where R [set membership] {1, …,4} for ROI1 to ROI4, and I [set membership] {T1, T1ce, T2, FLAIR, rCBV}.

Gabor texture

The voxel-wise texture features of image I (x, y, z) are extracted at each tomographic slice of the 3D ROI by convoluting with 2D Gabor filters [26][27] and averaging inside the ROI. The 2D Gabor filters are mathematically described at location (x, y) as

gλ,θ,σ,ψ(x,y)=exp(xθ2+γ2yθ22σ2)cos(2πλxθ+ψ)wherexθ=xcos(θ)+ysin(θ)andyθ=xsin(θ)+ycos(θ)

λ = 1/f is the wavelength, θ the orientation, γ the spatial aspect ratio which determines the eccentricity of the convolution kernel and which was taken as γ = 1 in this study, and ψ the phase offset which determines the symmetry of the Gabor function. The standard deviation σ of the Gaussian factor determines the neighborhood of a voxel in which weighted summation takes place. The ratio σ/λ determines the spatial frequency bandwidth. We used a constant ratio σ/λ [27] and therefore σ is not considered as independent parameter.

We calculated the texture by combining the output of a symmetric (ψ = 0) and anti-symmetric (ψ = π/2) gabor kernel using the L2-norm [27], as shown below:

Gλ,θ(x,y)=(I(x,y)gλ,θ,ψ=0(x,y))2+(I(x,y)gλ,θ,ψ=π/2(x,y))2,

where [multiply sign in circle] denotes 2D linear convolution. Then, in order to make the average Gabor features rotation invariant, for each radial frequency f Fast Fourier Transform (FFT) is performed across orientation θ [22]. Suppose we use Nθ orientations within a period of π, then Nθ magnitudes of Fourier coefficients can be obtained for each frequency f, but only the first Nc = Nθ /2 +1 coefficients for each frequency are unique.

We extract the texture features from the T1ce in ROI1 [union or logical sum] ROI2 [union or logical sum] ROI3 and from FLAIR in ROI4. We use 5 wavelengths, λ={22,4,42,8,82}, and Nθ = 8 orientations in [0, π] which after the FFT produce Nc = Nθ /2 +1 = 5 unique coefficients for each frequency. Therefore for each bandwidth we obtain 25 rotation-invariant texture features, glc(I), where l = 1,…,5 is the wavelength index and c = 1,…,5 is the index on FFT coefficients and I [set membership] {T1ce, FLAIR}. We used 2 bandwidth values (1 and 1.5) and finally obtained 100 features in total describing texture in T1ce and FLAIR for tumor classification.

Fig. 1 illustrates in the first row the Gabor filter for a single frequency across the first 5 (out of 8) orientations and in the 2nd row the rotation-invariant filters after Fast Fourier Transform (FFT) for the same frequency. Fig. 2 illustrates examples of brain tumor types and the corresponding texture images. The first row shows one axial slice of the T1ce image with the tumoral region of interest (ROI1 ROI2 ROI3) indicated by a red line. The 2nd row shows the same slice in FLAIR image zoomed around the tumor region. As an example of texture, one pattern extracted from FLAIR is shown over the edematous area (ROI4). The illustration shows the voxel-wise texture without averaging over the area of interest.

Fig. 1
Examples of filters used to extract texture features. The 1st row shows Gabor filters for same frequency and different orientations and the 2nd row the rotation-invariant filters.
Fig. 2
MR images of different of brain tumor types and an example of texture images extracted from the edematous area. From left to right: meningioma, glioma grade II, grade III, grade IV and metastasis. 1st row: T1ce image with the tumoral region of interest. ...

It should be noted that texture features are affected by the MRI acquisition parameters, especially the spatial resolution [28]. Therefore it is important to use the same acquisition protocol during training of the classifier and during testing, i.e. when the classifier learns the imaging characteristics and when the classifier recognizes a new case, respectively.

2.4. Feature selection

Feature selection methods can be divided into feature ranking methods and feature subset selection methods [29]. The feature ranking methods compute a ranking score for each feature according to its discriminative power, and then simply select the top ranked features as final features for classification. These feature selection methods are preferable for high dimensional problems, due to their computational scalability. However, the subset of features selected by the feature ranking methods might contain a lot of redundant features, since the ranking score is computed independently for each feature, by completely ignoring its correlation with others. On the contrary, the feature subset selection methods focus on selecting a subset of features that jointly have better discriminative power. In general, sophisticated subset selection methods have better classification performance than feature ranking methods, but their high computational cost usually limits their applications to the high dimensional problems. Since combinatorial search of the optimal combination of features would be computationally prohibitive, we combine the advantages of both the feature ranking method and the feature subset selection method. Specifically, we first reduce the number of features by eliminating the less relevant features using a forward selection method based on a ranking criterion and then apply backward feature elimination using a feature subset selection method, as explained next.

Ranking-based criterion

We use a simple ranking-based feature selection criterion, a two-tailed t-test, which measures the significance of a difference of means between two distributions [30], and therefore evaluates the discriminative power of each individual feature in separating two classes. If the features are assumed to come from normal distributions with unknown, but equal, variances, the t-statistic is defined as:

t=x¯1x¯2(N11)v1+(N21)v2N1+N22(1N1+1N2),

where [x with macron]1, v1 and [x with macron]2, v2 are the sample mean and sample variance of a particular feature in 1st and 2nd class, respectively. N1 and N2 are the numbers of samples in each class.

Since the correlation among features has been completely ignored in this feature ranking method, redundant features are inevitably selected, which ultimately affects the classification results. Therefore, we use this feature ranking method to select the more discriminative features, e.g. by applying a cut-off ratio (p-value <= α, where α = 0.1 is the significance level for rejecting the null hypothesis), and then apply the feature subset selection method on the reduced feature space, as detailed in the next paragraph. The remaining features (with p-value > α) are ranked by applying the t-test using a bagging strategy [31]. Specifically, the training set is divided in 5 folds and the features are ranked at each fold separately. Then the rankings are combined to produce the total rank for each feature based on forward selection.

2) Features subset selection method based on SVM

SVM-based feature selection methods have been successfully applied in a variety of problems. One good example is the support vector machine recursive feature elimination (SVM-RFE) algorithm which was initially proposed for a cancer classification problem [32], and was later extended by introducing SVM-based leave-one-out error bound criteria in [33]. The goal of SVM-RFE is to find a subset of features that optimizes the performance of the classifier. This algorithm determines the ranking of the features based on a backward sequential selection method that removes one feature at a time. At each time, the removed feature makes the variation of SVM-based leave-one-out error bound smallest, compared to removing other features.

The training data set {xk, yk} [set membership] Rn × {−1, 1} consists of the training patterns xk and the known class labels yk, where k = 1,…, NS (NS = N1+N2 is the total number of training samples belonging to both classes). We apply the zero-order method for identifying the variable that produces the smallest value of the ranking criterion when removed, and use the weight magnitude ‖w2 as ranking criterion, defined as

w(i)2=j=1NSk=1NSak(i)aj(i)ykyjK(i)(xk,xj)
(1)

where K(i) is the Gram matrix of the training data (see Eq. 2 next paragraph) when the variable i is removed and a(i) is the corresponding solution of the SVM classifier.

3) Constrained LDA (CLDA)

For the purpose of comparison of the previously described feature selection algorithm with a commonly used dimensionality reduction method, such as LDA, we also investigated the performance of the constrained LDA (CLDA) algorithm for feature selection [34]. CLDA maximizes the discriminant capability between classes without transforming the original features, as done by traditional LDA or PCA (Principle Component Analysis). This is important because it allows to preserve the physical meaning of the input variables and to assess the contribution of each original variable in classification.

4) Calculating total rank by combination

Classification is performed by following a leave-one-out strategy on the training samples. For each leave-one-out experiment, feature ranking is performed using data only from the training samples. The feature selection method is implemented in each training subset in order to correct for the selection bias [35]. It is important that cross-validation is external to the feature selection process in order to more accurately estimate the prediction error. Evidently, there is no guarantee that the same subset of features will be selected at each leave-one-out experiment. We combine the rankings of all leave-one-out experiments and report the total rank of features (in Table 2) according to the frequency of a feature appearing in a specific rank. For example the top-ranked feature is assumed to be the one that more frequently has the highest ranking score, regardless of the distribution of the scores it receives across experiments. In order to assess if the features are selected consistently, we use two measures: the entropy (E) to evaluate the certainty in ranking and the standard deviation (S) to measure the compactness. These measures are normalized between 0 (no consistency) and 1 (same rank in all experiments).

Table 2
Binary classification results obtained by leave-one-out cross-validation using SVM for classification and SVM-RFE for feature selection.

2.5. Classification

Classification was performed by starting with the more discriminative features and gradually adding less discriminative features, until classification performance no longer improved. Three pattern classification methods were investigated for comparison: LDA with Fisher's Discriminant Rule [36], k-Nearest Neighbor (k-NN) [37], and nonlinear SVMs [33]. In LDA, a transformation function is sought that maximizes the ratio of between-class variance to within-class variance. Since usually there is no transformation that provides complete separation, the goal is to find the transformation that minimizes the overlap of the transformed distributions. k-NN classification is performed based on closest training examples in the feature space. In SVMs, the original input space is mapped into a higher dimensional feature space in which an optimal separating hyperplane is constructed such that the distance from the hyperplane to the nearest data point is maximized. Due to this property, SVM classifiers tend to possess good generalization ability.

A critical parameter in SVM is the penalty parameter C. A larger C corresponds to assigning a higher penalty to misclassification. Since the data are unbalanced and the sample size is rather small to produce balanced classes by subsampling the largest class, we used a weighted SVM1 [38] to apply larger penalty to the class with the smaller number of samples. If the penalty parameter is not weighted (equal C for both classes), there is an undesirable bias towards the class with the large training size; and thus we set the ratio of penalties for the two classes, C1 and C2, (in each binary classification) to the inverse ratio of the training class sizes [38]. The kernel function used in our SVM classifier is Gaussian radial basis function kernel. The Gram matrix for two feature vectors xi, xj is defined as

K(xi,xj)=exp(xixj2s2),
(2)

where s controls the size of the Gaussian kernel. We defined s to be adaptive to the number of retained features (NF), using the equation s = k · NF · log(NF), where k is a constant. The aim was to include a constant fraction of the training samples in a ball in feature space of size s for any dimensionality (number of features). The constant k was determined such that the fraction of the training samples contained in the kernel is approximately 20%.

The multi-class problem was solved by constructing and combining several binary classifiers into a voting scheme. We applied majority voting from all one-versus-all binary classification problems. For assessing the predictive ability of the classification scheme we applied leave-one-out cross-validation. In the future, when more training data will be available for each class, we will assess the generalization ability through 3-fold cross validation and will further optimize the classification parameters, C and s, for each classification problem by exhaustive search.

3. Results

We applied leave-one-out external cross-validation in classifying meningioma (MEN), glioma of grade II, III, and IV (GL2, GL3 and GL4, respectively), and metastasis (MET) by applying three different classification methods (LDA, kNN, non-linear SVM) and two feature ranking methods (t-test with bagging, CLDA). The results are presented in Table 1. The first column in Table 1 shows the tumor type to be classified. The other columns show the number of selected features (NF) giving the highest classification accuracy (the feature selection step was cross-validated, however we currently have no automated way of selecting the number of features, hence we recorded performance as a function of the number of features), the classification accuracy (Acc) defined as the percentage of correctly classified samples, and the area under the receiver operating characteristic (ROC) curve (AUC). The last row shows the corresponding mean values. The results show that the classification accuracy is higher when using SVM, as expected. The number of features giving highest accuracy is overall smaller when t-test is used as ranking criterion versus CLDA.

Table 1
Binary classification accuracy (Acc) and AUC obtained by leave-one-out cross-validation using different classifiers (LDA, kNN, SVM) and feature ranking methods (t-test with bagging or CLDA)

Subsequently we investigated the performance of SVM using the SVM-RFE algorithm for feature selection and assessed the contribution of each feature in classification. The results are shown in Table 2. The first ten rows show the classification between two single tumor types, similar to Table 1, whereas the last two rows show the classification between combined types: secondary versus primary gliomas (i.e. metastases versus gliomas grade II, III, IV) and low versus high grade gliomas (grade II versus III and IV). Those two classification problems are clinically relevant since treatment is usually adapted accordingly. Meningiomas are not included in the combined classification problems because they differ from the glial tumors and metastases in both origin and behavior.

Table 2 also shows the top ranked features (thresholded based on t-statistic and ranked by the SVM-RFE algorithm, as described in section 2.4) for each classification task. Not all selected features are shown in the third column, but only the overall top ranked calculated by combining all leave-one-out results. The notation used for these features is described analytically in section 2.3. The subsequent columns show the sensitivity, specificity, accuracy and AUC values, respectively.

The results of Table 2 show that perfusion is an important MR imaging technique for most classification tasks. The enhancing portion in T1ce appears as significant parameter in the distinction of meningioma from gliomas of all grades; specifically in the case of glioblastoma, the enhancing portion in T1ce is selected as a single feature and achieves 97.4% accuracy. A combination of multi-parametric features, including T2 and T1 pre-contrast imaging characteristics, is used for classifying primary glioma versus metastasis (with accuracy 84.7%) and low versus high grade glioma (with accuracy 87.8%). Gabor texture seems also to be a relevant pattern. It is interesting to note that usually the large wavelengths (λ = 8 or 82) appear as part of the selected patterns.

For most binary classification pairs the number of features selected with SVM-RFE giving the highest classification accuracy is relatively small which means that the chance of overfitting is reduced and the generalization ability improved. The feature reduction is an important advantage of SVM-RFE over the simple t-test. Although the accuracy over all classification tasks is on average almost the same when applying SVM-RFE as a second step after t-test, the average number of selected features over all classification tasks is overall reduced by 33% (NF = 30 in the case of t-test and NF = 20 in the case of SVM-RFE). Fig. 3 and Fig. 4 (1st row) show the classification accuracy with increasing number of retained features. The plots illustrate that the fluctuations of accuracy around the optimal number of features are small and the method is not very sensitive to the exact number of selected features, except in the case of metastasis versus GBM where the plateau is quite narrow. The feature selection method in general eliminates the redundant features, reduces the noise and builds groupings that are both robust and accurate.

Fig. 3
Classification accuracy (with SVM-RFE) versus number of retained features for classification of metastasis versus gliomas grade II, III, or IV (1st row) and glioma grading (2nd row).
Fig. 4
Classification accuracy (with SVM-RFE) versus number of retained features (1st row) and ROC analysis (2nd row) for two main classification problems: metastases versus primary gliomas (grade II, III, IV) shown in the 1st column and low versus high grade ...

Accuracy is relatively high for all classification pairs except for grade II versus grade III glioma. These two types of tumor have common characteristics due to the mixed histology and are difficult to differentiate. On the other hand, the accuracy in classifying metastasis from primary glioma and low from high grade glioma is high as illustrated in Fig. 4.

Consistency in feature selection during cross-validation

In order to assess if the same features are selected as discriminant features for each leave-one-out experiment, we calculated the normalized entropy (E) and standard deviation (S) of the ranking scores for each feature. Averaging of E and S across all selected features showed that the highest consistency was observed when distinguishing meningioma from GBM (Ē = 0.95, S = 0.99, NF = 1), as well as from gliomas grade II (Ē = 0.82, S = 0.92, NF = 9) and metastasis (Ē = 0.71, S = 0.92, NF = 6).

Multi-class

For the multi-class problem, one-versus-all SVM classification and majority voting is applied. For each one-versus-all classification tasks, feature selection is performed using SVM-RFE. Since in the multi-class problem, the number of classifications to be performed is large, feature selection with RFE on the total number of features (161) is computationally very expensive. For this reason we excluded from the evaluation all features that were never (or only once) selected as significant features in all binary classification tasks (Table 2). Accordingly, 50 features were retained and used for feature selection. The confusion matrix with leave-one-out cross-validation is shown in Table 3. Meningiomas were not included due to their small sample size and small clinical value. The highest classification accuracy is achieved for metastasis, where only 2 (out of 24) samples are misclassified (1 as glioma grade III and 1 as GBM) and for low grade tumors with 2 (out of 22) cases being misclassified (1 as glioma grade III and 1 as metastasis).

Table 3
Multi-class confusion matrix obtained with one-versus-all voting scheme using SVM classifiers with RFE.

4. Discussion and Conclusions

This paper presents a classification scheme for differentiating adult brain tumors using conventional MRI and rCBV maps calculated from perfusion MRI. Shape characteristics, statistics on image intensities and rotation-invariant Gabor texture features are extracted from the central and marginal tumoral, edematous and necrotic region. The scheme is fully automated and the help of an expert is not required except for tracing the ROIs. Overall, we found that SVM-based classification of texture patterns is a very promising approach to developing an objective and quantitative evaluation of brain tumors. However larger datasets need to be analyzed, which is expected to test the generalization ability of this approach, but also to further improve its performance, as such classification systems perform better if trained more extensively.

The results of multi-class classification (Table 3) illustrate that the highest classification accuracy is achieved for metastasis (91.7%) and low grade glioma (90.9%), whereas the classification accuracy for GBM is reduced (29.4% are classified as grade III and 29.4% as metastasis). The lowest classification rate in the multi-class problem is for the grade III glioma where the largest portion (44.4%) is classified as grade II and the smaller portions as GBM (11.1%) or metastasis (11.1%). The prediction of glioma grade is inherently difficult since brain neoplasms are often heterogeneous, meaning that different histopathologic features (such as mitotic index) can be present throughout an individual neoplasm. Therefore, part of our validation relies on examining the distribution of the false positives. The results of Table 3 show that most of the misclassified samples are assigned to a class of close degree of malignancy (grade). The degree of malignancy starts from the less malignant tumors (such as meningiomas) and increases until the high grade gliomas and metastases. The failure of the method to classify grade III gliomas possibly indicates that the extracted features do not form a separate cluster, but are rather similar to the features of the nearby classes (grade II and grade IV). The binary classification tasks, where only tumors from two single types are compared, exhibit higher classification accuracy (mean = 91%, standard deviation = 7.7 % over all tasks). The highest accuracy is achieved when distinguishing grade II glioma from metastasis (97.8%) and the lowest (75%) when distinguishing grade II from grade III glioma.

Generally the classification accuracy using the proposed method is comparable or higher than in other studies [6] not using spectroscopy or DTI. The results in [21] using imaging intensities from standard MR alone cannot be immediately compared with ours, because (i) they were based on ROIs extracted from spectroscopy imaging, which we want to avoid in this study, and (ii) they reflected the number of voxels correctly classified rather than the number of subjects (voxels from the same image were used as independent samples). In [19], an SVM-based classification system with radial basis function kernel achieved 74.4% overall accuracy in discriminating primary brain tumors from metastases utilizing the external cross-validation method, whereas an Artificial Neural Network classifier performed better (80% accuracy). The features employed in that study were solely textural features from the T1ce MR images. Our analysis, achieving 84.7% accuracy on the leave-one-out cross-validation error, also showed that Gabor textural features from T1ce are important for this classification task combined with statistical parameters (mean, variance, etc) from other imaging sequences (T2, rCBV). Moreover, our results in glioma grading (low versus high grade) are comparable with the ones reported in [20] (the accuracy in [20] was assessed by cross-validation not being external to the feature-selection process). This study was based on descriptive features estimated by domain experts whereas we applied features extracted in a semi-automatic way.

The feature subset selection method shows that all MR sequences are important for classification, since different features are selected for different classification tasks (Table 2). The rCBV maps calculated from perfusion seem to be particularly important since parameters extracted from those are usually top-ranked in most classification pairs. Also contrast enhanced T1 is a significant sequence; the volume of enhancement as percentage of total tumor volume (venh) is used as single feature for distinguishing between meningioma and GBM and is also part of the top-ranked features for other classification tasks. Texture calculated on specific frequencies also contributes to tumor classification. Textural parameters extracted from the edematous area in FLAIR seem to be significant for glioma grading, whereas texture in the neoplastic area in T1ce is important for distinguishing metastatic from glial tumors.

The features currently used for classification are extracted from the imaging profile and describe shape and texture. We have not incorporated features describing the deformation of the healthy structure due to tumor growth. It is known that different tumor types affect the surrounding healthy region differently and therefore studies have used the tumor mass effect as a descriptor for classifying gliomas according to their clinical grade [20] or as an independent predictor of survival [39]. For this purpose we recently proposed an automated method for quantifying the mass effect [40] by measuring how much the deformation in the tumor origin deviates from the range observed in a normal population. We plan to incorporate the indicator of mass effect as an additional parameter in our classification framework in the future.

Moreover, a limitation of this framework comes from the need for tracing ROIs which makes the current approach semi-automatic and subject to intra- and inter-observer variability. However it should be also noted that most of applied features are based on spatial averaging and therefore are not very sensitive to small differences in the delineation of ROIs. The most sensitive features are expected to be the ones calculated over the marginal area of the ROIs. We have investigated in the past the use of conventional and advanced MR imaging for automatic segmentation of neoplastic and healthy tissue [41]. We plan in the future to combine the two frameworks in order to automatically segment and classify brain neoplasms. The aim of this study is to assess the discrimination ability of standard MR imaging usually acquired in most clinical facilities. The use of imaging intensities from standard MR alone reaches lower performance than when combined with spectroscopy [21]. In the future we plan to incorporate also DTI and spectroscopy in our analysis for assessing the increase in classification accuracy. Other data based on microscopy imaging or histopathological exams could also be included to increase accuracy in predictions. For example nuclear features extracted from segmented nuclei [42] or blood vessel patterns [43] can assist brain astrocytomas malignancy grading, and the spatial organization of tumor vessels can be indicative for differentiation between medulloblastomas and supratentorial PNETs [44]. Voxel-wise histopathological parameters that reflect proliferation and protein synthesis or PET imaging could also be used, if available, in order to extend the current framework to output spatial maps of malignancy (instead of a single grade) providing more reliable differentiation of heterogenous neoplasms.

Footnotes

1Chang C-C, Lin C-J. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

References

1. Aronen HJ, Gazit IE, Louis DN, Buchbinder BR, Pardo FS, Weisskoff RM, Harsh GR, Cosgrove GR, Halpern EF, Hochberg FH, Rosen BR. Cerebral Blood Volume Maps of Gliomas: Comparison with tumor grade and histological findings, Radiology. 1994;191:41–51. [PubMed]
2. Krabbe K, Gideon P, Wagn P, Hansen U, Thomsen C, Madsen F. MR diffusion imaging of human intracranial tumors. Neuroradiology. 1997;39:483–489. [PubMed]
3. Provenzale JM, Mukundan S, Baroriak DP. Diffusion-weighted and Perfusion MR Imaging for Brain Tumor Characterization and Assessment of Treatment. Radiology. 2006;239:632–649. [PubMed]
4. Wolde H, Pruim J, Mastik MF, Koudstaal J, Molenaar WM. Proliferative Activity in Human Brain Tumors: Comparison of Histopathology and L-[l-HC] TyrosinePET. J Nucl Med. 1997;38:1369–1374. [PubMed]
5. Young RJ, Knopp EA. Brain MRI: Tumor Evaluation, Journal of Magnetic Resonance Imaging. 2006;24:709–724. [PubMed]
6. Lev MH, Ozsunar Y, Henson JW, Rasheed AA, Barest GD, Harsh GR, Fitzek MM, Chiocca EA, Rabinov JD, Csavoy AN, Rosen BR, Hochberg FH, Schaefer PW, Gonzalez RG. Glial Tumor Grading and Outcome Prediction Using Dynamic Spin-Echo MR Susceptibility Mapping Compared with Conventional Contrast-Enhanced MR: Confounding Effect of Elevated rCBV of Oligodendroglimoas. Am J Neuroradiol. 2004;25:214–221. [PubMed]
7. Kremer S, Grand S, Remy C, Esteve F, Lefournier V, Pasquier B, Hoffmann D, Benabid AL, Le Bas JF. Cerebral blood volume mapping by MR imaging in the initial evaluation of brain tumors. Journal of Neuroradiology. 2002;29:105–113. [PubMed]
8. Provenzale JM, Mukundan S, Baroriak DP. Diffusion-weighted and Perfusion MR Imaging for Brain Tumor Characterization and Assessment of Treatment Response. Radiology. 2006;239:632–649. [PubMed]
9. Wang Q, Liacouras EK, Miranda E, Kanamala US, Megalooikonomou V. Classification of brain tumors using MRI and MRS. Proceedings of the SPIE Conference on Medical Imaging. 2007
10. Weber MA, Zoubaa S, Schlieter M, Jüttler E, Huttner HB, Geletneky K, Ittrich C, Lichy MP, Kroll A, Debus J, Giesel FL, Hartmann M, Essig M. Diagnostic performance of spectroscopic and perfusion MRI for distinction of brain tumors. Neurology. 2006;66:1899–1906. [PubMed]
11. Cho YD, Choi GH, Lee SP, Kim JK. 1H-MRS metabolic patterns for distinguishing between meningiomas and other brain tumors. Magn Reson Imaging. 2003;21:663–672. [PubMed]
12. Higano S, Yun X, Kumabe T, Watanabe M, Mugikura S, Umetsu A, Sato A, Yamada T, Takahashi S. Malignant astrocytic tumors: clinical importance of apparent diffusion coefficient in prediction of grade and prognosis. Radiology. 2006;241:839–846. [PubMed]
13. Al-Okaili RN, Krejza J, Woo JH, Wolf RL, O'Rourke DM, Judy KD, Poptani H, Melhem ER. Intraaxial Brain Masses: MR Imaging–based Diagnostic Strategy—Initial Experience. Radiology. 2007;243:539–550. [PubMed]
14. Tate AR, Majós C, Moreno A, Howe FA, Griffiths JR, Arús C. Automated classification of short echo time in in vivo 1H brain tumor spectra: a multicenter study. Magn Reson Med. 2003;49:29–36. [PubMed]
15. Majós C, Julià-Sapé M, Alonso J, Serrallonga M, Aguilera C, Acebes JJ, Arús C, Gili J. Brain tumor classification by proton MR spectroscopy: comparison of diagnostic accuracy at short and long TE. Am J Neuroradiol. 2004;25:1696–1704. [PubMed]
16. Huang Y, Lisboa PJG, El-Deredy W. Tumour grading from magnetic resonance spectroscopy: a comparison of feature extraction with variable selection. Statist Med. 2003;22:147–164. [PubMed]
17. Emblem KE, Zoellner FG, Tennoe B, Nedregaard B, Nome T, Due-Tonnessen P, Hald JK, Scheie D, Bjornerud A. Predictive Modeling in Glioma Grading From MR Perfusion Images Using Support Vector Machines. Magnetic Resonance in Medicine. 2008;60:945–952. [PubMed]
18. Lu C, Devos A, Suykens JAK, Arus C, Van Huffel S. Bagging linear sparse Bayesian learning models for variable selection in cancer diagnosis. IEEE Transactions on Information Technology in Medicine. 2007;11:338–346. [PubMed]
19. Georgiadis P, Cavouras D, Kalatzis I, Daskalakis A, Kagadis GC, Sifaki K, Malamas M, Nikiforidis G, Solomou E. Improving brain tumor characterization on MRI by probabilistic neural networks and non-linear transformation of textural features. Computer Methods and Programs in Biomedicine. 2008;89:24–32. [PubMed]
20. Li G, Yang J, Ye C, Geng D. Degree prediction of malignancy in brain glioma using support vector machines. Computers in Biology and Medicine. 2006;36:313–325. [PubMed]
21. Devos A, Simonetti AW, van der Graaf M, Lukas L, Suykens JA, Vanhamme L, Buydens LM, Heerschap A, Van Huffel S. The use of multivariate MR imaging intensities versus metabolic data from MR spectroscopic imaging for brain tumour classification. J Magn Reson. 2005;173:218–228. [PubMed]
22. Tan TN. Rotation invariant texture features and their use in automatic script identification. IEEE Trans PAMI. 1998;20:751–756.
23. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage. 2004;23 1:S208–219. [PubMed]
24. Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Medical Image Analysis. 2001;5:143–156. [PubMed]
25. Smith SM. BET: Brain Extraction Tool. FMRIB technical report TR00SMS26
26. Daugman JG. Uncertainty relations for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America. 1985;A2:1160–1169. [PubMed]
27. Kruizinga P, Petkov N. Non-linear operator for oriented texture. IEEE Trans on Image Processing. 1999;8:1395–1407. [PubMed]
28. Mayerhoefer ME, Szomolanyi P, Jirak D, Materka A, Trattnig S. Effects of MRI acquisition parameter variations and protocol heterogeneity on the results of texture analysis and pattern discrimination: an application-oriented study. Med Phys. 2009;36:1236–43. [PubMed]
29. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–1182.
30. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press; 1992. p. 616.
31. Breiman L. Bagging predictors, Machine Learning. 1996;24:123–140.
32. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
33. Rakotomamonjy A. Variable selection using SVM-based criteria. J Mach Learn Res. 2003;3:1357–1370.
34. Zifeng C, Baowen X, Weifeng Z, Dawei J, Junling X. CLDA: Feature Selection for Text Categorization Based on Constrained LDA. International Conference on Semantic Computing. 2007 Sept 17-19;:702–712.
35. Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA. 2002;99:6562–6566. [PubMed]
36. Lachenbruch PA. Discriminant Analysis. Hafner Press; 1975.
37. Dasarathy BV, editor. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. 1991.
38. Huang YM, Du SX. Weighted support vector machine for classification with uneven training class sizes. Proceedings of 2005 International Conference on Machine Learning and Cybernetics. 2005;7:4365–4369.
39. Lacroix M, Abi-Said D, Fourney DR, Gokaslan ZL, Shi W, DeMonte F, Lang FF, McCutcheon IE, Hassenbusch SJ, Holland E, Hess K, Michael C, Miller D, Sawaya R. A multivariate analysis of 416 patients with glioblastoma multiforme: prognosis, extent of resection, and survival. J Neurosurg. 2001;95:190–198. [PubMed]
40. Zacharaki EI, Hogea CS, Shen D, Biros G, Davatzikos C. Non-diffeomorphic registration of brain tumor images by simulating tissue loss and tumor growth. Neuroimage. 2009;46:762–774. [PMC free article] [PubMed]
41. Verma R, Zacharaki EI, Ou Y, Cai H, Chawla S, Wolf R, Lee SK, Melhem ER, Davatzikos C. Multi-parametric tissue characterization of brain neoplasms and their recurrence using pattern classification of MR images. Academic Radiology. 2008;15:966–977. [PMC free article] [PubMed]
42. Glotsos D, Tohka J, Ravazoula P, Cavouras D, Nikiforidis G. Automated diagnosis of brain tumours astrocytomas using probabilistic neural network clustering and support vector machines. International Journal of Neural Systems. 2005;15:1–11. [PubMed]
43. Selby DM, Woodard CA, Lakshman-Henry M, Bernstein JJ. Are endothelial cell patterns of astrocytomas indicative of grade? in Vivo. 1997;11:371–375. [PubMed]
44. Goldbrunner RH, Pietsch T, Vince GH, Bernstein JJ, Wagner S, Hageman H, Selby DM, Krauss J, Soerensen N, Tonn JC. Different vascular patterns of medulloblastoma and supratentorial primitive neuroectodermal tumors. Int J Dev Neurosci. 1999;17:593–599. [PubMed]