PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Med Phys. Author manuscript; available in PMC Apr 15, 2009.
Published in final edited form as:
Med Phys. Apr 2008; 35(4): 1377–1386.
doi:  10.1118/1.2870218
PMCID: PMC2669284
NIHMSID: NIHMS101228
Improved classifier for computer-aided polyp detection in CT Colonography by nonlinear dimensionality reduction
Shijun Wang, Jianhua Yao, and Ronald M. Summers*
Diagnostic Radiology Department, National Institutes of Health Clinical Center Building 10 Room 1C368X MSC 1182, Bethesda, MD 20892-1182 USA
*Corresponding author: Ronald M. Summers (rms/at/nih.gov)
Computer-aided detection (CAD) has been shown to be feasible for polyp detection on computed tomography (CT) scans. After initial detection, the data set of colonic polyp candidates has large-scale and high dimensional characteristics. In this paper, we propose a nonlinear dimensionality reduction method DMLLE based on Diffusion Map and Locally Linear Embedding for large-scale datasets. By selecting partial data as landmarks, we first map these points into a low dimensional embedding space using Diffusion Map. The embedded landmarks can be viewed as a skeleton of whole data in the low dimensional space. Then by using Locally Linear Embedding algorithm, non-landmark samples are mapped into the same low dimensional space according to their nearest landmark samples. The local geometry is preserved in both the original high dimensional space and the embedding space. In addition, DMLLE provides a faithful representation of the original high-dimensional data at coarse and fine scales. Thus, it can capture the intrinsic distance relationship between samples and reduce the influence of noisy features, two aspects that are crucial to achieving high classifier performance. We applied the proposed DMLLE method to a colonic polyp dataset of 175269 polyp candidates with 155 features. Visual inspection shows that true polyps with similar shapes are mapped to close vicinity in the low dimensional space. We compared the performance of a Support Vector Machine (SVM) classifier in the low dimensional embedding space with that in the original high dimensional space, SVM with principal component analysis dimensionality reduction and SVM committee using feature selection technology. Free-response receiver operating characteristic analysis shows that by using our DMLLE dimensionality reduction method, SVM achieves higher sensitivity with lower false positive rate compared with other methods. For 6-9 mm polyps (193 true polyps contained in test set), when the number of false positives per patient is 9, SVM with DMLLE improves the average sensitivity from 70% to 83% compared with that of an SVM committee classifier which is a state-of-the-art method for colonic polyp detection (p < 0.001).
Keywords: Computed Tomographic Colonography, Computer-aided Detection, Dimensionality reduction, Diffusion Map, Locally Linear Embedding, RFOC analysis
Colon cancer is the second leading cause of cancer-related deaths in the United States. Optical colonoscopy (OC) is a traditional and effective method for colonic polyp detection. However, OC is invasive and requires sedation of patients to reduce discomfort. Computed tomographic colonography (CTC), also known as virtual colonoscopy (VC), is an imaging technique based on X-ray penetration and is less invasive than optical colonoscopy. When CTC is performed in conjunction with computer aided detection (CAD) software, the screening becomes easier on the patient, less time-consuming, and more accurate for the radiologist [1-3].
Computer aided detection (CAD) systems to detect polyps using CTC have been under investigation in the past decades. Most of these systems used features computed from the original CT data set, such as surface curvature [18], volumetric cluster [21], edge displacement fields [20], surface patch [19], deformable models [15], surface normal overlap [16] and wavelet texture features from 2D endoluminal projection images [17], etc.
Shape analysis plays an important role in the detection of colonic polyps. Colonic polyps appear as bulbous protrusions that adhere to the inner wall of the colon or on the folds which have elongated, ridgelike structures. It is possible to extract more than 150 features from each polyp candidate. However, the high dimension (number of features) is an obstacle to the processing of the data. For example, it will bring us a lot of trouble when we approximate functions or estimate density of data in high dimensional space. It is called the curse of dimensionality. For colonic polyp detection problem, high dimensionality will make it hard to learn a good decision boundary or function in the space where the data reside. Especially when the data are noisy, high dimensions may cause highly biased estimates of the distance relationship of samples and thereby reduce the accuracy of predictions. Thus, a point of interest is finding a compact representation of polyp candidates in a low dimensional space. Classical techniques for dimensionality reduction, such as Principal Components Analysis (PCA) are designed for data whose submanifold is embedded linearly or almost linearly in the observation space [24]. Because many data from real applications, such as visual perception [4], have nonlinear submanifold structures, there is a surge in the research of Nonlinear Dimensionality Reduction (NLDR) in recent years. The representative methods of NLDR include local approaches Locally Linear Embedding (LLE) [5], Laplacian Eigenmaps [6] and global approaches ISOMAP [4], Diffusion Map [7-9], etc. In these nonlinear methods, local methods try to preserve the local geometry of the data in low-dimensional space; global approaches tend to give a more faithful representation of the data's global structure.
Here we attempt to explore, by dimensional reduction, the extraction of useful information from polyp candidates. Learning and testing are performed in a low dimensional space, and thus the classifier can be more effectively trained, as compared with that in the original high dimensional space. The outline of the paper is as follows: In section 2 we introduce the pipeline of our CAD system. Section 3 proposes a nonlinear dimensionality reduction method called Diffusion Map Locally Linear Embedding (DMLLE) for large scale datasets. We also discuss the relationship between DMLLE and Nystrom extension \cite{Fowlkes04,Lafon06b,Bengio04} in this section. Section 4 shows the experimental results and analysis of DMLLE on a large-scale colonic polyp dataset. Finally, section 5 concludes with a short summary.
Our CTC CAD protocol consists of four main parts. Firstly, when the CAD software reads a CT scan, it automatically identifies the colon wall and lumen based on a region growing and iso-surface technique [18]. This step is called colon segmentation. Secondly, for each point on the colon surface, geometric and local volumetric properties are analyzed and filtered. The specifications for the filter are elliptical curvature of the peak subtype, mean curvature range −2.5 to −0.7 cm−1, ≥8 vertices, ≥ 0.25 cm diameter, and sphericity < 1.2. The filtered surface vertices are then clustered based on connectivity. The centroid of each cluster is used as a polyp candidate [18]. The candidates are then passed to a segmentor where the detection boundaries are delineated. The segmentation is based on a combination of knowledge-guided intensity adjustment, fuzzy clustering and adaptive deformable model [15]. The third step is feature extraction, where characteristic features, such as shape and texture, are calculated from the segmented detection and its surrounding regions. Finally, the fourth provision step is discrimination, whereby the extracted features are subjected to learning algorithms [13]. These algorithms are applied to sets of training data with known labels of each polyp candidate. The learned classifier then predicts the label of each new polyp candidate. Experienced operators manually traced the boundaries of all polyps in our database and used them as the reference standard in our training and validation process.
To apply the dimensionality reduction method in the CAD system and make it practical, two major problems need to be considered. The first one is the large-scale problem. Our dataset contains CT scans of more than 1200 patients. After preprocessing, we get more than 160000 polyp candidates altogether. How to deal with such a large-scale dataset is a big challenge for current dimensionality reduction methods. The second one is the noise problem. For the complex environment inside the colon and the artifacts induced by segmentation, the features extracted from a polyp candidate contain much noise. In addition, for clinical application, we focus on the detection of small polyps whose size is in the range from 6 mm to 9 mm. Because the resolution of 3D CT images is limited, the features of small polyp candidates are not as accurate as that of bigger polyp candidates. So we should consider the noise resistance ability of dimension reduction methods. For example, if two sub-manifolds are linked by noisy samples in original high dimension space, then the performance of ISOMAP algorithm [4] will drop because the geodesic distances between samples distributed on these two sub-manifolds will be close. The same is true for the LLE method [5].
3.1 Construct global skeleton in low dimensional space by using Diffusion Map
R.R. Coifman et al. proposed a unified framework for dimensionality reduction, graph partitioning and data set parameterization based on the Markov random walk on the data [7-9]. The method called Diffusion Map is noise resistant because the calculation of distance between two samples considers all paths between them. So noisy samples will have little influence on the calculation of similarities between samples.
Assume that a system contains n samples X = {x1,x2,…,xn}. The weight function w(x, y) depicts the distance relationship between samples x and y. The transition probability of going from node x to y in one step is p1(x, y) = w(x, y)/d(x), where d(x) = Σz[sm epsilon]X w(x, z) is called the degree of node x. The one step Markov transition matrix P has a set of right eigenvectors corresponding eigenvalues 1 = |λ0|≥|λ1|≥ … ≥|λn−1|: j = λjψj. The diffusion map is defined as equation M1, where t represents the steps of Markov random walk on the graph and q(t) is the number of dimensions of embedding space.
In the Diffusion Map algorithm [9], we need to solve an eigenvalue decomposition problem in which the size of the Markov probability transition matrix is n×n. For real large-scale application, it would be prohibitive to do the eigenvalue decomposition task. For example, if an application has 100,000 samples, then the Markov probability transition matrix will need about 80 Giga bytes of memory space. The eigenvalue decomposition cannot be done in acceptable time by using current mainstream PC servers. So how to extend the Diffusion Map algorithm and make it applicable for large-scale data are critical for some real applications.
Inspired by the idea of landmarks in the ISOMAP algorithm [4], here we consider just randomly selecting partial data and embedding them into a lower dimensional space by using Diffusion Map. These selected samples are viewed as landmarks. The number of landmarks is chosen according to the capability of eigenvalue decomposition of our computation resource. As we know, Diffusion Map uses multi-step Markov transition probability as the measure of distance between any two points. So it can preserve the global geometry structure of original dataset in low dimensional space and be viewed as a global dimension reduction method. By using Diffusion Map just for these landmarks, we can construct a global skeleton of the whole data in the low dimensional space.
3.2 Embed non-landmark samples by using Locally Linear Embedding
With the skeleton in low dimension, now the problem is how to embed other samples (non-landmarks) into the same low dimensional space. As we mentioned above, local dimension reduction methods try to keep the local distance relationship of data in embedded space. Locally Linear Embedding (LLE) is such a typical algorithm [5]. The LLE algorithm contains three steps:
  • Step 1 (discovering neighbors). For each xi, find the K nearest neighbors in the original high dimensional data set.
  • Step 2 (constructing the approximation matrix). Compute the weights Wij that best reconstruct xi from its nearest neighbors. Wij should minimize the following reconstruction error: equation M2 with the constraints Σj Wij = 1 for each i.
  • Step 3 (embedding to low dimensional space). Compute the embedding by choosing ψi which minimizes equation M3 where ψij, j = 1,… K are embeddings of the K nearest neighbors of sample i in the low dimensional space.
Many proposed dimensionality reduction methods are related to eigenvalue decomposition problems. The Nystrom method provides a way to connect the eigenvalue decomposition problem with eigenfuction problems which can be used to embed out-of-sample examples [10-12]. Given a Hilbert space Hp of functions, consider the following eigenfunction problems: equation M4. By using training samples xi, i = 1, 2,…, n, we can approximate this integral as equation M5, where [phi with hat](x) is an approximation to the true eigenfunction ϕ(x). Assuming λ1, λ2,…, λn and ψ1, ψ2,…, ψn are eigenvalues and corresponding eigenvectors of the kernel matrix K of samples xi, i = 1, 2,…, n. The Nystrom extension for each eigenfunction [phi with hat]i is defined as: equation M6. Based on the Nystrom extension, for dimension reduction problems, we can embed a new sample into the low dimensional space by using a kernel function K and calculated eigenvectors ψ1, ψ2,…, ψn. So Nystrom extension provides another way to embedding non-landmark samples to the low dimensional space. We call it DM Nystrom method [11].
Then, what is the relationship between LLE and the Nystrom extension? It seems that LLE is a local interpolation method and the Nystrom extension is a global interpolation method for inference embedding of a new coming sample based on eigenvectors calculated from training samples. As pointed by Y. Bengio et al. [12], by introducing an extra parameter μ and defining the kernel matrix as equation M7, the embedding for a new sample x in LLE algorithm also follows Nystrom extension mechanism in essence. It can be viewed as a variation of basic Nystrom extension.
Borrowed from the idea of LLE, by using the embeddings of landmarks, we can embed non-landmarks to the low dimensional space and maintain the local geometry relationship at the same time. Each non-landmark sample in embedding space is constructed by using K nearest landmark embeddings. The locally linear weight matrix W is computed in the original high dimensional space.
3.3 Framework of DMLLE algorithm
Based on the above introduction, the framework of the proposed large-scale dimensionality reduction method is as follows. Because it is based on Diffusion Map algorithm and LLE, we call it Diffusion Map Locally Linear Embedding (DMLLE) algorithm.
  • Step 1. Randomly select partial samples as landmarks from whole data and calculate the Markov probability transition matrix of them.
  • Step 2. Do eigenvalue decomposition and construct the low dimensional embeddings of the landmarks by using Diffusion Map.
  • Step 3. Compute the weights Wij that best reconstruct non-landmark point xi from its nearest landmarks in original high dimensional space. Wij should minimize the following reconstruction error: equation M8 with the constraints Σi Wij = 1 for each i.
  • Step 4. Compute the low dimensional embedding of non-landmark point xi by choosing ψi which minimizes equation M9 where j is among the K-nearest landmarks of i.
By using dimension reduction, we hope we can capture the intrinsic distance relationship of samples located on low-dimensional manifold. That would be very helpful for the subsequent supervised learning task. To evaluate the effectiveness of our method for the detection of colonic polyps, we compared the performance of support vector machine (SVM) [22] using DMLLE with that of SVM using PCA dimension reduction method and SVM using all features. In addition, recent research shows that feature selection provides another feasible way to select most informative features from colonic polyp dataset [13][14]. So we also compared the performance of SVM after DMLLE dimension reduction with that of an SVM committee in which the features used by each SVM classifier are a subset of whole features from feature selection [13]. In next section, we will show the comparison results.
4.1 CT Colonography Data
To evaluate our method, we applied DMLLE to a CT colonography dataset that contains the CT scans of 1186 patients. Every patient was scanned twice – once supine and once prone. Each scan was done during a single breath hold using a 4-channel or 8-channel CT scanner (General Electric LightSpeed or LightSpeed Ultra; GE Helthcare Technologies, Waukesha, WI). CT scanning parameters included 1.25- to 2.5-mm section collimation, 15 mm/s table speed, 1-mm reconstruction interval, 100 mAs, and 120 kVp. These scans are divided into a training set (containing 395 patients) and a test set (containing 791 patients). After 3-dimensional colon surface segmentation and polyp candidate segmentation, we extracted 155 features from each polyp candidate. These features are calculated from a polyp candidate based on its shape, volume, etc. Most features are especially related with various curvatures of a polyp candidate. For the training set, we got 58406 polyp candidates that include 85 true polyp detections (coming from 54 physical polyps) whose sizes are in range 6 mm to 9 mm and 52 true polyp detections (coming from 25 physical polyps) whose sizes are greater than 9 mm. The test set contains 116863 polyp candidates in which there are 193 6-9 mm true polyp detections (coming from 123 physical polyps) and 87 true polyp detections (coming from 44 physical polyps) which are bigger than 9 mm. To focus on the detection performance of our method, here we do not consider uniqueness, meaning that several true polyp candidates may come from the same physical polyp. Since current detection algorithms perform very well on true polyps which are greater than 9 mm [3], we focus on the detection of true polyps whose sizes are in the range 6 mm to 9 mm. These polyps are important because the earlier we can find a polyp in its growth stage, the higher probability we may prevent its developing into a malignant tumor.
4.2 Experimental results
After deleting those true polyps which are greater than 9 mm or less than 6 mm, we got 58336 polyp candidates from training set and 116740 polyp candidates from test set. We randomly selected 1000 candidates from them as landmarks. Because true polyps are very few compared with false polyps, here we select all true polyps as landmarks. A Radial Basis Function (RBF) kernel with width 1/155 was used to capture the distance relationship between landmarks. Markov probability transition matrix was generated according to the kernel matrix. The step of Markov transition is 1. The above parameters are selected according to the cross validation on the training set. After diffusion map, we reduced the dimensions of original data to 30 dimensions. Figure 1 shows the distribution of eigenvalues of Markov matrix from the second largest eigenvalue to 155th eigenvalue (the largest eigenvalue of Markov matrix is always 1). The eigenvalues drop quickly and there is an elbow point around the 30th eigenvalue. Because the approximation quality of Diffusion Map is controlled by the distribution of eigenvectors [7-9], it is reasonable to use just the eigenvectors corresponding to the largest 30 eigenvalues to approximate the distribution of original data.
Figure 1
Figure 1
Eigenvalue distribution of Markov probability transition matrix.
In Fig. 2 we compare the performance of SVM [22] using DMLLE with that of SVM using PCA linear dimension reduction method. To avoid over-fitting, the comparisons were done on the training set (we divided the whole training set into two sets: train-training set and train-testing set). In Fig. 2, we show the sensitivities of SVM using DMLLE and PCA at specific false positives per patient when the dimensions of embedding space increase. Each point on the curves is the average result of 100 random tests in which we randomly selected 42 positive samples and 42 negative samples from the training set as training set (train-training set) and the remaining samples comprised the test set (train-testing set). By using DMLLE, SVM shows higher sensitivities at specific false positives per patient compared with SVM using PCA. As indicated by the distribution of eigenvalues of Markov matrix, when we reduce the dimensions of data to about 30, the performance of SVM climbs to the climax which is consistent with our previous eigenvalue analysis. We can view the coordinates which correspond to the eigenvectors whose eigenvalues are smaller than the 30th eigenvalue as noise. The introducing of noise reduces the performance of the SVM. The subspace spanned by the eigenvectors of the first largest 30 eigenvalues captures the intrinsic geometry structure of data and the classifier trained in this subspace shows higher generalization ability than that in the original high dimensional space (please refer the FROC analysis listed below). The same conclusion is drawn in the work of K. Fukumizu et al. \cite{Fukumizu04} in which they discussed how to find a low-dimensional “effective subspace” for high-dimensional data which retains the statistical relationship between the data and their labels.
Figure 2
Figure 2
Comparisons of SVM using DMLLE and PCA under different number of dimensions of embedding space at specific false positives per patient.
Figure 3. shows the distribution of polyp candidates in the reduced subspace spanned by the eigenvectors corresponding to the second to the fourth largest eigenvalues of the Markov matrix. Red dots correspond to true polyps; blue stars are false polyps which are selected as landmarks; Green dots are non-landmarks generated by Locally Linear Embedding (LLE). The number of nearest neighbors in locally linear embedding is 10. From Fig. 3, we can find that the distribution of false polyps is a mixed Gaussian like distribution. Most true polyps are located in the intersection of two Gaussian components. In Fig. 3, we also show images of some true polyps. The true polyps with similar shape are mapped to a nearby place in the embedding space.
Figure 3
Figure 3
Embeddings of data by DMLLE in low dimensional space spanned by eigenvectors corresponding to the second to the fourth largest eigenvalues. Red dots are true polyps. Blue stars are false positives selected as landmarks. Green dots are non-landmark samples. (more ...)
For comparison, we mapped the data to a 3-dimension space by using LLE and plotted the embeddings in Fig. 4. The number of nearest neighbors K = 6 was used in LLE which is the same as that used in DMLLE algorithm. It is obvious that positive samples and negative samples aggregate together which impairs the performance of classifier. Because LLE focuses on preserving the local geometry of data, it lacks the ability to maintain the global geometry structure of original data in low dimensional space. Especially when data are noisy, faraway data points may be connected by noisy samples and therefore mapped to near place. So most of the data collapse to a central point in the low dimensional space. That is what we observed in Fig. 4.
Figure 4
Figure 4
Embeddings of data by LLE. Red dots are true polyps. Green dots are false positives.
To show the influence of landmarks on the performance of the DMLLE algorithm, in Fig. 5 we show FROC curves of DMLLE using a different number of landmarks in the training set. We did 100 random samplings for each number of landmarks. For one random sampling, an SVM classifier was trained and tested 100 times in which the experimental configuration was the same as that in Fig. 2. Therefore, each of the following curves is the average of 10000 FROC curves. The experiments were done on the NIH biowulf computer cluster in parallel mode [25]. Fig. 5 shows that, with increasing number of landmarks, the performance of DMLLE is also improved. But when the number of landmarks is greater than 1000, the performance of DMLLE converges. So in the following experiments on the test set, we chose 1000 landmarks for DMLLE to limit the computational burden.
Figure 5
Figure 5
FROC curves of DMLLE under different number of landmarks on training set (polyp size: 6-9 mm).
To show the advantage of DMLLE on the detection of colonic polyps, in Fig. 6 (a) we show the FROC curves on the testing sets generated by using bootstrap sampling method on the test set (polyp size: 6-9 mm). We used the entire training set to train each algorithm; then we generated the testing set for each algorithm and each random run by bootstrap sampling data in the test set. Each point on the curves is the average result of 100 random tests. For clinical purposes, radiologists are concerned most about the sensitivity of the CAD system at low false positives per patient. In Fig. 6 (b) we plot a sub region of the FROC curves. To compare different embedding methods for the non-landmark samples after using Diffusion Map on the landmarks, in Fig. 6, we also show the results of DM Nystrom method which means the non-landmark samples are mapped to the low dimensional space by the Nystrom extension. From them we find that, SVM with DMLLE dimension reduction shows higher generalization ability than other methods. In order to show whether the improvement is significant, we compared the sensitivities of five methods at different operation points of FROC curves. We performed t-hypothesis tests (Student's t-test) to determine whether two samples (x and y) from a normal distribution have the same mean. The standard deviations are unknown but assumed equal. In table 1 we list the p-value and confidence interval of the t-hypothesis test for DMLLE and other methods at different false positives. By using t-hypothesis test, we found that the sensitivities of DMLLE between 5-10 false positives per patient has significant difference compared with that of the other four algorithms at significance level 0.05. That means by using DMLLE, the performance of our CAD system can be improved significantly.
Figure 6
Figure 6
FROCs of SVM on the large-scale colonic polyp dataset with (green dot curve) or without (magenta square curve) DMLLE dimensionality reduction, SVM committee (Blue star curve), SVM using DM Nystrom (Red triangle curve) and SVM using PCA dimension reduction (more ...)
Table 1
Table 1
T-hypothesis test of DMLLE with other four methods on 6-9 mm polyps at significance level 0.05. In each item we show the confidence interval of the difference between means of DMLLE and other methods. The p-value for each test is less than 0.001.
From the experimental result shown above, the DMLLE algorithm shows good performance on the 6-9 mm colonic polyp classification. At the same time, we also expect that it works well on the polyps bigger than 9 mm. In Fig. 7 and table 2 we show the FROCs and results of t-hypothesis test of the five methods on the bigger polyps (> 9 mm). The DMLLE method shows again the effectiveness on the bigger colonic polyp classification task. In addition, because the number of 6 – 9 mm polyps in the training set is greater than the number of > 9 mm polyps, the performance of DMLLE drops a little for > 9mm polyps compared with that of 6 – 9 mm polyps.
Figure 7
Figure 7
FROCs of SVM on the large-scale colonic polyp dataset with (green dot curve) or without (magenta square curve) DMLLE dimensionality reduction, SVM committee (Blue star curve), SVM using DM Nystrom (Red triangle curve) and SVM using PCA dimension reduction (more ...)
Table 2
Table 2
T-hypothesis test of DMLLE with other four methods on polyps bigger than 9 mm at significance level 0.05. In each item we show the confidence interval of the difference between means of DMLLE and other methods. The p-value for each test is less than 0.001 (more ...)
We found that the SVM with DMLLE shows better generalizability than SVM with all features, SVM with PCA or DM Nystrom dimensionality reduction methods. This result supports our conjecture that we can learn a better classification function in the low dimensional embedding space than in the original high dimensional space. That is also one reason why feature selection methods work. In addition, the advantage of DMLLE over PCA reveals that the colonic polyp data may have a low dimensional manifold structure [4, 5] which cannot be discovered by linear dimensionality reduction methods. Although we can not describe the low dimensional structure definitely, we can still find some clues from the embeddings and figures of true polyps shown in Fig. 3. That means the key factors to discern a true polyp from false polyps are not as many as the features extracted.
In addition, as we mentioned in previous section, the embedding mechanism of LLE for new samples is in essence a variation of the Nystrom extension. The basic Nystrom extension will use all landmarks to infer the embedding of a new sample, whereas the LLE will just use the nearest K neighbors. If we view the changing of the size of neighbor region for a new sample as the changing of scale, then it is obvious that the Nystrom extension and the LLE compute the embedding for a new sample at different scales. The basic Nystrom extension operates at a larger scale and LLE operates at a finer scale. Although LLE works better than the basic Nystrom extension, is it the best scale to infer the coordinates of a new sample based on the embeddings of landmarks? It is still an open question and worthy of further exploration.
Since current detection algorithms perform very well on true polyps which are greater than 9 mm [3], we focused on the detection of true polyps whose sizes are in the range 6 mm to 9 mm. To some extent, improving the detection probability of 6-9 mm polyps is the main target of current CTC CAD research. In order to compare the performance of different methods, we censored other than 6-9 mm polyps in most our experiments. In addition, according to the comparisons of our DMLLE method with other methods on polyps which are greater than 9 mm, the DMLLE method also outperformed other methods on bigger polyps. Because the resolution of CTC is limited, the size of polyps will affect the significance of features on the classification task (good features for big polyps may be not significant for smaller polyps [13]). In practical CTC CAD systems, one could train different classifiers for different sizes of polyps.
In our current experiments, we do not combine the results from supine and prone scans in order to compare the performance of different dimensionality reduction methods. In practical CTC CAD systems, one can get higher sensitivity if the classification of a true polyp is based on supine and prone scans [3]. We also do not focus on the detection of adenomatous polyps only but instead consider all histologic types of polyps in our database. Interested readers may find the performance of an SVM committee based on joint classification of adenomatous polyps on supine and prone scans in our previous work [3].
In this paper, we propose a nonlinear dimensionality reduction method based on Diffusion Map and Local Linear Embedding for large-scale data. By using Markov transition probability as the measure of distance between samples, Diffusion Map shows high resistance to noise. Because Diffusion Map needs to do eigenvalue decomposition for Markov matrix, it limits the application of Diffusion Map to large-scale data. Inspired by the idea of landmark of ISOMAP algorithm, we first randomly select partial data as landmarks. The embeddings of landmarks in low dimensional space is constructed by using Diffusion Map. These embeddings constitute the global skeleton of whole data in embedding space. Then by using Local Linear Embedding, non-landmark points are embedded into the same low dimensional space by using nearest landmarks. For any non-landmark point, the linear geometry relationship with its nearest landmarks is maintained in both high dimensional space and embedding space. By combining global and local dimensionality reduction methods, DMLLE provides a faithful representation of original high-dimensional data at coarse and fine scales. As a consequence, it can capture the intrinsic distance relationship between samples and reduce the influence of noisy features. Both of these characteristics are crucial to the performance of classifiers. PCA can be regarded as a global dimensionality reduction method and LLE is a local dimensionality reduction method. In the experiments in our paper, we compared the DMLLE with PCA and LLE; the comparison results support our findings.
We applied the proposed DMLLE method to the colonic polyp detection problem on virtual colonoscopy. The embedding of all polyp candidates in three-dimensional space shows that most true polyps concentrate in the valley of two Gaussian-like distributions formed by false polyps. The clustering of true polyps enables the supervised learning method to determine a good decision boundary and achieve higher generalization ability than could be achieved in the original high dimensional space. We compared the test errors of the SVM classifier trained in a low dimensional space by using DMLLE with that of the SVM classifier using all available features. FROC analysis showed that with DMLLE, the SVM classifier shows higher generalization ability than that using all features. DMLLE also shows higher generalization ability compared with an SVM committee using feature selection technology and SVM using PCA linear dimensionality reduction.
Acknowledgement
This research was supported by the Intramural Research Program of the NIH Clinical Center. This study utilized the high-performance computational capabilities of the NIH Biowulf PC/Linux cluster. We would like to thank Drs. Jiang Li and Jiamin Liu for helpful discussions, and Dr. Rong Jin for comments on the manuscript. We thank Drs. Perry Pickhardt, J. Richard Choi and William Schindler for providing CT colonography data.
1. Fletcher JG, Luboldt W. CT colonography and MR colonography: current status, research directions and comparison. European Radiology. 2000;10(5):786–801. [PubMed]
2. Frentz SM, Summers RM. Current Status of CT Colonography. Academic Radiology. 2006;13(12):1517–31. [PMC free article] [PubMed]
3. Summers RM, Yao J, Pickhardt PJ, Franaszek M, Bitter I, Brickman D, Krishna V, Choi JR. Computed tomographic virtual colonoscopy computer-aided polyp detection in a screening population. Gastroenterology. 2005;129:1832–1844. [PMC free article] [PubMed]
4. Tenenbaum JB, de Silva V, Langford JC. Global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323. [PubMed]
5. Roweis S, Saul L. Dimensionality reduction by locally linear embedding. Science. 2000;290:2323–2326. [PubMed]
6. Belkin M, Niyogi P. Advances in Neural Information Processing Systems. Vol. 14. MIT Press; Cambridge, Mass., USA: 2002. Laplacian eigenmaps and spectral techniques for embedding and clustering.
7. Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW. Geometric diffusions as a tool for harmonic analysis and structure definition of data. Proceedings of the National Academy of Sciences. 2005 May;102(21) Part I: diffusion maps and Part II: multiscale methods. [PubMed]
8. Lafon S, Lee AB. Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning and Data Set Parameterization. IEEE transactions on Pattern Analysis and Machine Intelligence. 2006;28(No 9):1393–1403. [PubMed]
9. Coifman RR, Lafon S. Diffusion maps. Applied and Computational Harmonic Analysis: Special issue on Diffusion Maps and Wavelets. 2006 July;21:5–30.
10. Fowlkes C, Belongie S, Chung F, Malik J. Spectral grouping using the Nystrom method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004;26(No 2) [PubMed]
11. Lafon S, Keller Y, Coifman RR. Data fusion and multicue data matching by Diffusion Maps. IEEE transactions on Pattern Analysis and Machine Intelligence. 2006;28(No 11):1784–1797. [PubMed]
12. Bengio Y, Paiement JF, Vincent P, Delalleau O. Advances in Neural Information Processing Systems. MIT Press; Cambridge, Mass., USA: 2004. Out-of-Sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering.
13. Yao J, Summers RM, Hara A. Amini AA, Manduca A, editors. Optimizing the committee of support vector machines (SVM) in a colonic polyp CAD system, In Medical Imaging 2005: Physiology, Function, and Structure from Medical Images. Proceedings of SPIE. 2005;5746:384–392.
14. Li J, Yao J, Summers RM, Petrick N, Hara A. An efficient feature selection algorithm for computer-aided polyp detection. International Journal on Artificial Intelligence Tools. 2006;15(No 6):893–915.
15. Yao J, et al. Colonic Polyp Segmentation in CT Colonography Based on Fuzzy Clustering and Deformable Models. IEEE Trans. Med. Imag. 2004;23(11):1344–1352. [PubMed]
16. Paik DS, et al. Surface normal overlap: A computer-aided detection algorithm, with application to colonic polyps and lung nodules in helical CT. IEEE Trans. Med. Imag. 2004;23:661–675. [PubMed]
17. Li J, et al. IEEE ISBI. Arlington, VA: 2006. Wavelet Method for CT Colonography Computer-Aided Polyp Detection.
18. Summers RM, et al. Colonic Polyps: Complementary Role of Computer-Aided Detection in CT Colonography. Radiology. 2002;225:391–399. [PubMed]
19. Dijkers JJ, van Wijk C, Vos FM, Florie J, Nio YC, Venema HW, Truyen R, van Vliet LJ. Segmentation and Size Measurement of Polyps in CT Colonography. MICCAI. 2005 [PubMed]
20. Acar B, Beaulieu CF, Gokturk SB, et al. Edge displacement field-based classification for improved detection of polyps in CT colonography. IEEE Transactions on Medical Imaging. 2002;21(12):1461–1467. [PubMed]
21. Yoshida H, Nappi J. Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans. Med. Imag. 2001;20(12):1261–1274. [PubMed]
22. Burges CJC. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. 1998;2:121–167.
23. Fukumizu K, Bach F~R, Jordan M~I. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research. 2004;25:73–99.
24. Jolliffe IT. Principal Component Analysis. second edition Springer-Verlag; 2002.
25. Biowulf Linux cluster National Institutes of Health; Bethesda, Md: http://biowulf.nih.gov.