|Home | About | Journals | Submit | Contact Us | Français|
Barrett’s esophagus (BE) and associated adenocarcinoma have emerged as a major health care problem. Endoscopic optical coherence tomography is a microscopic sub-surface imaging technology that has been shown to differentiate tissue layers of the gastrointestinal wall and identify dysplasia in the mucosa, and is proposed as a surveillance tool to aid in management of BE. In this work a computer-aided diagnosis (CAD) system has been demonstrated for classification of dysplasia in Barrett’s esophagus using EOCT. The system is composed of four modules: region of interest segmentation, dysplasia-related image feature extraction, feature selection, and site classification and validation. Multiple feature extraction and classification methods were evaluated and the process of developing the CAD system is described in detail. Use of multiple EOCT images to classify a single site was also investigated. A total of 96 EOCT image-biopsy pairs (63 non-dysplastic, 26 low-grade and 7 high-grade dysplastic biopsy sites) from a previously described clinical study were analyzed using the CAD system, yielding an accuracy of 84% for classification of non-dysplastic vs. dysplastic BE tissue. The results motivate continued development of CAD to potentially enable EOCT surveillance of large surface areas of Barrett’s mucosa to identify dysplasia.
Barrett’s esophagus (BE) is a pre-malignant precursor lesion for esophageal adenocarcinoma [1–9]. Endoscopic optical coherence tomography (EOCT) has demonstrated interpretable high-resolution images of mucosa and submucosa in the gastrointestinal (GI) tract [10–15] and is capable of identifying dysplasia in gastrointestinal mucosal tissue [16–19]. BE is the most extensively studied potential GI application of EOCT [12,13,17,18,20–22].
Computer-aided diagnosis (CAD) is a relatively new field that aims to provide a diagnosing physician with quantitative and objective measures and interpretation of image features. In the last two decades CAD has been applied using several imaging modalities (e.g. X-ray mammography and computed tomography (CT) [23–28] ultrasound [29–34] and magnetic resonance imaging (MRI) [35–37]). These computerized analysis schemes are being developed to aid in detection of lesions and distinguishing between malignant and benign lesions in order to improve both sensitivity and specificity of detection. Many studies have shown that CAD has the potential to increase the sensitivity [23,25,38,39] and the specificity [40,41] of diagnostic imaging. The merit of computer-aided analysis of image features lies in the objectivity and reproducibility of the measures of specific features. There is a large body of knowledge developed over the past 10-15 years with new computer techniques and new image processing methods being applied frequently to new imaging modalities [25,26,28,34,42–45]. For example, mammography CAD researchers have found that different types of lesions require different approaches to detection . A wide range of feature extraction, feature selection, and classification methodologies have been explored [46–49].In general, CAD systems attempt to quantify image features associated with pathology and analyze these data to provide the physician with an objective assessment of disease state. The conventional paradigm envisions that the CAD output will be used by the physician as a “second opinion” with the final diagnosis to be made by the physician .
Although the specialized intestinal metaplasia associated with Barrett’s esophagus is readily distinguishable from normal esophageal mucosa using EOCT [12,13,20–22], discerning dysplasia within Barrett’s esophagus is more challenging [16–19]. CAD techniques may aid the effort to identify dysplasia in BE by providing objective and quantitative interpretation of clinical EOCT images reducing inter- and intra-observer variability and by detecting subtle changes in image features associated with the transformation. Furthermore, the future role of EOCT in BE surveillance is likely to involve comprehensive imaging of the involved segment of the esophagus, resulting in the acquisition of hundreds or possibly thousands of images per patient. Analysis of these data by the unaided human reader will not be practical. Therefore CAD is an essential component of the potential clinical application of EOCT to BE management.
For our clinical trials of EOCT imaging in BE, we have employed CAD methods for detecting dysplasia and classifying EOCT images of Barrett’s esophagus [19,50]. Here, we expand upon these methods and describe them in detail and evaluate their suitability for detection of dysplasia in BE using EOCT, in the context of a generalizable tissue classification scheme. The analysis is extended to include multiple texture feature analysis methods as well as image features specifically designed to quantify observed EOCT image characteristics associated with dysplasia. A two-step image feature selection process is introduced to reduce risk of over-training the CAD system. Several multivariate classification methods are investigated, as well as the effects of bootstrapping and aggregating the training data. We furthermore extend the analysis to include the use of multiple image frames per examination site and investigate three-category classification.
The paper is organized as follows. Section 2 introduces the data analyzed and methods used for CAD consisting of the steps of segmentation, feature extraction, feature selection, classification and validation. The results of feature extraction, selection and classification and validation are reported in Section 3. Finally the results are discussed in Section 4.
Under a protocol approved by the Institutional Review Board of University Hospitals Case Medical Center, we enrolled patients undergoing surveillance for Barrett’s esophagus in a study designed to assess EOCT imaging. The protocol specified that surveillance should be conducted according to the “Seattle protocol” , with biopsies being obtained in four quadrants at two centimeter intervals along the entire length of esophagus involved by Barrett's changes.
The EOCT system used to obtain these data has been described previously [52,53]. A digital stream of EOCT images was obtained at each biopsy site prior to removal of the actual biopsy. For this purpose, a 2.4 mm diameter EOCT probe was designed for use with a 2-channel endoscope fitted with an endoscopic mucosal resection (EMR) cap. The cap, a transparent, plastic cylinder beveled at the distal end, fits tightly on the end of the endoscope. When the tip of the endoscope was deflected toward the wall of the esophagus, a small circular portion of the esophageal mucosa was fixed by the cap, thereby negating the effects of esophageal motion. With the EOCT probe inserted through one of the two accessory channels in the endoscope, a portion of the esophageal mucosa within the area encircled by the cap was imaged. The EOCT probe and study protocol were designed such that the probe did not make physical contact with the tissue, thus avoiding compressing and altering the appearance of the mucosa . As a digital stream of images was being obtained, a biopsy forceps inserted through the second endoscope channel was used to obtain a specimen. With the cap properly aligned on the tip of the endoscope, the biopsy forceps entered the EOCT field of view, providing assurance that the tissue imaged by EOCT was the same tissue removed by biopsy. Using this system, EOCT images were precisely correlated to the histopathologic diagnosis at each biopsy site [17,19].
For the present analysis, we selected image-biopsy pairs from screening procedures in 33 patients. An EOCT video stream (approximately 20 frames) was recorded at each biopsy site. The images selected at each biopsy site were the last several images recorded immediately before the biopsy forceps entered the EOCT field of view. The frame acquisition rate of the EOCT system was 4/second. We assumed that an endoscopist could reasonably hold the EOCT probe in place for 2-3 seconds. Therefore, the maximum number of frames chosen for analysis per biopsy site in this study is 10.
The EOCT image stream for each biopsy site was reviewed by EOCT experts (XQ, MVS). Also, each biopsy was evaluated by an experienced GI pathologist (JEW). The criteria for image-biopsy pair selection for inclusion in this study were as follows: 1) the biopsy forceps entered the EOCT field of view, ensuring image-biopsy correlation, 2) the EOCT probe did not contact the tissue, 3) the biopsy was graded as non-dysplastic (ND) BE or as low-grade dysplasia (LD) or high-grade dysplasia (HD). Biopsies graded as indefinite dysplasia (IND) or as cancer were excluded. The study was intended to determine the utility of EOCT and CAD for detecting dysplasia in BE. Therefore, cancer was excluded from this analysis because cancerous tissue can be readily identified endoscopically, and IND was excluded for lack of a determined gold standard diagnosis. We expect that an eventual clinical EOCT-based CAD system would not provide a classification of IND, opting instead for the more conservative classification of dysplasia, since the diagnosis will be confirmed by histological analysis of a biopsy. Of a total of 314 biopsy sites imaged from 33 patients, 96 biopsy sites obtained from 12 patients were included in this analysis. A total of 690 EOCT images were included (421 from 63 ND biopsy sites, 202 from 26 LD biopsy sites, and 67 from 7 HD biopsy sites).
The observed characteristics of dysplasia in BE include decreased scattering of light and focal loss of mucosal structures [16–18,54,55]. We and others  have also observed striped patterns that are common within non-dysplastic BE EOCT images when the probe is not making contact with the tissue surface, but uncommon within dysplastic BE EOCT images. These are the image features that we selected for quantitative analysis. Figure 1 shows representative EOCT images of Barrett’s esophagus without dysplasia (A), with low grade dysplasia (B), and with high grade dysplasia (C) which illustrate these image features. Within each EOCT image the region of interest (ROI) representing the site of the biopsy was segmented for analysis using a segmentation method reported previously , removing the EMR cap and tissue outside of the cap. A binary mask was created from the ROI to select the region to be analyzed by the feature extraction algorithms. Figure 2 shows an EOCT image of Barrett’s esophagus, the segmented ROI and its binary mask.
As described in detail hereafter, a total of 39 image features were quantified from the segmented ROI within each EOCT image. From those 39 image features, 18 were selected as candidates to be used for classification based on their 2-category classification power (using ROC analysis). To reduce risk of over-fitting and to remove highly correlated features, the first 5 principal components of the 18 selected features, accounting for 94% of the variance, were used as the final features for image and site classification. All image analysis and statistical analysis presented here was carried out using signal processing and image processing toolboxes in MATLAB R2006b (The Math Works Inc,. Natick, MA, USA), and S-PLUS 7.0 (The Insightful Corp., Seattle, WA, USA).
One observed characteristic of dysplasia in BE is decreased scattered light intensity, presumably resulting from the altered optical scattering properties of the dysplastic tissue. Dysplasia is characterized by cellular changes, including the size, shape and density of nuclei within the epithelial layer. These changes affect the light scattering by the tissue. The profile of each axial line (A-scan) in the EOCT image is related to the backscattered power as a function of depth and the optical properties of the tissue. In order to quantify the scattering intensity by a tissue site, we estimated the optical properties of the tissue from the EOCT images. The single scattering model used here assumes that light attenuation occurs along the path of the wave both before and after it is backscattered, that higher-order scattering processes (multiple scattering) can be neglected, and that the beam was small and had a low beam-divergence angle. Under these assumptions, the backscattered power P(z) returning from a depth z can be approximated by the method suggested by Schmitt et al as Eq. (1) [56,57].
Here is the incident power measured at depth z = 0; A(z) is the beam divergence function; is the coherence length of the light source; is the backscattering coefficient of the tissue; and is attenuation coefficient of the tissue. The backscatter and attenuation coefficients are reasonable image features to grossly represent tissue scattering, and therefore image backscatter intensity. The tissue backscattering and attenuation coefficients can be estimated by fitting the logarithmically compressed EOCT A-scan data to a line as a function of the depth z. Then the intercept of the line can be equated to and the slope can be equated to , yielding quantities representing the A-scan that are direct functions of μb and μt.
In this study, each EOCT image ROI was filtered with a rolling average of 9 adjacent A-scans to reduce noise. Then, for each pixel, the intercept and slope in the direction of the A-scan were estimated within an approximately 200 micrometer window around the pixel. Under the single-scattering assumption outlined above, the EOCT data should always have a positive intercept and a negative slope. To remove the influence of bad estimates, A-scan segments with a positive slope or negative intercept were excluded from the intensity analysis, as were any segments deeper than excluded regions., The mean, standard deviation and dominant peak of the histogram of slope and intercept over the entire ROI were extracted as possible classification features representing the intensity characteristic of the image.
Another observed characteristic of EOCT images of dysplasia in BE is a lack of the stripe-like patterns that are observed in non-dysplastic BE EOCT images. These are seldom found in dysplastic BE EOCT images. Although the physical origin of the stripes has not been proven, they are probably associated with the surface texture of the tissue. They have been observed by us and others  when the EOCT catheter probe is not in contact with the tissue surface. Because they are often observed in non-dysplastic tissue and seldom observed in dysplastic tissue, they may be a good feature for tissue classification. Figure 3(A) shows the obvious stripe pattern within a non-dysplastic BE EOCT image and Fig. 3(B) shows no obvious stripe pattern within a dysplastic BE EOCT image. Because the EOCT probe scans in a radial manner, the stripe-like patterns within non-dysplastic BE EOCT images appear in a radial orientation. For tissue classification, the radial stripe density and relative orientation were quantified by a local intensity gradient-based method.
Local orientation and density of stripes was quantified by calculating local intensity gradients in small sub-regions within the ROI of BE EOCT images [58,59]. The gradient magnitude and the angle of the stripes were calculated using a 5x5 Prewitt gradient kernel. The edges of the ROI were removed by morphological erosion to avoid edge effects. The dominant local stripe orientation was determined within 5x5 pixel sub-regions using maximum likelihood estimation [58,59]. Because the gradient magnitude and the angle within sub-regions can be described as circular data , the maximum likelihood of the stripe orientation within a sub-region was estimated using a von Mises distribution, which is analogous to a normal distribution for circular data . The relative orientation was then determined as the difference between the local dominant angle and direction of the sub-region to center of probe. For tissue classification, three parameters were quantified: the density of stripes, defined as the number of radially-orientated stripes divided by the major axis length of the ROI, and the mean and the standard deviation of the relative orientation.
Another observed characteristic of dysplastic mucosa in BE is loss of structure associated with normal histological organization, presumably resulting from the altered tissue architecture of the dysplastic tissue [10,17,54]. Therefore such a structure loss can be quantified as texture features, such as smoothness, coarseness and homogeneity etc. in EOCT images, which could potentially serve for tissue classification. There are three principal approaches to defining qualities of image texture: statistical methods, structural methods and model based methods [61,62]. With a statistical approach, the relationships between each pixel and its neighboring pixels are quantified by their spatial distribution of gray values. A set of local descriptors are extracted, such as energy, entropy, coarseness, contrast etc. With a structural approach, an image is considered as being composed of a set of texture units or primitives. The methods of analysis usually depend upon the geometric properties of these texture units. Model based texture analysis methods are based on the construction of an image model that can be used not only to describe texture, but also to synthesize it. The model parameters capture the essential perceived qualities of texture, such as Markov random fields, fractals et al. The fact that the perception of texture has many different dimensions is an important reason why there is no single method of texture representation which is adequate for a variety of textures. Statistical and structural approaches are often combined to extract texture features .
In the present study, the size, shape, and orientation of the ROI were different for each EOCT image. Model based approaches, which are sensitive to size and shape of the ROI, would not be readily adapted for texture analysis in this case. Statistical and structural texture features tend to vary locally and so these methods are not sensitive to variation in ROI size. Based on these observations, we chose to make use of statistical and structure approaches for texture analysis in this study of BE EOCT images. Because these EOCT image data were obtained using a radial scanning probe, rotation invariant features were investigated for texture analysis for this study. Center-symmetric auto-correlation (CSAC)  and co-occurrence matrices (COOC) [64,65] are statistical methods. The texture feature coding method (TFCM)  is a combined statistical and structural approach. We chose to investigate these three methods for this study because they capture rotation- and intensity-invariant texture features and are not sensitive to ROI size.
The CSAC method can be regarded as a generalization of Laws’ kernel method . It measures covariance of any local center-symmetric patterns. Two local center-symmetric auto-correlations, linear and rank-order (SAC and SRAC), together with a related covariance measure (SCOV) and variance ratio (SVR), within-pair variance (WVAR) and between-pair variance (BVAR) were calculated. All of these are rotation-invariant measures .
Co-occurrence matrices (COOC) (also called spatial gray-level dependence matrices) were first proposed by Haralick et al [64,65] and are based on the estimation of the intensity second-order joint conditional probability density functions (pdf) for various distances and for four specified directions (0°, 45°, 90°, and 135°) between two pixels. Texture features calculated using the COOC quantify the distribution of gray-level values within an image. For this study, four texture features were calculated from the co-occurrence matrices within the segmented ROIs, contrast, correlation, energy and homogeneity. Contrast is a measure of the gray-level variation between pairs of image elements. Correlation is sensitive to uniform and repeated structures. Energy is sensitive to image regions that have only a small number of intensity distribution patterns; it is an indicator of uniformity or smoothness. Homogeneity is sensitive to images with lower contrast values. Within the segmented ROI of each EOCT image, these four features were calculated in four different directions within a 3 pixel distance.
The texture feature coding method (TFCM)  is a coding scheme in which each pixel is represented by a texture feature number (TFN). The TFN of each pixel is generated based on a 3x3 texture unit as well as the gray-level variations of its eight surrounding pixels. The TFNs are used to generate a TFN histogram from which texture feature descriptors are quantified. In this work, we calculated coarseness, homogeneity, mean convergence and variance. Coarseness measures drastic intensity change in the 8-connective neighborhood. Homogeneity measures the total number of pixels whose intensity have no significant change in the 8-connective neighborhood. Mean convergence indicates how closely the texture approximates the mean intensity within a texture unit. Variance measures deviation of TFNs from the mean. Code entropy, which measures the information content of coded TFNs, was also calculated, in four Orientations; 0°, 45°, 90° and 135°.
From the feature extraction methods described above, a total of 39 image features were quantified within the segmented ROI of each EOCT image. Classification based on multiple image features has the advantage of increasing accuracy by increasing the amount of information used. However, making use of too many image features derived from a limited training data set increases the risk of over-fitting. This will decrease the robustness of the system when classifying data outside of the training set [68,69]. Therefore it is necessary to select a limited number of image features to balance accurate and robust classification. Here, we used a two-stage feature selection process. First, image features were not equally correlated with the histopathological diagnosis, so image features were initially screened individually to remove those that were not strong classifiers. Second, image features that are strong classifiers may be correlated with each other and therefore redundant, so redundant features were removed. For this process we used the single EOCT images immediately previous to the appearance of the biopsy forceps for each biopsy case.
For the first feature selection stage, individual image features were screened for their ability to separate dysplastic and non-dysplastic populations. The receiver operating characteristic (ROC) curves of each feature were calculated to find the features with higher classification accuracy. Features with areas under ROC curve smaller than 0.7 were rejected. This threshold is subjective, but because it is only the first stage of feature selection, small changes of the threshold would not be expected to significantly affect the final classification result. The list of calculated image features and corresponding areas under the ROC curve is shown in Table 1 .
For the second stage of feature selection, to remove redundancy due to highly correlated features, principal component analysis (PCA) was applied to the set of image features selected from stage one . Each principal component is orthogonal and represents a linear combination of the original variables. The first few principal components typically account for most of the variance in the original data. Therefore, the first five principal components were selected as the parameters to be input to the multivariate classification algorithms. From the loading of the first five principal components, image features accounting for the most variance are shown in Fig. 4 .
Multivariate analysis was applied to potential image classification parameters to understand the relationships between the variables, and to predict generalizability to new data. The image parameters analyzed were the first five principal components of selected features from the single images previous to forceps appearance at each biopsy site, as described above. We carried out and compared the performance of four types of multivariate analysis for EOCT image classification of dysplasia in BE: linear and quadratic discriminant analysis (LDA & QLA), K-nearest neighbor (k-NN) classification, two types of neural network (NN) (single-hidden-layer NN (SLNN) and learning vector quantization (LVQ) network), and classification tree. Here, a binary classification was used into two categories, non-dysplastic (ND) and dysplastic (D). The dysplastic category includes both low-grade and high-grade dysplastic sites. Because of the moderate sample size (12 patients, 96 biopsy sites), to validate the classification based on the selected parameters, we applied leave-one-patient-out cross-validation to calculate the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy with their corresponding 95% confidence intervals (CI).
Fisher  introduced a linear discriminant analysis (LDA) seeking a linear combination of the variables (here image features) with a maximal ratio of the separation of the class means to the within-class variance, in which extreme values are obtained from eigen values and eigenvectors. Bayesian classification with pooled covariance matrix and equal prior probabilities for each group was used as the rule for classification. Quadratic discriminant analysis (QDA) is closely related to LDA, where the boundaries of the decision regions are quadratic surfaces in feature space instead of linear plane for LDA. The advantage of LDA and QDA is that they are easily calculated. But LDA and QDA require normality assumptions for training data sets.
The k-nearest neighbor algorithm (k-NN)  is a method for classifying objects based on the closest training samples in the feature space. The training samples are mapped into multidimensional feature space, and partitioned into regions by class labels. Testing data is assigned to the class c if it is the most frequent class label among the k nearest training samples, usually using Euclidian distance. The best choice for k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. A good k can be selected by cross-validation, running the nearest neighbor classifier on the learning set only. Because of the simplicity and because no normality assumption is required, k-NN classification has been widely used in pattern recognition.
A neural network  is a computer-intensive, algorithmic procedure for transforming inputs into desired outputs using highly connected networks of relatively simple processing units (neurons or nodes). Many different NN architectures can be employed for pattern recognition. SLNN implements the well-known statistical techniques of linear regression and generalized linear models. Here we used the independent logistic sigmoidal activation function applied to each of the outputs independently. The classification of image features using LVQ  is based on a comparison with a number of so-called prototype vectors. LVQ is composed of a competitive layer and a linear layer. Vector quantization is one example of competitive learning. The competitive layer automatically learns to classify input vectors in a supervised manner. In vector quantization, the network is given by prototypes and the changes of the weights of the network related to the k-nearest neighbor algorithm (k-NN). The linear layer transforms the competitive layer’s classes into target classification defined by target results.
The classification tree is another non-parametric classification method. The root is the top node of the tree, and features are passed down the tree with decisions being made at each node until a terminal node or leaf is reached. Each non-terminal node contains a question on which a split is based. Each leaf contains the label of a classification . A classification tree partitions the feature space into a set of sub-regions corresponding to the leaves, since each image will be classified by the label of the leaf it reaches. Thus decision trees can be seen as a hierarchical way to describe a partition of feature space. Using tree methods for classification results in a series of logical if-then conditions (tree nodes), it is conceptually simple yet powerful, and provides a structured description of the feature space . Here we applied entropy  as a measure of impurity of nodes to construct the whole classification tree.
The advantage of LDA and QDA is that they are easily calculated. But LDA and QDA require normality assumptions for training data sets. The performance of k-NN varies according to the choice of k. The best choice for k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. Neural networks based on artificial intelligence principles have the powerful capacity to define a non-linear separation of classes. Almost any finite-dimensional vector function on a compact set can be approximated. However, training NN are computationally expensive and can suffer from “interference” in that new data can cause it to forget some of what it has learned on old data.
Preliminary comparison of the results of these classification methods (shown in Tables 2 , ,33 , and the first row of Table 4 ) reveal that the classification tree achieved classification accuracy superior to k-NN and neural network methods, and equivalent as to LDA and QDA methods. Moreover, unlike LDA and QDA, the classification tree does not assume normality nor that the underlying relationships between the predictor variables and the dependent variable are linear or quadratic. Therefore, we employed the classification tree method for subsequent investigations presented here.
One major problem with classification trees (also NN) is their high variance. Often a small change in the training data can result in a very different series of splits, making interpretation somewhat precarious. The major reason for this instability is the hierarchical nature of the process. The optimization of a classifier using a learning sample only leads to over-fitting and bias in both model selection and error rate estimation. Independent test samples are available for some problems , but this is usually not the case in medical statistics due to small learning samples.
Bootstrapping is a data re-sampling method that iteratively trains and evaluates a classifier in order to improve its performance. Bootstrapping is applied to estimate an approximate population distribution from the empirical distribution of the observed data, and can be implemented in the case where a set of observations can be assumed to be from an independent and identically distributed population. It constructs a number of re-samples of the observed data set (of equal size to the observed data set), each of which is obtained by random sampling with replacement from the original data set .
Bootstrapping is becoming the most popular method of testing mediation because it does not require the normality assumption to be met, and it can be effectively utilized with smaller data sizes (such as N<20) . Combining bootstrapping with classification (called bootstrapping aggregating or bagging), reduces the variance in classification . The misclassification error estimated by the bootstrap with observations from the learning sample is unbiased. Bagging classification trees results in substantial reduction of misclassification error in many applications [81,82]. We classified the EOCT images based on the selected image parameters using bagging of classification trees, with 10, 30, 50, 70 and 100 times bagging. Using bootstrapped leave-one-patient-out training sets, one classification tree was constructed for each subsample. Then each test image was classified by majority voting of those bagging classification trees.
For the EOCT image/biopsy pairs included in this study, the maximum number of frames chosen for analysis per biopsy site was 10. Not all biopsy sites had 10 frames available. In this unbalanced data set, among the 96 biopsy sites (690 EOCT images) included, the number of usable frames ranged from one to more than ten per biopsy site. Cinematic viewing of multiple EOCT images of a tissue site is extremely advantageous to a human reader classifying the site, as compared to examination of a single image. Therefore, we have investigated whether use of multiple images of a biopsy site by our CAD algorithm would improve classification accuracy. This initial investigation uses the simplest possible method of making use of multiple images.
For this analysis, biopsy sites were classified in three ways. First, sites were classified into two categories as non-dysplastic (ND) or dysplastic (D). Second, sites were classified into two categories as non-high-grade dysplastic (NHD) or high-grade dysplastic (HD). Third, sites were classified into three categories as non-dysplastic (ND), low-grade dysplastic (LD), or high-grade dysplastic (HD).
For 2-category classification, we defined V (values 1-10) as the number of frames to be considered in the classification algorithm. For each value of V, we included in the analysis only those biopsy sites for which at least V frames were available. We defined R as the number of frames (out of V) that had to be classified as positive for the biopsy site to be classified as positive. For each V, we analyzed data for values of R from one to V. For each V and R, we calculated the resultant sensitivity and specificity for classifying the biopsy site, using the pathologist’s diagnosis of the biopsy as the reference standard. These data were compiled in sensitivity, specificity, and accuracy tables according to changing V and R.
For 3-category classification (ND vs. LD vs. HD), classification was calculated using classification tree with bagging using the most conservative criteria for detecting dysplasia. Within the site, if at least one HD image was found, this site was classified as HD. If no HD image and at least one LD image was found, this site was classified as LD. Otherwise the site was classified as ND.
We extracted a total of 40 image features within the segmented ROIs. Eighteen image features had areas under the ROC curve that were >0.7. These features were: mean intercept from intensity model; number of stripes per unit length from stripe detection; mean SVR, VAR and BVAR from CSAC; homogeneity, CE 0°, CE 45°, CE 90° and CE 135° from TFCM; contrast 0°, correlation 45°, correlation 90°, correlation 135°, energy 0°, energy 45°, energy 90° and energy 135° from COOC. Table 1 shows the extracted image features and their areas under the ROC curve. Highlighted features are those for which the areas under ROC curve were larger than 0.7.
In this analysis, the first principal component explained 0.69 of the variance, and the first five principal components together explained 0.94 of the variance. These were the parameters used for classification. The loading of the first five principal components is shown in Fig. 4. From this plot, we can observe that the image features contributing most to the variance of the data are: mean VAR from CSAC, CE 0° from TFCM, contrast 0° from COOC, mean intercept from intensity model and number of stripes per unit length from stripe detection. Figure 5 shows the ROC curves of the first five principal components evaluated individually as classification parameters, and each corresponding area under its ROC curve.
Table 2 reports classification results using leave-one-patient-out cross-validation by LDA, QDA, SLNN and LVQ. Table 3 reports the k-NN classification results for values of k ranging from one to six using leave-one-patient-out cross-validation. It can be seen that k-NN classification achieved the best classification of dysplasia on our EOCT data set when k = 3. Table 4 shows the classification result using the classification tree method without bagging, and with 10, 30, 50, 70 and 100 times bagging using one image per biopsy site. From these results it can be seen that misclassification errors were much higher without bagging than with bagging. At 50 times bagging, the sensitivity and specificity significantly improved. Beyond 50 times bagging, the classification accuracy did not improve further.
Results of using multiple frames per biopsy site for binary classification of non-dysplastic sites (ND, negative) vs. dysplastic sites (D, positive) are summarized in Table 5 . The table presents the sensitivity and specificity of classification using number of frames (V) ranging from 1 to 10, and numbers of positive images required to classify the site as positive (R) ranging from one to V. The table also shows how many of the biopsy sites were included in the analysis for each value of V (i.e. number of sites for which enough OCT images were available). For V = 1 and R = 1, sensitivity was 0.76 (95% confidence interval (CI): 0.59-0.87) and specificity was 0.79 (95% CI: 0.68-0.88) and accuracy was 0.77 (95% CI: 0.68-0.84). Using ten frames per biopsy site and requiring at least four frames to be positive to classify the site as positive (V = 10, R = 4) yielded a sensitivity of 0.78 (95% CI: 0.54-0.92) and a specificity of 0.90 (95% CI: 0.67-0.98) and accuracy was 0.84 (95% CI: 0.68-0.93). This was the best classification result for this experiment. In this data set, 39% of the sites had ten frames available.
Results using multiple frames per biopsy site for binary classification of non-high-grade dysplastic (NHD) vs. high-grade dysplastic (HD) sites are summarized in Table 6 . Using V = 1 and R = 1 yielded a sensitivity of 0.14 (95% confidence interval (CI): 0.06-0.31) and a specificity of 1.00 (95% CI: 0.93-1.00). The best classification result was obtained using V = 9, R = 1 or 2, which yielded a sensitivity of 0.67 (95% CI: 0.46-0.85) and a specificity of 1.00 (95% CI: 0.83-1.00). 44% of the sites had at least nine frames available. From the results shown, using multiple frames per biopsy site classification achieved higher sensitivity and specificity than using only one single frame per biopsy site. Table 7 contains positive predictive values (PPV) and negative predictive values (NPV) for the V = 1 and R = 1 cases and the best cases from the ND vs. D results and the NHD vs. HD results, using multiple frames per biopsy site. Three-category classification (ND sites vs. LG sites vs. HD sites) results are shown in Table 8 . Table 8 shows the raw classification results. Using the conservative criteria for multi-image classification, no high-grade dysplastic biopsy sites were misclassified as low-grade or non-dysplastic. Furthermore, no low-grade dysplastic biopsy sites were misclassified as non-dysplastic. However, several non-dysplastic sites and low-grade sites were misclassified as high-grade dysplastic. Table 9 shows the calculation time for each module using MATLAB R2006b on an Intel CPU with 3.25 GB of RAM computer.
The EOCT system used here and the experimental protocol were designed to acquire EOCT images precisely correlated with biopsies. An EOCT for clinical surveillance will comprehensively image the entire affected esophageal wall, not only individual locations [83,84] The image analysis presented here should be translatable to such comprehensive imaging. Based on these results, it is apparent that CAD has the potential to aid in detecting the presence or absence of dysplasia in surveillance of large surface areas of Barrett’s mucosa using EOCT.
Automatic segmentation of the region of interest (ROI) for analysis will be important for surveillance of dysplasia in Barrett’s esophagus using EOCT because of the large number of images acquired. The segmentation of ROI for comprehensive imaging will be simplified compared with the segmentation method used in this work  because the images will not include artifacts such as the EMR cap or tissue outside the cap.
Our observation of stripe-like patterns that are usually present in non-dysplastic BE EOCT images, but usually not present within dysplastic BE EOCT images, is supported by a study conducted by Chen et al , where OCT images were recorded of pinch biopsies taken during BE surveillance procedures. However, these stripe-like patterns are not observed in some BE images recorded using other EOCT systems [10,85] which made use of probes in physical contact with the tissue. Furthermore, similar stripe-like features are observable in other pitted gastrointestinal mucosae such as stomach and colon (e.g. Figure 4 in  and Figs. 1, 2 in , and Fig. 6 in .) Because the feature is observed when the probe in not in contact with the tissue, but not observed when the probe is in contact, and because the stripes are typically oriented in the direction of the illuminating probe light, it is likely that the stripes are associated with the surface topology related to the pit structure of the mucosa. The effects of probe-tissue contact on OCT detection of dysplasia within BE is currently under investigation, but it is apparent that contact obscures this feature by smoothing surface topology.
Of the 39 image features quantified in this study, 18 (46%) were selected for classification because they were found to be significant classifiers (area under ROC > 0.7) in the initial screen. The highest individual area under the curve is 0.85. Several features not selected have areas under the curve close to 0.7. From the principal component analysis, the first five principal components account for 94% of the variance within the selected features. This indicates that there is a high level of correlation amongst the calculated features . From the loading plots shown in Fig. 4, It can be seen that the first three principal components (PC 1-3) are composed mostly of texture features. PC 1 is dominated by CSAC features VAR and BVAR and the COOC energy features. PC 2 is dominated by TFCM features besides CSAC feature VAR. PC 3 consists of the COOC contrast and correlation features. PC 4 and PC 5 are composed primarily of the mean intercept and the number of stripes, respectively, with little contribution from texture features. Therefore it is apparent that while there is a high level of correlation among texture features, most of this correlation is between the various features calculated using the same method. The features calculated from the attenuation model and the stripes feature are largely independent. Whether the specific image features that appeared as the strongest classifiers in t his study also emerge as the strongest classifiers in future studies, using different EOCT systems and imaging protocols, is unclear. For example, improved resolution or signal to noise ratio can be expected to improve classification accuracy. However, the feature extraction methods described here all yielded strong classifiers, and the feature selection methods applied here are generalizable.
In this study, classifying images using bagging of classification trees  reduced variance and misclassification error due to outliers. Without bagging, misclassification errors were much higher than with bagging; for 50 times bagging of classification trees, sensitivity and specificity significantly improved results without bagging. Additionally, in this study, each feature was equally weighted. In the future, once good training data is built, different weights can be applied to each feature when using tree methods, which may improve diagnostic accuracy.
The method of using multiple images evaluated in this study is the simplest and most conservative possible approach. The results shown in Table 5 and Table 6, show that making use of multiple images from a site has the potential improve the sensitivity and specificity of classification. Compared with the results of using one image per biopsy site with bagging (Table 4), the sensitivity of using multiple frames per biopsy site for binary classification of non-dysplastic sites vs. dysplastic sites is improved significantly (0.78 vs. 0.67), however both approaches achieved similar accuracy (0.84 vs. 0.82). This is because the data are unbalanced, with more negative sites than positive sites so that the improved sensitivity does not strongly affect the overall accuracy. The improved sensitivity is reflected in the PPV, however, which is improved from 0.79 to 0.88 by use of multiple images at a classification site. Making use of multiple frames makes sense intuitively; a human reader much more easily interprets a cine-stream of EOCT images than a single image. Also, use of multiple frames may support investigation of 3D image features when 3D data are available. More sophisticated methods should make better use of the information present in multiple images at a site or of 3D image sets.
Results of CAD classification in this study must be compared cautiously with previously-reported classification by endoscopists  or with our previously reported CAD . That said, in , the EOCT system with human readers resulted in an accuracy of 78% for detection of dysplasia in patients with Barrett's esophagus compared with 82% accuracy presented here. In this study, image selection criteria for CAD were different than those of . Of a total of 405 image-biopsy pairs collected, 96 met inclusion criteria for the present study. But for each image and biopsy pair used in the present study, there was a high degree of certitude that the biopsy forceps entered the EOCT field of view, thereby ensuring nearly perfect image-biopsy correlation. On the other hand, the endoscopists grading the images in the previous study had access to the real-time image stream consisting of many images at each biopsy site, while the CAD algorithm in this study analyzed each image independently before grouping them according to biopsy site. However, the results presented here still indicate that CAD has the potential to be at least as accurate as humans for identifying dysplasia in EOCT images of BE.
The set of data analyzed in this report is similar to the set analyzed in our previous CAD report  because they originate from the same clinical study, but they are not the same data set because the current set was selected blindly using criteria based only on image quality so it would not be biased by previous classification results. For comparison we analyzed the current data set using the method reported in  and found that the classification accuracy is worse for the current data set than for the set analyzed in . In other words, the current data set is noisier than that used in . Therefore, while the methods reported here offer improvement over those reported in , this improvement is not reflected clearly in a direct comparison of the best reported values of sensitivity and specificity. However, the comparative analysis of methods reported here do make clear a number of useful observations such as: Multivariate classification is likely more robust and accurate than single parameter classification. The technique of bagging of training data improves classifier performance. Use of multiple EOCT images at a single examination location improves classification accuracy over use of a single image. Three-category classification was achieved with very low false negative rate using a classification tree and multiple images per site.
In conclusion, we present a generalizable method for developing CAD classifiers for EOCT, and the results presented here demonstrate that CAD has the potential to accurately detect the presence or absence of dysplasia for surveillance of Barrett’s esophagus using EOCT. CAD quantifies classification criteria, eliminating inter-observer variability and potentially allowing further stratification.
The authors acknowledge the contributions of Brian Wolf, and Matthew Ford and the financial support of the National Institutes of Health (CA94304 and CA114276). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
|BE: Barrett’s esophagus|
D: dysplasia or dysplastic
LD: low-grade dysplasia or dysplastic
HD: high-grade dysplasia or dysplastic
NHD: non-high-grade dysplasia or dysplastic
ROI: region of interest
CAD: computer-aided diagnosis
CSAC: center symmetric auto-correlation
TFCM: texture feature coding method
|COOC: co-occurrence matrix|
SCOV: gray scale texture covariance
VAR: local variance
BVAR: between-pair variance
WVAR: within-pair variance
SVR: variance ratio
SAC: normalized gray scale texture covariance
TFN: texture feature number
MC: mean convergence
CE: code entropy
CI: confidence interval
ROC: receiver operating characteristic
PCA: principal component analysis
LDA: linear discriminant analysis
QDA: quadratic discriminant analysis
k-NN: k-nearest neighbor
NN: neural network
SLNN: single-hidden layer NN
LVQ: learning vector quantization
Bagging: bootstrap aggregating
PPV: positive predictive value
NPV: negative predictive value