Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2678744

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Methods
- 3 Evaluation study
- 4 Experimental results and discussions
- 5 CONCLUSION
- References

Authors

Related links

Pattern Recognit. Author manuscript; available in PMC 2010 June 1.

Published in final edited form as:

Pattern Recognit. 2009 June; 42(6): 1126–1132.

doi: 10.1016/j.patcog.2008.08.028PMCID: PMC2678744

NIHMSID: NIHMS98494

See other articles in PMC that cite the published article.

In this paper we propose a microcalcification classification scheme, assisted by content-based mammogram retrieval, for breast cancer diagnosis. We recently developed a machine learning approach for mammogram retrieval where the similarity measure between two lesion mammograms was modeled after expert observers. In this work we investigate how to use retrieved similar cases as references to improve the performance of a numerical classifier. Our rationale is that by adaptively incorporating local proximity information into a classifier, it can help to improve its classification accuracy, thereby leading to an improved “second opinion” to radiologists. Our experimental results on a mammogram database demonstrate that the proposed retrieval-driven approach with an adaptive support vector machine (SVM) could improve the classification performance from 0.78 to 0.82 in terms of the area under the ROC curve.

Breast cancer remains to be a leading cause of death among women in the developed countries. The American Cancer Society [1] estimates that in 2008 approximately 182,460 women in the US will be diagnosed with invasive breast cancer. About 40,480 women will die from this disease this year. Currently mammography is the dominant method for detection of breast cancer. But it is still far from being perfect. The high sensitivity of screening mammography is compromised by its low specificity to benign lesions, which often appear mammographically similar to malignant lesions. This results in approximately 70% of biopsies performed on benign lesions [2][3].

Clustered microcalcifications (MC) can be an important early sign of breast cancer. As an example, Figure 1 shows a mammogram with a cluster of microcalcifications. They appear as bright spots of calcium deposits. Individual MCs are sometimes difficult to detect because of the surrounding breast tissue, their variation in shape (from granular to rod shapes), orientation, brightness and diameter size. Due to the subtlety in the appearance of individual MCs, there is a significant risk that a radiologist may misclassify some cases in breast cancer diagnosis [4]. It has been reported that 10 – 30% of lesions are misinterpreted during routine screening of mammograms [5]. In recent years there have been significant efforts in development of computerized methods for automated classification of MCs. For example, Hamid *et al.* [6] investigated the performance of four different texture and shape feature extraction methods for classification of benign and malignant MCs. Massimo *et al.* [7] proposed a multiple-expert approach for classifying MCs wherein the final output was obtained from a combination of multiple experts. In [4] we investigated several state-of-the-art machine learning algorithms, and found that a support vector machine (SVM) classifier could achieve the best performance among several well-known methods for MC classification.

Left: A mammogram in craniocaudal view; Right: expanded view showing clustered microcalcifications (MCs).

Recently we developed a content-based mammogram retrieval system as a diagnostic aid to radiologists in their interpretation of mammograms [8]. We conjecture that by presenting perceptually similar mammograms with known pathology to the one being evaluated, the radiologists could reach a better informed decision in their diagnosis. Our proposed mammogram retrieval system involves two major components: 1) retrieving similar mammogram images from a database by using learning based similarity measure, and 2) classifying the query mammogram image based on retrieved results (retrieval-driven classification). This retrieval framework is illustrated with a functional diagram in Figure 2.

In our retrieval system [8], we explored a similarity measure for mammogram retrieval based on supervised learning from expert readers. We evaluated the approach using data collected from an observer study with a set of clinical mammograms. It was demonstrated that the proposed machine learning approach can be used to model the notion of similarity as judged by expert readers in their interpretation of mammogram images and that it can outperform alternative similarity measures derived from unsupervised learning.

In this work, we focus on the second component of our proposed mammogram retrieval and classification system: microcalcification classification assisted by retrieval. The traditional approach in this field is to present the human observer with examples in the database that are similar to the one being examined. In this study we consider how to use the retrieved similar cases as references to improve a numerical classifier’s performance. We conjecture that by adaptively incorporating proximity information to the cost function of a classifier, it can help to improve its classification accuracy, thereby leading to an improved “second opinion” to radiologists. Toward this goal, we propose a retrieval-driven approach with an adaptive SVM for improving the classification performance. We choose SVM since it has been demonstrated to outperform several competing methods in microcalcification classification [4].

SVM is a constructive learning procedure rooted in statistical learning theory [9]. It is based on the principle of structural risk minimization, which aims at minimizing the bound on the generalization error. The SVM decision function is pre-determined through a training process using a set of examples before it can be applied to data outside the training set. As a result, an SVM tends to perform well when applied to data outside the training set. Indeed, in recent years SVM learning has found a wide range of real-world applications, including handwritten digit recognition [10], object recognition [11], speaker identification [12], face detection in images [13], and text categorization [14], etc. In our recent work [15], we developed an SVM based approach for detection of clustered microcalcifications in mammograms, and demonstrated using clinical mammogram data that such an approach could outperform several well-established methods in the literature. In [4], we applied SVM for classification of benign vs. malignant clustered microcalcifications.

Despite its success, the performance of an SVM classifier can be hampered by several factors in practice. First, the nature of the problem is often complicated and not well understood, as is the case of breast cancer diagnosis [4], and it is not even clear that the classification task could be well described by a single decision function. Secondly, even when such a decision function indeed exists, the “true” decision boundary is rarely obtainable because of the limited number of available training samples. In such a case, a challenging problem is how to strike a balance between over-fitting and under-fitting in the classifier model. This is especially the case when it is too expensive or simply impossible to obtain enough training samples in many practical problems. Consequently, it becomes impossible to determine the “optimal” classifier function. For example, in [4] the best classification performance achieved by the SVM was still far from being perfect, which we believe is largely due to the presence of many difficult-to-classify cases in the database we used [4].

In this work we propose a locally adaptive SVM classification scheme. In the proposed scheme, we attempt to adapt the decision function of the SVM classifier according to how it performs on samples that are close to the one being examined (called query). Specifically, before the SVM decision function is applied to the query, it is first tested and adapted based on the knowledge of the samples that are in its neighborhood. Our motivation is as follows: if the SVM function is found to perform poorly on known samples close to the query, it indicates that the decision function has not been well trained for samples in the neighborhood of the query, and thus, it will also likely not perform well on the query. In such a case, we will adjust the SVM classifier using these similar samples accordingly, which in turn can lead to improved classification accuracy on the query.

In our retrieval-driven classification scheme the SVM classifier is adaptive in that its decision boundary is adjusted according to the “local” information of the case to be classified (i.e., retrieved similar cases). To demonstrate the concept, we show a classification example in Figure 3. In this binary classification problem, there are 300 samples in each class. Figure 3(a) shows the decision boundary between the two classes of an SVM (Gaussian RBF kernel, *σ* = 2.5, *C* = 100) trained from a subset of 100 training samples; Figure 3(b) shows the decision boundary obtained using the proposed adaptive SVM classifier (with the same parametric setting as the SVM in (a)). In this example, the five nearest neighbors according to the Euclidean distance were used as the similar cases to the query sample. As can be seen, the proposed adaptive SVM could achieve improved classification over the SVM in this example.

(a) The classification boundary learned by SVM; (b) the classification boundary learned by adaptive SVM.

We note that there exist several algorithms related to adaptive SVM classification in the literature which aim to improve a classifier’s performance. For example, in [16] an adaptive-margin SVM was proposed in which the classifier margin was adjusted for each training sample. In our own previous work [17], we used the concept of adaptive SVM in a content-based image retrieval system, where the SVM regression function was adjusted according to the relevance feedback samples provided by the user. To our best knowledge, these methods are quite different from our proposed approach here.

Consider a general two-class classification problem of assigning a class label *y* {−1, +1} to an input feature vector **x** *R*^{N}. We are given input-output training data pairs {(**x**_{1}, *y*_{1}),…,(**x**_{n}, *y*_{n})}.

In an SVM classifier, the classification function can be written in the following form [9]:

$${f}_{\mathit{SVM}}(\mathbf{x})={\mathbf{w}}^{T}\mathrm{\Phi}(\mathbf{x})+b$$

(1)

where the parameters **w**, *b* are determined from the training data samples. This is accomplished through minimization of the following so-called *structural risk* function:

$$\begin{array}{l}J(\mathbf{w},\xi )=\frac{1}{2}{\mathbf{w}}^{T}\mathbf{w}+C\sum _{i=1}^{N}{\xi}_{i}\\ s.t.\phantom{\rule{0.38889em}{0ex}}{y}_{i}{f}_{\mathit{SVM}}({\mathbf{x}}_{i})\ge 1-{\xi}_{i}\\ {\xi}_{i}\ge 0;\phantom{\rule{0.38889em}{0ex}}i=1,2,\dots ,N\end{array}$$

(2)

where *C* is a user-specified, positive parameter, *ξ _{i}* are slack variables. The cost function in Eq. (2) constitutes a balance between the empirical risk (i.e., the training errors reflected by the second term) and model complexity (the first term). A larger

In the proposed adaptive SVM, we modify the SVM cost function as follows:

$$\begin{array}{l}\stackrel{\sim}{J}(\mathbf{w},\xi )=\frac{1}{2}\parallel \mathbf{w}\mid {\mid}^{2}+{C}^{(s)}\sum _{{\mathbf{x}}_{i}\in N(\mathbf{x})}{\xi}_{i}+C\sum _{{\mathbf{x}}_{i}\notin N(\mathbf{x})}{\xi}_{i}\\ s.t.\phantom{\rule{0.38889em}{0ex}}{y}_{i}{f}_{\mathit{SVM}}({\mathbf{x}}_{i})\ge 1-{\xi}_{i}\\ {\xi}_{i}\ge 0;\phantom{\rule{0.16667em}{0ex}}i=1,2,\dots ,N\end{array}$$

(3)

where *N*(**x**) denotes the set of training samples that are in a defined neighborhood of a query sample **x**, and *C*^{(}^{s}^{)} is a penalty parameter introduced for the training samples in *N*(**x**).

In the modified cost function (**w**, *ξ*) above, the training samples in *N*(**x**) are closer (hence more similar) to the query **x** than the others. We write *C*^{(}^{s}^{)} = *tC*, where 1 < *t* < ∞ is a *penalty factor*. This will have the effect of imposing a greater emphasis (*C*^{(}^{s}^{)}) on those samples similar to the query **x*** _{i}* over other samples. The rationale is that those similar samples should have a greater impact on the classification of the query. Thus, a larger penalty is assessed in the cost function (

In the SVM cost function, the purpose of using model complexity to regularize the optimization of empirical risk is to avoid over-fitting, a situation in which the decision boundary too precisely corresponds to the training data, and thereby may fail to perform well on data outside the training set. As in the choice of the parameter *C* in regular SVM, the newly introduced penalty factor *t* in the adaptive SVM will have to be determined during the training phase. This can be determined by using a cross-validation procedure on the set of samples similar to the query, as we discuss later in Sect. 2.4.

Below we examine how the modified SVM cost function can impact on the SVM decision function in Eq. (4). Recall the SVM formulation in Eq. (2), a training sample (**x*** _{i}*,

Introducing a so-called kernel function *K*(·, ·), we can rewrite the SVM function *f*_{SVM}(**x**) in Eq. (1) as follows:

$${f}_{\mathit{SVM}}(\mathbf{x})=\sum _{i=1}^{{N}_{s}}{\alpha}_{i}K(\mathbf{x},{\mathbf{s}}_{i})+b$$

(4)

where **s*** _{i}*,

Let **x** denote a query sample to be classified, and *N*(**x**) denote a set of training samples in its neighborhood. If none of the samples in *N*(**x**) are error support vectors in the regular SVM cost function *J*(**w**, *ξ*), then the adaptive SVM decision function resulting from the modified cost function (**w**, *ξ*) will coincide with the regular SVM.

To demonstrate the above proposition, consider a training sample **x*** _{i}* in

The above proposition offers a rather intuitive insight on the nature of the adaptive SVM. Recall that for a sample **x*** _{i}* that is not an error support vector, we have

Furthermore, recall that the support vectors are associated with the decision boundary of the decision function *f*_{SVM}(**x**). When the training samples in the neighborhood of the query **x** are away from the decision boundary, so is **x**. Thus, the result in Proposition I implies the following: the adaptive SVM makes the same decision as the regular SVM classifier *f*_{SVM}(**x**) for a query sample **x** that it is located far away from its decision boundary; on the other hand, when **x** is near the decision boundary of *f*_{SVM}(**x**) and that the samples in its neighborhood are not well-classified (signaled by the presence of error support vectors), the SVM will be retrained with an emphasis on the samples in the local neighborhood of **x**. As we explain next, this observation can lead to a computationally efficient implementation of the adaptive SVM classifier.

Compared to the regular SVM, it may seem that the adaptive SVM will be much more demanding computationally, because the modified cost function (**w**, *ξ*) in Eq. (3) would vary with the query sample **x**, which would need to be re-optimized for every **x**. However, based on the result in Proposition I, this is not the case. Instead, we can greatly reduce the extra computation burden by employing a regular SVM. Specifically, we adopt the following procedure for training the adaptive SVM: for each query, we first apply a regular SVM classifier on its similar cases. If it can correctly classify all of them, then we apply this SVM classifier to the query as well; otherwise, we invoke the adaptive SVM procedure. The rationale behind this is that if the SVM classifier performs well on the similar cases, it will likely perform well on the query as well. Our experiments show that this can result in marked saving in computation time.

For retraining the SVM, the optimization problem for the modified cost function in Eq. (3) can be solved in a similar fashion as in the case of regular SVM in Eq. (2). Indeed, by using the method of Lagrange multipliers and applying the Kuhn-Tucker conditions, the dual problem of Eq. (3) is obtained as maximization of:

$$\begin{array}{l}J(\overrightarrow{\alpha})=\sum _{i=1}^{N}{\alpha}_{i}-\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}{\alpha}_{i}{\alpha}_{j}{y}_{i}{y}_{j}K({\mathbf{x}}_{i},{\mathbf{x}}_{j})\\ s.t.\phantom{\rule{0.38889em}{0ex}}\sum _{i=1}^{N}{\alpha}_{i}{y}_{i}=0\\ 0\le {\alpha}_{i}\le {C}^{(s)},\phantom{\rule{0.38889em}{0ex}}{\mathbf{x}}_{i}\in N(\mathbf{x})\\ 0\le {\alpha}_{i}\le C,\phantom{\rule{0.38889em}{0ex}}{\mathbf{x}}_{i}\notin N(\mathbf{x})\end{array}$$

(5)

The maximization is accomplished by quadratic programming. To speed up the numerical algorithm, the so-called incremental learning technique can be applied [17], in which the cost function in Eq. (3) is treated as a perturbation from the regular SVM in Eq. (2), and thus, the regular SVM solution can be used as a starting point for the solution of Eq. (5).

As in regular SVM, the parameters of the adaptive SVM will have to be determined during the training phase. In particular, the newly introduced penalty factor *t* can be determined using a leave-one-out cross-validation procedure on the set of similar samples to the query. Too large a value for *t* can lead to over-emphasis on the training samples near the query, which may cause over-fitting. On the other hand, too small a value for *t* may not have enough impact on the cost function. For each query, we pick the *t* value that corresponds to the lowest error rate resulting from this procedure. i.e.,

$$t=\underset{t}{\mathit{argmin}}\sum _{{\mathbf{x}}_{i}\in N(\mathbf{x})}[{y}_{i}-\mathit{sign}({\stackrel{\sim}{f}}_{\mathit{SVM}}({\mathbf{x}}_{i}))]$$

(6)

where _{SVM}(**x**) is the adaptive SVM classifier. This yields a customized penalty factor for each test sample. In our experiments the penalty factor was found to be typically in the range of 2 ≤ *t* ≤ 10.

Another issue for the adaptive SVM is how to determine the similar samples to use for the query. In this work we use the mammogram retrieval framework reported previously in [8]. For each query mammogram image, we invoke the retrieval system to obtain a set of similar mammograms from the database, which is then used for the adaptive SVM. For comparison purposes, we also experimented with using other distance based similarity measures, including *k*-nearest neighbors based on the Euclidean distance [18], and discriminant adaptive nearest neighbors (DANN) [19]. DANN [19] is an improved version of the *k*-nearest neighbor (KNN) measure based on the Euclidean distance for computing the similarity between two images. The KNN is based on the assumption that locally the class posterior probabilities are approximately constant. This is not true in many cases especially in a high dimensional space. The DANN modifies the KNN metric, so that the resulting local neighborhood stretches out in the direction for which the class probability does not vary much. In our experiments, a neighborhood was formed at each query point and the class distribution among these neighborhood points was used to decide how to adapt the metric. The adapted metric was then used in a nearest-neighbor rule at the query point. In essence, the modified neighborhood extends in parallel to the local decision boundaries and shrinks in directions orthogonal to the decision boundaries.

To further reduce the computational complexity of the proposed adaptive SVM classifier (Ada-SVM), we also introduced a pre-classifier stage in our experiments. This pre-classifier stage functions in the following fashion: For each query, we first examine its retrieved similar cases. If all these retrieved cases have the same class label, then we simply assign the same label to the query; otherwise, we will invoke the Ada-SVM classifier. The motivation for this pre-classifier stage is that if all the similar cases are found to be from the same class, it is a good indication that the neighborhood around the query is away from the decision boundary between the two classes. Because the decision function of SVM depends on only the samples located in the proximity of the boundary (i.e., support vectors), we simply invoke the KNN rule to avoid the more expensive Ada-SVM classifier. The decision steps of the resulting cascade SVM (Cas-SVM) classification system are described as follows: First, similar cases to a query sample are retrieved according to a similarity measure. Next, if all these similar cases have the same class label, simply assign the same label to the query; otherwise, we invoke the adaptive SVM classifier (Ada-SVM) as follows: if the SVM can correctly classify all the retrieved similar cases, we apply it to the query; if not, we apply the Ada-SVM to adapt the SVM classifier.

In our study we used a database of mammogram images collected by the Department of Radiology at the University of Chicago. The database consists of a total of 200 different mammogram images from 104 cases (46 malignant, 58 benign), of which all had lesions containing clustered microcalcifications which were histologically proven. These images were digitized with a spatial resolution of 0.1 mm/pixel and 10-bit grayscale. All these images contain clustered microcalcifications, as in the example shown earlier in Figure 1. The MCs in each image have been identified by a group of expert radiologists. Many of them are extremely difficult to classify [20]; in an observer study the average classification performance by a group of five attending radiologists on these cases yielded a value of only 0.62 in the area under the receiver-operating characteristic (ROC) curve [20].

For the retrieval system, we used the learning based similarity measure reported in [8], where 600 image pairs had been scored in a human observer study for training the similarity function. To test the proposed retrieval-driven adaptive SVM classifier, the 200 mammogram images were used in a leave-one-out procedure. During each round, one mammogram image was used as the test sample (i.e., query); similar mammogram images were then retrieved for this query from the database based on the learned similarity function; subsequently, the test mammogram was classified by the adaptive SVM.

It is important to note that, to avoid any potential bias, during the leave-one-out procedure the held-out image for testing was also removed from training the retrieval stage. This achieved complete isolation of the test sample from any of the training sets.

In our previous work [17] a set of 10 features was used based on the geometric distribution of the MCs in a cluster. However, image features of individual MCs are very important in diagnosis of clustered MCs. To better characterize the similarity data by the experts, in this work we introduced eight additional features which were demonstrated to have high discriminating power for cancer diagnosis [4]. These eight features were selected to have intuitive meanings that correlate qualitatively to features used by radiologists [21]. Consequently, there were a total of 18 features used for describing the MCs. For the purpose of selecting the most relevant features for similarity learning and classification, we applied a feature selection procedure, called sequential backward selection [22]. It is a suboptimal searching procedure, but is simple and easy to implement. The following set of 12 features was finally selected for characterizing a MC cluster:

- compactness of the cluster: a measure of roundness of the region occupied by the cluster.
- eccentricity of the cluster: the eccentricity of the smallest ellipse of the region (ratio of the distance between the foci and the major axis).
- the number of MCs per unit area.
- the average of the inter-distance between neighboring MCs.
- the standard deviation of the inter-distance between neighboring MCs.
- solidity of the cluster region: the ratio between cross-sectional area and the area of the convex hull formed by the MCs.
- the moment signature of the cluster region: computed based on the distance deviation of the boundary point from the center of the region.
- the number of MCs in the cluster.
- the mean effective volume (area times effective thickness) of individual MCs.
- the relative standard deviation of the effective thickness.
- the relative standard deviation of the effective volume.
- the second highest MC-shape-irregularity measure.

In our experiment, all the feature components were normalized to have the same dynamic range (0,1).

To evaluate the performance of a classifier, we used the so-called ROC analysis [23], which is now used routinely for many classification tasks. An ROC curve is a plot of the classification sensitivity (i.e., true positive fraction (TPF)) as the ordinate versus the specificity (i.e., false positive fraction (FPF)) as the abscissa; for a given classifier, it is obtained by continuously varying the threshold associated with the decision function. Thus, at any given FPF, an ROC curve with a higher TPF corresponds to better classification performance. As a summary measure of overall diagnostic performance, the area under an ROC curve (denoted by *A _{z}*) is used. A larger

Figure 4 summarizes the classification results achieved by the proposed retrieval-driven approach (Ada-SVM and Cas-SVM), where the obtained *A _{z}* value is plotted against

As can be seen from Figure 4, the classification result (*A _{z}*) could be improved from 0.7752 (SVM) to 0.8139 (Ada-SVM with

We also note from Figure 4 that as the size *N* of retrieved images is further increased the classification performance *A _{z}* value starts to decrease. We believe that this is due to the fact that the database is limited in size, which in turn limits the number of truly “similar” cases to the query. Thus, further increase of the number of retrieved images will no longer be beneficial.

For comparison, in Figure 5 we also show the classification results obtained by the adaptive SVM using a different similarity measure, the discriminant adaptive nearest neighbors [19] (DANN), for retrieving similar images. Note that the classification result could still be improved from 0.7752 (SVM) to 0.7925 (Ada-SVM with *N*=9). For clarity, the best classification results with different similarity measures for retrieval are showed in Table 1.

In our experiments, we also tested the proposed approach by grouping multiple views from the same cases. Specifically, the multiple views were treated as separate samples, but grouped together either for training or for testing in classification. In testing a classifier, a case is classified as malignant when any of its multiple views are classified as malignant. Under this setup, the performance was improved from 0.7691 (SVM) to 0.8050 (Ada-SVM with *N* = 7) and 0.8134 (Cas-SVM with *N* = 7). The error rates were 0.3173 (SVM), 0.2692 (Ada-SVM with *N* = 7) and 0.25 (Cas-SVM with *N* = 7), respectively. As above, SVM, Ada-SVM and Cas-SVM were using the same parameter setting (Gaussian RBF kernel, *σ* = 2.5, *C* = 100). Interestingly, these results are similar to those obtained above. In addition, when the distance based measure (DANN) was used in the retrieval stage, the classification result could be improved from 0.7691 (SVM) to 0.7856 (Ada-SVM with *N*=9).

In this paper we proposed a classification approach assisted by content-based image retrieval to improve the classification accuracy in computer aided diagnosis for breast cancer. We presented the proposed adaptive classification scheme in the context of SVM learning, which has been demonstrated to outperform several competing methods in breast cancer classification. The proposed retrieval and classification framework was developed and tested using a database of 200 mammogram images collected by the Department of Radiology at the University of Chicago. While the dataset is somewhat limited in size, our results demonstrated that the proposed adaptive SVM classifier could lead to reduced generalization error. As the part of the proposed CBIR system, we demonstrated that a numerical observer’s (classifier) performance could be improved by incorporating retrieved similar cases. Encouraged by this initial success, we plan to further develop and validate the proposed approach using more clinical evaluations in future studies. It is reasonable to expect that use of a significantly enlarged database will increase the number of truly “similar” cases for retrieval, which would lead to even bigger improvement in classification performance by the adaptive SVM classifier.

RM Nishikawa is a shareholder in Hologic, Inc (Bedford, MA). He and the University of Chicago receive research funds and royalties from Hologic.

•

**Yongyi Yang** received the B.S.E.E. and M.S.E.E. degrees from Northern Jiaotong University, Beijing, China, in 1985 and 1988, respectively. He received the M.S. degree in applied mathematics and the Ph.D. degree in electrical engineering from Illinois Institute of Technology (IIT), Chicago, in 1992 and 1994, respectively.

Dr. Yang is currently on the faculty of the department of electrical and computer engineering at IIT, where he is an Associate Professor. Prior to this position, He was a faculty member with the Institute of Information Science, Northern Jiaotong University. His research interests are in signal and image processing, medical imaging, machine learning, pattern recognition, and biomedical applications. He is a co-author of *Vector Space Projections: A Numerical Approach to Signal and Image Processing, Neural Nets, and Optics*, John Wiley & Sons, Inc., 1998. He is an Associate Editor for the IEEE Transactions on Image processing.

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

This work was supported in part by NIH/NCI grant CA89668.

1. AmericanCancerSociety, Cancer Facts and Figures. 2008. www.cancer.org.

2. Knutzen AM, Gisvold JJ. Likelihood of malignant disease for various categories of mammographically detected, nonpalpable breast lesions. Mayo Clin Proc. 1993;68:454– 460. [PubMed]

3. Kopans DB. The positive predictive value of mammography. AJR. 1992;158:521–526. [PubMed]

4. Wei L, Yang Y, Nishikawa RM. A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications. IEEE Trans on Medical Imaging. 2005;24(3):371–380. [PubMed]

5. Strickland RN, Hahn HL. Wavelet transforms for detecting microcalcifications in mammograms. IEEE Trans on Medical Imaging. 1996;15:218–229. [PubMed]

6. Soltanian-Zadeh H, Rafiee-Rad F, P-ND S. Comparison of multiwavelet, wavelet, haralick, and shape features for microcalcification classification in mammograms. Pattern Recognition. 2004;37:1973–1986.

7. Santo MD, Molinara M, Tortorella F, Vento M. Automatic classification of clustered microcalcifications by a multiple expert system. Pattern Recognition. 2003;36:1467–1477.

8. Wei L, Yang Y, Nishikawa RM, Wernick MN. Learning of perceptual similarity from expert readers for mammogram retrieval. IEEE Int’l Symposium on Biomedical Imaging. 2006:1356–1359.

9. Vapnik V. Statistical Learning Theory. John Wiley; 1998.

10. Pontil M, Verri A. Support vector machines for 3-d object recognition. IEEE Trans on Pattern Anal Mach Intell. 1998;20:637–646.

11. Wan V, Campbell WM. Support vector machines for speaker verification and identification. Proc IEEE Workshop on Neural Networks for Signal Processing. 2000:775–784.

12. Osuna E, Freund R, Girosi F. Training support vector machines: Application to face detection. Proc Computer Vision and Pattern Recognition. 1997:130–136.

13. Joachims T. Transductive inference for text classification using support vector machines. Proc. Int. Conf. on Machine Learning.

14. Burges CJ. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. 1998;2:121–167.

15. El-Naqa I, Yang Y, Wernick MN, Galatsanos NP, Nishikawa RM. A support vector machine approach for detection of microcalcifications. IEEE Trans on Medical Imaging. 2002;21(12):1552–1563. [PubMed]

16. Herbrich R, Weston J. Adaptive margin support vector machines for classification. Ninth International Conference on Artificial Neural Networks. 1999;2:880–885.

17. El-Naqa I, Yang Y, Galatsanos NP, Wernick MN. Relevance feedback based on incremental learning for mammogram retrieval. Intel Conf on Image Processing. 2003;1:729–732.

18. Duda RO, Hart PE. Pattern Classification and Scene Analysis. New York: Wiley; 1973.

19. Hastie T, Tibshirani R. Discriminant adaptive nearest neighbor classification. IEEE Trans on Pattern Anal Mach Intell. 1996;18(6):607–616.

20. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K. Improving breast cancer diagnosis with computer-aided diagnosis. Academic Radiology. 1999;6:22–33. [PubMed]

21. Jiang Y, Nishikawa RM, Wolverton EE, Metz CE, Giger ML, Schmidt RA, Vyborny CJ. Malignant and benign clustered microcalcifications: Automated feature analysis and classification. Radiology. 1996;198:671–678. [PubMed]

22. Theodoridis S, Koutroumbas K. Pattern Recognition. Academic press; 2003.

23. Metz CE, Herman BA, Shen J. Maximum-likelihood estimation of receiver operating (roc) curves from continuously-distributed data. Stat Med. 1998;17:1033–1053. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |