Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Biomed Opt. Author manuscript; available in PMC 2010 November 24.
Published in final edited form as:
PMCID: PMC2990885

Automated algorithm for breast tissue differentiation in optical coherence tomography


An automated algorithm for differentiating breast tissue types based on optical coherence tomography (OCT) data is presented. Eight parameters are derived from the OCT reflectivity profiles and their means and covariance matrices are calculated for each tissue type from a training set (48 samples) selected based on histological examination. A quadratic discrimination score is then used to assess the samples from a validation set. The algorithm results for a set of 89 breast tissue samples were correlated with the histological findings, yielding specificity and sensitivity of 0.88. If further perfected to work in real time and yield even higher sensitivity and specificity, this algorithm would be a valuable tool for biopsy guidance and could significantly increase procedure reliability by reducing both the number of nondiagnostic aspirates and the number of false negatives.

Keywords: optical coherence tomography (OCT), automated tissue differentiation, breast biopsy, fine needle aspiration

1 Introduction

Breast cancer is the most common cancer found in women in the United States, with an estimate of over 180,000 new cases every year. It is also the second leading cause of cancer deaths in women, after lung cancer, and is responsible for the deaths of over 40,000 women every year in the U.S.1 The most common breast malignancy is invasive ductal carcinoma (IDC), which accounts for around 75 to 80% of invasive breast cancers, while invasive lobular carcinoma (ILC) accounts for most other invasive cases. As with all malignancies, early detection of breast cancer is the only way to effectively manage patients who suffer from this disease. According to the American Cancer Society (ACS),2 the 5-year survival rates for persons with breast cancer that are appropriately treated are as follows: 100% for stage 0, 100% for stage I, 92% for stage IIA, 81% for stage IIB, 67% for stage IIIA, 54% for stage IIIB, and 20% for stage IV. This indicates that improved diagnosis methods need to be developed to detect this cancer in its early stage when it is treatable and its survival rates are high.

Mammography is currently the standard screening tool for breast cancer. When suspicious masses are found during the mammographic screening, various other tests are performed to confirm and stage the disease. Among these tests, biopsy has proven to be the best way to determine whether a suspicious area seen in an image is in fact cancer. Breast biopsies provide very useful diagnostic information and can be comfortably performed with intravenous sedation and local anesthesia. However, while the biopsy of relatively large palpable masses usually has a high diagnostic yield, up to 99% (Ref. 3), the biopsy of smaller infiltrating masses has a much lower diagnostic yield (below 60% in some studies).410 Without a priori knowledge of the 3-D location of small cancer masses, it is unlikely that a biopsy protocol will yield consistently high cancer detection rates. This is the primary reason why biopsy has relatively high rates of false-negative diagnoses when no guidance modalities are used. Since the majority of lesions for which women undergo biopsy prove to be benign and many women have multiple biopsies during their lifetimes, very reliable and less invasive techniques are required. Currently, three types of biopsies are used on a large scale: fine needle aspiration biopsy (FNAB), core needle biopsy (CNB), and vacuum biopsy (VB).

FNAB is the least invasive and best tolerated procedure, typically using a 23-gauge needle or smaller,11 and it is preferred by many patients because it does not produce discomfort or bleeding. However, its diagnostic yield largely depends on the clinician skills.10,1214 FNAB is a cost-effective and rapid procedure, and the results are rapidly available for cytopathologic analysis. Therefore, the patient usually has a diagnostic by the end of the visit. The number of annual FNAB procedures varies from one clinic to another within a range of a few hundred to thousands,15 but even for small clinics, it is usually over one hundred per year.

CNB is the most used technique and has high reliability because it allows for collection of larger size tissue samples (up to 1.5 mm in diameter and 1 to 2 cm in length) for histological analysis. However, CNB is often associated with deleterious side-effects, including serious discomfort and bleeding, and potential tissue morbidity. Also, CNB requires ultrasound or computed tomography (CT) needle guidance to sample nonpalpable masses and to avoid perforating major vessels.

VB has recently gained acceptance by clinicians because it provides more reliable results than FNAB or CNB.16,17 However, it is relatively expensive and produces even greater discomfort than CNB.

Currently, most of the FNAB interventions are done without any guidance modality. However, due to the inability to identify tissue type by manual palpation and the challenges of positioning the needle tip within a viable tumor, which may be admixed with normal, reactive, and necrotic tissue, nondiagnostic aspirates occur in about 20% of aspirates and in 5 to 15% of patients.3 Therefore, a relatively simple but efficient method for FNAB guidance would substantially increase the diagnostic yield of this simple, minimally invasive, and affordable procedure. Proper needle placement could substantially reduce the number of the nondiagnostics aspirates and improve the sensitivity and specificity of the procedure. Ultrasound and stereotactic CT guidance of all three types of biopsy procedures have produced enhanced outcomes.1820 However, ultrasound and stereotactic CT guidance is not always available and when it is available, it substantially increases the overall cost of the biopsy. Therefore, simpler and less expensive guidance methods are desirable.

Optical methods have been developed for many years to improve biopsy outcome. They can detect tissue abnormalities with relatively good accuracy, and therefore they offer a viable alternative for biopsy guidance. Among various techniques developed to date, spectroscopic-based methods have shown real promise for tissue-type discrimination. For example, several diagnostic studies have found significant differences in both the emission and excitation spectra from normal, benign, and/or malignant breast tissues. Alfano et al.21,22 have first showed differences between the fluorescence emission spectra of normal and malignant breast tissues. Yang et al.23,24 have used the emission spectra within the 300 to 400-nm spectral region to discriminate between malignant and fibrous samples. They found 93% sensitivity and 95% specificity, but results were worse for discriminating normal fatty and malignant tissues. Gupta et al.25 have measured emission spectra when normal, benign (fibroadenoma), and malignant (IDC) breast tissue samples were excited with 337 nm. Using the integrated emission intensity from the 337-nm excitation, malignant tissues were separated in a binary fashion from both benign and normal with a specificity of 98%. Diffuse reflectance methods have also shown promise for use in tissue discrimination. Several studies have determined that diffuse reflectance spectroscopy can detect changes in scattering and absorption due to malignancy-associated alterations in levels and organization of hemoglobin, β-carotene, DNA, and other proteins.2123,2529 Bigio et al.26 have used in vivo measurements to distinguish between malignant and normal tissue with sensitivities up to 69% and specificities up to 93%. Ramanujam30 has combined reflectance and fluorescence measurements but has found no significant improvement in diagnostic performance using this multimodal approach. However, neither of the aforementioned reflectance technologies can probe the tissue in depth over a distance of at least 1 to 2 mm, and therefore their role is mostly limited to epithelial malignancy biopsy guidance (for example, for colon polyps, esophageal, oral, or cervical cancer). Another limitation of most of these techniques is that they cannot be performed through the lumen of a fine gauge needle. Only optical reflectance and the reduced scattering coefficient have been investigated using a needle-like probe for tissue characterization.31

More recently, optical coherence tomography (OCT) and low-coherence interferometry (LCI), the nonscanning, nonimaging version of OCT, have been applied to tissue discrimination toward optical biopsy3236 and image-guided surgery of breast cancer.37 OCT is an optical ranging technique that is capable of imaging depth-resolved (axial, z) tissue structure, birefringence, flow (Doppler shift), and spectra at a resolution of several microns. The tissue probing depth with this technology is on the order of 2 to 3 mm, which is almost one order of magnitude higher than the probing depth of the spectroscopic approaches. Besides this advantage, OCT systems can be constructed using fiber-optic components, and therefore OCT probes can easily fit within the bore of a fine gauge needle (for both scanning and nonscanning modes), allowing diagnostic information to be obtained directly from the FNAB site. Such very small fiber-optic-based probes have numerous clinical diagnostic and therapeutic applications.

Automated interpretation of OCT findings is, nonetheless, a very challenging issue. Previous studies3236 suggested that this technology has the potential to substantially increase the diagnostic yield of the FNAB procedures. However, until now, only the differentiation between adipose and tumor has been demonstrated on both OCT and LCI scans.3336 A previous study32 demonstrated the possibility of differentiating between tumor, adipose, and stroma (connective tissue) using elaborated algorithms illustrated on a very limited number of samples and without axial discrimination. The capability of differentiating between the multiple tissue types (adipose, fibrotic, tumor, necrotic, etc.) that could be present within the same OCT or LCI scan will add more value to an optical guidance tool.

In this paper, we demonstrate an advanced algorithm for automated differentiation of the three major tissue constituents: adipose, fibrous, and tumor, usually found admixed in suspicious breast masses. The algorithm was tested ex vivo on 137 samples of human breast tissue and provides spatial discrimination of the tissue both lateral and axial. With this algorithm, the pathologist/clinician performing the biopsy will be able to more precisely determine what tissue type is present at the tip of the needle before performing the biopsy. Therefore, this algorithm could help substantially decrease the number of nondiagnostic aspirates and increase the overall biopsy yield.

2 Methods

2.1 Instrumentation and Measurement Protocol

An ex vivo study on excised tissue specimens was conducted to test the capability of the OCT technology for tissue differentiation. The main objectives of this study were to identify characteristic features of each of the tissue types (fibrous, adipose, tumor), develop quantitative metrics for tissue differentiation using a training set of tissue specimens, and test these metrics on a validation set of tissue specimens.

OCT measurements were performed on over 150 fresh tissue samples from patients with breast cancer surgery (lumpectomy and mastectomy). A 1310-nm spectral-domain OCT (SD-OCT) system, presented elsewhere,38 was used for this study. This system provided an axial resolution of 10 µm, a lateral resolution of 25 µm, and an imaging range of about 2.2 mm at a line rate of 5.12 kHz. The SD-LCI/OCT system can accommodate both scanning and nonscanning probes, and therefore it can work in both LCI and OCT modes. A bench-top OCT probe was used in this study employing a free-space scanning mechanism in the sample arm. An imaging rate of 5 frames/s of 512 × 1024 pixels/frame was achieved with this system and was limited by the relatively low reading rate of the InGaAs line detector (SU512LX, Sensors Unlimited, Inc., Princeton, New Jersey). However, we are currently upgrading this system, and the new SU1024-LDH camera will allow for a frame rate of 45 frames/s.

The breast tissue samples were obtained from the Pathology Department, Massachusetts General Hospital (MGH), and National Disease Research Interchange (NDRI). No information about tissue donors was provided. Tissue procurement, handling, and data collection were performed according to an MGH-approved Institutional Review Board protocol (2002P000487 from 09/14/2007) and NDRI protocol (DIFN1-001-005 from 03/05/2007). Tissue samples from NDRI were shipped overnight within a few hours from excision. The tissue was kept in saline and shipped in a box with ice bags and arrived in very good pathological condition.

Our tissue measurement protocol consisted of OCT B-scans over small areas of each sample (3-mm lateral scans). Each tissue sample was kept hydrated in saline solution at 37 °C during the measurements. After completion of OCT measurements, each tissue sample was marked with india ink on the OCT imaging locations (usually 3 to 5 locations on each sample) and fixed with formalin (10% formalin in PBS). Histologic preparation of each tissue specimen was then performed at the MGH histology department.

Typical OCT images for each tissue type are shown in Fig. 3. The OCT image was cropped to 1.5 mm depth to keep only the tissue part of the image. Representative depth reflectivity profiles (A-scans) are shown in the first row of Fig. 1. A clear difference can be observed between the adipose tissue and the other two tissue types (fibrous and tumor). However, this difference is less significant between fibrous and tumor tissue. In many cases, tissue differentiation is difficult, especially within breast masses that consist of admixed tissue types, when more than one tissue type is present within the same reflectivity profile. Therefore, a set of key metrics (signal slope and variance, mean spatial frequency of the intensity peaks, mean area of the power spectrum peaks, etc.) was developed to find specific characteristics for each tissue type.

Fig. 1
Graphical illustration of the main steps in the signal processing sequence. First column (position 1 in Fig. 3, shown laser)—adipose tissue; second column (position 2 in Fig. 3)—fibrous and adipose tissue; third column (position 3 in ...
Fig. 3
Representative examples of OCT diagnosis on breast tissue specimens. (a) adipose; (b) admixed fibroadipose; (c) fibroglandular; (d) tumor.

2.2 Data Processing

An elaborated signal processing scheme was designed to determine a set of key metrics for tissue differentiation, and a data analysis algorithm was developed to analyze the key metrics and assign tissue type. This data processing scheme is summarized in the following. The OCT spectra are first processed following the standard SD-OCT procedure to produce the depth reflectivity logarithmic profiles (shown in Fig. 1 in linear scale using arbitrary units). The next step in data analysis is to remove the background that bears no relevant information. A constant background is subtracted from all A-lines. After background subtraction, a low-pass filter is applied to each depth profile to generate a smoothed depth reflectivity profile (Fig. 1, first row), and the smoothed profile is further used only to determine the slope of the signal decay (Fig. 1, second row).

2.2.1 Depth reflectivity profile parameters

The slope of the reflectivity profile is the first parameter used in our algorithm. It provides information related to the depth attenuation of the signal, which is a function of tissue type. If different slopes are found at different depths, it might indicate the presence of two or more tissue types within the same depth reflectivity profile. Therefore, linear fitting is performed on several windows, each window corresponding to a portion of the depth reflectivity profile that has the same slope (Fig. 1, second row). The first depth where the signal reaches 10% of the maximum intensity of the smoothed profile is used as the starting point for the linear fit. Alternatively, the starting point can be selected at the maximum of the signal; however, the maximum of the signal might not always reflect the real tissue surface but could be deeper into the tissue and the first portion of the tissue would be missed [Figs. 1(A) and 1(B)]. The end point of the linear fit is initially selected a predetermined distance (a quarter of the total depth) away from the starting point. Then the linear fit goodness R2 is calculated, and the end point of the linear fit is varied to maximize R2 (minimize 1/R2) using a standard optimization procedure (fminbnd function in MATLAB, for example) (Fig. 1, second row). The end point of the optimized linear fit in the first window is selected now as the starting point in the second window. The end point in the second window is first automatically selected at a predetermined distance away from the starting point as described earlier, and the R2 optimization procedure is performed in the second window. The process continues until the end of the profile is reached, the intensity of the linear fit becomes negative, or the signal is identically zero over the entire window. To avoid fitting very short segments of the smoothed profile, the minimum size of a window is selected as 20% of the whole depth range. If the first positive slope is shorter than this minimum length, it can be safely neglected, since it denotes the tissue surface. All the other parameters used in our tissue differentiation algorithm are calculated in each window that was found in this initial step of the signal processing algorithm.

In general, the adipose tissue is characterized by smaller slopes [1 and 2 in Fig. 1(D) and 3 in Fig. 1(E)], while fibrous and tumor tissue exhibit steeper slopes [1 and 2 in Figs. 1(E) and 1(F)].

The second parameter used in the tissue differentiation algorithm is the standard deviation (Std) of the depth profile variations around the linear fit (Fig. 1, third row). These variations, obtained by subtracting the linear fit from the depth profile, may also provide information about the nature of the tissue being investigated. Adipose tissue produces strong reflection peaks with low reflectivity zones between them because of the relatively high differences between the refractive indices of the fat cell cytoplasm and membrane, while fibrous and tumor tissues produce lower peaks. The spread of the depth profile variations around the linear fit is the largest for adipose tissue (large Std), is significantly smaller for fibrous tissue and is the smallest for tumor tissue. Notice here the change of vertical scale in Fig. 1, third row.

The mean distance between peaks is expected to be a characteristic size of the fat cells. Therefore, this mean distance is determined between consecutive peaks of the depth profile variations around the linear fit. This third parameter is called MeanPeakDistance. This parameter is expected to be relatively large for adipose tissue, medium for fibrous tissue, and small for tumor. (Tumor tissue is optically denser than fibrous or fibroglandular tissue.) Clearly, the mean distance between peaks in Fig. 1, third row, is the largest for adipose tissue (G) and is the smallest for tumor tissue (I).

The fourth parameter is the standard deviation of the peak spreading over depth (StdPeakDistance). A more homogeneous tumor tissue is expected to have a reduced spread of the peaks. Therefore, a peak finder, as a signal processing routine, was developed to identify the position of the peaks, neglecting in the same time the small local variations that would otherwise be interpreted as false peaks. The profile to be analyzed is first zero-padded with a factor of 5 to increase the number of points within the profile. The first derivative of the profile is then computed. Since the profile is a discrete array of points and not a continuous function, it is unlikely to find the exact zeros of the first derivative. However, the zero-padding allows us to get close enough. The algorithm searches for pairs of neighboring points for which the first derivative is negative for the left point and positive for the right point. The two points necessarily contain the zero-crossing between them, and the requirements on the first derivative ensure negative second derivative that identifies peak and not valley. The valleys can be identified this way as well by changing the signs on the first derivative. If the height difference between the peak and the neighboring valleys is smaller than a predetermined threshold, the peak is a local maximum and is disregarded.

2.2.2 Power spectrum parameters

The power spectrum calculation is the next step in the signal processing algorithm. The power spectrum is normalized to its maximum (Fig. 1, last row) and the peak detector is used to identify the frequency peaks. These calculations are performed for each window where different slopes were found. The weighted mean frequency (MeanFrequency) and the standard deviation around this mean (StdFrequency) are another two parameters that are evaluated. The power spectrum is used as the probability function for calculating the mean frequency. The power spectrum is expected to have a dominant small frequency for adipose tissue corresponding to large spatial distances between the fat cell walls, while for tumors it is expected to exhibit multiple high frequencies (a broad spectrum with relatively high mean and standard deviation), as seen in the last row of Fig. 1.

Only the frequency peaks above a certain threshold (0.3 as shown by the horizontal line in the last row of Fig. 1) are counted (PeakNr), indicating the number of dominant strong frequencies. Their total area above the threshold is calculated (PeakArea) with the purpose of identifying the spread of the dominant frequencies. Sharp peaks (smaller area) or broad peaks (larger area) for the same number of dominant frequencies may indicate the presence of different tissue types within the reflectivity profile. For example, breast cancerous tissue is generally denser and stiffer than the surrounding tissue, and therefore the OCT signal exhibits an increased number of dominant frequencies resulting in a broad normalized spectrum with large PeakArea [Fig. 1(M)]. Fibroglandular tissue [slopes 1 and 2 in Figs. 1(E) and 1(L)] is more heterogeneous than cancerous tissue but more homogeneous than adipose tissue. As a result, sharper frequency peaks with smaller PeakArea than for tumor tissue are observed, but with broader frequency peaks with larger PeakArea than for adipose tissue [slopes 1, 2, and 3 in Fig. 1(K) and slope 3 in Fig. 1(L)].

2.3 Decision Algorithm

As a result of the analysis presented earlier, eight parameters (Slope, Std, MeanPeakDistance, StdPeakDistance, MeanFrequency, StdFrequency, PeakNr, PeakArea) were calculated and assigned to each pixel in the OCT images for each tissue specimen used in the training set. The training set is selected based on the histological diagnosis provided by an experienced pathologist.

Mean values [x with macron]i of each parameter are calculated for the three tissue types: adipose, fibrous, and tumor, i=1,2,3. [x with macron]i is a column vector made of the eight means. Covariance matrices are also calculated for each tissue type accounting for all eight parameters39:


where ni is the number of elements in each tissue class within the training set, and the superscript T indicates matrix transpose. The mean values for each parameter used in our algorithm, corresponding to the three tissue types in the training set, are listed in Table 1.

Table 1
Mean values for the eight parameters.

For each sample to be diagnosed, the mean values and the covariant matrices are used to calculate a quadratic discrimination score39:


where |.| indicates the matrix determinant, Si1 is the inverse matrix of Si, and x is the column vector made of the eight calculated parameters for that sample. Three quadratic discrimination scores are obtained for each pixel corresponding to the three tissue classes, and the maximum score is selected to assign each pixel of the image to the correct tissue type. The quadratic discrimination score is the logarithm of the probability that the tissue at that pixel belongs to a tissue class and the maximum probability is used for tissue assignment.

Figure 2 shows a scatter plot illustrating the clustering of the three main tissue types (adipose—green, fibrous—blue, tumor—red) and their projections on the x, y, and z planes for only three parameters: Slope, Std, and PeakArea. The points represented here are for each pixel of the OCT images corresponding to the training set of tissue samples. With only three parameters, there is significant overlap for the three tissue types. One can notice however, that there is a decent degree of separation among the three tissue types in the Slope–Std projection plane. The adipose tissue is also well separated from fibrous and tumor in the Slope–PeakArea projection plane, while the tumor tissue is better separated from adipose and fibrous tissue in the Std–PeakArea projection plane. Using more parameters in a multidimensional space is therefore expected to produce better clustering of the three tissue types with much less overlap.

Fig. 2
Scatter plot illustrating the clustering of the three main tissue types (adipose—green, fibrous—blue, tumor—red) and their projections on the x, y, and z planes for three parameters: Slope, Std, and PeakArea.

Multiple depth reflectivity profiles are acquired and processed in each OCT frame in either scanning or nonscanning mode. The result of the algorithm calculation is a numerical value (1 to 3) representing a tissue type. A specific color corresponding to each numerical value is attributed to every pixel in the frame: light blue=1 to adipose tissue, yellow=2 to fibrous and fibroglandular tissue, and red=3 to tumor tissue. Dark blue=0 corresponds to pixels that were masked due to low signal value. Averaging schemes, user selectable over a window of 20 × 20 pixels to 50 × 50 pixels, are applied before displaying the results. Each pixel is finally assigned the dominant tissue type within the averaging window. The algorithm also calculates the percentage of each tissue type present in each frame.

We note that no image processing schemes were used here. The described algorithm was designed and implemented for A-line processing and is capable of identifying different tissue types within each A-line whether the A-lines were acquired with or without scanning (OCT or LCI mode). Averaging was done only at the end before displaying the results. It was performed to remove variability between neighboring pixels and to ensure a locally smooth display map of the tissue assignment. Alternatively, multiple reflectivity profiles can be averaged first and then only the average profile is processed. This modality is applicable to the LCI mode, where multiple A-lines are collected from the same tissue location. This can speed up processing, since 50 to 100 A-lines are sufficient to get a correct estimation, and the acquisition and processing of a small number of A-lines is very fast.

3 Results and Discussions

Selected OCT frames for several cases of single tissue type [(A), (C), (D)], as well as of admixed tissues (B) are presented in Fig. 3. The diagnostic maps reflect the spatial distribution of each tissue type. It can be observed that each tissue type is well recovered. In Fig. 3, the adipose (A) and fibroglandular (C) tissue types can be reasonably well differentiated by a trained OCT reader by examining the OCT image only. However, it is very difficult to diagnose the relatively small differences between the fibrous (C) and tumor tissue (D) based on the OCT appearance only. For these cases, the consensus among OCT readers is relatively low. Our algorithm, however, correlates well with the histology findings.

OCT measurements were performed on 152 tissue samples to test the capability of our algorithm for tissue differentiation. Each measurement site was marked with ink, and histology was performed to correlate OCT measurements with histology findings. However, for 15 samples the technician could not find the ink marking when slicing the tissue, and therefore these samples were removed from the study. Of the remaining 137 samples, 48 were assigned to a training set and 89 to a validation set. The training set allocation was based on pathologist recommendation. These samples showed the best representation of the three tissue types: adipose, fibrous, and tumor. A correlation of over 95% was found when the algorithm was retrospectively applied to the training set. The pathologist, blind to the algorithm findings, also performed histology readings on the validation set. The trained algorithm was then applied to the validation set, and algorithm findings were correlated with histology readings. 93% of the adipose samples were correctly diagnosed, while fibrous and tumor tissues were correctly identified in 75.5% and 88% of the samples, respectively. The same set of samples was measured in an LCI (nonscanning) configuration following the same protocol.38 However, the OCT mode seems to provide better results. We attribute this to the larger tissue volume sampled in the OCT mode.

Our primary interest in this study was to train the algorithm to distinguish between normal (adipose, fibroadipose, or fibroglandular) and abnormal tissue (tumor or tumor admixed with normal tissue), and also to preferentially recognize adipose tissue, which usually creates nondiagnostic aspirates (fatty fluid or fatty cells). The sensitivity and the specificity of the algorithm findings were calculated as:

Sensitivity=TP/(TP+FN); Specificity=TN/(TN+FP),

where TP is the true positive value that was correctly attributed as positive to cancer findings, TN is the true negative value that was properly attributed to normal tissue, FN is the false negative value that was falsely ascribed as negative to cancer sites, and FP is the false positive value that was falsely assigned as positive to normal tissue samples.

The results of our study are summarized in Table 2. We note here that the results were obtained by processing multiple A-lines in each specimen generated by scanning a relatively large tissue section. Sensitivity and specificity of 0.88 were found. These are very good values considering the fact that the tissue differentiation parameters were based on a relatively small number of samples in the training set. The algorithm can be further improved by using a larger training set and by applying a weighting function to each key parameter used in the algorithm.39 Some of the parameters may provide redundant information, and their uniqueness is still under investigation. However, given the large variability in biological tissue, they seem to behave differently across a large number of tissue samples. The algorithm can also be trained to minimize the FN results (increase the sensitivity of the findings) by using a log-linear modeling to determine weighting factors for each tissue feature.39 This will indicate how strongly each feature correlates to the histopathologic diagnosis. The classification approach used here follows from the assumption that each class is described by a multivariate normal distribution.37 Determining more accurate probability distribution functions for each class based on a larger training set might improve the algorithm performance.

Table 2
The results of the automated algorithm.

The current version of the algorithm implemented in MATLAB (The MathWorks, Inc., Natick, Massachusetts) is not fully optimized yet. Post-processing on a laptop with a 2.0-GHz dual-core processor currently takes 56 s for an OCT image of 1000 A-lines and 256 depth points. The processing time could be significantly improved with parallel processing of A-scans, using a faster processor, reducing the number of A-scans, and algorithm optimization. The algorithm can also be implemented in hardware for real-time processing suitable for clinical applications.40 The study presented here was applied to OCT images. Alternatively, since the processing algorithm was intentionally designed for line processing and does not use any region-based analysis techniques (e.g., texture or kernel processing schemes), it is equally well suited for nonscanning protocols in FNAB applications where a single fiber is inserted through a biopsy needle and data from a fixed location is taken at a time.34,38 In the nonscanning case (LCI mode), several A-lines are acquired, individually processed, and averaged in the end, and their processing becomes faster than of 1000 A-lines OCT images as described here. Alternatively, multiple-depth reflectivity profiles could be averaged in the nonscanning case to reduce noise, and only the average profile could be processed with our algorithm. Therefore, even the current version of the algorithm might become suitable for real-time guidance of needle biopsy.

4 Conclusions

A novel algorithm for automated classification of breast tissue types based on OCT data was demonstrated. The algorithm was able to successfully differentiate three breast tissue types (adipose, fibrous, and tumor) providing both lateral and depth discrimination. Identification of healthy, normal versus diseased, cancerous tissue was done with a sensitivity and specificity of 0.88. An increase in both sensitivity and specificity might be possible by further refining the algorithm (as described earlier).

The algorithm was preliminarily tested on OCT images. Alternatively, since it is based on the processing of individual A-lines (does not use image features for tissue classifications), it is well suited for automatic interpretation of LCI data. This enables the use of simpler probes for biopsy guidance, consisting of a bare fiber inserted through a biopsy needle.34,36,38 Besides the use of a simpler probe, the LCI mode allows for processing of a reduced number of A-lines than in the OCT mode as well as for averaging first multiple A-lines and processing only the average profile. This mode makes the current algorithm faster and suitable for real-time tissue classification and therefore for guidance of the biopsy needle by providing the physician relevant information about the type of tissue present at the tip of the needle. This could positively impact the diagnostic yield of the FNAB procedures. Since FNAB is much less expensive and faster than CNB and VB, comparable yield on palpable masses could favor LCI-guided FNAB to become the preferred diagnostic modality.

The application of the algorithm to automated interpretation of OCT data could have a significant impact on clinical translation of OCT. Even the reported level of accuracy in differentiating tissue types could increase the yield of the biopsy procedure if OCT is used as a guidance tool. With this simple technology, the pathologist or clinician performing the biopsy will be able to guide the needle or the biopsy forceps to the most representative diagnostic area of the mass based on the instrument’s ability to determine the tissue type in real time. This will avoid unnecessary biopsy and increase the effectiveness of the procedure.


This research was supported in part by a research grant from the National Institutes of Health (1R41CA114896-01A1).


3. Farshid G, Downey P, Gill PG, Pieterse S. Assessment of 1183 screen-detected, category 3B, circumscribed masses by cytology and core biopsy with long-term follow up data. Br. J. Cancer. 2008;98(7):1182–1190. [PMC free article] [PubMed]
4. Boerner S, Sneige N. Specimen adequacy and false-negative diagnosis rate in fine-needle aspirates of palpable breast masses. Cancer. 1998;84(6):344–348. [PubMed]
5. Castella E, Gomez-Plaza MC, Urban A, Llatjos M. Fine-needle aspiration biopsy of metaplastic carcinoma of the breast: report of a case with abundant myxoid ground substance. Diagn. Cytopathol. 1996;14(4):325–327. [PubMed]
6. Gupta RK. Fine needle aspiration cytodiagnosis of primary and metastatic squamous cell carcinoma of the breast. Acta Cytol. 1997;41(3):692–696. [PubMed]
7. Hindle WH, Chen EC. Accuracy of mammographic appearances after breast fine-needle aspiration. Am. J. Obstet. Gynecol. 1997;176(6):1286–1290. discussion 1290-1282. [PubMed]
8. Pisano ED, Fajardo LL, Caudry DJ, Sneige N, Frable WJ, Berg WA, Tocino I, Schnitt SJ, Connolly JL, Gatsonis CA, McNeil BJ. Fine-needle aspiration biopsy of nonpalpable breast lesions in a multicenter clinical trial: results from the radiologic diagnostic oncology group V. Radiology. 2001;219(3):785–792. [PubMed]
9. Yen H, Florentine B, Kelly LK, Bu X, Crawford J, Martin SE. Fine-needle aspiration of a metaplastic breast carcinoma with extensive melanocytic differentiation: a case report. Diagn. Cytopathol. 2000;23(1):46–50. [PubMed]
10. Ljung BM, Drejet A, Chiampi N, Jeffrey J, Goodson WH, Chew K, Moore DH, Miller TR. Diagnostic accuracy of fine-needle aspiration biopsy is determined by physician training in sampling technique. Cancer Cytopathol. 2001;93(4):263–268. [PubMed]
11. Abati A, Simsir A. Breast fine needle aspiration biopsy: prevailing recommendations and contemporary practices. Clin. Lab Med. 2005;25(4):631–654. [PubMed]
12. Vlastos G, Verkooijen HM. Minimally invasive approaches for diagnosis and treatment of early-stage breast cancer. Oncologist. 2007;12(1):1–10. [PubMed]
13. Hatmaker AR, Donahue RMJ, Tarpley JL, Pearson AS. Cost-effective use of breast biopsy techniques in a veterans’ health care system. Am. J. Surg. 2006;192(5):e37–e41. [PubMed]
14. Klimberg VS. Advances in the diagnosis and excision of breast cancer. Am. Surg. 2003;69(1):11–14. [PubMed]
16. Jackman RJ, Rodriguez-Soto J. Breast microcallcifications: retrieval failure at prone stereotactic core and vacuum breast biopsy—frequency, causes, and outcome. Radiology. 2006;239(1):61–70. [PubMed]
17. Lomoschitz FM, Helbich TH, Rudas M, Pfarl G, Linnau KF, Stadler A, Jackman RJ. Stereotactic 11-gauge vacuum-assisted breast biopsy: influence of number of specimens on diagnostic accuracy. Radiology. 2004;232(3):897–903. [PubMed]
18. Grady I, Gorsuch H, Wilburn-Bailey S. Ultrasound-guided, vacuum-assisted, percutaneous excision of breast lesions: an accurate technique in the diagnosis of atypical ductal hyperplasia. J. Am. Coll. Surg. 2005;201(1):14–17. [PubMed]
19. Roe SM, Mathews JA, Burns P, Sumida MP, Craft P, Greer MS. Stereotactic and ultrasound core needle breast biopsy performed by surgeons. Am. J. Surg. 1997;174(6):699–704. [PubMed]
20. Azavedo E, Svane G, Auer G. Stereotactic fine-needle biopsy in 2594 mammographically detected non-palpable lesions. Lancet. 1989;1(8646):1033–1036. [PubMed]
21. Alfano RR, Pradhan A, Tang GC, Wahl SJ. Optical spectroscopic diagnosis of cancer and normal breast tissues. J. Opt. Soc. Am. B. 1989;6(5):1015–1023.
22. Alfano RR, Tang GC, Pradhan A, Lam W, Choy DSJ, Opher E. Fluorescence-spectra from cancerous and normal human-breast and lung tissues. IEEE J. Quantum Electron. 1987;23(10):1806–1811.
23. Yang YL, Celmer EJ, Koutcher JA, Alfano RR. DNA and protein changes caused by disease in human breast tissues probed by the Kubelka-Munk spectral function. Photochem. Photobiol. 2002;75(6):627–632. [PubMed]
24. Yang YL, Celmer EJ, Koutcher JA, Alfano RR. UV reflectance spectroscopy probes DNA and protein changes in human breast tissues. J. Clin. Laser Med. Surg. 2001;19(1):35–39. [PubMed]
25. Gupta PK, Majumder SK, Uppal A. Breast cancer diagnosis using N-2 laser excited autofluorescence spectroscopy. Lasers Surg. Med. 1997;21(5):417–422. [PubMed]
26. Bigio IJ, Bown SG, Briggs G, Kelley C, Lakhani S, Pickard D, Ripley PM, Rose IG, Saunders C. Diagnosis of breast cancer using elastic-scattering spectroscopy: preliminary clinical results. J. Biomed. Opt. 2000;5(2):221–228. [PubMed]
27. Ghosh N, Mohanty SK, Majumder SK, Gupta PK. Measurement of optical transport properties of normal and malignant human breast tissue. Appl. Opt. 2001;40(1):176–184. [PubMed]
28. Palmer GM, Zhu CF, Breslin TM, Xu FS, Gilchrist KW, Ramanujam N. Comparison of multiexcitation fluorescence and diffuse reflectance spectroscopy for the diagnosis of breast cancer (March 2003) IEEE Trans. Biomed. Eng. 2003;50(11):1233–1242. [PubMed]
29. Zhu CF, Palmer GM, Breslin TM, Xu FS, Ramanujam N. Use of a multiseparation fiber optic probe for the optical diagnosis of breast cancer. J. Biomed. Opt. 2005;10(2):024032. [PubMed]
30. Ramanujam N. Fluorescence spectroscopy of neoplastic and non-neoplastic tissues. Neoplasia. 2000;2(1–2):89–117. [PMC free article] [PubMed]
31. Johns M, Giller CA, German DC, Liu HL. Determination of reduced scattering coefficient of biological tissue from a needle-like probe. Opt. Express. 2005;13(13):4828–4842. [PubMed]
32. Zysk AM, Boppart SA. Computational methods for analysis of human breast tumor tissue in optical coherence tomography images. J. Biomed. Opt. 2006;11(5):054015. [PubMed]
33. Goldberg BD, Iftimia NV, Bressner JE, Pitman MB, Halpern E, Bouma BE, Tearney GJ. Automated algorithm for differentiation of human breast tissue using low coherence interferometry for fine needle aspiration biopsy guidance. J. Biomed. Opt. 2008;13(1):014014. [PubMed]
34. Iftimia NV, Bouma BE, Pitman MB, Goldberg B, Bressner J, Tearney GJ. A portable, low coherence interferometry based instrument for fine needle aspiration biopsy guidance. Rev. Sci. Instrum. 2005;76(6):064301.
35. Rao YJ, Jackson DA. Recent progress in fiber optic low-coherence interferometry. Meas. Sci. Technol. 1996;7(7):981–999.
36. Schmitt JM, Knuttel A, Bonner RF. Measurement of optical-properties of biological tissues by low-coherence reflectometry. Appl. Opt. 1993;32(30):6032–6042. [PubMed]
37. Boppart SA, Luo W, Marks DL, Singletary KW. Optical coherence tomography: feasibility for basic research and image-guided surgery of breast cancer. Breast Cancer Res. Treat. 2004;84(2):85–97. [PubMed]
38. Iftimia NV, Mujat M, Hammer DX, Ustun T, Ferguson DR. Spectral-domain low coherence interferometry/OCT system for fine needle breast biopsy guidance. Rev. Sci. Instrum. 2009;80(2):024302. [PubMed]
39. Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis. 5th ed. Upper Saddle River, New Jersey: Prentice Hall; 2002.
40. Ustun TE, Iftimia NV, Ferguson RD, Hammer DX. Real-time processing for Fourier domain optical coherence tomography using a field programmable gate array. Rev. Sci. Instrum. 2008;79(11):114301. [PubMed]