In this paper we have compared an empirical approach and a Monte Carlo model based approach for the analysis of tissue fluorescence and diffuse reflectance spectra and for the diagnosis of breast cancer. Both approaches can extract diagnostically useful features from the fluorescence and diffuse reflectance spectra, thus significantly reducing the large number of spectral variables into a few features that can be used for the discrimination of breast cancer. The classification using the features extracted from both approaches (i.e. PCs or tissue properties) were comparable, which suggests that both approaches are equally effective for the discrimination of breast malignancy.
In most classifications, the percentage of misclassified samples increased with decreasing percentage of malignancy as would be expected. The misclassified samples with 0~25% of malignancy were either invasive lobular cancer or carcinoma in situ, which account for only a small portion of malignant samples. These sub-types of malignancy may be underrepresented due to the small number of tissue samples as well as the low percentage of malignancy presented in these tissue samples.
In the empirical approach, PLS was employed to extract a set of components, which represent the tissue spectra with dramatically reduced dimension. This particular method, rather than Principal Component Analysis (PCA) that is widely used for data reduction was employed because in a previous study [1
] it was shown that PLS has the advantage of taking into account the histological diagnosis of tissue samples when extracting the principal components. Only one PC obtained from the diffuse reflectance displayed statistically significant differences between malignant and fibrous/benign breast tissues, while PCs obtained from fluorescence spectra primarily displayed statistically significant differences between malignant and adipose tissues.
Classifications based on the diagnostically useful PCs provided a sensitivity and specificity of up to 87% and 89% for the discrimination between malignant and non-malignant breast tissues. It was noted that classification using combined fluorescence and reflectance PCs yielded similar sensitivity and specificity as that using reflectance PCs only for the discrimination between malignant and non-malignant tissue samples. Although in this study adding fluorescence does not seem to increase the diagnostic accuracy, fluorescence does provide diagnostically useful information, because four out of the ten misclassified samples using reflectance PCs alone were correctly classified using fluorescence PCs. The rest six samples have been consistently misclassified using either fluorescence PCs, reflectance PCs or the combination of both (see ). Combining the fluorescence and reflectance PCs directly in a SVM classification may not make full use of the complementary information that each type of spectra may contain (since a separation hyperplane in a higher dimensional space is not a direct combination of the hyperplane in its sub-space of lower dimension). However a strategy using fluorescence PCs and reflectance PCs separately (e.g., sequentially) may have the potential to improve the overall classification.
All the extracted tissue properties displayed statistically significant difference between malignant and adipose, and between malignant and non-malignant breast tissues (p < 0.05). For the tissue samples obtained from cancer surgery, only hemoglobin saturation and the mean reduced scattering coefficient extracted from the diffuse reflectance spectra showed statistically significant differences between malignant and fibrous/benign tissues. However, only 8 fibrous/benign samples were available for this analysis. Incorporating the normal fibrous samples obtained from breast reduction surgery increased the sample size such that it is comparable to that of malignant tissue samples. Results from the Wilcoxon rank-sum test () indicated that most of the extracted tissue properties, except for the total hemoglobin concentration and the fluorescence contribution of NADH, showed statistically significant differences (p < 0.05 at least) between malignant and fibrous/benign tissues. Classification between malignant vs. fibrous/benign tissue samples (as shown in ) using diagnostically significant absorption and scattering properties yielded higher sensitivity and specificity as compared to that using diagnostically significant fluorescence properties only. Combining the two sets of tissue properties did not improve the diagnostic accuracy, suggesting the diagnosis is primarily attributed to the difference in absorption and scattering properties (especially the hemoglobin saturation).
The tissue samples used in this study are a subset of the tissue samples used in a previous study, which we have reported in ref [2
]. In the previous study, two sets of tissue samples obtained from two independent breast studies (including the one used in this study) were combined for the discrimination analysis. Most results shown here for the statistically significant differences between malignant and non-malignant breast tissues are consistent with those obtained from the combined sample set investigated in the previous study, with the exception that the total hemoglobin concentration displayed a significant difference (p < 0.05) between malignant and non-malignant breast tissues for this sample set, while this was not observed for the combined sample set in previous study. This demonstrates that there is consistency between the results analyzed with the Monte Carlo based inverse models of data collected from different instruments and probes.
The PCs extracted from the PLS analysis can be correlated with the tissue properties extracted from the model based analysis. For fluorescence, the 340nm PC1 for example, had significant positive correlation with the fluorescence contribution of collagen, and significant negative correlation with the fluorescence contribution of retinol. As shown in , 340nm PC1 displayed positive values within wavelength range of 360 – 460 nm with maximum appearing at around 390 nm, and negative values over the wavelengths above 460 nm, with minimum appearing at around 520 nm. Since the PCs primarily account for the difference in spectral line shape observed between malignant and non-malignant tissue samples, this may suggest that the fluorescence intensity over wavelength range of 360 – 460 nm was higher, while the fluorescence intensity over wavelengths above 460 nm was lower for malignant than that for non-malignant tissues. The spectral features over 360 – 460 nm characterize the collagen fluorescence, and a larger PC score may indicate higher collagen fluorescence. The spectral features over the wavelengths above 460 nm characterize retinol fluorescence, however a larger PC score may indicate lower retinol fluorescence, as the magnitude of the PC over this wavelength range is negative. This explains the positive correlation of 340nm PC1 with the fluorescence contribution of collagen, and the negative correlation with the fluorescence contribution of retinol. The 360nm PC2s were most significantly correlated with the fluorescence contribution of NADH. showed that 360nm PC2 had a shoulder over the wavelengths between 430 – 510 nm, which coincide with the fluorescence emission maxima of NADH. The larger PC scores suggest higher NADH fluorescence, thus a positive correlation between the PC score and the fluorescence contribution of NADH is expected.
For reflectance, the Refl PC1 was most significantly correlated with the mean reduced scattering coefficient. This PC has positive values over the entire spectrum, thus a larger PC score suggests a higher over all spectral intensity. Increased scattering in the medium will also result in a diffuse reflectance spectrum of higher intensity. Thus an increase in the score of Refl PC1 may reflect the increase in the spectral intensity that can partially result from the increasing mean reduced scattering coefficient. The negative correlation observed between Refl PC1 and β-carotene concentration may be attributed to the negative correlation between β-carotene concentration and mean reduced scattering coefficient, since the former increases while the latter decreases with increasing adipose tissue content. Refl PC2 displayed significant positive correlation with β-carotene concentration. This PC displayed an apparent valley over the wavelength range of 430 – 520 nm, which coincides with the absorption band of β-carotene. The larger the PC score is, the deeper the valley would be, suggesting more absorption by β-carotene.
In summary, the PC1s of fluorescence spectra measured at 340, 360 and 380 nm primarily reflect the fluorescence from collagen and retinol, while the PC2s primarily reflect the NADH fluorescence. The PC1 of diffuse reflectance spectra is most related to the scattering property, while the PC2 is primarily related to β-carotene concentration and hemoglobin saturation. For the sample set investigated in this study, the classification based on PCs and that based on intrinsic tissue properties provided comparable classification accuracy. This suggests that both the linear (PLS) and non-linear (Monte Carlo) methods extract similar features from the tissue spectra for the diagnosis of breast cancer and that one method is not superior to the other in this respect.
Each approach has its advantages and disadvantages. The one advantage of the empirical approach is that it is not computationally intensive, which is the disadvantage of the model based approach, as the latter involves a recursive procedure for model optimization and each tissue spectrum has to be processed individually, while the former only involves linear regression and the spectra of all samples are pooled together for processing. One disadvantage of the empirical approach is that a finite number of tissue spectra from both tissue types are needed to extract the principal components, and the change in sample pool (e.g., exclusion of part of samples or inclusion of new samples) will result in a different set of principal components. In this study, the empirical analysis was not performed for the discrimination between malignant vs. fibrous/benign samples after the inclusion of additional tissue samples from the breast reduction surgery, because the PLS analysis on the new sample set will produce a set of PCs that are different from the ones that have been presented earlier (i.e., those extracted from the cancer surgery samples), thus making it difficult to relate and compare the new set of PCs with the other results presented in this study. This however, is not a problem with the model based approach, since the model-based feature extraction is performed on each individual sample thus adding new samples for further analysis is straightforward.
In conclusion, we have presented in this study the use of both an empirical and a Monte Carlo model based approach for the analysis of tissue fluorescence and diffuse reflectance spectra, and demonstrated that classification based on both approaches provided comparable classification accuracy for discriminating breast malignancy. We also showed that there are significant correlations between the PCs extracted from the empirical spectral analysis and the intrinsic tissue properties extracted from the model based analysis, suggesting both approaches may probe the same spectroscopic contrast in the tissue that discriminate between malignant and non-malignant breast tissues albeit in different ways. While the empirical spectral analysis provides a straightforward means to examine the difference in the fluorescence and diffuse reflectance spectra of various tissue types, the model based analysis allows for a quantitative assessment of the physiological and biochemical information about the tissue property, thus providing insights into the biological basis of the spectral features that are observed in the tissue spectra.