|Home | About | Journals | Submit | Contact Us | Français|
A non-invasive optical diagnostic system for detection of cancerous and precancerous lesions of the cervix was evaluated, in vivo. The optical system included a fiber optic probe designed to measure polarized and unpolarized light transport properties of a small volume of tissue. An algorithm for diagnosing tissue based on the optical measurements was developed which used four optical properties, three of which were related to light scattering properties and the fourth of which was related to hemoglobin concentration. A sensitivity of ~77% and specificities in the mid 60's were obtained for separating high grade squamous intraepithelial lesions and cancer from other pathologies and normal tissue. The use of different cross-validation methods in algorithm development is analyzed and the relative difficulties of diagnosing certain pathologies is assessed. Furthermore, the robustness of the optical system for use by different doctors and to changes in fiber optic probe were also assessed and potential improvements in the optical system are discussed.
Optical diagnostics have the potential to provide real-time diagnosis of tissue and many optical diagnostic techniques are being developed. For example, fluorescence, light scattering and the combination of the two continue to be investigated for their ability to accurately detect pre-cancerous lesions of the cervix and a summary of published clinical studies is given in the Discussion section. The motivation for developing optical methods for detection of cancerous and precancerous cervical lesions is that current methods have several short comings including missed lesions [1, 2] and loss of patients to follow up . The diagnositic procedures currently in clinical practice are not suitable for ”See and treat” methods which would allow treatment at the time of diagnosis [4, 5].
In this study, light scattering spectroscopy alone is studied. Specifically, we have designed, built and implemented a unique fiber-optic probe which determines both morphological and biochemical properties of tissue by measuring the transport of linearly polarized and unpolarized light through a small volume of tissue. The primary goal of the study was to determine the accuracy of the light scattering measurements for the detection of high grade squamous interepithelial lesions (HSIL) - a precursor for cervical cancer. We have also investigated the effects of having different physicians using the instrument and of changes in optical probes during the course of the clinical study. The results of different resampling methods for determining the accuracy of our classification algorithms are also compared.
The experimental measurement system and probe are illustrated in Fig. 1. The tungsten lamp box contains two tungsten lamps (Gilway, model L1041) each with a UV filter (Hoya, Y-48, required by the FDA) and a shutter (Uniblitz VMM-D3). Light collected by each of the optical fibers is simultaneously dispersed using an Acton Spectra Pro 275 spectrograph with an attachment designed specifically for imaging optical fibers onto a CCD. A front illuminated, TE cooled CCD is used for light collection (Princeton Instruments). For measuring tissue, the probe is placed in gentle contact with the tissue and can be used to measure either polarized or unpolarized light transport. When the shutter for the lamp illuminating fiber DU is open, the light collected by fiber U is a measurement of unpolarized light transport. The center-to-center separation of fibers DU and U is 550 μm. The light source for polarized measurements is fiber DP and the polarized collection fibers are 1, 3, and 4. There is a linear polarizer (3M, HN 32% × 0.01”) over fibers DP, 1 and 4 that allows horizontally polarized light to pass. The linear polarizer over fiber 3 allows vertically polarized light to pass. All collection fibers are angled at 20° towards their respective delivery fiber in order to optimize sensitivity to epithelial tissue and all optical fibers in the probe are 200 μm in diameter with an numerical aperture of ~0.37. Titanium dioxide in solid epoxy is used as the reference material. It is submersed in water and the probe is placed in contact with this reference material for measurement. Nine parameters, which are described in detail below, are calculated from the spectra: total hemoglobin (Hb) concentration (multiplied by pathlength); fraction of Hb that is oxygenated; a vessel ‘size’ (divided by pathlength), wavelength dependence of the unpolarized scattered light intensity; amplitude of the unpolarized light scatter; slope of the unpolarized light scatter; the ratio of light collected by fibers 1 and 3; the ratio of light collected by fibers 1 and 4; and water concentration (multiplied by pathlength) .
To estimate hemoglobin (Hb) concentration and oxygenation, the portion of the unpolarized spectrum from 500 − 800 nm was fit to Eq. 1.
CHb and CHb02 are the concentrations of deoxygenated and oxygenated hemoglobin respectively, both multiplied by the pathlength. εHb and εHb02 are the wavelength dependent absorption of deoxygenated and oxygenated hemoglobin respectively, both taken from the literature. c0 is the ‘amplitude’ of the unpolarized light scatter and c1 is the wavelength dependence of the unpolarized light scatter, sometimes called scatter power. R is the ‘size’ of the blood vessels divided by the pathlength the collected light travelled. Eq. 1 is an equation for the absorption due to hemoglobin in blood vessels . The total Hb content (multiplied by optical pathlength) is given by CHb + CHb02, while the fraction of hemoglobin that is oxygenated is given by CHb02/(CHb + CHb02). An example fit to Eq. 1 is given in Fig. 2. In some cases there was insufficient hemoglobin absorption to determine vessel diameter and in those cases V was set to 1. A more general expression which includes both hemoglobin in blood vessels and hemoglobin outside blood vessels has been described , however, for most of our data this expression resulted in over parameterization of the data.
The slopes of the unpolarized data from 690 to 790 nm are calculated by fitting a straight line to the data from 690 to 790 nm as shown in Fig. 2. The slope was then divided by the area under the spectrum from 690 to 790 nm. Consequently, this slope value is proportional to 1/intensity as demonstrated in Eq. 4 where it is assumed that the data are a straight line from 690 to 790 nm and y is the average value of the intensity between 690 and 790 nm. The magnitude of the slope has been found to be greater for proliferating cells  and a greater slope magnitude indicates that the average size of structures scattering light is smaller [9, 10].
The ratio of light intensity from fibers 1 and 3 and the ratio of light intensity from fibers 1 and 4 are calculated as a function of wavelength. At wavelengths past ~900 nm, the polarizers do not polarize. Therefore. this region can be used to normalize the data and correct for the different collection efficiencies of the optical paths including fibers 1, 3 and 4. Specifically, I1/I3 and I1/I4 are normalized to 1 from 950 to 1000 nm. Example spectra are shown in Fig. 3 The physical interpretation of I1/I3 and I1/I4 has been determined in previous work. I1/I3 is greater for more strongly scattering tissue and I1/I4 increases as the average size of scattering structures decreases . Finally, water concentration was calculated in a manner analogous to that used for total Hb. An example of the fit is given in Fig. 2.
All tissue sites for which the colposcopist planned to take a biopsy as part of normal clinical procedure were measured with the spectroscopic system. Additionally, one or two normal sites were also measured but not biopsied. All sites were measured once with the spectroscopic system and then the measurements were repeated. Subsequently, biopsies were obtained and each biopsy was placed in a separate container. After all spectroscopic measurements of a patient were completed, the probe was gently wiped off, and a reference measurement was made.
Each biopsy was characterized as normal, cervicitis, low-grade squamous intraepithelial lesion (LSIL), high-grade intraepithelial lesion (HSIL) or cancer by the study pathologist. The study pathologist also ranked the inflammation as none, a few clusters of inflammatory cells, or many inflammatory cells. Vascularity was paramatized as normal or increased. The tissue site was determined by histopathology as ectocervix (squamous epithelium), endocervix (columnar epithelium) or squamous columnar junction (SCJ).
Data from 151 patients were acquired and analyzed. Data from several other patients could not be used primarily due to failure of our decade old equipment. Human subjects review boards reviewed and approved this work at both the University of New Mexico and at Los Alamos National Laboratory. Each patient was consented by the study coordinator.
Data from 64 of the patients were acquired with the original fiber optic probe dedicated to this study. When that probe broke in a non-repairable manner, data from the rest of the patients were acquired with a replacement fiber optic probe that was very similar, but not perfectly identical. To determine if the change in probe had any effect on the measurements, data from the two probes were compared using Students t-tests within each pathology classification. (The two instances of invasive cancer were not included in the comparison.) When statistically significant differences were found in multiple pathology categories, the measurements made with the second probe were multiplied by a correction ratio which was calculated as follows. For each pathology classification, the average for probe 1 divided by the average of probe 2 was calculated. The average of these ratios was the correction ratio. No significant differences were found after the data were corrected.
Four doctors participated in this study and made spectroscopic measurements. Each patient was only measured by only one doctor. The Student's t-test was used to determine if there were significant differences in the average values of the spectroscopic variables measured by the different doctors. The difference in values of spectroscopic parameters for two doctors can not be compared by simply using the mean and standard deviation of all measurements of a spectroscopic variable, because each doctor may have measured a different fraction of patients with a given pathology. Also, the colposcopically normal data were not used, because the tissue that one doctor used for the colposcopically normal measurements may be different from what another doctor chose.
Here is an example of how the corrections were determined for I1/I3. The averages of all cervicitis measurements for each doctor were determined. Call these, pi1, where i goes from 1 to 4 for the four doctors. Then the averages of a second pathology were determined. Call these pi2. Then the ratio of I1/I3 values for pathology 2 to cervicitis is calculated as:
A new set of I1/I3 data for each doctor is then calculated as
where j is over the four pathologies (HSIL and cancer were grouped together) and n is the subscript for an individual data point. (Note that the colposcopically normal data are not part of these data sets.)
The standard deviation and averages were calculated for these data sets in order to use the Student's t-test to determine whether there are significant differences between the new I1/I3 data sets for each doctor.
When differences in the data sets were found (p < 0.05), corrections were made to the original data. For I1/I3, slope, total Hb, and oxy Hb, the averages were similar for three doctors, while a fourth doctor had significantly different averages from two of the first three (in the case of total Hb it was only one of the first three). Therefore, the raw data for the fourth doctor was multiplied by a correction factor so that the data sets calculated by Eq. 6 had the same average for all four doctors. The ”odd” doctor was not the same in every case. The colposcopically normal data were multipled by the same correction factor as the rest of the data.
Histograms of the number of sites with a given value of a spectroscopic variable were made for each diagnostic category for each measured variable. Histograms for slope, I1/I3 and I1/I4 were then fit to Gaussians and normalized to yield probability distributions. These histograms provide a visual picture of the changes in spectroscopic values with tissue pathology and of the overlap between different categories.
We initially analyzed our data with the Mahalanobis distance metric, which is the analysis method used by Chang et al.  and Mirabal et al. , but found that significantly worse results were obtained for the testing sets than for the training sets indicating that this method was over training. The following classification method was found to give more similar results between the training and testing data sets. A vote is cast ”by” each of three variables: slope, the average value of I1/I4 from 660 to 760 nm, slope, and the average value of I1/I3 from 660 to 760 nm. For each variable there is a cut-off value. If the measured value for a site is on one side of the cut-off then the vote is positive, i.e. for HSIL or cancer. If it is on the other side of the cut-off, it is for the negative category. Initial classification is then a two out of three vote. In addition, if totalHb is very high, an initially negative classification is changed into a positive classification if at least one of the three variables, I1/I4, slope, and I1/I3, had a positive vote. The cut-off values were optimized as follows. The data were normalized so that the data range for each variable was 0 to 10. The cut-off for total hemoglobin was set at 3 and the cut-off for I1/I4 was fixed at 4.5. A wide range of I1/I3 cut-off's was then tested. For each I1/I3 cut-off a wide range of slope cut-offs was tried. The optimum cut-offs for I1/I3 and slope were defined as those providing the largest sum of sensitivity and specificity such that the sensitivity was greater than or equal to 80%. A sensitivity greater then 80% was required in order to limit the number of HSIL sites that were missed. Using fixed values for the cut-offs for I1/I4 and Hb reduced the variation in results of the training and testing sets. Furthermore, correlations between the different variables meant that disparate combinations of cut-off values would yield the same results. By holding I1/I4 constant, the optimization problem became much smaller with little change in the ultimate results. The value for the Hb cut-off was chosen such that the vast majority of measurements had total Hb less than the cut-off. (The distribution of Hb measurements is non-Gaussian). The value for the I1/I4 cut-off was chosen to be a number with only two significant digits that was near to values commonly found in early optimization runs where I1/I4, I1/I3 and slope were all varied.
Five-fold cross-validation was used as a validation method for the classification algorithms. The data were split into 5 subsets of approximately equal size with each subset containing approximately the same proportion of each pathology classification. Each of the 5 subsets were used once as a testing set with the remaining data used for training in each case. Sensitivity and specificity were estimated by averaging the results for the 5 data sets. This validation method was chosen because resampling methods, such as n-fold cross-validation, have been shown to be better at evaluating models than non-resampling methods. Furthermore, 5-fold and 10-fold cross validation have been recommended over leave-one-out cross validation [14,15], because leave-one-out (LOO) cross validation can have large variance (e.g. the results for one trial of 25 patients may be very different than the results for a different trial of 25 patients) [14, 16].
To examine the differences between five-fold cross-validation and LOO cross validation, both methods were used to develop and evaluate classification algorithms that optimize the sum of specificity and sensitivity.
Figure 2 shows a representative spectrum of collected unpolarized light and the fits to that spectrum. There are some small systematic errors in the Hb region which were fairly common. Examples of I1/I3 and I1/I4 are shown in Fig. 3. The distance light travels from the linearly polarized delivery fiber to the cross-polarized collection (I3) is longer than to the co-polarized collection (I1), therefore the hemoglobin bands show up as positive in the I1/I3 spectrum. Similarly, the pathlength of light traveling from the delivery fiber to I4 is longer than the pathlength from the delivery fiber to I1  and the hemoglobin bands are positive in the I1/I4 spectrum.
Two fiber optic probes were used in this study. The mean values for I1/I4 and water differed for every pathology classification for the two probes used in this study and therefore the data were corrected for these differences. In contrast, no significant differences were found between the probes for I1/I3 for any pathology category. For slope, a significant difference was found only for the category of ”colposcopically normal” and was not corrected. The total Hb measurement was found to differ significantly for two categories and was corrected. Whatever physical differences in the probes caused water concentration measurements to differ also likely caused the Hb concentration measurement to vary. However, the distributions of Hb concentration in tissue are broader and consequently differences between the probes are more difficult to detect.
This study was performed by four clinicians and a few systematic differences were found in the results for different doctors. The average values of I1/I3, slope, total Hb, scatter power, and amplitude were all significantly different between some of the doctors. Corrections to the data were made as described in the Methods Section.
We have also investigated how patient characteristics (e.g. patient age) affect the spectroscopic measurements. This work is described in a separate paper where dependencies on menstrual cycle and patient age are reported .
The pathology of the measured sites is given in Table 1. A total of 362 sites were used in this analysis, half of which were biopsied sites and half of which were normal via colposcopic examination and not biopsied. The vast majority of biopsied sites were of the squamous-columnar junction which contains some combination of squamous, columnar and metaplastic epithelium. 24 biopsies were confirmed to be of the ectocervix which is usually squamous epithelium, and 11 biopsies were confirmed to be of the endocervix which is usually columnar epithelium. On average, inflammation was increased for cervicitis and HSIL compared to the normal sites. Vascularity was more likely to be increased for cervicitis and HSIL than in the normal and LSIL biopsies.
Examination of the probability distributions for a given spectroscopic variable for each pathology provides insight into which pathology categories can be accurately diagnosed. Fig. 4 shows the distributions of values of I1/I4 and slope obtained for the different diagnostic categories. The best separation is between the categories of colposcopically normal and HSIL. LSIL and HSIL have very similar distributions. The distribution for ”no diagnostic abnormality” is as narrow or narrower than the other distributions.
A goal of this work is to identify sites HSILs and cancers versus sites with other pathologies and normal tissue sites. There were several confounding factors. The measurements depended slightly on which doctor made the measurements and on which optical probe was used (Section 3.B). We also found that the some of the spectroscopic parameters depend on patient age and menopausal or menstral cycle status . Corrections to the data for all of these confounding effects were made before the ROC curves were calculated and the classification algorithms developed.
Reciever operating characteristic (ROC) curves for the diagnosis of HSIL and cancer versus the other pathologies are presented in Fig. 5. The areas under the ROC curves for slope, I1/I3, I1/I4 and total Hb are 0.69, 0.64, 0.70 and 0.64, respectively. ROC curves are not shown for water, oxyHb, vessel ‘size’, and ‘amplitude’ because the area under them was close to 0.5. The wavelength dependence of the unpolarized light scatter (scatter power) is very similar to the slope parameter. Since the area under the ROC curve for this parameter was 0.65 which is less than that for slope, this parameter was not used for classication. Because none of the areas under the ROC curves are near the perfect value of 1, a method of combining these metrics was desired. In the course of analyzing the data, several different methods were considered (e.g. classification by Mahalobonis distance). A voting method was chosen for simplicity and because of the similarity found between results for training and testing data sets. The inputs to the voting method are measured values for I1/I4, slope, I1/I3, and total hemoglobin as described in Section 2.G. The results are shown in Table 2. The best results are obtained when the colposcopy normal sites are included, when the positive category is HSIL or cancer, and when the negative category is non-dysplastic. The average results for the testing data sets are then a sensitivity of 77% and a specificity of 68%. When colposcopically normal sites are not included, the obtained sensitivity is 79% and the specificity was 47%.
Leave-one-out cross-validation is a very common method for assessing the accuracy of a classification method. Table 3 compares results obtained with leave-one-out cross-validation (LOO) and five-fold cross-validation. In the top row, where the disease classification is HSIL and cancer and the non-disease classification is LSIL and non-dysplastic, the sensitivities are the same, but the specificities are higher for LOO. In the bottol row, where the disease classification is HSIL and cancer and the non-disease classification is non-dysplastic, the results are nearly identical.
Values of I1/I4 and slope have been previously shown in tissue phantom studies to correlate with the average size of the scattering structures [9-11]. I1/I4 is and the slope magnitude (the slope is negative) are greater for HSIL than for non-dysplastic tissue indicating that the average size of the scattering centers decreases in HSIL. This change may be due to increased spatial fluctuations in DNA content in dysplastic nuclei which have been shown to affect light scattering .
An increase in slope magnitude and greater values for I1/I3 were seen for HSIL sites in this study and for our tumorigenic model in our previous in vitro experiments. However, the changes in I1/I4 found in this clinical work differ from the changes in I1/I4 seen in our in vitro measurements comparing a tumorigenic and non-tumorigenic model . I1/I4 was smaller for the tumorigenic model than the non-tumorigenic model. In contrast, I1/I4 was larger for in vivo precancerous and cancerous tissue. The reason for this difference is currently not known. However, there are several possibilities. Possibly, the morphological changes in the cells really are different between the in vitro fibroblast model and the in vivo cervical epithelial cells. Secondly, tissue is more complex than the cell models. In vivo, we measure not only cells, but also some of the underlying stroma. Both the thickness of the epithelial layer of cells and the properties of the stroma may change in precancerous tissue.
When comparing results of different studies, several details of the studies should be considered. Extremely important is the extent of inclusion of sites that are expected to be normal (ENS). We (Fig. 4 and Table 2) and others  have found that separating sites that appear normal by colposcopy from HSIL sites is easier than separating non-normal appearing sites from HSIL.
Another consideration is the validation and resampling methods, particularly for small data sets. As noted in section 2.G some resampling methods give more robust results. Closely related is the reported error in the presented sensitivity and specificity. Unfortunately, this is frequently not reported. The number of patients in the study should also be considered. In our own work, we have found that better sensitivities and specificities are obtained when the sample size is smaller. The number of patients in this study, 151, is comparable to or significantly larger than that used in previously published studies.
Table 4 summarizes results from three distinct point measurement spectroscopy studies which used fluorescece, light scattering or a combination of both. The first line is a study from 1996 which used fluorescence from three separate excitation wavelengths . 59% of the samples were normal epithelium. The diagnostic algorithm was developed with a calibration model and tested on a testing data set. The results for the separate data set were slightly worse than for the calibration set, similar to our differences between average training and average testing set results. The results appear slightly better than those in this study, however, this result may be caused by the slightly higher percentage of expected normals. The second two lines in table 4 are results from the same study. In one case fluorescence was used to perform the diagnosis and in the second case reflectance was used. Our result of 77±4.5% specificity is similar to these results, while our results of 68±2% sensitivity are slightly lower than from this study that has a similar number of patients. Comparison of our results to the third study is difficult because pathology classifications are different and the study is quite a bit smaller . Nonetheless, the bottom two lines of Table 3 demonstrate a very important point. The reported accuracy greatly decreases when expected normal sites (ENS) are not included in the study.
Table 5 shows the results of three studies performed with imaging instruments. The first study (rows 1−4) was quite small and contained a large number of expected normal sites, 373 out of a total of 490 . The second study (row 5) is the largest published study and used an excellent cross-validation method. The fraction of sites that were normal by colposcopy was probably smaller than our study (an exact number is not given) . The final study (row 6) is again difficult to compare with ours, because the sensitivity and specificity are presented on a per patient basis and the validation method was unusual (see Table caption) . Overall, our results compare well to those in the peer-reviewed literature.
Results from a small portion of patients in this study have been previously reported . Specifically, results from 29 patients were reported in a retrospective study with no validation of the classification method. In an analysis that excluded 3 sites, a sensitivity of 100% and a specificity of 80% was obtained. When the colposcopy normal sites were not included the specificity dropped to 55%. These results are somewhat better than those reported here for the much larger data set. The major reason for this change in accuracy is, most likely, that no validation was performed for the first study, while the larger study used 5-fold cross-validation. Furthermore, it seems plausible that better results are obtained with a small data set because classification parameters are optimized for the unique characteristics of that small data set. Because of the discrepancy in accuracy, we examined the significance of variables used for classification to determine if that had changed. For the small data set, slope and I1/I4 were found to have significantly different averages (i.e. p < 0.05) for the non-HSIL and the HSIL data regardless of whether the colposcopy normals were included . Those results held for the larger data set. Similarly, I1/I3 was significant only when the colposcopy normals were included for both the small and large data sets.
An optical imaging system that performs both fluorescence and reflectance has been reported to increase detection rates of HSIL and cancer . In one published study, half the patients went to a colposcopy only arm and the others went to a colposcopy plus optical imaging arm. The percent of patients found to have HSIL or worse was significantly greater in the optical imaging + colposcopy arm, 14.4% vs 11.4%. In a second study of 193 subjects, colposcopists completed their standard exam and then were instructed to take at least one biopsy in a region identified as high probability for HSIL or worse by the optical imaging system . An additional 9 patients were identified as having HSIL or worse via this biopsy, above the 41 already identified by colposcopy. Importantly, ~1 more biopsy was taken per patient due to the use of the optical imaging system. The study did not demonstrate whether or not the increase in detection rates was simply due to the increase in the number of biopsies.
Information about how patient characteristics affect the spectroscopy data was used in this work to improve the quality of the data . Most of this information is routinely acquired in a clinical exam (e.g. age) and the other information can easily be acquired. The values of the spectroscopic variables also depended slightly on the clinician making the measurement. In this work, we corrected for these effects. However, this correction will not generally be feasible. These differences are possibly due to differences in the time between when acetic acid is applied and measurements are made or more likely caused by how the doctors hold and use the fiber optic probe. We are working on incorporating a pressure sensor into the probe so that all doctors will hold the probe against the tissue with the same pressure. Additionally, the manufacturing procedures are being modified in order to eliminate differences between optical probes.
Improvements in the accuracy of the optical system are clearly desired. To make these improvements it will be necessary to understand some fundamentals. For example, it is important to understand whether the widths of the probability distributions in Fig. 4 are instrumental, due to measurement technique, or biological in origin.
Alterations to the probe are planned and implementation of the changes will be influenced by knowledge of the fundamental scattering processes that are the basis for this technique. For example, some alterations will provide additional light scattering information with very little added complexity to the probe. Other alterations will make the probe a more robust clinical tool. We have already built a prototype with an incorporated pressure sensor.
In order for this spectroscopy system to be used in the places where it is most needed (i.e. low income areas), the system must be made less expensive. There is tremendous potential for greatly decreasing the size and cost of the instrumentation used here, because light scattering is a strong and relatively easy to measure signal.
An elastic light scattering system that measures both polarized and unpolarized light transport in the cervical epithelium has a sensitivity of 77±5% for detection of HSIL and a specificity of 44±3% for colposcopically abnormal sites. An important result of this study is that much improved results are obtained if colposcopically normal sites are included in the analysis. Further conclusions are that spectroscopic measurements varied slightly depending on the doctor using the spectroscopic system, and that similar results for sensitivity were obtained using either leave-one-out cross-validation or five-fold cross-validation, while specificity results were sometimes greater for leave-one-out cross-validation.