|Home | About | Journals | Submit | Contact Us | Français|
To assess the consistency of digitization of 35mm slides, as practiced in ophthalmologic research and estimate the impact of variation on semi-automated retinal vessel width measurements.
A single retina slide was repeatedly digitized under various conditions on three scanner models. Average color levels were extracted from the resulting images, from which vessel widths were graded. The color channel level variations and possible correlation with width were analyzed.
The Nikon 5000 scanner had average coefficients of variation (CV) of 0.4, 2.3, and 0.5 for the red, green, and blue channel levels across all runs. The p-values of the correlation between the red, green, and blue color channel levels, and the width of the large retinal arteriole, were 0.89, 0.27, and 0.58.
Our results suggest that the tested scanners digitize the 35mm slides in a reliable manner without biasing the retinal vessel measurements.
The 35 mm color slide has been used to document eye disease in ophthalmic practice and research for many years.1–3 Research centers such as those at the University of Wisconsin have accumulated large libraries of color slides, representing longitudinal views of the retinas and lenses of study participants in epidemiological studies and clinical trials as well as patients receiving clinical care for retinal disease. More recently, digital fundus photography has become a popular option,4 paralleling developments in the mainstream photography world.
In many cases, an image originally acquired on film is used as input to an automated or semi-automated process, necessitating digitization. At the University of Wisconsin, images originally acquired on film, in some cases from longitudinal studies spanning decades, are digitized for use in semi-automated assessments of retinal vessel diameters. Thirty-five millimeter color slides are digitized using general-purpose flatbed scanners with 35 mm adapters or dedicated 35 mm scanners. The latter are generally recommended for journalistic and scientific 35 mm conversion due to their greater spatial resolution and more consistent quality.5 The dedicated 35 mm film scanners have been widely used in projects where retinal or lens images were originally acquired on film and then digitized.6–9 At the University of Wisconsin, evaluation of the retinal vessel diameters is commonly performed on these images. An underlying assumption is that the digital scanning process is stable, that is, variations in light sensitivity of the 35 mm scanner do not affect the scientific results. The question of potential variations in the sensitivity of the 35 mm scanner has not been widely addressed in the vision research literature, although related issues have been discussed in the radiology field.10–12
Anecdotal evidence of instability of digital scanners led us to design an experiment where a color slide was repeatedly scanned and the resulting images compared. We used popular models of 35 mm scanners and assessed the impact of this variation on retinal vascular caliber gradings.
A given Early Treatment of Diabetic Retinopathy (ETDRS) Field 1 slide, shot on Kodak Ektachrome 100 Plus film and centered on the optic disc, was scanned in four runs of 20 instances each on three different scanner models. The slide was of a healthy eye, of one of the authors, and of good photographic quality.
The scanners used were the Nikon LS-2000 (Nikon Corp, Tokyo, Japan), the Nikon 5000, and the Polaroid 4000 (Polaroid Corp, Waltham, MA). For the Nikon LS-2000, the Nikon Scan 2.5 software was used and the images were scanned at 2700 dots per inch (dpi). For the Nikon 5000, the Nikon Scan 4.0 software was used and the images were scanned at 2540 dpi. For the Polaroid 4000, the Polaroid Polacolor Insight 5.5 software was used and the images scanned at 3000 dpi. Images were acquired at 8 bits per channel and saved in the Tagged Image File Format (TIFF) format.
The four runs were organized into sessions of two cold runs and two warm runs. Cold runs were defined as sessions where equipment was turned on for more than 30 minutes and less than four hours, and warm runs were defined as sessions where equipment was on for more than four hours. A Matlab (The Mathworks, Natick, MA) software program averaged the pixel color values across the entire image separately for the red, green, and blue channels. The result was three numbers for each image. The three numbers characterized the image and were used roughly to quantify the differentness of the images. For our purposes, we looked for a tool that would identify cases of scanner drift and inconsistency. We had received reports of dramatic increase in scanner sensitivity in one of the channels, particularly the red channel, so we focused on color channel levels. The numerical results showed whether or not the results of two scans are identical, and modeled the level of difference. The resulting 240 images were processed using custom-built Matlab routines to determine the average pixel brightness of the red, green, and blue channels. The analysis looked at differences between the warm and cold runs, as well as differences within the runs, to determine if the variations were due to temperature. Vessels within each image were measured three times using the Interactive Vessel ANalysis (IVAN) software13 and three variations of the IVAN grading protocol (The IVAN Grading Protocol, University of Wisconsin Ocular Epidemiology Reading Center internal document, 2006).
The IVAN grading protocol specifies the procedure for choosing the vessels to grade for width, where to measure them, and how to handle issues such as branching of the vessels within the evaluation zone. It delineates methods to adjust mis-readings by the automated process. For this experiment, we chose to focus on four specific retinal vessels; two arterioles and two venules. Three variations of the standard protocol were developed.
In the automated approach (Phase 1), the only grader intervention was to guide the IVAN software to evaluate the pre-selected vessels. Then, using a semi-automated approach (Phase 2), grader intervention included correction, if needed, of the automated placement of the grid over the optic nerve. The third phase used the standard IVAN grading protocol to a greater degree, in that the grader was permitted to truncate and otherwise modify the automated grading. The IVAN software attempts to find the optic nerve head and center the grid on it; occasionally this automatic centration fails and is manually corrected. Under our modified versions of the grading protocol, in Phase 1 we noted, but did not correct, the mis-centration. In Phase 2 and 3, we noted and corrected these errors.
Statistical analysis was performed with the SAS System v 9.1.3 (SAS Institute, Cary, NC). The mean, standard deviation (SD), and coefficients of variation (CV) of the color channel levels were found using procedure MEANS, which performs simple statistics. The analysis of the effect of the warm-up period and the search for correlation between color channels and vessel caliber grading were performed using ANOVA (ANalysis of VAriance) in procedure Generalized Linear Model (GLM). ANOVA is robust to unbalanced data sets. For purposes of analysis, the luminance levels for each channel were stratified into low, medium, and high tertiles of equal size by scanner and these tertiles were regressed against the vessel widths.
The Nikon 5000 scanner had the lowest CV for the blue and red channels, averaging each color channel across all runs (Table 1.). The Nikon LS-2000 scanner had the lowest CV for the green runs (Table 2). The SD for the Nikon scanners ranged from 0.43 to 1.21 in the various channels, with higher SD’s for the Polaroid scanner, ranging from 6.73 in the green channel to 9.01 in the red channel. All channel intensity levels are based on eight-bit pixels and thus potentially can range from 0 to 255. Due to scanning error, the data for run 1 (cold), of the Nikon LS-2000 was discarded.
Analysis of variance was used to compare the warm and cold runs by color channel. The data were missing for one Nikon LS-2000 cold run. Significant differences were found in the color channel levels, in most cases, between the two cold runs and between the two warm runs, as well as between the warm and cold runs, suggesting that the overall variability, while of interest, did not relate to temperature.
The vessel caliber measurements were regressed against the categorized luminance values of the three channels among the three scanners, with all runs combined. Figures 1–4 show scatter plots of the vessel widths, for the four measured vessels, against the color channel levels for the Nikon 5000 scanner. Here for simplicity we show only the results from Phase 3 of the grading protocol, and for the Nikon 5000 scanner. In these figures the color channel levels are normalized to the average value for that scanner and color. Regression lines are displayed. The measured vessel widths seem to vary by around three microns, in the bulk of the cases, and the color channel levels vary by about five brightness levels. The data show no significant relation between the color channel level and the vessel width, except for some relation between the small arteriole and the green channel which rises to the level of significance.
In 3 of 80 scans from the Nikon 5000 scanner, the software incorrectly centered the grid to a large degree, necessitating re-centration of the grid by the grader. Notwithstanding the occasional incorrect centration, we found that the average CV of the measured vessel widths was quite small; for the Nikon 5000 scans, the CV was 0.3 for the large arteriole and 0.4 for the large venule.
This study grew out of reported inconsistencies with the 35 mm scanners extensively used at the Ocular Epidemiology Reading Center. We decided to examine the reliability of the scanning process, investigating the physical consistency of the resulting images as well as the impact of any variation on our computer-assisted retinal vessel caliber grading. Our overall test was to scan the same slide repeatedly. Ideally, the result of these repeated scans would be a series of identical image files. Because the scanner and associated personal computers are electronic-mechanical devices, we expected some variation and found some in color channel levels. Interesting patterns were shown, and may bear further investigation. However, they did not appear to correlate with the warm-up period, and they did not significantly affect the vessel width measurements.
A review of the literature did not find extensive coverage of this topic in vision research publications, although many projects have digitized 35 mm slides in the course of their activity (Ferrier NJ. Automated identification of the anatomical features in slit lamp photographs of the lens. IOVS 2002;43:E-Abstract 435).14–17 In 1991, Cideciyan, Nagel and Jacobson developed a mathematical analysis and model of the overall noise produced by the process of retinal film photography and digital scanning in the course of research into retinitis pigmentosa.18 They described the mechanical operation of the Nikon LS3500 scanner and developed an image processing technique to restore detail lost in the process. In 2000, Wilson, Nemeth, Edwards and Soliz attempted to find the best set of scanning parameters to support human recognition and grading of retinal features such as vessels, cup and disc, and drusen.19
In the field of radiology oncology, a number of investigators have reported on the use of and issues associated with consumer-level flatbed and slide scanners for digitizing dosimetry film. Lynch, Kozelka, Ranade et al. analyzed various performance characteristics of the Macbeth densitometer as well as the Epson and Microtek flatbed scanners used to read radiochromic film.20 They found that after repeated scans the flatbed scanners warmed the film itself, which in turn affected the values they were measuring, and that the scanner did not respond uniformly over the entire area of the film. Hupe and Brunzendorf examined the use of the Nikon 8000 ED slide scanner for radiochromic film dosimetry.21 This scanner used light-emitting diode (LED) illumination which did not heat up the film as much as the lamps used in the flatbed scanners noted above. They also observed the non-uniform characteristics of the charge-coupled diode receptor array but found that the chief contributor to uncertainty in the process was the non-uniformity of the film itself. Paelinck, De Neve, De Wagter et al. evaluated the Epson Pro 1680 Expression scanner and found a warm-up effect where the reading was slightly higher on the first scan. They did not find other forms of drift over time. Along with the other investigators, they noted the “response over the scanner field is not uniform.”22
Semiconductor theory indicates that warmer temperatures result in more noise in the electronic response of the optical sensor.23 The above researchers did not find conclusive evidence of a warm-up effect, and neither did we as a result of our research.23 Variations over time are illustrated in figures 1–4. Note that the luminance of the color channels is measured on a 0–255 scale,24 so a change of 2.0 would be less than 1% of the entire range.
The Nikon scanners had lower CV in the color channel levels, suggesting a better suitability for this type of application.
The core of our analysis was to repeatedly measure the width of certain vessels in the selected eye using variations of our IVAN vessel caliber reading protocol. We made certain modifications to focus precisely on potential vessel width variations. Our hypothesis was that the variations in color channel level, particularly where color levels varied with respect to each other, might influence either the automated or human finding of the vessel walls. As noted above, no clear-cut correlation was found.
We were also interested in any other insights the experiment offered about the vessel caliber grading process using the IVAN software. The IVAN software attempts to automatically detect the optic nerve head and center a grading grid on it. We would not expect any incorrect centration, as all images were of the same slide and the original photo quality was good. Nonetheless, we found a rate of 2.4% incorrectly centered grids. Despite this, we found that the average CV of the measured vessel widths was quite small; for the Nikon 5000 scans, the CV was 0.3 for the large artery and 0.4 for the large vein. In comparison, Newsom, Sullivan, Rassam et al. found CV of 1.5–7.5% for automated vessel diameter measurement and CV of 6–34% for an observer-driven measurement method.24
In conclusion, the film digitization process did not produce systematic bias that would influence the measurement of vessel caliber or the consequent calculation of summary variables based on individual vessel width measurement. The variations in the vessel caliber were low and lent credence to the reproducibility of the IVAN-based vessel width grading process.
The authors wish to thank Mr. Scott Brandenburg, Ms. Holly Cohn, Ms. Lisa Grady, and Ms. Mary Kay Aprison for their assistance.
Supported by the National Institutes of Health grant EY06594 (R Klein, BEK Klein).
No conflicting relationship exists for all authors for this manuscript.