|Home | About | Journals | Submit | Contact Us | Français|
To assess the repeatability and measurement error associated with cone density and nearest neighbor distance (NND) estimates in images of the parafoveal cone mosaic obtained with an adaptive optics scanning light ophthalmoscope (AOSLO).
Twenty-one participants with no known ocular pathology were recruited. Four retinal locations, approximately 0.65° eccentricity from the center of fixation were imaged 10 times in randomized order with an AOSLO. Cone coordinates in each image were identified using an automated algorithm (with or without manual correction), from which cone density and NND were calculated. Owing to naturally occurring fixational instability, the 10 images recorded from a given location did not overlap entirely. We thus analyzed each image set both before and after alignment.
Automated estimates of cone density on the unaligned image sets showed a coefficient of repeatability of 11,769 cones/mm2 (17.1%). The primary reason for this variability appears to be fixational instability, as aligning the 10 images to include the exact same retinal area, results in an improved repeatability of 4,358 cones/mm2 (6.4%) using completely automated cone identification software. Repeatability improved further by manually identifying cones missed by the automated algorithm, with a coefficient of repeatability of 1,967 cones/mm2 (2.7%). NND showed improved repeatability, and was generally insensitive to the undersampling by the automated algorithm.
As our data were collected in a young, healthy population, this likely represents a best-case estimate for corresponding measurements in patients with retinal disease. Similar studies need to be carried out on other imaging systems (including those using different imaging modalities, wavefront correction technology, and/or cone identification software), as repeatability would be expected to be highly sensitive to initial image quality and the performance of cone identification algorithms. Separate studies addressing inter-session repeatability and inter-observer reliability are also needed.
The use of ophthalmoscopes equipped with adaptive optics (AO) enables direct visualization of individual cone and rod photoreceptors in the living human retina.1, 2 The higher transverse resolution provided by AO makes it possible to examine features of the photoreceptor mosaic such as the spatial arrangement of the different spectral types of cone within the mosaic, 3, 4 temporal reflectance changes of individual cones and rods,5–9 and even the orientation tuning of individual cones.10 However the most exciting applications of this imaging technology are perhaps the clinical ones, as AO imaging tools offer the promise of a more sensitive means with which to characterize and track retinal degeneration than is currently possible with conventional clinical tools. This capability is especially pertinent to those conditions for which treatments are available or will soon become available.
Central to the realization of the clinical potential of AO imaging is the development of robust techniques with which to analyze such high-resolution images. The ability to use retinal images to make a determination about whether the photoreceptor mosaic of a particular individual has changed over time, or whether it differs from normal depends, among other things, on the reliability and repeatability of the metric being used. Metrics currently used include cell density,11 mosaic geometry,12, 13 and cell spacing,14, 15 though there remains inconsistency in how these are derived. While numerous studies have examined photoreceptor density and spacing in the normal16–18 and diseased14, 19–23 retina, there have been only a few reports examining the repeatability of such measurements, outlined below.
A recent study by Talcott et al. performed a repeated-measures analysis of cone spacing in three normal eyes and found no significant change in cone spacing over time periods ranging from 16 to 53 months.24 They provide an estimate of error in cone density measurements of 6.3%, which takes into account cone selection/misidentification, spectacle magnification errors, distortion in cone images from eye motion, and the selection of the region of interest for analysis. In a single patient with a red-green color vision defect, Rha et al. observed a 3.9% change in cone density over a period of six years.25 Boretsky et al. reported a standard deviation of less than 1,000 cones/mm2 for repeated measures of the same retinal location, though the identification of cone cells was reported to be highly dependent on the confocal pinhole diameter (which would affect the contrast of individual cells) and no additional repeatability statistics were reported.26 Song et al. imaged a single retinal location in one subject at two time points separated by six months and observed cone density estimates from the two sessions within 2%.18 Despite these isolated reports, there remains a pressing need to rigorously define repeatability statistics for cone density measurements in a larger population, in order to facilitate their application to larger clinical studies. In other words, it is difficult to determine whether a significant change has occurred without an estimate of the repeatability of any one measurement. As such, the purpose of the present study was to assess the intrasession repeatability of in vivo cone density measurements based on automated and semi-automated cone identification, and to quantify the measurement error. In addition, we investigated the intrasession repeatability of a metric of cone spacing, mean nearest neighbor distance (NND), also using automated and semi-automated cone identification. For both metrics, we also assessed the effect of the size of the retinal area sampled, as different sampling strategies are often used by different investigators. These results provide a valuable starting point in the discussion of repeatability, and similar systematic approaches will be required for different systems and cone identification software.
All research followed the tenets of the Declaration of Helsinki and study protocols were approved by the Institutional Review Boards at the Medical College of Wisconsin and Marquette University. Subjects provided informed consent after the nature and possible consequences of the study were explained. Axial length measurements were obtained on all of the subjects using an IOL Master (Carl Zeiss Meditec, Dublin, CA) to calculate the scale of the retinal images. Twenty-one subjects (13 males and 8 females, aged 25.9 ± 6.5 years) were recruited for the study (Table 1). No subjects had any vision limiting pathology, though one subject (JC_0002) was found to have an inherited color vision deficiency (deuteranopia). While some individuals with color vision defects have been show to have disrupted cone mosaics,20, 27 this subject was previously shown to have a contiguous cone mosaic of normal density and did not harbor any genetic mutation known to affect cone structure in red-green color vision defects, and was thus included in the present study.
Each subject’s head was stabilized using a chin and forehead rest similar to those found on standard clinical imaging instruments. There was no pupil dilation or control of accommodation using eye drops. A previously described AOSLO was used to image the parafoveal cone mosaic of the right eye.28, 29 The wavelength of the super luminescent diode used for retinal imaging was 775nm, subtending a field of view of 0.96° x 0.96°. The system’s pupil used for imaging was 7.75mm, however the eye’s pupil was undilated and certainly less than this. We thus calculated that the 30μm confocal pinhole of our system was one Airy disk diameter or less. Separate image sequences of 150 frames each were acquired at four parafoveal locations, each approximately 0.65° from the center of fixation (Figure 1). The four parafoveal locations were imaged in a random order, with the subject staying positioned on the chin/forehead rest for each set of four image sequences. Randomization of the imaging order had two potential benefits. First, the image quality may be best at the first location imaged when the tear film might be more evenly distributed across the cornea (though subjects were instructed to blink normally during each imaging set). Second, the randomized order would mitigate any effect in decreased fixational stability over the course of the imaging session, which might result from fatigue. This procedure was repeated 10 times for each subject with a short break after each set of four locations. The image acquisition software had an “active blink removal” algorithm, which discarded frames that had a mean intensity below a specified threshold. This process improved the percentage of frames in the recorded image sequence (always 150 frames) that contained useable retinal image data.
To correct for intraframe distortions within the frames of the raw image sequence due to the sinusoidal motion of the resonant optical scanner, we estimated the distortion from stable images of a Ronchi ruling, and then re-sampled each frame of the raw image sequence over a grid of equally spaced pixels. After desinusoiding, a reference frame was manually selected from within each image sequence, for subsequent registration using custom software. Registration of frames within a given image sequence was performed using a “strip” registration method, in which the frames were registered by dividing the frame of interest into strips, aligning each strip to the location in the reference frame that maximizes the normalized cross correlation between them.30 Once all the frames were registered, the 40 frames with the highest normalized cross correlation to the reference frame were averaged, in order to generate a final registered image with an increased signal to noise ratio for subsequent analysis.
A total of 840 registered images (21 subjects, four locations each, 10 images at each location) were analyzed. The same retinal area (55μm × 55μm) within the central portion of each image was cropped and used for subsequent analysis of cone density at each location (Figure 1). The cropped images were analyzed three different ways. First, a completely automated algorithm implemented in Matlab (Mathworks, Natick, MA) was used to identify the cones in each cropped image. This is a modified version of the previously described algorithm of Li & Roorda (2007).12 This algorithm first applies a finite-impulse-response low-pass filter to the retinal image. The original version of the algorithm required manual setting of cutoff frequency of this filter, which dramatically affects the performance of the algorithm. In our study, the filter applied to the image was objectively and automatically determined based on the image itself (by first automatically estimating the modal cone frequency in the image being analyzed). Local maxima were then identified in the filtered image, and complete details of the method for applying the filter and identifying local maxima have been previously published,12 which were applied similarly here. The number of cones in each cropped image was simply divided by the retinal area (0.003025mm2) to derive an estimate of cone density for a given cropped image. The (x,y) coordinates of the cones were stored in a text array and the Delauney triangulation of the coordinates was obtained. From this triangulation, the built in dsearch function in Matlab was used to find the distance of the closest cone in the array for each of the cones (NND). This is identical to the newer function, nearest Neighbor.
We then repeated the analysis, except in the second analysis, the 10 averaged images from a given location were first aligned to one another (using the same strip registration as described above) before cropping the central portion (see Figure 2 and Supplementary Digital Content 1 available at [LWW insert link]). This ensures that cone density and NND estimates were derived from exactly the same retinal area. The third analysis incorporated manual identification of cones missed by the automated algorithm, using the same aligned image sets utilized in the second analysis. All manual additions for the 840 aligned and cropped images were performed by the same observer (author JC). The identity of the images was not known to the observer and were presented in random order. During the manual addition step, the brightness and contrast of the image was adjusted by the observer to assist in determining whether a cone was present or not. While the opportunity to remove cones was also available to the observer, no such removals were necessary in our image set.
These three analyses were then applied to two additional cropped image sets utilizing smaller sampling windows. As we were interested in the effect of the sampling window size, we simply selectively truncated the (x,y) cone coordinate list to leave just those cones falling within 40μm or 25μm of the center. This resulted in 40μm × 40μm and 25μm × 25μm cropped image sets, respectively.
The repeatability for each of the analysis conditions described above was calculated based on the within-subject standard deviation (Sw) as outlined by Bland & Altman (1996).31 To estimate Sw, we first calculated the standard deviation of the repeated measures for each subject, and then squared this to get variance for each subject. The square root of the average variance for the 21 subjects gives Sw, and repeatability is defined as Sw times 2.77.31 The 95% confidence interval (CI) for repeatability is 1.96 ( ), where n is the number of subjects and m is the number of observations for each subject. Repeatability is reported both in terms of the measurement unit as well as a percentage of the mean. The measurement error is defined as Sw times 1.96, and the difference between a subject’s measurement and the true value would be expected to be less than the measurement error for 95% of observations.
Figure 3 shows representative images of the parafoveal cone mosaic (~0.65° eccentricity) for all 21 subjects, acquired at the temporal-superior fixation location. As can be seen in the figure, contiguous images of the cone mosaic were obtained in all subjects. In assessing the repeatability of cone density measurements using the completely automated algorithm, we find an average repeatability of 11,769 cones/mm2, or 17.1%. This means that the difference between two measurements for the same subject would be less than this value for 95% of pairs of observations. The measurement error in this case was 8,328 cones/mm2, which represents the expected difference between a single measurement and the true value for 95% of observations. Compared to cone density, NND showed enhanced repeatability of 0.29 μm (8.4%), with a measurement error of 0.20 μm. A summary of the repeatability statistics is provided in Table 2 and Table 3. In examining the left panel of Figure 2, we see that despite instructing the subject to fixate at a given location 10 times, a slightly different patch of cones was imaged each time. Thus, the relatively poor repeatability here is due to the fact that fixation is unstable even in “normal” subjects and the density/spacing of the underlying mosaic is changing rapidly near the fovea. As a result, even small deviations in fixation would result in differences in cone density or NND between successive images.
To account for fixational instability, the 10 images from a given fixation location were first aligned to each another before cropping out the central 55μm × 55μm for analysis. As shown in the right panel of Figure 2, this results in a situation where exactly the same cones are included in the analysis. As summarized in Table 2, this results in an improved average repeatability of 4,358 cones/mm2, or 6.4% for the aligned images. In this case, the measurement error was 3,084 cones/mm2, which again represents the expected difference between a single measurement and the true value for 95% of observations. For the 55μm × 55μm cropped images an average of 207 cones were identified by the automated algorithm, so our repeatability indicates that the number of cones missed between two measurements for the same subject would be fewer than 13 for 95% of pairs of observations. The average repeatability for the NND measurements improved to 0.078μm (2.3%), with a measurement error of 0.055μm (Table 3).
The third analysis allowed the manual addition of cones that were missed by the automated algorithm. Despite good image contrast and resolution, the performance of the automated cone identification algorithm was highly variable, and this can be seen in Figure 4. An average of 12 cones were manually added across the 840 images analyzed (range=0–62 cones added), resulting in an average of 219 total cones in the 55μm × 55μm cropped images. The top row of Figure 4 shows an example of an image where the user added no cones. In other words, by the judgment of the user, no cones were missed by the automated algorithm. The middle row of Figure 4 shows an example of an image where the user identified 12 cones missed by the automated algorithm, and the bottom row shows an example of an image where the user identified 62 cones missed by the automated algorithm. The manual addition step further improves the repeatability of cone density measurements, with an average repeatability of 1,967 cones/mm2, or 2.7% (Table 2). For our data, this is equivalent to about 6 cones, indicating that the number of cones missed between two measurements for the same subject would be fewer than 6 for 95% of pairs of observations. The associated measurement error improves to 1,392 cones/mm2and the average standard deviation for the 10 repeated measures across the 21 subjects was 710 cones/mm2.
In contrast to cone density, the NND measurements showed no improvement over those obtained using the completely automated algorithm, highlighting the insensitivity of this metric to small amounts of undersampling. The average repeatability for the NND measurements was 0.090μm (2.7%), with a measurement error of 0.064μm.(Table 3)
We repeated all of the above analyses on our image sets using two smaller sampling windows, 40μm × 40μm and 25μm × 25μm. These were chosen based on those reported previously by other groups.16, 18 Interestingly, as the sampling window size decreased, we observed a decrease in the repeatability and an increase in the measurement error for both cone density and NND, though there was some variability in the effect. Complete statistical summaries for cone density for the 40μm × 40μm sampling window are given in Table 4, while those for the 25μm × 25μm sampling window are given in Table 5. Table 6 and Table 7 provide similar summaries of the NND measurements. These data illustrate the importance of specifying the size of the sampling window used to derive density estimates in order to facilitate comparison of different studies.
Accepting that the estimates of cone density and NND obtained using the aligned images with manual addition of cones are more accurate than those based on the completely automated analysis, we can examine the statistics of the normal cone mosaic. Table 8 provides the average cone density and NND for each subject using each of the three sampling window sizes. There was no significant difference in cone density across the three sampling window conditions (p=0.21, repeated measures ANOVA, GraphPad Instat, v3.1a). The average cone density for each subject ranged from 55,165 cones/mm2 to 93,604 cones/mm2, with a mean (± SD) of the group of 72,528 ± 8,539 cones/mm2 (using the 55μm × 55μm window). This is comparable to previous estimates at this retinal location (~0.65°). For example, Li et al. reported a range from about 64,000 cones/mm2 to 98,000 cones/mm2 at a comparable eccentricity across 18 subjects.17
As seen in Table 8, there was a significant difference between the NND values across the three sampling window conditions (p<0.0001, repeated measures ANOVA, Bonferroni corrected, GraphPad Instat, v3.1a). This presumably reflects the fact that as the sampling window decreases in size, the relative proportion of cones with undefined neighbors increases. These edge cones will serve to increase, on average, the NND – as there are only two possible scenarios with regard to the NND for that cone. Either the nearest neighbor resides within the sampling window, or it falls outside the sampling window. If it falls inside the sampling window, the NND value recorded for that cone will be equal to the true NND for that cone. If, on the other hand, it falls outside the sampling window, then the NND value for that cone will be based on the closest neighbor within the sampling window, which will always have a greater intercone distance than the true NND for that cone. While this artifact affects the overall accuracy of NND measurements, it wouldn’t affect the measured repeatability, as each image within a given condition would be expected to have a similar proportion of cones at the edge of that particular sampling window size.
Using undilated pupils, we obtained images of the contiguous cone mosaic in 21 subjects with an AOSLO at four locations, each approximately 0.65° from the center of fixation. We used automated and/or manual approaches to identify the cones in each image, from which cone density and NND were calculated. These data represent an important first step in assessing the broader clinical utility of such measurements, specifically with regard to determining whether a given mosaic has changed over time or whether a given mosaic differs significantly from another or from a population mean. There are a number of important limitations and caveats to our study that we review here, with the goal of stimulating further work on this issue so as to accelerate the development of robust image analysis tools for in vivo images of the photoreceptor mosaic.
First, our images were acquired close to the fovea (within about 200 μm). It is known from a number of studies that this is where cone density is changing most rapidly.17, 18, 32 One would expect that in the periphery, where cone density is more uniform, that the repeatability would be affected less by fixational instability and that there may be less of a difference between the automated approach that does not include aligning the successive images to one another versus the automated approach that first aligns the successive images to one another.
A second issue relates to the fact that we only examined the cone mosaic. As has been shown recently, it is now possible to image the rod mosaic.2, 29, 33 Unlike the cone mosaic, which appears to reach an asymptotic density beyond about 5mm, rod density changes throughout the retina; first increasing sharply moving away from the fovea and then decreasing beyond the rod-rim.32 As a result, the same negative effect that small misalignments between images has on the repeatability of parafoveal cone density estimates would exist for estimates of peripheral rod density. Thus, we conclude that obtaining the highest intersession repeatability requires precise alignment of images from each session, or some other means by which one can ensure the images are from the exact same retinal location. Not doing this severely limits the sensitivity of the corresponding photoreceptor density measurements.
Another important issue to consider relates to the use of cone density and NND as our image metrics. While our NND measurements were less sensitive to undersampling (i.e., missed cones) than our estimates of cone density (Table 3), it has been shown previously that measures of cone spacing based on an exclusion radius are even less sensitive to undersampling.34, 35 Such insensitivity could be viewed as either an advantage or disadvantage. From the point of view of developing image processing tools to find cones in an image, the utilization of spacing metrics relaxes the constraint that such a tool find each and every cell in the image. However from an image interpretation point of view, finding “normal” cone spacing in an image in no way ensures that the image in its entirety is “normal”. Thus, these spacing measures overestimate the global health of the photoreceptor mosaic. For example, a mosaic that has sporadic loss of cones would be flagged as having normal spacing, but abnormal density. To be able to use density, one needs to be sure that they can reliably visualize every cell that remains in the mosaic. Likewise, any analysis of the geometry of the mosaic (i.e., Voronoi) requires that every cell present be visualized. As suggested by Chen et al.,36 cone spacing (and conversely, cone density) should each only be considered one aspect of image analysis. Perhaps more importantly, it will be useful to combine different mosaic metrics (both local and global) to provide a more comprehensive picture of the overall integrity of the mosaic.
In conclusion, we have defined the repeatability of parafoveal cone density measurements for our AOSLO system and accompanying semi-automated cone identification software, as well as the associated measurement error. Repeatability would be expected to differ from system to system based on image quality and individual, thus one should not generalize these results to other research or commercial AO systems, though our data provide a useful starting point for the discussion of reliability and repeatability. Our data also demonstrate the importance of specifying the size of the sampling window, as this can affect the repeatability and/or absolute values of cone density and NND. For multicenter clinical trials, it will be important to demonstrate comparable repeatability across systems, as well as establishing the inter-session repeatability and inter-observer reliability. Equally important are the development of normative databases against which measurements of the cone mosaic in diseased retinas can be compared. There are growing databases of cone spacing14, 22, 36 and cone density16–18 that will need to be expanded to include information about the rod mosaic as well as define the repeatability of the measurements used to construct the databases.
Shown are unaligned (left) and aligned (right) image sequences of the 10 images acquired using the temporal-inferior fixation location for JC_0616. The white box depicts a 55μm × 55μm sampling window, demonstrating how different photoreceptors are sampled in each of the 10 images in the unaligned condition, while in the aligned image sequence, the exact same photoreceptors are analyzed in each of the 10 images. Scale bar is 50μm. (.avi file).
J. Carroll is the recipient of a Career Development Award from Research to Prevent Blindness. A. Dubra is the recipient of a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. This study was supported by NIH Grants P30EY001931, T32EY014537, R01EY017607, UL1RR031973, The Gene and Ruth Posner Foundation, Foundation Fighting Blindness, RD and Linda Peters Foundation, and an unrestricted departmental grant from Research to Prevent Blindness. This investigation was conducted in a facility constructed with support from Research Facilities Improvement Program Grant Number C06 RR-RR016511 from the National Center for Research Resources, National Institutes of Health. The authors would like to thank Charlie Fields for designing the patient interface, and Austin Roorda & Kaccie Li for access to the Matlab code of their cone identification algorithm.