We assessed intergrader agreement between film and digital images in classifying the severity levels of DR, DME, and CSME in a subset of study participants enrolled in DRCR.net studies. In general, agreement between digital and film grading of DR severity was substantial regardless of whether the digital images were obtained using 7 standard ETDRS fields or 4-field, wide-angle images (, ). Agreement of macular edema grading ranged from moderate to substantial.
Recent reports on comparison between film and digital media for DR graded by the FPRC, following the same procedures, support these study results. Li et al.13
compared pairs of stereoscopic images from 152 eyes of 85 patients across the entire ETDRS scale. They showed 67.8% exact agreement and 96.1% agreement within one step in assessment of DR severity level (weighted κ statistic, 0.86; 95% CI, 0.82–0.90). These images were from a single academic center obtained by a single photographer. The similarity to the results found in our present study suggests that rigorous certification of photographers minimizes potential variability between photographers and camera systems. Results were not compared to those in earlier publications of film digital analyses, which showed poor sensitivity of digital images.12,14
As digital cameras have improved substantially in recent years, the process of post hoc manipulation of color balance and illumination have enhanced the quality of digital images. Li et al.15
compared a film image evaluation with three different formats of digital images acquired from different camera types. They noted substantial agreement between all digital formats and film in the evaluation of diabetic retinopathy severity level. Additionally, they noted that severity assessment remained the same, even without stereoscopic images. In the present study, disagreements on DR severity did not appear to be the result of more severe retinopathy being assessed by one specific medium, as the discrepancies were fairly equally distributed on both sides and were without any specific orientation. Distribution of DR severity grading on film and digital images appeared similar (P
for marginal homogeneity = 0.45). There were five images in which the photograph pairs were more than two steps apart. Side-by-side comparisons of these sets showed that four of the five images had different field definition leading to a lesion being seen on one medium and not the other. The lesion was visible on two film images and on two digital images.
The agreement rate between film and digital images was similar to that expected with film–film reproducibility for DR severity level. Grader reproducibility for DR severity on film photographs, as published in ETDRS report number 126
(a grading process similar to that used in the present study), was 53% exact agreement, and 88% were within one step (weighted κ statistic, 0.65). Contemporaneous intergrader reproducibility of DR severity level at the FPRC is assessed by a biannual regrading of images across multiple ongoing studies. Results from May 2010 showed 70% exact agreement and 91% agreement within one step (weighted κ statistic, 0.81). These data included a combination of film and digital images graded over the previous 6 months (unpublished results analyzed by the FPRC, 2010).
In the present study, the agreement rates were lower for the ETDRS DME severity scale than for CSME (weighted κ, 0.44 and 0.72, respectively). This difference may have resulted in part from the differing length of the scales (nine and three steps, respectively) and in part from the presence, in most eyes, of either center-involved DME or no DME at all. Li et al.16
compared an evaluation of presence or absence of CSME in a set of 152 paired stereoscopic film and digital images. The κ statistic for identification of CSME was 0.86 (95% CI: 0.77–0.95), somewhat higher than the weighted κ statistic for CSME in the present study. In a previous DRCR.net report using the ETDRS DME severity scale in eyes with a broader range of DME severity, the weighted κ statistic for a film–film comparison was 0.58 (SE 0.05).8
Photographic gradability and quality are important aspects of the grading process, as subtle DR lesions can easily be obscured without optimal illumination, contrast, and color balance.2
In the present study of eyes with DR, with post hoc optimization of digital images, no systematic differences were noted between film and digital photograph quality. The frequency of ungradable images in assessing DR on digital media was slightly greater than the number on film—however, not to the extent that future clinical trial results would be significantly affected. Technical components that can affect data quality of a digital image are the make, model, and magnification of the cameras. Spatial resolution in most modern cameras is not an issue, as most modern digital cameras have resolution of 6 megapixels or higher, which is close to the resolution of color film. The FPRC procedure of certifying cameras reduces variability and also allows for “scaling” of images, to attain measurement data that can be compared uniformly across various camera types.
Strengths of the study are the inclusion of a large number of clinical sites, digital systems, and photographers in a clinical trial setting, using measures to maximize the quality of digital images. Limitations include fewer subjects with severe NPDR and mild degrees of DME and a small sample size for comparison of agreement between 4-field, wide and modified 7-field fundus cameras. Grading was performed by summary assessment and therefore the exact ability of each modality to assess individual retinopathy lesions with respect to one another other could not be evaluated.
In conclusion, grading results from digital fundus images obtained on 4-field wide or modified 7-field fundus cameras, when optimized in a standardized manner, have substantial to almost perfect agreement with grading obtained from film photographs for assessing DR level. Differences between gradings of DR severity obtained on film and digital images may be similar to differences in replicate gradings of film images. Digital fundus photographs are suitable for clinical trials where DR level is an outcome variable.