The results reported in this paper suggest that automated image grading offers diabetic retinopathy screening programmes an opportunity to reduce safely the manual burden of grading.
The automated “disease/no disease” grading software was more sensitive but less specific than its manual equivalent. As “disease/no disease grading” is only part of the grading process, the overall specificity is unlikely to be affected.
The main function of any grading system is to identify those patients who require referral to ophthalmology or who are not suitable for photographic screening. As well as being effective, automated “disease/no disease” grading has the potential to reduce the human workload of grading. Manual “disease/no disease” grading and “full disease” graders undertook, respectively, 6722 and 2545 grading episodes (including 164 cases referred to “full disease” grading due to concerns about non‐diabetic eye disease) totalling 9267 episodes. In comparison, a system using automated “disease/no disease” grading and manual “full disease” grading would have led to only 3652 manual grading episodes. This equates to a 60% workload reduction.
The current system had higher sensitivity and lower specificity than reported in our previous work, but this had been confined to good quality images.17
Other studies have reported higher sensitivities and specificities, but they used small numbers of patients and did not include automated quality assessment.18,19,28,29,30
The only comparable study used retinal photographs, manually graded for quality, from 773 patients. Candidate bright and dark lesions were identified by image analysis and features classified by a neural network. The authors' recommended operating point gave a sensitivity of 94.8% and specificity of 52.8%.30
In our study, automated grading missed five cases of referable retinopathy/maculopathy (table 2). While this is a concern, the main identifiable source of false negatives in the grading pathway was “full disease” grading accounting for 18 missed cases of referable eye disease in the test set (16 of referable maculopathy and 2 of severe background diabetic retinopathy). The two eyes graded as having proliferative retinopathy and missed by the automated system (table 3), were found to have only mild diabetic retinopathy at eye clinic examination. The automated system detected both patients due to retinopathy in the opposite eye.
The demographic profile of the patients was similar to that reported in the Scottish Diabetes Survey 2003.31
The prevalence of referable retinopathy is comparable to that reported from Tayside (3.0%) and Newcastle (4.5%) but lower than that reported from Liverpool (7.1%) and Cheltenham (12.2%).21,31,32,33,34
As with many screening programmes, the main indications for additional slit lamp examination (883 patients, 13.1%) were people with ungradeable images (553, 8.2%). Although the population studied was predominantly Caucasian, racial variations in pigmentation mainly affect the reflectance of blue and red light from the retina causing differences in retinal colour. Our automated software uses mainly the green plane of the image, which we would expect to be more affected by photographic conditions than by racial variations in retinal pigmentation.
We have presented results by patient, by eye and by image to enable comparison with other grading systems. As was expected, due to concordance between patient eyes, automated detection rates for images and eyes had lower sensitivities and higher specificities than the corresponding detection rates for patients.
The automated methods tested in this study are adaptable to local photographic procedures and equipment. For example, photographic protocols requiring a higher number of images per eye could result in a higher sensitivity and lower specificity per patient so that the operating point for the sensitivity of DH/MA detection would need to be altered if similar results to those presented here are desired. Variations in photographic scale can be handled by scaling of images to a standard number of pixels per degree. There are also local variations in requirements pertaining to retinal field of view. This would affect the field definition aspect of automated image quality assessment but can be adjusted by straightforward modification of software parameters.25
Any necessary resizing or selection of software parameters could then be made appropriately for each case.
Automated grading can run almost continuously at a speed depending on the computer system and the number of parallel processors. For example, the average time to process one patient is under 4 min (on a PC with a 3 GHz Intel Pentium processor) whereas manual grading typically requires 6 min.
Automated grading could have a significant impact on the costs of quality assurance, an essential component of systematic screening. The small increase in quality assurance required for “full disease” graders (due to the higher referral rate) would be outweighed by the reduced quality assurance requirements for “disease/no disease” grading even though the automated system would be treated as an additional grader.
In conclusion, automated grading of diabetic retinopathy and image quality could safely reduce the burden of “disease/no disease” grading in diabetic retinopathy screening programmes and could facilitate implementation of screening across Europe.