|Home | About | Journals | Submit | Contact Us | Français|
To assess the efficacy of automated “disease/no disease” grading for diabetic retinopathy within a systematic screening programme.
Anonymised images were obtained from consecutive patients attending a regional primary care based diabetic retinopathy screening programme. A training set of 1067 images was used to develop automated grading algorithms. The final software was tested using a separate set of 14 406 images from 6722 patients. The sensitivity and specificity of manual and automated systems operating as “disease/no disease” graders (detecting poor quality images and any diabetic retinopathy) were determined relative to a clinical reference standard.
The reference standard classified 8.2% of the patients as having ungradeable images (technical failures) and 62.5% as having no retinopathy. Detection of technical failures or any retinopathy was achieved by manual grading with 86.5% sensitivity (95% confidence interval 85.1 to 87.8) and 95.3% specificity (94.6 to 95.9) and by automated grading with 90.5% sensitivity (89.3 to 91.6) and 67.4% specificity (66.0 to 68.8). Manual and automated grading detected 99.1% and 97.9%, respectively, of patients with referable or observable retinopathy/maculopathy. Manual and automated grading detected 95.7% and 99.8%, respectively, of technical failures.
Automated “disease/no disease” grading of diabetic retinopathy could safely reduce the burden of grading in diabetic retinopathy screening programmes.
Diabetic retinopathy is a major cause of visual impairment in Europe.1,2,3 Systematic screening for diabetic retinopathy in people with diabetes has been shown to be cost‐effective and an essential component of their care.4,5,6,7,8 An ageing population, sedentary lifestyle and obesity all contribute to the increasing prevalence of type 2 diabetes. The number of people with diabetes is expected to double in the next 15 to 30 years.9 Delivery of a quality assured, systematic screening programme is a major challenge for health care providers. At the conference “Screening for Diabetic Retinopathy in Europe – 15 years after St. Vincent” on 17–18 November 2005 in Liverpool,10 official national representatives of 29 European Countries declared that European Countries should:
“Reduce the risk of visual impairment due to diabetic retinopathy by 2010 through:
‐ a systematic programme of screening reaching at least 80% of the population with diabetes;
‐ using trained professionals and personnel;
‐ universal access to laser therapy.”
Assuming a 4% prevalence of diabetes, governments in Europe might have to offer screening to 35 million people.11
In our experience, 60% of patients screened have no retinopathy.12 To make best use of limited resources, it has been proposed that the assessment of image quality and the presence or absence of any diabetic retinopathy could be performed by relatively inexperienced “disease/no disease” graders after a short period of training. Experienced “full disease” graders would then identify patients, deemed to have retinopathy, for referral to ophthalmology.13,14,15
Microaneurysms appear as red dots on colour photographs and are a sensitive sign of diabetic retinopathy.16 A number of automated systems for detecting microaneurysms in digital fundus photographs have been evaluated.16,17,18,19 However, before these systems could operate in screening practice they would require more thorough validation and the addition of a system for detecting images with ungradeable quality.
The aim of this study was to compare a combined automated image quality and “disease/no disease” grading system with the existing manual system in a diabetic retinopathy systematic screening programme.
Anonymised images forming a test set were obtained prospectively from consecutive patients attending the Grampian Diabetes Retinal Screening Programme in 2003–2004 in North‐East Scotland. This consisted of 14 406 images from 6722 consecutive patients. The median age was 65 years (IQR 19 years), 3725 (55%) were male and 46 (0.7%) had only one screenable eye. A total of 1423 patients (21.2%) received mydriasis as adequate quality photographs were not obtained with undilated photography. Ethical approval was obtained from the Grampian Medical Research Ethics Committee for the use of the anonymised images and grading data.
The programme used non‐mydriatic 45‐degree fixed Canon CR5‐45NM and mobile Canon CR6‐45NM fundus cameras (Canon Inc. Medical Equipment Business Group, Kanagawa, Japan) attached to Canon D30 digital colour cameras (2160×1440 pixels). Each patient had a minimum of one disc/macula photograph per screenable eye.13 The photographers took additional photographs if image quality or field definition was considered inadequate at the time of photography. If the photograph was inadequate because the pupil was too small then the pupils were dilated using Tropicamide 1% (Chauvin Pharmaceuticals Ltd, Kingston‐upon‐Thames, UK).
The test set images were displayed on a contrast adjusted 22 inch monitor (Iiyama Vision Master Pro 512) at full resolution and graded by the clinical research fellow using the definitions of table 11.. The quality grading included assessments of field definition and image clarity. The clinical research fellow was trained to use the Scottish Diabetic Retinopathy Grading Scheme (table 11),), with 500 images from a training set. This is based on the ETDRS grading scheme, adapted for non‐stereoscopic single disc/macula field photographs, and very similar to the “International clinical diabetic retinopathy and diabetic macula edema severity scale”.14
There was good agreement between the clinical research fellow and the clinical director for retinal screening outcome (κ=0.91), grade of retinopathy (κ=0.85) and grade of maculopathy (κ=0.85) for the training set.
This study assessed the efficacy of the manual and automated “disease/no disease” grading systems against the reference standard (fig 11).). In the manual system, patients were referred for “full disease” grading if the images of either eye were of inadequate quality or if any diabetic retinopathy was detected. The automated system referred patients for “full disease” grading if all images of one eye were of inadequate quality or if dot haemorrhage/microaneurysms (DH/MA) were detected.
The manual grading of the retinal images was performed by three retinal screeners who also performed the photography. They were trained according to the recommendations of the HTBS.13 Manual graders recorded one grade for each eye.
The automated grading system consisted of software for image quality assessment and for DH/MA detection. The software was developed using a training set that was separate from the test set. Details of the image processing methods are described elsewhere.17,24,25,26,27 Quality assessment was performed in two steps, field definition assessment and clarity assessment. Field definition was assessed by ascertaining the location of the optic disc and the fovea, and whether both temporal arcades were visible. The position of the optic disc was found by identifying a circular shape lying at the apex of an approximate ellipse formed by the temporal arcades. The fovea was identified as a dark area close to a point at a fixed distance from the optic disc and on the centreline of the elliptical temporal arcades. Image clarity was assessed by checking the length of visible small vessels within a circle of radius of 1.75 optic disc diameters around the fovea. An image was judged to have adequate quality for retinopathy detection if it had adequate field definition and adequate clarity. The location, in the image, of the optic disc and the orientation of the course of the temporal arcades was used to decide whether an image was of the left or right eye.
If there was at least one image of adequate quality for each eye, then DH/MA detection was applied to all images of adequate quality for that patient. Dark objects of limited size were detected using techniques from mathematical morphology and used as DH/MA candidates. As many of these are not real DH/MAs, features were evaluated on each candidate such as area, eccentricity, mean intensity and mean intensity gradient. The features were used to classify candidates as true or false identifications. DH/MA detection was improved by recent developments that include explicit vessel detection and analysis of the background retina around candidates.26
The operating point of DH/MA detection can be adjusted to alter its sensitivity, an increase in sensitivity being accompanied by a decrease in specificity. Due to inter‐eye concordance, the sensitivity must be adjusted according to the results per patient rather than the results per eye. We used an operating point that gave a higher sensitivity for detection of patients with any retinopathy than that of manual graders.
Data were analysed using the Statistical Package for Social Sciences (SPSS, V.13.0). Sensitivity and specificity are presented for manual and automated grading with confidence intervals calculated using the Wilson score method without continuity correction. Overall agreements were measured using the kappa (κ) statistic. McNemar tests were used to compare positive agreement between the automated and manual grading.
TablesTables 2–5 compare the performance of the manual and automated “disease/no disease” grading systems against the reference standard. Table 22 shows frequencies of patients with each grade of retinopathy as assigned by the reference standard process. Table 22 also shows detection rates and the number of missed cases for each retinopathy grade and for each “disease/no disease” grading system. The number of patients misclassified as normal by the automated system (n=240) was significantly lower than for the manual system (n=341) (p<0.001).
Table 22 shows that 3 of the 330 patients with referable or observable retinopathy/maculopathy (M1, R2, M2, R3 or R4) were graded as having no retinopathy by the manual system and that seven were graded as having no retinopathy by the automated system. The difference is not statistically significant (p=0.125). None of the non‐referred patients had new vessel growth. One patient (M2) had solitary minor exudate formation and was missed by both systems. Six other patients (two M1, four M2) were missed by the automated system: two had solitary blot haemorrhages, one had a small solitary linear streak of exudate and three had small, scattered exudates. The manual system missed two additional patients, one with isolated intra‐retinal microvascular anomalies (R3) and one with microaneurysms and small exudates (M1).
Table 33 presents data in a similar format but for detection rates in eyes and images. The manual and automated systems missed 10 eyes and 26 eyes, respectively, out of 465 eyes with referable or observable retinopathy/maculopathy. In terms of images, the automated system missed 28 out of 527. It was not possible to obtain detection rates for the manual system by image as the manual graders provided a composite grade for each eye. Two eyes from two patients graded as R4 but missed by the automated system were found, at clinical review, to have mild background diabetic retinopathy and disc collaterals resulting from old retinal vein occlusions.
TablesTables 4 and 55 show sensitivities and specificities for both grading systems. In table 44,, the results are presented by patient and, in table 55,, by eye and by image. As for table 33,, it was not possible to obtain results by image for the manual system. Technical failure sensitivities and specificities are for the quality assessment stage of the two systems alone and take no account of disease. Sensitivities and specificities for detection of any “disease” in patients with no technical failure are based on the subset of cases where a decision on the presence of retinopathy was made by both the reference standard and by the grading system under assessment. They therefore ignore cases deemed as technical failure by either the reference standard or by the grading system under assessment. Sensitivities and specificities for referral to “full disease” grading are based on a “disease/no disease” grading result combining technical failure and any “disease” detection. They indicate the proportions of cases that were correctly determined as being normal or abnormal by the grading system under assessment.
The results reported in this paper suggest that automated image grading offers diabetic retinopathy screening programmes an opportunity to reduce safely the manual burden of grading.
The automated “disease/no disease” grading software was more sensitive but less specific than its manual equivalent. As “disease/no disease grading” is only part of the grading process, the overall specificity is unlikely to be affected.
The main function of any grading system is to identify those patients who require referral to ophthalmology or who are not suitable for photographic screening. As well as being effective, automated “disease/no disease” grading has the potential to reduce the human workload of grading. Manual “disease/no disease” grading and “full disease” graders undertook, respectively, 6722 and 2545 grading episodes (including 164 cases referred to “full disease” grading due to concerns about non‐diabetic eye disease) totalling 9267 episodes. In comparison, a system using automated “disease/no disease” grading and manual “full disease” grading would have led to only 3652 manual grading episodes. This equates to a 60% workload reduction.
The current system had higher sensitivity and lower specificity than reported in our previous work, but this had been confined to good quality images.17 Other studies have reported higher sensitivities and specificities, but they used small numbers of patients and did not include automated quality assessment.18,19,28,29,30 The only comparable study used retinal photographs, manually graded for quality, from 773 patients. Candidate bright and dark lesions were identified by image analysis and features classified by a neural network. The authors' recommended operating point gave a sensitivity of 94.8% and specificity of 52.8%.30
In our study, automated grading missed five cases of referable retinopathy/maculopathy (table 22).). While this is a concern, the main identifiable source of false negatives in the grading pathway was “full disease” grading accounting for 18 missed cases of referable eye disease in the test set (16 of referable maculopathy and 2 of severe background diabetic retinopathy). The two eyes graded as having proliferative retinopathy and missed by the automated system (table 33),), were found to have only mild diabetic retinopathy at eye clinic examination. The automated system detected both patients due to retinopathy in the opposite eye.
The demographic profile of the patients was similar to that reported in the Scottish Diabetes Survey 2003.31 The prevalence of referable retinopathy is comparable to that reported from Tayside (3.0%) and Newcastle (4.5%) but lower than that reported from Liverpool (7.1%) and Cheltenham (12.2%).21,31,32,33,34 As with many screening programmes, the main indications for additional slit lamp examination (883 patients, 13.1%) were people with ungradeable images (553, 8.2%). Although the population studied was predominantly Caucasian, racial variations in pigmentation mainly affect the reflectance of blue and red light from the retina causing differences in retinal colour. Our automated software uses mainly the green plane of the image, which we would expect to be more affected by photographic conditions than by racial variations in retinal pigmentation.
We have presented results by patient, by eye and by image to enable comparison with other grading systems. As was expected, due to concordance between patient eyes, automated detection rates for images and eyes had lower sensitivities and higher specificities than the corresponding detection rates for patients.
The automated methods tested in this study are adaptable to local photographic procedures and equipment. For example, photographic protocols requiring a higher number of images per eye could result in a higher sensitivity and lower specificity per patient so that the operating point for the sensitivity of DH/MA detection would need to be altered if similar results to those presented here are desired. Variations in photographic scale can be handled by scaling of images to a standard number of pixels per degree. There are also local variations in requirements pertaining to retinal field of view. This would affect the field definition aspect of automated image quality assessment but can be adjusted by straightforward modification of software parameters.25 Any necessary resizing or selection of software parameters could then be made appropriately for each case.
Automated grading can run almost continuously at a speed depending on the computer system and the number of parallel processors. For example, the average time to process one patient is under 4 min (on a PC with a 3 GHz Intel Pentium processor) whereas manual grading typically requires 6 min.
Automated grading could have a significant impact on the costs of quality assurance, an essential component of systematic screening. The small increase in quality assurance required for “full disease” graders (due to the higher referral rate) would be outweighed by the reduced quality assurance requirements for “disease/no disease” grading even though the automated system would be treated as an additional grader.
In conclusion, automated grading of diabetic retinopathy and image quality could safely reduce the burden of “disease/no disease” grading in diabetic retinopathy screening programmes and could facilitate implementation of screening across Europe.
We would like to thank Lorraine Urquhart and members of her team from the Grampian Retinal Screening Programme for their support.
JAO was the principal investigator. JAO, PFS, KAG, PM and GJP designed the study. SP performed the data collection and reference grading. ADF developed the automated methods. JAO performed the quality assurance. SF and GJP performed the statistical analyses. All participated in the interpretation of the data. SP and ADF wrote the first draft of the paper. All authors reviewed and revised the paper for important intellectual content. JAO takes responsibility for the content.
DH/MA - dot haemorrhage/microaneurysm
Funding: This project was funded by the Chief Scientist Office, Scottish Executive Health Department (grant number CZH/4/76).
Competing interests: Implementation in Scotland is being considered. If this occurs it is likely that there will be some remuneration for the University of Aberdeen, NHS Grampian and the Scottish Executive.
Ethics approval: Ethics approval was obtained from the Grampian Medical Research Ethics Committee for the use of the anonymised images and grading data.