PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of brjopthalBritish Journal of OphthalmologyCurrent TOCInstructions for authors
 
Br J Ophthalmol. Nov 2007; 91(11): 1512–1517.
Published online May 15, 2007. doi:  10.1136/bjo.2007.119453
PMCID: PMC2095421
The efficacy of automated “disease/no disease” grading for diabetic retinopathy in a systematic screening programme
S Philip, A D Fleming, K A Goatman, S Fonseca, P Mcnamee, G S Scotland, G J Prescott, P F Sharp, and J A Olson
S Philip, Biomedical Physics and Grampian Retinal Screening Programme, University of Aberdeen, Foresterhill, Aberdeen
A D Fleming, K A Goatman, P F Sharp, Biomedical Physics, University of Aberdeen, Foresterhill, Aberdeen
S Fonseca, G J Prescott, Department of Public Health, University of Aberdeen, Foresterhill, Aberdeen
P Mcnamee, G S Scotland, Health Economics Research Unit, University of Aberdeen, Foresterhill, Aberdeen
J A Olson, Retinal Screening, David Anderson Building, Foresterhill Road, Aberdeen
Correspondence to: Dr John A Olson
Clinical Director, Diabetes Retinal Screening Service, David Anderson Building, Foresterhill Road, Aberdeen AB25 2ZP; John.olson@nhs.net
Accepted May 4, 2007.
Aim
To assess the efficacy of automated “disease/no disease” grading for diabetic retinopathy within a systematic screening programme.
Methods
Anonymised images were obtained from consecutive patients attending a regional primary care based diabetic retinopathy screening programme. A training set of 1067 images was used to develop automated grading algorithms. The final software was tested using a separate set of 14 406 images from 6722 patients. The sensitivity and specificity of manual and automated systems operating as “disease/no disease” graders (detecting poor quality images and any diabetic retinopathy) were determined relative to a clinical reference standard.
Results
The reference standard classified 8.2% of the patients as having ungradeable images (technical failures) and 62.5% as having no retinopathy. Detection of technical failures or any retinopathy was achieved by manual grading with 86.5% sensitivity (95% confidence interval 85.1 to 87.8) and 95.3% specificity (94.6 to 95.9) and by automated grading with 90.5% sensitivity (89.3 to 91.6) and 67.4% specificity (66.0 to 68.8). Manual and automated grading detected 99.1% and 97.9%, respectively, of patients with referable or observable retinopathy/maculopathy. Manual and automated grading detected 95.7% and 99.8%, respectively, of technical failures.
Conclusion
Automated “disease/no disease” grading of diabetic retinopathy could safely reduce the burden of grading in diabetic retinopathy screening programmes.
Diabetic retinopathy is a major cause of visual impairment in Europe.1,2,3 Systematic screening for diabetic retinopathy in people with diabetes has been shown to be cost‐effective and an essential component of their care.4,5,6,7,8 An ageing population, sedentary lifestyle and obesity all contribute to the increasing prevalence of type 2 diabetes. The number of people with diabetes is expected to double in the next 15 to 30 years.9 Delivery of a quality assured, systematic screening programme is a major challenge for health care providers. At the conference “Screening for Diabetic Retinopathy in Europe – 15 years after St. Vincent” on 17–18 November 2005 in Liverpool,10 official national representatives of 29 European Countries declared that European Countries should:
“Reduce the risk of visual impairment due to diabetic retinopathy by 2010 through:
‐ a systematic programme of screening reaching at least 80% of the population with diabetes;
‐ using trained professionals and personnel;
‐ universal access to laser therapy.”
Assuming a 4% prevalence of diabetes, governments in Europe might have to offer screening to 35 million people.11
In our experience, 60% of patients screened have no retinopathy.12 To make best use of limited resources, it has been proposed that the assessment of image quality and the presence or absence of any diabetic retinopathy could be performed by relatively inexperienced “disease/no disease” graders after a short period of training. Experienced “full disease” graders would then identify patients, deemed to have retinopathy, for referral to ophthalmology.13,14,15
Microaneurysms appear as red dots on colour photographs and are a sensitive sign of diabetic retinopathy.16 A number of automated systems for detecting microaneurysms in digital fundus photographs have been evaluated.16,17,18,19 However, before these systems could operate in screening practice they would require more thorough validation and the addition of a system for detecting images with ungradeable quality.
The aim of this study was to compare a combined automated image quality and “disease/no disease” grading system with the existing manual system in a diabetic retinopathy systematic screening programme.
Study population
Anonymised images forming a test set were obtained prospectively from consecutive patients attending the Grampian Diabetes Retinal Screening Programme in 2003–2004 in North‐East Scotland. This consisted of 14 406 images from 6722 consecutive patients. The median age was 65 years (IQR 19 years), 3725 (55%) were male and 46 (0.7%) had only one screenable eye. A total of 1423 patients (21.2%) received mydriasis as adequate quality photographs were not obtained with undilated photography. Ethical approval was obtained from the Grampian Medical Research Ethics Committee for the use of the anonymised images and grading data.
Photographic protocol
The programme used non‐mydriatic 45‐degree fixed Canon CR5‐45NM and mobile Canon CR6‐45NM fundus cameras (Canon Inc. Medical Equipment Business Group, Kanagawa, Japan) attached to Canon D30 digital colour cameras (2160×1440 pixels). Each patient had a minimum of one disc/macula photograph per screenable eye.13 The photographers took additional photographs if image quality or field definition was considered inadequate at the time of photography. If the photograph was inadequate because the pupil was too small then the pupils were dilated using Tropicamide 1% (Chauvin Pharmaceuticals Ltd, Kingston‐upon‐Thames, UK).
Reference standard grading
The test set images were displayed on a contrast adjusted 22 inch monitor (Iiyama Vision Master Pro 512) at full resolution and graded by the clinical research fellow using the definitions of table 11.. The quality grading included assessments of field definition and image clarity. The clinical research fellow was trained to use the Scottish Diabetic Retinopathy Grading Scheme (table 11),), with 500 images from a training set. This is based on the ETDRS grading scheme, adapted for non‐stereoscopic single disc/macula field photographs, and very similar to the “International clinical diabetic retinopathy and diabetic macula edema severity scale”.14
Table thumbnail
Table 1 The Scottish Diabetic Retinopathy Grading Scheme 2004.7
There was good agreement between the clinical research fellow and the clinical director for retinal screening outcome (κ = 0.91), grade of retinopathy (κ = 0.85) and grade of maculopathy (κ = 0.85) for the training set.
Disease/no disease grading systems
This study assessed the efficacy of the manual and automated “disease/no disease” grading systems against the reference standard (fig 11).). In the manual system, patients were referred for “full disease” grading if the images of either eye were of inadequate quality or if any diabetic retinopathy was detected. The automated system referred patients for “full disease” grading if all images of one eye were of inadequate quality or if dot haemorrhage/microaneurysms (DH/MA) were detected.
figure bj119453.f1
Figure 1 Flow charts of the “disease/no disease” manual and automated graded systems for assessment of eyes, images and patients. The automated system detected dot haemorrhages/microaneurysms (DH/MA).
The manual grading of the retinal images was performed by three retinal screeners who also performed the photography. They were trained according to the recommendations of the HTBS.13 Manual graders recorded one grade for each eye.
The automated grading system consisted of software for image quality assessment and for DH/MA detection. The software was developed using a training set that was separate from the test set. Details of the image processing methods are described elsewhere.17,24,25,26,27 Quality assessment was performed in two steps, field definition assessment and clarity assessment. Field definition was assessed by ascertaining the location of the optic disc and the fovea, and whether both temporal arcades were visible. The position of the optic disc was found by identifying a circular shape lying at the apex of an approximate ellipse formed by the temporal arcades. The fovea was identified as a dark area close to a point at a fixed distance from the optic disc and on the centreline of the elliptical temporal arcades. Image clarity was assessed by checking the length of visible small vessels within a circle of radius of 1.75 optic disc diameters around the fovea. An image was judged to have adequate quality for retinopathy detection if it had adequate field definition and adequate clarity. The location, in the image, of the optic disc and the orientation of the course of the temporal arcades was used to decide whether an image was of the left or right eye.
If there was at least one image of adequate quality for each eye, then DH/MA detection was applied to all images of adequate quality for that patient. Dark objects of limited size were detected using techniques from mathematical morphology and used as DH/MA candidates. As many of these are not real DH/MAs, features were evaluated on each candidate such as area, eccentricity, mean intensity and mean intensity gradient. The features were used to classify candidates as true or false identifications. DH/MA detection was improved by recent developments that include explicit vessel detection and analysis of the background retina around candidates.26
The operating point of DH/MA detection can be adjusted to alter its sensitivity, an increase in sensitivity being accompanied by a decrease in specificity. Due to inter‐eye concordance, the sensitivity must be adjusted according to the results per patient rather than the results per eye. We used an operating point that gave a higher sensitivity for detection of patients with any retinopathy than that of manual graders.
Statistical analysis
Data were analysed using the Statistical Package for Social Sciences (SPSS, V.13.0). Sensitivity and specificity are presented for manual and automated grading with confidence intervals calculated using the Wilson score method without continuity correction. Overall agreements were measured using the kappa (κ) statistic. McNemar tests were used to compare positive agreement between the automated and manual grading.
TablesTables 2–5 compare the performance of the manual and automated “disease/no disease” grading systems against the reference standard. Table 22 shows frequencies of patients with each grade of retinopathy as assigned by the reference standard process. Table 22 also shows detection rates and the number of missed cases for each retinopathy grade and for each “disease/no disease” grading system. The number of patients misclassified as normal by the automated system (n = 240) was significantly lower than for the manual system (n = 341) (p<0.001).
Table thumbnail
Table 2 Detection rates with 95% confidence intervals (CI), by patient, for manual and automated “disease/no disease” grading for each of the grades assigned by the reference standard, with frequency of occurrence of each (more ...)
Table thumbnail
Table 3 Detection rates with 95% confidence intervals (CI) by eye and by image for manual and for automated “disease/no disease” grading for each of the grades assigned by the reference standard, with frequency of occurrence (more ...)
Table thumbnail
Table 4 Sensitivities and specificities, with 95% confidence intervals (CI), for technical failures, any “disease” in patients with no technical failure, and referral for “full disease” grading
Table thumbnail
Table 5 Sensitivities and specificities, with 95% confidence intervals (CI), for technical failures, any “disease” in patients with no technical failure, and referral for “full disease” grading
Table 22 shows that 3 of the 330 patients with referable or observable retinopathy/maculopathy (M1, R2, M2, R3 or R4) were graded as having no retinopathy by the manual system and that seven were graded as having no retinopathy by the automated system. The difference is not statistically significant (p = 0.125). None of the non‐referred patients had new vessel growth. One patient (M2) had solitary minor exudate formation and was missed by both systems. Six other patients (two M1, four M2) were missed by the automated system: two had solitary blot haemorrhages, one had a small solitary linear streak of exudate and three had small, scattered exudates. The manual system missed two additional patients, one with isolated intra‐retinal microvascular anomalies (R3) and one with microaneurysms and small exudates (M1).
Table 33 presents data in a similar format but for detection rates in eyes and images. The manual and automated systems missed 10 eyes and 26 eyes, respectively, out of 465 eyes with referable or observable retinopathy/maculopathy. In terms of images, the automated system missed 28 out of 527. It was not possible to obtain detection rates for the manual system by image as the manual graders provided a composite grade for each eye. Two eyes from two patients graded as R4 but missed by the automated system were found, at clinical review, to have mild background diabetic retinopathy and disc collaterals resulting from old retinal vein occlusions.
TablesTables 4 and 55 show sensitivities and specificities for both grading systems. In table 44,, the results are presented by patient and, in table 55,, by eye and by image. As for table 33,, it was not possible to obtain results by image for the manual system. Technical failure sensitivities and specificities are for the quality assessment stage of the two systems alone and take no account of disease. Sensitivities and specificities for detection of any “disease” in patients with no technical failure are based on the subset of cases where a decision on the presence of retinopathy was made by both the reference standard and by the grading system under assessment. They therefore ignore cases deemed as technical failure by either the reference standard or by the grading system under assessment. Sensitivities and specificities for referral to “full disease” grading are based on a “disease/no disease” grading result combining technical failure and any “disease” detection. They indicate the proportions of cases that were correctly determined as being normal or abnormal by the grading system under assessment.
The results reported in this paper suggest that automated image grading offers diabetic retinopathy screening programmes an opportunity to reduce safely the manual burden of grading.
The automated “disease/no disease” grading software was more sensitive but less specific than its manual equivalent. As “disease/no disease grading” is only part of the grading process, the overall specificity is unlikely to be affected.
The main function of any grading system is to identify those patients who require referral to ophthalmology or who are not suitable for photographic screening. As well as being effective, automated “disease/no disease” grading has the potential to reduce the human workload of grading. Manual “disease/no disease” grading and “full disease” graders undertook, respectively, 6722 and 2545 grading episodes (including 164 cases referred to “full disease” grading due to concerns about non‐diabetic eye disease) totalling 9267 episodes. In comparison, a system using automated “disease/no disease” grading and manual “full disease” grading would have led to only 3652 manual grading episodes. This equates to a 60% workload reduction.
The current system had higher sensitivity and lower specificity than reported in our previous work, but this had been confined to good quality images.17 Other studies have reported higher sensitivities and specificities, but they used small numbers of patients and did not include automated quality assessment.18,19,28,29,30 The only comparable study used retinal photographs, manually graded for quality, from 773 patients. Candidate bright and dark lesions were identified by image analysis and features classified by a neural network. The authors' recommended operating point gave a sensitivity of 94.8% and specificity of 52.8%.30
In our study, automated grading missed five cases of referable retinopathy/maculopathy (table 22).). While this is a concern, the main identifiable source of false negatives in the grading pathway was “full disease” grading accounting for 18 missed cases of referable eye disease in the test set (16 of referable maculopathy and 2 of severe background diabetic retinopathy). The two eyes graded as having proliferative retinopathy and missed by the automated system (table 33),), were found to have only mild diabetic retinopathy at eye clinic examination. The automated system detected both patients due to retinopathy in the opposite eye.
The demographic profile of the patients was similar to that reported in the Scottish Diabetes Survey 2003.31 The prevalence of referable retinopathy is comparable to that reported from Tayside (3.0%) and Newcastle (4.5%) but lower than that reported from Liverpool (7.1%) and Cheltenham (12.2%).21,31,32,33,34 As with many screening programmes, the main indications for additional slit lamp examination (883 patients, 13.1%) were people with ungradeable images (553, 8.2%). Although the population studied was predominantly Caucasian, racial variations in pigmentation mainly affect the reflectance of blue and red light from the retina causing differences in retinal colour. Our automated software uses mainly the green plane of the image, which we would expect to be more affected by photographic conditions than by racial variations in retinal pigmentation.
We have presented results by patient, by eye and by image to enable comparison with other grading systems. As was expected, due to concordance between patient eyes, automated detection rates for images and eyes had lower sensitivities and higher specificities than the corresponding detection rates for patients.
The automated methods tested in this study are adaptable to local photographic procedures and equipment. For example, photographic protocols requiring a higher number of images per eye could result in a higher sensitivity and lower specificity per patient so that the operating point for the sensitivity of DH/MA detection would need to be altered if similar results to those presented here are desired. Variations in photographic scale can be handled by scaling of images to a standard number of pixels per degree. There are also local variations in requirements pertaining to retinal field of view. This would affect the field definition aspect of automated image quality assessment but can be adjusted by straightforward modification of software parameters.25 Any necessary resizing or selection of software parameters could then be made appropriately for each case.
Automated grading can run almost continuously at a speed depending on the computer system and the number of parallel processors. For example, the average time to process one patient is under 4 min (on a PC with a 3 GHz Intel Pentium processor) whereas manual grading typically requires 6 min.
Automated grading could have a significant impact on the costs of quality assurance, an essential component of systematic screening. The small increase in quality assurance required for “full disease” graders (due to the higher referral rate) would be outweighed by the reduced quality assurance requirements for “disease/no disease” grading even though the automated system would be treated as an additional grader.
In conclusion, automated grading of diabetic retinopathy and image quality could safely reduce the burden of “disease/no disease” grading in diabetic retinopathy screening programmes and could facilitate implementation of screening across Europe.
Acknowledgements
We would like to thank Lorraine Urquhart and members of her team from the Grampian Retinal Screening Programme for their support.
JAO was the principal investigator. JAO, PFS, KAG, PM and GJP designed the study. SP performed the data collection and reference grading. ADF developed the automated methods. JAO performed the quality assurance. SF and GJP performed the statistical analyses. All participated in the interpretation of the data. SP and ADF wrote the first draft of the paper. All authors reviewed and revised the paper for important intellectual content. JAO takes responsibility for the content.
Abbreviations
DH/MA - dot haemorrhage/microaneurysm
Footnotes
Funding: This project was funded by the Chief Scientist Office, Scottish Executive Health Department (grant number CZH/4/76).
Competing interests: Implementation in Scotland is being considered. If this occurs it is likely that there will be some remuneration for the University of Aberdeen, NHS Grampian and the Scottish Executive.
Ethics approval: Ethics approval was obtained from the Grampian Medical Research Ethics Committee for the use of the anonymised images and grading data.
1. Kocur I, Resnikoff S. Visual impairment and blindness in Europe and their prevention. Brit J Ophthalmol 2002. 86716–722.722. [PMC free article] [PubMed]
2. Bamashmus M A, Matlhaga B, Dutton G N. Causes of blindness and visual impairment in the West of Scotland. Eye 2004. 18257–261.261. [PubMed]
3. Evans J, Rooney C, Ashwood F. et al Blindness and partial sight in England and Wales: April 1990–March 1991. Health Trends 1996. 285–12.12.
4. Foulds W S, McCuish A, Barrie T. et al Diabetic retinopathy in the West of Scotland: its detection and prevalence, and the cost‐effectiveness of a proposed screening programme. Health Bull (Edinb) 1983. 41318–326.326. [PubMed]
5. Sculpher M J, Buxton M J, Ferguson B A. et al Screening for diabetic retinopathy: a relative cost‐effectiveness analysis of alternative modalities and strategies. Health Econ 1992. 139–51.51. [PubMed]
6. James M, Turner D A, Broadbent D M. et al Cost effectiveness analysis of screening for sight threatening diabetic eye disease. BMJ 2000. 3201627–1631.1631. [PMC free article] [PubMed]
7. Diabetic Retinopathy Screening Implementation Group Diabetic retinopathy screening services in Scotland: recommendations for implementation. Edinburgh: Scottish Executive, 2003.
8. Scottish Intercollegiate Guidelines Network SIGN 55. Management of Diabetes. Guidelines. Edinburgh: Scottish Intercollegiate Guidelines Network 2001.
9. Wild S, Roglic G, Green A. et al Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 2004. 271047–1053.1053. [PubMed]
10. European conference on screening for diabetic retinopathy in Europe May 2006 http://www.drscreening2005.org.uk/conference_report.doc (accessed 19 March 2007)
11. World Health Organization Atlas of health in Europe http://www.euro.who.int/Document/E79876.pdf (accessed 19 March 2007)
12. Philip S, Cowie L M, Olson J A. The impact of the Health Technology Board for Scotland's grading model on referrals to ophthalmology services. Br J Ophthalmol 2005. 89891–896.896. [PMC free article] [PubMed]
13. Facey K, Cummins E, Macpherson K. et alOrganisation of services for diabetic retinopathy screening. Glasgow, Health Technology Board for Scotland; 2002, Health Technology Assessment Report 1 .
14. Wilkinson C P, Ferris FL I I I, Klein R E. et al Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology 2003. 1011677–1682.1682. [PubMed]
15. Harding S, Greenwood R, Aldington S. et al Grading and disease management in national screening for diabetic retinopathy in England and Wales. Diabet Med 2003. 20965–971.971. [PubMed]
16. Kohner E M, Stratton I M, Aldington S J. et al Microaneurysms in the development of diabetic retinopathy (UKPDS 42). UK Prospective Diabetes Study Group. Diabetologia 1999. 421107–1112.1112. [PubMed]
17. Hipwell J H, Strachan F, Olson J A. et al Automated detection of microaneurysms in digital red‐free photographs: a diabetic retinopathy screening tool. Diabet Med 2000. 17588–594.594. [PubMed]
18. Larsen M, Godt J, Larsen N. et al Automated detection of fundus photographic red lesions in diabetic retinopathy. Invest Ophthalmol Vis Sci 2003. 44761–766.766. [PubMed]
19. Niemeijer M, van Ginneken B, Staal J. et al Automatic detection of red lesions in digital color fundus photographs. IEEE Trans Med Imaging 2005. 24584–592.592. [PubMed]
20. Olson J A, Strachan F M, Hipwell J H. et al A comparative evaluation of digital imaging, retinal photography and optometrist examination in screening for diabetic retinopathy. Diabet Med 2003. 20528–534.534. [PubMed]
21. Scanlon P H, Malhotra R, Thomas G. et al The effectiveness of screening for diabetic retinopathy by digital imaging photography and technician ophthalmoscopy. Diabet Med 2003. 20467–474.474. [PubMed]
22. Agrawal A, McKibbin M A. Technical failure in photographic screening for diabetic retinopathy. Diabet Med 2003. 20777.
23. NHS Quality Improvement Scotland Diabetic retinopathy screening. Clinical standards ~ March 2004. Edinburgh: NHS, 2004; 29 .
24. Cree M J, Olson J A, McHardy K C. et al A fully automated comparative microaneurysm digital detection system. Eye 1997. 11622–628.628. [PubMed]
25. Fleming A D, Philip S, Goatman K A. et al Automated assessment of diabetic retinal image quality based on clarity and field definition. Invest Ophthalmol Vis Sci 2006. 471120–1125.1125. [PubMed]
26. Fleming A D, Philip S, Goatman K A. et al Automated microaneurysm detection using local contrast normalization and local vessel detection. IEEE T Med Imaging 2006. 251223–1232.1232.
27. Fleming A D, Goatman K A, Philip S. et al Automatic detection of retinal anatomy to assist diabetic retinopathy screening, Phys Med Biol2007. 5233–45.45.
28. Lee S C, Lee E T, Kingsley R M. et al Comparison of diagnosis of early retinal lesions of diabetic retinopathy between a computer system and human experts. Arch Ophthalmol 2001. 119509–515.515. [PubMed]
29. Sinthanayothin C, Boyce J F, Williamson T H. et al Automated detection of diabetic retinopathy on digital fundus images. Diabet Med 2002. 19105–112.112. [PubMed]
30. Usher D, Dumskyj M, Himaga M. et al Automated detection of diabetic retinopathy in digital retinal images: a tool for diabetic retinopathy screening. Diabetic Med 2004. 2184–90.90. [PubMed]
31. Scottish Diabetes Survey Monitoring Group Scottish diabetes survey 2002. Edinburgh: Scottish Executive, 2003.
32. Leese G P, Morris A D, Swaminathan K. et al Implementation of national diabetes retinal screening programme is associated with a lower proportion of patients referred to ophthalmology. Diabet Med 2005. 221112–1115.1115. [PubMed]
33. Younis N, Broadbent D M, Vora J R. et al Prevalence of diabetic eye disease in patients entering a systematic primary care‐based eye screening programme. Diabet Med 2002. 191014–1021.1021. [PubMed]
34. Pandit R J, Taylor R. Quality assurance in screening for sight‐threatening diabetic retinopathy. Diabet Med 2002. 19285–291.291. [PubMed]
35. Goldberg M F, Fine S L. Symposium on the treatment of diabetic retinopathy. Washington, DC: US Department of Health, Education & Welfare, 1969. 7–15.15.
Articles from The British Journal of Ophthalmology are provided here courtesy of
BMJ Group