|Home | About | Journals | Submit | Contact Us | Français|
To review findings from the authors’ published studies involving telemedicine and image analysis for retinopathy of prematurity (ROP) diagnosis.
Twenty-two ROP experts interpreted a set of 34 wide-angle retinal images for presence of plus disease. For each image, a reference standard diagnosis was defined from expert consensus. A computer-based system was used to measure individual and linear combinations of image parameters for arteries and veins: integrated curvature (IC), diameter, and tortuosity index (TI). Sensitivity, specificity, and receiver operating characteristic areas under the curve (AUC) for plus disease diagnosis were determined for each expert. Sensitivity and specificity curves were calculated for the computer-based system by varying the diagnostic cutoffs for arterial IC and venous diameter. Individual vessels from the original 34 images were identified with particular diagnostic cutoffs, and combined into composite wide-angle images using graphics editing software.
Expert sensitivity ranged from 0.308–1.000, specificity from 0.571–1.000, and AUC from 0.784 to 1.000. Among computer system parameters, one linear combination had AUC 0.967, which was greater than that of 18 of 22 (81.8%) experts. Composite computer-generated images were produced using the arterial IC and venous diameter values associated with 75% under-diagnosis of plus disease (ie, 25% sensitivity cutoff), 50% under-diagnosis of plus disease (ie, 50% sensitivity cutoff), and 25% under-diagnosis of plus disease (ie, 75% sensitivity cutoff).
Computer-based image analysis has the potential to diagnose severe ROP with comparable or better accuracy than experts, and could provide added value to telemedicine systems. Future quantitative definitions of plus disease might improve diagnostic objectivity.
Retinopathy of prematurity (ROP) is a vasoproliferative disease of low birth weight infants. ROP incidence is over 65% in infants with birth weight <1,251 g and over 80% in infants with birth weight <1,000 g.1,2 There are 4 million live births in the United States each year, of these, 60,000 have birth weight <1,500 g.3 Among these babies, it is estimated that 600 annually suffer a lifetime of blindness.4
This is a growing problem because the number of infants at risk for ROP is rising. The annual preterm birth rate in the United States has grown from 9% to 13% since 1981, while survival rates continue to rise.5,6 A joint policy statement recommends that all infants with BW <1,500 grams or gestational age ≤30 weeks should be monitored for ROP. In fact, this gestational age cutoff was recently expanded because of concern that larger infants may rarely develop severe ROP.7 Furthermore, the societal burden of infancy-acquired blindness is enormous. It is estimated that the governmental cost of visual impairment from ROP in the United States is $69 to $117 million/year in inflation-adjusted dollars.8 As neonatal advances have disseminated throughout Latin America, Eastern Europe, and Asia, worldwide ROP incidence has increased dramatically.9,10 Concerns have been raised about an emerging international “epidemic” due to persistent variability in oxygen management as well as a shortage of adequately trained ophthalmologists.11
Standard disease management involves dilated ophthalmoscopy at the neonatal intensive care unit (NICU) bedside by an experienced examiner, with hand-drawn documentation of retinal findings using the international classification of ROP.12,13 These advances in clinical care have dramatically improved the visual prognosis for at-risk infants. At the same time, standard ROP care is logistically difficult, time consuming, and associated with tremendous medico-legal liability. Because of such pressures, a recent survey found that only 54% of retinal specialists and pediatric ophthalmologists were willing to manage ROP and that over 20% planned to stop in the near future.14 Another study reported that 36% of American neonatologists were unable to transfer infants to other NICUs because there were no specialists available to perform ROP screening.15 In a focus group study involving 15 neonatology nurses and physicians, we found that participants expressed 15 concepts related to standard ROP care, of which 2 reflected positive perceptions and 13 reflected negative perceptions (e-Supplement 1, available at jaapos.org).
Emerging technologies such as telemedicine and computer-based image analysis have potential to address these limitations in clinical care. This paper reviews a series of studies that we have performed involving automated image analysis for ROP diagnosis compared to that of human experts. We will summarize our research findings involving three complementary topics: (1) accuracy of remote image-based ROP diagnosis compared to indirect ophthalmoscopy by ophthalmology experts and perceptions of neonatal staff toward this telemedicine approach; (2) performance of a computer-based image analysis system compared to expert review for plus disease diagnosis from digital retinal images; (3) quantification of retinal vascular features related to plus disease. All studies described in this manuscript were approved by the Columbia University Medical Center Institutional Review Board, and were performed in compliance with the Health Insurance Portability and Accountability Act.
Store-and-forward telemedicine is an emerging technology in which medical data and images are captured for subsequent interpretation by a remote expert. Large-scale telemedicine systems have been implemented in specialties such as radiology and pathology, which rely heavily on the interpretation of image-based data.16,17 This might be applied to address some of the current challenges associated with ROP management and quality of care. In ROP telemedicine programs, images would likely be captured by trained neonatal staff for subsequent grading by a remote ophthalmologist. This could improve travel time for ophthalmologists, logistical coordination with neonatal staff, and accessibility to expert care for patients. Serial retinal imaging may offer a more objective method for documentation of disease findings and progression. In addition, widespread retinal imaging for ROP would provide opportunities to create digital libraries for educational and research purposes. Retinal imaging might also cause less physiologic stress to infants than ophthalmoscopy with scleral depression.18,19
In a prospective study of telemedicine for ROP diagnosis, we obtained informed consent from parents of 67 consecutive infants (206 total eye examinations) hospitalized at Columbia University during a one-year recruitment period.20 All subjects underwent wide-angle retinal imaging (RetCam-II; Clarity Medical Systems, Pleasanton, CA) by a trained neonatal nurse at 31–33 weeks and/or 35–37 weeks postmenstrual age using a standard protocol. Data were uploaded to a web-based telemedicine system and interpreted by three expert retinal specialist graders, who were experienced in interpretation of wide-angle retinal images. Graders provided a diagnosis and an evaluation of image quality for each eye. Diagnoses were divided into four ordinal categories: (1) no ROP, (2) mild ROP (defined as less than type 2 ROP), (3) type-2 prethreshold ROP (defined as zone 1, stage 1 or 2, without plus disease, or zone 2, stage 3, without plus disease), or (4) treatment-requiring ROP (defined as type-1 ROP [zone 1, any stage, with plus disease; zone 1, stage 3, without plus disease; zone 2, stage 2 or 3, with plus disease] or threshold ROP [at least 5 contiguous or 8 noncontiguous clock hours of stage 3 in zone 1 or 2, with plus disease]). Findings were compared to a reference standard of dilated indirect ophthalmoscopy by an experienced pediatric ophthalmologist, which was performed independently on the same day during routine ROP care.
Our studies found that accuracy of telemedicine diagnosis compared to indirect ophthalmoscopy was high at 35–37 weeks postmenstrual age (sensitivity 100% and specificity 85%–94% for diagnosis of type 2 or worse ROP) that inter-grader reliability of telemedicine diagnosis was substantial to near-perfect at 35–37 weeks (κ 0.79–0.89 for diagnosis of type-2 or worse ROP), and that intra-grader reliability of telemedicine diagnosis was near perfect (κ 0.77–1.00).20 When the same physician performed ophthalmoscopic and telemedicine examinations at different times, the absolute intra-physician agreement between these two examinations was 86% (178/206 eye examinations). Among the 14% intra-physician discrepancies between ophthalmoscopic and telemedicine diagnoses, many involved uncertainty about presence of zone 1 disease or plus disease. In these latter cases, there was often photographic evidence suggesting that telemedicine diagnoses may have been more accurate.21 Finally, quality of images captured by trained neonatal nurses was generally high, and was better at 35–37 weeks postmenstrual age than at 31–33 weeks. At 35–37 weeks postmenstrual age, technical image quality was rated “adequate for diagnosis” in 70%–78% of images and “possibly adequate for diagnosis” in 16%–25% of images. Accuracy and reliability of remote diagnosis were high, regardless of image quality ratings by the three graders.
Using these findings, together with evidence-based outcome data from published studies, we have developed economic models representing ROP management using telemedicine and standard ophthalmoscopy. These models found that telemedicine is more cost-effective than standard ophthalmoscopy for ROP management across a wide range of assumptions.22
Taken together, these findings suggest that ROP detection using telemedicine is feasible, and that image-based diagnosis may provide important benefits over traditional ophthalmoscopic examinations. Photographs could be scrutinized in detail, in a manner analogous to other ophthalmic diseases such as diabetic retinopathy where the diagnostic “gold standard” is considered to be review of images by certified reading centers.23 Furthermore, retinal images from telemedicine systems could be analyzed by computer-based systems for detection of abnormalities, as a diagnostic aid for physicians.
Plus disease, which is characterized by arterial tortuosity and venous dilation, is a key component of the international classification of ROP. The minimum vascular abnormality required for plus disease is defined by a standard photograph, which was selected by expert committee.12,13,24 More recently, major trials have explicitly required ≥2 quadrants of this amount of vascular change for plus disease diagnosis.25,26 Accurate diagnosis is critical because the Cryotherapy for ROP (CRYO-ROP) and Early Treatment for ROP (ETROP) trials established that plus disease is a necessary feature of threshold disease and a sufficient feature of type 1 ROP, both of which benefit from treatment with laser photocoagulation or cryotherapy.24,26
However, diagnosis and documentation of ophthalmoscopic findings may be heavily subjective. This is particularly true with regard to identification of plus disease, which is defined by a standard photograph with qualitative descriptors (“dilation” and “tortuosity”) rather than by clearly established cutoff values for abnormality. We have in fact shown that there may be significant variability in the accuracy and consistency of plus disease diagnosis from wide-angle images, even among recognized ROP experts.27 These factors raise major concerns, because errors in classification may result in over- or under-treatment of disease.
Computer-based image analysis has potential to produce quantifiable, objective measurements for plus disease diagnosis. To examine the feasibility of this approach, we conducted a pilot study in which 22 recognized clinical ROP experts independently interpreted a set of 34 wide-angle retinal images for presence of plus disease.28 Images were then processed using the Retinal Image multiScale Analysis (RISA) system.29,30 This computer-based system performs semiautomated segmentation of retinal vessels, and calculates quantitative values for three parameters of each vessel: diameter (defined as total vessel area divided by its length), tortuosity index (TI, defined as vessel length divided by length of a line segment connecting its end points), and integrated curvature (IC, defined as the sum of angles along the vessel, normalized by its length). The mean value of each parameter was then calculated in every image, for arteries and veins separately (eg, “arterial IC,” “venous diameter,” “arterial TI”).
A consensus reference standard diagnosis (“plus” vs “not plus”) was defined for each image as the response provided by the majority of the 22 experts. Accuracy of each expert for diagnosing plus disease was assessed by sensitivity and specificity compared to this reference standard. Receiver operating characteristic (ROC) curves, which illustrate discriminative ability of diagnostic tests by plotting sensitivity against (1-specificity) over a range of cutoff thresholds, were constructed for each expert.28,31 This was done using an ordinal-scale of “plus,” “pre-plus,” and “neither” diagnoses.
Accuracy of the computer-based system was assessed by sensitivity and specificity curves generated from varying the numeric cutoff separating “plus” from “not plus” for each parameter, and by receiver operating characteristic AUCs. This was done for individual system parameters, and for linear combinations of multiple parameters.32 Results of this analysis are shown in e-Supplements 2 and e-Supplement 3 (available at jaapos.org). Among the 22 ROP experts, sensitivity of plus disease diagnosis ranged from 0.31 to 1.00, specificity ranged from 0.57 to 1.00, and AUC ranged from 0.78 to 1.00. Among individual computer system parameters, venular IC, venular diameter, and arteriolar IC had the highest receiver operating characteristic AUCs (0.82–0.85). Among all computer system parameters, the linear combination of arteriolar IC, arteriolar TI, venular IC, venular diameter, and venular TI had highest AUC (0.97), which was greater than that of 18 (82%) of 22 experts.28
We have replicated these findings using an independent data set of 20 images interpreted by 11 ROP experts.33 Taken together, these results demonstrate that accuracy of ROP experts for plus disease diagnosis is imperfect, and that a computer-based image analysis system may have potential to diagnose plus disease with comparable or better accuracy using objective and quantitative measurements.
As described above, the standard photographic definition of plus disease is somewhat subjective and qualitative. This definition is further limited because the appearance of vessels in the published photograph is not uniform and because there are no guidelines regarding which specific vessels represent the minimum requisite abnormality. Finally, the published standard photograph has a much narrower field of view than ophthalmoscopy or wide-angle cameras, and this difference in perspective may contribute to variability among examiners.12,13,24 Our above studies involving computer-based diagnosis of plus disease created a large database relating the values of quantitative vascular parameters to the overall diagnostic impressions of numerous recognized ROP experts.28 In principle, these data may be utilized to systematically examine the vascular features that are considered most important by experts for identifying the presence of plus disease in wide-angle retinal images. This could be used to generate composite wide-angle images representing “plus disease” at varying degrees of abnormality based on expert opinion.
To develop a methodology for generating composite images, we used sensitivity curves from the above study involving interpretation of 34 wide-angle retinal images by 22 ROP experts (Figure 1A). Curves from arterial IC and venous diameter were used because these two parameters are most closely related to the descriptive definition of plus disease (“arterial tortuosity and venous dilation”), and because these two parameters matched most closely with expert opinions after cross-validation methods (data not shown).28,33 Using these curves, three precise diagnostic cutoffs for arterial IC and venous diameter were identified: (1) values associated with 25% under-diagnosis of true plus disease according to the reference standard determine by expert consensus (ie, 75% sensitivity cutoff), (2) values associated with 50% under-diagnosis of true plus disease (ie, 50% sensitivity cutoff), and (3) values associated with 75% under-diagnosis of true plus disease (ie, 25% sensitivity cutoff).34
From among the 34 original study images, individual vessels were identified which matched these cutoff values (Figure 1B). One artery and one vein were selected from each quadrant, and combined using graphics-editing software (Photoshop CS2; Adobe, San Jose, CA) to produce composite images at 75%, 50%, and 25% sensitivity cutoff values for arterial IC and venous diameter. The resulting composite images are displayed in Figure 2. For example, the image at 75% sensitivity cutoff (Figure 2A) illustrates the level associated with 25% under-diagnosis of plus disease, and therefore has less arterial IC and venous dilation than the images at 50% and 25% sensitivity cutoffs (Figures 2B and 2C).34
This methodology might be applied toward developing a future definition of plus disease which is based on more quantitative principles.34 To illustrate the validity of this approach, Figure 3 compares the appearance of two images generated from this process to published photographs selected by expert committee. Figures 3A and 3B display the composite image reflecting 50% under-diagnosis of true plus disease (ie, 50% sensitivity cutoff), which has been cropped and magnified to match the perspective of the standard photographic definition of plus disease.24 Figure 3C and 3D display the composite image reflecting 25% under-diagnosis of true plus disease (ie, 75% sensitivity cutoff), compared to a published example of pre-plus disease.13 This allows direct comparison of vascular features, and illustrates potential limitations of a photographic definition with a smaller field-of-view than indirect ophthalmoscopy or wide-angle imaging. It also illustrates that pre-plus disease and plus disease represent a spectrum of disease severity that can be quantified using parameters such as IC and diameter to represent characteristics of retinal vessels.
Taken together, these findings demonstrate a methodology for quantifying retinal image characteristics and generating composite plus disease images over a range of disease severities, based on the opinions of recognized ROP experts. This may have applications as an educational tool for ophthalmologists, a mechanism for developing future quantitative definitions of plus disease, and a technique that may eventually be generalized to other image-based diseases. More detailed studies to investigate the exact features that are used by experts while diagnosing plus disease may inform further efforts to model this process using computer analysis.
A crisis in ROP is developing because of factors such as rapidly increasing numbers of premature births worldwide, decreasing supply of qualified ophthalmologists willing to manage the disease, rising medicolegal liability, growing economic pressures, and persistent inequities in accessibility to quality health care. Ophthalmologists who currently manage ROP are faced with additional challenges because of the qualitative nature of diagnosis and hand-drawn documentation, rapid evolution in retinal vascular findings, and logistical difficulties associated with coordination of care among multiple ophthalmologists and neonatal specialists.
Telemedicine and related information technologies might provide methods for improving some negative features of standard ROP care related to ophthalmologist workflow as well as communication among physicians, families, and NICU staff. This was suggested by a focus group study that we conducted (Table 1), in which groups of neonatologists and NICU nurses were asked open-ended questions that had been prepared in advance by a moderator experienced in qualitative research, sessions were videotaped to ensure completeness of data collection, and comments were analyzed using qualitative research software (NVivo 2.0; QSR, Doncaster, Australia) to identify and classify key concepts and themes that were identified by participants.
Our studies indicate that the accuracy, reliability, and image quality of remote ROP diagnosis are very high at later post-menstrual ages, and that telemedicine may be more cost-effective than standard care.22 Unresolved issues include medico-legal liability, engineering of telemedicine strategies into existing neonatal workflows, uncertainty about image quality at earlier post-menstrual ages,20 and training and standardization to ensure consistent quality of image capture and review. Major benefits would result from the availability of wide-angle retinal photographs used for telemedicine diagnosis. Images could be reviewed to precisely measure critical diagnostic features such as zone of disease and changes in vascular appearance over time. These images could one day be supplied to computer-based analysis systems for identification of parameters such as plus disease, which are clinically important yet currently diagnosed using methods that are subjective and qualitative. We have shown that automated image analysis has potential to diagnose plus disease comparably to or better than experts, and these tools could provide significant added value to telemedicine systems. This would be analogous to widely used methods for computer-based interpretation of electrocardiograms and Papanicolaou smears.41,42
Additional studies will be required to understand which features are considered most important by experts for diagnosis of ROP and plus disease, how reference standards for clinical parameters such as plus disease should best be determined, and whether image-based ROP diagnosis introduces additional bias43 or improves consistency compared to traditional ophthalmoscopic examination. Ultimately, a new definition of plus disease based on quantitative, measurable vascular parameters may improve accuracy and decrease variability in diagnosis. The future challenge will be to demonstrate improved quality through technology, while maintaining or improving critical aspects of the patient–physician relationship.
The authors have no commercial, proprietary, or financial interest in any of the products or companies described in this article. Dr. Chiang is an unpaid member of the Scientific Advisory Board at Clarity Medical Systems (Pleasanton, CA).
Supported by a Career Development Award from Research to Prevent Blindness (MFC) and by grant EY13972 from the National Eye Institute of the National Institutes of Health (MFC). The authors would like to thank all expert participants for their contributions to these projects.
Institution where study conducted: Columbia University Medical Center
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.