|Home | About | Journals | Submit | Contact Us | Français|
Automated mosaic software programs are used to stitch together overlapping retinal fundus photographs. The performance of these programs in eyes with retinal diseases has not been independently evaluated. This study compares the quality of the mosaic products of three autophotomontage software programs, using digital fundus photographs of eyes with cytomegalovirus (CMV) retinitis.
Photographs of 99 eyes with CMV retinitis of 94 patients with HIV were taken at Maharaj Nakorn Chiang Mai Hospital in Chiang Mai, Thailand. Automated mosaic images were created for each of the 99 eyes by three different commercially available programs: IMAGEnet (Topcon, Oakland, NJ), i2k Retina (DualAlign LLC, Clifton Park, NY), and AutoMontage (OIS, Sacramento, CA). Three masked graders ranked each set of mosaics for each eye. The graders also assessed the overall image quality and documented mosaic artifacts in each image.
i2k Retina was ranked as the best program (70%–88%) more often than AutoMontage (10%–33%, P < 0.001) or IMAGEnet (0%–4%, P < 0.001) for creating automontages from digital fundus photographs of eyes with CMV retinitis. Acceptable quality mosaic images were reported most commonly for i2k Retina (93%–94%) and AutoMontage (91%–95%), followed by IMAGEnet (27%–56%, P < 0.001). IMAGEnet had a significantly higher percentage of mosaic errors than did either i2k Retina or AutoMontage (P < 0.001).
In eyes with CMV retinitis, both the i2k Retina and AutoMontage software packages appear to create higher quality mosaics than does IMAGEnet. Automated retinal mosaic imaging may be valuable in diagnosing CMV retinitis and observing disease progression.
Automated mosaic software programs stitch together multiple fundus photographs for the evaluation of retinal diseases. These programs use multiple overlapping digital fundus photographs to construct a single composite image, easing the review of these photographs.1,2 Programs with automated mosaic functions may use landmarks in the normal retina, such as the optic disc and the vasculature, to assist in the creation of a photomontage.3,4 Although many mosaic programs are in use, their performance in images with retinal pathology has not been independently evaluated.
CMV retinitis is an opportunistic infection affecting severely immunocompromised patients, including many patients with HIV/AIDS.5 Although the incidence of CMV retinitis has declined in industrialized countries due to the widespread availability of antiretroviral medications,6 it remains a significant cause of blindness in many regions of the world affected by HIV/AIDS.7–9 In Southeast Asia, CMV retinitis is still present in up to one third of HIV/AIDS patients with CD4 counts of less than 50 cells/μL.7,10 CMV retinitis has a characteristic appearance7 and is a new and important target for telemedicine screening in developing nations.11
An accurate photomontage of eyes with CMV retinitis can be helpful in determining the location, extent, and progression of disease, and may have a role in the remote diagnosis and treatment of CMV. We recently performed a study on telemedicine diagnosis of CMV retinitis at an ocular infectious diseases clinic in northern Thailand. As an ancillary study, we evaluated three commercially available autophotomontage software programs using digital fundus photographs of eyes with CMV retinitis.
The study protocol was in compliance with the Declaration of Helsinki, and institutional review board approval was prospectively obtained at the University of California, San Francisco, and Chiang Mai University.
This study was conducted using fundus photographs acquired from 94 patients with HIV at the Ocular Infectious Diseases Clinic at Maharaj Nakorn Chiang Mai Hospital in Chiang Mai, Thailand from August 2008 through April 2009. All patients presenting for their initial visit or diagnosed with CMV retinitis in the previous month were offered enrollment in the study. The image capture protocol was analogous to the fundus photography protocol used in the Longitudinal Study of the Ocular Complications of AIDS.12 A trained ophthalmic photographer used a digital fundus camera (TRC-NW 6S; Topcon, Oakland, NJ) to capture overlapping 45° fundus photographs in each eye: one central image of the optic disc and macula, with eight or more surrounding midperipheral images. Each eye was photographed only once. A mean of nine fundus photographs were taken for each eye (range, 8–15).
In this ancillary study, we included 89 eyes with a definite clinical diagnosis of CMV retinitis, 5 eyes with an uncertain diagnosis of CMV, and 5 eyes with a diagnosis from photographic review only. One of the authors (JC) loaded all photographs for each study eye onto each of three commercially available automontage programs: OIS AutoMontage (OIS, Sacramento, CA), i2k Retina (DualAlign LLC, Clifton Park, NY), and IMAGEnet Professional (Topcon, Oakland, NJ). The authors used the factory default settings for each program, without specialized image registration settings. Although we did not use any of the advanced options in any of the programs, it should be noted that all programs have the capability to accommodate as many individual photographs as necessary; all have the option to view images as blended or unblended mosaics, with advanced blending options available in i2k Retina and OIS AutoMontage, and IMAGEnet allows for manual construction of a mosaic. One of the authors (JC) created all the mosaics, documenting whether at least two individual photographs were stitched together and also whether all the individual photographs in the set had been incorporated into the mosaic. The completed mosaics were presented to three expert graders (DH, TPM, and JDK) for two rounds of grading. First, the mosaics were presented as randomly sorted sets of three mosaics per study eye (one mosaic from each software program), to facilitate a side-by-side comparison. Second, the mosaics were presented in random order (without stratifying by study eye), with each mosaic paired with the individual central photograph used in constructing the mosaic. The graders were masked to the identity of the software.
In the first round of grading (i.e., side-by-side comparison of the three fundus montages from the same study eye), the three graders assigned each image a rank based on personal preference (1, best; 3, worst). Images that could not be stitched into a mosaic by one program, but could be by another, were automatically given the lowest ranking. In the second round of grading, the same three graders assessed whether each mosaic was gradable or ungradable. Images that were considered to be ungradable were not assessed further. For gradable mosaics, images were assessed for the presence or absence of the following five photographic features: duplicated vessels (duplication of the same blood vessel from a common origin), breakage in vessel continuity (sudden sharp disruption of a single, nonduplicated blood vessel that is not due to image misplacement), misplaced images (placement of one or more of the individual photographs in an incorrect sector of the montage), relative blurriness of the mosaic image (compared to the individual central photograph), and acceptable versus unacceptable overall quality of the mosaic (acceptable mosaics were defined as the presence of few or no mosaic errors and the preference for using the mosaic as opposed to the individual images for clinical care). The graders were also invited to make comments about individual mosaics. They repeated this exercise on a set of 50 randomly selected duplicate mosaics approximately 2 weeks after the initial grading, to assess intrarater reliability. All graders viewed images for the study on a glossy monitor manufactured by Apple, Inc. (Cupertino, CA).
We performed descriptive statistics for each software program separately for each grader. The proportion of mosaic images assigned a rank of 1 and the proportion assigned a rank of 3 were computed for each program. To assess whether the ranks differed between the three software programs (using all three raters for each eye), we computed a simple rank centroid statistic, as described in Appendix A. We further examined whether the differences in rank were statistically significant in pairwise comparisons of the three programs.
Intrarater reliability was assessed using Cohen's κ on two separate measurements from each rater (with 95% CI computed using the bootstrap percentile method),13 except when the estimate was 1, in which case we used the exact method to compute a lower, 97.5%, boundary.14 Interrater reliability between all three raters was assessed with the Fleiss κ for multiple raters (with 95% CI computed using the bootstrap percentile method).13 The κ coefficients were interpreted according to the following categories: κ < 0.4 corresponds to poor agreement, κ from 0.4 to 0.75 corresponds to fair to good agreement, and κ >0.75 corresponds to excellent agreement.15
We used clustered logistic regression to model the probability of the presence of each feature, accounting for the correlation between images captured from the same person and also the correlation between images graded by the same grader. We used a likelihood ratio to assess whether there was a significant difference in the frequency of each photographic feature between the three programs. When this overall hypothesis test was correct, we further examined pairwise contrasts.
We then assessed whether the presence or absence of each of the photographic features was associated with the programs' ranking. Since each grader also determined the presence or absence of each feature of interest, we calculated the correlation coefficient for each eye–grader combination. Each estimated correlation coefficient summarizes the degree of association between the presence or absence of each feature and the rankings for the three software programs (for that eye and grader). In the absence of any association, the expected average of these correlations is 0. We tested the hypothesis that these were 0 by estimating the intercept in a linear mixed effects regression model with eye and grader as random effects.
To determine whether missing individual images (i.e., the full set of individual images was not incorporated into the mosaic) was predictive of the ranks by each grader, we computed the rank centroid for images with the missing indicator and the rank centroid for those without and computed the distance between these two centroids (see Appendix A for details). Under the null hypothesis, this difference of distances should be 0 on average; the sampling distribution of this difference was computed using the permutation method. We then computed Spearman's rank correlation coefficient between missing individual images and the mean rank of each photo. Statistical analysis was performed with R 2.12 (University of Auckland, Auckland, New Zealand). We accounted for multiple comparisons by setting the significance level at α = 0.002, which assumed a Bonferroni correction with a family-wise error rate of α = 0.05 and 25 comparisons.
All 99 sets of retinal photographs were processed by each of the three auto-photomontage programs. The programs differed in their ability to create mosaics from the study eyes and also in their ability to incorporate all the individual photographs into the final mosaic. The i2k Retina program incorporated at least two individual images in its mosaics for 97% of the study eyes, AutoMontage for 100% of the study eyes, and IMAGEnet for 92% of the study eyes (P = 0.01, χ2). i2k Retina incorporated all the individual photographs for 82.8% of the study eyes, AutoMontage for 77.8% of the study eyes, and IMAGEnet for 32.3% of study eyes (P < 0.001, χ2).
As summarized in Figure 1, graders most frequently assigned the number one ranking to photographic mosaics formed by i2k Retina (70%–88%), followed by AutoMontage (10%–33%), and then IMAGEnet (0%–4%). IMAGEnet received the lowest ranking for 79% to 97% of the mosaics. Both i2k Retina (4%–7%) and AutoMontage (2%–14%) were rarely ranked 3. Differences in assigned rank between programs were statistically significant (P < 0.001, permutation test, rank centroid).
Photographic mosaics were graded for duplicated vessels, vessel breaks, misplaced images, relative blurriness, and overall quality. Intrarater reliability was higher for graders A and B than for grader C (Table 1). Although grader C had an intrarater κ of 0 for duplicated vessels, breaks in vessel continuity, and relative blurriness, the uncorrected agreement for each feature was ≥96%. The low κ occurred because grader C assigned the same grade to all 50 duplicate images used to assess intrarater reliability. Graders A and B each had good to excellent intrarater reliability for all photographic features except for vessel breaks (κ = 0.31 for grader B). Interrater reliability among all three graders was poor for duplicated vessels, breaks, and blurriness, largely because grader C had low agreement with the other two graders (κ < 0.2 for duplicated vessels, vessel breaks, and relative blurriness for reliability between A and C and B and C).
There were significant differences between the three software programs for each of the five photographic features that were graded (overall P < 0.001 for each feature). Figure 2 summarizes the presence of the five photographic features in each program. Figures 3 and and44 show examples of the photographic features. IMAGEnet produced mosaics that were more likely to have duplicated vessels, misplaced images, relative blurriness, and unacceptable overall quality than those produced by either AutoMontage or i2k Retina (P < 0.001 for each comparison). Mosaics by IMAGEnet were more likely to have breaks in blood vessel continuity than those by i2k Retina (P < 0.001), but not more likely than those by AutoMontage (P = 0.95). The i2k Retina and AutoMontage programs differed in only two of the photographic features, with duplicated vessels (P = 0.001) and breaks in blood vessel continuity (P < 0.001) observed more frequently in the AutoMontage images. There was no significant difference between the i2k Retina and AutoMontage programs for misplaced images (P = 0.37), relative blurriness (P = 0.75), or unacceptable overall quality (P = 0.15).
The overall rank assigned to a photographic mosaic correlated significantly with the presence of each of the five photographic features (P < 0.001 for each feature; Table 2). The magnitude of association was relatively strong for misplaced images, image blurriness, duplicated vessels, and overall quality. A much lower correlation coefficient was observed for vessel breaks. We also found that the ability of a program to incorporate all individual images into the mosaic was associated with its overall ranking, with a correlation coefficient of 0.43 (P < 0.001, permutation test, rank centroid difference).
All graders commented on two types of image distortions that were not elicited in the grading form but were present in some mosaics: (1) visible, overlapping borders of the individual images at the suture lines of the peripheral mosaic and (2) linear color changes in the mosaic, usually present in the center of the image (Figs. 3, ,4),4), that were not present in the individual central photograph. These abnormalities were systematically quantified by grader A, who documented visible overlapping borders in 23% of the 99 AutoMontage-created mosaics, 1% of the 96 i2k Retina-created mosaics, and none of the 91 IMAGEnet-created mosaics. He also noted abrupt color changes in 22% of the 96 mosaics made by i2k Retina, 4% of the 99 mosaics made by AutoMontage, and none of the 91 mosaics made by IMAGEnet.
The i2k Retina and AutoMontage software packages appear to create high-quality photographic mosaics from digital photographs of retinas with CMV retinitis. These programs created mosaics of higher quality and with fewer missing images or other errors than in those created by IMAGEnet. Although the i2k Retina and AutoMontage programs scored similarly in many aspects, all three graders ranked i2k Retina as the best program more frequently. Each of the photographic features documented by the three graders (duplicated vessels, vessel breaks, misplaced images, relative blurriness, and overall quality) was significantly associated with a program being ranked as inferior, with vessel breaks having a weaker association than the other errors.
In this study, we evaluated the specific features that play a role in determining the quality of a mosaic retinal image. Several studies have compared different imaging modalities and their clinical usefulness in retinal diseases, including comparisons between 35-mm film and digital imaging in diabetic retinopathy.16–18 However, we are unaware of any previous reports comparing the clinician-assessed quality of different retinal imaging software products that perform the same imaging function. Because ophthalmic imaging technology is rapidly advancing, objective clinical evaluation of each new facet of technology is important. By assessing the automontage process, this study provides a framework for assessing other imaging programs based on clinician judgment.
Within the field of CMV research and telemedicine, the ability to create consistent, high-quality mosaics can be clinically useful. Although high-quality mosaics may not be necessary for the diagnosis of CMV retinitis, they allow for easier comparison with previous photographs, which could simplify the identification of new lesions and aid in the assessment of CMV retinitis border advancement. High-quality mosaics could therefore improve clinical practice, and they could also be used as an outcome measure in clinical and epidemiologic research.
We used a competitive ranking system in this study, which did not allow ties, and therefore we cannot provide information on the magnitude of difference between the three programs. However, we addressed this by also collecting information on five photographic features. We observed that although the i2k Retina software received the highest ranking most of the time, the quality of AutoMontage mosaics was similar to that of i2k Retina for three of the five features (misplaced images, relative blurriness, and overall quality), suggesting that there was probably not a large margin that separated the rankings of these two programs.
Vessel breakage in the retinal montages had some of the lowest intra- and interrater reliability values and also displayed the weakest degree of correlation with a program's ranking. In general, vessel breaks were often subtle findings that occurred at the mosaic periphery (Fig. 4). Perhaps in this location the breaks were not visually distracting and did not interfere with assessment of pathology. This finding suggests that vessel breaks may not be an appropriate measure of software performance. Programmers of mosaic software may want to focus on reducing the presence of other mosaic errors before addressing vessel breaks.
There were several limitations to this study. First, we did not examine any of the cartographic or mathematical problems involved in constructing an accurate mosaic (such as the coordinate system, the model for image transformation, and the construction of a flat image from a curved retinal surface1), but instead chose to focus on the final mosaic image itself. Second, we did not assess differences between the programs regarding measurement of lesions. Because of the cartographic distortions inherent in the mosaic construction process, linear and area measurements of the retina are artificially inflated, especially in the periphery of the mosaic—a problem outside the scope of this study. Third, we focused on photographic features that were relatively easy to quantify, but did not include some features that are most likely very important to the viewing experience but more difficult to assess, such as image distortion induced by the montaging procedure. Fourth, the retinal camera stored all individual photographs as highly compressed images (roughly 320 kb with dimensions of 3872 × 2592 pixels). It is possible that the mosaic programs would have performed differently had higher quality, uncompressed images been uploaded. However, results should not have been biased by the use of compressed images, since all three mosaic programs were loaded with the same sets of photographs. Finally, we did not standardize monitor size and resolution.
We found no significant difference in quality between i2k Retina and AutoMontage, both of which create high-quality images with fewer errors when compared to IMAGEnet. The i2k Retina program was most often ranked first compared to the other two programs. The accurate montages that these programs create may be valuable in future CMV retinitis research and telemedicine. Further investigation is necessary to determine whether our findings can be generalized to retinal diseases other than CMV retinitis.
To assess whether the ranks of the three software programs differed statistically, we proceeded as follows. For each grader, we obtained the rank for software programs 1, for 2, and 3 (note that these dependent variables always sum to 6). For each eye, we transformed these ranks to the coordinates of a point with a distance to the nearest side of an equilateral triangle that is proportional to the rank:
Under the null hypothesis that the ranks are independent of the software program, the centroid of these points lies at the center of the triangle. We computed the following multivariate statistic (combining all three ranks for each grader on each eye). Specifically, we computed the distance of the rank centroid from the center of the triangle,
The sampling distribution of this statistic under the null hypothesis was computed using the permutation method with 10,000 replicates. All three graders were weighted equally.
Supported by the University of California, San Francisco-Gladstone Institute of Virology and Immunology Center for AIDS Research, the University of California at San Francisco Research Evaluation and Allocation Committee, National Institutes of Health/National Eye Institute (NIH/NEI) Grant K23EY019071, That Man May See, the Littlefield Trust, the Peierls Foundation, the Doris Duke Charitable Foundation, and grateful patients. OIS AutoMontage and i2k Retina donated their automontage software for the study. Topcon IMAGEnet Professional was previously purchased along with the Topcon digital fundus camera. None of the software companies tested in this article had any input into the study design or analysis.
Disclosure: J. Chen, None; S. Ausayakhun, None; S. Ausayakhun, None; C. Jirawison, None; C.M. Khouri, None; T.C. Porco, None; D. Heiden, None; J.D. Keenan, None; T.P. Margolis, OIS (F), DualAlign LLC (F)