|Home | About | Journals | Submit | Contact Us | Français|
Automated perimetry does not produce reliable estimates of true psychophysical threshold in glaucomatous visual fields when the perimetric threshold falls below 15 to 19 dB. It may be possible to truncate testing at such locations and not use stimuli with very high contrast. However, this can only be recommended if it does not harm the ability to monitor change. This study examined the effect of applying such a cutoff by censoring sensitivities in two existing longitudinal datasets.
Series of six visual fields were taken from participants with glaucoma or high-risk ocular hypertension in the Portland Progression Project (P3) and Rotterdam Eye Study (RES). Pointwise linear regression was used to find “progressing” locations, defined as a slope ≤ −1 dB/y with P < 1%. An eye was labeled progressing if ≥3 locations were progressing. This was repeated after setting any sensitivities below the cutoff value C (CdB) to instead equal that value for different integer values of CdB.
In the P3 cohort tested using Swedish Interactive Testing Algorithm (SITA) Standard, censoring below 15 to 19 dB did not reduce the number of eyes flagged as progressing. For the RES cohort tested using the Full Threshold algorithm, censoring below 10 dB did not reduce the number of eyes flagged as progressing, but a modest reduction was seen for CdB between 10 dB and 15 to 19 dB.
The proportion of eyes flagged as progressing was not decreased by censoring unreliable sensitivities. Restricting the range of contrast used in clinical perimetry may be possible without hampering the ability to monitor glaucomatous visual field progression.
Functional measurement is an essential part of the standard of care for patients with glaucoma. Most commonly, measurements are performed using white-on-white standard automated perimetry (SAP). The principle of this testing modality is that the patient looks at a uniform background of low luminance and presses a button when a small circular stimulus of greater luminance is detected.1,2 As the stimulus luminance increases, so should the probability that the patient responds. This allows assessment of different levels of light sensitivity, defined as the stimulus contrast to which the patient will respond on 50% of presentations, at different locations across the visual field.
Perimetry using modern automated equipment such as the Humphrey field analyzer (HFA; Carl-Zeiss Meditec, Inc., Dublin, CA, USA) presents stimuli as high as 10,000 apostilbs (asb). However, this range may be higher than is necessary. Earlier instruments such as the Goldmann perimeter unit used a maximum luminance of 1000 asb,3 beyond which the effects of stray light caused by diffusion within the eye become substantial. It was reported that “a maximum target luminance of 1000 asb…is entirely sufficient and adequate to measure deep scotomata with the geometry and stray light behavior of the blind spot, down to the limit set by stray light interference.”4 Even this target luminance may be too high a contrast to obtain useful information; it has been suggested that “since our data suggest that, with SAP III, threshold estimates below 20 dB have little value for predicting the value at retest, one can make the case that for the purposes of detecting change, examination of such test locations could be eliminated.”5 This could be caused by nonlinearities in the magnitude of the response to contrast stimuli.6,7
In a recent paper, we used frequency-of-seeing curves to assess responses to high-contrast stimuli in regions of glaucomatous functional damage. We found that clinical perimetry did not produce reliable estimates of the psychophysical detection threshold when it was above 400% to 1000% contrast,8 using conventional size III stimulus (a circular luminance increment with a diameter of 26 minutes of arc). In equivalent clinical units, perimetric sensitivities did not provide reliable estimates of contrast sensitivity below 15 to 19 dB. In that study, the clinical testing was performed using the HFA with the Swedish Interactive Testing Algorithm (SITA) Standard test.9 For locations with sensitivity worse than 15 to 19 dB, the relationship between sensitivity measurements from frequency-of-seeing curves and those obtained from clinical perimetry had an R2 value of <0.1, indicating that the true sensitivity explained less than 10% of the observed variance in perimetric sensitivity. The probability that observers responded to a 20,000% (2-dB) contrast stimulus in a deep visual field defect was typically only very slightly higher than the probability that they would respond to a 2000% (12-dB) contrast stimulus at the same location. We hypothesized that this was caused by the effects of response saturation,10,11 whereby the remaining retinal ganglion cells were already near their maximum firing rate, so further increases in contrast would not cause an appreciable increase in response probability (with any small increase possibly reflecting effects of light from the stimulus being scattered toward relatively more healthy retinal areas with higher sensitivity). In some cases, the maximal response probability was below 50%, so sensitivity, as conventionally defined, would not exist at that location.
It should be noted that transition from “reliable” to “unreliable” perimetric sensitivities is gradual, not abrupt. The maximum response probability is asymptotic, so an increase in stimulus contrast will still produce a very small increase in response probability beyond any chosen cutoff value. Furthermore, the contrast at which these effects become apparent will vary among eyes and locations. However, although the distinction between reliable and unreliable estimates is essentially artificial, it is still useful for both research and clinical practice to define a numerical cutoff, even while acknowledging the accompanying caveats.
Some limited useful information could be extracted from locations with sensitivity below 15 to 19 dB, even if the sensitivity itself cannot be reliably measured. A reported value >0 dB (instead of <0 dB) indicates that some function likely remains at that location, as the observer responded to at least one stimulus presentation. Furthermore, a small positive correlation exists between sensitivities observed on repeat testing, even when both test and retest measurements are below 15 dB.12 This is consistent with differences in the asymptotic maximum response probability between such locations. A location with 80% asymptotic response probability to a high-contrast stimulus would tend to generate higher sensitivity estimates than a location with 20% asymptotic response probability to high-contrast stimuli, because there is an increased chance that the observer will respond to one or more stimulus presentations.
Therefore, our working hypothesis was that automated perimetry effectively measures contrast sensitivity when the outcome is above 15 to 19 dB but (albeit poorly) reflects the maximal response probability when the outcome is below 15 to 19 dB. In this scenario, it may be better to cease testing at a given location at 15 dB (in the same manner as current clinical perimetry ceases at 0 dB), saving testing time. Reducing the test duration would also reduce fatigue, improving the reliability of results. It would also allow test duration to be more similar between patients; at present, testing takes significantly longer for damaged eyes. Additionally, this change would minimize the effects of stray light and photoreceptor bleaching that are caused by very high contrast stimuli, thus providing further improvement to the reliability and repeatability of measurements of sensitivity.
In this study, we asked whether sensitivities below a certain cutoff contributed to monitoring of glaucomatous visual field change, using current clinical testing algorithms. We applied pointwise linear regression13,14 to series from two separate cohorts and then determined whether each eye would be flagged as “progressing.” This was then repeated after setting any sensitivities below CdB to equal CdB, for a range of cutoff (C) values. If the ability to detect progression were compromised by censoring, then contrasts below CdB would indeed be necessary. However, if progression could be detected just as easily using the censored sensitivities, then ceasing testing at this value might be feasible.
Participants in the Portland Progression Project (P3)15 were recruited to a tertiary glaucoma clinic study at Devers Eye Institute. Inclusion criteria were simply a diagnosis of primary open-angle glaucoma and/or likelihood of developing glaucomatous damage, as determined by each participant's physician. Exclusion criteria included inability to perform reliable visual field testing, best-corrected visual acuity worse than 20/40, cataract or medium opacities likely to significantly increase light scatter, or other conditions or medications that might affect the visual field. All protocols were approved and monitored by the Legacy Health Institutional Review Board and adhered to the Health Insurance Portability and Accountability Act of 1996 and tenets of the Declaration of Helsinki. All participants provided written informed consent once all of risks and benefits of participation were explained to them.
Portland Progression Project participants were tested every 6 months using a variety of structural and functional tests. Standard automated perimetry was performed using an HFA II perimeter, with the 24-2 test pattern, a size III white-on-white stimulus, and SITA Standard testing algorithm.9 Only test results with <33% false negatives and false positives were used. Test results with >33% fixation losses were included only if the technician performing the test noted that fixation remained “stable” as observed on the instrument's operator screen, as fixation losses can be erroneous if the blind spot is inaccurately mapped at the start of the test. For this study, only eyes with series of at least six reliable tests by these criteria were included in the analysis.
Participants with glaucoma (excluding pseudoexfoliation glaucoma) were recruited into the Rotterdam Eye Study (RES) from the Rotterdam Eye Hospital. Details of the RES study have been published elsewhere.16,17 The dataset was made available for research and was downloaded from http://www.rodrep.com/data-sets.html (in the public domain) on June 25, 2015. Inclusion criteria required that two of the following four conditions were met: pattern standard deviation was significantly abnormal, with P < 0.05; glaucoma hemifield test result was abnormal; presence of a cluster of ≥three locations, depressed at P < 0.05; or one location at P < 0.01. Visual field defects had to be reproduced on at least one additional occasion. Exclusion criteria were secondary glaucoma (except for pigmentary glaucoma); evidence of a functional defect consistent with other disease; visual acuity worse than 0.3 logMAR; refractive error outside the range of −10.0 to +5.0 diopters (D); cataract surgery in the previous 12 months; previous refractive or vitreoretinal surgery; evidence of diabetic retinopathy, diabetic macular edema, or other vitreoretinal disease; previous keratoplastic surgery; diabetes; leukemia; acquired immune deficiency syndrome (AIDS); uncontrolled systemic hypertension; multiple sclerosis or (other) life-threatening disease. All participants provided written informed consent, and the study conformed to the tenets of the Declaration of Helsinki.
Participants in RES were tested approximately every 6 months, using a variety of structural and functional tests. Standard automated perimetry was performed in the same manner as in the P3 study, except that the Full Threshold algorithm was used. Again, only eyes with series of at least six tests were included in this analysis.
Pointwise sensitivities from perimetry were used in the instrument's native dB scale. Although printouts from the HFA perimeter report sensitivities as “<0 dB,” if the person does not respond to the 0-dB stimulus, for analytical purposes, the perimeter software treats such locations as having sensitivity equal to −2 dB when using the SITA Standard and −1 dB when using the Full Threshold algorithm. Censored sensitivities were calculated on the basis that sensitivities below some cutoff CdB are unreliable. Therefore, any sensitivities recorded as equal to or less than the CdB were set as having a censored sensitivity equal to CdB. Previously, using frequency-of-seeing curves, we reported that the reliability cutoff is approximately 15 to 19 dB for size III stimuli and threshold estimates made with the SITA Standard,8 but for this study, we repeated the analyses using all integer values of CdB from 0 to 35 dB. All analyses were performed using R statistical programming language.18
Each of the 52 non–blind spot locations within the 24-2 test pattern was labeled progressing if the rate of change was worse than −1 dB/y and the slope was significantly different from zero at the P < 1% level. Otherwise, the location was labeled stable.13 In the primary analysis, the eye was labeled progressing if at least three locations were progressing by these criteria. In a secondary analysis, the eye was labeled progressing if at least three neighboring locations were progressing by these same criteria. “Neighboring locations” were defined as those within ±6° both horizontally and vertically (i.e., a maximum of eight neighbors), and within the same hemisphere (except for locations temporal of the blind spot where the same hemisphere requirement was not imposed). Requiring three locations to be progressing results in a strict criterion than will provide high specificity14; the aim is to determine whether censoring would cause eyes to be considered stable when there is strong evidence of progression, rather than examining more borderline cases. Analyses were performed using the last six visual fields for each eye and then repeated using the last eight visual fields per eye when a sufficiently long series was available.
The P3 dataset contained series of ≥6 reliable visual fields from 474 eyes of 242 participants. Of these, 410 eyes of 211 participants had ≥8 reliable visual fields. The RES dataset contained series of ≥6 visual fields from 277 eyes of 139 participants, and all those series had ≥8 visual fields. The Table summarizes the two datasets. Histograms of the pointwise sensitivities at the end of each series are shown in Figure 1 for the two datasets.
Figure 2 shows the number of locations flagged as progressing in each dataset, using either the most recent six visual fields (Fig. 2, black lines) or the most recent eight visual fields (Fig. 2, red lines). Figure 3 shows the number of eyes flagged as progressing based on identification of four different criteria: (1) three progressing locations, using the most recent six visual fields (Fig. 3, black solid line); (2) three progressing neighboring locations, using the most recent six visual fields (Fig. 3, black dashed line); (3) three progressing locations, using the most recent eight visual fields (Fig. 3, red solid line); and (4) three progressing neighboring locations, using the most recent eight visual fields (Fig. 3, red dashed line).
For the P3 cohort, the number of progressing eyes is out of a possible total of 474 for the black lines, and 410 for the red lines. For the RES cohort, the number is out of a possible total of 277 eyes in both cases. The left-most data point on each figure is equivalent to current clinical practice, that is, not imposing further censoring on the data. As expected, requiring the three progressing locations to be neighbors reduces the number of eyes flagged as progressing. Extending the series length from six to eight visual fields increases the number of eyes flagged as progressing. This is likely due to the fact that longer series are more likely to be significant at the level of P < 1% for the same rate of change because of a greater number of observations. It should also be noted that sensitivities reported by the Full Threshold algorithm (as used in the RES cohort) are approximately 1 dB lower on average than those obtained from the SITA Standard (as used in the P3 cohort), likely because Full Threshold defines sensitivity using the contrast of the last stimulus seen, which will therefore tend to have higher intensity than the psychophysical threshold.12 Therefore, a CdB = 15 dB when Full Threshold is used is approximately equivalent to a CdB = 16 dB when SITA Standard is used.
It can be seen that whichever of the four progression criteria is used, in the P3 dataset (tested using the SITA Standard test), a CdB of 15 to 19 dB does not appear to hamper the ability of pointwise linear regression to detect eyes that are deteriorating. In the RES dataset (tested using Full Threshold test algorithm), cutoffs below 15 dB may cause a small reduction in the number of locations and (to a lesser extent) the number of eyes flagged as deteriorating when using series of eight visual fields, but no such trend is apparent when using series of six visual fields. In contrast, for either dataset, if a cutoff of, for example, 25 dB were imposed (i.e., all sensitivities below 25 dB would be set equal to 25 dB), then this would result in a substantial reduction in the ability to detect functional change.
It has long been established that perimetric sensitivities become more variable in regions of the visual field displaying damage. Our recent paper showed that the correlation between perimetric sensitivities from the SITA Standard test and the true psychophysical detection threshold is in fact so weak below 15 to 19 dB that these measurements cannot be considered reliable.8 Continued testing at such locations may be unimportant for detection of damage, because a sensitivity of 15 dB is abnormal in almost all circumstances. Therefore, the primary potential use of such a sensitivity estimates is for assessing progression. Results of this study question whether change detection is helped by such clinical measurements.
When a localized scotoma exists, containing severely damaged locations below 15 to 19 dB but still with some remaining function, these locations can of course continue to deteriorate. However, at this relatively late stage of the disease process, it is uncommon for such changes to occur without concurrent changes elsewhere in the visual field. It is much easier to detect changes happening elsewhere at locations with higher sensitivity, due to the concomitant lower variability. As seen in Figure 3, the number of eyes flagged as progressing was not reduced by moderate censoring, even though there may be a few locations that are no longer flagged as progressing.
It can be seen from Figure 3 that the number of eyes flagged as progressing in the P3 cohort does not decrease until the cutoff CdB is greater than 15 to 19 dB. By contrast, in the RES cohort, censoring might have caused a slight decrease in the ability to detect change. It seems there is more useful information contained in sensitivities between 10 and 20 dB in the RES cohort. There are two major differences between the RES and P3 cohorts that could explain this discrepancy. First, the RES cohort contained more severely damaged eyes. The average mean deviation shown in the Table is much lower for the RES cohort (−9.2 dB) than for the P2 cohort (−1.3 dB). Consequently, the proportion of eyes undergoing change might be expected to be higher for the RES cohort. Without censoring, 24 of 474 eyes (5.1%) were flagged as progressing in the P3 cohort and 25 of 277 eyes (9.0%) in the RES cohort (Fig. 3, solid red line). However, although this difference in severities could impact the likelihood of glaucomatous progression, it should not affect the ability to detect functional change in those eyes that are progressing. It should also be noted that, clinically, as in Figure 1, most locations tended to appear toward the normal end of the spectrum.19
The second and more consequential difference between the two cohorts is that P3 used the SITA Standard test, whereas RES used the Full Threshold test algorithm. If the asymptotic maximal response probability is indeed reached at 15 to 19 dB, then it will not be possible to accurately assess this maximum using just one or two presentations in the SITA Standard. However, the Full Threshold algorithm performs more stimulus presentations at high contrast, using step sizes of 4 or 2 dB and two reversals. Maximal response probability will be reflected in the proportion of these presentations to which the individual responds and hence will be (at least weakly) reflected in the reported sensitivity. Continuing functional deterioration would reduce this probability and hence reduce the reported sensitivity. This means that reported sensitivities may still contain useful information later in the disease process when Full Threshold algorithm is used than when SITA Standard is used. Unfortunately, the Full Threshold algorithm typically takes more than 15 minutes per eye in damaged eyes, and its potential benefit of being able to obtain useful information slightly later in the disease process is unlikely to outweigh the practical considerations of a much longer test duration. Notably, even when Full Threshold was used in the RES cohort, there was still no obvious reduction in the ability to detect changes when sensitivities were censored up to CdB of approximately 10 dB.
Stopping testing at a given location rather than continuing to test with very high contrast stimuli would seem to reduce the dynamic range of perimetry. Given the difficulty with using structural tests to assess progression in severe glaucoma,20 it seems counterintuitive to reduce the possibility of using functional measures to assess progression. However, the variability is so high in these situations that it is already extremely difficult to reliably assess functional progression at such locations, using current clinical perimetry. Presently, a damaged location progresses from a reliable sensitivity above 15 to 19 dB to something that fluctuates unpredictably within the range between 0 and 15 to 19 dB, to a consistent value of <0 dB when no function remains. If testing ceased at 15 dB, then the main change clinically would be that measurements in the unreliable range would be labeled <15 dB instead of some number within that range, in the same way that locations where the observer does not respond to any stimuli are currently reported as having sensitivity <0 dB. As demonstrated in the results above, the concomitant reduction in variability means that the ability to detect progression would not be hindered.
Testing algorithms could usefully be altered in light of these findings. Algorithms could simply stop testing a location once 15 dB was reached. The SITA Standard test currently takes up to twice as long for severely damaged eyes compared with testing for normal eyes,21 compromising reliability by increasing fatigue. It also incorporates postprocessing using estimates from neighboring locations,22 so a reduction in variability at locations below 15 dB could also reduce variability at locations above this cutoff. Ceasing testing at 15 dB would therefore help in obtaining accurate sensitivity estimates at other (healthier) locations elsewhere in the visual field. This would allow easier detection of change at those locations and hence should negate any slight loss of ability to detect change caused by censoring at very damaged locations. It would also allow the test duration to be more similar across different disease severities. It is known that structural measurements, such as retinal nerve fiber layer thickness, reach a floor at approximately this same level of damage.23,24 Attention could be focused on monitoring other areas of the visual field where function is still relatively preserved and where both functional and structural measurements remain reliable.
An alternative would be for the algorithm to perform repeated presentations at severely damaged locations, using a 15-dB stimulus. This would enable making an estimate of the asymptotic maximum response probability at that location, allowing further assessment of pointwise change. Indeed, we hypothesize that this effect is responsible for the fact that useful information can be obtained at lower sensitivities with the Full Threshold algorithm than with SITA Standard. However, this would take longer than simply ceasing testing at 15 dB.
Another possibility would be to increase stimulus size at damaged locations. We have previously shown that a location will remain within its reliable stimulus range later into the disease when using a size V stimulus25; that is, it was still possible to obtain reliable sensitivity measurements using larger stimuli for some time after it had become impossible using smaller stimuli. This is because increasing stimulus size increases the contrast sensitivity, so the location remains within the reliable stimulus range later in the course of the disease. The limit of the reliable stimulus range was still 15 to 19 dB when the larger size V stimulus was used,25 so, although it will take longer for the location to reach this limit, there would still be no benefit to testing with very high contrasts.
For clinical purposes, pointwise linear regression is not optimal when encountering very low sensitivities because of the floor effect at 0 dB. If a CdB >0 dB is imposed, then this floor effect will become even more pronounced. In order to resolve this problem, Tobit or exponential regression models have been proposed for longitudinal analysis.26 However, using these models would necessarily improve the fit when censoring sensitivities, allowing more locations to achieve significance at the P level of <1%. Therefore, for the purpose of this study, standard pointwise linear regression was used to avoid biasing the conclusions. An alternative way to analyze such censored data would be to omit all measurements below CdB, instead of setting them to equal CdB; but this would greatly reduce the number of series containing sufficient measurements to perform linear regression, so again it was felt to be unsuitable for this particular study even though it may be beneficial in other circumstances.
One confounding factor that can limit the usefulness of the pointwise linear progression technique is if changes in sensitivity are nonlinear over time. That is why the primary analysis used a fixed series length of the most recent six visits, because longitudinal nonlinearity would be less pronounced over such a time span.27 In another study, we suggested that functional progression on a dB scale might in fact accelerate but be linear on a 1/Lambert scale.28 On such a 1/Lambert scale, differences between a 0-dB stimulus and a 10-dB stimulus are much smaller than differences between a 20-dB stimulus and a 30-dB stimulus. Therefore, the effect of censoring would be even smaller than that shown here.
In conclusion, we found that moderate censoring of perimetric sensitivities did not adversely affect the ability to detect and monitor visual field deterioration. Automated testing algorithms could be altered to stop presenting increasingly high-contrast stimuli beyond some cutoff at damaged locations. Based on our previous results,8 we suggest a cutoff of 15 to 19 dB, and the results shown here are consistent with that suggestion. Algorithms could then simply report that the sensitivity is below the chosen cutoff, similar to how they are currently reported when sensitivity is below the arbitrary cutoff of 0 dB. Alternatively, repeated testing could be performed at 15 dB in order to estimate the maximum response probability at that location; or the stimulus size could be increased, enabling reliable sensitivities to still be measured. Meanwhile, the search for visual field deterioration should concentrate on locations in which moderately good sensitivity remains.
Supported by US National Institutes of Health Grants R01-EY020922 (SKG), R01-EY019674 (SD), and R01-EY024542 (WHS) and the Good Samaritan Foundation.
Disclosure: S.K. Gardiner, None; W.H. Swanson, Carl Zeiss Meditec (C), Heidelberg Engineering (C); S. Demirel, None