Search tips
Search criteria 


Logo of brjopthalBritish Journal of OphthalmologyVisit this articleSubmit a manuscriptReceive email alertsContact usBMJ
Br J Ophthalmol. 2005 November; 89(11): 1427–1432.
PMCID: PMC1772941

Factors affecting the test-retest variability of Heidelberg retina tomograph and Heidelberg retina tomograph II measurements


Aims: To evaluate the test-retest variability of stereometric parameter measurements made with the Heidelberg retina tomograph (HRT) and Heidelberg retina tomograph-II (HRT-II), and to establish which parameter(s) provided the most repeatable and reliable measurements with both devices. An investigation into the factors affecting the repeatability of the measurements of this parameter(s) was conducted.

Methods: 43 ocular hypertensive and 31 glaucoma subjects were recruited to a test-retest study. One eye from each subject underwent HRT and HRT-II imaging by two observers on each of two occasions within 6 weeks of each other. Lens grading was carried out by LOCS III grading and Scheimpflug camera generated densitogram analysis.

Results: Rim area (RA) and mean cup depth measurements were found to be least variable. Both inter-test reference height difference and image quality had a strong relation (R2>0.5, p<0.0001) with inter-test RA difference and, together, are responsible for 70% of RA measurement variability. Image quality was influenced by lens opacity, cylindrical error, and age. Inter-test RA measurement differences were unrelated to the observer or visit interval.

Conclusions: RA represents an appropriate measure for monitoring glaucomatous progression. Reference height difference and image quality were the factors that most influenced RA measurement variability. Image analysis strategies that address these factors may reduce test-retest variability.

Keywords: Heidelberg retina tomograph, glaucoma, test-retest variability

The ability to evaluate disease progression is of great importance in the management of patients with primary open angle glaucoma (POAG). Likewise, the detection of progression in subjects with ocular hypertension (OHT) could enable early intervention and delay the onset of visual field (VF) loss.1,2 POAG is a chronic, progressive disorder in which damage to retinal ganglion cell axons results in characteristic VF defects. However, there is evidence to suggest that structural changes to the optic nerve head (ONH) may occur before identifiable VF changes are detected by standard perimetry.3–7 Clinical examination of the ONH is of limited value in detecting progression because of interobserver variation.8 An objective approach, longitudinal quantitative imaging of the ONH, may be more useful for the detection of glaucomatous progression.

One imaging method is scanning laser tomography performed with the Heidelberg retina tomograph (HRT) and, more recently, the Heidelberg retina tomograph II (HRT-II). The HRT-II is intended for use in the clinical milieu by various operators, some of whom may not possess the level of experience required for the successful use of the previous model.9 Both devices generate three dimensional mean topography images from which a range of stereometric parameter values can be calculated. These parameters can be measured over time to detect progression. It is useful to estimate the test-retest repeatability of stereometric parameter measurements so that changes caused by disease progression and not by measurement error can be identified correctly.

The purpose of this study was to evaluate the test-retest variability of stereometric parameter measurements made with the HRT and HRT-II, and to establish which parameter(s) provided the most repeatable and reliable measurements. An investigation into the factors that may affect the repeatability of the measurement of this parameter was conducted.


Subject selection

A total of 74 (43 OHT; 31 POAG) subjects were recruited from the ocular hypertension clinic at Moorfields Eye Hospital. OHT was defined as intraocular pressure (IOP) more than 21 mm Hg on two or more occasions and a baseline Humphrey 24–2 full threshold Advanced Glaucoma Intervention Study (AGIS) score of 0.10 POAG was defined as a consistent AGIS VF score greater than 0, and a pretreatment IOP greater than 21 mm Hg on two or more occasions. All subjects had previous experience of scanning laser tomography. For each subject, one eye was selected on the basis that it had a refractive error less than 12 dioptres of spherical power and no history of previous intraocular surgery. In subjects with lens opacity, the eye with the greater degree of opacity was preferentially selected although the presence of lens opacity itself was not a criterion for subject selection. This study adhered to the tenets of the Declaration of Helsinki and had local ethics committee approval and the subjects’ informed consent.

Testing protocol

Image acquisition was carried out by two experienced observers (ETW and NGS) at each of two visits within 6 weeks of each other. The testing sequences were performed using both the HRT (Version 2.01b; Heidelberg Engineering, Heidelberg, Germany) and HRT-II (Eye Explorer Version 1.7.0). In each subject, the scan focus (HRT and HRT-II) and depth of focus (HRT) that were used in the first visit were also used in the second visit. A series of three scans was acquired by each observer at a 10° field of view for the HRT, and 15° for the HRT-II (the two different scanning angles have the same degree of resolution). The following imaging protocol was adhered to for all subjects:

  • Visit 1: ETW then NGS then ETW
  • Visit 2: ETW then NGS.

Following IOP measurement at visit 1, the eyes were dilated using tropicamide 1% to enable one observer (NGS) to carry out lens grading. Subjective grading was carried out using the Lens Opacity Classification System III (LOCS III).11 Nuclear opalescence (NO, range 0.1–6.9), nuclear colour (NC, range 0.1–6.9), posterior subcapsular (P, range 0.1–5.9) and cortical (C, range 0.1–5.9) scores were graded against a standardised transparency. Scheimpflug photography was performed using the Case 2000 series (Marcher Diagnostics, Hereford, UK). The central nuclear dip (CND) value, derived from digitised densitograms, was used as an objective lens score.12

Image analysis

Heidelberg Eye Explorer (Version 1.7.0), the operating system of the HRT-II, was used to generate mean topography images and to perform image analysis. The term “HRT-II Explore” is used to indicate when Explorer was used to analyse the HRT-II mean topographies. The HRT topographies were imported into the HRT-II operating platform as HRT-Port files. HRT mean topographies were generated and analysed using the same Explorer software as the HRT-II images (termed “HRT Explorer”). HRT mean topographies were also generated and analysed using an option on the Explorer software called “HRT Classic ” which is derived from the older MS-DOS HRT software. HRT and HRT-II images may be examined interchangeably, and therefore longitudinally, using Explorer; HRT-II images are not compatible with HRT Classic. Contour lines were drawn by one observer (NGS) on the baseline mean topographies and exported to the subsequent images. Four different image sequences were analysed for both imaging devices:

  • intraobserver/intravisit (ETW then ETW, same visit)
  • interobserver/intravisit (ETW then NGS, same visit)
  • intraobserver/intervisit (ETW then ETW, different visits)
  • interobserver/intervisit (ETW then NGS, different visits).

The Standard reference plane was used for all analyses in this study. The mean pixel height standard deviation (MPHSD) was recorded for each mean topography as a proxy measure of image quality.

Statistical methods

Within subject coefficient of variation was used to examine the repeatability of each stereometric parameter. Within subject coefficient of variation was calculated with the following equations:

An external file that holds a picture, illustration, etc.
Object name is bj67298.e1.jpg

CVw = 100×sw/mean of all repeated measurements

where sw is the common standard deviation of repeated measurements (within subject standard deviation) and CVw is the within subject coefficient of variation.

Test-retest repeatability was also assessed by constructing Bland-Altman plots and by estimating the repeatability coefficient (RC) as:

RC = sqrt(2)×1.96sw

This statistical method was applied when no relation was observed between observation magnitude and observation difference, and when the observation differences were normally distributed13 (JM Bland, personal communication, 2004).

Intraclass correlation coefficient (ICC) was used to estimate the reliability of the parameters generated.

Scatter plots and regression lines were constructed to identify which factors influenced test-retest measurement variability with significant associations assumed at p<0.05. The factors evaluated were age, refractive error (spherical and cylindrical power), IOP, lens score (NO, NC, PS, C, and CND), MPHSD, inter-test reference height difference, disc area, and baseline rim area.

All statistical analyses were performed using Medcalc Version (Medcalc Software, Mariakerke, Belgium) and SPSS Version 11.5 (SPSS Inc, Chicago, IL, USA).


The male:female ratio of the subjects was 41:33 and the right:left eye ratio was 43:31. The baseline subject characteristics are summarised in table 11.

Table 1
 Summary of baseline characteristics of test-retest subjects

Judged by the coefficient of variation, RA and mean cup depth had the highest measurement repeatability (table 22).

Table 2
 Coefficient of variation values for the stereometric parameters generated in the test-retest study

Judged by the ICC, mean cup depth, cup volume, cup area and RA were the most reliable parameters (table 33).). There is no significant difference between the coefficients generated for these parameters in the situation most likely to be encountered in the longitudinal setting (interobserver/intervisit, IV).

Table 3
 Intraclass correlation coefficients for the stereometric parameters generated in the test-retest study

RA and mean cup depth measurements were the most consistently repeatable and reliable. As RA represents the more clinically meaningful measure, subsequent analyses were performed on this parameter.

RA interobserver/intervisit CVw (%) values were 10.3% (HRT Classic), 10.2% (HRT Explorer), and 7.8% (HRT-II Explorer). RA repeatability coefficients were similar for HRT Classic, HRT Explorer, and HRT-II Explorer, and were not affected by the observer or test interval (table 44).). There was a tendency towards more repeatable measurements with HRT Classic, compared with HRT Explorer.

Table 4
 Rim area repeatability coefficients obtained with different types of HRT at different image sequences

Bland-Altman plots (fig 11)) show similar interobserver/intervisit repeatability for the three HRT software platforms. The mean inter-test difference in all cases approximates zero.

Figure 1
 Bland-Altman plots of interobserver/intervisit rim area obtained with HRT Classic (A), HRT Explorer (B), and HRT-II Explorer (C).

Factors affecting repeatability

Inter-test reference height difference and mean image MPHSD were the two factors which consistently had a strong relation (R2>0.5) with inter-test RA difference for all testing permutations. Figures 22 and 33 show scatter plots, with a regression line, of intraobserver/intravisit RA difference against mean MPHSD and against inter-test reference height difference, respectively. Weaker relations (R2<0.5) of inter-test RA difference were observed with CND, LOCS III NC, and NO scores, and cylindrical power. Table 55 summarises these relations for intraobserver/intravisit RA.

Figure 2
 Scatter plot of intraobserver/intravisit rim area difference (mm2) against reference height difference (mm) using HRT Explorer. The regression line is also shown (R2 = 0.7, p<0.0001). ...
Figure 3
 Scatter plot of intraobserver/intravisit rim area difference (mm2) against mean image quality (MPHSD) using HRT Explorer. The regression line is also shown (R2 = 0.4, p<0.0001). ...
Table 5
 Association (R2) between sources of variability and intraobserver/intravisit rim area differences (mm2)

A weak relation (R2 = 0.1) with age was also found with HRT Classic and HRT-II in the interobserver/intravisit and interobserver/intervisit settings. No relation was observed with disc area, baseline RA, spherical power, spherical equivalent (spherical power + cylindrical power/2), IOP, time between visits, LOCS III PS, or LOCS III C scores.

A multiple regression was performed, with RA difference as the dependent variable, and the factors identified as influencing RA difference (that is, reference height difference, MPHSD, CND, NO, NC, age, and cylindrical power) as independent variables. Reference height difference (p<0.0001) and MPHSD (p = 0.05) were the only two significant variables (R2 = 0.7).

To elucidate which factors influenced image quality (as determined by MPHSD), a multiple regression was carried out for intraobserver/intravisit HRT Explorer, with MPHSD as the dependent variable and CND (as a single, objective measure of lens opacity), age, and cylindrical power as independent variables (R2 = 0.5). CND and cylindrical power displayed a highly significant (p<0.0001) relation with image quality, and age showed a weaker but significant (p = 0.03) relation.


Scanning laser tomography is a well established technique that provides reproducible ONH measurements.14,15 The topographic measures produced by the HRT and its predecessor, the laser tomographic scanner (LTS, Heidelberg Engineering, Heidelberg, Germany), have been demonstrated to be repeatable,16,17 and to have less variation compared with other techniques such as computer assisted planimetry.18 Little has been published about the reproducibility of the HRT-II.19,20 As the HRT-II is intended as a “clinical” instrument, its reproducibility under clinical conditions needs to be established. The profile of the subjects in this study was heterogeneous in terms of demographics, disease stage, refractive error, media opacity, and image quality, and therefore simulates the patient profile encountered in clinic. Image quality has previously been shown to be associated with pupil size and the degree of lens opacity (both objective scoring and LOCS III grading). Image quality was seen to improve with pupillary dilation but the improvements were often small.21 Pupil size was therefore not taken into consideration in this study. None of the subjects was taking miotic medications at the time of the study, although this was not a recruitment criterion.

Since the publication of the original reproducibility studies of the HRT,17,22 the Windows based Explorer platform has been introduced. From this perspective, this study is the first to examine the repeatability of HRT defined morphometric parameters using the newer software.

RA and mean cup depth were the most repeatable parameters for both devices. Some caution is required when interpreting coefficients of variability as some parameters, such as cup area and cup shape measure, have mean values of low magnitude (approaching zero). Another difficulty is the interpretation of differences between ICC values of a similar magnitude. It is therefore unlikely that there is any real difference in measurement reliability between mean cup depth, cup area, cup volume, and RA. It should also be noted that the ICC values depend on the variability of the sample population. As our sample was enriched with eyes with lenticular opacity, the ICC values may not be applicable to other populations with less cataract.

Overall, RA and mean cup depth were consistently the most repeatable and reliable of the parameters measured. This concurs with previous findings.23 The findings from another study showed that mean cup depth and cup area were the least variable parameters measured with the HRT-II.19 There is, however, no advantage in measuring cup area as it is merely the difference between disc area (kept constant in Explorer) and RA. As it contains the retinal ganglion cell axons, RA is a meaningful parameter for physicians. It has also been shown to discriminate between normal, glaucoma, and OHT subjects,24–26 and is therefore an appropriate candidate for the assessment of progression.

In this study, the repeatability of RA measurements was similar with both devices (RC = 0.2 – 0.3 mm2), irrespective of observer or test interval. Similar repeatability between imaging performed at the same visit or at different visits is consistent with a previous study, where no difference was identified in the short term and long term variability of topographic measurements.27 The HRT-II performs at least as well as the HRT. The similar level of RA repeatability between HRT-II and HRT Explorer analyses indicate that the two methods could theoretically be used interchangeably in a longitudinal setting.

The sources of variability for the HRT have been documented and include patient/scanner misalignment,28 and interobserver differences in optic disc contour line drawing.18,29,30 The present study identifies inter-test reference height difference to be the most consistent factor related to test-retest variability. It has previously been reported that the use of a 320 μm reference plane reduced RA variability, compared with the Standard reference plane.31 Image quality, as recorded by MPHSD, is another factor consistently found to influence variability. MPHSD is a gauge of the variability of pixel height measurements across the three topographic images used to construct the mean image.14 This study shows that image quality was, in turn, influenced by lens opacity, age, and degree of astigmatism. Sihota et al also found a significant correlation between the test-retest variability of the HRT-II and age and degree of astigmatism.19 Our results suggest that MPHSD may be an appropriate summary measure for the effect of these factors. It is therefore possible to predict repeatability coefficients for various levels of image quality without having to measure the patient’s age, degree of media opacity, or astigmatism.

In conclusion, this study indicates that RA may be an appropriate measure when monitoring glaucoma progression, as its measurements consistently showed excellent repeatability and reliability. Repeatability was similar with both the HRT and HRT-II, irrespective of observer or test interval. Reference height difference and image quality were found to be the factors that influenced RA variability most. The findings of this study will be used as the basis for suggesting strategies for improving test-retest repeatability. Once this is achieved, strategies for monitoring stereometric parameter progression can be devised and tested.


  • CND, central nuclear dip
  • HRT, Heidelberg retina tomograph
  • ICC, intraclass correlation coefficient
  • LOCS III, Lens Opacity Classification System III
  • NC, nuclear colour
  • NO, nuclear opalescence
  • OHT, ocular hypertension
  • ONH, optic nerve head
  • POAG, primary open angle glaucoma
  • RA, rim area
  • VF, visual field


Competing interests: NGS was funded by a Friends of Moorfields research fellowship and through an unrestricted grant from Heidelberg Engineering.


1. Kass MA, Heuer DK, Higginbotham EJ, et al. The Ocular Hypertension Treatment Study: a randomized trial determines that topical ocular hypotensive medication delays or prevents the onset of primary open-angle glaucoma. Arch Ophthalmol 2002;120:701–13. [PubMed]
2. Lee BL, Wilson MR. Ocular Hypertension Treatment Study (OHTS) commentary. Curr Opin Ophthalmol 2003;14:74–7. [PubMed]
3. Sommer A, Pollack I, Maumenee AE. Optic disc parameters and onset of glaucomatous field loss. I. Methods and progressive changes in disc morphology. Arch Ophthalmol 1979;97:1444–8. [PubMed]
4. Sommer A, Pollack I, Maumenee AE. Optic disc parameters and onset of glaucomatous field loss. II. Static screening criteria. Arch Ophthalmol 1979;97:1449–54. [PubMed]
5. Pederson JE, Anderson DR. The mode of progressive disc cupping in ocular hypertension and glaucoma. Arch Ophthalmol 1980;98:490–5. [PubMed]
6. Quigley HA, Katz J, Derick RJ, et al. An evaluation of optic disc and nerve fiber layer examinations in monitoring progression of early glaucoma damage. Ophthalmology 1992;99:19–28. [PubMed]
7. Zeyen TG, Caprioli J. Progression of disc and field damage in early glaucoma. Arch Ophthalmol 1993;111:62–5. [PubMed]
8. Lichter PR. Variability of expert observers in evaluating the optic disc. Trans Am Ophthalmol Soc 1976;74:532–72. [PMC free article] [PubMed]
9. Heidelberg Engineering. Operation manual for the Heidelberg retina tomograph II. Vol 1.7.0. 2003.
10. Advanced Glaucoma Intervention Study. 2. Visual field test scoring and reliability. Ophthalmology 1994;101:1445–55. [PubMed]
11. Chylack LT, Wolfe JK, Singer DM, et al. The Lens Opacities Classification System III. The Longitudinal Study of Cataract Study Group. Arch Ophthalmol 1993;111:831–6. [PubMed]
12. Hammond CJ, Snieder H, Spector TD, et al. Genetic and environmental factors in age-related nuclear cataracts in monozygotic and dizygotic twins. N Engl J Med 2000;342:1786–90. [PubMed]
13. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307–10. [PubMed]
14. Dreher AW, Tso PC, Weinreb RN. Reproducibility of topographic measurements of the normal and glaucomatous optic nerve head with the laser tomographic scanner. Am J Ophthalmol 1991;111:221–9. [PubMed]
15. Kruse FE, Burk RO, Volcker HE, et al. Reproducibility of topographic measurements of the optic nerve head with laser tomographic scanning. Ophthalmology 1989;96:1320–4. [PubMed]
16. Rohrschneider K, Burk RO, Volcker HE. Reproducibility of topometric data acquisition in normal and glaucomatous optic nerve heads with the laser tomographic scanner. Graefes Arch Clin Exp Ophthalmol 1993;231:457–64. [PubMed]
17. Rohrschneider K, Burk RO, Kruse FE, et al. Reproducibility of the optic nerve head topography with a new laser tomographic scanning device. Ophthalmology 1994;101:1044–9. [PubMed]
18. Garway-Heath DF, Poinoosawmy D, Wollstein G, et al. Inter- and intraobserver variation in the analysis of optic disc images: comparison of the Heidelberg retina tomograph and computer assisted planimetry. Br J Ophthalmol 1999;83:664–9. [PMC free article] [PubMed]
19. Sihota R, Gulati V, Agarwal HC, et al. Variables affecting test-retest variability of Heidelberg Retina Tomograph II stereometric parameters. J Glaucoma 2002;11:321–8. [PubMed]
20. Verdonck N, Zeyen T, Van Malderen L, et al. Short-term intra-individual variability in Heidelberg retina tomograph II. Bull Soc Belge Ophtalmol 2002;11:51–7.
21. Zangwill L, Irak I, Berry CC, et al. Effect of cataract and pupil size on image quality with confocal scanning laser ophthalmoscopy. Arch Ophthalmol 1997;115:983–90. [PubMed]
22. Mikelberg FS, Wijsman K, Schulzer M. Reproducibility of topographic parameters obtained with the Heidelberg retina tomograph. J Glaucoma 1993;2:101–3.
23. Tan JC, Garway-Heath DF, Hitchings RA. Variability across the optic nerve head in scanning laser tomography. Br J Ophthalmol 2003;87:557–9. [PMC free article] [PubMed]
24. Wollstein G, Garway-Heath DF, Hitchings RA. Identification of early glaucoma cases with the scanning laser ophthalmoscope. Ophthalmology 1998;105:1557–63. [PubMed]
25. Kiriyama N, Ando A, Fukui C, et al. A comparison of optic disc topographic parameters in patients with primary open angle glaucoma, normal tension glaucoma, and ocular hypertension. Graefes Arch Clin Exp Ophthalmol 2003;241:541–5. [PubMed]
26. Zangwill LM, van Horn S, de Souza Lima M, et al. Optic nerve head topography in ocular hypertensive eyes using confocal scanning laser ophthalmoscopy. Am J Ophthalmol 1996;122:520–5. [PubMed]
27. Chauhan BC, Macdonald CA. Influence of time separation on variability estimates of topographical measurements with confocal scanning laser tomography. J Glaucoma 1995;4:189–93.
28. Orgul S, Cioffi GA, Bacon DR, et al. Sources of variability of topometric data with a scanning laser ophthalmoscope. Arch Ophthalmol 1996;114:161–4. [PubMed]
29. Miglior S, Albe E, Guareschi M, et al. Intraobserver and interobserver reproducibility in the evaluation of optic disc stereometric parameters by Heidelberg Retina Tomograph. Ophthalmology 2002;109:1072–7. [PubMed]
30. Iester M, Mikelberg FS, Courtright P, et al. Interobserver variability of optic disk variables measured by confocal scanning laser tomography. Am J Ophthalmol 2001;132:57–62. [PubMed]
31. Tan JC, Garway-Heath DF, Fitzke FW, et al. Reasons for rim area variability in scanning laser tomography. Invest Ophthalmol Vis Sci 2003;44:1126–31. [PubMed]

Articles from The British Journal of Ophthalmology are provided here courtesy of BMJ Group