|Home | About | Journals | Submit | Contact Us | Français|
Automated measurements of the retinal nerve fiber layer thickness on circular OCT B-Scans provide physicians additional parameters for glaucoma diagnosis. We propose a novel retinal nerve fiber layer segmentation algorithm for frequency domain data that can be applied on scans from both normal healthy subjects, as well as glaucoma patients, using the same set of parameters. In addition, the algorithm remains almost unaffected by image quality. The main part of the segmentation process is based on the minimization of an energy function consisting of gradient and local smoothing terms. A quantitative evaluation comparing the automated segmentation results to manually corrected segmentations from three reviewers is performed. A total of 72 scans from glaucoma patients and 132 scans from normal subjects, all from different persons, composed the database for the evaluation of the segmentation algorithm. A mean absolute error per A-Scan of 2.9 µm was achieved on glaucomatous eyes, and 3.6 µm on healthy eyes. The mean absolute segmentation error over all A-Scans lies below 10 µm on 95.1% of the images. Thus our approach provides a reliable tool for extracting diagnostic relevant parameters from OCT B-Scans for glaucoma diagnosis.
Ophthalmology has been one of the main application areas of Optical Coherence Tomography (OCT)  since its invention in 1991. OCT allows a direct visualization of the retina and its layered structure. This proved to be very beneficial for glaucoma disease research. Glaucoma is one of the most frequent reasons for blindness in the world [2, 3]. During glaucoma progression the supporting tissue and nerve fibers in the retina are lost. Thus a thinning of the innermost retinal layer, the retinal nerve fiber layer (RNFL), is observed [4, 5, 6, 7]. Since the first appearance of commercially available OCT systems, automated retinal layer segmentation algorithms were presented to objectively quantify the RNFL thickness and its loss. While segmentation algorithms are built into commercial systems, details of their design remain undisclosed. Below we give a short overview of only the published research on retinal layer segmentation.
To our knowledge, Koozekanani et al.  presented for the first time an automated retina segmentation algorithm on Time Domain (TD) OCT scans. This first publication on the topic already mentioned one main challenge. The noise that corrupts OCT images is non-Gaussian, multiplicative and neighborhood correlated (see also Schmitt et al. ). Thus, it can not be easily suppressed by standard software denoising methods. Besides this observation one key element of subsequent algorithms was already included in this early paper: An edge detection approach was introduced which also takes the leading sign of the derivative into account so as to differentiate between rising and falling contrast along a depth profile . This method outperformes simple thresholding algorithms, as they tend to fail due to the high variance of intensity values on OCT images. Their evaluation considered only healthy normal subjects. Ishikawa et al.  included glaucoma patients in their performance analysis. To our knowledge, this was the first RNFL segmentation algorithm published and the only group that evaluates the algorithm on both glaucoma patients and normal subjects. In a subsequent paper five different layers were segmented . For the evaluation the quality of the images was measured with the quality index of Stein et al. . It was stated that the automated segmentation fails on more than half of the low quality images. According to Ishikawa et al. a good quality scan is harder to achieve on diseased eyes, which makes the automated segmentation challenging. Additionally, a visualization of the RNFL thickness in 2D interpolated out of multiple radial scans was shown. While Ishikava et al. used an integrity check to connect neighboring 1D depth profile segmentations, Shahidi et al.  just averaged thickness profiles in transversal direction. This lead to thickness profiles of much coarser resolution. Fernandez et al.  proposed complex diffusion (see Gilboa et al. ) instead of the typical median filtering as a preprocessing step for the segmentation. Seven retinal layers including the inner/outer nuclear layer and inner/outer plexiform layer were segmented. It is stated that pathologies may violate assumptions made in the algorithm and thus parameters have to be adjusted. In the same year Mujat et al.  presented an algorithm to segment the RNFL on Frequency Domain (FD) OCT Volume data. The segmentation was done frame-by-frame in 2D. While results were shown only from two volumes of healthy normal subjects and no evaluation was performed, the number of steps included in the algorithm leads to the assumption that FD data, although of higher resolution and with fewer motion artifacts, presents more challenges to segmentation developers. Somfai et al.  investigated how decreased image quality caused by operator errors affects segmentation results.
Baroni et al.  formulated an edge-likelihood function consisting of a gradient and a smoothness term to segment TD-OCT images. The most recent work still concentrating on the segmentation of TD-OCT images was published by Tan et al. . A segmentation based on progressive edge detection was presented that regularizes the segmentation result by averaging fewer A-Scans in each segmentation refinement step. The algorithm itself was not evaluated, since the scope of the work was not on the algorithm development, but on the investigation of parameters generated from the segmentations for glaucoma diagnosis. Images with segmentation errors were discarded from the presented study. Later Tan et al. investigated further glaucoma diagnostic parameters on 3D FD-OCT volume scans .
The first method that really made use of 3D information is the retina segmentation from Haecker et al.  which was further developed to a multilayer segmentation by Garvin et al. . Here six radial linear TD-OCT 2D scans were combined to a volume. A 3D-Graph search to minimize a cost function for each layer (for up to five layers) was performed to segment the volume. For each boundary, assumptions in the cost function were made that could include e.g. a signed edge, summation of pixel intensities in limited regions and summation of pixel intensities from already segmented borders. The algorithm was evaluated on data from subjects with unilateral chronic anterior ischemic optic neuropathy. No glaucoma patients were included in the evaluation. The approach was further extended to all 10 retinal layers in Quellec et al. . In this work, the focus shifted from the actual segmentation of the retinal layers to an application of the segmentation results. Thickness measurements, together with texture features from image data within the layer boundaries, are used to build up a model for the detection of pathologic retina abnormalities like fluid filled regions.
Recently, multiple sophisticated methods known from computer science and optimization theory were applied and modified for OCT-Data. Tolliver et al.  uses spectral rounding to detect the RNFL on FD-OCT data of normal subjects and patients. Mishra et. al.  uses a two step optimization scheme solved with dynamic programing to segment all retinal layers. It was applied on scans of rat eyes, not on data from humans. The same holds for the work of Yazdanpanah et al. , who uses active contours. An enery functional that includes a shape prior is minimized. Such prior information can be either formulated out of heuristic considerations, or computed from training data. Kajić et al.  use the latter in their appearance shape model based approach. The evaluation shows that the algorithm is very robust and insensitive to noise. However the results were only generated from data of normal eyes. The evaluation from Chiu et al.  is also performed only on data of normal subjects, but visual examples in the paper show that plausible segmentations even in severe pathologic cases can be generated by the method. It is based on graph theory and dynamic programming. Vermeer et al.  also avoid heuristics in their approach. A support vector machine is trained to classify every pixel of a volume for its position above or below certain boundaries. The resulting segmentation is regularized by a level set method. The classifier was trained on data of normal subjects. An evaluation was performed on volume scans of 10 normal subjects and 8 glaucomatous patient, but only one or two B-Scans per volume scan were taken into account for the evaluation. The segmentation algorithm showed decreased performance on the pathological data.
The work of Götzinger et al.  used the additional information of a polarization sensitive OCT system provides for segmenting the retinal pigment epithelium with two different methods. Patient data was shown, but a quantitative evaluation was not performed.
Contrary to the aforementioned boundary search approaches, Fuller et al.  proposed a half-manual region-based classification to segment layers. Out of manually marked regions on sample frames, a SVM classifier was trained for segmenting the whole FD-volume scan. A similar classifier-based half-manual approach was followed by Szulmowski et al. . Joeres et al.  and Sadda et al.  presented a completely manual segmentation tool for TD-OCT Scans. In pathologies like age related macula degeneration, or pigment epithelial detachment automated algorithms will most likely fail due to heavy abnormalities in the data. They showed that a manual segmentation provides reliable and reproducible results in these cases.
In this paper we present a completely automated segmentation of the RNFL. To our knowledge this is the most important layer for glaucoma diagnosis and is thus the focus of our work. The segmentation is a 2D approach working on circular FD-OCT scans, but can be easily applied on 3D volume data, as will be shown. The goal during the development of the algorithm was to make as few assumptions on the layer borders as possible. The employed assumptions should also be very general. Reliable application on pathological cases should be possible without changing any parameter. In Section 2 our method is presented. Qualitative visual examples and quantitative evaluation results are shown in Section 3. Section 4 includes a summary, conclusions, as well as further ideas.
We use two datasets that are described in the following subsection (2.1). The processing steps to segment the RNFL on circular B-Scans are covered in the subsequent two subsections. First the inner and outer retinal borders are detected (2.2). Particularly, the inner border of the retina is detected at the inner border of the RNFL, or internal limiting membrane (ILM). The outer retinal border is detected at the the outer retinal pigment epithelium boundary (RPE). The algorithm for finding the position of the outer nerve fiber layer boundary (ONFL) is described in (2.3). The method is then extended to 3D (2.4). The final subsection presents our evaluation methodology (2.5).
Circular B-Scans were acquired from 204 subjects with a Spectralis HRA+OCT (Heidelberg Engineering, Heidelberg, Germany). This OCT device is referred to as Spectralis for the remainder of the paper. The scans were centered at the optic disk and had a diameter of 3.4mm. The B-Scans consisted of 512 or 768 A-Scans. Each A-Scan consists of 496 pixels. The axial resolution of the Spectralis is 7µm in tissue, although the pixel length is 3.87µm. The images are thus oversampled in the axial direction. The raw data was exported using the VOL file format of Heidelberg Engineering. The pixel intensity value range in the VOL files is [0;1], saved as 32 bit floating point values. All computations were performed in the same data format.
To clarify denominations in this work: A-Scan and depth profile are used interchangeably. B-Scan, OCT image and 2D frame are also used as synonyms. OCT volumes are also referred as 3D data. In the description of the algorithm the axial direction is Z. To simplify formulas, the transversal direction in a circular Scan, giving the position of an A-Scan in the resulting image, is denominated only by R. The transversal directions in a volume are X and Y. The Z direction as well as the Y direction have their origins in the upper left corner of the corresponding images. Figure 1 illustrates these notations. Unless stated otherwise, the intensity values of the VOL-File are double square routed for display reasons as proposed by Heidelberg Engineering. All abbreviations and symbols are shown in Table 5.
The recorded data has a wide range of subjectively perceived quality. Unfortunately, at the time the data was recorded the Spectralis software had no built-in quality index. Therefore we chose the following quality measure that, by visual inspection, correlates well with the subjectively perceived quality.
QI is the quality index (QI). I(z, r) denotes the image intensity at position z in Z-direction and r in R-Direction. #NI(z,r)=0 is the number of pixels on the scan with intensity value 0. This is normalized by #N, the complete number of pixels on the scan. The quality index is motivated as follows: The scans result from an averaging of multiple single scans. Fewer scans yield a higher noise level. We observed that on the Spectralis images with high noise level some pixels have 0 intensity even inside the tissue. This is even more pronounced in the dark regions outside the retina. By averaging more images this effect diminishes, as the number of randomly distributed zero intensities significantly decreases. Because it is not possible to read out the number of single captures out of the VOL files in the present software version, our simple QI is presumed to be an objective measurement of the quality of an image. Subtracting the fraction from 1 yields an ascending index. High QI values correspond to a high quality image. The QI is not necessary in complete agreement with with a subjective assessment, but it provides a rough estimate for evaluation and data description. The distribution of QI in our circular B-Scan evaluation data set can be seen in Fig. 2. No image of the dataset was excluded due to quality reasons.
Two groups of subjects formed our circular B-Scan dataset: Glaucoma patients (72 subjects, age 62.1±9.4, mean defect 5.5±6.2) and healthy normal subjects (132 subjects, age 49.8±16.3, mean defect 0.5±0.6). All patients were members of the ‘Erlangen Glaucoma Registry’ with annual visits to our glaucoma service. The inclusion/exclusion criteria and the type of examinations are defined in a protocol which was approved by the Local Ethics committee. The study is registered at www.clinicaltrials.gov (NCT00494923). The study followed the tenets of the declaration of Helsinki for research involving human subjects and informed consent was obtained from all participants of the study.
Only one eye of each subject was taken into account for the evaluation. The subjects were diagnosed based on an ophthalmic examination using slit lamp inspection, applanation tonometry, funduscopy, gonioscopy, perimetry and papillometry. A 24 hours intraocular pressure profile with 6 determinations was also obtained. A detailed review of the employed diagnostic routine can be found in Baleanu et al.  and Horn et al.  and is not within the scope of this paper.
In addition to the circular B-Scans volume scans were acquired centered on the optic nerve head. The number of pixels are 512 in the X-direction, 97 in the Y-direction and 496 in the Z-direction. The pixel spacing in the X-direction is 11.55µm, in the Y-direction 61.51µm and 3.87µm in the Z-direction. One volume scan from a glaucoma patient and one volume scan from a healthy normal subject are discussed in Section 3 as exemplary cases.
All algorithm parameters were adapted by visually inspecting random sets of images from the database. The processing steps of the algorithm are shown in Fig. 3 with visual examples provided in Fig. 4. A scan from a glaucoma patient was chosen as an example. It shows a nearly complete loss of the RNFL in the transition between the inferior and temporal quadrant, while the other regions still have relative high RNFL thickness.
To limit the search space for the retinal boundaries, first a separating line located inside the outer nuclear layer is identified. It splits the image content into the the inner segment (ISG) and the outer segment (OSG) of the retina. The image is blurred with a wide Gaussian filter (standard deviation σ = 22pixels). The separating line is the lowest intensity value inbetween the two maximas with the highest intensity value (see Fig. 4 (a) and Fig. 5(a)). The intensities of each A-Scan were scaled to [0;1] in the ISG and OSG separately. For a rough speckle noise removal, a 2D median filter of size 5 in the Z- and 7 in the R-direction is applied twice, as proposed by Ishikawa et al. . The ILM is then set to the greatest contrast rise in the ISG, while the RPE is set to the greatest contrast drop in the OSG. Both lines are smoothed. Smoothing includes an outlier detection by fitting a polynomial of degree 5 and removing distant line segments afterwards. Short disconnected segments are also removed, a median filter and Gaussian smoothing are applied. Gaps are closed by linear interpolation. After the smoothing, the A-Scans of the original unprocessed image are aligned so that the RPE forms a constant even line (see Fig. 4 (b)). This image becomes the basis for the following ONFL segmentation.
Our empirical analysis of the OCT data showed that a simple edge detection for the ONFL, even if the retina boundaries are known, will not give promising results. This holds especially true for a general low image quality, glaucoma patients having a complete local loss of the RNFL and normal subjects with a very thick RNFL. For the last two cases, a state-of-the-art preprocessing with sophisticated denoising as proposed by Fernandez et al.  and Mujat et al  is also insufficient. A neighborhood integrity check as mentioned in  might not be able to cope with a jump of the segmented border in a whole region to a higher contrast outer layer border. Assumptions on the layers, as Garvin et al.  made, may be violated in pathological cases, or parameters have to be adapted for either normal subjects, or glaucoma patients. Our approach is the following: The image is denoised with complex diffusion (see Gilboa et al. ) as proposed by Fernandez et al. . Our implementation is not based on the traditional time-marching implementation but uses lagged diffusivity [36, 37]. The code of the algorithm can be downloaded from the homepage of our work group (http://www5.informatik.uni-erlangen.de) on the personal page of the first author. The timestep parameter was set to 13, while the σCD parameter that controls which gradients are detected as edges, is directly estimated from the image:
I denotes the original image matrix, Imed filt is the original image on which every A-Scan is filtered with a median filter of width 7. The is a heuristic weighting factor. The computation of the standard deviation of all pixels is abreviated by std(...). The noise estimate does not correspond to a physically meaningful noise measurement on the OCT data, but it has by visual inspection proven to adapt to the different noise levels and qualities of the OCT B-Scans. After denoising the image, along the A-Scans the four greatest contrast drops are detected in the ISG (see Fig. 4 (c)) and Fig. 5(b)). Actually, only three layer borders with falling contrast should lie in the ISG beneath the ILM, namely: the ONFL; the border between the inner plexiform layer and the inner nuclear layer; the border between the outer plexiform layer and the outer nuclear layer. To derive an initial segmentation from the detected four minima, the following measurement is used:
The part sum ratio PS yields for every column position r on the B-Scan the ratio of the summed intensities in the ISG and OSG region. The heuristic rule to form an initial segmentation is: If PS is high (PS > 0.7), the second contrast drop beneath the ILM is chosen. This can be motivated by the idea, that a high PS indicates a thick RNFL. Thus, the first contrast drop is most likely speckle noise. In the other cases (PS ≤ 0.7) the first contrast drop beneath the ILM is used. This does not hold if fewer than three contrast drops are detected in an A-Scan. A complete loss of the RNFL can be assumed in that case. The initial segmentation is set to the IFNL. This heuristic method delivers a good estimate of the segmentation (see Fig. 4 (d)), but can be further enhanced.
To improve the segmentation we formulate an energy-minimization based approach:
ONFL(r) gives the Z-position of the boundary at an A-Scan r. E(r) is the energy at an A-Scan r that needs to be minimized. It consists of three terms. Two factors, α and β weight these terms. In the current implementation they are set to and respectively. The first term, G(z, r) is the gradient at depth z. As the ONFL is in Z-direction a contrast fall-off, the gradient should have a negative sign with an absolute value which is as high as possible. N(r) is a smoothing term that ensures that there are no high jumps in the border, while allowing for some edges. It is defined as the sum of the absolute differences in height z of the border in A-Scan r and its neighbors:
The second smoothness term D(r) works on a more global scale. It is motivated by the observation, that when the A-Scans are aligned for an even RPE, the ONFL is partially almost even, too. In Baroni et al.  the distance to a constant line along the whole B-Scan was taken as a smoothness term. We extend this idea. The RNFL should not be as constant as possible over the whole B-Scan but within certain regions. To avoid using arbitrary positions on the image, the regions in between blood vessels are used. D(r) is therefore the distance to the average height of the segmented boundary in between two blood vessels:
BVR means the region in between two blood vessels, #NrBVR is the number of A-Scans in this region. The blood vessel positions are determined by adaptive thresholding. A layer of 8 pixels above the RPE is summed along Z-direction to form a RPE intensity profile. The average of this profile is computed in a 61 pixel wide window. If the value of the profile in the middle of the window is lower than 0.7 times the average, it is marked as a blood vessel. For the BVR boundaries the centers of the blood vessels are computed. As the size parameter of the average window and the threshold are fixed, some large vessels above 12 pixels in width are not detected. This does not affect the segmentation results as typically enough blood vessel centers are detected for splitting the image into regions. Very small blood vessels with a diameter below 4 pixels are ignored in this computation.
The energy Equation (4) is solved iteratively by moving the boundaries in the directions of decreased energy. The initial segmentation is created out of high-contrast edges. As we want to avoid being stuck in this first estimate in the iterative process the initial segmentation is heavily blurred by fitting a polynomial of degree 4 trough it. This polynomial provides the initial values for the energy minimization process. The resulting boundary solution is smoothed similar to the RPE in Section 2.2. The result of the algorithm is shown in Fig. 4 (e).
The aforementioned method is applicable on 2D circular Scans. A direct application on 2D linear scans or slices from volumes leads to segmentation errors. The assumption that the ONFL is roughly piecewise constant is violated. Furthermore obvious splitting points, like the blood vessels, might be missing in some areas as for example the macula region. But a segmentation of 3D data can be achieved by interpolating circular B-Scans with varying diameters d out of the volumes. Given linear scans, circular scans are created through bilinear interpolation of the intensity information in the R-direction out of the X/Y aligned volumes. The center of the circular scans has to be set manually. The segmentation algorithm is then applied to the interpolated scans without any parameter change. A method for visualizing the results is computing a RNFL thickness profile RNFLd(r) from the segmentation by calculating the distance between the ILM and ONFL:
where ONFLd(r) and ILMd(r) denote the border positions in pixels at an A-Scan r for a scan circle diameter d. ScaleZ is the pixel spacing in the Z-direction in µm/pixel. ScaleZ is 3.86µm/pixel in the case of the Spectralis. These thickness profiles are then transformed back from the radial coordinate system of the interpolated B-Scan to the cartesian coordinates of the volume scan and color coded. As the Spectralis provides a hardware registered SLO-image besides the OCT-scan, the resulting 2D thickness map can be translucently laid over the SLO image to provide an instant RNFL thickness visualization. A color coding of the map enables the differentiation between the gray scale SLO image and the RNFL thickness values. Visual examples of the method are given in the results Section 3.
In order to quantitatively evaluate the algorithm we wrote a graphical user interface in Matlab (Mathworks, Inc., Natick, Massachusetts, USA) for displaying the segmentation results. Various display modes are available. For example, the image contrast can be adjusted and layer borders can be switched on and off. The simultaneously acquired SLO image of the Spectralis is also displayed. Manual corrections to the automated segmentation can be made by free-hand repainting of the segmentation borders. No region has to be selected. The method allows for the correction of even the smallest errors. We decided not to allow the observers to draw complete manual segmentations but rather to correct errors of the automatic segmentation. This was done for two reasons: Firstly, offsets in the segmentation lines often appear when they are completely manually drawn, even within one image and especially among different observers. This means that lines or line segments are shifted in the Z-direction by a few pixels distance. The human observers do not draw attention to this constant shift, but it prevents precise evaluation. The second reason for the correction of errors is that a complete manual segmentation of the dataset would be too time-consuming. The error correction alone took up to 12 hours for the entire circular B-Scan dataset for each reviewer.
Two experts in the field (authors MM and RT) and a student individually reviewed all segmented images of the circular B-Scan dataset. All automated segmentations were created with the same parameter set. A common rule on how to treat blood vessel regions was designed beforehand by the two experts. The segmentation boundary should follow the intensity drop at the ONFL as long as it is visible. Throughout the shadow region the two loose ends should be connected with a straight line. The student was trained on example images for the task.
To evaluate the automated segmentation three manual corrections were therefore available. To generate one segmentation per A-Scan that holds as a gold standard (GS) for the evaluation, the median border was selected from the three manual corrections. Thus, outlier corrections are not taken into account. The resulting gold standard is not an average, but lies exactly on a position where at least one observer set it. B-Scans consisting of less than 768 A-Scans were interpolated to 768 A-Scans before the evaluation computations. The differences between the segmentations were not calculated at the specific boundaries, but by looking at the resulting RNFL thickness at each position r. The RNFL thickness was calculated similar to Eq. (8).
For the evaluation of observer differences, an agreement between the observers is given. Agreement in our case defines the percentage of images were the mean absolute observer difference (MAODi) over all 768 A-Scans lies below a certain threshold. The MAODi for a single image with number i in the dataset is computed by:
RNFLObs1,i(r) denotes the RNFL thickness profile of image number i computed out of the manually corrected ILM and ONFL (see Eq. 8) of observer 1. RNFLObs2,i(r) and RNFLObs3,i(r) are defined accordingly. MAOD without the subscript i denotes an average of the MAODi over a part or the complete image data set. This convention holds also for similar B-Scan related measurements. The agreement is then:
The number of images is given by #Img, the number of images with the MAODi below a certain threshold t is given by . If the averaging in the MAODi formula is carried out over the image data set instead of the A-Scans of one image, we obtain the mean absolute observer difference for each A-Scan position (MAOD(r)):
The difference of the automated segmentation to the gold standard (Diffi(r)) for a single image with number i in the dataset is computed by:
where RNFLGS,i(r) denotes the thickness profile according to the gold standard and RNFLautom,i(r) the thickness profile generated by the automated segmentation. We consider thickness profile differences below 8µm as negligible. If an A-Scan has less than 8µm thickness difference, we set it to 0µm for further evaluation. All subsequent numbers and graphs are calculated from these thresholded data. We use this threshold because it lies in the range of the resolution limit of the Spectralis in tissue. It roughly equals a 2 pixel distance on the image. No other exceptions are made, no image is excluded due to algorithm failure reasons.
The mean RNFL thickness(mRNFLi) is a parameter used for glaucoma diagnosis. It is computed by
The difference of the mRNFLi between the GS and the automatic segmentation (mean difference to the gold standard - MDG) is thus
The averaging may cancel out positive and negative difference values. The mean absolute difference to the gold standard (MADGi) is an additional error measurement and better represents the total quantitative segmentation error:
This evaluation measurement can be, similar to the MAOD(r), computed for each A-Scan and over the entire data set instead of on a single image:
Replacing the MAODi in Eq. (10) by the MADGi yields an agreement value between the GS and the automatic segmentation.
The evaluation results presented in Section 3 were generated after all algorithm parameters had been fixed. No parameter was changed during the evaluation procedure. All scans of normal subjects and glaucoma patients were processed with the same parameter set.
Fig. 6 shows visual examples of the segmentation results. Figure 6 (a) displays segmentation results of an image of a normal eye and (b) and (c) of glaucomatous eyes. The RNFL boundaries are detected without severe errors in all these example cases, independent of the actual thickness of the RNFL.The significant variation of the RNFL thickness throughout the image is also captured. This is shown in Fig. 4 (e) and 6 (b). Figure 6 (d) shows a segmentation result of an image with very low quality. Here disruptions in the boundaries, as well as minor segmentation errors, can be observed and are marked with arrows. Note that, on the left and right side of the image clear boundaries can not be observed at all. The varying image quality that is best in the nasal quadrant is due to the signal fall-off of FD-OCT systems. The imaging range of an FD-OCT is limited by this signal fall-off. This is caused by the attenuation of the OCT signal due to the washout of the interferogram fringe visibility with increasing path-length difference of the interferometer . The nasal quadrant of the example image in Fig. 6 (d) was placed on the upper side of the original image before aligning it to the RPE and thus has the shortest path-length difference.
The average runtime of the algorithm was measured on the complete circular B-Scan dataset using a MacBook Pro, Intel Core 2 Duo, 2,66 GHz with 4GB main memory. The code was written in Matlab (Mathworks, Inc.). Only one processor core was utilized. The average runtime was 20.5s. It did not differ substantially from normal subjects to glaucoma patients, or between images of good or bad quality. Included in the average runtime are the loading of the data from the hard disc and storing of the results. The biggest part of the running time (in average 73.5%) was used for the complex diffusion filter. Diffusion filters can be considerably optimized for speed by using multigrid technologies. Other parts of the algorithm can also be accelerated by implementing them in a more efficient language, for example C++. However, algorithm speed is not the focus of this paper. Therefore we did not optimize for computational efficiency.
We first quantitatively review the inter observer deviations. In Table 1 the agreements of the three observers are shown. There is a high degree of inter-observer agreement with an average 93.7% for a 5µm threshold and 98.2% for a 10µm treshold. On the other hand, these numbers show that the match is not perfect.
To localize the mismatch, in Fig. 8 the mean absolute observer difference for each A-Scan position (MAOD(r)) is plotted in addition with the blood vessel densitiy (BVD(r)) and the mean GS RNFL thickness (mRNFLGS(r)). The BVD(r) denotes the percentage of images, where a blood vessel is detected by the automatic algorithm at an A-Scan position r. All 3 curves are computed over the entire circular B-Scan dataset. It can be seen that the main peaks of all curves lie in the same positions. The correlation in between them is also high. Between MAOD(r) and BVD(r) it is 0.86, between the BVD(r) and the mRNFLGS(r) 0.87 and between mRNFLGS(r) and MAOD(r) 0.84. The observer differences are located mainly around blood vessel regions. This shows that, despite the fact that a rule was formulated for the treatment of blood vessel regions, there are still perceptual differences between the observers. The correlation of high RNFL thickness and high blood vessel density is due to the fact, that the OCT scans are taken on a circle of 3.4mm. Using this diameter, the nerve fibers and the main blood vessels are concentrated on the same positions. This does not necessarily hold for other scan circle diameters.
After localizing the inter-observer deviations, a second point of interest is whether normal and glaucomatous eyes were treated differently. The MAOD averaged over a) the 72 glaucoma patients and b) the 132 normal subjects separately is given in Table 2. It is higher for glaucoma patients (2.6µm) than for normal subjects (1.9µm). The difference becomes more evident when examining the mean difference measurements in relation to the mean RNFL thickness for each B-Scan. The relative mean absolute difference is more than twice as high for glaucoma patients (4.7%) compared to normal subjects (1.9%).
The influence of the image quality on the manual segmentations can also be viewed in Table 2. The 72 images with lowest quality (QI < 0.69) were selected from the database for this evaluation. This number was chosen so that it equals the number of images from glaucoma patients. The correlation of the images from glaucomtous eyes and low quality images can be computed in that way. The correlation is 0.12. We conclude that scans from both normal subjects and glaucoma patients were recorded with low quality. The MAOD on low quality images (2.7µm) was higher compared to that of better quality (1.9µm). This shows that, on a good quality scan, boundaries are placed more consistently among human observers.
We also compared the automated segmentations to the gold standard. We first compare the mean RNFL thickness of the GS and the automatic segmentation, as this value serves as a glaucoma parameter in daily cinical routine. The average difference to the gold standard (MDG) over the whole circular B-Scan dataset is 2.4 µm (see Table 3). On glaucoma patients it is 0.9 µm compared to 3.2 µm on healthy eyes. On good quality images it is 2.1 µm compared to 2.9 µm on the low quality images. Note that the average error in all groups is positive. One can conclude that the error is systematic. If errors occur in the segmentation, there is a high probability that the RNFL thickness is on average segmented too thinly.
Despite that, as stated in Section 2.5, averaged positive and negative errors at different A-Scan positions may cancel each other out at least partially when using the MDG as an error measurement. Hence Table 3 also shows the mean absolute differences to the gold standard averaged over all A-scans (MADG). Glaucoma patients have smaller MADG (2.9µm) than normal subjects (3.6µm). If these values again are examined in relation to the mean RNFL thickness, the behavior changes. The average relative error on glaucoma patients is higher (4.9%) than on normal subjects (3.7%). On images with low quality both the absolute and the relative error (4.5µm−5.7%) are higher than on images of good quality (2.7µm−3.3%). This shows that the algorithm results are affected by the presence of pathologies and by low image quality. The effect adds up at the intersection of these two groups (glaucoma and low quality images) with an error of 3.9µm−6.7%.These numbers, however, also state that the effect is not severe and lies within the standard deviations of the measurements in the respective groups. The numbers for all other possible intersection groups are given in Table 3.
The agreement of the automated segmentation with the gold standard shown in Table 4 are generally lower than the inter-observer agreements. If a threshold of 10µm for the MADG is set, 95% of the images show an agreement. Suprisingly, the agreement in the glaucoma group is higher than on scans of normal subjects. 97.2% of the glaucomatous images have an average absolute error of at most 10µm. As it can be concluded already out of the difference measurements, low image quality results in lowered agreement.
A point of special interest is the location of the differences. Figure 9 shows (similar to Fig. 8) the mean absolute difference to the gold standard at A-Scan position (MADG(r)) computed over the whole circular B-Scan dataset. The blood vessel distribution is also shown. They correlate with a factor of 0.84. Most differences are found in the superior quadrant and the inferior quadrant (quadrant positions see Fig. 1). This behavior showed up in the inter observer review (see Fig. 8), too. The segmentation errors concentrate in regions with a high blood vessel density. The treatment of blood vessels in automated segmentations is still an open question, though (see Hood et al. ). It can lead to segmentation differences among OCT system implementations (see Hood et al. ). Our proposed segmentation algorithm generates a thinner RNFL thickness than the current software of the Spectralis in normal subjects. Baleanu et al.  reported a mean RNFL thickness of 97.2±9.7µm for healthy eyes, while our algorithm measured 94.1±11.7µm (see Table 3).
To conclude the evaluation of the automated segmentation on circular B-Scans, a comparison of the MADG(r) to the MAOD(r) is performed. As Fig. 10 shows, the difference of the automated segmentation to the gold standard is higher than the inter-observer difference. But one can also notice that it scales almost linearly. This is an indicator that the automated segmentation fails mostly in regions where a definite objective decision by humans is also hard to obtain. The correlation between the MADG(r) and MOAD(r) calculated over the whole circular B-Scan dataset is 0.93.
The usability of the proposed segmentation algorithm for 3D Volume scans is shown exemplarily on two volume scans. As described in Section 2.4 circular B-Scans with varying diameters were interpolated out of the volumes. As the data density in the Y-direction of the volume is coarse, compared to the X- and Z-direction, and motion compensation is carried out by the Spectralis, a lower image quality is expected. This assumption is supported by a visual inspection. In Fig. 11 an interpolated scan is shown compared to a circular B-scan from a glaucoma patient. Both the interpolated and the circular scan have the same diameter. The interpolated scan suffers from interpolation artifacts especially in the nasal and temporal quadrant. The data density in these regions is the lowest (see arrows in Fig. 11 (b)). When looking at the transition between the inferior and nasal quadrant it must be noted that some image features are shifted. This shift is inherent in the data and not produced by the interpolation algorithm.
Despite the lower quality of the interpolated image and the feature shifts, the RNFL thickness plots of the automated segmentations show a high correlation of 0.82 and low difference in mean RNFL thickness of 2.9 µm. As the algorithm does not make assumptions on the RNFL thickness, it is applied to the interpolated scans without changing any parameter. If now multiple interpolated scans with varying diameter are segmented and the resulting RNFL thickness is color coded and mapped back to the coordinate system of the volume scan, a thickness map is obtained. This map can be translucently laid over the SLO image acquired during the volume scan protocol of the Spectralis. An example thickness map of the same glaucoma patient as in Fig. 11 is shown in Fig. 13 (b). In Fig. 13 (a) a scan of a normal eye is shown in comparison. 100 circular scans with radii between 1 and 3mm are interpolated out of the volumes for these visualizations. This visualization shows the local RNFL loss of the glaucoma patient in the temporal quadrant. A visualization artifact is present in both scans in the nasal region due to the boundary segmentation differences in the B-scan. This artifact does not affect the instantaneous qualitative impression an observer can get from the thickness map.
We presented an automated algorithm for segmenting the RNFL on circular OCT B-Scans. Its key idea is the minimization of an energy term that takes into account the gradient along an A-Scan, as well as local and regional smoothness. The initialization segmentation provided to the minimization procedure is found by denoising the image with complex diffusion and using a heuristic selection of prominent edges. The algorithm makes only few general assumptions on the position and shape of the layer borders and thus can be applied on scans of both normal and glaucomatous eyes.
We have substantiated the applicability of the proposed algorithm by an evaluation on a dataset of images from 132 normal subjects and 72 glaucoma patients. The images were compared to a gold standard generated out of manual corrections to the automated segmentations of two experts and a trained student. The mean absolute difference to the gold standard (MADGi) lies below 10µm in 97.2% of all glaucomatous eyes and 94.0% of all normal eyes. 97.0% of the images with high quality have an overall MADGi below 10µm compared to 91.7% of images with low quality. The inter-observer difference is lower than the one between the automated segmentation and the gold standard, but scales nearly lineary (correlation coefficient 0.93). This leads to the conclusion, that the algorithm produces errors in positions where human observers also do not perfectly agree on. By interpolating circular scans out of OCT volumes the 2D algorithm is also applicable on 3D data. Using multiple scans with varying diameters, a 2D RNFL map can be generated. Relative measurement errors are slightly higher for pathologic data and low image quality. They are mostly located in regions with high blood vessel concentration. Thus, a more accurate segmentation at the position of the retinal vessels will increase the performance of the algorithm. However, the average absolute measurement error, as well as the average relative error, compared to the mean RNFL thickness, are very low. They are below the standard deviation measured in the glaucoma and normal group separately.
We believe that the automated segmentation algorithm for FD-OCT data presented and evaluated in this work provides a reliable tool for extracting diagnostically relevant parameters from circular B-Scans. The same algorithm applied to 3D volume data gives the physician an immediate impression of the RNFL thickness around the optic nerve head.
The authors gratefully acknowledge funding of the Erlangen Graduate School in Advanced Optical Technologies (SAOT) by the German Research Foundation (DFG) in the framework of the German excellence initiative.