This section details an implementation of the generalized segmentation algorithm discussed in Section 2 that automatically segments eight retinal layer boundaries in SDOCT images. The data to be segmented is assumed to consist of raster scanned images densely sampled in the lateral (B-scan) dimension. Each B-scan is segmented independently from other images within the same volume, and is assumed to be centered at the macula without the optic nerve head present.
below shows a full schematic of this algorithm, and the following subsections discuss each of the outlined steps.
Eight retinal layer boundary segmentation algorithm schematic for SDOCT images.
3.1 Image flattening
As mentioned in Section 2, a graph is segmented by computing the minimum weighted path from a start node to an end node. Inherent in this method is the tendency for the shortest geometric path to be found, since fewer traversed nodes results in a lower total weight. As a result, features with strong curvature or other irregularities, even with relatively strong gradients, are disadvantaged since their paths do not reflect the shortest geometric distance. Alternative to the normalized-cut approach [17
], a natural solution to this dilemma is to transform the image such that the desired path is shortened.
To account for the natural retinal curvature seen in SDOCT images, we flatten the image to avoid inaccurate shortcuts across the layers when segmenting.
demonstrates retinal flattening, where is the flattened version of the original image, .
Image flattening. (a) The original retinal SDOCT image. (b) The flattened image.
We created the flattened image based on a pilot estimate of the RPE layer. Our pilot estimate is based on prior knowledge that the RPE layer is one of the most hyper-reflective layers within a retinal SDOCT image. Thus, after denoising the SDOCT image with a Gaussian filter, we tentatively assign the brightest pixel in each column as an estimate of the RPE. We locate outlier pixels, often associated with the NFL, by searching for discontinuities greater than 50 pixels in the RPE estimate. These outliers are removed from the RPE estimate along with pixels lying in columns that present a significantly lower signal-to-noise ratio. We fit a second order polynomial to the remaining valid RPE points, and shift each column up or down such that the RPE points lie on a flat line. Regions of the flattened image that are outside the original field of view are extrapolated from the mirror image of the valid pixels. This avoids border artifacts when later filtering the image. The extrapolated pixels are excluded from weight calculations, and the resulting image is a smoothly flattened retina.
3.2 Calculate graph weights
Graph weights in the literature [17
] often include two terms representing the geometric distance and intensity difference of the graph nodes. Given the relatively high resolution of an SDOCT retinal scan, most features of interest have a smooth transition between neighboring pixels. Therefore, each node is associated with only its eight nearest neighbors, sparing us from incorporating geometric distance weights. All other node pairs are disconnected, resulting in a sparse adjacency matrix of intensity difference graph weights. For example, an [M × N
] sized image has an [MN × MN
] sized adjacency matrix with MNC
filled entries, where C
(here eight) is the number of nearest neighbors. We then define our graph to be undirected, thus halving the size of the adjacency matrix.
In SDOCT images, retinal layers are primarily horizontal structures distinguishable by a change in pixel intensity in the vertical direction. Weights can therefore be calculated solely based on intensity gradients as follows:
|wab||is the weight assigned to the edge connecting nodes a and b,|
|ga||is the vertical gradient of the image at node a,|
|gb|| is the vertical gradient of the image at node b,|
|wmin||is the minimum weight in the graph, a small positive number added for system stabilization.|Equation (1)
assigns low weight values to node pairs with large vertical gradients. In our implementation, ga
are normalized to values between 0 and 1, and wmin
= 1 × 10−5
. These weights are further adjusted to account for the directionality of the gradient. To segment the vitreous-NFL layer boundary, for example, it is known that the boundary exhibits a darker layer above a brighter layer [12
]. In contrast, the NFL-GCL layer boundary has a light-to-dark layer change. As a result, edge maps such as [1
] and [-1;1] can be utilized when calculating the gradient to extract the appropriate layers. Finally, for automatic endpoint initialization as discussed in Section 2.2, the end columns are duplicated and added to either side of the image with vertical edge weights equal to wmin
and all other weights following Eq. (1)
Our implementation for segmenting retinal layers in SDOCT images yields two undirected adjacency matrices sensitive to either dark-to-light or light-to-dark intensity transitions. For layer boundaries exhibiting a darker layer above a lighter layer such as the vitreous-NFL, INL-OPL, IS-OS, and OS-RPE boundaries, we utilize the dark-to-light adjacency matrix. In contrast, for the NFL-GCL, IPL-INL, OPL-PRNL, and RPE-choroid boundaries, which exhibit a lighter layer above a darker layer, we use the light-to-dark adjacency matrix.
shows two complementary gradient images used for calculating edge weights. is the dark-to-light gradient image used to generate weights for segmenting the boundary between a darker layer above a lighter layer, and is the light-to-dark gradient image for boundaries separating a lighter layer above a darker layer.
Gradient images used for calculating graph weights for . (a) Dark-to-light image for segmenting a darker layer above a lighter layer. (b) Light-to-dark image for segmenting a lighter layer above a darker layer.
3.3 Segmenting the vitreous-NFL and the IS-OS
We implement the segmentation of multiple layers in an iterative process, where layer boundaries are cut by order of prominence. The vitreous-NFL and IS-OS are the two most prominent layer boundaries in an SDOCT retinal image due to their high contrast in pixel intensity. Our algorithm utilizes Dijkstra’s method and the dark-to-light graph weights to find the lowest-weighted path initialized at the upper left and the bottom right pixels of the image. The resulting cut is either the vitreous-NFL or the IS-OS.
The vitreous-NFL is the topmost layer of the retina on a B-scan with little hyper-reflectivity above it, unlike the IS-OS. The region directly above the cut is therefore inspected for bright layers in order to determine which layer boundary was segmented. If there is no hyper-reflectivity present in the region, we then conclude that the segmented layer is the vitreous-NFL. Otherwise, we infer that the IS-OS is detected. In order to determine the presence or absence of hyper-reflectivity, the image is first low-pass filtered with a Gaussian kernel and thresholded using Otsu’s method [23
] to generate a binary mask. This step isolates the NFL-OPL and IS-RPE complexes. The fraction of bright pixels in the region above the cut is then calculated for the binary image. If the fraction exceeds 0.025, then we conclude that the segmented layer boundary is the IS-OS due to the presence of the NFL-OPL complex. Otherwise, we conclude that the vitreous-NFL layer boundary was segmented.
Following the limited search space concept described in Section 2.3, if the IS-OS layer boundary was first segmented, the search space is limited to the region 20 pixels above the segmented line so the vitreous-NFL can be segmented next. In contrast, if the vitreous-NFL was segmented first, then the search space is limited to the region 40 pixels below it to segment the IS-OS. Consequently, the algorithm successfully segments the two most prominent layer boundaries.
shows an SDOCT B-scan, where the vitreous-NFL is first segmented, given a search space of the entire image. The search region is then limited to the space shown in , resulting in the segmented IS-OS line. This example also shows the result of the automatic endpoint initialization implemented in Section 3.2. Lastly, note that the pilot IS-OS layer detected at this stage may be inaccurate due to IS-OS and RPE layer confusion. Such errors are corrected in Section 3.8.
3.4 Search region limitation using connectivity-based segmentation
While the vitreous-NFL and IS-OS are easily identifiable due to their prominent hyper-reflectivity, the remaining layer boundaries are not as distinct. To accurately segment these remaining layers, we introduce a method for defining a valid and narrow search region that isolates the layer boundaries of interest. Our method is implemented in two steps. We first develop a pilot estimate of pixel clusters corresponding to the hyper-reflective layer regions using a column-wise intensity profiling technique. Then, we connect and modify clusters associated with a particular layer, in what we call the connectivity-based step. We exploit the resulting layer estimations to limit search regions in Sections 3.6 to 3.8, in which three of the estimated layer boundaries are considered at a time. We use the top and bottom of these layers as the search space area for identifying the cut that represents the middle layer. The two column-wise intensity profiling and connectivity-based segmentation steps are described in the following paragraphs.
First, we enhance the contrast between the light and dark layers. To achieve this, we coarsely denoise the image with a rectangular averaging filter of size 3 × 19 pixels. Next, we threshold this image by setting the values of pixels that are smaller than the median of their corresponding column to zero. An example contrast-enhanced image is shown in
Contrast enhancement. (a) A flattened retinal SDOCT image. (b) The contrast-enhanced image.
A binary mask is then generated to isolate the hyper-reflective retinal layers. This is done by taking the 1-D, column-wise, second-order derivative of the contrast-enhanced image to boost layer boundaries [24
]. Next, we create a pilot binary mask by thresholding the edge-enhanced image at zero. Then, to remove outliers in each column, we set all non-zero clusters less than 5 pixels tall to zero and join the remaining clusters that are closer than 3 pixels from each other. The resulting mask corresponds to the hyper-reflective layers in the retina (i.e. NFL, IPL, OPL, IS-OS, RPE). A horizontal 1-D closing operation with a kernel of 10 pixels in size is performed on the image as a whole to close gaps, and any clusters less than 500 pixels in size are removed. The result is a coarse binary mask of the hyper-reflective retinal layers, as shown in
Fig. 10 (a) A binary mask of the filtered image in . The red arrows mark the location of the holes corresponding to the GCL-IPL complex. (b) A zoomed in binary mask with disconnected layers achieved by interpolating the lower boundaries of corresponding (more ...)
The layers in some columns of the coarse binary mask might be blended together, as seen in the top layers of . We distinguish these merged hyper-reflective layers by interpolating the approximate location from neighboring columns in which they are detached. This is done by first detecting hyper-reflective pixel clusters with holes corresponding to a partially detected middle hypo-reflective layer. For example, note the merged NFL and IPL clusters and the holes corresponding to the GCL layer in . We simply cut the merged layers by interpolating the lower boundaries of corresponding holes, separating all layers as shown . Note that, the interpolating line may not break all merged layers into two clusters, as is the case in the foveal region of . We separate these areas from the original cluster with vertical lines.
The algorithm then examines each column in the original binary mask () and tentatively assigns each cluster of pixels within the column to a particular anatomical layer. The available choices for hyper-reflective layers are the NFL, IPL, OPL, IS-OS, RPE, or “no assignment” in the case that the rules fail to reach a reasonably certain assignment. To do this, the number of clusters is counted for each column. For columns with five clusters, the retinal layer assignments are straightforward since there is a one to one correspondence with the targeted layers. To determine the missing layers in columns with fewer than five clusters, we take advantage of our prior knowledge of the layer order and approximate layer distances. For instance, we assume that the OS-RPE is at least 30 pixels below the vitreous-NFL, and the distance between the RPE and IS-OS layers is less than 10 pixels. The end result is the tentatively-assigned retinal layers shown in
Fig. 11 Retinal layer assignments, where blue = NFL, green = IPL, yellow = OPL, cyan = IS-OS, magenta = RPE. (a) Column-wise layer assignments of the mask in . Note the conflicts in the top three layer assignments. (b) Cluster layer assignments of the (more ...)
To correct possible conflicts in the column-wise layer assignments where a spatially connected group of pixels may have tentative assignments to different retinal layers (), we devise a voting scheme. We reassign all connected pixels from a 2-D binary cluster to a single retinal layer based on the majority tentative assignment of the pixels in that cluster. The result from this cluster-based refinement method is shown in .
The boundaries of the NFL, IPL, and OPL are determined by finding the top and bottom edges of the assigned pixel clusters (). The boundaries of the IS-OS and RPE are found by first distinguishing the two layers from each other. The brightest pixels in the filtered image () that are a) given the IS-OS / RPE layer assignment in the original mask (), and b) furthest from the top of the image, are assigned as the center of the RPE. This center RPE line is then smoothed with a median filter. The RPE edges are located by performing a focused search for large changes in gradient near the center RPE line on the filtered image (). The same is done to find the IS-OS edges by searching just above the RPE layer, resulting in the estimated retinal layers displayed in
. Because these layer estimations still contain errors, they are refined in the subsequent subsections using Dijkstra’s method, with adjacent layer estimations as search region boundaries.
Connectivity-based segmentation of retinal layers using the layer assignments from .
3.5 Vessel detection
Retinal SDOCT images with prominent vessels pose a challenge when segmenting the NFL-GCL boundary, because they result in hyper-reflective bulges in the NFL layer as shown in
. In a very recent study, retinal blood vessels are located through iterative polynomial smoothing [25
]. To address this problem, we incorporate a method that detects major vessels on individual B-scans, making Dijkstra’s algorithm indifferent to bulges in the NFL layer.
Vessel detection. (a) NFL-GCL without vessel detection. (b) NFL-GCL with vessel detection.
In the first step, we tentatively segment the RPE-choroid layer boundary, where the pilot IS-OS is used to limit the search region. Then, a Gaussian filter is applied to the image, and columns ranging from the RPE-choroid to 15 pixels above are summed. Columns exhibiting low sum values, corresponding to the dark areas from vessel shadowing, are defined as the vessel regions.
In the second step, we set edge weights in the vessel regions to wmin prior to segmenting the NFL-GCL. Upon updating the graph weights, the NFL-GCL cut will be indifferent to these vessel regions. demonstrates the effectiveness of the vessel correction algorithm in improving the accuracy of the NFL-GCL boundary detection, which is described in Section 3.6.
3.6 Segmenting the NFL-GCL
After incorporating minimum weights in the vessel regions and estimating the layer boundaries using connectivity-based segmentation, we form a pilot estimate of the NFL-GCL layer boundary using Dijkstra’s algorithm and the updated light-to-dark graph weights. We limit the search region using the vitreous-NFL from Section 3.3 and the estimated IPL-INL from Section 3.4 prior to cutting. Errors may arise, however, due to the similar light-to-dark gradient characteristics of the NFL-GCL and IPL-INL layer boundaries. Moreover, the NFL layer is thicker on the nasal side of the fovea that is closer to the optic nerve head, necessitating an asymmetric, larger search region.
We include a technique in our algorithm to account for the NFL-GCL / IPL-INL confusion. First, the temporal side is predetermined based on knowledge of the scan direction (horizontal / vertical) and which eye (left / right) was imaged. For instance, horizontal scans of the left eye exhibit layer thinning on the right side of the fovea, whereas vertical scans of the left eye show layer thinning on both sides of the fovea.
Using the vitreous-NFL from Section 3.3 and the preliminary NFL-GCL cut, we estimate the NFL thickness. Moving across the image from the thicker to the thinner side of the layer, we find the first instance where the thickness drops below a threshold of 6 pixels and mark it as the divide between the thick and thin sides. We limit the search region on the thinner NFL side to 10 pixels below the vitreous-NFL, and the search region on the thicker side is expanded from 5 pixels below the vitreous-NFL to 5 pixels below the preliminary NFL-GCL cut.
shows the resulting NFL-GCL segmentation before () and after () correcting the layer boundary.
NFL-GCL segmentation. (a) NFL-GCL before correction. (b) NFL-GCL after correction.
3.7 Detecting the fovea and segmenting the IPL to ONL layer boundaries
It is important to detect whether the image contains the fovea, since layers tend to merge together and become increasingly indistinguishable near the fovea. Due to inaccuracies in estimating layer boundaries, errors similar to the example in
often appear in the pilot segmentations for B-scans containing the fovea. To address such problems, we first perform a pilot estimate of the IPL-INL, INL-OPL, and OPL-ONL layer boundaries using the appropriate weighting matrix (light-to-dark or dark-to-light) and Dijkstra’s algorithm. We limit the search regions using the previously segmented lines and the estimated layer boundaries from Section 3.4. For example, the IPL-INL search region is limited to 3 pixels below the NFL-GCL, and above the estimated IPL-INL or INL-OPL boundaries, whichever is higher.
Fovea correction. (a) Segmented image before correction. (b) Segmented image after correction.
In the next step, we estimate the presence and location of the fovea by calculating the NFL, GCL-IPL, INL, and OPL layer thicknesses from the tentatively segmented layers. We then assign columns with a mean layer thickness of less than 5 pixels as the foveal region. Since this region may not necessarily be centered at the fovea, we locate the center by calculating the vitreous-NFL to IS-OS thickness, where the column containing the minimum thickness is located and expanded by 20 pixels on either side. If this region coincides with the foveal region detected prior, then it is included as a part of the fovea.
After locating the fovea, we define more accurate search regions to account for foveal layer merging. This is done by maintaining the position of the lower boundary of the search space, however expanding the top boundary of the search space upward to reach the vitreous-NFL when re-segmenting the IPL-INL, INL-OPL, and OPL-ONL. The resulting segmentations are more accurate, as depicted in .
3.8 Segmenting the IS to choroid layer boundaries
The remaining lower boundaries (IS-OS, OS-RPE, and RPE-choroid) are segmented in a straightforward manner by exploiting the previously estimated layer boundaries, the connectivity-based search regions, and Dijkstra’s algorithm. In particular, we improve upon the IS-OS estimate from Section 3.3 to remove any possible IS-OS / OS-RPE confusion. Our implementation re-segments the IS-OS with a modified search region ranging from 4 pixels below the segmented OPL-ONL line to 2 pixels above the estimated OS-RPE from Section 3.4.
After segmenting all layer boundaries, we unflatten the image by shifting the pixels up or down in the direction opposite to image flattening, thereby restoring the original curvature. The result is an algorithm that segments 8 retinal layer boundaries on a given SDOCT image.