Automated layer segmentation of macular OCT images using dual-scale gradient information

A novel automated boundary segmentation algorithm is proposed for fast and reliable quantification of nine intra-retinal boundaries in optical coherence tomography (OCT) images. The algorithm employs a two-step segmentation schema based on gradient information in dual scales, utilizing local and complementary global gradient information simultaneously. A shortest path search is applied to optimize the edge selection. The segmentation algorithm was validated with independent manual segmentation and a reproducibility study. It demonstrates high accuracy and reproducibility in segmenting normal 3D OCT volumes. The execution time is about 16 seconds per volume (480×512×128 voxels). The algorithm shows potential for quantifying images from diseased retinas as well.


Introduction
The recent advancement of optical coherence tomography (OCT) from time domain OCT to spectral domain OCT (SD-OCT) has greatly improved imaging speed and detection sensitivity and has thus allowed acquisitions of larger volumes of data [1][2][3][4][5]. Retinal 3D OCT volumes are now commonly acquired for clinical diagnosis or investigation. But without accurate and fast quantification of these large volumes of data, it is difficult for clinicians or scientists to quickly diagnose or investigate retinal diseases. Therefore, automatically extracting useful information from OCT images, such as calculating total retinal thickness, has become increasingly significant. Segmentation is a critical step towards reliable quantification. Because of segmentation, total retinal thickness and nerve fiber layer (NFL) thickness have become useful indices to indicate the progress of glaucoma. Recent improvements in OCT technology with respect to axial resolution have made the delineation of more intra-retinal layers possible, which will greatly benefit clinical studies in areas such as glaucoma and diabetic retinopathy [6][7][8][9].
Fully automated segmentation of retinal OCT images can be very challenging due to the intrinsic speckle noise, and to the possible presence of blood vessels and other artifacts of motion or reduced illumination. Various methods have been explored to segment the intraretinal layers. The initial straightforward methods were based purely on intensity variation [10][11][12][13][14]. Although adaptive-thresholding [10] and iterative refinement [12,14] techniques have been proposed, segmentation techniques dependent on intensity thresholding still suffer to some degree from intensity inconsistency and discontinuity within a single image or across images. Fabritius et al. incorporated 3D intensity information to further improve the intensity-based segmentation [15]. However, their technique was only demonstrated on two boundaries, the internal limiting membrane (ILM) and the retinal pigment epithelium (RPE) upper boundary, and these borders have the highest intensity and contrast. Recently, segmentation methods based on more complex models, which incorporate not only intensity but also gradient information or other constraints, have been proposed [16][17][18][19][20]. Among these methods, two models have also been widely used in other image modalities, such as ultrasound. One model is based on active contour segmentation [16,17], and the other based on an optimal graph search [18]. The active contour approach is good at finding local optima, but has the limitation of requiring pre-determination of the initial points that must be close enough to converge to the optimal path. Mishra et al. have proposed a two-step optimization solution [17] to first localize the approximate positions of the layers. Compared with active contour approach, the optimal graph search method can guarantee to find the global minimums [18]. The cost functions or constraints of the graph were built on gradient and/or intensity information in a 3D context. In addition, Götzinger et al. demonstrated the use of unique polarization scrambling characteristics to locate the RPE [21], but polarization sensitive OCT is needed to acquire the polarization data.
Most of the above segmentation methods are based on either gradient or intensity information. Gradient and intensity information are actually correlated to some degree, so that when there is a blood vessel or artifact present, both local gradient and intensity information provide little information. This situation may result in errors in the boundary detection process. However, we have noticed that the gradient information obtained with a larger kernel can provide useful information that is complementary to the local gradient or intensity information, since it incorporates neighboring information. In this paper we have simultaneously utilized both local and global gradient information (i.e. dual-scale gradient information) to gather edge information and optimize the edges using a shortest path search method, a type of dynamic programming. The customized local Canny edge detector output and the global axial intensity gradient are chosen as the main costs to build up the edgeoptimization graph. By combining local and global information, the local missing information is compensated for without losing other local details.
Segmentation execution time is an important factor in the development of practical applications [15]. Our aim was to minimize processing time while yielding reliable detection results for a total of nine boundaries. To obtain reliable results, complex models or methods are usually introduced. The reported automated segmentation execution times on 3D data sets in the literature are not necessarily practical for widespread clinical use [14][15][16]18,[21][22][23]. The fastest ones that have been reported are 37s for the ILM and RPE border detection on a 320×1024×138 voxel volume [15] and 70s for eleven layer detection on a 200×1024×200 voxel volume [23]. To gain speed, our segmentation approach utilizes well established algorithms that can be implemented in real-time applications. The proposed nine boundary detection method first gathers the dual-scale gradient information and then performs a shortest path search. The gradient information is based heavily on Canny edge detection and the shortest path search utilizes dynamic programming. Utilizing both Canny edge detection and dynamic programming, the segmentation process can be sped up. We show here that our algorithm is an efficient and effective segmentation technique for quantifying OCT macular images from normal retinas and that it holds promise for quantifying the changes seen with disease as well.

Methods
The edge detection algorithm as a whole is based on gradient information and a shortest path search. The advantage of using gradient rather than intensity information is that the segmentation will not be severely affected by local absolute intensity values as long as the contrasts between layers remain. The gradient information is obtained from two approaches: the local Canny edge result and the global axial intensity gradient. The shortest path search helps to complete and optimize the boundary by incorporating other necessary information. Nine boundaries were detected in this manner, including: boundary 1 corresponding to the inner limiting membrane (ILM); boundary 2 between the NFL and the ganglion cells layer (GCL); boundary 3 between the GCL and inner plexiform layer (IPL); boundary 4 between the IPL and the inner nuclear layer (INL); boundary 5 between the INL and the outer plexiform layer (OPL); boundary 6 corresponding to the external limiting membrane (ELM); boundary 7 between the inner segments (IS) and the outer segments (OS); boundary 8 between the OS and the retinal pigment epithelium (RPE); and boundary 9 between Bruch's membrane (BM) and the choroid. (See Fig. 1 for each layer position.) These nine boundaries are detected sequentially to best utilize the prior information and each boundary detection is based on the same two-step segmentation strategy described as follows.

Two-step segmentation algorithm
The detection of each boundary follows the same two-step segmentation algorithm. First, a customized Canny edge detector is used to create a map showing local main edges. A complementary gradient map in the axial direction is also acquired with a larger kernel. Second, a graph is built based on a combination of the axial intensity gradient and the Canny edge maps. The layer boundary is then extracted by a shortest path search applied to the graph using a dynamic programming algorithm.
The Canny edge detector has been recognized as a well-optimized edge detector [24]. Here, the Canny edge detector is customized using three thresholds: a low-valued threshold to remove the false edges, a high-valued threshold to highlight the significant edges, and a middle-valued threshold to preserve other potential edges as well. The customized Canny edge output is not binary, but rather consists of three discrete values as determined by the thresholds, i.e. the higher the value, the stronger the edge. The parameters of the Canny edge detector, such as kernel size and thresholds, are catered to the different boundary detections, but are common and set automatically for all subjects. All the parameters were experimentally optimized for general use in segmenting Topcon OCT images. The thresholds are set to constant values that vary by boundary. For high contrast boundaries, such as the ILM and IS/OS, the high-valued threshold is set to 0.85, and the middle-valued and low-valued thresholds are set to 0. For low contrast boundaries, such as the ELM, all three thresholds are set to 0. Other intra-retinal boundaries have the high-valued threshold set to 0.8, the middle-valued threshold set to 0.55, and the low-valued threshold set to 0.1. The kernel size of each Canny edge detector varies and is proportionately adjusted by the scan resolution. In most cases, the Canny edge map can correctly show the discontinuous layer edges being detected. Therefore, the Canny edge map can serve as the primary guide within the shortest path search.
A graph based only on node cost assignments is constructed. The node costs are mainly based on a linear combination of Canny edge detector output, axial intensity gradient values, and other information, such as intensity, for specific boundaries, and are given by (1) where C(i, j) is the node cost function, i is the X direction index with maximum number n, j is the Y direction index with maximum number m, and w 1 , w 2 , and w 3 are weights.
The axial intensity gradient map provides complementary search guidance where Canny edge information is missing or weak and is acquired by applying a derivative of Gaussian filter in the axial direction to the Gaussian smoothed image. The reason that the axial intensity gradient map can provide additional information is that it extracts the axial direction gradient using a larger kernel size. Due to the larger kernel size, the neighboring information assists to fill in the lost information in blood vessel shadowing areas, but at the same time local variations at other locations will be smoothed spatially as well. Utilizing gradient information in dual-scale can preserve local details while interpolating information in shadow regions. Lu et al. presented an approach to pre-extract the vessel locations throughout each B-scan and then apply a local linear interpolation across disconnected Canny edges to complete the boundaries [20]. In our approach, pre-extraction of vessel locations, which is not a trivial operation, is unnecessary. Figure 2 illustrates how the axial intensity gradient map can provide useful complementary gradient information to the main guidance of the Canny edge detector output, using an IPL/INL boundary segmentation as an example. The Canny edge detector generally located the IPL/INL boundary where there were no blood vessels as illustrated in Fig. 2b, while the axial gradient utilizing a bigger kernel provided additional information in the blood vessel shadow area shown in Fig. 2c. The axial gradient output values were continuous, and the figure display range has been adjusted to better visualize the gradients near the IPL/INL. Figure 2d shows the shortest path search result from the map combining both Canny edge and axial gradient information. Clearly, the local variations were well tracked while the boundary within the blood vessel shadow region was reasonably interpolated.
The applied shortest path search, or optimization, algorithm utilizes dynamic programming. Dynamic programming is an efficient optimization method, which does not require the specification of start and end points [25] and has been shown to be applicable in retinal image segmentation [26]. The shortest path search here finds the minimum cost to reach position (i, j). Equation (2) shows how a dynamic programming algorithm is utilized to define the object function to search the shortest path. (2) where t(i, j) is the object function of minimum cost to reach (i, j), C(i, j) is the node cost function, i is the X direction index with maximum number n, and j is the Y direction index with maximum number m. Figure 3 illustrates the sequence in the boundary detection process for a single B-scan. The OCT image is first preprocessed with a customized cross-correlation based alignment algorithm to realign the A-scans. No additional denoising or smoothing treatment is needed, which avoids the possible edge blurring brought about by some smoothing processes. The ILM and IS/OS boundaries are detected first so that the shortest path search maps of the other boundaries can be restricted to successively smaller search areas, which saves considerable computation time. Features such as edge direction and pixel intensity are added as the additional cost to the search graph and depend upon the boundary of interest. The edge direction is only considered as black-to-white or white-to-black. After detecting the IS/ OS boundary, the image is further aligned along the IS/OS boundary. The OS/RPE and BM/ Choroid boundaries are the next to be detected. In order to accurately detect the OS/RPE and BM/Choroid, a smaller kernel size is applied in the Canny edge detector to increase the axial detection sensitivity. The search graph is constructed using only Canny edge and axial intensity gradient strength, and the search area is limited below the IS/OS boundary. The IPL/INL and NFL/GCL are detected next. The GCL/IPL is then detected within the NFL/ GCL and IPL/INL boundaries. The Canny edge detector is applied again within that area to extract the GCL/IPL boundary. The INL/OPL, the dark-to-bright edge between the IPL and the outer nuclear layer, is the next to be detected. Finally, the ELM is detected in a similar manner. Although the detection of a single boundary may utilize the other two pre-detected neighboring boundaries to limit the shortest path search area, the intra-retinal boundaries are allowed to overlap their neighboring boundaries. To ensure smoothness, intra-retinal boundaries, including the NFL/GCL, GCL/IPL, IPL/INL, ELM, OS/RPE and BM/Choroid, are smoothed using a polynomial curve-fitting based technique. The polynomial order is common for all subjects and varies by boundary, ranging from 5 to 16. If the segmented data sets are 3D volumes, additional smoothing is applied across frames. The boundaries for each A-scan location are filtered with a one-dimensional Gaussian kernel across B-scans.

Experiment and quantitative analysis
Accuracy and repeatability testing were performed for both normal and fast mode segmentations. In the normal mode segmentation, the full size image is the input to segmentation algorithm. In the fast mode segmentation, an A-scan reduction process is introduced to further speed up the segmentation. Each full size B-scan image (such as 480×512 pixels) is reduced to a smaller size image (480×171 pixels) with A-scans reduced to one-third the original number by averaging every consecutive three A-scans. After the segmentation algorithm is applied to the reduced size image, the results are then linearly interpolated back to the full image space.
Segmentation accuracy was independently validated by comparing our algorithm results to the results [28] from a manual segmentation procedure [6,27,28]. The data sets included 38 scans obtained from 38 individuals, 19 glaucoma patients and 19 controls. The thirty-eight two-dimensional macular scans of the horizontal meridian were acquired using Topcon 3D OCT-1000 equipment, and consisted of 16 overlapping, averaged B-scans (1024 A-scans and 6 mm in length). The image depth was 1.68 mm and axial resolution was 3.5 µm/pixel. The manual segmentation results were provided by four experienced and well-trained segmenters with the use of a computer-aided manual segmentation procedure [6,27]. The manual segmentation procedure by trained individuals has been shown to have low withinand between-segmenter variations [28]. The segmenters only segmented 5 of the 9 boundaries. The automated segmentation algorithm was applied to the same data sets without any prior information regarding the manual segmentation results. The unsigned and signed local border position differences of the ILM, NFL/GCL, IPL/INL, INL/OPL and BM/ Choroid boundaries were compared as follows [22]: algorithm versus the average of the four segmenters; algorithm versus each of the four segmenters; and a pair-wise comparison between the four segmenters. The differences were not calculated in image locations where at least one segmenter reported that a boundary was not visible. At each A-scan location within each image, the local border position difference was calculated as the difference between the border position for the algorithm and the manual segmentation. The signed and unsigned local border differences were then averaged across A-scans and images. The average pair-wise difference between segmenters was calculated as the average of the mean local border position differences of all the possible segmenter pairings among the four segmenters [28].
To test repeatability of the segmentation algorithm, vertical macular 3D scans of 43 eyes (each with 3 repetitions) from Topcon's Normative Database were segmented. The 3D scans were acquired using Topcon 3D OCT-1000 machines. Each 3D scan has 128 frames and each frame covers a 6mm by 6mm macular area (B-scan image size 480×512). The intraclass correlation (ICC), mean of coefficients of variation (CV) and mean of standard deviation (SD) were calculated to test the reproducibility of each detected boundary. For these statistical reproducibility calculations, the data within each group of four adjacent Alines were averaged down to yield a reduced data set of size 128 by 128. An automated foveal centering algorithm was then applied to the first repetition for each subject. Next, the data for the remaining two repetitions were registered to the first using a proprietary registration algorithm that allowed for optimization of translation and rotation based on total retinal thickness information (thickness from ILM to BM/Choroid). The data sets were then cropped to the center 5mm by 5mm square region, resulting in a final data set size of 107 by 107. The ICC, CV, and SD calculations were then performed across the three repetitions for each subject, and the 43 individual results were averaged together yielding collective means of ICC, CV, and SD. Figure 4 illustrates a typical segmentation result for a three-dimensional volume of a normal eye and shows the corresponding two-dimensional thickness maps for the NFL, ganglion cell complex (GCC: from ILM to IPL/INL boundary) and total retina. The segmentation algorithm was implemented in Matlab and C. All the data sets were processed by a personal computer (CPU: Core 2 Duo 3 GHz, RAM: 4 GB). It took about 45 seconds in C for the full nine layer detection for each 3D volume in normal segmentation processing mode and 16 seconds in fast segmentation mode. The algorithm was not specifically optimized for multiple threads.

Segmentation experiment results
The unsigned and signed border position differences for normal segmentation mode are summarized in Tables 1 and 2 and the results for fast segmentation mode are listed in Tables  3 and 4. Looking at the individual boundaries, the ILM, INL/OPL and BM/Choroid border position differences for the algorithm (Auto) versus manual segmentation were very close to the between-segmenter differences, which were within axial resolution (3.5 µm) range. The NFL/GCL and IPL/INL differences for algorithm versus manual segmentation were larger than the differences between segmenters. One major source of the larger discrepancy for the IPL/INL was in the foveal region. Because all of the tested images spanned the horizontal meridian, this factor affected all scans in our algorithm versus manual segmentation comparison. In particular, the auto-segmented IPL/INL around the fovea tended to be below the manually segmented boundary as shown in Fig. 5. When the fovea area (the central 250 of the 1024 A-scans) was excluded, the unsigned border position difference of the algorithm versus average of the four segmenters for IPL/INL boundary was improved to 3.72 ± 1.03µm from 4.31 ± 1.04µm. The discrepancy in the NFL/GCL measurements arose mainly from temporal side measurements where the NFL is very thin. This thin NFL could be detected by our algorithm, but the detected thicknesses were thicker than what were measured by the human segmenters. When only the nasal side NFL was counted in the accuracy validation, the unsigned border position difference of the algorithm versus the average of the four segmenters was improved to 2.97 ± 0.57µm, which was smaller than the between-segmenter difference. It is notable that the human experts also have difficulty performing consistently within and between segmenters in segmenting thin temporal NFL outer boundaries as well as the intra-retinal layer boundaries in the central fovea area [28]. The overall unsigned border position difference (3.39 ± 0.96µm) for the algorithm versus averaged manual segmentation was close to the axial resolution of 3.5µm and slightly larger than the averaged between-segmenter difference (3.28 ± 1.03µm). The results for the fast segmentation mode followed the same trend with slightly larger numbers in unsigned border position differences.
The grand average ICC, CV and SD reproducibility results for both normal and fast segmentation modes are listed in Table 5. Overall the ICC of each boundary was above 0.94, the mean coefficient of variation was less than 7.4%, and the mean standard deviation was less than 2.8µm. There was virtually no difference between normal mode reproducibility results and fast mode results. The high reproducibility, including NFL/GCL and IPL/INL boundary detection [29][30][31][32], indicates the potential for monitoring disease progress in a clinical application.
Automated segmentation techniques can have a greater chance of failure when the image quality is poor. Figure 6a shows the segmentation result of a low intensity OCT image. Although the intensity in the middle part of the image sharply decreases, the contrasts between different layers are still preserved and the boundary is still visible. An algorithm based on the gradient information in dual scales can detect this information and accurately delineate the image. In addition, the customized Canny edge detector utilizes a threethreshold technique, and therefore the lower contrast edges still can be preserved as true boundary candidates rather than designated as background noise. The shortest path search will pick up these low contrast edges when no other stronger edges present. As a result the proposed algorithm has the ability to segment low-intensity and low-contrast OCT images. Figure 6b demonstrates the segmentation robustness in the presence of blood vessel artifacts. Discontinuities caused by blood vessels are commonly seen in macular OCT images, and this represents a significant challenge for automated segmentation and thickness map generation [33]. Our algorithm incorporates neighboring gradient information to overcome the blood vessel discontinuity issue as illustrated in Fig. 2. Figure 8 illustrates the results based on a vertical 3D scan volume for a patient with glaucoma. The thinning of the NFL in the nasal and inferior hemispheres is clearly visible from the NFL thickness map as well as the total thickness map.

Discussion and conclusions
In summary, a novel automated segmentation algorithm has been developed and evaluated. The segmentation algorithm is mainly based on dual-scale gradient information that captures both global and local features. As indicated by the reported high accuracy, reproducibility, and fast execution speed, it holds high promise toward practical applications.
Regarding accuracy, similar quantitative validations versus manual segmentation are found in four previous macular OCT image segmentation studies [18,19,22,23]. In these studies, comparable border differences between algorithm-versus-manual-segmentation and segmenter-versus-segmenter were reported, although the paper that was based on a statistical model did not specify the absolute difference values [19]. The authors of the 3D segmentation papers presented overall unsigned border differences of 5.69 ± 2.41µm (fovea excluded) [18], 6.1 ± 2.9µm [22] and 5.75 ± 1.37µm [23] between algorithm and manual segmentation in their three studies. Our results (3.39 ± 0.96µm) are considerably smaller than what has been reported. While our algorithm may have been performing better, several factors may have contributed to our smaller values, including machine type (time domain OCT or SD-OCT), axial resolution, image type and quality, and the boundaries quantified. The mean difference between algorithm and manual segmentation shown in our study was slightly larger than the mean between-segmenter difference. The fact that all segmenters were well trained and given the same guidelines [28] and that all our test images span the horizontal meridian may contribute to this result.
Among the nine boundaries segmented in our paper, the GCL/IPL and ELM are not commonly found in literature. Several attempts at quantifying the combined GCL and IPL thickness have shown its potential application in the detection of diseases such as glaucoma and diabetic retinopathy [6][7][8][9]. Our results technically demonstrate the feasibility of automated GCL/IPL and ELM segmentation and/or quantification, which may be beneficial in future clinical study. With current OCT single image quality, manual segmentation of the GCL/IPL and ELM boundaries is not yet available. The IS/OS and OS/RPE were not tested for accuracy due to the availability of manual segmentation data [28]. While visually these two untested boundaries appear to be performing well, they too should be validated against manual segmentation in future work.
The algorithm aims to achieve high speed without degrading accuracy. Ideally, to achieve the highest degree of sensitivity and accuracy, there would be customized Canny edges for each individual boundary detection. However, to reduce processing time in our implementation, the ILM and IS/OS share the information from the same Canny edge detector and axial gradient map, as do the OS/RPE and BM/Choroid, and the NFL/GCL and IPL/INL. The GCL/IPL and INL/OPL can be calculated separately in relatively little time, since the prior information of neighboring boundaries narrows down the detection area, which is an advantage of sequential detection. In this way, about 45 seconds are needed to complete the segmentation of a full-size 3D volume. To further gain speed, the A-scan reduction technique has been implemented, thereby successfully reducing the execution time to 16 seconds per volume without notably degrading the accuracy or reproducibility.
Although the algorithm is primarily based on gradient information, it still has the flexibility to add any other useful information to the final optimization because its post-processing is based on a convenient graph framework. Currently, only node cost is assigned. But link cost, usually assigned as smoothing terms, can also be used. If link cost were to be assigned, the additional smoothing procedure in the current algorithm may no longer be needed. Additional flexibility rests in the possibility for different kernel sizes, Canny thresholds for the various boundaries, and weight factors of different terms in the cost function. Our settings for these parameters were designed experimentally and intuitively based on various OCT images including normal and diseased eyes with different scan types. In other words, the algorithm has been optimized for general use rather than only for the data sets in this work.
In the final processing step for 3D volumes, the layer height information from neighboring B-scans is utilized to additionally smooth the 2D segmentation results. In our approach, the 3D information is integrated after 2D segmentation. Our current algorithm has the advantage of utilizing 3D information without building a complex 3D segmentation model, but it loses the ability to fully utilize the 3D information. In the future, efficiently integrating 3D information prior to 2D segmentation may be beneficial.
So far, this algorithm has been extensively evaluated for macular OCT images of relatively normal scans. More work is needed to evaluate our approach on retinas affected with outer and inner retinal diseases. Our approach can also be extended to other scan modes as well, such as circumpapillary and optic nerve head images.    Results of a normal 3D vertical scan volume. NFL, GCC (from ILM to IPL/INL) and total retinal thickness maps (from ILM to OS/RPE) covered 5×5mm 2 area, in which three Bscans (the 32nd, 64th and 96th images) of this volume were shown. T: temporal; I: inferior; N: nasal; S: superior. Comparison between manual segmentation (average of four segmenters' results, yellow lines) and algorithm segmentation (blue lines). The blue arrows illustrate the commonly seen NFL/GCL discrepancy between human segmenters and the automated algorithm. The red arrows indicate the commonly seen IPL/INL difference between human segmenters and the automated algorithm around the fovea.  Results of a 3D horizontal scan volume with drusen. The total thickness map (from ILM to OS/RPE) and the RPE complex thickness map (from OS/RPE to BM/Choroid) cover 5×5mm 2 area, in which two B-scans (the 20th and 40th images) of this volume are shown. T: temporal; I: inferior; N: nasal. Results of a 3D vertical scan volume from a glaucoma patient. NFL and total retinal thickness maps (from ILM to OS/RPE) cover 5×5mm 2 area, in which two sample B-scans (the 15th and 47th images) from the volume are shown. T: temporal; I: inferior; N: nasal; S: superior. Table 1 Unsigned border position differences (mean ± SD in um) of 38 scans using normal segmentation mode  Signed border position differences (mean ± SD in um) of 38 scans using normal segmentation mode  Unsigned border position differences (mean ± SD in um) of 38 scans using fast segmentation mode  Signed border position differences (mean ± SD in um) of 38 scans using fast segmentation mode  ICC: intra-class correlation; SD: standard deviation; CV: coefficient of variation; Normal: segmentation in normal mode; Fast: segmentation in fast mode with A-scan reduction technique applied