Layer boundary evolution method for macular OCT layer segmentation

: Optical coherence tomography (OCT) is used to produce high resolution depth images of the retina and is now the standard of care for in-vivo ophthalmological assessment. It is also increasingly being used for evaluation of neurological disorders such as multiple sclerosis (MS). Automatic segmentation methods identify the retinal layers of the macular cube providing consistent results without intra- and inter-rater variation and is faster than manual segmentation. In this paper, we propose a fast multi-layer macular OCT segmentation method based on a fast level set method. Our framework uses contours in an optimized approach speciﬁcally for OCT layer segmentation over the whole macular cube. Our algorithm takes boundary probability maps from a trained random forest and iteratively reﬁnes the prediction to subvoxel precision. Evaluation on both healthy and multiple sclerosis subjects shows that our method is statistically better than a state-of-the-art graph-based method.


Introduction
Optical coherence tomography (OCT) is a three-dimensional (3D) imaging technique that emits a beam of light into the tissues to be examined and detects the reflected or back-scattered light from the tissues. This reflected light will interfere with light that originated from the same source and the reflectivity profile along the light beam can be derived from the interference signal and used to generate an A-scan. Sequences of A-scans are used to form two-dimensional (B-scans) and three-dimensional volumes. The reflectivity profile of all A-scans at a fixed depth forms a C-scan. OCT acquires images with high resolution (< 5µm) [1], from which the individual retina layers can be demarcated, qualitatively assessed, and quantified [2].
Ophthalmological assessment of retinal disease and neurological disorders often includes a segmentation of retinal cellular layers from OCT images [3][4][5]. Eight   continued to use the term IS-OS.
Quantification of layer thickness is often used in the study of various ophthalmological and neurological diseases. For example, subjects with multiple sclerosis (MS) of all subtypes exhibit significant thinning of the inner retinal layers [3]. OCT is also used to study myopia [7], diabetes [5,[8][9][10], Alzheimer's disease (AD) [11], and glaucoma [4,12,13]. Since manual segmentation of each layer is time consuming and prone to inter-rater variability, fast, automatic algorithms for retinal layer segmentation are preferred.
Various approaches have been developed for automated layer segmentation of OCT images , including methods based on shortest paths [17], active contours [18][19][20], statistical models [21], and level sets [22][23][24][25]. The graph method introduced by Li et al. [36], was originally developed for blood vessel segmentation before being repurposed by Garvin et al. [26] for OCT data. The underlying graph framework, with various improvements and extensions [14][15][16][28][29][30][31], has become the predominate method in use today for OCT segmentation. Recently, a great amount of effort has been focused on exploring the use of deep networks, which have proven to be very effective in other medical imaging tasks, for use in OCT image segmentation. Fang et al. [33] developed a combined approach using both a deep neural network and a graph cut search, while He et al. [34] proposed a cascaded fully convolutional neural network to guarantee topological ordering of the layers. This paper introduces a novel framework for retinal layer segmentation that builds on the multi-layer fast level set method (MLS) [37]. Our method takes probability maps for each boundary from a trained random forest as input. An external force field and an initial boundary location are calculated from the input probability maps. The initialization is further refined through iterative updates based on the external forces and internal forces. Five major extensions to MLS are employed: 1) a subvoxel location is calculated during surface evolution, which allows for better estimation of the smoothness force; 2) a simple Gaussian filtering process is used to calculate the smoothness force without the need to calculate curvature, which significantly reduces computational time; 3) the initial surface is generated using a shortest path method; 4) surface evolution is performed in a BM flattened space and the benefit of using a flat space is incorporated into an extra term in the internal force; and 5) vector field convolution [38] is used instead of gradient vector flow [39] for better performance. Our proposed layer boundary evolution (LBE) algorithm can be divided into two parts. The first part, in which nine boundary probability maps are calculated from a retinal OCT image, is largely adopted from [28]. Three steps are involved: preprocessing an OCT image, feature extraction, and boundary classification using a trained random forest. The second part uses the boundary probability maps from the trained random forest for generating two inputs to the layer boundary evolution: a force field and an initial prediction for each boundary. The final result is the output of layer boundary evolution, which iteratively refines the input prediction to a subvoxel accuracy (see Fig. 2). Since the first part is well documented in [28] and its source code is publicly available (http://www.nitrc.org/projects/aura_tools), we briefly introduce each step in the first part and then focus most of our discussion on the second part.

Preprocessing
The preprocessing steps serve to transform an OCT volume into a common space so that the intensity and spatial coordinates of an input volume is consistent with the training data. An example of a preprocessed B-scan is shown in Fig. 3.

Retina boundary flattening
Flattening is a common preprocessing step performed by many retina segmentation algorithms [17,26]. We first estimate the top and bottom boundaries of the retina, specifically the ILM and the BM. A B-scan is filtered by a Gaussian smoothing kernel with σ = 3 pixels and the derivative along each A-scan within the B-scan is computed. The locations of the ILM and BM in a particular A-scan is found as the locations that have the largest positive and negative derivatives, respectively. Error points are found by comparing the estimated boundaries with its independently median filtered version. Points that deviate more than 15 voxels from the median filtered surface are defined as outliers and are replaced by interpolated points. A BM flattened volume is obtained by vertically shifting each A-scan within its B-scan based on the estimated BM surface. Features related to spatial coordinates of pixels contain more information in a retina boundary flattened volume. Our classifier described in Section 2.2 is trained using features calculated from this flattened volume and its corresponding manual delineation.

Intensity normalization
The automatic intensity rescaling and automatic real-time averaging performed by the OCT scanner may potentially cause a large intensity variation for a particular tissue type within one subject. These differences in dynamic range are not desirable especially during feature classification. An intensity consistent B-scan is achieved by computing a robust maximum I m and linearly rescaling intensities from the range [0, I m ] to the range [0, 1], with values greater than I m being clamped at 1. The robust maximum I m is found by first median filtering each individual A-scan within the same B-scan using a kernel size of 15 pixels. Then, I m is set to the value that is 5% larger than the maximum intensity of the entire median-filtered image.

Random forest classification
The boundary locations are estimated using a trained random forest classifier [28]. In total 27 features are calculated for every voxel in the preprocessed OCT volume. Those features are passed to decision trees in the forest, which provides information about whether the voxel belongs to a specific boundary or not. By combining the predictions from all decision trees, the probability of each voxel belonging to each boundary is computed. In short, the trained random forest takes features and outputs nine probability maps, each associated with one boundary (see Fig. 4). A probability map has the same size as the input OCT volume. To train the random forest, manual delineations are also preprocessed to have the same volume size and flattened BM.

Surface representation
The fast level set method introduced by Shi et al. [40] uses a list instead of a level set function to represent a boundary or the zero level set. Their idea adapts naturally to boundaries in retinal OCT images. Specifically, the anatomical structure of human retina determines that retinal layers cannot fold on themselves. In other words, an A-scan can only intersect a specific layer boundary once in an OCT image. Therefore, for a given OCT image of size l × m × n, where l is the number of C-scans, m is the number of A-scans within a B-scan, and n is the number of B-scans within the 3D volume, a 2D array of size m × n is sufficient to represent a specific boundary. The j th row, k th column of the 2D array corresponds to the j th A-scan within the k th B-scan, and its value indicates the intersection location of that boundary in that A-scan. In other words, this 2D array represents the boundary height as shown in Fig. 5. We note that a layer boundary in a B-scan of size l × m is represented by a 1D array of size 1 × m.

Force field
The external force F ext used to drive the surface to the layer boundaries is computed from the probability map described in Section 2.2. A probability map can be thought of as an edge map of the corresponding boundary. We generate the external force field using vector field convolution (VFC) [38], which calculates the convolution of the probability map and a vector field kernel. Typically, all vectors in the vector field kernel point to the kernel origin; thus our external force always points to locations with the highest boundary probability. It can be used to guide our model to the boundary locations learned by the random forest. An example of the vector field kernel with its generated force field in a zoomed B-scan is shown in Fig. 6. During layer boundary evolution, we evolve surfaces vertically in a B-scan (see Section 2.6), which means only forces along an A-scan take effect. Which is why we only show vertical forces. The vector field kernel, however, is 2-dimensional so that information from adjacent A-scans is incorporated. Compared with other methods, VFC produces a force field that is robust to noise. This is important for processing fuzzy edge maps, like the probability maps generated by a random forest. A comparison between VFC and gradient vector flow [39], which our previous approach used, found a three fold improvement in speed [38].
Internal forces are used to enforce smoothness of the output surface. The smoothness constraints for layer boundaries in retinal OCT are different from other boundaries. Specifically, boundaries have very different curvature near the fovea versus far from the fovea. Because of this, a universal smoothness constraint over the entire boundary may produce overly-smooth surfaces near the fovea and bumpy surface elsewhere. Using a flat space [25,41,42] can solve this problem by encoding the boundary shape prior information within the flat space. A flat surface in the flat space corresponds to a curvy surface near the fovea and smooth surface away from the fovea. Therefore enforcing a universal smoothness constraint in the flat space allows different smoothness characteristics in different regions. However, a smooth surface in flat space does not necessarily mean a smooth surface in native space, because the conversion from flat space to native space involves interpolation [42]. Instead of converting to flat space, we propose a new internal force F int = F s + F p that accounts for the smoothness of the surface and uses a shape prior for the retinal boundaries.
Internal forces for smoothness regularization, F s , are commonly computed using the curvature on the surface, which is computationally expensive to evaluate. Alternatively, we introduce a Gaussian filtering process that efficiently calculates F s . Use of Gaussian smoothing to replace the curvature was first proposed in [40]. It is motivated by the observation that when the level set function is chosen as the signed distance function, the curvature takes a simpler form and equals the Laplacian of φ, From the theory of scale-space ( [43]), we know that evolution of a function according to its Laplacian is equivalent to Gaussian filtering the function. We first filter the predicted surface B, where B is the 2D array of numbers representing the retinal surface as depicted in Fig. 5, with a Gaussian kernel of standard deviation s to produce a smoothed surface B s . F s is then computed as, where i is the A-scan index, c s is a parameter to control the strength of the force and i is a unit vector along the direction of the A-scan. F s is designed to always point to the smoothed surface. It regularizes a bumpy surface to a smooth one. The second term F p is computed using the difference between the smoothed initialization, B p , and the predicted surface B as, where c p is a parameter to control the strength of the force. Intuitively, F p ensures that the evolution will not drive the surface too far from the initialization. Similar to the use of flat space, we assume that the initial surface encodes the boundary shape prior information and F p makes use of this prior shape without flat space conversion. When we impose a universal smoothness force F s over the entire boundary, F p allows the total internal force to act differently in different regions. Generally speaking, F s near the fovea is large, but it will not oversmooth the boundary because F p compensates F s and encourages a curvy surface when F s tries to flatten out the surface. Both F s and F p were developed in a heuristic manner after the exploration of more traditional forces proved unsuccessful. The external force F ext computed from VFC is normalized between [0, 1]. During the evolution described in Section 2.6, we compute the internal force F int for a boundary based on the current predicted surface B. The total force F for evolution is computed as:

Initialization
With an external force field free of noise, generating an initialization is a trivial problem as the forces point perfectly towards the true boundary. For example, we could simply initial nine flat planes as the predicted surfaces. However, as a real force field is usually noisy, pixels away from the boundary may not have forces that point in the correct direction. If we initialize the surface far from the true boundary, it is likely that it will never evolve to the true boundary, because false boundary responses from noise in the neighborhood may overwhelm the force for a true boundary. Therefore, we want to initialize the surface close enough to the true boundary to guarantee that our initialization is within the capture range of the force fields. We use Dijkstra's shortest path (DSP) algorithm [44] to find the shortest path using the B-scan probabilities.
To use DSP, we construct a graph within each B-scan for each boundary using its probability P(i, j) derived from the random forest. The vertex at the i th location in the j th A-scan only connects to vertices in the ( j + 1) th A-scan that are between i − 1 and i + 1. This ensures that the shortest path we find is smooth and, more importantly, it goes through each A-scan only once. The edge cost C(i, j) to the vertex at location i in the j th A-scan is defined as The total distance is the sum of all edge costs from the previous vertices. We initialize all vertices in the first A-scan as visited with the total distance equal to the edge cost described above. We then iteratively assess a vertex with the minimal total distance within all visited vertices. The algorithm stops when we mark a vertex in the last A-scan as visited. From that vertex, we trace back to the first A-scan. The coordinates of all vertices along the path are used as an initial surface location. For an input B-scan probability map, we find a smooth path that goes across all A-scans. Since the shortest path method provides no topology guarantee, its predicted surfaces may cross over each other, a problem most likely to occur near the fovea. To address this problem, after the nine boundary predictions are individually found in each B-scan, we fix ordering errors as we combine them. We first combine the boundaries of ILM and RNFL-GCL, if in some A-scans the boundary height of RNFL-GCL are close to or even larger then the boundary height of the ILM, we move the RNFL-GCL boundary in those A-scans down so that the RNFL layer has a predefined minimal thickness. This process is iterated to combine all nine boundaries, guaranteeing that the initial boundaries have the correct layer ordering along every A-scan. Previously, we used a regression model to generate initial boundary locations from the retina thickness [42]. We find that the DSP is fast and robust and produces an initialization that is closer to the true boundary.

Layer boundary evolution
Layer boundary evolution (LBE) is an iterative process that refines the initial surfaces to the boundary locations indicated by the force fields described in Section 2.4. Our evolution scheme evolves the boundaries in the A-scan direction by adjusting values in the 2D arrays. Adding to or subtracting from a 2D array will move the boundary towards the inner or outer retina, respectively. This approach provides two benefits: firstly, we can easily find grid points adjacent to the boundary without the need of a level set function; secondly, the layer ordering can be easily maintained during the evolution. Given the layered nature of the tissues under consideration within the retina, there is a topological requirement for OCT images. Specifically, layers cannot fold over themselves and the ordering of the eight layers between the vitreous and choroid needs to be maintained. Since only one location is recorded in each A-scan for each boundary using our surface representation, the first requirement is automatically satisfied. We postprocess the initialization to have the correct layer ordering, as described in Section 2.5. Evolving the boundary only in the A-scan direction helps maintain the ordering of layers to be consistent with the underlying tissue ordering. Every time a change of boundary height (a value changes in the surface 2D array) that will result in its adjacent layer to have thinner thickness than the minimal thickness, we reset its boundary height so that layer has the exact minimal thickness defined. As is shown in Fig. 7, before the current iteration, Point C and Point D are 0.9 pixels away. If Point C tries to move down by more than 0.8 pixels, its location will be reset so that the distance between Point C and Point D is exactly 0.1 pixels, where 0.1 pixels is the predefined minimal layer thickness (for this example).
During each iteration, a decision will be made for every value in a 2D array B: whether to increase the value (move up towards inner retina), decrease the value (move down towards outer retina), or keep the surface position unchanged. For a boundary point B n ( j, k), this decision is made by looking at the combination of external and internal forces at its upper adjacent grid point (ceil(B n ( j, k))) which is computed as F n u ( j, k) = F ext (ceil(B n ( j, k))) + F n int (ceil(B n ( j, k)) and at its lower adjacent grid point (floor(B n ( j, k))) which is computed as F n l ( j, k) = F ext (floor(B n ( j, k))) + F n int (floor(B n ( j, k)).
Function ceil(·) rounds the input value to the nearest larger integer while function floor(·) rounds the input to the nearest smaller integer. Superscript n indicates the iteration index, and ( j, k) is the A-scan index. While F ext can be precomputed, we need to calculate F int during each iteration. We use two arrows and a division line to represent the direction of the net forces F n u ( j, k) and F n l ( j, k); for example, the symbol ⇓ ⇑ indicates that the F n u ( j, k) is pointing down while the F n l ( j, k) is pointing up. The decision rule for refining a predicted surface point B n ( j, k) in the n th iteration is described as follows: 1. If the two forces are ⇓ ⇑ , a zero level set location will be calculated as the predicted boundary location. Case 1 has two net forces pointing at each other, which is the feature exhibited at a boundary. The zero level set location is calculated using the net force at the upper grid point F n u ( j, k) and lower grid point F n l ( j, k): Even if this subvoxel location may change in future iterations because of changes in the internal force, it is essential to calculate it so that boundary points in adjacent A-scans can use this more accurate location for better estimating F s in internal force in later iterations. For cases 2 and 3, we evolve the boundary point to follow the force field. Both cases indicating that the true boundary is not in between the current upper grid point and lower grid point interval. Situation 4 accounts for the influence of noise or a bad initial surface. When we have F n u ( j, k) and F n l ( j, k) pointing away from each other, the combined force field cannot correctly infer the direction of the true boundary. It can result from noise in the probability map or a poor initial surface. In such cases, our model trusts the smoothness force F s . Because the upper grid point (ceil(B( j, k))) is always one voxel higher then the lower grid point (floor(B( j, k))), from Equation 1 it is obvious that the smoothness force at the upper grid point will always be smaller than the lower grid point. Thus, in subsequent iterations we will not have F n u ( j, k) and F n l ( j, k) pointing away from each other again.

Data set
We test our LBE algorithm on macular OCT images with no observable intraretinal pathologies. We do this with data from the right eyes of 36 subjects obtained using a Spectralis OCT system (Heidelberg Engineering, Heidelberg, Germany). The research protocol was approved by the local Institutional Review Board, and written informed consent was obtained from all participants. Of the 36 subjects, 21 were diagnosed with MS while the remaining 15 were healthy controls. All scans were screened and found to be free of microcystic macular edema (a pathology sometimes found in MS subjects).
All scans were acquired using the Spectralis scanner's automatic real-time function in order to improve image quality by averaging at least 12 images of the same location (and the ART setting was fixed at 12). The resulting scans had signal-to-noise ratios (SNRs) of between 20 dB and 38 dB, with a mean SNR of 30.6 dB. Macular raster scans (20 • × 20 • ) were acquired with 49 B-scans, each B-scan having 1024 A-scans with 496 voxels per A-scan. The B-scan resolutions varied slightly between subjects and averaged 5.8µm laterally and 3.9µm axially. The through-plane distance (slice separation) averaged 123.6µm between images, resulting in an imaging area of approximately 6 × 6 mm 2 . We note that our data is of a higher resolution and better SNR than many other recent publications. The nine layer boundaries were manually delineated on all B-scans for all subjects by a single rater using an internally developed protocol and software tool. The manual delineations were performed by clicking on approximately 20-50 points along each layer border followed by interpolation between the points using a cubic B-spline. Visual feedback was used to move points to ensure a correct boundary.

Parameter selection
The random forest classifier was trained using 4 healthy subjects and 6 MS subjects. The recommended parameters of the RF classifier described in [28] are used. Since our initialization is very close to the final results, the maximum number of iterations is set to 10. This allows an accurate estimation of the boundary at a small computational cost.
There are three parameters related to the internal force that need to be determined in our proposed method (LBE) for a single boundary: 1. c s in Equation 1 weights the smoothness force; 2. c p in Equation 2 weights the shape prior force; and 3. s defines the standard deviation when computing B s in Equation 1 Different sets of parameters were found for each of the nine boundaries using a validation set of 2 healthy subjects and 3 MS subjects. We used mean absolute boundary error as the metric when determining these parameters. The mean absolute boundary error is calculated as the absolute distance for each boundary against the manual delineated surface along each A-scan. These errors are first averaged across all A-scans in each subject and then averaged over all subjects for each boundary.
We ran 1440 experiments on the validation set for each boundary with different parameters and we chose our parameters by selecting the combination of c s , c p , and s that produced the lowest mean absolute boundary error. We limited the search region of c s and c p to be within [0, 2.75]; otherwise, the internal force might have overwhelmed the external force yielding a result that is simply a smoothed version of the initial surface. Figure 8 shows the effect of parameter selection on the mean absolute error of four boundaries. Each point in the 3D volume in Fig. 8 represents the resultant absolute error produced by a certain choice of the three parameters. For each boundary, the resulting mean absolute errors larger than 5% of the lowest mean absolute error are shown as red circles. Meanwhile a mean absolute error within the range of 5% of the best result is linearly mapped to [0, 1] with the lowest mean absolute indicated by 0 (blue small spheres) and the worst mean absolute error indicated by 1 (yellow large spheres). The best results are marked with a magenta triangle.  After determining the optimal c s , c p , and s for the validation set, we used them on the test data set; the mean absolute boundary error for iterations between 1 and 50 are shown in Fig. 9. This demonstrates that 10 iterations is an appropriate number to allow all boundaries to converge.

Results
We evaluated our method (LBE) on the remaining 21 subjects that were not used to train the random forest or for parameter selection. We compared with a previously reported graph cut method (AURA v2.11) [28] and its most recent version (AURA v3.4). AURA v2.11 has been independently shown to be one of the most accurate OCT retinal segmentation tools [45]. For the purpose of direct comparison, the recommended parameters [28] for the random forest are used for all three methods. An example of the manual delineations, as well as the result of the AURA v3.4 and LBE methods on the same subject are shown in Fig. 10.
Shown in Table 1 is the mean absolute boundary error of the three methods against the manual delineation. The mean absolute boundary error is calculated as the absolute distance for each boundary against the manual delineated surface along each A-scan. The standard deviations were calculated assuming that every error value is separate (i.e., errors were not averaged for each subject before calculation). It is observed that the mean absolute boundary error is smaller for LBE than AURA v3.4 in 8 out of the 9 boundaries, the exception being the RNFL-GCIP boundary. A paired Wilcoxon signed rank test was used to test the significance of any improvement between AURA v3.4 and our method. 6 of the 9 boundaries reach significance (α level of 0.001). Two boundaries (OS-RPE and BM) lack statistical significance because of the large variance. AURA v3.4 has lower mean absolute boundary error than LBE on one of the nine boundaries, however it does not reach significance. We also computed the Dice Coefficient [46] between each of the three methods and the manual delineation; the results are shown in Table 2. The Dice Coefficient measures how much the segmentation results produced by an algorithm agree with the manual delineation. The Dice Coefficient has a range of [0, 1], with higher values reflecting more agreement between the algorithm and the manual delineation. Since both manual delineation and segmentation results are boundary locations, we first converted the boundary locations to a layer mask by rounding the boundary location to the voxel level and then in each A-scan assigning the corresponding layer label from the current boundary voxel to the location for the boundary below. The resulting Dice coefficients for AURA v3.4 and LBE are very close, with five out of the eight layers showing statistically significant improvements (α level of 0.001).
The computational performance are compared between AURA v3.4 and LBE. Both algorithms were implemented in MATLAB 2018a and run on a four core 3.1GHz Linux computer. For  4. Typically, more than 80 seconds are spent on producing the probability maps, which are common to both algorithms. The peak memory usage for our algorithm is 9.7 GB during the random forest classification, compared with 11.5 GB for AURA v3.4 during its graph cut search.

Conclusion
In this paper, we proposed a layer boundary evolution method that loosely originates from the fast level set method to replace the commonly used graph based method. We build our pipeline upon the state-of-the-art graph based method [28]: our method takes boundary probability map from a trained random forest, the result produced by the proposed method shows excellent performance in terms of both mean absolute boundary error and layer Dice coefficient. In particular, five out of nine boundary estimates are statistically better than the state-of-the-art graph cut method when using the same random forest classifier. In addition, less computation cost is required, which makes it an ideal alternative for graph cut search in boundary refinement after feature classification. The mean absolute boundary error is less then 4 µm for all nine boundaries. Although we are unable to outperform deep learning based methods [47] in terms of speed (since the latter needs only a few seconds to run on a GPU), our method provides topological correct results and subvoxel resolution that most deep learning based methods cannot achieve. In terms of the run times of other methods, we note that OCTRIMA3D [48] has a run time of 28 seconds whereas AURA v2.11 on the same hardware had a run time of 152 seconds [48]. However, AURA v2.11 performed better in terms of mean absolute boundary error. The data used to develop LBE is of higher quality than is typically acquired in a clinical OCT setting, however it is similar to the data used to developi the state-of-the-art [28]. Moreover, the random forest that generates the boundary classification probability maps is the same as [28] and has been previously shown to be very robust across scanners and data sources [45,49]. Which provides us with some confidence about the general utility of out method across scanning platforms.
Future work will focus on replacing the current random forest classifier with a deep learning model to produce boundary probability maps using data driven features. The major challenge will be producing a diffused probability map from a deep network. Deep networks are often quite certain about the position of a boundary, with near binary probability maps as output; that is the true uncertainty of the deep network is not fully reflected in the networks output.

Funding
This work was supported by the NIH/NEI under grant R01-EY024655 (PI: Prince) and by the NIH/NINDS under grant R01-NS082347 (PI: Calabresi).

Disclosures
The authors declare that there are no conflicts of interest related to this article.