Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema

We present a fully automatic algorithm to identify fluid-filled regions and seven retinal layers on spectral domain optical coherence tomography images of eyes with diabetic macular edema (DME). To achieve this, we developed a kernel regression (KR)-based classification method to estimate fluid and retinal layer positions. We then used these classification estimates as a guide to more accurately segment the retinal layer boundaries using our previously described graph theory and dynamic programming (GTDP) framework. We validated our algorithm on 110 Bscans from ten patients with severe DME pathology, showing an overall mean Dice coefficient of 0.78 when comparing our KR + GTDP algorithm to an expert grader. This is comparable to the inter-observer Dice coefficient of 0.79. The entire data set is available online, including our automatic and manual segmentation results. To the best of our knowledge, this is the first validated, fully-automated, seven-layer and fluid segmentation method which has been applied to real-world images containing severe DME. ©2015 Optical Society of America OCIS codes: (100.0100) Image processing; (170.4500) Optical coherence tomography; (170.4470) Ophthalmology. References and links 1. J. A. Davidson, T. A. Ciulla, J. B. McGill, K. A. Kles, and P. W. Anderson, “How the diabetic eye loses vision,” Endocrine 32(1), 107–116 (2007). 2. J. W. Yau, S. L. Rogers, R. Kawasaki, E. L. Lamoureux, J. W. Kowalski, T. Bek, S. J. Chen, J. M. Dekker, A. Fletcher, J. Grauslund, S. Haffner, R. F. Hamman, M. K. Ikram, T. Kayama, B. E. Klein, R. Klein, S. Krishnaiah, K. Mayurasakorn, J. P. O’Hare, T. J. Orchard, M. Porta, M. Rema, M. S. Roy, T. Sharma, J. Shaw, H. Taylor, J. M. Tielsch, R. Varma, J. J. Wang, N. Wang, S. West, L. Xu, M. Yasuda, X. Zhang, P. Mitchell, and T. Y. Wong, Meta-Analysis for Eye Disease (META-EYE) Study Group, “Global prevalence and major risk factors of diabetic retinopathy,” Diabetes Care 35(3), 556–564 (2012). 3. V. Kumar, A. K. Abbas, N. Fausto, S. L. Robbins, and R. S. Cotran, Robbins and Cotran pathologic basis of disease, 7th ed. (Elsevier Saunders, Philadelphia, 2005), pp. xv, 1525 p. 4. J. Pe’er, R. Folberg, A. Itin, H. Gnessin, I. Hemo, and E. Keshet, “Upregulated expression of vascular endothelial growth factor in proliferative diabetic retinopathy,” Br. J. Ophthalmol. 80(3), 241–245 (1996). 5. M. J. Tolentino, D. S. McLeod, M. Taomoto, T. Otsuji, A. P. Adamis, and G. A. Lutty, “Pathologic features of vascular endothelial growth factor-induced retinopathy in the nonhuman primate,” Am. J. Ophthalmol. 133(3), 373–385 (2002). 6. A. Bringmann, T. Pannicke, J. Grosche, M. Francke, P. Wiedemann, S. N. Skatchkov, N. N. Osborne, and A. Reichenbach, “Müller cells in the healthy and diseased retina,” Prog. Retin. Eye Res. 25(4), 397–424 (2006). 7. A. Bringmann, A. Reichenbach, and P. Wiedemann, “Pathomechanisms of cystoid macular edema,” Ophthalmic Res. 36(5), 241–249 (2004). 8. A. Negi and M. F. Marmor, “The resorption of subretinal fluid after diffuse damage to the retinal pigment epithelium,” Invest. Ophthalmol. Vis. Sci. 24(11), 1475–1479 (1983). 9. M. F. Marmor, “Mechanisms of fluid accumulation in retinal edema,” Doc. Ophthalmol. 97(3/4), 239–249 (1999). #231093 $15.00 USD Received 19 Dec 2014; revised 25 Feb 2015; accepted 27 Feb 2015; published 9 Mar 2015 (C) 2015 OSA 1 Apr 2015 | Vol. 6, No. 4 | DOI:10.1364/BOE.6.001172 | BIOMEDICAL OPTICS EXPRESS 1172 10. Early Treatment Diabetic Retinopathy Study Research Group, “Treatment techniques and clinical guidelines for photocoagulation of diabetic macular edema. Early Treatment Diabetic Retinopathy Study Report Number 2,” Ophthalmology 94(7), 761–774 (1987). 11. E. T. Cunningham, Jr., A. P. Adamis, M. Altaweel, L. P. Aiello, N. M. Bressler, D. J. D’Amico, M. Goldbaum, D. R. Guyer, B. Katz, M. Patel, and S. D. Schwartz, Macugen Diabetic Retinopathy Study Group, “A phase II randomized double-masked trial of pegaptanib, an anti-vascular endothelial growth factor aptamer, for diabetic macular edema,” Ophthalmology 112(10), 1747–1757 (2005). 12. M. J. Elman, L. P. Aiello, R. W. Beck, N. M. Bressler, S. B. Bressler, A. R. Edwards, F. L. Ferris 3rd, S. M. Friedman, A. R. Glassman, K. M. Miller, I. U. Scott, C. R. Stockdale, and J. K. Sun, Diabetic Retinopathy Clinical Research Network, “Randomized trial evaluating ranibizumab plus prompt or deferred laser or triamcinolone plus prompt laser for diabetic macular edema,” Ophthalmology 117(6), 1064–1077 (2010). 13. P. Mitchell, F. Bandello, U. Schmidt-Erfurth, G. E. Lang, P. Massin, R. O. Schlingemann, F. Sutter, C. Simader, G. Burian, O. Gerstner, and A. Weichselberger, RESTORE study group, “The RESTORE study: ranibizumab monotherapy or combined with laser versus laser monotherapy for diabetic macular edema,” Ophthalmology 118(4), 615–625 (2011). 14. M. J. Elman, N. M. Bressler, H. Qin, R. W. Beck, F. L. Ferris 3rd, S. M. Friedman, A. R. Glassman, I. U. Scott, C. R. Stockdale, and J. K. Sun, Diabetic Retinopathy Clinical Research Network, “Expanded 2-year follow-up of ranibizumab plus prompt or deferred laser or triamcinolone plus prompt laser for diabetic macular edema,” Ophthalmology 118(4), 609–614 (2011). 15. D. V. Do, U. Schmidt-Erfurth, V. H. Gonzalez, C. M. Gordon, M. Tolentino, A. J. Berliner, R. Vitti, R. Rückert, R. Sandbrink, D. Stein, K. Yang, K. Beckmann, and J. S. Heier, “The DA VINCI Study: phase 2 primary results of VEGF Trap-Eye in patients with diabetic macular edema,” Ophthalmology 118(9), 1819–1826 (2011). 16. M. C. Gillies, J. M. Simpson, C. Gaston, G. Hunt, H. Ali, M. Zhu, and F. Sutter, “Five-year results of a randomized trial with open-label extension of triamcinolone acetonide for refractory diabetic macular edema,” Ophthalmology 116(11), 2182–2187 (2009). 17. A. I. Kotsolis, E. Tsianta, M. Niskopoulou, P. Masaoutis, S. Baltatzis, and I. D. Ladas, “Ranibizumab for diabetic macular edema difficult to treat with focal/grid laser,” Graefe's archive for clinical and experimental ophthalmology = Albrecht Von Graefes Arch. Klin. Exp. Ophthalmol. 248(11), 1553–1557 (2010). 18. Q. D. Nguyen, S. M. Shah, A. A. Khwaja, R. Channa, E. Hatef, D. V. Do, D. Boyer, J. S. Heier, P. Abraham, A. B. Thach, E. S. Lit, B. S. Foster, E. Kruger, P. Dugel, T. Chang, A. Das, T. A. Ciulla, J. S. Pollack, J. I. Lim, D. Eliott, and P. A. Campochiaro, READ-2 Study Group, “Two-year outcomes of the ranibizumab for edema of the mAcula in diabetes (READ-2) study,” Ophthalmology 117(11), 2146–2151 (2010). 19. P. Massin, F. Bandello, J. G. Garweg, L. L. Hansen, S. P. Harding, M. Larsen, P. Mitchell, D. Sharp, U. E. WolfSchnurrbusch, M. Gekkieva, A. Weichselberger, and S. Wolf, “Safety and efficacy of ranibizumab in diabetic macular edema (RESOLVE Study): a 12-month, randomized, controlled, double-masked, multicenter phase II study,” Diabetes Care 33(11), 2399–2405 (2010). 20. D. J. Browning, M. M. Altaweel, N. M. Bressler, S. B. Bressler, I. U. Scott, and Diabetic Retinopathy Clinical Research Network, “Diabetic macular edema: what is focal and what is diffuse?” Am. J. Ophthalmol. 146(5), 649–655, e1–e6 (2008). 21. A. Hunter, E. K. Chin, and D. G. Telander, “Macular edema in the era of spectral-domain optical coherence tomography,” Clin. Ophthalmol. 7, 2085–2089 (2013). 22. Y. M. Helmy and H. R. Atta Allah, “Optical coherence tomography classification of diabetic cystoid macular edema,” Clin. Ophthalmol. 7, 1731–1737 (2013). 23. B. L. Sikorski, G. Malukiewicz, J. Stafiej, H. Lesiewska-Junk, and D. Raczynska, “The diagnostic function of OCT in diabetic maculopathy,” Mediators Inflamm. 2013, 434560 (2013). 24. S. Sonoda, T. Sakamoto, M. Shirasawa, T. Yamashita, H. Otsuka, and H. Terasaki, “Correlation between reflectivity of subretinal fluid in OCT images and concentration of intravitreal VEGF in eyes with diabetic macular edema,” Invest. Ophthalmol. Vis. Sci. 54(8), 5367–5374 (2013). 25. D. A. Sim, P. A. Keane, S. Fung, M. Karampelas, S. R. Sadda, M. Fruttiger, P. J. Patel, A. Tufail, and C. A. Egan, “Quantitative analysis of diabetic macular ischemia using optical coherence tomography,” Invest. Ophthalmol. Vis. Sci. 55(1), 417–423 (2014). 26. S. Ghosh, P. Bansal, H. Shejao, R. Hegde, D. Roy, and S. Biswas, “Correlation of morphological pattern of optical coherence tomography in diabetic macular edema with systemic risk factors in middle aged males,” Int. Ophthalmol. 35, 3–10 (2014). 27. B. J. Antony, M. D. Abràmoff, M. M. Harper, W. Jeong, E. H. Sohn, Y. H. Kwon, R. Kardon, and M. K. Garvin, “A combined machine-learning and graph-based framework for the segmentation of retinal surfaces in SD-OCT volumes,” Biomed. Opt. Express 4(12), 2712–2728 (2013). 28. S. J. Chiu, X. T. Li, P. Nicholas, C. A. Toth, J. A. Izatt, and S. Farsiu, “Automatic segmentation of seven retinal layers in SDOCT images congruent with expert manual segmentation,” Opt. Express 18(18), 19413–19428 (2010). 29. D. C. DeBuc, G. M. Somfai, S. Ranganathan, E. Tátrai, M. Ferencz, and C. A. Puliafito, “Reliability and reproducibility of macular segmentation using a custom-built optical coherence tomography retinal image analysis software,” J. Biomed. Opt. 14(6), 064023 (2009). 30. H. Ishikawa, D. M. Stein, G. Wollstein, S. Beaton, J. G. Fujimoto, and J. S. Schuman, “Macular Segmentation with Optical Coherence Tomography,” Invest. Ophthalmol. Vis. Sci. 46(6), 2012–2017 (2005). 31. K. A. Vermeer, J. van der Schoot, H. G. Lemij, and J. F. de Boer, “Automated segmentation by pixel classification of retinal layers in ophthalmic OCT images,” Biomed. Opt. Express 2(6), 1743–1756 (2011). #231093 $15.00 USD Received 19 Dec 2014; revised 25 Feb 2015; accepted 27 Feb 2015; published 9 Mar 2015 (C) 2015 OSA 1 Apr 2015 | Vol. 6, No. 4 | DOI:10.1364/BOE.6.001172 | BIOMEDICAL OPTICS EXPRESS 1173 32. M. Shahidi, Z. Wang, and R. Zelkha, “Quantitative thickness measurement of retinal layers imaged by optical coherence tomography,” Am. J. Ophthalmol. 


Introduction
Diabetic retinopathy is the leading cause of blindness among working-aged adults in the United States [1].Among those affected, approximately 21 million people develop diabetic macular edema (DME) [2].DME results from the breakdown of the blood-retinal barrier as retinal vascular endothelial and mural cells become damaged from chronic hyperglycemia [3].Retinal hypoxia results in increased production of vascular endothelial growth factors (VEGF) and other signaling cascades [3][4][5].These further progress DME through mechanisms such as cytotoxic damage to retinal fluid transport cells [6][7][8][9].Ultimately, the imbalance between vascular leakage and fluid transport leads to retinal edema and vision loss.
Furthermore, while averaging-based denoising methods enhance image quality and improve the performance of automated image segmentation algorithms [52], they also result in drastic increases in image acquisition times, which may not be tolerated by ill or uncooperative patients, especially when capturing dense volumetric scans.It is therefore desirable to develop automatic OCT segmentation algorithms which can be applied to lower quality images captured using fast-image scanning protocols, as these are often required in real-world, clinical settings.
Finally, while a few state-of-the-art algorithms provide combined fluid and layer segmentation methods [47,48,50], to the best of our knowledge, no fully-automated algorithm has been validated to identify all retinal layers and fluid-filled regions.Furthermore, there is limited rigorous analysis of layer-specific changes in retinal thickness following therapy.As such, the selection of personalized therapeutic strategies for patients remains subjective.As a step towards this goal, we present a fully automated algorithm based on kernel regression (KR) [53] and our previously described graph theory and dynamic programming (GTDP) framework [46,54,55] to identify fluid-filled regions and seven retinal layers on spectral domain (SD)-OCT images of eyes with DME.In the first phase of developing our algorithm, we created KR-based classifiers to estimate fluid and retinal layer positions.We then used these classification estimates in the second phase to guide GTDP segmentation for a more accurate result.
In Section 2, we provide an introduction to KR, and in Section 3 we present our new general purpose KR-based classification method.We then combine our KR-based classification method with GTDP segmentation in Section 4 and describe our implementation on OCT images of eyes with DME in Sections 5 and 6.Finally, we validate the performance of our algorithm in Sections 7 and 8.

Kernel regression review
KR is a non-parametric method for deriving local estimates of a function using a kernel that weighs the relative importance of nearby points [53].While traditional KR-based applications include image denoising and interpolation [53], deblurring [56], and object detection [57], we use KR to improve image classification and segmentation.In this section, we provide an overview on the second order iterative Gaussian steering KR method for image denoising as introduced and described in detail in [53].

Problem formulation
Looking at a small patch of P pixels on a noisy image , the intensity p y of a pixel can be described by ( ) , 1, , , where We next solve for the unknowns 0 , β 1 , β and 2 β in Eq. ( 2) using the weighted linear least squares formulation [53,58] following where , , , , and "diag" defines a diagonal matrix and ˆ( ) ⋅ K is a linearly normalized kernel function used to weigh the P observations such that 1 ˆ( ) 1 3) is a linear least squares formulation, the solution can be given by Eq. ( 4), where the first element in b is the denoised image ˆ.

Adaptive iterative Gaussian steering kernel
In order to perform KR as described in Section 2.1, an appropriate kernel must be selected.In this section, we describe the popular adaptive iterative Gaussian steering kernel ( ) iGS K [53].
To summarize, this kernel can be described as a Gaussian function that is elongated ( ) σ , rotated ( ) θ , and scaled ( ) γ based on the local data to reduce edge blurring.
To compute the Gaussian steering kernel for a pixel z, we first smooth the image using a linear Gaussian filter with standard deviations of i h and j h along the i and j dimensions, and then we compute the image gradients i G and .
Second, we compute the steering parameters , σ , θ and γ as described in Eq. ( 6), where ' λ and '' λ are parameters that prevent undefined or zero values for σ and .γ We then denoise z following Eq.( 4) using a linearly normalized version of the kernel defined Eq. ( 7), where h is the global smoothing parameter.
An example image and its Gaussian steering KR denoised result are shown in Fig. 1(a) and 1(f), respectively.Figure 1(c) and 1(e) are the iterative Gaussian steering kernels for Fig. 1(b) and 1(d), respectively.The kernel in Fig. 1(c) is wider and more isotropic since its corresponding image patch does not contain strong edges, while the kernel in Fig. 1(e) is narrower and oriented along the image edge.In Fig. 1(f), the regions external to the retina (black) were not denoised with the exception of a padded boundary surrounding the retina.The thickness of this boundary is half the kernel size to encapsulate all pixels required for feature vector computation in Sections 3.

Kernel regression-based classification
In this section, we integrate KR with supervised classification [61,62] to classify noisy images.The following subsections describe our novel, general purpose KR-based classification method.Specific application of this method for segmenting retinal layers is discussed in Sections 5 and 6.

Steering kernel regression estimation
Since pixels within an image are individually classified, the presence of noise worsens classification results.Thus, the first step is to use the KR method described in Eq. ( 4) to denoise the image and obtain: 1) direct estimates for the noise-free first and second derivatives ( 1 β and 2 β ), and 2) adaptive directional kernels ( ) K for each pixel.

Computing features
To classify a pixel z , the pixel needs associated descriptors called features.Each feature v is a scalar value used to describe the pixel, and L features comprise a feature vector , , , .
Basic features include pixel intensity, gradient, and location.Other feature types include textural features such as Laws' Texture Energy Measures [63,64].
To mitigate the effects of noise on feature computation, we perform weighted averaging.Since the steering kernels in Section 2.2 adapt to the shape of the underlying structure, we use the kernels from Section 3.1 to filter the features following where ( ) v z is the th l feature for pixel p z and ˆ( ) v z is the th l (now denoised) feature for pixel z.We also use the kernels themselves as features (e.g.kernel height, width, and area) to reveal more indistinct information.Specific feature examples are defined in Section 5.8.

Defining the true (training) classes
In supervised classification, a set of classes (categories) is pre-defined for a given classifier [62].Given a training data set, we manually identify the true class c for each training pixel z such that c .
∈ c Specific examples of these classes are given in Section 5.2.

Defining the classifier function
For an image I, we define a classifier ( ) ϕ ⋅ which estimates the class ĉ ∈ c for a pixel z following ˆ( ).
Provided that the training data set is composed of R total training pixels, we first combine the feature vectors for all R pixels into the feature vector set , , , , where r v is the feature vector for the th r training pixel.For each class k c , we then create a feature vector subset training pixels with k c as the true class.From this definition, it can be seen that We next define the classifier function used to estimate the class.To take into consideration the typical spread of feature values, we use a weighted negative Gaussian classifier function following Eq.(10), where is the mean feature vector across the training feature vector subset , is a set of constant weights used to alter the relative importance of features, and  is the Hadamard product, an element-wise multiplication operation for vectors and matrices.Selection of these weights is described in Section 3.5.

Weighted sequential forward feature selection
Since not all features in v may be useful for predicting the true class c , we reduce the dimension of v by defining a subset ' ⊆ v v of length ' L L ≤ containing the most relevant features [65][66][67].We achieve this by performing weighted sequential forward feature selection (wSFFS), a variation of the classic SFFS method [66] that simultaneously selects features and their corresponding weights.In the classic SFFS method, ' v begins as a null set and features are added to it sequentially [66] by minimizing a predefined criterion function ( ) E ⋅ (e.g.misclassification rate).
A limitation of the SFFS method is that it weighs each feature equally.As a result, we use the wSFFS method to select features along with their optimal weight.Given a set of A possible weight values , , , , , , , .
We then perform SFFS on A V to find the most relevant weighted feature.After sequentially adding weighted features, the result is a subset of features ' v and corresponding weights [ ] where the th l weight l .
ω ∈ a We subsequently use this new set of reduced features and weights in Eq. ( 10) to classify the pixel in place of the full, non-weighted feature vector .v Finally, to reduce classifier overfitting, we use K-fold cross-validation [68] during feature selection.We also increase the classification accuracy by using K-means clustering [69] to divide V into K clusters.We then perform wSFFS on each of the clusters to generate K sets of selected feature vectors and weights.When classifying a pixel, the set that minimizes the criterion function is used to determine the final class.

Classification-based GTDP segmentation
To improve the KR-based classification method from Section 3, we apply our GTDP segmentation framework introduced in [54] to more accurately isolate class boundaries.This section provides an overview of our GTDP method and describes how the classification results can be used to both improve and simplify GTDP segmentation.

GTDP layer segmentation review
We represent an image as a graph, where each pixel is a node on the graph and adjacent nodes are connected by edges.To create a path preference, we assign positive weights to the edges based on features such as image intensity or gradient, where ultimately the lowest weighted path is preferred.For example, to segment the darkest path across an image, we assign weights based on image intensity, where the weight ab w of the edge connecting nodes a and b following min , where n I is the intensity of I at node n and min w is a positive number specifying the minimum weight of an edge since a weight equal to zero indicates two unconnected nodes.
We also specify that ab ba w w = for all node pairs, as this generates an undirected graph with half as many weight values.
To segment a horizontal layered structure, we add an additional node to either side of the graph to automatically initialize the start and end nodes of our path.The additional node on the left is connected to all nodes in the leftmost column of the graph, and the right additional node has edges connecting it to all nodes in the rightmost column.These additional edges are assigned a weight of min . w We then apply Dijkstra's algorithm [70] to find the shortest path from the start node to the end node, which corresponds to our desired segmentation.For more details referring to our GTDP segmentation algorithm, refer to [54,55].

Classification-based search region limitation
For scenarios in which multiple similar features are to be segmented within a given image (e.g.retinal layer boundaries), the graph may be isolated to a smaller region containing only one of the features.This may be done using a priori knowledge of the feature to segment.For example, to segment the inner plexiform layer (IPL) / inner nuclear layer (INL) boundary, we can set all edge weights in the outer half of the retina to zero.This prevents the algorithm from segmenting outer retinal features such as the outer segment (OS) / retinal pigmented epithelium (RPE) boundary.Implementation examples of these types of search region limitation can be found in [45,46,54,55,71].
To simplify and reduce the complexity of search region limitation, we can modify the edge weights of the graph to emphasize a particular region.For example, let us extend our example in Section 4.1 to segment the darkest path across an image.Given an image I and its classification result , we would like to segment the darkest path across the portion of the image where [ , ] .
To do this, we compute the edge weights for where n C is the class at node n and { } k c From Eq. ( 12), it can be seen that edges connecting nodes with class k c are given lower weights than edges connecting nodes assigned to other classes.The advantage of this method is that far fewer a priori rules are required to isolate the desired region, as the classification method automatically generates estimates of region locations.Furthermore, hard limits are not set on the search region, as portions of the graph are not excluded.

Training the DME classifier
In this Section, we describe an implementation of our KR-based classification method to generate a DME classifier to identify fluid-filled regions and seven retinal layers on SD-OCT images of eyes with DME.An outline of our method is shown on the left side of Fig. 2. All parameter values were either selected empirically based on the training data set or were extensions from our previous segmentation algorithms [45,54,55].

DME training data set
To learn our DME classifier, we obtained training data separate from our validation data set.We used the Duke Enterprise Data Unified Content Explorer search engine to retrospectively identify patients within the Duke Eye Center Medical Retina practice with a billing code for DME (ICD-9 362.07) associated with their visit.An ophthalmologist then identified six patients imaged in clinic using the standard Spectralis (Heidelberg Engineering, Heidelberg, Germany

Target classes
To generate the target classes for classifier training, we manually segmented fluid-filled regions and semi-automatically segmented all eight retinal layer boundaries following the definitions in Fig. 3.This was done for 12 B-scans within the training data set (two from each volume).The B-scans selected consisted of six images near the fovea (B-scan 31 for all volumes) and six peripheral images (B-scans 1, 6, 11, 16, 21, and 26, one for each of the six volumes).We then used the manual segmentations to assign the true class for each pixel, with a total of eight possible classes defined in Table 1 and the classified result shown in Fig. 4(a).

Pre-processing the image
Prior to training the classifier, we perform image pre-processing steps for all B-scans within a volume following our validated DME algorithm described in [45].To summarize, we first remove Spectralis registration boundaries by replacing them with the mirror image of the valid image regions.To standardize the images following [45] and [55], we resize the image to a lateral and axial resolution of 13.4 j r = µm/pixel and 6.7 i r = µm/pixel, respectively, and then linearly normalize the image to range from 0 to 1.

Flattening the retina
Next, we flatten the curved retina to obtain a shorter path for GTDP segmentation.Following [45], we achieve this by first heavily filtering each image using a 40.2 µm × 80.4 µm Gaussian with 73.7 σ = µm × 147.4 µm.We then threshold the filtered image at 0.5 to extract the inner and outer hyper-reflective complexes.We refine the hyper-reflective mask by 1) opening any gaps in the clusters using a 33.5 µm × 40.2 µm rectangular structuring element, 2) removing all clusters less than 0.018 mm 2 in size, and 3) closing any remaining gaps using the same structuring element.Using GTDP, we segment the boundaries of the hyperreflective complexes as described in [55] which obtains estimates at the middle of the ONL and RPE.
Unlike our previous algorithms which flattened the retina on B-scans individually [45,54,55], in this algorithm we perform the above steps for all images in the volume to generate 2D map estimates ( ) of the OPL and RPE surfaces.Using the map of the RPE, we linearly interpolate any missing A-scans and B-scans.We then search for outliers by computing the absolute second derivative in the vertical direction and isolating all values greater than three standard deviations above the mean.Outlier clusters within each B-scan are then linearly interpolated, and the map of the RPE is smoothed using a 500 µm Gaussian filter with 250 σ = µm.Finally, for each B-scan, we circularly shift the A-scans until the RPE estimate is flat.Since the ONL boundary no longer aligns with the flattened RPE and image, we also flatten the ONL using the same flattening transformation.

Segmenting the total retina
To isolate the retina from the vitreous and choroid, we implement our GTDP layer segmentation framework described in Section 4.1 to segment the ILM using the dark-to-light gradient weights computed following

LinNorm , ,
x y z is a linear normalization of x to range from y to .
z While using Dijkstra's algorithm to segment the ILM, we limit the search region to span from the top of the image down to the RPE estimate generated in Section 5.4.To ensure that hyper-reflective spots or detachments internal to the ILM are not segmented, we re-segment the boundary using the weights in Eq. ( 14) where ab D is the Euclidean distance from node a to node .
b We also limit the search region to span from the initial ILM estimate to 67 µm below the ILM.

Locating the fovea
To compute the features in Section 5.8 and gauge the location of the optic nerve and merged retinal layers, we first identify the location of the fovea on the volumetric macular scan.Upon segmenting the ILM and BM across all B-scans, we locate the fovea based on the 2D total retinal thickness (TRT) map computed by subtracting the ILM from BM.To do this, we first filter the TRT map ( ) using a Gaussian (120 µm × 89.8 µm and σ = 18.95 µm × 85.8 µm).We then narrow our search region to the central 3 mm of the volumetric macular scan and find all local extrema with a depth greater than 13.4 µm.If the extremum with the lowest or highest TRT resides closest to the center of the volume, then we assign that location as the fovea.Otherwise, we create a summed voxel projection (SVP) by summing pixels ranging from the ILM to 33.5 µm below the ILM across all B-scans.As with the TRT, we smooth the SVP, locate the extrema, and computed distances from the fovea.Looking at the five extrema closest to the center of the volume, we set the extremum with the lowest intensity as the fovea.

Kernel regression-based denoising
We next perform iterative Gaussian steering KR on each training image as described in Section 2 to 1) extract the normalized kernels ˆiGS K for every pixel z in the retina, and 2) recover b defined in Eq. ( 4) which contains the denoised image and its first and second order derivatives.Parameter values are listed in Section 2.2, and since only the pixels within the retina need to be classified, we do not denoise pixels within the vitreous and choroid.

Computing DME training features
To learn the DME classifier, we computed feature vectors for the 12 training images manually segmented in Section 5.2.We generated a total of 22 features ( 1 2 2 v v − ) for each vector, and only pixels within the retina were processed.The nine KR-based features are defined in Table 2, the four position-related features are listed in Table 3, and the nine Laws' Texture Energy Measure features [63,64] are listed in Table 4.We then enhance the texture features in Table 4 using Eq. ( 8) and further enhance all features except 7 1 3 v v − using adaptive histogram equalization [72].Finally, we normalize all features to match the range of the training features.Figure 5 shows Laws' 5 5   T E E feature computed with and without KR-based denoising.As can be seen from Fig. 5, utilizing KR reduced the amount of noise present within the feature.

Selecting relevant features and weights
To select the most relevant features and weights for fluid and layer classification, we implemented our wSFFS method described in Section 3.5.We used ten-fold cross-validation [73] during this process and adjusted selection parameters until we achieved the desired classification result.During feature selection, we set the maximum number of features in ' v to 10, a ranged from 0.1 to 1 in 0.1 increments, used the classifier function defined in Eq. ( 10), and allowed a maximum of three clusters.
Next, we defined three types of DME classifiers: 1) a layer classifier ( l ϕ ) for segmenting layer boundaries, 2) a fluid classifier ( f ϕ ) for identifying fluid-filled regions, and 3) a layer + fluid classifier ( l f ϕ + ) used to remove false positive fluid detected by the fluid classifier.For each classifier type, we defined the criterion function following Eqs.( 16) and (17), where ' 1 , , , is the set of reduced feature vectors for all training data, r c is the true class for , r ' v and k c is the class that is being analyzed.For the fluid classifier, 1 k = so only the fluid class is evaluated by the criterion function.For the layer classifier, 2, ,8 k =  so only the layers are analyzed, and 1, ,8 k =  for the layer + fluid classifier so all classes are analyzed.The resulting feature and weights are shown in Table 5 and Fig. 6, where l ϕ is a 3-cluster classifier, and f ϕ and l f ϕ + are 1-cluster classifiers.  .Figure 4 shows an example training image comparing the true classes (Fig. 4(a)) to the automatically generated classes output by our algorithm (Fig. 4(b)).

Automatic classification-based GTDP segmentation algorithm for DME
We next leveraged the classification results from Section 5 to create a fully automatic GTDP segmentation algorithm to identify fluid-filled regions and seven retinal layers on SD-OCT images with DME.An outline of the algorithm is shown on the right side of Fig. 2 and explained in detail in the following subsections.

Pre-processing, flattening, and segmenting the retina
We pre-process all images in a volumetric scan, flatten the entire retina based on estimates of the RPE, and use GTDP to segment the ILM and BM following Sections 0 -5.5.

Classification of fluid and retinal layers
To classify the images, we locate the fovea and perform second order iterative Gaussian KR as described in Sections 5.6 and 5.7, respectively.We then compute the 12 features listed in Table 5 for all pixels residing within the retina following Section 5.8.Finally, we classify the feature vectors using the three classifiers from Section 5.9, and merge the results following Section 5.10 to generate the final image C containing the fluid and layer class estimates.

GTDP segmentation of intraretinal layers
To segment the six remaining intraretinal layer boundaries, we begin by creating a binary mask of the OS/RPE boundary based on the layer classification image .l C To generate the binary image , B we set all pixels from the bottom of the classified ISE layer to the bottom of the classified RPE layer to 1.We then segment the OS/RPE boundary using the weights in Eq. ( 18) and a search region extending from the ILM to BM, where n B is the value of B at node , n max 0.5, d = and max 0.5.b = Furthermore, for Eq. ( 18) and for all subsequent weight calculations, we compute DL n G using the denoised image 0 β generated in Section 6.2.
( ) Next, we segment the ISM/ISE by setting all pixels in B ranging from the bottom of the classified ONL-ISM to the bottom of the classified ISE.We then segment the boundary using Eq. ( 18) where max 0.5, and the search region extends from halfway between the classified ILM and OS/RPE to13.4 µm above the OS/RPE.
The third intraretinal boundary we segment is the NFL/GCL.Since the ILM is oftentimes significantly brighter and creates a pseudo light-dark transition within the NFL, we take the pixel-wise square root of Î and use the resulting image to compute the light-to-dark gradient image of the NFL/GCL boundary, generating sqLD G .We then compute the graph weights following ( ) ) min LinNorm , 0, 1 LinNorm , 0, 0.1 LinNorm , 0, 0.1 , where sqLD n G is the light-to-dark gradient of sqLD G at node n and values in B ranging from the middle of the classified NFL to the bottom of the GCL-IPL are set to 1.We then set the search region to range from the ILM to the ISM/ISE.However, since the thickness of the NFL varies more than all other layers and is oftentimes very faint on the temporal side, we provide additional search region steps as follows: 1) the bottom boundary for A-scans less than 1 mm from the fovea is set to the ILM plus 40.2 µm times the distance from the fovea (in mm), 2) the top boundary for A-scans between 1.5 and 3 mm nasal to the fovea is set to the ILM plus 6.7 µm times the distance from the fovea, and 3) the bottom boundary for A-scans greater than 1 mm temporal to the fovea is set to a third of the distance between the ILM and ISM/ISE.
Next, we segment the OPL/ONL using Eq. ( 20) where max 0.5, b = max 0.5, i = and the binary image is set to 1 between the top of the classified OPL and the middle of the classified ONL.We then set the top of the search region to either the NFL/GCL plus the NFL thickness, or 67 µm below the NGL/GCL, whichever is closer to the NFL/GCL.Finally, the bottom of the search region is set to the ISM/ISE.( To segment the INL/OPL, we set values in B extending from the top of the classified INL to the bottom of the classified OPL to 1.We also morphologically dilate the edges of the fluid in C using a disk structuring element with a radius of 2 pixels.The dilated edges are also set to 1 in B .We then use Eq. ( 18) with max 0.1 d = and max 0.1 b = to compute graph weights, and we limit the search region extent from 13.4 µm below the NFL to the OPL/ONL.
With the IPL/INL being the last boundary to segment, we set values in B ranging from the bottom of the classified GCL-IPL to the bottom of the INL to 1.To discourage the IPL/INL from running through fluid regions, we set fluid regions to a negative value in .B We then compute graph weights following Eq.( 20) with max 0.4 b = and max 0. i = Finally, we limit the search region to range from the a third of the distance between the NFL/GCL and INL/OPL, to the INL/OPL.

DME validation data set
We obtained our validation data set by identifying ten patients with DME that were not included in the training data set.The method for selecting these data sets is described in Section 5.1, with the difference that the images had to be of adequate quality (i.e.layer and fluid boundaries needed to be visible).The image acquisition parameters were consistent with the training data set, and lateral and azimuthal resolutions ranged from 10.94 -11.98 µm/pixel and 118 -128 µm/pixel, respectively.We made the entire data set available online, including the training and validation data sets and their corresponding automatic and manual segmentation results.This data set can be found at www.duke.edu/~sf59/Chiu_BOE_2014_dataset.htm.

Automatic versus manual segmentation
Two ophthalmologists manually segmented all fluid-filled regions and eight retinal layer boundaries using custom software (DOCTRAP V50.9) for 110 images (11 per patient).The images selected consisted of the foveal scan fov k and scans fov k ± 2, fov k ± 5, fov k ± 10, fov k ± 15, and fov k ± 20.The graders worked in a dimly lit room alone and free of other distractions.A tablet computer was used to allow the grader to utilize a stylus to mark or draw the segmentation as if writing on a flat surface.While marking the images, the ophthalmologists were able to use either a freeform (drawing) or connect-the-dots (piecewise cubic-interpolated clicking) method.They also had the options to hide/show other markings, erase/add/correct markings, and pin/unpin adjacent layer boundaries.The graders first segmented the retinal layers using predominantly the connect-the-dots function with freehand segmentation reserved for particularly complex contours such as those encountered retinal layers are severely disrupted by edema.In cases where boundaries between layers were completely obscured or lost, the graders made their best estimate.The retinal layer segmentation could then be hidden to allow segmentation of edema without influence by retinal layer segmentation.The graders segmented retinal edema using the freehand function, and all edema was segmented whether it was cystic or non-cystic in nature.Cysts were segmented individually, whereas non-cystic edema was segmented according to the contour of thickening and the perceived location of the edema with respect to the retinal layers.The expert graders (MJA and PSM) are both fellowship-trained medical retina specialists on the faculty at the Duke Eye Center.They each have over five years of clinical experience working with diabetic subjects and reading OCT images.Both have prior experience with manual OCT grading and have each published two manuscripts involving OCT analysis and grading.Outside consultation and discussion between graders was prohibited.
To validate our algorithm, we compared our fully automatic KR + GTDP segmentation to our pre-determined primary grader (MJA) within a 6-mm lateral radius from the fovea.We then measured the Dice coefficient [74,75] for retinal layer and fluid classification.We also measured the mean layer thickness difference between automatic and manual segmentation for each B-scan, and computed the absolute mean error across all 110 B-scans.To realize the improvement of our new KR + GTDP segmentation algorithm compared to existing methods that do not consider the presence of fluid, we also evaluated the performance of our GTDP algorithm developed for normal eyes on the DME data set.This Normal GTDP algorithm is a modified version of our 2010 algorithm described in [54], where enhancements had been made following our 2010 publication to improve segmentation results and address issues such as registration artifacts seen on Spectralis SD-OCT images.Finally, we computed the mean Dice coefficient and thickness difference for each layer for a given patient and performed the Wilcoxon matched-pairs test across all patients for each automated algorithm compared to the inter-manual results, as well as the Normal and DME algorithms compared to each other.

Experimental results
Figure 7 (right) shows the qualitative results of our automatic KR + GTDP segmentation algorithm for varying amounts of edema, including cases where there was no fluid present (Fig. 7(f)).This is in comparison to the automatic GTDP results using the normal algorithm (Fig. 7, middle).Table 6 and Table 7 report quantitative results comparing automatic and manual segmentation, with an overall mean Dice coefficient and thickness difference of 0.78 ± 0.10 and 5.22 ± 5.81 µm, respectively, for our proposed KR + GTDP algorithm.This is comparable to the inter-manual results which yielded an overall mean Dice coefficient and thickness difference of 0.79 ± 0.09 and 4.30 ± 2.38 µm, respectively.On the other hand, our popular GTDP algorithm developed for eyes without pathology had mean overall values of 0.70 ± 0.19 and 11.39 ± 19.11 µm.Table 8 shows the resulting p-values when comparing a) the mean patient Dice and thickness difference values between the automated and gold standard methods, to b) the inter-manual (Columns 1 and 2) or Normal GTDP (Column 3) results.An insignificant result against the inter-manual results indicates that the automated algorithm performed comparably to a second expert grader, while a significant result against the Normal algorithm indicates an improvement in the segmentation algorithm.Finally, Fig. 8 is an example comparison of the segmentations completed by the algorithm (Fig. 8(b)) and by the two graders (Fig. 8(c) and 8(d)), and Fig. 9 shows example errors in automatic segmentation (Fig. 9, bottom) when compared to a manual grader (Fig. 9, middle).The performance of the algorithms in MATLAB (MathWorks, Natick, MA) were 1.8 seconds per image on average for the Normal GTDP algorithm and 11.4 seconds for the KR + GTDP algorithm.This is in comparison to the 5.5 minutes required per image for manual segmentation.

Discussion
Despite having two qualified graders to manually mark the validation data sets, their results did not overlap with high accuracy (Table 6).This is likely due to the lower image quality and severe DME pathology present within the data set, as we deliberately chose to use real-world, clinical images in our study.While not necessarily a representative example for all graded images, Fig. 8 shows the challenge faced by both expert and automatic graders when confronted with an image from clinical practice that is affected by severe DME.The massive retinal thickening and punctate inner retinal hyper-reflectivity results in shadowing and loss of detail in the outer retinal layers.While some cysts are well-delineated hypo-reflective spaces in Fig. 8(a), others are minimally hypo-reflective compared to the surrounding tissue and create a judgment call for the grader.In this case, one grader perceived some non-cystic edema (Fig. 8(d)) while the other identified primarily cystic edema (Fig. 8(c)).The subjective nature of complex image segmentation by humans is thus one of the major obstacles to generating robust imaging-based biomarkers and is the primary unmet scientific and clinical need that is addressable using automated algorithms.
Despite the difficulty in managing such a challenging data set, Table 6 indicates that our fully automatic KR + GTDP algorithm generated results with an error close to the interobserver variability (Dice coefficient of 0.78 versus 0.79).Furthermore, this was a significant improvement from previous algorithms developed by us, as our Normal GTDP method was tested on images without the significant pathology as seen on our DME data set.KR + GTDP classification of fluid-filled regions yielded the lowest Dice coefficient with errors similar to those shown in Fig. 9.In Fig. 9(a), false positives were detected near areas containing hyperreflective deposits.Since the training data set did not include images with these hyperreflective spots, the algorithm perceived these areas as "normal" and its surrounding regions as hypo-reflective, thus resulting in false positive classifications.Future versions of the algorithm can include the deposits as an additional class.Figure 9(b), in contrast, is a very dim image.Since the image contained a single hyper-reflective spot, the rest of the image remained dark after linear normalization.In future versions, the image can instead be nonlinearly normalized (e.g.histogram equalization) prior to using the DME classifier.Finally, Fig. 9(c) shows the algorithm's tendency to identify more regions of the ONL as fluid compared to the grader.This may be partially due to the small training data set size of 12 images.
Classification-based GTDP segmentation includes embedded denoising which is advantageous for reducing the noise present within the classification feature vectors (Fig. 5).While many different denoising methods may be also used to reduce the noise, the exciting properties of KR include its direct estimation of the denoised first and second order derivatives ( 1 β and 2 β ), as well as the generation of locally-weighted kernels which can be used directly as features or reused for non-linear feature denoising.A limitation of KR is the dependence of parameter values on the image noise level.Thus in order to optimally denoise images of varying scan types, future versions of the algorithm should adapt the KR parameter values based on the estimated noise level.Adapting alternative image representation models such as sparsity-based techniques, shown to be very efficient in representing retinal OCT images [76], may also improve the performance and is part of our ongoing work.The need for GTDP segmentation is a limitation to this study, as the KR-based classification method was not sufficient to accurately identify retinal layers as a standalone method.While K-means clustering helped improve the classification of layers with varying thicknesses and positions (e.g. a thicker NFL on the nasal side versus a thinner NFL on the temporal side as shown in Fig. 4(b), the classified intraretinal layers did not merge at the fovea.Furthermore, three separate classifiers were required to achieve the final classification result.While l f +

C
provided a balance in accuracy between the layer and fluid classifiers, l C generated the most accurate layer classifications and f C achieved better fluid classifications but with more false positives.We therefore merged all three classifiers to achieve the best possible classification result.Finally, since there was no hard constraint on layer ordering, Fig. 4(b) shows the OS-RPE layer sometimes appearing above the ISE.Nevertheless, the classification example shown in Fig. 4(b) shows that our method provides a significant starting point that is helpful in simplifying the GTDP search region step.Other more advanced classification methods (e.g.support vector machines) may also be implemented in place of our described method to possibly improve the classifier performance.
The classifier features and weights shown in Fig. 6 and Table 5 provide insight into the characteristics that are important for fluid and layer identification.For example, the denoised intensity feature ( 1 v ) was the most important feature for identifying fluid, likely due to the dark intensity exhibited by fluid.In contrast, for both cases where the classifier was required to identify layers, the vertical position within the retina ( 11 v ) was most important.This is due to the layered nature of the retina.Adapting alternative image representation models such as sparsity-based techniques, shown to be very efficient in representing retinal OCT images [76], may improve the performance and is part of our ongoing work.
Finally, our randomly selected validation data set did not include many images with subretinal fluid, a relatively uncommon feature in DME.Thus, its performance for such images is unknown.Furthermore, our algorithm does not yet distinguish between cystic and non-cystic fluid.Incorporating the ability to identify and categorize different types of fluid (e.g.subretinal, cystic, non-cystic) is a necessary step in our future work, as this will enable rigorous DME subtyping.While the correlation between DME subtype and response to therapy is still unknown, the potential clinical implications of personalized therapy and improved treatment outcomes are vast.

Conclusion
We developed a fully automatic algorithm to identify fluid and eight retinal layer boundaries on SD-OCT images with DME pathology.Results showed that our automatic algorithm performed comparably to manual graders.This is the first validated, fully-automated, and multi-layer segmentation method which has been applied to real-world, clinical images containing severe DME.Accurate identification of DME imaging biomarkers is extremely important, as it will facilitate the quantification and understanding of DME.

Fig. 1 .
Fig. 1. Second order iterative Gaussian steering KR of an SD-OCT image with DME.a) Automatically flattened image, b,d) zoomed-in images of the pink and green boxes in (a), c,e) Gaussian steering kernels used to denoised the central pixel of (b) and (d), and f) KR-denoised image of (a).In (f), regions external to the retina (black) were not denoised with the exception of a padded boundary surrounding the retina required for feature vector computation.
) 61-line volume scan protocol with severe DME pathology and varying image quality.Averaging of the B-scans was determined by the photographer, and ranged from 9 to 21 raw images per averaged B-scan.The volumetric scans were Q = 61 B-scans × N = 768 A-scans with an axial resolution 3.87 i r = µm/pixel, lateral resolution ( ) j r ranging from 11.07 -11.59 µm/pixel, and azimuthal resolution ( ) k r ranging from 118 -128 µm/pixel.

Fig. 2 .
Fig. 2. Flowchart of the KR-based classification and GTDP-based segmentation algorithm for identifying fluid-filled regions and eight retinal layer boundaries on images with DME pathology.

Fig. 3 .
Fig. 3. Target retinal layer boundaries and fluid to segment on an SD-OCT B-scan of an eye with DME.A flattened version of the image without markings is shown in Fig. 1(a).

Fig. 4 .
Fig. 4. KR-based classification of retinal layers and fluid-filled regions.a) Manual classification of the image in Fig. 1(a) with the classes defined in Table 1.b) Automatic classification using the merged DME classifier.
n G is the maximum of the dark-to-light vertical (filter [1, 1] T = − ) and diagonal gradients of I at node n after filtering the image using a Gaussian (20.1 µm × 67 µm, ) where LD n G is the light-to-dark equivalent of DL n G and n I is the intensity of the filtered image at node n .We also set the top boundary of the search region to the mean location between the ONL estimate and 100.5 µm below the ILM, and we set the bottom boundary to the bottom of the image.

Fig. 5 .
Fig. 5. Laws' 5 5 T E E texture feature computed for the image in Fig. 1(a) with (a) and without (b) KR-based denoising.Both feature images have been enhanced using adaptive histogram equalization.

Fig. 6 .
Fig. 6.Visual example of features selected for the DME classifiers.a-l) Features computed for the image in Fig. 1(a).

Fig. 7 .
Fig. 7. Qualitative results for the identification of fluid and eight retinal layer boundaries on SD-OCT images of eyes with DME.a-c) Images with significant, moderate, and no visible fluid, respectively, d-f) their corresponding automatic segmentation result using the GTDP algorithm developed for normal eyes, and g-i) the automatic KR + GTDP segmentation result.

Fig. 8 .
Fig. 8. Automatic and manual segmentation comparison.a) An SD-OCT B-scan with DME pathology, b) the fully automatic segmentation result, and c-d) the segmentation results completed by two different graders.

Fig. 9 .
Fig. 9. Automatic fluid detection errors.Top) Cropped portions of SD-OCT B-scans with DME, middle) manual segmentation of fluid-filled regions by a grader, and bottom) automatic fluid-filled classification.a) An image with hyper-reflective deposits, b) a dim image with a hyper-reflective outlier, and c) a high quality image.