Effect of patch size and network architecture on a convolutional neural network approach for automatic segmentation of OCT retinal layers

: Deep learning strategies, particularly convolutional neural networks (CNNs), are especially suited to finding patterns in images and using those patterns for image classification. The method is normally applied to an image patch and assigns a class weight to the patch; this method has recently been used to detect the probability of retinal boundary locations in OCT images, which is subsequently used to segment the OCT image using a graph-search approach. This paper examines the effects of a number of modifications to the CNN architecture with the aim of optimizing retinal layer segmentation, specifically the effect of patch size as well as the network architecture design on CNN performance and subsequent layer segmentation. The results demonstrate that increasing patch size can improve the performance of the classification and provides a more reliable segmentation in the analysis of retinal layer characteristics in OCT imaging. Similarly, this work shows that changing aspects of the CNN network design can also significantly improve the segmentation results. This work also demonstrates that the performance of the method can change depending on the number of classes (i.e. boundaries) used to train the CNN, with fewer classes showing an inferior performance due to the presence of similar image features between classes that can trigger false positives. Changes in the network (patch size and or architecture) can be applied to provide a superior segmentation performance, which is robust to the class effect. The findings from this work may inform future CNN development in OCT retinal image analysis.


Introduction
Optical coherence tomography (OCT) has transformed the imaging of the eye, providing high-resolution cross-sectional images of the ocular tissue that are now commonly used in clinical practice and research to better understand the eye in both health and disease [1,2]. Although qualitative assessment of OCT images can be used to inform clinical decision making, including the classification and detection of eye diseases, quantification of the acquired images is necessary to better understand the eye's normal development and the impact of common eye conditions such as myopia upon eye morphology [3][4][5], and to more reliably facilitate early disease detection, the tracking of disease progression and monitoring of treatment efficacy [6][7][8][9]. Estimates of tissue thickness, derived from quantitative OCT image analysis remains the most common way to characterize the B-scan images acquired from the instrument [10]. Thus, layer segmentation is a fundamental task in OCT image analysis, and the development of reliable OCT image segmentation methods has been the focus of numerous previous studies.
A number of papers have reviewed the rich literature involving OCT image analysis, particularly layer segmentation methods [11][12], which includes a large variety of approaches such as analysis of intensity variation and adaptive thresholding [13], intensity-based Markov boundary models [14], texture and shape analysis [15], graph theory techniques [16], multisurface graph cuts approach [17], and active contour segmentation models [18]. These techniques are in general a set of ad hoc rules applied to the image to extract the boundaries of interest. Although the methods have shown significant merit, they do not always generalize well and changes in the image may mean that changes in the set of rules are also required. Machine learning methods develop their own rules based on the provided training data (images plus boundary position) and may be more resilient to variations in the data that are commonly encountered in clinical images. Thus in recent years, machine learning methods have emerged as a useful tool for a range of retinal OCT image analysis applications, including the classification of OCT images into different disease groups and for the detection of pathological features in images [19][20][21][22][23]. Reviews of this research area can be found elsewhere [24,25]. Other applications include, automatic detection of the foveal center in patients with age-related macular degeneration [26] and the detection and segmentation of retinal pigment epithelium detachments [27].
For layer segmentation, a number of studies have used machine learning methods to detect the boundaries between retinal layers. Vermeer et al. [28] first extracted several features from individual A-scans, then used these features in a Support Vector Machine to classify each pixel within the A-scan to segment six retinal layers. Features extracted included the intensity, neighboring intensity to various lengths, and neighboring gradients to various lengths. Lang et al. [29] used a Random Forest Classifier (RFC) to classify features around a pixel into nine layer classes, then either a Canny edge detector or graph-search was used to segment the layers into boundaries. Ben-Cohen et al. [30] utilized a combination of the U-net fully convolutional architecture [31], Sobel edge detection, and graph-search to identify four retinal boundaries. U-net first identified the full retinal layers, then the Sobel edge detector was used to segment these layers, and finally Dijkstra's graph-search algorithm was used to predict the boundaries between layers. Venhuizen et al. [32] also used a U-net based network to calculate the total thickness of the retina. U-net was used to classify parts of the scan as either retinal or another part of the eye, then this classification was thresholded to create a binary image of the retina. Roy et al. [33] created an encoder-decoder framework similar to U-net, called ReLayNet, to segment the image into retinal tissue (eight layers) and intra-layer fluid. ReLayNet was used as an end-to-end approach, with the output being a completely segmented image with no further processing necessary.
Recently, Fang et al. [34] proposed a convolutional neural network (CNN) and graphsearch method (termed as CNN-GS) for the automatic segmentation of nine layer boundaries in OCT retinal images of patients with non-exudative age-related macular degeneration. The work follows Chiu's [16] original OCT segmentation model based on graph theory, but replaces the edge-detection step with a CNN. The CNN predicts the boundary location providing a per-layer probability image, which is then used to trace the boundary using graphsearch methods. The method runs a patch (window) of fixed size through the image to provide the probability map for a particular boundary to be present in the center of that window. The authors adopted a previous CNN architecture, proposed for a different data set, for the classification of the patch in OCT retinal boundaries. Their model operates on a square input patch size of 33x33 pixels. In this paper, an in-depth analysis of the effect of using different patch sizes and CNN architectures on segmentation performance is presented. The aim of this work is to better understand the effect of patch size as well as network architecture on CNN performance and subsequent layer segmentation results, in order to improve the performance of the classification and optimize the outcomes from the OCT retinal layer segmentation. The findings of this work may inform future CNN development in OCT imaging analysis using deep learning strategies.

Demographics and OCT data set
A retrospective data set of retinal OCT images was used for this study and a detailed description of the study participants and procedures have been provided in a number of previous publications [5,35,36]. Briefly, this retrospective data is from a longitudinal study examining macular retinal layer thickness in childhood involving 101 children (13.1 ± 1.4 years) with a range of refractive errors. Retinal OCT images were collected on each child at four study visits, conducted every 6 months over an 18-month period, although for this work, OCT images were analyzed from the first visit only. The study was approved by the Queensland University of Technology human research ethics committee and all study procedures followed the tenets of the Declaration of Helsinki. All participants enrolled had normal vision in both eyes, no history or evidence of ocular disease, injury or surgery and no manifest hyperopic refractive errors of greater than + 1.25 DS.
High-resolution cross-sectional retinal images were collected using the Heidelberg Spectralis (Heidelberg Engineering, Heidelberg, Germany) SD-OCT instrument. This device uses a super luminescent diode with a central wavelength of 870 nm for OCT scanning (capturing 40,000 A-scans per second), and provides cross-sectional retinal OCT images with an axial digital resolution of 3.9 μm. The images captured were 496 pixels deep and 1,536 pixels wide, with a total area of 761,856 pixels, with scale of 3.9 μm per pixel deep, and 5.7 μm per pixel wide.
At each study visit, the participants had 2 series of 6 high resolution foveal centered radial OCT scan lines acquired using the instrument's Enhanced Depth Imaging (EDI) mode (with each radial scan line separated by 30 degrees). The EDI mode is typically used to enhance the visibility of the choroid [37], and it has been shown that retinal thickness measures from EDI scans are comparable to those collected using the Spectralis instrument's conventional imaging mode [38]. To improve the image signal to noise ratio, 30 frames were averaged using the instrument's automatic real time eye tracking feature. The exported OCT images were analyzed using custom written software. Initially, an automated graph based method [3,16] was used to segment the boundaries of 7 different retinal layers, including: the outer boundary of the retinal pigment epithelium (RPE), the inner boundary of the inner segment ellipsoid zone (ISe), the inner boundary of the external limiting membrane (ELM), the boundary between the outer plexiform layer and inner nuclear layer (OPL/INL), the boundary between the inner nuclear layer and the inner plexiform layer (INL/IPL), the boundary between the ganglion cell layer and the nerve fiber layer (GCL/NFL) and the inner boundary of the inner limiting membrane (ILM) (Fig. 1). An experienced observer, masked to the demographic and refractive details of the participants, then checked the integrity of the automated segmentation of each boundary and manually corrected any segmentation errors.

Overview of image processing methods
The method used for the segmentation of OCT images follows a similar procedure to that used by Fang et al [34]. This is a two stage method consisting of an initial CNN followed by a graph-search procedure. The CNN computes a probability map that indicates the likelihood of a retinal boundary lying at any given pixel, and this probability map is then used as an input to the graph-search method that traces from the top left of the image to the bottom right to calculate the most likely path of the retinal boundary of interest. All training and testing of networks was computed on an Nvidia Titan Xp using MATLAB r2017a and VLfeat's MatConvNet library [39].
The Fang method uses a well-known CNN, the CIFAR-CNN architecture [40,41], which was originally designed and tested to classify small 32x32 color images into 10 classes, and the authors adopted this network for retinal OCT image layer segmentation. In this paper, we explore the proposed method and examine the effects of a number of modifications to the CNN architecture with the aim of optimizing retinal layer segmentation. The modifications take into account the distribution of the information (retinal boundaries within the tissue) within the OCT image and adapts the patch size to better capture the richness of the information and improve image segmentation. Figure 2 provides an overview of the proposed methods, which have been divided into two major steps. The first step involves the training of the CNN network, using a data set with seven labelled retinal boundaries. The second step involves the testing of the trained network performance to produce probability maps and then subsequent graph-search to trace these boundary locations. Fig. 2. Overview of the proposed method with the two major steps involved in the process. In the training step (top section) the different proposed networks can be substituted to evaluate the effect of different networks on the performance of the segmentation task.

Convolutional neural networks
Convolutional Neural Networks (CNNs) are classifiers often used for image analysis [41]. In ophthalmic applications, CNNs have been used for a number of different purposes, including cone detection in high-resolution retinal images obtained using adaptive optics [42], and retinal vessel segmentation [43]. CNNs consist of many layers, normally arranged into blocks which find local features, activate based on these findings, and sub-sample to create a smaller but more feature-rich input for the next block [44]. The most commonly used layers for these blocks are combinations of convolutional layers, activation layers and sub-sampling layers. The convolutional layers take an X1xY1xZ1 sized input and convolute this to a 1x1xZ2 output. The activation layers are normally implemented using Rectified Linear Units (ReLU) [45] which activate when the input is greater than an arbitrary threshold. The sub-sampling layers, normally average or max pooling layers, reduce the size of the input while preserving the information. Other layers that are used less commonly, but still appear in most networks, are fully connected layers, which are simply convolutional layers connecting to all values in the input, and softmax layers, which are used to calculate the final probabilities of each class.
In general, a network block consists of a convolutional layer, a ReLU layer, and a subsampling layer. Each block reduces the height and breadth of the input, but increases the depth of the input, until the final classification is performed [46]. CNNs are normally trained by a gradient descent method over multiple iterations, where each sample is used multiple times. This gradient descent algorithm seeks to minimize the error with respect to various weights throughout the network. Most modern networks can consist of millions of parameters which govern the convolutional or activation functions within the network. In most cases the data set used for training is too large to fit within the memory of a single GPU, so training is often done in several smaller batches (a stochastic gradient descent [47]).

Patch prediction and boundary segmentation
The first stage in the CNN-GS method is the prediction of the boundary locations. Fang et al. [34] pre-processed images by applying an intensity normalization, for completeness this step was still included in the procedure. The normalization procedure (originally presented in [28]) ensures the data is located between a 0 to 1 range and removes outliers by looking for maximum values of a smooth filter image. To train the network, small patches of arbitrary dimensions were sampled once per column (A-scan) of the OCT image, centered in the boundary, with a randomly chosen non-boundary sample patch to provide a null or background sample. To preserve the weighting of the network, any column without all layers present was omitted from the training or testing sets. Each layer was given an arbitrary class label from 1 to 8, including the background class.
A total of 138 OCT images from 70 subjects were used to form the training set, with 28 of these scans used as validation samples during training. Training was done by stochastic gradient descent with momentum [47], in batches of 1024 randomly chosen samples at a time. Given the scans in this project were collected in a radial pattern, an equal number of scans from each orientation were used during training and validation. For the prediction of the boundaries, a CNN was trained to predict the probability of a sample patch belonging to any of the seven boundary classes, or the background class in the case it did not belong to any boundary. The probability map contains values between zero and one, where values close to one indicate a high likelihood of the boundary being present in that particular pixel position (Fig. 3).
To perform image segmentation from the probability maps, a graph-search method is used to find the highest weighted probability path from one side of the scan to the other side for each boundary. This follows a similar approach to Chiu et al [16]. Thus, the probability map represents a graph of nodes, where each pixel corresponds to a node. Dijkstra's algorithm [48] determines the preferred path between any two nodes on the entire graph by calculating the lowest weight between them. By initializing the start-node and end-node at each side of the probability map, the detected path matches with the layer of interest given the appropriate weight map. The weight assigned to the edges connecting adjacent nodes a and b is calculated as follows; W ab = 2-(g a + g b ) + W min . Where g a , g b are the gradient information at nodes a and b respectively with g a , g b ∈ [0; 1] and W min is a constant value of 10 −5 . Here a 4-node connection is used (right, top, left, and bottom). After the weights maps are calculated, Dijkstra's algorithm is used to determine the lowest weighted path of a graph between the start and end nodes. This process is repeated for all seven non-background classes (i.e. boundaries), until a prediction has been created for each layer.

Network design rationale
There were several networks created during testing, however only the best performing networks will be discussed in detail. The CIFAR network, used by Fang et al. [34], was used as the baseline for comparison purposes. This network originally used a 32x32 size input, but this was adapted to a 33x33 image patch to center the boundary at the center of the patch. The aim of this work is to examine the effects of a number of CNN modifications (patch size and network design) with the goal of optimizing retinal layer segmentation.
Patch size: When the original patch was used, some boundaries were easily confused with neighboring boundaries without the full topological frame. This negative effect was particularly significant when reducing the class size (see section 3.3). For this reason, we hypothesized that an increase in patch size may improve performance for these particular layers. Previous studies have shown that using a larger image patch in CNNs can lead to better classification accuracy, since CNNs can capture more contextual information to make the decision [49][50][51]. Thus, a larger network was tested (65x65 pixels patch size), which was designed to resemble the baseline network topology. However, it is important to note that increasing the patch size also requires changes in the network, since some of the layers (fullyconnected layers) need to have a fixed size input by definition. Thus, the change of patch size also implies a change in the layer architecture.  Complex network design: The original CIFAR-10 network used in this work was design to handle a RGB object classification problem. Here, we proposed a more complex CNN network, by removing some of the pooling layers. Instead, convolutional layers can be used to reduce the size of the input if it is not zero padded. U-Net uses a similar principle to focus the result on a central area [31]. The benefit of having no zero padding is that there is no effect from adding a large amount of 0's around the border of the image, and the output only depends on the actual data. For comparison purposes, the network was tested with both patch sizes (33x33 and 65x65) explored in the previous version. Table 1 provides the detail for the tested architectures and the four tested CNNs provide various patch sizes and network complexities to gain a better understanding of the CNN performance. The Fang et al. [34] network (CIFAR 33x33) is used as a baseline for our work.

Evaluation
To evaluate the performance of the network, 120 OCT images from 20 subjects were used to create a test data set. All subjects in the test data set were different to those in the training set, including validation subjects. This ensured that the training data set was completely separate from the data set used for the testing. Similar to the training, an equal number of radial orientations were chosen. For all networks, the images were pre-processed with the same filter used by Fang et al. [34], but no additional pre-processing was performed in comparison to the baseline network.
When supplied with a test image, the image was first split into patches around each pixel. In the case of pixels where the patch fell outside the borders of the image, zeros were added to pad the patch. Once every pixel had its corresponding patch created, the patches were provided to the network as inputs, and the resulting class probabilities became that pixel's values in the probability map. Once the probability of every pixel was calculated, the boundaries were segmented using the graph-search approach. These boundaries were then compared to the known truth values (from the original segmentation data [5] that were examined and manually corrected by an experienced trained human observer) to calculate the mean and absolute error and the standard deviation of the error. In a large portion of the scans used there was sub-optimal image quality and poor scan detail at the extreme edges of the images, as well as anatomical features where layers were not present (e.g. the optic nerve head), so for the calculation of error, the images were truncated by 100 pixels on either edge.
Repeatability was also measured by comparing the predicted thickness between each series of 2 repeated measurements for a single participant. Only 14 participants of the 20 used for testing had consecutive measurements present for all orientations in both series, so only 84 pairs of B-scans were used for assessing the repeatability.

Overall performance
To compare the overall performance of the network, the difference between the predicted boundary location for each network architecture and the true boundary location was calculated using both mean error and mean absolute error on a column by column basis, with the standard deviation of both errors also included. All values are given in pixels throughout the paper. The mean absolute error is a measure of how far all points are from the truth-value regardless of whether these points fall above or below the truth. It can be observed that the mean absolute per-layer error for the 65x65 networks are comparable or slightly lower than the 33x33 network (mean absolute error for all layers 1.30 pixels [33x33 CIFAR] and 1.07 pixels [65x65 CIFAR]), (Fig. 4 and Table 2 top). However, there is also a more significant decrease in the standard deviation of the large networks relative to the 33x33 networks. This represents both a more consistently correct segmentation outcome, and that the errors were generally clustered very close to the truth.

Table 2. Difference in boundary position between the different network architectures and patch sizes and the manual observer for the entire data set (120 B-scans). The results are reported in mean values and (standard deviation) in pixel units
In comparison, the mean error is the average of both positive and negative errors, so this mean error value can be balanced out by similar amounts of positive and negative errors even if the predictions are not close to the truth-value. Across the different networks, the values of the mean error are small (all below 1 pixel), with only minimal differences across the different networks Table 2 (top). Similarly, the standard deviation shows small differences across most of the retinal boundaries for the larger patch image.   Figure 4 provides the comparison for each of the seven boundaries considered in the study, with the graphs providing a mean absolute error for the seven boundaries of interest for each of the four considered CNN architectures. When viewed on a layer by layer basis (Fig.  4), the performance of the methods varies between the different layers as well as with distance from the foveal center within each layer. Although the larger CIFAR network yields a lower mean absolute error in most boundaries in comparison to the smaller CIFAR network, the more complex networks appear to provide more significant improvements. Across the different networks, the layers that showed slightly worse performance were the NFL followed by the INL/IPL and OPL/INL, particularly for the 33x33 network, but performance improved significantly using the larger patch size 65x65 or the Complex CNN. Figure 5 provides an illustration of the boundary prediction from an A-scan of a representative B-Scan image, with three (33x33 CIFAR and both Complex networks) corresponding per-layer probability profiles for that A-scan (i.e. a cross-section of the perlayer probability maps). For this particular example, it can be observed how the different architectures affects the outcomes. Both of the complex networks show narrower peaks in the probability maps, where the peak of the probability coincides well with the true location marked by the observer. The baseline network (CIFAR 33x33) provides a wider profile with less distinct peaks and more false positives (i.e. secondary peaks) from other layers. Overall, the large (65x65) Complex network appears to provide better results and shows a closer agreement with the experienced observer. Figure 6 presents the boundary comparison (large Complex network versus manual) for five B-scans from five different subjects, illustrating this close agreement.

Repeatability results
Repeatability error was calculated for each retinal layer. The error was calculated by comparing the thickness from one series of measurements to the other series of measurements (i.e. a comparison between the two repeated series of 6 radial scans collected on each subject). Comparisons were only made between scans of the same orientation between series. Table 3 presents both the mean thickness repeatability as well as the mean absolute thickness repeatability for baseline (33x33 CIFAR) and large complex (65x65 Complex) networks. The mean absolute thickness repeatability shows the larger differences between networks, with the 65x65 network exhibiting the best overall performance. The Complex large network shows the lowest errors across all boundaries, with a mean reduction in the mean absolute repeatability of 0.85 pixel [range 0.01 to 2.41 pixels] compared to the other network. For the mean absolute thickness repeatability, there is a larger margin in standard deviation, with the larger Complex 65x65 network showing substantially smaller standard deviations, indicative of more consistent performance, having smaller errors on average compared to the 33x33 CIFAR network.

Effect of CNN classes on performance
To observe the effect that the number of classes has on the performance of the different networks, all four networks were trained again but using only 4 classes (background class plus three boundaries, including ILM, INL/IPL and RPE) and the results were compared with those results from the previous 8 classes (background class plus seven boundaries: ILM, GCL/NFL, INL/IPL, OPL/IPL, ELM, ISe and RPE). Table 4 provides the summary performance for the boundary position evaluated on the 120 OCT images from 20 subjects.
The results demonstrate that the number of classes can have a negative effect on the performance of the network, since reducing the number of classes increases the boundary error. This increase is especially significant for the mean absolute error, but the standard deviation of the mean error also shows an increase. This effect is likely linked to the higher number of false positive in the probability maps from the group with a lower number of classes, which will negatively impact the graph-search method. To show this, Fig. 7 compares the probability maps for a particular B-scan, only for the three layers of interest (ILM in blue, INL/IPL in orange and RPE in red). The color in the map indicates a high probability of a boundary to be present in that location. The image shows the effect on performance while training the CNN with different networks and training classes. Interestingly, for the larger and complex networks, the effect is reduced and the probability maps show a narrower path with fewer false positives, indicating the superior performance of this network.  Tables 2 and 3 illustrate the close agreement between the automatic and manual methods as well as the repeatability of the outcomes, and generally demonstrate only small errors between the automated and manual (truth) methods on average. However, it is also important to understand the nature of the disagreement between the methods. For all seven boundaries, the GCL/NFL generally showed the greatest differences especially near the edges of the OCT images in the presence of retinal blood vessels as shown in Fig. 4. To better understand this, a few examples of interest are discussed here and presented in Fig. 8. In Fig. 8, the results from the manual observer (red line) are compared with the automatic method (yellow dotted line) for the GCL/NFL boundary. For each example, a close-up region were the error is located is presented at the bottom of the image, along with the probability map. For a number of these examples, the probability maps highlight two potential paths, above and below the GCL/NFL, given that the GCL/NFL has such large topographical variations (features) across the image, the "learned features" vary significantly. For these particular examples, the paths with the false positive coincide with the regions of the NFL where there are retinal blood vessels leading to shadows and hence reduced image contrast in the region.

Other results
In other examples shown, the ILM boundary provides a stronger feature that is traced by the graph-search, similar to the way the GCL/NFL features appear close to the foveal center.

Conclusions
This study examined the effect of patch size and CNN architecture design to provide a novel approach for the segmentation of the retinal boundaries in OCT images. The technique builds upon and extends a previously proposed method that combines CNN methods with graphsearch [34]. The CNN method is used to detect the probability of the locations of the boundaries present in the OCT image. The boundary probability maps are then traced using a graph-search method to extract the position of the boundaries. In this work, a number of different CNN design aspects were proposed and their performance compared, demonstrating that CNN architecture has a significant impact on the segmentation results. Specifically, transitioning to a large input patch for the CNN seems to take advantage of the rich features and improved the performance and the detection of the retinal boundaries, this improvement was particular evident in the reduction of the standard deviation error. Although no previous studies have systematically examined the influence of patch size on CNN performance for the analysis of OCT images, our findings are in agreement with previous research that has shown that a larger image patch in CNNs can lead to better classification accuracy [49][50][51]. However, it is important to note that part of this improvement may be due to the fact that a more complex network is needed to handle the change in patch size. In this work, the effect of a more complex network architecture was also evaluated, which showed a significant improvement across all layers, in comparison to the original network. Although the improvement was more significant for the larger Complex network, even the smaller patch size Complex outperformed the two tested CIFAR networks. Overall, from the results obtained on a data set containing 120 B-scans, the proposed method shows close agreement with manual analysis performed by an experienced observer.
In this study, it was also demonstrated that the performance of this architecture depends on the number of classes used during training. Similar features across the image are likely to trigger the incorrect detection of the boundaries (i.e. false positives), which will affect the graph-search and result in segmentation error. The proposed larger and more complex network showed less dependency on the number of classes and should be considered for OCT retinal boundary classification problems with fewer classes (pre-segmented retinal boundaries).
Despite the encouraging results given by the technique in this data set of images, there are a number of potential limitations that should be considered. All the B-scans had a quality index (QI) value greater than 20 dB (the mean QI from all measurements was 33 ± 3 dB) as per the manufacturer recommendations. This value is achieved through the averaging of multiple B-scans to reduce speckle noise and enhance contrast, however, B-scan averaging may not be feasible with dense volumetric scanning protocols. To ensure the method can deal with images of lower quality, a number of options should be explored in the future. Training the CNN with labelled retinal images obtained under lower QI (fewer averaged images) or data augmentation. This second approach would involve adding noise to the data, using an OCT speckle noise model [52,53]. It is also acknowledged that none of the subjects from the data set had retinal pathologies. Although the small square network (33x33), which is used in this work as a benchmark, has already shown good results for non-exudative age-related macular degeneration [34], the use of the proposed CNN architectures and their benefit in subjects with different posterior segment pathologies should be validated in the future.
Manual analysis of retinal boundaries is a complicated and time-consuming task. The method proposed here provides encouraging results for the automated segmentation of optical coherence tomograms of the normal human retina, with a close agreement with the results from an experienced human observer. The work presented here may assist in the future development of CNN methods for retinal boundary detection.