Automatic segmentation of OCT retinal boundaries using recurrent neural networks and graph search

: The manual segmentation of individual retinal layers within optical coherence tomography (OCT) images is a time-consuming task and is prone to errors. The investigation into automatic segmentation methods that are both efficient and accurate has seen a variety of methods proposed. In particular, recent machine learning approaches have focused on the use of convolutional neural networks (CNNs). Traditionally applied to sequential data, recurrent neural networks (RNNs) have recently demonstrated success in the area of image analysis, primarily due to their usefulness to extract temporal features from sequences of images or volumetric data. However, their potential use in OCT retinal layer segmentation has not previously been reported, and their direct application for extracting spatial features from individual 2D images has been limited. This paper proposes the use of a recurrent neural network trained as a patch-based image classifier (retinal boundary classifier) with a graph search (RNN-GS) to segment seven retinal layer boundaries in OCT images from healthy children and three retinal layer boundaries in OCT images from patients with age-related macular degeneration (AMD). The optimal architecture configuration to maximize classification performance is explored. The results demonstrate that a RNN is a viable alternative to a CNN for image classification tasks in the case where the images exhibit a clear sequential structure. Compared to a CNN, the RNN showed a slightly superior average generalization classification accuracy. Secondly, in terms of segmentation, the RNN-GS performed competitively against a previously proposed CNN based method (CNN-GS) with respect to both accuracy and consistency. These findings apply to both normal and AMD data. Overall, the RNN-GS method yielded superior mean absolute errors in terms of the boundary position with an average error of 0.53 pixels (normal)

of-the-art performance on a data set for brain MRI segmentation. Investigating dynamic cardiac MRI reconstruction, Qin et al [46], used a novel convolutional recurrent neural network which outperformed existing techniques in terms of both speed and accuracy and also required fewer hyper parameters. In another paper, Xie et al [47], attempted to use a clockwork recurrent neural network (CW-RNN) [48] based architecture to accurately segment muscle perimysium.
While a RNN approach has proven successful in the analysis of a range of medical images, to the best of our knowledge, there is no previous work utilizing RNNs as an approach to segment retinal boundaries in OCT images. There is also little evidence of RNNs being applied directly to individual medical images to extract spatial features. Instead, they have been used predominantly to extract temporal features from sequences of feature maps, with convolutional neural networks (CNNs) preferred to operate spatially on each image as a prior step. In this work, a novel recurrent neural network combined with a graph search approach (RNN-GS) is presented. This combines patch-based boundary classification using RNNs with a subsequent graph search to delineate retinal layer boundaries. This approach is partly inspired by the work of Fang et al [20], but in the methodology presented here the CNN is replaced with a RNN. A detailed selection of the RNN architecture and configuration as well as the evaluation of the optimal RNN model is presented.
The paper is structured as follows. In Section 2, the RNN-GS methodology and approach is outlined, including details about the data sets used as well as the RNN model and architecture selection. Section 3 presents experimental classification results for a range of RNN architectures which were used to inform the empirical selection of a suitable RNN design. Section 4 presents the segmentation results for the selected RNN design with performance evaluated against other CNN based methods. Discussion of the method and results are provided in Section 5. Concluding remarks are included in Section 6.

Data set 1 (normal OCT images)
The first data set (data set 1) used in this work consists of a range of OCT retinal images from a longitudinal study that has been described in detail in a number of previous publications [21,[49][50][51], The data comprises OCT retinal scans for 101 children taken at four different visits over an 18-month period. All subjects had normal vision in both eyes and no history of ocular pathology. The images were acquired using the Heidelberg Spectralis (Heidelberg Engineering, Heidelberg, Germany) SD-OCT instrument. At each visit, each subject had two sets of six foveal centered radial retinal scans taken. The instrument's Enhanced Depth Imaging mode was used and automatic real time tracking was also utilized to improve the signal to noise ratio by averaging 30 frames for each image. The acquired images each measure 1536x496 pixels (width x height). With a vertical scale of 3.9 µm per pixel and a horizontal scale of 5.7 µm per pixel, this corresponds to an approximate physical area of size 8.8x1.9 mm (width x height). These images were exported and analyzed using custom software where an automated graph based method [13,52], was used to segment seven retinal layer boundaries for each image. This segmented data was then assessed by an expert human observer who manually corrected any segmentation errors. Throughout this paper, "B-scan" refers to an individual full-size (1536x496) image while "A-scan" corresponds to a single column of a B-scan.
The seven layer boundaries within the labelled data include the outer boundary of the retinal pigment epithelium (RPE), the inner boundary of the inner segment ellipsoid zone (ISe), the inner boundary of the external limiting membrane (ELM), the boundary between the outer plexiform layer and inner nuclear layer (OPL/INL), the boundary between the inner nuclear layer and the inner plexiform layer (INL/IPL), the boundary between the ganglion cell layer and the nerve fiber layer (GCL/NFL) and the inner boundary of the inner limiting membrane (ILM). with one are also associated with each layer including the receptive field size (width x height in pixels), number of filters, and type of recurrent unit. Each additional filter can give the network an opportunity to learn a different pattern. In terms of the recurrent unit types, both the LSTM and GRU are considered as options within each layer. The receptive field corresponds to the volume that is processed by the RNN at each step in each sequence. For each RNN, the spatial dimensions (width and height) of the output volume are equal to the respective sequence lengths in each direction. These sequence lengths are equal to the spatial dimensions of the input volume divided by the corresponding receptive field dimensions. Meanwhile, the depth (number of channels) of the output volume is simply equal to the number of filters. For example: an 8 filter horizontal unidirectional RNN operating with an input volume size of 16x16x3 (width x height x channels) and a receptive field size of 2x2, would mean that a volume of size 2x2x3 is processed by the RNN for each of 8 steps (horizontally), for each of 8 sequences (vertically), with an overall output volume size of 8x8x8. A visualization of this example is illustrated in step 1 of Fig. 2. For bidirectional layers, the output volumes of each pass are concatenated together along the depth dimension as depicted in steps 2 and 3.
To avoid overfitting to the training data, the dropout regularization technique was utilized [61]. Here, each layer is equipped with a level of dropout which corresponds to the percentage of units in that layer which are randomly turned off (dropped) in each epoch. Batch normalization [62], at the input to each layer was also used. This ensures that the mean and variance of each mini-batch is scaled to 0 and 1 respectively, which can help to improve the performance and stability of the network during training.

Training and patch classification
The RNN model is trained as a classifier. This is done by constructing small overlapping patches from the OCT images and assigning each to a class based on the layer boundary that they are centered upon. These constitute the "positive" training examples. For patches not centered upon a layer boundary, Fang et al [20], utilized a single class for "negative" training examples, also called the "background" class. In this study, two background classes were used with the intention to better capture the different background features of the OCT image, particularly some of the features and image artefacts in the retina, vitreous (anterior to the retina) and in the choroid and scleral region (posterior to the retina). The first background class consists of patches centered within the retina (between the ILM and RPE for data set 1 and between the ILM and BM for data set 2) as well as in a small region of both the vitreous and choroid directly above and below the retina respectively. The height of these smaller regions is set to be equal to the patch height. All patches within the described area that are not centered on any boundary are considered part of this class. The second background class consists of patches centered in a region bounded between the bottom of the first background class region and the bottom of the image. Zero-padding is added to any patches at the edge of images where required.
Using the 'labelled data A' images as described in Section 2.1 and 2.2, patches were created for both training and validation (for both data set 1 and data set 2). Patches were created for each class with background examples randomly selected within their corresponding ranges. To reduce computational burden, the total number of patches was restricted with patches only created for every eighth column of each image. For data set 1, the training set was comprised of ~1,450,000 patches with an additional ~380,000 for validation. This was a nine-class classification problem with equally weighted classes. Similarly, for data set 2, the training set was comprised of ~980,000 patches with an additional ~320,000 for validation. This was a five-class classification problem with equally weighted classes. In an effort to maximize training performance, all patches were normalized (0-1) before they were input to the network. However, unlike previous studies [13,20,21], intensity normalization and other image pre-processing steps were not used in this work for any of the data sets.
The Adam algorithm [63], a stochastic gradient-based optimizer, was used to train the network by minimizing log loss (cross-entropy) [64]. Empirically, Adam performs well in practice with little to no parameter adjustment and compares favorably to other stochastic optimization methods [63]. Due to Adam's relatively quick convergence to good solutions in a small number of epochs, no early stopping was used for training. In addition, given the adaptive per-parameter learning rates that this optimizer possesses, no learning rate scheme was deemed to be necessary. The network was trained for 50 epochs with a batch size of 1024 and the model that yielded minimum validation loss was selected. This is similar to approaches described and used elsewhere [65,66]. The number of epochs and the batch size were chosen empirically, while the algorithm parameters were left at their recommended default values. The Keras API [67], with Tensorflow [68], backend in Python was the programming environment of choice.

Probability maps and graph search
For a single OCT test image, patches are generated for every pixel and passed to the trained neural network to be classified. From this, per class probability maps can be constructed and a graph search performed to delineate the layer boundaries. The idea was proposed by Chiu et al [13], for the segmentation of OCT images which has been adapted in a number of CNN studies [20,21]. However, in contrast to previous work, in this study the search path was not limited between the top and bottom layer boundaries. Each probability map can be used to construct a directed graph where the pixels in the map correspond to vertices in the graph. Each vertex is connected to its three rightmost neighbors (diagonally above, horizontally, diagonally below). The weights of these connections are given by the equation: where P s and P d are the probabilities (0-1) of the source and destination vertices respectively, and w min = 1x10 −5 is a small positive number added for system stability.
To automate the start and end point initialization, a column of maximum intensity pixels is appended to both the left and right of the image. As well as being connected to their rightmost neighbors, vertices in these columns are also connected vertically from top to bottom. This allows for a graph search algorithm, like Dijkstra's shortest-path algorithm [69], as used here, to start at the top-left corner and traverse the graph through to the bottom-right corner without any manual interaction. In this way, a graph cut is performed and this shortest path is used as the predicted location of the layer boundary.

Comparison of methods
A comparison between the RNN-based method and a patch-based CNN method is presented. The CNN method used is identical except that the RNN is replaced by a CNN. The CNN used here is the so-called "complex CNN" proposed by Hamwood et al [21]. This is trained identically to the RNN as described in Section 2.5.
In addition, a comparison between the patch-based method and a full image-based method is also provided. The method for comparison is a fully-convolutional network and graph search method (FCN-GS). For this, the patch-based classifier network is replaced with a U-Net [70], style architecture similar to that used by Venhuizen et al [26]. The FCN used here consists of four down sampling blocks each with two 3x3 convolutional layers. The network was trained for 50 epochs with a batch size of three using Adam with default parameters in a similar way to the patch-based networks in Section 2.5. Cross-entropy loss is used here to classify each pixel of the image into one of eight area classes. These eight areas are constructed between adjacent layer boundaries and the top and bottom of the image as required to create an overall area mask. For A-scans where at least one layer boundary is not defined, the image is zeroed with the corresponding columns in the area mask set to be defined as the top-most region. The overall method is similar to that used by Ben-Cohen et al [25], where the FCN is used for semantic segmentation on whole OCT images. Instead of classifying patches to generate probability maps, the Sobel filter is applied to the area probability maps output from the FCN to extract the boundary probability maps. A shortestpath graph search is then performed using these boundary probability maps in the same way as the patch-based method.

Evaluation
As described in Section 2.1 for data set 1 and Section 2.2 for data set 2, the images contained in labelled data B were used to evaluate the whole method. By comparing the predicted boundary positions to the truth (the segmentation from the expert human observer), the mean error and mean absolute error with their associated standard deviations were calculated for each layer across the whole test set. For data set 1, the full-width image was used for both patch creation and performing the graph search. However, due to the lack of consistency of the layers around the left and right extremities of the image (e.g. presence of optic nerve head and shadows), the first and last 100 pixels of each side were excluded from the final error calculations and comparisons. For data set 2, the full-width image was used as input to the network with a full-width probability map used for the graph search. However, as the true layer boundaries were not defined in every column, only those columns with all true boundary locations present were used for error calculations and comparisons.

RNN design
In order to design a suitable RNN architecture for patch classification, the impact of various network parameters on performance was examined. This section presents the results for a range of experiments including; the impact of patch size and direction of operation (Section 3.1), receptive field size (Section 3.2), number of filters (Section 3.3), stacking and ordering of layers (Section 3.4), and fully-connected layers (Section 3.5).
Visin et al. [38], used dropout after each layer in their network while Srivastava et al. [61], show that 50% dropout is a sensible choice for a variety of tasks. As such, 50% dropout is added after each fully-connected layer in Section 3.5. However, due to the relatively small number of parameters in the RNN layers used here, this level of dropout was deemed to be unnecessary and a potential hindrance to performance. Instead, 25% dropout is used after each RNN layer to ensure sufficient network capacity.
All networks were trained identically as described in Section 2.5. The experiments are performed only utilising data set 1 to allow for a fair comparison with Hamwood et al. [21], and their CNN which was also tested and optimized using normal OCT images. The generalizability of each network is assessed by measuring the classification accuracy on the validation set (described as part of labelled data A in Section 2.1). This is computed by dividing the number of correctly classified patches by the total number of patches. Due to randomness associated with both the network weight initialization and batch ordering leading to possibly different solutions, each experiment was performed three times and the results were averaged. These experiments were used to inform the careful selection of the most suitable final RNN architecture that was employed, which is described in Section 3.6.

Patch size and direction
For the design of the RNN, it is of interest to investigate the effect of the patch size (height x width pixels) on the network performance. In their CNN-GS approach, Fang et al [20], used a 33x33 patch size centered on the layer boundaries, while Hamwood et al [21], showed that increasing the size can improve network performance. With this in mind, 32x32, 64x32, 32x64 and 64x64 patches are compared on a range of RNN architectures. Even-sized dimensions are chosen to facilitate the network model and to avoid additional zero padding. Because of the even-size, the patch cannot be truly centered, and therefore each is consistently placed with the layer boundary positioned on the pixel above and to the left of the central point. Table 1 outlines the results of the experiments undertaken. A small but significant improvement in classification accuracy was observed when using a vertically oriented 64x32 patch (longer along the A-scan direction) compared to a 32x32 (about 1.1% mean improvement). However, this level of improvement is much less pronounced when comparing the 32x32 with the horizontally oriented 32x64 patch size (about 0.4% mean improvement). Despite possessing twice as many pixels, the 64x64 patch does not exhibit a clear performance benefit compared to the 64x32 patch (below 0.1% mean improvement). Thus, the 64x32 patch size appears to yield the best trade-off between accuracy and complexity for the tested sizes. It should be noted that other patch sizes were not tested for computational reasons. Within the ReNet layers [38], RNNs were used separately to process the input horizontally or vertically. To better understand the impact that the direction of operation has on network performance, these different options were considered. As shown in Table 1, the direction of operation appears to have a small impact on the classification accuracy, although it is worth noting that RNNs operating in the vertical direction outperform their horizontal counterparts by a small percentage. However, operating bi-directionally does not appear to yield improved performance.

Receptive field size
The effect of the receptive field size on the network performance was also investigated. Visin et al [38], used a receptive field size of 2x2 between each of their ReNet layers. Here, a variety of square and rectangular receptive field sizes were compared on a single-layer vertical unidirectional RNN with the results outlined in Table 2. Similar to the effect of patch size described in Section 3.1, the vertical rectangular receptive fields provide a marginal improvement in performance compared to the equivalent horizontal variants, attributable to the vertical nature of the layer structure in the image. Overall, most of the tested sizes give similar performance indicating that the size of the receptive field does not have a significant impact on the accuracy for the tested data set.

Number of filters
Increasing the number of filters gives the neural network more parameters and hence more opportunity to learn. The change in classification accuracy, as the number of filters in a single layer vertical unidirectional RNN is varied, was investigated to better estimate the optimal number of filters and the impact on performance. Table 3 shows that adding more filters yields a small increase in classification accuracy, albeit with diminishing returns. For this single layer network, choosing 32 filters gives a good trade-off between accuracy and complexity.

Stacked layers and order
The ReNet architecture [38], uses several layers of RNNs, each of which first operate on the input vertically before horizontally. Here, the effect of adding additional layers to the network as well as the order that these are stacked together was evaluated. The results presented in Table 4 indicate that stacking layers improves the classification accuracy. Further, stacking both horizontal and vertical RNN layers yields greater performance than solely using vertical ones. There is no noticeable performance difference when changing the stacking order. This is also the case when using bi-directional RNNs, reinforcing the results presented in Section 3.1.

Fully-connected layers
Visin et al. [38], used one or more fully-connected (FC) output layers of size 4096 in their ReNet architecture. The effect of including a fully-connected layer in our network design was also evaluated. The results presented in Table 5 show that adding a fully-connected layer has little benefit given the corresponding drastic increase in network parameters.

RNN architecture selection
Based on the experimental findings presented in Sections 3.1-3.5, a RNN architecture was selected. An overview of this architecture is provided in Table 6. As discussed in Section 3.5, no fully-connected layers are used due to their seemingly negligible performance benefit. Two sets of vertical and horizontal bi-directional layers are used, each with a size of 32 filters (16 per direction) and 25% dropout. Because the classification is based on pixel level accuracy, the first two layers are equipped with a 1x1 receptive field to enable the network to initially process the full-sized image on a pixel by pixel basis. The subsequent layers utilize a 2x2 receptive field with the intention of allowing the network to learn context at different levels. As described in Sections 3.1-3.5, the network operates with gated recurrent units (GRUs) which were found to perform comparably to LSTMs for this problem.

Normal OCT data (data set 1)
Using normal OCT images (data set 1) as described in Section 2.1, the RNN-GS method was evaluated as described in Section 2 using the RNN architecture selected and trained as outlined in Section 3. Utilizing a 64x32 patch size, the network yielded a validation classification accuracy of 96.84% (0.05) taken as the average over three training runs. The mean accuracy of the seven boundary classes (excluding the background) was 98.25% (0.06) with the individual per-class accuracies ranging between 96.52% (0.08) (the IPL) and 99.24% (0.08) (the ILM). With the chosen patch size, the RNN architecture consisted of ~70,000 total parameters. Using an Nvidia GeForce GTX 1080Ti + Intel Xeon W-2125 the average evaluation time per B-scan was ~145 seconds. Here, the time to generate the probability maps was ~105 seconds on average with an average of ~40 seconds to perform the graph search for all seven boundaries. The segmentation results for each layer boundary are presented below in terms of the mean error and the mean absolute error as well as their standard deviations. The patch-based approach was also evaluated using the Complex CNN architecture as described in Section 2.7 using the same set of 64x32 patches. To support the patch dimensionality, a 13x5 fullyconnected output layer was used. Averaged over three training runs, the CNN provided a validation classification accuracy of 96.36% (0.04), 0.48% lower than the RNN. The per-class accuracies ranged between 95.65% (0.85) (the IPL) and 99.17% (0.11) (the ILM) with a mean accuracy for the seven boundary classes of 97.94% (0.08), 0.58% lower than the RNN. This CNN architecture consisted of ~1,200,000 total parameters, approximately 17 times as many as the RNN. Using the same hardware, the average evaluation time per B-scan was approximately 65 seconds, about 2.2 times faster than the RNN. Given the same time for the graph search (~40 seconds), this corresponds to ~25 seconds on average to generate the probability maps which is about 4.2 times faster than the RNN.
The segmentation errors in terms of boundary positions (in pixels) are presented in Table  7. The mean errors (and mean absolute errors) between methods are of similar magnitude, which suggests that the two networks give a similar level of performance with the RNN based approach performing marginally better on each boundary with a 0.02 to 0.05 pixels mean absolute error improvement with the exception of the GCL/NFL (0.12 pixels improvement in mean absolute error). This corresponds to an average improvement of 0.05 pixels (mean absolute error) with the RNN-GS yielding an average of 0.53 pixels mean absolute error on each boundary compared to the CNN-GS with 0.58. Both RNN-GS and CNN-GS performed the best on the ISe boundary with 0.33 and 0.35 pixels mean absolute error respectively, whereas both performed the poorest on the GCL/NFL with respective mean absolute errors of 0.84 and 0.96 pixels. The standard deviations of the errors are also consistently smaller for the RNN-GS method for each of the considered layers, indicating a greater level of consistency in the segmentation compared to the CNN-GS approach. The error profiles in Fig. 3 also demonstrate consistently small errors across the central 6 mm of the B-scan for each layer, and also shows a high level of similarity between the two considered methods, with the exception of the GCL/NFL where RNN-GS performed noticeably better across the entire boundary. Observing these profiles also shows that both networks exhibit a noticeable central error spike for the OPL/INL boundary attributable to the merging of the layer boundaries at the fovea. Also, all the layer boundaries showed a spike in error on the far right side of the profile, which corresponds to the location of the optic nerve head for a number of scans, where retinal boundaries disappear in this region. Some example segmentation plots for data set 1 using the RNN-GS method are displayed in Fig. 4.
The patch-based method employed here is also compared with a fully-convolutional based approach (FCN-GS) as described in Section 2.7. In terms of the boundary position error (Table 7), the FCN-GS method is comparable in accuracy to RNN-GS and CNN-GS with an average mean absolute error of 0.55 pixels compared to 0.53 and 0.58 for RNN-GS and CNN-GS respectively. However, FCN-GS shows a greater level of consistency for the segmentations with smaller standard deviations on all boundaries with the exception of the ELM. Similar to the two patch-based methods, FCN-GS showed the lowest error on the ISe and highest error on the GCL/NFL. In addition, Fig. 3 shows the error profiles of FCN-GS to be somewhat similar to the two patch-based methods. The FCN contained ~490,000 parameters, approximately 7 times more than the RNN. However, the FCN was much faster in general with a per-image probability map creation requiring about one second, approximately 100 times faster than the RNN. For per-image evaluation overall, FCN-GS was ~3.5 times faster than RNN-GS when taking the graph search into consideration.   [38], there was no observed performance difference between the LSTM and GRU recurrent units. Using a data set comprising normal OCT images (data set 1), the segmentation results showed the RNN based approach performed competitively in comparison to a CNN approach using the same patch-size. RNN-GS showed marginally smaller mean absolute errors for all seven layer boundaries and a greater consistency (i.e. smaller standard deviations) in the segmentation than CNN-GS. Overall, mean absolute errors of less than one pixel for all seven layer boundaries were observed, with less than half a pixel for four of those boundaries, indicating close agreement to the truth. Despite possessing 17 times fewer parameters, the evaluation time of RNN-GS was longer than that of CNN-GS. This can be attributed to the relatively high number of operations required to process an image sequentially as is the case with the RNN.
To gauge the performance on pathological data, RNN-GS and CNN-GS were also evaluated using a data set comprising OCT images from patients exhibiting age-related macular degeneration (data set 2). The RNN showed competitive performance with smaller mean absolute errors for all three layer boundaries corresponding to a mean improvement of 0.36 pixels. In particular, CNN-GS exhibited a small number of major failure cases for the ILM boundary. These failure cases were the result of the relatively high level of noise within some B-scans where the ILM was less well defined. These failures were not evident for RNN-GS possibly indicating a greater robustness of the method in the presence of noise. Segmentation of the RPEDC and BM boundaries proved more challenging in the presence of pathological features. However, for both of these boundaries, RNN-GS exhibited marginally superior mean absolute errors and standard deviations compared to CNN-GS continuing the trend evident within the results from the normal OCT images.
It should be noted that this work did not focus on the performance of the method with regards to different ocular pathologies with only one type of pathological data (AMD) investigated here. In addition, the RNN network architecture here was optimized using data from normal OCT images. Future work should attempt to further explore the application of this method to pathological data by extending the types of pathologies present within the data as well as investigating an optimal network architecture for such.
In the past, RNNs have proven useful for tasks involving sequential data whereas CNNs have had considerable success when applied to image data. Consequently, RNNs have received less attention for image classification problems. Here, the ability of an RNN to perform competitively against a CNN on such a task was investigated. RNNs are suited to sequential data, so the good performance relative to the CNN may be attributed to the sequential nature of the retinal layer structure and features. In all, RNNs provide a viable alternative to CNNs for this particular problem even with retinal pathology and poor image quality conditions (AMD data).
It should be noted that the 64x32 patch size used here is not necessarily the most optimal, with a number of alternative patch sizes not tested for computational reasons. Nonetheless, these are promising results and the performance here is encouraging. Future work may investigate other patch sizes and, in particular, larger vertically-oriented rectangular patches as these appear to give the best tradeoff between performance and speed.
The patch-based approach presented here (RNN-GS) was compared to a full image-based approach utilizing a fully-convolutional network (FCN-GS). For normal OCT images (data set 1), the accuracy was comparable with RNN-GS. However, FCN-GS was more consistent in the segmentation with lower standard deviations for most boundaries. FCN-GS was also much faster in terms of evaluation highlighting this as a possible drawback of the patch-based method especially when time is critical (e.g. for many clinical applications where rapid segmentation performance is required). However, it should be noted that optimizing the speed of the RNN was not a focus here and should be investigated in future work.
For AMD OCT images (data set 2), the overall accuracy between RNN-GS and FCN-GS was again comparable. Like CNN-GS, FCN-GS exhibited a number of major failure cases, which were responsible for the relatively high mean absolute error and standard deviation on the ILM. On the other hand, FCN-GS was more accurate and consistent on the BM boundary. It is possible that the superior performance is a result of the greater amount of context available to the FCN while processing the whole image at once. Future work in the area may further investigate the relative performance of the patch-based method compared to fullimage based methods.

Conclusions
In this paper, the RNN-GS method exhibited promising results for the segmentation of retinal layers in healthy individuals and AMD patients. In addition, RNNs have been identified as a sensible alternative to CNNs for tasks with images involving a sequence as is the case with the layer structure observed in the OCT retinal images used in this work. The results and the RNN-GS methodology presented here may assist future work in the domain of OCT retinal segmentation and highlight the potential of RNN-based methods for OCT image analysis.