An end-to-end network for segmenting the vasculature of three retinal capillary plexuses from OCT angiographic volumes

: The segmentation of en face retinal capillary angiograms from volumetric optical coherence tomographic angiography (OCTA) usually relies on retinal layer segmentation, which is time-consuming and error-prone. In this study, we developed a deep-learning-based method to segment vessels in the superficial vascular plexus (SVP), intermediate capillary plexus (ICP), and deep capillary plexus (DCP) directly from volumetric OCTA data. The method contains a three-dimensional convolutional neural network (CNN) for extracting distinct retinal layers, a custom projection module to generate three vascular plexuses from OCTA data, and three parallel CNNs to segment vasculature. Experimental results on OCTA data from rat eyes demonstrated the feasibility of the proposed method. This end-to-end network has the potential to simplify OCTA data processing on retinal vasculature segmentation. The main contribution of this study is that we propose a custom projection module to connect retinal layer segmentation and vasculature segmentation modules and automatically convert data from three to two dimensions, thus establishing an end-to-end method to segment three retinal capillary plexuses from volumetric OCTA without any human intervention.


Introduction
Optical coherence tomography (OCT) can non-invasively provide three-dimensional (3D) images of tissue microstructure at micrometer resolution and has been widely used in ophthalmology for research and diagnosing ocular diseases [1]. OCT angiography (OCTA) is a novel imaging modality based on structural OCT. By measuring the OCT signal variation between consecutive B-scans, the intrinsic blood flow signal, down to the capillary level, can be detected and used to generate 3D images of the retinal microvasculature [2,3]. Because of the 3D, high-resolution nature of OCTA imaging, it is uniquely capable of elucidating the retinal circulatory structure in both humans [4] and model organisms [5][6][7][8] in vivo.
Many of the most important OCTA metrics, such as vessel area, skeleton density, and vessel morphological features (for example, caliber or tortuosity), rely on accurate vessel segmentation. A few studies have explored methods to reliably extract the OCTA-generated vasculature from en face images [9][10][11], but most such approaches have limitations. Many studies focused on just segmentation of the superficial complex [11][12][13]. Nonetheless, different diseases can affect separate plexuses differently, and the organization of the different plexuses is important for a full understanding of retinal function [14]. Algorithms that are effective and characterize the deep, as well as the superficial plexus, are also needed. Furthermore, vessel segmentation is just one step in the entire process of deriving a binarized two-dimensional (2D) vasculature map from volumetric OCTA data. Generation of en face images (angiograms) is also crucial. This requires (1) accurate retinal anatomic layer segmentation, and (2) data projection within the segmented slab.
Many OCTA algorithms therefore need additional software support. Retinal layer segmentation in particular is non-trivial, and mis-segmentation can map flow to regions where it could be misinterpreted as pathological. Researchers have proposed a number of layer segmentation algorithms based on both conventional image processing methods [15][16][17] and deep learning approaches [18][19][20][21]. These methods, including deep learning, are sometimes context-dependent based on location [13,22,23], disease [24][25][26], or species [24]. For example, an algorithm designed to segment a healthy human retina may not perform well on an eye with advanced disorganization of retinal layers, or on a rat eye. Following retinal slab segmentation, the projection strategy is then used in the segmented layer to produce a 2D en face image. Within the retina, maximum projection performs better than average projection on OCTA data [27]. All told, layer segmentation and projection design choices and accuracy can greatly influence vessel segmentation. Differences in vessel density studies may therefore be in part attributable to not just the vessel segmentation algorithm itself, but also the layer segmentation and projection methods.
Recently, deep-learning-based methods have demonstrated tremendous success in image processing. Especially in OCTA, deep learning shows great potential in accounting for artifacts [28,29] and enhancing retinal angiograms [30], detecting vascular biomarkers [28,29,[31][32][33], and classifying or staging retinopathy [34][35][36][37][38]. In this study, we aim to use deep learning to achieve an end-to-end algorithm for segmenting the vasculature from OCT angiographic volumes of three retinal capillary plexuses in a rodent model. By using an end-to-end strategy, a completely automatic process can be achieved from OCTA scans (input end) to segmented capillary vasculatures (output end) without any manual intervention in the middle steps. We explored a rodent model in this work for two reasons. First, compared to a human retina, the rodent retina has more sparse vascular coverage. This can significantly improve the generation of ground truth since more isolated vessels are easier to differentiate from the background than the closely-packed capillaries found in human retinas. In turn, a higher confidence in the ground truth will enable better quantification of the network's performance. Second, while many investigators have used OCTA to study the retinal vasculature in animal models of ocular diseases [39][40][41], very few have investigated the robustness of their vessel segmentation algorithms. This study will provide a valuable automatic processing tool for OCTA analysis in animal imaging.

Data acquisition
A total of 88 OCT data volumes were acquired from one or both eyes of 10 Brown Norway rats by a prototype 50-kHz visible-light OCT (vis-OCT) system with a full-width half-maximum bandwidth of 90 nm from 510 to 610 nm [42]. Two to five scans were acquired from each eye. The OCT volumetric scans were collected over a 2.2×2.2-mm 2 field of view. Each volume scan consists of 512 slow axes (Y) position sampling. At each Y position, three consecutive B-scans were captured, each containing 512 A-lines. The OCTA data were calculated simultaneously during scanning using the split-spectrum amplitude-decorrelation angiography (SSADA) algorithm [43].

Convolutional neural network architecture
In this study, the key challenges for accurate vessel segmentation from the OCT/OCTA volume ( Fig. 1 A & B) are identifying the boundary of retinal layers and projecting corresponding volumetric slabs to 2D en face images in a single CNN architecture. To overcome these challenges, we designed a new convolutional neural network ( Fig. 1) that contains a 3D convolutional module ( Fig. 1 C), a custom projection module (Fig. 1 D) and three 2D convolutional modules (Fig. 1  E). The 3D convolutional module takes volumetric structural OCT data ( Fig. 1 A) as an input to identify the boundaries of each retinal layer. This module adopts a U-net-like [44] fully convolutional architecture, which is composed of a down-sampling encoder and an up-sampling decoder. Between the encoder and decoder, three skip-connections connect to the corresponding convolutional layer. The custom projection module was designed to project volumetric slabs to 2D en face images of the SVP, ICP, and DCP. This module takes the retinal segmentation result from the 3D convolutional module and the volumetric OCTA data as input. The last module comprises three parallel 2D convolutional networks that perform vasculature segmentation and output three segmented retinal capillary plexuses ( Fig. 1 F-H). All of these subnetworks have the same architecture, DenseNet [45], and each of them takes one output of the custom projection module as input to perform vasculature segmentation. The custom projection module receives the output of the previous 3D convolutional modules, the retinal layer segmentation result ( Fig. 2 A), and volumetric OCTA data ( Fig. 2 D). It learns a weight vector with shape 6×1 for each vascular plexus ( Fig. 2 B). Each weight vector contains six scalars, which represent the background and five retinal layers (nerve fiber layer (NFL) + ganglion cell layer (GCL) + inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclear layer (ONL) and retinal pigment epithelium (RPE)). Then, the weights are propagated to the whole volume (

Dataset preparation
The dataset used in this study is composed of 88 samples, each sample containing a structural OCT data volume ( Fig. 3 A), an OCTA data volume ( Fig. 3 B), a volumetric ground truth map for the retinal layer segmentation (Fig. 3 C), and three ground truth maps for the retinal vascular plexus en face images ( Fig. 3 D-F). To generate the volumetric ground truth map for retinal layer segmentation, we applied an automated retinal layer segmentation algorithm [16] to segment five retinal layer boundaries, then two certified graders (P.S., M.G.) were employed to correct segmentation errors. For each case, the grader takes about 20 minutes to complete the correction. Then, a third grader (Y.G.) reviewed all the correction results and ensured that no obvious errors in the segmentation. After the retinal layer segmentation, we used maximum value projection [27] to produce a retinal plexus en face OCT angiogram for each plexus. Three experts manually delineated the retinal vasculature in each en face OCT angiogram individually, and the final ground truth map was combined from the three manual grading outputs using a pixel-wise voting method.

Training settings
In the proposed network, the 3D and 2D convolutional modules perform different tasks. The 3D convolutional module performs a segmentation task, and outputs the location of the retinal layers. The 2D convolutional module performs binary segmentation and outputs retinal vessel segmentation results for each separate plexus. The training loss in both modules was calculated by weighted cross-entropy: where C is the number of classes, y i is the ground truth, p i class predicted by the network, and ω i is the class weighting. The weight of the 3D module was set to 1 for the background and 2 for all retinal layers. The weight for the 2D module was set to 1 for background and 2 for vessels. We use the Adam algorithm [46] to reduce the loss during the training phase. The initial learning rate was set to 0.001. The batch size was set to 2 as a compromise due to hardware limitations. The maximum training epoch was set to 1000. A global learning rate decay strategy was used to reduce the learning rate during the training. This strategy will reduce the learning rate by 90% when the validation loss shows no decrease (the difference of losses between two epochs lower than 0.0001) after 5 consecutive epochs. An early-stopping strategy was employed to stop training when the loss shows no decrease (the difference of losses between two epochs lower than 0.0001) after 10 consecutive epochs. The dataset was split into 66 (75%) cases for training, 10 (11%) cases for validation, and 12 (14%) cases for testing. Due to hardware limitations, the samples were randomly cropped to 84×360×84-pixel (width × height × depth) before being fed to the network.
We implemented our network in Python 3.7 with Tensorflow on a PC with an Intel i7 CPU, Nvidia TITAN RTX graphics card, and 64G RAM.

Weight vectors in the custom projection layer
To verify that the custom projection layer is working as we expected it to, we plotted the learned weight vectors of this layer after training (Fig. 4). The weight vector for the SVP (Fig. 4 red line) shows a peak at NFL + GCL + IPL, indicating the OCTA data around the NFL + GCL + IPL slab contributed the most flow signal to generate the SVP. Similarly, the INL slab contributed to the ICP most, and OPL contributed to the DCP most.

Performance validation metrics
We used five-fold cross-validation to evaluate the performance of our network on the entire dataset. To quantify the performance of our network, we calculated three measures (specificity, sensitivity, and F1-score (Eq. (2))) on the results of retinal layer segmentation and retinal capillary segmentation: where TP is true positives (correctly predicted target pixels), TN is true negative (correctly predicted non-target pixels), FP is the false positives (wrongly predicted non-target as target pixels), FN is the false negative (wrongly predicted target as non-target pixels).

Performance on retinal layer segmentation
We evaluated the performance of the retinal layer segmentation in Table 1. The specificity was high for all retinal layers, while the sensitivity was lower in the INL and OPL. This may be because the INL and OPL have relatively lower area ratios than other layers, which makes them more vulnerable to segmentation errors. As F1-score considers both specificity and sensitivity, it is likely a better indicator of overall network performance. Large vessel shadows cannot be ignored in retinal layer segmentation. Since the 3D convolutional module that we used in our network can extract the context from the 3D volumetric data, the network was robust in areas with strong shadow artifacts caused by large vessels (Fig. 5.  A). With the accurate segmentation from the 3D convolutional module, the custom projection module can generate high-quality 2D en face images of retinal capillary plexuses (Fig. 5. B-D).

Performance in segmentation of three retinal capillary plexuses
The specificity approached 1 in each of the three retinal capillary plexuses, indicating high performance on distinguishing noise and background from the flow signal ( Table 2). The sensitivity deteriorated in the ICP and DCP, which may be due to the relatively low layer segmentation accuracy in the INL and OPL. Compared to the SVP, the vasculature in the ICP and DCP have lower contrast and higher noise (Fig. 6. A1-C1). Moreover, at the junction between SVP and DCP, the vasculature in the ICP appears discontinuous (Fig. 6. B), which may contribute to the low sensitivity of the proposed method.

Vessel density quantification
We quantified the vessel density from the output of our network and compared it with the ground truth map on the test set. To increase the sample number for quantification, we split each en face plexus image into four equal parts, then calculated the mean vessel density in each part. As our method shows high performance on SVP, the vessel density on SVP shows very high consistency (Fig. 7 A). Although our method shows relatively lower accuracy for the ICP and DCP (Table 2), the vessel densities in the ICP and DCP were also very consistent (Fig. 7 B, C).

Discussion
We used a deep convolutional network to design an end-to-end method to automatically segment all three retinal plexuses (SVP, ICP, and DCP) in visible light OCT/OCTA data from rat eyes. Prior to this, several automated retinal layer segmentation methods [15,[17][18][19][20][21] and vessel segmentation algorithms [47][48][49] have been separately developed. However, those vessel segmentation algorithms utilized a device-specific layer segmentation algorithm to generate the 2D en face angiograms. Particularly, Li et al. [50] proposed a deep-learning-based method to perform the segmentation from 3D OCTA data to produce a 2D en face image, but they only segmented the large vessels of the SVP and not the capillaries. To the best of our knowledge, this is the first end-to-end method to segment all three retinal capillary plexuses from volumetric OCTA data. Our method contains three modules, a 3D convolutional module, a custom projection module, and a 2D conventional module. The 3D conventional module adopted a U-net-like architecture. With skip-connection between the encoder and decoder, the network could reuse the lower level features to help generate high definition segmentation results, and suppress vanishing gradients during training [44]. The custom projection module, which bridges between the 3D convolutional module and the 2D convolutional module, is key to this method because it removes human intervention and integrates the whole process. Comparison of the weight vector learned by the custom projection model (Fig. 4) to vessel density by depth in the rat eye [42] shows that it worked as we expected. The 2D convolutional module contains three parallel subnetworks that allow separate segmentation of the three retinal capillary plexuses.
Our results indicate that this network has good performance (F1-score > 90%) for vessel segmentation in the SVP. However, in the ICP and DCP, the performance deteriorated. With increasing depth, confounding factors introduced by low OCT signal strength are more prevalent, which interferes with network performance. Sensitivity in the ICP was particularly low. We believe that there are additional factors that may also contribute to this performance deterioration. First, as a junction between the SVP and DCP, the ICP vasculature in rats mainly comprises vertical vessels that connect the SVP (dominated by arteries and arterioles) and DCP (dominated by capillaries). These inter-plexus vessels appear in images as only single dots, so the ICP appears to be very sparse and discontinuous (Fig. 6 B), and not a complete blood vessel network, unlike the SVP or DCP [42]. The resulting disconnected vascular morphologies increase the difficulty of the segmentation task for the network and impose more challenges for human graders to manually delineate them. Additionally, due to its deeper position, the flow signal attenuation is also exacerbated, which may affect the performance of both network and manual grading. This could also decrease agreement with the ground truth.
We have also demonstrated that a deep-learning-based method can be used to build an end-to-end pipeline to segment the three retinal plexuses from OCTA volumes without manual assistance, which can greatly reduce the time required for data processing when applied to animal models. The method presented here is a useful research tool in its own right, since animal models continue to play an important role in ophthalmic research and will almost certainly continue to do so for the foreseeable future.
We also believe that the strategy outlined here will eventually be applicable to clinical (human) data sets. Human data shares a similar structure to rat data in both anatomical layers and vasculature organization, and so a model trained to perform a similar function for humans is imminently feasible. However, there are some challenges to be solved before this application can become a reality. The human retinal vasculature is denser than the rat, which may make it more difficult to generate a ground truth data set that has an accuracy similar to that of the rat eye. Furthermore, to achieve clinical relevance, any network to be used on human patients will have to demonstrate a robust performance for a wide variety of diseases. Finally, clinical data sets often contain data with poor image quality, and even the highest quality images are unlikely to be as clear as the images of rat retinas used in this study due to lack of anesthesia. Fortunately, clinical data are readily abundant. This will allow training with larger datasets, which may offset the impact of data diversity on network performance.
It is worth noting that, while the overlap between the ground truth and the network output was imperfect, our measurements of vessel density made from both the ground truth and the network output indicate that the network's segmentation result did not adversely affect our ability to quantify vessel density. While the overlap between output and the ground truth is clearly important, small shifts in vessel location may reflect the difficulty of establishing the true location of a vessel in a pixelated image as much as network performance. Additionally, few diagnostic criteria are concerned with the precise pixel-scale location of a vessel. Instead, summary statistics like vessel density are used for retinopathy characterization [51][52][53][54]. By avoiding manual segmentation, our results should be unambiguously transferrable between measurements in different contexts. Finally, with the greater scalability of the convolutional network, we can improve accuracy by increasing the dataset size and optimizing the network architecture. This approach may help improve segmentation in difficult layers in the future.

Conclusions
In summary, we proposed a deep learning method for vessel segmentation in all three retinal capillary plexuses of the rat eye from visible light OCT/OCTA. The network could segment five retinal layers using a 3D convolutional module, project 3D OCTA data to a 2D OCTA en face image using a custom projection layer, and segment three retinal capillary plexuses using a 2D convolutional module. By using these three modules, our network can achieve an end-to-end workflow for vessel segmentation of retinal plexuses. The high performance shown here indicates that this approach can replace complex data processing procedures and reduce errors caused by manual processing. Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.