ChoroidNET: A Dense Dilated U-Net Model for Choroid Layer and Vessel Segmentation in Optical Coherence Tomography Images

Understanding the changes in choroidal thickness and vasculature is important to monitor the development and progression of various ophthalmic diseases. Accurate segmentation of the choroid layer and choroidal vessels is critical to better analyze and understand the choroidal changes. In this study, we develop a dense dilated U-Net model (ChoroidNET) for segmenting the choroid layer and choroidal vessels in optical coherence tomography (OCT) images. The performance of ChoroidNET is evaluated using an OCT dataset that contains images with various retinal pathologies. Overall Dice coefficient of 95.1 ± 0.4 and 82.4 ± 2.4 were obtained for choroid layer and vessel segmentation, respectively. Comparisons show that among state-of-the-art models, ChoroidNET, which produces results that are consistent with ground truths, is the most robust segmentation framework.


I. INTRODUCTION
In optical coherence tomography (OCT) images, the choroid is a dense vascular layer between the retina and the sclera. It comprises choroidal vessels (luminal area) embedded in elastic connective tissues (stromal area). Its main function is to supply oxygen and nourishment to the outer retina. The thickness and vascularity index of the choroid are choroidal biomarkers [1] that facilitate the diagnosis, prognosis, and treatment of various ophthalmic diseases or their pathological conditions such as age-related macular degeneration (AMD) [2], choroid neovascularization (CNV) which is a pathology that occurs in wet AMD [3], diabetic macular edema (DME) [4], and retinitis pigmentosa [5], [6].
The associate editor coordinating the review of this manuscript and approving it for publication was Sabah Mohammed .
OCT is a non-invasive imaging technique that captures a cross-sectional view of the retina, including the choroid. With rapid development in optical imaging technology, enhanced depth imaging OCT (EDI-OCT) and swept-source OCT (SS-OCT) enable better visualization of the choroid than the conventional spectral-domain OCT (SD-OCT) [7], [8]. Figure 1 shows an OCT image of the components in the choroid layer, i.e., the upper boundary of the choroid (blue dashed line), the lower boundary of the choroid (green dashed line), the choroidal vessels, and the stromal area.
The choroid has an inhomogeneous texture because it contains vessels. The contrast between the choroid and the sclera is usually low in an OCT image, and thus the lower boundary of the choroid, called the choroid-sclera interface (CSI), is fuzzy and difficult to distinguish from the choroid. The choroid layer and choroidal vessels in OCT images must thus be manually annotated, which is time-consuming, error-prone (due to indistinct vascular structures), and subject to interobserver variability. Although some automated segmentation approaches are available, individual applications are required for segmenting the choroid layer and choroidal vessels. To the best of our knowledge, this study is the first to combine the segmentation of the choroid layer and vessels, which is clinically important. This is done using a deep learning model trained on eyes with various pathologies. In this work, we propose a dense dilated U-Net model called ChoroidNET for segmenting the choroid layer and choroidal vessels in OCT images. The automatic extraction of these regions would greatly assist ophthalmologists in diagnosis and treatment monitoring. ChoroidNET uses the U-Net [9] as a backbone architecture and integrates dilated convolutions with different dilation rates (factors) at the bottleneck. The dense connection of dilated convolutions exploits image contexts at multiple scales and improves segmentation performance. The experimental results demonstrate that ChoroidNET significantly outperforms existing state-of-the-art methods. We perform ablation studies to confirm the performance of ChoroidNET. Our ultimate goal is to automatically quantify clinical parameters that can be derived from the choroid, such as the luminal-to-stromal ratio and choroidal thickness.
The main contributions of this work are: 1) development of ChoroidNET for segmenting the choroid layer and choroidal vessels; 2) highlighting of the use of dilated convolutions in both layer and vessel segmentation; 3) robust segmentation of OCT images of eyes with various retinal pathologies; 4) qualitative and quantitative evaluation using manually annotated ground truths to determine the reproducibility of ChoroidNET.

II. LITERATURES
Many traditional image processing methods have been proposed for segmenting the choroid layer fully or semiautomatically for various retinal diseases. Graph search algorithm [10], graph cuts and dynamic programming [11], [12], min-cut max-flow graph theory [13], the Dijkstra shortest path algorithm [14]- [16], active contour [17], and the level set method [18] have been applied to detect the choroid boundaries in OCT images. Poor robustness is the main drawback of choroid segmentation techniques based on traditional image processing since they are highly sensitive to severe pathological images. Deep learning has gained increasing interest in medical image processing research. Several deep learning models were recently applied to choroid segmentation to improve performance. To detect the choroid boundaries, Sui et al. [19] presented a multi-scale convolutional neural network (CNN) to learn the edge weights in a graph searching approach. Masood et al. [20] performed automatic choroid segmentation using a patch-based CNN and morphological operations. A variety of deep learning models such as CNN, residual network, recurrent neural network, and squeeze and excitation network were explored in the choroid segmentation works of Kugelman et al. [21] and Alonso-Caneiro et al. [22]. They investigated the effects of patch size and the network architectures, and image pre-processing techniques on their patch-based and semantic segmentation networks. Chen et al. [23] used two SegNet models [24] to generate edge probability maps for BM and the CSI. Then, seam carving was applied to obtain a full choroid layer by finding a path of connected pixels between BM and the CSI. Devalla et al. [25] presented the dilated-residual U-Net model (DRUNET) for segmenting optic nerve head tissues, which contain the choroid, in OCT images of glaucomatous and healthy eyes. Zhang et al. [26] infused a biomarker prior into a global-to-local network (BIONET) for choroid segmentation. BIONET is composed of 1) a biomarker infused prediction network that learns the biomarker features, 2) a global multi-layered segmentation module that initially segments all layers (retinal layers and the choroid layer) in OCT images, and 3) a local choroid segmentation module that segments the choroid using the result from the global module and the learned biomarker features. Hsia et al. [27] segmented the choroid layer using a mask region-based CNN model, composed of deep residual network and feature pyramid networks. The choroid segmentation performance of deep learning models is very competitive and generally better than those of traditional image processing techniques.
Algorithms based on traditional image processing techniques have been developed for choroidal vessel segmentation. Srinath et al. [28] initially defined the RPE by finding the brightest region and the CSI by calculating the structural similarity index between the choroid and the sclera. Then, choroidal vessels were segmented in the region between the RPE and the CSI using the level set method.
Recently, Liu et al. [29] presented a deep-learningbased choroidal vessel segmentation model adapted from RefineNet [30]. There have been attempts to obtain the choroidal thickness and vasculature from SS-OCT images. Zheng et al. [31] detected the choroid's upper and lower boundaries in OCT images using the Residual U-Net model [32] and then performed binarization to detect the choroidal vessels using Niblack's algorithm. Zhou et al. [33] applied an attenuation correction approach to compensate for the attenuated light in SS-OCT images as a pre-processing step in choroid segmentation. Then, choroidal vessel maps, which enable the choroidal vasculature to be visualized without OCT angiography, were generated using a projection of OCT structural information.
A summary of performances, datasets, advantages, and drawbacks/limitations of some existing choroid layer and vessel segmentation methods, are discussed in Table 1.

III. METHODS
This section describes the network architecture and components of ChoroidNET. ChoroidNET is a patch-based model that adopts the structure of U-Net [9]. To denoise and enhance the contrast of OCT images, we preprocess each extracted patch using normalized gamma-corrected contrast-limited adaptive histogram equalization [34]. Training is then performed using the preprocessed patches. During training, we also perform data augmentation, which includes affine transformation, horizontal flipping, random distortion, and zooming.
A. NETWORK ARCHITECTURE Figure 2 shows the network architecture of ChoroidNET. The model comprises a layer segmentation module (LSM) and a vessel segmentation module (VSM). Each module consists of an encoder path, a decoder path, and a dilation block, which uses dense dilated convolutions, instead of standard convolutions. The blocks used in the network are defined as follows. A standard (purple) block corresponds to the resulting activation map from two consecutive 3 × 3 standard convolutions. All layers in a standard block are regularized by DropBlock (DB) [35], batch-normalized (BN) [36], and activated by a rectified linear unit (ReLU) [37]. A gray block represents the activation map forwarded from the encoder path that is concatenated with the corresponding up-sampled map in the decoder path. The red and yellow blocks, at the bottleneck of LSM and VSM, respectively, are dilation blocks that comprise six dilated convolutions with different dilation factors. These dilation blocks help ChoroidNET to overcome the loss of detailed spatial information and difficulty in extracting contextual semantic features. ChoroidNET thus has a good segmentation accuracy, resulting in smooth boundaries of the choroid layer. The intersection area of the input patch and the choroid layer prediction from LSM is fed into VSM to obtain a more consistent segmentation of choroidal vessels.

1) ENCODER PATH
At each level of the encoder path, the number of feature vectors is doubled. Thus, the bottommost level of the encoder path generates high-level semantic features. The purpose of the encoder path is to capture the contextual information of the input patches. This information is then fed to the decoder path through skip connections [38].

2) DECODER PATH
After each level of the decoder path, a 2 × 2 up-sampling operation is applied to restore the image to its original size.
The purpose of the decoder path is to perform semantic segmentation by concatenating up-sampled outputs and the contextual information transferred from the encoder path via skip connections. The features generated by the dilation block are added to achieve multi-scale context aggregation. Finally, a 1 × 1 convolution and a sigmoid activation are applied to obtain the pixel-wise binary segmentation for each pixel.

3) DILATED CONVOLUTIONS
A dilated convolution refers to a convolution conducted with a dilated filter. Yu and Koltun [39] and Chen et al. [40] reported that dilated convolutions can be used instead of down-sampling operations to expand the receptive field without degrading the resolution of intermediate feature maps by inserting zeros between the pixels of the kernel. Consider convolutional kernel K l with a kernel size of k × k in dilated layer l. The receptive field F K l of kernel K l can be calculated as: where D K l denotes the dilation rate of kernel K l . Figure 3 shows how the dilated convolutions adaptively enlarge the field of views by increasing the dilation rates.
For the dilation block of LSM, we increase the dilation factors in increments of 2. We experimentally found that this increasing order of dilation factors yields better performance in the choroid layer segmentation. However, aggressively increasing dilation factors is less effective for small objects such as choroidal vessels. Dilated convolutions with increasing dilation factors lead to weak spatial inconsistency between neighboring pixels; thus, it fails to aggregate local features. To address this, Hamaguchi et al. [41] used a local feature extractor after large contexts are aggregated by increasing the dilation factors. The local feature extractor helps to extract local features by decreasing the dilation factors. Inspired by this concept, for the dilation block of VSM, we first increase the dilation factors gradually and then decrease them to recover consistency between neighboring pixels.

4) DropBlock
Dropout [42] is a widely used regularization technique for fully connected networks. It prevents the overfitting caused by coadaptation on the training dataset by reducing the complexity of the network architecture and randomly dropping out independent features. However, this technique is less effective for convolutional networks where the features are spatially correlated because semantic information can still leak through in the networks. Thus, Ghiasi et al. [43] proposed DropBlock [35], which is a form of structured dropout, for effectively regularizing convolutional networks. We apply DropBlock to prevent our network from overfitting and to effectively remove semantic information. Figure 4 shows how DropBlock discards some contiguous regions that contain certain semantic information from a feature map of the choroid layer.

IV. EXPERIMENTS
This section describes the datasets used in the experiment, existing state-of-the-art models used for comparison, and the experiment and its implementation details.

A. DATASETS
Kermany et al. [44] published a large OCT dataset that contains approximately 80,000 images. These images were acquired via spectral-domain OCT (SD-OCT; Spectralis, Heidelberg Engineering) and collected from the Shiley Eye Institute of the University of California San Diego, the California Retinal Research Foundation, Medical Center Ophthalmology Associates, Shanghai First People's Hospital, and the Beijing Tongren Eye Center. This dataset was constructed to evaluate methods for classifying OCT images into four categories, namely CNV, DME, Drusen, and Normal. Abnormalities, such as the neovascular membrane and associated subretinal fluid in CNV images, retinalthickening-associated intraretinal fluid in DME images, and multiple drusen, are present in their dataset. CNV and the appearance of drusen indicate clinical signs of AMD.
In the experiment, we evaluated the performance of ChoroidNET using 80 OCT images (20 images from each category) randomly selected from this OCT dataset. Figure 5 shows examples of OCT images used in our experiment. The ground truths of the choroid layer and choroidal vessels were annotated by an expert observer using the ibisPaint application [45].

B. EXPERIMENT AND IMPLEMENTATION DETAILS
Ten images from each category were used to create a training set and the remaining images were used to create a test set. We enlarged the training set by using patches cropped from the original images (minimum dimensions: 230 × 495). Previous studies have shown that increasing the size of an image patch in a deep learning network provides a more precise segmentation performance since the network can capture more contextual information to make the prediction [46]. However, using a larger image patch requires larger memory. Under consideration of limited GPU memory, we chose a patch that is large enough to cover the choroid region and to be able to apply down-sampling operations in our network region, yet small enough to make the problem handleable. We randomly extracted 300 patches (dimensions: 224 × 224) from each image in a trainset set, for a total of 12,000 patches. Note that the areas of some patches overlapped. 90% of each training set was used for training and the remaining 10% was used for validation.
ChoroidNET was trained on each training set end-to-end using a computer with an Intel Core i7 CPU and an NVIDIA GeForce GTX 1070 Ti GPU. The training was performed for 50 epochs with a batch size of 4 and an initial learning rate of 0.0001. The RMSprop optimizer was used to adaptively reduce the learning rate. The loss function (L) was based on the sum of binary cross entropy loss (L BCE ) and Dice where y ∈ [0, 1] andp ∈ [0, 1] respectively denote the set of pixels in the ground truth and the set of pixels predicted by the trained network. The segmentation performance for the choroid layer and choroidal vessels was quantitatively evaluated in terms of five evaluation metrics, namely accuracy, the Dice coefficient, precision, recall, and specificity. The formulas for these metrics are shown in Table 2. The metrics were calculated based on four possibilities, namely true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The numerical results are expressed as means ± standard deviation (SD).

C. EXISTING METHODS
The performance of ChoroidNET is compared with that of U-Net++ L 3 [47], DRUNET [25], and Residual U-Net [31]. U-Net++ is an improved U-Net architecture based on nested and dense skip connections. U-Net++ was used to segment polyp, liver, and cell nuclei datasets. DRUNET and Residual U-Net adopt the structure of U-NET. DRUNET integrates residual blocks that comprise two dilated convolutions, instead of standard convolution blocks (except at the top level). DRUNET was designed for segmenting optic nerve head (ONH) tissues (including the choroid) in ONH-centered SD-OCT images. Residual U-Net inserts a residual connection between each pair of convolution blocks in its U-Net backbone. Residual U-Net was used for detecting the upper and lower boundaries of the choroid in foveal-centered SS-OCT images.
For a fair comparison, we trained and validated U-Net++, DRUNET, and Residual U-Net using the same training and test sets used for ChoroidNET, and performed the same data augmentation, pre-processing, and DropBlock regularization as that in our experiment.

A. COMPARISON WITH EXISTING METHODS
This section presents the experimental results of the choroid layer and choroidal vessel segmentation.  The segmented images produced by U-NET++, DRUNET, Residual U-NET, and ChoroidNET were qualitatively compared with their corresponding ground truths and quantitatively evaluated. Figures 6 and 7 show examples of the choroid layer and choroidal vessel segmentation results. Table 3 compares the performance of the choroid layer and choroidal vessel segmentation for the tested networks.
In general, the choroid layer segmentation results for ChoroidNET are qualitatively comparable to the ground truths. ChoroidNET shows the best segmentation performance (accuracy: 98.5 ± 0.2, Dice coefficient: 95.1 ± 0.4, precision: 94.1 ± 1.6, recall: 96.1 ± 0.9, specificity: 99.0 ± 0.3). U-Net++ segmented the choroid layer as smooth as the ground truths, and also had the high Dice coefficient (94.0 ± 1.1) and recall (96.0 ± 0.8). However, it was slightly inferior to ChoroidNET in terms of all evaluation metrics and oversegmented areas outside the choroid layer for DME images. The DRUNET produced irregular choroid boundaries. Residual U-Net results are similar to the ground truths; however, the segmented boundaries of the choroid are not smooth.

B. ABLATION STUDIES
To provide insight into each design element of ChoroidNET, we conducted four ablation studies. The ablation models were trained and validated using the same training and test sets. Figure 8 shows the architectures of the ablation models. Figures 9 and 10 show the segmentation results of the choroid layer and the choroidal vessels, respectively, for the ablation models and ChoroidNET. Table 4 compares the performance of ChoroidNET and its ablation models.   For choroid vessel segmentation, Ablation-1 had the highest recall (92.3 ± 2.2), but it oversegmented the region outside the choroid layer and thus had the lowest precision (68.6 ± 5.6). Ablation-2 slightly outperforms Ablation-1 by 0.6% in terms of the absolute Dice coefficient. This highlights the efficiency of the dilation block in VSM. The performance improvements (in terms of the Dice coefficient) of Choroid-NET over the four ablation models are 3.7%, 3.1%, 0.9%, and 0.6%, respectively. The improvement of ChoroidNET over Ablation-4 demonstrates the effectiveness of the connection between LSM and VSM.

C. INTRA-OBSERVER VARIABILITY
For the assessment of intra-observer variability, our observer repeated annotating process for the choroid layer and choroidal vessels. Table 5 shows the variability between two sets of ground truths (GT1 and GT2) and ChoroidNET's segmentation. Intraclass correlation coefficient (ICC) was used to measure the variabilities. The ICC value of 1 indicates the highest agreement between the two observations. The intra-observer reproducibility of choroid layer and vessel segmentation between GT1 and GT2 were excellent (Dice coefficient: 96.1 ± 1.1 and 84.1 ± 2.6) and (ICC: 0.983, 0.971). ChoroidNET also produced a high agreement with     Figure 11 shows ChoroidNET's segmentation results and their corresponding ground truths.

D. CONSISTENCY OF THE PROPOSED NETWORK
We included 80 more images (20 images each from CNV, DME, Drusen, and Normal) for validating the consistency of our proposed network. The proposed model was trained and validated on new training and test sets (40 images each). The training and validation processes were performed as same as the previous training. We then compared the performances of two distinct trained models using two distinct test sets. Table 6 presents the quantitative performance of the proposed network for four sets. Set-1 corresponds to the results of the trained model-1 and the test set-1, set-2 corresponds to the results of the trained model-1 and the test set-2, set-3 corresponds to the results of the trained model-2 and the test set-1, and set-4 corresponds to the results of the trained model-2 and of test set-2, respectively. Figure 12

VI. DISCUSSION
We now present a qualitative and quantitative segmentation analysis of the choroid layer and choroidal vessels. The experimental results in Table 3 confirm that ChoroidNET is the best state-of-the-art model for the segmentation of the choroid layer and choroidal vessels.
In an eye with DME, an accumulation of fluid with cystic properties usually occurs in the retinal layers. In an OCT image, those accumulated fluid regions are similar to the characteristics of choroidal vessels. For DME images, U-Net++ and Residual U-Net had inconsistent vessel segmentation performance compared to that of DRUNET and ChoroidNET, as shown with the yellow arrows in Figure 7 (c and e). The objective of the standard convolutions in U-Net++ and Residual U-Net is to extract the spatial information in the image. A deeper network can learn more semantic information. However, spatial information is lost at deeper layers, and thus the network predicts incorrect regions outside the choroid layer. Dilated convolutions reduce the loss of spatial information by expanding the receptive field of the network. Thus, the dilated convolutions in DRUNET and ChoroidNET facilitate the creation of large-scale feature maps with rich spatial information. The segmentation performance of DRUNET and ChoroidNET is thus more consistent for DME images.
In the U-Net architecture, the number of filters is doubled after down-sampling in the encoder path and halved after up-sampling in the decoder path. However, in the DRUNET architecture, only 16 filters are used in both standard blocks and residual blocks. DRUNET thus had poor vessel segmentation performance for CNV and Drusen images. The filter of a convolutional layer captures the patterns in image data. A higher number of filters allows the network to learn more complex patterns (abstractions) contained in image data and extract useful features. As a result, DRUNET was unable to separate the choroid pattern from the neovascular membrane in CNV images and mistakenly segment small drusen (which occurs in the complex between RPE and the choroid) as the    choroidal vessels, as illustrated with the yellow arrows in Figure 7 (d). U-Net++, Residual U-Net and ChoroidNET use a high number of filters (the same as that in U-Net), which considerably improves the recognition and segmentation of the choroid layer and vessels.
Overall, the segmentation performance of ChoroidNET is similar and consistent with the ground truths. U-Net++, DRUNET, and Residual U-Net are sensitive to the pathologies (subretinal and intraretinal fluid) present in CNV, DME, and Drusen images. In contrast, there is no visual difference in the segmentation performance of ChoroidNET for CNV, DME, Drusen, and Normal images.
The number of parameters used by a network depends on the number of filters. ChoroidNET and Residual U-Net each use approximately 4.5 million parameters (compared to 2.2 million for U-Net++ and only 40,000 for DRUNET) and thus have a much higher computational cost and use much more memory. This is a major drawback of ChoroidNET.
We also evaluated the segmentation performance of Further, we performed ablation studies to demystify the architecture of the proposed network. We also measured the intra-observer variability for choroid layer and vessel segmentation. To access the consistency and robustness of the proposed model, we tested ChoroidNET's performance using an additional dataset that contains 80 images of CNV, DME, Drusen, and Normal.
In summary, ChoroidNET significantly outperforms U-Net++, DRUNET and Residual U-Net and is robust for images with various retinal pathologies. In addition, it provides good tradeoffs between TPR and FPR, and between precision and recall for both choroid layer and choroidal vessel segmentation. ChoroidNET is thus the most robust model.

VII. CONCLUSION
In this study, we proposed ChoroidNET, a robust segmentation model for segmenting both the choroid layer and choroidal vessels in OCT images. ChoroidNET uses U-NET as a backbone and adds dense dilated convolutions at the bottleneck of LSM and VSM. The performance of ChoroidNET was tested on CNV, DME, Drusen, and Normal OCT images. The numerical results indicate that Choroid-NET outperforms U-Net++, DRUNET, and Residual U-Net, and is robust to cases of pathological abnormality (i.e., neovascular membrane and associated subretinal fluid in CNV, retinal-thickening-associated intraretinal fluid in DME, and multiple drusen).
Clinical research has shown that choroidal structures, in terms of changes in the luminal and stromal areas, and visual functions are highly correlated in diseased eyes [48]- [50]. Based on the segmentation results of ChoroidNET, our work could be extended to offer accurate quantification of clinical parameters derived from the choroid. These parameters can be used to find clinical correlations between choroidal changes and other clinical measures. It would be helpful for ophthalmologists to monitor changes in the choroid layer over time for various eye diseases.
In this work, we considered the segmentation of the choroid layer and choroidal vessels in OCT images. We will consider the segmentation of the retinal layers, the RPE, and the sclera in future studies because the pathologies of other tissues in the retina are important for diagnosing diseases such as Alzheimer's disease [51], AMD, diabetic retinopathy, and scleritis.  HIDEAKI HANEISHI received the M.S. and Ph.D. degrees from the Tokyo Institute of Technology, in 1987 and 1990, respectively. He joined Chiba University as a Research Associate, in 1990. He was a Visiting Research Scientist with the Department of Radiology, University of Arizona, from 1995 to 1996. He is currently a Full Professor with the Center for Frontier Medical Engineering (CFME) and also the Director of CFME.