Dual-stage deep learning framework for pigment epithelium detachment segmentation in polypoidal choroidal vasculopathy.

Worldwide, polypoidal choroidal vasculopathy (PCV) is a common visionthreatening exudative maculopathy, and pigment epithelium detachment (PED) is an important clinical characteristic. Thus, precise and efficient PED segmentation is necessary for PCV clinical diagnosis and treatment. We propose a dual-stage learning framework via deep neural networks (DNN) for automated PED segmentation in PCV patients to avoid issues associated with manual PED segmentation (subjectivity, manual segmentation errors, and high time consumption).The optical coherence tomography scans of fifty patients were quantitatively evaluated with different algorithms and clinicians. Dual-stage DNN outperformed existing PED segmentation methods for all segmentation accuracy parameters, including true positive volume fraction (85.74 ± 8.69%), dice similarity coefficient (85.69 ± 8.08%), positive predictive value (86.02 ± 8.99%) and false positive volume fraction (0.38 ± 0.18%). Dual-stage DNN achieves accurate PED quantitative information, works with multiple types of PEDs and agrees well with manual delineation, suggesting that it is a potential automated assistant for PCV management. © 2017 Optical Society of America OCIS codes: (110.4500) Optical coherence tomography; (170.3880) Medical and biological imaging; (100.0100) Image processing; (100.4996) Pattern recognition, neural networks. References and links 1. J. H. Kim, S. W. Kang, T. H. Kim, S. J. Kim, and J. Ahn, “Structure of polypoidal choroidal vasculopathy studied by colocalization between tomographic and angiographic lesions,” Am. J. Ophthalmol. 156(5), 974–980 (2013). 2. R. Liu, J. Li, Z. Li, S. Yu, Y. Yang, H. Yan, J. Zeng, S. Tang, and X. Ding, “Distinguishing polypoidal choroidal vasculopathy from typical neovascular age-related macular degeneration based on spectral domain optical coherence tomography,” Retina 36(4), 778–786 (2016). 3. S. Mrejen, D. Sarraf, S. K. Mukkamala, and K. B. Freund, “Multimodal imaging of pigment epithelial detachment: a guide to evaluation,” Retina 33(9), 1735–1762 (2013). 4. C. W. Wong, Y. Yanagi, W. K. Lee, Y. Ogura, I. Yeo, T. Y. Wong, and C. M. Cheung, “Age-related macular degeneration and polypoidal choroidal vasculopathy in Asians,” Prog. Retin. Eye Res. 53, 107–139 (2016). 5. S. M. Cohen, G. T. Kokame, and J. D. Gass, “Paraproteinemias associated with serous detachments of the retinal pigment epithelium and neurosensory retina,” Retina 16(6), 467–473 (1996). 6. J. D. Gass, S. B. Bressler, L. Akduman, J. Olk, P. J. Caskey, and L. E. Zimmerman, “Bilateral idiopathic multifocal retinal pigment epithelium detachments in otherwise healthy middle-aged adults: a clinicopathologic study,” Retina 25(3), 304–310 (2005). 7. U. Schmidt-Erfurth, S. M. Waldstein, G. G. Deak, M. Kundi, and C. Simader, “Pigment epithelial detachment followed by retinal cystoid degeneration leads to vision loss in treatment of neovascular age-related macular degeneration,” Ophthalmology 122(4), 822–832 (2015). 8. N. Nagai, M. Suzuki, A. Uchida, T. Kurihara, M. Kamoshita, S. Minami, H. Shinoda, K. Tsubota, and Y. Ozawa, “Non-responsiveness to intravitreal aflibercept treatment in neovascular age-related macular degeneration: implications of serous pigment epithelial detachment,” Sci. Rep. 6(1), 29619 (2016). 9. A. C. Tan, D. Simhaee, C. Balaratnasingam, K. K. Dansingani, and L. A. Yannuzzi, “A perspective on the nature Vol. 8, No. 9 | 1 Sep 2017 | BIOMEDICAL OPTICS EXPRESS 4061


Introduction
Worldwide, polypoidal choroidal vasculopathy (PCV) is a common, vision-threatening exudative maculopathy. Pigment epithelium detachment (PED), which occurs secondary to leakage and bleeding beneath the retinal pigment epithelium (RPE), is an important clinical characteristic of this chorioretinal disease. As PED volume can predict the treatment outcome of PCV disease [1][2][3][4], precise and reliable PED segmentation is required for quantification in clinical practice. Generally, PEDs can be divided into the three following types: serous, vascularized and drusenoid PEDs [3]. This work focuses on PEDs among PCV patients, i.e., serous and vascularized PEDs. Serous PED is caused by a collection of fluid in the sub-RPE space [5,6]. Vascularized PED, which is the result of angiogenesis and sub-RPE neovascularization, is more sight threatening than other types of PED but is more responsive to treatment [7,8]. Drusenoid PED seldom appears in PCV patients because it is caused by drusen, which is uncommon in PCV patients [9].
Compared with other imaging modalities, optical coherence tomography (OCT) provides noninvasive, in vivo, high-resolution cross-sectional view [10]. OCT is now the preferred imaging modality for PCV disease management and has been widely utilized for PED segmentation [1,11,12]. As manual interpretation of PED images is time consuming and prone to human errors [13][14][15], recent research studies are developing computer-aided diagnosis systems to provide efficient, reproducible and reliable information [16]. However, three main challenges significantly impede precise PED segmentation: (1) distorted morphology that limits the use of prior knowledge [17]; (2) blurred boundaries by unexpected speckles and undesirable abnormalities impeding precise delineation; and (3) intensity inhomogeneity between serous and vascularized PED that hinders accurate segmentation. As illustrated in Fig. 1, serous PED appears as an arch-shaped region with homogenously hyporeflective regions below the RPE layer, whereas vascularized PED has heterogeneous signals with hyperreflective vascular lesions and hypo-reflective lumens beneath the RPE layer [3]. Thus, the homogeneity among the neighboring tissues together with the heterogeneity within PED pose difficult challenges for automated segmentation of vascularized PED in PCV patients. Several computer software algorithms have been proposed for the purpose of PED segmentation from OCT images, including the conventional threshold-based and more recent graph theory-based methods (state-of-art methods) [13,[18][19][20][21]. All aforementioned methods were reliant on carefully hand-crafted, low-level image features, which are sensitive to image quality and intensity variance, and therefore sometimes result in non-reproducibility under different scenarios (serous and vascularized PEDs). In recent years, with the emergence of deep neural networks (DNN), there has been tremendous improvement in the ability to automate feature extraction in which the learned features are highly convolved to encode the intrinsic structures of the image for classification, recognition and segmentation [22,23] In this study, we propose a novel DNN-based framework to automatically segment PEDs in PCV patients. We validated this framework against two specialists as well as other state-ofart algorithms on PED segmentation performance. To the best of our knowledge, we presented the first dual-stage DNN learning framework for automated PED segmentation on PCV patients. The novelty of this paper is the dual-stage DNN learning. We first learn the BM layers on images via DNN; then we employ the obtained BM layers as constraints to assist another DNN to segment PED regions. While the single-stage network cannot solve different types of OCT imaging issues, our framework focuses on different issues in different stages so that our framework performs better than single-stage network.

Dual-stage PED segmentation framework
Given a denoised imageÎ , we will use convolutional neural networks (CNN), a proven powerful DNN-feature extractor, to capture image features and utilize these features to differentiate PED regions fromÎ . We propose to segment PED in a dual-stage manner. We built two DNN networks (named S1-Net and S2-Net) to form the main structure of our framework (shown in Fig. 2). Fully convolutional networks (FCN), which is an off-the-shelf powerful CNN model to extract PED-oriented whole image features to learn end-to-end PED segmentation [30], was adopted as the structures of both S1-Net and S2-Net (listed in Table  1). Following FCN, as the five pooling layers of FCN makes the feature map 32 × subsampled resolution, we set filter size to 64 and set stride to 32 to make sure that the resolution of obtained segmentation map is same with input image. In our dual-stage learning scheme, after normalization and denoising process, we first capture the BM layer from the image via the S1-Net model and then use the recognized BM layer as a constraint for the later PED recognition and delineation via the S2-Net model. We explain the framework architecture in detail as follows. Abbreviations: c, convolution layer; bn, batch normalization; r, rectified linear unit; p, pooling layer; fc, fully convolution layer; dc, deconvolution layer; c + r: convolution layer followed by rectified linear unit. softmax, decision layer to get segmentation probability map. a Fully convolutional network (FCN) is a type of deep neural network (DNN), which forms the basis of our framework. The whole settings in the network, from the input to the output of the network, are shown from top to bottom.

Pre-processing
As the original images of our data were 512 × 496 pixels, the images were subjected to an imresize process to 384 × 384 pixels when fitted in the network, and when final results were output from the framework, we use bilinear interpolation to restore the images to the original resolution and obtain the final results. Although our used DNN structure (FCN) could adopt images with any resolution, we did the normalization for input size to improve efficiency of our framework. Normalization for input size is widely used in many DNN based segmentation methods [34,35]. According to the experimental results of these segmentation methods, normalization for input size does not affect much in performance. As shown in Fig.  3(a), due to the signal transmission in OCT machines or other sources of noise, OCT images in real clinical practices usually contain unexpected speckles and patterns. To reduce the influence of this imaging noise, the probability-based non-local means filter [36,37], which was competitive with other state-of-the-art speckle removal techniques and able to accurately preserve edges and structural details with small computational cost on denoising process of OCT images, was used to denoise the original images.

BM layer recognition learning
To recognize the BM layer from the OCT image, we input the denoised imageÎ (shown in Fig. 3(b)) into S1-Net in whichÎ is passed into a sequence of convolution and pooling layers (as listed in Table 1). The output of the sequence of convolution and pooling layers is highly convolved data, which could represent the intrinsic and semantic information of Î . Afterwards, the obtained highly convolved data are passed into a deconvolution layer (the last layer of S1-Net in Table 1), which is the transpose of the convolution to upsample the convolved data. The ground truth of the BM layer corresponding to each training image is required for this learning. We pad the regions below the BM layer with foreground pixels on the ground truth such that the positive and negative samples are relatively balanced. We utilize the padded ground truth to train S1-Net. As shown in Fig. 3(c) and 3(d), we can obtain a more compact and precise BM layer with this training compared with training with the linebased ground truth. Therefore, we could gain a probability map R (shown in Fig. 3(d)), which has the same resolution with Î . We use R as the BM layer recognition result.

PED delineation learning
In the PED learning stage, we employ R obtained in the previous stage as prior knowledge to assist S2-Net training for PED delineation. Specifically, the intensity imageÎ is transformed into a RGB image G (shown in Fig. 3(e)) such that the recognized BM layer appears red on G . We input G into S2-Net and train it with the ground truth of PED regions. Thus, the size of input data and the first filter bank are different with S1-Net. As the BM layer is imposed on the input data of S2-Net, this constraint is successively and inherently attached to the PED-region-oriented feature maps. We use the output of the S2-Net as the segmentation map and then adopt a threshold of 0.5 to delineate the PED contour (shown in Fig. 3(f)).

Framework settings
We implemented our framework using MatConvNet [38]; the implementation was accelerated via GPU computation. The number of training epochs was set to 50, batch size was set to 20 and learning rate was set to 0.0001; these parameters were derived empirically to produce optimal results. Consistent with the original MatConvNet [38], other parameters, including batch size, momentum and weight decay, followed the default settings as these were shown to be robust in previous works [24,34]. Our experiments followed a ten-fold cross validation protocol where each validation process contains 45 patient scans as a training set and 5 scans as a validation set.

DNN and optimization in the framework
The main structure of our framework is FCN that is widely used for image segmentation [30]. In this kind of CNN model, convolution and pooling are the two essential layers. When the data will convolve X to produce a d ′ -dimensional feature map. During the training phase, these filters and biases will update so that the produced feature maps are more discriminative to differentiate PED. Pooling layers usually follow convolution layers in which the resolution of feature maps is reduced. These procedures result in obtained features that are less sensitive to input shift and distortions [39]. In practice, the input data normally pass into a set of convolution layers and pooling layers such that the extracted feature maps are more intrinsic and semantic than the low-level hand-crafted features [40].
After extracting feature maps, deconvolution layer is embedded to obtain segmentation probability map with the same resolution of input image. Deconvolution is the transpose of convolution defined as [38]: is the weight of deconvolution, X and Y are the input and output of deconvolution respectively. z is the size of padding in deconvolution, and r is the stride of To train the CNN, we minimize softmax log-loss to make CNN evolve to optimal segmentation. The loss function is defined as [38]: where S is the set of pixels of all training images, t x is the output of deconvolution layer of t -th channel at the position of x , c is the label of x and λ is the weight decay for regularization of learnable weights W . λ is set to 0.0005 empirically.

Data set
Spectral domain OCT volume scans of patients diagnosed with PCV were obtained with a Heidelberg Spectralis device (Heidelberg Engineering, Heidelberg, Germany) between March 2015 and December 2015. The dimension of each OCT volume image is 512 × 97 × 496 voxels (97 B-scans of 512 × 496 pixels); the resolution is 11.13 µm × 59 µm × 3.83 µm(Distance between B-scans is 59 µm).1800 OCT B-scans from the 50 patients were used. All the B-scans were taken continuously from the PED regions. The tenets of the Declaration of Helsinki were followed, and the Institutional Review Board of Shanghai General Hospital, Shanghai Jiao Tong University approved the study. Informed consent was obtained from all subjects. The diagnosis of PCV was based on the EVEREST study using angiographic criteria [41]. To evaluate the layer segmentation results, two retinal specialists manually labeled PED regions, the BM layer and inner limiting membrane (ILM) on each scan (which contains 97 slices). The results generated by one of the specialists (Expert I) defined the ground truth. For subgroup analysis, Expert I classified the PED cases as either simple or complicated, following the rule that vascularized PEDs in less than 50% of the slices were defined as simple cases, whereas the vascularized PEDs in more than 50% of the slices and with hyperreflective exudates around the PED regions were defined as the complicated cases. The unsigned border positioning error was calculated by measuring the vertical absolute Euclidean distances between the positioned BM layers from the different methods and the ground truth [15].

Problem statement and approach to the major challenges
Given a denoised OCT imageÎ , the task is to differentiate PED regions fromÎ in our PCV data set. To assign each pixel location to a particular label l in the label space  = { l } = { 1,…, K} for K classes. We treat the current segmentation task as a K = 2 class classification problem. The approach of our dual-stage framework to deal with the major challenges as listed below.
We recognized three major challenges in the previous section: (1) We used DNN to address the issue of distorted morphology. Specifically, in DNN, convolution layer is to extract and mix image information. With deep convolution layers, the extracted image information is more intrinsic to interpret the image. The pooling layers of DNN makes our framework less sensitive to the input rotation and shift. Therefore, although there is some distorted morphology on OCT images, our DNN based framework could still recognize most of PED regions. (2) Our dual-stage learning could solve the challenges of speckles, abnormalities and inhomogeneous regions. While a single DNN cannot handle so many issues, our proposed model addresses different learning aim in different stage, so it expectably decreases the impacts by various issues, so it expectably decreases the impacts by various issues. For example, S1-Net is responsible for the recognition of BM layer. As the speckles and abnormalities around BM layer is the biggest major issue, S1-Net expectably learns to solve the issue by deep learning, regardless of solving intensity homogeneity. On the contrary, S2-Net is responsible for PED region detection. In this stage, intensity homogeneity inside of PED region becomes the major challenge, so S2-Net expectably learns to solve it regardless of other issues.

Comparison methods
The graph theory-based algorithm proposed by Sun, et al [40]. However, the source codes are not available and thus we implemented their algorithm as the benchmark in our evaluation. The first step of their segmentation algorithm was a multi-scale graph search (GS) algorithm, which is mainly based on the work of Shi et al [15]. This theory defines layers in order from top to bottom by calculating the dark-to-bright and bright-to-dark boundaries, and the RPE layer is defined as one of these boundaries. Then, the BM layer is created by the convexity of RPE. PED boundary delineation was conducted via a machine learning (ML) (AdaBoost) combined algorithm for marking the regions between these two layers in Sun's work [40]. The parameters we used are shown in Table 2, which are the same from their papers [15,40]. We denote their approach as GS + ML (shown in Fig. 3(i) and 3(j) for the two following steps mentioned in their algorithm: layer segmentation and PED segmentation). To verify our implemented methods of Sun's work [40], we validated it on the arbitrary 100 serous PED slices from our private data set. We gained 91.77 ± 4.41% DSC, which is consistent with the published result (91.20 ± 3.77% DSC) from Sun's private data set, demonstrating the accuracy and robustness of our implemented methods of the data sets from Sun's work [40].
In addition, a single-stage DNN framework (FCN), directly adopting OCT images without the notation of BM layer (shown in Fig. 3(b)), is also comparatively evaluated. The basic settings of single-stage DNN framework are the same with our proposed framework as shown in Table 1. The example segmentation result of the single-stage DNN framework is shown in Fig. 3(g). Abbreviations: EZ = ellipsoid zone; ILM = inner limiting membrane; RPE = retinal pigment epithelium. a Bruch's membrane was detected after the RPE floor using the convhull algorithm. PED region segmentation was conducted by first locating the area between the RPE and BM, and then a graph cut and morphology combined algorithm was used to obtain the final results using AdaBoost. The details were given in the work of Sun et al [40]. b Initial detection level: According to Shi et al., the 3D OCT scan is downsampled by a factor of 2 twice in the zdirection to form three resolution levels [15]. Level 1 represents the lowest resolution, and level 3 represents the highest resolution, i.e., the original data. In this manner, their algorithm is multi-resolution. c The following two facts are considered when determining the smoothness constraints Δx and Δy for each surface: the image resolution and the shape of surfaces. Δx

Evaluation metrics
The true positive volume fraction (TPVF), dice similarity coefficient (DSC), positive predictive value (PPV) and false positive volume fraction (FPVF) are used in our evaluation [40] and defined as follows: where R V , G V and ε V are the segmentation results, ground truth and retina volume, respectively, between BM and ILM.

Statistical analyses
The results are presented as the mean ± standard deviation (SD) for continuous variables. Intergroup differences were tested by t-test. Correlation analysis was used to display the correlation of PED volumes measured between different methods and different specialists. Bland-Altman analysis was used to analyze agreement [42]. We used 95% limits of agreement (LoA) to evaluate agreement between the different methods and experts. Statistical significance was set at p<0.05 (two tailed).

Results
Fifty SD-OCT scans of PCV patients from Shanghai General Hospital were collected to evaluate our proposed framework. The average age of the patients was 66.5 years (95% confidence interval [CI], 64.8 -68.1 years); 64% (32/50) of the patients were male with an average best corrected visual acuity (logMAR) of 0.68 (95% CI, 0.58-0.78). Twenty-four patients were classified as simple cases, and 26 patients were complicated cases. The final segmentation results on SD-OCT images are illustrated in Fig. 4. For serous PED, segmentation error mainly occur when there are some discontinuity in the RPE layer (shown in the last row of column a). For vascularized PED, segmentation error mainly occurred when the PED regions are difficult to distinguish from the surrounding tissues (shown in last three rows of column c).
Our framework was implemented in MATLAB R2016a and run on a desktop PC with a GPU NVIDIA GeForce GTX 980 equipped on an Intel Core i7 2.60 GHz machine. The average running time of our framework per B-Scan is 0.92 seconds.

Segmentation accuracy of the different methods and experts
Quantitative assessments of the segmentation performance achieved by the different methods are summarized in Table 3. Results from Expert II is comparable with ground truth (Expert I), proving the robustness of ground truth. In terms of accuracy, the mean and standard deviation of TPVF, DSC, PPV and FPVF for the proposed framework are 85.74 ± 8.69%, 85.69 ± 8.08%, 86.02 ± 8.99% and 0.38 ± 0.18%, respectively. Higher values of TPVF and DSC indicate the PED region is segmented more accurately. Over 85% PPV and less than 0.5% FPVF indicates less incorrectly segmented PED regions. Compared with the results from state-of-the-art methods based on the graph theory (GS + ML) and single-stage DNN, our results exhibited better performance by over 5%. 90.03 ± 4.13 (p<0.0001) b Abbreviations: DNN = deep neural network; GS + ML = a method based on graph theory proposed in Sun. et al [40]. p = statistical significance test between dual-stage DNN and other methods. a All the results are listed as the mean ± standard deviation in percentage. Statistical tests: To test for differences of segmentation accuracy results between dual-stage DNN and other methods/expert, t-test was applied. b p<0.05

Correlation between the different methods and experts
A strong correlation is obtained between Expert I and Expert II (r = 0.9997, p < 0.001) in all cases, proving the robustness of the ground truth. There is a significant positive correlation between our dual-stage DNN framework and different experts (Expert I correlation coefficient 0.9986, Expert II 0.9981,p < 0.001 for each).The correlation coefficient of singlestage DNN (Expert I 0.9666, Expert II 0.9691, p < 0.001 for each) and GS + ML (Expert I 0.9775, Expert II 0.9781, p < 0.001 for each) are not as high as that of our framework. A comparison of PED volumes measured by the three methods is summarized in Table 4. Mean PED volume measurements were not significantly different between different methods and experts. However, dual-stage DNN showed the least difference with Expert I (0.0032 mm 3 ) and Expert I (0.0016 mm 3 ), which is comparable to the difference between experts (0.0017 mm 3 ). The difference of the other two methods with different experts is almost ten times larger than that of dual-stage methods. In Bland-Altman analysis, dual-stage DNN has the lowest 95% LoA with Expert I (−0.0691 to 0.0627 mm 3 ) and Expert II (−0.0765 to 0.0734 mm 3 ), which is nearly a quarter of the LoA from the other two methods (shown in Fig. 5). These differences are acceptable for clinical purposes, as noted in a previous work [40]. Less than 5% of points are outside the agreement limits, and the bias is not significant as the line of equality is within the confidence interval of the mean difference.

Segmentation performance of the different methods within the subgroups
The segmentation performance by dual-stage DNN is different within subgroups. In simple cases, the mean TPVF, DSC and PPV of our segmentation results are over 92%, and the standard deviation is less than 4% (3.87%, 2.64% and 3.71%, respectively). This performance is the highest on simple cases among all comparison methods. In complicated cases, the proposed method also outperforms all the comparison methods. The results from our framework are TPVF and DSC of nearly 80% (79.11% and 79.32%, respectively) and PPV over 80% (80.11%).

BM layer recognition performance between the different methods
BM layer recognition is compared between dual-stage DNN and GS + ML because this constraint is important for improving segmentation performance. The unsigned border positioning error of our layer segmentation methods is 5.71 ± 3.53 μm. The number of errors of our dual-stage DNN framework is almost half the number of errors of GS + ML (10.53 ± 4.69μm, p<0.0001). In addition to the performance difference between algorithms, we also notice that the layer recognition result of dual-stage DNN on simple cases is better than that on complicated cases by 0.91 μm (p<0.0001).

Discussion
PCV is characterized by abnormal choroidal vascular networks with aneurismal or polypoidal terminations. These choroidal vascular changes lead to morphology changes in RPE, characterized as various forms of PED. Multiple PEDs, sharp PED peaks, PED notches, and rounded polyp lumens inside the PED regions were helpful in diagnosing PCV based on SD-OCT [43]. Decreased PED volumes have been observed during resolution with anti-VEGF therapy in clinical studies [21,44]. Treatment based on changes in PED morphology rather than on exudatives or hemorrhagic recurrences may improve the long-term visual outcomes of this disease. Nevertheless, large samples and long-term clinical studies are needed to obtain enough data to support this treatment paradigm. However, data analyses by manual segmentation are time consuming and thus limit the design of clinical studies [14]. Automatic PED segmentation is the first step for PED morphology analysis of large clinical samples. DNN has been successfully applied to tumor segmentation from the normal tissue, such as skin, liver and bone tumors [27][28][29]. Thus, DNN has the potential to be the solution.
Our dual-stage DNN framework achieved a mean DSC and TPVF of over 85%, outperforming the comparative methods. A strong correlation was found between dual-stage DNN and different experts. In terms of agreement with human experts, dual-stage DNN has the least difference. The mean volume difference with Expert II (0.0016 mm 3 ) is comparable to that between different experts (0.0017 mm 3 ). Compared with other methods, dual-stage DNN had the closest 95% LoA between different experts. These results demonstrate the ability to apply dual-stage DNN for PED segmentation and volume monitoring in PCV treatment.
These improvements can be attributed to the following reasons: (1) using the advantages of high-level features from DNN; and (2) utilizing the prior knowledge from BM recognition in a dual-stage manner to later assist DNN for PED delineation. The highly reflective exudatives above the PED region, which disturb the bright-to-dark boundary delineation, lower segmentation performance of GS + ML. However, our learned, highly convolved features can effectively represent the intrinsic data structures of different image scenarios to self-adapt to various PED cases (serous and vascularized). The BM layer is blurred in some cases and thus impedes the feature extraction by a single-stage DNN; in dual-stage DNN, the millions of learnable parameters in the DNN model are trained simultaneously to fit the BM layer recognition task. Compared to conventional classifier such as AdaBoost with countable parameters, DNN adjusts millions of parameters to address the issues of poor quality images.
Millions of parameters should work better than countable parameters to complicated scenarios.
Due to the different ratios of serous PEDs to vascularized PEDs between different subgroups, the segmentation performance of dual-stage DNN on simple cases is better than that on complicated cases. Compared with the image quality in simple cases, poor image quality in complicated cases, typically caused by speckles and abnormalities around the PED region, lowers the precision of layer recognition and PED region segmentation. As shown in Fig. 4, compared with the ground truth, the predicted results of vascularized PEDs by our framework could have over/under-segmented, as the PED regions are difficult to distinguish from the surrounding tissues. However, the proposed framework still had the highest performance.
In a recent study [21], PED segmentation was conducted in PCV patients using built-in commercial software, which was originally developed for drusenoid PED segmentation [20] based on threshold-based methods, and manual correction was performed when the automated results were incorrect. However, there are some limitations in the software [18,20]. First, as noted in an earlier study, for the PED surrounded by abnormal and highly reflective exudatives, the built-in commercial software could not precisely recognize the RPE layer, which eventually causes incorrect PED segmentation [18]. We utilized information from the whole image fully convolved into the framework to obtain PED segmentation and theoretically overcome this challenge. Second, the software was designed to ignore PED with a height below a given threshold set to 20 μm [20]. However, branching vascular networks and small PEDs in PCV occurred within the 20-μm limitation beneath the RPE layer in some cases, which is shown in the first row of column (c) in Fig. 4. Our algorithm set no limitations to avoid this problem. With our framework, automated segmentation has the least segmentation error; thus this method requires fewer manual corrections, which will save time that would otherwise be devoted to manual correction and will therefore permit a larger scale population analysis.
In this study, the algorithm was developed and evaluated with data acquired from a Spectralis device. As our framework adopted OCT images with fixed resolution, we used the imresize process to ensure that the size of input images was 384 × 384. After the final results were obtained from the framework, another imresize process was conducted to restore the images to their original size. Furthermore, prior to the segmentation procedure, all images were pre-processed for speckle noise reduction by the probability-based non-local means filter [36]. Although there are many denoising methods that had good performance, such as sparsity based denoising [45,46], we choose this method as it is competitive with other stateof-the-art speckle removal techniques and able to accurately preserve edges and structural details with small computational cost. Denoising methods such as sparsity based denoising could be an option in our framework and will be studied in the future. Besides, FCN was used as the basis of our framework, while other single-stage network such as U-net and RelayNet could also be employed in our framework [31,33]. We would like to investigate their performance in our framework in the future. Because our segmentation algorithm focuses on diseased images, we will build a classifier to separate normal and diseased images prior to segmentation in our future work. In addition, our database limited our algorithm testing on drusenoid PED, which seldom appears in PCV [47]. Since this type of PED did not show up in our PCV database, we will test the performance of our algorithm on it and other diseases in the future.

Conclusion
Our dual-stage DNN framework can be applied to multiple types of PED segmentation (serous and vascularized) in PCV patients, which is in contrast to the existing algorithm studies that only focused on a particular type of PED. Moreover, our framework can be further extended to PED segmentations in central serous chorioretinopathy [48], choroidal neovascularization secondary to age-related macular degeneration [7] and myopic choroidal neovascularization [49] in which serous and vascularized PEDs commonly occur. Dual-stage DNN, suitable for large database segmentation, can provide needed information for better disease management.

Funding
National Natural Science Foundation of China (NSFC) (81570851, 81273424) and Project of the National Key Research Program on Precision Medicine (2016YFC0904800)

Disclosures
The authors declare that there are no conflicts of interest related to this article.