Automatic geographic atrophy segmentation using optical attenuation in OCT scans with deep learning

: A deep learning algorithm was developed to automatically identify, segment, and quantify geographic atrophy (GA) based on optical attenuation coefficients (OACs) calculated from optical coherence tomography (OCT) datasets. Normal eyes and eyes with GA secondary to age-related macular degeneration were imaged with swept-source OCT using 6 × 6 mm scanning patterns. OACs calculated from OCT scans were used to generate customized composite en face OAC images. GA lesions were identified and measured using customized en face sub-retinal pigment epithelium (subRPE) OCT images. Two deep learning models with the same U-Net architecture were trained using OAC images and subRPE OCT images. Model performance was evaluated using DICE similarity coefficients (DSCs). The GA areas were calculated and compared with manual segmentations using Pearson’s correlation and Bland-Altman plots. In total, 80 GA eyes and 60 normal eyes were included in this study, out of which, 16 GA eyes and 12 normal eyes were used to test the models. Both models identified GA with 100% sensitivity and specificity on the subject level. With the GA eyes, the model trained with OAC images achieved significantly higher DSCs, stronger correlation to manual results and smaller mean bias than the model trained with subRPE OCT images (0.940 ± 0.032 vs 0.889 ± 0.056, p = 0.03, paired t-test, r = 0.995 vs r = 0.959, mean bias = 0.011 mm vs mean bias = 0.117 mm). In summary, the proposed deep learning model using composite OAC images effectively and accurately identified, segmented, and quantified GA using OCT scans.


Introduction
Geographic atrophy (GA), also known as complete RPE and outer retinal atrophy (cRORA) [1], forms in the late stage of nonexudative age-related macular degeneration (AMD) and is characterized by the loss of photoreceptors, retinal pigment epithelium (RPE) and choriocapillaris (CC) [1,2]. GA is estimated to affect approximately 5 million people globally, and its prevalence increases exponentially with age [3,4]. With no FDA approved treatment available, it remains a leading cause of irreversible vision loss worldwide [5,6]. Currently, there are several clinical trials underway using complement inhibitors for the treatment of GA, where the growth rate of GA area is used as the primary study endpoint [7][8][9][10]. An automated and accurate approach to identify, segment, and quantify GA would be of great interest and importance for following these patients in clinical practice and confirming the effectiveness of treatments in clinical trials.
Historically, GA was defined as sharply demarcated area of apparent hypopigmentation on color fundus imaging (CFI), but with current technological advancement, fundus autofluorescence (FAF) and optical coherence tomography (OCT) have become more widely used in clinical practice and clinical research to identify and measure GA [11][12][13][14][15][16][17][18][19]. Both modalities have their own advantages as FAF contrasts RPE pigmentation by visualizing ocular fluorophores such as melanin and lipofuscin [4], while OCT provides depth-resolved information using both spectral domain OCT (SD-OCT) and swept-source OCT (SS-OCT). From the OCT datasets it is possible to define custom slabs under the RPE based on boundary specific segmentation that allows for the visualization of defined anatomical regions. One custom slab, known as the sub-retinal pigment epithelium slab (subRPE slab), can produce en face images that specifically accentuate the choroidal hyper-transmission defects (hyperTDs) that arise when the RPE is attenuated or absent [11,12,20,21].
Several automatic algorithms have been developed using various imaging approaches to segment GA and calculate GA areas. These include computer vision and traditional machine learning approaches like region-based Chan-Vese model [22], random forest classifier [23], and fuzzy c-means clustering [24] as well as modern deep learning approaches like sparse autoencoders [25][26], convolutional neural networks (CNNs), [27][28][29][30] and a generative adversarial network (GAN) [31]. Overall, these published works demonstrated good agreement with human graders and they achieved satisfactory DICE similarity coefficients (DSCs), ranging from 0.68 to 0.89.
In this study, we propose a new deep learning approach to identify, segment, and quantify GA area using optical attenuation coefficients (OACs) calculated from OCT datasets. We introduce novel en face OAC images to identify and visualize GA and a CNN model for the task of automatic GA identification and segmentation.

Methods
The OCT images used to develop, train, and test the algorithm were acquired as part of a prospective study performed at the University of Miami and approved by the Institutional Review Board of the University of Miami Miller School of Medicine, in adherence to the tenets of the Declaration of Helsinki and the Health Insurance Portability and Accountability Act of 1996 regulations. Normal subjects and subjects diagnosed with GA secondary to nonexudative AMD were enrolled from June 2016 to November 2019 with informed consents obtained before participation.

Imaging acquisition
All subjects underwent SS-OCT scanning (PLEX Elite 9000, Carl Zeiss Meditec, Dublin, CA). This instrument uses a 100 kHz light source with a 1050 nm central wavelength and a 100 nm bandwidth, resulting in an axial resolution of ∼ 5.5 µm and a lateral resolution of ∼20 µm estimated at the retinal surface. 6 × 6 mm scans were acquired at the baseline visit, at six month follow-up visits, and at one year follow-up visits as previously described [18]. For each 6 × 6 mm scan, there are 1536 pixels on each A-line (3 mm), 500 A-lines on each B-scan, and 500 sets of twice repeated B-scans. Scans with a signal strength less than 7 or evident motion artifacts were excluded from further data analysis.

Image processing
A depth-resolved single scattering model was applied to calculate OAC for each pixel in the volumetric SS-OCT dataset [32,33]. Briefly, if it's assumed that all light is completely attenuated within the imaging depth range, the backscattered light is a fixed fraction of the attenuated light, and the detected light intensity is uniform over a pixel, then the OAC at each pixel in the volumetric OCT dataset can be written as: where µ[i] is the OAC of the ith pixel, with a unit of mm −1 , ∆ is the axial size of the a pixel (mm) and I[i] is the detected OCT signal intensity (linear scale) at the ith pixel. Given the assumption of full attenuation within the imaging range, ∫ ∞ i+1 I[i] could be calculated as the summation of the OCT intensity of all pixels beneath the ith pixel. An equation provided by the manufacturer was used to convert log scale SS-OCT data back to the linear scale.
An automatic proprietary segmentation algorithm (Carl Zeiss Meditec, Dublin, CA) was used to identify the lower boundary of retinal nerve fiber layer (RNFL) and Bruch's membrane (BM) on the volumetric SS-OCT data. These segmentation lines were then applied to the volumetric OAC data using MATLAB version R2016b software (MathWorks, Natick, Massachusetts). Two separate slabs were defined to generate en face images for the GA visualization and segmentation. The first slab was defined as RNFL to BM, as shown between the red dashed lines on an OAC B-scan in Fig. 1(A). For this slab, four different en face OAC images were automatically generated: the maximum projection image (OAC max image as shown in Fig. 1(B)), the sum projection image (OAC sum image as shown in Fig. 1(C)), the RPE to BM distance map (the OAC elevation map as shown in Figs. 1(D), 1(E)), and the false color OAC image, which is a composite using different OAC images in each color channel, i.e., the OAC max image in the red channel, the OAC sum image in the green channel, and the OAC elevation map in the blue channel, as shown in Fig. 1(F), as previously described in detail [34]. The position of the RPE is defined as the pixel with the maximum OAC value above BM along each A-line and the OAC elevation map represents the distance between the RPE and BM. A 5 × 5 pixel median filter was used for smoothing to reduce the noise appearance in the image. Green dashed lines in Fig. 1(D) show an example of this RPE-BM distance on an OAC B-scan. The second slab, also known as the subRPE slab, was defined as extending from 64 µm below BM to 400 µm below BM, and it is shown between the yellow dashed lines in Fig. 1(G). For this subRPE slab, an en face OCT image was also automatically generated, as shown in Fig. 1(H), using the sum projection. GA lesions ( Fig. 1(I)) were manually outlined using this en face subRPE OCT image, referencing B-scans, by two independent graders (Y.S. and L.W.) using Photoshop CC (Adobe Systems, San Jose, California, USA), and consensus boundaries were reached between both graders. In cases of disagreement, a senior grader (P.J.R.) served as the adjudicator, as described in previous studies [18]. All images were resized to 512 × 512 pixels.

Deep learning model
To perform the GA segmentation task, 2 deep learning models were trained using the same U-Net architecture with different input images, namely, the composite OAC false color images and the en face OCT subRPE images. The U-Net model architecture is shown in Fig. 2 with specifications all labeled. The only difference for the two models is the input layer, one with a 3 channel image and the other one with a 1 channel image. Training used 80% of all eyes, and testing used 20% of the eyes. Within the training cases, there is also an 80:20 split between training and validation, partitioned at the eye level. Cases were shuffled and the set division was completely random. The hyper-parameters (learning rate, dropout and batch normalization) for the training process were tuned on the validation set using grid search. In the training process, data augmentation with zoom, shear, and rotation was implemented and a batch size of 8 was used. For each 3 × 3 convolution block, the He normal initializer [35] was used for kernel initialization. The Adam optimizer was used and the model evaluation metric was defined as the soft DSC (sDSC) and the loss function was the sDSC loss: where N is the number of all pixels, p i and g i represents the ith pixel on the prediction and ground truth image respectively, s is a smoothing constant set as 0.0001 to avoid dividing zero. Each model was trained with 200 epochs with a patience for early stopping of 50 epochs, and only the model with the best metric was saved. The models were implemented in Keras using Tensorflow as the backend, and training was performed with a 16GB NVIDIA Tesla P100 GPU through Google Colab.

Evaluation metrics and statistical analysis
To evaluate the performance of trained models, DSC, area square-root difference (ASRD), the subject-wise sensitivity, and specificity were calculated on the testing set: where TP denotes true positive, TN denotes true negative, FP denotes false positive, and FN denotes false negative. Note that TP, FP and FN in Eq. (3) represent pixel level information and TP, TN, FP, and FN in Eqs. (4) and (5) represent eye level information. A threshold of 0.5 was used to binarize the probability map from the model's prediction output. An image with any GA pixels is classified as having GA.
To further compare the identified GA regions, total area and square-root area measurements of GA were calculated for both ground truth and model outputs. A square-root transformation was applied to calculate the size and growth of GA since this strategy decreases the influence of baseline lesion size on the test-retest variability and on the growth of GA [19,36]. The paired t-test was used to compare model outputs using OAC composite images and OCT subRPE images. Pearson's linear correlation was used to compare the square-root area of the manual and automatic GA segmentations, and Bland Altman plots were used to analyze the agreement between the square-root area of the manual and automatic GA segmentations. P values of < 0.05 were considered to be statistically significant.

Results
In total, 80 eyes diagnosed with GA secondary to nonexudative AMD and 60 normal eyes with no history of ocular disease, normal vision, and no identified optic disc, retinal, or choroidal pathologies on examination were included in this study. All cases were randomly shuffled such that 51 GA eyes and 38 normal eyes were used for training, 13 GA eyes and 10 normal eyes were used for validation, and 16 GA eyes and 12 normal eyes were used for testing. In the training dataset, 22 out of these 51 eyes had three scans from three visits and these scans were added into the training set for data augmentation. Eyes in the validation and testing set only had one scan. Table 1 summarizes patient demographics in this study.  In total, 2 separate models with the same architecture were trained using the same datasets but with different en face images as the input, namely, the composite OAC false color model and the OCT subRPE model. Both models had the same learning rate of 0.0003 and the same batch normalization momentum of 0.1 with the scale set as false. A dropout of 0.3 was used for the composite OAC model and a dropout of 0.5 was used for the OCT subRPE model. All hyper parameters were tuned on the validation set. Each model was trained with 200 epochs and their specific sDSC for training, validation, and testing are given in Table 2.
A series of evaluation metrics were quantified on the testing cases for each trained model, and their specific values are tabulated in Table 3. For testing, the model outputs, GA probability maps   (0-1), were binarized with a threshold of 0.5. DSC was calculated for each individual image and the mean and standard deviation (SD) were reported in Table 3    To further compare the quantification of GA segmentation generated by our models with the ground truth, the GA square-root area was calculated for all GA cases in the test set. Figure 8 shows the Bland-Altman plots and Pearson's correlation plots of both proposed models. GA square-root area segmented by both models showed significant correlation with ground truth (R2 = 0.99 for composite OAC model and R2 = 0.92 for OCT subRPE model, both p < 0.0001). Both model outputs also showed satisfactory agreement with the ground truth; the composite OAC model resulted in a smaller bias of 11 µm while the OCT subRPE model resulted in a larger bias of 117 µm, compared with the ground truth.

Discussions and conclusion
In this study, we introduced a novel strategy to visualize GA using OAC calculated from OCT datasets, and we demonstrated that the proposed composite OAC false color en face images could be effectively used to identify, segment, and quantify GA areas automatically with a deep learning model.
According to Classification of Atrophy Meetings (CAM) consensus, the definition of GA or cRORA is defined by 3 inclusive OCT criteria: (1) region of hyperTD with at least 250 µm in its greatest linear dimension, (2) zone of attenuation or disruption of the RPE of at least 250 µm in its greatest linear dimension, and (3) evidence of overlying photoreceptor degeneration; and 1 exclusive criteria: the presence of scrolled RPE or other signs of an RPE tear [1]. This definition of GA or cRORA relies solely on average B-scans, but en face imaging of GA using the subRPE slab is a convenient alternative for the detection of GA using fundus autofluorescence and conventional OCT B-scans [20,21,[36][37][38]. Our proposed OAC approach is particularly suitable for GA identification because it allows en face views with direct three-dimensional information of RPE attenuation and disruption. OAC quantifies the tissues' ability to attenuate (absorb and scatter) light, meaning that it is particularly useful to identify high pigmentation (or the lack of) in retinal tissues. Using our custom slab and en face imaging strategy, we show that the composite OAC false color images provide a novel approach to visualizing the RPE with strong contrast. When RPE cells die and lose pigments, their OAC values are reduced as well, resulting in a dark appearance on the composite OAC images. This is very similar to the loss of lipofuscin causing a dark appearance on FAF images, but OAC is not specific to lipofuscin, nor melanin. In addition to the enhanced contrast for attenuated or disrupted RPE, our OAC approach also provides the same depth-resolved advantage as the traditional OCT approach. By incorporating three different en face images from the same slab in the composite OAC images, we provide depth-resolved information, namely the RPE elevation information on an en face view. This approach is also useful for identifying drusen or other forms of RPE elevation in AMD eyes [34].
We used the same deep learning model architecture and trained two models with different input images; the traditional OCT subRPE images that were widely used in previous studies [22,26,31] and the novel composite OAC images we introduced in this study. We purposely designed our study so as to illustrate the importance of image pre-processing for GA segmentation with deep learning. In our study, using the same model architecture, the same hyper-parameter tuning process, and the same patients' OCT scans, we demonstrated a significantly higher agreement with the ground truth by using composite OAC images than by using traditional OCT subRPE images in our testing set. For all 28 eyes in our testing sets, both models successfully identified eyes with GA from normal eyes. For the 16 eyes with GA in the testing sets, the composite OAC model achieved a mean DSC of 0.940 and a SD of 0.032, significantly higher than the OCT subRPE model with a mean DSC of 0.889 and a SD of 0.056 (p = 0.03, paired t-test), respectively. For GA square-root area measurements, the composite OAC model achieved a stronger correlation with the ground truth than the OCT subRPE model (r = 0.995 vs r = 0.959, R 2 = 0.99 vs R 2 = 0.92), as well as a smaller mean bias (11 µm vs 117 µm).
Previous studies have reported GA segmentation models with different input images resulting in different DSC segmentation accuracies. Using CFI as inputs, Feeny et al. reported a DSC of 0.68 ± 0.25 using a random forest algorithm [23] and Liefers et al. reported a DSC of 0.72 ± 0.26 with a deep learning network [39]. In comparison, models with FAF and SD-OCT images have resulted in higher DSC values. Hu et al. reported a DSC of 0.89 ± 0.07 using FAF inputs and a DSC of 0.87 ± 0.09 using SD-OCT inputs with a level set approach [40]. Wu et al. reported a DSC of 0.872 ± 0.066 using SD-OCT images as input with GAN synthesized FAF and a U-Net model for segmentation [31]. In our study, using a simple U-Net model and SS-OCT data, we achieved a DSC of 0.889 ± 0.056 using subRPE images, similar to what were used in SD-OCT studies. Though different datasets were used in different studies and direct comparisons of testing DSC values are somewhat unfair, our SS-OCT subRPE model achieved a segmentation accuracy that was similar to these previous studies. However, the model using composite OAC images achieved a significantly higher segmentation accuracy (0.940 ± 0.032) compared with the model using OCT subRPE images. This is a fair comparison since the same volumetric SS-OCT datasets were used to generate the en face images for input in the models. It should also be noted that though our deep learning model structure is simpler compared to some of these previously published studies, [29][30][31] our segmentation accuracy in terms of DSC is similar to or superior to what were reported before, likely due to the use of denser and higher quality SS-OCT scans as well as the enhanced contrast of GA produced by using the OAC.
The criteria for defining GA or cRORA are evolving and this adds difficulty to the task of GA segmentation. In our study, we followed the criteria proposed by the CAM consensus meeting and defined regions of cRORA based on the attenuation/disruption of the RPE [1]. Figs. 4-7 demonstrate some segmentation inaccuracies of our proposed model using solid arrows to represent false negative segmentation and dashed arrows to represent false positive segmentation. It should be noted that we did not apply a size threshold (250 microns in its greatest linear dimension) to our model outputs, therefore some false positive segmentations (Fig. 4, dashed arrows) may represent incomplete RPE and outer retina atrophy (iRORA) instead of cRORA due to its size. Depending on specific applications, further developments and uses of our model could include size thresholding. There are several possible situations that may lead to possible false negative results. Figure 5 shows example of un-segmented cRORA that may not reach the 250 micron threshold in the horizontal dimension, yet the border of the lesion in a non-horizontal dimension may exceed 250 microns. As shown in B-scans (panels C and F), even though hyperTDs and attenuated/disrupted RPE are present, the RPE is still elevated and the hypertransmission or OAC defect can still be seen on both the composite OAC image and the subRPE OCT image. This raises the question of whether restricting the definition of cRORA to the horizontal dimension really makes sense since the progression of disease occurs in all dimensions. The advantage of en face imaging is that the full geometry of disease progression can be appreciated, not just what occurs in the horizontal dimension on the B-scan. In these situations, both models may be better at actually identifying the full extent of disease. Figure 6 shows an example of a false negative outcome where residual RPE was surrounded by cRORA. According to the manual grading guidelines, such regions were included in the cRORA outline, but both models struggled to correctly segment them as there is still residual RPE signal present. Figure 7 shows an example of a false negative outcome caused by choroidal vessel artifacts. The OCT subRPE model failed to identify these regions correctly, but the composite OAC model correctly labeled these regions as cRORA, since the choroidal vessel artifacts are not as prominent on OAC images as they are on OCT subRPE images.
There are several limitations of our study. First, our approach is an estimation of the true OAC values and might not be an entirely accurate representation of the OAC. We selected this approach for its computation simplicity and its depth-resolved nature. It should also be noted that the assumption of complete signal attenuation within the imaging range is often violated within GA lesions, meaning that our study could underestimate the OAC values within the GA region. There are other OAC estimation approaches that could potentially be more accurate but would not meet the purpose here to develop an automated segmentation of GA [41][42][43]. Second, we used a well-known and simple U-Net model structure and our technical contribution in terms of model architecture is very limited. We made this choice to emphasize the added value of the proposed composite OAC images and to demonstrate that innovation in data pre-processing could be equally important, if not more important than model architecture innovation in medical imaging. Third, we had a limited dataset to work with and only 80 eyes with GA were included in our study, there is also a significant difference in age among our eyes with GA and normal eyes. With the limited number of patients in our study, we were not able to only include one eye per patient and this could potentially introduce bias in our study as well. Future developments and validation using a larger dataset are warranted. Lastly, it should be noted that though our study was based on SS-OCT data, it remains to be seen whether our proposed method can equally be applied to SD-OCT data. Our study has only included OCT scans from PLEX Elite 9000, there could potentially be variation in converting OCT signal back to linear scale among different OCT instruments and future studies are warranted.
In summary, we proposed a novel strategy to visualize and quantify GA using composite OAC false color en face images generated from SS-OCT datasets, and we demonstrated that these OAC images could be used to identify, segment, and quantify GA automatically and accurately with a simple deep learning U-Net model. Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.