Robust non-perfusion area detection in three retinal plexuses using convolutional neural network in OCT angiography

: Non-perfusion area (NPA) is a quantitative biomarker useful for characterizing ischemia in diabetic retinopathy (DR). Projection-resolved optical coherence tomographic angiography (PR-OCTA) allows visualization of retinal capillaries and quantify NPA in individual plexuses. However, poor scan quality can make current NPA detection algorithms unreliable and inaccurate. In this work, we present a robust NPA detection algorithm using convolutional neural network (CNN). By merging information from OCT angiograms and OCT reﬂectance images, the CNN could exclude signal reduction and motion artifacts and detect the avascular features from local to global with the resolution preserved. Across a wide range of signal strength indices, and on both healthy and DR eyes, the algorithm achieved high accuracy and repeatability.

More recently, we developed a "projection-resolved" (PR)-OCTA algorithm [15][16][17] that suppresses projection artifacts on both en face and cross-sectional angiograms and enhances depth resolution of vascular networks [7,9]. By using PR-OCTA, it is possible to acquire accurate, detailed images not only of the superficial vascular complex (SVC) but also the intermediate (ICP) and deep capillary plexuses (DCP). Recent work indicates that an evaluation of the individual plexuses provides a superior diagnostic accuracy in DR [7,9,18,19].
However, automated image analysis can yield incorrect estimations of NPA when the images are compromised by poor scan quality, motion, or signal reduction artifacts. Our recent progress on detecting NPA on 6 × 6mm 2 SVC angiograms using a customized network [20,21]. Despite this preliminary promise, it is more challenging to detect NPA in the ICP and DCP since these angiograms are more susceptible to signal reduction and motion artifacts. With light propagating deeper, the signal strength would be attenuated, the signal to noise ratio might be decreased. Although projection artifacts were suppressed by our prior work, the discontinuity of vasculature caused by superficial vascular shadow is still very obvious, which makes the NPA detection more challenging. Therefore, a robust NPA detection platform is essential to investigate the pathological effects on ICP and DCP.
In this work we present a robust NPA detection algorithm using a CNN that, regardless of image quality, is able to detect NPA in three retinal plexuses by accurately distinguishing decreased flow signal from nonperfusion from artifacts caused by signal reduction and motion artifacts.

Dataset
OCTA scans were collected in Casey Eye Institute, Oregon Health & Science University, Portland, OR, USA and Shanxi Eye Hospital, Taiyuan, Shanxi, PR China. The study was conducted in compliance with the Declaration of Helsinki.
Participants with healthy eyes and patients diagnosed with DR were scanned using a 70kHz commercial OCTA system (RTVue-XR; Optovue, Fremont, CA) with a central wavelength of 840nm. The eyes were scanned using a 3 × 3mm 2 macular scan pattern. There were 304 A-lines in each B-scan and 304 B-scans in each volumetric scan. B-scans were repeated two times at the same location to produce an OCTA volume, and retinal blood flows were detected using the commercial split-spectrum amplitude-decorrelation angiography (SSADA) algorithm [22]. One X-fast and one Y-fast scan were obtained and registered to suppress motion artifacts [23].
For data preprocessing, retinal layers were segmented by a directional graph search algorithm incorporated in our in-house COOL-ART software for OCTA image processing, and segmentation errors were corrected manually on challenging scans ( Fig. 1(A)) [24,25]. The PR-OCTA algorithm [15] removed projection artifacts and visualized retinal capillary plexuses in three distinct slabs located at different depths ( Fig. 1(B)-(E)) [26].

Artifact included challenges in detecting NPA
Signal reduction is one of the major challenges for NPA detection. It is mainly caused by two factors: 1. shadow artifacts caused by vitreous opacity or vignetting; and 2. Defocus artifacts caused by the quality of focusing, astigmatism, and aberrations. In the view of optics, shadow artifacts can be explained by signal attenuation below strongly absorbing or scattering obstructions, and present similar visual characteristics as NPA in that no vessels or capillaries appear in the affected area. This poses a challenge for machine learning methods. On defocused scans, not only is flow signal reduced, but NPA boundaries are blurred, which increases the difficulty of identifying small NPA.
Eye motion artifacts are the other major challenge. They include microssacadic and bulk motion artifacts. Miscrosaccadic artifacts manifest as bright stripes with high flow value on angiograms, and can cause under-segmentation when they happen to fall on NPA. Bulk motion artifacts have similar signal strength to capillaries, and can entail minimal contrast between NPA and vascular areas. In particular, they make calculating a self-adapted thresholding map to detect the NPA area challenging.

Detecting NPA using CNN
Regardless of the causes, a key challenge for accurate NPA detection is differentiating signal loss due to artifacts from true NPA. We hypothesize that the en face mean projection of OCT reflectance, along with the corresponding OCTA angiograms, can be used for this purpose. The key insight is that true NPA should manifest in only the OCTA signal, while signal loss due to artifacts will affect both the structural and angiographic signal simultaneously. In this study, the goal was to reliably segment NPA in three retinal plexuses. Input to the CNN then consisted of the slab-specific mean projection of angiographic and structural images. The mean projection of the OCT reflectance of the entire inner retina is also included because it could help distinguish reduced tissue reflectivity (e.g. due to fluid accumulation) from shadow.
In common CNN architecture, pooling layers are repeated with convolution layers. The replicated pooling layers compress features, enabling detection across multiple scales. However, with capillary widths of only 1-5 pixels, the feature resolution reduction caused by replicated pooling layers is unacceptable since it may completely eliminate capillaries from being considered. Hence, we replaced most pooling layers with atrous kernels, which detect and extract multi-scale features while maintaining feature resolution.
Atrous kernels are generated by dilating the kernel with zeros. The dilation size refers to the distance between the kernel elements. A dilation rate of one is the original 3 × 3 size; while a dilation rate of two adds one zero among the kernel elements, and a dilation rate of three is adding two zeros among the kernel elements (Fig. 2). The overall kernel size increases with the number of zeros included, as does the distance between the kernel elements. As illustrated in Fig. 2, the result is that as kernel size increases the field of view for feature extraction grows, but critically the computational cost is equivalent across each atrous dilation rate. The CNN architecture designed by this study is shown in Fig. 3. During encoding, atrous kernels were applied in convolution layers. In deeper layers of the network the feature dimension increases. During decoding, the U-net-like architecture concatenated raw features from the encoding. To detect variously sized NPAs, atrous kernels were also used to parallelize feature decoding and extract the multi-scale features for decision-making. The softmax activation function generated the NPA probability maps as output. Pixels with probability greater than 0.5 were labeled as NPA.

Subjects and ground truth generation
The data used in this study consisted of a total 1428 scans acquired from both eyes of 428 participants, including 122 normal controls, 56 diabetics without retinopathy, 118 with mild to moderate DR, and 132 with severe DR. DR scans included 2 repeat scans within each visit and 2 − 3 follow-up visits per patient. We reserved 110 cases for testing, and used the remainder for training. 10 healthy eyes were also scanned to generate scans with manufactured reduced signal strength artifacts for training and testing the robustness of the proposed algorithm in low-quality images. We manufactured scan quality reduction by placing neutral density filters in the beam path to simulate shadow artifacts and adjusted the defocus parameter on the AngioVue software manually to simulate defocus artifacts. The optical density filters ranged from 0.1 to 0.6, and the defocus ranged from −0.5 to −4.5 diopters related to the baseline which was scanned without neutral density filters and with optimal focus. Five of ten manufactured signal reduction scan sets were reserved for validation.
Certified graders manually delineated the ground truth NPA and a retinal specialist reviewed the results on our in-house software which was equipped with a semi-automated NPA detection algorithm (Fig. 4). Shadow graphics were identified on the both inner retinal slab (Fig. 4 bottom left) and plexus-specific en face reflectance images (Fig. 4 bottom right). The manual correction function allowed graders to remove the over-detected NPA in shadow area, and delineate the under-detected NPA area with a stylus or mouse, which a region growing software then would fill this area automatically. To generate reliable ground truth, three graders were manually defined the NPA, and a majority-vote scheme was applied that the NPA was graded by two more graders in pixel-level( Fig. 4(B1)-(B3)).

Implementation
The training loss was calculated by the cross-entropy function with L2 regularization to prevent overfitting: where y i is the label, p i is the output from a softmax activation function, N is the number of classes, w is vector containing the weights of the convolution kernels, and α is the L2 regularization weight. During the training, the Adam optimizer was used to accelerate convergence. Batch normalization was also applied to help prevent over-fitting and improve generalization capability. The learning rate was getting smaller with training to achieve the optimal parameters. The designed CNN was implemented using Tensor Flow with an intel i7 − 6800K@3.40GHZ, DDR4 128GB RAM, and Nvidia 2080Ti graphics card (4352 CUDA cores, 1635MHZ Boost clock, 11GB GDDR6 memory).

Effect of signal attenuation
As illustrated by Fig. 5, neutral density filters attenuated the brightness of OCT angiograms, but the vasculature was sharply defined because the laser beam was still focused on the retina (Fig. 5B). This can be qualified by the signal strength index (SSI), a proprietary measurement provided by averaging the logarithmic reflectance of tissue volume and normalizing the value in [0, 100] [27]. SSI relates to overall image brightness, a proxy for image quality. Defocus, however, not only reduces SSI but blurs the entire angiogram ( Fig. 5(C)). As demonstrated in Fig. 5, the effects of the neutral density filter and defocus are different, however in neither case are additional imaging artifacts introduced by the manufactured signal attenuation.
To better explore effect of these artifacts, we normalized SSI by its highest value in the complete manufactured artifact data set. This normalization in this manner indicates that neutral density filters effectively lowered SSI (Fig. 6(A)).  In healthy eyes, only the foveal avascular zone (FAZ) is avascular. We determined an FAZ baseline from the inner retinal maximum projection en face angiogram, since the FAZ has the best contrast (Fig. 7(A)). Even in neutral density filter signal-attenuated OCTA scans, the detected NPAs were quite similar to the unaltered scans ( Fig. 7(B)-(D)). In order to directly quantify these results, we normalized the detected FAZ areas to the manually delineated baseline case. Ideally the normalized FAZ area would have a value of 1, indicating that signal reduction did not affect the CNN's performance. As can be seen by the essentially similar shape and size of the segmented NPA even as SSI declines from left to right in the figure, the algorithm's performance was not affected detrimentally by SSI decline. Indeed, in all of our measurements normalized FAZ area was close to unity, and showed no significant trend with SSI reduction (Fig. 6(B)-(D)). Next, we tested the performance of the proposed algorithm on manufactured defocus scans. The beam defocus was calibrated by diopters. By again normalizing SSI and FAZ area to the highest measured SSI value in the defocus data set, we found that defocus successfully produces signal attenuation but did not adversely affect our algorithm's performance (Fig. 8). Visual inspection confirms this conclusion, as the shape and area of the detected FAZ area did not obviously change with increasing defocus (Fig. 9).
In order to quantify the effect of SSI reduction directly, we compared normalized FAZ area in our manufactured artifact data set directly, again found no significant trend (Fig. 10).

Effect of artifacts in clinical dataset
While it is not possible to manufacture other artifacts in a similar manner to signal strength reduction, it is still possible to assess our algorithm's performance on representative scans qualitatively. As noted, shadow artifacts are especially detrimental for NPA detection, since they mimic the appearance of avascular regions. As indicated in a representative scan (from NPDR eye, Fig. 11), our algorithm was able to correctly exclude shadow area from being detected as NPA.
In our previous algorithms, microsaccadic artifacts can also disrupt NPA detection and quantification. The algorithm we propose here also correctly classified regions affected by microsaccadic artifacts as NPA (Fig. 12).    Finally, bulk motion artifacts can increase the background level in OCTA scans, potentially diminishing contrast between vasculature and static tissue. This artifact also did not obviously affect our algorithm's performance (Fig. 13).

Performance on clinical DR scans
To further assess performance of the algorithm, we evaluated its segmentation accuracy on DR scans, since DR is a key application for NPA detection. We assessed the agreement between the outputs of the algorithm and the ground truth using four metrics: intersection over union (IOU), precision, recall, and F1 score. They are defined as: where GT is the manually graded NPA and Out is the NPA detected by the proposed algorithm, TP is true positive, FP is the false positive, and FN is false negative. Using these metrics, we explored the effect of DR severity on NPA detection (Table 1). Across all DR severities and in each plexus, the algorithm achieved high IOU, precision, recall, and F1 Table 1. Agreement between detected NPA and ground truth (mean ± standard deviation) grouped by DR severity score. As expected, the extent of NPA increases with DR severity. Importantly, NPA on diabetics without retinopathy was larger than that in normal controls, indicating that NPA could serve as a biomarker for detecting DR at an early stage. However, the mean IOU deteriorated in more severe DR in the ICP and DCP. Despite the PR-OCTA algorithm, some projection artifacts remain in these plexuses. Scattering and absorption of light also attenuates flow signal in the deeper layers, reducing the NPA contrast with the surrounding vasculature. The decline in performance, however, partially due to the difficulty of establishing an adequate ground truth. In eyes with large NPA and significant signal attenuation, human graders may find it difficult to clearly delineate the true NPA. NPA detection on a representative severe DR case (SSI = 54) is demonstrated (Fig. 14). The detected NPA well matches the ground truth in the SVC (Fig. 14(A1)-(D1)), where NPA contrasts most with perfused regions. In the ICP (Fig. 14(A2)-(D2)), the primary detected regions still match the ground truth; and even in the DCP with least contrast (Fig. 14(A3)-(D3)) the detection was accurate. After close visual inspection, we found that small capillaries with low flow strength and isolated area were excluded on the manually graded ground truth; the algorithm-labeled regions appear more "realistic" than the human-labeled. This explains why mean IOU is low in severer cases.

Repeatability assessment
50 healthy eyes enrolled in our clinical studies were scanned twice to test repeatability by calculating the coefficient of variation (CV) in NPA segmentation. Scans affected by shadow and motion artifacts were also included in this dataset. Our study shows the intra-visit repeatability is very high ( Table 2). In a healthy eye, detected NPA should be identical in the whole inner retinal slab, SVC, ICP and DCP. This was demonstrated by the nearly consistent NPA area in (Table 1). Moreover, the size of detected FAZ is also highly matched with our previous outcome [7], which indicated that the performance of the algorithm is robust in all layers and reliable in clinical studies. Figure 15 demonstrated a test scan with focal shadow artifacts. The designed CNN was capable of differentiating shadow artifacts from NPA. The majority of shadow artifacts was excluded in outputs, and only a few small areas were mis-detected ( Fig. 15(C1)). Despite these occasional mis-segmentations, our algorithm performs favorably against other algorithms that have either ignored the effect on shadow artifacts, as in [28,29] or required manual corrections [30].

Discussion and conclusion
While NPA is as an important biomarker for DR [7,9,14,19], accurately quantifying NPA from OCTA scans can be challenging. Particularly, artifacts that reduce signal have confounded previous automated approaches to identify NPA, resulting in misclassification [28,29,31], that require manual correction [30]. While our previous work also used the OCT reflectance image to account for shadow artifacts, our results were limited to the SVC [20,21], requiring manual correction in the ICP or DCP. This is not an insignificant task, since it would require an expert to examine 2/3 of plexus en face images. Because the NPA in ICP and DCP can provide information about DR progression distinct from SVC, an accurate assessment of the these plexuses is critical [32]. However, automated NPA detection in the deeper plexuses poses a significant challenge. First, it requires an accurate means of removing flow projection artifacts. Previous methods such as slab subtraction could not remove the projection artifacts without introducing disruption in the vasculature that led to false identification of NPA [33]. Secondly, signal strength diminishes in deeper anatomic slabs, making algorithms more vulnerable to error from signal reducing artifacts, necessitating an algorithm that can demonstrate insensitivity to such variation (section 4.1). The ability to mitigate the effect of inherent signal variation between different images is especially important because diseased eyes often have lower signal strength (SSI) due to complications such as the presence of edema, cataract, floaters and poor tear film quality. Algorithm dependence on SSI could therefore compound the effect of diseases if it were dependent on SSI, since SSI-biased NPA detection is also potentially pathology-biased.
We have presented a method that accurately detects NPA on three retinal plexuses. The test set used to evaluate performance did not exclude any OCTA scans due to image quality regardless of artifacts or high noise levels, indicating that the CNN could be effective on clinical data sets.
Several design choices contributed to the CNN's performance. By including both plexusspecific angiograms and en face OCT reflectance maps in the CNN input, it was able to differentiate NPA from shadow artifacts from NPA. The CNN also incorporated atrous kernels, which enabled it to detect NPA at multiple scales while maintaining the resolution of the detected regions. The low CV indicated the high repeatability of healthy controls collected on clinic. This proposed method contributed to a more repeatable results compared to manual grading, which might be affected by subjective judgements, thresholding and other image processing based methods [19,31,34]. Finally, the CNN trained on a diverse data set, which allowed it to learn a large variety of features relevant to the NPA detection task and facilitated accurate performance even on images with severe disease or low quality. One possible deficiency in the proposed solution is that its segmentation accuracy apparently diminished as DR severity increased. While disagreement with the ground truth is obviously not preferred, NPA segmentation in severe DR cases is difficult not just for automated procedures but for human graders as well. Thus, a reduced disagreement between the algorithm and human grader may not necessarily indicate a worse performance.