Retinal layer thicknesses retrieved with different segmentation algorithms from optical coherence tomography scans acquired under different signal-to-noise ratio conditions

: Glaucomatous damage can be quantiﬁed by measuring the thickness of diﬀerent retinal layers. However, poor image quality may hamper the accuracy of the layer thickness measurement. We determined the eﬀect of poor image quality (low signal-to-noise ratio) on the diﬀerent layer thicknesses and compared diﬀerent segmentation algorithms regarding their robustness against this degrading eﬀect. For this purpose, we performed OCT measurements in the macular area of healthy subjects and degraded the image quality by employing neutral density ﬁlters. We also analysed OCT scans from glaucoma patients with diﬀerent disease severity. The algorithms used were: The Canon HS-100’s built-in algorithm, DOCTRAP, IOWA, and FWHM, an approach we developed. We showed that the four algorithms used were all susceptible to noise at a varying degree, depending on the retinal layer assessed, and the results between diﬀerent algorithms were not interchangeable. The algorithms also diﬀered in their ability to diﬀerentiate between young healthy eyes and older glaucoma eyes and failed to accurately separate diﬀerent glaucoma stages from each other.


Introduction
Glaucoma is an age-related optic neuropathy. The primary site involved in this disease consists of the retinal ganglion cells (RGCs). Loss of RGC axons causes excavation of the optic nerve head (ONH) and thinning of the retinal nerve fiber layer (RNFL). The ganglion cell complex (GCC) consists of the RNFL, the ganglion cell layer (GCL) and the inner plexiform layer (IPL), while the GCL and the IPL together are referred to as GCIPL. In order to measure the thickness of these retinal layers in vivo, the currently most commonly used technique is optical coherence tomography (OCT) [1]. It has been shown that OCT-reported layer thicknesses depend largely on image quality; a lower image quality reduces the observed thickness [2]. This underestimation of layer thickness hampers the usage of OCT both for the detection of glaucoma (screening) and for progression detection, once glaucoma has been diagnosed (monitoring).
OCT uses infrared radiation, which passes through several different media when assessing the retina, and is therefore subject to noise from media opacities [3]. Media opacities, in elderly patients commonly caused by cataract, thus decrease image quality and indeed make the RNFL seem thinner [4]. A similar thinning with a decrease in image quality was found in healthy subjects, using neutral density filters [5]. On the other hand, RNFL thickness may be overestimated due to image intensity transformation of the spectrometer output, done as a pre-processing step in order to make low reflective layers more visible [6]. Finally, the changes in the retinal tissue that occurs in glaucoma may compromise image quality [7,8]. In order to accurately measure RNFL thickness, it is of great interest to fully understand the influence of image quality and image processing on the reported RNFL thickness [9].
In this study, we aimed (1) to reproduce the previously reported degrading effect of poor image quality (low signal-to-noise ratio (SNR)) on the different layer thicknesses and (2) to compare different segmentation algorithms regarding their robustness against this degrading effect. For this purpose, we performed OCT measurements in healthy subjects and degraded the image quality by employing neutral density filters. We also analyzed OCT scans from glaucoma patients, in which the image quality is generally lower, amongst others due to a thinner RNFL with lower reflectivity. We analyzed the obtained images using (i) device-specific proprietary software, (ii) freely available segmentation software, being the Iowa reference algorithm [10], which was also used by [5], and (iii) the DOCTRAP software [11]. As a reference (iv), we followed the theoretical framework that we developed recently [6]. In that study, we introduced the full width at half maximum (FWHM) of the square of the first peak in the Fourier-transformed spectrometer output as an unbiased estimate of the RNFL thickness. We focused on the macular area, which is at least as relevant to glaucoma diagnostics as the peripapillary area [12][13][14][15] and allowed us to address different retinal layers. In a subsequent study, we will address the peripapillary area.

Study population
Twenty healthy subjects who fulfilled the inclusion criteria participated in this cross-sectional, observational study. The inclusion criteria were (1) age between 18 to 50, (2) best-corrected visual acuity of 1.0 or better in the study eye, (3) a spherical equivalent > -4 diopters (that is, high myopia was not allowed), (4) a normal visual field as tested with frequency doubling technology (FDT C20-1 screening mode; Carl Zeiss, Jena, Germany), (5) an intraocular pressure of 21 mmHg or below (non-contact tonometry), and (6) no ophthalmic history or first-degree relatives with a history of glaucoma. Mean (standard deviation [SD]) age was 28.5 (5.5) years; 45% were male. If both eyes were eligible, the right eye was chosen; this resulted in 19 right eyes and 1 left eye. The ethics board of the University Medical Center Groningen (UMCG) approved the study protocol. All participants provided written informed consent. The study followed the tenets of the Declaration of Helsinki.
We selected 102 subjects with glaucoma from the database of the Groningen Longitudinal Glaucoma Study [16]. Glaucoma patients were divided into three groups according to the mean deviation (MD) of their latest standard automated perimetry (HFA 30-2 SITA fast; Carl Zeiss Meditec AG, Jena, Germany) recording of their right eye. The three groups were early glaucoma (MD better than -6 dB), moderate glaucoma (MD between -12 and -6 dB), and severe glaucoma (over -12 dB). Within the database, the study patients have a randomly allotted study number. Randomization was done by selecting, for each group, the first 34 patients whose right eye fulfilled the inclusion criteria of spherical equivalent > -4 diopters and who had OCT scans taken with the Canon HS-100 device (see below). From the 102 glaucoma patients, 5 early, 5 moderate, and 9 severe patients were removed due to the presence of other eye diseases that might interfere with the segmentation of inner retinal layers (e.g., macular pucker, cystoid macular oedema), leaving 29 early, 29 moderate and 25 severe glaucoma patients. Mean (SD) age of the glaucoma patients was 72.8 (8.0), 72.7 (7.0), and 73.6 (9.7) years for early, moderate, and severe glaucoma patients, respectively; 53%, 56%, and 55% were male.

Data collection
Data collection in healthy subjects was done by a single operator (TH) using the Canon OCT-HS100 device (software version 4.2.1). Twelve OCT scans per subject were taken, three without a filter for reference, and three scans per different neutral density filter. Filters were used to attenuate the light, which lowers the SNR. Filter strengths used were 0.3, 0.6, and 0.9 optical density (Neutral density NIR filter; Edmund Optics, Barrington, USA), corresponding to 50%, 25%, and 12.5% of the light getting through in a single pass, respectively. Pupils were dilated with 0.5% tropicamide. Volume scans were taken from the macular area using the glaucoma scan mode (128 B-scans, each with 1024 A-scans in a 10×10 mm area centered at the fovea, with a depth resolution of 3 µm). The quality parameter (range: 0-10; the higher the better) provided by the device was also recorded. The OCT scans from the healthy subjects can be downloaded from dataverse.nl for comparison with future segmentation algorithms.

Regions of interest and segmentation algorithms
We analyzed the collected data for two different regions of interest (ROIs) with four different segmentation algorithms. We primarily focused on the RNFL; we also analyzed the GCIPL and total retinal thickness (TRT). The ROIs were (1) a circular area around the fovea with a diameter of 5 mm, excluding the fovea itself (eccentricity between 0.5 and 2.5 mm), named '5 mm ROI', and (2) a circular area with a diameter of 10 mm, connected to the 5 mm ROI (eccentricity between 2.5 and 5 mm), named '10 mm ROI'. These areas were chosen due to them being used by the Canon proprietary software.
The algorithms were (i) the device-specific proprietary software (Canon OCT-HS100, software version 4.2.1), (ii) the Iowa reference algorithm (Retinal Image Analysis Lab, Iowa Institute for Biomedical Imaging, Iowa City, IA, version 3.8.0; [10,17,18]). Scans were analyzed using the standalone desktop application called OCTExplorer, in which we chose the automated 10-layer macular OCT segmentation for the DICOM files exported from the Canon device. (iii) the Duke Optical Coherence Tomography Retinal Analysis Program (DOCTRAP software version 61.8; [11,19,20]. DOCTAP is an automated segmentation algorithm that runs on MATLAB. For the adjustable parameters, we manually chose the eye, scan orientation as vertical, 9.77 µm/pixel horizontal and 1.7 µm/pixel vertical pixel resolution. (iv) the FWHM of the square of the first peak in the linearly scaled intensity image. (details given below; [6]). For IOWA and DOCTRAP, we used the original, logarithmic images.
For algorithms with predefined regions (CANON and IOWA), we selected those regions that covered the above-mentioned ROIs as accurately as possible. CANON provides segmentation results for 8 areas; a ninth area corresponding to the inner 1 mm circle centred at the fovea is not reported by CANON and therefore discarded from all analyses (Fig. 1). For the 5 mm ROI and the 10 mm ROI, we took, for CANON, the average value from the inner 4 areas and outer 4 areas, respectively ( Fig. 1(A)). For the IOWA algorithm, we selected the 10-2 grid, excluding the 4 squares centered at the fovea, as 5 mm ROI ( Fig. 1(C)). The IOWA algorithm does not provide an equivalent of the 10 mm ROI. For algorithms without predefined regions (DOCTRAP and FWHM), we defined the two ROIs as depicted in Fig. 1(B). For each rectangular area of 1×0.5 mm in Fig. 1(B), we calculated the mean thickness from 6 B-scans of 100 A-scans each, and these mean thicknesses were subsequently averaged over the concerning ROIs.
For FWHM, we developed segmentation software using MATLAB (Mathworks, version R2018a, Natick, Massachusetts: The MathWorks Inc.). First, B-scan intensity values were transformed to (squared) spectrometer values assuming that the 0-255 grayscale intensity values correspond to a 50 dB dynamic range [21]. Dynamic range is device-specific and estimated to be between 40 to 60 dB [22,23]. The raw spectrometer values depicted as the linear-scale intensity I linear values are, therefore: where I linear (i, j) is the transformed intensity value at a location (i,j) and I log (i, j) the corresponding original log-scale intensity value. For the linear-scale images, a median filter of 5×10 pixels was applied (5 along each A-scan; 10 successive A-scans) twice (first on the original image and then on the initial resulting image) Table 1

. Characteristics of the study population (mean with SD between brackets unless stated otherwise) a
Healthy subjects n = 20 Glaucoma patients Age ( to remove speckle noise from individual B-scans to align individual A-scans accurately. A-scans were aligned in the z-direction based on intensity threshold. The intensity threshold for detecting retinal structures was set to 20 times the mean intensity of the noise level (mean intensity in the first 50 pixels in each individual A-scan). Peaks or high reflective areas in an A-scan are usually in the RNFL and in the RPE. When two highly reflective areas, separated by a low reflective area defined as a minimum of 20 pixels (34 µm) in length below the set intensity threshold, were identified, the set intensity threshold was used to find the edge of the inner limiting membrane. If fewer than two peaks were found, the intensity threshold was lowered by fourfold (to 5 times the mean intensity of the noise level) in order to detect peaks in case of a weak signal due to ND filters or cataract. If only one peak, assumed to be the RPE, or no peaks were found with either of the intensity threshold levels, the A-scan was assigned as NaN (typically in case of, for example, a floater or a blink during image acquisition). A total of 100 consecutive linear-scale A-scans were averaged to construct an RNFL aligned B-scan. If all 100 consecutive A-scans had NaN value, the B-scan was considered missing. Table 1 presents the average percentage of missing locations per B-scan in the corresponding dataset. RNFL thickness was calculated, per B-scan, from the averaged 100 linear-scale A-scan as the FWHM above the noise floor (see Fig. 2). In order to avoid a too narrow thickness estimate from a noisy averaged A-scan, gaps of no more than 5 pixels (8.5 µm) were allowed between two consecutive peaks. When the distance was over 5 pixels, only the first peak was considered. In order to account for nonexistent RNFL in the different glaucoma conditions, RNFL thickness was defined to be zero if the maximum intensity of the RNFL peak was less than one-tenth of the maximum intensity of the RPE (see Table 1). TRT was estimated as the distance between the ILM border, used in FWHM estimation, and the half maximum of the RPE peak at the RPE/choroid border (TRT is marked with red circles in Figs. 2 & 3). If no RNFL peak was found, TRT was measured from the detected layer that fulfilled the intensity threshold, most likely the GCIPL. RNFL thicknesses from six vertically oriented B-scans were averaged in order to calculate the average RNFL thickness in a 1×0.5 mm rectangular area. These rectangular areas were attributed to the two ROIs described above ( Fig. 1(B)).

Data analysis
Mean RNFL thickness, GCIPL thickness, and TRT were calculated for both ROIs; the 10 mm ROI could not be analysed with the IOWA algorithm; the GCIPL thickness was only available for the Canon proprietary software and IOWA. Three images were taken per neutral density filter condition; all images were analysed with all algorithms. The mean thickness (calculated per filter condition and per algorithm) of the three images was used for the between-algorithm and between-filter condition analysis.
Data were presented as box plots showing for each algorithm and ROI, the RNFL thickness, GCIPL thickness, and TRTs as a function of optical density for healthy subjects and as a function of disease stage for glaucoma cases. We studied, for each algorithm separately, the influence of SNR with two-way ANOVA and the influence of disease presence and stage with one-way ANOVA. For post-hoc analysis, we used Bonferroni corrected t-tests. Repeatability was studied within, by using the three separate measurements, and between algorithms using the intraclass correlation coefficient (ICC) agreement. Because we expect a systematic difference between FWHM and the other algorithms (see Introduction section), we also used Pearson's correlation coefficient to compare the thickness measurements between FWHM and the other algorithms and we compared the thickness differences between FWHM and the other algorithms in the 0.0 OD condition using two-way ANOVA. The analysis was performed using R (version 3.6.0; R Foundation for Statistical Computing, Vienna, Austria). A P value of 0.05 or less was considered statistically significant. Figure 2 shows how the different algorithms (CANON, DOCTRAP, IOWA, and FWHM) segment the RNFL and the TRT, as a function of SNR (filter strength). All images are from the same subject and the same retinal location. Figure 3 illustrates the corresponding results as a function of glaucoma stage (early, moderate, and severe; from three different subjects). Figures 4 and 5 show boxplots with RNFL thickness for the different algorithms as a function of filter strength (upper row) and disease stage (lower row), for the 5 mm (Fig. 4) and 10 mm ( Fig. 5; not available for IOWA) ROI, respectively. Clearly, the RNFL is thinner in the 5 mm ROI than in the 10 mm ROI. FWHM yielded a thinner RNFL than the other algorithms in the no filter  The effect of glaucoma on the RNFL thickness was more pronounced for the 10 mm ROI than for the 5 mm ROI. Table 2 presents the corresponding Bonferroni corrected P values from the ANOVA analyses. All algorithms were more or less subject to layer thinning due to noise caused by the ND filters. For the 5 mm ROI, CANON was most robust regarding the applied filters but also failed to recognize glaucoma. Figure 6 shows boxplots with GCIPL thickness for the two algorithms that provided this metric (CANON and IOWA) as a function of filter strength (upper row) and disease stage (lower row), for the 5 mm (both algorithms) and 10 mm (not available for IOWA) ROI. Obviously, the GCIPL is thicker in the 5 mm ROI than in the 10 mm ROI. Table 2 (middle part) presents the corresponding Bonferroni corrected P values from the ANOVA analyses. CANON but not IOWA was robust regarding the applied filters; both algorithms easily detected the presence of glaucoma. Figure 7 shows boxplots with TRT for the different algorithms as a function of filter strength (upper row) and disease stage (lower row), for the 5 mm (DOCTRAP, IOWA, and FWHM) and 10 mm ROI (DOCTRAP and FWHM). TRT is thicker in the 5 mm ROI than in the 10 mm ROI. Table 2 (lower part) presents the corresponding Bonferroni corrected P values from the ANOVA analyses. TRT measurements were influenced by the applied filters for all algorithms. IOWA and FWHM appeared to be robust against noise up to the 0.3 OD filter; DOCTRAP up to the 0.6 OD filter. However, there was a slightly thicker TRT for the 0.3 OD filter for DOCTRAP, for both ROIs.

Results
The last three columns of Table 2 show the suitability of the algorithms, layers, and ROIs for discriminating between the successive glaucoma stages. Regarding the layers, GCIPL and TRT were more suitable for discriminating between early and moderate glaucoma than RNFL; for RNFL, the 10 mm ROI was better than the 5 mm ROI. All current algorithms failed to discriminate between moderate and severe glaucoma, for all employed layers and ROIs, except for CANON assessing the GCIPL at 10 mm ROI. For CANON, the RNFL thickness seemed also significantly different between moderate and severe glaucoma at 5 mm ROI, but the RNFL thickness appeared to increase rather than decrease with disease severity, suggesting a chance finding. Moreover, CANON failed for this combination of layer and ROI to discriminate between healthy and glaucoma. Table 3 depicts the intraclass correlation coefficient (ICC) agreement for the RNFL, GCIPL and, TRT assessments, both between different segmentation algorithms and within each algorithm, the latter based on the three measurements performed for each filter strength. According to the interpretation guidelines by Cicchetti [24], TRT showed excellent agreement for all tested algorithms (ICC > 0.75). All four algorithms showed good (ICC > 0.60) repeatability for the RNFL thickness in the 5 mm ROI and excellent repeatability for the 10 mm ROI for essentially all filter strengths. For the GCIPL thickness, CANON showed excellent repeatability for all filter strengths whereas IOWA showed excellent repeatability only for 0.0 and 0.3 OD. Regarding the between algorithm comparisons, the agreement was excellent for TRT. For RNFL and GCIPL the algorithm comparisons were poor (ICC < 0.4), with the exception of a good agreement between CANON and DOCTRAP for the 10 mm ROI RNFL and an excellent agreement between CANON and IOWA for the 5 mm ROI GCIPL (up to 0.3 OD). Because there is a systematic difference in RNFL thickness between FWHM and the other algorithms (see above), compromising the ICC agreement, we also calculated Pearson's correlation coefficients for RNFL thickness between FWHM and CANON, IOWA, and DOCTRAP in the 5 mm RNFL ROI, yielding 0.58, 0.44, and

Discussion
Optical coherence tomography segmentation results of retinal layers are influenced by artificial noise -most layer thicknesses decrease with increasing neutral density filter strength. For RNFL, CANON is most robust against this degrading effect, followed by DOCTRAP and FWHM, and IOWA. FWHM gives a thinner RNFL than the other algorithms. For GCIPL, CANON but not IOWA is robust against the effect of the applied filters. For TRT, DOCTRAP, FWHM, and IOWA show a similar robustness. GCIPL and TRT are more suitable for discriminating between healthy and glaucoma, and between early and moderate glaucoma, than RNFL; for RNFL, the 10 mm ROI is better than the 5 mm ROI. The current algorithms mostly fail to discriminate between moderate and severe glaucoma, for all employed layers and ROIs.

Comparison with literature
Darma et al. showed, using the IOWA algorithm, that an optical density of 0.26 caused by an artificial filter produces a clinically meaningful decrease of 1 SD in the GCC thickness [5]. We found a similar layer thinning at 0.3 OD for the IOWA algorithm for both RNFL and GCIPL at the 5 mm ROI used by IOWA. Other algorithms were robust against the influence of noise at 0.3 OD, and showed a smaller decrease in layer thickness for higher filter strengths. Darma et al. also reported significant differences in TRT measurements, although these differences were within one SD from the baseline measurements even at 0.67 OD, and thus deemed not to be clinically meaningful [5]. In agreement with this, in our data different filter strengths had only a small influence on the TRT in all algorithms and ROIs assessed. Differences were significant, but the IQRs amply overlapped across the different filter strengths (Fig. 7). Wolf-Schnurrbusch et al. compared six different OCT devices in their ability to measure central retinal thickness, and conclude that different OCT devices cannot be used interchangeably for the measurement of macular thickness, and that the differences between different devices are due to differences in the segmentation software [25]. A more recent study by Cho and Hwang showed, in four different OCT devices, that the macular retinal thickness in 32 healthy eyes differed by 0.7 to 27.3 µm, and only results between Spectralis OCT and RS-3000 Advance OCT showed good agreement [26]. Testing between devices is usually performed using the device-specific software, and therefore differences between devices can be attributed either to differences in the hardware or to differences in the built-in segmentation software. In our previous study, we showed that eleven different OCT devices produce comparable results, when the thickness is analyzed in an identical way, suggesting that differences due to hardware can possibly be eliminated [21]. Here we compared measurements from a single device using four different algorithms, and showed variable agreement between different algorithms in RNFL and GCIPL measurements, in both ROIs, which worsened even more in noisy conditions. Different algorithms, however, showed good agreement when assessing TRT, even in noisy conditions, possibly due to similarly defined ILM and choroid borders.
Grossly speaking, our data suggest that discriminating between moderate and severe glaucoma with OCT is difficult, indirectly indicating its limited value for progression detection in these stages. This is often attributed to a floor effect. Earlier, peripapillary RNFL (pRNFL) measurements have been shown to reach a measurement floor at a standard automated perimetry mean deviation somewhere between -10 and -15 dB [12,27,28]. It has been suggested, however, that macular measurements may be superior to peripapillary measurements for detecting change between moderate and severe glaucoma [29]. Our results suggest that detecting progression from RNFL, GCIPL, or TRT in either ROIs with any of the algorithms is challenging. Our findings showing that the mean GCIPL thickness does not produce significant differences in the late-stages of glaucoma is in agreement with findings by Sung et al. [13]. However some previous studies have shown that the GCIPL between the fovea and the optic disc, being an area that is spared relatively long in glaucoma, could provide valuable information on late disease progression [13][14][15]. We explored this in our data by looking at the GCIPL thickness in area 2 and 4 in Fig. 1(A). We found no clear differences between moderate and severe glaucoma for either IOWA (P=1.0) or CANON (P=0.05).

Limitations and strengths
A limitation of our study is that we were unable to measure exactly the same areas with all of the algorithms. Similarly, some algorithms were limited in the number of layers they could segment. While the Canon has an imaging area of 10×10 mm, other OCT devices imaging areas may be limited to an area of 6×6 mm. The measurement area in the Canon device is divided into eight regions ( Fig. 1(A)), of which four are in the inner 5 mm ROI and the rest in the outer 10 mm ROI. IOWA analyzes preset areas within a 6 mm circle. We chose their 10-2 grid as it matches the 5 mm ROI as accurately as possible ( Fig. 1(C)). DOCTRAP does not have segmentation for the GCL or IPL layers. For DOCTRAP we used the same grid as for the FWHM algorithm ( Fig. 1(B)). Large ROIs were chosen as they have higher test-retest accuracy [30]. Although our primary outcome was the macular RNFL thickness, under-and overestimation biases of the RNFL thickness undoubtedly lead to an observed change in the underlying layers, mainly the GCL [6]. A strength of our study is that we compared four different algorithms under a wide range of SNR conditions in the same group of healthy subjects, and that we applied the same algorithms and methodology to subjects with different glaucoma stages. This contrasts with most earlier studies on the effect of SNR on OCT layer thickness assessment, which used only the device's proprietary algorithm or a single third-party algorithm. We used ND filters over computational SNR degradation. ND filters offer a physical proof of signal strength degradation and, importantly, they can be used in case of proprietary software inside the device where access to the raw images -needed for computational SNR degradation -is not possible. As all scans were acquired using the Canon device, the algorithm comparisons could be biased towards the CANON algorithm. In this regard it is worthwhile to mention that software differences seem to matter more than hardware differences in current commercial devices [21].
Importantly, the healthy subjects and glaucoma patients in our study differed regarding age. As such, any differences found between these two groups could be due to either glaucoma or ageing. The reason for the age difference is that we wanted to have clear optics, and limited variability therein, in our primary experiment, the effect of filter strength on OCT layer thickness assessment. This implies young healthy subjects. Despite the large database of the Groningen Longitudinal Glaucoma Study, it was not possible to find a reasonable number of age-similar glaucoma patients. Within the glaucoma patients, however, the stage subgroups were age-similar and differences in layer thicknesses between these groups could thus be attributed to glaucoma.

Implications
In this study we investigated whether low SNR images could be reliably segmented with the algorithms in question. As the answer -in general -is no, criteria for a good image become pivotal. The canon OCT provides an image quality index. The recommended value of this index is 7 or higher (scale: 0-10). However, the technician should not just observe the index, but should also pay attention to blink and motion. Interestingly, the value of 7 was only reached (on average) for the no filter condition in the healthy subjects ( Table 1). As such, the cut-off point seems well-chosen. In glaucoma, however, the mean index value was below 7 (Table 1), indicating that clinicians often have to deal with scans that are formally suboptimal.
What are the clinical implications of our findings? First, in the past, apparent thinning of retinal layers has been observed in patients with cataract, where the RNFL thickness increased after cataract surgery [4,31,32]. Such a thickness increase could be due to improvement of the optical properties of the eye or to a real thickness change. Our study confirms the strong association between image quality and layer thicknesses as observed with OCT. Therefore, it is crucial to emphasize in a clinical setting that the best possible image quality has to be achieved during image acquisition. The acquisition time with OCT is only a few seconds, and averaging more B-scans or taking an entirely new scan doesn't take too much of the technician's or patient's time, but could greatly improve the accuracy of OCT interpretation. However, averaging is not possible in all OCT devices, and in devices where averaging is possible, excessive averaging increases the acquisition time, which in turn will result in more blink artifacts, motion artifacts, and discomfort for the patient. Recently, post-processing techniques that use deep learning for denoising the images have been proposed to achieve a better image quality for segmentation [33].
Albeit promising, such post-processing methods are not a feasible option in a clinical setting; for this, the entire process has to be implemented in the device itself. For the time being, the most important message is that technicians should be instructed to check the acquired image and to repeat the test in case of a low SNR image, to see if the image quality can be improved by better scan alignment, and in case of blink or patient movement. Second, retinal layers have been reported to become thinner with age [34,35]. Apart from a real thinning, a degradation of the optical quality of the eye with age may have its impact as well. A thorough understanding of the effect of image quality on layer thickness assessment is pivotal for the interpretation of longitudinal data, both in healthy subjects -for understanding normal ageing -and in glaucoma patients -for an unbiased progression detection.
With current knowledge, macular RNFL thickness assessment is clearly sensitive to noise, and only at the 10 mm ROI, RNFL thickness seems to be a meaningful biomarker of disease. Hence, for studying changes in the inner retina with a macular OCT scan, the GCIPL seems to be preferable over mRNFL, being robust against noise and offering good disease detection and separation. This is in agreement with findings by Khawaja et al. from a large cohort of UK Biobank participants, where they concluded that for detecting glaucomatous and macular pathophysiologic processes, the GCIPL, over mRNFL and GCC, may be the superior inner retinal biomarker [35]. The GCC is usually used to study the usability of the macular OCT scan in glaucoma detection, as it has been shown to have the same diagnostic performance in detecting early, moderate and severe glaucoma as the pRNFL [36,37]. Kim et al. found that the GCC and RGCL thickness in the 3 to 6 mm diameter macular ROI were the best macular parameters for discriminating between early glaucoma and healthy. The only macular parameter that was not significantly different between glaucoma and healthy was the RNFL in their smallest ROI (1 to 2 mm diameter) [38]. In a population-based study by Springelkamp et al., the inferior part of the GCL offered the highest sensitivity at 97.5% specificity and area under the ROC-curve, compared to other macular or peripapillary parameters [39].
Total macular thickness has been suggested as a potential alternative to assess structural changes in patients with glaucoma [40]. However, earlier studies suggested that macular thickness does not outperform pRNFL in glaucoma detection [9,41,42]. Our findings suggest that TRT is both sensitive to disease induced changes and robust against noise, apparently outperforming mRNFL and GCIPL, and could therefore be a viable measure of glaucomatous damage. Hence, the role of TRT in glaucomatous damage and change detection needs further research.
What are the implications of our findings regarding our knowledge of the true RNFL thickness? In addition to layer thinning due to noise, retinal layers may be overestimated. Layer thickness overestimation has been shown in phantom eyes where a 49 µm layer thickness can be overestimated by 13-23 µm depending on the OCT device and their built-in algorithm used [43]. We confirmed this overestimation by performing phantom eye measurements similar to that of de Kinkelder et al. and developed a theoretical framework for this overestimation [6]. The FWHM approach was the spin-off of this work, and should be able to provide an unbiased RNFL thickness estimate, insensitive to noise. Clearly, FWHM yielded in the current study a thinner RNFL, indicating that indeed other algorithms overestimate this layer. FWHM thickness measurements provided on average a 11.9 µm thinner RNFL thickness in the 5 mm ROI and a 19.7 µm thinner thickness in the 10 mm ROI compared to the other algorithms. Whether this thickness difference can be tied to an actual overestimation, remains an open question since validating measurements in histology is extremely challenging. The phantom eye measurements, however, clearly support the hypothesis that the real RNFL is thinner than commonly reported [21,43]. As a practical method for clinical studies, FWHM was more robust against noise than IOWA, but did not surpass the other two algorithms.

Conclusion
In conclusion, we showed that the four algorithms used in this study are all susceptible to noise at a varying degree, and the results between different algorithms are not interchangeable. The algorithms differ in their ability to differentiate between young healthy eyes and older glaucoma eyes, and they all fail to accurately separate different glaucoma stages from each other based solely on the RNFL thickness in the macular area. From the RNFL, GCIPL, and TRT measurements, the TRT was the most robust against noise, had the highest agreement between different algorithms, and was the best metric in separating early and moderate and early and severe glaucoma from each other. A cross-sectional screening study with age-similar healthy subjects and glaucoma patients and a longitudinal study in a large glaucoma cohort exploring the value of this parameter would be the logical next steps. None of the algorithms compared were able to separate moderate and severe glaucoma from each other, leaving a clear challenge in ocular imaging.

Disclosures
The authors declare no conflicts of interest