Testing a phantom eye under various signal-to-noise ratio conditions using eleven different OCT devices.

We compared eleven OCT devices in their ability to quantify retinal layer thicknesses under different signal-strength conditions, using a commercially available phantom eye. We analyzed a medium-intensity 50 µm layer in an identical manner for all devices, using the provided log-scale images and a reconstructed linear-scale tissue reflectivity metric. Thickness measurements were highly comparable when the data were analyzed in an identical manner. With optimal signal strength, the thickness of the 50 µm layer was overestimated by a mean of 4.3 µm in the log-scale images and of 2.7 µm in the linear-scale images.


Introduction
Optical coherence tomography (OCT) has made a high impact in the clinical practice in ophthalmology. In retinal disease, a qualitative description of the images is common, sometimes extended with a measure of the overall thickness of the retina to quantify edema; in glaucoma and other diseases affecting the optic nerve, a quantitative assessment of individual retinal layers, especially the retinal nerve fiber layer (RNFL), retinal ganglion cell layer (RGCL), and inner plexiform layer (IPL), is the primary approach. Unfortunately, current commercial OCT devices have a poor inter-brand repeatability regarding the assessment of the retinal layers, limiting the interchangeable use of different devices. Differences between devices have been observed both in human eyes [1][2][3][4][5][6][7] and in specially developed phantom eyes [8][9][10][11].
Discrepancies in observed layer thicknesses can be the result of differences in hardware, software, image quality, field of view (FOV), or the size of the region of interest (ROI) [9]. Moreover, if devices differ, it is not possible to know which one is most accurate. Proprietary issues play a role here, and validating OCT measurements to histology is extremely challenging. The only way to standardize measurements is to use some kind of phantom eye with known anatomy. While the use of phantoms is quite common in other medical imaging modalities like CT and MRI [12], their use in ophthalmic imaging is sparse and not standardized [13]. A decrease in image quality can be caused by changes in the optical media, due to, for example, dry eye syndrome, corneal opacities, cataract or myopia. [14,15] A decrease in image quality may cause the measured layers seem thinner than they actually are [16][17][18] and may thus hamper disease detection and accurate follow-up. One way to artificially decrease SNR, is to prevent a portion of the light from reaching the imaging sensor, by using neutral density filters.
The aim of this study was to compare 11 commercially available OCT acquisition devices in their ability to quantify retinal layer thicknesses under different signal-strength conditions. For this purpose, we assessed a layer with known thickness in a commercially available phantom eye. OCT scans were acquired using each device's default settings with optimal and artificially-degraded image quality using neutral density filters. A standardized, custom approach to analyze layer thicknesses was used to remove the effect of differences in segmentation software. As a result, all images were analyzed in an identical manner.

Model eye
The whole model eye, as provided by Rowe Technical Design, Inc. (Dana Point, CA, USA) is fitted with the following tissue phantoms: aspheric cornea with central corneal thickness of 427 µm; crystalline lens (shape and thickness of which mimic a 50 YO male, 4.0 mm central thickness; retina phantom, which is composed of 6 layers, including a retinal pigment epithelium layer and a choroid. Total retinal thickness is approximately 300 µm. The thickness varies [see Fig. 1(C)], because, during the embossment process, the top four layers are crushed, to make a foveal shape. However, this does not influence the layer of interest, the fifth layer. Total axial length of the model eye is approximately 25.7 mm. The model eye is fluid filled with a polymer fluid that mimics the refractive index (RI) of aqueous and vitreous humor (1.3315 ± 0.0005 at room temperature) and is sealed without bubbles. There is a 5 mm pupil and all phantoms are axisymmetric in the model eye. The thickness of the layer used for this study [marked in Fig. 1(C)] inside the model eye is 50 ± 1 µm and the mean RI of the various retinal layers is 1.564 ± 0.001, as reported by the manufacturer. The same model eye was used in all measurements.  Table 1 lists the 11 spectral domain OCT (SD-OCT) devices used and highlights their technical aspects and the differences in imaging protocols used. Two identical Heidelberg Engineering Spectralis OCT-1 (Heidelberg, Germany) were used at two different locations (City, University of London, London, UK and Queen Elizabeth Hospital Birmingham, Birmingham, UK) to investigate differences between two devices of the same model. The imaging technology of the Heidelberg Engineering Flex is the same as used in the Spectralis OCT-2. However, the latter is a device with conventional headrest, while the former is mounted on an adjustable arm.

Image acquisition
Acquisitions were obtained using each device's default settings, with the model eye fixed to the headrest [ Fig. 1(A)]. Figure 1(B)-1(D) show a fundus image of the model eye and B-scans from two different devices, respectively. Scans with and without attenuated signals were obtained from each device. Three neutral density filters (NIR filter; Edmund Optics, Barrington, USA) with optical density (OD) of 0.3, 0.6, and 0.9 were used to attenuate the light to measure thicknesses under six different signal-to-noise ratio (SNR) conditions. Filters were used individually or stacked together in combination to reduce the intensity to 1/2, 1/4, 1/8, 1/16, 1/32, and 1/64 of the original intensity in a single pass (0.3, 0.6, 0.9, 1.2, 1.5, and 1.8 OD respectively); they were placed between the OCT device and the model eye, by holding the filter against the OCT device to ensure stability. When using a filter, the OCT device had to be moved closer to the model eye (which was left in place between the measurements), due to changed optical path length.
Applying several filters meant the OCT device had to be moved even closer. Some devices were unable to accommodate more than one or two filters due to a lack of space between the model eye and the OCT device. For each neutral density filter setting, either a single B-scan was captured or a full set of B-scans (cube scan; see Data analysis subsection). Autofocus was used in all devices, if there was an option for it. The focus was switched to manual when the autofocus failed due to the filters. Autofocus was used when possible, often it was needed to use manual focus, especially when using filters.

Data analysis
For each B-scan, we calculated the mean of 10 adjacent A-scans to improve the SNR. This was repeated for 10 adjacent B-scans, when a 3D scan was acquired, yielding 10 mean A-scans. These 10 mean A-scans were analyzed separately, after which the resulting thicknesses were averaged. For the Spectralis devices, only a single B-scan was acquired (with ART 9 level, meaning that the single B-scan is an average of 9 recordings), yielding a single mean A-scan. We also analyzed the data in an alternative way. The raw output of an OCT device (the square of the modulus of the Fourier transform) is proportional to the tissue reflectivity; before exporting, this output is transformed logarithmically to generate the images normally provided by the OCT devices, referred to as log-scaled images in this article, in which relevant structures can be distinguished by the human eye. Images may also be compressed to fit the commonly used 8-bit depth of monitors. In order to get unbiased thickness estimates, however, the untransformed, referred to as linear-scaled image in this article, data are needed [19].
We were able to export the linear-scale data from the Heidelberg Engineering devices by using the Heyex software (Heidelberg Engineering). The other OCT devices provided only the default, log-scale images. As the details of the transformation may differ amongst OCT devices and are not provided by the manufacturers, a logarithmic transformation with 50 dB dynamic range was assumed (see Discussion section). [20,21] This results in image intensity transformation given by: where I linear (i,j) is the linear-scale intensity at location (i,j) and I log (i,j) the log-scaled intensity (the grey intensity values of the image; 0-255 for an 8-bit image). For Optovue, which gives 12-bit images, another intensity transformation was used, based on optical bench measurements by Zhang et al. [22]: Equations (1) and (2) are essentially identical if they are applied to 8-bit and 12-bit images, respectively (2 12 /800 ∼ 5). OCT devices use the RI of the retinal human tissue to provide the desired physical thickness of the layers from the measured optical thickness. A RI of 1.38 is assumed for the human eye [20]. The RI of the measured layer in the phantom eye is 1.564, as reported by the manufacturer. In order to get the true physical thickness, we need to account for the difference in the RI [23]. The true physical thickness is therefore 1.38/1.564 = 0.88 of the reported thickness. This ratio was used to scale the measurements.

Statistical analysis
The effect of filter strength and analysis method (image versus reflectivity based) was studied using repeated measures analysis of variance (ANOVA); because filters could not be used in all devices, we compared the analysis method also separately for thickness measurements without filter, using a paired t-test. All analyses were performed using R (version 3.2.3; R Foundation for Statistical Computing, Vienna, Austria). A p-value of 0.05 or less was considered statistically significant. Figure 3 shows the effect of signal attenuation on the OCT image. Topcon Maestro, being fully automated, failed to take any images from the model eye when filters were used since it did not recognize the pupil and failed to align the device. Topcon 1000 and 2000 didn't show anything in the scans beyond 0.9 OD filters. Only Canon could be used to measure the phantom eye with 1.8 OD filter due to lack of physical space between the device and the phantom eye in the other OCT devices (the 1.8 OD requires the use of all 3 filters simultaneously). The thickness measurement in Canon and Nidek became impossible after 1.2 OD due to noise. Fig. 3. B-scans from the phantom eye measured using different OCT devices and different optical densities (OD). All scans were cropped to show the same area of interest. Normal image size for each device is described in Table 1. Table 2 shows the thickness measurement results from the 50 ± 1 µm thick layer. The measured thicknesses based on the log-scale image intensity (I log ) values, for images without a filter, ranged between 52.4 and 58.1 µm, with a mean ± standard deviation of 54.3 ± 1.7 µm across all 11 devices. All values were above the thickness as declared by the manufacturer. The corresponding thicknesses based on the linear-scale image intensity (I linear ) values ranged from 50.3 to 54.6 µm, with a mean of 52.7 ± 1.8 µm across all devices. Without filter, the linear-scale values were significantly lower than the log-scale values (paired t-test; P = 0.012, 95% CI [0.43 2.77]). No obvious monotonic thinning or thickening of the measured layer with the use of filters is noticeable for any of the devices. ANOVA (for filter strengths 0-0.9 and without Topcon Maestro to avoid missing data) showed a statistically significant effect of filter strength (P = 0.04), but the corresponding post-hoc comparisons were not significant when Bonferroni corrected paired t-tests were applied. There was a statistically significant difference between analysis methods, when comparing across all filter strengths (P = 0.016). The effect of filter strength did not depend on the analysis method (interaction between filter strength and analysis method: P = 0.85). We found no significant differences between devices in the log-scale images. For the linear-scale images, there were significant differences between Spectralis 2 and Topcon 1000 (P = 0.03) and between Spectralis 1 and Topcon 1000 (P = 0.03). The four Spectralis devices gave identical results within the resolution of the approach (as we acquired a single B-scan in these devices, the resolution of the thickness assessment is equal to pixel size) for filter strengths up to and including 0.6; one of the devices (OCT-1 City UoL) failed for filter strength 1.2 and above and reported a thinner layer at filter strength 0.9. 1.8 **** *** *** *** *** *** *** ** ** ** * HE = Heidelberg Engineering; the two Spectralis OCT-1 located at City University of London and Queen Elizabeth Hospital Birmingham are identified by "City UoL" and "QEHB" nomenclature respectively; * Could not scan due to automated system; **Nothing visible in the scan; ***Could not fit 3 filters; ****Could not be calculated due to noise.

Discussion
Between eleven different OCT machines from six different manufacturers, the thickness measurements are highly comparable when the data are analyzed in an identical manner. Mean thickness without a filter for the 50 µm thick layer was overestimated by a mean of 4.3 µm in the log-scale images and a mean of 2.7 µm in the linear-scale images, showing only a modest overestimation across all devices. Lowering the SNR with a neutral density filter didn't have a clear monotonic thinning or thickening effect on the measured layer. Thickness measurements of high-reflective layers (like the RNFL) overestimate the real thickness if based on log-scale image intensity, and this overestimation decreases with decreasing signal-to-noise ratio (increasing filter strength). Kinkelder et al found, using a phantom eye with a layer thickness of 49 µm, an overestimation ranging from 13 to 23 µm for different devices [8]. Similarly, thickness measurements of low-reflective layers underestimate the real thickness. This over-and underestimation are less pronounced, and should theoretically disappear completely, if the analysis is based on linear-scale images [19]. The phantom layer addressed in the current study is neither a high-reflective nor a low-reflective layer, but is preceded by a layer with a lower reflectivity and followed by a layer with higher reflectivity [Figs. 1(C), 2(A), 2(C)]. As such, the effect of analysis method (log-scale image versus linear-scale image) on the thickness measurements should be small compared to what has been reported for high-reflective layers, and the effect on the signal-to-noise ratio on the thickness measurements is not self-evident. Indeed, in the current study the differences between the methods and the influence of signal-to-noise ratio were small; within the small differences, the thickness measurements based on the linear-scale images were in better agreement with the thickness as specified by the manufacturer than those based on the log-scale images. For getting the linear-scale data, it is needed to know the dynamic range of the device. Unfortunately, none of the manufacturers disclosed this information. As the dynamic range is expected to be between 40 and 60 dB [21], we chose to use 50 dB for all scans.
To explore the effect of this assumption, we conducted additional analyses with dynamic ranges of 40 and 60 dB, using the Canon HS-100 data for 0.0 OD. Mean thicknesses were 50.6, 50.3, and 50.1 um for the 40, 50, and 60 dB dynamic range, respectively.
Thickness measurements from different OCT devices are either found to be comparable [24] or not comparable [5,6,25,26] in a clinical setting. Differences in the mean RNFL thickness between Stratus and Spectralis OCT (106.2 ± 6.9 µm vs 100.0 ± 7.3 µm) have been reported, and concluded not to be interchangeable in a clinical evaluation [6]. Agrawal et al. showed that built-in algorithms may be the main cause for differences in the measured layer thicknesses [9]. A reported mean difference between Spectralis and Cirrus devices RNFL thicknesses was 6.7 µm, when device's own software was used, and the thickness difference was reduced to 0.1 µm when custom software was used [5]. In agreement with this, using 3rd party software to measure retinal layers from different OCT devices has been shown to offer good agreement cross-sectionally and longitudinally in MS patients [27]. Other studies found different devices (Spectralis and Cirrus) to agree well (correlation coefficient = 0.912) in measuring the mean RNFL thickness, but the applied correlation coefficient is not the appropriate way to assess agreement. Indeed, they found the means to differ by approximately 5 um (89.22 ± 15.87 versus 84.54 ± 13.68 µm using Spectralis and Cirrus devices, respectively) [28]. We were not able to use the built-in algorithms (because they were not able to correctly segment the phantom eye layers), but our data confirm that measurements become very similar between devices if a uniform, custom approach is applied.
In order to get the best clinical follow-up for patients, using the same device and the same operator is recommended [25]. This is, however, not always possible, since clinics may have different OCT brands used in daily practice, either simultaneously or during history, and often more than one ophthalmic technician. Currently there are no established protocols between different OCT device manufacturers to allow easy comparison between different brands. Establishing agreement between different OCT devices is needed in order to uniform test results, enabling an accurate between-device longitudinal follow-up. Our study shows that different devices could be comparable if the data from different devices are analyzed similarly. Further improvements could be realized by using, amongst others, standardized ROIs.
The main limitation of this study is that we were unable to compare devices own built in algorithms against our approach. We could not use the devices segmentation results because the layer of interest, shown in Fig. 1(C), does not resemble an anatomical layer the devices would automatically report. The devices built-in algorithms usually only report the total retinal thickness or the thickness of the top three layers (RNFL, GCL and IPL). Future prospect is to image an anatomically correct phantom, with clear distinction in the top layers which the OCT devices built-in software can automatically evaluate, and compare these segmentation results to a custom approach. Phantoms have been shown to mimic the layer thicknesses and the scattering properties of the human eye, but still fall short in accurately describing the full complexity of the real anatomy.
In conclusion, in the case of a phantom eye, our study shows that measurements from different devices can be reliably compared against each other when a custom approach is used to quantify the layer thickness.