Applying an Open-Source Segmentation Algorithm to Different OCT Devices in Multiple Sclerosis Patients and Healthy Controls: Implications for Clinical Trials

Background. The lack of segmentation algorithms operative across optical coherence tomography (OCT) platforms hinders utility of retinal layer measures in MS trials. Objective. To determine cross-sectional and longitudinal agreement of retinal layer thicknesses derived from an open-source, fully-automated, segmentation algorithm, applied to two spectral-domain OCT devices. Methods. Cirrus HD-OCT and Spectralis OCT macular scans from 68 MS patients and 22 healthy controls were segmented. A longitudinal cohort comprising 51 subjects (mean follow-up: 1.4 ± 0.9 years) was also examined. Bland-Altman analyses and interscanner agreement indices were utilized to assess agreement between scanners. Results. Low mean differences (−2.16 to 0.26 μm) and narrow limits of agreement (LOA) were noted for ganglion cell and inner and outer nuclear layer thicknesses cross-sectionally. Longitudinally we found low mean differences (−0.195 to 0.21 μm) for changes in all layers, with wider LOA. Comparisons of rate of change in layer thicknesses over time revealed consistent results between the platforms. Conclusions. Retinal thickness measures for the majority of the retinal layers agree well cross-sectionally and longitudinally between the two scanners at the cohort level, with greater variability at the individual level. This open-source segmentation algorithm enables combining data from different OCT platforms, broadening utilization of OCT as an outcome measure in MS trials.


Introduction
Multiple sclerosis (MS) is a chronic demyelinating disorder of the central nervous system, with both inflammatory and neurodegenerative components [1]. MS has a predilection to affect the optic nerves with autopsy studies revealing that up to 99% of MS patients have involvement of the optic nerves, regardless of optic neuritis history [2][3][4]. While acute demyelination and inflammatory axonal transection may be responsible for the symptoms observed during an acute relapse, neuroaxonal degeneration appears to be the principal pathological substrate underlying accumulation of disability and progression in MS [5][6][7]. Several putative therapeutic strategies for remyelination and neuroprotection are now transitioning from the laboratory to early phase clinical trials [8][9][10]. The anterior visual pathway has been proposed as an ideal model within which to study the effect of such therapies, due to its excellent structure-function correlations [11,12].
Optical coherence tomography (OCT) is a rapid, noninvasive, well tolerated, and reproducible method utilizing lowcoherence, near-infrared light to generate high-resolution, cross-sectional images of the retina [13]. Advances in OCT technology have led to shorter scan times, improved resolution, and better reproducibility [14,15]. Current generation spectral-domain (SD) OCT devices have a resolution of approximately 4-5 m. Initial studies of OCT in MS primarily focused on peripapillary retinal nerve fiber layer (p-RNFL) and total macular volume (TMV) measurements [16,17]. Recently developed automated retinal layer segmentation algorithms have enabled examination of alterations within discrete retinal layers in MS [18][19][20]. Optic nerve pathology results in degeneration of its constituent axons, the retinal nerve fiber layer, and ganglion cell neurons, from which optic nerve axons derive. Moreover, studies suggest that primary retinal pathology may also be operative in MS, though this has been challenged by other studies [20,21]. Previous studies found that increased inner nuclear layer (INL) thickness may be associated with the development of new T2 lesions, contrast enhancing lesions, and EDSS progression, while p-RNFL and ganglion cell layer (GCL) thicknesses may correlate with grey matter volume [22,23]. Despite these findings and the relative inexpensiveness of OCT, OCT derived measures have not been widely employed as outcome measures in clinical trials. This likely relates to the utilization of different OCT platforms across varying clinical sites and the fact that currently employed segmentation algorithms are mostly platform specific. This is a barrier for not only MS research and trials, but also virtually all disciplines in which OCT is of interest. The comparison of quantitative results across OCT platforms has been a challenge, since manufacturer segmentation algorithms utilize different anatomical landmarks from which retinal measures are calculated [24,25]. An open-source segmentation algorithm that could be used to segment OCT scans from different OCT platforms in a consistent fashion could allow more widespread use of OCT in clinical trials, provided the agreement between acquired measures across the platforms was good. In this study comprising cross-sectional and longitudinal cohorts, we performed a cross-platform comparison of retinal layer OCT segmentation utilizing a new open-source segmentation algorithm and also compared derived measures between MS patients and healthy controls.

Study Population.
Patients for this study were recruited from the Johns Hopkins Multiple Sclerosis Center by convenience sampling. Written informed consent was obtained from study participants. The study was approved by the Institutional Review Board of Johns Hopkins University, was HIPAA compliant, and adhered to the tenets laid down in the Declaration of Helsinki. MS diagnosis and subtype classification into relapsing-remitting (RRMS), secondary progressive (SPMS), or primary progressive (PPMS) were confirmed by the treating neurologist, based on the revised McDonald criteria [26]. Healthy controls (HCs) were recruited from volunteers amongst Johns Hopkins staff. Individuals with refractive errors of > ±6.0 diopters, history of ocular surgery, glaucoma, hypertension, diabetes, or any other apparent ocular pathology were excluded.

OCT Scanning.
Retinal imaging was performed by experienced technicians on Cirrus HD-OCT model 4000, software version 5.0 (Carl Zeiss Meditec, Dublin, CA, USA) and Spectralis OCT, software version 5.2.4 (Heidelberg Engineering, Heidelberg, Germany), as described in detail elsewhere [15,25]. Briefly, Cirrus macular data was obtained using the macular cube 512 × 128 protocol. OSCAR-1B quality control criteria were applied to OCT scans. Only scans with signal strengths ≥ 7 and without artifact were included in the study. Spectralis macular scans were obtained using the fast macular protocol. Spectralis macular scans included in this study had an automatic real time (ART) of 16 and signal strength ≥ 20 dB and were devoid of artifact. We removed 8 scans that did not fulfill these quality control criteria. Cirrus and Spectralis scans were obtained in a random order on the same day.

OCT Segmentation.
Layer segmentation of the OCT data was performed using a previously developed and validated algorithm for detecting 8 layers within the macula as depicted in Figure 1 [27]. The algorithm works in three stages: preprocessing, pixel classification, and graph-based multilayer segmentation. In the preprocessing stage, the intensities of each B-scan image are normalized to add consistency between scans. Additionally, estimates of the inner and outer retinal boundaries (inner limiting membrane (ILM) and Bruch's membrane (BM)) are used to restrict the region of interest for the algorithm, as well as to flatten the data to the BM boundary. In the second stage, we use a random forest classifier to determine the probability that each pixel belongs to one of the 9 layer boundaries [28]. The classifier was trained using manual segmentations from 7 randomly chosen subjects (including both MS and control data). In the final stage, a graph-based segmentation algorithm is used to find the 9 surfaces (corresponding to the boundaries between each of the 8 layers) by maximizing the boundary-specific probabilities on those surfaces [29]. Constraints are used to limit the minimum and maximum distance between each boundary and to limit the smoothness of the final segmentation. Thickness measurements are computed by averaging the thickness values within a square 5×5 mm region centered at the fovea. The center of the fovea was estimated as the location of the A-scan having the smallest total macular thickness within the central 2 × 2 mm area of the data. Note that the thickness values were not averaged over the entire 6 × 6 mm imaged area since we allow the position of the fovea to vary by ±1 mm. This algorithm is available for download at http://www.nitrc.org/projects/aura tools/. The run time for the algorithm is between 3 and 4 minutes per scan. Cirrus scans were exported in .img format prior to segmentation with the algorithm, while Spectralis scans were exported in .vol format. Following segmentation, scans were inspected for segmentation errors. The segmentation software allows for scans that have segmentation errors identified on visual inspection to have segmentation lines corrected manually.
The layers produced by manufacturer segmentation algorithms include mRNFL, ganglion cell + inner plexiform layer (GCIP), inner nuclear layer + outer plexiform layer (INL + OPL), and outer nuclear layer + photoreceptor (ONL + PR) for the Cirrus HD-OCT and mRNFL, GCL, IPL, INL, OPL, ONL, photoreceptor-inner segment (PR-IS), photoreceptorouter segment (PR-OS), and retinal pigment epithelium (RPE) for Spectralis.  the two OCT platforms [30,31]. This included calculating mean differences with 95% confidence intervals (CI), limits of agreement (LOA) with 95% CI, and Bland-Altman plots of differences against average measurements. These were calculated for all retinal layers. The interscanner agreement index was calculated for each retinal layer for each subject. This index has previously been used to compare interscanner variation between MRI platforms [31], as well as OCT measures derived from different scanners [18]. If is the measurement on machine , and is the measurement on machine , then the interscanner agreement is defined as follows:

Statistical
For the longitudinal cohort we performed modified Bland-Altman analysis to adjust for repeated measures, utilizing the change in various retinal layers between serial scans [32]. Similar to the cross-sectional cohort, we calculated mean differences and LOA with 95% CI and Bland-Altman plots. We utilized mixed effects linear models to calculate the rate of change of layer thickness over time, adjusting for age and sex and accounting for within-subject intereye correlations.
We performed an exploratory comparison of thicknesses of various retinal layers (derived from each platform) between MS subjects and healthy controls utilizing a mixed effects linear regression model, adjusting for age and sex, accounting for within-subject intereye correlations. values < 0.05 were defined as statistically significant.

Study Population.
The cross-sectional cohort consisted of 90 subjects, 68 MS patients and 22 HCs. The longitudinal cohort was a subgroup of the cross-sectional cohort, consisting of 51 subjects. The demographic characteristics of the study participants are illustrated in Table 1. MS patients were significantly older than HCs ( < 0.001), with an insignificantly greater proportion of MS patients being female ( = 0.25). The mean follow-up duration of the longitudinal cohort was 1.4 ± 0.9 years.   mean differences, and LOA for all layers are listed in Table 2. The mean differences and LOA for all retinal layers were similar between MS patients and healthy controls (data not shown). We also constructed Bland-Altman plots for these comparisons for each layer (Figure 2). These showed no evidence of a systematic relationship between differences and average thickness values. Interscanner agreement indices were extremely high for all layers except for the mRNFL (mRNFL: 85. 5 Figure 3. These results further support excellent agreement between OCT segmentation measures across the cohort between the two scanners.  Table 3.

Longitudinal Comparison of Segmentation across
Bland-Altman plots for the change in retinal layers across the two devices are shown in Figure 4.
We also utilized mixed effects models to ascertain the rate of change of different retinal layers for the entire cohort using segmentation values derived from the two platforms. These analyses are summarized in Table 4. We found that, except for the mRNFL, the remaining layers showed consistency in the significance of rate of change in the layer thicknesses between the two platforms.

Comparison of Retinal Layers between MS and Healthy
Controls. In the cross-sectional cohort, relative to controls, MS patients had reduced mRNFL ( = 0.001) and GCIP ( < 0.001) thicknesses across both platforms, adjusting for age and gender. Table 5 lists difference in the mean thickness values of individual layers between HCs and MS subjects, with separate comparisons for each OCT platform.

Discussion
The results of this study reveal excellent agreement of retinal layer measures acquired from two different OCT scanners, both cross-sectionally and longitudinally across MS patients and healthy controls, helping to validate a new automated retinal segmentation algorithm operative across platforms. Utilizing this segmentation technique could help overcome current limitations in comparing retinal segmentation data across different OCT scanners, enabling wider adoption of OCT measures as outcomes in clinical trials.
The results of the cross-sectional comparison suggest excellent cross-platform agreement at the cohort level for the GCIP, INL, INL + OPL, and ONL + PR, as evidenced by the small mean differences for these measures between the two     OCT devices studied. The mean difference for the mRNFL was larger suggesting poorer agreement between scanners. The small mean differences suggest that at the cohort level retinal layer measures are comparable across platforms. This is an important finding since it suggests data acquired using different scanners could be pooled, utilizing this segmentation algorithm. In multicenter studies, where different sites may have different scanners, this could allow an increase in sample size, power, and ultimately the ability to detect meaningful relationships between OCT and other clinical and imaging measures of MS disease activity. Further support for agreement is derived from our analyses comparing retinal layer thickness measures between MS patients and healthy controls acquired from the two different OCT scanners. These results were consistent in terms of magnitude of difference, as well as significance, across both platforms for all retinal neuronal layers, underpinning the potential utility of employing a consistent segmentation algorithm for the examination of cross-sectional data acquired on different OCT scanners. Similar results were obtained in the longitudinal cohort suggesting that utilizing retinal layer thicknesses from two different OCT platforms may not change the interpretation of the rates of layer change across the entire cohort. The LOA represent the agreement of measures at an individual level. Although we found the LOA to be narrower than those reported in previous studies, they may still be unacceptably wide to support the use of different platforms interchangeably at the individual patient level. In routine clinical practice, therefore, patients should continue to be  scanned on the same OCT device, as has been suggested in prior studies [18,25]. Results from the longitudinal cohort revealed small mean differences for all retinal layers. Comparing these mean differences to the absolute values for change in the layers, the GCIP, INL + OPL, ONL, and RPE mean differences suggested good agreement at the cohort level. The mean differences for mRNFL, INL, and ONL + PR appeared large compared to the absolute values of the change in those layers. This suggests that the GCIP, INL + OPL, ONL, and RPE agreed well between the scanners at the cohort level over time, raising the possibility that the employed segmentation algorithm may have utility not only cross-sectionally, but longitudinally as well. Elaborating upon these findings, when assessing the rate of change of individual retinal layers over time (adjusting for age and gender), we found consistent results between the platforms in terms of the direction and significance of change in various retinal layer thicknesses with the exception of the mRNFL. This would suggest that, despite the mean differences being large compared to the absolute rate of change for some layers, combining data at the cohort level may be possible for all layers except for the mRNFL. Similar to the cross-sectional cohort, the LOA for the longitudinal cohort suggested poorer agreement at the individual level. Thus it would not be advisable to use different scanners while following up an individual patient over time.
TMV and pRNFL have been previously compared across scanners [24,25,33]. Some of these studies suggested poor agreement between different OCT platforms. The limitation with these studies is that manufacturers utilize different landmarks to calculate retinal thicknesses, thus making it difficult to compare across platforms. The use of a common algorithm to segment data from different platforms helps to circumvent this problem by utilizing consistent landmarks.
It has been suggested that there may be inherent limitations to comparing data from different scanners. A study comparing lateral and axial thickness measures derived from different SD-OCT scanners imaging a phantom eye showed significant differences between OCT platforms [34]. The authors suggested using a conversion factor when attempting to compare retinal measures across scanners. This study did not attempt to segment retinal layers and utilized manual rather than automated measurement methods. In contrast, another study utilizing manual delineation of retinal boundaries showed that it was possible to obtain almost identical retinal thickness values from different OCT scanners [35]. The use of an automated method may facilitate more consistent and accurate measurement of retinal layer thicknesses.
A limitation of this study is that the novel segmentation technique employed was only applied to two OCT platforms, and we therefore do not know whether its application to other OCT scanners may be as effective. Moreover, due to the novelty of this algorithm and the single center nature of this study, the findings of this study should be replicated in a larger sample size, over multiple centers, preferably incorporating more OCT platforms, and a wider host of neuroophthalmological/ophthalmological disorders to generate more generalizable and definitive results. Despite these limitations, this study represents a major advance in terms of demonstrating the utility and advantages of applying a consistent segmentation technique to scans acquired from different OCT scanners.
An important development that could help in the application of segmentation algorithms such as ours, as well as the incorporation of OCT images into patient electronic medical records, would be the development of DICOM standards that would be utilized by multiple OCT device manufacturers. This could help expand the application of novel segmentation algorithms to multiple OCT platforms and their use in OCT studies as well as clinical practice.
The availability of an open-source OCT segmentation algorithm would be of interest to those conducting observational studies utilizing OCT, allowing them to increase sample size and pool data across centers. Such a segmentation technique is even more critical for the incorporation of OCT as a routine outcome measure in trials. From an MS perspective this is important since a new wave of trials of remyelinating therapies is poised to begin. At present, we have limited techniques to measure the effects of such therapies in humans. OCT has been proposed as an important tool that could be specifically utilized for this purpose based on the supposition that rapid remyelination will protect retinal axons from degeneration. OCT could also be incorporated as an outcome measure in trials of putative neuroprotective agents. The ability to easily and consistently segment OCT scans, in order to derive reliable retinal segmentation data from different scanners at multiple centers, would be an essential prerequisite to increasing the utilization of OCT in not only these scenarios, but also a whole host of other research and trial settings. Therefore our validation of a novel retinal segmentation algorithm, which can be consistently applied across OCT platforms, could be a major step towards expanding OCT utilization in not only MS research, but also other neuroophthalmological disorders, ultimately facilitating therapeutic advances.