Hybrid deep learning and optimal graph search method for optical coherence tomography layer segmentation in diseases affecting the optic nerve

Accurate segmentation of retinal layers in optical coherence tomography (OCT) images is critical for assessing diseases that affect the optic nerve, but existing automated algorithms often fail when pathology causes irregular layer topology, such as extreme thinning of the ganglion cell-inner plexiform layer (GCIPL). Deep LOGISMOS, a hybrid approach that combines the strengths of deep learning and 3D graph search to overcome their limitations, was developed to improve the accuracy, robustness and generalizability of retinal layer segmentation. The method was trained on 124 OCT volumes from both eyes of 31 non-arteritic anterior ischemic optic neuropathy (NAION) patients and tested on three cross-sectional datasets with available reference tracings: Test-NAION (40 volumes from both eyes of 20 NAION subjects), Test-G (29 volumes from 29 glaucoma subjects/eyes), and Test-JHU (35 volumes from 21 multiple sclerosis and 14 control subjects/eyes) and one longitudinal dataset without reference tracings: Test-G-L (155 volumes from 15 glaucoma patients/eyes). In the three test datasets with reference tracings (Test-NAION, Test-G, and Test-JHU), Deep LOGISMOS achieved very high Dice similarity coefficients (%) on GCIPL: 89.97±3.59, 90.63±2.56, and 94.06±1.76, respectively. In the same context, Deep LOGISMOS outperformed the Iowa reference algorithms by improving the Dice score by 17.5, 5.4, and 7.5, and also surpassed the deep learning framework nnU-Net with improvements of 4.4, 3.7, and 1.0. For the 15 severe glaucoma eyes with marked GCIPL thinning (Test-G-L), it demonstrated reliable regional GCIPL thickness measurement over five years. The proposed Deep LOGISMOS approach has potential to enhance precise quantification of retinal structures, aiding diagnosis and treatment management of optic nerve diseases.


Introduction
Optical coherence tomography (OCT) imaging is essential for the diagnosis, progression monitoring and damage assessment of diseases affecting the optic nerve, most of which cause some degree of irreversible vision loss [1][2][3].OCT enables visualization and thickness measurement of the retinal nerve fiber layer (RNFL) and ganglion cell-inner plexiform layer (GCIPL), which are reduced due to optic nerve damage.These measures provide important clinical information about structural changes needed for monitoring the progression of diseases.For glaucoma patients, the gradual progression of nerve damage and thinning of these layers are visible [3][4][5][6] and can be quantified [3,7].For patients with multiple sclerosis (MS), subtle regional thinning over time in RNFL and GCIPL thickness are correlated with factors beyond visual function, such as reduced brain volume, neuronal loss (demonstrated on pathology of the eye), cognitive impairment, and overall disability [8][9][10].For patients with non-arteritic ischemic optic neuropathy (NAION), the condition is evident by the initial manifestation of substantial retinal swelling, followed by rapid thinning of the RNFL and inner layers of the retina in the subsequent months [2,11,12].Figure 1 shows several examples of OCT images with thickening and thinning of retina layers.The significant thickening and thinning of retina layers can cause standard automated segmentation methods to fail to identify RNFL and GCIPL correctly [13,14], confining eye care providers to following glaucoma progression primarily by visual field (VF) outcomes, which depends on determining the threshold light perception in different locations of patients' visual field and can be inconsistent particularly when the visual field loss is severe [15].To ensure reliable and reproducible retinal structural quantification in OCT, an accurate layer segmentation method that is robust to extreme variations in layer thickness is critical.Traditionally, experts design automated OCT layer segmentation approaches based on domain knowledge assuming regular retinal thicknesses, surface topology, and image properties of OCT devices.Previously, we developed The Iowa Reference Algorithms (also known as OCTExplorer [40]) with highly integrated specific designs and parameters to fit automated layer segmentation in both macular and optic-nerve-head (ONH) volumetric scans for different OCT devices.It was designed based on Layered Optimal Graph Image Segmentation for Multiple Objects and Surfaces (LOGISMOS, Sec.2.2), a general purpose method for optimally segmenting multiple n-D surfaces that mutually interact within individual and/or between objects [16][17][18][19].Since LOGISMOS theoretically guarantees surface/object topology by design, it has been widely used and exceeds segmentation accuracy compared to other standard automated methods [20] in multi-surface/object segmentation applications.However, the LOGISMOS parameters in OCTExplorer were not tailored for severe retinopathy and/or retinal deformation, resulting in unreliable segmentation results and structural measurements in extreme cases of thickening and thinning of the retinal layers.Although LOGISMOS can achieve the desired accuracy by using fine-tuned parameters (surface constraints) for specific retina pathology [21][22][23], such parameter tuning is often difficult for users without sufficient understanding of the underlying algorithm.
Recent advances in deep learning methods have demonstrated exceptional capabilities in modeling retinal structure by directly learning features from input data (with precise, high-quality manually traced layers/surfaces in cases of supervised retinal layer segmentation).Among various deep learning architectures, U-Net [24] based approaches have gained significant attention in OCT layer segmentation.Mishra et al. [25] proposed using a U-Net to create 2D probability maps of drusen and reticular pseudodrusen (RPD) and combine them with the image gradient to guide a shortest-path algorithm to identify retinal surfaces in 2D B-scans.He et al. [26] trained a residual U-Net with two output branches for two probability maps of the retinal layers and junction surfaces and merged probability maps in an iterative surface topology module to estimate the retinal surface locations.Furthermore, Yadav et al. [27] proposed a cascaded two-stage design, in which the first U-Net identified the retina in 2D OCT B-scan and the following U-Net focused on identifying each retinal layer based on the pre-segmented retina.Most recently, He et al. [28] aimed to segment longitudinal OCT scans by proposing a long-short-term-memory U-Net (LSTM-UNet) to incorporate adjacent 2D B-scans information and convert the segmented surfaces as longitudinal priors.A hidden Markov model was also used to avoid segmenting all past images in the OCT longitudinal segmentation task to reduce GPU memory usage.Mukherjee et al. [29] utilized 3D U-Net using blocks of 8 neighboring B-scans to produce single-pixel surfaces generated from A-scan-wise maximal probability and a 3D autoencoder was used as constraint-inducing regularizer to enforce surface smoothness and improve surface topology.
Existing automated approaches have shown excellent performance in identifying retinal surfaces, but several major issues remain.First, although 2D segmentation methods do not require intense computational power, the segmentation results can be locally unstable due to the absence of volumetric contextual information.Some deep-learning methods require intricate pre-and post-processing steps, such as hole filling, outlier detection, and surface/layer topology checking [27], which are often empirically tuned according to the training data and can cause unforeseen errors in different test datasets.Another common strategy in post-processing is to refine the segmented surface according to the topological order of the retina [25] such that the errors occur on the first-processed surface can easily propagate to other surfaces that were originally segmented correctly.
In this study, a hybrid approach, Deep LOGISMOS, was developed to combine the strengths of deep learning and LOGISMOS while overcoming their respective limitations.A two-stage true 3D pipeline was implemented to improve the accuracy, robustness and generalizability.The novelties and advantages of this work are as follows.
A two-stage segmentation approach is used -pre-segmentation locates the whole retina region in the full image and final segmentation identifies individual layers.Each stage employs a dedicated deep learning network, whose output probability map is used to derive in-region cost for the following LOGISMOS optimization that also incorporates weighted gradient-based on-surface cost to improve segmentation robustness and accuracy.
High quality deep learning probability maps eliminate the need for complex parameter tuning for LOGISMOS.The built-in topology guarantees in LOGISMOS eliminate the need for complex and dataset-specific post-processing of deep learning results.State-of-the-art multi-threading maximum flow solver substantially reduces running time of LOGISMOS graph optimization.True 3D implementation is less sensitive to outlier B-scans with heavy noise or artifacts.
In addition to the traditional segmentation accuracy metrics, the clinical usability of the developed method is also assessed by quality of resulting retinal layer thickness maps.The robustness and generalizability are evaluated across cohorts and OCT devices.This work does not involve creating a new deep learning segmentation network or augmenting existing capabilities of LOGISMOS.Instead, it determines the usefulness of effective combination of two methodologies.The deep learning components in this study were implemented using nnU-Net framework [30] but can be potentially replaced with any network capable of producing probability maps for retinal layers.The LOGISMOS graph structure remains consistent with that proposed in [18] while using in-region costs derived from deep learning.

Methods
As shown in Fig. 2, the proposed method consists of two stages: pre-segmentation and final segmentation.The retina and top/bottom background regions are first segmented during the pre-segmentation stage; the target retinal layers/surfaces are then simultaneously identified in 3D in the final segmentation stage.The five target retinal layers are: retinal nerve fiber layer (RNFL), ganglion cell-inner plexiform layer (GCIPL), inner nuclear plus outer plexiform layer (INOPL), outer nuclear layer (ONL), and retinal pigment epithelium (RPE) complex.The target surfaces are the bounding surfaces of the five interested layers, including: the inner limiting membrane (ILM), RNFL-GCL, IPL-INL, OPL-ONL, upper and lower RPE complex surfaces.The pre-segmentation stage is implemented using a 2D U-Net network followed by a two-surface 3D LOGISMOS, while the final segmentation stage uses a 3D U-Net network followed by a six-surface 3D LOGISMOS.

Surface-oriented two-stage segmentation
OCT images inherently have a significant observed imbalance between the retina and background, as the retina constitutes a minor portion of the overall image.The penetration of OCT is approximately 2 mm and thickness of normal retina rarely goes beyond 400 µm [31], which means less than 20% of the B-scan is occupied by retina.Even with existing pathology considered, the majority of B-scan voxels still belong to the background.Considering the thickness of individual layers is even smaller percentage of the B-scans, this skewed representation of the retina and its layers relative to the background might lead the deep learning network to overemphasize the background when training a multi-layer segmentation model utilizing complete B-scans.
In order to more effectively manage this imbalance between background and retinal layers, as well as reduce the computational cost for subsequent 3D U-Net based multi-layer segmentation, we implemented pre-segmentation of the entire retinal region using a 2D U-Net component from the nnU-Net framework.The retinal region, in this context, is defined as the area between the ILM and lower bounding surface of the RPE complex.This allowed the generation of probability maps distinctly for the bottom background, the retinal region, and the top background.Although a 3D U-Net could be utilized for pre-segmentation, we found that the 2D U-Net was sufficient for accurate retina delineations with significantly reduced computational demands.
Typically, the pre-segmentation process provides a reasonable labeling of the retinal region.However, the retinal region can exhibit complex and irregular features due to various factors, including artifacts, local pathologies, OCT deficiencies, blood vessels, etc.Furthermore, when applying the trained model to an unseen dataset, the pre-segmentation results may produce unexpected false predictions such as holes, irregular attachments, incorrect labels and other similar anomalies.These inaccuracies stem from unseen patterns that arise from different imaging device characteristics and cohort properties.Converting such region-based segmentation to surface-based segmentation may not be straightforward and might require tailored post-processing.Moreover, having an accurate outline of the retinal region is essential to achieve final segmentation accuracy.To better handle such complexities, the LOGISMOS framework was employed by utilizing the probability maps of 3 regions (bottom background, retina and top background) from 2D U-Net output to achieve the 2-surface segmentation of the retinal region.More details about our Deep LOGISMOS framework are discussed in Section 2.2.
Following the pre-segmentation, the retinal regions are flattened based on the top surface (ILM), which typically has less complexity than the bottom surface (OB-RPE).This flattening is intended to provide better layer connectivity, facilitating improved multi-layer/-surface segmentation in both 3D U-Net and LOGISMOS algorithms.Additionally, the retinal region is cropped based on the maximum retinal thickness.A top and bottom margin are added to the cropped retinal region, allowing for some errors from unseen complexities and incorporate some contextual information.
The final segmentation of individual layers is performed using the 3D U-Net component from the nnU-Net framework.The training phase uses the retinal region cropped based on manually labeled surfaces to ensure that the training is independent of the pre-segmentation model.The inference phase uses the retinal region cropped based on the pre-segmentation results.The 3D U-Net was trained to segment each layer independently, with training labels consisting of the cropped background and five retinal layers (derived from six surfaces) within the focal and cropped retinal region.In spite of all efforts to address label imbalance and improve the 3D connectivity of the retinal layers, the 3D U-Net may still encounter challenges in generalizing to unseen patterns and inevitable artifacts, and occasionally produce unexpected false predictions.Similar to the pre-segmentation stage, we opted for a unified LOGISMOS framework instead of applying customized post-processing techniques for unforeseen failure cases.

Deep LOGISMOS
As shown in Fig. 3, the 3D image to segment is covered by columns of inter-connected graph nodes such that each target surface intersects with each column exactly once and the graph edges connecting graph nodes are used to enforce geometric constraints.The smoothness constraint ∆ s determines the maximal allowed shape variation between neighboring columns of the same surface.If node n on a given column is the on-surface node, the on-surface nodes on the neighboring columns can only reside within [n − ∆ s , n + ∆ s ].Surface separation constraints, ∆ l and ∆ u , determine the bounds of distances between two surfaces on the corresponding columns.
Each graph node is assigned a cost that can be on-surface cost, in-region cost, or their combination.Let s, c, n denote the indices of surface, column and node and S, C, N denote the number of surfaces, columns per surface and nodes per column, respectively.The on-surface cost emulates the unlikeness of target surface passing thought a given node.The on-surface cost of node n on column c of surface s is thus denoted as C os (s, c, n).A given multi-surface segmentation can be represented by its surface function SF such that n = SF(s, c) defines the location of surface s on column c.The total on-surface cost for a segmentation is When the target surfaces are non-crossing, the above surface function has an equivalent region function RF such that r = RF(c, n) assigns node n on column c to one of the S + 1 regions (Fig. 3(a)).The in-region cost C ir (r, c, n) emulates the unlikeness of (c, n) belonging to region r ∈ [0, S].The total in-region cost for a segmentation is On-surface and in-region costs can be used individually or together with adjustable weights as where w s and w r are used to adjust relative contributions of on-surface and in-region costs.
Because C os and C ir are computed on different numbers of cost values, α must be used as an additional correction factor.When on-surface and in-region cost have the same absolute value range, α = S/N.After the graph is constructed using proper cost functions, the desired segmentation with globally minimal total cost while satisfying geometric constraints can be found by maximum flow of the graph.For more details about the graph construction, converting above costs to terminal capacity of the graph nodes, and finding the maximum flow of the graph, please refer to [16,18].Deep LOGISMOS utilizes in-region cost derived from high-quality probability maps from deep learning to achieve fast and reliable layer identification.Deep learning, while regionally robust, may not always produce very accurate surface delineation and may produce local errors due to noise or artifacts.Therefore, Gaussian gradient based on-surface cost was combined with learned in-region cost.Using w s <w r in Eq. ( 3), the main role of on-surface cost is providing an additional nudge when the in-region cost is insufficient.
One limitation of the previous OCTExplorer is its long running time associated with Boykov-Kolmogorov maximum flow solver [32].IBFS and EIBFS solvers [33,34] can achieve 4-8 times speedup on the same graph and an additional 2-4 times speedup using a multi-threading implementation [35].The running time of these maximum flow solvers also heavily depend on the quality of the cost, using cost with less ambiguity such as that derived from deep learning can further reduce the running time substantially comparing to hand-crafted cost.

Datasets
Table 1 lists the detailed information about the training and test datasets used in this study.For the training data, 124 Cirrus OCT macular volumes (Carl Zeiss Meditc, Germany) from 31 NAION participants (both eyes and two visits) were manually selected from the Quark Pharmaceutical clinical trial (ClinicalTrials.govIdentifier: NCT02341560).For the training set, the inclusion criteria of OCT images from both the NAION-affected study eye and the unaffected fellow eye across two visits aim to maximize the utility of limited data.This approach covers a broad spectrum of retinal patterns from swelling to atrophy, providing the model with diverse learning opportunities.Reference tracings are available for three cross-sectional test datasets.Test-NAION includes 40 Cirrus OCT macular scans of 20 additional participants from the same Quark NAION trial.Test-G includes 29 Cirrus OCT macular scans of affected eye from randomly selected 29 glaucoma patients from the University of Iowa Hospitals and Clinics.Test-JHU is the publicly available dataset that included 21 and 14 Spectralis OCT macular scans (Heidelberg Engineering Inc., Germany) from 21 multiple sclerosis (MS) and 14 control subjects, respectively, from the Johns Hopkins Hospital [36].Additionally, a longitudinal test dataset (Test-G-L) contains 155 Cirrus OCT macular scans from 15 independent glaucoma subjects from the University of Iowa Hospitals and Clinics, averaging 10.33±0.7 semi-annual (162.32±14.63day interval) scans of the study eye was used to test the robustness of our method in longitudinal data.Reference tracings of six surfaces -ILM, RNFL-GCL, IPL-INL, OPL-ONL, upper and lower RPE -were created by manually modifying the resulting surfaces produced by OCTExplorer for the Training, Test-NAION, and Test-G datasets.In addition, this study included two imaging protocols for the Cirrus OCT scans: 128 B-scans of 512×1024 pixels and 200 B-scans of 200×1024 pixels, physically covering 6×6×2 mm 3 , and each Spectralis OCT scan contained 49 B-scans of 1024×496 pixels, physically covering 6×6×1.8mm 3 .Further insights into our model's approach to managing diverse OCT dimensions are elaborated in Section 3.2.
The study protocol was approved by the University of Iowa's Institutional Review Board and adhered to the tenets of the Declaration of Helsinki.The Quark NAION trial data was prospectively collected during a study that was approved by numerous IRBs and participants provided consent prior to enrollment.

Deep learning
We incorporated two U-Net architectures from the nnU-Net framework [30] into our segmentation pipeline -a 2D nnU-Net for pre-segmentation and a 3D nnU-Net for final segmentation.For both U-Net variants, we employed Stochastic Gradient Descent (SGD) with mini-batches [37] as the optimizer, configured with a momentum of 0.99 and a weight decay of 0.005.The learning rate was initially set as 0.01 and then gradually decreased throughout the training.Both the 2D and 3D U-Net underwent an initial training phase of 30 epochs to reach a performance plateau and were fine-tuned for an additional 15 epochs to optimize their performance.The proposed models were implemented using the PyTorch platform [38] and trained on an NVIDIA Tesla V100 GPU.
As described above and outlined in Table 1, our segmentation model was trained on Cirrus OCT images from a single cohort of NAION patients and then applied to various test datasets to assess its performance and generalizability.To facilitate the application of our model across different cohorts, devices and protocols, we standardized the size of the B-scan image input to the 2D and 3D U-Net models to 512×512 and 512×256, respectively.The B-scans were restored to the original size and resolution for final LOGISMOS segmentation.The patch size of 128×128×128 was used in the 3D nnU-Net model.This patch size was chosen to manage the computational demands of processing 3D scans while still preserving sufficient image details for effective segmentation.During the inference phase, the patches, along with the sliding window technique, were utilized to generate the probability maps for Cirrus OCT images, which usually have more than 120 B-scans.
However, in contrast to 128 or 200 B-scans per OCT volume in the collected Cirrus data, the Spectralis OCT scans used in the Test-JHU dataset have only 49 B-scans per volume, which causes a challenge during the inference phase using the 3D U-Net with 128 B-scans patch.To address this, the number of B-scans of Spectralis OCT images was increased to 145 by inserting two linearly interpolated B-scans between each two neighboring original B-scans.This approach preserved all the original B-scans without introducing artifacts and brought the scan to a comparable range with the Cirrus OCT images in the training dataset.The original B-scans were extracted after the inference phase.
At the end of this stage, the resulting probability maps were mapped back to the original image by resizing and reversing the flattening and cropping steps.This remapping serves two purposes.First, the output probability maps can be visualized in the context of the original image for easy quality check.Second, the next LOGISMOS step can be performed independently without the knowledge regarding the resizing, flattening and cropping parameters used for the deep learning.

Deep LOGISMOS
The in-region costs of Deep LOGISMOS were directly derived from probability maps with simple probability-to-unlikeness conversion.The on-surface costs were computed based on intensity gradient along A-scan and the known bright-to-dark or dark-to-bright pattern of given surfaces.The ranges of absolute values of in-region and on-surface costs were linearly scaled to [0, 100] so the correction factor α in Eq. ( 3) is S/N.The relative contribution of in-region and on-surface costs, w r and w s , were experimentally set to 0.6 and 0.4 for pre-segmentation, 0.9 and 0.1 for final segmentation, by testing several w r >w s combinations on selected difficult cases and observing frequency and severity of local errors.
LOGISMOS optimization was always performed in 3D to segment all target surfaces simultaneously.For pre-segmentation, graph columns covered complete A-scans with 256 evenly spaced nodes.For final segmentation, graph columns covered A-scans within the retina range identified by pre-segmentation with one-to-one node-to-voxel correspondence, thus achieving implicit flattening and cropping.The LOGISMOS constraints were ∆ s = 8 node, ∆ l = 0 and ∆ u = 0.5N, where N is the number of nodes per column.These constraints were extremely relaxed and applied to both segmentation stages, and were standardized for all test datasets.

Validation metrics
The deep learning segmentation results -assigning each voxel a unique label based on corresponding values in the probability maps -have no guaranteed correct topology and therefore were only evaluated by the Dice similarity coefficient.Deep LOGISMOS results with correct topology were compared with manual tracing using the Dice coefficient, surface positioning error, and layer thickness error.
To demonstrate the robustness and quality of clinically oriented metrics derived from Deep LOGISMOS, the GCIPL layer thicknesses were measured in an elliptical annulus grid centered on the fovea using standard Cirrus macular analysis settings: vertical inner and outer radii of 0.5 and 2.0 mm, respectively; horizontal inner and outer radii of 0.6 and 2.4 mm, respectively [5].

Results
The training dataset included 124 scans from 31 NAION participants, and the model training time was approximately 2.5 and 4.0 hours for the pre-segmentation and final segmentation stages, respectively, using a Tesla V100 GPU.For each input OCT volume, the pre-segmentation stage took approximately 40 seconds and 1 second for the deep learning inference (using a Tesla V100 GPU) and the LOGISMOS graph optimization (using an 8-core modern CPU), respectively.The subsequent final segmentation stage took approximately 5 seconds with deep learning inference and 10-15 seconds with LOGISMOS optimization.For comparison, OCTExplorer took around 210 seconds to segment a Cirrus OCT macular scan.
After training, Deep LOGISMOS successfully identified six target surfaces in all 259 OCT scans from four test datasets without failures.Figure 4 shows several examples from subjects in the Test-NAION dataset.The Deep LOGISMOS results were evaluated against the reference tracings.For comparison, the results of OCTExplorer and deep-learning-only (referred to as Deep Learning and was only evaluated by Dice due to lack of correct topology) were also evaluated.The performance evaluation results are listed in Tables 2, 3 and 4.

Test-NAION dataset
The characteristics of the Test-NAION dataset was similar to the training dataset, in which all subjects were from the same Quark NAION study and all OCT images were acquired by Cirrus device.Figure 5 illustrates a comparison example among retinal thickness maps generated based on the segmentation results from Deep LOGISMOS, OCTExplorer, and manual tracing.Comparing to Deep LOGISMOS, OCTExplorer was struggling to stably determine a thin RNFL, and then the segmentation errors propagated to other layers in the inner retina.
Table 2 shows Deep LOGISMOS achieved significantly higher Dice coefficients than the deep-learning-only approach in all five retinal layers.For the layers that are highly affected by NAION (showing apparent thinning of the inner retina, including RNFL, GCIPL, and INOPL), Deep LOGISMOS significantly outperformed OCTExplorer by yielding much higher mean Dice coefficients and lower standard deviations.For surface positioning errors, Deep LOGISMOS produced outstanding results in Table 3, especially for the bounding surfaces of GCIPL (i.e.,  RNFL-GCL and IPL-INL) on which OCTExplorer showed approximately 6-8 voxel signed and unsigned errors and they are reduced by more than 50% in Deep LOGISMOS.Since the thin-retina-related segmentation errors are often caused by misidentified regional GCIPL, the neighboring RNFL, INOPL (sometimes even ONL) thicknesses are mainly affected.Deep LOGISMOS has shown significantly better thickness measurement accuracy in RNFL, INOPL, and ONL in Table 4.

Test-G dataset
The Test-G dataset was used to assess the performance of Deep LOGISMOS on images acquired by the same OCT device but from a different glaucoma cohort.4.

Test-JHU dataset
The Test-JHU dataset was used to assess the performance of Deep LOGISMOS on images acquired by a different OCT device (Spectralis) from eyes with a different disease (multiple sclerosis).Table 2 shows that Deep LOGISMOS yielded the highest Dice coefficients in all five target layers (p-value < 0.05).For surface positioning errors, Deep LOGISMOS achieved a sub-voxel level for the mean signed/unsigned errors and standard deviations in all target layers except for the lower RPE.The Deep LOGISMOS results are comparable with those from other deep-learning methods that were directly trained using Spectralis OCTs [27].For the layer thickness error measurements in Table 4, both OCTExplorer and Deep LOGISMOS show sub-voxel-level errors in all target layers except for the RPE complex.
Comparing with Cirrus images, the Spectralis images have higher quality, especially clearer distinction between layers.Therefore, the improvement in Dice introduced by Deep LOGISMOS over deep learning was not as substantial as on the other test datasets in Table 2.However, the success of the final segmentation was made possible by the robust pre-segmentation in which LOGISMOS is imperative in overcoming errors in deep learning as shown in Fig. 10.

Test-G-L dataset
The Test-G-L dataset was used to test the stability of Deep LOGISMOS for longitudinal macular OCT scans, which is essential for monitoring disease progression.A standard Cirrus elliptical annulus grid, was centered at the fovea and for the regional thicknesses of GCIPL in six sectors (i.e., N: nasal, S: Superior, T: Temporal, and I: Inferior).Figure 6 shows a comparison example of the regional GCIPL thicknesses in sectors calculated by Deep LOGISMOS and OCTExplorer.Both the radar plot and longitudinal thickness maps demonstrate that Deep LOGISMOS was able to stably identify the GCIPL even when the thickness had reached the floor value.For this extremely challenging case, the OCTExplorer results were only reliable in the nasal sectors.
Reference tracings were not available for Test-G-L because the process is extremely timeconsuming for longitudinal datasets.However, it is known that the included 15 subjects did not exhibit rapid glaucoma progression and the visit intervals were less than six months.Therefore, it is expected that the sector thickness changes (∆T) in the GCIPL annulus grid would be relatively small and stable.Figure 7 shows distributions of ∆T based on the segmentation results from Deep LOGISMOS and OCTExplorer.Deep LOGISMOS yielded highly consistent ∆T measurements (|∆T| < 2 µm) that agree with the expectation derived from the clinical findings.On the other hand, dramatic ∆T changes (e.g., |∆T| > 4 µm) produced by OCTExplorer, suggests unreliable measurements due to inconsistent segmentation.

Discussion
Deep LOGISMOS can robustly and accurately segment retinal layers even in challenging cases of NAION and glaucoma, as shown in Figs. 8 and 9.Although mild segmentation errors still occurred in Fig. 8(e,f) and Fig. 9(d), Deep LOGISMOS, with its built-in topology, was able to eliminate the need for complex and dataset-specific post-processing of deep learning results and limit the size of the affected region and the severity of errors.Figure 10 shows that the capability of Cirrus-trained deep learning model in segmenting Spectralis images is substantially compromised in the pre-segmentation stage, producing many mislabeled regions.However, the LOGISMOS optimization, without adopting new parameter set or data-specific parameter tuning, overcame these negative effects of cross-device inference and produced correct pre-segmentation results.After that, the final segmentation showed no apparent differences between the Spectralis and Cirrus OCT images.The reference tracings of the training, test-NAION, and test-G datasets were achieved by manual modification of OCTExplorer results to reduce human efforts.Since the underlying pathology mainly affects the RNFL and GCIPL, their related surfaces were subjected to more manual modifications than ILM, ONL and RPE which were often visually reasonable and thus did not require modification in most situations.This time-saving approach might introduce bias into the trained model and quantitative results and it also caused slightly larger (but still acceptable) errors on RPE in Tables 3, and 4.This potential bias in reference tracings does not exist in the independently traced test-JHU dataset and Deep LOGISMOS outperformed OCTExplorer equally in all retinal layers/surfaces.
Potential future work may include extending our layer segmentation approach to simultaneously identify retinal layers and regions with fluid accumulations, including cases with age-related macular degeneration, diabetic retinopathy, and radiation retinopathy.The implemented workflow can also be further enhanced with Just-Enough Interaction (JEI) [39] to allow intuitive and efficient correction of segmentation errors.

Conclusion
This study presented Deep LOGISMOS, a novel hybrid framework that combines deep learning and optimal graph search for accurate and robust segmentation of retinal layers in OCT images.The results demonstrated that Deep LOGISMOS consistently outperformed existing standard automated algorithms (OCTExplorer) and deep learning alone.Trained on NAION cases acquired by Cirrus, the method showed great flexibility and generalizability when applied two various disease cohorts (NAION, glaucoma, and MS) and images acquired by different device (Spectralis).It also provided stable and robust longitudinal quantification of retinal layers, enabling more reliable monitoring of disease progression over time.
In summary, Deep LOGISMOS addresses key limitations of current OCT layer segmentation methods and provides a robust automated approach to extract clinically meaningful retinal structural information, especially in cases of significant retinal layer thickening or thinning where current algorithms fail.With further validation, this hybrid framework has the potential to aid diagnosis, enhance understanding of pathophysiology, and improve management of diseases affecting the optic nerve.

Fig. 1 .
Fig. 1.Examples of OCT central B-scans showing thickening and thinning.(a) A NAION affected eye shows edema at the region close to the optic nerve head and fluid around the outer retina.(b) The same patient's unaffected fellow eye at the same visit shows no swelling and atrophy.(c) The NAION affected eye after six months shows significant GCIPL thinning, and the RNFL is hardly identifiable.(d) An independent eye with advanced glaucoma also shows severe thinning of the RNFL and GCIPL.

Fig. 2 .
Fig. 2. Deep LOGISMOS workflow.The pre-segmentation stage identifies the whole retina region, while the final segmentation stage identifies individual surfaces within focal retina region.

1 Fig. 3 .
Fig. 3. LOGISMOS graph construction.(a) Two terrain-like target surfaces in 3D volume image.Each target surface intersects with each column exactly once.(b) Intra-column edges and inter-column edges for surface smoothness constraints, ∆ s .(c) Edges connecting corresponding columns from two non-crossing surfaces to enforce lower and upper bounds of surface separation constraints, ∆ l and ∆ u , respectively.

Fig. 4 .
Fig. 4. Example segmentation results of NAION subjects.Column 1: Original images.Column 2: Deep learning pre-segmentation of (top to bottom) the top background (no color), the retina and the bottom background.Column 3: Deep LOGISMOS pre-segmentation of ILM and lower RPE surfaces.Column 4: Deep learning final segmentation of RNFL, GCIPL, INOPL, ONL, RPE complex.Column 5: Deep LOGISMOS final segmentation showing six target surfaces.

Fig. 5 .Fig. 6 .
Fig. 5. Retinal thickness maps generated from segmentation using Deep LOGISMOS, OCTExplorer, and manual tracing.The dashed lines mark the B-scan location in the thickness maps.The color scale is adjusted to highlight the variation of the thicknesses in different layers.The segmentation errors propagate among the inner retinal layers of OCTExplorer results.Horizontal line artifacts exist in the thickness maps derived from manual tracings because surfaces were traced on individual B-scans without sufficient visual feedback to check and maintain surface continuity between B-scans.GCIPL Thickness Maps (µm) GCIPL Thickness Radar Plots

Fig. 8 .Fig. 9 .Fig. 10 .
Fig. 8. Challenging cases in Test-NAION with individual B-scans and the segmented surfaces from Deep LOGISMOS.(a) a hyper-reflective region under RPE complex, (b) shadows under separate inner retinal vessels, (c) shadows from vitreous media opacity, (d) shadows under multiple retinal vessels close to each other, (e) weak OCT signal, (f) weak OCT signal and pigment epithelial detachment, (g) optic disc edema, (h) sub-retinal and intra-retinal fluid due to the edema, (i) severe motion artifact, and (j) hypo-reflectivity at the inner retina.Yellow arrows: challenging regions.(i, j) reconstructed vertical B-scans perpendicular to the regular horizontal B-scan.

Table 1 . Summary of training and test datasets.
bCarl Zeiss Meditec, Dublin CA. c Heidelberg Engineering, Heidelberg, Germany.d Both eyes from two visits.e Both eyes from the same visit.f 'L' for longitudinal; single eye with more than 9 semi-annual visits.Note: Datasets denoted bold have reference tracing available.

Table 3 . Surface positioning errors a (voxels) of six target retinal surfaces.
a Surface locations are not directly available from Deep Learning.b Deep LOGISMOS is significantly better than OCTExplorer (p<0.05).
a Not directly available for Deep Learning.b Table 2 shows that Deep LOGISMOS had significantly higher Dice coefficients than the deep-learning-only approach in all five target layers.Compared to OCTExplorer, Deep LOGISMOS showed much lower