An Open-Source Deep Learning Algorithm for Efficient and Fully Automatic Analysis of the Choroid in Optical Coherence Tomography

Purpose To develop an open-source, fully automatic deep learning algorithm, DeepGPET, for choroid region segmentation in optical coherence tomography (OCT) data. Methods We used a dataset of 715 OCT B-scans (82 subjects, 115 eyes) from three clinical studies related to systemic disease. Ground-truth segmentations were generated using a clinically validated, semiautomatic choroid segmentation method, Gaussian Process Edge Tracing (GPET). We finetuned a U-Net with the MobileNetV3 backbone pretrained on ImageNet. Standard segmentation agreement metrics, as well as derived measures of choroidal thickness and area, were used to evaluate DeepGPET, alongside qualitative evaluation from a clinical ophthalmologist. Results DeepGPET achieved excellent agreement with GPET on data from three clinical studies (AUC = 0.9994, Dice = 0.9664; Pearson correlation = 0.8908 for choroidal thickness and 0.9082 for choroidal area), while reducing the mean processing time per image on a standard laptop CPU from 34.49 ± 15.09 seconds using GPET to 1.25 ± 0.10 seconds using DeepGPET. Both methods performed similarly according to a clinical ophthalmologist who qualitatively judged a subset of segmentations by GPET and DeepGPET, based on smoothness and accuracy of segmentations. Conclusions DeepGPET, a fully automatic, open-source algorithm for choroidal segmentation, will enable researchers to efficiently extract choroidal measurements, even for large datasets. As no manual interventions are required, DeepGPET is less subjective than semiautomatic methods and could be deployed in clinical practice without requiring a trained operator. Translational Relevance DeepGPET addresses the lack of open-source, fully automatic, and clinically relevant choroid segmentation algorithms, and its subsequent public release will facilitate future choroidal research in both ophthalmology and wider systemic health.


Introduction
The retinal choroid is a complex, extensively interconnected vessel network positioned between the retina and the sclera.The choroid holds the majority of the vasculature in the eye and plays a pivotal role in nourishing the retina.Optical coherence tomography (OCT) is an ocular imaging modality that uses low-coherence light to construct a three-dimensional map of chorioretinal structures at the back of the eye.Standard OCT imaging does not visualise the deeper choroidal tissue well as it sits beneath the hyperreflective retinal pigment epithelium layer of the retina.Enhanced Depth Imaging OCT (EDI-OCT) overcomes this problem and offers improved visualisation of the choroid, thus providing a unique window into the microvascular network which not only resides closest to the brain embryologically, but also carries the highest volumetric flow per unit tissue weight compared to any other organ in the body.
Since the advent of OCT, interest in the role played by the choroid in systemic health has been growing 1 , as non-invasive imaging of the choroidal microvasculature may provide a novel location to detect systemic, microvascular changes early.Indeed, changes in choroidal blood flow, thickness and other markers have been shown to correspond with patient health such as choroidal thickness in chronic kidney disease 2 and choroidal area and vascularity in Alzheimer's dementia 3 .
Quantification of the choroid in EDI-OCT imaging requires segmentation of the choroidal space.However, this is a harder problem than retinal layer segmentation due to poor signal penetration from the device -and thus lower signal-to-noise ratio -and shadows cast by superficial retinal vessels and choroidal stroma tissue.This results in poor intra-and inter-rater agreement even with manual segmentation by experienced clinicians, and manual segmentation is too labour intensive and subjective to be practical for analysing large scale datasets.
Semi-automated algorithms improve on this slightly but are typically multi-stage procedures, requiring traditional image processing techniques to prepare the images for downstream segmentation 4 .Methods based on graph theory such as Dijkstra's algorithm 5;6 or graph cut 7 , as well as on statistical techniques including level sets 8;9 , contour evolution 10 , and Gaussian mixture models 11 have been proposed previously.Concurrently, deep learning(DL)-based approaches have emerged. 12used a DL model for choroid layer segmentation, but with traditional contour tracing as a post-processing step.Other DL-based approaches, too, combine traditional image processing techniques as pre-or post-processing steps 13;14;15 whereas others are fully DL-based 16;17 , the latter of which is in a similar vein to the proposed method.More recently, DL has been used to distil existing semi-automatic traditional image processing pipelines into a fully-automatic method 18 .
Gaussian Process Edge Tracing (GPET), based on Bayesian machine learning 19 , is a particularly promising method for choroid layer segmentation that has been clinically and quantitatively validated 20 .Gaussian process (GP) regression is used to model the upper and lower boundaries of the choroid from OCT scans.For each boundary, a recursive Bayesian scheme is employed to iteratively detect boundary pixels based on the image gradient and the GP regressor's distribution of candidate boundaries.However, GPET is semi-automatic and thus requires time-consuming manual interventions by specifically trained personnel which introduces subjectivity and limits the potential for analysing larger datasets or deploying GPET into clinical practice.
There are currently no accessible, open-source algorithms for fully-automatic choroidal segmentation.All available algorithms fall into one of three categories: First, semi-automatic methods 21;22 that require human supervision and thus require training and introduce subjectivity.Second, fully-automatic DL-based methods that are not openly accessible, either only providing the code but not the trained model necessary to use the method 23 or not providing any access at the time of writing 24 .Third, fully-automatic but comprising of many steps, requiring a good understanding of image processing techniques and a license for proprietary software (MATLAB) 25 .
We aim to develop and release an open-source, raw image-tomeasurement, fully-automatic method for choroid region segmentation that can be easily used without special training and does not require licenses for proprietary software (Fig. 1).Importantly, we intend to not only to make our method available to the research community, but to do so in a frictionless way that allows other researchers to download and use our method without seeking our approval.We distil GPET into a deep learning algorithm, DeepGPET, which can process images without supervision in a fraction of the time -permitting analysis of large scale datasets and potential deployment into clinical care and research practice without prior training in image processing.The code and model weights for DeepGPET are available here: https://github.com/jaburke166/deepgpet.

Study population
We used 715 OCT B-scans belonging to 82 subjects from three studies: OCTANE 26 , a study looking at renal function and impairment in chronic kidney disease patients.i-Test, a study recruiting pregnant women of any gestation or those who have delivered a baby within 6 months, including controls and in-dividuals at high risk of complications.Normative, data from 30 healthy volunteers as a control group 27 .All studies conformed with the Declaration of Helsinki and received relevant ethical approval and informed consent from all subjects.Table 1 provides an overview of basic population characteristics and number of subjects/images of these studies.Supplementary Fig. S1 presents box-plot distributions of choroidal thickness and area for the three datasets used to build DeepGPET, with Table S1 presenting tabular mean and standard deviation values.
Two Heidelberg spectral domain OCT SPECTRALIS devices were used for image acquisition: the Standard Module (OCT1 system) and FLEX Module (OCT2 system).The FLEX is a portable version that enables imaging of patients in a ward environment.Both machines imaged a 30 • (8.7 mm) region, generating a macular, cross-sectional OCT B-scan at 768 × 768 pixel resolution.Notably, 14% of the OCT B-scans were non-EDI and thus present more challenging images with lower signal-tonoise ratio in the choroidal part of the OCT.Horizontal line and vertical scans were centred at the fovea with active eye tracking, using an Automatic Real Time (ART) value of 100.Posterior pole macular scans covered a 30-degree by 25-degree region, using EDI mode.
We split the data into approximately an 85:8:7 split between training (603 B-scans, 66 subjects), validation (58 B-scans, 9 subjects) and test sets (54 B-scans, 7 subjects).When splitting the data, we did so at the patient-level, i.e. each subjects OCT images are present in only one set, and were selected so that each set had proportionally equal amounts of scan types (EDI/non-EDI) to best represent image quality.See supplementary Table S2 for an overview of basic population and imaging characteristics for each set.

DeepGPET
As the ground truths are based on GPET, DeepGPET can be can be seen as a more efficient, fully automatic and distilled version of GPET.Our approach was to fine-tune a UNet 28 with MobileNetV3 29 backbone pre-trained on ImageNet for 60 epochs with batch size 16 using AdamW 30 (lr = 10 −3 , β 1 = 0.9, β 2 = 0.999, weight decay = 10 −2 ).After epoch 30, we maintain an exponential moving average (EMA) of model weights which we then use as our final model.We use the following data augmentations: brightness and contrast changes, horizontal flipping, and simulated OCT speckle noise by applying Gaussian noise followed by multiplicative noise (all p = 0.5); Gaussian blur and random affine transforms (both p = 0.25).To reduce memory-load, we crop the black space above and below the OCT B-scan and process images at a resolution of 544 × 768 pixels.Images are standardised by subtracting 0.1 and dividing by 0.2, and no further pre-processing is done.We used Python 3.11, PyTorch 2.0, Segmentation Models PyTorch 31 and the timm library 32 .

Statistical analysis
We used Dice coefficient and Area Under the ROC Curve (AUC) for evaluating agreement in segmentations, as well as the Pearson correlation r and Mean Absolute Error (MAE) for segmentation-derived choroid thickness and area.The calculation of thickness and area from the segmentation is described in more detail in 20 .Briefly, for thickness the average of 3 measures is used, taken at the fovea and 2,000 microns from it in either direction by drawing a perpendicular line from the upper boundary to the lower boundary to account for choroidal curvature.

Manual selection of initial points Processing
Fully-automatic DL algorithm Semi-Automatic GPET Fully-Automatic DeepGPET  For area, pixels are counted in a region of interest 3,000 microns around the fovea, which corresponds to the commonly used Early Treatment Diabetic Retinopathy Study (ETDRS) macular area of 6, 000 × 6, 000 microns 33 .
We compare DeepGPET's agreement with GPET's segmentations against the repeatability of GPET itself.The creator of GPET, J.B., made both the original and repeated segmentations with GPET.Since both segmentations were done by the same person there is no inter-rater subjectivity at play here.Thus, the intra-rater agreement measured here is a best case scenario and forms an upper-bound for agreement with the original segmentations and any other semi-automatic method requiring manual input, which can necessarily be subject to human variability, unlike DeepGPET.
In addition to quantitative evaluations, we also compared segmentations by GPET and DeepGPET for 20 test set OCT images qualitatively by having them rated by I.M., an experienced clinical ophthalmologist.We selected 7 examples with the highest disagreement in thickness and area, 7 examples with disagreement closest to the median, and 6 examples with the lowest disagreement.Thus, these 20 examples cover cases where both methods are very different, cases of typical disagreement, and cases where both methods are very similar.In each instance, I.M. was shown the segmentations of both methods overlaid on the OCT -blinded to which method produced which segmentation -and also provided with the raw, full-resolution OCT, and was then asked to rate each one along three dimensions: Quality of the upper boundary, the lower boundary and overall smoothness using an ordinal scale: "Very bad", "Bad", "Okay", "Good", "Very good".

Quantitative
Table 2 shows the results for DeepGPET and a repeat GPET, compared to the initial GPET segmentation as "ground-truth".

Agreement in segmentation.
Both methods have excellent agreement with the original segmentations.DeepGPET's agreement is comparable to the repeatability of GPET itself, with DeepGPET's AUC being slightly higher (0.9994 vs 0.9812) and Dice coefficient slightly lower (0.9664 vs 0.9672).DeepGPET   pixels where it disagrees with GPET after thresholding, the confidence is lower than for ones where it agrees with GPET.This in turn suggests that DeepGPET is well-calibrated based on the raw predictions made for each pixel.
Processing speed and manual interventions.Both methods were compared on the same standard laptop CPU.DeepGPET processed the images in only 3.6% of the time that GPET needed.DeepGPET ran fully-automatic and successfully segmented all images, whereas GPET required 1.27 manual interventions on average, including selecting initial pixels and manual adjustment of GPET parameters when the initial segmentation failed.This results in massive time savings: A standard OCT volume scan consists of 61 B-scans.With GPET, processing such a volume for a single eye takes about 35 minutes during which a person has to select initial pixels to guide tracing (for all images) and adjust parameters if GPET initially failed (for about 25% of images).In contrast, DeepGPET could do the same processing in about 76 seconds on the same hardware, during which no manual input is needed.DeepGPET could even be GPU-accelerated to cut the processing time by another order of magnitude.
The lack of manual interventions required by DeepGPET means that no subjectivity is introduced unlike GPET, particularly when used by different people.Additionally, DeepGPET does not require specifically trained analysts and could be used fully-automatically in clinical practice.
Agreement in choroid area and thickness.GPET showed very high repeatability for thickness (Pearson r=0.9527,MAE=10.4074µm) and area (Pearson r=0.9726,MAE=0.0486mm 2 ).DeepG-PET achieved slightly lower, yet also very high agreement for both thickness (Pearson r=0.8908,MAE=13.3086µm) and area (Pearson r=0.9082,MAE=0.0699mm 2 ).Fig. 2 shows correlation plots for thickness and area.DeepGPET's agreement with GPET does not quite reach the repeatability of GPET itself, when used by the same experienced analyst, but it is quite comparable and high in absolute terms.Especially noteworthy is that the MAE for thickness and area is only 21% lower for thickness and 30% lower for area for repeated GPET than for DeepGPET Thus, DeepGPET comes quite close to optimal performance, i.e. best case repeatability where the same experienced analyst did both sets of annotation.Furthermore, the regression fits in both derived measures for DeepGPET are closer to the identity line than for the repeated GPET measurements.For CT, the linear fit estimated a slope value of 1.043 (95% confidence interval of 0.895 to 1.192) and intercept of -7.308 µm (95% confidence interval of -48.967 µm to 34.350 µm).For CA, the linear fit estimated a slope value of 1.01 (95% confidence interval of 0.878 to 1.137) and an intercept of 0.016 mm 2 (95% confidence interval of -0.195 mm 2 to 0.226 mm 2 ).All confidence intervals contain 1 and 0 for the slope and intercepts, respectively, suggesting no systematic bias or proportional difference between GPET and DeepGPET 34;35 .Fig. 3 shows the residuals between DeepGPET and the ground truth labels from the held-out test set using Bland-Altman plots 36 .Rahman 37 found that intra-rater agreement and inter-rater agreement of subfoveal choroidal thickness measurements were 23µm and 32µm, respectively.For CT, only 9.3% (5 / 54) were greater than 23µm in absolute value, with 4 of these  Table 3 Qualitative ratings of 20 test set segmentations along 3 key dimensions.The rater was blinded to the identity of the methods and their order was randomised for every example.
representing major sources of disagreement.Similarly for CA, the majority of residuals were centred around 0 (mean residual of -0.02mm 2 ), with only 5.5% (3 / 54) of residuals lying outside the limits of agreement.

Qualitative
Table 3 shows the results of the adjudication between GPET and DeepGPET.The upper boundary was rated as "Very good" for both methods in all 20 cases.However, for the lower boundary, DeepGPET was rated as "Bad" in 2 cases for the lower boundary and 1 case for smoothness.Otherwise, both methods performed very similarly.Fig. 4 shows some examples.In (a), DeepGPET segments more of the temporal region than GPET does, providing a full width segmentation which was preferred by the rater.Additionally, both approaches are able to segment a smooth boundary, even in regions with stroma fluid obscuring the lower boundary (red arrow).In (b), the lower boundary for this choroid is very faint and is actually below the majority of the vessels sitting most posterior (red arrow).DeepGPET produced a smooth and concave boundary preferred by the rater, while GPET fell victim to hugging the posterior most vessels in the subfoveal region.In (c), DeepGPET rejected the true boundary in the low contrast region (red arrow) and opted for a more well-defined one, while GPET segmented the more uncertain path.Since GPET permits human intervention, there is more opportunity to fine tune it's parameters to fit what the analyst believes is the true boundary.Here, the rater preferred GPET, while DeepG-PET's under-confidence led to under-segmentation and to a bad rating.In (d), the lower boundary is difficult to delineate due to a thick suprachoroidal space (red arrow) and thus a lack of lower boundary definition.Here, the rater gave a bad rating to DeepGPET and preferred GPET, while remarking that GPET actually under-segmented the choroid by intersecting through posterior vessels.The choroids in Fig. 4(b-d) are the choroids with the largest CT and CA disagreement between DeepGPET and GPET as observed in Fig. 3.

Discussion
We developed DeepGPET, a fully-automatic and efficient method for choroid layer segmentation, by distilling GPET, a clinically validated semi-automatic method.DeepGPET achieved excellent agreement with GPET on held-out data in terms of segmentation and derived choroidal measurements, approaching the repeatability of GPET itself and well within the threshold expected to exceed inter-rater agreement as observed in previous work 37 .We also found no significant association between segmentation performance (via Dice score) and choroidal thickness, area and the Heidelberg signal-to-noise quality index in the held-out test set (supplementary Table S3 and Fig. S2).Most importantly, DeepGPET does not require specialist training and can process images fully-automatically in a fraction of the time, enabling analysis of large scale datasets and potential deployment in clinical practice.
While the observed agreement was very high, it was not perfect.However, even higher agreement with GPET would not necessarily produce a better method as GPET itself is not perfect and even conceptually there is debate around the exact location of choroid-scleral interface (CSI), i.e. the lower choroid boundary in an OCT B-scan.CSI is commonly defined, e.g. by the original authors behind EDI-OCT 38 , as the smooth inner boundary between the choroid and sclera, or just below the most posterior  vessels but excluding the suprachoroidal space.However, even that definition is still debated and can be hard to discern in practice.Not all choroids are smooth, and there are edge cases like vessels passing from the sclera into the choroid, or stroma fluid obscurations that make the boundary even more ambiguous.These features, coupled with low signal-to-noise ratio and vessel shadowing from superficial retinal vessels, all contribute to the difficult challenge of choroid layer segmentation.
For quantitative analysis of choroidal phenotypes, the specific definition of the CSI is secondary to applying the same, consistent definition across and within patients.Here, fullyautomatic methods like DeepGPET provide a large benefit by removing the subjectivity present in semi-automatic methods.Where semi-automatic methods require manual input, two analysts with different understandings of the CSI could produce vastly different segmentations.With DeepGPET, the same image is always segmented in the same way, removing subjectivity.Initial experiments with other types of OCT imaging have positively indicated DeepGPET's ability to generalise to different visualisations of the choroid.Fig. 5 shows a peripapillary scan extracted from the Heidelberg Standard Module, centred on the optic head, with the choroid automatically segmented.Fig. 6 shows choroid segmentations using DeepGPET for three OCT B-scans from a TopCon device (DRI OCT Triton plus) -two cases where DeepGPET works well and one case where it does not.This shows some promise in it's usability in scans different to the Heidelberg macular line scans from which it was trained on.We hope in future iterations to extend the training data with scans from different imaging devices and scan locations.We recommend those using DeepGPET on non-Heidelberg images to review the segmentations after ward as a sanity check.
In the present work, we used data from three studies, two OCT devices and included both EDI and non-EDI scans.However, we only used data from subjects that were either healthy or had systemic but not eye disease, to which DeepGPET might not be robust to.In future work, we plan to externally validate DeepGPET and include cases of ocular pathologies.A further limitation is that while GPET has been clinically validated, not all segmentations used for training DeepGPET were entirely perfect.Thus, revisiting some of the existing segmentations and manually improving them to a "gold standard" for purposes of training the model could improve DeepGPET.For instance, GPET does not always segment the whole width of the choroid.Interestingly, DeepGPET already is able to do that in some cases (e.g.Fig. 4(a) and Fig. 5), but also does emulate the incomplete segmentations by GPET in other cases.A model trained on enhanced "gold standard" segmentations would produce even better segmentations.
Finally, we have focused on segmentation as it is the most important and most time-consuming step of choroidal analysis.However, the location of the fovea on OCT images needs identified to define the region of interest for derived measurements such as thickness, area and volume.Identifying the fovea is less time-consuming or ambiguous than choroid segmentation, and so we plan to extend DeepGPET to output the fovea location.This would make DeepGPET a fast and efficient endto-end framework capable of converting a raw OCT image to a set of clinically meaningful segmentation-derived measurements.Likewise, segmenting the choroidal vessels is a very challenging task even for humans and would be prohibitively time-consuming to do manually, but in the future we aim to explore whether DeepGPET can automatically segment the vasculature within the choroid as well.

Conclusion
Choroid segmentation is a key step in calculating choroidal measurements like thickness and area.Currently, this is commonly done manually which is labour intensive and introduces subjectivity.Semi-automatic methods only partially alleviate both of these problems, and previous fully-automatic methods were not easily accessible for researchers.DeepGPET addresses this gap as a fully-automatic, end-to-end algorithm that does not require manual interventions.DeepGPET provides similar performance as the previously clinically validated, semi-automatic GPET, while being fully-automatic and an order of magnitude faster.This enables the analysis of large scale datasets and potential deployment in clinical practice without necessitating a trained operator.Although the definition of the lower choroid boundary is still subject to debate -especially when it comes to suprachoroidal spaces -the most important consideration is to have a method that consistently applies the same definition across subjects and studies, which DeepGPET as a fully-automatic method provides.As an easily accessible, open-source algorithm for choroid segmentation, DeepGPET will enable researchers to easily calculate choroidal measurements much faster and with less subjectivity than before.

Figure 2
Figure 2 Correlation plots comparing derived measures of mean choroid thickness (a) and choroid area (b) using DeepGPET and the re-segmentations using GPET.

24 Figure 3
Figure 3 Bland-altman plots comparing the agreement between DeepGPET and GPET using mean choroid thickness (a) and choroid area (b).

Figure 4
Figure 4 Four examples from the adjudication.The rater preferred DeepGPET for (a-b) and GPET for (c-d).Top row: green, segmented by both GPET and DeepGPET; red, GPET only; and blue, DeepGPET only.Bottom row: arrows indicate important choroidal features which can make segmentation challenging.(a): no large vessels in nasal region to guide segmentation; (b): lower boundary very faint and below the posterior most vessels; (c): lower boundary noisy and faint; (d): large suprachoroidal space visible.

Figure 5
Figure 5 An example peripapillary scan from Heidelberg's Standard Module, automatically segmented by DeepGPET without manual intervention.

Figure 6
Figure 6 Three OCT B-scan images from a TopCon imaging device, of which two were successful (a-b) and one was not (c).

Figure S2
Figure S2 Test set Dice scores plotted against choroid thickness (left), area (middle) and Heidelberg-measured quality index (right) in the held-out test set.The outlier Dice score of approximately 0.84, is the dice score between DeepGPET and GPET from figure 4(d).

Table 2
performing better in terms of AUC but worse in terms of Dice suggests that for Metrics for DeepGPET and repeated GPET using the initial GPET annotation as "ground-truth".Time given as mean ± standard deviation.