Cost-efficient and glaucoma-specifical model by exploiting normal OCT images with knowledge transfer learning

Monitoring the progression of glaucoma is crucial for preventing further vision loss. However, deep learning-based models emphasize early glaucoma detection, resulting in a significant performance gap to glaucoma-confirmed subjects. Moreover, developing a fully-supervised model is suffering from insufficient annotated glaucoma datasets. Currently, sufficient and low-cost normal OCT images with pixel-level annotations can serve as valuable resources, but effectively transferring shared knowledge from normal datasets is a challenge. To alleviate the issue, we propose a knowledge transfer learning model for exploiting shared knowledge from low-cost and sufficient annotated normal OCT images by explicitly establishing the relationship between the normal domain and the glaucoma domain. Specifically, we directly introduce glaucoma domain information to the training stage through a three-step adversarial-based strategy. Additionally, our proposed model exploits different level shared features in both output space and encoding space with a suitable output size by a multi-level strategy. We have collected and collated a dataset called the TongRen OCT glaucoma dataset, including pixel-level annotated glaucoma OCT images and diagnostic information. The results on the dataset demonstrate our proposed model outperforms the un-supervised model and the mixed training strategy, achieving an increase of 5.28% and 5.77% on mIoU, respectively. Moreover, our proposed model narrows performance gap to the fully-supervised model decreased by only 1.01% on mIoU. Therefore, our proposed model can serve as a valuable tool for extracting glaucoma-related features, facilitating the tracking progression of glaucoma.


Introduction
Glaucoma, one of the leading causes of blindness, is characterized by the damage to visual cells within the retinal layers, resulting in a significant reduction in living quality for patients and the high proportion of labor capacity loss among glaucoma patients, imposing a substantial burden on both individuals and society [1].While timely detection and therapy can effectively mitigate its development, the resulting vision loss remains irreversible [2].Therefore, monitoring progression of glaucoma play a crucial role in assessing the effectiveness of therapy and preventing further vision loss, especially for cases in mild or serve stages [3].
In clinic, assessing condition of fundus region is essential for early detection glaucoma and tracking glaucoma progression.While color fundus imaging is a widely used approach to monitor condition of fundus for reasonably priced, it is restricted in limited situations as it primarily reveals superficial information [4].Currently, one of the most common imaging approaches-optical coherence tomography (OCT) [5] has been proved to owe ability of deep insight into retinal layers by a non-invasive method.Additionally, OCT images can provide quantify and qualitative information of retinal layers, including thickness information of retinal nerve fiber layer (RNFL) and detailed structure visualization of deep retinal regions, these features possess significant value in tracking progression of glaucoma [6,7].However, manual delineating retinal layers is time-consuming and low-efficient.Furthermore, the accuracy of annotations varies depending on diagnostic experience, considering the presence of inter-glaucoma and intra-glaucoma variants [8].Therefore, it's urgent to develop a robust and accurate segmentation tool to assist ophthalmologists to assess OCT images of glaucoma.
Deep learning-based models have achieved considerable success in image analysis tasks due to sufficient well-annotated datasets [9].However, developing supervised models for monitoring glaucoma suffers from insufficient annotated glaucoma OCT images, since high cost of collecting glaucoma samples and low efficient of labelling pixel-level annotations [10].Moreover, as illustrated in Fig. 1, most tasks of deep learning-based models are early detection glaucoma by using mixed dataset comprising large amount of normal domain images and combined with limited abnormal images.Although these mixed models demonstrate satisfied performance for early glaucoma detection, but it fails in glaucoma-confirmed samples, result in unsuitable for monitoring progression of glaucoma.The limited glaucoma samples make it hard for models to capture shared feature and glaucoma-related features at same time.As a result, the models tend to biased towards the normal domain, leaving a large gap to glaucoma domain [11].Except for insufficient labelled glaucoma datasets, intra-variants in glaucoma samples of different stages and individuals, inter-variants between glaucoma and normal domains present further challenges [12].
Fig. 1.The illustration of our motivation.Early detection models are not suitable for assisting monitoring progression of glaucoma due to a significant performance gap in glaucoma-confirmed samples.We exploit shared knowledge at encoding space and output space from low-cost annotated normal OCT images for alleviating the issue.
Currently, sufficient normal OCT images with pixel-level annotation is available at a low cost [13], [14].Therefore, it is profitable to exploit shared knowledge from normal dataset for glaucoma tasks based on similarities between two domains.A common and straightforward approach involves directly utilizing normal samples to train a model.While this strategy may yield acceptable performance for limited situation, such as early stage of glaucoma, but it fails in majority of glaucoma samples, particularly in later stages of glaucoma.The main reason is that significant feature discrepancies between two domains, resulting in models inevitably learn un-shared features and biased to normal domain.
For mitigating above issues, we argue that it is essential to appropriately introduce glaucoma information to establish a relationship between two domains during training stage.The discrepancies in segmentation results and encoding features between two domains could be considered as domain gaps.Therefore, performance deterioration issue can treat as an implicit domain gap problem.Inspired by domain adaptation technology, which aim to narrow gaps between source tasks and target tasks.We acknowledge that performance of domain adaptation relays heavily on similarity between source dataset and target dataset.Thus, most domain adaptation models are applied at same category under different conditions.While our source dataset (normal domain) and target dataset (glaucoma domain) are different categories, we argue that they share considerable overlap, including similar retinal layers layout and context patterns.Furthermore, no novel certain lesions are introduced in glaucoma samples.
Therefore, we proposed a novel knowledge transfer learning model for retinal layers segmentation of glaucoma OCT image to exploit resources from sufficient normal annotated datasets by introducing domain adaptation technology.The proposed model only demand for low-cost pixel-level annotated normal OCT images and image-level annotated glaucoma OCT images.Compared with the direct training strategy which simultaneously learn shared features and un-shared features, the proposed model not only exploit wealth from sufficient normal samples for capturing shared knowledge but also utilize learned relationship of two domains for capturing glaucoma-related features and removing un-related features.Our proposed model comprises two parts including a segmentation part and a domain discriminator part.The segmentation part serves as generator to produce feature maps in encoding spaces and segmentation results in output space shared by normal and glaucoma domain.The domain discriminator part serves as an evaluator to explicit assess domain gaps of encoding features and segmental results between normal and glaucoma domains.The sufficient normal OCT images (with pixel-level annotations) are utilized to optimize the shared segmentation part for capturing general features.While glaucoma OCT images (without pixel-level annotations) are utilized to evaluate the distance of encoding features and segmentation results between two domains for capturing glaucoma-specific features.Those distances serve as dissimilarities information to adjust the shared trained segmentation part by backward domain gaps gradient from evaluation part to the segmentation generator part, assist in transferring distribution of segmentation results from normal domain to glaucoma domain.Furthermore, narrowing domain gaps at multi-level spaces is beneficial for dense pixel prediction task, we combined domain gaps from an encoding space and output space to assist the proposed model to capture related-features at different spaces.In addition, we adopt a suitable size of the output layer of domain discriminator modules for capturing domain gaps from suitable dimensions.Therefore, the proposed model can enforce the segmentation results migrate to glaucoma domain at the output space and encoding space from suitable size.
To address the lack of high-quality glaucoma OCT image datasets, we have collected a dataset comprising normal and glaucoma samples with corresponding pixel-level annotations of retinal layers from annual physical examination database during 2008 to 2018 and called TongRen OCT glaucoma dataset (TRGD).

Contribution
The contributions of this work are summarized as follows: 1. We proposed a knowledge transfer model for assisting monitoring glaucoma progression, providing accurate retinal layers segmentation of glaucoma OCT images and reducing cost of labelling by exploiting sufficient pixel-level annotated normal OCT images.
2. We introduce domain adaptation technology to explicitly establish relationship between glaucoma domain and normal domain by multi-steps adversarial-based training strategy.
Our proposed model can gradually capture general features and glaucoma-specifical features based on similarities and dissimilarities from appropriate output size of encoding feature space and output space, respectively.
3. We have collected a high-quality glaucoma OCT dataset with pixel-level annotations, called TRGD, serve as a valuable resource for glaucoma research.We have conducted extensive experiments on TRGD dataset.The results show our proposed model outperforms the direct un-supervised algorithm and close to the fully-supervised method.

Related work
In this section, we briefly summaries retinal layers segmentation approaches in OCT images and knowledge transfer learning models in medical image.

Retinal layer segmentation approaches in OCT images
Approaches about retinal layers segmentation in OCT images mainly incorporate traditional machine learning-based approaches and deep learning-based approaches.
Most traditional approaches for OCT retinal layers segmentation are based on hand-crafted features by experts.Early traditional approaches attempted to find peaks of gradient intensity to represent boundaries of retinal layers by extracting grey gradients from images [15,16].Subsequently, deformable models such as active contours and level set were utilized to overcome disadvantages of grey gradient-based approach, by constructing an adaptive model based on shape priors [17][18][19].Moreover, a graph-based method called graph-cut was proposed by treating pixels in OCT image as vertices of a graph and their relationships as edges of a graph.However, establishing pixel relationship requires expert knowledge and diagnostic experience.Moreover, generalization ability of graph-based models is poor in un-seen dataset [20,21].
Recently deep learning-based approaches have been developed for segmentation retinal layers of OCT images.[22] achieved promising performance due to benefits from fully convolution layers and skip connections by introducing the U-Net [23].Followed by various U-Net-based models, including RelayNet [24], CE-Net [25] and U-Net++ [26], have been developed by modifying model frameworks to better suit OCT task.Moreover, generations adversarial network (GAN) based models have been utilized as regularization approaches, achieving significantly improved performance for the task [27,28].Furthermore, several models have successfully combined the advantages of deep learning-based models with traditional machine learning approaches with promising performance [28][29][30][31].
However, majority of researches are focused on developing universal segmentation models to segment all kinds of OCT images, including normal and various abnormal samples, for the purpose of early detection.Those universal models are biased to normal domain, resulting in poor performance in glaucoma samples and not suitable for assisting monitoring progression of glaucoma.

Knowledge transfer learning
Knowledge transfer learning aims to transfer useful knowledge from related datasets or tasks to new tasks [32].Commonly, dataset in source task should have large similarities with dataset in target task.In the case of nature images tasks, some public datasets such as ImageNet [33], Open Images [34] can serve as related datasets for down streaming tasks.Moreover, synthesis datasets have become popular due to low cost of annotating by automatic or semi-automatic methods.For instance, the synthesized dataset GTA5 can serve as a city-related semantic segmentation dataset [35].In the domain of medical image tasks, knowledge transfer learning serves as an annotation-efficient approach is appealing since the high cost of annotating and collecting datasets.
The pre-trained method is one popular approach of knowledge transfer, where a model is initially trained on related annotated datasets, then the pretrained model is fine-tuned for the target task.In medical tasks, various disease datasets can be utilized to train a pre-trained model for related disease tasks [36].Wang et al. proposed a transfer model that capture shared knowledge from lung diseases datasets to segment COVID-19 CT image tasks with limited annotated datasets [37].However, the performance of pre-trained models may be poor in some situations.One possible reason is that the related datasets did not attend training stage of downstream tasks, resulting in model cannot capture crucial related features.To address this limitation, several approaches have been developed for establishing connections between related datasets by introducing related dataset to training stage of downstream task.Zhang et al. proposed a source-free transfer model that can generate target-style and consistency features for diabetic retinopathy detection by collaboratively transferring features from un-annotated datasets [38].Moreover, domain adaptation-related technologies are introduced in transfer learning models for better aligning features between related datasets and alleviating the issues of limited annotations datasets.Lei et al. proposed an un-supervised domain adaption model for mapping related source datasets to target domain for improving segmentation performance of optic-disc and optic-cup in color fundus images segmentation task [39].Cho et al. proposed a stain-style transfer model to learn similarities and dissimilarities features between different stain-style of histopathological images via adversarial learning, thus encouraging the model to capture shared discriminator knowledge from source datasets [40].Alvaro et al. proposed a transfer learning model to learn similarities among datasets from different devices via a contrast learning model [41].
The performance of knowledge transfer learning model relies on similarities between source dataset and target dataset, significant feature disparities pose challenges for models to effectively capture related features.Therefore, most transfer learning approaches exploit shared features from datasets with same category.However, in our case, the source dataset is normal, while the target dataset is glaucoma.We argue that there exists a large overlap between glaucoma domain and normal domain as glaucoma primarily causes deformation and thickness changes of retina layers rather than introducing lesions, these similarities give a guarantee for the performance of our proposed knowledge transfer learning model.

Methods
In this section, we first introduce the overview of the proposed model, then we provide the details of the knowledge transfer module and multi-level knowledge transfer module.Finally, we go deep insight into overall training procedure.

Overview of the proposed model
The overview of the proposed model is shown in Fig. 2. The proposed model comprises one segmentation module and two levels of domain discriminator modules.The segmentation module is based on the U-Net and shared by normal OCT images and glaucoma OCT images to generate encoding features and segmentation results.Two domain discriminators, based on fully convolution architecture, serve as domain evaluator to establish relationship between two domains in different level spaces.The upper normal OCT images with pixel-level annotations are utilized to capture general shared features by the segmentation module, the down glaucoma OCT images without pixel-level annotations are adopted to capture glaucoma-related features by domain discriminators.Additionally, we introduced two level domain discriminator modules at the output space and an encoding feature space since encoding space and output space contain different level relevant features.

Knowledge transfer module
The details of our proposed knowledge transfer module as follows, it comprises two parts: a segmentation generator part and a domain discriminator part.The former part which denotes as S aims to generate encoding features and segmentation results.The framework of S is based on the U-Net model for capturing general OCT features by a supervised way with sufficient normal pixel-level annotated OCT images.Additionally, it's convenient to be replaced by other segmentation models based on specific tasks.The dice loss is adopted in segmentation generator module for mitigating imbalance problem by small targets, which could be written as follows: Where I n and Y n represent OCT images and corresponding pixel-level annotations from normal domain, I g represents OCT images from glaucoma domain.We first forward I n to segmentation generator part S and optimize segmentation module with pixel level annotations Y n from normal domain.Followed by we forward I g and I n to initialized optimized S module to obtain prediction pixel-level segmentation maps S(I n ) and S(I g ).S(I n ) and S(I g ) serve as inputs of latter part.The latter part, denoted as D, serve as a domain discriminator module responsible for estimating domain gap between segmentation maps results or encoding features of two domains from the former part S. The main framework of D is based on a fully convolution neural network that determines the domain of inputs by a domain discriminator loss.The loss can be written as followings: The domain discriminator loss, denoted as L D , is based on cross-entropy.S g and S n are the training set of glaucoma and normal, respectively.Z is assigned as 0 if the segmentation soft-max map belongs to normal domain, and Z is assigned as 1 for the segmentation soft-max map comes from glaucoma domain.We forward the prediction results S(I n ) and S(I g ) to D module for obtaining domain cross-entropy loss by L D and optimizing the D module with the ground truth image-level labels.
Followed by we update initialized optimized segmentation module S by domain gap loss.For evaluating domain gap loss L gap between two domains, we treat glaucoma labels as target imagelevel labels of normal images segmentations results.In details, we forward OCT segmentation results and encoding features from segmentation module to domain discriminator module D for calculating domain gap loss.The domain gap loss could be calculated by following function, which defines as follows: Then gradient of L gap is back-propagated from D module to segmentation generator part S for transferring segmentation results closed to glaucoma domain, and encouraging part S to generate glaucoma-related features.

Multi-level knowledge transfer module
Only aligning global features in output space is not sufficient for pixel-level classification task, we introduce two level modules at an encoding feature space and output space in our proposed model.The output space of the segmentation module contains scene layout and context information, and encoding spaces also have shared features at high dimension level space.
In details, two separated domain discriminators, sharing a similar architecture, are connection to two distinct spaces: an encoding space and the output space.We fuse two level spaces losses by trade-off weights coefficient.To balance different level losses, we employ a collaborative learning method based on performance.The fusion function for domain gaps losses at different level as followings: where L gap e and L gap o are encoding and output space domain gap loss, respectively.w e and w o is the corresponding weight coefficient.Detailed modules architecture will be showed in the part of details of our proposed framework.

Objection function
To achieve optimal performance, we formulate glaucoma OCT images segmentation task with an objection function by two different losses, including a loss from segmentation generator module and two losses from domain discriminator module at encoding feature space and output space.We combined directly them together.Therefore, the overall object function can be written as followings: where L seg is dice segmentation loss with using ground truth pixel-level annotations of normal OCT images by supervised strategy, L gap_all is domain gap loss of predictions result of normal images to glaucoma domain.Where I N and I G are OCT images from normal domain and glaucoma domain, respectively.
We apply min-max optimized strategy to assist model coverage as following: To encourage segmentation results closed glaucoma domain, our proposed model minimizes segmentation error for pixel-level annotated normal images by segmentation loss from segmentation module and maximize probability of segmentation results being considered as glaucoma domain by domain gap losses from domain discriminators.

Details of the proposed model
Our proposed framework comprises a segmentation generator module and two domain discriminator modules.We utilize the U-Net, a prevalent medical segmentation network, as the base-bone of the segmentation module, which is shared by normal OCT images and glaucoma OCT images.The segmentation module is composed of two pathways: a contraction path for encoding features and an expansion path for recovering spatial information.We apply four down-sampling blocks in the contraction path, each block has two convolutional block and a max-pooling layer.As for the expansion path, we also apply four up-sampling blocks, which has two convolutional block and an un-pooling layer.To facilitate recovering spatial information, the features from four encoding layers are concatenated to the corresponding decoding features by four skip connections for passing shallow features to deep layers.In details, a convolutional block within each path involves a convolutional layer with a 3 × 3 kernel and 1 step padding, followed by a batch normalization layer and an un-linear layer with leaky ReLU.Additionally, the UP1 encoding features and output feature serve as inputs of following two discriminators.
We utilize the same framework of domain discriminator path in both an encoding feature space and the output space.The framework of domain discriminator path is based on fully convolution neural network, which contains four consecutive convolution neural layers with same kernel size but different numbles of filters.A leaky ReLU non-linear layer follows every convolution layer except for last convolution layer.Moreover, the last output layer of domain discriminator consists of a convolution layer with only one 3 × 3 kernel filter, producing a one channel soft-max map as domain discriminator results.Each value in the map represents a domain distance between normal and glaucoma domains in the corresponding region of input features from encoding space or output space of the segmental module.In our proposed model, we only utilize two levels of features due to balance cost and performance.

Overall training procedure
The detailed training procedure as follows: We firstly forward normal OCT images I n to segmentation module, optimizing the segmentation generator with pixel-level annotated Y i by supervised strategy.Then segmentation soft-max results and encoding features maps of glaucoma domain and normal domain from initialized optimized segmentation module are passed to corresponding level domain discriminator module, so as to evaluate domain gap loss with fake image-level labels.We then update generator segmentation module by backpropagating domain gap gradients.This encouraging segmentation module to generate segmentation results closed to glaucoma domain.Then to improve the capability of the discriminator module, we fix parameters of segmentation module and optimize the discriminator modules with inputs from former segmentation module by decreasing discriminator loss.The iteration will not stop until domain gap loss relatively low.Moreover, we apply a step-by-step training strategy by randomly selecting part of normal domain for obtaining small domain gap once to optimize proposed module until model converge.The first part of converged model will serve as segmentation module to obtain final segmentation result of glaucoma OCT images.

Deployment in clinic practice
In clinic practice, the deployment strategy of our proposed model includes several steps.First, the proposed model is trained with low-cost annotated normal samples.Subsequently, during inference stage, the optimized segmentation part serves as the final segmentation model, and the un-segmentation part are removed from the model.Thereafter, successive OCT images of a patient are inputted into this model, generating the segmentation results of retinal layers.These pixel-level classification results are then to derive progression-related features.The ophthalmologists can subsequently use these features to assess the conditions of deep retinal layers by comparing changes of glaucoma-related layers between previous and current OCT images, any detailed alterations can be provided as a valuable reference, aiding ophthalmologists in making a choice to further treatment.

Data acquisition and data overview
The glaucoma and normal OCT images utilized in this study were obtained from an annual physical examination project conducted by KaiLuan MeiKuang, a mine company, from 2008 to 2018 in HeBei province, PR China.The images were acquired by Spectralis SD-OCT (Heidelberg Engineering Gmbh, Heidelberg, Germany) employing five different scan modalities at the macular region and optic nerve head (ONH) region.As shown in Fig. 3, the datasets are specifically from circular scans in ONH region, which are commonly employed to monitor glaucoma.As shown in Table 1, the total size of our collected dataset comprises 567 samples, consisting of 347 glaucoma and 220 normal samples.All images were captured from adults (overall mean: 65.97, overall standard variation: 10.92) with a mean age of 56.51 and 71.89 for normal and glaucoma subjects, respectively.The male-to-female ratio among all participants is approximately 1:0.89 (299 males to 268 females), while the size of male samples in glaucoma samples is larger than female samples (182 males to 165 females).Additionally, female samples account for a large proportion of the normal samples (134 normal females to 220 normal all).We find that thickness of RNFL is different between glaucoma and normal samples (88.88 glaucoma to 111.37 normal).Therefore, these feature dis-similarities exists an obvious difference between normal and glaucoma samples.The race of all samples is Han Chinese of East Asian race.The detailed age, Quality, and RNFL distribution of normal and glaucoma samples are in Fig. S1 in Supplement 1.All the individuals in the dataset are agreed to use their OCT images for this research by written informed consent.The quality of the image plays a key role in influencing the performance of our models.To close to clinic practice, we have not employed a constrain for exclusion of low Quality value images.Nevertheless, we conduct manually inspections of the samples to eliminate any images that may be complete without retinal layers or exhibit deficiencies by eye movement.The entire annotated procedure has two levels (image-level and pixel-level).For image level annotation (category of images is glaucoma or normal), an expert from TongRen Eye hospital with more than 10 years diagnosed experience provided diagnosed results based on clinics evidence, then assigned as image-level label.For pixel-levels annotations, we initially applied a graph-cut based algorithm (OCTSEG v0.4 [42]) to obtain pre-segmentation results, reducing labeling cost.Subsequently, two experts from HuaZhong University of Science and Technology union ShenZhen hospital and Lishui Renmin hospital optimized the pre-segmentation results and corrected any wrong segmentation annotations, each expert is responsible for half size pixel-level annotations of the dataset.Finally, all pixel-level annotations were reviewed by one expert with over 10 years of diagnosing experience (same as the image-level annotator).The analysis is performed on individual eye.In addition, we incorporate GCC layers, which include layers relevant to glaucoma, specifically RNFL, GCL, and IPL.For more detailed information about the dataset can be found in Table 1.

Preprocessing of OCT images
To improve the quality of our dataset, we implemented two preprocess steps.Firstly, we utilized BM3D denoise algorithm to effectively remove spike noise in OCT images by carefully adjusting parameters to obtain high quality and low noise images [43].Secondly, considering the relatively low contrast degree of raw OCT images in our dataset, result in hard to determinate boundaries of retinal layers, we applied adaptive histogram equalization algorithm to enhance the contrast of OCT images for mitigate fuzzy boundaries [44].The preprocessing examples can be found in Fig. S2 in Supplement 1.

Experimental configuration
For accurately evaluating the performance of our proposed model, we randomly split the dataset based on a predefined method according to ID number of samples, detailed steps are in the supplemental material.For our proposed model, including multi-level model and single-level model, we utilize 220 normal OCT images with pixel-level annotations and 220 glaucoma OCT images without pixel-level annotations as training dataset.For the un-supervised method, the training dataset are all from normal domain (220 normal OCT images with pixel-level annotations).For the fully-supervised method, the training dataset are all from glaucoma domain (220 glaucoma OCT images with pixel-level annotations).In the mixed methods, we apply different proportions of pixel-level annotated glaucoma and normal subjects as the training datasets, while keeping the same overall number of images.The validation dataset and test dataset are the same for all experiments comprising 59 and 68 glaucoma samples, respectively.The input image size for the segmentation module is 512 × 512 × 1 (height × width × channel), with a batch size of 1.The segmentation module produces two level features (64 × 64 × 256, 512 × 512 × 6) from an encoding space and the output space, respectively.These features serve as inputs of corresponding domain discriminator modules.We employ stochastic gradient descent (SGD) to optimize segmentation module with an initial learning rate of 2.5 × 10 −4 , momentum of 0.9 and weight decay of 5 × 10 −4 .AdaBoost optimizer with an initial learning rate of 1 × 10 −4 and betas of 0.9 and 0.99 is utilized for optimizing domain discriminator module.During the optimization produce, the parameters of segmentation module are updated twice, while the domain discriminator modules are updated once.

Implementation details and evaluation metrics
All the experiments are implemented with Pytorch (pytorch-gpu 1.8) [45], and trained on NVIDIA Tesla K80 GPU with 12 G memory.The proposed model contains a segmentation module and multi-level domain discriminator modules.The backbone network of segmentation generator module is a modified U-Net, one of the most prevalent networks in medical image analysis, it can be easily replaced by other segmentation networks.The domain discriminator modules, based on a fully-convolution neural network, are applied at an encoding space and the output space of segmentation module.The details of network are shown in chapter of the proposed network.
To quantitatively evaluate performance of our proposed model, we utilize the intersection over union (IoU) coefficient, a common metric in image segmentation task.It is based on the set theory by calculating ration of intersection to union of sets for evaluating matching similarity between sets, which is defined by the following formula (7), where TP denotes true positive, FP denotes false positive, TN denotes true negative and FN denotes false negative segmentation results.A larger value indicates higher similarity between sets and better performance of model.
To further quantitatively evaluate segmentation performance of our proposed model, we employ the Dice coefficient, a common metric utilized in medical image tasks.This metric is a varied IoU coefficient by calculating the ration of twice intersection to sum of two sets, as define by the following formula (8), where TP and TN denote true positive prediction results and true negative prediction results, respectively.FN and FP represent false negative prediction result and false positive prediction results, respectively.
To quantitatively evaluate the performance of our proposed model for monitoring progression of glaucoma, we employ the thickness prediction difference of RNFL layer at different positions, a key glaucoma-related features in clinic.It defines as follows (9): counting pixels of RNFL in ground truth and segmentation result, then calculating the square thickness error at the same position between ground truth and segmentation result, then divided it by corresponding width of RNFL.A smaller mean square error between the ground truth and predicted RNFL thickness indicates better performance.Where MSE_L is mean square error RNFL thickness between ground truth and prediction value, L g i and L P i is ground truth and prediction value of RNFL at position i, respectively.w i denotes as the width of RNFL.
To assess boundaries segmentation results of our propose model, we apply Hausdorff distance which defines as (10), G denotes as the ground truth and P denotes as the segmentation results.The d(a, b) in formula (10) is the Euclidean distance between point a in ground truth and point b in segmentation results.The larger value of the Hausdorff distance means larger gap between boundaries of ground truth and prediction results, demonstrates lower performance.

Un-supervised model and fully-supervised model
To establish a baseline for evaluating our proposed model, we first train a model only using pixel-level annotated normal OCT images, then we apply the trained model to predict the glaucoma test dataset as the un-supervised approach.To ensure a fair comparison, the framework of un-supervised model is same as the segmentation module of our proposed model, the experimental settings and test dataset are also consistent.The quantitative experimental results are shown in Table 2. Based on results, we have determined that performance of the un-supervised approach on the glaucoma test dataset isn't extremely poor (82.94 mIoU, 61.77MSE_L, 74.97 GCC, 0.9067 Dice, and 37.23 H-distance) due to the partial overlap or similarities between normal and glaucoma domains.However, feature discrepancies between two domains deteriorate performance of the un-supervised model.In addition, it is critical to establish a top baseline by training a model with the pixel-level annotated glaucoma dataset as the fully-supervised approach to evaluate the performance distance for other methods.The architecture of model, test dataset and experimental settings are the same as the un-supervised approach.The quantitative experimental results are shown in the Table 2, which demonstrates that the performance of fully-supervised approach is significantly better than the un-supervised approach in all metrics (88.21 mIoU, 8.39 MSE_L, 82.67 GCC, 0.9374 Dice, and 24.53 H-distance).The main reason is that the domain of fully-supervised approach is the same as the domain of test dataset, while the training dataset in the un-supervised model is from normal domain.However, the cost of fully-supervised approach is expensive due to the requirement of pixel-level annotated glaucoma dataset.

Single-level knowledge transfer model at output space
To demonstrate our proposed model is better than the un-supervised approach, we conducted extensive experiments with a single-level proposed model at output space by only narrowing domain gaps at the output space.The experimental settings, test dataset and the segmentation module are the same as the un-supervised approach, except for a domain discriminant module is added at output space.The results are shown in the Table 3, which show that single-level model at output space achieves 85.90 mIoU when domain gap loss weight is 1 × 10 −4 and outperforms the un-supervised approach with increased by 3.57% in terms of mIoU, which demonstrates that the knowledge transfer module encourages the segmentation module to capture glaucoma-related features.However, the performance is susceptible to the weight assigned to domain gap loss, unsuitable weight leads the performance of model even lower than model without adaptation module.As shown in Table 3, when weight of domain gap is set to 0.1, the performance of single-level model only reaches 82.16 mIoU.In addition, performance of proposed model is influenced by output sizes of the final layer in the domain discriminator.Therefore, we conduct experiments with 6 different output size (single value, 32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512) of final layer of domain discriminator module.All other experimental settings, the architecture of model and test dataset are the same as above model.The results are shown in Table S1, which demonstrate that the patch-level (32 × 32) model can yields the highest performance (86.16 in terms of mIoU) out of all output sizes.While the model with output size 512 × 512 and single value only achieved 83.20 and 85.90 on mIoU, respectively.The main reason is that the model with single value of output layer measured domain gaps at global size leads model lost important details, but the discriminator modules with large output size focus on too much details and easily be influenced by noise.

Single-level knowledge transfer model at encoding space
In addition, we conduct extensive experiments with different single-level model by independently exploring different spaces at encoding spaces to uncover roles of features.All the experimental settings, test dataset and architecture of module are the same as above proposed single-level model at output space.The results are shown in Table 4, which demonstrate that features from different encoding spaces contribute differently to the overall performance.Moreover, single level model at UP4 and UP1 space rank top two performance among all single-level models with 86.23 mIoU and 86.17 mIoU, respectively.The main reason is that the features from high dimension feature spaces involve high level semantic discriminator information and thus facilitate domain discriminator evaluation of domain gaps.Although the single-level model at bottom space achieves lowest performance (85.47 mIoU) among all single-level space models, it still outperforms than the un-supervised approach which proves the effective of our proposed model.

Multi-level knowledge transfer model at encoding and output spaces
To further improve performance of our proposed model, we conduct experiments with multi-level proposed model at a feature space and the output space for capturing different level features.The settings of experiment, architecture of model and test dataset are the same as the single-level models.The architecture of our proposed multi-level model is shown in Fig. 2. The experimental results are shown in  37.23).In relation to another glaucoma-related layer of GCIPL, our proposed model demonstrates superior performance compared to the un-supervised model with a significant increase of 9.81%.Regard of all the glaucoma-related layers-GCC, our proposed model demonstrates superior performance than the un-supervised model, achieving an improvement of 7.96%.Furthermore, as shown in the Table 5, it is essential to balance weights of different level domain gaps due to features in encoding space and output space play different roles.Thus, these results show that only narrowing domain gap in just one space is insufficient to achieve satisfying performance.a Note: Ada1 is the weight of UP1 encoding space, Ada2 is the weight of output space and fixed at 1 × 10 −4 according the optimal performance of single-level model.

Mixed model with different proportion of pixel-level annotated glaucoma
To further verify the performance of our proposed method, we compare our proposed model with the common mixed training strategy in early glaucoma detection task.We trained a model using the mixed datasets, which combined pixel-level annotated glaucoma OCT images and normal OCT images, by supervised method for mimicking common early detection models [46,47].The architecture of mixed model is the same as segmentation module of our proposed model, the experimental settings and test dataset used are also the same as our proposed model.The results are shown in the Table 6.Compared with the un-supervised method (0% glaucoma), performance of mixed method improves with help of pixel-level annotated glaucoma images.However, our proposed model still outperforms the common mixed model, achieving a performance improvement by 5.77% when combined 10% pixel-level annotated glaucoma images and 90% pixel-level annotated control images as the training dataset.Typically, the amount of glaucoma samples in the overall dataset is small compared to a larger proportion of normal samples.As a result, the mixed model biased to the normal domain, causing poor performance in glaucoma samples.While performance of mixed models improves as the ratio of glaucoma increases (60% glaucoma, 85.32 mIoU), it tends to inevitably learn both shared features and un-shared features since they only implicitly capture differences between two domains.Furthermore, another drawback of the mixed training strategy is its high cost associated with pixel-level annotated glaucoma OCT datasets, but our proposed model does not require such pixel-level annotations.
Based on these results, it is evident that the early glaucoma detection model trained with mixed dataset isn't suitable for monitoring the progression of glaucoma.

Visualization
To facilitate direct observation of the experimental results, we visualize segmental results of the retinal layers from different models.The visualization results are shown in Fig. 4, which clearly demonstrate better performance of our multi-level proposed model compared to the un-supervised method and mixed training method.Furthermore, results of our proposed model closely approximate the ground truth.Notably, our proposed model achieves more accurate boundaries and layers segmentation results, particularly in the case of RNFL segmentation, with significantly fewer mis-segmentation isolate regions compared to other methods.In order to intuitively observe learning abilities of different models, we visualize training stages of different models by comparing the absolute difference of relationship matrixes between normal domain and glaucoma domain at the same epoch.As depicted in Fig. 5, the results suggest that the absolute difference matrix from the model without domain discriminator modules keeps relatively stable during the training stage.This stable trend indicates that the model only captures normal-related features.However, the absolute difference matrix generated by the model with domain discriminator modules exhibits fluctuation during early training stage, gradually stable as training, it indicates that the model is progressively learning glaucoma-related features.The different trends intuitionally demonstrates that the proposed model owe ability to capture glaucoma-related features and remove un-related features, and explicitly explain benefits of our knowledge transfer learning modules.
In order to visualize results and domain gaps at domain level, we visualize distribution of prediction results of the un-supervised model and the proposed model with same test dataset, respectively.The results, shown in Fig. 6, reveal a large interval between distribution of prediction results from the un-supervised model and distribution of ground truth glaucoma domain, while it is closed to distribution of normal domain.However, distribution of prediction results of the proposed model is closed to the distribution of glaucoma domain and significantly far away from normal domain.Therefore, visualized distribution results at domain level further suggest that the un-supervised learning model haven't captured glaucoma-related features, and provide compelling evidence that our proposed model effectively narrow domain gap in explicitly method.

Discussion
Glaucoma can cause permeant damage to vision loss even blind.Getting timely intervention and therapy is crucial for stopping its development.A comprehensive assessment of retinal layers provides valuable information for assisting in tracking glaucoma progression.But manual delineation of retinal layers is low efficient, which demands extensive clinical experience.Therefore, it's appealing to develop a high-efficient and accurate segmentation tool to assist ophthalmologists to obtain valuable information.
Several glaucoma-related marks, including the thickness map of RNFL and GCC extracted from OCT images, hold significant importance for tracking the progression of glaucoma.Moreover, these indicators possess the potential to significantly assist ophthalmologists in refining both diagnosis produces and treatment strategies [48][49][50].The measurement of these marks relies on the segmentation performance of retinal layers in OCT images, as inaccuracy segmentation results contributes to generation of the imprecise and unreliable information of marks, potentially resulting in missed diagnoses.Therefore, there exists a compelling need to develop an automatic and precise model for retinal layers segmentation, aims at cost reduction, the efficient enhancement, and high-accuracy.This is especially crucial in undeveloped region where is a shortage of ophthalmologists.
Supervised learning models have achieved great successes for sufficient labeled datasets.However, extending these models to glaucoma monitor progression task can be challenging for limited annotated glaucoma datasets available.The major obstacle is high cost and burden associated with dataset collecting and labelling.The deformation and thickness charges of retinal layers complicate the determination of boundaries.Therefore, it's great significance to develop a knowledge transfer model serve as an annotated-efficient method to low cost.Furthermore, considerable feature discrepancies across different stages and individuals with glaucoma samples result in further challenge.
Currently, sufficient pixel-level annotated normal dataset can be exploited at low cost.However, effectively leveraging the shared knowledge from normal dataset is challenge.Performance of model only using normal dataset is poor in glaucoma samples for inevitable capturing shared features and un-shared features simultaneously.Employing a common mixed strategy to directly trained a model with a mixed dataset comprising a large number of normal samples and a small number of glaucoma samples, which tends to pay more attention on the major category and bias towards the normal domain.
To address the issue of insufficient annotated glaucoma dataset and overcome limitations of above learning strategies.We argue that exploring knowledge from sufficient annotated normal samples for lowing cost is promising with correctly transferring true knowledge from the normal dataset.Our intuition is that there exists a feature overlap between normal domain and glaucoma domain due to similarities, e.g., similar layout and texture of retinal layer since no certain fluid lesions are introduced to disrupt layout of retinal layers.Therefore, we can capture shared features from normal domain based on similarity.However, it still has feature discrepancies between two domains since glaucoma cause retinal layers deformation and retinal thickness changes.For instance, thickness of RNFL differs in normal samples and glaucoma samples, morphology of glaucoma samples is diversity compared to consistent morphology in normal samples.Moreover, boundaries of some retinal layers in glaucoma samples may not be distinct.
We argue that these feature discrepancies can be seen as domain gaps, developing a model to narrow domain gaps is key to disentangle poor performance.Inspired by strategies employed to solve domain gaps in nature images, where poor performance arises by different domains of source and target dataset, we proposed a knowledge transform model to exploit wealth from sufficient pixel-level annotated normal datasets by establishing relationship between two domains.We argue that the situation of our proposed model differs from typical domain gaps problems, where source and targe dataset belong to the same category, but the categories of our source domain and targe domain are different.The proposed model can discover similarity and dis-similarity between two domains by shared segmentation module and domain discriminator modules.To evaluate our proposed model, we collected a dataset comprising normal and glaucoma samples with corresponding pixel level annotations of retinal layers from a large population of physical examination database during 2008 to 2018.
To validate the performance of our proposed model better than common un-supervised model, we first employ a single-level model at output space or encoding feature spaces.From quantitative, qualitative and visualization of the experimental results, which demonstrate that all proposed single-level models achieve better performance in test glaucoma samples than unsupervised model.Furthermore, the single-level model at output space achieves the satisfying performance since features (i.e., layout and texture) are distinct in output space.These results provide compelling evidence that our proposed model effectively discovers and captures glaucoma-related features.Particularly, the performance of RNFL, one of the most glaucoma-related features, significantly outperform than un-supervised model and mixed model in our proposed model.
Moreover, only narrowing domain gaps at a single space is insufficient for pixel-level prediction tasks, we propose a multi-level strategy by combining features from both an encoding feature space and the output space.We only employ two-level model at an encoding space and output space rather than all level spaces for balancing performance and cost.The results show that the cooperative learning strategy in two-level spaces can yields better performance than single-level model.Equivalently, narrowing domains gaps at output space and encoding space are beneficial for encouraging domain of prediction results closed to glaucoma domain together, and alleviate the model biasing to normal domain.
From visualization results of some test samples, difference matrix of training stage and the distribution map of prediction results, which directly prove that prediction results of our proposed model are more closed to the target glaucoma domain than un-supervised method and mixed training model.In summary, our proposed model not only capture general features based on similarities from the sufficient pixel-level annotated normal samples, but also leverages dissimilarities to capture glaucoma-related features while removing unrelated features.However, our proposed model has some limitations.Firstly, our proposed model focuses on accuracy segmentation retinal layers of glaucoma-confirmed samples for obtaining glaucoma-related markers to assist monitoring the progression of glaucoma and is not suitable for early detection of glaucoma.Moreover, the performance of our proposed model falls in comparison to fullysupervised method, which deteriorate its applicability in clinic.Furthermore, our proposed model was limited in five major retinal layers of OCT images, and the classification criterion used to annotate dataset is relatively coarse, we just categorized images as normal or glaucoma without detailed stage of glaucoma based on severity, which result in lacking comprehensive evaluation.
In the near future, we aim to develop novel knowledge transfer framework for improving capacity of the proposed model to close the performance of fully-supervised approach.We believe that it may exists alternative transfer learning models, including various domain adaptation models, potentially better suited for the specific task, necessitating exploration in further work.Additionally, we plan to improve quality and diversity of glaucoma datasets by annotating detailed staged of glaucoma stages (mild, moderate and severe), providing more retinal layers with pixel-level annotations and collecting more samples for comprehensively evaluating the performance of our proposed model.

Conclusion
In this paper, to alleviate the issue of insufficient pixel-level annotated OCT images for developing model to assist monitoring glaucoma progression, we proposed a knowledge transfer model to exploit benefits from sufficient pixel-level annotated normal samples by narrowing domain gaps between normal domain and glaucoma domain in explicit method.Furthermore, we collected a high-quality glaucoma OCT dataset from annual physical examination database during 2008 to 2018.The extensive experiments on the dataset reveal that our proposed model can transfer shared knowledge from sufficient annotated normal samples and significantly outperforms the direct unsupervised model and mixed training model.Moreover, a multi-level strategy was employed to exploit features from an encoding feature space and output space, narrowing performance gap to the fully supervised model.These results prove that the proposed model effectively capture general features and glaucoma-related features based on the similarities and discrepancies, thus achieving decent performance and lowering the cost simultaneously.The visualization results at domain level and training stages show that our proposed model encourages distribution of prediction results from normal domain to target glaucoma domain, thus alleviate the biased issue.Moreover, we hope our proposed model can be extended to other suitable challenging medical images tasks.

Fig. 2 .
Fig. 2.The illustration of our proposed model, which incorporates one shared segmentation module for generating pixel-level results and two different levels of domain discriminators modules for evaluating domain gaps.Sufficient normal OCT images with pixel-level annotations are employed to optimize the shared segmentation component for capturing general features.Glaucoma OCT images without pixel-level annotations are utilized to assess the dissimilarity between two domains for capturing glaucoma-specific features by image-level labels.These distance as dissimilarities information to adjust the shared segmentation module by back propagating domain gap gradients from the evaluation module to the segmentation module, encouraging model transferring the distribution of segmentation results close glaucoma domain.During the inference phase, the optimized module can be utilized to obtain pixel-level segmentation results, subsequently transforming segmentation results to glaucoma progression-related features, such as the thickness map of RNFL.

Fig. 3 .
Fig. 3. Two examples of normal and glaucoma OCT images in TRGD dataset, scanned in ONH region by circular modal.The rightmost figure is the example of pixel-level annotations of OCT images.

Fig. 4 .
Fig. 4. The visualization comparison of segmentation results from different models on TRGD dataset.In each figure, a zoom-in image is provided to facilitate a detailed view of small region.

Fig. 5 .
Fig. 5. Visualization of feature relation matrixes for normal and glaucoma samples, along with the absolute different matrix with and without knowledge transfer module during training.

Fig. 6 .
Fig. 6.Visualization of the distribution of results from different methods, the ground truth normal domain and glaucoma domain.

Table 1 . Demographics and image characteristics of TRGD dataset. a
a Note that OD: right eye, OS: left eye, ART: Automatic real time, Quality: image quality3.1.2.Image quality control and annotation standard

Table 2 . Quantitative results of un-supervised, fully-supervised, mixed-strategy, single-level and multi-level proposed model a
MethodUn-supervised Mixed-strategy Fully-supervised Single-level Multi-level 50% pixel-level annotated normal + 50% un-annotated glaucoma, transfer knowledge only applied at the output space), multi-level (training dataset = the same as single-level model, but transfer knowledge applied at UP1 encoding space and the output space), detailed settings of difference models are shown in the section of the experiment configuration.

Table 3 . Different weight of single-level model at output space. a
a Note, Ada2 is the balanced weight of single-level model at output space.

Table 2 ,
which demonstrates that our proposed multi-level model achieved best performance than other methods.It shows that the proposed model has captured different level features by exploiting benefits from multi-level spaces.The performance of the multi-level model (87.32 mIoU) is improved by 1.35% and 1.26% compared to the single-level output model (86.16 mIoU) and single-level encoding model (86.23 mIoU), and 5.28% compared to the un-supervised method (82.94 mIoU), respectively.With regard to another segmental metric of Dice, the multi-level proposed model also outperforms other models, achieving a Dice score of 0.9323.Moreover, segmentation results of RNFL and Hausdorff distance by our proposed model (9.93 and 26.64) significantly improved compared to the un-supervised model (61.77 and

Table 6 . Quantitative results of mixed model at different ratio of pixel-level annotated glaucoma. a
Note that % represents the proportion of glaucoma samples in the overall training datasets, while the total size of the training dataset keeps constant. a