Multimodal affine registration for ICGA and MCSL fundus images of high myopia

: The registration between indocyanine green angiography (ICGA) and multi-color scanning laser (MCSL) imaging fundus images is vital for the joint linear lesion segmentation in ICGA and MCSL and the evaluation whether MCSL can replace ICGA as a non-invasive diagnosis for linear lesion. To our best knowledge, there are no studies focusing on the image registration between these two modalities. In this paper, we propose a framework based on convolutional neural networks for the multimodal aﬃne registration between ICGA and MCSL images, which contains two parts: coarse registration stage and ﬁne registration stage. In the coarse registration stage, the optic disc is segmented and its centroid is used as a matching point to perform coarse registration. The ﬁne registration stage regresses aﬃne parameters directly using jointly supervised and weakly-supervised loss function. Experimental results show the eﬀectiveness of the proposed method, which lays a sound foundation for further evaluation of non-invasive diagnosis of linear lesion based on MCSL.


Introduction
Pathological myopia is a major cause of blindness in many developed countries [1,2].T. Tokoro et al. proposed that high myopia accompanied by visual dysfunctions might be defined as pathologic myopia [3].According to [4,5], linear lesion (as indicated by the yellow arrow in Fig. 1) is an important clinical sign for evaluating the development from high myopia to pathological myopia.At present, indocyanine green angiography (ICGA) (shown in Fig. 1(a), (b), (c) and (d)) is considered to be the "Ground Truth" for the diagnosis of linear lesions in ophthalmology clinic [6,7], but it requires the injection of contrast agent indocyanine green (ICG), which may cause adverse reactions such as allergy, dizziness, and even shock [8].So it is an urgent need to find a non-invasive imaging modality that can replace ICGA for the diagnosis of linear lesions.Multi-color scanning laser (MCSL) imaging is a non-invasive imaging technology, in which three lasers with different wavelengths (488nm, 515nm and 820nm) are used to scan the fundus simultaneously.MCSL image fused by several fundus images (shown in Fig. 1(e), (f), (g) and (h)) can reveal linear lesions more richly than other non-invasive modality such as color fundus imaging and red-free fundus imaging and some other invasive one such as fundus fluorescein angiography (FFA).Therefore, we try to investigate whether MCSL could replace ICGA as a non-invasive imaging for linear lesion diagnosis.At the beginning of this study, the ICGA and MCSL images need to be registered.As can be seen from Fig. 1, the multimodal registration between ICGA and MCSL fundus images is a big challenge due to two aspects: (1) the appearance differences between ICGA and MCSL are large.(2) The retinal vessels are so fuzzy in the late phase ICGA images that it can't be utilized as structure feature during the registration.Traditional image registration methods can be roughly classified into two categories: intensitybased methods and feature-based methods [9,10].The main idea of the intensity-based methods [11,12] are to search the geometric transformation parameters iteratively.The geometric transformation parameters are optimized by maximizing or minimizing a similarity metric between the transformed moving image and the corresponding fixed image.Common similarity metrics include cross-correlation (CC) [13], mutual information (MI) [14], normalized mutual information (NMI) [15], and sum of squared difference (SSD) [16], etc.Because of the iterative operations, the time and computational cost of registration is high.Feature-based methods [17,18] usually use features such as point, edge, line, contours surfaces and area, to establish the correspondence between fixed image and moving image.Algorithms such as SIFT [19], SUNSAN [20] and SURF [21] are often used in feature extraction.Generally, the retinal fundus image registration adopts the feature-based methods.Existing feature-based fundus image registration [22][23][24][25] mainly used vessel features for registration of color fundus images, FFA image and ICG image.However, it is not suitable to use retinal vessel features for late stage ICGA and MSCL image registration.As shown in Fig. 1 In order to overcome the shortcomings of the traditional methods, deep learning based methods, which have achieved great success in image classification [26], segmentation [27] and object detection [28], are introduced in image registration and achieves better performances.In particular, the registration time is greatly reduced, making intraoperative real-time registration possible.The deep learning based methods can be classified as supervised learning, weakly supervised learning and unsupervised learning.The supervised methods [29,30] need the corresponding ground truth deformations obtained by traditional methods or manual registration.Reference [31] proposed a dual-supervised deep learning strategy that involves using both supervised and unsupervised loss functions.There are also some works [32,33] using weakly supervised learning to merge the tissue segmentation label into the registration network and guide the network to learn registration parameters, which do not require registration ground truth.In recent years, to avoid the difficulties of registration ground truth obtainment and exhausting manual segmentation, the unsupervised learning method [34][35][36] has been developed.In this kind of methods, similarity metric is used as cost function to be maximized or minimized during network training.However, [37] pointed out that manually crafted similarity metrics have made very little success in multimodal registration.Recently, generation adversarial network (GAN) has been applied in registration [38][39][40][41][42], in which the discriminator is equivalent to the similarity metric.
Several studies [29], [43][44][45] have pointed out that a single convolutional neural network can not work well in registration task with large deformation.To solve this problem, [29] proposed CNN regressors to predict the affine parameters in a hierarchical manner.Reference [43] adopted the multi-stage strategy by stacking multiple stages of ConvNets, in which each stage has its own registration task.Work [44] proposed a multi-step affine framework which contains only one neural network.As in a recurrent network, the parameters of the affine network are shared by all steps.At each step, the network outputs the parameters to refine the previously predicted affine transformation.However, the best number of iterations needs a lot of experiments to be sure and recurrent training of a single neural network will increase the training time.Paper [45] proposed an unsupervised end-to-end learning with progressive alignment through deep recursive cascaded neural network.In this work, they trained a multi-network to transform images with large deformation gradually.Furthermore, [46] proposed a two-step registration method with traditional registration methods.
To the best of our knowledge, multimodal registration of ICGA and MCSL has not been reported yet.Since the high myopia cases to be analyzed are in the stationary phase, the key fundus features such as optic disc, retinal vessel and linear lesion do not change significantly in the MCSL and ICGA image pair, although there may be a certain time interval in the acquisition of image pair (usually no more than 1-2 weeks which is the common appointment interval of ICGA check in clinic).That is to say, the differences between these two modalities are mainly caused by the capture angle and visual field, which can be aligned through affine transformation such as scaling, rotation, etc.However, the registration task still faces great challenges because both the inter-subject differences of ICGA/MCSL and intra-subject differences between ICGA and MCSL are great, as shown in Fig. 1.This will make it difficult to optimize the required affine transformations at once, including scaling, rotation, translation and shearing transformations, especially large translations and scaling.
Therefore, in this paper, we propose a two-stage affine registration framework to achieve the registration of ICGA and MCSL images from coarse to fine.The affine registration parameters are regressed through the two-stage registration networks.In the first stage, the feature information of the image such as optic disc is used for coarse registration, which can greatly decrease the initial registration error.In the second stage, in order to avoid the over-fitting caused by the registration ground truth based supervised training and the under-fitting caused by the few prior structural feature (only optic disc) based weakly-supervised training, we adopts a dual-supervised training strategy by combining the supervised and weakly-supervised loss functions to achieve the fine registration of image pair.The main contributions of this paper are as follows: -The framework of two-stage registration from coarse to fine is proposed.
-The supervised and weakly-supervised loss functions are jointly applied in this work.
-Image prior information such as optic disc is efficiently used in both stages of the registration.
-Multimodal registration for ICGA and MCSL images is explored for the first time.
The remainder of the paper is organized as follows.In Section 2, the proposed method is described in detail.In Section 3, experimental results are shown and analyzed, and followed by the conclusions and discussions in Section 4.

Overview
The overview of the proposed method is shown in Fig. 2, including coarse registration and fine registration in both training and test phases.Due to the large scale and space differences between the original ICGA and MCSL image pair, a coarse registration consisting of translation and scaling can greatly reduce the initial registration error and reduce the difficulty of the subsequent fine registration.The translation parameters are calculated from the centroid coordinates of the paired optic disc labels.The scaling parameter is an empirical value which is statistically analyzed and estimated according to our experimental dataset.The fine registration network is improved from a Resnet18 model [47].During the training of the fine registration network, the predicted affine parameters and the registered moving label are used to calculate the supervised loss function L rmse (Eq.( 10)) and the weakly-supervised loss function L dice (Eq.( 11)) respectively, which realize the dual-supervision loss function L (Eq. ( 9)).
The networks are trained to obtain the optimal network parameters ξ by minimizing the dual-supervision loss function L and predict the optimal affine transformation parameters M : Where Γ represents the proposed networks.ξ is the parameters of the neural network.x f and x m are the original fixed image and moving image respectively.l f and l m present the corresponding fixed label and moving label respectively.M gt and M are the ground truth transformation parameters and predicted transformation parameters respectively.
In the test phase, the original image pair (x f , x m ) and the corresponding label pair (l f , l m ) are coarsely registered first, and then the coarsely registered image pair and the corresponding label pair are sent to the trained fine registration network to predict the parameters.In the coarse registration Γ coarse stage, two trained U-Net networks [27] are used to automatically segment the optic disc label (l f , l m ).The translation parameters obtained from the label centroid coordinates alignment, combining with the scaling parameters, are adopted to obtain the coarse registration affine matrix M coarse : And then the coarsely registered moving image x w m and the corresponding label l w m are obtained by Eq. ( 4): Re where the Re sampler(•) is the operation of affine transformation.
In the fine registration Γ fine stage, the trained fine registration network is used to predict the affine transformation matrix M fine : Finally, the original moving image x m is interpolated only once to get the final registered image x reg m : Re sampler(x m ; M coarse M fine ) = x reg m (6) It can avoid information loss caused by multiple interpolations.No manual registration parameters or manually labeled optic disc labels are required during the test phase.
We summarize some advantages of the proposed framework as shown in Fig. 2. First, we make full use of optic disc label in both stages of registration, which solves the problem that retinal vessels are very insignificant in the late phase of ICGA imaging (shown as Fig. 1(a)-(d)).Second, the supervised and weakly-supervised loss function are effectively combined in the proposed network.Such dual supervision mechanism is the trend of deep learning based registration [48].

Affine registration
Affine registration is a linear and global transformation, in which the transformation parameters for each pixel are the same.That is to say the coordinates of the moving image can be mapped to the fixed image through a set of parameters.Equation (7) represents the mathematical expression of the affine transformation (rotate around image origin).
where  8) also gives the detailed composition of the affine transformation matrix, where θ is the rotation angle, k x , k y are the shear coefficients in x and y direction, and λ x , λ y are the scaling factors in x and y direction, respectively.

Coarse registration
To make the deep learning network suitable for multimodal image registration with large deformation, it is necessary to reduce the initial deformation.In this paper, the prior information of the image is applied to coarsely register the image pairs.In ICGA and MCSL images, the most remarkable features are the retinal vessels and optic disc.But the retinal vessels are very faint in the late-stage ICGA images (as shown in Fig. 1(c) and (d)) or severely disturbed by the choroidal vessels.Moreover, the noise is serious in the middle-stage ICGA images (as shown in Fig. 1(a) and (b)).Therefore, the retinal vessels are not suitable to be used as prior information.
On the contrary, optic disc with near-circular structure is robust in both modalities, so it can be used as the prior information in the coarse registration.
In the training phase, the manual-labeled optic disc label is used to calculate the affine matrix for the coarse registration of the image pair and the corresponding optic disc pair.In the test phase, as shown in Fig. 3, two individually trained U-Net networks are used to automatically segment the optic disc labels in ICGA and MCSL images respectively for coarse registration.The centroid coordinates of the fixed optic disc label (X , Y ) and the moving one (X 2 , Y 2 ) are used to calculate the coarse registration matrix M coarse .The coarse registration transformation is applied to the moving image and its optic disc label.Equation ( 8) describes the process of coarse registration, where t x = X − X and t y = Y − 0.6Y are translation parameters in x and y directions and the 0.6 is the empirical value of the scaling parameter in the height direction.In order to reduce the network's dependence on the ground truth parameters and further improve the registration performance, the optic disc label information is used as an auxiliary supervision and the dual-supervised loss functions are adopted to optimize the fine registration network.Equation ( 9) shows the loss function of the network, which composes of root mean square error (RMSE) loss and Dice loss.

Fine registration
Where The network uses the RMSE loss function L rmse shown in Eq. (10) to evaluate the difference between the ground truth affine parameters and the predicted affine parameters in the form of mini-batch data.
where b represents batch size, v(•) is the vector form of matrix, and M i gt and M i represent i th ground truth affine parameter matrix and the corresponding predicted affine parameter matrix in a batch, respectively.
The predicted affine registration parameters are applied to the coarsely warped optic disc label l w m to obtain the registrated label l reg m by spatial resampling.The Dice loss for l reg m and the fixed label l f is shown in Eq. ( 11):

Implementation details
All the experiments were implemented with PyTorch on a Linux server running Ubuntu 16.04, with Intel Core i7-8700 CPU and 8 GB RAM.The networks were trained on a single NVIDIA GeForce GTX1060 GPU with 3 GB RAM.The initial learning rate is set to 1e-3 in the coarse registration stage with the SGD optimizer and Poly strategy.The learning rate of the fine registration stage is 1e-3, and the optimizer is the Adam.Batch size b is set as 16.

Dataset
The collection and analysis of image data were approved by the Institutional Review Board of Shanghai General Hospital and adhered to the tenets of the Declaration of Helsinki.An informed consent was obtained from each subject.The medical records, ICGA (Heidelberg Retina Angiography 2, Heidelberg Engineering, Heidelberg, Germany, 596 X 496 pixels) and MCSL (SPECTRALIS, Heidelberg Engineering, Heidelberg, Germany, from 596 X 496 pixels to 960 X 496 pixels) database of Shanghai General Hospital from July 2018 to June 2019 were searched and reviewed.Totally 117 pairs of ICGA and MCSL images from 112 eyes (85 patients) were included, including 102 with linear lesions (such as Fig. 1(a) and (c)) and 15 without linear lesions (such as Fig. 1(b) and (d)).The heights of the original MCSL images are fixed at 496, while the widths vary from 596 to 960 because of different view angles.Previous study [49] reported that linear lesions are hypofluorescent in the late ICGA phase, which is 15 minutes after ICG dye injection.In the late phase, blood vessels have different morphologies and appear as bright white (such as Fig. 1(a) and (b)) or dark gray (such as Fig. 1(c) and (d)).In order to reduce the computational cost, all images are resampled to (256, 256), grayed and normalized to [0, 1].Online data augmentation was applied during training, including rotation [-5 • , 5 • ], translation [-6, 6] and scaling [0.9, 1.2].Five-fold cross validation was adopted to evaluate the performance of the proposed method.All data were randomly split into five parts according to the subjects and initial registration errors, which contain 23, 23, 23, 24 and 24 image pairs.Ground Truth.Under the supervision of the experienced ophthalmologists, three pairs of key points including the intersections and bifurcations of blood vessels are manually selected in the ICGA and the corresponding MCSL image.The ground truth of affine parameters are calculated by three pairs of key points.The ground truth of optic disc is manually labeled under the supervision of the experienced ophthalmologist.

Evaluation of optic disc segmentation
In the coarse registration stage, the optic disc is segmented based on the original U-Net both in ICGA and MCSL images.The indexes including IoU (Intersection over Union), Dice coefficient, sensitivity, specificity and accuracy of the optic disc segmentation are shown in Table 1, which indicate the good performance and ensure the feasibility of optic disc centroid calculation in the coarse registration stage and the prior image feature based weakly-supervision in the fine registration stage.

Metrics
To evaluate the performance of the proposed method objectively, the RMSE of distance on five key points and Dice similarity coefficient (DCS) [50] and target registration error (TRE) [33] on the optic disc label pair are adopted in this paper.The DSC reflects the overlap degree of the optic disc label pair and the TRE reflects the center distance error of the optic disc label pair.Paired Wilcoxon signed-rank tests (significance level α H = 0.05) are applied to compare medians of the registration results between different methods.

Ablation experiments
In this section, the effect of two-stage network and the improvement of dual-supervised loss functions are investigated.Table 2 and Table 3 show the results of the ablation experiments, in which "Baseline" means single-stage registration (only fine registration network) and single supervision loss function with RMSE loss, "TS" means two-stage registration + single supervised loss function of RMSE loss, and "TD" means two-stage registration + dual supervised loss function of RMSE loss and Dice loss.As can be seen from Table 2, the proposed "TD" framework achieved a median RMSE of 4.42 pixels on five key points with first and third quartiles being 3.13 and 5.91 pixels, a median DSC of 0.888 on optic disc label with first and third quartiles being 0.847 and 0.919, and a median TRE of 3.162 on label centroids with first and third quartiles being 1.58 and 4.30 pixels.More detailed results are summarized in Table 3 and shown in Fig. 5. Compared with original image pair, the "Baseline" method significantly decreased the RMSE of key points (p-value < 0.001) and the TRE of optic disc label centroids (p-value < 0.001), and significantly increased the DSC of the optic disc label pair (p-value < 0.001).The "TS" method significantly outperformed the "Baseline" method on all metrics of RMSE, DSC and TRE with p-values < 0.001.This result indicate that the optic disc label centroid alignment based coarse registration can reduce the initial error effectively and reduce the difficulty of fine registration.The proposed "TD" method significantly surpassed "TS" method in indexes of RMSE and TRE with p-values < 0.001 and in index of DSC with p-value = 0.0011, which indicates that the addition of auxiliary supervision of optic disc label can further refine the registration result.Specially, the improvement on RMSE index between "TS" and "TD" in Table 2 and Table 3 seems to be slight.We think the reason may be that the annotated key points are relatively sparse and fine registration (such as small-scaling, translation, and rotation) cannot be reflected well in the RMSE indicator.But as shown in Fig. 6(c) and (d), these fine tunings do improve the overall the registration result effectively.Figure 6 shows an example of registration results with different methods (the corresponding original image pair are shown in Fig. 1(a) and (e)).It can be seen from Fig. 6 that the overlap Fig. 5. Boxplot of the cross-validation results obtained from the networks described in section 2. The numerical results are also summarized in Table 2. degrees of both retinal vessels and optic discs gradually increase from left to right, which indicates that the proposed methods are gradually optimized through two-stage registration and dual-supervision strategies.By comparing with the overlap degree of the optic discs and the corresponding labels in Fig. 6(b) and Fig. 6(c), it can be seen that the coarse-fine registration strategy is effective.By comparing with overlap degree of the retinal vessels and the optic disc labels in Fig. 6(c) and Fig. 6(d), it can illustrate that combining the supervised and weakly-supervised loss function can improve the registration performance finely.
Figure 7 shows some registration results of two-modality image pair with linear lesions.It can be seen from Fig. 7 that the linear lesions in ICGA and MCSL images can be aligned well, which indicate the possibility of non-invasive diagnosis of linear lesion via MCSL imaging.  2 and Table 3 as "TD"), an additional model is trained only based on the abnormal data (102 pairs of images with lesions).Because there are only 15 pairs of normal data (data without lesions), we do not train the network only based on the normal data.Then we validate these two models with the mixed data, abnormal data and normal data, respectively.Table 4 shows the corresponding cross-validation registration results of the ablation experiments.As can be seen from Table 4, the overall performances of the model trained with mixed data (TD-M) are generally better than that of the model trained with abnormal data (TD-A).We think the possible reason is that the relatively large quantity and good diversity of training samples (data with and without lesions) in model TD-M.In addition, on model TD-A, the validation results with normal data is worse than that with abnormal data.The reason may be that model TD-A enables the network to learn and take advantage of the feature of linear lesion, which can not be used in the normal data.

Conclusion and discussions
The pathological myopia developed from high myopia and its complications is one of the main causes of blindness worldwide.The timely detection and analysis of linear lesion is necessary and effective for the prevention, supervision and treatment of pathological myopia.In our previous related research [7], an improved cGAN based framework was proposed to segment linear lesions in ICGA images.As mentioned above, ICGA is invasive and a part of patients may suffer from allergic reactions.To solve this problem, our team focuses the study on the possibility evaluation for the replacement between non-invasive MCSL imaging and invasive ICGA imaging in linear lesion diagnosis and analysis.The evaluation conclusion will be drawn according to the results of linear lesion joint segmentation in MCSL and ICGA images, which is our ongoing and challenging research.The MCSL and ICGA registration research in this paper is the necessary premise of the linear lesion joint segmentation and the subsequent evaluation.
In this paper, we propose a deep learning based two-stage registration framework for the registration of ICGA and MCSL images, which contains the coarse registration stage and fine registration stage.The optic disc label information is fully used in both coarse registration and fine registration to increase the robustness and effectiveness of the network.We also combine supervised and weakly-supervised learning strategies to train the fine registration network, which are achieved through RMSE loss and Dice loss of optic disc label, respectively.
There are still some shortcomings in this paper: (1) The quantity of the experimental dataset is insufficient, which only includes 117 image pairs (102 with linear lesions and 15 without linear lesions).The generalization of the proposed registration network can be improved by increasing the amount of dataset, especially the quantity of data without linear lesions.(2) To simplify the coarse registration stage, the original U-Net is used for the segmentation of the optic disc, whose segmentation error (shown in Table 1) may affect the registration performance gently.The optic disc segmentation accuracy should be further improved based on the improved U-Net or other advanced CNNs so that more information such as edge of optic disc can be fully used in the two-modality image registration.Introducing the adversarial training strategy and using discriminator as a similarity measurement function for multi-modality image registration are also a direction of our next work.(3) Although both ICGA and MCSL use confocal laser scanning imaging technology, the distortion of the retina's natural curvature cannot be unified because of the following two reasons: (a) The ICGA and MCSL images used in our experiments were acquired from two different devices.(b) The wavelength and number of lasers are different (ICGA: 795 nm, MCSL: 488 nm, 515 nm and 820 nm).The affine registration may not be sufficient to model the transformation between ICGA and MCSL.We will explore registration algorithm combining affine transformation and non-rigid transformation to achieve better registration performances in our near future work.On the foundation of further improvement of the registration accuracy, we will use the complementary characteristics of multi-modal information for the non-invasive detection and analysis of linear lesions in high myopia.
(a)-(d), the retinal vessels in ICGA images are revealed as hyper-fluorescent with noisy background (Fig. 1(a)) or low contrast (Fig. 1(b)) at about 20 minutes and blurry hypo-fluorescent at the very late stage (about 30+ minutes, Fig. 1(c) and (d)).

Fig. 2 .
Fig. 2. The framework of proposed method.In the training phase, black dash lines represent the data flow of dual supervision.In the test phase, the red lines represent the data flow of final registration.
(X, Y) and (X , Y ) represent the coordinates of the moving image and the fixed image respectively.a 11 , a 12 , a 21 and a 22 are the deformation parameters; b 1 and b 2 are translation parameters along axis x, y respectively.Parameters a 11 , a 12 , a 21 , a 22 , b 1 and b 2 constitute the coordinate transformation matrix.Equation (

Figure 4
Figure 4 presents the structure of the fine registration network, which consists of two parts: feature extraction layers and regression layers.The feature extraction layers are implemented based on Resnet18 without classification layer.The regression layers contain four cascaded fully connected layers after the last global average pooling layer of Resnet18, with 512, 256, 64

Fig. 6 .
Fig. 6.The example of registration results with different methods.The corresponding original ICGA and MCSL image pair are shown in Fig. 1(a) and (e) respectively.The rows from up to down represent the overlap of the image pair after registration, the corresponding magnified sections in the yellow box and the overlap of the optic disc label pair (white region) after registration respectively.Each column represents the results of different methods: (a) Before registration.(b) Baseline.(c) Two stage + Single supervised loss function (TS).(d) Two stage + Dual supervised loss function (TD).(e) Ground Truth (GT).

Fig. 7 .
Fig. 7.The registration results of ICGA and MCSL images with linear lesions.The yellow arrows refer to linear lesions.The last row shows the enlarged view of the part of image in the red box.The influences of model training based on data with or without lesions are also investigated in this section.Besides of the model which has been trained based on the mixed data (data with and without lesions, shown in Table2and Table3as "TD"), an additional model is trained only based on the abnormal data (102 pairs of images with lesions).Because there are only 15 pairs of normal data (data without lesions), we do not train the network only based on the normal data.Then we validate these two models with the mixed data, abnormal data and normal data, respectively.Table4shows the corresponding cross-validation registration results of the ablation experiments.

Table 2 . The cross-validation registration results of ablation experiments, measured with percentiles [25th, 50th, and 75th]. a
a (Bef.reg.represents before registration; "TS" represents Two stage + Single supervised loss function; "TD" represents Two stage + Dual supervised loss function.)

Table 3 . The cross-validation registration results of ablation experiments, measured with mean and standard deviation. a
a (Bef.reg.represents before registration; "TS" represents Two stage + Single supervised loss function; "TD" represents Two stage + Dual supervised loss function.)

Table 4 . The cross-validation registration results of ablation experiments, measured with percentiles [25th, 50th, and 75th] (the first 6 rows) and mean and standard deviation (the next 6 rows). a
a ("TD" represents Two stage + Dual supervised loss function.'TD-M' and 'TD-A' represent the model trained with mixed data and the model trained with abnormal data, respectively.)