An adversarially consensus model of augmented unlabeled data for cardiac image segmentation ( 𝐂𝐀𝐔 (cid:2878) )

: High quality medical images play an important role in intelligent medical analyses. However, the difficulty of acquiring medical images with professional annotation makes the required medical image datasets, very expensive and time-consuming. In this paper, we propose a semi-supervised method, CAU (cid:2878) , which is a consensus model of augmented unlabeled data for cardiac image segmentation. First, the whole is divided into two parts: the segmentation network and the discriminator network. The segmentation network is based on the teacher student model. A labeled image is sent to the student model, while an unlabeled image is processed by CTAugment. The strongly augmented samples are sent to the student model and the weakly augmented samples are sent to the teacher model. Second, CAU (cid:2878) adopts a hybrid loss function, which mixes the supervised loss for labeled data with the unsupervised loss for unlabeled data. Third, an adversarial learning is introduced to facilitate the semi-supervised learning of unlabeled images by using the confidence map generated by the discriminator as a supervised signal. After evaluating on an automated cardiac diagnosis challenge (ACDC), our proposed method CAU (cid:2878) has good effectiveness and generality and CAU (cid:2878) is confirmed to have a improves dice coefficient (DSC) by up to 18.01, Jaccard coefficient (JC) by up to 16.72, relative absolute volume difference (RAVD) by up to 0.8, average surface distance (ASD) and 95% Hausdorff distance ( 𝐻𝐷 (cid:2877)(cid:2873) ) reduced by over 50% than the latest semi-supervised learning methods


Introduction
According to the World Health Organization (WHO), cardiovascular diseases (CVDs) are the leading cause of death worldwide. 17.9 million people died from CVDs in 2016, and heart disease and strokes are classified as the leading CVDs. This number is increasing every year. Significant advances in cardiovascular research and practices have been made in recent decades, aimed at improving the diagnosis and treatment of heart diseases, as well as reducing CVD mortality. Modern medical imaging techniques, such as magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound are widely used, they allow the non-invasive qualitative and quantitative assessment of cardiac anatomy and function for diagnosis, disease monitoring, treatment planning, and prognosis [1].
It is worth noting that cardiac image segmentation is an important first step in many applications. It segments the image into a number of semantically (i.e., anatomically) meaningful regions, on the basis of which quantitative metrics such as myocardial mass, wall thickness, left ventricle (LV) and right ventricle (RV) volumes, and ejection fraction (EF) can be extracted. Typically, the anatomical structures of interest for cardiac image segmentation includes the left ventricle, right ventricle, left atrium, right atrium, and coronary arteries.
Cardiac MRI (constructed from a series of parallel short-axis slices) is considered the gold standard for the functional analysis of the heart because of its well-known ability to differentiate between different types of tissue [2]. However, there are some difficulties in cardiac MRI segmentation. For example, this includes inherent noise caused by motion artefacts and cardiac dynamics, as well as variations in the shape and intensity of cardiac structures from patient to patient and from condition to condition [3].
Current fully supervised segmentation methods in the field of cardiac image segmentation are mainly based on convolutional neural networks using fully convolutional networks (FCN) [4][5][6][7] or U-Net [8][9][10] architectures. However, one of the major challenges for deep learning methods is the scarcity of annotated data, especially in the field of medical imaging, where data is scarce. Most studies have used a fully supervised approach to train their networks, but this requires many annotated images. In fact, annotating cardiac images is time consuming and requires a lot of expertise.
Based on this, many semi-supervised algorithms have been applied to the field of medical image segmentation. In a semi-supervised learning task, only a small fraction of the training images is assumed to have full pixel-level annotations, while a large number of unlabeled images are available to improve accuracy and generalization. Since unlabeled data does not require labor-intensive annotation, any performance gains from using unlabeled data are low-cost. The challenge in this learning scenario is to use the large amount of unlabeled data effectively and thoroughly.
In this paper, we propose a semi-supervised based method, CAU , for cardiac image segmentation. This method is based on our original proposed method CAU [11]. Two mechanisms are used for upgrading from CAU to CAU . First, inspired by ReMixMatch, we replaced the data augmentation of CAU with CTAugment so that the model can dynamically learn the augmentation strategy during the training process. Second, inspired by AdvSemiSeg [12], adversarial learning is introduced by replacing the generator with a segmentation network (i.e., the teacher-student model in CAU), and a confidence map generated by the discriminator is made to guide the loss function as a supervised signal.
The contributions of this paper include the following: (1) We propose a semi-supervised algorithm for cardiac image segmentation, namely "an adversarially consensus model of augmented unlabeled data (CAU )", which enables low-cost and the high-precision segmentation of cardiac images.
(2) Our method combines a teacher-student model, and the overall framework is based on a weighted combination of supervised and unsupervised losses from this model. In this way, false identification is avoided and the regularization effect is improved.
(3) We extend the strong and weak augmentation of the data into CTAugment, which using the idea of control theory to dynamically learn the magnitude of each transformation during the training process.
(4) We propose a combination of unsupervised loss to make full use of unlabeled data, i.e., minimizing the difference between network predictions under different data augmentation treatments and using minimized entropy for the output of both networks. Based on this, adversarial learning is introduced to add adversarial loss to unsupervised loss and train discriminators to facilitate semisupervised learning of unlabeled images by using the confidence map generated by the discriminators as a supervised signal.
(5) We validate CAU on the ACDC dataset, experimentally demonstrating the effectiveness of our method. Experiments shows that CAU has improved over the original CAU in almost all experiments with the same amount of data (up to 1.17 higher DSC, up to 5.64 lower ASD, up to 1.94 lower , up to 3.24 improvement in JC and up to 0.06 improvement in RAVD). CAU improves DSC by up to 18.01, JC by up to 16.72, RAVD by up to 0.8 and reduces ASD and reduced by more than 50% than the latest semi-supervised learning methods. It also outperforms a fully supervised algorithm using all labeled data in the ACDC dataset with 35% and 50% labeled data.

Semi-supervised learning
Many semi-supervised learning methods provide better generalization of the model by adding a loss term to the unlabeled data. The loss term usually consists of the following: 1) Entropy minimization, which encourages the model to output high confidence predictions on unlabeled data.
2) Consistency regularization, which encourages the model to output the same probability distribution after perturbing the data.
3) Generic regularization, which encourages better generalization and reduces overfitting. MixMatch [15] achieves good results by combining these methods into one loss. ReMixMatch [16] improves on MixMatch with two components: Distribution Alignment, which distribute the predictions of unlabeled data aligned to labeled data, and Augmentation Anchor, which uses the predictions of the weakly augmented samples as the training target for the strongly augmented version. To generate a strong augmentation, ReMixMatch proposes a variant of AutoAugment, also known as CTAugment, which can learn the augmentation strategy simultaneously during training. FixMatch [17] simplifies MixMatch and ReMixMatch by using the weak enhancement method to obtain a pseudo-label for unlabeled data, and then uses the pseudo-label to monitor the output values of the strong enhancement.

Automated data augmentation based on control theory
Data augmentation is an effective technique to improve the accuracy of modern image classifiers. AutoAugment [18] is a method for learning data augmentation strategies to improve the accuracy of validation sets. The augmentation strategy consists of a set of transformation parameter magnitude tuples to be applied to each image. Crucially, however, AutoAugment is learned under supervision (i.e., the magnitude and order of the transformations are determined by training many models on an agent task). This makes the application of the AutoAugment method problematic for semi-supervised learning on low-labeled semi-supervised learning, especially for medical images with sparsely labeled images. To compensate for the need to train the strategy on labeled data, RandAugment [19] uses uniform random sampling transformations, though this requires tuning the hyperparameters of random sampling on the validation set; however, this is also methodologically difficult when very little labeled data is available.
ReMixMatch introduces a control-theory-based variant of AutoAugment, called CTAugment, which uses ideas from control theory to eliminate the need for augmentation learning in AutoAugment. Unlike AutoAugment, CTAugment learns the augmentation strategy while the model is being trained, making it particularly convenient to set up in semi-supervised learning.
In CTAugment, there is a set of 18 possible transformations and the magnitude values of the transformations are divided into bins, witheach bin assigned a different weight. Initially, all bins have a weight of 1. Now two transformations are randomly selected from this set with equal probability to form a sequence of transformations, similar to RandAugment. For each transformation, a magnitude bin is randomly selected based on the normalized bin weights; labeled samples are augmented by these two transformations and fed to the model to predict how close the model predictions are to the actual labels. Then, the bin weights of these transformations are updated. In this way, CTAugment learns to select models that have a higher chance of predicting the correct label and thus augment within the network tolerance.

Adversarial learning for image segmentation
In a game-theoretic sense, generative adversarial networks are based on a game between two machine learning models. The game between two machine learning models in the game theoretic sense is usually implemented using neural networks.
We can think of generative adversarial learning as being a bit like counterfeiters and policeman: counterfeiters create counterfeit currency, while policeman try to arrest counterfeiters and keep legitimate currency in circulation. The competition between the counterfeiters and the police leads to increasingly realistic counterfeits, until the counterfeiters create prefect counterfeits and the police are unable to tell the difference. A complication of this analogy is that the generator learns from the gradient of the discriminator, as if the counterfeiter had planted a mole among the police to report the specific methods the police use to detect counterfeit currency.
Since the framework of generative adversarial network (GAN) and its theoretical foundations were proposed, it has provided ideas for research in many directions in the field of images. In the area of the semi-supervised semantic segmentation of images, several studies have used adversarial methods to make the segmentation of unlabeled images to be more like the segmentation of labeled images [20][21][22][23]. Considering the spatial resolution, Hung [12]. proposed a method for semi-supervised semantic segmentation using adversarial networks to design a discriminator in a fully convolutional manner to distinguish the predicted probability map from the true value segmentation distribution. In the field of medical image segmentation, Xu [25] proposed an adversarial model that allowed for a boundary mining model to learn from additional unlabeled data by evaluating segmentation performance and by providing pseudo-supervision. Zhang [21]. introduced adversarial learning to encourage the segmentation output of unlabeled data to be similar to the annotation of labeled data. Chen [20]. added a discriminator after the segmentation network to distinguish whether the input signed distance map is from either a labeled or an unlabeled image. These methods always include a discriminator to distinguish whether the input image is an annotation from either a labeled image or a prediction from un unlabeled image. , and the unlabeled samples are trained by , and . The discriminator network contains the discriminator, trains the discriminator, and is for adversarial training. Figure 1 shows the improved CAU model. The whole is divided into two parts: the segmentation network, and the discriminator network. The segmentation network, like CAU, is based on the teacher-student model: therefore the teacher model and the student model share the same architecture, and in this paper, we use U-Net. The labeled image is sent to the student model, and the unlabeled image is processed by CTAugment. The strongly augmented samples are sent to the student model and the weakly augmented samples are sent to the teacher model. Meanwhile, the similarity measures of the two models' outputs are calculated and the entropy of the two models' outputs is minimized. In the discriminator network, we add the adversarial loss , which is used to compute the confidence map through the discriminator network, and in turn, the confidence map is used as a supervised signal to guide the segmentation network (i.e., the teacher-student model). We use all prediction data to train the discriminator network and the loss function is used to train the discriminator network.

Adversarial learning in
Similar to AdvSemiSeg, the model consists of two parts: a segmentation network and a discriminator network. The former can be any network designed for semantic segmentation; in this paper, we use U-Net [26],( i.e., given an input image of size 3); the segmentation network outputs a class probability map of size , where C is the number of semantic classes. The framework of the segmentation network is based on the same teacher-student model as the CAU. The discriminator network takes the class probability map as the input, and the class probability map is a spatial probability map of size 1 obtained from either the segmentation network or from the ground truth label after one-hot encoding, and the discriminator outputs each pixel of the map, with a pixel value 1 indicating from the ground truth label, and a pixel value 0 indicating from the segmentation network. When using labeled data, the segmentation network is supervised by , and for unlabeled data, the loss function adds the adversarial loss to the CAU . After obtaining the initial segmentation prediction of unlabeled data from the segmentation network, we compute the confidence map by the discriminator network, and in turn use this confidence map as a supervisory signal to guide the segmentation network. AdvSemiSeg trains the discriminator using only labeled data, but due to the sparsity of medical image data, we use the full data to train the discriminator.

Augmentation in
CAU extends the strong augmentation and weak augmentation processing of data in CAU to CTAugment. CTAugment not only compensates for the disadvantage that AutoAugment [18] must be trained on an agent task before it can be used, but also compensates for the disadvantage that RandAugment cannot be trained on the rare cases where there are labeled images. It uses uniform random sampling transforms and is able to perform dynamic inference of the magnitude of each transform during training. Intuitively, CTAugment learns the likelihood that it will produce an image that is classified as correctly labeled. Using these possibilities, CTAugment samples only those enhancements that fall within the tolerance of the network. First, as in AutoAugment, CTAugment divides each parameter of each transform into different deformation magnitudes. Let be a vector of bin weights for a certain deformation parameter amplitude. At the beginning of training, all magnitude bins are set with a set of weights initialized to 1.
These weights are used to determine which magnitude bin is applicable to a given image. In each training step, two transforms are sampled uniformly and randomly for each image. To enhance the images for training, CTAugment generates a set of modified bin weights for each parameter of these transforms. If 0.8, , otherwise 0, and the magnitude bins are drawn from categorical (normalize ( )) [15]. To update the weights of the sampled transforms, CTAugment first samples one magnitude bin m uniformly and randomly for each transform parameter. The resulting transforms will then be applied to the image x with label , resulting in an enhanced version of the image x . Then, according to 1 | | ; | , which measures how well the model's predictions match the labels. The weight of the magnitude bin of each sample is subsequently updated to 1 , where 0.99 is a fixed exponential decay hyperparameter. In this paper, CTAugment is also divided into strong Aug and weak Aug. As mentioned in ReMixMatch, the exponential decay hyperparameter does not significantly affect the results, but depth and threshold have significant effects on the results, and according to the experiments in ReMixmatch, depth = 2, threshold = 0.8 gives the best results. In this paper, according to the parameter values of depth and threshold provided by ReMixmatch, the value of the threshold has no significant effect on the results, but the value of the depth has a significant effect on the results, and depth = 2 and depth = 1 can produce better results.

Mean teacher based semi-supervised framework
Our segmentation network is based on the mean teacher architecture, which is structured by two identical models, the student model, and the teacher model. The weights of both models are randomly initialized at the beginning of training. The weights of the teacher model are set using an exponentially weighted moving average (EMA) of successive student weights: : is the parameter of the student model and is the parameter of the teacher model. is a hyperparameter of the smoothing factor to control the coverage of the EMA in the training history. According to the experience of [14], the set 0.999 achieves a great performance. Therefore, is also set to 0.999 in this paper. Each prediction sample of the teacher model can be considered as an ensemble of the current and previous versions of the student model.

Loss function
CAU consists of a segmentation network and a discriminator network. The segmentation network is trained by the following loss function: (1) where is the loss of training labeled data, and , , and are the loss of training unlabeled data.
is a hyperparameter, and are weight factors, which are defined by a time-dependent Gaussian warming up function: 0.1 [27]. Where represents the current training iteration and is the total number of iterations. is defined as: is the focal loss and is the dice loss. represents the prediction and represents the label of image .
is defined as: where is the entropy of . Define the student network as , and the teacher network as . denotes the strongly augmented data of the input student network, and denotes the weakly augmented data of the input teacher network. We calculate the Jensen-Shannon divergence of the student network and the teacher network, which is used to make their predictions for the unlabeled data close.
Derived from Shannon entropy [28], is defined as: where represents the input image, and is an entropy map consisting of independent pixellevel entropies in the normalized range [0,1]. We can encourage the model to make more confident predictions on unlabeled data by entropy minimization [29].
is defined as: • indicates the discriminator network. With this loss, we train the segmentation network to deceive the discriminator by maximizing the probability of the prediction results that are generated from the ground truth distribution.
The discriminator network is trained by cross entropy loss, it is defined as: where is the expectation of the prediction value and is the expectation of the true value. The purpose of is to train a discriminator network for adversarial training. The goal of the discriminator network is to distinguish whether the input is either a ground truth labeled image or a probabilistic map generated by a segmentation network.

Experiment and analysis
We experimented CAU and CAU on an ACDC dataset. Our methods are compared with five existing semi-supervised algorithms in the cases of 10, 15, 20, 35, 50% and 1-5% of the labeled data, respectively.
The loss function used in the fully supervised algorithm is the same as those used in semisupervised algorithms for the labeled data, and the labeled images are randomly selected from the dataset. The base model used in all our experiments is a classical and effective model in the field of medical image segmentation, U-Net. We use the cosine learning rate strategy 0.05 1.0 cos 1 . The optimizer uses stochastic gradient descent (SGD) with a learning rate of 0.03. The batch size is 8, and the total number of iterations is 30,000. The training process slices the 3D images (total number of slices is 1562) for 2D segmentation and the predictions are generated slice by slice and stacked into a 3D volume. The number of slices with labeled data used for different percentages of the experiments is shown in Table 1.

Dataset
In this paper, all experiments and comparisons are based on the public benchmark dataset ACDC. The ACDC dataset was created from real clinical examinations obtained at the University Hospital of Dijon and it has a larger scope than previous cardiac datasets because it includes expert manual segmentation results for the right and left ventricles and myocardial epicardial contours. The 200 MR images with annotated short-axis cardiac images from 100 patients make the ACDC dataset a study material for clinical and algorithmic studies, and the dataset contains the left ventricle (LV), myocardium (Myo), and right ventricle (RV) and their corresponding segmentation masks. Given the large intervals between short-axis slices and the potential for interslice shifts due to respiratory motion, the ACDC dataset is more suitable for 2D segmentation than conventional cardiac images that must be segmented in 3D [30].
We selected 20% of the total dataset as the test set, 10% of the remaining data as the validation set and 90% as the training set. We crop all training images to the same size 256 256, normalize their intensities to the range [0,1], and randomly disrupt them before feeding them into the network for training.

Metric
We use five standard metrics to evaluate the performance of CAU ,including DSC, ASD, , JC and RAVD.
DSC DSC is an ensemble similarity measure function, usually used to calculate the similarity of two samples, and takes the value of [0,1]. The closer it is to 1, the better the result is. It is defined as: Where is true positive, is false positive, and is false negative. ASD ASD is a measure of the distance between two surfaces. It is defined as the average of a list of distances between each point on one surface and the nearest point on the other surface. It is defined as: Where and represents the set of surface voxels of and . ∑ min || • || represents the Euclidean distance of any voxel to • .

HD 95
The Hausdorff distance is a measure of the distance between two sets of points. It is defined as the maximum distance from one set to the nearest point in another set. 95% Hausdorff is the 95 percentile of the ordered distance measure and is more stable for smaller outliers. It is defined as: where is the point of set , is the point of set , and , is the Euclidian distance between and . JC JC is used to measure the similarity between finite sample sets and is defined as the size of the intersection set divided by the size of the union set. It is in the range of [0%,100%]. The higher the percentage, the more similar the two sample sets are. It is defined as: where ∩ represents the intersection of set and set , and ∪ represents the union of set and set .
RAVD RAVD is a metric used in medical imaging to evaluate the accuracy of segmentation algorithms. There is no fixed upper or lower limit for its value range. The closer the value is to 0, the closer the segmentation result is to the reference standard. It is defined as: (11) where represents the number of voxels in the reference standard and represents the number of voxels in the segmentation result.

4.3.
Because the segmentation network is based on the teacher-student model, the input of the segmentation prediction data in the discriminator is different from that of AdvSemiSeg, except for the input of the ground truth label. We found that the segmentation prediction with labeled data trained only with the adversarial loss function is poor, which is certain because the amount of data is inherently small. Additionally, we found that it is not good to include both labeled and unlabeled data in the adversarial loss training; therefore, the adversarial loss function training only trains the segmentation prediction of unlabeled data.   Table 3.
Those marked in red in Table 2 indicate the best value in a metric, and those marked in green indicate the second-best value in a metric.
CAU with CTAugment and adversarial training is better than CAU alone with CTAugment or adversarial training and is better than the original CAU.
According to the table, we can know that adversarial loss only trains weakly augmented data better than training all unlabeled data. The reason may be because the segmentation network uses the teacher-student model, and the strongly augmented data is into the student model, while the weakly augmented data is into the teacher model. The teacher model is an average of the continuous student model, in which it theoretically learns more useful and correct semantic information and training the strongly augmented data may mislead the model. We found little difference between the effects of _01 and _02 , so the following experiments were conducted for both combinations. Table 4 shows the experiments for _01 and _02 at 10, 15, 20, 35, and 50% and at the extremes of 1-5% with labeled data volume.
_0~ X 1,2 indicates the experiments with the parameter depth = 1, threshold = 0.85 for CTAugment. The ones marked in red in the table indicates the best value in a metric among the four experiments for a certain amount of data.
According to the table, it can be seen that: 1) depth = 1 is better for the case of 10-50% of data volume, i.e., the case with relatively more labeled data. While depth = 2 is better for the extreme case of 1-5% of data volume; and 2) performs better for the extreme case of 1-5% data volume and performs better for the case of 10-50% data volume.

Results
In Table 5, we recorded the total training time and the average time per epoch for each CAU, CAU , and other semi-supervised methods for 50% of the data. These are compared with the total time and average epoch time of the fully supervised method using all data.
Due to the time required for CTAugment to learn the augmentation strategy, the total time for CAU is relatively longer.
Our methods take slightly more time than the other semi-supervised and fully supervised algorithms in terms of total time and average epoch time, but because the trained models are offline, they are ready to use after training. Therefore, the small increase in time is worth the increase in accuracy.
The results of our method CAU on the ACDC dataset with 1-5%, 10, 15, 20, 35 and 50% data volumes are listed in Table 6. The ones marked in red indicate the best value of a metric in a certain data volume, and the ones marked in green indicate the second-best value. CAU has some improvements compared to CAU for most different data volumes, and performs much better than fully supervised and other semi-supervised algorithms with the same volume of data for extreme data amounts from 1-5%. Moreover, Dice, ASD, , JC, and RAVD indices perform better than fully supervised algorithm using all data for 35% and 50% of data amounts.  As shown in Figure 2, the bar chart demonstrates the Dice metrics for different algorithms with a different number of labeled images on the ACDC dataset. The blue color is the original CAU and the brown color is the CAU with the improvements to it proposed in this paper. As can be seen, our method CAU performs particularly well at the extremes of data volume (i.e., 1-5%). CAU improves compared to CAU in almost all cases with different data volumes. Moreover, the adversarial learning method DAN has lower indices than others in almost all cases. This indicates that adversarial learning alone does not work well in cardiac image segmentation, demonstrating the effectiveness of CAU in introducing adversarial learning into the original CAU.
As Figure 3 shows, the bar chart presents the ASD metrics for different algorithms on the ACDC dataset with different numbers of labeled images. The blue color is the original CAU and the brown color is the CAU with its improvement proposed in this paper. It can be seen that CAU has lower indices than the fully supervised algorithm and other semi-supervised algorithms that use the same volume of data for all different data sizes; the indices are improved compared to CAU for almost all different data sizes, in particular, CAU improves the problem that the original CAU has higher indices than the fully supervised algorithm that use the same volume of data for 50% of the time. Moreover, the adversarial learning method DAN and mean teacher outperform the fully supervised algorithm with the same volume of data in almost all cases, demonstrating that neither outperform the fully supervised algorithm with the same volume of data in almost all cases, demonstrating that neither method alone works well for cardiac image segmentation and validating the effectiveness of CAU .  As is shown in Figure 4, the bar chart presents the metrics for different algorithms on the ACDC dataset with a different number of labeled images. The blue color is the original and the brown color is the with its improvement proposed in this paper. It can be seen that has a lower index than the fully supervised algorithm and other existing semi-supervised algorithms that use the same volume of data for all different data volumes; this has a larger improvement compared to for 2-15% of the data volume, especially improves the problem that exceeds the fully supervised algorithm with the same volume of data in 50% of the cases. Moreover, in the extreme case (i.e.,1-5%), almost all other existing semi-supervised algorithms outperform the fully supervised algorithm using the same volume of data, while performs better than all algorithms, verifying the effectiveness of for cardiac image segmentation in the case of extremely small amount of data.  As shown in Figure 5, the segmentation results generated by different methods on the ACDC dataset are plotted. The first column is the ground truth, the second column is fully supervised that use all labeled images, the third column is fully supervised with various data volumes, and the fourth column is our improved algorithm for (i.e., , the fifth, sixth, seventh, eighth, ninth and tenth columns are , FixMatch, adversarial learning, entropy minimization, mean teacher and pseudo label [13], respectively). The segmented areas, colored blue, green and red in the Figure 5, are the left ventricle, myocardium, and right ventricle. It can be seen that and are closer to the ground truth in the extreme case of data volume (1-4%) than fully supervised and other semisupervised methods with the same data volume, and the observations also show the effectiveness of our method compared to fully supervised methods using all fully labeled images. In fact, our method can also be used to segment images from other imaging techniques, such as optical coherence tomography (OCT) [31][32][33][34]. OCT is a non-invasive imaging technique that provides structural and functional imaging of retina with high spatial and temporal resolution. In the future, we will try to apply our methods to the segmentation of images from other imaging techniques, which we believe will lead to further advances in medical image segmentation.

Conclusions
In this paper, we propose a semi-supervised training method CAU which is suitable for cardiac image segmentation. The whole is divided into two parts:the segmentation networkand the discriminator network. The segmentation network is based on the teacher student model. Labeled image is sent to the student model. Unlabeled image is processed by CTAugment. Then, the strongly augmented samples are sent to the student model and the weakly augmented samples are sent to the teacher model. The loss function adopts a hybrid loss function, which mixes the supervised loss for labeled data with unsupervised loss for unlabeled data. Adversarial learning is also introduced to facilitate semi-supervised learning of unlabeled data through confidence maps generated by discriminators. We validate CAU and CAU on the ACDC dataset. Experiments show that CAU has improved over the original CAU in most experiments with the same amount of data (up to 1.17 higher DSC, up to 5.64 lower ASD, up to 1.94 lower , up to 3.24 improvement in JC and up to 0.06 improvement in RAVD). And CAU improves DSC by up to 18.01, JC by up to 16.72, RAVD by up to 0.8, ASD and reduced by more than 50% than the latest semi-supervised learning methods. It also outperforms fully supervised algorithm using all labeled data with 35% and 50% labeled data.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.