Weakly supervised semantic segmentation of leukocyte images based on class activation maps

Leukocytes are an essential component of the human defense system, accurate segmentation of leukocyte images is a crucial step towards automating detection. Most existing methods for leukocyte images segmentation relied on fully supervised semantic segmentation (FSSS) with extensive pixel-level annotations, which are time-consuming and labor-intensive. To address this issue, this paper proposes a weakly supervised semantic segmentation (WSSS) approach for leukocyte images utilizing improved class activation maps (CAMs). Firstly, to alleviate ambiguous boundary problem between leukocytes and background, preprocessing technique is employed to enhance the image quality. Secondly, attention mechanism is added to refine the CAMs generated by improving the matching of local and global features. Random walks, dense conditional random fields and hole filling were leveraged to obtain final pseudo-segmentation labels. Finally, a fully supervised segmentation network is trained with pseudo-segmentation labels. The method is evaluated on BCCD and TMAMD datasets. Experimental results demonstrate that by employing the pseudo segmentation annotations generated through this method can be utilized to train UNet as close as possible to FSSS. This method effectively reduces manual annotation cost while achieving WSSS of leukocyte images.


Introduction
Leukocytes are an important component of the human immune system, screening and early diagnosis of blood cancer can reduce mortality among cancer patients [1].The collection, segmentation, classification, and counting of leukocytes are the main tasks in studying its impact on human health.Due to the varying numbers and proportions of different types of white blood cells corresponding to different diseases, the accuracy of white blood cell detection is crucial for disease auxiliary diagnosis [2].
With the development of deep learning, large-scale datasets and the rapid development of high-performance GPU computing, automatic feature extraction through deep learning techniques has achieved significant success in various computer vision tasks without manual intervention [3].Recently, fully supervised semantic segmentation (FSSS) has made remarkable progress in white blood cells images segmentation [4][5][6].However, creating large scale training datasets with precise pixel-level annotations for per image is considerably expensive and requires laborintensive and time-consuming tasks.Not just more professional medical pathological images, according to the statistics, it takes 1.5 hours to create a pixel level annotation in cityscapes dataset [7].
Weakly supervised semantic segmentation (WSSS), which is used to train networks using image-level annotations, scribbles, bounding boxes or points.These are weaker than pixel-level annotations and the acquisition cost is lower.Among them, image-level is the easiest to obtain and the lowest manufacturing cost.Many previous related works have been proposed, one of which is using WSSS based on the class activation maps (CAMs) [8].In the segmentation of natural scene, Jiwoo et al. trained an additional network called AffinityNet hat predicts semantic affinity between a pair of adjacent image coordinates [9].AffinityNet generates a transformation matrix and multiplies it with CAMs to adjust its activation coverage range.IRNet generates a transition matrix from boundary activation mapping and extends this method to achieve weakly supervised instance segmentation [10].Sanghyun et al. proposed Puzzle-CAMs, a process that minimizes differences between the features from separate patches and the whole image [11].Puzzle-cam can activate the entire region of the target effectively, rather than the most discriminiative part.Wang et al. considered the relationship between the threshold set for training AffinityNet and the classification network, designed an iterative self-improving segmentation model based on the encoder decoder structure to simultaneously train the classification and segmentation networks [12].
For histopathology images, Qu et al propose a novel approach using points annotation for nuclei segmentation [13].Two types of coarse labels with complementary information are derived from the points annotation, and then are utilized to train a deep neural network.The fully connected conditional random field loss is utilized to further refine the model.Chan et al. trained a convolutional neural networks on patch-level annotations and applied Grad-CAM to infer coarse class map, follow by class-specific modifications, and used CRF for post-processing to generate fine pixel-level nuclear segmentation maps [14].Compared to natural scene images, there are relatively few papers of using the lowest cost image-level annotation to apply WSSS in other fields, such as medical histopathology images and visible light satellite images [15].The advancement of medical technology has led to the emergence of an increasing number of white blood cell datasets.The number of images in these datasets continues to grow, with high resolutions in white blood cell images and often a large number of cells [16].Adding pixel-level annotations requires personnel with specialized knowledge to ensure accurate labeling [17].Obtaining image-level annotations is easy.Existing public blood cell datasets often provide category information.Therefore, it's feasible and necessary to use weakly supervised methods to solve the problems of difficulty and high cost in white blood cell images annotation.The statement of significance is shown as Table 1.

Table 1. The statement of significance problem
Existing segmentation networks for leukocyte images only using FSSS, resulting in labor-intensive and time-consuming tasks.

What is Already Known
Existing method using CAMs for natural scene achieved great success in WSSS, but it is blank in leukocyte images.

What this Paper Adds
Propose to add preprocessing and attention mechanisms to generate CAMs, and optimize CAMs.Implemented WSSS of white blood cell images.
This paper focuses on image level annotation to achieve WSSS of white blood cell images.This paper's main contributions are as follows: (1) In the stage of training the classification network for generating CAMs, a preprocessing operation for enhancing the Lab space of leukocyte images is added, and an attention mechanism is introduced to the backbone network for generating CAMs by matching part and global features.For different categories of white blood cells, the improved CAMs have higher quality and are used as seeds for generating pseudo labels.
(2) In the optimization stage of CAMs, after AffinityNet network, random walk and dense conditional random fields, using hole filling to improve the quality of pseudo segmentation labels.
(3) There is no other data available, only image-level annotations are used for WSSS of white blood cell images, generating pixel-level pseudo labels to reduce the cost of white blood cell image segmentation annotation.

Method
We propose a method of leukocyte images segmentation using only image level annotations.This method mainly incorporates: Addressing the issues of blurred boundaries and low contrast, in training the classification network for generating CAMs, the preprocessing operation of contrast stretching was used.Based on the method of matching part and global features, an improved backbone network feature extraction network is trained to generate CAMs.Then, semantic AffinityNet, random walk algorithm, dense conditional random field and hole filling are used to optimize CAM, ultimately generating pixel-level pseudo segmentation annotations.

Preprocessing
In blood cell datasets, the boundaries between white blood cells and the background are relatively unclear.Although the classification network focuses on the most discriminative regions to determine the categories of different white blood cells, clear boundaries are vital for image-level weakly supervised segmentation.This is because clear boundaries make it easier for the generated CAMs to grow in the region with semantic similarity, which avoiding over-activation and under-activation.
In this paper, based on mainstream preprocessing methods, contrast stretching is added.Firstly, the input image is converted to the Lab color space.For the L channel, the maximum and minimum values are found.The difference between the brightness value of each pixel and the minimum brightness value is calculated and multiplied by a scaling factor.The afore mentioned operations on the L channel are expressed as Eq.(1): where, P l represents the pixel value on the L channel, P l_max , P l_min respectively represent the maximum and minimum values of L channel, and P l_stretch is transformed value.Then, the stretched L channel is merged with the a and b channels to create a new Lab color space image, which is finally converted back to the RGB color space.The process of enhancing the Lab color space is shown in Fig. 1.Compared to the preprocessing process in [11], the complete preprocessing for WSSS of white blood cells images include cropping, horizontal flipping, Lab enhancement, normalization, and random cropping.

CAMs
As the seed for generating pseudo segmentation labels, the quality of CAMs directly affects the subsequent training of pseudo segmentation labels and fully supervised segmentation networks.CAMs is obtained from classification networks, and generating high-quality CAMs depends on the performance of the classification network.Given white blood cells have similar contours, distinguishing different categories primarily based on the morphology of the nucleus.Due to the network pays more attention to the most recognizable area, CAMs often fail to cover the entire cell area adequately, resulting in under activation.In addition, impurities in the background of images with colors similar to those of white blood cells can lead to non target areas being activated excessively, causing overactivation issues.To address these challenges effectively while ensuring accurate classification and comprehensive coverage of white blood cells, we propose a local and global matching method incorporating channel attention for generating CAMs.
where, Pl represents the pixel value on the L channel, Pl_max, Pl_min respectively represent the maximum and minimum values of L channel, and Pl_stretch is transformed value.Then, the stretched L channel is merged with the a and b channels to create a new Lab color space image, which is finally converted back to the RGB color space.The process of enhancing the Lab color space is shown in Fig 1 .Compared to the preprocessing process in [11], the complete preprocessing for WSSS of white blood cells images include cropping, horizontal flipping, Lab enhancement, normalization, and random cropping.

CAMs
As the seed for generating pseudo segmentation labels, the quality of CAMs directly  By improving upon the structure used in [11], our generated CAMs for white blood cells exhibit closer resemblance to real pixel-level annotations compared to unmodified CAMs.The specific method is to generate CAMs by an improved classification network, which is divided into two branches: upper and lower.On the first branch, a white blood cell image is input into the feature extraction backbone network, which follows the structure of Resnet50 that consists of a convolutional layer conv1 and four residual structures [18].To enhance the feature extraction capability of the network, a Squeeze and Excitation (SE) [19]mechanism block is introduced after the four residual structures in the backbone network structure, assigning different weights to different positions of the image from the perspective of the channel domain.Denote the last feature map of backbone as A s , after global average pooling, the classification loss Lcls for predicted categories and image-level annotations is calculated.On the second branch, the white blood cell image is first divided into four patch blocks through the Tiling module.Then, after passing through Backbone, the feature maps generated by the four patch blocks are re concatenated into a new feature map A re .After global average pooling, the The specific method is to generate CAMs by an improved classification network, which is divided into two branches: upper and lower.On the first branch, a white blood cell image is input into the feature extraction backbone network, which follows the structure of Resnet50 that consists of a convolutional layer conv1 and four residual structures [18].To enhance the feature extraction capability of the network, a Squeeze and Excitation (SE) [19] mechanism block is introduced after the four residual structures in the backbone network structure, assigning different weights to different positions of the image from the perspective of the channel domain.Denote the last feature map of backbone as A s , after global average pooling, the classification loss L cls for predicted categories and image-level annotations is calculated.On the second branch, the white blood cell image is first divided into four patch blocks through the Tiling module.Then, after passing through Backbone, the feature maps generated by the four patch blocks are re concatenated into a new feature map A re .After global average pooling, the classification loss L p_cls on this branch is calculated.The last layer of feature maps before global average pooling in two branches, namely CAMs, is used to calculate the reconstruction loss L re .The reconstruction loss corresponds to the CAMs generated from the original image and the segmented image, providing self supervision.L cls and L p_cls are employed to improve classification performance and roughly estimate the target area, while L re is utilized to narrow the gap between pixel-level and image-level supervision.The backbone part of the upper and lower paths is weight sharing.
The global average pooling operation is denoted as G, and the multi label soft margin loss function is denoted as l cls , Ŷs =G(A s ) is the prediction vector of the classification network.
L cls is represented by the Eq. ( 2): L p_cls expressed as Eq. ( 3): L re expressed as Eq. ( 4): The total loss is expressed as Eq. ( 5): where, the value of α is 4.

CAMs optimization
The method based on affinity learning is to refine the representative methods in CAM using the seed region growth idea.This method usually learns and predicts the pixel affinity relationship in the image through an additional network, and finally uses the predicted affinity relationship as the similarity judgment basis for region growth.The standard AffinityNet model is considered as classic model based on pixel affinity learning methods.AffinityNet uses discriminative regions obtained from CAMs as initial "seed" regions, constructs a network to learn the affinity relationship between pixel pairs in the image, and uses the learned affinity relationship as the judgment basis for region growth.Through random walk algorithm [20] and conditional random field [21], semantic information is diffused from the seed to refined pseudo labels.And use pseudo labels to train a fully supervised semantic segmentation network.In this paper, AffinityNet is based on Resnet50 as the backbone network, integrating the output of each sub block of the backbone network.Each sub block is reduced to a unified dimension through 3 × 3 convolutional layers, and then concatenated in the channel direction through a 1 × 1 convolutional layer.The training objective is to calculate the euclidean distance between two pixels on the corresponding specified feature map, which is described in Eq. ( 6): where i and j represent the positions of two pixels, f aff represents the concatenated convolutional feature maps.The specific AffinityNet network structure used in this paper is shown in Fig. 3.We noticed that the random walk algorithm, after refining the output matrix of AffinityNet to obtain CAM, although dense conditional random fields can improve the quality of generated pseudo segmentation labels, there will be regions in the target that judge a few pixel values as background, which means there will be holes in the target, as shown in the second column of the figure.This is not friendly for training segmentation models using fully supervised networks in the second stage.Therefore, after conditional random fields, hole filling is added.The complete optimization process for CAMs includes: utilizing the semantic affinity matrix generated by the trained Affinity Net network, applying the random walk algorithm, incorporating conditional random fields, and refining CAMs through hole filling.
The commonly used hole filling algorithms rely on morphological operations such as dilation and iterative filling methods.Figure 4 presents the hole filling used in this article: first, find the contour of each target from the obtained pseudo segmentation labels, then remove the category to obtain a binary image containing only the contour, and finally fill the binary image with the pixel values of that category.Considering that some targets may stick together (as shown in the second and third rows in the figure below), contour based filling involves drawing two different categories of cells into one contour.If the maximum value is used for filling, such as in the third image, it is beneficial to judge some pixel values of one cell as belonging to other categories.However, this situation is incorrect for the second image.So do not solve this situation.

Experimental environment
All experiments in this paper used mainstream framework Pytorch, Ubuntu 20.04, two RTX3090 GPUs for the model to train the dataset.

Dataset
BCCD dataset: In the BCCD (Blood Cell Count and Detection) dataset, each image provides the annotation of white blood cell category, including 5 types of white blood cells.There were 364 images of neutrophils (NEU), eosinophils (EOS), lymphocytes (LYM), monocytes (MON) and basophils (BAS).Most of the images contained only a single cell, and a few contained multiple cells.Since the number of BAS in the original data set is too small, in order to ensure that the image of the training set, validation set and test set can maintain the same data distribution before amplification, only NEU, EOS, LYM and MON are weakly supervised segmented in this paper.Due to the small number of samples in the dataset, it was amplified.The number of each category before and after amplification is shown in Table 2. TMAMD Dataset: TMAMD dataset is The Munich AML Morphology Dataset (TMAMD), which is open source on The Cancer Imaging Archive platform.It contains 15 classes of leukocytes, which has a serious class imbalance problem.The number of some classes is less than 30.Therefore, we only focused on the four categories with the largest number of samples, namely typical lymphocytes (LYT), monocytes (MON), myeloblasts (MYO), and paged nuclear neutrophils (NGS).We selected 1/10 from each of the four classes to maintain the same distribution as in the original data set.The amount of original data and the number of selected are shown in Table 3.Both datasets were annotated at the pixel level under the guidance of professional doctor.Annotation format was consistent with the Pascal VOC dataset.The pixel level annotation was used to compare the resulting pseudo-segmentation annotations and the final weakly supervised segmentation results.Two datasets were divided into training set, validation set and test set at the ratio of 7:2:1.

Model training
In the experiments, the training data for the classification network generating CAMs and the semantic affinity network AffinityNet were normalized to a size of 512 × 512 × 3. 50 iterations were conducted with a batch size of 12.The initial learning rate was set to 0.1.During training, the image-level labels for the classification network were encoded in one-hot format.The value of α in Eq. ( 5) increased gradually from 0 to 4.

Evaluation
In order to assess the segmentation performance of images, this article employs several metrics, including Intersection over Union (IoU), mean Intersection over Union (mIoU), mean Pixel Accuracy (mPA), and Dice coefficient (Dice).The definitions of these metrics are as follows: IoU is computed by regarding the labeled image and the predicted image as two sets and calculating the ratio of their intersection to their union.mIoU is the average of IoU values for all classes.IoU, mIoU expressed as Eq. ( 7) and Eq. ( 8) : mPA is the average ratio of correctly classified pixels to the total pixels predicted for each category.mPA expressed as Eq. ( 9) : Dice is a function that measures the similarity between two sets, it expressed as Eq. ( 10): where TP indicates the instances that are correctly identified as positive samples, FP refers to the instances that are incorrectly identified as positive samples but are negative samples, FN signifies the instances that are incorrectly identified as negative samples but are positive samples, TN denotes the instances that are correctly identified as negative samples, and k is the number of total classes.

Comparative experiment on generating pseudo segmentation labels
Figure 5 shows the methods of generating CAMs used by Refs.[9] (denoted as PSA), [22] (denoted as SEAM), [11] (denoted as Puzzle-CAMs) and compares them with the proposed method on BCCD and TMAMD.Since the comparative methods used in the paper were not originally studied on the leukocyte images, all experiments were performed under the same experimental environment described in Section 3.1.We ensured that the inputs for all comparative models remained consistent and adjusted relevant parameters appropriately to achieve the best results under the respective methods.It can be see that the CAMs generated by the proposed method avoid over-activation in areas such as leukocyte boundaries and impurities.At the same time, the CAMs cover the complete target area as much as possible.
Meanwhile, the generated pseudo segmentation label are presented.The other three comparison methods and the proposed method all use AffinityNet, Random Walk and CRF to improve label quality.Figure 6 illustrates pseudo segmentation labels generated by the methods on two datasets.Segmentation labels with different colors represent different class of leukocytes.In BCCD dataset, blue, red, green, purple corresponds to LYM, MON, NEU and EOS, respectively.In TMAMD dataset, blue represents LYT, red stands for MON, green signifies MYO, and purple denotes NGS.To access the quality of the generated pseudo segmentation labels, a comparison was made between them and the Ground Truth labels.Table 4 presents IoU for each class of pseudo labels produced on different methods for the two datasets.The classes represented by ①, ②, ③ and ④ of the two datasets in Table 4 are different.In BCCD dataset, ①, ②, ③ ④ corresponds to LYM, MON, NEU and EOS, respectively.In TMAMD dataset, ① represents LYT, ② stands for MON, ③ signifies MYO, and ④ denotes NGS.Table 5 presents the comparison of the pseudo-segmented labels and the true labels mIoU, Dice, mPA obtained by the different methods on the two datasets.Table 5 shows that the indicators of each class on two datasets basically exceed other methods.

Ablation experiment
In order to analyze the effectiveness of each module proposed in this article, ablation experiments were conducted on BCCD dataset.When implementing weakly supervised semantic segmentation of white blood cell images, we propose improvements based on Puzzle-CAMs: adding preprocessing and SE attention mechanism.This directly affects the generation of CAMs.Therefore, providing the best train mIoU for each epoch after training the classification network using the first two improvement points like Puzzle-CAMs, as shown in Fig. 7.The red circle markings in Fig. 9 represent the maximum mIoU reached during training.
Here, the initial Puzzle-CAMs and improved CAMs proposed results are provided.From Fig. 8, it can be seen that the initial Puzzle-CAMs activate impurities present in the blood cell image, and the activation of white blood cells does not cover the complete target.After preprocessing and the addition of SE blocks, the above problems have been well solved, which proves the effectiveness of the method.
Table 6 shows the objective results of three improvements on the final pseudo segmentation labels.All ablation experiments have a direct impact on the final pseudo segmentation labels, that is, using complete raw CAMs and CAM optimization.Firstly, image preprocessing operations can maximum mIoU reached during training.From Fig 8, it can be seen that the initial Puzzle-CAMs activate impurities present in the blood cell image, and the activation of white blood cells does not cover the complete target.After preprocessing and the addition of SE blocks, the above problems have been well solved, which proves the effectiveness of the method.Table 6 shows the objective results of three improvements on the final pseudo segmentation labels.All ablation experiments have a direct impact on the final pseudo segmentation labels, that is, using complete raw CAMs and CAM optimization.Firstly, image preprocessing operations can make the target and background distinctions of white blood cell images more prominent, thereby enhancing the original image.The experimental results show that preprocessing operation is necessary; Secondly, the proposed method promotes CAMs by matching local and global features, and applies attention mechanisms to the feature extraction network that generates CAMs, which can comprehensively elevate the quality of pseudo segmentation labels; Finally, employing hole filling technique to the advanced CAMs can also bring certain performance improvements.The experimental results show that, on the basis of the baseline, mIoU quality of the generated pseudo labels is increased by 3.75%, Dice coefficient is    6 shows the objective results of three improvements on the final pseudo segmentation labels.All ablation experiments have a direct impact on the final pseudo segmentation labels, that is, using complete raw CAMs and CAM optimization.Firstly, image preprocessing operations can make the target and background distinctions of white blood cell images more prominent, thereby enhancing the original image.The experimental results show that preprocessing operation is necessary; Secondly, the proposed method promotes CAMs by matching local and global features, and applies attention mechanisms to the feature extraction network that generates CAMs, which can comprehensively elevate the quality of pseudo segmentation labels; Finally, employing hole filling technique to the advanced CAMs can also bring certain performance improvements.The experimental results show that, on the basis of the baseline, mIoU quality of the generated pseudo labels is increased by 3.75%, Dice coefficient is make the target and background distinctions of white blood cell images more prominent, thereby enhancing the original image.The experimental results show that preprocessing operation is necessary; Secondly, the proposed method promotes CAMs by matching local and global features, and applies attention mechanisms to the feature extraction network that generates CAMs, which can comprehensively elevate the quality of pseudo segmentation labels; Finally, employing hole filling technique to the advanced CAMs can also bring certain performance improvements.The experimental results show that, on the basis of the baseline, mIoU quality of the generated pseudo labels is increased by 3.75%, Dice coefficient is increased by 1.86%, and mPA is increased by 4.30%.

Comparative results of training UNet using pseudo segmentation labels
The original intention of UNet is to solve problems in biomedical images, as it has good segmentation results and is widely utilized in various directions of semantic segmentation.In order to compare the quality of generated pseudo segmentation labels, UNet is selected as the fully supervised segmentation network.Table 7 shows the results of training models with pseudo segmentation labels generated using different methods.The results show that UNet is used as the fully supervised segmentation network, and the pseudo-segmentation labels generated by the proposed method are used for training.Compared with the full supervision, the mIoU, Dice and mPA of leukocyte segmentation on BCCD decreased by 1.26%, 1.84% and 1.63%, respectively.On TMAMD, the mIoU, Dice and mPA of leukocyte segmentation decreased by 1.34%, 0.82% and 2.08%, respectively.Although each index has decreased, the use of image-level labels to achieve white blood cell semantic segmentation greatly reduces the cost of manual labeling.Compared with the other three mainstream weakly supervised methods, the label training in this paper is closer to the results of fully supervised segmentation.

Additional experiment
Despite applying data augmentation in our experiments, we conducted five-fold cross-validation on two datasets to assess the model's generalization.Each image in the dataset was used once as a validation set.We evaluated the pseudo segmentation labels generated by our improved method against the ground truth annotations.Experimental data is presented in Table 8.For the BCCD dataset, mIoU across folds ranged approximately from 89.88% to 92.30%, indicating consistent performance with slight variation.Dice coefficients ranged from 93.73% to 95.88%, showcasing robust pixel-level prediction capabilities.Similarly, mPA scores ranged from 93.53% to 96.62%, reflecting high pixel-level accuracy.Evaluation on the TMAMD dataset showed mIoU ranging from 92.56% to 93.92%.Dice coefficients demonstrated strong performance, ranging from 96.10% to 96.86%, underscoring precise pixel-level predictions.mPA scores for TMAMD ranged from 94.23% to 96.38%, indicating consistent and accurate results.These results, with minimal fluctuations observed in the metrics, indicate the model's reliability and efficacy in generating pseudo segmentation annotations.

Discussion and conclusion
The background of our study is that with the improvement of computer performance and the increase of white blood cell images, training a fully supervised white blood cell image semantic segmentation model requires pixel level annotations first, which imposes a workload on doctors.Therefore, it is necessary to develop an effective method to solve the problem of image annotation.For this propose, we propose a label generation method based on CAM using low-cost category labeling.This method was evaluated on two datasets.Experimental results showed that it has good performance and generalizability.
Compared with some previous methods used for white blood cell segmentation, which focus only on fully supervised, a key strength to our study can be identified.First, we propose Lab spatial image enhancement preprocessing before entering the classification network.It solves the problem of contrast between white blood cells and the background, which leads to segmentation CAM not being able to recognize boundaries.It avoids under activation.Then, we use local and global matching to generate CAM, which is added to channel attention.It utilizes the CAM effect of local CAM to be greater than that of the initial image, avoiding under-activation.For the generated pseudo segmentation annotations, we design hole filling after AffinityNet to avoid the presence of holes in the target area.Finally, we investigate the effectiveness of segmenting annotated data by training UNet models with pseudo segmentation labels.
On the other hand, our method also has limitations.When conducting experiments, due to the limitations of publicly available datasets, most images only contain one white blood cell.Therefore, our method is limited by the presence of one image scene for multiple white blood cells.
In the future, in order to increase the performance of white blood cell segmentation and improve the ability to handle different clinical situations.We will use image stitching technology to enhance the issue of public datasets.This may help address this challenge.
In conclusion, this work presents an effective weakly supervised semantic segmentation method for white blood cell images.This method was evaluated on the BCCD and TMAMD datasets.The quality of the generated pseudo segmentation labels was fully supervised through experiments on UNet.The experimental results demonstrate that our method achieves segmentation results close to fully supervised methods, and further indicate the potential of our proposed method as part of automatic white blood cell diagnosis.It can significantly reduce the clinical workflow, shorten access time, and promote future research in the medical imaging community.

Fig 1 .
Fig 1.The process of preprocessing

Figure 2
Figure2illustrates the overall process of generating CAMs in this study.By improving upon the structure used in[11], our generated CAMs for white blood cells exhibit closer resemblance to real pixel-level annotations compared to unmodified CAMs.affects the subsequent training of pseudo segmentation labels and fully supervised segmentation networks.CAMs is obtained from classification networks, and generating high-quality CAMs depends on the performance of the classification network.Given white blood cells have similar contours, distinguishing different categories primarily based on the morphology of the nucleus.Due to the network pays more attention to the most recognizable area, CAMs often fail to cover the entire cell area adequately, resulting in under activation.In addition, impurities in the background of images with colors similar to those of white blood cells can lead to non target areas being activated excessively, causing overactivation issues.To address these challenges effectively while ensuring accurate classification and comprehensive coverage of white blood cells, we propose a local and global matching method incorporating channel attention for generating CAMs.Fig 2 illustrates the overall process of generating CAMs in this study.By improving upon the structure used in[11], our generated CAMs for white blood cells exhibit closer resemblance to real pixel-level annotations compared to unmodified CAMs.

3. 2
Dataset BCCD dataset: In the BCCD (Blood Cell Count and Detection) dataset, each image

Fig. 5 .
Fig. 5. CAM visualization.(a) Leukocytes images.(b) CAMs obtained by PSA.(c) CAMs obtained by SEAM.(d) CAMs obtained by Puzzle-CAMs.(e) CAMs obtained by the method in this paper.

Fig. 6 .
Fig. 6.Pseudo segmentation labels for images.(a) Leukocytes images.(b) Pseudo segmentation labels generated by PSA.(c) Pseudo segmentation labels generated by SEAM.(d) Pseudo segmentation labels generated by Puzzle-CAMs.(e) The pseudo segmentation labels generated by the method in this paper.(f) Ground Truth.

Fig 7 .
Fig 7. Best mIoU for training classification networks Here, the initial Puzzle-CAMs and improved CAMs proposed results are provided.FromFig 8,  it can be seen that the initial Puzzle-CAMs activate impurities present in the blood cell image, and the activation of white blood cells does not cover the complete target.After preprocessing and the addition of SE blocks, the above problems have been well solved, which proves the effectiveness of the method.

Fig 7 .
Fig 7. Best mIoU for training classification networks Here, the initial Puzzle-CAMs and improved CAMs proposed results are provided.FromFig 8,  it can be seen that the initial Puzzle-CAMs activate impurities present in the blood cell image, and the activation of white blood cells does not cover the complete target.After preprocessing and the addition of SE blocks, the above problems have been well solved, which proves the effectiveness of the method.

Fig 8 .
Fig 8. Ablation of CAMsTable6shows the objective results of three improvements on the final pseudo segmentation labels.All ablation experiments have a direct impact on the final pseudo segmentation labels, that is, using complete raw CAMs and CAM optimization.Firstly, image preprocessing operations can make the target and background distinctions of white blood cell images more prominent, thereby enhancing the original image.The experimental results show that preprocessing operation is necessary; Secondly, the proposed method promotes CAMs by matching local and global features, and applies attention mechanisms to the feature extraction network that generates CAMs, which can comprehensively elevate the quality of pseudo segmentation labels; Finally, employing hole filling technique to the advanced CAMs can also bring certain performance improvements.The experimental results show that, on the basis of the baseline, mIoU quality of the generated pseudo labels is increased by 3.75%, Dice coefficient is