An MRI brain tumor segmentation method based on improved U-Net

: In order to improve the segmentation effect of brain tumor images and address the issue of feature information loss during convolutional neural network (CNN) training, we present an MRI brain tumor segmentation method that leverages an enhanced U-Net architecture. First, the ResNet50 network was used as the backbone network of the improved U-Net, the deeper CNN can improve the feature extraction effect. Next, the Residual Module was enhanced by incorporating the Convolutional Block Attention Module (CBAM). To increase characterization capabilities, focus on important features and suppress unnecessary features. Finally, the cross-entropy loss function and the Dice similarity coefficient are mixed to compose the loss function of the network. To solve the class unbalance problem of the data and enhance the tumor area segmentation outcome. The method's segmentation performance was evaluated using the test set. In this test set, the enhanced U-Net achieved an average Intersection over Union (IoU) of 86.64% and a Dice evaluation score of 87.47%. These values were 3.13% and 2.06% higher, respectively, compared to the original U-Net and R-Unet models. Consequently, the proposed enhanced U-Net in this study significantly improves the brain tumor segmentation efficacy, offering valuable technical support for MRI diagnosis and treatment.


Introduction
Brain tumors, being a highly threatening condition to human life and well-being, have garnered significant attention from the public.In the contemporary society, alongside technological advancements, medical imaging diagnostic techniques have emerged as effective methods for the treatment of brain tumors.By observing the lesion images in medical imaging, experts can diagnose the type and severity of the patient's disease and then proceed with the appropriate treatment.Magnetic Resonance Imaging (MRI) is a commonly employed medical imaging technology in patient treatment.Additionally, it is a widely utilized technique for brain tumor segmentation, effectively presenting the brain tissue structure and pathological areas [1].Although MRI provides great assistance to doctors, its imaging process can be influenced by surrounding factors, leading to problems like artifacts and field inhomogeneity.On this basis, the difficulty of manual diagnosis is increased, resulting in timeconsuming and potential misdiagnosis.Utilizing machine vision technology to achieve the segmentation of pathological areas is beneficial for the rapid and accurate assessment of disease severity, enabling targeted treatment, and it holds significant importance for the treatment of brain tumors.
In the early stages, common segmentation algorithms included region-growing-based segmentation methods [2][3][4] and edge-based segmentation methods [5][6][7], which were widely applied in brain tumor research.These algorithms segmented the target regions based on differences in image grayscale values, texture feature variations and color differences.However, a single threshold segmentation method cannot meet the accuracy requirements of medical images, and the resulting errors can significantly affect the assessment of the patient's condition.To enhance the effectiveness of medical image segmentation, machine learning techniques such as Support Vector Machine (SVM) [8] and Random Forest (RF) [9] have gradually been employed.Machine learning algorithms utilize classifiers to determine the class of each pixel in brain tumor images.Stelios Krinidis et al. proposed a variation of the fuzzy c-means (FCM) algorithm, which fused grayscale information and combined the information of local spatial.This method can enhance the clustering effect of the target.This improved FCM algorithm effectively solves the sensitivity of clustering algorithms to outliers and noise in noisy images [10].Ortiz et al. proposed an automated segmentation method of MRI, which belonged to a type of unsupervised learning.This method combined Self-Organizing Maps (SOM) and Genetic Algorithm (GA) to achieve detailed segmentation of MRI brain images.Additionally, a novel SOM clustering mechanism is presented, utilizing spatial information to define clustering boundaries and implementing completely unsupervised and automated segmentation methods [11].Khalid et al. introduced an approach for segmenting and classifying MRI scans by leveraging multi-modality MRI data.They used the extracted wavelet coefficients as feature vectors and applied SOM and SVM classifiers to categorize the images into normal and pathological classes [12].In the treatment of brain tumor patients, traditional machine learning has been widely used as an effective tool for assisting segmentation, greatly enhancing physicians' ability to treat brain tumors.However, brain diseases exhibit variability, and manually selecting features often cannot represent image characteristics well.In some cases, these features not only fail to improve recognition accuracy but may also lead to misidentifications.
The treatment of brain tumors has witnessed remarkable advancements, thanks to the continuous progress in computer technology and hardware devices.Deep learning, in particular, has played a pivotal role in these breakthroughs [13].Ronne-Berger proposed a symmetrical segmentation network called U-Net, which utilizes the same multiple for upsampling and downsampling to extract features from images.This has become a commonly used network model in medical imaging, effectively obtaining desirable tumor segmentation regions [14].Vittikop et al. introduced an improved U-Net segmentation method by incorporating skip connections into the U-Net architecture.This fusion of deep and shallow features enhances the semantic and spatial information of the images, compensating for the lack of shallow information during the feature extraction process and achieving good results [15].Kaikai Luo presented a brain tumor MR image U-Net segmentation method that integrates attention mechanisms and multi-view fusion.By incorporating attention mechanisms into the cascaded architecture of the decoding module.This method outperformed other approaches, with improvements of 0.9%, 1.3% and 0.6% in evaluation metrics [16].Despite U-Net being an effective method for brain tumor segmentation [17,18], there is great potential for improvement in terms of segmentation performance.To improve the segmentation performance and accuracy of brain tumors, segmentation networks with fused attention mechanism modules [19] and hybrid segmentation networks [20][21][22] have been proposed.Although improved segmentation algorithms based on the U-Net network can effectively enhance the segmentation effect of brain tumors, they suffer from issues such as increased network computational complexity and parameter count.
In this article, we propose a segmentation method that combines attention mechanism and U-Net network, aiming to ensure segmentation performance.This integration aims to improve the accuracy of segmentation.The integration, which used ResNet50 as the backbone network of U-Net, and the deep residual convolution network can extract the required feature information at a deeper level.Furthermore, the residual part of U-Net was integrated with the revolutionary block attachment module (CBAM).CBAM can improve the expressive power of features; it can exhibit good embedding ability with any network; its own network structure is simple, and the increase in parameter volume is relatively small; and it enhances the network's ability to capture features.Furthermore, the Dice loss function in the prediction phase of U-Net network.It can be seen from these experimental results that further enhanced the segmentation performance of the tumor's core region.In this study, the loss function consisted of two fundamental functions, included cross-entropy and Dice.This method can adjust these weights of different functions, that problem of the class imbalance can be addressed, leading to an improved segmentation effect.It provides technical reference for the clinical diagnosis and treatment of medical images.In 2015, He et al. [23] introduced the ResNet series of networks, with ResNet50 being a deep residual network widely utilized in various algorithms.The network structure of ResNet50 consists of two fundamental modules: the Conv Block and Identity Block.The Conv Block is used to change the dimensions of the network throughout, while the Identity Block deepens the network.Figure 1, as we can see in this paper, which is the network structure.The first part involves convolution, regularization, activation functions and max pooling on the input image.The second, third, fourth and fifth parts are composed of Conv Blocks and Identity Blocks.In each part, the Identity Block is executed 2, 3, 5 and 2 times, respectively.The sixth part performs the global average pooling on the output, which was a feature display that converted the feature maps into a feature vector.Then, used a classifier to calculate the probability distribution of classes.

The convolutional block attention module
CBAM, a lightweight attention module, comprises two sub-modules: The Channel Attention Module (CAM) and the Spatial Attention Module (SAM) [24].This structure, depicted in Figure 2, focuses on channel and spatial attention, respectively.As we can see, the channel attention mechanism was employed to extract information from the feature map.This mechanism utilizes global AvgPool and MaxPool to derive rich high-level features.Subsequently, an MLP is used to adjust the channel number ©. Map the C × H × W feature map to a C × 1 × 1 feature map.The formula for Multi-Layer Perceptron (MLP) is shown as (1), where W1 is the weight matrix from the input layer to the hidden layer, b1 is the bias vector of the hidden layer, W2 is the weight matrix from the hidden layer to the output layer, R is the Relu of activation function of the hidden layer and X represents the intput.Then they served as the input for the next part.The dimension of the feature map outputted by CAM is C × 1 × 1, and the calculation formula for the output is shown in Eq (2), s is the sigmoid of activation function.F' serves as the input of SAM, and the calculation formula for the input is shown in Eq (3).Where ⊗denotes element-wise multiplication, map the C × 1 × 1 feature map to a C × H × W feature map.
Similar to CAM, SAM also uses global AvgPool and MaxPool to extract information.Map the C × H × W feature map to a 1 × H × W feature map.The resulting feature maps are concatenated based on their channels (channel splicing).A 7 × 7 convolutional operation is then used to reduce the dimensionality, map the 2 × H × W feature map to a 1 × H × W feature map.Followed by the application of the sigmoid function to obtain the final feature map, map the 1 × H × W feature map to a C × H × W feature map.The calculation formula for the output is shown in Eq (4).
In this paper, ResNet50 was used to build the backbone network of U-Net, and CBAM was inserted into the Bottleneck module of ResNet50, which makes the network pay more attention to some feature layers and spatial areas.From Figure 1, it can be observed that ResNet50 consists of Conv Block and Identity Block.In this method, CBAM is added to each Identity Block, and CBAM is inserted a total of 12 times.Figure 3 shows the schematic diagram of inserting CBAM into the residual module of ResNet50.The utilization of CBAM not only helps in reducing parameter count and computational requirements but also facilitates its seamless integration as a plug-and-play module within existing network architectures.By leveraging the attention mechanism, CBAM enhances the representation capability of the network by emphasizing crucial features and suppressing irrelevant ones.It can enable the model to focus on important information, thereby improved overall performance.

Loss function
In the improved network, the loss function is composed of cross entropy loss function and Dice similarity coefficient.Cross-entropy was usually used in the training of network models.It can optimize the model and achieve optimal performance during trainingand.Also, it can solve the gradient disappearance.The formula is shown in Eq (5): 1 log 1 log (5) where, M and N represent these set of pixel points and these set of labeled pixel points in the segmented image.K represents the real category, and kij refers to the category of the ith prediction chart and the jth real label.P always represents the predicted value, which refers to the predicted value in the i-th prediction chart and the jth real label.In this paper, MRI has the problem of uneven distribution of foreground and background features, and the use of cross-entropy loss function alone will bias the background features.Dice can solve the problem of data imbalance and is also widely used in medical image segmentation.The Dice loss function formula is shown in Eq (6): Calculate M and N respectively, where the smooth operator is ε.It is mainly used to avoid situations where the denominator is 0. Dice is often used to learn network parameters to make the predicted value closer to the real value.The function is as follows Eq (7): U-Net is a typical coder-decoder structure and its structure symmetry is ''U'' type.The method in this paper was based on the U-Net model.ResNet50 was used as that improved U-Net feature extraction work.Then, that CBAM was integrated into the residual.As we can see in Figure 4, it is the improved U-Net, which was composed of encoder part, decoder part and jump connection.The encoder part uses ResNet50 for feature extraction, and the part in U-Net corresponds to the part1-part6 in ResNet50 respectively.The decoder used the up-sampling layer instead of the traditional CNN pooling layer to improve the resolution of the output feature map.The up-sampling layer reshapes the feature map to the size of the previous layer through deconvolution.The decoder receives the semantic information from the bottom of the U-Net network, and recombines it with the high-resolution features of the encoder through the jump connection.So, the U-Net which the segmentation algorithm used in this article can better segment the fine structure.Finally, that convolution of 1 × 1 was used to map the number of channels.The role of convolutional mapping can require number of categories to obtain an output consistent with the input image.

Evaluation indicators
In this paper, we select Intersection over Union (IoU), the fraction efficiency (Dice) and Hausdorff distance as the performance evaluation metrics for the model [25].IoU is a widely in the CNN, to adopt evaluation metric in semantic segmentation methodologies.The Eq (8) for IoU in semantic segmentation is as follows: where A represents ground truth, B represents the network prediction result.When it comes to evaluating semantic segmentation, the Dice coefficient is commonly used as a measure of sample similarity.By comparing the segmented target region with the annotated target region, if the two regions have a high degree of similarity, it indicates a good segmentation result.Conversely, if the similarity is low, it suggests a poor segmentation result.The Eq (9) for calculating the Dice coefficient is as follows: The formula of Hausdorff distance (HD) is as Eq (10): (10) where, A= {a1, a2, ..., ap}, B= {b1, b2, ..., bq}, ||•|| represents the norm between A and B.

Data set and test environment
The experimental data set in this paper is from the medical imaging database published in The Cancer Genome Atlas (TCGA) [26]  In this paper, Python programming language is used as the foundational coding language.Then, that Windows 10 served as the platform for the experimental process.We employed the PyTorch framework as the training simulation environment, leveraging its powerful capabilities.The test computer was equipped with an AMD Ryzen 6 1700X six-core processor and a powerful 8GB GPU (GeForce GTX 1070Ti), constituting its primary hardware configuration.During the network training process, the researchers employed 150 epochs and a batch size of 6.For the learning rate of the network training process in this study, it was set to 0.001.Then, we resize the size of the image.The input image size was 256 × 256 pixels.After every 5 training iterations, the network undergoes validation and the progress is saved.In this paper, the improved U-Net algorithm was used for tumor segmentation of MRI.A total of 150 trained models are saved during the training process, with each model being saved every 5 epochs.Figure 6 shows the loss value after the completion of network training.The process of the loss value approaching to a steady state is also the process of model convergence.In the figure, the red curve illustrates the variation of the loss value during training, while the blue curve represents the loss value at each epoch.By observing the trend of the loss value curve, it becomes evident that as the number of training iterations increases, the training model progressively reaches convergence.After 20 epochs, we can see in the figure that both that training and validation loss curves gradually stabilized, indicating that the model reached a good state.

Ablation experiments
In this paper, the improved U-Net algorithm was used to segment MRI tumors.The U-Net algorithm with the backbone network of ResNet50 (R-Unet) and the U-Net algorithm with the CBAM (CB-Unet) were compared.During the test, the algorithm used the same network parameters and test environment, and the same test set was used by U-Net and its improved network for segmentation performance test.Figure 8 shows the image segmentation results of the three different networks.Column A is the MR brain tumor images, column B is the label images, column C is the original U-Net segmentation results, column D is the R-Unet segmentation results, column E is the CB-Unet segmentation results and column F is the improved U-Net segmentation results.
It can be seen from the figure that U-Net can segment brain tumors, but it cannot completely segment tumors, and the network performance is poor.The segmentation results of R-Unet demonstrate superior performance compared to U-Net.R-Unet is capable of independently separating the tumor area; however, its segmentation may lack precision and result in over-segmentation.Although the U-Net algorithm after replacing the backbone network has the problem of imperfect segmentation, the segmentation effects are significantly improved compared with the original U-Net algorithm.It can be concluded that the U-Net algorithm replacing the backbone network can further improve the segmentation effects for the MRI segmentation task.The segmentation results of CB-Unet are shown in column E. From a horizontal comparison, the overall segmentation results of CB-Unet are better than U-Net.Furthermore, compared with the segmentation results of R-Unet, there are some images where the segmentation results are not as good as R-Unet.However, the overall segmentation accuracy is good.In summary, improvement methods of ResNet50 and CBAM can improve the segmentation performance of brain tumors.
Column F shows the segmentation performance of the improved U-Net.The improved U-Net combines the ResNet50 and CBAM.From the comparison of the segmentation results, we can see that the improved U-Net algorithm has higher segmentation accuracy for the tumor region at the same location.In terms of the size of the segmentation region, the improved U-Net algorithm is more detailed for the segmentation of tumor regions.Comparing Column F and Column B, it can be seen that the proposed improved U-Net can perform segmentation of brain tumor regions, but there is an issue of over-segmentation.In summary, the improved U-Net algorithm exhibits a better segmentation effect compared to the U-Net.Not only does it accurately segment the target, but it also achieves high accuracy in segmenting the tumor region.   1 shows the test statistical results of these segmentation methods.From the evaluation metrics of IoU, Dice and HD, it can be observed that the results of R-Unet and CB-Unet are both superior to U-Net.From the evaluation indicators perspective, the improved U-Net algorithm outperforms both CB-Net and R-Unet in terms of IoU indicators, with 1.51% and 2.41% higher scores, respectively.Similarly, when considering the Dice indicators, the improved U-Net algorithm demonstrates a 0.88% and 2.06% higher performance compared to CB-Net and R-Unet, respectively.Last, in terms of HD, the improved U-Net also computes the smallest value, indicating that the segmented regions are closer to the target areas.These experimental results indicate that the improved U-Net algorithm exhibits superior segmentation performance in brain tumor analysis.

The segmentation results of improved U-Net
By comparing the segmentation performance of the improved U-Net algorithm before and after the experiment, it is concluded that the improved U-Net network in this paper has the best performance and can effectively segment the brain tumor area.In order to better reflect the segmentation effect of the improved network, Figure 8 shows the segmentation and extraction for MRI.Column A is the original image, column B is the segmented image, column C is the segmented fusion image and column D is the segmentation and extraction of tumor.It can be seen from the segmented images that the network proposed in this paper can segment the tumor area.Furthermore, it can also extract the tumor location effectively, and the segmentation accuracy can be optimal.Compared with the U-Net, CB-Unet and R-Unet segmentation effect, the improved segmentation algorithm does not have oversegmentation and inaccurate segmentation, and the segmentation target and the actual target coincide more.In conclusion, the improved U-Net segmentation network has better segmentation effect and can provide effective technical support for medical treatment.performance.Specifically, the Dice evaluation index is 3.18% and 2.41% higher compared to U-Net and R-Unet respectively, while the MIoU evaluation index is 3.13% and 2.06% higher than U-Net and R-Unet, respectively.These results affirm that the proposed model effectively improves MRI segmentation and offers valuable technical insights for MRI diagnosis and treatment.Additionally, the model lays a solid theoretical foundation for the subsequent segmentation of 3D brain tumor images.

Use of AI tools declaration
The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

Figure 1 .
Figure 1.The backbone network architecture used in this experiment was ResNet50.

Figure 2 .
Figure 2. The CBAM was inserted in this experiment.

Figure 3 .
Figure 3.The specific application of CBAM in this network.

Figure 4 .
Figure 4.The improved U-Net network required for these experiments in this paper.
. The dataset utilized in their study consists of a collection of 3929 pairs of brain MR images and corresponding manual FLAIR anomaly segmentation masks, with the image size of 256 × 256, data set images are shown in Figure 5.The background of MR image is black, which greatly facilitates the segmentation of network models.Before the test, 100 images are reserved as the test data after model fitting.During the training, 3829 images were divided into two sets.The data splitting ratio was 9:1, where 90% of the data was allocated for training and 10% for testing.The training set is used for the normal training of the network, and the verification set is used to verify the performance of the model after each training.Furthermore, image enhancement technology is added in the training process, and each image input to the network is randomly flipped 5 times.

Figure 5 .
Figure 5.The original MRI images required for the experiment.

Figure 6 .
Figure 6.The change in loss value during the network training process.

Figure 7 .
Figure 7. Different segmentation algorithms for the same object's segmentation results.

Figure 8 .
Figure 8.The segmentation results of the improved algorithm.
enhanced U-Net network model was proposed in this paper for MRI brain tumor segmentation.The model leveraged the ResNet50 as the improved U-Net and it can enhance the effectiveness of feature extraction due to its deeper architecture.The residual component of the backbone network is incorporated into the CBAM module, enhancing representation capability by emphasizing important features while suppressing irrelevant ones.To address class imbalance in the data and improve tumor core region segmentation, a combination of cross entropy loss function and Dice similarity coefficient is utilized as the network's loss function.Experimental results demonstrate that the improved U-Net network model outperforms its predecessor in terms of segmentation

Table 1 .
Results of ablation experiments.