Next Article in Journal
Bioinformatics-Based Characterization of ATP-Binding Cassette Subfamily B Member 1 (ABCB1) Gene Expression in Non-Small-Cell Lung Cancer (NSCLC)
Previous Article in Journal
Magnetocaloric Properties and Critical Behaviour of the Sm2Ni17 Compound
Previous Article in Special Issue
Applying Deep Learning Methods for Mammography Analysis and Breast Cancer Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computer-Aided Diagnosis System for Breast Ultrasound Reports Generation and Classification Method Based on Deep Learning

College of Computer Science, Sichuan University, Section 4, Southern 1st Ring Rd, Chengdu 610065, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(11), 6577; https://doi.org/10.3390/app13116577
Submission received: 22 April 2023 / Revised: 22 May 2023 / Accepted: 24 May 2023 / Published: 29 May 2023

Abstract

:
Breast cancer is one of the most common malignancies that threaten women’s health. Ultrasound testing is a widespread technique employed for the early detection of tumors. However, after receiving the paper ultrasound report, most patients often have to wait for several days to receive the diagnosis results, which can increase their psychological burden and may cause treatment delay. Based on deep learning, this study designed a computer-aided diagnostic system that directly classifies benign and malignant tumors in breast ultrasound images on paper reports taken by patients, helping them obtain auxiliary diagnostic results as soon as possible. In order to segment and denoise ultrasound report images of patients, this paper proposes a breast ultrasound report generation method, which mainly includes a segmentation model, a rotating classification model and a generative model. With this method, multiple high-quality individual breast ultrasound images can be obtained from a single ultrasound report photo, improving the performance of the breast ultrasound image classification model. In order to utilize high-quality breast ultrasound images and improve classification performance, this paper proposed a breast ultrasound report classification model that includes a feature extraction module, a channel attention module and a classification module. The accuracy of the model reached 89.31%, recall rate reached 88.65%, specificity reached 89.57%, F1 score reached 89.42% and AUC reached 94.53% when input images contained noise. The method proposed in this article is more suitable for practical application scenarios and it can quickly and accurately assist patients in obtaining the benign and malignant classification results of ultrasound reports.

1. Introduction

At present, breast cancer poses the greatest threat to women’s health as the most prevalent type of malignant tumor [1]. Preventive screening of the breast can diagnose the presence of breast tumor in advance and improve the survival rate of patients [2]. Mammography is the most important screening method, which includes ultrasound, X-ray and MR imaging. European and American families mainly use X-rays for breast screening, with a sensitivity of about 85%, but if the breast is a dense breast, the sensitivity will be reduced to 47–64.4% [3]. In China, 49.2% of women have dense breasts, and the use of X-rays for breast screening can lead to an increase in false negative results. Ultrasound is not affected by breast compactness, and it is radiation-free, non-invasive and low-cost, so ultrasound is often used as a common method of physical examination in middle-income countries.
In the process of ultrasound examination, the sonographer will use the detection probe to detect the patient’s breast in different directions, and according to the experience of the sonographer, detect the key parts and capture images. After the screening is completed, the sonographer will fill in a ultrasound examination report, the main contents in the area of possible suspicion during screening breast ultrasound images of the areas where there may be suspected symptoms in the screening process, the morphological description of the suspected tumor and the diagnostic recommendations of the sonographer. After obtaining the report, the professional doctor will comprehensively consider the images in the report and the corresponding information provided by the sonographer to give the diagnostic opinion. However, in the actual scene, due to the large number of people waiting for diagnosis, doctors are often unable to analyze the ultrasound report in time. After receiving the ultrasound report, patients usually have to wait for several days to receive the diagnostic advice of a professional doctor. This waiting time will undoubtedly cause high psychological pressure on patients and may miss the best time for early treatment. Moreover, when benign and malignant breast tissues share similar features, it can be challenging for less experienced physicians to determine if the region is malignant.
Due to the important role of breast tumor screening in the prevention, diagnosis and treatment of female breast cancer and the existence of problems such as limited doctor diagnostic pressure and diagnostic experience, it is particularly important to provide a fast, objective and accurate auxiliary diagnostic method. With the gradual development of deep learning, more and more artificial intelligence applications have emerged. Computer-assisted Diagnosis (CAD) is one of them. This system can analyze breast ultrasound images and provide benign or malignant auxiliary diagnostic results. However, current CAD systems [4,5,6,7] for such classification predominantly focus on analyzing electronic images extracted from hospital systems rather than evaluating the entire ultrasound report. This is inconsistent with actual application scenarios because patients can only obtain paper ultrasound reports and cannot access high-quality electronic images within the hospital. Classifying benign and malignant categories in paper-based breast ultrasound reports tends to be more challenging than analyzing a single electronic breast ultrasound image. Firstly, the breast ultrasound report contains multiple breast ultrasound images, and conventional image classification methods are difficult to classify multiple breast ultrasound images. Secondly, paper-based breast ultrasound reports taken with a mobile phone often carry more environmental noises such as glare, rotation, deformation and other issues that can affect the accuracy of the classification model. In addition, existing CAD systems [8,9] are based on a VGG network structure for classification and do not consider the problem of gradually degraded model performance as the network model deepens. Many methods use transfer learning to extract image features but do not filter out important feature channels as inputs for the classification model. Finally, existing CAD systems do not provide a simple and easy-to-use system for patients but rather deploy an internal hospital system to assist doctors in diagnosis.
To tackle the aforementioned challenges, this paper presents a mobile-assisted diagnostic system for the classification of benign and malignant tumors in breast ultrasound images. The system primarily consists of a breast ultrasound image generation method and a breast ultrasound image classification model. The image generation method is employed for segmenting and denoising the breast ultrasound report, while the image classification model is utilized for distinguishing between benign and malignant types of breast ultrasound images. The primary contributions of this article include the following:
  • This study presents a method for generating breast ultrasound report images to solve the problem of excessive noise in breast ultrasound report photos. The method consists of three models: a segmentation model, a rotation classification model and a generation model. It restores and generates low-quality breast ultrasound images, resulting in higher-quality breast ultrasound images. This improves the classification performance of subsequent classification models;
  • This work presents a classification method of benign and malignant tumors in ultrasound reports utilizing ResNet [10]. The method consists of a feature extraction module, a channel attention module and a classification module.
  • A convenient and simple self-diagnosis platform for breast cancer tumors is designed, allowing users to perform assisted diagnosis on their own after receiving ultrasound reports.

2. Related Work

Computer-assisted Diagnosis (CAD) systems can analyze breast ultrasound images and provide auxiliary diagnosis results for benign or malignant tumors. Numerous research efforts have focused on the segmentation, feature extraction, feature selection and classification of breast images [11,12,13,14]. Xuan et al. [12] proposed a technique that integrates region growing and edge detection for segmenting magnetic resonance (MR) images. High feature counts increased computational costs and decelerated the classification process. Feature selection methods decreased feature space to improve process accuracy and minimize computation time by removing redundant, irrelevant and noisy features [13]. The classification stage, the core of the CAD system, categorized ROI identification data into predefined classes, typically binary classification as positive or negative classes. Support vector machines (SVM) [15,16] are important methods in classification tasks. A breast cancer diagnosis approached utilizing an SVM-based method was suggested by Akay et al. [14].
Recently, due to neural network advancements, a deep convolutional neural network (CNN) [17] has been developed, achieving high accuracy in complex image classification tasks such as ImageNet Challenger [18], enhancing classification and detection progress. Deep learning techniques facilitate automatic feature learning via neural networks, bypassing the need for manual design and feature extraction processes typically involved in conventional machine learning approaches. They can handle more complex data and have a stronger generalization ability and scalability. Daoud et al. [19] proposed using deep feature extraction and transfer learning methods to complete the breast ultrasound image classification task. Hijab et al. [20] initially trained a small network from scratch as a baseline model, then employed two more complex network models, the first being VGG16 [21], which was trained on over 14 million images, and the second being a fine-tuned model derived from features saved by pre-trained CNN models of VGG16. Masud et al. [22] utilized eight pre-trained CNN models and one custom model, all of which were trained on the ImageNet dataset. The convolutional layer weights of these pre-trained models were employed as feature extractors for breast cancer classification. Gheflati et al. [23] first used ViT to classify breast ultrasound images using different enhancement strategies. Sahu et al. [24] developed five new breast cancer detection frameworks based on deep hybrid convolutional neural networks. The proposed hybrid scheme outperformed the respective base classifiers while maintaining the overall advantages of both networks. Table 1 compares existing methods for breast ultrasound image classification.

3. Materials and Methods

3.1. Patient and Image Characteristics

We collected 16,091 electronic breast ultrasound images from three hospitals in China, referred to as ElecPic in this paper. Sonographers used these images to generate ultrasound reports, which included four or more ultrasound images, image analysis, and the sonographer’s inspection suggestions. To simulate the steps of patients taking and uploading photos for their reports, we de-identified the ElecPic dataset and printed it on A4 paper to construct paper-based ultrasound reports. Then, we used 25 different models of smartphones to take pictures of these reports. We used the YOLOv5s [26] model to segment the photos of these ultrasound reports. The segmented images obtained by this method are referred to as Photo in this paper. In addition, when users take pictures of ultrasound reports with their phones, many types of noise such as photo deformation, light spots and rotation will be introduced during this process. The occurrence of deformation noise in ultrasound images is often attributed to the folding or improper handling of ultrasound reports by patients, leading to distortions in the ultrasound images. The presence of light spot artifacts can be attributed to excessive light exposure in the environment during patient photography. Additionally, rotational noise may arise when patients capture ultrasound images using their mobile phones, which could be oriented vertically or horizontally. To simulate these issues, we also added corresponding processing when constructing paper-based ultrasound reports. We added deformation, spot of light and rotation processing to the Photo dataset separately, referred to as Defor-Photo, Spot-Photo and Rot-Photo, respectively. Furthermore, we randomly mixed these noises together and finally obtained a dataset that randomly mixed all three types of noises mentioned above called Mix-Photo dataset. In training our generation model, we used Mix-photo as input image while using ElecPic as target image, considering that large-angle rotations would affect the generation effect of our model significantly; therefore, based on the Rot-photo dataset, we trained a classification model for judging whether there is any rotation problem in an image or not. The generated images produced by our generation model are called GenPic hereafter. Finally, the classification results were obtained by inputting both GenPic and Mix-photo datasets into our classification model. The dataset used in this article is shown in Table 2.

3.2. Overall Structure

The overall architecture is shown in the Figure 1. Our system consists of a mobile phone-assisted diagnosis program, a breast ultrasound report generation method and a breast ultrasound classification model.
  • Mobile-assisted diagnostic program: Patients use the assisted diagnostic program provided by us to take and upload breast ultrasound reports, and after subsequent processing, acquire the classification outcomes for benign and malignant breast ultrasound images.
  • Breast ultrasound report generation method: As mentioned earlier in this article, it is difficult for classification models to directly classify the benign and malignant breast ultrasound reports. There is a lot of noise in the photos taken by users. Therefore, we use segmentation models, rotation classification models and generation models to segment and denoise breast ultrasound reports.
  • Breast ultrasound report classification model: The classification model includes two inputs. One input is a high-quality image from the generative model, and the other input is the original low-quality image segmented by the segmentation model. They are both input into the classification model, and after going through the feature extraction module, channel attention module and classification module, they obtain the final classification results.

3.3. Breast Ultrasound Report Generation Method

The breast ultrasound classification model cannot directly classify breast ultrasound reports, so it is necessary to segment the breast ultrasound reports. We annotated multiple breast ultrasound reports and trained a YOLOv5s model. With the YOLOv5s model, we obtained multiple independent images of breast ultrasounds.
The breast ultrasound report photos taken by users contain a lot of noise, which can affect the classification effect of the classification model. Common types of noise include light spots, deformation and rotation. We prepared a dataset containing these types of noise in advance. Then, we built a Pix2Pix [27] model to eliminate the influence of these noises and generate high-quality breast ultrasound images. Pix2Pix is based on cGAN (Conditional GAN) [28] and is mainly used for image-to-image generation tasks in supervised learning. Compared with traditional GANs, Pix2Pix changes the input random noises to the image provided by the user. In this task, the input of the Pix2Pix model is low-quality breast ultrasound images. In terms of loss function, cGAN’s loss function includes L2 Loss, while Pix2Pix chooses L1 Loss. The formula for L1 Loss is shown in Equation (1).
L L 1 ( G ) = E x , y [ y G ( x ) 1 ]
where L L 1 ( G ) represents the evaluation of the performance of generator G using the L1 loss function, E x , y represents the expected value of the sample dataset x and y, and y G ( x ) 1 represents the L1 distance between the true value y and the predicted value G ( x ) of the generator. The final objective function of Pix2Pix is Equation (2).
L p i x 2 p i x ( G , D ) = L G A N ( G , D ) + λ E x , y [ y G ( x ) 1 ]
L G A N ( G , D ) represents the adversarial loss, where G is the generator and D is the discriminator. λ is a hyperparameter used to control the weight of reconstruction loss. We input low-quality breast ultrasound images and their corresponding electronic images into the Pix2Pix model, which will learn their relationship and generate a fake high-quality image that contains features from low-quality images but removes noises contained in them. By observing the generated results of our Pix2Pix model, we found that when rotating angles exceed 90 degrees for low-quality images, quality deteriorates significantly; therefore, we trained a classification model to correct rotation angles for low-quality images as well as solve this problem effectively.

Rotating Classification Model

We found that the current dataset contains many low-quality images with rotation angles exceeding 90 degrees. This problem may be caused by different shooting methods of users. When we input these images into the Pix2Pix model for training, the generated high-quality images are significantly different from the original low-quality ones. Therefore, we collected 2352 images with such problems and trained a classification model. As ResNet has powerful feature classification capabilities, ResNet34 was chosen as the classification network. The classification results include three categories: not rotated, rotated clockwise by 90 degrees and rotated counterclockwise by 90 degrees. Before inputting low-quality images into the generation model, the rotation classification model will determine whether there is a rotation problem in the input image.

3.4. Breast Ultrasound Report Classification Model

As shown in Figure 2, the classification model consists of a parallel feature extraction module, a channel attention module and a classification module. The parallel feature extraction module consists of two ResNet50 networks. One ResNet50 network takes the original low-quality image as input, while the other takes the high-quality image generated by Pix2Pix model as input. The high-quality image eliminates problems such as glare, deformation and rotation in the low-quality image but may lose some of its original features. Therefore, to retain the features of the low-quality image, we constructed a dual-input network model with the original low-quality image as one of the inputs.
After the parallel feature extraction module, we obtained two sets of feature maps. Then, these two feature maps were concatenated and fed into the channel attention module, which filtered the concatenated channels, calculated the channel attention power scores and weighted them to each channel. It also increased the weight of important channels in the concatenated feature map and decreased the weight of channels that are less relevant to the final classification result. After passing through the channel attention module, each channel has its corresponding weight information. Ultimately, these channels are provided as inputs into a classification network, primarily consisting of fully connected layers, to derive the final classification outcome.

3.4.1. Feature Extraction Module

To perform subsequent classification operations, feature extraction from breast ultrasound images is essential. We employed ResNet50 as the backbone of the feature extraction module. To prevent overfitting, we utilized transfer learning to initialize the network parameters with pre-trained parameters and fine-tuned them. We opted for the non-medical dataset ImageNet as the pre-training dataset. Although the ImageNet and breast ultrasound datasets differ, most images share similar underlying features. Consequently, pre-training allows the model to possess a general feature extraction ability, thus diminishing the need for extensive datasets and reducing training time and memory costs. ResNet50 is composed of several similar residual blocks, with a mapping path connecting the input and output of these residual blocks. Based on ResNet50, we eliminated the final average pooling layer and fully connected layer from the original network structure while retaining only residual blocks using feature maps output by final residual block as inputs for the channel attention module. In order to simultaneously extract information from low-quality breast ultrasound images and high-quality generated images, we designed two feature extraction modules.

3.4.2. Channel Attention Module

After the extraction operation of the parallel feature extraction module, the feature mapping images of two input images were obtained. Now, we need to extract important channel information from the two feature maps. Firstly, we concatenate these two feature maps. The formula is represented as Equation (3):
r c o n = C o n c a t e n a t e ( r s , r g )
Among them, r s and r g correspond to the feature vectors extracted by ResNet50 from the original low-quality breast ultrasound images and the generated high-quality breast ultrasound images, respectively. The two vectors are concatenated using the concatenate function. Then, we use the Squeeze operation and global average pooling (GAP) operation to compress the feature map, with Formula (4):
z = F s e q ( r c o n ) = 1 H × W i = 1 H j = 1 W r c o n ( i , j )
H denotes the height of the feature image, while W corresponds to the width of the feature map. The GAP operation performs average pooling on the spatial dimensions until each spatial dimension becomes 1, while keeping the other dimensions unchanged. We need to utilize the correlation between channels rather than spatial distribution. Using GAP to mask spatial distribution information can make calculations more accurate.
The next step after the Squeeze operation is the Excitation step. The Excitation step is implemented through two fully connected layers. Let z c be the number of input channels and m be the channel scaling parameter. The first fully connected layer scales the input channel to z c m , which is then passed through a ReLu layer and outputted to the second fully connected layer. The second fully connected layer restores the input channel number to z c and finally passes through a Sigmoid layer to obtain attention scores. The formula for the above steps is Formula (5):
s = F e x c ( z , W ) = σ ( W 2 δ ( W 1 z ) )
W 1 and W 2 represent the weights of the fully connected layer, and z represents the input. In this step, the scaling parameter is used to reduce the number of channels in order to reduce computational complexity, usually set to 16. After obtaining the attention score s, it is necessary to multiply it by r c o n to enhance important feature channels and weaken unimportant ones, thereby increasing the weight of important feature channels, the formula for this step is Equation (6):
x = F S c a l e ( r c o n , s ) = s · r c o n
s represents the attention score and r c o n is the concatenated feature map. Upon computing the channel attention module, we acquire the final feature maps. The classification module uses these feature maps to derive the ultimate classification outcome.

3.4.3. Classification Module

The classification component consists of one average pooling layer and two fully connected layers. The feature map produced by the channel attention module first passes through the average pooling layer and then through the initial fully connected layer, where the channel dimension is reduced from 4096 to 1024. Subsequently, it proceeds through the second fully connected layer, where the channel dimension is further decreased from 1024 to 2. Ultimately, the module outputs the classification result for benign or malignant breast ultrasound image.

3.5. Dataset and Experimental Settings

3.5.1. Dataset

For the generative model, we use the Mix-Photo dataset for training. For the rotating classification model, we annotated 3200 images and included four categories of labels: normal direction, clockwise rotation of 90 degrees direction, clockwise rotation of 180 degrees direction and clockwise rotation of 270 degrees direction. The number of images for each label is 800. For the breast ultrasound report classification model, both GenPic and Mix-Photo datasets need to be inputted for training. The GenPic and Mix-Photo datasets contain 9506 benign labels and 7395 malignant labels. In splitting the dataset, it is partitioned into a training set, validation set and test set at a ratio of 7:1.5:1.5.

3.5.2. Experimental Settings

The segmentation and generation models in this article are trained based on the methods provided by the open-source projects YOLOv5 and Pix2Pix. The rotation classification model and breast ultrasound report classification model proposed in this article are implemented based on the deep learning framework Pytorch. The GPU model used for training is NVIDIA Tesla K40, with a memory size of 12G. When training the classification network, the image size is adjusted to 224 × 224. The optimizer used is Adam (the weight decay rate is 1 × 10 4 , the momentum is 0.9), the batch size is 12, the learning rate is 1 × 10 3 , iteration times are set to be 200 and the best-performing parameters are selected using a validation set after each iteration. Finally, the performance of the model was evaluated using a test set to obtain the performance indicators of the model.
For the generative model, this article mainly evaluates it from two aspects. On the one hand, it evaluates its denoising effect. On the other hand, it compares the performance of breast ultrasound report classification models before and after adding the generative model. For classification models, we use accuracy, recall, AUC and F 1 score to evaluate them. T P (true positive) and T N (true negative) denote the accurate classification of positive and negative samples, respectively, while F P (false positive) and F N (false negative) represent the misclassification of samples. A c c u r a c y is the ratio of correct classifications, the formula for A c c u r a c y is Equation (7):
A c c u r a c y = T P + T N T P + T N + F P + F N
R e c a l l is the proportion of true positives among actual positives, the formula for R e c a l l is Equation (8):
R e c a l l = T P T P + F N
AUC measures classifier performance, ranging from 0.5 to 1. F 1 score is a harmonic mean of precision and recall, the formula for F 1 is Equation (9):
F 1 = 2 × T P 2 × T P + F P + F N

4. Results

In this chapter, we first conducted performance testing and comparison on the breast ultrasound report classification model, rotation classification model and breast ultrasound image generation model. Then, we performed ablation experiments on each model.

4.1. Performance Experiment

4.1.1. Breast Ultrasound Report Classification Model

To evaluate the effectiveness of the proposed breast ultrasound report classification model, the model was trained using the Mix-Photo and GenPic datasets. The best parameters were selected according to the validation set, while the model’s true performance was evaluated using the test set. Moreover, alternative methods and models were trained on the Mix-Photo dataset and contrasted with the proposed approach, as illustrated in Table 3.
The table compares the work on breast ultrasound classification tasks in recent years, and all the work is based on deep learning models to complete the classification task. It can be clearly seen from the table that our method has achieved the best performance in all indicators. The accuracy is 0.8931, recall is 0.8865, specificity is 0.8957, F 1 score is 0.8942 and AUC is 0.9453. This indicates that our method has higher classification performance compared with other methods. In addition, compared with other works, the basic ResNet50 network has shown better performance in all indicators. The feature extraction module of this article is also based on the ResNet network, which reflects that the feature extraction ability of the ResNet network can significantly improve the performance of medical image processing task-related models.

4.1.2. Rotating Classification Model

This section presents testing of the performance of rotation-based classification models and comparison of the network performance of ResNet series. The performance of other types of classification models was also compared, and the results are shown in Table 4. Considering that the rotation classification model is a multi-classification model, the Micro-average method was used to calculate each classification result when summarizing prediction results. The Micro-average method adds up T P , F P and F N categories for each category before calculating according to the binary classification formula.
From the table, it is evident that as the depth of the ResNet model increases, it can extract more profound feature information from images. There is a significant improvement in the model’s classification performance from ResNet18 to ResNet50. ResNet34 achieves the best results in terms of accuracy, recall rate and F 1 score. While ResNet50 attains the highest AUC score of 0.8862, it is slightly inferior to ResNet34 in other performance metrics. Although the performance of ResNet34 and ResNet50 are comparable, ResNet34 has a relatively simpler structure with fewer parameters. Considering this study’s practical use case, we opt for ResNet34 as our rotation classification model.
From the results, it can be seen that the ResNet models perform better than VGG, InceptionV3 and DenseNet121 in terms of accuracy, AUC and F 1 score. However, in terms of recall rate, DenseNet121 performs better than ResNet34. Additionally, from the table it can be seen that DenseNet121 and ResNet34 have similar and higher performance in all indicators compared with the other two models. This also reflects that by effectively utilizing features from previous layers to construct deeper and more accurate convolutional neural network models, image recognition and classification performance can be improved.

4.1.3. Generative Model

The generative model is trained using the Mix-Photo and ElecPic datasets, mainly to remove noises such as deformation, glare and rotation in the Mix-Photo dataset. The generated images are called GenPic. The generative model used in this article removes image noises, as shown in Figure 3.
From Figure 3a, it can be seen that the generative model has a strong ability to remove light spot noise, and the generated images are very close to the ElecPic images. It also removes the color flow Doppler on the ElecPic images, making the tumor area more prominent. It can be seen from Figure 3b that the generative model cannot handle images with significant deformation noise very well. The deformation of Mix-Photo in the first column is more significant, resulting in a significant difference between the GenPic and ElecPic. The images in the second and third columns have less deformation, and GenPic is closer to ElecPic. In addition to light spot and deformation noises, there is also significant rotation. We use the proposed rotation classification model to remove this noise, and the image generation results can be viewed in the comparison shown in Figure 4. In addition to the subjective analysis of GenPic, we also compared the performance differences of the breast ultrasound report classification model when inputting GenPic and Mix-Photo. For details, please refer to Section 4.3.4 of this article for comparative experiments.

4.2. Generative Model Ablation Experiment

Due to the use of a rotation classification model in the generative model for restoring images with large angle rotations, a comparison was made between the generated images of the generative model with and without the rotation classification model, as shown in Figure 4.
Figure 3. Noise removal performance of generative model on breast ultrasound photos. (a) Image comparison with light spot noise. (b) Image comparison with deformation noise.
Figure 3. Noise removal performance of generative model on breast ultrasound photos. (a) Image comparison with light spot noise. (b) Image comparison with deformation noise.
Applsci 13 06577 g003
Figure 4. Comparison of image generation quality between non-rotation classification model and rotation classification model. (a) ElecPic. (b) Photo. (c) None Rot-Photo. (d) Rot-Photo.
Figure 4. Comparison of image generation quality between non-rotation classification model and rotation classification model. (a) ElecPic. (b) Photo. (c) None Rot-Photo. (d) Rot-Photo.
Applsci 13 06577 g004
From Figure 4c, it can be seen that before adding the rotation classification model, the generated image quality of the generation model was very poor and could not obtain images similar to ElecPic. However, after adding the rotation classification model, as shown in Figure 4d, the generation model can generate images highly similar to ElecPic. Compared with Photo, the color is closer to ElecPic and there are more details similar to ElecPic in dark areas of the image.

4.3. Breast Ultrasound Report Classification Model Ablation Experiment

4.3.1. Network Structure Ablation Experiment

This article proposes a benign and malignant classification model for breast ultrasound reports, which consists of a feature extraction module, a channel attention module and a classification module. To evaluate the performance enhancements resulting from the feature extraction and channel attention modules, an ablation experiment was conducted, and the experimental findings are presented in Table 5.
The Baseline in the table refers to using the basic ResNet50 network for classification tasks without adding the channel feature extraction module and channel attention module proposed in this paper. Through analysis of the data in the table, it can be found that only after adding the feature extraction module, compared with Baseline, the model’s accuracy, recall rate, specificity, F 1 score and AUC have all decreased. This is because the lack of channel attention module make the model unable to judge weights on feature channels, leading to a decrease in the classification performance of the model. After adding a channel attention module based on Baseline and the feature extraction module, it can be seen that compared with Baseline alone, accuracy has increased by 1.02%, the recall rate has increased by 1.14%, and other indicators have also improved. It can be seen that combining feature extraction module with channel attention module can significantly improve model classification performance. In this section, structural ablation experiments are also conducted on the channel attention module; compared with Baseline without adding a channel attention module, accuracy increased by 0.92%, recall rate increased by 0.99% and other indicators also improved.

4.3.2. Feature Extraction Module Ablation Experiment

The feature extraction module is different from the conventional feature extraction module, which requires two inputs: one is Photo and the other is GenPic. To compare the influence of dual input on the proposed classification model in this paper with a single input, an ablation experiment was conducted in this section, and the results are shown in Table 6.
From the table, it can be seen that when the input image is Photo, the accuracy is higher compared with only inputting GenPic. This may be because Photo retains all of the original information of the image, while GenPic may change some original information. The recall rate is the highest when only inputting GenPic. GenPic can restore the edge shape of breast tumors to a certain extent, allowing the classification model to capture more edge information in tumor areas and detect more breast tumor areas that cannot be detected in the original image, thus improving the recall rate. Through a comparison experiment between single-input and dual-input models, it can be seen that with the dual-input model, classification performance can be improved by considering both original information from Photo and partially restored tumor regions from GenPic.

4.3.3. Channel Attention Module Ablation Experiment

To validate the effectiveness of the channel attention module (CA) used in this article, it was compared with the spatial-channel attention module CBAM [29]. The results are shown in Table 7.
From the table, it is observable that after adding the spatial-channel attention module, the model’s accuracy increased by 0.31%, the recall rate increased by 0.1%, specificity increased by 1.89%, F 1 score increased by 0.7% and AUC increased by 0.63%. Although there was a slight improvement, it was still not as good as when only the spatial attention mechanism was added. It is speculated that the spatial attention module increases weights on some irrelevant areas in images on a spatial level, which affects the channel attention module’s selection of channel weights and thus reduces the model’s classification performance.

4.3.4. Impact of Generative Model on Classification Performance

In the actual breast ultrasound image classification task, the generative model is used to denoise the Photo and obtain the GenPic image. Then, both Photo and GenPic are input into the breast ultrasound report classification model. To validate the performance improvement of the classification model with GenPic images, this section tested different input images, and the experimental results are shown in Table 8.
By comparing with Table 6, it can be seen that the dual-input Photo and GenPic present a slight improvement in most indicators compared with single-input. The dual-input uses two feature extraction modules to extract image features, so when classifying, the model has relatively more feature information available for use. When the input images are changed to Photo + GenPic, there is a significant improvement compared with only inputting Photo or GenPic. This is because the channel attention module can filter the original features contained in Photo and the image features restored by GenPic, selecting significant feature channels to enhance the classification model’s performance.

4.4. Channel Attention Visualization Analysis

In order to better demonstrate the role of the channel attention module in the breast ultrasound report classification model, this section uses Grad-CAM [30] to display the regions of interest that the model focuses on in the form of heat maps.
From Figure 5b, it can be seen that when there is a malignant tumor in the image, the model’s attention is mostly focused on the area where the tumor is located. From Figure 5d, it can be seen that when the tumor in the image is benign or non-existent, there is no specific area where the model’s attention is concentrated. Analyzing visualized images of channel attention module can reflect its actual role in the classification model.

5. Discussion

At present, breast cancer is the most prevalent form of malignant tumor posing a threat to women’s health. Timely and accurate screening for breast tumors is particularly important in reducing the mortality rate of breast cancer. However, with the increasing demand for breast ultrasound screening, doctors’ workload and working hours have also increased sharply. In addition, due to differences in medical resources across regions, inexperienced doctors may make misdiagnoses of ultrasound images. Therefore, this article proposes a computer-aided diagnosis system for breast ultrasound reports based on deep learning. Patients can use our mobile application to upload photos of paper-based breast ultrasound reports. After processing and analysis by the breast ultrasound report generation method and classification model, they can obtain benign or malignant classification results of the breast ultrasound images in the report. Our experiments show that using deep learning methods to classify benign or malignant breast ultrasound images has good performance. When there is more noise in the image of a breast ultrasound report, using a generative model can effectively remove noises from it. In addition, using high-quality generated images as one input to the classification model can also improve its performance in classifying benign or malignant cases.
To solve the segmentation and noise removal problems in breast ultrasound reports, this paper proposes a breast ultrasound report generation method based on deep learning. Firstly, using existing electronic breast ultrasound images, thousands of breast ultrasound reports were constructed and used as datasets for subsequent image generation and classification methods. Then, the YOLOv5 object detection model was used as a segmentation model to segment breast ultrasound reports in order to improve the performance of subsequent classification model. Since patients may introduce more noises such as rotation, deformation and light spot into the breast ultrasound report due to different shooting methods and environments when shooting a breast ultrasound report, this paper proposes using a generation model to denoise the image. Considering that this paper already features high-quality electronic breast ultrasound images, supervised training is performed using Pix2Pix model to generate high-quality breast ultrasound images from segmented low-quality ones in order to remove noise from them. Finally, during the process of training the generative model, it was found that when there is a large difference between the rotation angle of the input image and the electron image rotation angle, the quality of generated images will become very poor. Therefore, a ResNet-based rotation classification model was introduced to achieve good classification performance.
In order to classify ultrasound reports based on high-quality breast ultrasound-generated images, this paper proposes a breast ultrasound report classification model. The feature extraction module extracts raw information from the original image as well as tumor edge information recovered from the generated image to enable the network to obtain more feature information. After completing feature extraction, it is necessary to filter out channels. This paper uses a channel attention module to select important channels from concatenated channels and assign them larger weights in order to improve the performance of the classification model. Finally, the classification module is used for benign–malignant classification. Through our testing, when noisy breast ultrasound images are input into this classification model, its accuracy can reach 89.31% and recall can reach 88.65%, indicating good performance in classifying such images.

6. Conclusions

The article proposes a deep learning-based computer-aided diagnosis system for breast ultrasound reports, which can assist patients in quickly and conveniently obtaining diagnostic results through mobile terminals. The following conclusions can be drawn: Firstly, the breast ultrasound image generation method proposed in this article can effectively segment the ultrasound report and remove noise from the breast ultrasound images. After our testing, using denoised breast ultrasound images as input for benign and malignant classification can improve accuracy by 0.57%, recall rate by 1.18%, specificity by 3.43% and F 1 score by 3%. This indicates that adding additional recovery information can improve the recall rate of the classification model, thereby reducing missed diagnosis problems and saving more patients’ lives. Secondly, the breast ultrasound report classification model proposed in this article still showed good classification performance when inputting breast ultrasound images containing noises, with 89.31% accuracy, 88.65% recall rate, 89.57% specificity, 89.42% F 1 score and 94.53% AUC. The main work in the future will be performed from two aspects. In terms of breast ultrasound image generation, the image generation method used in this paper may result in a decrease in the quality of generated images when encountering image deformation problems. In future work, it is considered to continue optimizing the generation model or using better generation models to complete this part of the work. Regarding breast ultrasound image classification, the task addressed in this article pertains to the classification of benign and malignant breast ultrasound images. In actual medical scenarios, BI-RADS grading is also performed on breast tumors. From grade 0 to grade 6, the possibility that the tumor is a malignant tumor gradually increases. Therefore, in subsequent work, a multi-classification task for breast tumors is planned. In addition, it is also planned to improve the existing model structure to further enhance the classification performance of the model.

Author Contributions

H.Q. designed and performed the research and wrote the paper; L.Z. proposed revisions to the paper and ensured the accuracy of the experiments; Q.G. made critical revisions to the paper, proposed improvements to some of the methods and was responsible for data collection. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data available upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed]
  2. Oeffinger, K.C.; Fontham, E.T.; Etzioni, R.; Herzig, A.; Michaelson, J.S.; Shih, Y.C.T.; Walter, L.C.; Church, T.R.; Flowers, C.R.; LaMonte, S.J.; et al. Breast cancer screening for women at average risk: 2015 guideline update from the American Cancer Society. Jama 2015, 314, 1599–1614. [Google Scholar] [CrossRef] [PubMed]
  3. Brem, R.F.; Lenihan, M.J.; Lieberman, J.; Torrente, J. Screening breast ultrasound: Past, present, and future. Am. J. Roentgenol. 2015, 204, 234–240. [Google Scholar] [CrossRef] [PubMed]
  4. Huang, Q.; Huang, Y.; Luo, Y.; Yuan, F.; Li, X. Segmentation of breast ultrasound image with semantic classification of superpixels. Med. Image Anal. 2020, 61, 101657. [Google Scholar] [CrossRef] [PubMed]
  5. Huang, Q.; Yang, F.; Liu, L.; Li, X. Automatic segmentation of breast lesions for interaction in ultrasonic computer-aided diagnosis. Inf. Sci. 2015, 314, 293–310. [Google Scholar] [CrossRef]
  6. Ding, J.; Cheng, H.; Xian, M.; Zhang, Y.; Xu, F. Local-weighted Citation-kNN algorithm for breast ultrasound image classification. Optik 2015, 126, 5188–5193. [Google Scholar] [CrossRef]
  7. Shi, X.; Cheng, H.D.; Hu, L.; Ju, W.; Tian, J. Detection and classification of masses in breast ultrasound images. Digit. Signal Process. 2010, 20, 824–836. [Google Scholar] [CrossRef]
  8. Byra, M.; Galperin, M.; Ojeda-Fournier, H.; Olson, L.; O’Boyle, M.; Comstock, C.; Andre, M. Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med. Phys. 2019, 46, 746–755. [Google Scholar] [CrossRef] [PubMed]
  9. Zeimarani, B.; Costa, M.G.F.; Nurani, N.Z.; Bianco, S.R.; Pereira, W.C.D.A.; Costa Filho, C.F.F. Breast lesion classification in ultrasound images using deep convolutional neural network. IEEE Access 2020, 8, 133349–133359. [Google Scholar] [CrossRef]
  10. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  11. Jalalian, A.; Mashohor, S.B.; Mahmud, H.R.; Saripan, M.I.B.; Ramli, A.R.B.; Karasfi, B. Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: A review. Clin. Imaging 2013, 37, 420–426. [Google Scholar] [CrossRef] [PubMed]
  12. Xuan, J.; Adali, T.; Wang, Y. Segmentation of magnetic resonance brain image: Integrating region growing and edge detection. In Proceedings of the Proceedings, International Conference on Image Processing, Washington, DC, USA, 23–26 October 1995; Volume 3, pp. 544–547. [Google Scholar]
  13. Tang, J.; Alelyani, S.; Liu, H. Feature selection for classification: A review. In Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2014; pp. 37–64. [Google Scholar]
  14. Akay, M.F. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 2009, 36, 3240–3247. [Google Scholar] [CrossRef]
  15. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  16. Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
  17. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  18. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  19. Daoud, M.I.; Abdel-Rahman, S.; Alazrai, R. Breast ultrasound image classification using a pre-trained convolutional neural network. In Proceedings of the 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Sorrento, Italy, 26–29 November 2019; pp. 167–171. [Google Scholar]
  20. Hijab, A.; Rushdi, M.A.; Gomaa, M.M.; Eldeib, A. Breast cancer classification in ultrasound images using transfer learning. In Proceedings of the 2019 Fifth international conference on advances in biomedical engineering (ICABME), Tripoli, Lebanon, 17–19 October 2019; pp. 1–4. [Google Scholar]
  21. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  22. Masud, M.; Eldin Rashed, A.E.; Hossain, M.S. Convolutional neural network-based models for diagnosis of breast cancer. Neural. Comput. Appl. 2022, 34, 11383–11394. [Google Scholar] [CrossRef] [PubMed]
  23. Gheflati, B.; Rivaz, H. Vision transformers for classification of breast ultrasound images. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, Scotland, 11–15 July 2022; pp. 480–483. [Google Scholar]
  24. Sahu, A.; Das, P.K.; Meher, S. High accuracy hybrid CNN classifiers for breast cancer detection using mammogram and ultrasound datasets. Biomed. Signal Process. Control 2023, 80, 104292. [Google Scholar] [CrossRef]
  25. Wang, F.; Liu, X.; Yuan, N.; Qian, B.; Ruan, L.; Yin, C.; Jin, C. Study on automatic detection and classification of breast nodule using deep convolutional neural network system. J. Thorac. Dis. 2020, 12, 4690. [Google Scholar] [CrossRef] [PubMed]
  26. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
  27. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 5967–5976. [Google Scholar]
  28. Wang, J.; Zhao, Y.; Noble, J.H.; Dawant, B.M. Conditional generative adversarial networks for metal artifact reduction in CT images of the ear. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018. Proceedings, Part I. [Google Scholar]
  29. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  30. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Figure 1. The pipeline of the proposed method.
Figure 1. The pipeline of the proposed method.
Applsci 13 06577 g001
Figure 2. Breast ultrasound report classification model.
Figure 2. Breast ultrasound report classification model.
Applsci 13 06577 g002
Figure 5. Visualization of channel attention module. (a) Malignant tumor. (b) Malignant CAM. (c) Benign or no tumor. (d) Benign or no CAM.
Figure 5. Visualization of channel attention module. (a) Malignant tumor. (b) Malignant CAM. (c) Benign or no tumor. (d) Benign or no CAM.
Applsci 13 06577 g005
Table 1. Comparison of existing breast ultrasonic image classification methods.
Table 1. Comparison of existing breast ultrasonic image classification methods.
ReferenceYearCategoryMethodDatasets NumberAdditional Comments
Hijab [20]2018ClassificationVGG19882Transfer learning
Daoud [19]2019ClassificationAlexNet210Transfer learning
Wang [25]2020Segmentation and ClassificationSeg3D U-Net & CNN293Spatial attention
Masud [22]2022ClassificationCustom CNN Model1030-
Zeimarani [9]2020ClassificationCustom CNN Mode641-
Gheflati [23]2022ClassificationViT780-
Sahu [24]2023ClassificationCustom CNN Model9684-
Table 2. The dataset used in this article.
Table 2. The dataset used in this article.
DatasetNumberSize
ElecPic16091 224 × 224
Photo11021 224 × 224
Defor-Photo11021 224 × 224
Spot-Photo11021 224 × 224
Rot-Photo11021 224 × 224
Mix-Photo11021 224 × 224
GenPic11021 224 × 224
ElecPic—Original ultrasound images exported from the hospital; Photo—A single ultrasound image taken with a mobile phone and cropped; Defor-Photo—Ultrasound images containing deformation noise; Spot-Photo—Ultrasound images containing a spot of light; Rot-Photo—Ultrasound images containing rotational noise; Mix-Photo—Ultrasound image mixed with three types of noise.
Table 3. Experimental results and comparisons of classification models for breast ultrasound reports.
Table 3. Experimental results and comparisons of classification models for breast ultrasound reports.
Classification MethodsAccuracyRecallSpecificity F 1 AUC
Wang [25]0.78940.75460.82340.86010.8596
Masud [22]0.79200.75810.83800.87960.8576
Zeimarani [9]0.85440.83450.85810.86010.9101
Hijab [20]0.85440.85450.85810.87010.9032
Daoud [19]0.86950.86450.84810.88010.9232
ResNet500.88010.87530.85340.87060.9302
Ours0.89310.88650.89570.89420.9453
Table 4. Comparison of experimental results and performance of rotation classification model.
Table 4. Comparison of experimental results and performance of rotation classification model.
Classification MethodsAccuracyRecallAUC F 1 Parameters
VGG0.84520.83620.82230.8316-
InceptionV30.85540.84540.83540.8478-
DenseNet1210.88680.89980.87310.8962-
ResNet180.86360.87340.85340.87171.35 M
ResNet340.89720.89730.87340.89712.93 M
ResNet500.88960.89630.88620.89313.34 M
Table 5. Experiment of Network structure Ablation.
Table 5. Experiment of Network structure Ablation.
Network CompositionAccuracyRecallSpecificity F 1 AUC
Baseline0.88010.87530.85340.87060.9302
Baseline + Feature extraction module0.85440.83450.84810.85010.8703
Baseline + Channel attention module0.88930.88520.89320.89010.9378
Ours0.89030.88670.89030.89340.9374
Table 6. Comparison of experimental results between single input and dual input.
Table 6. Comparison of experimental results between single input and dual input.
Input TypeDatasetAccuracyRecallSpecificity F 1 AUC
single inputPhoto0.88740.87470.86140.86420.9355
single inputGenPic0.88090.88720.87290.88010.8851
dual inputPhoto + GenPic0.89310.88650.89570.89420.9453
Table 7. Experimental results of different attention modules.
Table 7. Experimental results of different attention modules.
ModelAccuracyRecallSpecificity F 1 AUC
Baseline0.88010.87530.85340.87060.9302
Baseline + CBAM0.88320.87630.87230.87760.9365
Baseline + CA0.89310.88650.89570.89420.9453
Table 8. The impact of different inputs on the classification model.
Table 8. The impact of different inputs on the classification model.
Input Image CategoryAccuracyRecallSpecificity F 1 AUC
Photo + Photo0.88740.87480.86410.86580.9375
GenPic + GenPic0.88240.88450.87510.88610.8973
Photo + GenPic0.89310.88650.89570.89420.9453
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qin, H.; Zhang, L.; Guo, Q. Computer-Aided Diagnosis System for Breast Ultrasound Reports Generation and Classification Method Based on Deep Learning. Appl. Sci. 2023, 13, 6577. https://doi.org/10.3390/app13116577

AMA Style

Qin H, Zhang L, Guo Q. Computer-Aided Diagnosis System for Breast Ultrasound Reports Generation and Classification Method Based on Deep Learning. Applied Sciences. 2023; 13(11):6577. https://doi.org/10.3390/app13116577

Chicago/Turabian Style

Qin, Haojun, Lei Zhang, and Quan Guo. 2023. "Computer-Aided Diagnosis System for Breast Ultrasound Reports Generation and Classification Method Based on Deep Learning" Applied Sciences 13, no. 11: 6577. https://doi.org/10.3390/app13116577

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop