Analysis of identifying COVID-19 with deep learning model

Coronary Virus Disease 2019 swept the world and caused serious impact on human society. Doctors usually use CT scan pictures and chest X-ray images to determine whether a patient is infected. Many researchers try to use deep learning methods to test COVID-19 of patients. However, there are many problems when using deep learning methods for feature extraction, such as: fewer data samples, unclear pictures, and pictures containing special marks. This article uses deep learning methods for COVID-19 detection and visual analysis of popular deep learning methods. Experiments verify that when using deep learning in the public small sample COVID-19 dataset, a small part of the test results are not reliable. We propose solutions to the problems of deep learning during COVID-19 detection.

ICEMCE 2020 Journal of Physics: Conference Series 1601 (2020) 052021 IOP Publishing doi:10.1088/1742-6596/1601/5/052021 2 of public papers [9]. He et al. proposed the Moco [10] transfer learning scheme for this small data set. Tartaglione believes that there are several problems with training on small data sets. 1. Transfer learning on feature tasks requires caution. 2. The hidden prejudice that appears in the data set will become the specific identification of the data. 3. The small data size cannot provide any statistical certainty for learning.
For these problems, we first use the deep learning model to detect on the COVID-19 dataset. We visually analyze the detection results of the deep learning test results and conclude that some of the deep learning test results are not reliable and have nothing to do with the patient's health.

COVID-19 recognition based on deep learning method
The use of convolutional networks to extract images has become a common method in computer vision. The use of convolutional neural networks can quickly build a feature extraction model, which is not too concerned about the effectiveness of some features. Apostolopoulos et al. used vgg19 and MobileNet v2 to perform feature extraction on X-ray images and found that MobileNet v2 is superior to VGG19 in terms of specificity [11]. They believe that MobileNet v2 is the most effective model for specific classification tasks and specific data samples. Wang et al. designed COVID-Net specifically for detecting COVID-19 cases from CXR images. In order to observe the COVID-Net architecture and visualize the internal convolution diagram to explain the decision factors, a large number of projection-expansion-projection design patterns are used [12]. Hemdan et al. verified the feasibility of COVIDX-Net on their own data set, using VGG19 and DenseNet as feature extractors to obtain considerable accuracy [13].
Tartaglione et al. believe that if different small data sets are used and there are any deviations (corners, medical equipment or other contingent factors, such as tags of the same age and gender), the deep learning model can learn to recognize these data set deviations, and Not focusing on the functions related to COVID [14]. Maguolo changes the center of the X-ray scan to black and trains the classifier only outside the image. They use X-ray image datasets that do not contain most of the lungs and can obtain similar results with other classifiers. It is inferred that several recognition methods are not related to the existence of COVID-19, and the characteristics of convolutional network learning may not be related to the patient's health status [15].

Deep learning model analysis
Feature extraction using convolutional neural networks will naturally bring a black box effect. Especially for small data sets, the focus of the convolutional network may be different from our original intention, and more is to judge the sample in a simple way. Many researchers are already exploring interpretive work on convolutional networks. CAM visualizes based on the last layer feature map of the convolutional network, and retrains to obtain the Global average pooling [16]. Grad-cam selects the node with the largest softmax value (corresponding to the category with the highest confidence level) back propagation, and finds the last layer of the base layer Gradient, the average value of the gradient of each feature map is used as the weight of the feature map [17]. Grad-cam++ is optimized for Grad-cam, and introduces pixel-level weighting of output gradients for specific locations [18]. Focus more on areas that contribute significantly to the recognition of objects. Wang et al. proposed a novel post-event visual interpretation method called Score-CAM based on class activation mapping. Unlike previous methods based on activation mapping, Score-CAM gets rid of the dependence on the gradient by obtaining the weight of each activation mapping [19]. The ProtoPNet network dissects the image by looking for prototype parts, and combines the evidence from the prototype for final classification [20].

Deep learning recognition model
Research data on COVID-19 has recently been published in small parts. Our deep learning analysis is mainly based on the COVID-CT-Dataset [9] and COVID-ChestXRay [8] data sets. These two data sets have been widely used in COVID-19 detection. In order to improve the performance of the detection model, we use DenseNet as the feature extractor of the detection model [21]. Due to the large number of reused features of DenseNet, a large number of jump connections can fully guarantee the reverse optimization of the front-end network gradient saturation. The model of COVID-19 recognition using DenseNet as backbone is shown in Figure 1. Since this is a small sample data set, in this paper we follow the self-supervised learning method proposed by He et al. for training [22]. It is designed to capture the inherent patterns and attributes of input data (such as CT images) without using manually provided tags. The actual situation is that COVID-19 has very few data, and there are many non-COVID-19. The basic idea of self-supervised learning is to construct some auxiliary tasks based on the data itself, without the use of manually annotated labels, and force the network to learn meaningful representations by performing auxiliary tasks well. The deep learning recognition model in this paper is trained on a label-free data set, and then fine-tuned on the label-containing data set.

3.2.Analysis methods and strategies
Our analysis of the model uses the Score-CAM method [19], which is the latest classification interpretation model. Score-CAM gets rid of the dependence on the gradient by obtaining the weight of each activation graph on the target class through its forward score. The final result is obtained by a linear combination of weight and activation graph. Score-CAM has two important stages, as shown in Figure 2. In stage 1, the activation graph of the specified level is extracted. The activation map will act as a mask for the original image. In stage 2, each mask is up-sampled, normalized, and the mask is multiplied with the original picture. The results of stage 2 are sent to the fully connected network to obtain the classification weights. Finally, the result can be generated by a linear combination of the score-based weight and the activation graph. Stage 1 and stage 2 share the same CNN module with the feature extractor.
In order to explore which part of the X-Ray image the deep learning model is more recognizable. Figure 3 describes this method. Using DenseNet as a backbone, we trained a prototype part that can pass through the object to dissect the image, and combined with the ProtoPNet recognition method from the prototype for final classification [20].  Figure 2. Score-CAM inspection process. There are two stages. In the stage 1, the activation layer is extracted. In the stage 2, the activation layer is adopted and standardized, and the original features are multiplied element by element. Finally, the weights are input to the fully connected layer, and finally the weights are linearly fused with the activation layer.

Implementation Details
All our experiments are performed on an i7-8700k server using NVIDIA GTX1080Ti GPU with 32GB memory. The training data sets are from COVID-CT-Dataset and COVID-ChestXRay data sets. Both models use DenseNet as the backbone, and use the pre-training parameters.

Experimental results
We use ProtoPNet to detect COVID-19 pictures. Table 1 shows our experimental results. In randomly cropped pictures, DenseNet can achieve an accuracy rate of 0.85. We crop the picture to remove interference information, and DenseNet can achieve an accuracy rate of 0.82. Using Vgg19 as a feature extractor has limited performance, and its accuracy is 0.77. From the experimental results, random cropping can bring performance improvements to classification. In order to investigate whether self-supervised learning really works on transfer learning, we use DenseNet to directly conduct classification experiments. In Table 2, the accuracy of self-supervised learning+random crop reaches 0.83, and the area under curve (AUC) is 0.89. However, the accuracy of self-no-supervised learning+random crop is 0.91, and the area under curve (AUC) is 0.90.

4.3.Internal analysis and interpretation of the model
We use Score-CAM to analyze deep learning models. Score-CAM visualization results of the model are shown in Figure 4. Based on the above analysis, the simple use of deep learning methods to identify on the small data set related to COVID-19 and the results that cannot be obtained have nothing to do with the patient. The deep learning method only classifies the characters in the texture, edges and pictures. Experimental and visualization results show that self-supervised learning and center cropping can effectively trend the model to focus on the center area of the picture. This area can effectively reflect the patient's health. What these methods have in common is to improve the authenticity of the data set and eliminate background interference.

5.Conclusion
This article explores the problems caused by deep learning methods for COVID-19 recognition. The existing image datasets related to COVID-19 are relatively small, so using deep learning for recognition will deviate from our original intention. The experimental results show that the deep learning method tends to be a simple method to classify pictures. When the data set is too small, the image has the characteristics of character, which will interfere with the model. The final recognition result has nothing to do with the patient's lung health. In the process of identifying COVID-19 using a deep learning model, we recommend: 1. enough data should be constructed to use deep learning. 2. Clean the data to remove special marks. 3. Increase the diversity of the data set.