Automated image classification of chest X-rays of COVID-19 using deep transfer learning

Introduction In December 2019, the city of Wuhan, located in the Hubei province of China became the epicentre of an outbreak of a pandemic called COVID-19 by the World Health Organisation. The detection of this virus by rRTPCR (Real-Time Reverse Transcription-Polymerase Chain Reaction) tests reported high false negative rate. The manifestations of CXR (Chest X-Ray) images contained salient features of the virus. The objective of this paper is to establish the application of an early automated screening model that uses low computational power coupled with raw radiology images to assist the physicians and radiologists in the early detection and isolation of potential positive COVID-19 patients, to stop the rapid spread of the virus in vulnerable countries with limited hospital capacities and low doctor to patient ratio in order to prevent the escalating death rates. Materials and methods Our database consists of 447 and 447 CXR images of COVID-19 and Nofindings respectively, a total of 894 CXR images. They were then divided into 4 parts namely training, validation, testing and local/Aligarh dataset. The 4th (local/Aligarh) folder of the dataset was created to retest the diagnostics efficacy of our model on a developing nation such as India (Images from J.N.M.C., Aligarh, Uttar Pradesh, India). We used an Artificial Intelligence technique called CNN (Convolutional Neural Network). The architecture based on CNN used was MobileNet. MobileNet makes it faster than the ordinary convolutional model, while substantially decreasing the computational cost. Results The experimental results of our model show an accuracy of 96.33%. The F1-score is 93% and 96% for the 1st testing and 2nd testing (local/Aligarh) datasets (Tables 3.3 and 3.4). The false negative (FN) value, for the validation dataset is 6 (Fig. 3.6), for the testing dataset is 0 (Fig. 3.7) and that for the local/Aligarh dataset is 2. The recall/sensitivity of the classifier is 93% and 96% for the 1st testing and 2nd testing (local/Aligarh) datasets (Tables 3.3 and 3.4). The recall/sensitivity for the detection of specifically COVID-19 (+) for the testing dataset is 88% and for the locally acquired dataset from India is 100%. The False Negative Rate (FNR) is 12% for the testing dataset and 0% for the locally acquired dataset (local/Aligarh). The execution time for the model to predict the input images and classify them is less than 0.1 s. Discussion and conclusion The false negative rate is much lower than the standard rRT-PCR tests and even 0% on the locally acquired dataset. This suggests that the established model with end-to-end structure and deep learning technique can be employed to assist radiologists in validating their initial screenings of Chest X-Ray images of COVID-19 in developed and developing nations. Further research is needed to test the model to make it more robust, employ it on multiclass classification and also try sensitise it to identify new strains of COVID-19. This model might help cultivate tele-radiology.


Introduction
COVID-19 has caused a global alarm, broken families, crushed economies, reduced income and burdened the healthcare systems. Coronaviruses are a large family of viruses with some causing less-severe diseases, such as the common cold, and others more severe diseases such as SARS-CoV (Severe Acute Respiratory Syndrome Coronavirus) of 2002 and MERS-CoV (Middle East Respiratory Syndrome Coronavirus) of 2012. In December 2019, the city of Wuhan, located in the Hubei province of China became the epicentre of an outbreak of a pneumonia of an unknown cause [1]. The pneumonia spread quickly, and it was reported at an early stage that patients had contact history with the Huanan seafood market. On 7 January 2020, in a throat swab sample of a patient by the Chinese Centre for Disease Control and Prevention [2], the pathogen of this disease was confirmed by molecular methods as novel coronavirus and WHO temporarily called it as "2019-nCoV acute respiratory disease" [3,4]. On 11th, February 2019 the International Committee on Taxonomy of Viruses (ICTV) announced the official name of the virus to be "SARS-CoV-2" (Sever Acute Respiratory Syndrome Coronavirus-2) due to genetic similarity of the virus to SARS-CoV outbreak of 2002. From a risk of communication perspective and to avoid stigmatisation of regions and ethnic groups, the WHO, on the same date announced the name of this new disease as Coronavirus Disease 2019 "COVID-19" [4,5]. This is the 3rd coronavirus outbreak in the past 20 years and the 6th Public Health Emergency of International Concern (PHEIC) declared by WHO since the International Health Regulations (IHR) came into force [3]. The main transmission routes of COVID-19 identified are respiratory droplets and direct contact with symptomatic and asymptomatic persons. The incubation period was observed around 3-7 days with a maximum of 14 days [6]. With the virus and clinical research moving at breakneck pace, more and more rare and unusual symptoms are coming under the light for different age cohorts that may be associated to SARS-CoV-2 virus. The common symptoms of COVID-19 are cough, shortness of breath or difficulty breathing, fever, chills, muscle pain, sore throat, new loss of taste or smell, other less common symptoms have also been reported, including gastrointestinal symptoms like nausea, vomiting, or diarrhoea [2]. Apart from these, rare and unusual symptoms like multi-system inflammatory syndrome in children, strokes and blood clots in adults, COVID-toes, silent hypoxia and delirium are seen and reported in the Scientist, Exploring Life-Inspiring innovation [7]. The above evidence calls for the need for early diagnosis, isolation and treatment to facilitate research and to flatten the curve by isolating positive patients. Real Time Reverse Transcription-Polymerase Chain Reaction, (rRT-PCR), tests were used for confirming COVID19. However, it is time consuming, have high rates of false negatives between 2% and 29% and the supply of nucleic acid detection kits is also limited [8,9]. The role of imaging in COVID-19 is of paramount importance as the disease's characteristics manifestations in the lungs show prior to the symptoms [10]. CT scans and X-Rays both can help in early diagnostics and help monitor the clinical course of the disease. CT scans are advanced X-Rays and hence exposes the patient to nearly hundreds of X-Ray radiations. CT scanners need thorough sanitisation after every patient or risk of catching the disease may increase through contamination. CT scanners may not always be readily available to screen the large number of potential COVID-19 patients, especially in developing countries [11]. It is also a challenging task for the radiologists to expertise in the diagnostics of this disease especially in places with limited number of radiologists. It is the need of the hour that the healthcare and artificial intelligence areas merge to prevent unnecessary deaths and to promote tele-radiology by using internet as a primary source to send clinical data and digital images while also following social distancing. Thus, an attempt was made to contribute to this work for early diagnosis of COVID-19 patients amidst the pandemic [12][13][14]. This study is aimed to establish an early-automated screening model using a low computational transfer learning technique with high accuracy and reduced false negatives by evaluating its performance using performance metrics, that uses easy to procure and more convenient raw Chest X-Ray (CXR) images to distinguish the COVID-19 cases using deep transfer learning techniques in order to assist the radiologists and help flatten the curve and further test its performance on vulnerable countries like India, with low doctor to patient ratio.

X-Ray image dataset
The Chest X-Ray (CXR) images in our data set to predict the COVID-19 disease are combined from 2 different sources ( Fig. 2.1). The first source is the "covid-chestX-Ray-dataset" which is a public open dataset of CXR and CT images of patients which are positive or suspected of COVID-19 or other viral and bacterial pneumonias (MERS, SARS, and ARDS) developed by Joseph Paul Cohen and is available on GitHub repository. This dataset is compiled from public sources as well as through indirect collection from hospitals and physicians. This dataset is constantly updated and images for the chest X-Ray are in dcm, jpg, or png formats [15]. The second source is the "CXR8" database developed by Ronald M. Summers from National Institutes of Health-Clinical Centre from where we have gathered the images of normal patients/Nofindings. These images are available in png format [16]. We have an additional dataset that has been locally and directly obtained from Jawaharlal Nehru Medical College, Aligarh Muslim University (JNMC, AMU), Aligarh, Uttar Pradesh, India, upon which the model shall be tested and the performance metrics analysed. We received these images via e-mail and the details of the patient were cropped out [17]. The first source contained 758 Chest X-Ray (CXR) and CT images of various lung diseases like COVID-19, SARS, and Legionella etc. Total COVID-19 images were 521 from which we dug 417 CXR images with PA, AP and SupineAP view only. From the second dataset we took 447 normal images or No-finding CXR images. These 417 and 447 images were combined to form our database of 864 CXR images. There was an additional dataset acquired locally from J.N.M.C., A.M.U. that had 30 COVID-19 images, extending our database to 447 COVID- 19 and 447 No-finding CXR images, a total of 894 images.

Data processing
The database was divided into 3 parts namely training, validation and testing dataset. The 1st (training) folder had 309 COVID-19 images and 320 No-finding images. The 2nd (validation) folder had 58 COVID19 images and 51 No-finding images and the 3rd (testing) folder had 50 COVID-19 images and 49 Nofinding images. The 4th (local/ Aligarh) folder of the dataset was created to re-test the diagnostics efficacy of our model and this folder had 30 COVID-19 images and 27 Nofinding images.

The CNN Architecture-MobileNet
Artificial Intelligence has bridged the gap between the capabilities of humans and machines. Computer vision is a domain of AI that enables machines to perceive the world like humans. The advancements in these fields has been done over one particular algorithm called a Convolutional Neural Network. A CNN consists of an input layer, hidden layer(s) and output layer. The hidden layers consist of convolutional layers, feature extraction carried out by additional layers such as activations function, pooling layer, batch normalization and the fully connected layer completes the classification process [18].
To develop a diagnostic model for classifying COVID-19 and non COVID-19 patients, we choose a model based on deep learning Mobi-leNet Convolutional Neural Network. MobileNet is considered much faster than standard convolutional network due to its distinct filter approach to each response channel. Our model is constructed on depth wise separable convolution which has two succeeded functions, one is a depth wise convolution at filtering stage: that applies convolution to single input channel at a time, second is a point wise convolution at filtering stage: that performs linear combination of outputs to the depth wise convolution (Fig. 2

.2).
Batch normalization and rectified linear unit (ReLU) layer come after each convolution stage. Computational cost get reduces phenomenally in depthwise separable approach due to separate filtration at combining steps to minimize the size of the model and its complexity. An example is as explained: For a feature map of Df × Df in size, the kernel size is Dk × Dk, the input channel is M, the output channel is N. The computational cost of the standard convolutions can be seen in this equation: The theory above simplifies MobileNet and makes it faster than the ordinary convolutional model, and thus, decreases the computational cost [19].
The Version used for this model was MobileNet_V2 with 3.47 million parameters. The multiplyaccumulate operations (MACs) were 300 million.

Results
The evaluation platform used was the Kaggle cloud service platform with CUDA version 9.2 and the evaluation time taken was 20 min using the 17.1 GB NVIDIA Tesla P100 GPU.  Table 3.1 and Table 3   AUC-ROC Curve: An AUC-ROC curve is a performance measurement for classification problem at various thresholds settings. It is also called as c-statistic. ROC is a probability curve and AUC represent degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0 s as 0 s and 1 s as 1 s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease. The ROC curve is plotted with TPR against the FPR where TPR is on yaxis and FPR is on the x-axis.  Table 3.3 for the testing dataset shows an F1score of 93%. The recall/sensitivity for detection of COVID-19 specifically is 88%. The macro-average and weighted-average recall of the classifier are both 93%. The classification report in Table 3.4 for the local/Aligarh dataset manifests an F1-score of 96%. The recall/sensitivity for detection of COVID-19 specifically is 100%. The macroaverage and weighted-average recall of the classifier are both 96%. The False Negative Rate (FNR) calculated from the above formula is 12% for the testing dataset and 0% for the locally acquired testing dataset (local/Aligarh).
Accuracy is considered important when true positives and true negatives are key. It is a useful measurement of performance when the classes are balanced. In the case where false positives and false negatives are of immense significance, F1-score is a better metric. This is also useful when the classes are imbalanced. We will tend to focus slightly more on the F1-score than accuracy, although in our case both are equally important.
From Table 3.5 we see that the model has an error rate of 3.67% for the 20th epoch. From Table 3.6 we see that the top 1 accuracy of our model is 96.33%. This means that the first guess of our model is 96.33% accurate and the model gives 100% accurate prediction results on the second guess. The optimal learning rate for our model lies between 1e -03 and 1e -02 as shown in Fig. 3.9, where it gives the least loss. Learning rate     is a hyper-parameter while configuring your neural network that controls how much to change the model in response to the estimated error each time the model weights are updated. Small values can result in longer training process of the model whereas higher values could lead to an unstable training process. Fig. 3.10 shows the loss function for training and validation dataset. The loss functions tracks loss during training over time as it is evaluated on the individual batches during the forward pass. For most of our training we obtained a good learning rate with a low value of loss. During validation we see a couple of peaks but with a continuous decay in loss which eventually falls down to a small low value of loss. Thus, the results for loss function was a good "flattened loss". Figs. 3.11, 3.12, 3.13, 3.14, 3.15 and 3.16 all show the predicted output class for each of the input raw CXR images. Figs. 3.14, 3.15 and 3.16 are the predicted class of the locally acquired images from Aligarh, U.P., India. The execution time for the model to predict the input images and classify them is less than 0.1 s. The time taken by the classifier to classify two random images was 0.08546829223632812 s and 0.08694672584533691 s.
Our model compared to other binary classification models using            radiological imaging for the diagnosis of COVID-19 is shown below.

Discussion
The novel coronavirus that emerged in Wuhan has now submerged the entire globe. It has exposed the broken healthcare systems and lack of healthcare facilities. The gold standard nucleic acid-based detection has its shortcomings including high rate of false negatives due to factors like methodological disadvantages, disease stages etc [20]. Many open sources of COVID-19 images have emerged for the public to use for the greater good. Ying S. et al. (2020) proposed DRE-Net model built on the pre-trained ResNet50 and attained an accuracy of 86% using CT scans [12]. They used 777 COVID-19 images and 708 Healthy images.  [25]. These studies comparable to our model can be viewed in Table 3.7. Our data is taken from two open sources and it consists of 864 CXR images. Our study uses Chest X-Ray (CXR) type images with 417 COVID-19 and 447 No-finding CXR images. Our utmost concern was to have CXR images of comparable age groups of people as age can have a significant impact on the diagnostic ability of the automated model. Tulin  [26,27]. This accuracy could be attributed to use of non-COVID images of children dataset belonging to the age 1-5 years, due to the lack of availability or restriction on adult image datasets during the time of their study. Children CXR images are evidently different from adult CXR images which if used can cause the accuracy to shoot up due to obvious differentiation between the two. Hence, taking data of similar age cohorts is indispensable. Our proposed model attained a high accuracy of 96.33%. Since our data was balanced the accuracy does play an important role in the justification of our model. As we have the data of an infectious disease, it is required that we reduce the number of false negatives i.e. positive COVID19 patients being predicted as negative. The FN value of our model was exceptionally low. Recall is also a useful metric in cases where false negative trumps false positive as we accidently do not want to discharge an infectious person and have them mixed with the healthy population thereby spreading the contagious virus. The value of F1-score which is the harmonic mean of precision and recall is 93% for the testing dataset and 96% for the second testing (local/Aligarh) dataset. For the 1st testing dataset, we obtained the recall value of the classifier as 93% and a recall value for the class of all COVD-19 images as 88%. For the 2nd (local/Aligarh) testing dataset, we obtained the recall value of the classifier as 96% and a recall value for the class of all COVID-19 images as 100%. The False Negative Rate (FNR) is 12% for the testing dataset and 0% for the locally acquired dataset (local/Aligarh). According to another study, for a 100% assumed specificity of the diagnostic assay the FNR was 9.3% and sensitivity/ recall was 90.7% and it was suggested that rRT-PCR results alone should not be the deciding factor for COVID-19 [28]. This means that the false negative rate is quite low for the above mentioned datasets.
Henceforth, all the above performance metrics indicates that with a few more training and tests this model can be employed in hospitals as aimed by the paper.
The advantages of using this technique model are: 1) Higher accuracy 2) Extremely low false negatives 3) Classification of raw images of data 4) Fully automated end-to-end (back-end-to-front-end) structure 5) No feature extraction required 6) Results in less than a second 7) X-Ray images are easier to procure than CT scans, especially in developing countries. 8) Patient is exposed to lesser radiation in X-Ray 9) X-Ray units are easier to sanitise than CT scanners 10) An approach to assist radiologists 11) Promote tele-radiology while abiding the social distancing protocol Due to the COVID-19 outbreak and rising cases in the world, especially India, our study was limited from being clinically tested. We could collect only 30 local radiology images for COVID-19 cases and evaluate them with our model. In future, we aim to validate the model by incorporating more CXR images and will further take the model a step ahead from binary classification to multi-class classification. We might even experiment with different layering structures and compare the results. After conducting clinical trials, we aim to deploy the model in the local hospitals for early screening of potential COVID-19 patients. We aim to further test the model on multiple classification and increase its sensitivity on the different variants of COVID-19.

Conclusion
In this study, we established an early screening and fully automated model with end-to-end structure without necessitating feature extraction for the detection of COVID-19 from Chest X-Ray images with deep learning based technologies and an accuracy of 96.33%. The value of recall/sensitivity for testing and local/Aligarh dataset was 93% and 96% respectively. The False Negative Rate (FNR) is 12% for the testing dataset and 0% for the locally acquired dataset (local/Aligarh). Imaging of COVID-19 with X-Rays is more feasible in developing countries and wherever the patient count surpasses existing imaging modalities. This automated model can help reduce patient load for radiologists. It could be a promising supplementary aid for frontline workers typically in countries where more and more healthcare workers have been isolated after having tested positive causing an acute shortage of health care workers. Social distancing norms are also fulfilled as this technology promotes tele-radiology. COVID-19 has burdened the healthcare systems and economies. Early diagnosis with the aid of image classification models allows early containment of this contagious disease and assists in flattening the curve. We intend to make our model more robust and accurate by validating it with additional images database.

Clinical trial
No clinical trial conducted. Images from Jawaharlal Nehru Medical College, Aligarh Muslim University (JNMC, AMU), Aligarh, and Uttar Pradesh, India were tested to measure the diagnostic efficacy of the model.

Informed consent and patient details
Images were mostly taken from online open data sources. The local/ Aligarh images were taken without knowing the details of the patient. Further possible traces of the patient on the CXR images were also removed before usage.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.