SAM: Self-augmentation mechanism for COVID-19 detection using chest X-ray images

COVID-19 is a rapidly spreading viral disease and has affected over 100 countries worldwide. The numbers of casualties and cases of infection have escalated particularly in countries with weakened healthcare systems. Recently, reverse transcription-polymerase chain reaction (RT-PCR) is the test of choice for diagnosing COVID-19. However, current evidence suggests that COVID-19 infected patients are mostly stimulated from a lung infection after coming in contact with this virus. Therefore, chest X-ray (i.e., radiography) and chest CT can be a surrogate in some countries where PCR is not readily available. This has forced the scientific community to detect COVID-19 infection from X-ray images and recently proposed machine learning methods offer great promise for fast and accurate detection. Deep learning with convolutional neural networks (CNNs) has been successfully applied to radiological imaging for improving the accuracy of diagnosis. However, the performance remains limited due to the lack of representative X-ray images available in public benchmark datasets. To alleviate this issue, we propose a self-augmentation mechanism for data augmentation in the feature space rather than in the data space using reconstruction independent component analysis (RICA). Specifically, a unified architecture is proposed which contains a deep convolutional neural network (CNN), a feature augmentation mechanism, and a bidirectional LSTM (BiLSTM). The CNN provides the high-level features extracted at the pooling layer where the augmentation mechanism chooses the most relevant features and generates low-dimensional augmented features. Finally, BiLSTM is used to classify the processed sequential information. We conducted experiments on three publicly available databases to show that the proposed approach achieves the state-of-the-art results with accuracy of 97%, 84% and 98%. Explainability analysis has been carried out using feature visualization through PCA projection and t-SNE plots.


Introduction
Coronavirus disease (COVID-19) is a viral respiratory disease that initially emerged in China when a cluster of patients with unknown pneumonia was reported in the capital of Hubei Province (Wuhan). The virus that caused the disease was identified to be severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by the International Committee on Taxonomy of Viruses based on phylogeny, taxonomy, and established practice [1]. At the time of writing, the World Health Organization (WHO) reported that approximately 185 million people are affected and 92, 798 deaths worldwide [2]. Moreover, animals such as cats and dogs have also been reported to be infected with SARS-CoV-2 in many countries, including the United States. Thus, WHO declared this virus a "Public health emergency of international concerns" and classified it as a pandemic on March 2020 [3]. Most infected people develop mild to moderate illness and common symptoms are runny nose, body aches, cough, fever, sore throat, and shortness of breath [4]. Since the beginning of the pandemic, several diagnostic methods have been approved by several international muhammad.usman@oulu.fi (U. Muhammad); mdziaul.hoque@oulu.fi (Md.Z. Hoque) ORCID(s): and country-specific agencies. However, there is no clear consensus on the correct tests to be used related to any acute complaints to yield a correct diagnosis in a timely constraint. In EU member countries, there are 365 different commercialized devices that have been used for conducting a such research. Among them, 168 are Immunoassays, three are sequencing-based methods, 192 are PCR-based methods, and two commercialized tools are based on different medical devices [5]. WHO recommended RT-PCR test (developed by Corman), which is nowadays considered as the current standard for detecting a coronavirus infection. However, the false-negative rate was found to be approximately 20% to 40% in the infected cases in China due to inappropriate sample collection, faulty operation, storage, and low sensitivity test kits [6].
Along with laboratory testing, chest CT scans with the help of a radiologist can be considered as a complementary tool with RT-PCR [7]. COVID-19 infected patients show ground-glass opacities (GGO) in the periphery of both lungs, and appear more grey or hazy as opposed to the normally dark-appearing lungs. It is also stated that those patients who recovered from COVID-19 pneumonia, lung disease was observed ten days after the onset of symptoms [8]. In the early days of the pandemic, clinical centers in Wuhan were working with an insufficient number of often malfunctioning test kits, resulting in a concerning amount of false negatives.

J o u r n a l P r e -p r o o f
Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images To counteract these challenges, doctors were persuaded to make diagnoses based only on laboratory and chest CT results [9]. In developing countries, such as India, where the number of test kits remains low, CT is also used for COVID-19 detection.
In addition to CT scans, chest X-ray scanning machines are easily accessible in almost all hospitals, and have a potential role in the diagnosis because X-ray images represent visual indexes linked with COVID-19 [7]. In Fig.  1, we visualize example images from normal, COVID-19, pneumonia, and bacterial pneumonia classes, taken from two X-ray image databases [10,11,8]. Thus, radiologic images obtained from COVID-19 cases with laboratory results may help in the early detection of infection. The study conducted on CT images by Kong et al. [12], demonstrates acute bilateral airspace opacities in infected patients. Zhao et al. [13] reported that most patients had a fever as the onset symptom. Based on the result of X-ray scans, GGO 87%, vascular enlargement in the lesion 72% or mixed GGO, and consolidation 65% appeared. Moreover, authors show that lesions present on CT images are more likely to have a peripheral distribution. Li et al. observed that chest CT had a small rate of missed diagnosis of COVID-19. GGOs and consolidation with or without vascular enlargement are common CT features of COVID-19 and may be useful as a standard method for the rapid diagnosis of COVID-19 [7]. Similarly, Zu et al. [14] concluded that 1649 of chest CTs can have rounded lung opacities.
Machine learning (ML) techniques are attracting substantial interest in the medical field, where deep learningbased models have been successfully utilized in many healthcare applications such as depression detection [15], pain estimation [16], breast cancer detection [17], Alzheimer's disease classification [18], and pneumonia detection from chest X-ray images [19]. Due to the increase in COVID-19 cases, healthcare systems have been overwhelmed and require alternative solutions for the automated diagnosis of COVID-19. In this regard, many attempts have been put forward to address such problems using radiology images [8,9,7,20,21,12]. However, it is not feasible to build a large labeled database for every disease, i.e., viral pneumonia, COVID-19, bacterial pneumonia, aspiration pneumonia, etc. Thus, the bias in small datasets and the lack of representative training and tuning data impair the performance of such deep learning models.
A simple way to deal with these challenges consists in applying data augmentation techniques, which enable researchers to significantly increase the diversity of data, without collecting new data. However, augmented data that could be borrowed from unlabeled data [22], random erasing [23] or randomly masking regions [24] are heavily dependent on training parameters. For instance, a slight rotation between 1 to 30 or a random cropping ((288, 288) → (224, 224)), could be useful on digit recognition tasks such as MNIST, but as the rotation degree increases, the label of the data is no longer preserved under post-transformation [25]. The dominant approaches such as Generative adversarial networks (GANs) [26], Bidirectional GANs [27], the DCGAN [28], Progressively Growing GANs [29], the CycleGAN [30], generate synthetic images but require careful domain adaptation to transfer the knowledge and features to the real image domain. A study based on combined CNN-BiLSTM reveals that training samples of COVID-19 needs to be enlarged to test the generalizability of the developed systems [31]. The authors claim that COVID-19 images might be associated with multiple disease symptoms, and demand computeraided diagnostic systems (CAD) to detect them accurately and rapidly. Most of the existing methods either consider 2-class (normal vs.  i.e., binary classification) or 3-class classification (normal vs pneumonia vs . To overcome this issue, Asif et al. [32] proposed a deep learning model based on Xception architecture for 4class cases (COVID vs Pneumonia bacterial vs pneumonia viral vs normal). Wang et al. [33] introduces COVID-Net and achieved 83.5% accuracy in classifying pneumoniabacterial, COVID-19, normal, and pneumonia-viral classes. However, the performance for multi-class disease predictions remains unclear. Except ensemble-CNNs [7], none of the methods discussed to treat 5-class cases.
On the other hand, existing CNN-LSTM based COVID-19 detection methods [34] treat the convolutional features as equally important and ignore the interference information (e.g., mutual exclusion and redundancy), which can prevent learning of long data sequences. Moreover, the high dimensional vector generated by CNN can increase the network parameters of LSTM and make the network difficult to optimize. Therefore, motivated by the urgent need to develop an artificial intelligence (AI) solution to aid in rapid evaluation of different lung diseases with COVID-19 detection, inspired by open source available databases, we propose a unified architecture that consists of a deep convolutional neural network (CNN), a feature augmentation mechanism, and a bidirectional LSTM (BiLSTM) for the detection of 5-cases including COVID-19 from X-Ray J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images images. Specifically, the feature augmentation mechanism based on reconstruction independent component analysis (RICA) [35] is designed in such a way that it improves the performance of CNN-based BiLSTM architecture by approximating the real distribution in feature space rather than in data space, where the generated features are mutually independent and promise diversity. To the best of our knowledge, this is the first work in COVID-19 literature that implements feature augmentation without performing any training data augmentation strategy. An illustration of a such concept is provided in Fig 2. By employing this strategy, the interference information or redundancy is significantly eliminated by selecting the low-dimensional augmented features.
In addition, it is worth mentioning that chest radiography analysis is known to have inherent limitation in early stages of COVID-19 detection, due to low sensitivity in groundglass opacity detection [14]. Moreover, recovered patients are likely to be protected against reinfection for several weeks but may still transmit the virus. However, well-trained deep learning methods can focus on anomalies that are not visible to human eyes, and may encourage their applications in a health care system. Overall, our main contributions in this paper are summarized as follows: 1. We introduce a deep feature augmentation framework to improve COVID-19 detection mitigating the current lack of sufficient annotated data.

We employ a combined CNN-BiLSTM network to
show that the proposed low-dimensional augmented features are more compact and more powerful than raw CNN features for the diagnosis of COVID-19 in a robust manner. 3. The effectiveness and validation of our proposed method have been extensively explored on three publicly available datasets and compared with state-ofthe-art results. 4. PCA and t-SNE feature visualization has been utilized to demonstrate the explainability of the proposed learning model. Moreover, a detailed experimental analysis is conducted in terms of specificity, sensitivity, F1-score, accuracy, confusion matrix, and receiver operating characteristic (ROC) to determine the performance of the proposed method.
The rest of this paper is organized as follows. Section 2 deals with literature review in the field. Section 3 details the proposed method highlighting the different phases of our proposal, including deep-feature extraction, augmentationbased learning module and the associated recurrent neural network. Section 4 emphasizes the experiment results, including dataset description, evaluation metrics, implementation details, results and explainability analysis. Finally conclusive statements and perspective works are provided in Section 5.

Literature Review
In recent months, researchers have evaluated SARS-CoV-2 infected chest X-ray images using convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs) and encoder-decoder models. A brief overview about recent developments is provided in this section.
Narin et al. [36] utilized five pre-trained deep learning models (InceptionV3, Inception-ResNetV2, ResNet50, Res-Net101 and ResNet152) to detect COVID-19, bacterial pneumonia and viral pneumonia. Their best performing model achieved an accuracy of 98% using a ResNet-50 CNN pre-trained on COVID-19 images. Rajaraman et al. [37] proposed a method that increases training data using weakly labeled data augmentation. A stage-wise approach was used to train the CNN. The authors concluded weakly labeled data augmentation is superior in comparison to baseline nonaugmented training. A correlation learning mechanism is proposed, and the images are augmented by flipping, image nosing, and rotation in [38].
Ozyurt et al. [39] used a traditional machine learning descriptor (LBP), and a feature selector method that selects most informative features together to achieve a better performance, achieving a 95.84% classification accuracy on CT images. Rahimzadeh and Attar [40] introduced a combined deep CNN to identify 11302 chest X-ray images. In their study, Xception and ResNet50V2 were used and claimed as a new strategy to address the unbalanced dataset J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images problem. They reported an accuracy rate of 99.56%. Castiglioni et al. [41] utilized an independent dataset of 110 patients suspected for COVID-19 infection, and developed a ten convolutional neural networks (CNNs) to evaluate the performance.
Multiple state-of-the-art deep learning models including DenseNet201, Resnet50V2 and Inceptionv3, were finetuned individually to make independent predictions in [42]. Then, a weighted average ensembling technique was used to combine them to achieve a classification accuracy of 91.62%. Similarly, Wang et al. [33] proposed a tailored deep convolutional neural network for classifying chest X-ray images. Hemdan et al. [43] examined seven different CNN architectures in their experiment, including DenseNet-121, VGG-19, ResNet-V2, Inception-V3, Xception, MobileNet-V2 and InceptionResNet-V2. Their work revealed that the DenseNet and VGG-19 models achieved the best performance with 91% accuracy for detecting COVID-19 and non-COVID-19 infections.
Berrimi et al. [44] fine-tuned two pre-trained models, Den-seNet and InceptionV3 to classify both X-ray and CT chest scans. To increase the diversity of the training data, the images were rotated, zoomed, horizontally flipped, and shifted. Nour et al. [45] designed a five convolution layers (CNN) from the scratch. The extracted CNN features are then evaluated with traditional machine learning classifiers such as k-nearest neighbor, support vector machine (SVM), and decision tree. The authors concluded that the SVM classifier with an accuracy of 98.97% performs the best among all of them. Giacomo et al. [46] detect lung disorder by using X-ray images. Specifically, a fuzzy logic segmentation method combined with a neural network is proposed, and accuracy of 92.56% is reported in their work. Yoo et al. [47] used the pre-trained ResNet18 and different decision trees are utilized to detect CXR images as normal, tuberculosis, and COVID-19. Aslan et al. [34] proposed a hybrid architecture based on CNN-BiLSTM for COVID-19 detection. Moreover, the authors employed two deep learning architectures including Artificial Neural Networks (ANN) and a hybrid structure containing a BiLSTM layer to utilize the temporal properties. The accuracy of 98.14% and 98.70% were achieved using first and second architecture, respectively. Nayaar et al. [48] showed thoracic (chest) imaging are found to be effective in the diagnosis of coronavirus disease (COVID-19). Mukherjee et al. [49] utilized a lightweight (9 layered) CNN-tailored deep neural network to detect COVID-19 positive cases, and achieved an overall accuracy of 96.28%. A federated learning is proposed to detect COVID-19, and 98.72% accuracy was reported in [50]. Challenges, innovations and opportunities to detect COVID-19 are discussed in [51]. Mukher et al. [52] proposed a a light-weight CNN-tailored shallow architecture to detect COVID-19. The proposed model was designed with fewer parameters as compared to other deep learning models and validated using 321 COVID-19 positive Chest X-ray images with an accuracy 99.69%. Marcin et al. [53] designed a method for diseased tissues detection over input X-ray images.
M. Turkoglu [54] employed the transfer learning approach by using the AlexNet architecture. To choose the most effective features, the Relief feature selection algorithm is used. Finally, the Support Vector Machine (SVM) is applied to detect COVID-19, and Pneumonia disease. An accuracy of 99.18% was reported. Sahlol et al. [55] proposed a combined approach where Inception model is utilized to extract the features and a swarm-based feature selection algorithm is applied to choose the most relevant features. Two public COVID-19 X-ray datasets are used and 99.18% accuracy was reported. Mesut et al. [56] utilized MobileNetV2 and SqueezeNet models to extract the deep features. Then, the Social Mimic optimization method is proposed and the features were combined and classified using SVM classifer. To overcome the limitation of chest Xray samples, Karbhari et al. [57] proposed an Auxiliary Classifier Generative Adversarial Network (ACGAN) to generate synthetic images. Based on obtained images, Convolutional Neural Networks (CNNs) is utilized to detect COVID-19 in the CXRs.
Loey et al. [58] utilized a GAN architecture to synthesize auxiliary images as a motivation to overcome the issue of lack of datasets especially in chest X-rays images. Three deep transfer models are selected to detect four classes, i.e., the COVID-19, normal, pneumonia bacterial, and pneumonia virus. Googlenet performed the best in their work. The network consists of encoder and decoder is proposed in [59], to show that CORONA-Net performs the best for COVID-19 detection. MASC-Net consists of a multi-input encoder-decoder, and introduced to automatically detect infected lung regions from COVID-19 chest CT scans [60]. 3D U-Net is proposed as encoder-decoder method in [61], where the multi-task learning is applied and compared with four transfer learning strategies. The authors concluded that using multiple lung lesion datasets can extract more general features.
Therefore, previous research showed that chest X-ray images have been commonly used in most of the current works and have an important role in the diagnosis of COVID-19 detection. However, learning from imbalanced data or lack of necessary extracted features obtained from limited X-ray training samples cannot provide the expected performance in the COVID-19 detection. Thus, the proposed work focuses on a data augmentation strategy where the label preserved features are generated to improve the performance of deep learning model.

The Proposed Method
The general framework of the proposed approach is divided into three components: (1) extraction of deep features (2) an augmentation-based learning module and (3) a BiLSTM based sub-network. We first describe the procedure of feature extraction for guiding the process of feature generation. Next, we explain the procedure of augmenting the training data in feature space. Finally, the structure of J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images BiLSTM network is discussed. The overall procedure of the proposed approach is illustrated in Fig 3.

Deep features extraction
Inspired by the performance of deep learning models, we adapt ResNet-50, a CNN architecture known for its stability and performance, to extract high-quality features for our task [62]. The model is fine-tuned by replacing the last fullyconnected layer with a new fully connected layer and setting the number of outputs equal to the number of classes in the dataset. We freeze the weights of the first ten layers so that the gradients of these do not need to be computed. This is motivated by the fact that earlier features of ResNet contain more generic features (e.g. color blob detectors or edge detectors) and make the remaining layers more specific to the details of the classes contained in the original dataset. The weights of the new fully-connected layer are increased by a learning factor 10 and a bias factor 20. By biasing the weight updates in the new fully-connected layer, the influence of each training sample in the new data set is magnified and training time is reduced. We utilize crossentropy loss to adjust model weights during training. The purpose is to decrease the loss and motivate the network towards accurate predictions. It is defined as where denotes the true label, the softmax probability for the th class, and the total number of classes.

An augmentation-based learning module
Data augmentation in the image space is a well established technique that enhances the size and quality of training datasets such that deep learning models can robustly model the training data. However, feature augmentation has not yet acquired the same level of attention. This is crucial for applications like COVID-19 detection, where the number of training samples remain limited. To accomplish this, feature augmentation is conducted based on reconstruction independent component analysis (RICA) [35]. The latter was designed to overcome the drawbacks of independent component analysis (ICA) by replacing ICA's orthonormality constraint with a soft reconstruction penalty, which turns out to be very useful in learning sparse features. Therefore, the idea behind our proposed mechanism is to extract more meaningful information from generated ones to correctly classify target samples.
In our case, RICA receives data as input from the last pooling layer of RestNet-50, then it converts it into a new lower-dimension representation. In order to apply transformation, RICA is calculated by using the following equation: where is the vector representing the CNN features, denotes the matrix, and are the independent components for dimensionality reduction. The goal of RICA is to define the observed data by mixing the components . We need to determine both and from the data because we can not directly extract the sources , nor know the mixing matrix . Let be the inverse of , then the model can be expressed as: = Hence, using the original data , the goal is to determine a set of vectors (corresponding to the column vectors of matrix J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images ) that will form the features sparse; while being an orthonormal basis. In this regard, our matrix will assign the data to features . The optimization problem defined by RICA becomes [35]: where evaluates the objective and non-linear convex function and ∈ ℝ × (where denotes the features and is the number of data vectors in ). 1 denotes the sparsity penalty and has a tied reconstruction matrix . To decrease the computational cost of the optimization, limited-memory BFGS (LBFGS) algorithm method [63] is used as a constrained optimizer that results in fast convergence. Moreover, RICA can manage data with approximate whitening or even without whitening [35].
As illustrated in the formulation of RICA in equation (4), in the first part, represents the weight assigned to the sparsity constraint in relation to the recreation condition. The second part emphasizes accurate recreation of the original features by minimizing the recreation error ‖ ‖ − ‖ ‖ . In this regard, we fix the feature dimension of pooling layer features equal to 400, and empirically set the weights to 80, 100, and 120 because the higher the weight we give to the sparsity constraint the less precise will the recreation be, and vice-versa. Hence, we obtained three augmented feature vectors with different weights by keeping the same dimension 400 as mentioned above. Similarly, we repeat the same procedure to obtain three augmented features sets for the second dataset by setting the feature dimension to 500. The representations learned by the augmentation mechanism contain discriminative information related to the classes, which allows the network to accurately predict them.

Recurrent Neural Network (RNN)
Mainstream CNN frameworks are related to conventional statistical models, thus lacking the capacity to map sequences to sequences. BiLSTM [64] is one kind of RNN, which has the ability to process sequences of arbitrary length, and has obtained surprising performance in natural language processing [65]. However, the high dimensionality and sparsity of the data are one of the major challenges that limit its performance. Taking advantage of the lowdimensional RICA features, BiLSTM performs better than using the raw CNN features (we further discuss this argument in Section 4.4). The BiLSTM is implemented similarly to the standard bi-directional LSTM except that the input is based on three augmented features. We found that the proposed strategy calculated on each time step resulted in improved reconstructions, which we found to be vital to accomplish our feature augmentation process.
BiLSTM Networks capture each sequence vector based on the memory cell ( ), and compete for retaining dependencies between the elements in the input sequence. It is comprised of an input gate ( ), an output gate ( ) and a forget gate ( ). The input gate governs the information flow into the cell by multiplying the cell's non-linear transformation of inputs . The output gate decides how much information from the cell is used to compute the output activation of the LSTM unit. The forget gate regulates the extent to which a value remains in the cell. The LSTM unit updates for time step are: where is the input at the current time-step, is the current cell state, , and is the input gate activation, forget gate activation and output gate activation respectively, illustrates the logistic sigmoid function and ⊙ represents element-wise multiplication.

Experiments and Results
In this section, we first provide a brief description of three databases that are used to evaluate our method. Then, we present evaluation metrics, implementation details, and experimental results, which are discussed later in comparison to state-of-the-art methods.

COVID-19 X-ray scan database:
The second dataset is collected from the open access source provided by Vantaggiato et al. [7], where two scenarios are examined. In the first scenario, three classes (Normal, COVID-19, and Pneumonia) are provided in the dataset. For training, each class has 404 images. Validation and testing set contain 100 and 207 images, respectively. In the second scenario, two more classes were added by the authors yielding a five class model: Normal, COVID-19, Viral-Pneumonia, Bacterial-Pneumonia and Lung-Opacity. This dataset (https://github.com/Edo2610/Covid-19_X-ray_ Two-proposed-Databases, (acces-sed on June 11, 2021)) is acquired from different open access sources [10,66,67,68].
The SARS-CoV-2 CT-scan dataset: The third dataset used in our work is acquired from [69]. It consists of 2481 CT scan images and collected from hospitals of Sao Paulo, Brazil. In total 1252 patients were infected with SARS-CoV-2 and (1230) were reported as normal. This dataset (https:// www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset, (accessed on Nov 15, 2021)) is also publicly available , and we summarize number of classes and images of each dataset in Table 1.

J o u r n a l P r e -p r o o f
Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images   Table 3 The general comparison of the proposed method with other 3-class state-of-the-art methods.

Evaluation Metrics
The performance of the proposed method is evaluated with respect to Sensitivity, Specificity, Precision, F-Score and Accuracy, defined using the equations below:  The ROC exhibits the performance of the underlined classification model at all classification thresholds.

Implementation details
All the X-ray images are resized to 224×224 based on the size requirement of the model. No image data augmentation was applied, the features are augmented only in the feature space. The CNN is fine-tuned using stochastic gradient descent (SGD) with a learning rate of 3 − 4, mini-batch size of 32 and epochs of 5, and with shuffling of samples between every epoch. To build the feature augmentation mechanism, the feature vector is extracted from the output of the last pooling layer of ResNet-50 with a size of 2048. RICA is utilized to augment the pooling layer features into three augmented feature sets with dimension of 400 for X-ray image dataset, and 500 for COVID-19 X-ray scan and the SARS-CoV-2 CT-scan databases, respectively. The performance also varies by varying the number of iterations done by RICA before stopping, and 80 to 120 iterations were used to extract the augmented features.
We treat these augmented features as three sequences, and each sequence is an A-by-Z array, where A is the number of features (the output size of RICA) and Z is the number of samples. Therefore, BiLSTM takes the input of three sequence features sets. For training the BiLSTM, the Adam optimizer is used by setting the learning rate up to 0.0001. A BiLSTM layer with 60 hidden units, a fully connected layer, a softmax layer, and the number of epochs are fixed to 150. Initializing the BiLSTM sub-network with random initialization can be challenging because large random-valued weights may lead to the problem of exploding gradients. Therefore, we set the recurrent weights with He initializer [75] which performs the best in all scenarios of our experiments.

Experimental results
In order to detect COVID-19 on the first X-ray image database [10,11], we split the original image dataset to eighty percent for training and twenty percent for testing as in the same spirit as [8]. Rather than proposing the CNN model from the scratch, we leverage transfer learning using pre-trained CNN models that have shown outstanding results in classification tasks of a wide variety J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images  Figure 4: Confusion matrix of the three-class using ResNET50+SA+BiLSTM for X-ray image dataset [8]. The horizontal and vertical axis is for predicted and true classes, respectively. Figure 5: Confusion matrix of the three-class using ResNET50+SA+BiLSTM for COVID-19 X-ray scan database [7]. The horizontal and vertical axis is for predicted and true class, respectively.
of classes/types/applications. Specifically, four state-of-theart pre-trained models, such as ResNet50 [62], SquuezeNet [76], GoogleNet [77], and DenseNet-201 [78] were finetuned for the COVID-19 detection task and their results are reported in terms of specificity, sensitivity, precision, F1score, and accuracy in Table 2. Then, we evaluate the performance of deep models by combining them with BiLSTM (CNN-BiLSTM) network. Finally, we report the efficacy of the proposed augmentation mechanism with a combined Figure 6: Confusion matrix of the five-class using ResNET50+SA+BiLSTM for COVID-19 X-ray scan database [7]. The horizontal and vertical axis is for predicted and true class, respectively. CNN-BiLSTM. It can be noted that all these deep learning models exhibit a limited performance to detect three classes: COVID-19, No-Findings, and Pneumonia. One of the reasons is that a number of training samples in the COVID-19 class resembles Pneumonia class and this is not enough to compare to the other two classes (No-Findings and Pneumonia), which might cause overfitting of the model. However, when these models are connected with BiLSTM, the performance improves but remains limited due to ignoring the interference information. Table 3 summarizes the comparison with other state-ofthe-art works using three-class X-ray image database. Since COVID-19 is an emerging disease, the first dataset (X-ray image) used in our work is being updated regularly with the new images. Thus, making a fair comparison with other works would not be possible except the previous work [8] in comparison to us. However, we compare our method with other three class methods which are specifically designed for COVID-19 detection. At the time of writing this paper, the database contained a total of 125 COVID-19 chest Xray images. The best results were obtained from the ResNet-50 and validated using a 5-fold cross-validation procedure. The proposed method provides a 97% accuracy which is 10% higher than the previously proposed method [8] on the same J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images  dataset (three classes), and 99% accuracy for the two-class scenario. For further analysis, a confusion matrix is shown in Fig.4. It can be observed that the proposed approach classified COVID-19 better than the other two classes. Table 4 illustrates the results of the three-class scenario on the second COVID-19 database [7]. The training set, validation set, and testing set are provided separately. Using the self-augmentation mechanism, the proposed method achieves 79% accuracy and improves 4% performance from the previous study [7]. In addition to the three-class scenario, the results of the five-class scenario are also reported in Table 5. It can be observed that the proposed method provides 84% accuracy which is 3% better than the Ensemble-CNNs [7]. For further analysis, Figure 5 and 6 represent the confusion matrices of the three-class and the five-class, respectively. The main observation is that the proposed method attained 99.5% accuracy for the detection of COVID-19 samples. A confusion matrix in Fig.7 is illustrated only for COVID-19 and Normal class with two rows and two columns showing the number of true positives, false negatives, false positives and true negatives. It shows that the model predicted all 207 COVID-19 X-ray images correctly, and no false negative are detected. In the case of normal class, 98 images are misclassified while 109 images were correctly classified. All correct predictions are located in the diagonal of the table (highlighted in light blue and dark blue), so it is easy to visually inspect the table for prediction errors. Therefore, we can observe that the proposed framework is proficient in distinguishing the COVID-19 samples in both datasets.
As our main focus is the classification of COVID-19 samples, we present ROC curves for a two-class detection problem (COVID-19 vs Normal), in which only the true positive rate (TPR) and false positive rate (FPR) are needed. The best possible detection method would allow a learning curve in the upper left corner or coordinate (0,1) of the ROC space, depicting 100% sensitivity (no false negatives) and 100% specificity (no false positives). In Fig.8, the curves are visualized with raw CNN features, CNN with BiLSTM-based network, and finally with an augmentation mechanism. It can be observed that the proposed augmentation mechanism Table 6 The performance comparison results for the SARS-CoV-2 CTscan dataset.

Method
Performance Metrics (%) Accuracy Precision Sensitivity Specificity F1-Score xDNN [69] 97  clearly improves the performance of a CNN-based BiLSTM architecture as exhibited by higher sensitivity rate. In Table 6, we evaluate our proposed method for the COVID-19 CT-scan dataset with state of-the-art methods. The results are obtained by dividing the dataset into 90% as training and 10% as testing dataset [80]. In contrast to previous deep learning methods [69,79,80], our proposed method explicitly takes advantage of augmented features and efficiently detect COVID-19 cases by achieving the accuracy of 98.38%. Thus, based on experimental analysis on all three datasets, we concluded that neither a single CNN model nor CNN-based BiLSTM achieves the best results for all the evaluation metrics. Therefore, the proposed augmentation is essential to produce a robust feature representation for COVID-19 detection.

J o u r n a l P r e -p r o o f
Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images

Ablation study
We conduct the ablation study to present how the weight ( ) assigned to the sparsity constraint can have influences on performance. We note the effectiveness of each augmented feature set on all three datasets. The computational time is also calculated which was required to train and test the model. Moreover, we utilize other dimensional reduction techniques such as principal component analysis (PCA) [81] and factor analysis [82] to compare the performance with our propose method. All the experiments are performed on a workstation with 3.5 GHz Intel Core i7-5930k and 64 GB RAM memory.
From the Table 7 findings, it is evident that the ResNet-50+ 1 +BiLSTM obtains good accuracy by achieving 95.60% accuracy. When we increase the weights( 2 , 3 ), the model further improves the performance. By combining all the weighted features, we achieved the best performance on Xray image dataset (3-class case). Similarly, for COVID-19 Xray scan database (5-class case) and SARS-CoV-2 CT-scan dataset (2-class case), we obtained the highest performance with the combination of 1 , 2 , and 3 . In our experiments, the proposed method takes 352.4s for training and 1.1s for testing X-ray image dataset. In addition, COVID-19 Xray scan database and SARS-CoV-2 CT-scan dataset take 528.4s and 478.1s for training, and 1.9s and 1.1s for testing, respectively.
In Table 8, we can see the detailed classification results after using dimensional reduction techniques. The obtained results show that PCA significantly drops the performance on all the datasets. It might be possible that PCA fails to sustain feature transformation when reduce into a small number of components, i.e. linear combinations of the original features. Surprisingly, factor analysis provides better performance on COVID-19 X-ray scan database and achieves state-of-the-art performance when combined Resnet with BiLSTM network. However, it decreases the performance when tested on SARS-CoV-2 CT-scan and Xray image databases. In contrast to PCA and factor analysis, the proposed mechanism maintains the best performance on all three datasets.

Explainability Analysis
One of the advantages of the proposed approach is that we can interpret the detection process of the model. For each stage, we can see how the features are structured into the high-dimensional (ResNet) and the impact of the augmented feature space along the different classification stages. Taking this into consideration, we employed the PCA projection [83] and the (t-SNE) algorithm [84]. PCA offers a nice explanatory framework since its axes are made of a linear combination of the original dimensions, allowing comprehension of high dimensional patterns. Similarly, tdistributed Stochastic Neighbor Embedding (t-SNE) creates a low-dimensional representation of complex high dimensional data through a series of transformation and finetuned optimization procedures. In this respect, the projection results of both PCA and t-SNE provide a rough indication of the quality of the separation and supporting explainability through visual exploration.
The ResNet features in Fig.9 (a) indicate that both classes (COVID-19 vs Normal) are strongly correlated, which makes it hard for the BiLSTM to separate them as shown in Fig.9 (b) and Fig.9 (c). We also visualize the five class features in Fig.9 (d) and observed that the Normal class is still correlated with the Lung Opacity class that causes overfitting of CNN-BiLSTM architecture. The derived clusters indicate that the prior information obtained from raw CNN features causes to decrease the performance. On the other side, augmented features generated by RICA reduce the correlation between similar classes as shown in Fig.10 (a) and are able to capture more variability in the feature space. Moreover, it can be noticed from Fig.10 (b) and Fig.10 (c) that data points corresponding to Normal and COVID-19 are linearly separable which could potentially lead to better performance when training BiLSTM on lowdimensional data. Thus, the proposed mechanism helps to overcome the overfitting issue and also separate the five classes efficiently in comparison to raw CNN features as shown in Fig.10 (d).
It should be noted that our exploration through visualization as a way to achieve explanability can be further expanded in different directions. First, projection quality metric can be used to assess the quality of each projection by using PCA or t-SNE. This includes global measures such that Normalized Stress, Distance Consistency, ClustMe [85], or local measures such as projection precision score [86] can contribute to shedding light on the quality of such projections. Nevertheless, it should be noted that such assessment may also be misleading and cannot contribute towards comprehending why such results occurred. In this context, one shall mention the interesting work of Fujiwara et al. [87] who proposed a contrasting clusters in PCA (ccPCA) as a way to to find out which dimensions contributed more to the formation of a selected cluster and why it differed from the rest of the dataset, based on information on separation and internal versus external variability.

Conclusion
In this study, we address the problem of COVID-19 detection from chest CT and X-ray images. For this purpose, a unified architecture consisting of a deep convolutional neural network, an augmentation mechanism, and a bidirectional-LSTM is proposed. The CNN provides the high-level features extracted at the pooling layer where the J o u r n a l P r e -p r o o f Journal Pre-proof Self-Augmentation mechanism for COVID-19 Detection using X-ray Images        augmentation mechanism selects the most relevant features and generates low-dimensional augmented features. Finally, BiLSTM is used to classify the processed sequential information. The proposed method provides an end-to-end structure without the need for manual feature extraction. We showed that the detection of COVID-19 was improved by using the low-dimensional augmented features through a reconstruction independent component analysis method. Extensive experiments on three publicly available COVID-19 X-ray image datasets using state-of-the-art network architectures including Squeez-eNet, GoogleNet, and DenseNet-201 and recently published works showed that our newly designed CNN-based BiLSTM architecture outperformed several state-of-the-art models.
Our model achieved a 97% accuracy which is 10% higher than the best performing model published so far in the literature [74] on the three classes, and 99% accuracy for the two-classes dataset. In the five class case, our model achieved 84% accuracy which is 3% better than the previously proposed method in [7]. In some other scenarios, the developed model has demonstrated the ability to achieve 100% accuracy for the detection of COVID-19 samples. On the other hand, we also showed the possibility to utilize the componentwise property of the overall architecture where each stage (component) can be used to generate explanations that can be employed to comprehend the actions of the model. Explainability through PCA and t-SNE have also been explored and duly commented as well as highlighting the potential deficiencies that may restrict the ability of PCS or t-SNE projection to provide an answer to the "why" question in the explainability, while the prospect of a newly introduced ccPCA has been recognized. In the future, we plan to further robustify the feature selection method and RICA analysis in the convolutional layer of a CNN in a way to enhance the explanability of the results and develop joint visualization approach that can comprehend both PCA, t-SNE projection outcomes with attention weights. The source code is available at the project webpage: https://github. com/ziaul55/COVID-19-Detection