MRFGRO: a hybrid meta-heuristic feature selection method for screening COVID-19 using deep features

COVID-19 is a respiratory disease that causes infection in both lungs and the upper respiratory tract. The World Health Organization (WHO) has declared it a global pandemic because of its rapid spread across the globe. The most common way for COVID-19 diagnosis is real-time reverse transcription-polymerase chain reaction (RT-PCR) which takes a significant amount of time to get the result. Computer based medical image analysis is more beneficial for the diagnosis of such disease as it can give better results in less time. Computed Tomography (CT) scans are used to monitor lung diseases including COVID-19. In this work, a hybrid model for COVID-19 detection has developed which has two key stages. In the first stage, we have fine-tuned the parameters of the pre-trained convolutional neural networks (CNNs) to extract some features from the COVID-19 affected lungs. As pre-trained CNNs, we have used two standard CNNs namely, GoogleNet and ResNet18. Then, we have proposed a hybrid meta-heuristic feature selection (FS) algorithm, named as Manta Ray Foraging based Golden Ratio Optimizer (MRFGRO) to select the most significant feature subset. The proposed model is implemented over three publicly available datasets, namely, COVID-CT dataset, SARS-COV-2 dataset, and MOSMED dataset, and attains state-of-the-art classification accuracies of 99.15%, 99.42% and 95.57% respectively. Obtained results confirm that the proposed approach is quite efficient when compared to the local texture descriptors used for COVID-19 detection from chest CT-scan images.

• We have fine-tuned the parameters of CNNs and extracted features from different pre-trained CNNs (GooG-LeNet, ResNet18, ResNet152, VGG19, and VGG16) and compare each combination to get the better performing model. The combination of GoogLeNet and ResNet gives the best result among all other combinations (detailed discussion in "Deep feature extraction" section) • Though individual CNN model has less redundant features, we have proposed a hybrid meta-heuristic approach MRFGRO to reduce the overall feature dimension and increase the model's overall classification accuracy. That is, the MRFGRO algorithm focuses on reducing the dimension of feature space and which further leads in achieving faster and better classification results. We have compared the results with other optimization algorithms and achieved better results from them (detailed discussion in "Comparison with other optimization algorithms" section). • We have evaluated our model on three publicly available datasets namely, COVID-CT, Sars-CoV-2, and MosMed, and achieved accuracies of 99.15%, 99.42% and 95.57% respectively.

Literature survey
In this section, we have described some existing methods for COVID-19 detection using machine learning and deep learning models. Disease detection from CT scan images with various computer-aided systems have started in the end of the twentieth century. Many chronic disease detection become very easy with deep learning and machine learning based models. www.nature.com/scientificreports/ Different machine learning and deep learning models have been proposed to diagnose different lung diseases including COVID-19 and chronic pneumonia. The basic constraint for COVID-19 detection using medical images is the lack of data. That is the reason, Waheed et al. 12 have proposed an Auxiliary Classifier Generative Adversarial Network (ACGAN) that generates the number of images that can help to increase the performance of CNNs. However, Horry et al. have used a transfer learning model on different multimodal COVID datasets 13 . Sabanci et al. have introduced a conjugated system with a pre-trained CNN to a Bidirectional Long Short-Term Memories (BiLSTM) to emphasize the temporal features 14 . Matteo Polsinelli has proposed a light CNN namely SqueezeNet and implemented on the dataset developed by Zhao et al. and gets an accuracy of 83.3% 15 . Wang et al. 16 have proposed a deep CNN and trained it with 13,975 X-ray images and get an accuracy of 98.9%. In another research, Ying et al. 17 have introduced a DRE-Net to classify COVID and healthy patients using chest CT-scan images and achieved an accuracy of 86%. Also, Ozturk et al. have proposed a 17 layer CNN named DarkCovidNet. This model has got an accuracy of 87.02% for three-class classification and 98.08% for two-class classification. Moreover, Rajarshi et al. 18 have developed a model which extracts deep features from various CNNs and thereafter the optimal feature subset selection has been done using Harris Hawks optimisation with Simulated Annealing algorithm. The proposed method has been evaluated on SARS-COV-2 CT-Scan dataset and their obtained accuracy was 98.85%. Table 1 shows some further works on different models for automated COVID detection using medical image analysis.
From the literature survey, it is understood that most of the researchers have relied on different deep learning models for the detection of COVID-19 from medical images 25 . So, from the above discussion, we can say that different CNN based models have different capabilities of feature extraction from the input images. However, if we concatenate the feature vectors obtained from those models, then it would become a high dimensional feature vector which, in turn, needs more storage and a huge amount of time to train a model. Here lies the requirement of an FS model that can eliminate the redundant features from the extracted deep feature set. Meta-heuristic 26 approaches are quite popular to manage this task. In recent times, different feature selection techniques have been introduced. Although, we have mentioned different optimization algorithms in this paper. Researchers have found that a single optimization algorithm might fail to deal with every problem 11 . Some of recent times hybrid optimization algorithms are: cooperative Genetic Algorithm (CGA) 27 , Late Acceptance Hill-Climbing (BBA-LAHC) 28 , hybridization of Mayfly algorithm (MA) and HS named as MA-HS algorithm 29 , hybridization of GA with PSO and Ant Colony Optimization (ACO) algorithm 30 , clustering-based equilibrium and ant colony optimization (EOAS) 31 . Keeping the above facts in mind, in the proposed work, we have proposed a hybrid metaheuristic FS algorithm, called MRFGRO, which reduces the feature dimension of the features obtained from the deep learning models when applied over chest CT scan images to detect the COVID-19.

Materials and methods
In this section, the workflow of the proposed approach for COVID-19 detection has been discussed successively. The entire work is divided into different subsections that include: (A) dataset description, (B) deep feature extraction, and (C) feature selection.
Dataset description. In this paper, we have evaluated our model on three publicly available datasets which are briefly described below.
COVID-CT dataset. The covid-CT dataset is developed by Jhao et al. 32 . As the name suggests, this dataset consists of chest CT-scan images with 349 confirmed COVID-19 cases and 397 healthy cases. In this research framework, all images are resized to 224 × 224 × 3 and are normalized before feeding them to the deep learning frameworks for feature extraction. During the training process of deep neural networks, as the dataset is very small, the size of the dataset is augmented by a rotation of 50 • , a slant-angle of 0.5 • , as well as by enabling horizontal and vertical flipping.
SARS-Cov-2 dataset. SARS-Cov-2 CT-scan dataset is developed by Soares et al. 22 . This dataset contains 2492 chest CT-scan images, out of which 1262 are COVID-19 positive and the rest 1230 images are of a healthy subject. Similar to the previous dataset, the images are also resized to 224 × 224 × 3 and during training, data augmentation techniques are applied with 25 • of rotation and horizontal flip. • CT[0] Normal lung tissue with no sign of viral pneumonia.
• CT [1] Multiple ground-glass opacity is noticed and lung parenchyma is involved 25%.
• CT [4] Multiple ground-glass opacity is diffused and lung parenchyma is involved more than 75%.
Deep feature extraction. Sometimes it is difficult to design a competent feature vector using conventional feature engineering techniques when the underlying dataset is very complex. Moreover, it is found that such a feature vector designed for a particular dataset may not perform well when applied to other datasets. Hence, in this research work, we have focused on extracting deep features using pre-trained CNN models. For deep feature extraction, we have considered five standard pre-trained CNNs such as GoogLeNet 8 , ResNet18 9 , ResNet152 9 , VGG19 10 and VGG16 10 . All of the pre-trained CNNs are fine-tuned on the datasets for 30 epochs of training. For all cases, cross-entropy loss 34 has been optimized by Adam optimizer 35 with learning rate and momentum of 0.0009 and 0.85 respectively. After 30 epochs of training, the weights of the epoch which achieves the minimum loss have been loaded and the model is set to its evaluation mode. Thereafter, both the training and testing images are passed through the model, and the features from the last layer have been extracted. This is how deep feature extraction has been performed in this study. The numbers of deep features extracted using different CNNs are shown in Table 2. Also, to evaluate deep features obtained from different CNNs together, we have tested the combinations of different CNNs by fusing the feature sets and evaluated through our proposed MRFGRO algorithm for FS. In the fusing process, the features from different CNNs are concatenated together to form the final feature vector. Suppose from CNN1 and CNN2, the extracted features are f1 and f2, and suppose, after the fusion function (F(.)), the final feature vector becomes f. Therefore Then, the number of features in f would be the summation of the number of features of the feature set of each CNN.
where N f is the number of features the fused feature set has and N fi is the number of features in the ith deep feature set. The results are obtained from different features from different nets and their combinations are provided in "Results and discussion" section. Additionally, the representational diagram depicting deep feature extraction process is given by Fig. 2. Feature selection model. We have extracted features from different CNNs and concatenated them in various combinations. As a result, the size of the feature set becomes very large. Therefore, there remains a chance that such larger sized feature vector might overfit the classifiers and there may be some redundant features. So, to address this issue, we design a FS algorithm that can produce a more prolific feature subset out of the entire feature set. In doing so, we propose a new hybrid meta-heuristic FS algorithm known as MRFGRO algorithm by hybridizing MRFO with GRO. One of the main limitations of the FS model is the premature convergence and the get stuck at the local minimum. However, a hybrid model can help to balance between exploration and exploitation, so that the problem of premature convergence can be overcome. The working mechanism of each candidate optimization algorithm and their hybridization procedure are discussed in the next subsection.
Manta ray foraging optimizer. MRFO 36 is one of the two optimization algorithms which we have chosen to produce our hybrid FS model. MRFO is based on the foraging properties which manta rays use to haunt their prey. Three different foraging strategies have been used in the algorithm, which are chain foraging, cyclone foraging, and somersault foraging. In the first type of foraging technique, manta rays aim to achieve a high level of concentration to catch their prey plankton. Therefore, they form a foraging chain, while each manta ray is after www.nature.com/scientificreports/ the prey and their position is being updated over the iterations. The mathematical expression of chain foraging is as followed: at iteration n, the position of jth manta ray is given by p n j and, d, N and p n best are a random vector, number of manta rays and the best solution respectively. The weighting coefficient β is given by Manta rays start forming chain in a combined manner and swim towards the prey following a spiral path, after being cognizant about the exact position of the plankton. In cyclone foraging, in addition to spiral motion, each manta ray is one step ahead towards its prior one, and thus a cyclonic motion in formed. The cyclonic foraging can be expressed in terms of two perpendicular components, which are given as follows: where ω is a random number. Now similar to chain foraging, the position and movement of cyclone foraging towards the minimum can be expressed as given below: Here, also γ is a weighting factor with the expression where I is the maximum iteration and d1 is a random number. Since manta rays search for the prey from their reference positions, cyclone foraging has good exploitation towards the search of the best solution. In addition, cyclone foraging process exert forces to each manta ray or candidate solution to search for new best solution which remains far from the current best. That's how exploitation is enhanced here. This is performed by assigning a random position in the search space, www.nature.com/scientificreports/ where, p rand is the randomly assigned position and lb, ub are the lower bound and the upper bound of problem variables respectively. The final stage of this MRFO is the somersault foraging, where the food is chased as a hinge. In this type of foraging, each manta ray tumbles around the hinge for a new position. The motion can be expressed as where, S is the somersault foraging factor, and d 2 , d 3 are random numbers. This is the last phase and here the distances between the emerging solutions and the global minimum get reduced and converge to optimal solution. Eventually this foraging reduces adaptively over the iterations. This is how MRFO approaches the optimal solution by developing a mimic to the haunting process of manta ray fishes.
Golden ratio optimization. There are various physical phenomena which form a fixed ratio known as golden ratio 37 . Fibonacci first introduced the term golden ratio. He defined a series called Fibonacci series, which is basically an infinite series where the kth is the sum of (k − 1)th and (k − 2)th terms. The ratio of any two consecutive terms in the series is always a fixed number 1.618, this number is named as golden ratio. This is the key idea of the GRO algorithm. kth Fibonacci number can be obtained from the following equation.
Similar to other wrapper based FS algorithms, here also initial population is generated. In GRO algorithm, the candidate solutions are considered as vectors. These vectors have certain magnitudes and directions as well. The directions and the magnitudes of these vectors are updated over the iterations and moved towards the global minimum. Initially, the mean value of the population is chosen and the fitness of each candidate solution is calculated. Thereafter, each candidate solution of the population is compared to the mean solution of the population. Now, if the fitness of the mean solution is more than the worst solution, then the worst solution is replaced by the mean solution. This process is carried out in an iterative manner by updating the population in each iteration. Again the worst solution within the updated population is calculated and the above steps are repeated. Thus the vectors of the population get converged towards the minimum.
Equations (13) and (14)  Proposed algorithm. The proposed MRFGRO (see Fig. 4) algorithm is the hybridized form of MRFO and GRO. The main motive of this hybridization is to overcome the drawback of the parent algorithms. The extracted feature set is represented by 0's and 1's, where 1 represents the feature to be selected and 0 represents exactly the opposite. Again, the basic goal of FS algorithms is to reduce the number of 1's and achieve higher accuracy accordingly. Optimization in continuous search space is quite opposite than used in the binary search space. The binary search space is considered as a hypercube and the search agents try to jump nearer the hypercube by changing the bits. Two widely used transfer functions which are applied to convert the continuous optimization problem to a binary optimization problem are S-shaped and V-shaped transfer functions. S-shaped function is represented by Eq. (17). However, in this paper, we have used S-shaped transfer function.
Transfer function. The role of the transfer function to convert the feature set into series of 0's and 1's to perform the final training of the sample. For this purpose we have used signoid function for binarization. As we know, the output of the sigmoid function ranges between 0 and 1. Eq. (17) refers to the sigmoid function. Figure 3 shows the graphical representation of sigmoid function.
Our proposed algorithm has the following steps:

Results and discussion
In this section, we report the experimental results on the three COVID-19 detection datasets, brief descriptions of which are already given in the previous section. The experimentation include the results obtained by different machine learning classifiers used for fitness calculation of MRFGRO algorithm, loss plots and accuracy plots of different deep learning models, comparison of MRFGRO algorithm with other FS algorithms, hyperparameters tuning, and so on. At the end, we conclude this section by giving comparative studies of the proposed method of COVID-19 detection with several state-of-the-art techniques. For the evaluation purpose, we have used four standard metrics, which are Accuracy, Precision, Recall, and F1 score. All these metrics have been taken into consideration to evaluate the proposed model more generally as well as to handle the class imbalance issue. These evaluation metrics are dependant on some elementary measures, which are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The mathematical expressions for calculating aforementioned metrics based on TP, TN, FP, and FN values are given below: • Accuracy: • Precision: • Recall: TP TP + 1 2 (FP + FN) .  Table 3.
In the previous section, it is mentioned that we have extracted deep features instead of traditional features for automatic COVID-19 detection from CT-scan images. We have trained some pre-trained networks for 30 epochs with Adam optimizer and a learning rate of 0.001. The loss function which is optimized by the optimizer is a cross-entropy loss. During training, we have used some data augmentation which is mentioned in "Dataset description" section wheredatasets are briefly discussed. After training, the fine-tuned weights are saved and thereafter the images are loaded, and features of the last layer are extracted. The validation loss plots and accuracy plots of all the CNNs on the SARS-CoV-2 CT-scan dataset are shown in Figs. 5 and 6. From Figs. 5 and 6, it is observed that both GoogLeNet and ResNet18 architectures converge better compare to other CNNs and the   www.nature.com/scientificreports/ obtained accuracies are also better. The convergence loss plots of the SARS-CoV-2 CT-Scan dataset are much better as compared to COVID CT-Dataset, since the number of images in the previous one is also more. For these two datasets, the accuracies of GoogLeNet and ResNet18 happen to be much greater than that of others, but in the MOSMED dataset, all of the nets achieve comparable results. The maximum results in the SARS-CoV-2 CT-Scan dataset and COVID CT-Dataset are achieved by ResNet18, which are 92% and 90% respectively. Whereas for the MOSMED dataset GoogLeNet achieves the maximum, which is around 88%. ResNet152 performs badly on COVID CT-Dataset but gives decent result in the SARS-CoV-2 CT-Scan dataset. Both VGG16 and VGG19 thoroughly produce poor results on the SARS-CoV-2 CT-scan dataset and COVID-CT dataset but report comparable results over the MOSMED dataset. The result obtained by combining the deep features of GoogLeNet and ResNet18 is superior to all other combinations in terms of final classification accuracy for all three datasets. For SARS-CoV-2 CT-scan dataset and MOSMED dataset, the differences in classification accuracies of different combinations are significant, whereas for COVID-CT dataset the results are much comparable. Due to very large number of features, VGG models themselves and different combinations of them fail to achieve promising results. The possible reason may be many non-informative features are generated which degrade the overall recognition accuracy. Therefore, in this case, we have combined the deep feature sets of GoogLeNet and ResNet18 models, and this is considered as our final feature set.
It is to be noted that all the results are examined by fixing the other parameters to the optimal combination. These parameters include the machine learning classifier used in calculating the fitness function, different hyperparameters of these classifiers, and various parameters of MRFGRO optimization algorithm itself.
Calculation of fitness value. Different machine learning classifiers have been used for the calculation of the fitness value of the MRFGRO algorithm and the final classification task. The classifiers are SVM, ELM, and MLP. A brief description of these classifiers is given in the previous section. Needless to mention that the results obtained by these classifiers are numerically different from one another. The results obtained by these three classifiers upon all three datasets are reported in Table 4.
For most of the cases of Table 4, the SVM classifier outperforms the other two in terms of accuracy as well as other evaluation metrics. In some cases, ELM classifier achieves better result than SVM classifier, however, MLP classifier have not performed so well. The results obtained by ELM classifier for SARS-CoV-2 CT-scan dataset and COVID-CT dataset, are much comparable to that of SVM classifier, but for MOSMED dataset differences are much high. Therefore, SVM classifier has been chosen for both classifications as well as fitness calculation purposes.
Hyperparameter tuning. There are many hyperparameters in this entire framework of optimizing deep features using our proposed MRFGRO algorithm. Some are used during deep feature extraction and some are used in the proposed FS algorithm.
The main hyperparameters of the deep learning models are the optimizer, learning rate, momentum of the optimizer, and batch size among others. In the training procedure, the optimizer and learning rate have been set to Adam and 1e −3 for all three datasets. On the other hand, the batch size for SARS-CoV-2 CT-scan dataset, COVID-CT dataset, and MOSMED dataset are taken as 50, 25, and 30 respectively. The graphs showing the final classification accuracies achieved after using different combinations of optimizers and learning rates on all three datasets are illustrated in Fig. 7.
It is to be mentioned that the accuracies reported in the plots are achieved after applying the FS algorithm, not the accuracies obtained by the deep learning models. Other deep learning hyperparameters such as momentum, regularization constant, etc. have been fixed to their standard values.
Some most important hyperparameters of MRFGRO based FS algorithm are the initial population, different kernel functions and regularization parameters of the SVM classifier. The variation of resultant accuracy concerning the initial population in all three datasets is given by Fig. 8.
The maximum accuracy for all three datasets is obtained with the initial population size of 10. Therefore, the initial population is fixed to 10 in this current study.

Comparison with other optimization algorithms.
To confirm the superiority of the MRFGRO algorithm, we have evaluated many popular optimization algorithms on all three datasets and compared the results with the results obtained by the MRFGRO algorithm. The algorithms which we have chosen for comparison Table 4. Results obtained by the proposed MRFGRO algorithm using different classifiers on all three COVID-19 datasets. Maximum values of accuracy, precision, recall and F1 score for each dataset are made bold.
Evaluation parameter SARS-CoV-2 CT-scan dataset COVID-CT dataset MOSMED dataset  45 , GRO and MRO. In addition to these, some hybrid algorithms such as GA+EO, PSO+ASO and HAS+GRO which gave good results are also reported in here. It is to be noted that, there are numerous optimization algorithms used for feature selections have been developed over past three decades. Therefore it is not possible to estimate performances of every possible combinations of these feature selection algorithms. Hence, from the aforementioned chosen algorithms, those combination which gave comparatively good and promising results are reported hereby.These wrapper based optimization algorithms have not been chosen on a random basis. It is to be noted that GA, HSA and PSO are very old algorithms with successful usage history in varied domains, whereas the other three are developed in recent times and have better efficiencies in many competent fields. The classification accuracies obtained by different optimization algorithms (used for FS in the literature) are shown in Table 5.  www.nature.com/scientificreports/ Proposed MRFGRO algorithm performs much better than the old and new FS algorithms considered here for comparison in terms of classification accuracy for all three datasets. Along with impressive classification accuracy, the number of features selected is also very less for the MRFGRO algorithm. This indicates that the MRFGRO algorithm is very efficient in selecting optimal features, thereby improving the overall classification accuracy.

SVM (%) MLP (%) ELM (%) SVM (%) MLP (%) ELM (%) SVM (%) MLP (%) ELM (%)
Comparison with recent methods. To gauge the goodness of the proposed framework, results obtained by some recent works on the aforementioned datasets have been compared with the results obtained by the present one. The results of the comparative studies are reported in Tables 6, 7 and 8. The proposed method achieves the best results over all the aforesaid datasets. Apart from that Shaban et al. 46 with traditional machine learning with FS achieves impressive results of 96% in COVID-CT dataset. Whereas H. Aishazly 47 by transfer learning with ResNet101 reports 99.4% accuracy on SARS-CoV-2 CT-scan dataset, which is almost the same as the achieved accuracy of MRFGRO model (99.42%). MOSMED dataset is not much explored so far. Rohila et al. 48 did segmentation and classification, and reported 94.9% classification accuracy with their proposed ReCOV-101 net. As a whole, we can say that the proposed model of optimizing deep features using the MRFGRO algorithm outperforms all the models published recently for COVID-19 detection.

Conclusion
In this work, we have proposed a new hybrid FS model, called MRFGRO, which has been evaluated on three standard CT-scan based COVID-19 detection datasets. We have computed deep features instead of using traditional feature engineering in accomplishing this task, due to the advantages of deep features over traditional features as mentioned earlier. The state-of-the-art results obtained over all three datasets are reported in "Results and discussion" section. The effectiveness and superiority of hybrid MRFGRO over other FS algorithms are also provided in "Results and discussion" section. In spite of having many advantages of the proposed framework, there are some limitations too. Hereby we conclude our paper by mentioning some future extension of this work keeping in mind the limitations of the MFRGRO algorithm: • We have evaluated our model on only CT-scan datasets. However, to confirm the robustness of the work, chest X-Ray image datasets can also be taken into consideration. • Hyperparameters of transfer learning such as optimizer, learning rates, batch size etc. are very important for proficient learning of the CNN models. In this study, we have chosen the optimal parameters by performing some exhaustive experimentation. However, there are some efficient ways to find them, such as using some optimization techniques. Bayesian optimization can be used for hyperparameter fixing of deep learning models. • In recent times, some advanced neural nets are also developed such as Squeeze net, Exception net, Capsule net, and so on. These nets can also be used for deep feature extraction. • Initial population selection of MRFGRO algorithm can also be thought of which may help to increase the convergence rate of the said algorithm.