Optimal feature selection using novel flamingo search algorithm for classification of COVID-19 patients from clinical text

: Though several AI-based models have been established for COVID-19 diagnosis, the machine-based diagnostic gap is still ongoing, making further efforts to combat this epidemic imperative. So, we tried to create a new feature selection (FS) method because of the persistent need for a reliable system to choose features and to develop a model to predict the COVID-19 virus from clinical texts. This study employs a newly developed methodology inspired by the flamingo’s behavior to find a near-ideal feature subset for accurate diagnosis of COVID-19 patients. The best features are selected using a two-stage. In the first stage, we implemented a term weighting technique, which that is RTF-C-IEF, to quantify the significance of the features extracted. The second stage involves using a newly developed feature selection approach called the improved binary flamingo search algorithm (IBFSA), which chooses the most important and relevant features for COVID-19 patients. The proposed multi-strategy improvement process is at the heart of this study to improve the search algorithm. The primary objective is to broaden the algorithm's capabilities by increasing diversity and support exploring the algorithm search space. Additionally, a binary mechanism was used to improve the performance of traditional FSA to make it appropriate for binary FS issues. Two datasets, totaling 3053 and 1446 cases, were used to evaluate the suggested model based on the Support Vector Machine (SVM) and other classifiers. The results showed that IBFSA has the best performance compared to numerous previous swarm algorithms. It was noted, that the number of feature subsets that were chosen was also drastically reduced by 88% and obtained the best global optimal features.


Introduction
A new coronavirus (COVID- 19) emerged in Wuhan in December 2019 and quickly swept worldwide [1].The COVID-19 epidemic was declared a Public Health Emergency of International Concern by the World Health Organization in January 2020 [2].To counteract, control, lessen, and confine the COVID-19 virus's effects and consequences, several studies are still being done in a variety of fields.A number of models based on artificial intelligence have been developed to diagnose COVID-19 disease [3].However, there are still a few models based on the machine to diagnosis of infectious epidemics.
This study is focused on clinical text mining related to COVID-19 and applying machine learning algorithms to categorize COVID-19 patients.Individual symptoms, demographic information, diagnosis, laboratory test results, chest x-ray reports, treatments, etc., can all be found in clinical texts, which are narrative texts providing a great deal of information regarding afflicted patients.However, the data in clinical texts are often high dimensional and include uninformative features, that significantly affect the accuracy of the classifier.As a result, the dimensionality of the data must be decreased [4].Due to the vast amount of the clinical documents size, Feature Selection (FS) is an essential step before the classification process [5].Their main advantages involve finding a subset of relevant features that will be useful in categorization.In addition to delivering high recognition, easing data comprehension, shortening training time, and resolving the curse of dimensionality problem [6,7].FS is a challenging and complex problem because it necessitates striking a balance between lowering features and maintaining high classifier accuracy, so it requires an effective search strategy, especially when dealing with clinical text.Complicated issues, such as those involving feature selection, are often tackled with the help of algorithms that take inspiration from nature.In recent years, numerous novel swarm intelligence optimization algorithms have been proposed, such as the binary horse herd optimization algorithm [8], moth flame optimization [9], Binary Particle Swarm Optimization [10,11], binary grey wolf optimizer [12], binary aquila optimizer [13], artificial gorilla troop optimization [14].
For the first time, the flamingo search algorithm (FSA), for handling FS tasks in the healthcare sector, is presented in this work.FSA is an efficient new method for a novel swarm intelligence optimization inspired by the flamingo's lifestyle in the migratory and foraging behavior.Figure 1 depicts flamingo communities and individuals in their natural habitat.Flamingos are known for their foraging and migratory behaviors.To the best of our knowledge, it has not been used in feature selection issues; consequently, in this research, the proposed IBFSA has been developed to minimize the number of features chosen from the clinical text related to COVID-19 while maximizing classification accuracy.The proposed method is a wrapper-based approach.Hence a learning algorithm should be part of the evaluation process.In this investigation, SVMs are used [15,16].The most important contributions of this study are:  Development of a swarm algorithm called IBFSA to deal with feature selection process by an improved binary version of FSA is introduced. A novel modified Initialization approach has been proposed to enhance diversity and convergence during the research process.The remaining parts of the paper are structured as follows.Section 2 the related works of clinical of COVID-19 and the FS procedure.Section 3: An overview about the FSA.The proposed methodology is outlined in Section 4. The experimental and findings are presented and discussed in Section 5. Finally, Section 6, concludes the paper.

Related works
Comparatively few attempts have been made to create intelligent classifiers, including feature selection, for the clinical text categorization of COVID-19 patients than for other topics.To correctly identify COVID-19 patients, the authors of this paper [17] employed Binary Particle Swarm Optimization (BPSO) as a wrapper approach for critical feature selection.According to experiments, it not only beats other methods but also introduces the highest possible degree of accuracy with the lowest possible time overhead.The COVID-19 dataset in [18] to disease diagnosis based on Grasshopper Optimization Algorithm (GOA), was used.The experimental findings demonstrate that the suggested method provides high classification accuracy.In this paper [19], presents an intelligent strategy for predicting SARS-CoV2 (COVID-19) using genetic feature selection techniques.The proposed model appears to have substantially lower prediction errors than conventional techniques.In this paper [20], the authors propose using a hybrid strategy based on the BOA algorithm and particle swarm optimization (PSO).The suggested methodology has been tested using the COVID-19 dataset.The experimental results show that the proposed model BOAPSO outperforms the PSO, BOA and GWO in terms of improving performance precision and reducing the number of chosen features by 91.07, 87.2, 87.8 and 87.3%, respectively.This paper [14] aims to introduce a unique discrete artificial gorilla troop optimization (DAGTO) approach for dealing with FS challenges in the healthcare sector.After completing a case study on COVID-19 samples and ten medical data sets were using to demonstrate the method's influence in practice.Evidence from statistically shows that it performs the best.In this study [13], the single Aquila optimizer (AO) is suggested as a search technique to find the optimal feature subset.The COVID-19 real-world dataset is used to evaluate the proposed method.Results showed that AO is superior to competing algorithms in terms of accuracy attained with the fewest features.The novel Caledonian crow learning algorithm is used in this study [21] to propose a strategy for selecting features relevant to the COVID-19 illness.The suggested approach for detecting COVID-19 patients is more accurate than a competing method, as demonstrated by experimental findings on the COVID-19 disease dataset at a Brazilian hospital.The best feature subset may be chosen with the help of a mix of the brainstorm optimization algorithm and the firefly algorithm, as described in this article [22].For the dataset of coronavirus-related diseases, the proposed technique was used.The experimental findings produced demonstrated superior classification accuracy compared to previous approaches.Table 1 provides a brief comparison of earlier works on the COVID-19 detection method.
In conclusion, when comparing machine learning and globally intelligent algorithms to conventional methodologies, most of the experiments on COVID-19 Classification showed good classification results.In addition, swarm intelligence algorithms have been effectively used in the feature selection problem to manage various domains, but they are not extremely applied in clinical text related to COVID-19 categorization.As a result, there is a need and substantial motivation to present a new approach, which includes a weighting scheme, an intelligent feature selection method based on IBFSA, and SVM classifier for classification of the COVID-19 patients from clinical texts.

Overview of standard FSA
The FSA is an evolutionary algorithm with biological inspiration that is modeled after how flamingos in nature find food.Each candidate solution to the optimization issue in this algorithm is represented by a flamingo, and each flamingo has two primary characteristics, namely, its foraging and migrating patterns.Flamingos have no idea where most of the food is in the present (the globally ideal) search region.Therefore, flamingos look for a food site with more plentiful food than the known food in the search region by sharing information with each other, updating the location of each flamingo, and affecting changes in the locations of other flamingos in the group (the optimal solution Global).Identifying the globally best solution inside a specified search area is a significant aim of the swarm intelligence algorithm, and the flamingos' behavior is a fitting metaphor for this purpose [23].
The fundamental steps of this algorithm are described below: Step 1.The population is initialized, set as , the maximum number of iterations is   , and the proportion of migrating flamingos in the first part is   .
Step 2. The number of foraging flamingos in the  iteration of flamingo population renewal is   = ,0,1-×  × (1 −   ).The number of migrating flamingos in the first part of this iteration is   =   × .The number of migratory flamingos in the second part of this iteration is   =  −   −  .Individual flamingo fitness levels are calculated, and the entire flamingo population is then ranked by fitness.The flamingos with low fitness   and high fitness   are classified as migrants, while the others are classified as foraging flamingos.
Step 3. Migrating flamingos are modified based on Eq (2), and foraging flamingos are modified based on Eq (1).
In Eq (2),   +1 presents the location of the  flamingo in the  dimension of the population in the ( + 1)th iteration,    represents the location of the  flamingo in the  dimension in the  iteration of the flamingo population, namely, the location of the flamingo's feet.
represents the j dimension location of the flamingo with the best fitness in the population in the  iteration. = () is a diffusion factor, which is a random number that follows the chi-square distribution of  degrees of freedom.It is utilized to increase the size of the foraging-group for flamingos and simulate the possibility of individual selection in nature, enhancing its the global ability to search for the best opportunity.The random numbers  1 = (0,1) and  1 = (0,1) have a conventional normal distribution,  1 and  2 are determined by −1 or 1 at random.
In Eq (2),   +1 and    represents same meaning as the previous Eq (1). = (0,1) is a set of random integers with the same distribution across all trials; it is employed to broaden the search area during flamingo migration and simulate the randomness of individual flamingo behaviors during the particular migration process.
Step 4. Make sure there are no flamingos that are off-bounds.
Step 5. Move to Step 6 if the allotted number of iterations has been used; otherwise, go to Step 2.
Step 6. Result in the ideal solution and optimal value.The FSA pseudo code is displayed in Algorithm 1. GOA and CNN [18] Easy to implement and takes little time by optimizing CNN by GOA.
By utilizing a more detailed dataset with more images from all three classes, the proposed method can be further enhanced.
BOA, PSO and ML [20] Compared to conventional classification methods, the proposed hybrid model is more effective at classifying COVID-19 patients.
The COVID-19 patient data set used is small, and was not of very high dimensionality.
CA and ANN [21] ANN is a powerful classification technique.
The patient election has potential bias because the database is so unbalanced that only the number of infected people in it is 10% of the total number.
BSO, FA and ML

end while 31
Return   ,   /*X best is top optimal of a solution got by the algorithm */

Proposed methods
In order to predict a COVID-19 diagnosis from clinical texts, our strategy described in this work includes six processing stages, namely collection and describe the dataset, text pre-processing, extract features, features selection, use of machine learning methods, and performance evaluation.The suggested model's block diagram is shown in Figure 2.Both datasets contain "demographic" information, such as age, sex, and comorbidities, in addition to other needed diagnostics information and related tests, including symptoms, vital signs, lab results, values from routine blood tests, and chest CT imaging results, disposition, admission to an ICU, and survival to hospital discharge.The two datasets consist of 3053 and 1446 patients, respectively.Table 2 summarizes the used datasets comprising varying samples and attributes.

Text preprocessing
Clinical texts present a difficult challenge to extract the hidden features from, since they are always presented in an unstructured format.Thus, to train a classifier, data must be presented in a readable manner and undergo pre-processing.Since some symbols and words may not be beneficial for categorization, the pre-processing method aims to improve the data's quality and clean it up.Several pre-processing steps were used to convert unstructured clinical texts into a word vector.It includes removing punctuation, and numbers, stopping words and other characters, converting letters, short-word removal, tokenization, parts-of-speech tagging, stemming, and lemmatization.

Feature extraction
In order to complete NLP tasks, it is crucial to identify an effective text representation system [24].From the pre-processed clinical texts, different features are extracted.The feature engineering described here relies on the use of two steps.SpaCy and ScispaCy were employed in the first step to extract medical entities from clinical text.Symptoms with more than one word were then converted into a single expression (e.g., "shortness of breath") in some reports.ScispaCy provides a robust rule-matching engine and Fast Models for Biomedical Natural Language Processing [25].
In the second stage, the RTF-C-IEF weighting method [26] is used to transform the extracted concepts, which are features, into probability values to be ready for the feature selection model.This procedure drastically decreases the number of features while preserving the informative features.RTF-C-IEF is a statistical weighting method to retrieve a term's significance within a document as the first stage of feature selection strategy for text mining.It was used for feature extraction instead of Bag of Word (BoW) and TF-IDF classical since RTF-C-IEF provides more accurate results [26].
A higher RTF-C-IEF feature score indicates more significance for that feature within the text's clinical context.The RTF-C-IEF formula is written as follows: Where   is the term frequency,   represents the frequency count of the word  in the core corpus,  is the total of dataset, and (  ) corresponds to the frequency of documents that term   appears in the collection.

Improvements embedded into the standard FSA based feature selection
Prior to performing the classification, feature selection is a crucial step to choosing the important features, eliminating the irrelevant ones, minimizing the feature dimensions, and shortening the computing time required to complete the classification [10,27,28].To realize that, FSA [29] is implemented.FSA is a new algorithm that simulates the behavior of flamingos searching for the best possible solution within a given search region (where food is most plentiful).
Since FS is a binary issue, the native optimizer needs to be tweaked so that FSA may optimize in a high-dimensional binary search space, thereby improving the algorithm's efficiency.Many significant steps in updating the FSA algorithm are detailed in this study.Introducing a new operator into the algorithm's structure is the most common method for enhancing FSA exploration as well as correcting the typical roaming behavior of swarm members.In the first step, transfer functions from S-shaped families are used to convert the FSA to binary.Secondly, A novel initialization modification (MIA) approach was incorporated into the standard FSA algorithm to obtain high-quality individuals in beginning and thus increase the likelihood of discovering the best solution, which may increase the optimization's performance.In the third stage, the Levy flight operator is added to each flamingo to boost its variability and the optimizer's capacity to probe further into underexplored portions of the search space.Finally, enhancing the exploitation by Local Search Algorithm (LSA).These promising improvements are discussed in this sub-section.The architecture of the suggested feature selection approach is depicted in Figure 4, and the pseudocode of IBFSA is presented in Algorithm 4.

Transformation function
Modeling the FS problem as a binary one, which can only take values 0 or 1 in the feature-subset selection issue.Thereby, FSA cannot be utilized to directly resolve a feature selection problem because the final solution it produces using Eqs (1) and ( 2) is made up of continuous values (real number domain).As a result, a transfer function (TF) must be used to convert the values from continuous to binary (0 or 1).TF specifies the rate at which the values of the decision variables change from 1 to 0 and back.That is, when choosing a TF to convert the continuous values into binary (0,1), the range of values the TF produces should fall within the range [0,1].The S-shaped family of logistic transformation functions is perfect for mapping processes since it produces output in the [0,1] range.The purpose of this discovery is to identify features that have been omitted or elected.In this case, the flamingo stands for features set, and its binary values indicate whether or not that feature was chosen for inclusion in the final model, where 1 represents a selected feature and 0 means discard.An individual's value range is now mapped to [0,1] by the following function [10]: Where    denotes the  ℎ flamingo location in the  ℎ dimension at the  ℎ iteration,   is computed by Eqs (1) and (2).In Eq (4), the output of the S-shaped function is still displayed continuously as illustrated in Figure 3. Thus, to obtain the binary value the  ℎ position is modified as follows: Where    ( + 1) represents the  element in the  solution at dimension  in iteration  + 1, and  ∈ ,0,1-.

Levy flight strategy
Figure 4 depicts Levy flight, a mathematical representation of a random motion that follows a heavy-tailed probability distribution [30].Levy flight was recently introduced as a solution to optimization problems.It has since been incorporated into the design of many optimization algorithms to improve their performance in areas including speed of convergence, preventing premature convergence, leaping from local minima, and striking a balance between exploration and exploitation [8,9,30].This research aims to improve the FS process used in the COVID-19 diagnosis from clinical texts by proposing for the first time that Levy flight be included in the FSA structure to enhance the performance of the FSA optimizer.An equation that represents the flamingo location update based on Levy's flying improvement is Eq (6).So, in order to increase the variety of search spaces, it has been planned that each upgraded flamingo would employ Levy flight once, resulting in a higher level of exploration.
Where    indicates the  ℎ flamingo at iteration , rand indicates a random number in the range [0, 1], ⊕ represents the dot product, and α represents the step control parameter.Levy flight, as previously mentioned, is a random walk where the leap size supports a Levy distribution as given in Eq (7).Using Eq (8), Levy is computed as random numbers; µ and ν are common random distributions.Eq (9) shows how to calculate φ, where Γ represents a typical Gamma function, and β = 1.5, mentioned in [31].Evolutionary algorithms rely heavily on the variety and convergence of their populations, and population initialization is a crucial aspect of this.This step's purpose is to offer an initial guess at potential solutions.These initially hypothesized solutions will then be iteratively enhanced throughout the optimization process until a stopping requirement is fulfilled.In most cases, having a high-quality initial population can help an algorithm converge more quickly and find the optimal solution.On the other hand, it is possible that an algorithm will not be able to locate the optimal solution if it has based on poor guesses [32,33].In recent years, researches has shown that proper initialization approaches can improve the likelihood of locating global optimum solutions and decrease the variance of the final search outcomes [34].In this paper, the performance of FSA is expanded to make it appropriate for the optimization problem by introducing a new initialization algorithm named MIA.Its basic idea is to create a population based on the initial population in a sporting way without any complex equation or making much change in the original FSA algorithm and its structure.Next, the better individuals will be selected out of the initial population, resulting in the creation of a new initial population made up of outstanding individuals.Thus, the MIA managed to manage part of this algorithm and correctly cover the possible space.Additionally, the suggested initialization technique significantly impacts solution quality, finds the optimal solution with high precision, and has helped boost the likelihood of starting with a global optimum.The whole pseudo code of MIA is displayed as Algorithm2.The LSA algorithm was created and presented in Algorithm3 by [35] .In the original FSA, in each iteration of the migratory flamingo   , LSA is called to enhance the local location obtained by the Eq (3).After the migratory flamingos have moved to their best position, LSA is again called to In addition, in order not to lose the distinctive sites that the flamingo passes through in its journey during the search for the optimal global solution, we added a parameter to help it maintain its sites that have the best fitness value appropriate that it has currently reached, and this prevents the flamingo from moving away from the optimal position and moving to a worse position.

4.5.Binary FSA for FS problem
After the flamingo is converted into a binary vector with the same number of rows and columns of the dataset in TF.The fitness function of the IBFSA is used to quantify each flamingo's level of fitness by combining two seemingly opposing goals.These goals are the number of chosen features and the accuracy.The FS problem seeks to maximize classification accuracy (minimize error rate) with a minimum of specified features.Then, the model performance was optimized with the SVM technique, and the optimal set of features for detecting COVID-19 was determined by identifying the best flamingo.IBFSA uses the following fitness function to evaluate the solutions and achieve an equilibrium between the two main goals: (10) Where  is the classifier's error rate,  is the number of features used to make a decision, and is the total number of features.In addition, the values of and  are the weights employed to strike a balance between these two goals.

Classifier and evaluation
The proposed method is a wrapper-based approach.Hence a learning algorithm should be part of the assessment process.In this research, SVMs are used as classifiers in the fitness evaluation process [36,37,38] because they are so efficient, mainly when dealing with data sets that only have two classes.In addition, the other classifiers are utilized in all other cases.Each dataset was divided at random into 20% for testing and 80% for training.Multiple metrics, including precision, sensitivity, F-measure, Macro-F1, and Macro-Recall, are used to assess the results of our tests and verify the efficacy of the suggested method.Are defined as follows: Where  denotes the total number of categorized classes and,   ,   are F, R values in the  ℎ category of class.In order to increase the statistically significant of the empirical results, we independently test each optimization technique 20 times across all datasets.For each assessment, the following metrics are calculated and used: average classification accuracy, features election ratio, average fitness, and standard deviation (STD) and adopted as follows:

Results and analysis
This section offers a comprehensive empirical examination of the IBFSA optimization algorithm's behavior based on several improvements.Two datasets of patient medical records from COVID-19 are utilized for experiments.Table 1 details the specifics of these data collections.

Parameter tuning
It is well-known that it is challenging for a metaheuristics method to achieve optimal performance across all possible optimization situations, especially when employing the same parameter settings.Therefore, to obtain optimal performance, it is preferable to fine-tune the critical parameters for each optimization issue independently.Parameters must be established when the IBFSA has been defined, and its procedure explained (the number of flamingos, the number of iterations, and the number of runs).The iterations provide the flamingos the chance to achieve the best intensity during one generation.When the number of iterations is repeated multiple times, the runs get their best intensity.Although the runs take more time, they ensure that the solution produced is optimal.Keep in mind that only a subset (80%) of the COVID-19 datasets is used in the experiments for parameter setup.At the same time, the remaining data is held for assessment and validation at the end (testing data).To prevent random bias, each combination is separately run 20 times, and the average results are then shown.In addition, the state-of-the-art wrapper approaches, such as BPSO, BGWO, BWOA, BMFO and BFFA, were compared to the suggested method.All algorithms have been built with the same computer platform and settings for all algorithm parameters to ensure that comparisons are fairness.Table 3 displays how finely tuned the parameters got.

Experiment
Here, we show the results we got from applying our method to test datasets associated with Covid-19, measuring how well our system did at classifying the data.In two stages, experiments are conducted.In stage one, the term weighting schema's impact is investigated on datasets to categories Covid-19 patients as we look for the best performance by including it in the suggested strategy.In the second stage, the proposed IBFSA is compared to numerous alternative wrapper FS methods to demonstrate the proposed method's efficacy.The IBFSA result, which consists of clinical texts with decreased feature sizes, is used as input for classifiers to categorize the patients into the appropriate classes.Take note, the phase of feature selection was separated from the phase of categorization.SVM with a linear kernel function as baseline classifier, Random Forest, the logistic recursion Nave Bayes classifier, and the multi-layer perceptron are all used to assess the quality of the feature subsets.These experiments are based on two key metrics: 1) The total number of features chosen; 2) Secondly, the accuracy of the classification.Measures such as best fitness value, worst fitness value, mean fitness value, STD for the average fitness values, the average number of the elected features, average accuracy score, and maximum accuracy value obtained are used to evaluate IBFSA performance on the FS issue in this section.For ease of understanding, the optimal results of a particular method are presented in bold.Table 4 displays the total number of features extracted during pre-processing before the feature selection procedure.Tables 7 and 8 display the total number of features chosen from the datasets generated using various techniques.The tables show that, on average, the number of features is picked by using IBFSA better than any other technique tested (for both DS1 and DS2) from 20 iterations.Keep in mind that the accuracy and the number of selected features is tradeoffs.Thus, it may be challenging to get the best results in both of these objectives for any dataset.In light of this, we can conclude that the proposed IBFSA outperforms other algorithms in terms of feature selections in the chosen datasets, as shown in Figures 6 and 7.The boxplots for both datasets are seen in Figures 8 and 9 to measure the number of features selected and algorithms performance.It should be noted that the boxplots reflect outcomes of classification and the number of FS, and are displayed after each method has been executed 20 times.These figures allow us to visually see the minimum, median, and maximum values of the data.As shown in these figures, IBFSA has higher boxplots than the other approaches in both datasets.Tables 9 and 10 show that when comparing LR and RF performance, IBFSA performs best in terms of accuracy, precision, and F-measure index.However, there is no significant difference in average recall values between IBFSA and others.In the MLP classifier, Table 11 shows that the IBFSA has the best mean performance measured by the F-measure index.On the other hand, show Table 12 that compared to the performance of other models, the combination of Naive Bayes and IBFSA can categorize the texts with higher sensitivity.Moreover, in Table 13, we see that the SVM with IBFSA has a superior efficacy and outperforms all other algorithms regarding classifier performance, see Figure 10.Classifiers results from machine learning's second dataset are displayed in Tables 14-18.As it can be seen from Tables 14-16 the classifiers achieved a promising performance compared to all methods, however, comparatively there is a marginal difference in accuracy between the classifiers.It is noteworthy, Table 17 shows that a NB classifier trained with IBFSA can prove superior efficacy compared to its other peers, achieving average classification sensitivity of 98.25% and a maximum sensitivity among the 20 runs is 100%.While, Table 18 shows that the IBFSA has the best accurate performance of all of the rivals regarding the SVM classifier, see Figure 11.As per results in Tables 13 and 18, it can be seen that the optimizer IBFSA with SVM classifier has demonstrated a greater classification accuracy in comparison to the other variations using LR, RF, MLP and NB classifiers in handling all selected datasets.One of the causes is that the SVM classifier uses over-fitting protection and does not depend primarily on the total number of processed features.So, it has better potential than previously studied classifiers in dealing with bigger text feature spaces.As seen in the results, when dealing with a sparsely of samples, the SVM can demonstrate a steadier efficacy compared to other models.On these particular datasets, the IBFSA algorithm achieves better results than any other competing approaches in terms of feature selection accuracy.The inclusion of new, more efficient components that improve the algorithm's balance between its exploratory and exploitative capacities is one possible explanation for the algorithm's improved performance.

Conclusions and future work
A new diagnostic model for COVID-19 has been developed that will effectively increase the final prediction accuracy.The suggested approach includes two primary stages.The first stage is utilizing RTF-C-IEF to determine the feature's importance.Next, the modified flamingo search algorithm is then used to choose a collection of pertinent and non-redundant features in the second phase.Finally, the SVM-based classifier is used to predict COVID-19 using the features elected of clinical text.Our experiments were conducted on two sets of data, the first was collected from hospitals in the south of Iraq, and the second was from several sources on websites.In IBFSA, we presented four ways to boost both the global and local search capabilities of the algorithm.In addition, the continuous approach has been adapted to the binary feature selection problem using the binary transformation method.We have compared the suggested technique to state-of-the-art feature selection swarming methods such as PSO, MFO, GWO and FFA.Experiments reveal that the suggested technique is more effective in decreasing sub-features by more than 88% and with an accuracy superior to other methods.As a result, it can be concluded that the suggested approach is a powerful feature selection for COVID-19 patients' classification.Moreover, IBFSA reports that feature selection has decreased the number of diagnostic mistakes for COVID-19 patients.In this way, feature selection helps machine learning zero in on the most relevant information, lessening the likelihood of an incorrect diagnosis while attempting to distinguish between infected and uninfected individuals.In our future work, we'll take into account expanding and diversifying the test datasets to better assess the suggested methodology.

Figure 2 .
Figure 2. Diagram of the workflow of the study.

Figure 3 .
Figure 3. S-shaped function used in FSA algorithm.

Algorithm 2 : 2 Find
The proposed MIA algorithm  = position of flamingos; /* Randomly generate the positions of N flamingo;  = After Convert to binary_map (  );   = Find all Fitness to Population size(flamingos);  = maximum of number of local iterations;   = maximum of number of local iterations; N (Population size). 1 for  = 1   do  −  /* (Global optimal position)

Figure 8 .
Figure 8. Boxplots of IBFSA compared with other algorithms in number of FS for both datasets.

Figure 9 .
Figure 9. Boxplots of IBFSA compared with other performance of algorithms by F-score of SVM classifier for both datasets.

Figure 10 .
Figure 10.Average classification F-measure of IBFSA on DS1 compared with other algorithms by SVM Classifier.

Figure 11 .
Figure 11.Average classification F-measure of IBFSA on DS2 compared with other algorithms by SVM Classifier.

Table 1 .
A summary comparison of earlier works on the COVID-19 detection method.

Table 2 .
Details of datasets.
improve finding the best solution   +1 currently obtained by still removing any more potentially pointless features.At first, LSA stores, in a variable , the value of   +1 produced at the end of each IBFSA iteration.To improve , LSA runs iteratively  times.At each iteration   of LSA, four features'  −  are randomly selected from  .Every variable in the  −  is reversed by LSA.Then, the value of fitness () of the new solution (the new ,

Table 4 .
Number of the extracted features from pre-processing.

Table 5 .
Fitness values from various algorithms on DS1.

Table 6 .
Fitness values from various algorithms on DS2.

Table 7 .
Number of selected features from various algorithms on DS1.

Table 8 .
Number of selected features from various algorithms on DS2.

Table 9 .
Comparison results of Classification performance obtained by LR algorithm with DS1.

Table 10 .
Comparison results of Classification performance obtained by RF algorithm with DS1.

Table 11 .
Comparison results of Classification performance obtained by MLP algorithm with DS1.

Table 12 .
Comparison results of Classification performance obtained by NM algorithm with DS1.

Table 13 .
Comparison results of Classification performance obtained by SVM algorithm with DS1.

Table 14 .
Comparison results of Classification performance obtained by LR algorithm with DS2.

Table 15 .
Comparison results of Classification performance obtained by RF algorithm with DS2.

Table 16 .
Comparison results of Classification performance obtained by MLP algorithm with DS2.

Table 17 .
Comparison results of Classification performance obtained by NM algorithm with DS2.