A Survey of COVID-19 Diagnosis Using Routine Blood Tests with the Aid of Artificial Intelligence Techniques

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), causing a disease called COVID-19, is a class of acute respiratory syndrome that has considerably affected the global economy and healthcare system. This virus is diagnosed using a traditional technique known as the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test. However, RT-PCR customarily outputs a lot of false-negative and incorrect results. Current works indicate that COVID-19 can also be diagnosed using imaging resolutions, including CT scans, X-rays, and blood tests. Nevertheless, X-rays and CT scans cannot always be used for patient screening because of high costs, radiation doses, and an insufficient number of devices. Therefore, there is a requirement for a less expensive and faster diagnostic model to recognize the positive and negative cases of COVID-19. Blood tests are easily performed and cost less than RT-PCR and imaging tests. Since biochemical parameters in routine blood tests vary during the COVID-19 infection, they may supply physicians with exact information about the diagnosis of COVID-19. This study reviewed some newly emerging artificial intelligence (AI)-based methods to diagnose COVID-19 using routine blood tests. We gathered information about research resources and inspected 92 articles that were carefully chosen from a variety of publishers, such as IEEE, Springer, Elsevier, and MDPI. Then, these 92 studies are classified into two tables which contain articles that use machine Learning and deep Learning models to diagnose COVID-19 while using routine blood test datasets. In these studies, for diagnosing COVID-19, Random Forest and logistic regression are the most widely used machine learning methods and the most widely used performance metrics are accuracy, sensitivity, specificity, and AUC. Finally, we conclude by discussing and analyzing these studies which use machine learning and deep learning models and routine blood test datasets for COVID-19 detection. This survey can be the starting point for a novice-/beginner-level researcher to perform on COVID-19 classification.


Introduction
SARS-CoV-2 was first recognized in China, after which the severe pneumonia yielded by the virus, called COVID-19, rapidly circulated worldwide [1,2]. COVID-19 has different clinical symptoms, such as dyspnea, fever, cough, myalgia, fatigue, gastrointestinal complications, and headache [3,4]. This virus is risky and affects the mortality of individuals with compromised immune systems. Medical professionals and infectious disease experts from the entire globe are seeking a solution for the disease. COVID-19 has been the primary source of death in many countries around the world, with the United States, Italy, and Spain having one of the highest number of deaths [5]. Figure 1 demonstrates the global 14-day COVID-19 case notification rate per 100,000 as of 15 July 2020. To date, a couple of diagnostic techniques have been adopted by physicians, such as RT-PCR [6,7], imaging resolutions and blood checks [8]. RT-PCR, which is the best measure for the analysis of COVID-19 [9,10], tolerates a low sensitiveness (60-71%), more extended waiting duration for the results [11], and poses extra responsibilities to the healthcare system, demanding pricey devices [12][13][14]. Additionally, there is a shortage of testing kits, reagents, and trained personnel for analysis, particularly in less-developed nations [12]. Thus, scientists have searched for extra reachable techniques of diagnosis, among which imaging techniques have obtained remarkable attraction [15]. Chest CT and chest Xrays are mainstream imaging alternatives for diagnosing COVID-19 infection. Chest CT presents a better sensitivity [16], similar to RT-PCR [12]; however, it also has numerous downsides, such as hospital-acquired infection, radiation safety, and lower access rates to CT devices [17][18][19]. Chest X-ray is a less costly imaging choice and uses a lower radiation dose than CT, while virtually every health center and most clinics will have access to X-ray equipment [20][21][22]. Chest-X-ray, similar to CT, affords medical physicians with imaging symptoms of SARS-CoV-2 contamination, e.g., ground-glass opacity, but suffers from an increased rate of false-negative results [23][24][25]. Blood tests are broadly accessible and have much lower costs than RT-PCR and imaging tests. Since biochemical parameters contained in ordinary blood exams, for example, lactate dehydrogenase (LDH), C-reactive protein (CRP), etc. [26], change over the course of the COVID-19 contamination, blood tests can provide physicians with data about the diagnosis of COVID-19 [12]. Consequently, blood tests may additionally supply a potentially precious instrument for the quick screening of infected patients and compensate for the deficiency of RT-PCR and CT scans by providing an initial step of detection [27].
The health industry is impatiently following new techniques and technologies to address the growth of the COVID-19 epidemic in the international health crisis. AI is known to be one of the grandest uses of global technology that can follow the speed and detect the growth rate of COVID-19 and determine the risk and severity of COVID-19 patients. AI can also predict cases of death by adequately analyzing previous patient data. Moreover, AI can help us battle this virus by testing people and providing medical assistance, data and information, and recommendations regarding disease control [28].
Machine learning (ML) [29][30][31], as a class of AI, includes the algorithmic modeling structures of statistical models and only requires a small amount of knowledge to learn how to handle problems [32][33][34]. On the other hand, deep learning (DL) [35] is a class of ML that concentrates on making deep structural neural-network-based models that learn from data utilizing feed-forward and back-propagation. DL has improved significantly in the last two decades in several activities [36][37][38]. However, it needs a vast amount of data to learn. Exceptional cases of DL, where large-scale datasets are not required to train, have been generative models and transfer learning [39][40][41].
This survey introduces a series of main works of AI containing ML and DL research articles on COVID-19 diagnosis using routine blood tests. In total, we investigated 92, of which 82 articles are ML-based, and the rest are based on DL models. Because of the rapid growth of COVID-19 cases, we have quoted many published articles before conducting a thorough investigation, so these articles should be analyzed for their accuracy and quality in peer review.These articles are classified in two tables, in which, by comparing different works, it is determined which AI algorithms or performance metrics are used more to detect COVID-19, as well as which blood test datasets are used.
There are four main sections in this study. Section 2 discusses the procedure followed for selecting the research articles. A summary of contemporary machine learning and deep learning research is given in Section 3. Section 4 presents the blood features used in previous articles. The approaches are examined, and the outcomes of various models are discussed in Section 5. In Section 6, the article is concluded.

Protocol for Choosing COVID-19-Related Research Articles
To choose the research articles, the most pertinent keywords, such as COVID-19, routine blood tests, machine learning, and deep learning, were used. Moreover, we employed digital databases, including IEEE Xplore, Elsevier, Springer, and MDPI, to collect only English-language literature. Note that research articles about blood tests and rapid antigen tests were not reviewed. Figure 2 shows the details of the statistics of ML and DL publications on COVID-19. The search strategy was adjusted to obtain the maximum number of documents. Table 1 lists the number of documents retrieved by the queries referenced. The results were thoroughly scrutinized to seek relevant results. The queries yielded 4785 documents. The IEEE database was subject to a review of 620 studies that were evaluated according to their titles and abstracts. This screening led to the exclusion of 459 documents, as their material was not related to the analysis, and the remaining 161 were chosen for a full-text review. After a full-text review of the studies, 148 were rejected, and the remaining 13 were included in the present review. In total, the titles and abstracts of 1123 studies from the Springer database were scanned through. After the screening process, 992 documents were deemed unsuitable and only 131 were qualified for a full-text review. In total, 112 of the studies were disregarded and 19 were ultimately included in this study. The titles and abstracts of 1201 studies in the Elsevier database were examined. After the screening was finished, 1102 documents were disregarded because they had no association with the investigation, leaving 99 documents to be analyzed for a full-text review. From these studies, 21 were accepted and 78 rejected. A review of the MDPI database was conducted, resulting in 337 studies, with the evaluation based on their titles and abstracts. The screening process resulted in the elimination of 301 documents, as they were not relevant to the analysis, while 36 were selected for a full-text review. A complete textual analysis of the studies led to the rejection of 28, while the remaining 8 were included in the current review. Finally, the titles and abstracts of 1504 studies in other databases were evaluated. Following the screening, 178 documents were kept for a full-text review, since the other 1326 had no association to the topic being studied. Out of these studies, 147 were rejected and 31 were used for the final analysis.
This section delineates the content of the 92 studies found in the databases investigated. The results of the research article search are summarized in Figure 3.

Overview of Machine Learning and Deep Learning Methods
Routine blood tests can be used to quickly diagnose COVID-19 infection utilizing AI-based techniques such as ML and DL. These models can uncover potential connections between different qualities in blood test results and provide information to guide decisions.
This section contains research articles that use routine blood tests to diagnose COVID-19 while taking into account ML and DL models. The majority of COVID-19 detection techniques used are shown in Figure 4.

Machine Learning
More focus has been placed on the potential use of ML methods to address COVID-19 diagnosis using routine blood testing. The following are some well-known ML algorithms that have been applied for COVID-19 diagnosis [42]. Support vector machine (SVM) [43] is used for classification. The goal of SVM is to identify the best hyperplane for differentiating the features. There are numerous ways to draw the hyperplane and an ideal one has been discovered that best separates the dataset. SVM is a highly accurate model, and it is very unlikely that overfitting will occur.
Random Forest (RF) [44] is another ensemble technique using decision tree that utilizes a re-sampling process called the bagging method that creates multiple trees and handles weak classifiers in a different way. RF is effective for highly complex problems and can handle missing data and unbalanced datasets.
K-nearest-neighbor (KNN) [45] is a classification algorithm with which a sample's label can be predicted using the labels of its closest neighbors. It is necessary to choose the parameter K and use the attribute-distance computation metric to determine which other data points are the closest neighbors.
Logistic regression (LR) [46] is a classification algorithm. Based on the value of independent features, LR models the probability of samples belonging to a particular class. Then, the model can be used for predicting the probability that a given sample belongs to a certain class. LR has simple calculations and can support continuous numerical values, while non-linear data cannot be handled by LR.
Decision tree (DT) [47] is a Supervised ML algorithm. To characterize the connections between attributes and a class label, DT generates a tree-structured model. It divides observations recursively based on the property with the highest gain ratio value that is the most informative. The data are divided in the nodes and decisions are made in the leaves.
Naive Bayes (NB) [48] is a simple probabilistic classifier based on Bayes' theorem. It cannot handle missing data.
Extreme Gradient Boosting (XGBoost) [49] is a machine learning algorithm based on decision trees, which employ a gradient boosting structure. XGBoost is a library of machine learning which uses a scalable and distributed form of Gradient-Boosted Decision Tree (GBDT). There are many advantages to using this machine learning library, such as parallel tree-boosting, and it is the leading machine learning library for regression, classification, and ranking.
LGBM (LightGBM) [50] is an efficient, distributed, and high-performing gradient boosting framework based on decision tree algorithms, used for ranking, classifying, and other machine learning processes.
LGBM is an application of the gradient boosting framework that is based on tree-structured algorithms. Table 2 displays the most recent machine learning models for early detection of COVID-19 or assessing the disease severity level of COVID-19 patients based on laboratory and clinical data.

Deep Learning
The performance enhancements of hardware components, such as graphics cards, and the drop in unit costs are two reasons DL has grown in popularity. DL has also been aided by machine learning and information processing studies [51][52][53], as well as an increase in training data. Numerous domains, including computer vision [36,[54][55][56], natural language processing [57][58][59][60][61][62], and speech recognition [63], have extensively used various deep learning architectures, such as ANN, CNN, and RNN. ANN is a method of processing information that draws inspiration from the organic nervous system of humans. This structure is made up of neurons, activation processes, and input, output, and hidden layers. In an ANN, each layer comprises a hierarchy of neurons. The input for the following layer is the layer's output before it. From the incoming data, each layer learns increasingly intricate relationships. A deep learning system called CNN was created to analyze visual data, such as photographs and movies. Different layer types perform various functions on CNN. The names of these components are the convolution layer, pooling layer, fully connected layer, and activation function layer. In RNN structures, the outcome is influenced by both the other and current inputs. These networks produce their results by combining data from the past and present.
In this study, we selected 11 deep-learning-based studies, shown in Table 3. Overall, the number of works presented based on deep models is small; however, their accuracy in datasets with a large number of data is superior to ML methods.

Features
The importance of features in routine blood tests for COVID-19 diagnosis is significant, because in machine learning and deep learning, these features can be used to build predictive models that can identify patterns in data, allowing for earlier detection and diagnosis of COVID-19. Additionally, routine blood tests can be used to monitor disease progression and treatment response in patients with COVID-19. Changes in these features over time can indicate the severity of the disease and the effectiveness of treatment. Moreover, the identification of the most informative features from routine blood tests can help develop models that can predict disease outcomes and mortality rates. This information can be used to allocate resources and prioritize treatment for patients most at risk of severe illness.
The consequences of a wrong selection of features in medical diagnosis using routine blood tests, especially for COVID-19, can have severe consequences for patient health and treatment outcomes. Importantly, machine learning and deep learning algorithms rely heavily on the selection of appropriate features to make accurate predictions. If the wrong features are chosen, the accuracy and reliability of the diagnostic results can be significantly impacted. In the case of COVID-19 diagnosis, a wrong selection of features could result in misdiagnosis or delayed diagnosis, leading to delayed treatment and potentially worse outcomes for patients. For example, if important features, such as inflammatory markers or lymphocyte counts, are excluded from the model, patients with mild or asymptomatic cases of COVID-19 may be missed, leading to the spread of the disease. Moreover, a wrong selection of features can lead to false positives or false negatives, which can have significant implications for patient care. False positives can lead to unnecessary medical interventions or treatments, which can be costly, time-consuming, and may cause patient harm. On the other hand, false negatives can result in delayed treatment, which can lead to the progression of the disease and worse outcomes for the patient. Additionally, a wrong selection of features can result in a lack of generalizability and poor performance of the model. The model may not perform well on new data or may be specific to a particular population or dataset, limiting its usefulness and applicability in real-world settings.
The blood features that were used by previous studies (see Table 4) are as follows [152]: • Hematocrit: The computation of the ratio of erythrocytes (commonly referred to as red blood cells) in the blood is carried out. When this percentage is low, it could signify respiratory difficulties and possibly reveal the severity of COVID-19 cases [153]. • Hemoglobin: A particular material found in red blood cells carries oxygen in the bloodstream. When someone is diagnosed with pneumonia due to COVID-19, a drop in the level of this material (known as Hb) shortly after the diagnosis could suggest that the pneumonia is getting worse. It is worth noting that anemia is a frequent occurrence in COVID-19 cases as well [154]. • Red blood cell distribution width (RDW): Another term used to describe this is the RDW coefficient of variation. It offers a way to quantify the variability in the dimensions of erythrocytes; while initially utilized as a diagnostic tool for anemia, it has since evolved into an indicator of infections and more severe ailments, including cardiovascular and cancer. Although it is not a reliable indicator of the presence of COVID-19, it has been recognized as a sign of the disease's severity, as elevated RDW levels have been associated with mortality in cases of COVID-19 [155]. • Mean corpuscular hemoglobin (MCH): It pertains to the mean amount of hemoglobin, a protein that carries oxygen throughout the body, present in every individual red blood cell [75]. Sarkar et al. [156] suggested that changes in MCH levels could indicate the presence of COVID-19. A decrease in the MCH value typically signifies a lack of iron in the body, known as iron deficiency anemia. In general, people with COVID-19 tend to display MCH values that are slightly below the normal range, falling within one standard deviation. • Mean corpuscular hemoglobin concentration (MCHC): It is a measurement that determines the average hemoglobin concentration inside an individual red blood cell, similar to MCH [75]. A low MCHC value suggests that a person's red blood cells have insufficient hemoglobin, indicating anemia. According to [157], this metric aided in the differentiation of cases of COVID-19 from pneumonia contracted within the general community. MCV, which stands for Mean Corpuscular Volume, is an indicator of the typical size or volume of erythrocytes [75]. Changes in the mean size of red blood cells, whether an increase or a decrease, can signal underlying health concerns, and research has associated such alterations with the severity of COVID-19 [158]. • Lymphocytes absolute: A reduced level can be an indication of serious COVID-19, which can prompt early treatment or suggest a negative outcome [154]. This metric serves as a marker for infectious processes. • Leukocytes: The immune system's defensive cells are called white blood cells. Research has demonstrated that COVID-19 has the ability to attack these cells, causing them to discharge pro-inflammatory cytokines that result in an increase in inflammation within the affected individual [159]. Furthermore, it is possible to utilize indicators present in the genetic composition of leukocytes to detect the existence of COVID-19 [160]. • Basophils absolute: They are crucial cells of the immune system, and their levels tend to rise during prolonged inflammation or allergic reactions. However, research has shown that individuals infected with COVID-19, particularly those with severe cases, experience a significant decrease in basophil counts [161]. Similarly, eosinophils, which play a role in defending the body against parasites and infections, also exhibit reduced levels in COVID-19 patients [162]. • Platelets: The bone marrow produces these cells that aid in the process of blood coagulation. Keeping a close eye on this measurement is crucial because a rise in its levels may not always be indicative of COVID-19, but can instead point to complications related to the disease, including thrombosis [163]. • Monocytes absolute: The protection against different microorganisms and viruses is provided by monocytes and macrophages, which are essential constituents of the immune system [164]. Macrophages exist in bodily tissues, whereas monocytes can be found in the bloodstream and are identifiable through blood counts. Despite their beneficial characteristics, these cells can have harmful effects on those with COVID-19, leading to lung infections and lesions. Several studies have revealed a reduction in the number of monocytes in people with COVID-19 [164]. According to other research, such as Meidaninikjeh et al. [165], it is proposed to create novel methods to detect the migration of these cells towards the lungs as a potential sign of COVID-19, and to utilize suitable treatments to reduce lung harm. • SARS_CoV2_PCR: It is dependable in verifying the existence of a COVID-19 infection because it detects the virus's genetic material. The variable under consideration will be allocated a value of 0, which indicates negative instances, and 1, which signifies positive instances.

Discussion and Analysis
It is challenging to diagnose coronavirus using routine blood testing. Researchers have employed numerous preprocessing strategies, feature extraction approaches, and classification models [166]. Identifying a single strategy or set of methodologies that produce the best outcomes for detecting COVID-19 from regular blood tests is challenging. Most research articles showed accuracy rates of more than 90%, which may be extremely high. Nevertheless, the goal would be to raise the accuracy to about 100%, as inaccurate disease classification, even in a small number of cases, is wholly unacceptable. On the other hand, generalization capacity poses a serious issue for all learning-based methodologies. It results from both the procedures themselves and the diversity of the training dataset. As a result, deep learning has had a significant impact on how routine blood tests are applied, and we anticipate that it will become a more effective methodology in the future [167].
Different techniques of ML and DL have been used in 92 reviewed studies. The ML or DL methods utilized are displayed in Figure 4. Figure 5 shows that with a percentage of 16%, Random Forest is the most used machine learning method, followed by LR (14%), SVM and XGBOOST (11%), KNN and ANN (7%), DT (6%), etc.
Four metrics, including accuracy, sensitivity (recall), specificity, and AUC, were used to diagnose COVID-19 and to evaluate and compare the performance of the suggested methods quantitatively. These four performance metrics used in the literature of COVID-19 diagnosis are shown in Table 5. Accuracy indicates that how many samples are classified properly (ratio of true predictions over all predictions ). Sensitivity or recall refers to the rate of the number of correctly classified COVID-19-positive samples to the total number of suspected samples. Specificity refers to the rate of identifying negative samples correctly. The area under the ROC curve is represented by AUC, from (0, 0) to (1, 1).  Table 5. Performance metrics used in the research articles.

Performance Metrics
Refs.
Although we conducted a search in four of the most major databases and found over 4785 documents, there are some drawbacks to this research. The first one is the possibility of selection bias because only English-language research articles were chosen. The keywords used for the queries also influence the results that are obtained. Another limitation of this research is the time frame in which the search was conducted. As with any rapidly evolving field, new research is constantly being published, and the results of this study may not reflect the most recent findings. Additionally, the scope of the search may have been too narrow, focusing only on research articles related to a specific aspect of the topic. Moreover, the quality of the research articles selected in the review may vary, as not all research articles undergo the same level of peer review or scrutiny. It is possible that some of the research articles chosen may have had limitations in their methodology or analysis, which could impact the validity of the conclusions drawn from the review.
We want to take note that the primary benefit of the current study is that it gives the reader a comprehensive list of recent research articles that use various forecasting approaches based on routine blood tests. The reader gets access to a method-based categorization of publications (ML and DL). For anyone interested in this subject, the research articles given in this study would be a good place to start and would hasten their learning in this area of study. For everyone involved in a literature review, the flowchart shown in Figure 3 would be useful. There are two primary drawbacks to this study. We have only considered studies published within the last one to two years. This survey has not covered the material from other captivating databases. This is because of the overwhelming volume of publications that authors would have had to manage.

Conclusions
Millions of people's lives have been gravely threatened by the ongoing COVID-19 pandemic in a short amount of time. As the CT scan technique is more expensive and time-consuming than routine blood tests, it is apparent that routine blood tests are more broadly accessible than the CT image dataset. As a result, the majority of researchers used standard blood testing to identify COVID-19. After reviewing the literature in this field, we discover that there is a dearth of annotated data on those impacted by COVID-19. The performance of the aforementioned data-hungry models can be significantly improved by enhancing high-quality datasets of COVID-19 patients. ML and DL can detect the coronavirus using AI techniques when applied to routine blood testing. This study compares some recent research utilizing ML and DL algorithms to detect coronaviruses from routine blood tests obtained from multiple open-source datasets. This study only covers the 92 studies examined, while numerous studies have recently been undertaken based on these findings.
After reviewing 4785 research articles, only 92 were deemed pertinent to the topic under investigation for this study. This can provide the reader with a sense of how uncommon this topic is in the studied field. The application of ML and DL to the prediction of COVID-19 using routine blood tests remains unexplored. It should be noted that 559 full-text research articles were examined for this revision work. In order to appeal to readers, the authors suggest that any upcoming research should take into consideration other databases in the literature review, such as Emerald, Scopus, and Web of Science. In addition, the authors suggest researchers consider using open-source datasets to train and test ML and DL models for predicting COVID-19 using routine blood tests. Opensource datasets can provide a standardized and accessible platform for researchers to develop and test their models. This can also facilitate collaboration and sharing of data and results among researchers in the field. The authors also recommend that future research should explore the potential of using ML and DL in combination with other medical technologies, such as imaging and genomics, to develop more accurate and comprehensive diagnostic tools for COVID-19. By leveraging multiple sources of data, researchers can develop more holistic approaches to predicting and treating the disease.  Data Availability Statement: Data will be made available on request.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this study: