Harnessing Machine Learning in Early COVID-19 Detection and Prognosis: A Comprehensive Systematic Review

During the early phase of the COVID-19 pandemic, reverse transcriptase-polymerase chain reaction (RT-PCR) testing faced limitations, prompting the exploration of machine learning (ML) alternatives for diagnosis and prognosis. Providing a comprehensive appraisal of such decision support systems and their use in COVID-19 management can aid the medical community in making informed decisions during the risk assessment of their patients, especially in low-resource settings. Therefore, the objective of this study was to systematically review the studies that predicted the diagnosis of COVID-19 or the severity of the disease using ML. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA), we conducted a literature search of MEDLINE (OVID), Scopus, EMBASE, and IEEE Xplore from January 1 to June 31, 2020. The outcomes were COVID-19 diagnosis or prognostic measures such as death, need for mechanical ventilation, admission, and acute respiratory distress syndrome. We included peer-reviewed observational studies, clinical trials, research letters, case series, and reports. We extracted data about the study's country, setting, sample size, data source, dataset, diagnostic or prognostic outcomes, prediction measures, type of ML model, and measures of diagnostic accuracy. Bias was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). This study was registered in the International Prospective Register of Systematic Reviews (PROSPERO), with the number CRD42020197109. The final records included for data extraction were 66. Forty-three (64%) studies used secondary data. The majority of studies were from Chinese authors (30%). Most of the literature (79%) relied on chest imaging for prediction, while the remainder used various laboratory indicators, including hematological, biochemical, and immunological markers. Thirteen studies explored predicting COVID-19 severity, while the rest predicted diagnosis. Seventy percent of the articles used deep learning models, while 30% used traditional ML algorithms. Most studies reported high sensitivity, specificity, and accuracy for the ML models (exceeding 90%). The overall concern about the risk of bias was "unclear" in 56% of the studies. This was mainly due to concerns about selection bias. ML may help identify COVID-19 patients in the early phase of the pandemic, particularly in the context of chest imaging. Although these studies reflect that these ML models exhibit high accuracy, the novelty of these models and the biases in dataset selection make using them as a replacement for the clinicians' cognitive decision-making questionable. Continued research is needed to enhance the robustness and reliability of ML systems in COVID-19 diagnosis and prognosis.


Introduction And Background
Machine learning (ML), one of the broad disciplines of artificial intelligence (AI), refers to the ability of a machine to understand and learn hidden knowledge by finding patterns in large datasets using analytical techniques [1]. ML requires modeling design, learning functions, and developing algorithms. The main idea is to enable automated classification or clustering techniques to increasingly learn the behavior from data to generate new patterns and predict future actions using decision support systems [1]. Generally, ML can be broadly divided into three types: supervised learning, unsupervised learning, and reinforced learning [2]. The "supervised" method is the type often used in disease prediction. Supervised ML includes several classes such as regression, support vector machine, decision tree, random forest, naive Bayes, Knearest neighborhood, and artificial neural network [3]. A more complex form of the neural network is deep learning (DL), which employs multiple layers of neural networks [4]. DL can be supervised, unsupervised, or reinforced.
ML has been frequently adopted as an aid for diagnostic screening during the COVID-19 pandemic, where the research suggests its ability to identify infected individuals from radiological imaging before symptoms develop [5]. ML technology also has the ability to process hundreds of thousands of images in a short period while exhibiting higher sensitivity and specificity for picking up radiological changes compared to the naked human eye [5]. At the beginning of the COVID-19 pandemic in 2020, there was an urgency to expand on TABLE 1: Set of search strings adapted to each of the databases searched LH conducted a database search, and results were exported to Endnote [23] to facilitate the collaboration of reviewers during the study selection process. The search strategy followed two stages and was conducted by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) of Observational Studies in Epidemiology reporting guidelines [24]. In the first stage, four investigators (RD, AJ, JH, and HT) independently screened the titles and abstracts of all the articles retrieved from the searched databases. If sufficient information was available in the abstract of an article to decide whether to retain or exclude it, the decision was made to exclude such articles from the full-text screening stage. Otherwise, the articles with titles relevant to the topic of interest, in which abstracts did not provide sufficient information for exclusion, were included in the full-text screening stage. During the second stage, the same four investigators screened the full text of all articles retained from the first stage for inclusion and exclusion criteria. When in doubt, disagreements were resolved with consensus.

Inclusion and Exclusion Criteria
We included observational studies, clinical trials, research letters, case series, and case reports addressing ML models in COVID-19 prediction without language restrictions. However, inclusion was restricted to articles that met the following criteria: (1) The article was published in a peer-reviewed journal; (2) the population was any patients with suspected SARS-CoV-2 infection or with a confirmed diagnosis when the prognosis was predicted; (3) the use of ML models was for assisting diagnosis or prognosis of suspected or diagnosed COVID-19 patients; and (4) the outcome of interest was COVID-19 diagnosis. We excluded time series, surveillance studies forecasting the COVID-19 pandemic, systematic or narrative reviews, opinions, short communications, commentaries, statement articles, news reports, preprints, and articles where we failed to access full text despite contacting the authors. However, preprints that were published at the time of writing this article were included. We also excluded any study that only used ML models to predict the diagnosis or prognosis of diseases other than COVID-19 or studies that predicted the diagnosis or prognosis of COVID-19 without ML. Two authors (RD and MA) resolved the discrepancies through discussion and

FIGURE 2: Number of publications by country based on authors' countries
Overall, 42 (64%) studies used publicly available secondary data (Appendix 1). The most commonly used source for COVID-19 radiographic images was the Joseph Cohen dataset [95], while the most frequent source for non-COVID-19 radiographic images was Kaggle.com [96]. Although these databases were frequently used in the included studies, there was insufficient information to evaluate the similarity of samples retrieved from these publicly available data. As a result of having these available secondary data sources for COVID-19 cases and non-COVID-19 individuals, the most common study design was an un-nested case-control design (71%). Of the studies that used primary hospital data (29), the data were predominantly from Chinese hospitals (19), most of which were from Wuhan province. Few studies had smaller sample sizes of fewer than 100 patients [27,44,46,48,51,91]. In one study, the sample was not mentioned altogether [66]. An important observation was that 52 studies (79%) were in the field of radiology in which the ML models were developed using chest radiographs for predicting COVID-19 diagnosis and distinguishing it from other lung diseases or predicting disease severity among hospitalized COVID-19 individuals. It appeared that 30 studies used chest X-ray images, 20 used chest CT images, one used chest X-ray and CT images, and one used chest ultrasound (US) frames. Thirteen of the included studies used ML models to predict COVID-19 disease severity [39,40,46,51,56,59,63,77,86,87,89,92]. Severity outcome measures included the need for ICU transfer, hospital stay time, mechanical ventilation, and death (Appendix 1).

Types of Machine Learning Techniques Used
Different ML methods were used in the studies. These included convolutional neural networks (CNNs), decision trees (DT), random forest (RF), gradient boosting machines (GBM), support vector machines (SVM), artificial neural networks (ANN), k-nearest neighbors (KNN), logistic regression, and naive Bayes (Appendix 1). Most studies used (DL) models (74%), specifically in the form of CNN. However, ML is not limited to this technique. Some studies used a combination of types of ML models to enhance the CNN model. In contrast, others compared the performance of different ML models to identify the one with superior diagnostic accuracy. Modifying pre-trained models was also popular among the retrieved studies. For the most part, model architecture was clearly described, and the breakdown of datasets to testing, training, and validation was also mentioned. This later information was missing from 15 studies.

Diagnostic Accuracy Measures
The reported measures of model accuracy varied across the retrieved studies (Appendix 1). These measures included accuracy, sensitivity, specificity, precision, recall, F1 score, positive predictive value (PPV), negative predictive value (NPV), the area under the curve (AUC), Kappa statistic, % correctness, and % completeness. The most frequently reported measure was accuracy. The majority of studies reported performance measures exceeding 90%. This was commonly reported in studies that utilized ML to diagnose COVID-19 through chest imaging. In all instances, ML accuracy was superior to the resident or consultant's naked-eye diagnosis.

Risk of Bias Assessment
The overall risk of bias was "unclear" for 56% of the studies, while the applicability concerns were "low" for 88% of the studies (Figure 3). Unfortunately, most of the studies fell short in the domains of participant selection. This was because the majority of studies, and more specifically those using secondary open-source data, selected their databases arbitrarily, and inclusion and exclusion criteria for the selected sample were never mentioned. Additionally, it was unclear why the databases were fixed or whether or not the subsamples (COVID-19 vs. no  were randomly selected, which may have introduced selection bias to the studies. When data from different countries were used, it was unclear how comparable these data were. The predictor data were mostly chest images, and participants' characteristics were rarely considered during analysis. As most studies involved chest imaging, the timing of chest image acquisition was seldom recorded. The quality of chest image datasets was also questionable. For example, the images from the two most commonly used data sources (Joseph Cohen and Kaggle.com) are stored in JPEG format, which is of low 8-bit depth (256 gray shades), making them vulnerable to losing important pixel information. This imaging quality does not reflect the clinical or radiology practice where digital imaging and communications in medicine (DICOM) images of at least 12 bits (4096 gray shade) are used.
Most of the data used in these studies were from China, so it is not easy to assume similar accuracy if the ML models are tested on outside populations. On the other hand, other studies created large datasets, including data collected from different hospitals worldwide. Moreover, most studies did not specify the covariates controlled for in the ML models. This may have introduced confounding by unmeasured personal characteristics. The points above may have introduced bias to the evaluated studies' internal and external validity.

Discussion
Incorporating ML into health care is becoming more common. Advancements in ML have accelerated exceptionally during the COVID-19 pandemic, in which the technology has been adopted and improved for COVID-19 screening, diagnosis, and treatment, in addition to vaccine development [97,98]. The current systematic review focused on summarizing the published literature on utilizing ML in the diagnosis or prognosis of COVID-19 during the early phase of the pandemic. The findings from this study can be summarized in the following points. First, the studies suggest that ML can indeed help in identifying COVID-19 with high levels of accuracy, especially in the context of radiological diagnosis. Second, DL is the most preferred ML method for this purpose. Third, secondary data analysis was common among these studies as many researchers shared these data through open platforms. However, despite this data compilation, most of the data were collected from Chinese populations. There was little effort to merge large datasets to conduct ML testing on large samples representing various populations worldwide. The popularity of the application of ML on chest X-rays and CT scans in the retrieved studies agrees with what has been previously published [14,15,97,98]. This may be due to the feasibility of obtaining chest images in most healthcare settings. It may also be linked to the availability of open-source chest image data for training and the numerous existing pre-trained models that can be applied to chest images [14,15,97,98].
The literature examining the utility of ML in chest imaging suggests that this technology is exceptionally efficient in augmenting physicians' diagnoses, which can help reduce medical errors and improve patient safety [97,98]. Although abundant literature explores using ML to identify pulmonary lesions on chest imaging, there is still room for innovation in this domain. Future research could combine all available data from different countries into one mega-dataset and validate and test existing models for diagnostic accuracy. Another venue worth exploring would be pushing the accuracy of ML in identifying COVID-19 lesions using US chest imaging. This method is less invasive than conventional radiological approaches and has not been thoroughly examined in the currently reviewed literature [99]. DL is the most common ML method utilized as a decision support system for medical purposes [19,97,98]. DL has been described as having a shorter testing time when compared to other types of ML models. Additionally, many pre-trained DL models, particularly CNN models, have been shared during the pandemic as open-source algorithms that may have made it easier for other researchers to use as backbones to build on [98].
Given that most of the literature examined DL models using chest images as the main predictor, we recommend that future research expand on existing models and experiments with DL using presenting symptoms and laboratory markers as predictors. The retrieved studies suggest that the latter two indicators were used primarily in regression rather than DL modeling. The availability of COVID-19 data repositories may have driven the frequency of using secondary data for ML modeling. The urgency of expediting and facilitating COVID-19-related research during the early stages of the pandemic made scientific journals encourage authors to share their data through publicly accessible COVID-19 data repositories. Most of these include Chinese data, followed by data from the United States, the United Kingdom, and the European Union [100]. This may explain the abundance of Chinese data in the studies retrieved for our systematic review.
However, many publicly available chest X-ray data are stored in non-standard format with limited gray shade levels. This factor may limit the generalization of the used model. Chinese scientists also had the highest rate of COVID-19-related research production, especially in the early stages of the pandemic [100]. This may be explained by the natural course of the COVID-19 pandemic, which spread from China to other parts of the world two to three months later. Shuja et al. evaluated the sharing of COVID-19 datasets during the pandemic and identified 23 medical datasets shared for COVID-19 research [101]. Some of the mentioned drawbacks of these data included limited generalizability to other populations, small sample sizes, and challenges in accessing non-open-source data [101].
There are a few significant limitations to our study that should be mentioned. Due to the variability in ML models, datasets used, and accuracy measures, we could not synthesize pooled accuracy estimates. This variability also made it challenging to select the best method for ML modeling for prediction; different ML methods should be used depending on the context and desired prediction functions [98]. Moreover, our systematic review was limited to the search engines mentioned. Therefore, our review could have missed studies indexed outside these databases and in languages other than English. However, despite these limitations, this systematic review provided a detailed summary of the data types, predictive measures, and accuracy measures reported in ML models used to predict the diagnosis and prognosis of COVID-19 in the early pandemic phase. It also provided a detailed critique of the quality of the published literature, something lacking in many of the available reviews posted on this topic. We believe that our results can be used as a data source for future researchers to select existing models and publicly available data to experiment with in order to modify ML methods for enhancing healthcare delivery, especially with the new development in AI-Chatbots, such as ChatGPT, that was used to trigger possible causes of excess mortality in 2022 [102,103]. Further research is warranted on whether evolving AI-Chatbots could facilitate early integration of AI into future infectious disease outbreaks, provided these models become more reliable [104][105][106].

Conclusions
The COVID-19 pandemic has caused unprecedented disruption to healthcare systems around the world. This has led many countries to adopt modern technological approaches that can be alternatives to high-cost and inaccessible medical investigations and management modalities for combating COVID-19. The research suggests that ML can serve as a helpful aid in localizing and segmenting COVID-19 lesions on chest images. However, due to the uncertainty around the selection of samples in such research and the ambiguity in controlling for essential confounders in the development of such ML models, the results of accuracy in disease prediction should be approached with caution. Nevertheless, this research is rapidly evolving and requires more efforts to validate and test the existing models to establish their efficacy in different population settings. Although this current technology should not replace the gold standard diagnostic method for COVID-19 via RT-PCR, we encourage researchers to continue the scientific battle against this pandemic, focusing their interests on developing large datasets from different countries on which the existing models can be tested. These can be formed into mega-data repositories. Finally, transparency about data sources and sampling techniques is also essential for scientists to improve the quality of ML diagnostic and prognostic research.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.