Accuracy of Machine Learning Models to Predict Mortality in COVID-19 Infection Using the Clinical and Laboratory Data at the Time of Admission

Aim This study aimed to develop a predictive model to predict patients’ mortality with coronavirus disease 2019 (COVID-19) from the basic medical data on the first day of admission. Methods The medical data including the demographic, clinical, and laboratory features on the first day of admission of clinically diagnosed COVID-19 patients were documented. The outcome of patients was also recorded as discharge or death. Feature selection models were then implemented and different machine learning models were developed on top of the selected features to predict discharge or death. The trained models were then tested on the test dataset. Results A total of 520 patients were included in the training dataset. The feature selection demonstrated 22 features as the most powerful predictive features. Among different machine learning models, the naive Bayes demonstrated the best performance with an area under the curve of 0.85. The ensemble model of the naive Bayes and neural network combination had slightly better performance with an area under the curve of 0.86. The models had relatively the same performance on the test dataset. Conclusion Developing a predictive machine learning model based on the basic medical features on the first day of admission in COVID-19 infection is feasible with acceptable performance.


Introduction
In December 2019, a mass outbreak of coronavirus occurred in Wuhan, China. The World Health Organization (WHO) formally named the disease coronavirus disease 2019 (COVID- 19) on February 11, 2020. Because of the rapid spread of the virus, there has been a sharp rise in the demand for medical resources to support infected people [1]. In about 20% of the patients, the infection is severe and may necessitate hospitalization. A mortality rate of about 13.4% has been reported in the severe form of the disease [2,3]. The risk assessment and knowing the predictors of death among patients diagnosed with COVID-19 is crucial to target high-risk patients through early and more intensive interventions [4]. In addition to efficient diagnosis and treatment, accurate prognosis prediction is necessary to reduce the strain on healthcare systems and provide the best possible care for patients. When allocating limited medical resources, prediction models that estimate the risk of a poor outcome in an infected individual based on prediagnosis information could help to triage patients more effectively [5].
So far, several predictive models for COVID-19 have been used, and it appears that the most important medical features are age, body temperature, gender, creatinine level, and blood pressure [6]. However, designing a standard study about prognosis prediction is challenging. Many studies suffer from weak design and likely overestimation of the model performance, as was evident in a systematic review by Wynants and colleagues. They reported that most predictive models are prone to high bias, mainly because of selection bias, unclear reporting, and a high chance of overfitting [6].
Machine learning (ML), a branch of artificial intelligence (AI) that learns from past data to build predictive models, can help in this circumstance [7]. In recent years, ML has been developed as a useful tool to analyze large amounts of data from medical records or medical images [8]. Recently, there are advances in the prediction of COVID-19 using ML. These advances include estimating the mortality risk in patients with suspected or confirmed COVID-19, predicting progression to a severe or critical state, and predicting the hospital stay duration [9].
This study aimed to develop an ML platform for outcome prediction in admitted COVID-19 patients. We try to develop clinical and laboratory features accessible in small rural medical centers and not the sophisticated ones, including computed tomography (CT) images.

Materials And Methods
This study was approved by the ethical committee (Arak University of Medical Sciences) of our university. Informed consent was obtained from all subjects upon admission.

Retrospective part of the study: training dataset
The electronic medical records of the patients admitted to a tertiary referral hospital from March 20, 2020, to November 20, 2020, were evaluated retrospectively. All adult patients (age above 18 years) with a clinical diagnosis of COVID-19 infection were included. The clinical diagnosis was consistent with the physician's diagnosis based on the patients' exposure to positive cases of COVID-19 and positive clinical and imaging findings. The polymerase chain reaction (PCR) test was not used as a part of the inclusion criteria. However, this test was used as one of the predictive features to predict mortality. The patients who left the hospital against the physicians' recommendation were excluded. All patients with a clinical diagnosis of COVID-19 who died during the admission were recorded. The patients admitted and discharged with the clinical COVID-19 infection during the same time interval were more than the expired patients. However, only 270 discharged patients were included randomly to avoid class imbalance.
Multiple medical features (predictive features) related to the patients' demographic, medical history, and clinical and laboratory findings "at the time of admission" were extracted and documented ( Table 1). The outcome (target features) was considered as discharge or death. The mentioned features were analyzed by the Orange data mining platform version 3.27 (Bioinformatics Lab at University of Ljubljana, Slovenia) [10]. Upon the feature selection, the most important predictive features were selected. Subsequently, different ML models were trained based on the selected features to predict the outcome (discharge versus death). The performance (sensitivity, specificity, accuracy, and area under the curve {AUC}) of each model was then evaluated and reported by 10-fold cross-validation. Moreover, different ensemble models were developed by combining variable ML models. The performances of the ensemble models were also assessed and reported by 10-fold cross-validation.

Prospective part of the study: test dataset
Upon compilation of the study, the final trained models were tested on a test dataset. For the test dataset, the patients with the clinical diagnosis of the COVID-19 who were admitted to the same hospital were included prospectively from November 21, 2020, to February 3, 2021. Upon evaluating the medical record, the demographic, medical history, and clinical and laboratory data at the time of admission were collected in the same fashion as described in the training dataset. All consecutive patients in the mentioned time window were included. The patients who left the hospital against the physicians' recommendation were excluded. The data collection did not interfere with patients management. The physicians and nurses who took care of the patients were not informed. The trained models were then tested on this prospective dataset to predict the outcome. The performance of each model was then calculated from the confusion matrix.
All statistical analyses were performed using SPSS version 22 (Chicago, IL: SPSS Inc.) and R version 3.3.3 (Vienna, Austria: R Foundation for Statistical Computing). The P-value of 0.05 was considered as the cutoff point.

Results
After implementing the inclusion and exclusion criteria, 520 patients were included in this project with a mean age of 67.48 years and a standard deviation (SD) of 16.27; 264 patients (50.77%) were female (    The trained models were then tested on the test dataset. Again, the naive Bayes had the best performance with an AUC of 0.82, an accuracy of 80%, a sensitivity of 81%, and a specificity of 76%. The ensemble model from the naive Bayes and neural network combination had slightly better performance with an AUC of 0.85, an accuracy of 81%, a sensitivity of 85%, and a specificity of 61% (Table 4).

Discussion
Prognosis prediction is critical for the triage of patients with COVID-19. In this context, the ML-based models appear promising. For instance, Vaid et al. implemented the Extreme Gradient Boosting (XGBoost) on medical records [11]. They could predict the mortality on days three, five, and seven after hospitalization with an AUC of 0.89, 0.85, and 0.84, respectively. In another AI model by a convolutional neural network (CNN), researchers differentiated the progressive form of infection versus the nonprogressive form using CT images and clinical data with an AUC of 0.91 [12].
In this study, we trained various ML models to obtain the best performance for mortality prediction of patients with COVID-19 infection based on clinical and laboratory findings on the first day of admission. We tried to select the training dataset in a way that does not suffer from class imbalance. By the feature selection, we also tried to avoid overfitting. The trained models were then tested on the test dataset to make sure there is no substantial overfitting.  [15]. By evaluation of the medical data from 1270 COVID-19 patients, Guan et al. found the following six predictive features to predict the mortality in COVID-19: age, levels of high-sensitivity C-reactive protein (hs-CRP), LDH, ferritin, and interleukin-10 (IL-10). The subsequent Simple-tree XGBoost model predicted the mortality with accuracy above 90% [16]. In a nationwide cohort of COVID-19 in South Korea, the researchers evaluated the data from 10,237 patients. They reported the age above 70 years, male sex, moderate or severe disability, being in a nursing home, having diabetes mellitus, chronic lung disease, and asthma as significant predictors of mortality. They reported the AUC of 0.96 for least absolute shrinkage and selection operator (LASSO) and linear support vector machine (SVM) models using these features [5].
By evaluating the studies mentioned above, it seems that the LDH level is a common predictive feature in most of them, including our study. However, there is significant variation among other selected features. This difference can be partial because of the demographic differences between these studies (e.g., different ethnicities, mean age, access to medical services, etc.).
There is controversy about the role of comorbidities in predictive models about COVID-19. In our study, the comorbidities were not found to be significantly predictive. Several previously published studies emphasized the role of comorbidities in the prediction of mortality in COVID-19 patients. For instance, Wollenstein-Betech et al. trained different models based on demographic data and comorbidities of symptomatic COVID-19 patients. In their study, SMV and LR showed the best performance to predict the hospitalization, mortality, need for ICU, and need for ventilator with the accuracy of 72%, 79%, 89%, and 90%, respectively [17]. In another study, hypertension as the most common comorbidity in COVID-19 patients had a controversial effect on severe illness and mortality [18]. In a meta-analysis by Silverio et al., hypertension did not show an independent contribution regarding the in-patient mortality rate [19]. However, in some other surveys, hypertension showed a strong correlation with the severity and fatality of COVID-19 [20]. Diabetes and dyslipidemia had predictive value for severe illness and mortality in different studies [20][21][22][23].
Our study has several limitations. The laboratory data are fairly accurate; however, the clinical data (e.g, vital signs) were documented from the medical records. The healthcare workers in our medical center did not receive additional training for our study. They were blind to the study, thus, the recorded clinical data can be influenced by the staff experience and accuracy. The other limitation would be the lack of imaging biomarkers that were previously revealed to have predicting value [24,25]. However, we tried to focus on easily accessible medical data that are feasible in small medical centers. There are many other factors that could interfere with the predictive potency of our findings, such as the therapeutic strategy for each patient, underlying deficiency of vitamins/minerals [26,27], interleukin levels [28], and the different virus mutants/variants [29,30]. In addition, our study is different than most similar researches. In most studies, the PCR test was used as an inclusion criterion for COVID-19 infection. However, in this study, the patients with a "clinical diagnosis" of COVID-19 who underwent standard of care treatment were included. In this context, the PCR test was used as a predictive feature rather than an inclusion criterion. This strategy was selected to address the false-negative PCR tests. Finally, this study has the intrinsic limitations of machine learning especially the concept of the black box; the processes of decision making and prognosis prediction are not transparent in these models. Although we tried to avoid the overfitting by the feature selection, all patients included in this study belong to a specific region, and it is unclear if the final models work well for other ethnicities.

Conclusions
Prediction of the mortality from the basic medical data on the first day of admission is feasible using the ML models with acceptable accuracy. Such models may end in better patient triage and improve these patients' survival by referring the high-risk cases to well-equipped centers at the time of admission. In our study, the naive Bayes model showed the best performance for this task. The selected features in our research are accessible medical data and are feasible to be recorded in small rural clinics. In the future, these models' performance may be improved by adding sophisticated medical data and images from tertiary medical centers.

Additional Information Disclosures
Human subjects: Consent was obtained or waived by all participants in this study. Ethical Committee of Arak University of Medical Sciences issued approval #IR.KHOMEIN.1399.006. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.