Artificial Neural Networks model for predicting patients’ mortality due to COVID-19

Introduction. COVID-19 is the pandemic of the century with the unusual circumstances it generated. Subsequently, there has been medical and human scarcity of resources leading to the health system collapse, especially in third world countries. Objective. To support the white army in grasping the pandemic behavior, several studies have pointed to the existence of patient-related factors affecting COVID-19 patients’ mortality-risk. In the current study, Artificial Neural Network (ANN) has been employed to predict COVID-19 mortality. Material and methods. In particular, the modeling phase was done using a database of 684 samples collected from Mohamed Seddik Ben Yahia Hospital of Jijel, with antecedent diseases and blood biomarkers data of patients. Firstly, 18 parameters were selected in the input layer based on the literature recommendation and expert medical team consultation. Furthermore, the optimal inputs have been modeled using the ANN, par rapport aux autres modèles proposés dans la littérature. Par la suite, le modèle optimal proposé a été utilisé pour développer une interface publique GUI par le logiciel Matlab. Conclusion. Enfin, une interface graphique fiable et simple d'utilisation est générée dans la présente étude baptisée « CoviSurv2021 ». Cette dernière sera très utile au personnel médical pour sélectionner les patients prioritaires qui ont une urgence supérieure à être hospitalisés, prioriser les patients lorsque l'hôpital est surchargé et gagner du temps pour fournir les soins nécessaires.


Introduction
Human history has witnessed the spread of deathly diseases that took the lives of millions. To begin with, an infectious respiratory disease spread in China 18 years ago, it was known as SARS, this was the first shape of Coronavirus [1,2]. Later on, Saudi Arabia in 2012 witnessed a mortal contagious known as Middle East Respiratory Syndrome [3,4]. This latter spread again severely in the late of 2019 with a new sophisticated form, this new Coronavirus formed was named SARS-Cov-2 or COVID-19 (Severe Acute Respiratory Syndrome Coronavirus 2). Coronavirus is a zoonotic virus i.e. It transmits from animals to humans. The reasonable ground standing behind Coronavirus name is the crown-like spikes on their surface [5]. Furthermore, four main subfamilies of corona-viruses have been detected, known as Alpha, Beta, Gamma, and Delta [6]. The current pandemic caused by the newly emerged Coronavirus SARS-CoV-2 has created a uniquely dangerous situation around the world [7], which forced national governments to impose drastic measures to contain it, especially since the international economy was disastrously affected. SARS-CoV-2 caused a disease called COVID-19, which is marked by some unique symptoms such as coughing, fever, chills, and a range of respiratory symptoms [8]. By the end of June 2021, the total number of the confirmed COVID-19 cases had exceeded 187 million worldwide, while the total number of deaths approached 4 millions. Unfortunately, recent studies have proven that this pandemic will live with us for many years to come, making the development of new methods to mitigate the dire consequences of COVID 19 by saving lives, an inescapable solution [9]. In the pandemic case, the most important thing is how to manage patients diagnosed with COVID-19 by selecting those at a higher risk of mortality at the very beginning of the symptoms, for the sake of providing them with the appropriate treatment, taking into consideration that the condition of high-risk patients can deteriorate rapidly. Previously published studies reported that patients who died from COVID-19 initially had mild symptoms, but suddenly they turned into a critical stage causing their death [10]. For instance, in Italy, 75% of the deceased patients showed mild symptoms, such as fever, dyspnea, and cough, upon their admission to the hospital [10]. Thus, the development of an Artificial Intelligence (AI) interface to predict patients mortality is very essential. On the other hand, machine learning and AI are gaining popularity in the field of computer sciences. They have, undoubtedly, enhanced human life in many areas, providing radical solutions to many problems that were difficult to solve by classical methods. The Application of AIhas been robustly adopted in various sectors like science, engineering, business, weather forecasting, and medicine due to the huge impact it had on them. These domains helped caregivers with medical decision-making through its multiple functions [11]. The well-known examples published in the literature are the prediction of the infection severity with the COVID-19 and death possibility. The latter is what has been covered in this research paper with ANN modeling assistant. Since the emergence of the COVID-19 pandemic, several researchers have proposed machine learning (ML) models to predict the mortality risk of Coronavirus patients [12]. The significant benefit of proposing machine learning models is to aid the white army in selecting patients-priorities i.e. who urgently needs attention first or to be hospitalized. Prioritizing patients when the hospital is overcrowded, and gaining time to provide the care needed. Machine learning methods have proven to be effective in many applications [13]. The review of machine learning published studies used to predict COVID-19 mortality is provided in Table 1. Blood biomarkers used in the majority of them, have turned their proposed models to be very helpful. This is due to the fact that biomarkers could help in categorizing COVID-19 patients that have a high risk of dying, by giving vital information concerning their health status as provided in previous studies [14,15]. However, the aforementioned suggested models are limited to the use of fewer input parameters in modeling the mortality risk in patients with COVID-19. The important shortcoming of these models is that they disregard the antecedent diseases parameters. This has probably been conducted to oversimplify the complicated mechanism of the understudy phenolmena when several factors have been deemed to enhance the model predictability. Although, Zhang et al., (2021) have carried out a clinical study on 82 COVID-19 patients, which proved that respiratory, cardiac, hemorrhage, hepatic, and renal antecedent diseases had caused the death of 100%, 89%, 80.5%, 78.0%, and 31.7% patients, respectively. Most of the patients had augmented CRP (100%) and D-dimer (97.1%) [16]. Accordingly, the use of these parameters in relation to blood biomarkers could enhance the predictive capability of the provided model. Additionally, the majority of the available studies have suggested mathematical models in the form of equations, which make the proposed model hard to use in the future. Certainly, this idea has useless significance for other researchers and medical staff. To overcome this limitation, we have presented our model as a simple, reliable, and easy-to-use interface (dubbed "CoviSurv2021"), which can easily predict the COVID-19 mortality in the future cases by providing the input parameters.In such a case, the current proposed model can be readily used, and thus available to anyone interested in the problem of modelling, whatever his level.

Dataset
In this paper, dataset contained infected and dead COVID-19 samples, collected from Mohamed Seddik Ben Yahia Hospital, Jijel, starting from August until December 2020. We had browsed patients files in the aforementioned hospital, which contained more than 1600 confirmed COVID-19 patients including 684 fully sufficient samples containing patients from both genders (male and female). It was worth mentioning that the disease was confirmed by a PCR test. We also removed the incomplete data samples and massing values to conduct a reliable study. In order to make an accurate modeling step, a considerable effort has been undertaken to make the database balanced, with a close number of samples for both recovered and deceased patients in training and validation data. The data samples in both the training and validation phase has been randomly chosen and completely detached. The Fig. 1. presents the methodology structure adopted in the present study. For a reliable study, we removed useless information, concentrating on both antecedent diseases and blood biomarkers parameters. A medical team of experts was consulted to make sure that all of the relevant inputs were used. Table 2 and Table 3 present an overview of the variables adopted in this study.   An input layer receives the external data, an output layer gives the problem solution, and a hidden layer is an intermediate layer that separates the other layers. The ANN rely on mathematical algorithms, which generally use the back-propagation algorithm to learn from real datasets, and provide a mathematical model, which could estimate the output parameter from the input ones in future studies [27]. Fig. 3.
illustrates the structure of the ANN application. In addition, the ANN principal mechanism involves a method that usually follows three stages: training, testing, and validation. The objective was to decrease errors between the output and target values [28].

Fig. 3. ANNs architecture
The first stage is the training that aims to regulate iteratively the weight and bias values, frequently by using the popular training algorithm "Back Propagation Algorithm" for the sake of determining the selected stopping criteria. Therefore, the method could offer the optimal model, which is formed by weight and bias values, and transfer function. The latter can be selected based on the type of the understudied problem (i.e. linear, log-sigmoid, or tansigmoid function) [29,30]. The optimal model will be very handy for estimating effectively the target value when providing the input values, with the minimum error possible. Furthermore, the optimal model was composed by weight, bias, and transfer function suffers from the hard fitting used in future studies by the medical staff. Accordingly, in the current study, we presented the model in the form of easy-to-use and reliable interface.

Performance evaluation of ANN Model
ANNs have been used to train data with a number of varying nodes in the hidden layers, in order to find the optimal one (0-20 nodes in the hidden layer). The performance of each model is assessed according to several measures extracted from the confusion matrix. In the same vein, the confusion matrix is a technique for summarizing and evaluating the performance of a classification model. Hence, it is known as an error matrix, where N is the number of target classes and each row in the matrix represents the predicted class, and each column represents the actual class [31]. A confusion matrix is, generally, utilized to characterize and visualize the performance of the ANN model classifier and to offer an overview about the model misclassifies. For our study, the classification was binary i.e. the confusion matrix was a 2x2 matrix, because the target variable had two values: Positive or Negative as shown in Table 4 [32]. The precision estimation of the suggested models was assessed via several performance measures and graphical presentation. The performance measures were Sensitivity, Specificity, Precision, and Accuracy. They are expressed in equations 1, 2, 3, and 4 respectively.

Descriptive data analysis
The database employed in this study consisted of a sample of 684 individuals (190 death and 494 recoveries), with 18 attributes (10 quantitative and 8 qualitative features. According to SPSS treatment, Table 5 shows the descriptive statistics of the quantitative variables, which were very dispersed. Furthermore, a considerable effort has been done in order to gather as much patient information as possible for using a well-established database. The findings indicated that all variables were regularly distributed. Likewise, results confirmed that dataset included a wide range of data. Consequently, this dataset could be utilized to evolve new empirical models.

Evaluation of the COVID-19 mortality using ANN
To determine the most appropriate ANN model, the principal phase involved choosing the ideal input factors, which highly influenced on the target one, the next stage, was to define the optimal nodes number in the hidden layer. In this regard, to define the suitable input factors of the ANN method, both the antecedent diseases and blood biomarkers parameters have been utilized based on the literature recommendation and expert medical team consultation. Afterward, we have tried to determine the ideal number of nodes in the optimal ANN model for assessing COVID-19 mortality depending on four performance measures. The efficiency of each model during the training and validation phases is illustrated in Table 6. Four indicators measures were utilized in order to compare the suggested models to choice the optimal one in terms of sensitivity, specificity, precision, and accuracy.
The used data were separated into two sections, i.e. 80% for training and 20% for validation. As Table 6 proved, the COVID-19 mortality was modeled using ANN method, where the network architecture characteristics have been changed, and compared using four indicator measurements to determine the optimal one. The diverse models produced sensitivity (0.92 to 1), specificity (0.639 to 1), precision (0.792 to 1), and accuracy (0.831 to 1) in the training phase. Similarly, in the validation phase, sensitivity (0.72 to 1), specificity (0.467 to 0.846), precision (0.692 to 0.909), and accuracy (0.676 to 0.865) were obtained. Results indicated that the best performance was obtained from the ANN model with 12 neurons in the hidden layer (18-12-1) trained by Tan-Sigmoid function. This model was deemed to be the optimal one with the purpose that it produced the highest precision in terms of sensitivity (1/0,955), specificity (1/0,733), precision (1/0,84), and accuracy (1/0,865) during the training/validation phase. Finally, the optimal ANN model revealed the higher results in both training and validation data. Fig. 4. presents the architecture of the optimal ANN model. The latter was effectively trained in 25 steps. The results proved that the optimal mean squared error was the seventeenth one (MSEvalidation =0.0118, MSEtrainng=0.0046, and MSEall= 0.0074), as presented in Fig. 5. The errors histogram found from the results of modeling the optimal ANN model is presented in Fig. 6. The green and blue bars denoted validation and training data, respectively. The findings proved that the majority of errors between the output and target values ranged between -0.05 and 0.05.  Furthermore, the evaluation of the classifier should, preferably, be performed by the confusion matrix. The latter was an error matrix described by a table layout in order to help investigators in assessing the effectiveness of the classifier. In the confusion matrix, the instances in the target class are generally represented in the columns, whereas the instances in the output class are represented in the rows, as illustrated in Fig. 8. Results indicated a good accuracy: 1 for training data, 0.87 for validation data, and 0.973 for all data. The latter means that the ratio of distinguishing the survival patient from the true survival patient was 96.4% in all data, the ability to distinguish the nonsurvival patients from the true non-survival ones was 98.6% and the ratio of a total number of the correct diagnoses of all data as survival and non-survival was 97.3%. On the other hand, Fig. 9 illustrates the ROC analysis of training and validation data. It was worthy to mention that ROC analysis was utilized to define the true precision of the optimal model, determine the correlation among diagnosis sensitivity and specificity. The axes of the curves were true positive (calling the survival survival) and false positive (calling the non-survival survival). The curves were between the limits of 0 and 1, while the proximity to coordinate y and upper boundary indicated good results; the curves that had an inclination of 45°proved an ineffective learning of the model. The results of Fig. 9 indicated clearly the presence of powerful good learning of the proposed ANN model. To test the effectiveness of the suggested ANN model, a comparative study was performed with 11 empirical models proposed in the literature for predicting COVID-19 mortality, as presented in Table  1. The comparison has been made depending on the accuracy of the optimal model. Published studies have proved that the accuracy was a significant indicator in evaluating the forecast accuracy, as the optimal model was characterized by an accuracy close to 100%. The findings proved that the proposed ANN model in our study was the best performing model, with the maximum accuracy (97.3 for all data).

Graphical User Interface (GUI) design "CoviSurv 2021"
It is a popular usage detected in the majority of the studies using AI methods to present models in the form of equations, which make the proposed model hard to use in the future. Certainly, this idea has useless significance for other researchers and medical staff. In the sake of doing a beneficial effort, the suggested ANN model should be presented in the form of a programmed Interface or a simple script by using a well-known programming language like Matlab and Python [33]. Accordingly, the ANN model can be easily used thus obtainable to anybody concerned in the modeling problematic.
In the current study, a graphical easy-to-use interface was presented depending on the best appropriate ANN model, as illustrated in Fig. 10.
The suggested model was subsequently utilized to design a GUI open-access interface. Matlab software has been used to program the interface so-called "CoviSurv2021". The motive behind selecting this name was due to "Covi" relative to "COVID-19", "Surv" relative to survival to make as many people as possible survive, and 2021 was the year this interface was designed. CoviSurv2021 included patient personal information, computed tomography, antecedent diseases, and blood biomarkers. Initially, the user must define the patient personal information: choosing sex (male or female), enter age, and computed tomography pulmonary damage ratio (0% to 25%, 25% to 50%, 50% to 75%, or 75% to 100%). Secondly, to answer yes or no whether the patient had antecedent diseases, such as obesity, Insulin-Dependent Diabetes Mellitus (type 1), Non-Insulin-Dependent Diabetes Mellitus (type 2), High Blood Pressure, Lung Diseases, Cardiovascular Diseases, and Kidney Diseases. Thirdly, the user needed to present some blood biomarkers analyses (Glycemia, Glomerular Filtration Rate, Albumin to Creatinine Rate, International Normalized Ratio, C-Reactive Protein, White Blood Cells, Platelet Count, and D-dimer). Finally, by clicking on Run, the classification result appeared in the outputs, either survival or non-survival that was, the death probability was high. The suggested CoviSurv2021 interface will be very useful for all medical staff, helping them to select patients who necessities to get attention firstly, who had the upper urgency to be hospitalized, prioritize patients when the hospital is overcrowded, and profit time to provide the needed care.

Discussion
After the worldwide spread of COVID-19, the classification of patient-mortality risk was of great importance in both prevention, and treatment allocation, especially the last version "Omicron", which spread faster, leading to over-crowding hospitals, and so the inability of the medical staff to treat large numbers of patients. For this exact reason,there was an urge to understand and define the pandemic behavior. Several studies have pointed to the existence of patient-related factors affecting COVID-19 patients' mortality-risk. In the current study, ANN has been employed to predict COVID-19 mortality starting from collected antecedent diseases, and blood biomarkers data of patients. Only parameters that can be utilized as risk factors for the evaluating of likely mortality in COVID-19 patients have been adopted. This parameter was chosen based on the literature recommendation and expert medical team consultation. Afterward, a high accuracy model was developed depending on the ANN modeling which used a large number of data. The findings clearly indicated that the optimal model was the one containing 12 nodes in the hidden layer and trained by Tansig transfer function (18-12-2). The latter presented the optimal model, which provided the higher values in terms of sensitivity, specificity, precision, and accuracy, compared to the other models. Furthermore, the newly developed model was assessed by measuring its accuracy compared with other proposed models in the literature. The conclusion was that the optimal ANN model was more precise than the proposed empirical models. Furthermore, our model was closely followed by the SVM model which was proposed by Gao et al., [26], it gave an acceptable accuracy and ranked second. Moreover, the results revealed the poor performance of the logistic regression model proposed by Bhandari et al., [17] in predicting COVID-19 mortality.
With respect to the performance of machine learning models, the hierarchy followed the order of the presented model (in our study), models of Gao et al., [26], Li et al., [24], Pourhomayoun and Shakibi, [18], Ko et al., [19], Aktar et al., [20], Assaf et al., [22], Hu et al., [21], Chowdhury et al., [23], Tezza et al., [10], and Das et al., [25]. Finally, the proposed optimal model was afterward used to develop a GUI public interface in order to facilitate its usage in future. A graphical reliable and easy-to-use interface was, afterward, presented in the current study, dubbed "CoviSurv2021", which was programmed by Matlab software. The essential benefit of "CoviSurv2021" was to aid hospitals and medical services in selecting priority patients who have upper urgency to be hospitalized, prioritizing patients when the hospital is over-crowded, and gaining time to provide the care needed. Furthermore, both the antecedent diseases and blood biomarkers analysis were usually fast, affordable, and promptly accessible in hospitals. As a maximum, these parameters may last for twenty-four hours in the same health facility where patients are receiving the treatment. In other words, we can expect patients who suffer from the mortality-risk in the early stages of their hospitalization, giving them the appropriate care, while patients who are not at risk can be directedfor treatment at home to avoid the risk of overcrowding in hospitals. Our findings represented a crucial contribution to the medical field. The elaborated model built in our study represented a reliable tool for predicting COVID-19 mortality at the early stage of hospital admission. The performance of the estimation has been highly developed compared to other models proposed in the literature. In this pandemic crisis, the shortage of medical and personal resources is causing major problems for the health system, especially in third world countries. As a result, AI can play a significant role in COVID-19 patients management. In addition, CoviSurv2021 had several advantages. First, CoviSurv2021 could predict which patients were at high risk of death at the early stage of hospital admission (that was, within 24 hours of admission). Predicting mortality at the time of admission could be very informative for clinicians because the critical period that witnesses the disease propagation was from 10 to 14 days from the onset of symptoms, according to previous studies. Additionally, CoviSurv2021 could also provide advice on treatment priorities for people who needed to be treated intensively from day one of illness. While people who were not at risk of dying could be referred for home treatment to avoid the risk of overcrowding in hospitals. Second, the mortality results predicted by AI were explainable and easily understood by medical researchers. Although the interface to be offered will be easily usable by clinicians, nurses, and even patients at home. Finally, database collected from hospitals could help doctors and researchers in future studies to better understand the complex behavior of COVID-19.
Despite the amazing results offered in the current study, a number of significant shortcomings needed to be discussed. The important shortcomings was the small sample size used in the modelling, which could influence on the mortality identification accuracy. This might conduct to the suggested model incapability to predict from the new data input that were not utilized in the training phase. Additionally, the small database used in machine learning modelling could generate over-fitting and underfitting difficulties. Therefore, investigators generally use large and diverse database, which was gathered from transferring knowledge between them. This was a significant concern to be taken in consideration for future research base on collecting data from several countries to improve its learning and, therefore, generating a well model. Further modeling utilizing meta-heuristic algorithms on the prediction of COVID-19 mortality, were powerfully suggested. It was worthy to note that meta-heuristic algorithms had demonstrated high-efficiency finding combined with machine learning methods, conducting to enhancing their training and rapidly converging to the optimal parameters [34][35][36]. Demographic, genetic, and sociological factors could have an influence on the mortality of COVID-19 patients, with that being said, we highly encourage further studies on these parameters. We seek to further develop CoviSurv 2021 interface for the purpose of turning it into an information source to help hospitals and medical services.

Conclusion
This study contributes to providing a simple, reliable, and easy-to-use interface for the prediction of risk mortality of patients suffering from COVID-19 dubbed "CoviSurv2021". Firstly, a considerable number of the antecedent diseases and blood biomarkers data of patients have been collected. The selection of this parameter as risk factors for evaluating the likely mortality in COVID-19 patients are done based on literature recommendation and expert medical team consultation. Eighteen parameters have been selected in the input layer. To achieve our aim, several ANN modelsare presented for a practical analysis aimed at modeling the COVID-19 mortality based on the antecedent disease and blood biomarker data of the patients. Afterward, the proposed models are evaluated via four performance measures (sensitivity, specificity, precision, and accuracy). The comparison of the results assessment between the different proposed models reveals the superiority of the (18-12-2) model, which gives the highest accuracy during both the training/validation phases. The accuracy values are used to compare the performances of the optimal ANN model with the proposed models in the literature. The findings indicate that the aforementioned ANN model is more effective than the empirical models. Finally, the proposed optimal model is afterward used to develop a GUI public interface by Matlab software. A reliable and easy-to-use graphical interface is afterward presented in the current study dubbed "Covi-Surv2021". The fundamental benefit of "CoviSurv2021" is to facilitate for the medical staff the selection of first priority patients who has upper urgency to be hospitalized, prioritize patients when the hospital is over-crowded, and gain time to provide the care needed. This work has opened up several questions that need further investigations to develop certain limitations. Firstly, there is a need to utilize more data from other countries to improve the training phase. Secondly, we recommend the utilization of meta-heuristic algorithms combined with machine learning methods for predicting COVID-19 mortality in future studies. These algorithms have proved high-efficient findings, conducting to enhance their training and rapidly converging to the optimal solution.