COVID-19 Death Risk Assessment in Iran using Artificial Neural Network

Since the pandemic spread of COVID-19, it has posed a unique public health concern worldwide due to its increased death rate all around the world. The pandemic disease is caused by the SARS-CoV-2, which is the main cause of Middle East Respiratory Syndrome (MERS) and severe acute respiratory syndrome (SARS). Risk assessment is a vital action toward disease risk reduction as it increases the understanding of the risk factors associated with the disease and allows existing data to decide on adequate preventive and mitigation measures. Machine learning techniques have gained strength since 2000, as it has crucial role in data analysis and is really helpful to develop standard mortality models. This study aims to find the best model for data analysis using the Artificial Neural Network (ANN) and other risk factors, which contribute to the high mortality and morbidity associated with COVID-19 in Iran, to predict the risk of death for the people with different situation. A systematic review and meta-analysis were examined by using patient risk factor data from studies done by researchers to estimate COVID-19 death risk. Risk factors for the disease were extracted from an existing study. Using ANN, the best risk prediction for the disease is calculated. Assessment of a different number of hidden neurons with a different training function using the Bayesian Regularization algorithm, the best training function for the ANN model with 5 hidden neurons is found to have the most satisfying results. The coefficient of determination (R) and Root Mean Square Error (RMSE) was 9.99999e-1 and 4.54201e-19 respectively.


Introduction
Since the emergence of the known coronavirus disease, it has been a great threat to both global health and a socio-economic in almost all countries around the world. A continuous occurrence of the virus was reported first in Wuhan City, Hubei Province, China, since December 12 th , 2019 and it was linked to have originated from the South China Seafood Market [1]. On the December 29 th , 2019, the first 4 cases of COVID-19 were first reported all linking to the seafood wholesale market in Huanan (Southern China). Due to this occurrence, the World Health Organization (WHO) China Country Office was informed of a pneumonia of unknown cause, detected in the city of Wuhan in the Hubei province, China on the December 31 st , 2019 [2]. The COVID-19 disease has been associated to a great number of morbidity and mortality among both young and old people worldwide but, the answer to the origin of this disease remains unknown. Human-to-human transition is one of the biggest challenges in curtailing this epidemic leading to the WHO on January 30 th , 2020 to issue a worldwide health emergency warning notice. (COVID-19) is zoonotic virus belonging to a large family of positive-sense; single-stranded RNA viruses that can cause both mild and moderate upper respiratory tract illnesses with symptoms similar to that of the Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS) infection such as common cold [3,4]. COVID-19 has spread greatly to other countries including South Korea, Japan, Thailand, Singapore, Italy, US, UK, Spain and Iran, and on the March 12 th , 2020, it was classified by the WHO as a pandemic [1]. As of October 22 nd , 2020, based on the WHO reports, we had globally 41,104,946 confirmed cases with 1,128,325 deaths [5]. Iran has reported an increased number of cases of people infected with the virus in the Middle East and the highest cumulative number of deaths outside China. As of October 23 rd , 2020, there were 550,757 COVID-19 cases, including 31,650 associated deaths in Iran. About six neighbouring countries (Afghanistan, Bahrain, Kuwait, Iraq, Oman, and Pakistan) had reported imported cases from Iran [6]. According to recent statistics, Iran is ranked 13th in the number of deaths due to COVID-19 as reported as at October 23 rd , 2020. Multiple reports have shown that pre-existing illnesses, such as cardiovascular disorders, respiratory diseases, cancer, infectious diseases and substance abuse, can lead to increased morbidity and mortality of COVID-19. By the aid of existing data from China, cardiovascular disease, diabetes and hypertension are highly predominant among the patients. The disease can act as a factor to make the existing conditions worse and accordingly leading to death. To reduce the risk of this disease, risk assessment is important due to its help in understanding the risk factors associated with the disease and allows the opportunity to decide on adequate preventive and mitigation measures [7,8].
In data analysis, since 2000, machine learning techniques have gained strength as they played a vital role in epidemiological data analysis by developing standard mortality models [3]. Artificial Neural Networks (ANNs) are statistical and relatively unique approaches for modelling complex non-linear relationships in spatial epidemiology. The technique is considered as one of the most efficient methods in processing large sets of data that can be analysed computationally to reveal trends, patterns, prediction forecasting etc.
These techniques have been applied in a number of areas, including artificial intelligence, public health, epidemiology, economics, environmental science and agriculture [8,9]. ANN is a collection of a large number of units called neurons. Some of these neurons receives its information and sends the results to other neurons. The ANN structure is computer-based and consists of several basic parallel-operated processing elements. The most common ANN consists of a three-layer network, an input layer, one or more hidden layers, and an output layer. The input layer includes independent variables linked to the hidden layer for processing. The hidden layer consists of activation functions that helps calculate the weights of variables to examine the influence of predictors upon the dependent variables. The prediction process is completed in the output layer and the results are given with a small estimation error. Generally, a feedforward network is trained in a backpropagation algorithm. The back propagation algorithm learns relation between a defined set of input-output pairs in the training phase [10].
Unfortunately, to date, there is no global and standard response to the COVID-19 pandemic and each country is facing the crisis based on their own possibilities, expertise and hypotheses. Thus, there are different criteria for testing, hospitalisation and estimating of cases making it difficult to estimate the number of people affected by the epidemic. Early screening and risk estimation of the disease can help reduce the death chances and improve survival rate [11]. This study therefore aimed to evaluate the ability of ANN-based method in accessing the risk factors associated with the high morbidity and mortality of COVID-19 and to also estimate the risk of a patient dying when infected. To this end, we compared results from the conventional method and ANN modelling.

Conventional Method
Data extracted from a journal represented a retrospective, epidemiological study that has been performed on hospitalized cases of COVID-19 in Baqiyatallah hospital in Tehran, Iran, for patients admitted from February 19 th , 2020, to April 15 th , 2020 [12]. The study variables were as follows: age, gender, final outcome (including death or survival), and type of comorbidities. The number of people infected and death cases varied per gender alongside the comorbidities. Descriptive analyses of the variables were 3 expressed with representation of the risk factors as LEL=100, MEL=100. LEL (lower energy level) included data for death cases of COVID-19 across their age and gender as stated in the journal (Table 1 and Table 2). It was represented as: (1) Table 1. Risk factors of the model [12].

Age
Pre-existing medical conditions (Cancer, Hypertension, Chronic Respiratory disease, Diabetes, Chronic Kidney disease, Cardiovascular disorder).

2.
Gender -MEL included data for death cases of COVID-19 with pre-existing medical condition as also stated in the journal (Table1 and

Artificial Neural Network (ANN)
ANN was trained in MATLAB using the outputs of the conventional model predictions. To develop a machine learning model, a two-layer feed forward neural network was designed with input data of the age, gender, and comorbidities of covid-19 death cases in Iran as seen in (Table 2) and the target data (output of the conventional model). The network type used is the Feed-forward with a function of 'TRAINUM' to train data in the MATLAB. A total of 35 data samples were used as the input 25 of which were used for training, 5 for validation and 5 for testing. The neural network structure is shown in (Figure 1) with a total of 9 input, 5 hidden neuron and 1 output. The algorithm Levenberg-Marquardt (LM) was applied to train the network. Transfer function of 'TRANSIG' for the hidden layer and linear were considered. Number of hidden layers, neurons, and training algorithms are all influential parameters in defining the best prediction model using ANN.

Conventional Modelling Result
According to Table 2, the probability that a patient would die if confirmed to have COVID-19 was generated. Probability of people dying of COVID-19 based on their age and gender is calculated as follows: Probability of people dying of COVID-19 based on their pre-existing medical condition is calculated as follows: In conventional method by the aid of all information extracted from previous studies (Table 2), an algorithm was written to have probability of each disease based on prediction of the various risk factors. The algorithm dedicated one possible answer for the probability that a patient will die of COVID-19 depending on their age, gender and pre-existing medical conditions repeating the method 35 times (see Appendix A). With prediction made, the probability for 35 samples was calculated using the MATLAB software.

Artificial Neural Network (ANN) Result
The training has been considered for a total of 35 data sample. 9 inputs have been in the data with 5 hidden neuron and 1 output neural network structure. The ANN was optimized using the Bayesian Regularization algorithm.

Discussion
Comparing this study to other similar ones, Assaf et al. [13] used three different methods such as Neural Network, Random Forest, and Classification and Regression Decision Tree (CART) to predict patient deterioration. They reached 88.0% sensitivity, 92.7% specificity and 92.0% accuracy in predicting critical COVID-19. A study by Dhamodharavadhani et al. [3] employed Statistical Neural Network (SNN) models such as Probabilistic Neural Network (PNN), Radial Basis Function Neural Network (RBFNN), and Generalized Regression Neural Network (GRNN). They showed that the PNN and RBFNN-based MRP model had the best performance between all models for COVID-19. Another study by Clift et al. [14] to derive and validate a risk prediction algorithm to estimate hospital admission and mortality outcomes from COVID-19 in adults. The models were fitted in the derivation cohort to derive risk equations using a range of predictor variables. Although, the risk prediction model QCOVID population-based risk algorithm performed well, showing very high levels of discrimination for deaths due to covid-19; the algorithm will be regularly updated as the understanding of COVID-19 changes, as more data are provided. This study could expand to other diseases to help the healthcare system respond more effectively during an outbreak or a pandemic especially with most countries currently having a second wave of COVID-19.

Conclusion
In this study, we aimed to assess the utility of machine-learning models for predicting a high mortality rate for patients infected with COVID-19 based on their gender, age group and pre-existing medical condition. Machine Learning (ML) models enabled assessment of the relations between input and output of complex processes. ML models generated better performance than traditional prediction models, owing to their ability to reveal nonlinear associations, superior optimization of multi-factorial. Data analysed was just a few numbers with case study only in Iran therefore, extended studies with larger sample size of the variables and their relationship with the disease could provide more accurate information for COVID-19 monitoring. With the huge number of mortalities associated with COVID-19 especially for people with pre-existing medical conditions. The proposed method can help medical facilities, hospitals, and caregivers in deciding who needs to get attention first before other patients, especially when there is a lot of patients to be attended to. It can also help eliminate delays in providing the necessary care in order to reduce the risk of a patient with COVID-19 dying.