Forecast of the outbreak of COVID-19 using artificial neural network: Case study Qatar, Spain, and Italy

The present study illustrates the outbreak prediction and analysis on the growth and expansion of the COVID-19 pandemic using artificial neural network (ANN). The first wave of the pandemic outbreak of the novel Coronavirus (SARS-CoV-2) began in September 2019 and continued to March 2020. As declared by the World Health Organization (WHO), this virus affected populations all over the globe, and its accelerated spread is a universal concern. An ANN architecture was developed to predict the serious pandemic outbreak impact in Qatar, Spain, and Italy. Official statistical data gathered from each country until July 6th was used to validate and test the prediction model. The model sensitivity was analyzed using the root mean square error (RMSE), the mean absolute percentage error and the regression coefficient index R2, which yielded highly accurate values of the predicted correlation for the infected and dead cases of 0.99 for the dates considered. The verified and validated growth model of COVID-19 for these countries showed the effects of the measures taken by the government and medical sectors to alleviate the pandemic effect and the effort to decrease the spread of the virus in order to reduce the death rate. The differences in the spread rate were related to different exogenous factors (such as social, political, and health factors, among others) that are difficult to measure. The simple and well-structured ANN model can be adapted to different propagation dynamics and could be useful for health managers and decision-makers to better control and prevent the occurrence of a pandemic.


INTRODUCTION
The outbreak of the novel coronavirus (COVID-19) which began December 2019 in the city of Wuhan (China) and later spread rapidly in March 2020 to different countries, has had several impacts on public health and the worldwide economy [1][2][3]. The COVID-19 virus was detected and isolated from a single patient in late December, and afterward identified and verified in additional patients [4,5]. The genome analysis of the novel virus known as COVID-19 or SARS-CoV-2 revealed that the DNA sequence has up to 96% similarity to bat coronaviruses, sharing properties with other pathogenic viruses such as SARS-CoV and MERS-CoV [6][7][8][9][10]. Different studies have shown that the main concerns related to SARS-CoV-2 are its high transmission potential that has been responsible for the global COVID-19 outbreak [11][12][13]. The fast-spreading rate of COVID-19 is a result of different transmission methods, including one that is proved to be the most effective: direct contact between humans [14]. This close contact facilitates transmission more rapidly as droplets are expelled through individuals coughing or sneezing [15,16]. As this virus is proven to be airborne, it has increased chance of transmission, rate of contagiousness and a high survival period, lasting for up to 9 days within different materials [17][18][19]. Based on preliminary observation, it was proposed that COVID-19 has an incubation period ranging from 3 to 10 days with an average incubation period of 5.2 days [20,21]. Other studies suggest a period from 2 to 14 days in length [22,23]. Furthermore, personnel during the incubation period can transmit the infection, creating a very arduous identification process resulting in the number of infected people being higher than the official count [24].
The spread of COVID-19 reached more than 177,419,000 cases on July 16, 2021 in more than 110 countries, with a total death toll of up to 3,839,600 people. As a result of the high risk the fast outbreak of this virus, it was declared a global pandemic by the Director-General of WHO, which had a direct effect on public health requiring urgent international attention [25,26]. This declaration encouraged all countries to take serious remedial action to prevent the spread of the virus among their citizens, protect public health, and if required ban travels and close borders. The virus outbreak in Europe started in the northern zone on February 22, 2020, reaching closure on March 9, 2020, and suspending all national activities on March 21, 2020. Following that, Spain implemented the closure of commercial activities on March 15 th 2020 [27]. In the Middle East, initial cases were reported in Qatar at the beginning of March and the number increased to more than 400 in the 25 days leading to the closure of the industrial city, suspension of schools, universities, followed by a full closure within 20 days. In other countries, the precautions were focused on performing initial patient screenings and performing massive tests to all suspected people and for anyone who had been in contact with them, isolating only the affected people and areas [28,29]. With all these measures taken, the virus continued to spread throughout the world marking the largest quarantine in history. The statistics showed that the confirmed infected cases increased during the period of May 10, 2020 to June 15 2021 from 3,900,000 to more than 177,00,000 cases, with more than 270,000 to 3,800,000 death cases reported in more than 180 affected countries, respectively [30].
Compared to SARS or MERS, COVID-19 is more infectious with high R 0 values that are superadded across the globe at an alarming rate causing more infections and high mortality rates [24]. The fast spread rate and the high chance of transmission indicate that the employed prevention and control strategies including isolation, detection tests, and prophylactic measures, although helping flatten the epidemic curve are not effective to limit, prevent or stop the spread of the virus throughout the world. Therefore, there is an urgent need to develop models that can be used for the prediction of the outbreak of the pandemic disease in different areas. The developed model will help decision-makers and physicians to be prepared, understand the real magnitude of the risk, and take the appropriate prevention measures. Prediction tools can also help to estimate the risk volume and prepare the required control measures with sufficient advance time.
Different models were used in epidemiology history to calculate the outbreaks of different diseases [31,32]. One of the models that is often used is the classic model of susceptible-infectedrecovered (SIR) developed by Kermack and McKendrick [33,34]. Various prediction models have been studied, based on the SEIR model [35,36] and the logistic model has been used to successfully predict for 20-day infections [37]. Other subsequent studies using more complicated models with multiple variants of SIR patterns were used to predict the outbreaks of SARS [38] and Cholera [39]. Vanderpast al. [37] incorporated the exposed population and their corresponding immunity within the model. Furthermore, another logistic model has been used to predict the growth and development of diseases similar to bacterial growth [40] or infectious diseases [41]. Other studies have used the Gompertz model, which is usually used for bacterial sprouting in predicting virus outbreaks [42,43]. Recently, Wu et al. [44] studied the growth of COVID-19 and predicted the national spread of the pandemic in China using the SEIR model. Yang and Wang [45] used the confinement variable to predict the national spread of COVID-19 in China. Other studies have also studied and estimated infected and non-infected population in different areas of China [46] and flights from Japan [47]. Additionally, the epidemic evolution model has also been studied following the system of differential equations for the susceptible-infected-recovered-dead (SIRD) variables. However, all of these models have required accurate initial data, temporal dynamics of the disease, and growth rate to achieve precise prediction, which is not possible in the case of COVID-19.
Recently, artificial neural networks (ANN) have been successfully used for the prediction of the evolution of different systems with a high degree of accuracy [48]. The ANN demonstrated an excellent prediction accuracy short time, for different engineering-based processes such as wastewater treatment [49,50], electro-dialysis separation, metal removal [51], biosorption of heavy metals [52], anaerobic digestion [53], biofuel production [54][55][56][57]52], cell growth rate [58] and population growth [59]. The ANN architecture is a simple and fast methodology to predict the process output compared with the complicated physically related models. In general, ANN predictions depend on reasoning and formulation of a mathematical relationship between the inputs collected data (ICD) and system output function (SOF) without the need for previous physical correlation. The neutrons (Ns) connect the store and manipulate the ICD to produce the SOF using different combinations of the transfer functions (TFs). The intensities of the signals from different Ns determine the contribution of the ICD to the SOF through the different layers of the ANN. Therefore, this study presents the first time development of ANN architecture to predict the spread of COVID-19 to understand the evolution of epidemics over time. The developed ANN architecture is capable of forecasting the outbreak behaviors (number of infections and mortality) to help health systems and politicians in predicting future situations for better decision-making in the control and prevention of this pandemic. The ANN architecture was applied on two different European countries (Spain and Italy) and one Middle Eastern country (Qatar) to test its application.

Data collection
The prediction of the pandemic growth of COVID-19 in different countries was followed using the ANN algorithm. The breakout was tested in two different continents: the Middle East represented by Qatar with low population density, hot and humid climate, and a high number of reported cases per million of the population (PM IN ), compared to the European countries, Spain, and Italy, with high population density, Mediterranean mild climate conditions and medium to high number of reported cases PM IN . The reported data of the infected/death cases were obtained from WHO [30], with daily reports presented worldwide from the European Center for Disease Prevention and Control [60] and daily reports issued by the Ministry of Health in Qatar, which are publicly available in the ministry website.

Artificial neural network Model
The ANN architecture simulates both linear and non-linear systems by combining and reasoning the influence of the inputs collected data on the SOF. The ICD was introduced into the ANN architecture to generate a mathematical SOF without the requirement for a previous physical relationship. The ICDs are connected within the ANN via nodes known as neutrons (Ns). The ICDs are received, stored, and manipulated by Ns via TFs at changeable intensities depending on the contribution of the ICD to the SOF. The forecast of the ANN is refined by error tuning using a feed forward-back propagation neural network (FF-BPNN) algorithm. In the FF-BPNN algorithm, the prediction of the ANN is tuned by deploying the flow of ICD within the ANN layers (input layer [IL], hidden layer [HL], and the output layer [OL]. The Ns between layers are related by the connection weights (CW ij ) attuned by mapping competency of the trained network and activated by a bias value (β j ). The effect of the ICD entered at the IL is calculated as ICD (X i ) and transferred to HLs to categorize the correlation between the dependent and independent parameters. The summation of the weighted output (Σ CID (X i ) is added to a threshold bias (B i ) (i.e., ) and the outcome is transferred to OL via specific ∑ = 1 CID (Xi) + TF. The SOF in the OL is compared with measured data and the associated error (AE) determined. After that, the AE signal returned backward to the IL via HL renewing both the weighted ICD and B i at each Ns to minimize the error signal. This iterative tactic continues to update the Σ CID (X i ) in different Ns until the minimum required error is reached. After that, training of the ANN architecture is complete and another set of data known as testing data is used to verify the prediction ability.
The algorithm used in developing the ANN architecture is presented in Figure 1a.
Initially, the number of cases of the spread and death were collected for each country until June 16, 2021. Then the initial pandemic and death dates were adjusted for each country and the rate of spread and the maximum number of cases were determined. The data was entered into the IL and moved within the ANN along with the calculated CW ij and the sum of the weighted output ( ). The CW ij between layers was performed using the Levenberg-Marquardt back-∑ The arrangement of the ANN employed in the present study is shown in Figure 1b. The growth of the pandemic disease and mortality can be represented as a function of time (t), the number of initial patients (infected or dead) (P i ), the maximum predicted number of patients infected or dead (P max ), and the growth rate characteristic of the pandemic (µ g ) as per equation The training, testing, and validation of the ANN was conducted using MATLAB® software (MathWork, Inc,.Version: R2010a) and following L-M-BB-TA algorithms. A total of 5400 ICD from the three countries were used in the calculations. The ICD manipulation was separated into the 56% training set, 24% testing, and the balance for validation subsets. The ICD data were normalized with respect to minimum/maximum values previously used in the ANN to decrease the chance of local minima.

Statistical analysis
The ANN prediction during the training and testing calculation was judged using the root mean square (RMS), determination coefficient (R 2 ), and the mean absolute percentage error ( outlined in equations (2)(3)(4). |% |) Where is the number of patients infected or dead at any time, is the number of predicted patients infected or dead from the ANN algorithm, and N, the number of data. The acceptable RMS limit for the testing data was set in the range 10 −4 and 10 −2 .

The ANN model
The training of the collected data of infection and death cases was carried out using a wide variety of TFs combinations (sigmoid (S), sigmoid (S), hyperbolic tangent (HT), and hyperbolic secant (HS) and different iterations as presented in Table 2. It should be indicated that the training was carried out with the data until July 2020. Results revealed that the most appropriate ANN architecture that predicts the number of infected/death cases was achieved by the combination of 5-4-4-2 and 5-5-5-2) using HT, S, and S/HT, S, and S as TFs in the HL-1, HL-2, and OL, respectively. The statistical analysis showed that the RMS and maximum percentage deviation error (%Max div, erro ) for the infection data were 0.36% and 45.75%, with no more than 0.23% of the manipulated data fell within %Max div, erro of ±10% error. Death cases showed RMS data at 0.40% and 22% of the manipulated data fell within %Max div, erro of ±10% error. Previous studies showed that the mathematical modeling of the spread of the pandemic is based on the Gompertz model, which belongs to the family of the Sigmoid curve [61]. The combinations used in this study, although belonging to Sigmoid curve, support and refine the prediction of the ANN architecture [62]. The result showed that a single exponential model was not very adequate for the description of virus outbreak. On the other hand, the double exponential Gompertz model can accurately describe the biological growth. Indeed, the Sigmoid TF with double exponential expression has been used to model human mortality, bacterial growth curves, population growth [63], growth of animal fetuses [64], growth of chickens [65] and weight growth of fish [66], including anaerobic digestion kinetic [53]. The developed ANN architecture combines two families of TF to predict the growth and development of the COVID-19. This combination of S and HT generated SOF function that follows the instantaneous disease growth as a function of time, presenting a point of inflection where the growth curve transfers from concave to convex.
Similarly, the training tests with different ICD achieved a maximum RMS and %Max div, erro of 0.55% and 5.1% for infection cases, and 0.65% and 3.1% for death cases, respectively. Following the results in Table 2, it was observed that ANN architecture with one IL, two HLs, and one OL has an excellent tendency for predicting the number of infected/dead cases in the studied countries with high accuracy.
The data in showed that the R 2 and during the training stage were 0.993% and 0.076%, respectively. |% | Equally, these values were 0.995% and 0.189% for the testing stage, respectively. The frequency count versus relative error followed a Gaussian distribution with relative error in the ranges -27.6% to 32.5%and -28.8% to 31.9% for training and testing stages, respectively. The plot of residuals between the reported and predicted number of cases showed scattered points around the horizontal zero-lines, with an in the range of -0.5 to 1.0 for training, and -1.0 to 1.0 for testing data, |% | suggesting a very small deviation from the reported data.

Prediction of the infected cases
The developed ANN architecture was used to calculate the number of infected/death cases in Qatar, Spain, and Italy. Table 3 shows the predicted versus reported cases of COVID-19 during the period of January 2020 to June 2021. As the initial date of the breakout of COVID-19 in the three examined countries was different, the prediction calculations were based on the real date where at least three confirmed cases were reported. The analysis of variance (ANOVA) indicated that the developed ANN architecture accurately predicts the number of infected/death cases in the three countries. The student T-test (P=0.05) disclosed an insignificant difference between the predicted and reported number of cases until June 16, 2021.
Calculations showed that up to 98.5% of the infected cases and 95.7% of death cases were within the ±2.5% and ±3.5% of the maximum deviation of the reported cases (%Max div, erro ). Table 3 also shows that the RMS, and R 2 of the forecasted infected cases were ≤ 4.31, ≤ 1.65 and ≥0.94 for the three countries. The |% | calculated residuals between recorded and forecasted cases were in the range ± 0.75 and ± 0.750, respectively. As indicated before, having the residual scattered around the horizontal zero-line suggests high prediction accuracy of the data. Although the Pandemic spread rate and the number of reported cases were the highest in Qatar compared with Spain and Italy, the low mortality number suggest that the population density, infected people age, social distances precautions, weather conditions, and the responsibilities of individual have a major impact towards the critical pandemic evolution.  Figure 3. It can be seen that the ANN model follows the same trend of the real reported cases. Although, the model presents a forecast for the expected decrease in the NDICs for the coming period. Figures 2 and 3 show that the ANN prediction follows the reported case evaluation and can forecast the future number of cases with factors of R 2 ≥0.96, RMS≥ 0.055 and Ab RE ≤ 1.66 for all countries. The results in Figure 3 show that the NDICs in Qatar during the exponential outbreak phase are lower than in Spain and Italy, mainly due to the time elapsed for the latter countries to take measures to stop the pandemic, the differences between the climate conditions, and population density differences between the European countries and Qatar. Results from other countries such as South Korea and China present very low daily infected cases due to the strict measure confinements and monitoring strategies applied by these countries at the time of discovering the breakout, which has managed to paralyze the pandemic rapidly. The developed ANN model was used to predict the evolution of the cases after July 6, 2021 and verify if the end of a pandemic can be determined based on the recorded data and the developed model. The NDICs have been calculated in prediction for the date that the pandemic disappears and the number of infections reaches zero. According to the ANN model with the corresponding coefficients of each country, the end of the pandemic situation where the number of cases will be stable and ≤200 is predicted to be October for Qatar, mid-September for Spain, and early September for Italy. However, if the precautions are released and/or curfew and social distancing regulations are relaxed, a second wave can start and the number of cases will be increase.

Mortality forecast
The ANN architecture was also used to model and calculate the number of death cases in the three studied countries from the initial date of the pandemic until the number of cases reached zero.
In this case, similar trends were observed between Spain and Italy but different than Qatar, with Qatar possessing the lowest expected deaths. Table 3   The ANN model used short-term data to predict the long-term spread of the pandemic. The ANN was also valid in determining the rate of mortality and infections of the pandemic with respect to infected individuals, who were detected with symptoms. The FFBPNN tuning algorithm used in this study eliminates the limitations of input data uncertainty that generates problems for the models based on physical relationships. The FFBPNN tuning algorithm also reduces the dependency of the number of evolution data to achieve accurate estimates, which is a large problem in different growth models including the Gompertz function. Besides, the structure of the ANN can be manipulated to incorporate the external pandemic containment factors applied by each country and determine their effect on the virus breakout. A structure such as this will assist in anticipating how each containment factor will affect the disease growth patterns In comparison with other models, the ANN model provides an accurate prediction of the infection and death cases without the need for previous physical correlation, nor the assumptions required by the epidemiological models. In this regard, the model is considered an easy tool for the prediction of different diseases. Different studies were used in what is called a compartment model to quantitatively estimate the impact of interventions on the pandemic [67], most of these models considered population perspective and considered either deterministic or stochastic models (e.g. SEIR/SLIR, SIRD) [68,69]. Models based on Bayesian method [70,71], agent-based model and generalized growth model [72] were used for COVID-19 prediction. Based on the available results the ANN is considered within the top five models for pandemic prediction. The focuses of all the epidemic models were to determine the time for the infective pandemic 19 outbreak and differences in nation-to-nation controls and precautions. The present work provides an initial benchmark to demonstrate the potential of ANN for future research.

5-DECLARATION
The author of this work confirm the following:

1-
The data used in the present study was obtained from daily reports issued by the Ministry of Health in Qatar, which is publicly available through the ministry website.

2-
All methods used in this study were carried out in accordance with relevant guidelines and regulations.

3-
The experimental protocols used in this study were developed solely by the author, who is working at the department of chemical engineering Qatar University.

4-
The author confirms that there is no information taken from a third party persons.