COVID-19 prediction using LSTM algorithm: GCC case study

Coronavirus-19 (COVID-19) is the black swan of 2020. Still, the human response to restrain the virus is also creating massive ripples through different systems, such as health, economy, education, and tourism. This paper focuses on research and applying Artificial Intelligence (AI) algorithms to predict COVID-19 propagation using the available time-series data and study the effect of the quality of life, the number of tests performed, and the awareness of citizens on the virus in the Gulf Cooperation Council (GCC) countries at the Gulf area. So we focused on cases in the Kingdom of Saudi Arabia (KSA), United Arab of Emirates (UAE), Kuwait, Bahrain, Oman, and Qatar. For this aim, we accessed the time-series real-datasets collected from Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). The timeline of our data is from January 22, 2020 to January 25, 2021. We have implemented the proposed model based on Long Short-Term Memory (LSTM) with ten hidden units (neurons) to predict COVID-19 confirmed and death cases. From the experimental results, we confirmed that KSA and Qatar would take the most extended period to recover from the COVID-19 virus, and the situation will be controllable in the second half of March 2021 in UAE, Kuwait, Oman, and Bahrain. Also, we calculated the root mean square error (RMSE) between the actual and predicted values of each country for confirmed and death cases, and we found that the best values for both confirmed and death cases are 320.79 and 1.84, respectively, and both are related to Bahrain. While the worst values are 1768.35 and 21.78, respectively, and both are related to KSA. On the other hand, we also calculated the mean absolute relative errors (MARE) between the actual and predicted values of each country for confirmed and death cases, and we found that the best values for both confirmed and deaths cases are 37.76 and 0.30, and these are related to Kuwait and Qatar respectively. While the worst values are 71.45 and 1.33, respectively, and both are related to KSA.


Introduction
Coronaviruses are a large family of viruses that can cause severe illness to human beings. The first known severe epidemic is Severe Acute Respiratory Syndrome (SARS) that occurred in 2003, while the second episode of series sickness started in 2012 in Saudi Arabia with the Middle East Respiratory Syndrome (MERS). The current outbreak of disease due to COVID is accounted for in late December 2019. This new infection is exceptionally infectious and has rapidly spread around the world. On January 30, 2020, the World Health Organization (WHO) announced this episode a Public Health Emergency of International Concern; it had spread to 18 nations. On February 11, 2020, WHO named this "COVID-19". On March 11, as the number of Coronavirus cases have expanded many times, separated from China with more than 118,000 cases in 114 nations, what is more, more than 4000 deaths, WHO pronounced this a pandemic [1].
The World health organization expresses that Coronavirus could be spread starting with one person, then rapidly through contact and respiratory spray. The number of identified cases has expanded quickly over the world, so various investigates and tasks confronted new late difficulties in forecasting the peak of an epidemic to help the government settle on restricting the spreading of the disease. The challenge now is how to estimate the peak of a pandemic keeping in mind all the efforts that have been made in all directions.
Nowadays, most advanced applications and systems for different businesses are based on artificial intelligence and machine learning. Deep learning is a part of machine learning that produces proper performance and completely surpasses classical machine learning methods, especially when the scale of data increases.
As indicated by many researchers in AI, the deep learning models highlight high accuracy and can improve human output in specific occurrences [2]. The Machine Learning and deep learning models used above are generally used for time arrangement forecasting and give splendid outcomes in recent years. Several deep learning models with remarkable results across diverse applications in different fields. Therefore, deep learning based on time series data has become more popular and has been applied for different applications such as network intrusion detection, illegal traffic flow detection, fraud detection, and video surveillance.
This paper focuses on six countries from the GCC countries at the Gulf area: KSA, UAE, Oman, Bahrain, Kuwait, and Qatar. These countries are sharing almost the same weather and have a similar quality standard of life. The paper aims to use LSTM for the prediction of COVID-19 spread in the GCC countries at the Gulf area.
The rest of the paper is organized as follows: Section 2 discusses the methodology of applying LSTM and data preparation. While section 3 shows the proposed model of using LSTM on time series data. After that, sections 4 and 5 showing the predicted results for the selected countries and the discussion for these results, respectively. Finally, section 6 is the conclusion and future work.

Literature review
Coronaviruses (CoV) are a large family of viruses that cause diseases resulting from colds such as the Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). Coronavirus disease (COVID-19) is a new species discovered in 2019 and has not been previously identified in humans [3]. COVID-19 causes symptoms proved to be moderate in about 82% of cases, and the others are severe or critical [4]. The total cases all over the world until now are more than 109 million cases, where there are more than 25 million active cases (99% in mid condition), 2 million deaths, and 77,782,723 recovered cases.
In the GCC countries, we found that the total cases are more than 2,535,000 cases, in which there are more than 51,834 active cases (90% in mid condition), 10,650 deaths, and 1,243,977 recovered cases. Planning is critical to decreasing the sudden and potentially catastrophic impact of an infectious disease pandemic on society. Several researchers presented ideas about facing such types of diseases: In Ref. [5], the researchers showed that the SARS-CoV virus is transferred from musk cats to humans, and the MERS-CoV virus is contaminated from dromedary to humans.
Several research zones have executed simulated intelligence (such as illness analysis in health care). One of the fundamental preferences of simulated intelligence is that it may be executed very well in a prepared model to order inconspicuous pictures. In this examination, humanmade intelligence was actualized to distinguish whether a patient is positive for Coronavirus utilizing their chest X-ray image. In Ref. [6], the researchers proposed a deep transfer learning-based approach using chest X-ray images obtained from COVID-19 patients, and they succeeded in predicting COVID-19 patients automatically. In Ref. [7], the researchers presented a new method that could screen COVID-19 fully automatically by deep learning technologies. They proved that models with a location-attention mechanism could be more accurately classify COVID-19 at chest radiography with an overall accuracy rate of 86.7%.
Moreover, in Ref. [8], they used a deep learning way to deal with naturally recognize Coronavirus patients and look at the illness trouble evaluation on CT checks utilizing a dataset of CT examines from 157 unfamiliar patients from China and the USA. Their proposed framework investigations the CT check at two certain levels: subsystems A and B. Subsystem A plays out a 3D examination, and subsystem B plays out a 2D investigation of each section of the output to recognize and find more extensive diffuse opacities, including ground glass penetrates (which have been clinically distinguished as illustrative of Coronavirus). The creators applied Resnet-50-2 to subsystem B and got a region under the bend of 99.6% to assess their framework. The affectability and explicitness were 98.2% and 92.2%, individually.
In [2], the researchers utilized real-world datasets. The framework inspects chest X-beam pictures to distinguish such patients. Their discoveries show that such an examination is essential in Coronavirus determination as X-beams are helpfully accessible rapidly and at low expenses. Experimental discoveries acquired from 1000 X-beam pictures of genuine patients affirmed that their proposed framework helps analyze Coronavirus statistics and accomplishes an F-measure scope of 95-99%. Also, three determining techniquesthe prophet algorithm (PA), autoregressive integrated moving average (ARIMA) model, and long short-term memory neural network (LSTM) -were received to expect the quantities of Coronavirus affirmations, recuperation, and passing throughout the following seven days. The forecast outcomes show promising execution and offer a standard exactness of 94.80% and 88.43% in Australia and Jordan, individually.
Meanwhile, the authors built up a deep-learning approach to remove data from CT examinations [9]. Their examination incorporated an assortment of 453 CT filters from 99 patients. They separated 195 areas of intrigue (returns on initial capital investment) of sizes extending from 395 × 223 to 636 × 533 pixels from the CT outputs of 44 Coronavirus positive pneumonia patients and 258 returns money invested from those of 50 Coronavirus negative patients. They applied an altered organization initiation model and got a precision of 82.9% for the interior approval with an explicitness of 80.5% and an affectability of 84%. The outside testing dataset displayed an absolute precision of 73.1% with 67% explicitness and an affectability of 74%.
Moreover, a CNN has applied to a limited dataset, which was not explicitly characterized in their examination, to assess and gauge the number of reported cases in China [10]. The researchers used the mean supreme and root mean square blunders to contrast their model and other profound learning models, including multilayer perceptron, long short-term memory (LSTM), and gated repetitive units. The creators presumed that the got outcomes guarantee high prescient productivity.
Nevertheless, researchers talked about India's Coronavirus information against a few nations just as crucial states in the US with an entire episode. It is discovered that the main proliferation number R0 for India is in the normal scope of 1.4e3.9 [11]. Then, the ring of development of contaminations in India is near that of Washington and California. From this model, it is assessed that India will enter solidness before the finish of May 2020, with the last size of pestilence close to 13, 000. However, if India enters the gathering transmission point, the estimate will be invalid. The impact of social separating is also estimated by breaking down information from different geological areas by assuming no gathering transmission.
In [12], the artificial neural networks (ANN) based models were used to assess the affirmed instances of COVID-19 in China, Japan, Singapore, Iran, Italy, South Africa, and the United States of America. These models misuse chronicled records of affirmed. While their direct contrast, cases are the number of days they expect to affect the assessment cycle. The COVID-19 information was separated into a train part and a test part. The earlier was utilized to prepare the ANN models, while the last was used to look at the reasons. The information examination shows more changes in everyday affirmed cases, yet additionally, various scopes of complete affirmed cases were considered.
Based on the got results, the ANN-based model that considers the past 14 days beats different ones. This correlation uncovers the significance of thinking about the most extreme hatching time frame in anticipating the COVID-19 episode. Contrasting the scopes of assurance coefficients shows that the assessed. Moreover, the results show that: Italy is the best one. Besides, Iran's anticipated outcomes accomplished the scopes of [0.09, 0.15] and [0.21, 0.36] for the mean outright relative mistakes and standardized root means square blunders, individually, which were the excellent ranges got for these models among various nations.
However, in this study, an open-source code of Multi-Gene Genetic Programming (MGGP) from the writing is used, while the controlling boundaries are considered in MGGP [13]. The most extreme number of qualities permitted in an individual and the most incredible tree profundity are two essential controlling boundaries set by the client. The previous is a multi-quality boundary, while the last is a tree fabricate boundary. There is a compromise in choosing fitting qualities for these two boundaries. Notwithstanding, such improvement may unavoidably bring about a more muddled model. In this investigation, the most extreme number of qualities permitted in an individual is set to 5 by embracing an experimentation interaction.

Quality of life
In 1995, the WHO perceived the significance of assessing and improving individuals' quality of life. Because of that, we tried to get the relation between the quality of life and the spreading of COVID-19.
According to the quality of life index [14], the United Arab Emirates has a quality index (QI) equal to 169.17, Qatar has a quality index equal to 155.77, Oman has a quality index equal to 173.46, Bahrain has a quality index equal to 130.91, Saudi Arabia has a quality index equal to 144.52, and finally, Kuwait has a quality index equal to 113.99. Table 1 sorted the six countries according to their QI in Ascending order. Even though the Quality of Life Index is continuously changing, these countries' capitals are inside the top ten Arabian cities in the level of life. Although each of them is unique in its way, they all share one quality: the Arab world's best cities.

Long short-term memory (LSTM)
LSTM is a novel recurrent network architecture in combination with a proper gradient-based learning algorithm. LSTM is designed to overcome these error back-flow problems. It can learn to bridge time intervals over 1000 steps, even in the noisy case, compact input sequences, without the loss of short time delay capabilities. An efficient, gradient-based algorithm achieves this for an architecture enforcing constant error flow that is neither exploding nor disappearing through internal states of each unit [15].
In principle, an LSTMs can use its memory cells to remember longrange information and track the various attributes of text it is currently processing. For instance, it is a simple exercise to write gadget cell weights that would allow the cell to keep track of whether it is inside a quoted string [16].
During the preparation cycle of an organization, the primary goal is to limit misfortune (as far as blunder or cost) saw in the yield when preparing information is sent through it. We figure the inclination, that is, misfortune concerning a specific arrangement of loads, change the loads as needs be, and rehash this cycle until we get an ideal arrangement of loads for which misfortune is least. This is the idea of backtracking. At times, it so happens that the slope is practically immaterial [17]. It must be noticed that the slope of a layer relies upon specific segments in the progressive layers. On the off-chance that a portion of these segments is little (under 1), the outcome acquired, which is the inclination, will be much more modest. This is known as the scaling impact. When this angle is duplicated with the learning rate, which is a little worth running between 0.1 and 0.001, it brings about a more modest worth. As an outcome, the modification in loads is minuscule, delivering nearly a similar yield as in the past. Likewise, if the inclinations are huge in esteem because of the enormous estimations of parts, the loads get refreshed to an incentive past the ideal worth. This is known as the issue of detonating inclinations. To evade this scaling impact, the neural organization unit was re-underlying so that the scaling factor was fixed to one. The phone was then enhanced by a few gating units and was called LSTM [18].
A vanilla LSTM unit is made out of a cell, an input gate, an output gate, and a forget gate. This forget gate was not an aspect of the LSTM organization but instead was proposed by Ref. [19] to let the organization to reset its state. The cell recollects values over self-assertive. Periods and the three gates manage the progression of data related to the cell. In the rest of this segment, LSTM will allude to the vanilla variant, as this is the most. Famous LSTM engineering [17]. This does not suggest, nonetheless, that it is moreover the prevalent one in each circumstance. To put it plainly, LSTM engineering comprises many intermittently associated sub-organizations, known as memory blocks. The thought behind the memory block is to keep up its state over. Time and control the data stream through non-direct gating units. Fig. 1 showcases the design of a vanilla LSTM block, which includes the gates, the input signal x(t), the output signal y(t), the activation functions, and peephole connections [19].

Data preparation
Data preparation was managed using the COVID-19 time series. Real  Fig. 1. Architecture of a typical vanilla LSTM block [16].
data were prepared and transformed into proper interpretation and coding to fit into the LSTM model for predicting and forecasting problems. The LSTM model learns a function that maps a sequence of past observations as input to an output observation as in the following steps: 1 The COVID-19 time series data for the three countries were downloaded from January 22, 2020 till July 24, 2020 from Johns Hopkins Dataset [20] (JHU CSSE, 2020). 2 Drop/delete the date column from the time series data. The sequence of observations was as follows: [10,7,15,20,24,36,40,31,47,53,45]. The sequence of observations must be transformed into multiple samples from which the LSTM can learn. We divided the sequence into multiple input/output patterns called samples. In this paper, the sequence of observations was transformed using a window size equal to one week (because the spread of COVID-19 can be significant from one week to another). We have considered seven time steps as the input and one time step as the corresponding output for the prediction model that was being learned, as shown in Table 2. 3. We have used the Long Short-Term Memory (LSTM) as a regression model to predict and forecast COVID-19 spread for the coming months based on the number of confirmed cases and the number of death cases.

The proposed LSTM model
We have implemented a simple Long Short-Term Memory (LSTM) model with an input layer, a single hidden layer, and an output layer that is used to make a prediction. The input layer has some neurons equal to 7 sequence steps (for one week COVID-19 data points). The hidden layer is an LSTM layer with 10 hidden units (neurons) and a rectified linear unit (ReLU) as an activation function. The output layer had a dense layer with 1 unit for predicting the output. The learning rate is set to 0.001, and it decays every five epochs. Moreover, we have used 1000 as the number of epochs, Adam as the optimizer, and the mean square error as the loss function. After that, we fit the model with prepared data to make a prediction. The obtained results may vary given the stochastic nature of the LSTM model; therefore, we have run it several times. Finally, we enter the last sequence with output to forecast the next value in the series.

Experimental results
The created series of prediction and forecasting models based on time-series data to check when the situation will be under control in the GCC countries at the Gulf area (KSA, UAE, Oman, Bahrain, Kuwait, and Qatar) were straightforward and easy. We calculated two different metrics for model performance evaluation which are the root mean square error (RMSE) and the mean absolute relative errors (MARE). The results are stated in Tables 3 and 4, respectively.

Discussion
From the LSTM model, we confirm that the KSA and Qatar would take the most extended period to recover from the virus (during 2021), as shown in Figs. 2 and 6. Moreover, we confirmed that KSA's total number of deaths decreases and the situation might be controllable after the second half of March 2021, as shown in Fig. 3. Fig. 4 shows that the UAE already started the second cycle of the COVID-19 virus, and its peak is higher than the first cycle's peak, and a high number of confirmed cases will last long (during 2021) due to the lack of social distance. The total number of deaths in the UAE will be under control from the second half of March 2021, as shown in Fig. 5.

Table 2
The inputs/output patterns for time-series data example.   For Qatar, the total number of deaths is controllable since the beginning of November 2020, and it is stable until now, as shown in Fig. 7. There is a high spread of the COVID-19 virus for all other countries, but due to strict social distance, the situation will be controllable in the second half of March 2021, as shown in Figs. 8, 10 and 12. For Bahrain, the total number of deaths and the situation was under control since the beginning of February 2021, as shown in Fig. 9.
For Oman, the total number of deaths and the situation may be under control after the second half of February 2021, as shown in Fig. 11. Finally, the total number of deaths in Kuwait is under control from the second half of February 2021, as shown in Fig. 13.
We calculated the root mean square error (RMSE) between each country's actual and predicted values for both confirmed and death cases, as shown in Table 3. The largest value was related to KSA cases,  Table 4. The largest values for both confirmed and death cases were related to KSA cases, and the least value for the confirmed cases was related to Kuwait while the least value for the death cases was related to Qatar.
The situation of each country should be related to the values in Table 5, as we found that the percentage of total COVID-19 tests to the total population is almost 37%, 52%, 30%, 39%, 283%, and 169% for KSA, Qatar, Oman, Kuwait, UAE, and Bahrain respectively. Also, we should relate these values to the quality of life stated in Table 1. Furthermore, as we see, although Oman is the highest country among the six countries in the quality index with QI = 173.46, and the   percentage of total COVID-19 tests to the total population is equal to 30%. However, the situation of COVID-19 is under control, so the quality of life may affect the situation.
For UAE, it has a QI = 169.17, and the percentage of total COVID-19 tests to the total population is equal to 283%, so the situation of COVID-19 is surely under control because the quality of life is merged with the total tests performed.
For KSA and Qatar, they have an average quality index with QI = 144.52 and QI = 155.77, respectively, and also they have average total tests with the percentage of total COVID-19 tests compared to the total population is equal to 37% and 52% respectively. So, we find these countries would take the most extended period to recover from the virus (during 2021).

Conclusions and future work
The process of using machine learning for time-series prediction, especially in COVID-19, proved useful in modeling and forecasting the end status of the virus spreading. This paper focuses on applying Artificial Intelligence (AI) algorithms for the prediction of COVID-19 using the time-series data and study the effect of the lifestyle and the awareness of citizens on the virus in the GCC countries at the Gulf area, especially in the Kingdom of Saudi Arabia (KSA), United Arab of Emirates (UAE), Oman, Bahrain, and Qatar. For this aim, we access timeseries real-datasets collected from Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). In this paper, the obtained results from the LSTM models have confirmed that KSA and Qatar would take the most extended period to recover from the COVID-19 virus, and the situation will be controllable in the second half of March 2021 in the other countries. Also, we found that quality of life may affect the situation of COVID-19 spread along with the total number of tests performed. Age and the weather had a possible effect on the spread and mortality of COVID-19. Lack of social distancing has a powerful impact on those factors. Further investigations are still needed.