Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods

The first known case of Coronavirus disease 2019 (COVID-19) was identified in December 2019. It has spread worldwide, leading to an ongoing pandemic, imposed restrictions and costs to many countries. Predicting the number of new cases and deaths during this period can be a useful step in predicting the costs and facilities required in the future. The purpose of this study is to predict new cases and deaths rate one, three and seven-day ahead during the next 100 days. The motivation for predicting every n days (instead of just every day) is the investigation of the possibility of computational cost reduction and still achieving reasonable performance. Such a scenario may be encountered in real-time forecasting of time series. Six different deep learning methods are examined on the data adopted from the WHO website. Three methods are LSTM, Convolutional LSTM, and GRU. The bidirectional extension is then considered for each method to forecast the rate of new cases and new deaths in Australia and Iran countries. This study is novel as it carries out a comprehensive evaluation of the aforementioned three deep learning methods and their bidirectional extensions to perform prediction on COVID-19 new cases and new death rate time series. To the best of our knowledge, this is the first time that Bi-GRU and Bi-Conv-LSTM models are used for prediction on COVID-19 new cases and new deaths time series. The evaluation of the methods is presented in the form of graphs and Friedman statistical test. The results show that the bidirectional models have lower errors than other models. A several error evaluation metrics are presented to compare all models, and finally, the superiority of bidirectional methods is determined. This research could be useful for organisations working against COVID-19 and determining their long-term plans.

Machine Learning (ML) has demonstrated itself as a specific research field in recent decade by solving numerous exceptionally complex and advanced real-world problems (17,18). In this research, the number of new cases and new deaths are predicted using deep learning which is a subfield of ML. There are existing literature which have tried to predict mortality each day. In this article, the prediction of mortality rate and new cases are performed every day, every three and seven days using deep learning models such as Long The motivation of this research is preforming in-depth comparison of LSTM, Conv-LSTM, GRU with their bidirectional extensions. Moreover, based on the existing literature, it seems that Bi-GRU and Bi-Conv-LSTM have not been used before as predictors of COVID-19 time series data. During our experiments, we rely on Friedman test to compare the six deep learning methods statistically. Similar to the existing literature, we perform every day forecasting. Unlike the previous works, we also perform prediction every three and seven days which require one-third and one-seventh of every day prediction computational complexity. Investigation of prediction every three and seven days is done to determine whether it is possible to reduce computational complexity and still achieve reasonable performance. Computational complexity reduction matters in any application involving realtime forecasting of time series. The rest of the paper is structured as follows: Section 2 contains related research in this field, Section 3 reviews the background knowledge briefly, dataset description is provided in section 4, Section 5 is devoted to proposed method, Section 6 gives the experimental results, Section 7 presents discussion and Section 8 renders the conclusion and future works.

Related works
In this section, we briefly review the existing literatures that have similar scope with this paper. The differences between the reviewed works and our approach will be highlighted as well. Pinter et al. (19) predicted the number of infected people and the mortality rate by employing a hybrid ML approach. Their hybrid method consisted of a Multi-layered Perceptron (MLP) and Imperialist Competitive Calculation (MLP-ICA). The MLP was used as the predictor and ICA (an evolutionary optimization method) was used as the optimizer.
The hybrid method was trained on Hungary dataset (20). The trained model was compared against adaptive network-based fuzzy inference system (ANFIS). The prediction horizon was chosen to be nine days.  (6). They showed that their method had the best performance among other similar methods.
Dowd et al. (21)  Babaei et al. (24) analyzed the impact of health-protective measures such as quarantine, wearing masks, and social distancing using a susceptible-exposed-infectious-recovered (SEIR) type model on a hypothetical population. To further improve the model, the environmental noise (present in the data) has been taken into account using Brownian motion process. In addition, the stability analysis of the proposed model has been discussed. The authors reported that health strategies play a major role to contain the virus threat.
A mathematical model about the spread of COVID-19 was proposed in (25). The unique solvability of the proposed model was also proved. Additionally, the reproduction number of the proposed model was discussed. To survey the behavior of the considered model, some numerical simulations were conducted. Another research on the spread of COVID-19 has been conducted by Babaei et al. (26). The authors introduced a stochastic model considering several disease compartments related to different age groups. Their model was based on observing safety protocols, such as using mask and putting people into quarantine. The numerical results showed the effectiveness of safety protocols on COVID-19 containment. Singh et al. (31) analyzed the evolution of COVID-19 spread in an assumed population by employing a fractional-order dynamical system. They proposed a stable computational method to solve the dynamical system numerically. The computational method is based on the discretization of the domain and the short memory principle. The implemented approach divided the population to five subgroups such as susceptible people, exposed people, infected people, etc. and analyzed how these subgroups behave over time. Gao et al. conducted another study to describe COVID-19 spread behavior based on fractional calculus (31,32). focused on the virus transfer from the reservoir to people, we focus on prediction on mortality rate and the spread of the virus based on observed data.

Deep Learning and its variations
Deep learning (DL) is a machine learning algorithm which is based on artificial neural networks (ANNs). This research introduces a DL system for the prediction of the COVID-19 time series. The following is an introduction to some of the DL methods used to predict time series namely LSTM, Bi-LSTM, Conv-LSTM and GRU.
LSTM is a special type of Recurrent Neural Network (RNN) which relies on its repeating module called cell to remember sequence of information. Each cell contains three gates namely input, output, and forget gates. The forget gate decides how much information of the cell state must be thrown away. The input gate specifies the new information that must be stored in the cell state. The output gate decides the parts of the cell state that must be sent to the cell output.
A Bi-LSTM network is an extension of traditional LSTM which trains two LSTMs. One of the LSTMs is trained on the input sequence. The other LSTM is trained on the input sequence but in reversed order. Bi-LSTM can achieve faster learning compared to traditional LSTM.
Traditional LSTM has been designed to work with one-dimensional data so it cannot cope with multi-dimensional data such as images. Conv-LSTM replaces the associated gate layers of the LSTM with convolutional layers to address this issue. Conv-LSTM can encode Spatiotemporal data in its memory cell (39). Subsequently, by supplanting the convolution operators with an LSTM memory cell, the Conv-LSTM can know which data should be 'remembered' or 'forgotten' from the past cell state.
GRU (40) is a special version of RNN. GRU is similar to LSTM but instead of three, the number of gates in GRU is two: upgrade and reset gates. The upgrade gate determines how the past information should be passed along to the future. The reset gate determines how much of the past information must be discarded (41).

Dataset Description
This research aims to predict COVID-19 prevalence in the future, focusing on the new cases and the new deaths rate. The dataset used in this research contains the statistical reports of COVID-19 cases and the mortality rate of different countries. It has been obtained from the WHO website (42). The dataset includes eight different columns such as "Date Reported", "Country Code", "Country", "WHO Region", "Cumulative Cases", and "Cumulative Deaths". In this research, "New Cases", "Cumulative Cases", "New Deaths", and "Cumulative Deaths" columns are used as time series to forecast future rate of new cases and deaths in Australia and Iran. The rest of the features are presented in Table 1. In the presented study, data from two countries Australia and Iran are used.

Proposed Method
In this research, a DL-based approach was used to forecast the rate of new cases and new deaths every one, three and seven days. We experimented with six neural network models as our predictor. Each model consists of an input layer, an output layer and three hidden layers.
The first three models were LSTM, Conv-LSTM, and GRU. The next three models were the bidirectional version of the first three ones i.e. Bi-LSTM, Bi-Conv-LSTM and Bi-GRU. The number of neurons in the hidden layers was 50. In all layers, the Rectified Linear Unit (ReLU) was used as the activation function. The training was performed with respect to MSE loss function using Adam optimiser. The hyper-parameters of Adam were set to 1 =0.9 and 2 = 0.999. The learning rate was set to 0.001. The model was trained for 200 epochs. In Table 2, additional details of the implemented models are shown.
For the training data, the time series of Australia and Iran have been chosen from WHO website's database which reports new cases and new deaths rates. Approximately 70% of the data were used for training and the rest were kept for testing. About 20% of the training data were used for validation.
During the training for the first time, the time series were fed to the model based on which the model predicted the next day. The model training was repeated for the second time such that its output predicted the next three days. Finally, the model was trained for the third time to achieve predictions for the next seven days. As the forcasting horizon increases from one to three and to seven, the error rate of the model icreases which makes sense since forecasting for a longer horizon is harder than forecasting for a shorter horizon. The training process was implemented for both the time series of new cases and new deaths. Figure 1 illustrates the high leve steps of the proposed method.

Experimental Results and Analysis
In this section, the experimental results for LSTM, Conv-LSTM and GRU as well as their bidirectional counterparts are reported. To the best of our knowledge, we are the first to use

Bi-Conv-LSTM and Bi-GRU for prediction of COVID-19 new cases and deaths based on
time series data.
To have a fair comparison, we tried to implement all methods with relatively similar conditions. The prediction error was calculated based on critieria (14) where is the actual values, ̂ is the corresponding estimated values, and n is the number of samples.

Forecasting performance
For each of the mentioned methods, the error of 1, 3, and 7-day ahead predictions for new cases/deaths in a 100-day period were calculated in Australia and Iran. To this end, the predicted values were compared with the actual values, and the error rate was calculated based on evaluation criteria (Equations 1-4). The results of calculating the errors in the 100day period for each of Australia's models are given in Figure 2 and Figure 3. As it is apperant from MAPE values in Figure 2, Bi-GRU and LSTM have the best performance in the 1-day perdiction, Conv-LSTM is the best method in the 3-day prediction, and Bi-Conv-LSTM has the best performance in the 7-day prediction. All of the evaluated methods in Figure 2 have approximately similar explained variance. Figure 3 illustrates the evaluation results for new deaths prediction for the 100-day period in Australia. An interesting observation in Figure 3 is how LSTM significantly outperforms GRU in the 7-day ahead prediciton. The reason lies in the fact that GRU has a simpler structure (less parameters) consisting of only two gates.
However, the more complex structure of LSTM seems to prevail sometimes as it is the case in the 7-day prediciton of new deaths in Figure 3.      (Figure 12.a), GRU has predicted 354000 while true value was 345000. The absolute error in this case is |345000 − 354000| = 9000 which is 2.6% of the true value. Therefore, at first 9000 seems to be a large error but it is indeed tolerable compared the magnitude of the true value (345000). For the seven days ahead prediction (Figure 12.c), the same method has predicted 404000 while the true value was 345000. The absolute error is |345000 − 404000| = 59000 which is 17.10% of the true value. Obviously, the absolute error of the seven days ahead prediction is higher than that of one day ahead prediction. However, 59000 is still a reasonable value compared to the true  Table 3 based on which the algorithms were ranked as shown in Table 4. The methods with lower ranks are better than the ones with higher ranks.
To carry out Friedman test, the rankings from Table 4 are required. Suppose the rank of the jth classifier on the i-th dataset is denoted by ri j so the average rank of the algorithms can be computed by = 1 . The Friedman test is then computed by the following formula: where k is the number of algorithms and N is the number of datasets. Motivated by Friedman test, Iman and Davenport (46) proposed another statistical test as follows: which has F-distribution with ((k − 1),(k − 1)(N − 1)) degrees of freedom. According to the results of Table 4, χ 2 F and Ff are computed as follows: According to six algorithms and 12 datasets (New cases 1-day AU, …), is governed by the F-distribution with ((k − 1), (k − 1)(N − 1)) = (5,55) degree of freedom. The critical value of F(5,55) is 2.38 for significance level α = 0.05. As it is clear in Table 4, Bi-GRU algorithm has the best average rank among all the algorithms followed by LSTM, GRU, Bi-Conv-LSTM, Bi-LSTM, and Conv-LSTM.  forecasting ability of the six models is due to their memorizing capability. The limitation of the proposed method is that the characteristics of the time series data might change as time passes. Therefore, to keep the models accurate, we are forced to incur the cost of training the models on the newly observed data.

Conclusion
In this research, six different models were compared for predicting the number of new cases and deaths in the next 100 days. The prediction was done for each day, every 3 days and every 7 days. The conducted experiments showed that most of the time, the bidirectional models outperform their non-bidirectional counterparts.
In the future, the plan is to use a combination of other machine learning and deep learning methods to achieve better results. In particular, experimenting with non-parametric models such as Gaussian Process (GP) to perform time series forecasting seems interesting since GP can provide uncertainty about its predictions. We might be able to determine the appropriate prediction horizon based on the uncertainty provided by GP.