Approach to COVID-19 time series data using deep learning and spectral analysis methods

: This article focuses on the application of deep learning and spectral analysis to epidemiology time series data, which has recently piqued the interest of some researchers. The COVID-19 virus is still mutating, particularly the delta and omicron variants, which are known for their high level of contagiousness, but policymakers and governments are resolute in combating the pandemic’s spread through a recent massive vaccination campaign of their population. We used extreme machine learning (ELM), multilayer perceptron (MLP), long short-term neural network (LSTM), gated recurrent unit (GRU), convolution neural network (CNN) and deep neural network (DNN) methods on time series data from the start of the pandemic in France, Russia, Turkey, India, United states of America (USA), Brazil and United Kingdom (UK) until September 3, 2021 to predict the daily new cases and daily deaths at different waves of the pandemic in countries considered while using root mean square error (RMSE) and relative root mean square error (rRMSE) to measure the performance of these methods. We used the


Introduction
The COVID-19 pandemic is still evolving in various countries around the world, and most countries are attempting to encourage their citizens to be vaccinated in order to reduce its contagiousness and the daily new cases being recorded. The pandemic is more prevalent in the countries discussed in this article, owing to the delta and omicron variant, which is known for its high rate of contagiousness. Its forecasting and prediction are critical for guiding experts and leaders all over the world, and many researchers have worked on and are still working on it (see [1][2][3][4][5][6][7][8][9][10] for some papers coming from our research network).
As of November 5, 2021, there are 249,671,619 cases worldwide, with the United States having 47,198,233 cases, India 34,333,199,Brazil 21,849,137, the United Kingdom 9,241,916, Russia 8,714,595,Turkey 8,178,901,and France 7,190,334, making them the top seven countries in which the COVID-19 outbreak is more prevalent [11]. Globally, 5,049,824 people have died as a result of the disease, while 226,062,761 have recovered, proving that the virus is not fatal but highly contagious and evolves into different variants, and if not properly treated and mitigated, we will continue to see more new cases despite the vaccination campaign currently underway in all countries.
Many researchers have worked on deep learning and spectral analysis applications with various data, most recently with the COVID-19 data. We'll go over a few of the works below: Using daily confirmed cases data from the COVID-19 pandemic, [12] used regression models to make predictions on various upcoming scenarios in various countries. [10] derived new information on the epidemic parameters in France by using a new method to analyze the COVID-19 cumulative reported cases data and also computing the effective basic reproduction ratio on a daily basis. [13] worked on the COVID-19 confirmed cases graphs for the United States, Brazil, Canada, and Australia. The results show a decrease in new cases for the 2019-2020 season, and it was concluded that the control measures put in place by the governments of the countries considered may have played a critical role in mitigating pandemic spread. Forecasting the COVID-19 pandemic using different time series data and machine learning methods for different countries was considered in [8,[14][15][16][17], while the idea of using artificial intelligence to tackle the COVID-19 pandemic using deep learning methods like RNN in [18][19][20][21], ANN in [22,23], CNN in [24][25][26][27][28], DNN in [29,30], GRU in [31][32][33][34], and LSTM was considered in [35][36][37][38]. The results of researchers who used both machine and deep learning methods show quite good predictions. Spectral analysis allows us to smooth time series data, as well as transform time to frequency in order to see different peaks of the data being considered, and it is useful for stationary time series data [39][40][41][42][43][44], but non-stationary data was considered in [45]. The prediction performances of the different time series methods are analyzed in [46][47][48][49][50][51].
In [52], a survey approach was used to investigate how deep learning has combated the COVID-19 pandemic, and they described how different data types are inputted to deep neural networks and how tasks are constructed as learning problems. The oscillations in USA COVID-19 incidence and mortality data were studied in [53], and it was demonstrated that this oscillation in the number of new cases can be strongly explained by daily variation in testing. [54] investigated the test for significant for periodogram peaks and its applicability for time series data, whereas [55,56] provided forecasting in spectrum analysis for time series data, which was proven to be a powerful tool for parameter estimation and prediction. Some other classical modelling of COVID-19 pandemic can be found in [57][58][59][60][61].
This paper aims to use a data-driven approach to retro-predict and forecast the daily new cases and deaths of the COVID-19 outbreak from the start of the pandemic until September 3, 2021 at various stages of the waves in USA, India, Brazil, France, Turkey, UK, and Russia, which are the top seven countries in the world where the pandemic is the most prevalent. For prediction, we used various deep learning methods, while for short term forecasting, we used ELM, MLP and spectral analysis, and we compared their performance using RMSE and rRMSE. We also use spectral analysis to measure the different frequencies in our data set during the contagiousness period by estimating the time series periodicity and analyzing their peaks. In our literature review on the analysis of data from the COVID-19 pandemic, we have not found any use of prediction methods ranging from classical neural methods to recent deep learning algorithms (extreme machine learning (ELM), multilayer perceptron (MLP), long short-term neural network (LSTM), gated recurrent unit (GRU), convolution neural network (CNN) and deep neural network (DNN)) where data of countries was considered from the beginning of the pandemic till most recent data which include when vaccination has started in the countries we considered. Moreover, no study uses these methods after a spectral analysis allowing to remove the weekly and seasonal components discovered on the epidemiological data of incidence and mortality. Finally, no article deals jointly with a panel of seven countries representative of the diversity of behaviors of the viral disease according to specific demographic and geoclimatic contexts.
The paper is organized as follows: we presented the time series visualization and evaluation metrics in Section 2 and Section 3 describes the methods used in the article. In Section 4, we present the results of the data analysis, in Section 5, we discussed these results, and in Section 6, we drew some conclusions from our findings in the article.

Time series and evaluation metrics
In this section, we present the time series visualization for Turkey, the United Kingdom, the United States, Russia, India, France, and Brazil for daily new cases, 3-day moving average, and 7-day moving average from [11]. This time series data from [11] will be used throughout our analysis. When analyzing a model, it is critical to measure its performance in order to draw the best conclusion and interpretation for the time series data. Two errors were used to estimate the prediction and forecasting precision for the models, where for i 1, 2, … , n, Y ′s are the observed values, n is the number of data points and y ′s are the predicted values given below: -Root Mean Square Error (RMSE), given as:

Methods
In this section, we describe the two types of methods we used in our analysis.

Deep learning
There are several deep learning methods, and we decided to use six of these methods in order to compare their performance on the data set used for the Covid-19 outbreak analysis.

Long Short-term Neural Network (LSTM) and Gated Recurrent Unit (GRU)
LSTM and GRU represent the most advanced form of Recurrent Neural Networks (RNN). The LSTM solves an inherent problem of recurrent neural networks, the gradient vanishing problem, by addressing it with a long short-term neural network. It is designed to model long-term dependencies, and determine the optimal time lag for the time series by giving to the memory unit the ability to decide, remember, or forget some information. This complex positioning among the recurrent neural networks allows the LSTM network to recall past data, which makes it easier to create connections between current and past data points, allowing the network to find patterns that play out over time [18,62].
Because it uses gates to control data flow, the GRU is very similar to the LSTM, but unlike LSTM, it does not have a separate cell state; instead, it only has a hidden state. The GRU has some parameters that help to speed up the data set training [19,63]. Figure 2 depicts the LSTM and GRU architectures. The hidden layer in an artificial neural network (ANN) is used to store and evaluate how significant one of its inputs is for the output. Figure 3b represents an ANN if it has only one hidden layer. CNN and DNN are two types of artificial neural networks.
A CNN is a type of neural network made up of neurons with learnable weights and biases. In comparison to other classification algorithms, a CNN requires much less preprocessing. It is made up of filters that are applied to the input data, effectively condensing the data into a smaller resolution. CNN is well-positioned to denoise input data so that it can be fed into basic neural networks. The advantage of this method is that it approaches data spatial dependency [34,[65][66]. Figure 3a depicts a CNN illustration.
A DNN is an artificial neural network with multiple layers in which data flows from the input layer to the output layer without going backward, and the links between the layers are only one way, which is forward, and they never touch another node again. It's also known as a feed forward network [21,67]. Figure 3b depicts a DNN.
A MLP is a type of feedforward ANN that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer, as shown in Figure 3b, whereas ELM is used in the training algorithm for a feedforward ANN, which converges much faster than traditional methods and yields promising performances [38][39][40][41].

Spectral analysis
Spectral analysis is the decomposition of a time series into underlying sine and cosine functions of different frequencies, which allows us to identify those frequencies that appear to be strong or very important. Another way to think about spectral analysis is as a linear multiple regression problem, with the dependent variable being the observed time series and the independent variables being the sine and cosine functions of all possible frequencies. Spectral analysis always identifies the correlation of sine and cosine functions of different frequencies with the observed data, and if a large correlation is identified, one can conclude that the respective function has a strong periodicity in the real data [39]. Spectral analysis is useful for analyzing stationary time series and identifying noise-corrupted periodic signals.
One of the most important tools in spectral analysis is the periodogram, which quantifies the contributions of individual frequencies to time series regression and is denoted as P a b , where P is the periodogram, a cosine parameter and b sine parameter values at frequency k (for k = 1, 2, …, n/2). The periodogram values can be interpreted in terms of the variance of the real data at the respective frequency. A plot of P (as spikes) against k is a Fourier line spectrum. The periodogram is obtained by joining the tips of spikes in the Fourier line spectrum to give a continuous plot and scaling it to an area equals to the variance [40].

Parameters for the modelling in LSTM, GRU, CNN, and DNN methods
LSTM: To evaluate the model, we trained 80% of the data set and tested 20%. The MinMaxScaler feature in Python was used to normalize the data, and the best hyperparameter tuning for the LSTM, as determined by a manual search, is: Batch size of 32, epochs of 100, drop out = 0.2, and units = 50. Adam optimizer was used, and mean square error measures the effectiveness of the loss. The Adam optimizer was found to slightly outperform other learning algorithms. The errors are shown in Tables 1 and 2, and the visualization of the results is shown in Figure 4 for daily new cases and Figure 5 for daily deaths.
GRU: To evaluate the model, we trained 80% of the data set and tested 20%. The MinMaxScaler feature in Python was used to scale the data, and the best hyperparameter tuning for the model, as determined by a manual search, is: Batch size of 30, epochs of 100, drop out = 0.2, and units = 50. We used Adam optimizer and tanh activation function. Mean square error measures the effectiveness of the loss. The Adam optimizer was found to slightly outperform other learning algorithms. The errors are shown in Tables 1 and 2, and the visualization of the results is shown in Figure 4 for daily new cases and Figure 5 for daily deaths.
CNN: To evaluate the model, we trained 80% of the data set and tested 20%. We find an input layer with three neurons, followed by convolution layers with 128 units, 64 units, and 16 units, a maxpooling layer with 64 units, and a kernel size of 3. The MinMaxScaler feature in Python was used to scale the data, and the best hyperparameter tuning for the model, as determined by a manual search, is: Batch size of 32, epochs of 100, verbose = 0.2 but 0 for checkpoint, and validation split = 0.2.
Adam is the optimizer in use. While the mean square error was used to assess the effectiveness of the loss, the relu activation function was used for all layers expecting the output with linear function. Adam optimizer was found to slightly outperform other learning algorithms. Tables 1 and 2 show errors of the result, Figure 4 shows the visualization of the results for daily new cases and Figure 5 for daily deaths.
DNN: To evaluate the model, we trained 80% of the data set and tested 20%. The DNN has two hidden layers with 32 and 8 neurons, respectively, and we discovered that if we go beyond 32 neurons, the model tends to give much higher error with poorer prediction. The MinMaxScaler feature in Python was used to scale the data, and the best hyperparameter tuning for the model, as determined by a manual search, is: Batch size of 32, epochs of 100, verbose = 0.2 but 0 for checkpoint, period = 1, and validation split = 0.2.
The optimizer used is Adam, and mean square error measures the effectiveness of the loss. The relu activation function was used for all layers expecting the output with linear function. Adam optimizer was found to slightly outperform other learning algorithms. Tables 1 and 2 show the errors for the results, Figure 4 shows the visualization of the results for daily new cases and Figure 5 for daily deaths.

ANN results
We present here a visualization of the network weights and the bias between our data. As shown in Figures 6 and 7, the weights are good with low bias. We also predicted some of the data (the test data) and the daily new cases prediction scores in Figures 6 for ( We discovered that the daily deaths errors were lower than those of the daily new cases.

MLP & ELM results
We described the best parameter we used for each case based on performance and forecasting of daily new cases and deaths. For Turkey daily deaths and Russia daily new cases and deaths, we used 12 lags, five hidden layers, and 20 repetitions, with series modelled in difference and forecast combined using the median operator, as shown in Figures 9c, 8d, and 9d. For daily new cases, we used 24 lags for Turkey, India, and Brazil, as well as for death cases for the United Kingdom while keeping 12 lags and testing the rest for inclusion. Other parameters, such as those shown in Figures 8c, 8f, 9b, and 9e, were kept. In addition, for daily new cases in France, United States, United Kingdom, and Brazil, we used 24 lags while retaining all lags, as shown in Figures 8a, 8b, 8g, and 9f. For India daily death cases, we use four hidden layers, 20 repetitions, and 12 lags while keeping the same parameters for forecasting and series modelling as shown in Figure 8e. Figure 9a shows the use of 24 lags for daily deaths in the United States, with no testing for inclusion. Finally, for France daily deaths, we removed the trend with no differencing while also using regressors and 12 lags.

Spectral analysis results
We first checked for stationarity and normality before applying spectral analysis to our set of time series data. We used the Box-Lung test, the Jarque-Bera normality test, and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test and found that the p-value  0.01 for all of these tests and for all of the time series considered in this article, leading us to the conclusion that our data set is stationary and its residual is normal. We then estimated the spectral density and smoothed the periodogram. We plotted the spectral density and spectrum and performed harmonic regression because the goal of spectral analysis is to decompose a time series into periodic components. We consider this by performing a regression, in which we regress the time series on a set of sine and cosine waves. We attempted to regress the time series on harmonic waves, including daily harmonics as well as other harmonics. For the entire data set, we obtained p  0.05. We were able to accurately describe the daily variation, and the median of the residual was negative and close to zero, confirming the normality. We scaled both x and y using a continuous scale for x and log scale for y. Log-scaling has some theoretical advantages.
The periodogram values should be roughly normally distributed in the log scale, and log scaling can be useful because it spreads out the low frequencies while squashing the high frequencies. Due to the noisy nature of the periodogram, we smooth it by using 9 moving averages for the kernel functions of the periodogram. We also use tapering and multi-taper to estimate the spectral density, which is a method that allows us to test for peaks using the F-test. We discovered that p  0.001 for the entire data set, indicating that there are peaks in the model. The time bandwidth parameter (NW) used is 16 and the number of tapers is 31. Figures 10 and 11 show the results of the spectral analysis as well as a forecasting using the analysis, methods and theoretical study given in [50]

Discussion
According to Tables 1 and 2, DNN has the least error for the daily new cases in the United States, and by looking at Figure 4a, one can say that the estimation is good and it reflects the situation of the pandemic in the country as new cases fluctuate, just like daily deaths, which are best predicted by GRU with the least error. For the UK, MLP performed better than other models with the least error, as shown in Figure 8b, where daily new cases are decreasing from November 2021 as observed from real data, while CNN best predicts daily deaths, as shown in Figure 5b. For Russia, the daily new cases and daily deaths are best predicted by MLP, which has the least error, and from Figures 8d and 9d, we can see that the forecast curve is increasing, which corresponds to the current situation in the country, and with the festive season approaching in December 2021, it is expected that cases will increase if not controlled and also as new variants emerges because the virus keeps mutating. For France, CNN best predicts daily deaths, indicating that, despite a decreasing trend, cases will fluctuate as shown in Figure 5g, and MLP best predicts daily new cases, indicating a slight increase in September 2021 as shown in Figure 8g. We discovered that MLP achieves the best prediction for daily new cases and daily deaths in India, Brazil, and Turkey, as shown in Figures 8c, 8e, 8f, 9c, 9e, and 9f. Despite a slight increase in September cases, the pandemic dynamics from October in India, Brazil, and Turkey shows a decreasing but fluctuating trend. Figures 10 and 11 show a clear peak of frequency at value 0.145, which is approximately a 7 days period showing a weekly pattern for countries considered, both for daily new cases and daily deaths, except for Turkey daily deaths as seen in Figure 11f, where peaks are not really visible with a slight peak at 0.14 and another at 0.08. India daily deaths, Russia daily new cases, and the United Kingdom daily new cases have only one clear peak (see Figures 11b, 10e, and 10g), whereas others have smaller peaks at 3.6 days and 3.4 days, which represent approximately mid-week, which is an expected phenomenon due to accumulated cases from the beginning of the week, and as we have also seen in some countries collating their infection records after three days or more (e.g., one week in Cameroon). Furthermore, the forecasting pattern of the spectral analysis results corresponds to that of the MLP forecasting results presented, with the exception of Russia daily new cases and daily deaths, which have different results for both methods. Forecasting based on spectral analysis also confirms fluctuations in MLP forecasted values. We have also observed that all the methods are sensitive to the parameters being used: for instance, for DNN if we go beyond 32 neurons, the model tends to give much higher error with poorer prediction. Also, for MLP if we use too many lags and neurons for some of the data, the prediction will be poorer, which we have explained earlier in the section dedicated to the model. The training time used in the modelling varies across models and that is one factor that enhances their performance: we observed that LSTM training time was the lowest, hence, it does not perform better in any of the data set. Eventually, we noticed also that the model performance depends highly on the data size.

Conclusions
In this paper, we were able to use different deep learning methods for prediction and spectral analysis of Covid-19 outbreak data using recurrent forecasting for the UK, USA, India, France, Brazil, Turkey, and Russia, which are countries where the pandemic is the most prevalent across the globe. We show that despite vaccination campaigns, in some of these countries, the daily new cases are still increasing even though deaths are reduced despite the high rate of infection. Medically that shows that people immune system is fighting the virus and vaccination too is helping. The retro-predicted values based on our methods are closed and coincide often with the observed values of the time series. We also find significant peaks at about 7 days, 3.6 days, and 3.4 days. Mitigation measures such as vaccination, use of facemasks, social distancing, and so on, have reduced people's exposure to the virus [68,69] and are dynamically changing the values of the time series. We have also been able to show that the rRMSE is better to check the performance of the model as the values of RMSE are large, most especially for daily new cases for all countries.
Conclusively, for deep learning methods, MLP has shown better prediction than other methods for four countries out of the seven countries considered for both daily new cases and daily deaths, followed by CNN and also, spectral analysis prediction shows similar patterns with that of MLP.
In future, we intend to use the analysis presented in this article for continental base approach in order to better forecast the future pandemic evolution in different continent. Deep learning performs better with large data set and we hope that the results we will get in the new approach will be better than all work that has been done in the direction of this study.