SIRVD-DL: A COVID-19 deep learning prediction model based on time-dependent SIRVD

COVID-19 is one of the biggest challenges that human beings have faced recently. Many researchers have proposed different prediction methods for establishing a virus transmission model and predicting the trend of COVID-19. Among them, the methods based on artificial intelligence are currently the most interesting and widely used. However, only using artificial intelligence methods for prediction cannot capture the time change pattern of the transmission of infectious diseases. To solve this problem, this paper proposes a COVID-19 prediction model based on time-dependent SIRVD by using deep learning. This model combines deep learning technology with the mathematical model of infectious diseases, and forecasts the parameters in the mathematical model of infectious diseases by fusing deep learning models such as LSTM and other time prediction methods. In the current situation of mass vaccination, we analyzed COVID-19 data from January 15, 2021, to May 27, 2021 in seven countries – India, Argentina, Brazil, South Korea, Russia, the United Kingdom, France, Germany, and Italy. The experimental results show that the prediction model not only has a 50% improvement in single-day predictions compared to pure deep learning methods, but also can be adapted to short- and medium-term predictions, which makes the overall prediction more interpretable and robust.


Introduction
COVID-19 is an infectious epidemic caused by a new Coronavirus virus(SARS-Cov-2). Since the outbreak of COVID-19 at the end of 2019, it has spread throughout the world. As of June 15, 2021, there are 176,275,966 confirmed cases and 3,812,436 deaths worldwide. COVID-19 has had a huge impact on the global economy, human health, and daily life. In response to this pandemic crisis, researchers and policymakers around the world have been working hard to study and develop countermeasures and solutions against COVID-19 to control the epidemic and reduce its impact on human health and the economy [1]. There have been a large number of studies on COVID-19 by modeling the spread of the virus and predicting the number of people infected by the virus [2].
Since the mathematical theory of epidemiology proposed by Kermack in 1927, mathematical model of infectious diseases has been used as an important tool for epidemiological feature analysis and transmission analysis [3]. Since the epidemic of COVID-19, several mathematical models have been used to predict the epidemic. Among them, the most used models are SIR(Susceptible, Infected, and Recovered) and SEIR(Susceptible, Exposed, Infected, and Recovered) models, and most studies are based on these two models for corresponding modifications and adjustments. Liao et al. [4] proposed a generalized adaptive SIR prediction model based on time window characterized by introducing a time window mechanism for dynamic data analysis, which can dynamically measure the basic infection number and exponential growth rate. Pushpendra et al. [5] proposed a generalized SIR (GSIR) model, which includes multi-day reported cases. The experimental evaluation on COVID-19 data in different countries shows that the model can provide continuous prediction and monitoring for the pandemic of COVID-19. Based on the SIR model, Cooper et al. [6] regarded the number of susceptible people as a variable, which provided a theoretical framework for the study of the spread of COVID-19 in the community. Hakimeh et al. [7] used Caputo derivatives to establish a fractional-order SIRD(Susceptible, Infected, and Deceased) mathematical model for the transmission of COVID-19. Through numerical simulation of different order derivatives, the second wave epidemic situation in Iran and Japan was predicted. Ramezani et al. [8] proposed a variant of the SEIRD(Susceptible, Infected, Recovered and Deceased) model to capture the non-linear behavior of the COVID-19 pandemic while accounting for asymptomatic infected individuals. Mathematical models of infectious diseases can be used to evaluate the transmission process of epidemics, but the introduction of model parameters is based on many assumptions. Unknown parameters need to be estimated by model fitting, which makes the model more uncertain [9]. Moreover, for the long duration of COVID-19, the change of model parameters is related to many factors. It is a challenge for a single model to fit the real situation well [10]. In order to effectively predict COVID-19, more Step 1: Data preprocessing. First, obtaining data sets of COVID-19 including the number of confirmed cases, recovered cases, deceased cases, and vaccination. Then performing data preprocessing and transform the data into the required format for the SIRVD-DL model.
Step 2: Constructing the SIRVD model. In this step, an existing SIRVD model is modified to adapt to the dynamic changes of time. Then we build the curve of parameter changes over time by measuring the model parameters.
Step 3: Deep learning model. Based on the measurement of model parameters obtained from the last step, the moving average method is used to smooth the data. Then a time series vector is constructed as the input. Model training and evaluation are carried out through deep learning methods such as LSTM and its variants to select the best model parameters. By using the model parameters, the SIRVD-DL model is built to predicting the number of COVID-19 infections.
advanced methods should be applied to consider various factors in the process of virus transmission.
As an emerging method, deep learning technology is widely used in the analysis and prediction of the COVID-19 epidemic. Vinay et al. [11] used the LSTM(long short-term memory) deep learning method for the first time to model the spread of infectious diseases in Canada to predict the severity of COVID-19. Smail et al. [12] used ARIMA(Autoregressive integrated moving average method), NARNN(Nonlinear autoregressive neural network), and LSTM methods to model confirmed cases of new coronary pneumonia, and the results of the study found that LSTM is the most accurate model. Khondoker et al. [13] studied four deep learning models: LSTM, GRU(Gated recurrent unit), CNN(Convolutional neural network), and MCNN(Multivariate convolutional neural network), the results show that the CNN is superior to other deep learning models in terms of verification accuracy and prediction consistency. Jayanthi et al. [14] used ARIMA, LSTM, Stacked LSTM, and prophet models to analyze and predict the global cumulative number of confirmed cases, death cases, and recovery cases. The results show that the Stacked LSTM algorithm has higher accuracy than other considered methods, and the error is less than 2%. From the above research, deep learning can provide reliable single-day prediction results. However, this kind of methods have two problems. The first one is that these methods by using pure numerical fitting cannot correctly capture the trend of epidemics in the spreading process [13]. The another one is that the temporal change pattern of the number of infected people is very simple. It is difficult for these methods to find right patterns of long-term changes, leading to predictions that are only valid for a short period. A model that can provide forecasts for longer periods is undoubtedly more meaningful for policy-making and strategic planning.
Considering that epidemic mathematical models have background knowledge of infectious diseases and deep learning technology has strong predictive capabilities, combining both technologies may improve the interpretability of deep learning methods and generate more robust predictions [15]. This paper proposes a COVID-19 prediction model -SIRVD-DL (Susceptible, Infected, Recovered, Vaccinated, and Deceased -Deep Learning) by using deep learning method based on time-dependent SIRVD to combine these two methods to make the model more explanatory and provide effective predictions. Firstly, this model modifies the existing SIRVD model of vaccination status to enable it to dynamically measure model parameters. The model parameters are smoothed by using the moving average method to deal with data noise and anomalies and predicted by the deep learning method. Finally, the predicted parameters are substituted into the SIRVD model to obtain the final predicted number of infections. We have conducted related numerical experiments and are interested in addressing the following three important research questions.
• RQ1: Is the time-dependant SIRVD-DL model reasonable and effective? • RQ2: How is the prediction performance of the proposed SIRVD-DL model? What is the improvement compared to single deep learning methods? How does the model perform in the short and mediumterm forecasts? • RQ3: Is the moving average method proposed in this paper effective for data smoothing? What is the difference between using this method and not using this method?
The rest of the paper is organized as follows: in the second section, we propose the SIRVD-DL prediction model. In the third section, we conducted numerical experiments and analyzed the experimental results to illustrate the effectiveness of our model. Then, in section 4, we made some discussions and suggestions. Finally, the last section is the summary of the paper.

Method
The SIRVD-DL prediction model is mainly divided into three parts: data preprocessing, constructing the SIRVD model, and deep learning model. The overall workflow of the model is shown in Fig. 1.
SIRVD-DL aims to evaluate the changes in the parameters of the epidemic to build the best model to predict the development trend of the epidemic. In the rest of this section, we will describe the contents of each part in detail.

Data sources
Since the outbreak of COVID-19, many governments have released various public data sources and incorporated real-time observation data for the latest analysis and predictions. This article collects relevant research data from two data sources. One is a time-series data collected by the Johns Hopkins University System Science and Engineering Center (CSSE), which is widely used by many researchers, including the global cumulative confirmed cases, cumulative cured cases, and cumulative deaths. The data set provides variables such as country, province, longitude, latitude, and the number of cases corresponding to the date. Table 1 shows some sample data of global confirmed cases.
The second data source is from the website -Our World In Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest. The data to be used in this article is the COVID-19 vaccination data [16] collected from official reports by the "Our World" data team. Table 2 shows some sample data of vaccination data. In the table, location represents the name of the country (or region within a country),  iso_code is a three-letter country code, the date is the date of the observation, total vaccination is the total number of doses administered, and people_vaccinated is the total number of people who received at least one vaccine Dose, people_fully_vaccinated is the total number of people who received all doses prescribed by the vaccination protocol.

Data transforming
After obtaining the above two data sets, we need to transform the data into the format required by the SIRVD model. The infected cases I(t)can be obtained by the formula: In order to get the change of the number of susceptible people S(t), we can use the following formula: Where S(t), I(t), R(t), V(t), and D(t) are the functions of the changes of the population status with time, respectively representing the number of susceptible, infected, recovered, vaccinated, and deceased individuals in the population when the total population number is N at time t. C(t), R(t), D(t), and V(t) can be obtained from the above-mentioned two datasets. The population N can be obtained from the data source provided by Our World In Data. It is all based on the last revision of the United Nations World Population Prospects. After preprocessing the data, the next step is to construct a time-dependant SIRVD model.

The time-dependant SIRVD model 2.2.1. The basic SIRVD epidemic model
The basic SIRVD model [17] is based on the epidemic model SIR. The SIRVD model describes the interaction of the virus with the host during transmission, and divides the population into 5 types: susceptible, infected, recovered, vaccinated, and deceased. SIRVD adds two new states based on the SIR model, vaccinated and deceased. Among them, vaccinated means individuals in the population who have been vaccinated against the disease and have the ability not to be infected, while deceased means the individual who died because of the disease [18]. The SIRVD model modified based on SIR can be represented by the following ordinary differential equations: Among them, β is the rate of infection, which is that the transmission from susceptible to infected; γ is the recovery rate, which is the transmission from infected to recovered; δ is the death rate, which is the transmission from infected to deceased; α is the vaccination rate, which is the transmission from susceptible to vaccinated; finally, σ is the susceptibility rate, which is the transmission from recovered to susceptible. It should be noted that the transmission process of the virus is described by the nonlinear term βI(t)S(t) N , which represents the number of people transferred from the susceptible S to the infected I per unit time.
In the basic SIRVD model, the five parameters of infection rate β, recovered rate γ, death rate δ vaccination rate α, and susceptibility rate σ are assumed to be constant, which ignores their time-dependent characteristics because they are constantly changing in the process of virus transmission. In order to accurately and effectively predict the development trend of the disease, we propose a time-dependent SIRVD model, in which the five parameters are all functions of time t. This timedependant SIRVD model can better track the spread of the disease, control, and predict future trends.

The time-dependent SIRVD epidemic model
The time-dependant SIRVD model regards the infection rate β, the recovery rate γ, the death rate δ, the vaccination rate α, and the susceptibility rate σ as functions that change with time t: β(t), γ(t), δ(t), α(t), and σ(t). The rewritten differential equations are as follows: Among them, the five variables(S(t), I(t), R(t), V(t), and D(t)) still satisfy Eq. (8). If we assume that the total population N is constant, then the sum of the increase or decrease of the state of each population is 0.
Since the data of COVID-19 is updated daily, we can change Eqs. 9-13 into difference equations.
Similarly, the five variables(S(t), I(t), R(t), V(t), and D(t)) still satisfy Eq. (8). During the spread of COVID-19, the reinfection rate can be assumed to be approximately equal to zero because the human body produces antibodies against the virus [19]. Therefore, Eq. (15) can be rewritten as: (20) That is, the expression of γ(t) can be obtained: In the same way, the other two time-dependent parameters α(t) and δ(t) can be easily derived from the above difference equations Eq. (16) and Eq. (17).
After getting the rate of recovery and the rate of death, bring into Eq. (14), we can get the time-dependant parameter β(t) which is expressed by Eq. (24).

Deep learning prediction model
Through the time-dependant SIRVD model, the measurement and evaluation values of the parameters can be obtained and arranged in time series. There is a problem that the curve of a time series of data -the number of infected people does not have a related sequence pattern [13]. To solve the problem, our model firstly predicts the estimated parameters to discover the development trend of the epidemic and then builds the time-dependant SIRVD model by using the values of predicted parameter. Deep learning technology is an effective time series prediction method [20]. In the next section, we will introduce some specific deep learning time series prediction methods, the prediction process of SIRVD-DL, and the evaluation metrics of the model.

Deep learning time series prediction model
Deep learning (DL) has a good performance in time series data analysis and prediction, and can automatically learn the time correlation and structure of data, such as seasonality, periodicity, and trend [14]. The recurrent neural network is a kind of neural network which is specially used to deal with sequence-structure data. It contains different time distributions of hidden states, which makes it possible to store a lot of information about the past. They are most commonly used in predictive applications because of their ability to process variable-length sequence data [21]. But the recurrent neural network is not able to overcome the vanishing gradient or gradient explosion, and because the only previous time step involves the hidden layer activation function, it can only store short-term memory, long-term historical information can not be passed [22]. In order to solve this problem, Hochreiter et al. proposed a long and short-term memory neural network LSTM. The model includes a memory storage unit and three logic gated units to control the flow of data, and can filter out which values to store, forget or erase, and solve the problem of vanishing gradient or gradient explosion in the long-term dependence of the recurrent neural network. There are many variants of LSTM, which are divided into Vanilla LSTM, Stacked LSTM, Bidirectional LSTM, and GRU according to their structure, which will be described in detail below.

Vanilla LSTM
Vanilla LSTM is the most basic LSTM and the most widely used [22]. Fig. 2 shows the architecture of Vanilla LSTM. The rounded rectangle represents a Cell state, the green circle represents the input of the time series, and the purple circle represents the output of the hidden layer. The output of the Cell state at time t and the output of the hidden layer are linked to the next cell state, and the data flow in the cell State. The cell state contains a memory block of hidden units, which are used to control the flow of information from input to output ports. The first sigmoid function is the forget gate, which forgets the previous cell state  information. The next sigmoid and the first tanh function are input gates, which represent information stored in the cell state or information that should be forgotten. The last sigmoid function is the output gate, which determines the information passed to the next hidden state. Eq. 25-29 are the specific mathematical form of LSTM: Among them, W represents the weight matrix, b represents the bias matrix, they exist in the forget gate f t , input gate i t , cell state c t and output gate o t .

Stacked LSTM
Stacked LSTM [23], also known as a deep LSTM, is an extension of the vanilla LSTM we described above. In Stacked LSTM, there are multiple hidden layers containing multiple memory cells. Multi-layer superposition increases the depth of the neural network. Each layer has some information and passes it to the next layer to form an accurate model with a higher level and deeper representation. The stacked LSTM structure is shown in Fig. 3. For each time step, it provides a separate output instead of providing a single output for all time steps.

Bidirectional LSTM
The standard RNN only processes the input in one direction and processes the information that it has in the future. This problem is solved by implementing the bidirectional topology of LSTM. Bidirectional LSTM [24] extracts complete time information at time t by considering past and future information. This method divides the hidden neurons of the standard RNN into a forward state and a backward state. The neurons in the forward state are not connected to the neurons in the backward state, and the neurons in the backward state are not connected to the neurons in the forward state. The basic structure of the three time steps of the two-way LSTM expansion is shown in Fig. 4. Without a reverse state, this structure is similar to a standard one-way RNN. With this structure, there is no need to include additional delays as in standard RNNs.

GRU
GRU(gated recurrent unit) can be described as a variant of LSTM [25], which is similar to LSTM. GRU is mainly used to solve the vanishing gradient problem in typical RNNs, thereby improving the learning of long-term dependencies in the network. Fig. 5 shows the structure of GRU. The GRU block also uses the tanh and sigmoid functions to calculate the necessary values. But unlike the LSTM block, the GRU block does not have a separate storage unit. There is no separate forget gate for this type of block, and the input/update gate is responsible for controlling the flow of information. Due to the difference in these two structures, it has fewer parameters and a simpler design, which ultimately makes it more computationally efficient and easier to train. In addition to the update gate, the GRU block has a reset gate. In a GRU block, four values are calculated: update gate, reset gate, candidate activation, and output activation. Each gate and candidate activation has its own weight and bias, and the current block input and the previous activation value are used as inputs for calculating these values. In the first step, the sigmoid function is used to calculate the value of the gate. Eq. 30-33 are the specific mathematical form of GRU: In this paper, Vanilla LSTM, Stacked LSTM, Bidirectional LSTM, and   GRU are all implemented using the Keras package(An open-source software library that provides a Python interface for artificial neural networks.) and the loss function is unified using MSE. Table 3 shows the number of layers, the number of units, and the total number of parameters for each method. We use the estimated model parameters β(t), γ(t), and δ(t) as the prediction targets of the deep learning method. Since the model parameters are constantly changing during the development of the epidemic, in order to measure this change, we construct a time window of size w as a dimension of the input time-series data into the deep learning model. Let the model parameter to be predicted as x, and x is one of the items in the set {β(t), γ(t), δ(t)}, then the time series can be obtained as

Predicting process
. Generally, the model parameters eval-uated by SIRVD will have data noise, which will affect the subsequent use of deep learning methods to extract temporal features. To deal with this problem, a moving average with a small sliding window is used to smooth the model parameters. It divides the model parameter curve into two parts: baseline and residuals. Specifically, let the length of this small sliding window as W and the step length as 1. For each point x t , the corresponding point on the baseline is x * t , and the value of x * t is the average value of (x t− W+1 ,x t− W+2 ,…,x t ). The difference between x t and x * t is called residual. Baseline B and residuals R can be calculated by the following formula.
By using the moving average method, most of the noise data are eliminated while the baseline retains its underlying shape. The residual contains random noise and is not considered as model input. Thus, the shape of a single piece of input data can be obtained as w × 1. In the time range of length T, let the output dimension of the model be d, and the input data latitude that can be constructed is (T − w − d) × w. Finally, the optimal hyperparameters are obtained by training and evaluating the model and then predicting β (t),γ(t), and δ (t). The predicted number of infections can be obtained by using Eq. (16). The prediction algorithm is shown in ALGORITHM 1.

Model evaluation metrics
By comparing actual and predicted values, the accuracy of the model can be evaluated. This study uses four evaluation metrics to make fair and effective comparisons. They are: root mean square error (RMSE), normalized root mean square error (NRMSE), determination coefficient (R 2 ), and average absolute percentage error (MAPE). The calculation method is shown in Eq. (37-40).

Parameters setup
The experiments in this paper are conducted on open source libraries such as Numpy [26], Pandas [27], Tensorflow (Google) [28] and Keras [29]. Python [30] is a high-level general-purpose programming language that can interact with deep learning libraries. The deep learning model structure used in this paper is built with it. Hyperparameters are Fig. 8. Single-day prediction using SIRVD-DL in India.

Table 6
The prediction performance in 3-day, 7-day, 14-day, 21-day, and 28-day between SIRVD-DL and other models.   Fig. 9. Extract the baseline of model parameters using the moving average method. The baseline x * t preserves the underlying shape of original data x t . (a) β * (t) and β(t) (b) γ * (t) and γ(t) (c) α * (t) and α(t) (d) δ * (t) and.δ(t) Fig. 10. The residual of model parameters using the moving average method. Liao et al. values that define the architecture of the deep learning model. The correct value of hyperparameters is the key to achieving high-quality models. In order to determine the best combination of hyperparameters, this paper uses a grid search algorithm that takes a set of possible parameters for each adjusted hyperparameter. Then, after determining each possible hyperparameter combination [31], each combination is used to train the deep learning model. In order to avoid the possibility of errors due to the initial random setting of the weights, each set of hyperparameters is used for three pieces of training, and then each implemented model is evaluated. The hyperparameter combinations are shown in Table 4. The data set used in our experiment is India's historical data from January 15, 2021, to May 27, 2021. The data is divided into a training set and a test set to train and test our prediction model. The evaluation metrics of the model are in Section 2.3.3.

RQ1:Is the time-dependant SIRVD model reasonable and effective?
As shown in Fig. 6, the relevant data set in India includes the cumulative number of confirmed cases, the number of susceptible people, the number of infected people, the number of recovered people, the number of vaccinated people, and the number of deaths. We applied the time-dependant SIRVD model to this data set and obtained the changes in model parameters from January 15, 2021, to May 26, 2021, through evaluation, as shown in Fig. 7, including infection rate, recovery rate, vaccination rate, and death rate. It should be noted that the susceptibility rate σ(t) is not included here. Since the human body will produce antibodies to prevent future re-infection of the COVID-19 virus [19], the susceptibility rate σ(t) is assumed to be equal to zero. It can be seen from Fig. 7(a) that the infection rate is at its maximum when Day = 90. Compared with the middle position of the rising period in Fig. 6(c), this is the fastest rising time, which shows that the model parameters measurement is valid. In Fig. 7(b), the recovery rate has a trough between Day = 60 and Day = 90. This is because a large number of infections leads to insufficient medical resources, which greatly reduces the success rate of recovery. Similarly, in the same interval, the vaccination rate in Fig. 7(c) reached a peak. The death rate in Fig. 7(d) gradually increased after the peak of infection, indicating that the number of death cases would reach the peak.
In general, the model parameters evaluated based on the time-dependant SIRVD model are a good measure of real-time changes in the development of the epidemic. And, the most important thing is that it can be found from Fig. 7 that each model parameter obtained by the measurement fluctuates up and down with time series periodicity, which is the basis for the subsequent parameter prediction.

RQ2 how is the prediction performance of the proposed SIRVD-DL model? What is the improvement compared to single deep learning methods? How does the model perform in the short and medium-term forecasts?
In order to verify the effectiveness of the proposed SIRVD-DL model prediction, we compared it with the method of using deep learning to predict in existing studies. Among them, Stacked LSTM and bidirectional LSTM use the methods of Arora et al. [32], Gru and vanilla LSTM use the methods of Nabi et al. [13]. Table 5 shows the comparison of SIRVD-DL and Vanilla LSTM, Stacked LSTM, BiDirectional LSTM, and GRU four prediction models on single-day prediction. The test data set is India's data from April 18, 2021, to May 27, 2021 It can be seen from Table 5 that SIRVD-DL is the best based on each evaluation metrics in single-day forecasting, with the lowest RMSE of 385128.55, NRMSE of 0.012373, R2 of 0.995, and MAPE of 0.92. MAPE is within one percent, which is an improvement of 51% compared to the best existing single deep learning prediction methods. Fig. 8 shows the difference between the predicted number of infections and the actual number of infections of the SIRVD-DL model.
At the same time, in order to test the effect of the model in the shortterm and medium-term prediction, we did 3-day, 7-day, 14-day, 21-day, and 28-day experiments. The experimental results are shown in Table 6. It can be seen from the table that SIRVD-DL predicted a MAPE of 2.51% on the 3rd, 5.07% on the 7th, 10.93% on the 14th day, 17.89 on the 21st day, and 26.57% on the 28th day. Compared with other methods, SIRVD-DL performs better in short-term and medium-term prediction. As the number of forecast days increases, the coefficient of determination R2 is always maintained at a relatively high level, which shows that the prediction of the model is effective. It is worth noting that with the increase in the number of prediction days, R2 appears to be a negative number in other methods. This is because we used the sklearn.metrics. r2_score function in Sklearn to calculate. A negative number indicates that the prediction error of the fitting function is greater than the mean function, which indicates that the prediction performance of the model is not good.
In general, the SIRVD-DL model performs well in single-day forecasts and short-term and medium-term predictions, which shows that the proposed model is effective. Moreover, the short-term and medium-term predictions have very important guidance and reference significance for helping governments balance the load of medical infrastructure and to implement such control measures as mandatory implementation or lifting of blockades.

RQ3 is the moving average method proposed in this paper effective for data smoothing? What is the difference between using this method and not using this method?
In order to predict the number of infections, we first used the moving average method to smooth the parameters evaluated by the SIRVD model, which is shown in Fig. 9. The baseline extracted by our algorithm eliminates most of the anomalies and noises while preserving its underlying shape. This processing method makes the changes of model parameters more periodic and more suitable for time series prediction. As shown in Fig. 10, it is the residual error after model parameter extraction. The residual error contains random noise, so prediction is not considered. After the data is smoothed, we apply standardization again to get a standardized baseline. Such a baseline serves as the input of our deep learning model.
We have carried out comparative experiments using moving average for data smoothing and not using moving average respectively. As shown in Fig. 11, the x-axis shows the three predicted model parameters β (t), γ(t) and δ (t). The blue bars are the performance results of the two methods on the training set. The green bars are the performance results on the test set. It can be clearly found that after the moving average is used for processing, the predicted MAPE has a significant drop, which shows that the moving average method has a greater improvement in the prediction of the improved model parameters. Fig. 12 shows the actual parameter prediction effect.

Discussion
COVID-19 has changed the daily lives of most people around the world. Researchers across the world have done much work on the transmission mode of COVID-19 that has an important guiding role in helping medical workers and government decision-makers. A large number of studies have proposed different COVID-19 prediction models, among which artificial intelligence-based methods are currently the most interesting to the global scientific community [20]. Although the use of artificial intelligence methods for prediction has high accuracy, it is difficult to use deep learning technology to discover the time change pattern of the number of infected people [13]. In order to solve this problem, We proposed the SIRVD-DL model to combine deep learning methods with mathematical models of infectious diseases to predict the parameters of the infectious disease model. The advantage of the SIRVD-DL prediction model is that the introduction of an infectious disease mathematical model makes the overall prediction more interpretable and robust. From our experimental results, it can be found that compared with other methods [32], SIRVD-DL not only improves the single-day forecast by 51% but is also suitable for short-term and medium-term forecasting. This is more practical for policy formulation and strategic planning.
In order to demonstrate the universality of our method, we also used data from eight countries including Argentina, Brazil, South Korea, Russia, the United Kingdom, France, Germany, and Italy as verification. The end data for COVID-19 data in 8 countries is the same, which is May 27, 2021. The start date of COVID-19 data for Argentina is December 29, 2020, Brazil is January 16, 2021, South Korea is February 25, 2021, Russia is December 15, 2020, and the United Kingdom is May 2021. On the 26th, France, Germany, and Italy are all on December 27, 2020. The prediction results of these eight countries using SIRVD-DL are shown in Table 7. From the table, we can see that SIRVD-DL performs well on the data sets of other countries. The average MAPE is within 1.5% and the coefficient of determination is 0.995. This shows that our model is extensive and can be used in other countries and regional COVID-19 predictions.
It should be noted that our model still has some limitations. One limitation is the accuracy of the data. The data we use is official statistics, and there will be some differences from the actual data in the real world. At the same time, the model does not consider the situation of asymptomatic infections and survivors being infected again. In addition, the reinfection rate of vaccination and the rate of vaccine efficacy are not considered in this paper because they are difficult to obtain and may be inaccurate, but they will impact on the model. Another limitation of our research is that the deep learning method we use for parameter prediction may not be optimal, and there may be better ways to solve the model and predict the parameters. We will improve our model in future work.

Conclusion
We proposed the SIRVD-DL model to establish a virus transmission model and predict the trend of COVID-19. The model combines the mathematical model of infectious diseases and deep learning technology to overcome the problem that single deep learning methods cannot capture development trends. It can reflect the epidemic model parameters in real-time under the current situation of large-scale vaccination and predict the epidemic trend of infectious diseases. Experimental numerical results show that the SIRVD-DL model can evaluate model parameters such as infection rate, recovery rate, and death rate. The single-day prediction accurate rate improves 51% compared to the best existing single deep learning prediction methods. At the same time, the model performs very well in short-term and medium-term predictions. The single-day prediction error is 2.51%, the 7-day prediction error is 5.07%, the 14-day prediction error is 10.93, and the 28-day prediction error is within 30%.

Author contributions
Zhifang Liao conceived the experiments, Xiaoping Fan collected the data, Peng Lan and Zhining Liao conducted the experiments, Benjamin Kelly and Aidan Innes analyzed the results. All authors reviewed the manuscript.

Funding
This work was supported by Special Funds for the Construction of an Innovative Province of Hunan, No:2020GK2028.