Epi-DNNs: Epidemiological priors informed deep neural networks for modeling COVID-19 dynamics

Differential equations-based epidemic compartmental models and deep neural networks-based artificial intelligence (AI) models are powerful tools for analyzing and fighting the transmission of COVID-19. However, the capability of compartmental models is limited by the challenges of parameter estimation, while AI models fail to discover the evolutionary pattern of COVID-19 and lack explainability. This paper aims to provide a novel method (called Epi-DNNs) by integrating compartmental models and deep neural networks (DNNs) to model the complex dynamics of COVID-19. In the proposed Epi-DNNs method, the neural network is designed to express the unknown parameters in the compartmental model and the Runge–Kutta method is implemented to solve the ordinary differential equations (ODEs) so as to give the values of the ODEs at a given time. Specifically, the discrepancy between predictions and observations is incorporated into the loss function, then the defined loss is minimized and applied to identify the best-fitted parameters governing the compartmental model. Furthermore, we verify the performance of Epi-DNNs on the real-world reported COVID-19 data on the Omicron epidemic in Shanghai covering February 25 to May 27, 2022. The experimental findings on the synthesized data have revealed its effectiveness in COVID-19 transmission modeling. Moreover, the inferred parameters from the proposed Epi-DNNs method yield a predictive compartmental model, which can serve to forecast future dynamics.


Introduction
Epidemic compartmental models categorized the population into different compartments to analyze the transmission dynamics of infectious diseases based on disease status, serving as an incredibly powerful tool for detecting, understanding, and combating outbreaks [1]. Kermack and McKendrick first constructed a fundamental susceptibleinfected-recovered (SIR) compartmental model to study the transmission dynamics of the Black Death in London, the United Kingdom in the year 1927 [2]. Since the start of COVID-19, SIR compartmental model and its variants Susceptible-Exposed-Infectious-Removed (SEIR) [3], Susceptible-Infected-Recovered-Death (SIRD) [4][5][6], Susceptible-Exposed-Infectious-Quarantine-Removed (SEIQR) [7], Susceptible-Exposed-Infectious-Hospitalized-Removed (SEIHR) [8], et al. have applications. One limit is that the computational cost of numerical simulations increases exponentially with the complexity of the parameters and models. Another limitation is that these parameter estimation methods are only suited for time-constant parameters, which fail to reflect the complex dynamic of the infectious disease over time in real-world scenarios.
Artificial intelligence(AI), especially deep neural networks (DNNs) models also played an important role in analyzing and fighting the transmission of the COVID-19 epidemic [13,14]. Despite AI models having great power to fit the data and provide short-time prediction, two main weaknesses hinder their practical applications. One is that they cannot find the patterns of the disease transmission process and will suggest not reasonable predictions due to ignoring biological reality. In addition, they depend heavily on the quality and quantity of data, the model may not be useful if the data do not reasonably capture reality. Compartmental models and AI models have both been shown to be reliable tools in fighting against the COVID-19 pandemic, along with their corresponding limitations, respectively. Therefore, exploring how to combine compartmental models and AI models to enhance their performance is a promising research topic. Recently, Physics-informed Neural Networks (PINNs) approaches have shown success in combining differential equations into the neural networks to satisfy the equations while accurately fitting the data [15]. That is, using neural networks to model nonlinear systems, but reducing the required data and constraining the model's search space with prior knowledge such as a system of differential equations. Since then, DNNs-based models are consistently used as the non-linear function approximation method and have shown their strong potential to address various science computing tasks in many fields. Additionally, several research efforts have attempted to apply the PINNs framework in modeling and analyzing the dynamics of COVID-19 [16][17][18][19][20].
The concept of PINNs was first proposed for time-dependent partial differential equations (PDEs), which provide a flexible computational framework to address various science computing tasks. Inspired by PINNs, we proposed an Epi-DNNs method to model the complex outbreak dynamics of COVID-19 by integrating real-world data, epidemic transmission laws, and numerical ODE solvers into DNNs. Specifically, we build DNNs to express the unknown parameters in the compartmental model and introduce a numerical ODE solver to solve the corresponding ODEs so as to give the values of the ODEs at a given time. The discrepancy between predictions and observations is formulated as the loss function, which is minimized and applied to identify the best-fitted parameters governing the compartmental model. We verify the effectiveness of the proposed Epi-DNNs method on the COVID-19 reported data in the real world across several regions. The findings of simulation experiments demonstrated that the proposed Epi-DNNs robustly perform data-driven parameter estimation for the COVID-19 transmission modeling. Thus, the main contributions of this paper are as follows: • To efficiently respond to the complexity of infectious disease transmission dynamics in the real world, we propose a method that combines mathematical modeling and neural network modeling. The proposed method considers the coefficients of the epidemic compartmental model as time-varying parameters that provide an accurate capture of transmission dynamics and reliable predictions. • We build separate neural networks for each time-varying parameter in the epidemic compartmental model respectively and perform Fourier transformation for the input data to reduce the inherent stochastic and noisy nature of real-world data. The proposed method offers a feasible way to efficiently estimate timevarying parameters, instead of handling time-varying parameters by dividing them into different time intervals, as conventional parameter estimation methods are limited to.
• We apply the proposed method to real-world reported COVID-19 data to validate its effectiveness. More importantly, the proposed Epi-DNNs approach can be easily adapted to other compartmental models, providing a convenient way to model and analyze the dynamics of infectious disease transmission in any region.
The remaining of this paper is organized as follows: in Section 2, we briefly present the related works of COVID-19 transmission modeling. In Section 3, we introduce the Fourier-induced neural networks, the SIRD compartmental model, and the overview of the proposed Epi-DNNs method as well as its implementation details. In Section 4, we present simulation results based on the real-world reported data. Then, in Section 5, we present some discussions and suggestions. Finally, a brief conclusion is made in Section 6.

Related works
The coronavirus disease 2019 (COVID-19) and its related impact have emerged as one of the most complex and threatening public health challenges ever encountered. Given uncertainties in the transmission of COVID-19, and the impact of infections, hospitalizations, and deaths, the infectious disease transmission models have been widely used since the outbreak to answer a number of questions for decision-makers. Modeling approaches for studying infectious disease transmission primarily include compartmental models, statistical models, ensemble models, and individual models.
Compartmental models allow as much complexity in the model as is necessary and can represent non-linear processes and feedback. Li et al. applied a networked dynamic meta-population mode and Bayesian inference to infer the proportion of undetected individuals in COVID-19 early infections and analyze their contribution to virus spread [21]. Tian et al. performed a quantitative analysis of the effectiveness of control measures between December 31, 2019 and February 19, 2020, using a data set that included case reports, human movement, and public health interventions [22]. Wei et al. proposed an extended SEIR mode to evaluate how the implementation of clinical diagnostic criteria and universal symptom survey contributed to COVID-19 control in Wuhan, China [23]. Wang et al. used a modeling approach to reconstruct the full-spectrum dynamics of COVID-19 in Wuhan between January 1 to March 8, 2020 across 5 periods defined by events and interventions, identified the high covertness and high transmissibility features of the outbreak [24]. Liu et al. took the conversion rate between asymptomatic infections and reported/unreported symptomatic infections into account, and proposed an infectious dynamics model that adapts to all-people testing (APT). It adapted to densely populated metropolises for APT on prevention, where the result seemed more reasonable, and epidemic prediction became more accurate [25]. The primary limitation of these compartmental models is that they are subject to assumptions about the transmission process and the parameters.
Statistical models depend heavily on the quality and quantity of historical data used to make the prediction. Ensemble models are a compilation of multiple model outputs, which mitigate the risk of relying on just one model. Individual models incorporate each individual in the population as a separate agent in the model with their own individual assumptions and parameters. Each of these four categories of model structures has advantages and limitations. AI technologies have been intensively applied to modeling COVID-19, including daily infection prediction, medical imaging, health and clinic records, protein sequences, and drug discovery, et al. AI plays a significant role to control the COVID-19 pandemic disease, Intelligent Systems and Methods to Combat Covid-19 collection [26] categorizes and summarizes different intelligent systems and methods to prevent further COVID-19 spreading and provides a detailed description of various application scenarios. Chimmula et al. applied the long short-term memory (LSTM) networks to model the spread of infectious diseases in Canada to predict the severity of COVID-19 [27]. Jayanthi et al. used the Autoregressive Integrated Moving Average (ARIMA) model, LSTM, Stacked LSTM, and Prophet [28] models to analyze and predict the global cumulative number of confirmed cases, death cases, and recovered cases [29]. Nabi et al. studied four deep learning models: LSTM, Gated recurrent unit (GRU) networks, Convolutional neural networks (CNN), and Multivariate CNN to understand the future dynamics of COVID-19 flawlessly [30].
Physics-informed machine learning introduces a learning bias by directly embedding prior knowledge to make a more accurate and robust performance. Recent studies have successfully applied physicsinformed machine learning to study the complex outbreak dynamics of COVID-19 by integrating advanced epidemiology models into deep neural networks. Kharazmi et al. analyze several epidemiological models through the lens of PINNs to identify time-dependent parameters and data-driven fractional differential operators [18]. Long et al. proposed a variant of PINNs to identify the time-varying parameters of the Susceptible-Infectious-Recovered-Deceased model for the spread of COVID-19 by fitting daily reported cases [20]. Nascimento et al. proposed an approach that can implement hybrid models combining physics-informed and data-driven kernels, where the latter are used to reduce the gap between predictions and observations [31]. Cai et al. adopted a Caputo-Hadamard fractional derivative to refine the classical susceptible-exposed-infected-removed model, then inferring the fractional order and time-dependent parameters as well as unobserved dynamics of the fractional SEIR model via fractional physics-informed neural networks [32].

Neural network modeling
Deep neural networks. Mathematically, a deep neural network (DNN) defines a mapping of the form where and are the dimensions of the input and output, respectively. Generally, a standard neural unit of a DNN receives an input ∈ R and produces an output ∈ R , i.e., = ( + ) with ∈ R × and ∈ R being weight matrix and bias vector, respectively. (⋅), which is referred to as the activation function, is designed to add element-wise non-linearity to the model.
A DNN with hidden layers can be regarded as a nested composition of sequential standard neural units. For convenience, we denote the output of the DNN by ( ; ) with standing for the set of all weights and biases. Specifically, the ℎ neuron in layer can be formulated as where [ −1] represents the value of the ℎ neuron in the − 1 layer, [ −1] represents the number of neurons in the − 1 layer, [ −1] is the activation function of the − 1 layer, [ ] is the weight between the ℎ neuron in the − 1 layer and the ℎ neuron in the layer, and [ ] is the bias of the ℎ neuron in the layer.
ResNet block. Residual Network architecture (ResNet) was proposed to solve the problem of vanishing/exploding gradient of deep convolutional neural networks in computer vision tasks [33]. The key idea of ResNet is the skip connections by allowing alternate shortcut path for the gradient to flow through, which enables the model to learn the identity functions to guarantee that the higher layer will perform at least as good as the lower layer. For the given advantages, ResNet methods have been widely used in DNNs for solving PDEs and have shown extraordinary performance in approximating the solution and high-order derivatives of PDEs [34,35]. The architecture of ResNet is (2) Fourier mapping. The activation function is one of the critical factors for designing the architecture of DNNs. As a non-linear transformation that bounds the value of the input data, it directly affects the performance of DNNs models in practical applications. Several different types of activation functions have been used in DNNs, such as ReLU( ) = max{0, } and tanh( ).
A well-proven phenomenon is that DNNs show a spectral bias or frequency preference, that is, DNNs will first capture the low-frequency components of input data [36]. Within this sense of spectral bias and Fourier approximation, a given real function  ( ) can be expressed in the following sine and cosine expansions: where ( ,̃) and ( ,̄) represent DNN modules, and { 0 , 1 , 2 , …} are the frequencies of interest in the target function, where = 0 will always be included. Definitely, recent works have shown that using Fourier feature mapping as an activation function can remarkably improve the capacity of DNNs [37][38][39][40][41]. Therefore, a novel activation function can be expressed as Eq. (3) based on spectral bias and Fourier approximation. It can mitigate the pathology of spectral bias and enable networks to well learn the target function [37,38]: where is a user-specified vector (not trainable) which is consistent with the number of neural units in the first hidden layer for DNNs. By performing a Fourier feature mapping of the input data, the input points in R can be mapped to the range [−1, 1]. After that, the following layers of the neural network can process the feature information in Fourier space efficiently. The neural network architecture part of the Epi-DNNs method is shown in Fig. 2.

Compartmental models
Compartmental models enable the simulation of multi-state population transitions by incorporating domain knowledge and mathematical assumptions to characterize the dynamics of infectious diseases.
X. Ning et al.

Fig. 2.
Illustration of the representation of Fourier basis in DNNs with ResNet block. [1] represents the weight of the th hidden units in the first hidden layer.
Transmission dynamics models of infectious diseases are generally represented as the following non-linear dynamical system: where ∈ R (typically ≫ 1) is the state variable, ∈ [ 0 , ] is the time, ( 0 ) is the initial state, and stands for the parameters of dynamical system.
The basic Susceptible-Infection-Recovered-Death (SIRD) model is extended on the SIR model, which describes the interaction of the virus with the host during transmission, and divides the population into 4 types: susceptible, infected, recovered, and deceased. The SIRD model can be described by the following ordinary differential equations: where S(t), I(t), R(t), D(t) denote the number of susceptible, infected, recovered and deceased individuals over time, respectively. ≥ 0 represents the transmission rate of the disease. ≥ 0 represents the recovery rate, which is the proportion of infected individuals that recover from the disease per unit of time. ≥ 0 is the death rate. The model is initialized at some conventional = 0 with values ( 0 ) = 0 > 0, ( 0 ) = 0 > 0, ( 0 ) = 0 ≥ 0, and ( 0 ) = 0 ≥ 0. ( ) + ( ) denotes the removed individuals that are removed from the susceptible compartment due to death or immunization.
In the basic SIRD model, the three parameters of transmission rate , recovery rate , and death rate are considered time-constant. However, the long duration of the pandemic, the associated interventions

Overview of Epi-DNNs
Here, DNNs with input t are parameterized by a set of parameters as the hypothesis spaces (denoted as ) and implemented to represent the data-driven surrogate (⋅, , , ). For given differential Eqs. (4), the task is to find the value of the unknown function ( ) at a given point . Runge-Kutta method is an effective and widely used numerical method to solve differential equations, the numerical accuracy determined by its order. The classical fourth-order Runge-Kutta method is the most commonly used numerical ODE solver.
The formula basically computes the next value +1 using the current plus the weighted average of four increments. Then the expression of time-varying parameters ( ), ( ), and ( ) for the SIRD model can be obtained by minimizing the following loss function: where the observed data for ( ), ( ), ( ) and ( ) at = 0 , 0 + 1, 0 + 2, … with a given time interval [ 0 , ] are denoted as 0 , 1 , 2 , …, 0 , 1 , 2 , …, 0 , 1 , 2 , … and 0 , 1 , 2 , …, respectively.  stands for the nonlinear mapping of the fourth-order Runge-Kutta method for given input data and parameters. It should be noted that the fourth order Runge-Kutta Method has the local and global error of ( 5 ) and ( 4 ) with being the step size, respectively. In addition, we introduce four positive relaxing factors , , , and to balance the contribution of 2 , 2 , 2 , 2 and the regularization sum of network parameter in the loss function, respectively.
To obtain the ideal * , * and * , optimization methods such as gradient descent (GD) or stochastic gradient descent (SGD) are required to update the parameters of the DNNs during the training. In this context, the SGD is given by: where the learning rate decreases with increasing and = { , , }. Algorithm 1 describes the workflow of the proposed Epi-DNNs method for solving nonlinear dynamical Eqs. (5).

Experimental setting
Data. The simulation is based on the real-world COVID-19 data announced on the official website by Shanghai Municipal Health Commission. 1 The related data set includes exhaustive information on the time series ( ), ( ), and ( ) covering February 25 to May 27 2022. Here ( ) refers to the sum of confirmed and asymptomatic (here asymptomatic infections are assumed to transform into recovered status after 6 days [42]). These time series data are smoothed with a 7-day moving average to smooth out the errors in the data. Key events and corresponding dates are taken into account to better understand the evolution of the COVID-19 transmission in Shanghai, which are listed below: • February 25, 2022: First asymptomatic case. Framework. Each neural networks implemented in this paper comprise 5 layers, where the weight matrix and the bias vector of the th layer are respectively 1 ∈ R 1×35 , 2 ∈ R 35×50 , 3 ∈ R 50×30 , 4 ∈ R 30×30 , 5 ∈ R 30×20 and 1 ∈ R 35 , 2 ∈ R 50 , 3 ∈ R 30 , 4 ∈ R 30 , 5 ∈ R 20 . In this numerical experiment, all neural networks are trained by the Adam optimizer, where the initial learning rate is 2×10 −3 with a decay rate 95% for every 1000 epochs. In addition, the regular factors is set as 0.0005, and max epoch is set as 100 000.

Result analyses
Data fitting. We observe in Fig. 3 that the value of the loss function tends to decrease from the beginning to the end during the training process and gradually stabilizes in the range of minimal values. The formula of loss functions (8) indicates that it represents the differences between real-world data and predicted data, therefore well-performed loss function demonstrates the excellent fit between data and model.

Inferences.
We are interested to infer the parameters , , and by solving the inverse problem of the SIRD model. Fig. 4 shows  time behavior of the fitted parameters is consistent with their expected dynamics. Since the high transmissibility and immune escape properties of the Omicron variant, the infections increased sharply following the first case. The authorities of Shanghai started imposing the closure of partial public places on March 2, followed by a series of interventions to combat the outbreak of Omicron. These interventions achieved a certain success, as demonstrated by a significant reduction in transmission rates and ( ). However, the outbreak was not under control ( ( ) > 1). On March 16, grid precise management was implemented but with limited effectiveness until March 26. As can be seen, the transmission rate behaves as a fluctuating oscillation, with ( ) consistently greater than 1. Until the lockdown of eastern Shanghai on March 28 and the lockdown of western Shanghai on April 1, transmission rates ( ) and effective reproduction number ( ) showed a continuous decreasing trend, with ( ) gradually approaching 0, and the outbreak was curbed. On the other hand, the recovery rate and the death rate are expected to increase and decrease, respectively, thanks to the use of more effective treatments for the disease.
Forecasting. The non-linear ODEs system requires determined initial conditions and model parameters to make predictions. As the initial conditions can be obtained from the training data and the model parameters are already calibrated, we can forecast the epidemic dynamics by solving the forward problem. In the prediction part, the value of ( ), ( ) and ( ) are assumed to be their final value of the training time window. Fig. 5 depicts the data fitting and prediction obtained by using the identified time-varying model with the parameters given above. The perfect match between the predictions and the observations demonstrates the parameters inferred by the learned network are very plausible, as well as the generalization ability of the model.

Evaluation metrics
By comparing forecasting results and observations, the performance of the proposed Epi-DNNs can be evaluated. We use four evaluation metrics to make fair and effective comparisons. They are mean absolute error (MAE), average absolute percentage error (MAPE), root mean square error (RMSE), and relative error (REL). Their corresponding equations are shown in Eqs. (9) (10) (11) and (12), respectively.
To test the performance of the proposed Epi-DNNs method in the prediction, we did 3-days, 5-day, and 7-day experiments. The experimental results represented in Table 1 show the forecasting capability with high accuracy of the proposed Epi-DNNs method.

Discussion
The global pandemic COVID-19 has affected the lives of most people severely around the world, even causing numerous loss of lives. COVID-19 has reshaped the focus of global scientific attention and efforts, and researchers across the world have done much work to analyze the dynamic of COVID-19. Among them, exploring combining mathematical modeling and emerging AI technology to capture the complex outbreak dynamics of COVID-19 is a promising research topic. In this paper, we proposed the Epi-DNNs method to combine the deep learning method with the compartmental model to model the realtime dynamics of COVID-19. Experiment results demonstrate that the time-varying parameters of the compartmental model identified by the proposed Epi-DNNs method are consistent with expectations.  The transmission rate determines the dynamics of the epidemic, and the time-varying ( ) estimated by the proposed Epi-DNNs method can accurately capture the changes in government interventions and individual behaviors. The recovery rate ( ) and the death rate ( ) are expected to increase and decrease, respectively, thanks to the more effective treatments for the disease. The identified ( ) and ( ) by our proposed Epi-DNNs method also fit well with the improved capacity of the healthcare system to fight against COVID-19. The effective reproduction number R is the transmission process of the virus, which represents the number of people transferred from the susceptible group to the infected group per unit of time. Chen et al. divided the epidemic into three phases to describe the epidemiological characteristics and spatiotemporal transmission dynamics of the Omicron outbreak in Shanghai and estimated the dynamics of ( ) [43]. Lou et al. constructed an extended compartmental model to retrospectively analyze the epidemic in Shanghai from 26 February 2022 to 31 May 2022 across four periods defined by related interventions and estimated R. As shown in Figure, the value of ( ) estimated by proposed Epi-DNNs method is consistent with those given by other researchers [44]. More importantly, by applying estimated parameters to the compartmental model to depict the dynamics of COVID-19, the perfect fitting between model predictions and observed data also underscores that parameters yield great fitness.
For different research scenarios, compartmental models are required to divide different compartments such as asymptomatic and symptomatic, adding the virus mutations, or adding the vaccination campaign. The proposed Epi-DNNs method is easy to be implemented without any background knowledge about numerical analysis (for example, stability conditions). For applying the Epi-DNNs method to other compartmental models, practitioners only needs to redefine the transformation matrix for each compartment according to the equations and build DNNs with the help of some libraries that implement deep neural networks. Therefore, the proposed Epi-DNNs method is applicable for parameter estimation of other compartmental models and other areas around the world and future infectious diseases. Although the proposed Epi-DNNs provide an important modeling method for infectious disease transmission, there are also some limitations. Due to that it is impossible to build a state-of-the-art compartmental model that can represent all the scenarios of COVID-19, we selected only the SIRD model as an example to test the performance of Epi-DNNs. In addition, the data we use is official statistics, and there will be some differences from the actual data in the real world. Furthermore, the proposed Epi-DNNs method employs fully connected networks to build the model, which may not be the optimal approach. In the following works, we will try to apply other neural networks, such as RNNs and LSTMs, with the compartmental model for COVID-19 modeling.

Conclusion
In this paper, we proposed a novel Epi-DNNs method to identify the time-vary parameter for the epidemic compartmental model to accurately depict the dynamic of COVID-19. Incorporating domain knowledge, mathematical modeling, and AI techniques, we analyze the dynamics of COVID-19 using the compartment model and identify its parameters with neural networks and real-world observations. Experimental results revealed that the proposed Epi-DNNs method indeed calibrates the parameters of the compartmental model accurately and effectively. Based on the estimated parameters, reliable predictions are performed to validate the feasibility and predictability of the proposed idea to model the dynamic of COVID-19. We emphasize that our method can easily be implemented without any background knowledge about numerical analysis (for example, stability conditions) but about some libraries for implementing neural networks. Therefore, the proposed Epi-DNNs method is applicable for parameter estimation of other compartmental models and other areas around the world and future infectious diseases.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.