A mathematical model for the novel coronavirus epidemic in Wuhan, China

We propose a mathematical model to investigate the current outbreak of the coronavirus disease 2019 (COVID-19) in Wuhan, China. Our model describes the multiple transmission pathways in the infection dynamics, and emphasizes the role of the environmental reservoir in the transmission and spread of this disease. Our model also employs non-constant transmission rates which change with the epidemiological status and environmental conditions and which reflect the impact of the on-going disease control measures. We conduct a detailed analysis of this model, and demonstrate its application using publicly reported data. Among other findings, our analytical and numerical results indicate that the coronavirus infection would remain endemic, which necessitates long-term disease prevention and intervention programs.


Introduction
A severe outbreak of respiratory illness started in Wuhan, a city of 11 million people in central China, in December 2019. The causative agent is the novel coronavirus which was identified and isolated from a single patient in early January and subsequently verified in 16 additional patients [1]. The virus is believed to have a zoonotic origin. In particular, the Huanan Seafood Market, a live animal and seafood wholesale market in Wuhan, was regarded as a primary source of this epidemic, as it is found that 55% of the first 425 confirmed cases were linked to the marketplace [2]. Meanwhile, recent comparisons of the genetic sequences of this virus and bat coronaviruses show a 96% similarity [3]. This is the third zoonotic human coronavirus emerging in the current century, after the severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002 that spread to 37 countries and the Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 that spread to 27 countries. Typical symptoms of COVID-19 infection include dry cough, fever, fatigue, A number of modeling studies have already been performed for the COVID-19 epidemic. Wu et al. [11] introduced a susceptible-exposed-infectious-recovered (SEIR) model to describe the transmission dynamics, and forecasted the national and global spread of the disease, based on reported data from December 31, 2019 to January 28, 2020. They also estimated that the basic reproductive number for COVID-19 was about 2.68. Read et al. [12] reported a value of 3.1 for the basic reproductive number based on data fitting of a SEIR model, using an assumption of Poisson-distributed daily time increments. Tang et al. [13] proposed a deterministic compartmental model incorporating the clinical progression of the disease, the individual epidemiological status, and the intervention measures. They found that the control reproductive number could be as high as 6.47, and that intervention strategies such as intensive contact tracing followed by quarantine and isolation can effectively reduce the control reproduction number and the transmission risk. Imai et al. [14] conducted computational modeling of potential epidemic trajectories to estimate the size of the disease outbreak in Wuhan, with a focus on the human-to-human transmission. Their results imply that control measures need to block well over 60% of transmission to be effective in containing the outbreak. In addition, Gao et al. [15] developed a deep learning algorithm to analyze the infectivity of the novel coronavirus and predict its potential hosts. Their results indicate that bats and minks may be two animal hosts of this virus.
Most of these models have emphasized the significant role of the direct, human-to-human transmission pathway in this epidemic [16], as highlighted by the facts that the majority of the infected individuals did not have any contact with the marketplaces in Wuhan, that the number of infections has been rapidly increasing, and that the disease has spread to all provinces in China as well as more than 20 other countries. In particular, a large number of infected individuals exhibit a relatively long incubation period so that they do not show any symptoms and are unaware of their infection for as long as 10-14 days, during which time they can easily transmit the disease to other people through direct contact.
On the other hand, the models published thus far have not taken into account the role of the environment in the transmission of COVID-19. For example, it is reported that environmental samples taken from the areas of the Huanan Seafood Market have come back positive for the novel coronavirus [4], suggesting that the pathogen may be transmitted through the environmental reservoir. When infected individuals cough or sneeze, they may spread the virus to the environment through their respiratory droplets which then may go on to infect other people with close contact of the same area. Such transmission would especially be facilitated during the early period of the disease outbreak when the general public was not aware of the infection risk, infected individuals were not isolated, and most people did not wear face masks. Even worse, there is a possibility that the virus may survive in the environment for several days, increasing the risk of contamination via surfaces and fomites [17,18]. Such environmental survival was confirmed for SARS-CoV [19]. A most recent study, based on the review of 22 types of coronaviruses, reveals that coronaviruses such as SARS-CoV, MERS-CoV and endemic human coronaviruses can persist on inanimate surfaces like metal, glass or plastic for up to 9 days [20], providing strong evidences for the pathogen's environmental survival. Additionally, the novel coronavirus has been found in the stool of some infected individuals [5], which may contaminate the aquatic environment, and fecal-oral contact remains a possible route of transmission for this disease.
In the present paper, we present a new mathematical model for COVID-19 that incorporates multiple transmission pathways, including both the environment-to-human and human-tohuman routes. In particular, we introduce an environmental compartment that represents the pathogen concentration in the environmental reservoir. A susceptible individual may contract the disease through the interaction with the contaminated environment, with an infectious but asymptomatic individual, or with an infectious and symptomatic individual. Meanwhile, the transmission rates in our model depend on the epidemiological status and environmental conditions which change with time. In particular, when the infection level is high, people would be motivated to take necessary action to reduce the contact with the infected individuals and contaminated environment so as to protect themselves and their families, leading to a reduction of the average transmission rates. Such varied transmission rates also reflect the strong disease control measures that the Chinese government has implemented, including large-scale quarantine, intensive tracking of movement and contact, strict isolation, extending the Lunar New Year holiday, and advising the public to stay home and avoid spreading infection.
The remainder of this paper is organized as follows. In Section 2, we present our model and assumptions, and conduct a detailed mathematical analysis. In Section 3, we conduct numerical simulation by incorporating the infection data reported for the city of Wuhan. In Section 4, we conclude the paper with some discussion.

Formulation
We divide the total human population into four compartments: the susceptible (denoted by S), the exposed (denoted by E), the infected (denoted by I), and the recovered (denoted by R). Individuals in the infected class have fully developed disease symptoms and can infect other people. Individuals in the exposed class are in the incubation period; they do not show symptoms but are still capable of infecting others. Thus, another interpretation of the E and I compartments in our model is that they contain asymptomatic infected and symptomatic infected individuals, respectively.
We introduce the following model to describe the transmission dynamics of the COVID-19 epidemic: dV dt = ξ 1 E + ξ 2 I − σV , (2.1) where V is the concentration of the coronavirus in the environmental reservoir. The parameter Λ represents the population influx, μ is the natural death rate of human hosts, α −1 is the incubation period between the infection and the onset of symptoms, w is the diseaseinduced death rate, γ is the rate of recovery from infection, ξ 1 and ξ 2 are the respective rates of the exposed and infected individuals contributing the coronavirus to the environmental reservoir, and σ is the removal rate of the virus from the environment. The functions β E (E) and β I (I) represent the direct, human-to-human transmission rates between the exposed and susceptible individuals, and between the infected and susceptible individuals, respectively, and the function β V (V) represents the indirect, environment-to-human transmission rate. We assume that β E (E), β I (I) and β V (V) are all non-increasing functions, given that higher values of E, I and V would motivate stronger control measures that could reduce the transmission rates. Specifically, we make the following assumptions: Apparently, system (2.1) has a unique disease-free equilibrium (DFE) at The infection components in this model are E, I, and V. The new infection matrix F and the transition matrix V are given by where w 1 = w + γ + μ. The basic reproduction number of model (2.1) is then defined as the spectral radius of the next generation matrix FV −1 [21]; i.e., which provides a quantification of the disease risk. The first two parts ℛ 1 and ℛ 2 measure the contributions from the human-to-human transmission routes (exposed-to-susceptible and infected-to-susceptible, respectively), and the third part ℛ 3 represents the contribution from the environment-to-human transmission route. These three transmission modes collectively shape the overall infection risk for the COVID-19 outbreak.

Equilibrium analysis
We now analyze the equilibria of the system (2.1) which will provide essential information regarding the long-term dynamics of the disease. Let (S, E, I, R, V) be an equilibrium of model (2.1) and thereby satisfy the following equations It follows from the first two equations of (2.6) that S can be denoted by a function of I, namely, Yang and Wang Page 5 Math Biosci Eng. Author manuscript; available in PMC 2020 July 23.
Meanwhile, in view of the second equation of (2.5) and Eqs (2.6), we obtain Let us now consider curves S = ϕ(I), I ≥ 0 and S = ψ(I), I ≥ 0. In particular, the intersections of these two curves in ℝ + 2 determine the non-DFE equilibria. Clearly, ϕ(I) is strictly decreasing, whereas ψ(I) is increasing since β E , and Thus, we conclude:
Therefore, by Eq (2.6), we find that the model (2.1) admits a unique equilibrium, the DFE X 0 , if ℛ 0 ≤ 1; and it admits two equilibria, the DFE X 0 and the EE X * , if ℛ 0 > 1.
In what follows, we perform a study on the global stability of the DFE. By a simple comparison principle, we find that 0 ≤ S + E + I + R ≤ S 0 and 0 ≤ V ≤ ξ 1 + ξ 2 S 0 σ . Thus, it leads to a biologically feasible domain Theorem 2.1. The following statements hold for the model (2.1).
Proof. Let X = (E, I, V) T . One can verify that where the matrices F and V are given in Eq (2.3). By manipulating some algebraic computaion, we let u = (β E (0), β I (0), β V (0)). It then follows from the fact Differentiating ℒ along the solutions of (2.1), we have If ℛ 0 < 1, the equality If ℛ 0 = 1, then the equality It is easy to see that Yang and Wang Page 7 Math Biosci Eng. Author manuscript; available in PMC 2020 July 23.
In contrast, if ℛ 0 > 1, then it follows from the continuity of the vector fields that dℒ 0 dt > 0 in a neighborhood of the DFE in Ω°. Thus the DFE is unstable by the Lyapunov stability theory. The last part of the theorem can be proved by the persistent theory [23] which is similar to the proof of Theorem 2.5 in Gao and Ruan [24]. □ In addition, we have conducted an analysis on the global asymptotic stability of the endemic equilibrium [25,26], and the details are presented in the following theorem. Essentially, these stability results establish ℛ 0 = 1 as a forward transcritical bifurcation point, or, a sharp threshold for disease dynamics, and indicate that reducing ℛ 0 to values at or below unity will be sufficient to eradicate the disease. In other words, our model (2.1) exhibits regular threshold dynamics. In order to simplify our notations, we will adopt the abbreviations Hence, The last inequality follows from the assumptions that β P (P) and β P (P)P, where P can represent E, I, or V, are non-increasing and non-decreasing functions of P, respectively. This implies 1 − β p * β P ≤ 0 P * ≤ P β P P β p *P * − 1 ≥ 0.
Similarly, one can verify that  Therefore, X * is globally asymptotically stable in Ω°. □

Numerical results
We now apply our model to study the COVID-19 epidemic in the city of Wuhan. We use the outbreak data published daily by WHO and other sources [7,[27][28][29][30]. These data sets contain the daily reported new cases, cumulative cases, and disease-caused deaths for the city of Wuhan, as well as each province in China and all other countries that have reported COVID-19 infection.
To conduct the numerical simulation, we consider the following functions for the three transmission rates in our model: We implement our model and conduct numerical simulation for an epidemic period starting from January 23, 2020, when the city of Wuhan was placed in quarantine, to February 10, 2020. According to the estimate made by the Chinese government, about 9 million people remain in Wuhan after January 23 and they are not allowed to move out of the city. Meanwhile, only a relatively small number of people (mainly public health professionals) travel into the city since its lockdown. Thus, the influx rate Λ in our model is only based on newborns in Wuhan. The values of the transmission constants β E0 and β I0 can be found in a recent study [13]. The incubation period of the infection ranges between 2-14 days, with a mean of 5-7 days [31], and we take the value of 7 days in our model. The average recovery period is about 15 days [31], and so we set the disease recovery rate as γ = 1/15 per day.
Members of the coronavirus family can survive in the environment from a few hours to several days [19], and we take the value of 1 day which results in a virus removal rate σ = 1 per day. Additionally, since the Chinese government has been implementing a very strict isolation policy and intense medical care for all the confirmed cases, represented by I in our model, the chance of those infected individuals spreading the coronavirus to the environment connected with the general public is very low, and so we assume the virus shedding rate from the infected individuals is zero; i.e., ξ 2 = 0. Note that our results in Theorem 2.1 and Theorem 2.2 still hold in this case since the contribution of the coronavirus to the environmental reservoir remains a positive number w 1 ξ 1 . These and other parameters, their values and sources are provided in Table 1. There are three parameters, however, that remain to be determined: the environment-to-human transmission constant β V0 , the transmission adjustment coefficient c, and the virus shedding rate ξ 1 by the exposed individuals. The values of these parameters are not available in the literature because the models published thus far have not considered the environmental component for the COVID-19 infection, and they have generally applied constant transmission rates which remain fixed in time.
To estimate the values for these three parameters, similar to [32], we fit our model to the daily reported infection data for Wuhan from January 23 to February 10 by using the standard least squares method. Based on reported data, the initial condition is set as (S(0), E(0), I(0), R(0), V(0)) = (89985051000, 475, 10, 10000) [33]. Figure 1 shows the numbers of cumulative confirmed cases in Wuhan during this period versus our fitting curve. The parameter values and their 95% confidence intervals are presented in Table 2. The normalized mean square error (NMSE) for the data fitting is found as 0.0058.
Based on the parameter values from data fitting, we are able to evaluate the basic reproduction number ℛ 0 = 4.25. Specifically, we find that ℛ 1 = 1.959, ℛ 2 = 0.789, ℛ 3 = 1.497, which quantify the infection risk from each of the three transmission routes. Among these three components, the largest one ℛ 1 comes from the exposed-to-susceptible transmission, since exposed individuals show no symptoms and can easily spread the infection to other people with close contact, often in an unconscious manner. Meanwhile, the smallest component ℛ 2 comes from the infected-to-susceptible transmission, possibly due to the strict isolation policy on the symptomatic infected individuals. In addition, we observe that ℛ 3 = 1.497, showing a significant contribution from the environmental reservoir toward the overall infection risk. Figure 2 displays a short-term prediction for I (the infected individuals) and E (the exposed individuals) in Wuhan using our model. It shows that the infection level, starting from January 23 (marked as day 0 in our simulation), would continue increasing for about 80 days, reach a peak value around 45,000 infections, and then gradually go down afterwards. Meanwhile, the long-term behavior of the epidemic would be determined by the property of the endemic equlibrium of the system, which is found as X * = (2583683, 1353, 2735, 6528015, 3111). A phase portrait of I vs. E is provided in Figure 3, where all the solution orbits converge to the endemic equlibrium, illustrating its global asymptotical stability that is stated in Theorem 2.2.
In addition, we have performed a numerical test using simple, constant transmission rates in our model: β E (E) = β E0 , β I (I) = β I0 , β V (V ) = β V 0 , (3.2) equivalent to setting c = 0 in Eq (3.1). This leaves two parameters, β V0 and ξ 1 , to be estimated by data fitting. Using the same set of data, we find that ξ 1 ≈ 4.28 with the 95% confidence interval (0, 15.611), and β V0 ≈ 4.91 × 10 −10 with the 95% confidence interval (4.218 × 10 −10 , 5.603 × 10 −10 ). The normalized mean square error (NMSE) for the data fitting is 0.0266, larger than that in the previous scenario, 0.0058. Meanwhile, Figure 4 shows a prediction of the Wuhan coronavirus outbreak size in this setting. Compared to Figure 2, We now clearly observe a significantly higher level of infection; particularly, the peak value appears at 2.8 × 10 6 , which is extremely large and clearly unrealistic. The result demonstrates that using fixed transmission rates, which do not take into account the strong disease control measures currently on-going in Wuhan, may overestimate the epidemic severity and generate misguided information.

Discussion
We have proposed a mathematical model to investigate the on-going novel coronavirus epidemic in Wuhan, China. There are two unique features in our model: (1) the incorporation of an environmental reservoir into the disease transmission dynamics, and (2) the use of non-constant transmission rates which change with the epidemiological status and environmental conditions and which reflect the impact of the disease control measures implemented in Wuhan. We have conducted a detailed analysis of this model, and applied it to study the Wuhan epidemic using publicly reported data.
The basic reproduction number ℛ 0 of this model consists of three parts, representing the three different transmission routes; i.e., from the exposed individuals, the infected individuals, and the environmental reservoir, to the susceptible individuals. These three transmission modes collectively shape the overall disease risk of this epidemic, suggesting that intervention strategies should target all these three transmission routes. Our equilibrium analysis of this model shows that the disease dynamics exhibit a regular threshold at ℛ 0 = 1.
We have established the global asymptotic stability of the disease-free equilibrium when ℛ 0 < 1, and the global asymptotic stability of the endemic equilibrium when ℛ 0 > 1.
Our numerical simulation results demonstrate the application of our model to the COVID-19 outbreak in Wuhan. Our model can fit the reported data well. Through data fitting, we obtain an estimate of basic reproduction number, ℛ 0 = 4.25. In particular, we find that the contribution of the environmental reservoir (measured by ℛ 3 ) is significant in shaping the overall disease risk. Our model predicts the appearance of an epidemic peak, after which the infection level would decrease and approach an endemic state in the long run. We also find that if we use constant transmission rates instead, the model would predict a much higher and unrealistic epidemic peak. This is caused by the fixed transmission rates that do not reflect the impact of on-going disease control measures. It is an indication that using epidemiologically and environmentally dependent transmission rates can potentially generate more practical simulation results.
At present, many aspects regarding the pathology, ecology and epidemiology of the novel coronavirus remain unknown, which adds challenges to the mathematical modeling.
Particularly, in our current model, we have employed a bilinear incidence rate based on the law of mass action to represent the environment-to-human transmission route [34]. Practically, though, a saturation based incidence rate might better characterize the environmental pathogen, and we hope to investigate it in our future modeling efforts.
Meanwhile, the transmission rates β E and β V in our model depend on E and V, respectively, while in reality the exposed population E and the environmental pathogen concentration V may be unknown. To better quantify these transmission rates, we could instead assume that they are functions of I; i.e., β E (I) and β V (I), since the infected population I can be easily calculated from the reported data. Nevertheless, it is reasonable to assume that E and V are positively correlated to I, and so the qualitative properties of these transmission functions would remain the same under both formulations.
Given the current development of COVID-19, it is widely speculated that this disease would persist in the human world and become endemic. Our mathematical analysis and numerical simulation results support this speculation. The findings in this study imply that we should be prepared to fight the coronavirus infection for a much longer term than that of the current epidemic wave, in order to reduce the endemic burden and potentially eradicate the disease eventually. Among other intervention strategies, new vaccines for the novel coronavirus, which are currently in research and development, could play an important role in achieving that goal.
We emphasize that our data fitting is based on the reported confirmed cases in Wuhan from January 23 to February 10 in 2020. These confirmed cases were determined by the method of nucleic acid testing kits. On February 12, 2020, the national health commission in China started including cases confirmed by another method; i.e, clinical diagnosis, which refers to using CT imaging scans to diagnose patients. This change of criteria led to a surge of confirmed cases on February 12 (with an increase of about 14,000 new cases for Wuhan in a single day), and our current study does not take into account this factor. In this regard, our prediction of the epidemic duration and size should be interpreted as applicable only to the confirmed cases based on the previous, more strict, testing method. The issues regarding the accuracy, reliability and standard of reported data are complex and are beyond the scope of this work, which is more oriented on the mathematical modeling side. We plan to address the new development of the outbreak data in another piece of work in the near future. We also plan to expand our modeling efforts to the province and country levels beyond the epicenter, the city of Wuhan, and study the spread of the novel coronavirus in larger spatial scales. Cumulative confirmed cases for the city of Wuhan from January 23, 2020 to February 10, 2020. Circles (in blue) denote the reported cases and solid line (in red) denotes the simulation result. The basic reproduction number is ℛ 0 = 4.25 based on the parameters from  A simulation result for the outbreak size in Wuhan using the transmission rates formulated in Eq (3.1), the parameters from Table 1, and the result of data fitting.  A simulation result for the outbreak size in Wuhan using the constant transmission rates given in Eq (3.2), the parameters from Table 1, and the result of data fitting. Yang  Definitions and values of model parameters.

Parameter Definition Estimated mean value Source
Λ Influx rate 271.23 per day [30] β E0 Transmission constant between S and E 3.11 × 10 −8 /person/day [13] β I0 Transmission constant between S and I 0.62 × 10 −8 /person/day [13] β V0 Transmission constant between S and V fitting by data c Transmission adjustment coefficient fitting by data μ Natural death rate 3.01 × 10 −5 per day [30] 1/α Incubation period 7 days [31] w Disease-induced death rate 0.01 per day [30] γ Recovery rate 1/15 per day [31] σ Removal rate of virus 1 per day [19] ξ 1 Virus shedding rate by exposed people fitting by data - Virus shedding rate by infected people 0 per person per day per ml -Page 21