Transmission rates and environmental reservoirs for COVID-19 – a modeling study

The coronavirus disease 2019 (COVID-19) remains a global pandemic at present. Although the human-to-human transmission route for this disease has been well established, its transmission mechanism is not fully understood. In this paper, we propose a mathematical model for COVID-19 which incorporates multiple transmission pathways and which employs time-dependent transmission rates reflecting the impact of disease prevalence and outbreak control. Applying this model to a retrospective study based on publicly reported data in China, we argue that the environmental reservoirs play an important role in the transmission and spread of the coronavirus. This argument is supported by our data fitting and numerical simulation results for the city of Wuhan, for the provinces of Hubei and Guangdong, and for the entire country of China.


Introduction
The coronavirus disease 2019 (COVID-19) has led to high morbidity and mortality in China, Europe, and the United States. The disease has spread throughout the world, creating unprecedented challenges in public health. On March 11, 2020, the World Health Organization (WHO) declared COVID-19 as a global pandemic. This pandemic is currently on-going and the number of infections has been fast increasing, with tens of millions of cases reported in over 210 countries and territories.
COVID-19 is caused by a novel coronavirus which is now named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Typical symptoms of the infection include dry cough, fever, fatigue, difficulty in breathing, and bilateral lung infiltration in severe cases [7]. Non-respiratory symptoms such as nausea, vomiting, and diarrhea are also reported in some patients [1,4,27]. SARS-CoV-2 is regarded as the third zoonotic human coronavirus emerging in the current century, after SARS-CoV in 2002 that spread to 37 countries and the Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 that spread to 27 countries.
In order to better understand the transmission and spread of COVID-19 and to forecast its epidemic development, a number of mathematical and computational models have been proposed (see, e.g. [8,11,13,17,24]). Almost all these models are based on the susceptibleexposed-infected-recovered (SEIR) compartmental framework, exclusively focussed on the direct, human-to-human transmission pathway [1,2]. On the other hand, the role of the environment in the transmission of COVID-19 has been largely neglected in current clinical and theoretical studies and rarely investigated in modeling and simulation [28]. As a result, our knowledge regarding the transmission mechanism and the epidemiological characteristics of COVID-19 remain limited at present [3,15,23,25] It has been commonly accepted that COVID-19 can be transmitted through direct contact between human hosts, and both the symptomatic and asymptomatic individuals are capable of infecting others [1,14]. In addition, though not well established yet, the indirect transmission pathway from the environment to human hosts is a highly possible route to spread the coronavirus. When infected individuals cough, sneeze or exhale, they release respiratory droplets that contain the coronavirus and most of these droplets would fall on nearby surfaces and objects. Other individuals could catch the virus by touching the contaminated surfaces or objects and then touching their eyes, noses or mouths. Meanwhile, the coronavirus released by infected individuals could float in the air as aerosols and then be breathed in when other people pass by. Such environment-to-human transmission routes, and the success of such a transmission mode, largely rely on the ability of the coronavirus to survive and persist in the environment.
Environmental survival and viability of SARS-CoV were confirmed in [6,27]. A recent study reviewed 22 types of coronaviruses and found that SARS-CoV, MERS-CoV and endemic human coronaviruses can persist on inanimate surfaces such as metal, glass or plastic for up to 9 days [9]. Another experimental study published in March 2020 found that the novel coronavirus (i.e. SARS-CoV-2) was detectable in aerosols for up to 3 hours, on copper for up to 4 hours, on cardboard for up to 24 hours, and on plastic and stainless steel for up to 3 days [22]. These findings that the virus can remain viable and infectious in aerosols for hours and on surfaces for days indicate a high probability and significant risk of environmental transmission, particularly airborne and fomite transmission, for SARS-CoV-2. Additionally, the novel coronavirus has been found in the stool of some infected individuals [4], which may contaminate the aquatic environment through the sewage water and could add another possible route of environmental transmission for COVID-19 [27].
Another limitation in the current modeling studies of COVID-19 is that the transmission rates are typically fixed as constants, rendering simplicity for both mathematical analysis and data fitting. In practice, however, the transmission rates may change with the epidemiological status and may be impacted by the outbreak control. For example, the Chinese government implemented strong disease control measures to fight COVID-19, including large-scale quarantine, intensive tracking of movement and contact, strict isolation of infected individuals, expanded medical facilities, as well as extending the Lunar New Year holiday and demanding the public to stay home. Meanwhile, when the reported infection level is high, people would be motivated to take voluntary action to reduce the contact with the infected individuals and contaminated environment so as to protect themselves and their family members [26]. As a result, the actual transmission rates may decrease even though the outbreak was ascending, leading to a low level of transmission at a time of high disease prevalence. Consequently, reflecting this time and prevalence dependent feature of transmission rates could improve the accuracy in modeling and simulating COVID-19.
In this paper, we aim to characterize the variable transmission rates of COVID-19 and quantify the role of the environmental reservoirs in the transmission of this disease, through a mathematical modeling approach. The environmental reservoirs here, defined in a broad sense, include the air, water, and all surfaces and objects surrounding human individuals. Our model differs from the previously published COVID-19 models by explicitly including both the direct and indirect transmission routes, and by incorporating timedependent transmission rates that reflect the realistic impact of disease control measures. We apply our model to the retrospective data reported in China, where COVID-19 originally started. We implement data fitting for the city of Wuhan, the provinces of Hubei and Guangdong, as well as the entire country, and conduct numerical simulation and prediction for the epidemic development of COVID-19 in China.

Methods
We utilize a mathematical model based on differential equations to investigate the transmission dynamics of COVID-19 in China. The model involves five compartments: the susceptible individuals (denoted by S), the exposed individuals (denoted by E), the infected/infectious individuals (denoted by I), the recovered individuals (denoted by R), and the concentration of the coronavirus in the environment (denoted by V). Both the exposed and infectious individuals are capable of infecting others. Individuals in the exposed class E are in the incubation period and do not show symptoms, whereas individuals in the infectious class I have fully developed disease symptoms and are identified and counted in the publicly reported data. Thus, the E and I compartments in our model contain asymptomatic infected and symptomatic infected individuals, respectively.
The model is described as follows: The parameter is the population influx rate, μ is the natural death rate for the human hosts, α −1 is the incubation period, w is the disease induced death rate, γ is the rate of recovery from the infection, ξ 1 and ξ 2 are the rates of contributing the coronavirus to the environment by exposed and infectious individuals, respectively, and σ is the removal rate of the coronavirus from the environmental reservoirs. The values of these parameters are discussed in the Appendix, Section A.2.
We incorporate both the human-to-human (i.e. direct) and the environment-to-human (i.e. indirect) transmission routes in this model. The human-to-human transmission occurs between susceptible and exposed individuals, and between susceptible and infectious individuals. We assume a bilinear incidence for each of these transmission pathways.
We also assume that each of the transmission rates β E (I, t), β I (I, t) and β V (I, t) depends on the prevalence I and time t; particularly, each one is a non-increasing function of I, given that a higher level of disease prevalence would motivate stronger outbreak control measures that could reduce the transmission rates.
In this study, we consider a time domain of T consecutive days, divided into two time periods: 0 ≤ t ≤ t * and t * < t ≤ T for some t * < T. We employ the following representation for the three transmission rates: In the formulation above, β E0 , β I0 and β V0 denote the upper bounds, and c is a positive constant that adjusts the speed of the decay, for these transmission rates when 0 ≤ t ≤ t * . During this (first) time period, each transmission rate keeps decreasing as the prevalence I increases, due to the implementation of the disease control motivated by the reported infection level. We assume that the impact of the control measures would reach the maximum strength after t * days, so that each transmission rate would attain its lower bound and stabilize at that value for t * < t ≤ T. Thus, for this (second) time period, each transmission rate becomes a constant, given by β E * , β I * and β V * , respectively. Specifically, we set a simulation period of T = 100 days. We start our simulation from January 23, 2020, when the city of Wuhan was placed in quarantine and completely locked down, followed by the nation-wide implementation of a stay-at-home order. We set the length of the first time period as t * = 20 days; i.e. from January 23 to February 11. It is worth mentioning that on February 12, 2020, the national health commission in China started including cases confirmed by clinical diagnosis, which refers to using CT imaging scans to diagnose patients, in addition to those based on the original method of nucleic acid testing kits. Thus, February 12 marks the first day in our second time period (from February 12, 2020 to May 1, 2020), which has a length of 80 days.
Through a sensitivity analysis (see details in the Appendix, Section A.3), we find that the environment-to-human transmission constant β V0 , the transmission adjustment parameter c, and the virus shedding rate ξ 1 are among the most sensitive parameters involved in our model. Meanwhile, the values of these three parameters are not available in the literature as the majority of models published thus far have focussed on the SEIR framework without considering the environmental component of COVID-19, and they have generally applied constant transmission rates which remain fixed in time. Hence, to estimate the values for these three parameters, we fit our model to the daily reported infection data during the first time period, using the standard least squares method. With their estimated values, we then conduct numerical simulation and prediction in the second time period using fixed transmission rates, formulated above.
Hence, our study consists of two major steps: data fitting in the first time period, and simulation and prediction in the second time period. We conduct the data fitting and simulation tasks separately at the city, province and country levels in China. Specifically, we first apply our model to the reported data in Wuhan city, which had been the COVID-19 epicenter until March 2020 when Europe and the US took up this position. We then use the model to study the provinces of Hubei and Guangdong, which have the highest numbers of cases among all provinces in China. Finally we extend our modeling study to the entire country of China.

Results
We first conduct the data fitting for Wuhan city from January 23 to February 11, 2020 using our model (1) that incorporates both the direct and indirect transmission routes, to estimate the values of the parameters β V0 , c and ξ 1 . Based on data reported on January 23, the initial condition for the host population is set as (S(0), E(0), I(0), R(0)) = (8999015, 500, 475, 10) [19]. The initial condition for the coronavirus concentration is set as V(0) = 1000 viruses/ml. Figure 1 shows the numbers of cumulative confirmed cases in Wuhan during this period versus our fitting curve. The parameter values and their 95% confidence intervals are presented in Table 1. To measure the goodness-of-fit, we calculate the normalized root-mean-square error (NRMSE), which is defined as follows: are computed data, and n is the number of data points used. In general, a lower value of NRMSE indicates a better fitting result. We find that the NRMSE for our Wuhan data fitting is 0.0381. Based on the parameter values from data fitting, we are able to evaluate the basic reproduction number R 0 for the disease outbreak in Wuhan during this period; see Equation (A3) in the Appendix for the expression of R 0 . Estimates of the basic reproduction numbers for COIVD-19 in China range from 2.0 to 7.0 in the literature [3,15]. In  our study, we obtain that R 0 = 5.11. In particular, the third component R 3 in R 0 provides a quantification of the infection risk from the environment-to-human transmission route. We find that R 3 = 2.33, showing a significant contribution from the environmental reservoirs toward the overall infection risk.
As mentioned before, February 12 marks the first day of the second time period in our study where the transmission rates β E (I, t), β I (I, t) and β V (I, t) are fixed as constants, described in Equation (2). We then conduct numerical simulation using these constant transmission rates for 80 days, from February 12 (day 1) to May 1 (day 80). Figure 2 displays our simulation results for the number of the cumulative cases. Particularly, we compare our simulated curve (shown by the red solid line) against the reported data (shown by the blue circles) from February 12 to May 1, and we observe very good agreement. Indeed, this is confirmed by the NRMSE between the simulated and reported data, which is found as 0.0135. The simulated curve levels off after March 15 (day 33), consistent with the fact that there were almost no new cases reported in Wuhan after March 15 [18,19]. Meanwhile, to illustrate the impact of the environment-to-human transmission pathway on our result, we remove the indirect transmission route from the model and keep only the two direct (exposed-to-susceptible and infectious-to-susceptible) transmission routes, and then repeat the simulation for the same 80-day period. The result, displayed by the red dashed line in Figure 2, clearly shows that the simulated curve deviates from the actual data in this case. Although it is qualitatively evident, the result in Figure 2 provides a quantification on the contribution of the indirect transmission route toward the cumulative number of infections.
To further demonstrate the role of the environment-to-human transmission mode, we conduct both the data fitting in the first time period and the simulation in the second time period using the direct transmission mode only. Specifically, we remove the indirect transmission route from our model (1) so that only the two direct (exposed-to-susceptible and infectious-to-susceptible) transmission routes are taking effects. Then we conduct the fitting for the same Wuhan data from January 23 to February 11 (20 days), again using the standard least squares method. The result is displayed in Figure 3(a), which shows a clearly larger fitting error compared to that in Figure 1. Based on this fitting result, we then conduction the simulation from February 12 to May 1 (80 days) again using solely the direct transmission routes, and the result is displayed in Figure 3(b). We observe that the simulated curve is significantly below the reported data, a pattern similar to that in Figure 2 (see the dashed line). Together, these fitting and simulation results indicate that we would underestimate the disease burden and the total cumulative cases if the indirect transmission route is not considered.
Next, we apply the same two-step procedure to Hubei province, which contains Wuhan city. In the data fitting step, we fit our model (1) with both the direct and indirect transmission routes to the reported data in Hubei province for 20 days from January 23 to February  11 (i.e. the first time period). The result is presented in Figure 4(a). The values of the three parameters β V0 , c and ξ 1 and their 95% confidence intervals are presented in Table 2. The NRMSE for this data fitting is 0.0357. The basic reproduction number is R 0 = 4.45, and the component associated with the indirect transmission is R 3 = 2.79. In the simulation step, we fix the three transmission rates β E (I), β I (I) and β V (I) as constants, and conduct numerical simulation for 80 days from February 12 to May 1 (i.e. the second time period). . Data fitting and simulation results for Hubei province: (a) Data fitting for the cumulative confirmed cases using both the direct and indirect transmission routes, from January 23, 2020 (day 1) to February 11, 2020 (day 20). The circles denote the reported cases and the solid line denotes the fitting result. (b) Simulation for the cumulative confirmed cases based on data fitting from part (a), for a period of 80 days from February 12, 2020 (day 1) to May 1, 2020 (day 80). The transmission rates are constants in the simulation. (c) Data fitting for the cumulative confirmed cases using the direct transmission routes only, from January 23, 2020 (day 1) to February 11, 2020 (day 20). (d) Simulation for the cumulative confirmed cases based on data fitting from part (c), for a period of 80 days from February 12, 2020 (day 1) to May 1, 2020 (day 80). The transmission rates are constants in the simulation. We also apply the same fitting-simulation procedure to Guangdong province, which has the second highest number of infections among all provinces in China. The data fitting results with both the direct and indirect routes are presented in Table 3 and Figure 5(a), and the associated simulation results are presented in Figure 5(b). Meanwhile, the fitting and simulation results solely based on the direct transmission routes are displayed  in Figure 5(c,d), respectively. Although the infection level in Guangdong province is much lower than that in Hubei province, Figure 5 shows qualitatively similar behaviors, up to March 15, to those in Figure 4. There was a significant increase of the reported cases in Guangdong province after March 15, due to a relatively large number of cases imported from other countries [12], and our model does not account for such imported cases. Additionally, we have conducted data fitting and simulation, using the same two-step procedure, for the entire country of China. The results with both the direct and indirect transmission pathways are presented in Table 4 and Figure 6(a,b). The results using only the direct transmission routes are presented in Figure 6(c,d). We observe similar patterns as those at the city and province levels (see . In particular, based on the data fitting results in the first time period, we observe that the simulation curve in the second time period agrees well with the reported data if incorporating both the direct and indirect transmission routes, and that the numerical prediction would underestimate the infection risk and burden if the indirect transmission route is not included in the model. At the country level, we find the basic reproduction number is R 0 = 4.97, and the indirect transmission associated component is R 3 = 2.79. This again demonstrates the significant role played by the environment-to-human transmission pathway in the spread of COVID-19. Furthermore, we have conducted a long-term simulation for the disease prevalence I(t) over a period of 200 days starting from February 12, 2020, using our model (1) with both the direct and indirect transmission routes. The results for the city of Wuhan, the provinces of Hubei and Guangdong, and the entire country of China are presented in Figure 7. We consider two scenarios in this simulation: (1) We assume that the disease control measures (such as strict restriction of contact) are maintained at the maximum strength, which are implemented throughout the entire course so that each transmission rate is kept at its minimum; i.e. β E * , β I * and β V * from Equation (2). Other parameter values remain the same as given in Table A1. This is the same as what we have done before for the simulation of the second time period, but to extend the duration from 80 days to 200 days. The results, represented by the blue solid lines in Figure 7, clearly shows that the infection level approaches 0, a disease-free state, in the long run, consistent with our previous observation that the simulated curves of the cumulative cases reach a steady state. (2) We assume that the strength of the disease control is reduced to normal, as the number of new infections is quickly decreasing after February 12, so that each transmission rate remains a function of the infection level I and varies with time (see Equation (2)). Effectively, when the disease Figure 5. Data fitting and simulation results for Guangdong province: (a) Data fitting for the cumulative confirmed cases using both the direct and indirect transmission routes, from January 23, 2020 (day 1) to February 11, 2020 (day 20). (b) Simulation for the cumulative confirmed cases based on data fitting from part (a), for a period of 80 days from February 12, 2020 (day 1) to May 1, 2020 (day 80). (c) Data fitting for the cumulative confirmed cases using the direct transmission routes only, from January 23, 2020 (day 1) to February 11, 2020 (day 20). (d) Simulation for the cumulative confirmed cases based on data fitting from part (c), for a period of 80 days from February 12, 2020 (day 1) to May 1, 2020 (day 80). The reported cases in Guangdong province increased significantly after March 15 (day 33) due to cases imported from abroad. prevalence decreases, the transmission rates would increase due to weakened disease control. The results, represented by the red dashed lines in Figure 7, shows that the infection level would approach a positive endemic state over time. This is consistent with our analytical prediction given in Theorem A.2 in the Appendix. We also observe from Figure 7 that the simulated curves representing these two scenarios coincide during the first 30 days and then depart from each other afterwards, indicating that the impact of disease control measures at different strengths may be similar during the declining period of the outbreak, but the distinction would be reflected in the long run.
Finally, we present another set of fitting and simulation results to further demonstrate our model (1) with both the direct and indirect transmission routes. The accuracy of the reported COVID-19 data in China (as well as many other countries) has been under debate. In a recent Science article [11], the authors reported that about 86% of all infections in  China were undocumented before January 23, 2020, and about 35% of all infections were undocumented afterwards. Using their estimated reporting rates which are constants in different time periods (i.e. 0.14 and 0.65 for reported new cases before and after January 23, respectively), we modify the numbers of COVID-19 cases in Wuhan city, Hubei and Guangdong provinces, and the entire country of China, and conduct data fitting and simulation again based on the same two-step procedure as described before. The results are presented in Table 5 and Figure 8. Despite the different datasets with elevated infection numbers, we observe similar patterns as those in Figures 1-6 at the city, province and country levels; the data fitting in the first time period (shown in Figure 8(a,c,e,g)) and the numerical simulation in the second time period (shown in Figure 8(b,d,f,h)) both agree well with the modified data. These results provide another piece of evidence for our model application, and indicate that the accuracy of the reported data does not have a significant impact on the validity of our methods.

Discussion
Using a new mathematical model, we have performed a study on the transmission dynamics of COVID-19 in the city of Wuhan, the provinces of Hubei and Guangdong, and the entire country of China. Our model incorporates both the direct and indirect transmission routes, and emphasizes the role played by the environmental reservoirs in the transmission and spread of COVID-19. A unique feature of our model is that the transmission rates depend on the disease prevalence and time, with the potential to catch the realistic evolution of the contact pattern and transmission variation under the impact of the COVID-19 outbreak and the disease control measures. Our numerical study consists of two major parts: the data fitting in the first time period (20 days from January 23 to February 11, 2020), and the simulation and prediction in the second time period (80 days from February 12 to May 1, 2020). The selection of February 12, 2020 to separate these two periods is based on several factors. First, after 20 days of implementing strong disease control measures ordered by the Chinese government, it is reasonable to assume that the disease transmission rates have been minimized and stabilized. Second, the change of the testing methods and criteria on February 12 led to a surge of confirmed cases in China, with an increase of about 14,000 new cases for Wuhan city alone in a single day. The spike in the reported data poses a potential difficulty for a curve fitting. Our approach overcomes this challenge by conducting data fitting up to February 11, 2020, and then using fixed transmission rates for the simulation and prediction afterwards. The results generated, particularly the good agreement between the simulation curves and the real data during the second time period at all the city, province and country levels, provide a validation of this approach. We have also compared the results with those obtained by removing the indirect transmission route from our model and using only the direct transmission routes for data fitting and simulation, and the comparison provides another validation of our approach. We have carried out the fitting and simulation tasks for both the publicly reported data and the modified data based on the estimated reporting rates from [11], and we observe very similar patterns.
The data fitting is employed to estimate the three key parameters β V0 , c and ξ 1 which are highly sensitive in our model. With their values fitted from the reported data, we are able to evaluate the basic reproduction number R 0 at the city, province and country levels. In each of these calculations, we find that the component R 3 , which measures the risk of the environment-to-human transmission route, takes up a significant portion (ranging from 45% to 62%) of R 0 . Furthermore, the simulation results in the second time period demonstrate that without incorporating the environmental transmission mode in our model, we would underestimate the severity and burden of the disease.
The 'environment' in this work is defined in a broad sense, including the air and all the surfaces and objects surrounding human individuals. The environment-to-human (or, indirect) transmission routes in our study include, but not limited to, all the fomite transmissions through contacting the contaminated surfaces and airborne transmissions through inhaling the floating coronavirus in the form of aerosols. Recent experimental findings, particularly in [22], provide clear evidences that coronaviruses can persist in these environmental settings from several hours to a few days, implying the risk of transmitting the pathogen from the environmental reservoirs to the human hosts. Our modeling study provides a quantification of this risk, and demonstrates that the environmental reservoirs indeed play an important role in shaping the COVID-19 epidemic in China. Our results provide a strong justification for the practice of hand washing and surface cleaning and disinfection, recommended by CDC and WHO, as these actions can effectively reduce the transmission of coronaviruses through the environment-to-human pathway.
Mathematical representation of the transmission rates for COVID-19 is certainly not unique, and our formulation in Equation (2) provides a new and feasible way to characterize the temporal variation of the actual transmission rates under the impact of the epidemic evolution and the disease control measures, particularly for the outbreak period in China. Our numerical results demonstrate that this characterization, on which our two-step, data fitting and simulation, study approach is based, can well predict the epidemic development of COVID-19 in China from February 12 to May 1 in 2020. This pattern, though, does not necessarily reflect the reality for a long period of time.
The results presented in Figure 7 show that in the long run the curve with maximum disease control approaches 0 (the disease-free equilibrium), while the curve with normal disease control approaches a positive steady state (the endemic equilibrium). It is reasonable to expect that the realistic situation would find a way somewhere between these two hypothetical scenarios. Because relatively few new infections have appeared in China since early March of 2020, it is practically not meaningful to continue implementing control measures at the maximum strength. On the other hand, due to the severe consequence that COVID-19 has caused, many people would remain cautious in their behavior and attempt to reduce or limit their contact with others for a certain period of time. As a result, the practical transmission rates in the near future are expected to range between those associated with the two scenarios, and the real infection level would be bounded below by the disease-free state (determined by the maximum control) and above by the endemic state (determined by the normal control).

Acknowledgments
This work was partially supported by the National Institutes of Health under grant number 1R15GM131315. The authors are grateful to the two anonymous referees for their helpful comments that have improved the original manuscript.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was partially supported by the National Institutes of Health under grant number 1R15GM131315.

A.2 Parameter values
The values of the transmission constants β E0 and β I0 can be found in a recent study [17]. The incubation period of the infection ranges between 2-14 days, with a mean of 5-7 days [16]; we take the value of 7 days in our model. The average recovery period is about 15 days [16], and so we set the disease recovery rate as γ = 1/15 per day. Members of the coronavirus family can survive in the environment from a few hours to several days [6,22], and we take the value of 1 day which results in a virus removal rate σ = 1 per day. Meanwhile, since the Chinese government has been implementing a mandatory isolation policy and intense medical care for all the confirmed cases, represented by I in our model, the chance of those infected individuals spreading the coronavirus to the environment connected with the general public is extremely low, and so we assume the virus shedding rate from the infected individuals is zero; i.e. ξ 2 = 0. It turns out that the sensitivity of the parameter ξ 2 is also very low (see Section A.3). Additionally, the influx rate depends on the location and population size. For example, the period of our concern starts on January 23, 2020, when the city of Wuhan was placed in quarantine and no one was allowed to move out of the city. Only a relatively small number of people (mainly public health professionals) travelled into the city since its lockdown. Thus, the influx rate for Wuhan city is mainly based on newborns and we obtain = 271.23 per day [18].
Other parameter values are determined through data fitting.

A.3 Sensitivity analysis
The values of the three parameters β V0 , c and ξ 1 in our model are not available in the literature, and are thus estimated from data fitting. It turns out that these three are among the most sensitive parameters in all the model parameters. Below we provide details of our sensitivity analysis.
We consider the 9 parameters β E0 , β I0 , β V0 , c, α, w, γ , ξ 1 and ξ 2 in our model. We employ the basic differential equation analysis approach [5] to derive the sensitivity equations for our system (1). Let X = {S, E, I, R, V} and P = {β E0 , β I0 , β V0 , c, α, w, γ , ξ 1 , ξ 2 }. For X ∈ X , y ∈ P, we define the relative sensitivity s(X, y) of the state X to the parameter y, non-dimensionalized by the state X and the parameter value y, as s(X, y) = ∂X ∂y To compute the partial derivative ∂X ∂y , which is also referred to as a quasi-state variable, we differentiate it with respect to t to obtain We then numerically solve for the quasi-state solutions { ∂X ∂y : X ∈ X , y ∈ P} by associating system (1) with system (A5).
A typical set of results are presented in Figure A1, where we plot the relative sensitivities of I with respect to these 9 parameters in the set P, presented in three groups. We clearly observe that all these sensitivity variables approach a steady state in the long term. In particular, within each of the three groups, we have Hence, we conclude that the three parameters β V0 , c and ξ 1 all have high sensitivities compared to other parameters. Meanwhile, we notice that the sensitivity of the parameter ξ 2 is extremely low, close to 0 all the time, which provides another justification of setting ξ 2 = 0 in our study. Figure A1. The relative sensitivities s(I, y) for the 9 parameters y ∈ P.

A.4 Equilibrium analysis
In this section, we conduct a mathematical analysis for the model (1) when β E (I, t) = β E (I), β I (I, t) = β I (I) and β V (I, t) = β V (I); i.e. when these transmission rates do not explicitly depend on time t. We focus our attention on the equilibria of the system (1) which will provide essential information regarding the long-term dynamics of the disease.
Therefore, by Equation (A7), we find that the model (1) admits a unique equilibrium, the DFE X 0 , if R 0 ≤ 1; and it admits two equilibria, the DFE X 0 and the EE X * , if R 0 > 1.
In what follows, we perform a study on the global stability of the DFE. By a simple comparison principle, we find that 0 ≤ S + E + I + R ≤ S 0 and 0 ≤ V ≤ ξ S 0 σ . Thus, it leads to a biologically feasible domain = (S, E, I, R, V) ∈ R 5 Theorem A.1: The following statements hold for the model (1).
Proof: Let X = (E, I, V) T . One can verify that where the matrices F and V are given in Equation (A2). By some algebraic manipulation, we let u = (β E (0), β I (0), β V (0)). It then follows from the fact R 0 = ρ(FV −1 ) = ρ(V −1 F) and direct calculation that u is a left eigenvector associated with the eigenvalue R 0 of the matrix V −1 F; i.e. uV −1 F = R 0 u. Consider a Lyapunov function Differentiating L along the solutions of (1), we have If R 0 < 1, the equality dL 0 dt = 0 implies that uX = 0. This leads to E = I = V = 0 by noting that all components of u are positive. Hence, when R 0 < 1, equations of (A6) yield S = S 0 , and E = I = R = V = 0. Thus, the invariant set on which dL 0 dt = 0 contains only the point X 0 . If R 0 = 1, then the equality dL 0 dt = 0 implies that It is easy to see that Hence, we have either , and S = S 0 . As analyzed before, each case would indicate that the DEF X 0 is the only invariant set on {(S, E, I, R, V) ∈ : dL 0 dt = 0}. Therefore, when R 0 < 1 or R 0 = 1, the largest invariant set on which dL 0 dt = 0 consists of the singleton X 0 = (S 0 , 0, 0, 0, 0). By LaSalle's Invariance Principle [10], the DFE is globally asymptotically stable in if R 0 ≤ 1.
In contrast, if R 0 > 1, then it follows from the continuity of vector fields that dL 0 dt > 0 in a neighbourhood of the DFE in . Thus the DFE is unstable by the Lyapunov stability theory. The last part can be proved by the persistence theory [20].
Theorem A.2: If R 0 > 1, then the unique endemic equilibrium X * of the system (1) is locally asympototically stable.
Theorems A.1 and A.2 establish a sharp threshold at R 0 = 1 for the disease dynamics: when R 0 ≤ 1, the infection will be eliminated and the disease-free state will be accomplished; when R 0 > 1, the infection will persist and approach an endemic state in the long run. The results underscore the importance of reducing R 0 below unity, through public health policies and outbreak control Virus shedding rate by exposed people fitting by data ξ 2 Virus shedding rate by infected people 0 per person per day per ml strategies, and through the control of both the direct and indirect transmission routes, in order to contain and eventually eradicate the disease.