A discrete stochastic model of the COVID-19 outbreak: Forecast and control

: The novel Coronavirus (COVID-19) is spreading and has caused a large-scale infection in China since December 2019. This has led to a signiﬁcant impact on the lives and economy in China and other countries. Here we develop a discrete-time stochastic epidemic model with binomial distributions to study the transmission of the disease. Model parameters are estimated on the basis of ﬁtting to newly reported data from January 11 to February 13, 2020 in China. The estimates of the contact rate and the e ﬀ ective reproductive number support the e ﬃ ciency of the control measures that have been implemented so far. Simulations show the newly conﬁrmed cases will continue to decline and the total conﬁrmed cases will reach the peak around the end of February of 2020 under the current control measures. The impact of the timing of returning to work is also evaluated on the disease transmission given di ﬀ erent strength of protection and control measures.


Introduction
COVID-19 is the novel coronavirus that has spread among humans, mainly in China, since December 2019 [1]. It was first discovered in Wuhan, the capital city of the Hubei province [2]. The virus subsequently spread to other provinces in China. Cases of infection have also appeared in other countries [3,4]. COVID-19 is a respiratory virus that is transferred via contact with an infected person through droplets when a person coughs or sneezes, or through saliva droplets. The main clinical manifestations of the infection are fever, fatigue, respiratory symptoms (mainly dry cough), and emergence of dyspnea. Most patients have mild to moderate symptoms with good prognosis, but some patients can be in critical condition or die [1]. As of February 15, 2020, it has been reported that 68,500 people have confirmed with the COVID-19 infection and 1596 people have died in the mainland of China [5].
To avoid the risk of a large-scale movement of population that can accelerate the spread of the disease, the Chinese government has implemented various measures. The government keeps track of people who had direct contact with the COVID-19 patients, known as contact tracing. From the last day of contact with a confirmed patient, contacts are monitored for signs of illness within 14 days. If contacts develop fever or other symptoms, they are isolated, tested, and hospitalized immediately to prevent further spread of the virus to others. Since Jan 23, 2020, metropolitan-wide quarantine and traffic restriction have been taken in Wuhan and several nearby cities. As the outbreak grows rapidly, China is imposing a more massive quarantine. All schools have postponed the start of the spring semester. Most companies have also postponed the starting date of work after the holiday of Spring Festival. A number of countries or regions have issued restrictions on the entry of Chinese citizens. A large number of domestic and international flights have been canceled. There is no doubt that the outbreak of this virus infection has severely affected people's life, economy and health. How long this situation will last and when the disease will be controlled is a great concern to everyone.
In the last few decades, there were several major outbreaks of infectious diseases, such as atypical pneumonia (SARS) in 2003, H1N1 influenza in 2009, and H7N9 influenza in 2013. It is imperative to improve early predictive and warning capabilities for the disease endemic or pandemic. Mathematical models, combined with the prevalence data, have been used to study the dynamics, analyze the causes and key factors of the outbreaks, forecast the trend of disease spread, and provide optimal control strategies and measures. For example, Zhou and Ma [6] formulated a discrete mathematical model to investigate the transmission of SARS. Their simulation results agree with the data and indicate that early quarantine and a high quarantine rate are crucial to the control of SARS. Chowell et al. [7] and Lekone et al. [8] developed ordinary differential equations and stochastic SEIR models to study the dynamics of infectious disease and the effect of control interventions, respectively. Their models used the outbreak of Ebola in the Democratic Republic of Congo in 1995 as a case study. Very recently, some researchers have studied the spread of COVID-19 [9][10][11]. Most of them estimated the basic reproductive number R 0 , a key parameter to evaluate the potential of viral transmission [12][13][14][15][16]. Tang et al. [17,18] constructed a deterministic compartmental model and also estimated the basic reproductive number. They provided the predicted results under different degrees of control measures.
In this study, we will use a discrete-time stochastic compartment model to study the dynamics of COVID-19 epidemics. The model includes the transmission of COVID-19 and the government's measures to control the disease spread. The model captures the epidemiological status of individuals in the clinical progression of the disease and the changes in each time interval. We make full use of the existing reported data and perform parameter estimation based on the data. Furthermore, we forecast the spread of the disease in the next period of time based on the parameter estimates and numerical simulation. We also use simulations to evaluate the risk of returning to work at different timing and provide suggestions on when people can start their routine work and life.

Model
Based on the development and epidemiological characteristics of COVID-19 infection, the SEIR model is appropriate to study the dynamic of this disease. The population is partitioned into subpopulations as susceptible (S (t)), exposed (E(t)), infected (I(t)), hospitalized (H(t)), and recovered (R(t)). Let N denote the total population size. The government takes measures to track and quarantine people who have close contact with confirmed cases. Thus, a fraction of the susceptible population is quarantined and identified as S q (t) and a fraction of the exposed population is isolated and identified as E q (t). We consider the discrete time point series T = 0, 1, 2, · · · , T n as the time progression of the disease. Here the time step is chosen to be one day, i.e., h = 1. At this timescale, the number of each compartment is dependent on the number in the previous day and the inflows and removals from other compartments during the day. Let B i j (t) be the number of individual transportation between compartments. We provide their detailed descriptions as follows: The transition of an individual from one state to the next state can be considered as a stochastic process. The time length that an individual has been in a certain compartment obeys exponential distribution. If we assume that the parameter of the exponential distribution is λ(t), then the probability that individuals leave the current state in the time interval h is 1 − exp(−λ(t)h). Further, the numbers of inflows and removals from other compartments during one day can be generated by a binomial distribution. The number of experiments in the binomial distribution is the number of individuals in the current compartment. The transmission of the disease is presumed to occur in the context of close contact between susceptible and infected persons. Assuming the transmission probability is β and the contact rate is c(t), then βc(t) I(t) N is the exponential rate that leads to the probability of individuals leaving the susceptible compartment. In view of contact tracing, we denote the quarantined proportion of individuals exposed to the virus is q. Based on the above assumptions and the stochastic SEIR model in [8], the discrete-time stochastic compartment model for COVID-19 infection is constructed as The above random variables involve binomial distributions Bin(n, p) with the following probabilities: Note that the number of S (t) is approximately the total population N in China, which is a large number. The limit distribution of the binomial distribution is the poisson distribution. This is used instead for B 11 (t) and B 12 (t). The variation of these seven compartments and their relationship are illustrated in the diagram 1. The other parameters are summarized in Table 1. Reducing exposure is one of the effective measures to control the spread of disease. The public raises awareness of precaution and takes fewer trips or wears a mask in the presence of a disease outbreak. The government has taken more control measures, e.g., blocking the entrances and exits of the city to stop the movement of population. In this model, the contact rate c(t) (i.e. the average number of contacts of a person per unit time) is assumed to be a piecewise function. It is a constant before the time point t * at which control measures are taken. It is assumed to decrease gradually from c 0 to c u . We assume that c(t) takes the following form: The city of Wuhan was blockaded on January 23, 2020. The actual data we used for data fitting start from January 11, 2020. Thus, we let t * = 12.

Data
We obtained the data from the National Health Commission of the People's Republic of China [5] and the Health Commission of the Hubei Province [19]. The data that will be used for parameter estimation include the cases in the mainland of China, such as the newly reported confirmed cases, the newly recovered cases, the new death cases, and the number of people released from medical observation. The total population N in this study is the population of China, approximately 1.4 billion. The cumulative number of reported confirmed cases can be used to compare with the model simulations. The total number of confirmed cases is the cumulative number of cases minus the cumulative number of recovered and death cases. As a complete reporting coverage of the cases has been available since January 11, 2020, the data used in our analysis are from January 11 to February 13, 2020. Since February 13, the Hubei province has carried out screening of the previous suspected cases and revised the diagnosis results, adding "clinical diagnosis" to the diagnosis classification. Although the number of clinical cases has been public as confirmed cases in Hubei, the clinical diagnosis is not included in the number of confirmed cases in this study.

Parameter estimation
In general, Bayesian estimation or the maximum likelihood estimation method can be used to estimate unknown parameters in this type of stochastic models [8,20]. Because B i j (t), where (i, j) ∈ {(1,1), (1,2), (2,1), (3,1), (3,2), (3,3), (4,1), (5,1), (6,1), (6,2)}, are conditionally independent, the likelihood function can be the accumulation of probability densities of all variables, given by where g i, j is the binomial densities of B i j (t). The best scenario is that the value of transportation between compartments can be obtained. However, the exposed and some infected people are impossible to be identified. In the reported cases about COVID-19 infection, the data we obtained is the time series for B 41 , B 61 , B 62 , and the number of new cases, which is equal to the sum of B 51 and B 31 . The remaining time series including B 11 , B 12 , B 21 , B 32 , B 33 are hard to be determined. It is difficult to accomplish parameter estimates using Bayesian estimation or the maximum likelihood estimation due to such a data structure. Therefore, the Metropolis-Hastings (MH) algorithm will be used to estimate parameters in our model by carrying out the Markov Chain Monte Carlo (MCMC) procedure [21,22].
In the model, we assume σ = 1/7 and λ = 1/14 because the quarantined individuals were isolated for 14 days and the incubation period of COVID-19 is about 7 days [17,23]. The initial conditions are the data on January 11, that is, H 0 = 41, R 0 = 6, and the sum of S q and E q is 739. We estimated the parameters from the mean values of 10,000 samples after a burn-in period of 5000 iterations. The estimated results of all parameters relevant to COVID-19 infection are displayed in Table 1. The model parameters fitted to the data of COVID-19 infection, including the newly reported confirmed cases and the total confirmed cases, are shown in Figure 2. In addition, stochastic simulations in Figure 3 show that the model can provide a good fit to the data of newly recovered cases and new death cases of the COVID-19 infection.

The reproductive number
Comparing with the basic reproductive number, the effective reproductive number can be used to measure the number of secondary cases generated by one primary case in a population in which there is partial immunity or some intervention measures have been implemented [24][25][26][27][28]. It changes during the progress of the disease outbreak. As the population size is much larger than the resulting size of the outbreak, i.e., S (t)/N ≈ 1, the effective or control reproductive number of our model is given by the following formula In contrast with the basic reproductive number that usually involves a constant contact rate, we use a time-varying contact rate c(t) in the effective reproductive number [17]. Using the parameter values estimated from the data fitting, the effective reproductive number R c (t) can be computed numerically. The result shown in Figure 4 indicates that R c (t) is large at the beginning of the disease outbreak.
However, as control measures are implemented, the effective reproductive number decreases eventually to less than 1.    The effective reproductive number R c (t) Figure 4. The effective reproductive number R c (t) from January 11 to February 13, 2020. The gray line is the range due to disturbance in the stochastic model.

Prediction
An assumption of our model is that the contact rate is exponentially decreasing to c u due to various prevention measures. The estimates show that the maximum value of the contact rate is 34.03 at the beginning of disease spread. As control measures are implemented, the contact rate declines to 0.93. This indicates that susceptible people are well protected by these measures and there is a small chance of transmission by infected people. Using this assumption and estimated parameters, we can make predictions about the future trend of the disease spread. In Figure 5(a) and (b), the simulations show the dynamics of the number of newly confirmed cases and the total confirmed cases in the 350 days after January 11, 2020. The number of newly cases has begun to decrease slightly compared with before. Figure 5(a) also indicates that the number of newly confirmed cases will continue to decline in the future. The number of newly confirmed cases will decrease to 0 in about 102-119 days after January 11, i.e., April 22, 2020 to May 9, 2020. Figure 5(b) shows that the number of total confirmed cases is still increasing. It will reach the peak in about 46-48 days after January 11, that is, February 26, 2020 to February 28, 2020.

Control measure
Up to now, the Chinese government has adopted very strict measures to control the spread of the disease. Most people are quarantined at home. Most companies have stopped working and all schools have postponed the opening date. How long will such measures last? If people start to work and students go to school as usual, the population flow will increase the risk of contact. Will this lead to a second outbreak of the disease? When does people's life can go back to normal? These are issues of great concern to everybody in the country. We consider the following four scenarios: Using the above assumptions and fixing other parameters according to data fitting, we numerically investigate the dynamics of the number of newly confirmed cases and total confirmed cases, which are shown in Figure 6. The simulations suggest that when people return to work early without sufficient protection it is likely to observe the increase of newly confirmed cases. The total number of confirmed cases will also increase by more than 2000. If people return to work on March 20, it will result in a small increase in the number of newly confirmed cases. However, it's going to decrease quickly. Thus, postponing the return to work would be of great help to control the disease transmission.

Discussion
In view of the randomness in the transmission of the COVID-19 infection, we construct a discretetime stochastic compartment system to study the dynamic behavior of the disease outbreak, in which the population in each compartment is assumed to obey a binomial distribution. In the model, we further assume that the contact rate between susceptible and infected individuals decreases exponentially since the government has implemented strict control measures. This discrete system can make a good use of the newly reported data, including the number of newly reported confirmed cases, newly recovered cases, new death cases, and those quarantined that are released per day, to calibrate the model parameters. Because the diagnostic criterion for confirmed cases was revised on February 13, we use the data reported from January 11 to February 13, 2020 in this study. Although there are no data for the populations in the other groups, we estimated the parameters in the model using the MCMC procedure based on the exiting reported data.
The maximum value of contact rate we estimated is 34.02 and then dropped to 0.93. It indicates that the control measures implemented are very effective. Figure 3 shows that the effective reproductive number of the infection declines from 6.96 to 0.47 over the time period of simulation. Some other groups have also estimated R 0 of the COVID-19 infection [29,30]. See a summary of the estimates in the reference [31]. Some estimates of R 0 are comparable to ours (before various control measures are taken) and some are smaller. This may be due to different estimation methods under different model assumptions. In any case, the results indicate that this novel coronavirus is highly contagious in the early stage (e.g., higher than the SARS coronavirus outbreak in 2003 [31]). However, the control measures implemented so far are shown to be efficient in lowering the effective reproductive number to below 1.
The stochastic fits show that the reported data are less than those simulated in the initial phase, but the subsequent simulation agrees well with the reported data. Most of the data are in the region of the stochastic simulation. By the time of the submission of this manuscript, the number of newly reported confirmed cases has begun to decrease. Assuming that the current control measures adopted by the government and individuals are maintained, the predicted results shown in Figure 4 indicate that the peak of total confirmed cases will reach around late February of 2020, followed by a decrease. The newly confirmed cases will decline to zero in late April or early May of 2020. To study the timing of returning to work on the disease dynamics, we consider four different scenarios on the time to return to work and the strength of protective measures. We assume the contact rate is 3 if the resumption of work causes wide migration of people and the protective measures are not sufficient. In this case, the simulation shows that the infection will have a second outbreak if people return to work on March 1. The situation will be better if people can return to work after March 20, 2020.
In conclusion, our simulation shows that the contact rate is a key factor on the control of the COVID-19 outbreak. When there is a sign of epidemic, we should raise awareness of self-protection and take all possible protective measures, such as wearing a mask and staying indoor to reduce the risk of getting infected. Although the number of new cases of infection is decreasing, there is still a possibility of future outbreaks if there are no adequate protective measures or people return to work early. The public should not relax their vigilance against the transmission of this highly contagious disease.