Parameter Estimation and Prediction of COVID-19 Epidemic Turning Point and Ending Time of a Case Study on SIR/SQAIR Epidemic Models

In this paper, the SIR epidemiological model for the COVID-19 with unknown parameters is considered in the first strategy. Three curves (S, I, and R) are fitted to the real data of South Korea, based on a detailed analysis of the actual data of South Korea, taken from the Korea Disease Control and Prevention Agency (KDCA). Using the least square method and minimizing the error between the fitted curve and the actual data, unknown parameters, like the transmission rate, recovery rate, and mortality rate, are estimated. The goodness of fit model is investigated with two criteria (SSE and RMSE), and the uncertainty range of the estimated parameters is also presented. Also, using the obtained determined model, the possible ending time and the turning point of the COVID-19 outbreak in the United States are predicted. Due to the lack of treatment and vaccine, in the next strategy, a new group called quarantined people is added to the proposed model. Also, a hidden state, including asymptomatic individuals, which is very common in COVID-19, is considered to make the model more realistic and closer to the real world. Then, the SIR model is developed into the SQAIR model. The delay in the recovery of the infected person is also considered as an unknown parameter. Like the previous steps, the possible ending time and the turning point in the United States are predicted. The model obtained in each strategy for South Korea is compared with the actual data from KDCA to prove the accuracy of the estimation of the parameters.


Introduction and Problem Statement
The novel coronavirus  is an emerging disease that was first spread from Wuhan, China. This disease has developed in the entire world and has attracted worldwide attention. Since COVID-19 has affected more than 200 countries worldwide, it is important to model this disease correctly and identify how it spreads and predict the disease to take the necessary steps. In this regard, predictive mathematical models play a crucial role in investigating the epidemic spreading in a lack of specific antivirals or effective vaccine. Many articles have been published about disease modeling and estimating the unknown parameters of infectious dis-eases, including COVID-19. Hence, in the following, after reviewing the other papers, we will discuss our objectives and innovations in the present paper.
Accordingly, the authors in [1] estimated the parameters of the SIR model of COVID-19 in India using an actual data set. Also, Bastos and Cajueiro [2] have used two variations of the SIR-type model (SIR and SIAS) to forecast the evolution of the SARS-CoV-2 virus with the real data in Brazil. The second wave forecasting of spreading the COVID-19 in Iran with the SIR model is considered in [3]. The authors in [4] have also forecasted the trend of COVID-19 using the least square error (LSE) technique. Furthermore, in [5], the key epidemic parameters are estimated in the generalized SEIR model to forecast COVID-19 on epidemic size, peak time, and possible ending time for five different regions. The asymptomatic and quarantined people have not been considered in these papers. Due to the nature of COVID-19, it is recommended to consider the group of quarantined and asymptomatic individuals. Therefore, in our work, we modified the SIR model by considering the asymptomatic people, and people were put into quarantine. Also, the author in [6] introduced a SIR-type model that only considered the asymptomatic individuals for COVID-19 in Northern Italy based on parameter estimation.
In [7], the parameters and initial values of the SIR epidemic model are estimated for reported case data of the Hong Kong seasonal influenza epidemic in New York City in [1968][1969], to recognize the relevance between unreported and reported cases. The study by Liu et al. [8] is aimed at developing the mathematical model considering a new group of unreported cases for the COVID-19 epidemic in Wuhan, in which the parameters and the initial conditions of the proposed model are estimated. Thereupon, using the parameterized model, the number of unreported cases is identified. Hence, since the initial values are known in the present study, obtained through real data, there is no requirement to estimate them. The study by Hadeler [9] is aimed at identifying the time-dependent transmission rate in epidemic SIR, SIRS, and SEIRS models and reviewing and comparing the various results. In addition, in [10], the authors estimate the infection rate β of a SIR epidemic model based on input-output (IO) equations depending on the known quantity of output measurement and its derivatives. Furthermore, the authors in [11] introduced a more complete epidemic model for influenza that can be used for other diseases by parameter modification. In this regard, the authors in [12] have applied optimal control to the proposed epidemic model for COVID-19 compared to Ebola and influenza.
There are many different methods to estimate parameters in various epidemic models that can be used as required. For example, if new data is added during the identification process, then the model should be based on the observations until the current time. Therefore, the parameter estimation should be computed recursively over time, as described in [13] in detail. Moreover, if the model is considered twodimensional, the study by Shafieirad et al. [14] can be helpful. In addition to the continuous models considered for epidemic dynamics, discrete models can also be used, which are discussed in [15] in detail. Also, since some people who have previously been infected with COVID-19 have been reported to be resusceptible, the authors in [16] introduced a modified SEIRS model considering the possibility of susceptibility for recovered people for control action. In [17], a new mathematical model with time-dependent coefficients is used to characterize the dynamics of COVID-19 in three countries: S. Korea, Italy, and Brazil.
Since the prevalence of COVID-19 in the United States is on the rise, it is vital to make predictions on the possible ending time. The method mentioned in this article can be applied to other countries and similar diseases. Since the prevalence of COVID-19 in South Korea has decreased and there is a complete set of data, taken from KDCA, an accurate model can be obtained to predict the ending point of the disease in other countries (including the United States).
Our motivation is to evaluate our method's efficiency on a classical SIR and SQAIR epidemiological model to predict the turning point and ending time of the COVID-19 disease in the United States. For this purpose, the method used in this study is the following.
Using actual data of South Korea, taken from KDCA, which has provided accurate and well-documented statistics on the prevalence of the coronavirus disease, the epidemic model's unknown parameters can be estimated. Using the obtained determined model, the possible ending time of COVID-19 in the United States can be predicted. Also, we use two strategies in this article to implement our motivation: In the first strategy, the unknown parameters of a classical SIR (susceptible-infected-recovered) epidemiological model are estimated using the LS method more easily. Therefore, the turning point and ending time of COVID-19 in the United States are predicted. There may be asymptomatic carriers in the community in the incubation period despite having the disease and even despite the COVID-19 test result leading to transmit the disease to others. As a result, in this study, we also considered this group of people as asymptomatic people in our model and modified the basic SIR model to the SAIR model. Additionally, since there is no cure or vaccine for COVID-19 yet, it is necessary to quarantine susceptible people, and since there are no groups to include this group (Q) in the SAIR model, therefore in the next step, the SAIR model is developed to the SQAIR model by introducing a quarantine group. Besides, considering the delay in transferring people from the infected group to the recovered group is an essential factor added to the SQAIR model because it makes the model more realistic and closer to the natural process of spreading COVID-19. The same steps are then applied to the SQAIR model to achieve the turning point and ending time of the COVID-19 outbreak in the United States. Since there is no proper viral treatment or effective vaccine yet to prevent and control the spreading rate, currently, the best options and widely used strategies for decreasing the outbreak's growth rate are social distancing, stay-at-home orders, self-quarantine, lockdowns, isolation, and wearing a face mask.
In this paper, the group of quarantined people (Q) refers to all the above strategies, which are only called quarantined people, for this group's simplicity. As mentioned in the papers above, other groups can be added to the model, but this study is aimed at predicting COVID-19 with a comprehensive and straightforward model to incorporate the general features of the COVID-19 disease and can easily express the behavior of the disease. Furthermore, considering the delay in systems is critical because it is closer to the real world. Hence, in this study, delay in transferring infected people to the group of recovered people is considered an unknown parameter.
In the following, the general structure of the paper is presented.
In the first section of the paper, the introduction and problem statement were introduced. Our paper continues with Section 2, which presents the SIR model with dynamic 2 Computational and Mathematical Methods in Medicine equations and diagrams. In Section 3, the estimation of unknown parameters, model upgrade, prediction, and comparing results are presented. Finally, the conclusion is given in Section 4.

The SIR Epidemic Model
The SIR epidemic model used in this paper is described as follows: let S be the number of susceptible people to infection, I the number of infected people (people who have been tested positive for COVID- 19), and R the number of recovered people. The SIR epidemic model is given by where β is the transmission rate and the initial conditions are S 0 ≥ 0, I 0 ≥ 0, and R 0 ≥ 0. All states are positive values ðS, I, R ≥ 0Þ. The total population N includes individuals who have been tested. In other words, the total number of people considered as a statistical population (due to the normalization, N) is equal to one.
Remark 1. The total population N includes the individuals who have been tested (it is a statistical society that can be generalized to the total size of the population), which is generally variable. But in this work, it is fixed and equal to the total number of people on the last day that data is taken (72nd day). According to the other researches, N is usually considered as the whole number of the country population; however, it is challenging to consider all population sizes of the people (almost 330 million in the U.S.) involved with COVID-19 because this disease is not equally distributed in all the states of a vast country like the United States. Hence, we considered a smaller community (people who have been tested) as our statistical society, which contains all three groups of people (S, I, and R). The parameters are also more accurately identified. For example, suppose the number of infected people is 500,000 and the total number of population is about 330 million. In that case, the ratio of infected people to the total population becomes small, and the estimated parameters are not obtained correctly. Of course, when we consider the statistical society as tested individuals, we can generalize them to the entire population.
As shown in Figure 1, the infected people recover at a rate of g. μ d indicates the removal rate of infected people due to mortality caused by infection.
Since the epidemic model parameters are unknown, estimating these parameters with the real data taken from South Korea is the main objective of this paper. As a result, using the known parameters, the spread of infection in the United States can be predicted by the method presented in Section 3.
Remark 2. The nature of epidemic models is discrete because data are collected and/or reported over discrete units of time that makes it easier to compare data with the output of a discrete model and can be easily implemented. For system identification, it is required to measure the input and output data in the time domain. Then, select a model structure (usually discrete model) and apply an estimation method (LS method in this paper) to estimate unknown parameter values. Since, in this study, the identification data, taken from medical reports, are daily, the discrete desired model structure is determined. Furthermore, these data may be weekly (also daily) in fast-spreading epidemics, such as influenza, SARS, Ebola, and especially novel coronavirus . Basically, epidemic modeling is all discrete in nature which can be considered continuous with a small step length. Of course, after estimating the parameters, it can be simply written in continuous form. Furthermore, the numerical investigation of discrete-time epidemic models is more straightforward. There has been some study of discrete epidemic models referred to in our paper [15].
Remark 3. Assume that the initial values S k 0 ≥ 0, I k 0 ≥ 0, and R k 0 ≥ 0 in which k 0 = 0 and all parameters ðμ d , β, gÞ are all positive. In the mentioned model, the change rate of the susceptible people is as S k+1 = −βS k I k , which shows that susceptible people become infected with the rate of β and move from group S to I. Then, after a period when the number of susceptible people reaches zero, the rate of change (S k+1 = −βS k I k ) becomes zero and remains unchanged. After the number of the susceptible people reaches zero on a specific day ðS k 1 = 0Þ, Eq. ð1 − bÞ changes to I k+1 = −ðg + μ d ÞI k which is a difference equation that eventually tends to zero ðI k 2 = 0Þ. On the other hand, the number of recovered people increases at the rate of g and when the number of infected people reaches zero the recovered people remains at its maximum value ðR k 2 = R m , R m > 0Þ. As a result, on day k 2

Parameter Estimation, Prediction, and Comparing Results
According to the daily official reports of the Korea Disease Control and Prevention Agency (KDCA), the numbers of infected and daily deaths are available in public. The number of infected people ðIÞ (people who have tested positive for COVID-19) and people who have died of the coronavirus disease ðdÞ are specified in Table 1. Using Equations According to Equations (3) and (4), the number of infected, susceptible, and recovered people is determined. Minimizing an objective function leads to estimate the unknown parameters ðβ, g, μ d Þ, presented in two strategies.

SIR Strategy.
In the first strategy, three curves (S, I, and R ) are fitted to the real data of South Korea, given in Table 1. The goodness of fit describes how well the function fits a set of actual data shown in Table 2 with two criteria, sum of square error (SSE) and root mean squared error (RMSE) that measure the deviation of the actual data from the curve fitted to the data. For these two criteria, the smaller the value, the better the model fits. Therefore, according to Table 2, the fit results are reasonable because the SSE and RMSE values are small and close to zero. Applying the least square method to the objective functions leads to estimate the unknown values of the parameters. The error between the fitted curves and the actual data is considered as the objective function. Given the objective functions J 1 , J 2 ,and J 3 , where N T is the total number of data.
The basic reproduction number can also be estimated as R 0 = β/ðg + μ d Þ, based on estimated parameters (see [11] for details). The authors in [21] also estimated the reproduction number based on publicly available sources, which is a critical point in the outbreak of COVID-19, to investigate the growth rate of the COVID-19 outbreak in South Korea. According to Table 2, the uncertainty range of the basic reproduction number can be calculated in the following: R 0 min = β min /ðg max + μ dmax Þ as the lower range and R 0 max = β max /ðg min + μ dmin Þ as the upper range. The desired basic reproduction number can be calculated R 0 desired = β mean /ð g mean + μ dmean Þ using the mean of parameters in Table 3. Then, the number of susceptible, infected, and recovered people is shown, respectively, in Figures 2-4. The real data series of the susceptible, infected, and recovered people obtained from Table 1 is compared with the number of people taken from the model with estimated parameters.
As it turns out, the resulting SIR model is properly fitted to South Korean data, so this model can be used to predict the possible ending point of COVID-19 in the United States. Because COVID-19 is spreading out rapidly in the United States, it can be crucial to know the turning (inflection) point and possible ending time of the disease to make an effective decision. As shown in Figures 5-7, in the simplest strategy (SIR), the epidemic situation for the United States is not hopeful for the next 50 days, and the turning point of the disease is in the middle of June, and the number of infected people in the peak is about twice its current value (Apr. 28, 2020). However, fortunately, it is expected to end up completely within seven months (from Apr. 28, 2020).
However, in order to get closer to the real world, the model can be developed. Therefore, our studies will be expanded in the following strategy.

SQAIR Strategy.
Since coronavirus disease is currently incurable, quarantine is a priority in all countries. Therefore, a new group called quarantined people can be added to the proposed model. Also, considering a new hidden state can make the model more realistic. This hidden state can be indicated by A that includes asymptomatic people, which is very common in COVID-19. Delay in the transfer of infected people to the group of recovered people is also considered. So, these three different conditions can be considered as follows: (1) The new group added ðAÞ is infected people who have negative COVID-19 test and no symptoms. They are in their incubation period that can transmit the disease to others without any visible symptoms (2) In coronavirus disease, infected people continue to be carriers of the virus after recovery, so they remain in the infected group because they can continue to infect susceptible individuals at the rate of β, so they go to the group of recovered people with a delay ðk d Þ (3) The quarantined people are shown by ðQÞ. Figure 8 shows the quarantine group and how to transfer to that group. In different countries, the quarantine rate of susceptible individuals may vary, so we consider this rate equal to ψ Equations (1a)-(1c) are reformulated as follows: where α is the rate of transfer of individuals from group A to I . The values of vectors k d and A k are unknown, and ψ is the Remark 4. Assume that the initial values S k 0 ≥ 0, Q k 0 ≥ 0, A k 0 ≥ 0, I k 0 ≥ 0, and R k 0 ≥ 0 in which k 0 = 0 and all parameters ðμ d , β, g, ψ, αÞ are all positive. The rate of change of the susceptible people is as S k+1 = −ZS k in which Z = βðI k + A k Þ + ψ ≥ 0, which remains at zero after zeroing the number of susceptible people ðS k 1 = 0Þ: The rate of change of Q is ascending, which remains at its maximum with the zeroing of susceptible individuals ðQ k 1 = Q m Þ. Then, after a period when the number of susceptible people reaches zero, the rate of change of asymptomatic people ðA k+1 = βðI k + A k ÞS k − α A k Þ becomes A k+1 = −αA k which is a difference equation that eventually tends to zero ðA k 2 = 0, k 2 > k 1 Þ. After day k 2 , the rate of changes in the infected people changes as I k+1 = −μ d I k − gI k−k d which has a downward trend and converges to zero ðI k 3 = 0, k 3 > k 2 > k 1 Þ, and according to R k+1 = gI k−k d , the recovered people reach its maximum value on day k 3 and remains stationary ðR k 3 = R m Þ. As a result, the stable equilibrium point of the model is obtained as ð0, Q m , 0, 0, First, it is assumed that there is no group of quarantined people ðQÞ and the rate α is not affected by quarantined people; therefore, The total population N ′ includes the individuals who have tested (it is a statistical society that can be generalized to the total size of the population), which is generally variable. But in this work, the total population is fixed and equal to the total number of people on the last day that data was taken (72nd day). According to Equation (9), the number of I k and R k is obtained from the actual data directly. Since N ′ is known, the number of A k + S k can be calculated. There is no reported data for the number of A alone. Since the normalized value of N ′ is equal to one and the number of infected and recovered people for South Korea is known, therefore Due to the incubation period of COVID-19, it is difficult to separate these two groups, and as shown in Figure 9, considering the quarantine rate of 95%, it is predicted that the total number of susceptible and asymptomatic people in the United States will eventually reach almost zero in two months. Since the number of each state is positive, then the sum of them is positive too. If the number of A + S reaches zero, then the number of A and S must become zero individually. But even though we do not know the number of asymptomatic people, but in the end, we are sure that they will reach zero. Using the actual data in Table 1, a function (or curve) is fitted for the vector A k + S k . Similarly, as mentioned before, the goodness of fit model is investigated with two criteria in Table 4. According to Table 4, small SSE and RMSE indicate a close fit of the function to the data. Therefore, our model fits very encouraging based on South Korean data.
Then, by derivation from the obtained function and equating it with Equation (11) that is obtained by Equations (8a) and (8c), an unknown value α can be obtained.
Now to make the parameters more accurate and to choose the optimal parameter, similar to the previous one, using the LS method where e ðS+AÞ k = ððS + AÞ fitk − ð d S + AÞ k Þ, in which ðS + AÞ fitk is the total number of S and A obtained from the fitted curve. Besides that, ð d S + AÞ k = ϕ 4 T k θ 4 is the total number of the actual S and A obtained from 1 − I k + R k , where ϕ 4 T k = ½−A k and θ 4 = ½α. Finally, where Also, according to Equation (8e), Since ð1/gÞR k+1 is determined, so the value of I k−k d is determined, too. Also, the value of I k is known, and according to Table 1, data analysis, and comparing the differences between these two vectors, k d can be obtained approximately. Now, considering the two new groups (A) and the delay, similarly, the unknown parameters including β, g, μ d , k d , and α are estimated and the new model is obtained. The uncertainty range of the estimated parameters is also presented in Table 4. The uncertainty range of the estimated parameters is also presented in Table 5.  Figures 10-12 show the comparison of the number of infected, recovered, and the sum of two S and A groups, respectively, based on the actual data and the model obtained from the estimated parameters.
Finally, by estimating α and k d , and for ψ = 0:95, the spread of COVID-19 in the United States can be predicted in Figures 13-15. In Figure 13, the actual data published by CDC of the United States from Feb.15 to Apr.28 are marked        Computational and Mathematical Methods in Medicine 19 epidemic in the United States will be at the end of November 2020. But even in the current situation, by applying this technique to the epidemic situation in the U.S., it can be conjectured that the eventual eradication is reached in seven months that maximizes the number of individuals who escape infection altogether. As shown in Figure 14, there is a significant difference between the number of predicted recovered people using the proposed model and the real number of recovered people in the United States. Since in the proposed model 95% of people have been quarantined at the beginning of the disease outbreak, fewer people get infected. As a result, fewer people will be recovered from the disease, and fewer recovered people imply that convergence toward immunity will be faster, whereas the number of recovered people in the U.S. is on the rise, indicating the high number of infected people. If the United States had followed this study procedure to quarantine in early severely, then the number of recovered people would have been smaller (because there were fewer infected). In Table 6, there is no statistic of the number of asymptomatic people and susceptible people. However, based on the proposed model, it can be predicted that the number of ðA + SÞ reached zero (Figure 9), but since the number of people is a positive number and the sum of them reached zero, it means both S and A reach zero. Achieving zero number of susceptible people means that people's quarantine is well done and the number of asymptomatic people has fallen to zero (meaning that all of them are recovered). Eventually, in Figure 15, the number of people in quarantine is demonstrated, which as expected the susceptible people are quarantined well. Although there is a lack of actual data for some groups (A and Q), the SQAIR model with estimated       11 Computational and Mathematical Methods in Medicine second peaks of COVID-19 observed in some countries or are expected soon. Accordingly, it can be estimated that the pandemic will peak during the second wave, in the fall of 2020. Hence, if the quarantine is not done correctly and is broken for any reason, it is possible to create the highly fatal next waves, as what happened in the Spanish flu in 1918. It should also be pointed out that the spreading out of COVID-19 in the United States would still be very severe. In addition to the high growth and even the mortality rate of the COVID-19 outbreak, the economic and social costs are the next problem, which are affected by this disease, and if the quarantine of people is not emphasized, it will have many catastrophic economic consequences discussed in [22]. Therefore, the law can contribute to preventing COVID-19 by supporting access to treatment and allowing public health authorities to limit contact with infectious people in response to disease outbreaks. Hereupon, the government should intervene to reduce the number of involved people, and it requires imposing martial law to strictly quarantine the population, efforts to treat infected people, and clinical research. Also, criminal penalties for breaking the quarantine and transmission of COVID-19 may create disincentives for individuals to stay home. Encouraging people to observe selfprotection (like wearing a face mask, social distancing, and limiting gathering) is significant to break the transmission chain, especially in countries where rates of COVID-19 are high.