A Wiener process with jumps to model the logarithm of new epidemic cases

: The number of daily new cases of an epidemic is assumed to evolve as the exponential of a Wiener process with Poissonian jumps that are exponentially distributed. The model parameters can be estimated by using the method of moments. In an application to the COVID-19 pandemic in the province of Qu´ebec, Canada, the proposed model is shown to be acceptable. General formulas for the probability that a given increase in the number of daily new cases is due to the normal variations of the continuous part of the process or rather to a jump of this process are given. Based on these formulas, the probability of observing the likely start of a new wave of infections is calculated for the application to the COVID-19 pandemic.

Because Y(t) > 0 for any t ≥ 0, geometric Brownian motions are used extensively in financial mathematics to forecast the future price of stocks; see, for example, [2]. However, it is well known that diffusion processes with fixed infinitesimal parameters cannot reproduce the sudden increases or decreases of stock prices or financial indices like the Dow Jones.
A more realistic model is obtained by adding random jumps to the diffusion process, so that it becomes a jump-diffusion process. The jumps are generally assumed to occur according to a Poisson process {N(t), t ≥ 0} with rate λ, which is independent of the diffusion process. Moreover, for the sake of simplicity, the jumps are often taken to be exponentially distributed independent random variables U 1 , U 2 , . . . with parameter α. Finally, the jumps are assumed to be independent of the Poisson process.
In the case of epidemics, the basic model is the Susceptible-Infected-Recovered (SIR) model introduced by Kermack and McKendrick [3] in 1927 and that has been generalized in various ways since then. In particular, stochastic versions have been proposed. An optimal control problem for a stochastic version of the Kermack-McKendrick model has been considered by the author [4].
Since the beginning of the COVID-19 pandemic, many papers have been published on the potential spread of the disease. Premarathna et al. [5] tried various models to forecast the number of cases in Sri Lanka. Bhattacharyya et al. [6] also attempted to forecast daily new cases of COVID-19, by using a hybrid time series model. Ghosh et al. [7] took the transmission of the infection through Brownian motion-like dynamics into account in their study.
In a mainly theoretical paper, Tesfay et al. [8] proved the existence and uniqueness of the global positive solution for a stochastic COVID-19 model with jump-diffusion. El Kharrazi and Saoud [9] simulated the spread of the COVID-19 epidemic by adding a jump-diffusion process in a SIR model. Finally, Fabiano and Radenović [10] were able to reproduce correctly the first two COVID-19 waves in Italy by making use of a geometric Brownian motion.
In this paper, our aim is first to find a one-dimensional model for the number X(t) of daily new cases of a given epidemic at time t. In Section 2, we will assume X(t) evolves as the exponential of a Wiener process with Poissonian jumps that are exponentially distributed. There will be four model parameters. We will see how to estimate these parameters by using the method of moments.
In Section 3, the model will be applied to the COVID-19 pandemic, using real data from the province of Québec, Canada. For this application, it will be shown that the proposed model is acceptable. Then, general formulas for the probability that a given increase in the number of daily new cases can be attributed to the normal variations of the continuous part of the process or is rather due to a jump of this process will be derived. Making use of these formulas, the probability of observing the likely start of a new wave of infections for the application to the COVID-19 pandemic will be calculated. Finally, we will end this paper with a few remarks in Section 4.

The mathematical model
Let {B(t), t ≥ 0} be a standard Brownian motion (starting from zero) and {N(t), t ≥ 0} be a Poisson process with rate λ. We assume that the two processes are independent. The stochastic process {Z(t), t ≥ 0} defined by where µ ∈ R and σ > 0 are constants, and U 1 , U 2 , . . . are independent and identically distributed (i.i.d.) random variables that are also independent of the Poisson process, is a Wiener process with Poissonian jumps. In this paper, we suppose that U 1 , U 2 , . . . have the common density function That is, the parent variable U has an exponential distribution with parameter α, so that its expected value is E[U] = 1/α.
Next, let X(t) denote the number of new cases of a certain epidemic on day t. We consider the following model: It follows that {X(t), t ≥ 0} is a stochastic process whose logarithm is a Wiener process with jumps. If the rate λ of the Poisson process decreases to zero, then {X(t), t ≥ 0} reduces to a geometric Brownian motion.
There are four parameters in our model: µ, σ, λ and α. In order to estimate these parameters, we will use the method of moments. Since µt + σ B(t) has a Gaussian distribution N(µt, σ 2 t), we can write that where s is assumed to be a positive constant. Furthermore (see [11]), With the help of Eqs. (2.4) and (2.5), and the fact that the two parts of the jump-diffusion process are assumed to be independent, we can calculate Here, because there are four parameters to estimate, we must calculate E k for k = 1, 2, 3, 4. Actually, the jump-diffusion process {Z(t), t ≥ 0} was recently used by the author in hydrology as a model for river flows and the four moments that we need were computed. It was found that Moreover, because both the Wiener and Poisson processes have stationary and independent increments, we can write that Now, let {x 1 , . . . , x n } be a set of observed daily new cases of the epidemic of interest. We define by the method of moments, we compute the point estimatesμ and we setμ k = µ k . In our problem, to obtain point estimates of the four parameters µ, σ, λ and α, we must solve the system of non-linear equationsμ k = µ k for k = 1, 2, 3, 4. This can be done numerically.
Remark. In the case when there is at least one value of x i which is equal to zero in the data set, we can simply define the new variable X * (t) = X(t) + 1, so that x i becomes x * i := x i + 1. An alternative procedure to estimate the parameters µ, σ, λ and α is the following: • from Eq. (2.7) with t = 1 and the equationμ 1 = µ 1 , we have • substitute the above expression for µ into the equationμ 2 = µ 2 and then solve for σ in terms of λ and α; • substitute the expression for σ into the equationμ 3 = µ 3 and then solve for λ in terms of α; • substitute the expression for λ into the equationμ 4 = µ 4 to obtain a polynomial equation of degree 9 in α.
If there is only one admissible root of the polynomial equation, then point estimates of the other three parameters can be calculated by working backwards.

Application to the COVID-19 pandemic
We will implement our model using the data for the COVID-19 pandemic in the province of Québec, Canada, that can be found on the web site https://www.donneesquebec.ca. The data set gives, from January 23rd, 2020, the number of daily new cases, the cumulative number of cases, the number of new deaths in various places (home, nursing homes, etc.), the cumulative numbers of deaths in these various places, as well as the total new deaths and cumulative numbers of deaths.
In our study, we are interested in the observed daily new cases. We used the data until June 16th, 2022. There are 876 observations in the data set.
Remarks. (i) Actually, the first case occurred on February 25th, 2020. Therefore, there are many zeros at the beginning of the data set, which affects the values of the point estimates of the model parameters.
(ii) We could use the data only from February 25th, 2020. However, there are a few zeros after this date. So, as mentioned in the previous section, we will consider the variable x * i := x i + 1 for i = 1, . . . , 876. The values of x i and ln(x * i ) for i = 1, . . . , 876 are shown in Figures 1 and 2, respectively. We can see the various waves of infections since the beginning of the pandemic. Because the penultimate wave is much larger than the others, the waves appear more clearly in Fig. 2. It seems unlikely that especially the penultimate wave (in the case of Fig. 1) or the first one (in the case of Fig. 2) could be interpreted as values taken by a Wiener process with constant drift. It is more plausible to assume that these are   In Figure 3, the histogram of the differences x i+1 − x i from February 25th, 2020 to June 16th, 2022 is presented. It is clear that the differences are not normally distributed. This is confirmed by the probability plot shown in Figure 4.
The histogram and the probability plot of the differences ln(x i+1 ) − ln(x i ) from February 25th, 2020 to June 16th, 2022 are presented respectively in Figures 5 and 6. We see that the normality assumption is much more plausible. However, performing the Anderson-Darling normality test with the help of the statistical software program Minitab, this assumption is strongly rejected. This is hardly surprising. Indeed, there are 843 observations in the data set. It is well known in statistics that by increasing the size of the random sample, every potential model for the data will eventually be rejected.
A mathematical model for a phenomenon like an epidemic is obviously a simplification of the real process. It is impossible to find an exact model, even a stochastic one. The aim of this paper is to propose a model that reasonably describes the evolution of the epidemic and that can be useful.
To justify the use of our model, we drew a random sample of size 30 from the data set. The probability plot of this random sample is shown in Figure 7. This time, the normality assumption is easily accepted. The p-value of the test is equal to 0,814. Therefore, though not perfect, the proposed model seems at least acceptable.
(3.1)  Hence, according to the model, the annual rate of events of the Poisson process is approximately equal to 13. Furthermore, the average percentage increase of daily new cases produced by an event is e 1/α − 1 41%.
Remark. Another way to estimate the parameters λ and α is to first define what constitutes a jump and then compute the average increase caused by a jump. However, this technique is rather subjective, because in practice there are several instances when it is not obvious whether the observed increase in daily new cases is due to the variations of the continuous part of the stochastic process or is indeed the result of a new event of the Poisson process. Therefore, the method proposed above to estimate the four model parameters simultaneously seems to us more reliable.
In practice, the model parameters are not constants, but are rather functions of the time t. The point estimatesμ k , k = 1, 2, 3, 4, for the periods from January 23rd, 2020 to April 4th, 2021 (437 data points) and from April 5th, 2021 to June 16th, 2022 (438 data points) are presented in Table 2. Proceeding as We will now compute the probability that the increase of observed new cases likely indicates that a new wave of infections has started, or is rather simply due to the normal variations that we can expect, according to our model.
A jump-diffusion process is a continuous-time stochastic process. However, the observations are available on a daily basis. Therefore, if a new wave has indeed started, the number of new daily cases is expected to increase significantly. This increase cannot in general be attributable solely to the continuous part of the stochastic process.
Let F denote the event that there is no jump on a given day. If the event F occurs, the probability that the number of new cases increases by at least r%, compared to the previous day, is given by = P e µ+σ Z 0 ≥ 1 + r , where Z 0 is a standard Gaussian random variable. We thus have where Φ(·) is the distribution function of Z 0 . When there is a jump on day t, we have We can write that The probabilities p 1 (r) and p 2 (r) are given in Table 3 for various values of r. These probabilities were computed with the point estimates of the parameters µ, σ and α calculated by using the whole data set. We see that a daily increase of at least 10% of new cases is not unlikely, even in the case of no jump on that day. However, a daily increase of at least 20% is much likelier if a jump occurred. If the number of new cases was already quite large and has doubled on a given day, it is almost certain that a jump occurred that probably indicates the start of a new wave of infections.
Moreover, let us define the event G r : the number of new cases increases by at least r%, compared to the previous day. Neglecting the probability of observing two or more jumps on a given day, we can write that P[G r ] = p 1 (r) · P[F] + p 2 (r) · P[F ] = p 1 (r) · 1 − e −λ λ + p 2 (r) · e −λ λ.  It follows that the probability that a jump took place on a given day, given that the event G r occurred, is given by The value of this conditional probability was computed withλ = 0.0355 and can be found in Table 4. Finally, we could have computed the probability of consecutive daily increases to detect the onset of a new wave more reliably.

Conclusion
In this paper, a jump-diffusion process has been proposed as a model for the logarithm of the number of daily new cases of a given epidemic. The model was applied to the case of the COVID-19 pandemic. The four model parameters are easy to estimate by using the method of moments. Once these parameters have been estimated, we could use them to forecast the number of daily new cases for    Based on our model, we were able to compute the probability that the increase in the number of new cases on a given day likely signals the start of a new wave of infections.
Diffusion processes with jumps have been used for several years in financial mathematics to model the evolution of various stock market indices or commodity prices. In the case of epidemiology, there are still few papers in which the authors have used these processes as models.
It is unlikely that any one model can be used successfully for every possible type of epidemic, or even for all countries affected by a certain epidemic. In this paper, we saw that the proposed model seems at least realistic in the case of the COVID-19 epidemic in the province of Québec. It can be expected to be acceptable for countries such as the United States and those in Western Europe as well.
Finally, instead of using a jump-diffusion process, we could have considered a diffusion process whose infinitesimal parameters are time-dependent and are estimated by computing, for instance, a one-week moving average.