Application of the backstepping method to the prediction of increase or decrease of infected population

Background In mathematical epidemiology, age-structured epidemic models have usually been formulated as the boundary-value problems of the partial differential equations. On the other hand, in engineering, the backstepping method has recently been developed and widely studied by many authors. Methods Using the backstepping method, we obtained a boundary feedback control which plays the role of the threshold criteria for the prediction of increase or decrease of newly infected population. Under an assumption that the period of infectiousness is same for all infected individuals (that is, the recovery rate is given by the Dirac delta function multiplied by a sufficiently large positive constant), the prediction method is simplified to the comparison of the numbers of reported cases at the current and previous time steps. Results Our prediction method was applied to the reported cases per sentinel of influenza in Japan from 2006 to 2015 and its accuracy was 0.81 (404 correct predictions to the total 500 predictions). It was higher than that of the ARIMA models with different orders of the autoregressive part, differencing and moving-average process. In addition, a proposed method for the estimation of the number of reported cases, which is consistent with our prediction method, was better than that of the best-fitted ARIMA model ARIMA(1,1,0) in the sense of mean square error. Conclusions Our prediction method based on the backstepping method can be simplified to the comparison of the numbers of reported cases of the current and previous time steps. In spite of its simplicity, it can provide a good prediction for the spread of influenza in Japan.


Background
The spreading mechanism of infectious diseases in population has been studied in terms of mathematical modeling since the pioneering works by Kermack and McKendrick [1,2] (see Hethcote [3] for a detailed review). In the eradication or reduction of diseases, the concept of control would play an important role (see, for instance, Anderson and May [4] and Smith et al. [5]). To design the appropriate control strategies, the prediction of epidemic size is essentially important. In recent years, Ferguson et al. [6], Hyder et al. [7] and Riley et al. [8] studied influenza and other diseases from this perspective. On the other hand, the qualitative analysis of mathematical models is also important for understanding the effect of control. Alexander et al. [9] and Mills et al. [10] published their study from this perspective. For a long time, age-structured epidemic models have been studied by many authors (see, for instance, Iannelli [11], Inaba [12], Inaba and Nishiura [13] and Tudor [14]). Mathematically, these models can be regarded as the boundary-value problems of partial differential equations. On the other hand, in engineering, the backstepping method has recently been developed by Smyshlyaev and Krstic [15] to obtain the boundary feedback control for stabilizing the systems of partial differential equations and has widely been studied by many authors (see, for instance, Susto and Krstic [16] and Baccoli et al. [17]). The aim of this study is to make use of the backstepping method for epidemiological considerations. Specifically, we will develop a new method for the prediction of increase or decrease of infected population.
In the classical theory of the basic reproduction number R 0 (see, for instance, Diekmann et al. [18] and van den Driessche and Watmough [19]), the number of newly infected individuals produced by a typical infected individual invading into a fully susceptible population is characterized as the threshold value. That is, if R 0 > 1, then the epidemic size will increase and if R 0 < 1, then it will decrease. In our method, a threshold criteria U(t) will be calculated for each time step t as a boundary feedback control. It will be shown that whether the newly infected population I(t, 0) at each time step t will exceed U(t) or not determines the increase or decrease of infected population at the next time step t + 1. That is, if I(t, 0) > U(t) (I(t, 0) < U(t)) at the current time step t, then the infected population will increase (decrease) at the next time step t + 1. Thus, we can predict the increase or decrease of infected population by comparing I(t, 0) and U(t) for each time step.

The model formulation
The meaning of each symbol in our mathematical model is as follows.
• t ≥ 0: the chronological time; • a ∈ [0, a † ]: the class age (that is, the time elapsed since the infection) of infected individuals; • a † > 0: the maximum period of infectiousness; • I(t, a): the infected population of class age a at time t; • γ (a) ∈ L 1 + (0, a † ): the per capita recovery rate of infected individuals of class age a.
Under this setting, the class age-structured epidemic model in this paper is formulated as follows.
Note that the boundary condition for a = 0 is not considered in (1) unlike usual class age-structured epidemic models. In fact, in this paper, we analytically derive the boundary feedback control U(t) by using the backstepping method and compare it with I(t, 0) which will be obtained from the real data.

The backstepping method
The aim of this section is to derive the boundary feedback control U(t). Let us consider the following integral transformation.
Now we seek the integral kernel k(a, σ ) such that w(t, a) becomes the solution of the following target system.
where ψ(a) is a positive continuous function on [0, a † ] which is chosen such that the operator describing the target system has the spectrum consisting of stable eigenvalues and zero eigenvalue, that is, w(t, a) converges to a nontrivial equilibrium w * (a) as t → +∞. Therefore, we see that if the integral equation has an equilibrium I * (a), then I(t, a) converges to I * (a) as t → +∞. Hence, in what follows, we will determine the integral kernel k(a, σ ) such that (2) satisfies (3) and further (4) has the equilibrium I * (a). Differentiating (2) with respect to t and using (1), we have On the other hand, differentiating (2) with respect to a, we have Adding both sides of the Eqs. (5) and (6) and using (1), we have Comparing this equation with (3), we see that k(a, σ ) should satisfy the following equation.
Since the operator describing the target system (3) is required to have zero eigenvalue, ψ(σ ) should satisfy Note that the Eq. (8) corresponds to the situation where the basic reproduction number R 0 is just equal to one (see, for instance, Diekmann et al. [18]). By integrating (7) along the characteristic line σ − a = const., we obtain Since the kernel k(a, σ ) is bounded on the domain 0 ≤ a ≤ σ ≤ a † and w * (a) is continuous on [0, a † ], the integral Eq. (4) has an equilibrium I * (a) and the solution I(t, a) of (1) converges to I * (a) as t → +∞ under the boundary condition Thus, the boundary feedback control U(t) is given by and if I(t, 0) > U(t), then the infected population I(t, a) will increase and diverge, while if I(t, 0) < U(t), then the infected population I(t, a) will decrease and converge to 0.

Remark on mathematical well-posedness
If we consider the initial condition I 0 (a) := I(0, a), it should satisfy the following condition to let (1) have a differentiable solution I(t, a): In fact, if the first two conditions are satisfied, we can show the existence of the differentiable solution I(t, a) by constructing a C 0 -semigroup generated by the differen- Since the aim of this paper is to develop a prediction method, we do not require the observed data to satisfy the condition (11). However, we remark that this mathematical rigorousness is essentially important from the analytical point of view.

The prediction method
From the above discussion, we obtain the following prediction guideline.
If I(t, 0) > U(t), then the infected population I(t, a) will increase. If I(t, 0) < U(t), then the infected population I(t, a) will decrease.
In fact, we obtain Fig. 1 for some artificial parameters, which is an example for verifying the validity of (12).
In what follows, for simplicity, we assume that ψ(a) ≡ ψ and the period of infectiousness is same for all infected individuals. To represent this situation, we assume that the recovery rate is given by the Dirac delta function multiplied by a sufficiently large positive constant M 0: γ (a) = Mδ a − a † . Here, a model function for the Dirac delta function can be given by the isosceles triangle with height m > 0 and base a † − 2/m < a < a † (which approaches to δ a − a † as m → +∞). In this case, the survival rate at which an infected individual of class age a still has infectiousness is given by and hence, ψ = e M /a † . Then, from (10), the boundary feedback control U(t) is given by Let us consider the discrete-time series and let a † = 1. Then, by using the rectangle approximation at σ = a † = 1, the above equality can be represented as follows.
Thus, from (12), we can predict that if then the number of reported cases at the next time step t + 1 will decrease. Although this prediction method is very simple, its accuracy can be high for the real data of the spread of influenza in Japan (see the next section).

Prediction of increase or decrease of infected population for influenza in Japan
In this section, we apply our prediction method proposed above to the real data of reported cases of influenza in Japan from 2006 to 2015. The data is available in the website of the National Institute of Infectious Diseases, Japan (see [20]). Let the unit time step be a week. First, in Table 1, we exhibit the result of our prediction from the week 1 to week 12 in 2015.
In Table 1, it is easy to see that the control U(t) is equal to the number of reported cases at the previous week t − 1 as derived in (13). For each time step t ∈ [2,11], the number of reported cases is compared to U(t) and if it is greater than U(t), then the prediction is "Increase" and if it is less than U(t), then the prediction is "Decrease". In this case, the number of correct predictions is 9 and the total number of predictions is 10. Hence, the accuracy of the prediction is 9/10 = 0.90. Next, we extend our prediction to all weeks from 2006 to 2015. The time series of the actual number of reported cases per sentinel in this period is illustrated in Fig. 2. In this case, the accuracy of our prediction for each year is listed in Table 2.
As in the previous example, the prediction is not performed for the first week (t = 1) and the last week (t = 52). Therefore, the total number of predictions per year is 50. The accuracy of total predictions is 0.81 and relatively high.

Comparison
Next we compare our prediction method with an alternative prediction method based on ARIMA (AutoRegressive Integrated Moving Average) models (see, for instance, Kane et al. [21]). To perform the alternative prediction method, we use the function "ARIMAProcess" implemented in Wolfram Mathematica 10.3. For each of the weeks in 2006-2015, we iteratively apply the actual data from each of the past 10 weeks (including the current week) to predict the number of reported cases at each of the next weeks. Note that we additionally need the data of the last 10 weeks in 2005 to perform the prediction at the beginning of 2006. If the predicted number for the next week is greater (less) than the reported number at the current week, then the prediction can be regarded as "Increase" ("Decrease"). As it is well known, ARIMA models ARIMA(p, d, q) can be characterized by the orders of the autoregressive part p, differencing d and moving-average process q. For different p, d, q ∈ [0, 1] , p + d + q > 0, we obtain the prediction result as listed in Table 3.
From Table 3, we see that (p, d, q) = (1, 1, 0) is the best choice in this case. Nonetheless, its accuracy is lower than that of our prediction method (0.79 < 0.81).

Estimation of the number of reported cases
Finally, we perform an estimation of the number of reported cases, which is consistent with our prediction method. Roughly, we assume that the increase or decrease of newly infected population at each week follows a linear law. That is,  where k > 0 is a fitting parameter. It is easy to see that (14) is consistent with our prediction method (that is, if I(t, 0) > U(t), then I(t + 1, 0) > I(t, 0) and if I(t, 0) < U(t), then I(t + 1, 0) < I(t, 0)). Using the actual data from 2005 to 2015, we perform an ongoing estimation from the first week of 2006 to the last week of 2015. To minimize the mean square error between the actual number of reported cases and our estimated values, k is chosen to be 0.6 (see Fig. 3). In this case, the mean square error of our estimation is 6.17457371, while that of the best-fitted ARIMA model ARIMA(1, 1, 0) is 9.469997. The estimation result is shown in Fig. 4. In Fig. 4, it is seen that although three curves take close values, some values estimated by ARIMA(1, 1, 0) are negative.

Discussion
Although our prediction method only need the data from each of the past 2 weeks (including the current week), the accuracy of it (0.81) was higher than that of the best-fitted ARIMA model ARIMA(1, 1, 0) (0.79) which is based on the data from each of the past 10 weeks. Moreover, our estimation method (14) for the number of newly infected population was better than ARIMA(1, 1, 0) in the sense of mean square error. From these results, we conjecture that focusing only on the data in the current and previous weeks can lead to a good prediction for the spread of influenza in Japan.

Conclusions
In this study, based on the backstepping method in engineering, we developed a new prediction method for the increase or decrease of newly infected population. Under the assumption that the period of infectiousness is same for all infected individuals (that is,  the recovery rate is given by the Dirac delta function multiplied by a sufficiently large positive constant), the method was simplified to the comparison of the number of reported cases at the current and previous time steps. In spite of its simplicity, its accuracy was relatively high (0.81) for the spread of influenza in Japan from 2006 to 2015. Furthermore, the simple estimation method (14) based on the linear law was proposed and its accuracy was better in the sense of mean square error than that of the best-fitted ARIMA model ARIMA(1, 1, 0), which is based on the data from each of the past 10 weeks. From these results, we conjectured that focusing only on the data in the current and previous weeks can lead to a good prediction for the spread of influenza in Japan. As future tasks, not limited to influenza in Japan, our prediction method would be applied to various infectious diseases in various countries. The assumption that the period of infectiousness is same for all infected individuals might have to be modified for each case. Our model (1) was based on the SIR epidemic model in which the newly infected individuals immediately have the infectiousness without latency. To consider more realistic situations, we might have to start the discussion from some different models such as the SEIR and SIRS epidemic models.