CLAR(1) point forecasting under estimation uncertainty

Forecast error is not only caused by the randomness of the data‐generating process but also by the uncertainty due to estimated model parameters. We investigate these different sources of forecast error for a popular type of count process, the Poisson first‐order integer‐valued autoregressive (INAR(1)) process. However, many of our analytical derivations also hold for the more general family of conditional linear AR(1) (CLAR(1)) processes. In addition, results from a simulation study are presented, to verify and complement our asymptotic approximations.

of this variety of different AR(1)-like models, Grunwald, Hyndman, Tedesco, and Tweedie (2000) defined and discussed a general class of time series models, which are characterized by such an autocorrelation structure, namely, the conditional linear AR(1) (CLAR(1)) models.

Definition 1.
A stationary real-valued (either discrete-or continuous-valued) process (X t ) Z is said to be a CLAR(1) process if its conditional mean is given by with ∈ (−1; 1) and ∈ R, and if the variance exists, 2 = V[X t ] < ∞.
A CLAR(1) process according to Definition 1 satisfies = E[X t ] = ∕(1 − ) and (k) = k (Grunwald et al., 2000). If being concerned with a count process, then we additionally have to require that > 0 and ∈ [0; 1). Example 1. The INAR(1) model by McKenzie (1985) is defined by the model recursion X t = •X t−1 + t with ∈ [0; 1), where " •" denotes the binomial thinning operation (as an integer-valued substitute of the ordinary multiplication in the AR(1)'s model recursion), and where ( t ) Z constitutes an independent and identically distributed (i. i. d.) count process of innovations. Here, binomial thinning is defined by requiring that •X|X ∼ Bin(X, ), and it is assumed that these thinnings are executed independently of other thinnings and innovations. A comprehensive survey of properties and methods concerning the INAR(1) model is presented in chapter 2 of Weiß (2018). Among others, it holds that Poisson-distributed innovations lead to Poisson-distributed observations, namely, t ∼ Poi( ) and X t ∼ Poi( ); we refer to this specific instance as the Poisson INAR(1) model.
For a CLAR(1) process with its conditional-mean-based definition, it appears obvious to do point forecasting based on the conditional mean. So given the parameter values of , , the h-step-ahead point forecast is defined bŷ In case of a discrete-valued data-generating process (DGP), this does not lead to a coherent point forecast in the sense of Freeland and McCabe (2004) and Jung and Tremayne (2006), so an additional discretization (e.g., rounding) would be required in practice here (Homburg, Weiß, Alwan, Frahm, & Göb, 2019). However, for the sake of a unique treatment, let us concentrate on the basic mean-based forecasting approach. In addition, the mean-based forecasting of count processes has been considered by many other authors as well, for example, by Sutradhar (2008), Mohammadipour and Boylan (2012), and Maiti, Biswas, and Das (2016). In practice, , are not known but have to be estimated from the available time series data X 1 , … , X S with S ≤ T; denote the corresponding estimators bŷ,̂. A possible choice for̂,̂, which is unique across all CLAR(1) processes, are the moment estimatorŝ(1) and X (1 −̂(1)), respectively (whereas maximum likelihood (ML) estimators would be model-specific). Forecasting with estimated parameters is then done according tô X T+h =̂h X T + 1 −̂h 1 −̂̂.
Inspired by the works of Phillips (1979) and Fuller and Hasza (1981), let us now look at stochastic properties of the forecast error X T+1 − X T+1 = ( X T + − X T+1 ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ forecasting with known parameters + (̂− ) X T + (̂− ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ additional effect of estimated parameters . (1) The situation h = 1 in (1) is not only most important for applications in practice, it is also the only situation where an asymptotic analytic treatment turned out to be feasible (because the order of the involved moments increases with h). Nevertheless, the case h > 1 is later investigated via simulations. Phillips (1979) considered the special case of a zero-mean Gaussian AR(1) process (so only has to be estimated). There, it was possible to derive an approximate distribution for this forecast error. This, however, will not be possible for more general CLAR(1) processes. In particular, it will not be feasible if considering CLAR(1) count processes like the Poisson INAR(1) model (Weiß, 2018). Therefore, in the present article, we concentrate on the probably most important measure of forecast uncertainty, the variance ofX T+1 − X T+1 , like it was also done by Fuller and Hasza (1981) for Gaussian AR processes. Knowledge of the variance will also allow to construct simple k-prediction intervals. Here, V[X T+1 − X T+1 ] is influenced by • the estimation uncertainty of̂− and̂− ; • the cross-dependence of X T and̂− ,̂− ; • the randomness of the DGP as expressed by X T+1 − X T − .
For the ordinary AR(1) process X t = ⋅ X t−1 + t , the latter difference reduces to T+1 − , so uncertainty is only caused by the error term. For an INAR(1) process, in contrast, we have the binomial thinning operation as an additional source of uncertainty, X T+1 − X T − = ( •X T − X T ) + ( T+1 − ). Note that explicit asymptotic formulas for the case of moment estimatorŝ,ĥ ave been studied by Weiß and Schweer (2016) for the Poisson INAR(1) model (and also the INARCH(1) one). It should also be emphasized that the dependence between X T and̂,̂depends on the actual value of the difference T − S.
To derive an analytic approximation for the variance of the forecast error, V[X T+1 − X T+1 ], we start with a linear approximation of the involved estimators (under a general CLAR(1) assumption) in Section 2. Then, again under CLAR(1) assumption, we derive a general expression for the forecast-error variance in Section 3. This expression depends on higher order mixed moments, which can only be computed under full model specification. This is illustrated in Section 4, where we derive a simple closed-form approximate formula for the case of a Poisson INAR(1) DGP. Also, approximations for large T − S or h are discussed. The approximations are evaluated in the simulation study presented in Section 5. Finally, we conclude in Section 6.
To be able to approximate the variance of the forecast error (1), we approximate the deviationŝ − ,̂− by their first-order Taylor approximations. Then we compute the asymptotic variance of the linearized version of (̂− ) X T + (̂− ) as an approximation of the corresponding true variance. For this purpose, we require the higher order joint moments of the form (2). Generally, these depend on the specific model assumption for the DGP. However, a few relations hold for CLAR(1) processes in general. Proposition 1. Let (X t ) Z be a stationary CLAR(1) process according to Definition 1.
Proof. An explicit expression for the case (k) is easily derived by using that the ACF equals (k) = k : For (k,l) with l > k, the CLAR(1) assumption implies that This completes the proof of Proposition 1. ▪ The first-order Taylor approximation of̂− or̂− , respectively, takes the form where the functions g ⋆ = g , g are defined by (3). These linear approximations are computed as follows.
Proposition 2. Let (X t ) Z be a stationary CLAR(1) process according to Definition 1. Recall the moment estimatorŝ= g ) .

The proof of Proposition 3 is provided by Appendix
, that is, this difference becomes zero if the considered CLAR(1) process is time-reversible. Thus, for a time-reversible CLAR(1) process, Proposition 3 simplifies to Examples of such time-reversible CLAR(1) processes are the continuous-valued Gaussian AR(1) process, and the discrete-valued Poisson INAR(1) process (unbounded counts) as well as binomial AR(1) process (bounded counts), see McKenzie (1985). The formulas derived in Proposition 3 and Equation (6) are not expected to serve as a reasonable approximation of the actual mean forecast error because the biases of̂,̂in (5) are ignored due to the linear approximation. However, Proposition 3 and Equation (6)  ] .

VARIANCE OF CLAR(1) FORECAST ERROR
For the variance ofX T+1 − X T+1 , formula (1) first implies the decomposition However, it should be noted that for any expression f(X 1 , … , X T ), the law of total expectation and the definition of a CLAR(1) process imply that This can be further decomposed into where the "known-parameter variance" (unavoidable variance if doing forecasting) computes as Remark 1. In (7), it is again possible to account for the possible discrepancy between S and T. The law of total covariance implies that Analogously, the law of total variance yields However, for general CLAR(1) processes, the conditional variance V[X T | X S , …] is not specified. Consequently, we cannot derive a general expression for S ≤ T this time. Therefore, in the sequel, we concentrate on the case S = T. However, later in Section 4, when considering the special case of a Poisson INAR(1) process, we explain how the results could be extended to the case S < T.
To obtain an approximation for the "estimated-parameter variance add-on" , respectively. Since the considered linear approximations of − ,̂− imply that the means of (̂− ) , respectively. In a first step, this is done for general CLAR(1) processes. Later in Section 4, the obtained formulas are used to derive explicit closed-form expressions for the illustrative case of a Poisson INAR(1) process.

Proposition 4.
Let (X t ) Z be a CLAR(1) process according to Definition 1, let S = T. Then, where a, b, A, B are defined in Proposition 2.
The proof of Proposition 4 is provided by Appendix A.3. Since general closed-form expressions for the higher order moments of a CLAR(1) process are not possible, the approximate variance as given by Proposition 4 has to be computed individually for specific cases. Note that joint moments up to order 6 are required for this purpose. We shall exemplify the required computations for the special case of a Poisson INAR(1) process.

VARIANCE OF POISSON INAR(1) FORECAST ERROR
In this section, we consider a special type of CLAR(1) process, the Poisson INAR(1) process introduced in Example 1. To be able to compute the approximate variance according to Proposition 4, we need expressions for the higher order joint moments.
Proposition 5. Let (X t ) Z be a stationary Poisson INAR(1) process according to Example 1. Then the following results hold: The expressions (i)-(iii) for the moments of order ≤ 4 have been derived by Weiß (2012). The remaining proof of Proposition 5 is provided by Appendix A.4. Now we apply Proposition 5 to Equations (7) and (8), as well as to Proposition 4. Theorem 1. Let (X t ) Z be a Poisson INAR(1) process according to Example 1, let S = T. Then the variance of the forecast error, The proof of Theorem 1 is provided by Appendix A.5. It should be pointed out that the given known-parameter variance could also be computed in a different way, recall the discussion in Section 1: Note that T+1 is independent of X T and the thinning thereof. By the law of total variation and the binomial variance, we have So altogether, Note that for the ordinary AR(1) process, the first term would vanish. Hence, INAR(1) forecasting suffers from stronger uncertainty than AR(1) forecasting does. Actually, in Fuller and Hasza (1981), we find the approximate expression for the Gaussian AR (1)   . This summand covers the additional randomness of the DGP as well as the additional uncertainty of its parameter estimators.
Remark 2. Let us recall Remark 1 concerning the case S < T. There, it was shown that the variance V [(̂− ) X T ] computes as a sum consisting of two terms: : This variance can be split into the three summands Hence, it becomes clear that additional effort is required if intending to derive an asymptotic approximation for the case S < T: needs to be computed in analogy to Lemma 3, and an approximation for V [̂− ] is required, where the latter is provided by Weiß and Schweer (2016).
Although a general extension to T > S would require additional effort, some assertions for T ≫ S immediately follow. Since then T−S ≈ 0, the mean forecast error, , follows from Equation (5) as The required bias expressions are known from Weiß and Schweer (2016) as So altogether, which becomes more pronounced if The asymptotic cross-covariance matrix of̂,ê quals see Weiß and Schweer (2016). Thus, (7) implies that like in Theorem 1. Hence, this approximation formula seems to be appropriate for many situations S ≤ T. In contrast to the forecast bias (10), increasing has a decreasing effect on the variance, while it is increased with increasing . Finally, for h → ∞, we approximately predictX T+h ≈̂, so where the last expression follows from Weiß and Schweer (2016).
Example 2. Let us discuss a time series consisting of monthly counts of claims caused by burn related injuries in the heavy manufacturing industry (1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994), so T = 96). The dataset was taken from example 2.5.1 in Freeland (1998), where it was used to illustrate the Poisson ] ≈ 4.81 (Fuller & Hasza, 1981). Hence, a notable part of the forecast uncertainty would be neglected in this way.
However let us return to the fitted Poisson INAR(1) model. The derived results can now be used to evaluate the uncertainty of the point forecast valuex T+1 =̂x T +̂(1 −̂) in the given data example, where x T = x 96 = 11 according to Figure 1. Sox T+1 =x 97 ≈ 9.69. An approximate 90 % interval is constructed by adding ∓1.645 times the forecast standard deviation tox 97 . If ignoring the estimation uncertainty and, thus, using

RESULTS FROM A SIMULATION STUDY
To evaluate the approximations of the forecast-error variance according to Theorem 1 and Equations (11), (12), as well as that of the mean forecast error according to Equation (10), we did a comprehensive simulation study, where Poisson INAR(1) processes with different parametrizations were simulated (always 1 million replications per scenario). The moment estimates of , were computed based on the simulated time series x 1 , … , x S , and the subsequent observations were used to determine the forecast errorx T+h − x T+h for different choices of T − S ≥ 0 and h ≥ 1. For each scenario, the one million values of forecast error were used to compute the sample mean and variance thereof. The full simulation results are available from the authors upon request. In what follows, we present representative excerpts thereof to illustrate our general findings. Although our main interest is in the forecast-error variance, let us start with a brief discussion of the mean forecast error. Probably most important, see the excerpt in Table 1, the mean forecast error is generally very close to 0 such that it might be judged negligible for applications in practice. Nevertheless, a few further conclusions can be drawn. We see that with increasing gap T − S (first block of columns in Table 1), the mean forecast errors quickly stabilize, namely, at a value close to the approximation (10). Actually, it appears that (10)  ] . For the 1-step-ahead predictions, the variance approximation according to Theorem 1 or Equation (11), respectively, can be used. From Table 2, we see that this approximation generally works quite well, and the approximation quality improves for decreasing and increasing S. Increasing T − S deteriorates the approximation quality, but at least for S ≥ 200, we again have a close agreement. Although the variance increases with increasing mean (see Equation 11), the actual approximation quality does not seem to be affected by varying . Thus, only the case = 2.5 is shown in the plotted tables.  The necessity for accounting for the estimation uncertainty becomes clear if considering the known-parameter variance V [X T+1 − X T − ] and its decomposition according to (9), see the last column in Table 2. It can be clearly seen that the simulated variance V [X T+1 − X T+1 ] is always considerably larger than V [X T+1 − X T − ], and this discrepancy further increases with decreasing sample size. Hence, the newly derived approximation formula Equation (11) should be used in practice. If computing, for example, a k-prediction interval without considering the estimation uncertainty, there is a higher risk of falling below the intended coverage level. It is also insightful to compare with the pure innovations' variance (this would be the only source of variation in the AR(1)-case), which is much smaller than V [X T+1 − X T − ]. Hence, for a Poisson INAR(1) process, a major part of the forecast uncertainty is caused by the thinning mechanism.
Finally, let us discuss the h-step-ahead forecast-error variance (see Table 3). For h = 1, also see the previous discussion, the approximation according to Theorem 1 or Equation (11), respectively, works quite well. Then, for increasing forecast horizon h, also the variance quickly increases, namely, toward the value given by Equation (12). Actually, this approximation can also be used for finite values of h. As an example, for h = 10, a rather good agreement can be observed.

CONCLUSIONS
We analyzed the distribution of error in mean-based point forecasting of CLAR(1) processes. Forecast error is not only caused by the randomness of the DGP itself, but it is notably affected , though slightly negative, is often negligible, the variance becomes quite large (much larger than in the Gaussian AR(1) case). In particular, the decom- showed that the estimation uncertainty causes a notable increase in variance compared with the known-parameter case. Thus, for applications in practice, it is highly recommended to account for the effect of parameter estimation if evaluating the uncertainty in forecasting.

A.2 Proof of Proposition 3
To compute E [(̂− ) X S ] analytically, we are going to use the Taylor approximation according to Proposition 2. This yields ) .
Using the moment notations (2) leads to Utilizing Proposition 1 yields ) .
Inserting (0) and (1) as well as the definitions of a and b leads to ) .

A.3 Proof of Proposition 4
To prove Proposition 4, we first derive the three auxiliary results given in Lemmas 1-3. Lemma 1. Let (X t ) Z be a CLAR(1) process according to Definition 1, let S = T. Then, where A, B are defined in Proposition 2.
Proof. Using the linear approximations according to Proposition 2, we obtain Expanding this product yields ) .
This sum can be split as follows where we used the expressions for (0) and (1) according to Proposition 1. Finally, we can explicitly compute Thus, Using that A + B = 3 (1 − ) = b, we complete the proof of Lemma 1. ▪ Lemma 2. Let (X t ) Z be a CLAR(1) process according to Definition 1, let S = T. Then, where a, b, A, B are defined in Proposition 2.
Proof. Using the linear approximations according to Proposition 2, we obtain Expanding this product yields ) .
Note that A − a = A + 2b and b − B = A. Then, the sum can be split as follows ) .
Now, we can apply Proposition 1 in the same way as in the proof of Proposition 3: Using that ⋅ (0,0) = (0,1) − (1 − ) ⋅ (0) , we get Thus, using that b = 2 (1 − ), we get Using that a = −2 b and applying the geometric sum formula completes the proof. ▪ Lemma 3. Let (X t ) Z be a CLAR(1) process according to Definition 1, let S = T. Then, where a, b are defined in Proposition 2.
Proof. Using the linear approximations according to Proposition 2, the squared moment computes as Expanding this product yields ) .
This sum can be split as follows ) .
Using that b (0) = − B, we complete the proof of Lemma 3. ▪ Now we combine the expressions derived in Lemmas 1-3 according to This yields ) .
Collecting terms of equal type leads to ) .
This concludes the proof of Proposition 4.

A.4 Proof of Proposition 5
It remains to prove the expressions for the fifth-and sixth-order moment given in parts (iv) and (v), respectively. Both formulas are derived in a completely analogous way. To save some space, we only present the proof of part (iv) for illustration.
For the conditional moments given before, we introduce the notation for n ≥ 1. Using the law of total expectation, we now distinguish between the following five cases: Case 1: n > m. Then we have (k,l,m,n)  Case 2: n = m > l. Then we have (k,l,m,m) with a (k,l) ∶= 4 + 3 ( k + l−k ) + 2 (1 + ) l . Hence, we get ) .