A goodness-of-ﬁt test for Poisson count processes

: We are studying a novel class of goodness-of-ﬁt tests for para- metric count time series regression models. These test statistics are formed by considering smoothed versions of the empirical process of the Pearson residuals. Our construction yields test statistics which are consistent against Pitman’s local alternatives and they converge weakly at the usual para- metric rate. To approximate the asymptotic null distribution of the test statistics, we propose a parametric bootstrap method and we study its properties. The methodology is applied to simulated and real data.


Introduction
Recently, several authors have been developing theory for linear and nonlinear parametric models for regression analysis of count time series; see for instance Fokianos et al. [9], Neumann [19], Woodard et al. [27], Fokianos and Tjøstheim [11], Doukhan et al. [2] and Fokianos and Tjøstheim [11]. The main issues addressed in those contributions, are the study of conditions under which the observed process is ergodic and the associated problem of maximum likelihood estimation. However, a missing part of those studies is the advancement of goodness-of-fit methods for assessing the adequacy of the proposed models. In this contribution, we fill this void, by developing methodology for testing goodness of fit of parametric linear and nonlinear count time series models. The problem of goodness of fit has been studied recently by Neumann [19] but the proposed test statistic cannot be employed to detect local alternatives, in general. In our approach, this obstacle is removed because the proposed test statistics are based on a smooth approximation of the empirical process of Pearson residuals; see McCullagh and Nelder [18].
In what follows, we assume the following setup. Consider a stationary vector valued process {(Y t , λ t )}, for t ∈ N 0 = N ∪ {0}. In general, the process {Y t } denotes the response count process which is assumed to follow the Poisson distribution conditionally on the past and {λ t } denotes the mean process. In other words, if we denote by F t = σ(λ 0 , Y s : s ≤ t), we assume that the model is given by for some function f : [0, ∞) × N 0 → [λ min , ∞) with λ min > 0.
To understand the usefulness of model (1.1), suppose that a count time series is available and its autocorrelation function assumes relatively high values for large lags. Then, it is natural to model such data by including a large number of lagged regressor variables into a model. However such an approach can be avoided when employing model (1.1); it simply provides a parsimonious way to model this type of data; see Fokianos et al. [9] and Fokianos [8] for more. Such a modeling approach follows the spirit of GARCH models (see Bollerslev [1]) whereby the volatility is regressed to its past and lagged regressors. For the case of Poisson distribution, the mean equals the variance and therefore model (1.1) can be viewed as an integer GARCH type of model.
Recalling now the general model (1.1) we note that although estimation and inference is quite well developed, goodness-of-fit methods have not attracted a lot of attention. In this article we will be considering testing goodness of fit of model (1.1) by focusing on the following two forms of hypotheses. The first one refers to the simple hypothesis for some completely specified function f 0 which satisfies the contractivity assumption (C) below. However, in applications, the most interesting testing problem is given by the following composite hypotheses where Θ ⊆ R d and the function f θ is known up to a parameter θ and satisfies assumption (C) as before. Those testing problems have been considered in a recent contribution by Neumann [19]. Assuming that observations Y 0 , . . . , Y n are available he proposed a simple procedure for testing (1.2) based on the following test statistic Here, the initial value λ 0 is arbitrarily chosen and, for t = 1, . . . , n, λ t = f 0 ( λ t−1 , Y t−1 ). It is clear that the test statistic T n,0 is based on the observation that the conditional variance of Y t is equal to its mean because of the Poisson assumption. In the case of the composite hypotheses (1.3), the above test statistic is suitably modified as where θ n is any √ n-consistent estimator of θ, λ 0 is chosen arbitrarily, and, as before, λ t = f θn ( λ t−1 , Y t−1 ), for t = 1, . . . , n. It turns out that both of these test statistics are asymptotically normally distributed and offer easily implemented specification tests for the intensity function f . However, they are not suitable to detect local alternatives in general. In this contribution, we propose alternative test statistics which are able to detect local alternatives with a nontrivial asymptotic power. Those test statistics are formed on the basis of Pearson residuals and a smoothing function as we show in Section 2. We point out that the smoothing function does not need to depend upon any bandwidth; in other words the proposed test statistic's power is not influenced by any suboptimal choice of a bandwidth. In developing these test statistics, we extend the theory of specification testing to integer valued time series. In fact, we study the asymptotic distribution of the proposed test statistics and we show that they are consistent against Pitman's local alternatives, at the usual rate of n −1/2 , where n denotes the sample size. In some sense, we extend the theory of nonparametric goodness-of-fit tests for autoregressions (Koul and Stute [16]) to include count time series with a feedback mechanism; recall (1.1). A major difference though between our work and that of Koul and Stute [16] is that the proposed methodology includes models with feedback mechanism and employs the empirical process of the Pearson residuals. Based on our approach, we relax the number of moments needed to obtain asymptotic results following a recent approach outlined by Escanciano [5]. Furthermore, the goodness-of-fit test statistics are easily computed and they are valid under a fairly large class of problems. To implement the proposed test statistic for testing (1.3) we take full advantage of model (1.1) by resorting to parametric bootstrap; see Section 3. It is shown theoretically that the bootstrap replications successfully approach the null distribution of the test statistic. Furthermore, Section 4 demonstrates empirically that this method works well in practice and does not depend on the particular smoothing function heavily.

Main results
Throughout this paper we assume that observations Y 0 , . . . , Y n are available and they are generated from a stationary process ((Y t , λ t )) t∈T which satisfies model (1.1). For simplicity of presentation we chose the index set T = Z which is always possible due to Kolmogorov's theorem. Before stating our main results, we will impose the following contractive condition on the function f : where κ 1 and κ 2 are nonnegative constants with κ := κ 1 + κ 2 < 1 .

Test statistic for testing simple hypothesis
It is instructive first to consider the case of the simple hypothesis (1.2). We first give some notation. Denote by I t = (λ t , Y t ) ′ the vector consisting of the values of the intensity process and the count process at time t. Recall that if λ 0 is some initial value, then the process λ t = f 0 ( λ t−1 , Y t−1 ) can be recursively defined. Put Note that the sequence {ξ t } consists of the so called Pearson residuals; see Kedem and Fokianos [15,Ch.1]. For an x ∈ Π := [0, ∞) 2 , define the following supremum type of test statistic by where w(·) is some suitably defined weight function. In the applications, we consider the weight function to be of the form w(x) = w(x 1 , x 2 ) = K(x 1 )K(x 2 ) where K(·) is a univariate kernel and x = (x 1 , x 2 ) ∈ Π. We employ the uniform, Gaussian and the Epanechnikov kernels. For instance, when the uniform kernel is employed, compute the test statistic (2.1), by using the weights where ½(A) is the indicator function of a set A. Then, the test statistic (2.1) Obvious formulas hold for the other kernel functions that we are using and for the case of the composite hypotheses that we consider in Section 2.2; see equation (2.2). Then Assumption (A1)(ii) is trivially satisfied for the Epanechnikov kernel. We include also in simulations test statistics formed after employing the uniform and Gaussian kernel even though they do not satisfy (A1)(ii); see below.
It is not clear whether there exists an optimal choice of the weight function w(·). However, given the theory developed in this paper, we suggest users to employ weight functions that satisfy the following assumption.
(i) The true data generating process is given by (1.1) such that the contraction assumption (C) is fulfilled. (ii) The weight function w(·) is not identical to zero and has bounded support contained in [−C, C] 2 , for some constant C, and satisfies |w(x) − w(y)| ≤ L w max{|x 1 − y 1 |, |x 2 − y 2 |} for all x, y ∈ Π and some L w < ∞.
is true and assumption (A1) is fulfilled.
where G = (G(x)) x∈Π is a centered Gaussian process with covariance function K, and convergence holds true w.r.t. the supremum metric.

Test statistic for testing composite hypotheses
By employing a similar notation as above, is the estimated unobserved process. Similar to the case of simple hypothesis, we define the estimated Pearson residuals, by It is obvious that test statistic (2.1) can be adapted for testing (1.3) as follows: Note that the processes ( λ t ) t∈N0 and ( λ t ) t∈N0 are not stationary in general. It can be shown by backward iterations that, for given (Y t ) t∈Z and θ ∈ Θ, the sys- for some measurable function g θ ; see also the proof of Theorem 3.1 in Neumann [19]. In the technical part below, we will use λ t ( θ n ) as an approximation for λ t . Under H 0 , we have λ t = λ t (θ 0 ), where θ 0 denotes the true parameter value. The assumptions below are essential to study the asymptotic behavior of test statistic (2.2). These are mild assumptions and are satisfied for many useful models; see Fokianos and Tjøstheim [11] and Doukhan et al. [2].
(i) Assume that θ n is an estimator of θ ∈ Θ ⊆ R d which satisfies (ii) λ t (θ) is a continuously differentiable function with respect to θ such that, .

Proposition 2.2.
Suppose that H 0 is true and assume further that (A1) to (A3) are satisfied. Then we have the following: where G = ( G(x)) x∈Π is a centered Gaussian process with covariance function K,

Corollary 2.2. Suppose that the assumptions of Proposition 2.2 are satisfied. Then
If, additionally, there exists an x ∈ Π with var( G(x)) > 0, then the distribution of sup x∈Π | G(x)| is absolutely continuous with respect to the Lebesgue measure and

Behavior of the test statistic under fixed and local alternatives
In this part of the article we consider the behavior of the test statistic (2.2) under fixed and local alternatives. In particular, for studying convergence under a sequence of local alternatives, we impose assumptions analogous to A6 and A3 of Escanciano [4] and Escanciano [5], respectively. (i) Consider the fixed alternative (ii) Consider the sequence of local alternatives where (P n ) n∈N are the distributions according to H 1,n . Then The above results in conjunction with Proposition 3.1 indicate that the test is expected to have nontrivial power.
Remark 1. Our initial attempt towards the problem of goodness-of-fit testing for count time series was based on supremum-type tests of the following form (Koul and Stute [16]) In fact, Escanciano [5] introduced a method to prove stochastic equicontinuity for processes such as (H n ) n∈N under ergodicity and minimal moment conditions. Unfortunately, we were not able to verify conditions similar to those in that paper. In particular, for the condition W3 of Escanciano [5] to hold true, the author gives as a sufficient condition that some conditional densities are dominated by the unconditional density. For count time series models, the conditional distribution of (λ t , Y t ) ′ given F t−2 is clearly discrete while this property is not guaranteed for the unconditional distribution in general. Furthermore, Escanciano and Mayoral [6] gave a set of alternative sufficient conditions for proving stochastic equicontinuity. Because of the discrete distributions we consider, we were not able to verify the counterpart of their condition A1(c). Nevertheless, the asymptotic distribution of supremum-type test statistics based on (2.4) is possible to be studied following the arguments of Koul and Stute [16] and utilizing the recent results on weak dependence properties obtained by Doukhan et al. [2].
Remark 2. It is some sort of folklore, mainly in the context of i.i.d. observations, that alternatives with a difference of order less than n −1/2 from the null model cannot be detected with an asymptotically nontrivial power. We believe that this is also true in the case of Poisson count processes. As a simple example, consider the simplified case where we have i.i.d. observations Y 1 , . . . , Y n , either with Y t ∼ P 0 = Poisson(λ 0 ), λ > 0, or with Y t ∼ P n = Poisson(λ n ), where λ n = λ 0 + c n , n 1/2 c n −→ n→∞ 0. Then the squared Hellinger distance between P 0 and P n fulfills Denote by P ⋆n 0 and P ⋆n n the n-fold products of the distributions P 0 and P n , respectively. Then we obtain for the Hellinger affinity of these distributions that Therefore, we obtain for the Hellinger distance between the product measures that which implies by the Cauchy-Schwartz inequality that k1,...,kn This proves that no test of the hypothesis H 0 : Y t ∼ P 0 versus H (n) 1 : Y t ∼ P n can have an asymptotically nontrivial power.

Bootstrap
Resampling methods have been used in different contexts by several authors to approximate the null distribution of test statistics; see for instance Stute et al. [24] and Franke et al. [12] among other authors. In the case of testing the composite hypotheses (1.3) for count time series, we will take advantage of the Poisson assumption and employ the following version of parametric bootstrap to compute p-values for testing (1.3).
Step 3 Given Step 4 Based on the above sample, compute the bootstrap counterparts G * n (x) and T * n of G n (x) and T n , respectively. The choice of the starting value λ * 0 will be discussed in the next section. The next theorem shows the appropriateness of the bootstrap approximation.
, the proposed test is consistent against fixed alternatives under this condition. The following corollary states that our bootstrap-based test has asymptotically the prescribed size.

An empirical study
We illustrate the performance of the proposed goodness-of-fit methodology by presenting a limited simulation study. More precisely, we study the test statistic (2.2). To compute the value of the test statistic, we use weight functions w(·) of the form w(x) = w(x 1 , x 2 ) = K(x 1 )K(x 2 ) where K(·) is a univariate kernel and x = (x 1 , x 2 ) ∈ Π as discussed before. Note again that we employ three types of kernels; the Gaussian, uniform and Epanechnikov kernels. The first two do not satisfy assumption (A1)(ii). They are included for completeness of the presentation and for demonstrating that assumption (A1)(ii) might be possible to be relaxed. We also study empirically the asymptotic distribution of the supremum type test statistic based on H n (x) for the purpose of comparison; recall (2.4). We use the maximum likelihood estimator to calculate the Pearson residuals. We set λ 0 = 0 and ∂λ 0 /∂θ = 0 for initialization of the recursions in the case of the linear model. To compute the p-value of the test statistic we use the proposed bootstrap methodology. Accordingly, the test statistic is recalculated for B parametric bootstrap replications of the data set. Then, if T n denotes the observed value of the test statistic and T ⋆ i;n denotes the value of the test statistic in i'th bootstrap run, the corresponding p-value used to determine acceptance/rejection is given by Throughout the simulations (and for the data analysis) we use B = 499. All results are based on 500 runs. We first study the empirical size of the test. Table 1 shows the achieved significance levels for testing the linear model given by λ t = θ 1 +θ 2 λ t−1 +θ 3 Y t−1 . It can be shown, that for the linear model, the stationary mean is given by Table 1 reports results when the stationary mean is equal to 2 (relatively low value) for two different configurations of the parameters and for different sample sizes. In both cases, we see that all test statistics achieve the nominal significance levels quite satisfactorily, especially for larger sample sizes. Furthermore, significance levels obtained by the test statistic based on (2.4) tend to underestimate the significance levels computed by the proposed goodness of fit tests. Table 2 shows the power of the various test statistics for the model λ t = θ 1 +θ 2 λ t−1 +(θ 3 +θ 4 exp(−θ 5 Y 2 t−1 ))Y t−1 with θ 4 , θ 5 > 0 such that θ 2 +θ 3 +θ 4 < 1. This is a case of a model which belongs to the local alternative specification studied in Proposition (2.3). In both cases, the power of all test statistics increase with the sample size. Note that in the first case the power is relatively low for the test statistic based on the Epanechnikov kernel but this behavior changes when considering the other model. For the case of second model and for large sample sizes the test statistics achieve relatively high power.
We study now the power of the test for testing the linear model against a log-linear model specification as suggested by Fokianos and Tjøstheim [10]. More specifically, if we denote by ν t = log λ t the log-mean process, then the log-linear model studied by Fokianos and Tjøstheim [10] is given by ν t = θ 1 + θ 2 ν t−1 + θ 3 log(1 + Y t−1 ), for t ≥ 1. Note that for this model, the parameters are allowed to be either positive or negative valued whereas for the linear model all the parameters have to be positive. It is rather challenging to compute the  mean and the autocovariance function of the process {Y t } with this particular log-mean specification. However, Fokianos and Tjøstheim [10] have shown that when θ 2 and θ 3 have the same sign such that |θ 2 + θ 3 | < 1 then the maximum likelihood estimator is consistent and asymptotically normally distributed. The same conclusion is true when θ 2 and θ 3 have opposite signs but in this case the required condition is θ 2 2 + θ 2 3 < 1. For the log-linear model, we use as starting values ν 0 = 1 and ∂ν 0 /∂θ = 0. Table 3 shows values of the empirical power of the test statistic T n when the data are generated by a log-linear model, as described before, for different sample sizes. We note that when the coefficient of the feedback process (that is θ 2 ) is positive, then the proposed test statistic achieves good power even though the sample size is relatively small. The performance of the test statistic based on H n compares favorably with the performance of all other test statistics in this case. In particular, we note that the power of the test statistic T n formed after In the first case, in which the data are generated by a log-linear model with θ 1 = 0.5, θ 2 = 0.65, θ 3 = −0.35, the autocorrelation function takes on negative values; hence the linear model will be rejected often since its autocorrelation function is always positive (see Fokianos and Tjøstheim [10], and Fokianos [8]). For the second case where data are generated by a log-linear model with θ 1 = 0.5, θ 2 = −0.55, θ 3 = 0.45, the autocorrelation function of the observed process behaves like an alternating sequence with lag one value positive. Therefore, the linear model might accommodate such dependence structure and therefore it is rejected less often. We note that the test statistic based on H n performs better than all other test statistics.

Some data examples
The theory is illustrated empirically by two real data examples discussed in detail by Fokianos [8]. We first examine the number of transactions for the stock Ericsson B, for one day period; these data were reported by Fokianos et al. [9] where a linear model was fitted to the data. This series is an example of a count processes where the autocorrelation function decays slowly to zero; therefore it is expected that a linear feedback model would fit the data adequately; for further information see Fokianos [8]. Table 4 shows the test statistics accompanied with their p-values which have been obtained after B = 499 bootstrap replications. We note that the test statistic T n points to the adequacy of the linear model regardless of the chosen kernel function that is employed. However, the test statistic based on H n raises some doubt about the linearity assumption though not overwhelming. Given the simplicity of fitting the linear model to the data, we conclude that it can describe the data adequately. Another data example that we consider is a time series of claims, referred to as Series C3, of short-term disability benefits made by cut-injured workers in the logging industry; see Zhu and Joe [28] and Fokianos [8]. In contrast to the previous example, this is an example of count time series whose autocorrelation function decays rapidly towards zero; hence the feedback mechanism might not provide any further insight into the modeling of this series; for more see Fokianos [8]. Table 4 shows that all test statistics point to the adequacy of linear model for fitting those data. In this case, we note that the results are similar regardless of the chosen test statistic.

Appendix
In what follows, we denote by w ∞ = sup x∈Π w(x) .

Proof of Proposition 2.1.
To prove the first part of the proposition we use the following decomposition: and the fact that | λ t − λ t | ≤ κ t 1 | λ 0 − λ 0 | which follows directly from assumption (C), we obtain that assertion (i) holds true. In view of this result, it suffices to prove that G n,0 Towards this goal, we need to prove convergence of the finite dimensional distributions and stochastic equicontinuity. Weak convergence of the finite-dimensional distributions of G n,0 to those of the process G follows from the Cramér-Wold device and the CLT for martingale difference arrays; see Pollard [21, p. 171].
Hence, if the event Ω ∞ j0 = ∞ j=j0+1 Ω j occurs, then we obtain from |T n,1 (x) − T n,1 (π j0 (x))| ≤ |T n,1 (x) − T n,1 (π J (x))| + J j=j0+1 λ j ∀J ≥ j 0 that provided that j 0 is sufficiently large. It remains to show that the probability of the unfavorable event Ω \ Ω ∞ j0 is smaller than or equal to η for sufficiently large j 0 . Since and a j;k1,k2 := sup Furthermore, we have The last equality follows from the boundedness of the support of w. Therefore, we obtain by the Bernstein-Freedman inequality (see Freedman [13,Proposition 2.1 Hence, we obtain, for j 0 sufficiently large, that Proof of Corollary 2.1. The first assertion follows from Proposition 2.1 and the continuous mapping theorem. Absolute continuity of the distribution of sup x∈Π |G(x)| will be derived from a result from Lifshits [17]. Since G(λ, y) → 0 as either λ → ∞ or y → ∞, this supremum has the same distribution as sup λ,y∈[0,1] | G(λ, y)|, where The process ( G(λ, y)) λ,y∈[0,1] is a centered Gaussian process defined on a compact set and with continuous sample paths. Hence, Proposition 3 of Lifshits [17] can be applied and it follows that sup λ,y G(λ, y) is absolutely continuous on (0, ∞) with respect to the Lebesgue measure. For the same reason, the distribution of sup λ,y (− G(λ, y)) is also absolutely continuous on (0, ∞). Hence, the distribution of sup λ,y |G(λ, y)| has no atom unequal to 0. However, since P (sup λ,y |G(λ, y)| = 0) = 1, we obtain the second assertion. Convergence of the distribution functions in the uniform norm can be deduced from the distributional convergence in conjunction with the continuity of the limiting distribution function; see e.g. van der Vaart [26,Lemma 2.11].
Proof of Proposition 2.2. (i) To simplify calculations, we approximate the λ t by their stationary counterparts, λ t ( θ n ) = g θn (Y t−1 , Y t−2 , . . .). It follows from the contractive condition (A3)(ii) that see e.g. inequality (5.13) in Neumann [19]. Therefore, we obtain that say. According to Neumann [19,Eq. (5.13)], if θ n − θ 0 ≤ δ, we obtain that | we see that the summands in R n,1 (x, θ n ) are essentially of order O P (n −1/2 ) which suggests that the sup x∈Π |R n,1 (x, θ n )| should be of negligible order. Apart from this heuristic, a sound proof of this fact is however more delicate since we have a supremum over x and the λ t ( θ n ) depend via θ n on the whole sample. To proceed, we will first approximate θ n by values from a sufficiently fine grid, apply an exponential inequality to the corresponding sums, and use finally continuity arguments to conclude. To invoke the Bernstein-Freedman inequality, we will replace as in the proof of Proposition 2.1 the ξ t byξ n,t = ξ n,t − E(ξ n,t | F t−1 ), where ξ n,t = ξ t ½(|ξ t | ≤ √ n). We split up =: R n,11 (x, θ n ) + R n,12 (x, θ n ) + R n,13 (x, θ n ).
Finally, since the weight function w has compact support [−C, C] 2 we obtain sup x : x >γn and, therefore we obtain that sup x : x >γn Using the Lipschitz continuity of w we obtain the estimate Therefore, together with To estimate R n,3 (x, θ n ), we split up Hence, it follows from (A2)(ii) and (A-1) that for some θ n between θ 0 and θ n we see that the first term on the right-hand side of (A-16) is o P (1). We obtain from (A-17) and the ergodic theorem that n −1 n t=1 ξ t (θ 0 ) a.s.
To show the second part of the result, we need to establish stochastic equicontinuity and convergence of finite dimensional distributions. Stochastic equicontinuity of (G n,0 ) n∈N was already shown in the second part of the proof of Proposition 2.1. Hence, it remains to show that the process (n −1/2 n t=1 E θ0 [λ 1 (θ 0 )w(x− I 0 )/ λ 1 (θ 0 )]l t ) x∈Π also possesses this property. We obtain from the Lipschitz continuity of the weight function w that Furthermore, we have that sup x : x ≥c x : x ≥c E θ0 [ λ 1 (θ 0 ) ½(I 0 ∈ supp(w(x − ·)))] −→ c→∞ 0.
These two facts yield, together with n −1/2 n t=1 l t = O P (1) the desired stochastic equicontinuity.
In addition, weak convergence of the finite-dimensional distributions of G n,0 to those of the process G follows from the Cramér-Wold device and the CLT for martingale difference arrays given on page 171 in Pollard [21]. fθ(I t−1 ) w(x − I t−1 (θ)) =: R n,1 (x) + R n,2 (x).
We can see by calculations analogous to that for G n,0 that (n 1/2 R n,1 (x)) x∈Π converges to a certain Gaussian process that yields sup x∈Π |R n,1 (x)| = o P (1).
Summarizing the above calculations we obtain with sup x∈Π |R n (x)| = o Pn (1).
Moreover, the first term on the right-hand side of (A-21) is stochastically equicontinuous. The assertion follows now from the Cramér-Wold device and the CLT for martingale difference arrays given on page 171 in Pollard [21].
Proof of Proposition 3.1. In all cases, stochastic equicontinuity can be proved as above. And we obtain from the contractive property that E * (λ * 0 ) 2 = O P (1). To obtain part (i) of the proposition, note that under H 0 and H 1,n , the conditional distributions of the bootstrap variables converge weakly to those of the original random variables, in probability.