Estimation and testing linearity for non-linear mixed poisson autoregressions

Non-linear mixed Poisson autoregressive models are studied for the analysis of count time series. Given a correct mean specification of the model, we discuss quasi maximum likelihood estimation based on Poisson log-likelihood function. A score testing procedure for checking linearity of the mean process is developed. We consider the cases of identifiable and non identifiable parameters under the null hypothesis. When the parameters are identifiable then a chi-square approximation to the distribution of the score test is obtained. In the case of non identifiable parameters, a supremum score type test statistic is employed for checking linearity of the mean process. The methodology is applied to simulated and real data.


Introduction
The aim of this contribution is to study estimation and testing for non-linear mixed Poisson autoregressions. In fact, we enlarge the existing framework of Poisson and negative binomial based autoregressive models, as discussed by [35,Ch.4], [12] and [22], among others. It turns out that the class of mixed Poisson models contains numerous examples of integer-valued models including Poissonstopped-sum distributions ( [32,Sec. 3]), Tweedie-Poisson models ( [33,37]) and other. Mixed Poisson models do not necessarily belong to the exponential family models as discussed by [35] and more recently by [13]; a case in point is the negative binomial distribution with an unknown dispersion parameter.
We discuss ergodicity and stationarity conditions of those models by employing the notion of weak dependence (see [15]). Furthermore, assuming that the mean of the process has been correctly specified, we develop quasi maximum likelihood inference by employing the Poisson log-likelihood function. This approach avoids complicated likelihood functions-as in the case of mixed Poisson models-yet it produces consistent estimates under the correct mean specification. We omit details regarding estimation theory because these can be found in [9] for the case of negative binomial process.
We are particularly interested in testing linearity of the assumed model for the mean process. This question can be attacked by using the likelihood ratio, Wald or score (or Lagrange Multiplier) tests. The score test is often a very convenient tool because it does not require estimation of the model under the alternative. However, its application in the context of mixed Poisson time series modes, needs to be done with care because the test statistic is calculated by employing a quasi-likelihood function. All aforementioned types of test statistics are asymptotically equivalent (cf. [26,Ch. 8]).
Special attention is paid to two classes of nonlinear models specifying the mean process of a count time series under the alternative hypotheses. The first class consists of identifiable models. In this case and under the null hypothesis of linearity, the score test statistic possesses an asymptotic chi-square distribution. The second class consists of models in which a nonnegative nuisance parameter exists only under the alternative hypothesis. Then the testing problem is nonstandard and the classical asymptotic theory does not apply; see [11,2,44,1,40] among others. A notable example of a time series model which is not identifiable under the null, is the threshold model ( [45]). We propose a supremum score-type test statistic for dealing with the problem of non-identifiability. Our contribution is summarized by the following: • An introduction of a general framework for modeling count time series.
Indeed, mixed Poisson models include several parametric models which are quite useful to integer valued data analysis. • Inference and testing linearity based on a quasi-likelihood function. In fact, testing hypotheses by employing a score test which is based on a quasilikelihood function requires suitable adjustment, as we show in Proposition 3.1. • The proposed score test gives an additional tool to the data analyst for checking linearity of a given model. Further diagnostics for identification of a suitable model can be found in [34,10,13].
The paper is organized as follows. Section 2 discusses mixed Poisson autoregression and gives conditions for proving ergodicity, stationarity and existence of moments for the proposed models. In addition, we develop quasi-likelihood inference to obtain consistent regression parameters. This section generalizes previous results (see [9]); the interested reader is referred to this work for further details. This section lays out basic ideas and notation used throughout the paper. In Section 3, we discuss the score test for testing linearity of the assumed model for the mean process. We consider the cases of identifiable and non identifiable parameters. Section 4 reports empirical results for the performance of the proposed test statistics. In Section 5 the proposed testing methodology is illustrated to a real count time series.

Mixed poisson autoregression and inference
Assume that {Y t , t ∈ Z} denotes a count time series and let {λ t , t ∈ Z} be a sequence of mean processes. Denote by F Y,λ t the past of the process up to and including time t, that is F Y,λ t = σ(Y s , s ≤ t, λ 0 ), where λ 0 denotes some starting value. We will study the following class of count time series models defined by In the above,Ñ t is a standard homogeneous Poisson process (that is a Poisson process with rate equal to 1) and {Z t } denotes a sequence of independent and identically distributed positive random variables with mean 1, such that E|Z t | r < ∞, r ∈ N, which are independent ofÑ t . In addition, we assume that λ t is measurable with respect to {Y s , s < t} and Z t is independent of {Y s , s < t}.
The family of processes belonging to (2.1) is called mixed Poisson process (see [41], for instance). Two important distributional assumptions are implied by (2.1) and they are routinely employed for the analysis of count time series. Namely, the Poisson distribution given by and the negative binomial distribution given by where ν > 0. Note that (2.2) is a special case of (2.1) when {Z t } is a sequence of degenerate random variables with mean 1. Furthermore, (2.3) is a special case of (2.1) when {Z t } are iid Gamma with mean 1 and variance 1/ν. Related work for the case of the negative binomial is that of [49] but with a different parametrization and when the function f (·, ·) of (2.1) is linear (see (2.5)). Regardless of the choice of Z s, the conditional mean of {Y t } as given by (2.1) is always equal to λ t . Furthermore, the variance of (2.1) is given by The conditional variance of the Poisson distribution is equal to λ t , whereas the conditional variance of (2.3) is equal to λ t + λ 2 t /ν. We show that modeling based on (2.1) generalizes several existing results reported in the literature. Consider, for instance, the work by [50] who suggests the generalized Poisson distribution for modeling count time series data; see [32, pp. 396] who prove that the generalized Poisson distribution is proper distribution for a certain range of parameter values. In fact, the generalized Poisson distribution is a Poisson-stopped-sum distribution; hence it is an infinitely divisible distribution. [31] use the property of infinite divisibility to show that a proper generalized Poisson distribution is a mixed Poisson distribution. Hence, (2.1) covers the case of generalized Poisson distribution after suitable model reparametrization. Moreover, likelihood based inference for the generalized Poisson distribution is implemented by constraining its parameters; see [32, pp. 399] for more. In general, mixed Poisson distributions with an infinitely divisible mixing random variable Z are Poisson-stopped-sum distribution; [32, pp. 324].
Another count time series model, which was suggested by [51], is that of zeroinflated Poisson (negative binomial) distribution. These models are included in the framework introduced by (2.1). When the mixing variables Z t are binary, then we obtain a zero-inflated Poisson model. A similar argument yields a zeroinflated negative binomial model. In this contribution, we will employ quasimaximum likelihood for estimation. In the context of zero-inflated Poisson and negative binomial models, [51] employs the E-M algorithm. We point out that [50,51] considers linear autoregressions but we consider nonlinear models, in general. As a final remark, the works of [33] and [37] show that suitable choice of the mixing variables Z t in (2.1) yields to Tweedie-Poisson exponential dispersion models which include the Neyman type A and the Poisson-inverse Gaussian distributions.

Modeling
Assume that the mean process of (2.1) is given by where f (·, ·) is a parametric function defined on N 0 × R + and taking values on (0, ∞), where N 0 = {0, 1, 2, . . .}. For instance, consider the linear model where d > 0, a 1 > 0, b 1 > 0, such that a 1 + b 1 < 1. Model (2.5) has been shown to be stationary with any moments (see [22,43] for the Poisson case and [9] for the negative binomial case). Some examples of (2.4) include and Models (2.6) and (2.7) are modifications of analogous models studied by [43,23,19,9]. Model (2.6) introduces a deviation of (2.5) in the sense that small values of γ make (2.6) to approach (2.5). [9] show that when max{b 1 , dγ − b 1 }+a 1 < 1, model (2.6) is ergodic and stationary whose moments are finite. The same holds for the mixed Poisson process specification given by (2.1). Similarly model (2.7) can be viewed as a Smooth Transition Autoreggressive (STAR) model (see [44]). It turns out that when 0 < a 1 + b 1 + c 1 < 1, model (2.7) is ergodic, stationary and it has moments of all orders. Model (2.8) is a threshold model and has been studied recently by [47,18,46]. In particular, [46] show that (2.8) posses a stationary and ergodic solution, with any moments if 0 < a 1 + b 1 < 1 and a 1 + a 2 < 1, under the Poisson assumption (provided that min(d 1 , a 1 , b 1 , d+d 2 , a 1 +a 2 , b 1 +b 2 ) > 0). Furthermore, to obtain the asymptotic distribution of the MLE, they stipulate the condition b 1 + b 2 < 1. If all the coefficients are positive, then the condition a 1 + a 2 + b 1 + b 2 < 1 guarantees that (2.8) has a unique stationary solution which possess any moments; in addition the MLE of the parameter vector (a 1 , a 2 , b 1 , b 2 ) is consistent and asymptotically normally distributed (see also [47,18] for stationarity and consistency of the MLE for model (2.8)). We are not aware of any study regarding the probabilistic properties of the threshold model under any other distributional assumption. The following proposition generalizes [9, Thm.1] and it applies to (2.6) and (2.7); see also [13] in the context of exponential family. Proposition 2.1. Suppose that {Y t , t ∈ Z} is a count time series specified by (2.1) with a mean process {λ t , t ∈ Z} given by (2.4). Let {Z t } be a sequence of independent and identically distributed positive random variables with mean 1, such that E|Z t | r < ∞, r ∈ N, which are independent ofÑ t and independent of {Y s , s < t}. Suppose that there exist constants α 1 , α 2 of non-negative real numbers such that (2.4) which is stationary, ergodic and for any r ∈ N satisfies E (Y 0 , λ 0 ) r < ∞.

Inference
For the case of mixed Poisson models (2.1), it is rather challenging, in general, to have readily available a likelihood function, because the distribution of the mixing variable Z t is generally unknown. Hence, we resort to a quasi maximum likelihood (QMLE) methodology. This method is quite analogous to quasi-likelihood inference developed for estimation and fitting of ordinary GARCH models. For instance [4,25,42,3] among others, study the Gaussian likelihood function irrespectively of the assumed error distribution. It turns out that QMLE are consistent estimators of regression parameters under a correct mean process specification (see also [48,28,30], for instance). To define properly the QMLE, consider the Poisson log-likelihood function, conditional on some starting value λ 0 , where θ denotes the unknown parameter vector. The quasi-score function is defined by The solution of the system of nonlinear equations S n (θ) = 0, if it exists, yields the QMLE of θ which we denote byθ. The conditional information matrix is defined by It can be shown, under the assumptions of [9, Thm. 2] thatθ is consistent and asymptotically normally distributed; that is where the matrices G and G 1 are given by and where m denotes the dimension of θ andλ t = λ t (θ). There exists at most one solution of (2.14). If it exists, then it is consistent and can be calculated by existing software (cf. [14,6]). For the case σ 2 Z = 1/ν, see [7,Ch. 3] and [38].

Testing linearity
Testing linearity, within a parametric framework, is a problem which is attacked by computing the likelihood ratio, Wald and score tests. The likelihood ratio and Wald tests require estimation for the full model which can be computationally challenging. Consider (2.7), for instance. To obtain the maximum likelihood estimation of γ, we need rather large sample sizes. The problem's complexity increases especially for small values of γ, as empirical experience has shown; see also [52,Ch.18,p. 684] who consider smooth transition models. For (2.8), it is well known, from the linear model estimation theory, that the threshold parameter r is not estimated at the rate n −1/2 (see [8]). Hence, testing and inference in the context of (2.8) rise challenging computational and theoretical issues.
The computational advantage of the score test is that it is calculated after estimating the constrained model under the null. In other words, testing for linearity requires estimation of the simple linear model (2.5). In addition, the asymptotic distribution of the score statistic is not affected when parameters lie at the boundary of the hypothesis, (see [26,Ch. 8]). Denote by θ = (θ (1) , θ (2) ) the unknown parameter, where θ (1) and θ (2) are vectors of dimension m 1 and m 2 , respectively, such that m 1 + m 2 = m. The hypotheses of interest are n ) be the constrained quasi-likelihood estimator of θ = (θ (1) , θ (2) ). Put S n = (S (1) n , S (2) n ) for the corresponding partition of the score function. The general form of the score statistic is given by (cf. [6] and [29,Ch. 5]), In the above, Σ is an appropriate estimator for the covariance matrix Σ = Var(S (2) n (θ n )/ √ n). Because this approach is based on quasi-score instead of the true score, certain adjustments should be made for obtaining its asymptotic distribution. Recall (2.12) and consider the following partition and similarly for G 1 (see (2.13)). Then it can be shown that (3.2) If the true distribution is Poisson, then the matrices G and G 1 coincide and

Standard implementation
If all the parameters are identified under the null hypothesis, then the standard asymptotic theory holds as the following result shows.
Consider the problem Then the score test statistic converges to a chi-square random variable, i.e.
as n → ∞, when H 0 is true. In addition, consider testing The proof is postponed to the appendix. Recall (2.6) and let θ (1) = (d, a 1 , b 1 ) and θ (2) = γ. The hypothesis H 0 : γ = 0 implies that the parameter γ lies at the boundary of the parameter space. Following [26, pp. 196], Proposition 3.1 assures that the quasi-likelihood based score test statistic follows asymptotically the chi-square distribution with one degree of freedom for testing γ = 0.

Non-standard implementation
We deviate from the notation used so far in this section so that we can bring across the main ideas. We are interested on testing linearity of the mean process when a non-linear model contains nuisance parameters that are not identified under the null; recall (2.7) and (2.8). Then the quasi maximum likelihood estimators have nonstandard behavior. The lack of identification affects also the score test and the classical asymptotic theory does not apply. Consider (2.7). When c 1 = 0, the parameter γ is not identified under the null. Consequently, testing of (3.4) cannot be implemented in a standard way and the traditional large sample theory does not apply. To deal with this problem, we consider a fixed arbitrary value of γ. Then the model is still linear in the parameters and the score statistic LM n is asymptotically distributed as a chi-square random variable with one degree of freedom, under the null. Despite the fact that this test is in general consistent, even for alternatives where γ 0 = γ, it may lack power for alternatives where γ 0 is far from γ. To avoid low values of power obtained by the above approach, we propose a supremum type of test statistic. Consider Γ, a grid of values for the parameter γ. Then the sup-score test statistic is given by where LM n (γ) is given by (3.1). We reject hypothesis Similarly, consider a grid of values for the threshold parameter r, say Γ, and calculate LM n = sup r∈Γ LM n (r).
We reject (3.6) for large values of the test statistic.

Remark 3.1.
It is of interest to develop the asymptotic distribution of the supremum type score test statistic LM n similarly to the works by [24] and [39]. Note that Proposition 2.1 shows that the joint process (Y t , λ t ) is τ -weakly dependent ( [15,20]). However a uniform central limit theorem for multivariate τ -weakly dependent process is not available in the literature, to the best of our knowledge (but see [16] who prove a central limit theorem for the case of univariate empirical distribution function). For univariate functions of mixing process, see additionally the recent work by [17]. The main challenge is to extend the work of [16] to multivariate empirical processes indexed by classes of functions. We conjecture that under the null hypothesis and using the assumptions of Proposition 2.1, the process sup γ LM n (γ), as defined by (3.5), will converge to the supremum of a chi-square process (see also [11]). Our work focuses explicitly on the computational aspects of LM n and the problem of obtaining its asymptotic distribution will be examined elsewhere.
In the following, we examine the finite sample behavior of the score test under both cases of identifiable and non identifiable parameters.

Simulation study
We present a limited simulation study to demonstrate empirically the theoretical results. For this work, we employ parametric bootstrap based either on the Poisson distribution or the negative binomial distribution. More specifically, given the data, we estimate by QMLE the parameters of model (2.5) and calculate the score test statistic LM n given by In the case of a non identifiable parameter, the score test is computed by choosing a grid for the values of the non identifiable parameter and then take the supremum over this grid. We use B = 499 bootstrap replicates and 200 simulations for various sample sizes.

Results for identifiable models
We consider the size of the proposed score test statistic. Recall model (2.5) and choose (d, a 1 , b 1 ) = (1.5, 0.05, 0.6) and n = 250, 500 and 1000. We maximize the quasi log-likelihood function (2.9) by a quasi-Newton method using the constrOptim() function of R to obtain the QMLE. For selected nominal significance level α = 1%, α = 5% or α = 10% we obtain the results reported by Table 1 for the size of the test statistic (3.1). For comparison, the last three columns list the achieved significance levels of the test derived from the asymptotic chi-square distribution with one degree of freedom. The empirical results show that the bootstrap approximation performs well; in fact it yields achieved significance levels which are closer to nominal significance levels, especially for smaller sample size.
To study the power of the test statistic we work analogously under the model where γ = 0.3, 0.5, 1. Table 2-which is obtained in the same manner as Table 1shows that as γ assumes larger values, the power of the test statistic (3.1) approaches unity; see also Figure 1.

Results for non identifiable models
To examine the performance of the sup-score test statistic (3.5) for testing model (2.7), put  Table 3 shows that the test statistic, generally, achieves its nominal significance level. However, larger sample sizes yield to more accurate approximation of the nominal level. Table 4 shows that the power of the test statistic is relatively  large for increasing sample sizes and for values of γ close to zero provided that the parameter c 1 is also of large magnitude. Consider next the threshold model which specifies the time series mean process by model (2.8). The empirical results for testing (3.6) are based on the Poisson assumption. The sup-score test statistic is computed by defining a grid of values for the threshold parameter r. Following the suggestion of [46], let the grid defined by a sequence of ten equidistant values in the interval [q 1 , q 2 ], Table 4 Empirical power for test statistic (3.5) for testing ( where q 1 and q 2 are set to be respectively the empirical 0.2 and 0.8 quantile of each time series replication. Table 5 (respectively, Table 6) reports simulation results on the size (respectively, power) of the test statistic. The approximation of nominal significance levels improves for large sample sizes. Furthermore, the power of the test statistic increases for values of parameters a 2 and b 2 of large magnitude.

Example
We consider a time series of weekly number of measles at Sheffield for the period between September 8th, 1978 and April 17th, 1987. The total number of observations is 450. The sample mean (sample variance, respectively) of these data is 17.151 (265.781, respectively), showing strong overdispersion. Employing the quasi-likelihood methodology as outlined in Section 2.2, we obtain the QMLE of the regression parameters for each of models (2.5)-(2.8). Table 7 summarizes the results. We also report standard errors (in parentheses) where the first row shows    To test for linearity for each of these models, we use B = 499 bootstrap replications and calculate the p-values for each model and for each distributional assumption. Table 8 reports the results and indicates that the linear model (2.5) is always accepted. This is in accordance with the results obtained by Table 7 when we compare the estimators with their standard errors. To support further this conclusion we have calculated the scoring rules considered by [34,10,13]. All results obtained by fitting different models under different response distributions are close to the results obtained by [10] and they are not reported again. However, all these methods point out to the suitability of the linear mixed Poisson model even though some non-stationarity is quite evident. and the first m 1 components yield Thus, we have that It holds that KG K = G 21  Consider now the misspecified model where the data are generated from the mixed Poisson model. Following [26] and [27, pp. 126], (see also [5,36,21]), the score statistic is given by LM n = S n (θ n )G −1 K (KG −1 G 1 G −1 K ) −1 KG −1 S n (θ n ) = S (2) n (θ n )KG −1 K (KG −1 G 1 G −1 K ) −1 KG −1 K S (2) n (θ n ). Some calculations yield that KG −1 where Σ MP is given by (3.2), that is Σ MP = G 1,22 − G 21 G −1 11 G 1,12 − G 1,21 G −1 11 G 12 + G 21 G −1 11 G 1,11 G −1 11 G 12 and thus, the score test statistic is given by LM n = S (2) n (θ n ) Σ . Assume now that the data are generated again from a Poisson model under the local Pitman-type alternatives H 1 : θ (2) = θ (2) 0 + n −1/2 δ. In order to obtain the limiting distribution of the score test statistic (3.1) under the alternative, we only need the asymptotic distribution of the score function under H 1 . By a Taylor expansion of S n (θ 0 ) about θ n = θ 0 + n −1/2 δ * , where now δ * is a fixed vector in R m + of the form δ * = (δ 1 , δ), we have that n −1/2 S n (θ 0 ) op (1) = n −1/2 S n (θ n ) − n −1 H n (θ n )δ * .