Goodness‐of‐fit tests for Poisson count time series based on the Stein–Chen identity

To test the null hypothesis of a Poisson marginal distribution, test statistics based on the Stein–Chen identity are proposed. For a wide class of Poisson count time series, the asymptotic distribution of different types of Stein–Chen statistics is derived, also if multiple statistics are jointly applied. The performance of the tests is analyzed with simulations, as well as the question which Stein–Chen functions should be used for which alternative. Illustrative data examples are presented, and possible extensions of the novel Stein–Chen approach are discussed as well.


INTRODUCTION
introduced an approach for bounding the error if approximating distributions, nowadays referred to as the "Stein's method." In the special case of the Poisson approximation of a count distribution (i.e., a distribution on the set N 0 = {0, 1, … }), which was developed by Chen (1975), it is referred to as the "Stein-Chen method" (Erhardsson, 2005;Stein, 1986). The Stein-Chen characterization of a Poisson-distributed count random variable X with mean > 0, X ∼ Poi( ), uses expectations of functions f ∶ N 0 → R defined on the nonnegative integers. More precisely, the (univariate) Stein-Chen identity states that holds for any bounded f , if and only if X ∼ Poi( ). Note that the necessity part of the Stein-Chen identity holds for any function f ∶ N 0 → R, provided that the involved moments exist. Besides its original application within Poisson approximations, Weiß and Aleksandrov (2020) demonstrated that Equation (1) also constitutes a useful tool for computing diverse Poisson moments. In the present work, however, our aim is different: we use the Stein-Chen identity (1) to derive novel goodness-of-fit (GoF) tests for the Poisson distribution; see Beltrán-Beltrán and O'Reilly (2019) and Gürtler and Henze (2000) for references on Poi-GoF tests. Generally, there are two strategies for defining a GoF test statistic: one can try to capture (almost) the whole distribution, or one can focus on specific characteristic properties thereof. The first group comprises, for example, the famous Pearson statistic, while the second group includes the widely used Fisher's index of dispersion (Beltrán-Beltrán & O'Reilly, 2019;Gürtler & Henze, 2000;Kyriakoussis, Li, & Papadopoulos, 1998). Tests from the first group offer the potential of being powerful to nearly any violation of the null hypothesis, but they are often difficult to implement in practice, especially if not being concerned with independent and identically distributed (i.i.d.) but with time series data (Weiß, 2018b). The tests from the second group, by contrast, often have feasible asymptotic distributions also for the time series case. They are sensitive only regarding selected alternative scenarios (e.g., the dispersion test can detect alternatives going along with changes in the dispersion structure), but then, they often exhibit an excellent power.
The Stein-Chen characterization (1) of the Poi-distribution might be used to construct a GoF test belonging to the first group, as it was done by Betsch, Ebner, and Nestmann (2020) for the i.i.d. case, but this test requires a bootstrap implementation. In this article, by contrast, we use the Stein-Chen identity to develop GoF tests from the second group, but with great flexibility regarding the covered alternatives, and with a unique approach for computing the test statistics' asymptotic distribution (also comprising the time series case). The outstanding feature of our proposed Stein-Chen tests is the fact that the function f in (1), that is, the weighting scheme used for the expectations, is chosen flexibly with respect to the relevant alternative scenario. Furthermore, we apply our novel GoF tests not only to i.i.d. counts, but also to a rather wide class of time series models having a Poisson marginal distribution. The relevant types of data generating process (DGP) have the property that any pair (X t , X t−k ) with time lag k ∈ N = {1, 2, … } follows a bivariate Poisson (BPoi) distribution (Johnson, Kotz, & Balakrishnan, 1997, chapter 37), and corresponding models can be found within the Poi-INARMA family (Poisson INteger-valued AutoRegressive Moving Average), see Section 2 for details. In Section 3, these pairwise BPoi-distributions, together with the bivariate Stein-Chen identities of Weiß and Aleksandrov (2020), are used to derive the asymptotic distribution of our novel GoF test statistics. The performance of the resulting Stein-Chen GoF tests is investigated with simulations for diverse types of DGPs and alternative scenarios, see Section 5. In Section 6, we consider the joint application of different Stein-Chen tests, which might be used for further improving the power performance. The proposed approaches are also illustrated by several real-data examples in Sections 3 and 6. Finally, Section 7 concludes the article and outlines issues for future research.

ON POISSON INARMA PROCESSES
INARMA models have been proposed as integer-valued counterparts to the ordinary ARMA models for real-valued time series. Their basic idea is to replace the multiplication in the ARMA's model recursion by an appropriate type of thinning operation such that the integer nature of the counts is preserved (Weiß, 2018a). For the Poi-INARMA models considered here, one uses the operation of binomial thinning "•" (Steutel & van Harn, 1979): •X|X ∼ Bin(X, ) with thinning parameter ∈ [0; 1). Then, the INAR(p) model with p ∈ N (Alzaid & Al-Osh, 1990) is defined by the recursion where the innovations ( t ) Z are i.i.d. count random variables. We added the subscript "t" to the p thinnings in (2) to express that these are applied at time t for generating the tth observation, X t . The thinnings 1 • t , … , p • t at time t are executed independently of each other, and independent of the innovations ( t ) Z . But thinnings corresponding to different times might be mutually dependent: in the model proposed by Alzaid and Al-Osh (1990), the conditional distribution of ( 1 • t+1 X t , … , p • t+p X t ) given X t (these are the thinnings of X t used for the observations at times t + 1, Here, BPoi( 0 ; 1 , 2 ) refers to the bivariate Poisson distribution with parameters 0 ≥ 0 and 1 , 2 > 0, which is defined as the distribution of (Z 0 + Z 1 , Z 0 + Z 2 ) with independent Poisson variates Z i ∼ Poi( i ) for i = 0, 1, 2 (Johnson et al., 1997, chapter 37). The autocorrelation function (ACF) involved in (3) equals (k) = k 1 if p = 1, and if p = 2, we get (Alzaid & Al-Osh, 1988) (1) = 1 , Note that the type of Poi-INAR(p) model proposed by Du and Li (1991), where ( 1 • t+1 X t , … , p • t+p X t ) given X t are conditionally independent (and not multinomial as in the model by Alzaid & Al-Osh, 1990), does not satisfy the "BPoi , (k) -property" (3) for p ≥ 2; in fact, it does not even have a Poisson marginal distribution. A comprehensive analysis of Poi-INMA(q) models is provided by Weiß (2008). They rely on the model recursion where the q + 1 thinnings at time t are performed independently of each other (commonly, one sets 0 = 1 by using the convention 1• X = X). With the notation • ∶= ∑ q j=0 j , it holds that Poi ( ∕ • )-innovations lead to Poi( )-observations. The pairwise bivariate distributions and the ACF further depend on the actual choice for the conditional distribution of ( 0 • t t , 1 • t+1 t , … , q • t+q t ) given t , see Weiß (2008) for details. But in any case, the relation (3) still holds true.
Hence, several types of Poi-INARMA processes have the property that pairs (X t , X t−k ) are bivariately Poisson distributed. Their distribution BPoi , (k) is even symmetric, because the 1and 2 -parameter agree with each other due to stationarity. This symmetry is later used to simplify the asymptotic derivations. In particular, it allows to simplify the bivariate Stein-Chen identities in theorem 4.1 of Weiß and Aleksandrov (2020), leading to the following necessary conditions for (X t , X t+k ) being BPoi , (k) distributed. and provided that the involved expectations exist.

STEIN-CHEN GOF TESTS
One of the most well-known characterizations of the Poisson distribution is its equidispersion property, that is, its variance is equal to the mean, whereas common non-Poisson distributions often have a variance being larger (overdispersion) or smaller (underdispersion) than the mean. Thus, to test the null hypothesis of a Poisson distribution, one may focus on testing for equidispersion. This is the idea behind the widely used Fisher's index of dispersion (Beltrán-Beltrán & O'Reilly, 2019;Gürtler & Henze, 2000;Kyriakoussis et al., 1998), which has also been used for some types of Poi-INARMA processes, see Aleksandrov and Weiß (2020a) and the references therein. The dispersion index Î is the quotient of the empirical variance to the mean, which can be rewritten as The equidispersion property itself can be stated as Equations (8) and (9) show that Î and the equidispersion property correspond to the Stein-Chen identity (1) for the specific choice f (x) = x − 1. While the dispersion index (and thus the choice f (x) = x − 1 for (1)) is known to perform very well for uncovering overdispersion (this is confirmed later in Section 5), there might be better solutions for different types of alternative (i.e., other choices of f that lead to an improved power).

Stein-Chen statistics
In what follows, we utilize the general Stein-Chen identity (1) to derive GoF tests for detecting deviations from a Poisson distribution, given a count time series X 1 , … , X n of length n ∈ N. Therefore, we replace the theoretical moments involved in (1) by the following sample moments: At this point, the main idea behind the proposed Stein-Chen approach becomes clear: the function f serves as some kind of weighting function, which gives a large weight to, for example, low counts or a certain segment of counts. Several example choices for f are later discussed in Section 5; the dispersion index (8) uses the linear function f (x) = x − 1.
There are many different ways of defining test statistics via (1). For each considered statistic, we shall derive a normal approximation based on asymptotic considerations (under the Poisson null hypothesis). But since it is not clear in advance for which type of statistic such a normal approximation will perform best, we include the following candidates into our investigations: If the value of the test statisticsT 1 andT 2 deviates significantly from 1, then the DGP is supposed not to be Poisson distributed. In the case ofT 3 andT 4 , we look for departures from 0. For the ratio statisticsT 1 ,T 2 ,T 4 , possible issues regarding a division by zero have to be kept in mind. Recall that the test statisticT 4 is equivalent (except a constant) to the ordinary dispersion index (8) if choosing f (x) = x − 1 (linear weighting scheme), whereasT 2 then corresponds to the modified dispersion statistic in Kyriakoussis et al. (1998). Since the dispersion test is one of the most widely used Poi-GoF tests in practice (also for Poi-INARMA processes, see Aleksandrov & Weiß, 2020a), it constitutes the natural benchmark test in this work.

Univariate asymptotics
We start our asymptotic investigations with a central limit theorem (CLT) for the three-dimensional vector-valued process (Y t ) Z given by Under the null X t ∼ Poi( ), because of the Stein-Chen identity (1), we have E[Y t ] = 0. Furthermore, we assume that the DGP (under the null) is such that the CLT of Ibragimov (1962) can be applied to n −1∕2 ∑ n t=1 Y t . This is the case if the process is -mixing with exponentially decreasing weights as satisfied by the above Poi-INAR(p) process (Doukhan, Fokianos, & Li, 2012) (the Poi-INMA(q) process is even q-dependent), but the requirements in Ibragimov (1962) are even slightly weaker. The following lemma summarizes the resulting asymptotics.
Lemma 1. Let the DGP (X t ) Z satisfy (X t , X t−k ) ∼ BPoi , (k) as well as the aforementioned mixing conditions.
. Then, n −1∕2 ∑ n t=1 Y t according to (11) is asymptotically normally distributed with mean 0 and covariance matrix = ( ij ), where The proof of Lemma 1 can be found in Appendix A.1. In the following Theorem 1, we derive the asymptotics of the statisticsT 1 , … ,T 4 from (10). To allow for improved size properties of the corresponding GoF tests (relevant for small n), we offer bias-corrected formulas for the means of T 1 , … ,T 4 . Theorem 1. Let the DGP (X t ) Z satisfy the assumptions in Lemma 1. Then, the distribution of the statisticT i , i = 1, … , 4, according to (10) can be approximated by the normal distribution N( T i , 2 T i ∕n), where the bias-corrected means are given by , and the variances by Here, the expressions for ij are provided by Lemma 1.
Theorem 1 follows from Lemma 1 by utilizing Taylor expansions of the statistics ("Delta method"); the proof can be found in Appendix A.2. The required values of i and ij (k) can be computed numerically by truncating the involved infinite sums, for example, with M sufficiently large. Further implementation details are provided in Section 4.

Multivariate asymptotics
For a joint application of multiple Stein-Chen GoF statistics, their joint (asymptotic) distribution is required for computing the resulting size. Therefore, the asymptotics of Section 3 have to be extended in the following way. Let us focus on the case of two Stein-Chen functions f , g ∶ N 0 → R; a further extension to more than two functions is straightforward. First, we substitute the three-dimensional definition of (Y t ) Z according to (11) by a five-dimensional one, namely, If the DGP (under the null) satisfies the mixing conditions of Lemma 1, then n −1∕2 ∑ n t=1 Y t is asymptotically normally distributed. Lemma 2. Let the DGP (X t ) Z satisfy the assumptions in Lemma 1. Then, n −1∕2 ∑ n t=1 Y t according to (12) is asymptotically normally distributed with mean 0 and covariance matrix = ( ij ), where • 11 , 22 , 33 , 12 , 13 , 23 are the same as in Lemma 1, • 44 , 55 , 14 , 15 , 45 agree with 22 , 33 , 12 , 13 , 23 , respectively, after having replaced by , and where The proof of Lemma 2 is provided by Appendix A.3. In the second step, we apply the statisticŝ T 1 , … ,T 4 from (10) to the Stein-Chen functions f , g and derive their joint asymptotic distributions in analogy to our derivation of Theorem 1. To keep it simple, we use a unique type of statistic for both f , g, but the derivations are easily adapted to the case of different statistics as well. Since the bias corrections and asymptotic variances are already provided by Theorem 1, the subsequent Theorem 2 concentrates on the asymptotic cross-covariance of both statistics.
Theorem 2. Let the DGP (X t ) Z satisfy the assumptions in Lemma 1, and let T f ;i express that statistic T i is combined with function f. Then, the joint distribution of the statistics (T f ;i ,T g;i ) can be approximated by the multivariate normal distribution with the bias-corrected means ( T f ;i , T g;i ) according to Theorem 1, and with the covariance matrix ∕n, where has the diagonal entries 2 For the proof of Theorem 2, see Appendix A.4. As an example, in the special case of i.i.d. Poisson counts, we explicitly get Joint tests based on the multivariate asymptotics derived before are further discussed in Section 6.

IMPLEMENTATION OF STEIN-CHEN GOF TESTS
Before studying the performance of the proposed Stein-Chen GoF tests by simulations, let us briefly discuss their implementation in practice. This is done step-by-step by illustrative examples, starting with the most simple case of i.i.d. counts and for processes with finitely many nonvanishing ACF values such as, for example, INMA(1) counts, and ending up with INAR(1) counts where all pairs (X t , X t−k ) are dependent for any lag k ∈ N.

Application to i.i.d. counts
In the case of an i.i.d. DGP, the formulas derived in Lemma 1 and Theorem 1 simplify considerably, because the infinite sums " ∑ ∞ k=1 … " completely vanish. Thus, the asymptotic covariances in Lemma 1 reduce to As stated before, the involved moments i and ij (0) have to be approximated by truncating the summation after the Mth summand, with M being sufficiently large. We decided to set M equal to the (1 − 10 −7 )-quantile of the hypothetical Poisson distribution. Let us illustrate this approach and the whole application of the Stein-Chen tests with a first data example. We analyze a time series of n = 156 strike counts (for 4-week periods in 1948-1959) in the U.K. coal mining industry, which have been published as a frequency table in Kendall (1961 , table  7) and are, thus, treated as being i.i.d. These data have been previously used to illustrate models for underdispersed counts, see section 2.3 in Weiß (2013) for details and references, as the sample variance ≈ 0.742 is considerably smaller than the mean ≈ 0.994. To check for a significant violation of the Poisson null hypothesis, let us apply the (two-sided) Stein-Chen test statisticT 2 with function f (x) = exp(−x) (this choice is supported by our simulation study later in Section 5.1). Plugging-in the sample mean 0.994 instead of , the (1 − 10 −7 )-quantile equals M = 10, which is used as the upper summation bound for computing the required i and ij (0).
Plugging-in into Theorem 1, asymptotic mean and standard deviation ofT 2 are given by ≈1.006 and ≈0.083, respectively, whereas the test statistic itself takes the value ≈1.264. So we have to reject the Poisson null hypothesis at the 5%-level in favor of an underdispersed model for the strike counts data.

Application to INMA(1) counts
The asymptotics of Lemma 1 and Theorem 1 do not only simplify in the i.i.d. case. For the Poi-INMA(1) DGP X t = t + • t−1 , which also satisfies the conditions of Lemma 1, we have independence between X t and X t−k if k ≥ 2. Therefore, only the summand k = 1 in the sums ∑ ∞ k=1 … in Lemma 1 is nonzero. Compared with the i.i.d. case, we additionally have to compute the ij (1), while (1) = ∕(1 + ) (Aleksandrov & Weiß, 2020a).
As an example, let us look at the time series "1*" from Freeland (1998), where claimants from the logging industry with burn-related injuries are counted on a monthly basis (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994), so n = 120). In Aleksandrov and Weiß (2020a), it was shown that these data are well described by a Poi-INMA(1) model, although the sample variance ≈0.162 is slightly smaller than the mean 0.175. To cross-check this result, we apply the Stein-Chen test statisticT 2 with f (x) = exp(−x), which takes the value ≈1.053. Using again moment estimates, the two-sided critical values are computed from an approximate normal distribution, whose mean and variance have to account for the apparent INMA(1) autocorrelation (̂(1) ≈ 0.222). We get T 2 ≈ 1.008 and T 2 ∕ √ n ≈ 0.088 (compared with ≈1.006 and ≈0.084, respectively, under an i.i.d. assumption) such that we cannot reject the Poisson null hypothesis, in accordance with the findings of Aleksandrov and Weiß (2020a).

Application to INAR(1) counts
For Poi-INAR processes, the pairs (X t , X t−k ) are dependent for any lag k ∈ N, that is, the sums ∑ ∞ k=1 … in Lemma 1 are truly infinite. Nevertheless, also for Poi-INAR processes, some simplifications are possible. If, for example, the DGP (X t ) Z is a Poi-INAR(1) process according to Section 2, so X t = • X t−1 + t , then the conditions of Lemma 1 and Theorem 1 are satisfied. Furthermore, the expression for 11 (and thus also for 12 , 13 ) can be simplified by using the geometric sum formula: For the remaining i,j , however, a closed-form solution is not possible, that is, we additionally have to truncate the sums ∑ ∞ k=1 … after N summands with N sufficiently large. We implemented a dynamic choice of N, where N is not further increased if the Euclidean distance between Let us discuss the emergency counts in section 3.1 of Weiß (2013) for illustration. The length of this time series is n = 96, its sample mean ≈2.563, while the sample variance ≈1.870 is considerably smaller. The ACF is of AR(1)-type with rather large lag-1 ACF,̂(1) ≈ 0.664, such that Weiß (2013) decided to use an INAR(1) model for these data. To test the null hypothesis of a Poi-INAR(1) model, we use the dispersion test as well as the Stein-Chen test statis-ticT 2 with f (x) = exp(−x). The asymptotic implementation of the tests is done by using the Poi-INAR(1) moment estimates. Although the dispersion index Î ≈ 0.722 is quite low, we cannot reject the null hypothesis. This is due to the large standard error I ∕ √ n ≈ 0.232 (while I ≈ 0.948), which is caused by the large ACF in combination with the low sample size. This decision is confirmed by theT 2 -test, which leads toT 2 ≈ 1.237 while T 2 ≈ 1.058, T 2 ∕ √ n ≈ 0.197. To achieve the latter values, the upper summation limit was chosen as N = 31. Thus, altogether 2 + 3 × 32 = 98 -moments had to be computed, causing a lot of programming work.
Therefore, for DGPs with a nonvanishing ACF, one may think of an alternative to an asymptotic implementation, namely, a parametric bootstrap solution. Detailed investigations of INAR bootstrap schemes can be found in Jentsch and Weiß (2019) and Weiß and Jentsch (2019). The bootstrap solution requires less programming work but at the cost of increased computing times. Given the moment estimates for the emergency counts time series, B = 500 bootstrap time series of length n = 96 are generated from the fitted Poi-INAR(1) model and used to compute replicates of the statisticT 2 with f (x) = exp(−x). Then, we compute the 2.5%-quantile from these replicated statistics as the lower critical value, leading to ≈0.702 (asymptotic: ≈0.672), and the 97.5%-quantile as the upper critical value ≈1.511 (asymptotic: ≈1.445). Thus,T 2 ≈ 1.237 again does not lead to a rejection. This type of INAR(1) bootstrap implementation is further investigated in our simulations for Section 5.3, while we consider the asymptotic implementations for the i.i.d. and INMA(1) DGPs in Section 5.

RESULTS FROM A SIMULATION STUDY
To analyze the performance of the Stein-Chen GoF tests proposed in Section 3, we did a comprehensive simulation study. Relevant excerpts thereof are presented in the sequel, while further simulation results are available from the authors upon request. In Sections 5.1 and 5.2, when the GoF tests are implemented based on the asymptotic results of Theorem 1, we use 10 4 replications per scenario. The simulations are done for the means ∈ {2, 5} and for the sample sizes n ∈ {100,250, 500, 1000}. Later in Section 5.3, when considering the computer-intensive bootstrap implementations, we had to reduce the number of replications to 5000 and considered less scenarios, details are given below. The length of the prerun (if required) was chosen as 100, the nominal level for the GoF tests is 5% throughout. We compared the performance of the statisticsT 1 , … ,T 4 with two-sided critical values. For the asymptotic implementations based on Theorem 1, we used the plug-in approach, that is, the true parameter values needed for the asymptotic approximations are replaced by the corresponding moment estimates. Doing this, it turned out that the statisticT 4 has the same performance as the statisticT 2 , as long as we plug-in the sample mean instead of the true mean in Theorem 1. Therefore,T 4 is not further considered in the sequel. F I G U R E 1 PMF P(X = x) for diverse count distributions with mean = 2. PMF, probability mass function One of the major aims of our simulation study was to find out, which function f (weighting scheme) in combination with which statisticT i leads to the best size and power values with respect to different alternative scenarios and in the presence of serial dependence. To keep the results manageable, we present them only for those functions f and statisticsT i , where, provided a decent size, the power was among the best in almost all scenarios. We also explain why which functions f work best for which scenario such that guidance for the choice of f is provided. As alternative scenarios, we considered the cases of equidispersion but not Poisson, underdispersion, overdispersion, and zero inflation. We considered the negative binomial (NB) distribution in the overdispersion case, the Good distribution for equidispersion and underdispersion, and the zero-inflated Poisson (ZIP) distribution for zero inflation. Background and references on the aforementioned distributions can be found in appendix A.1 of Weiß (2018a). Example plots of their probability mass function (PMF) are shown in Figure 1, where the hypothetical Poisson distribution is always plotted by gray bars.

GoF tests for i.i.d. counts
If applying the Stein-Chen tests to i.i.d. counts, the asymptotic results of Lemma 1 and Theorem 1 simplify considerably, recall Section 4.1. For i.i.d. counts in the case of an equidispersed alternative, the Good distribution (see Table 1), we did not only consider the dispersion index Î (corresponding to f (x) = x − 1) as a competitor, but also the function f (x) = (x − 1)(x − 2) together with the statisticT 2 , because this leads to a statistic similar to the skewness index "̂Y " used by Schweer and Weiß (2016). If, in the sequel, we refer to the skewness index, we actually refer toT 2 with f (x) = (x − 1)(x − 2). The results in Table 1 show that we achieve the best power in the equidispersion case if using the function exp(−x) (this function was already used for the data examples in Sections 4.1-4.3). This is reasonable as exp(−x) puts a large weight on small outcomes (where the difference between the Poisson and the equidispersed Good distribution is most apparent, recall Figure 1), whereas both x − 1 and (x − 1)(x − 2) put their weight on large outcomes (which are hardly observed for an equidispersed distribution with low ). However, even if using exp(−x) as the weighting scheme, the power values are not particularly large for the lowest mean, = 2, because the Poisson and the equidispersed Good distribution are quite similar in this case (Schweer & Weiß, 2016), also see Figure 1. But as we shall see in TA B L E 1 Simulated size and power (nominal level 0.05) of different two-sided tests in the case of equidispersion Section 6, an appropriate type of joint Stein-Chen test allows to achieve a good power even for = 2.
In Table 2, results are presented for an underdispersed alternative, where the dispersion index and the skewness index again serve as the benchmark tests. For underdispersion even more than for equidispersion, it is crucial to focus on the lower tail of the count distribution, recall Figure 1. Thus, it is plausible that the best power is again achieved for the exponential weighting scheme, f (x) = exp(−x), in combination with statisticT 2 (clearly better than dispersion and skewness index). But also the inverse weighting scheme 1∕(x + 0.1) with its large weight on low counts does quite well, whereas the remaining choices for f in Table 2 lead to less power because of too much weight in the upper tail. Hence, if aiming at uncovering a certain type of alternative, it is important to identify the region of strongest deviation from the Poisson null, and to choose a function f with sufficient weight in that region. For the equidispersed and underdispersed alternatives analyzed so far, the function exp(−x) with its weight on low counts can be recommended for achieving the best power.
In Table 3, we investigate the alternative of i.i.d. ZIP counts. Since the ZIP distribution approaches the Poisson null for 2 ∕ → 1, we also present a local power analysis this time (due to the use of potentially highly nonlinear functions f , a theory-based local power analysis of the proposed GoF tests, for several choices of f and for different local alternatives, is nontrivial and not within the scope of this article). While in the equidispersion and underdispersion scenarios, the statisticT 2 was superior, we now observed an advantage forT 1 . But except using a different construction for the test statistic, it is again f (x) = exp(−x) and f (x) = 1∕(x + 0.1) that perform best in terms of (local) power, both putting their maximal weight into zero where the zero inflation becomes most apparent, recall Figure 1. Note that the dispersion index does considerably worse than these two tests, which might be surprising as zero inflation goes along with overdispersion, and the latter is well detected by the dispersion index (also see the next paragraph). But because of the isolated point mass in zero (recall Figure 1), zero inflation causes a rather irregular type of overdispersion. This confirms our guidance to choose the weighting scheme f in accordance with the region of strongest deviation from the Poisson null that one would like to detect. Finally, let us look at the regular type of overdispersion caused by an NB-alternative. In this case, we did not find any function f from the above proposals that lead to a better power than the dispersion index Î (i.e., f (x) = x − 1). Some functions perform almost as good as the dispersion index, but none does better. In Table 4, we show the NB-results of the functions |x − 1| a with a ∈ {2, … , 0.6} for = 2, where a = 1 corresponds to the dispersion index (its rejection rates are printed in italic font). These functions have increasing weight for increasing x, which is reasonable for uncovering overdispersion. We see that increasing deviations from a = 1 (especially decreases) lead to a deterioration of the power, showing that the dispersion index is the best choice for detecting such regular overdispersion.
To sum up, if constructing a Stein-Chen statistic aiming to detect a particular type of deviation from the Poisson null hypothesis (such as, e.g., underdispersion or zero inflation), one should identify the region in N 0 of strongest deviation from the Poisson null, and then choose the weighting scheme f such that it puts sufficient weight into this region. This guidance is also followed for the subsequent power analyses, where the counts exhibit additional serial dependence.

GoF tests for INMA(1) counts
After having analyzed how to choose the weighting scheme f for achieving a good power, we now study the effect of additional serial dependence on the performance of the Stein-Chen GoF tests.  results for the INMA(1) case are quite similar to those of the i.i.d. case. Therefore, we reduced the number of presented scenarios to save some space. Table 5 shows size and power results for the alternative of an INMA(1) DGP with equidispersed but Good-distributed innovations. Compared with the i.i.d. case in Table 1, we recognize a clearly worse power, that is, the additional serial dependence complicates the detection of this type of deviation from a Poisson distribution. Especially for the low mean = 2, there is virtually no power left. Nevertheless, for = 5, the Stein-Chen statistic with weight function exp(−x) emphasizing low counts clearly outperforms the dispersion and the skewness index.
Also in the case of an underdispersed alternative, the power deteriorates in the presence of serial dependence, see Table 6 in comparison with Table 2. But by contrast with the equidispersion case, we still have a reasonable power for detecting underdispersion in the case of an INMA(1) DGP. In particular, any of the considered Stein-Chen tests is still superior to the dispersion and skewness index. Altogether, the favorite solution appears to be the statisticT 2 together with the exponential weighting scheme.
Finally, let us turn to the overdispersion scenarios. For the regular type of overdispersion caused by the NB-INMA(1) alternative, we still did not find any other function f that improves over the dispersion index, recall the discussion of Table 4. We omitted plotting the results, because they are very similar to the i.i.d. case in Table 4 except having less power with increasing dependence. For overdispersion caused by zero inflation, that is, the alternative is an INMA(1) DGP TA B L E 5 Simulated size and power (nominal level 0.05) of different two-sided tests in the case of INMA(1) equidispersion with thinning parameter = 0.5 with ZIP-distributed innovations, we show some simulation results in Table 7. More precisely, we have to compare the upper part of Table 7 with the results in Table 3, because this upper part still relies on an implementation of the asymptotic results provided by Lemma 1 and Theorem 1. We see that both functions exp(−x) and 1∕(x + 0.1) are superior to the dispersion index, although the actual power values are again decreased because of the apparent serial dependence. Thus, to sum up, it became clear that the guidance for choosing f provided in Section 5.1 still remains valid, but we have to be aware of a deterioration of power caused by the additional serial dependence. By contrast with our previous analyses, this time, we did a further simulation experiment, which becomes more relevant in the next Section 5.3. Besides the asymptotic implementation of the Stein-Chen tests, we also did a parametric bootstrap implementation in analogy to Section 4.3, see the lower part of Table 7. So given the moment estimates for the available Monte-Carlo time series, B = 500 bootstrap time series of length n were generated from the fitted Poi-INMA(1) model and used to compute the considered statistics. The 2.5%-and 97.5%-quantile from these replicated statistics were used as the critical values for deciding about the null hypothesis. Table 7 compares the performance resulting from these different implementations. The size of the bootstrap implementation tends to exceed the nominal 5%-level, so it is slightly worse than the asymptotic implementation. Regarding the power, the asymptotic implementation is clearly better for the small sample size n = 100, while both implementations perform similarly well for n ≥ 500. Thus, for INMA-type data, there is a preference for using the asymptotic implementation, also in view of the lower computational effort. But one should note that the bootstrap implementation generally performs similarly well, which is important to know for such situations where an asymptotic implementation is difficult-such as for the INAR-type DGP considered in Section 5.3.

GoF tests for INAR(1) counts
In the INMA(1) case discussed in Section 5.2, the motivation for the bootstrap implementation was solely academic, because an asymptotic implementation is easily possible and TA B L E 6 Simulated size and power (nominal level 0.05) of different two-sided tests in the situation of underdispersion with mean = 2 and thinning parameter = 0.5 computationally more efficient. Nevertheless, it was important to recognize that at least for larger sample sizes, the performance of bootstrap and asymptotic implementation was nearly the same. The situation is different for the INAR(1) case. Then, also the summands for k > 1 have to be computed in Lemma 1 such that there is no computational advantage of an asymptotic implementation anymore, recall the discussion in Section 4.3. Therefore, we now consider the parametric bootstrap implementation of the INAR(1) case (with 5000 Monte-Carlo and B = 500 bootstrap replications, where the latter again use the Poi-INAR(1) fit relying on moment estimates). In our first simulation experiment, we compare the parametric bootstrap's performance between the ZIP-INMA(1) and ZIP-INAR(1) case, see Table 8. There is no big difference between both cases. The sizes are a bit worse in the INMA(1) case, whereas the power is considerably better. The latter is easily explained: although both DGPs have the same value for their thinning parameter, the extent of serial dependence is much stronger for the INAR(1) DGP. In accordance with our previous observations, we conclude that the deterioration of the power is caused by this increase of dependence.
Next, let us discuss the alternative of Good-distributed innovations having equidispersion. Comparing Table 5 with Table 9, we see that the power in the INAR(1) case becomes again worse than in the INMA(1) case for increasing values of the dependence parameter . In particular, for ≥ 0.5, there is virtually no power left despite the large sample size n = 1000. Comparing Table 6 with Table 10 for the case of Good-distributed innovations with underdispersion, we see that the power of the INAR(1) bootstrap implementation is comparable with the one of the asymptotic INMA(1) implementation for small , while the power gets again worse if the dependence increases. Another pattern is more notable: the ranking among the statistics changes with increasing . For = 0.75, the best power is observed for the dispersion index and the statisticT 1 with f (x) = log(x + 1). TA B L E 7 Simulated size and power (nominal level 0.05) of different two-sided tests in the situation of zero inflation with mean = 2 and thinning parameter = 0.5 In our final simulation experiment, we apply the block bootstrap to the INAR(1) DGP (with 5000 replications and B = 500). The motivation for this experiment is as follows: if we know that we are concerned with an INAR(1) DGP and just want to test for a Poisson distribution within this setup (as assumed before), there is no reason why one should prefer the block bootstrap over the parametric one. But if we do not know the actual type of DGP, then it seems plausible to use the block bootstrap to account for the (unknown) dependence structure while testing the Poisson null hypothesis. The performance of this approach is investigated for the INAR(1) DGP and compared with the parametric bootstrap's performance. The idea is to take the 2.5%-and 97.5%-quantiles from the bootstrapped deviationsT * i,b −T i , b = 1, … , B, as the critical values for the deviation of the test statisticT i to its hypothetical value 0 or 1, respectively. For implementation, we used the stationary bootstrap (SB) method described in Politis and Romano (1994). The beginning of the first block was chosen according to a discrete uniform distribution, the length of the blocks themselves according to a geometric distribution with mean b. Here, the mean block length b was either chosen as the fixed value 24 (which appears reasonable for sample size n = 1000), or in a data-driven way by using the function b.star from R's "np-package" (which implements TA B L E 8 Parametric bootstrap, simulated size, and power (nominal level 0.05) of different two-sided tests in the case of zero inflation for the sample size n = 1000 with thinning parameter = 0.5 or = 0.5, respectively

TA B L E 9
Parametric bootstrap, simulated size, and power (nominal level 0.05) of different two-sided tests in the case of equidispersion for n = 1000 the approach of Patton, Politis, & White, 2009). For both SB implementations, we "wrapped" the data around a circle such that X n is followed by X 1 . So if a block extends beyond X n , we continue with X 1 , … for building the block, that is, we take X (t−1)(mod n)+1 at time t. The results obtained for the SB implementations, in comparison with those of the parametric bootstrap, are summarized in Table 11. While the SB implementations lead to roughly the same power, we observe a rather strong oversizing. In addition, for sample sizes smaller than n = 1000, we did not observe a reasonable performance at all. Hence, the SB implementations have to be used with caution.

JOINT STEIN-CHEN GOF TESTING
If testing the Poisson null hypothesis against a specific type of alternative, we provided guidance in Section 5 of how to choose an appropriate weighting scheme f for the Stein-Chen statistic.

TA B L E 11
First block corresponds to parametric bootstrap, second block corresponds to SB with fixed b = 24 ("SB 24 "), and third block corresponds to SB with b selected by b.star ("SB*") To achieve a good power for multiple types of alternatives, different choices of the Stein-Chen function f could be necessary. Thus, it suggests itself to apply multiple functions and the corresponding test statistics jointly for getting some kind of omnibus test. The required asymptotics have already been derived in Section 3.3 before.
As an application of the proposed joint Stein-Chen tests, let us return to the i.i.d. simulations discussed in Section 5.1. In Table 1, we analyzed the performance of single Stein-Chen statistics to test against an equidispersed alternative. Using the function exp(−x), we obtained a better power TA B L E 12 Simulated size and power (nominal level 0.05) of joint statistic with (T f ;2 ,T g;2 ) in the case of equidispersion than for the dispersion or skewness test, but the power was still not satisfactory for = 2. Therefore, let us now analyze the joint test statistic (T f ;2 ,T g;2 ) with functions g(x) = exp(−x) and either Table 12. Using (13), the test was implemented by utilizing that the quadratic form n (T f ;2 ,T g;2 ) −1 (T f ;2 ,T g;2 ) ⊤ is asymptotically 2 2 -distributed. Comparing Table 1 with Table 12, it turns out that both combined tests outperform any of the one-dimensional statistics, with a good power not only for = 5 but also for = 2. Table 12 also indicates a slight advantage for the skewness-exp combination. Overall, we recognize that the combination of multiple Stein-Chen statistics offers the potential to achieve an improved performance than if testing based on a single statistic.
Let us apply this joint test in a final data example, where we analyze the pollonium data of Rutherford, Geiger, and Bateman (1910). For n = 2608 time intervals of length 1/8-min, the number of scintillations caused by the radioactive decay of polonium was determined. Since successive intervals were interrupted by breaks of several minutes, it is reasonable to assume an i.i.d. count DGP for these data. Rutherford et al. (1910) argued that a Poisson model is suitable for the data; in fact, we observe nearly equidispersed counts (sample mean ≈3.872 and variance ≈3.696). The Poisson null hypothesis was checked by Gürtler and Henze (2000) with altogether 18 GoF tests, where two of them lead to a rejection at the 5%-level. Thus, they conjectured that the data might origin from an equidispersed non-Poisson distribution. To further investigate this conjecture, we apply both aforementioned joint Stein-Chen tests to the data. The 2 -statistic of the dispersion-exp (skewness-exp) test takes the value ≈4.638 (≈2.594). Since the 95%-quantile of the 2 2 -distribution equals ≈5.991, we do not reject the Poisson null at the 5%-level. So in contrast to Gürtler and Henze (2000), we do not find evidence against a Poisson distribution for the pollonium data. There are several directions for future research. If choosing f (x) = x − 1, our Stein-Chen approach covers the test statistics relying on second-order factorial moments that were proposed by Kyriakoussis et al. (1998). In fact, we also considered the third-order case f (x) = (x − 1)(x − 2), but both cases are yet limited to a Poisson null hypothesis. So besides an analysis of higher-order statistics, it would be relevant to extend our Stein-Chen approach to other types of null hypothesis, such as the (negative) binomial distribution Bin(n, ) (NB(n, )) considered by Kyriakoussis et al. (1998). Test statistics could be defined by considering adequate types of Stein identity, such as

CONCLUSIONS AND FUTURE RESEARCH
1+X∕n 1+ ∕n f (X + 1)] in the NB case, see Sudheesh and Tibiletti (2012) for these and further identities. The resulting tests could then be applied to, for example, the geometric INAR-type DGP of Ristić, Bakouch, and Nastić (2009), which uses the NB-thinning operation instead of the binomial one. Related to this, also an asymptotic analysis of the local power of the proposed Stein-Chen tests, with respect to different local alternatives, could be relevant to achieve a theory-based guidance for choosing f .
Another direction for future research is trying to apply the Stein-Chen approach to DGPs from the Poi-INGARCH (INteger-valued Generalized AutoRegressive Conditional Heteroscedasticity) family. For these conditional Poisson regression models, the conditional distribution of X t given X t−1 , … is Poi(M t ), where the conditional mean M t = E[X t | X t−1 , … ] depends linearly on past observations and possibly on past conditional means (Weiß, 2018a). Then, the Stein-Chen identity (1) could be applied to the conditional Poi(M t )-distribution, leading to This is used to define standardized Stein-Chen residuals as where the required conditional variance follows from Equation (A3) as These residuals might be used for diagnostic model checking in analogy to Zhu and Wang (2010). The choice f (x) ≡ 1 in (15) results in the ordinary (standardized) Pearson residuals, see Aleksandrov and Weiß (2020b) for a recent investigation, but we conjecture that nonconstant Stein-Chen functions will lead to an improved performance for some types of alternatives.

ACKNOWLEDGMENTS
The authors thank the associate editor and the two referees for their useful comments on an earlier draft of this article. This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -Projektnummer 437270842.

Note that we have the equality E
because of the symmetry of BPoi , (k) . So 22 simplifies to the expression given in Lemma 1.
Next, applying (A2) to f (x) f (x + 1), we get Finally, does not further simplify. So the proof of Lemma 1 is complete.
Then, the Delta method implies the asymptotic normality of √ n (T i − g i ( , 1 , 1 )). To derive the asymptotic variance, we need the gradients of g i , which are given by ∇g 1 (y) = .
The remaining 2 T i immediately follow as Finally, we compute a bias correction based on a second-order Taylor expansion. For this purpose, we derive the Hessians H g i (y 1 , y 2 , y 3 ) for i = 1, … , 4, and evaluate them as H i ∶= H g i ( , 1 , 1 ). Then, the approximate bias correction is obtained as