An Observation-Driven Random Parameter INAR(1) Model Based on the Poisson Thinning Operator

This paper presents a first-order integer-valued autoregressive time series model featuring observation-driven parameters that may adhere to a particular random distribution. We derive the ergodicity of the model as well as the theoretical properties of point estimation, interval estimation, and parameter testing. The properties are verified through numerical simulations. Lastly, we demonstrate the application of this model using real-world datasets.


Introduction
Integer-valued time series data are prevalent in both scientific research and various socioeconomic contexts. Examples of such data encompass the annual number of companies listed on stock exchanges, the monthly usage of hospital beds in specific departments, and the yearly frequency of major earthquakes or tsunamis. However, traditional continuousvalued time series models are unable to precisely capture the unique characteristics of integer-valued data, resulting in only approximations through continuous-valued models. This shortcoming may lead to model mis-specification, posing challenges in statistical inference. Consequently, the modeling and analysis of integer-valued time series data have increasingly gained attention within academia. Amongst the extensive range of integervalued time series models, thinning operator models have attracted considerable interest from scholars due to their resemblance to Autoregressive Moving Average (ARMA) models in continuous-valued time series theory. Thinning operator models replace multiplication in ARMA models with the binomial thinning operator, which was initially introduced by Steutel and Van Harn [1]: where {Y i } refers to a count series and {B i } represents a Bernoulli random variable sequence that independent of {Y i }, satisfying the condition P(B i = 1) = 1 − P(B i = 0) = φ. Building upon this concept, Al-Osh and Alzaid [2] developed the first-order Integer-valued Autoregressive (INAR(1)) model, for t ∈ N + : where Z t is considered the innovation term entering the model during period t. Its marginal distribution aligns with a Poisson distribution, exhibiting an expected value of λ, thereby giving rise to the nomenclature of the Poisson INAR(1) model. An intuitive interpretation of this model is that, within a hospital setting, the number of in-patients in period t comprises patients from period t − 1 who have not yet been discharged, along with patients newly admitted in period t. Given that B i adheres to a Bernoulli distribution, the binomial thinning operator can exclusively express the {0, 1} to {0, 1} excitation states. However, the binomial thinning operator does not represent the sole available option for thinning The thinning operator models previously mentioned presuppose that φ is independent of other variables, thereby neglecting the dynamic features of the coefficient φ in INAR models.
To tackle this limitation, Zheng and Basawa [14] proposed a first-order observation-driven integer-valued autoregressive process. Triebsch [15] introduced the first-order Functional Coefficient Integer-valued Time Series model based on the thinning operator, in which the coefficient φ t during period t is a measurable function of the previous observation Y t−1 . Furthermore, Montriro, Scotto, and Pereira [16] presented the Self-Exciting Threshold Integervalued Time Series model (SETINAR) in which the coefficient φ t during period t assumes diverse values contingent on the varying observations in prior limited periods. Building on the geometric thinning operator (alternatively known as the negative binomial thinning operator) proposed by Ristić, Bakouch, and Nastić [17], Yu, Wang, and Yang [18] introduced an INAR(1) model encompassing observation-driven parameters. With respect to integer-valued time series models featuring observation-driven parameters, existing studies primarily focus on binomial and geometric thinning operators. However, the binomial thinning operator cannot represent one-to-many excitation states, and both binomial and geometric thinning operators exhibit limited descriptive capacity for locally non-stationary phenomena and extreme values in real data. Consequently, this paper employs a Poisson thinning operator, defined as follows: where, B (t) i is independent of Y t and constitutes an independent and identically distributed Poisson random variable sequence with an intensity parameter φ t > 0. The probability mass function is expressed by: and Y t are mutually independent. Leveraging this thinning operator, the INAR(1) model in this study is formulated as follows: where the sequence {Z t } comprises independent and identically distributed non-negative integer-valued random variables, which are independent of B (t) i and {Y s } s<t . Furthermore, diverging from the parameters set forth by Yu, Wang, and Yang [18], we posit that φ t correlates with the previous observation Y t−1 , and given Y t−1 , φ t |Y t−1 may still conform to a specific non-negative probability distribution. In Section 2, we will demonstrate that if the expectation of this non-negative discrete probability distribution falls below 1, it does not affect the model's ergodicity. Simultaneously, due to instances where φ t |Y t−1 occasionally exceeds 1, the autoregressive model exhibits non-stationary features or generates extreme values within specific periods-all without compromising its overall stationarity. In comparison to existing research, this setting offers the advantage of simultaneously illustrating one-to-many excitation states and observation-driven and time-varying parameter structures, as well as localized non-stationary features or extreme values. For example, in public health, a patient with an infectious disease may not transmit the illness to others or could potentially infect one or multiple individuals, indicating one-to-many excitation states. As the number of infections fluctuates, local epidemic prevention policies may undergo changes, consequently modifying the disease's transmissibility and reflecting the time-varying and observation-driven characteristics of the coefficient. During particular periods of rapid infectious disease spread, the majority of infected individuals are likely to infect more than one other person, resulting in infection data that exhibit extreme values or localized non-stationary characteristics.
The organization of this paper is as follows: in Section 2, we introduce the integervalued time series model featuring observation-driven coefficients under investigation and outline its essential statistical properties. In Section 3, we describe the estimation and testing methods pertinent to this model and present asymptotic results. Section 4 provides numerical simulation outcomes of these techniques, elaborating on the performance of the estimation and testing approaches across diverse settings and sample conditions. Section 5 demonstrates the application of the proposed model using real-world data. Finally, Section 6 offers a summary and discussion.

Model Construction and Basic Properties
For the time series {Y t }, consider the following data generating process: Given Y t−1 , φ t may be fixed as: Alternatively, {φ t } could represent an independent random variable sequence with a conditional expectation of: where β is an -dimensional parameter vector, the function ν(·; ·) belongs to a specific parametric family of functions G{ν(Y t−1 ; β); β ∈ Θ}, and Θ is a compact subset of R . β is an interior point of Θ and ν(y; β) is thrice continuously differentiable with respect to β. The conditional variance is given by Var(φ|Y t−1 ) = σ 2 . Additionally, {Z t } comprises an independent and identically distributed non-negative integer-valued random variable sequence with a probability mass function f z with expectation E(Z t ) = λ < ∞ and variance Remark 1. Integer-valued probability distributions that align with the settings of Z t are common, with typical examples being Poisson and geometric distributions. This paper employs a Poisson distribution in the numerical simulation section.

Remark 2.
There are numerous functions that align with the setting of ν(·; ·), with the most typical being the linear function ν(Y t−1 ; β) = β 0 + β 1 Y t−1 . In this paper's numerical simulation section, a linear function setting will be adopted. (4), it is evident that {Y t } is a Markov chain defined on the set of natural numbers N, with a one-step-ahead transition probability:

Remark 3. From model
Based on the above model construction, we can obtain the conditional moments for Model (4). Starting from these conditional moments, we can construct estimating equations to estimate the unknown parameters in the model: Ergodicity is crucial for the convergence of parameter estimation, as presented in the following property: Property 2. If sup y∈N ν(y; β) < ∞, β ∈ Θ, then the data generating process {Y t } defined by (4) is an ergodic Markov chain.

Remark 4.
In Property 2, since the form of the function ν is not determined, we cannot directly provide the conditions for the ergodicity of {Y t }. However, for specific cases, such as ν(Y t−1 ; β) = β 0 + β 1 Y t−1 , we can intuitively see that the stationary and ergodic property of the data generating process requires β 1 ≤ 0 at the very least, making the expected value of φ t lower when Y t is higher and vice versa. From the proof of Property A1 in Appendix A, it can be observed that the ergodicity of {Y t } requires the existence of a constant 0 < m < 1 such that 1+exp(β 0 +β 1 Y t−1 ) will increase with the rise of Y t , making it impossible to determine a constant m that meets requirements.

Parameter Estimation and Hypothesis Testing
In this section, we assume that the time series {Y t } T t=1 satisfies the data-generating process defined by Equation (4), with θ 0 = (β 0 , λ 0 ) as the true parameter vector of this process and θ = (β , λ) as the unknown parameter vector to be estimated. In this paper, our primary focus is on two estimation methods: Conditional Least Squares (CLS) and Conditional Maximum Likelihood (CML). Additionally, we attempt to establish observation-driven interval estimation through estimating equations in CLS and observation-driven hypothesis testing through the framework of Empirical Likelihood (EL). Here, we first make assumptions ∂β is a full-rank matrix, i.e., of rank .

Conditional Least Squares Estimation
The CLS estimator is then given by: The first-order condition equation is represented as follows: where Thus, the estimating equation is given by ∑ T t=1 M t (θ) = 0. Solving this equation provides the CLS estimateθ CLS for the parameter vector θ = (β, λ). Theorem 1. Under assumptions (A1) to (A5), the CLS estimatorθ CLS is a consistent estimator for the true parameter θ 0 , and it has an asymptotic distribution:

Interval Estimation
Based on the estimating equations from the CLS estimation, we can construct observation-driven interval estimation and hypothesis testing. Let: We can then obtain the following theorem: Remark 5. From Equation (8), we can construct an interval estimation for θ 0 : where C α satisfies that for 0 < α < 1, P χ 2 +1 ≤ C α = α. From the perspective of hypothesis testing, this serves as an acceptance region for testing the null hypothesis H 0 : θ 0 = θ. If H(θ) > C α ; then the null hypothesis is rejected.

Empirical Likelihood Test
In the following, we introduce hypothesis testing based on empirical likelihood estimation. First, we provide a brief introduction to the empirical likelihood (EL) method. Initially proposed by Owen [19] for providing interval estimations for expectation, the EL method was later extended to estimating equation estimation by Qin and Lawless [20]. For T observations y 1 , y 2 , . . . , y T of a random variable Y with distribution F, the empirical likelihood ratio is defined as: where L(F) = ∏ T t=1 p t is the nonparametric likelihood function, p t = dF(y t ) = P(Y = y t ), and F T (y) = 1 T ∑ T t=1 1 {y t ≤y} is the empirical distribution function of the random variable Y, dF T = 1 T , ∀t ∈ T. Under constraints ∑ T t=1 p t = 1 and p t ≥ 0, ∀t, F T maximizes L(F), so R(F) ≤ 1.
Suppose we are interested in the parameter vector θ, which satisfies the estimating equation E(M t (θ)) = 0. We need to add a new constraint for p t : ∑ T t=1 p t M t (θ) = 0. Based on this, we can establish the profile empirical likelihood ratio function: The profile empirical likelihood ratio function can be solved using the Lagrange multiplier method. Let: where and γ are Lagrange multipliers. It can be proved that when L(θ) is maximized, = T, and: Here, as a function of θ, γ = γ(θ) is the solution to the following equation: substituting this into p t and R(F), we find: Thus, the log empirical likelihood ratio function can be defined as: The empirical likelihood estimate is then given by: The corresponding γ is denoted byγ θ EL .

Remark 6.
Given that 0 ≤ p t ≤ 1 for all t ∈ T, it can be deduced that L E (θ) = −log ∏ T t=1 p t ≥ 0.

Remark 7.
Since the number of estimating equations matches the number of parameters to be estimated (also known as just-identified in some econometrics literature), andθ CLS is the solution to the estimating equation ∑ T t=1 M t (θ) = 0, it follows from Chen and Keilegom [21] that: Therefore, we will omit empirical likelihood estimation in the point estimation segment in the numerical simulation section.

Conditional Maximum Likelihood Estimation
It is straightforward to derive the log-likelihood function logL(θ) from the one-stepahead transition probability (6) of model (4). In time-series models, the probability distribution of the first observation Y 1 is unknown, and its influence on the likelihood function is minimal when the sample size T is sufficiently large. Thus, we focus only on the conditional likelihood function. Given that the log-conditional likelihood function is a nonlinear function of the parameter vector θ = (β, λ), we employ numerical methods to solve: To obtain the asymptotic distribution ofθ CML , we need to verify the regularity conditions presented in Billingsley [22]. The satisfaction of these conditions can be directly observed from the model-building process in Section 2 and the assumptions provided in Section 3. Therefore, the proof is omitted. We arrive at the following theorem: Theorem 4. Under assumptions (A1)-(A6), the conditional maximum likelihood estimatorθ CML consistently estimates the true parameter θ 0 and exhibits an asymptotic distribution: represents the Fisher information matrix.

Remark 9.
Achieving CML estimation requires making specific assumptions about the probability distribution of Z t . In this paper, we assume Z t follows a Poisson distribution with parameter λ. This strong assumption can result in significant errors or even inconsistency in statistical inference based on the CML method if the assumed model does not represent the true data-generating process. This constitutes the primary drawback of CML estimation. The impact of model mis-specification on CML estimation will be examined in the following numerical simulation section.

Numerical Simulation
In this section, we set the function ν as a linear function, considering the following data-generating process: Here, {Z t } represents an independently and identically distributed Poisson random variable sequence with a mean of λ. In subsequent numerical simulation studies, we mainly concentrate on three aspects: parameter estimation, interval estimation, and empirical likelihood ratio testing. All numerical simulations are conducted based on 1000 repeated sampling.

Parameter Estimation
We generate data using the above model and apply the CLS and CML methods to estimate parameters. Moreover, we define three statistical measures for evaluating estimation performance (using λ as an example): Sample bias : Bias = λ − λ, Root mean square error : In CML estimation, the score function is defined as: In the CML estimation, we primarily consider four distribution cases for φ t |Y t−1 when Z t follows a Poisson distribution. Let the variable without any randomness. In this case, the log-likelihood function is: (ii) φ t |Y t−1 follows a uniform distribution with mean A t , minimum value 0, and maximum value 2A t . In this case, the log-likelihood function is: follows an exponential distribution with mean A t . In this case, the loglikelihood function is: (iv) φ t |Y t−1 follows a chi-square distribution with the mean A t . Specifically, the density function of φ t |Y t−1 is: Although A t is not an integer, we still call it a chi-square distribution. In this case, the log-likelihood function is: The specific simulation results are shown in the table below: From Table 1, we can observe that for both CLS and CML estimators, as the sample size T gradually increases, BIAS, RMSE, and MADE all decline, indicating the consistency of these estimators. Notably, both CLS and CML yield satisfactory parameter estimates. In large samples, CLS and CML estimates are approximately equal, while in small samples, under the premise of a correctly specified model, CML tends to provide superior estimation precision. Furthermore, we present an additional set of parameter estimation simulation results in the Appendix A, as shown in Table A1.     Figure 1 showcases the typical trajectory of data generated by models (10) and (11) with parameters β 0 = 1, β 1 = −0.6, and λ = 1.2. In this figure, "fixed" represents φ t |y t−1 as a fixed parameter given y t−1 , "uniform" denotes φ t |y t−1 following a uniform distribution, "exponential" signifies φ t |y t−1 following an exponential distribution, and "chi-square" indicates φ t |y t−1 following a chi-square distribution. Figure 1 reveals that some extreme values are present in the sample paths when φ t |y t−1 follows either an exponential or chi-square distribution, with the latter capable of generating even higher extreme values. This suggests that these two distribution settings for φ t |y t−1 contain a certain descriptive ability concerning the extreme values in the data.
As pointed out in Section 3.4, the CML method depends upon correct model specification. To evaluate the effects of model misspecification on parameter estimation, we consider {Z t } as an independently and identically distributed geometric random-variable sequence with a mean of λ within the data generation process (10) and (11). Subsequently, we employ both the CLS and CML methods for estimation, presenting the results in the table below.
From Table 2, we can observe that the three statistical measures BIAS, RMSE, and MAPE for the CML estimator have noticeably increased compared to the CLS estimator. This indicates that model misspecification significantly impacts CML estimation, necessitating appropriate model selection efforts before employing the CML estimation method. As long as the conditional expectation E(Y t |Y t−1 ) is correctly specified, CLS estimation will be more robust than CML estimation. Moreover, we provide the parameter estimation simulation results obtained under the misspecification of the φ t |y t−1 distribution in the Appendix A, as shown in Table A2.
"chi-square" indicates | following a chi-square distribution. Figure 1 reveals that some extreme values are present in the sample paths when | follows either an exponential or chi-square distribution, with the latter capable of generating even higher extreme values. This suggests that these two distribution settings for | contain a certain descriptive ability concerning the extreme values in the data. As pointed out in Section 3.4, the CML method depends upon correct model specification. To evaluate the effects of model misspecification on parameter estimation, we consider { } as an independently and identically distributed geometric random-variable sequence with a mean of within the data generation process (10) and (11). Subsequently, we employ both the CLS and CML methods for estimation, presenting the results in the table below. Parameter : β 0 = 1, β 1 = −0.6, λ = 1.2 φ t |y t−1 follows a uniform distribution, Z t follows a geometric distribution.

Interval Estimation
We perform a numerical simulation study on the coverage frequency of the interval estimation, as proposed in Theorem 2 and Remark 5, for the true values in the model. We consider parameter settings of β 0 = 1, β 1 = −0.6, and λ = 1.2. The nominal levels considered are 0.90 and 0.95, with the specific simulation results presented in the following table: From Table 3, we can observe that as the sample size T increases, the coverage frequency of interval estimation gradually approaches the nominal level. Even with smaller sample sizes, the coverage frequency of the interval estimation for the true values remains satisfactory. This result suggests that the data-driven interval estimation has achieved commendable performance.

Empirical Likelihood Test
Lastly, we perform a numerical simulation study on the empirical likelihood test (EL test). For the observation-driven parameter model defined by data generation processes (10) and (11), we aim to test whether β 1 equals 0. If β 1 = 0, our model's parameters are not driven by observations. We employ models (10) and (11) to generate sequences, assuming φ t |y t−1 is a fixed parameter, and perform estimation under the null hypothesis. Then, we compare the test statistic proposed in Theorem 3 with the upper 0.90 and 0.95 quantiles of the corresponding chi-square distribution; if the EL test statistic exceeds the critical value, we reject the null hypothesis.
Initially, we investigate scenarios in which the true value of β 1 for the data generation process equals 0, considering the following hypotheses: where b is a nonnegative constant, the simulation results for the test power are presented below (the simulation results for H 0 : β 1 = 0 represent the frequency of Type I errors). Next, we examine the scenarios where the true value of β 1 in the data generation process is not equal to 0, considering the following hypotheses: The simulation results for the test power are as follows. From Tables 4 and 5, we observe that the Type I error frequency of the EL test gradually diminishes to the corresponding confidence level as the sample size T increases, while the test power concurrently ascends to 1. Notably, in small sample scenarios, when the true value of β 1 is 0, the test power level for H 0 : β 1 = −0.1 is relatively low. Likewise, when the true value of β 1 is −0.1, the test power for H 0 : β 1 = 0 exhibits a similar pattern. Overall, however, the EL test performs satisfactorily when the gap between the true and hypothesized values of β 1 is relatively large, or in cases involving large samples. Owing to space constraints, we include in the Appendix A, the EL test simulation results for the parameter λ under φ t |y t−1 following four distinct random distributions, as shown in Table A3.  It is crucial to note that the estimation equation employed in the empirical likelihood test solely reflects the linear mean structure inherent in the data-generating process. For more intricate and nonlinear coefficient random distributions, the test exhibits limited descriptive capacity. As a result, we advise against utilizing the empirical likelihood test in cases where φ t |y t−1 is stochastic. In Appendix A, we present numerical simulation results pertaining to the empirical likelihood test when φ t |y t−1 adheres to an exponential distribution. As evidenced by Table A4, the empirical likelihood test demonstrates a very high frequency of Type I errors when φ t |y t−1 conforms to an exponential distribution. Consequently, we discourage the use of the empirical likelihood test in such circumstances.

Real Data Application
In this section, we analyze the daily download count data for the software CWB TeXpert, covering the period from 1 June 2006, to 28 February 2007, resulting in a sample size of T = 267. This dataset is made available on the Supplementary webpage associated with Weiß [23].
From the sample path in Figure 2, we observe that this data contains a considerable number of extreme values. Simultaneously, the ACF and PACF plots suggest that the sample might have originated from a first-order autoregressive data-generating process. We proceed to analyze this data using the models introduced in this paper. For the CML estimation, CML f ix in the table below represents φ t |y t−1 as a fixed parameter, CML uni f denotes φ t |y t−1 following a uniform distribution, CML exp signifies φ t |y t−1 following an exponential distribution, and CML chi indicates φ t |y t−1 following a chi-square distribution. Additionally, for comparison purposes, we applied the model proposed by Yu et al. [18] to this dataset, which is denoted as CML geom in the subsequent table: proceed to analyze this data using the models introduced in this paper. For the CM mation, in the table below represents | as a fixed parameter, notes | following a uniform distribution, signifies | follow exponential distribution, and indicates | following a chi-square d tion. Additionally, for comparison purposes, we applied the model proposed by Y [18] to this dataset, which is denoted as in the subsequent table: The estimation results are displayed in Table 6, where we provide AIC and B ues for the four distributions that | may follow. Based on these two infor criteria, we show a preference for models in which | follows either a chi distribution or an exponential distribution. This preference might be attributabl presence of extreme values in the sample path, as anticipated. As observed in Fig  Section 4, models with | following either a chi-square or exponential distr prove more effective in capturing data characterized by extreme values.  The estimation results are displayed in Table 6, where we provide AIC and BIC values for the four distributions that φ t |y t−1 may follow. Based on these two information criteria, we show a preference for models in which φ t |y t−1 follows either a chi-square distribution or an exponential distribution. This preference might be attributable to the presence of extreme values in the sample path, as anticipated. As observed in Figure 1 in Section 4, models with φ t |y t−1 following either a chi-square or exponential distribution prove more effective in capturing data characterized by extreme values.

Discussion and Conclusions
In this paper, we propose a first-order integer-valued autoregressive time series model based on the Poisson thinning operator. The parameters of this model are observationdriven and may follow specific random distributions, resulting in time-varying autoregressive coefficients. We established the ergodicity of this model and performed estimation and hypothesis testing using conditional least squares (CLS), conditional maximum likelihood (CML), and empirical likelihood (EL) methods. Additionally, we provided a data-driven interval estimation.
In the numerical simulation study, we compared the parameter estimation performance of CLS and CML, verified the coverage frequency of the interval estimation for the true parameter values in the data generation process, and conducted corresponding simulation studies for the EL test. The simulation study reveals that the properties of the CML estimation depend on the correct model specification, while the CLS estimation demonstrates a degree of robustness against model misspecifications.
In future research, observation-driven parameter integer-valued time series models offer numerous promising avenues for development. In this discussion, a brief overview of some of these directions is provided: (1) Combining observation-driven parameters with self-driven parameters, namely selfexciting threshold models: the SETINAR model proposed by Montriro, Scotto, and Pereira [16] is defined as follows: (12) in this model, p (1) and p (2) represent given positive integers, with ∑ i ∈ (0, 1) for j = 1, 2. Additionally, the innovation series Z (1) t and Z (2) t possess probability distributions F 1 and F 2 on the set of natural numbers N 0 , respectively. The constant R represents the threshold value responsible for the structural transition in the lagged d-period observation excitation model. Montriro, Scotto, and Pereira [16] demonstrated that model 6.1 possesses a strictly stationary solution when p (1) = p (2) = 1. By effectively combining observation-driven parameter models with self-driven parameter models and flexibly selecting thinning operators, a more diverse range of integer-valued time series models can be characterized.
(2) Expanding upon current observation-driven models to incorporate higher-order models: Du and Li [24] introduced the INAR(p) model: in this model, ∑ p i=1 α i < 1, and {Z t } represents a sequence of integer-valued random variables defined on the set of natural numbers N 0 . Existing observation-driven models are primarily first-order models. By extending these models to higher-order versions, the capability to describe more intricate and complex parameter dynamics can be achieved. It is important to note that when progressing to higher-order models, the technique utilized in the proof of Property 2. is no longer applicable for establishing the model's ergodicity. As a result, new proof methods need to be sought from related Markov chain theories.
(3) Extending the observation-driven parameter setting to Integer-valued Autoregressive Conditional Heteroskedasticity (INARCH) models: Fokianos, Rahbek, and Tjøstheim [25] proposed the INARCH model (which they referred to as Poisson Autoregressive) as follows: where α ≥ 0, β ≥ 0, and α + β < 1. This model is a natural extension of the generalized linear model and helps to capture the fluctuating changes of observed variables over time. Another advantage of this model is its simplicity, which makes it easy to establish the likelihood function of the INARCH model. Extending the observation-driven parameter setting to integer-valued autoregressive conditional heteroskedasticity models allows the model to describe the driving effect of the fluctuations of observed variables on the parameters. However, the challenge in doing so lies in the fact that, compared to the INAR model used in this paper, the ergodicity of the INARCH model is more difficult to establish. (4) Forecasting Integer-Valued Time Series: In time series research, it is common to employ h-step forward conditional expectations for forecasting: Nonetheless, this approach does not guarantee that the predicted values will be integers, and such predictions primarily describe the expected characteristics of the model, without capturing potentially time-varying coefficients or other features, as illustrated in Figure A1. Furthermore, Freeland and McCabe [26] highlighted that utilizing conditional medians or conditional modes for forecasting could be misleading. Consequently, it is essential to adopt innovative forecasting methods for integer-valued time series analysis. The rapid advancement of machine learning and deep learning in recent years has offered numerous new perspectives, such as the deep autoregressive model based on autoregressive recurrent neural network proposed by Salinas, Flunkert, Gasthaus, and Januschowski [10], which may hold significant potential for widespread application in the domain of integer-valued time series. (i) Given the data generation process (4), the following can be proved using the law of iterated expectation: Using the formula Var(Y) = Var(E(Y|X)) + E(Var(Y|X)), the result can be proved (ii) By the law of iterated expectation, we know: From this, it follows that: Property A2. According to Theorem 1 in Tweedie [27] (also see Meyn and Tweedie [28]), the sufficient condition for {Y t } to be an ergodic Markov chain is the existence of a set K and a measurable function g in the state space Y of {Y t } such that: and for a constant B: The state space Y of {Y t } is the set of natural numbers N = {0, 1, 2, 3, . . .}. Let g(y) = y, then we have: Since sup y∈N ν(y; β) < ∞, then: Therefore, we can choose a constant 0 < m < 1, such that where c represents the floor function of c. Defining K = {0, 1, 2, . . . , N − 1}, we know: Hence, the data generation process {Y t } is ergodic.
Theorem A1. According to Theorems 5 and 6 in Klimko and Nelson [29], let g = E Y t Y (t−1) , if the following four conditions hold, then Theorem 1 in this paper holds: , 1 ≤ i, j, k ≤ + 1, exists and are continuous with respect to θ.
(iii) For 1 ≤ i, j, k ≤ + 1, there exist functions: Note that the second-and third-order partial derivatives of the function g with respect to λ are both 0. According to assumptions (A2) and (A3), exist and are continuous with respect to θ. According to assumption (A5), V(θ 0 ) is non-singular. Based on assumptions (A1), (A4), and the Hölder inequality, all four conditions are satisfied. Thus, Theorem 1 holds.
By a similar reasoning, let M * T = max 1≤t≤T M t (θ) , and for any ε > 0, with probability 1, there will be only a finite number of T ∈ N such that M * T > ε √ T. Consequently: The ergodicity property of {Y t } and Lemma A1 lead to: Theorem A2. Given the ergodicity property of {Y t } and Lemma A1, and applying Theorem 14.6 from Davidson [30], we have: Similarly, E M n( +1) F n−1 = M (n−1)( +1) . Thus, for 1 ≤ i ≤ + 1, M ni , F n , n ≥ 0 is a martingale. Based on this and the ergodicity property of {Y t }, and using Lemmas 2 and 3, applying Theorem 25.4 from Davidson [30] establishes that the conditions of Theorem 25.3 in Davidson [30] are satisfied, resulting in: Furthermore, for any ( + 1)-dimensional vector c = 0, we have: Here, σ 2 = E c M t (θ 0 )M t (θ 0 ) c . Therefore, we have: In summary, H(θ 0 ) d → χ 2 ( + 1).
Theorem A3. Following steps similar to those in Yu, Wang, and Yang [18] and Qin and Lawless [20], we can show (by replacing the usage of the double logarithm law with Lemma A4): Furthermore: where: Based on this, we perform a Taylor expansion of 2L E θ (1) EL at θ = θ 0 , γ = 0: It is easy to see that is a symmetric matrix; we will now show that this symmetric matrix is positive-semi definite: here, A B implies that A − B is positive-semi definite. Therefore, by the result in Rao [32], we have:  Parameter: β 0 = 1, β 1 = −0.6, λ = 1.2 φ t |y t−1 is fixed. T 300 500 800 1200 2000 Table A4. Cont.