Stability and asymptotics for autoregressive processes

: The paper studies inﬁnite order autoregressive models for both temporal and spatial processes. We present suﬃcient conditions for the existence of stationary distributions. To understand the underlying dynamics and to capture the dependence structure, we introduce functional dependence measures and relate them with Lipschitz coeﬃcients of the data- generating mechanisms. Our stability result allows both short- and long-range dependence. With functional dependence measures, we can establish an asymptotic theory for the underlying processes.


Introduction
Nonlinear autoregressive (AR) processes have been extensively studied in the literature; see Priestley [25], Tong [31], Tjøstheim [30], Fan and Yao [12] and Wu and Shao [35], among others. To study such processes, one needs to deal with two fundamental issues: stability and asymptotic theory. For the former, one should develop sufficient conditions on the mechanism of the underlying process so that it can have a stationary solution. The asymptotic theory is useful for the related statistical inference. In this paper we shall consider both issues for a general class of nonlinear AR(∞) models.
To fix the idea, we adopt the following setting. Let t , t ∈ Z, be independent and identically distributed (i.i.d.) random elements on the probability space (Ω, F, P). Consider where F t (x 1 , x 2 , . . .) = F (x 1 , x 2 , . . . ; t ) is a real-valued function and can be viewed as the data generating mechanism of the process (X t ). We can view (1) as an AR(∞) process. By Wu and Shao [35], for the special AR(1) process assuming that there exists x * such that E|F 0 (x * )| p < ∞, p ≥ 1, and the contraction condition where Z p = (E|Z| p ) 1/p , then there exists a stationary solution of the form where g is a measurable function and ξ t = ( t , t−1 , . . .) is the shift process. See Diaconis and Freedman [7] and Jarner and Tweedie [18] for related contributions. If condition (3) fails with L = 1, then (X t ) may not have a stationary solution. A prominent example is the random walk X t = X t−1 + t which has L = 1. Shao and Wu [29] considered the AR(d) processes with finite lag d: and obtained a similar result: (5) has a stationary solution if where a i ≥ 0 are Lipschitz constants: for all s 1 , . . . , s d ; w 1 , . . . , w d : Stability and asymptotics for autoregressive processes 3725 For the AR(∞) process (1), it turns out that, interestingly, contraction condition like (3) may not be needed for stationarity. For example, consider the fractional integration model where BX t = X t−1 is the backshift operator. We can rewrite (7) in the form of (1) with .
Then the Lipschitz constants (a k ) for the corresponding linear function F have sum k≥1 a k = 1, while (7) does have a stationary solution since 0 < d < 1/2. In Section 2 we shall study the stability problem for AR(∞) long memory processes. Hence differently from (2), AR(∞) models can allow both short-and long-range dependence.
For extension to spatial processes, we consider the simultaneous autoregressive scheme We say that where g(·) is a measurable function, is a stationary solution if it satisfies the relation (8). When d = 1, then (8) reduces to the two-sided AR(∞) process Differently from (1), the autoregressive scheme (10) allows non-causality. Properties for spatial processes are studied by Whittle [33] and Besag [3] among others. Gaussian and linear spatial processes have been widely studied in the literature. For linear processes Whittle [33] proposed ways to transform bilateral models to unilateral ones so that results in time series can be applied. The case with nonlinear processes is more challenging. Here, we will study stationary distributions for bilateral models directly using an idea which is similar to loopy propagation under a short-range dependence condition.
To perform statistical inference for the process (X t ) such as hypothesis testing and construction of confidence intervals, we need to establish an asymptotic theory. In particular we will present a central limit theorem and a Gaussian approximation result. To this end, we need to measure the decay speed of dependence. In this paper we adopt the framework of functional dependence measures introduced by Wu [34] which is easy to work with for a broad class of functions and enables us to obtain sharp approximation rate. The main task lies in building a convolution relationship between Lipschitz coefficients of F t (·) and functional dependence measures of the underlying processes. See Section 3.1 for details.
Using the functional dependence decay results we can derive a CLT and a quenched CLT for the stationary process, and a Gaussian approximation which can be used in various applications such as change-point analysis. For the augmented GARCH(1,1) model, Aue, Berkes and Horváth [1] obtained a Gaussian approximation rate o(n 5/12+ ), > 0. Using our result, we can derive a sharper Gaussian approximation for both Model (1) and augmented GARCH(∞), and our rate is optimal in view of the classical Gaussian approximation for i.i.d. random variables by Komlós, Major and Tusnády [22].
The paper is organized as follows. In Section 2 we present sufficient conditions for the existence of stationary distributions for both temporal and spatial models. It turns out that, interestingly, our result can also be applied to random coefficient models which are not in the form of (1); see Section 2.2 where the augmented GARCH(∞) processes are discussed. In Section 3, we introduce functional dependence measure and apply it to our models. Based on that we derive the relationship of decay rate between coefficients and functional dependence measure, and develop various asymptotic results. Proofs are given in Section 4.

Stationary distribution
In Section 2.1 we shall present sufficient conditions for the existence of stationary distributions of model (1) with short-range dependence. The theorem is also applicable to random coefficient models; see Section 2.2. Section 2.3 (resp. 2.4) concerns long-range dependent processes (resp. simultaneous autoregressive schemes), while Section 2.5 deals with extensions to non-stationary processes.

Short-range dependent AR(∞) processes
To state our main stability result for the process (1), we shall introduce a Lipschitz type condition. For a random variable Z, we say Z ∈ L p , if Z p := Condition 1. (Stochastic Lipschitz Continuity). Assume that there exist constants a k ≥ 0 k ∈ N, such that for all w = (w 1 , w 2 , . . .) and s = (s 1 , s 2 , . . .), F 0 (w) ∈ L p , p ≥ 1 and Definition 1. On a filtered probability space (Ω, F, (F t ) t∈Z , P), a process (X t ) is said to be adapted, if for each t, X t is F t measurable.
From now on, always let F t be the σ-field generated by ξ t = ( t , t−1 , ...).
Then there exists a unique strictly stationary solution in L p adapted to (F t ) t∈Z .
To incorporate the case with 0 < p < 1, we need to slightly modify Condition 1.
Further assume (12). Then (1) has a unique strictly stationary distribution in L p adapted to (F t ) t∈Z .
Finite order bilinear processes are considered in Granger and Andersen [16] and Rao and Gabr [27].
To study the existence of stationary solutions, we use the idea of backward iteration, enlightened by the "coupling from the past" algorithm in Propp and Wilson [26]. Traditionally forward iterations are considered. For the simple Markov chain example (2), for convenience assume that 0 is in the state space, one checks whether the forward iteratioñ converges weakly as t → ∞. If it converges weakly to a distribution π (say), then π is a stationary distribution. For the backward iteration, we let Under suitable conditions on F t (·), X (t−n) t converges almost surely as n → ∞ and the limit, denoted by X t , satisfies (2). For the AR(∞) process (1), we follow a similar idea to generate the sequence. Let Note that X (m) n has the same distribution as X converges almost surely to X t (say) as n → ∞. In the proof of Theorem 1 we shall make the latter idea rigorous.

Random coefficient models
Interestingly, our stationarity theory also applies to the following process where t are i.i.d. and {f k } k≥0 are real-valued functions. Let Λ : R + → R be an invertible function. Based on (16), we can define the augmented GARCH process Y t by restricting f k ( t−1 , ..., t−k ) to f k ( t−k ) and letting provided that Λ(x) is invertible; see Duan [10]. Augmented GARCH contains many commonly used ARCH models.
The lag one version was considered by Carrasco and Chen [5] and they studied its moment and mixing properties.
Example 5. Ding, Granger and Engle [8] introduced the asymmetric power See [10] and Aue, Berkes and Horváth [1] for other GARCH models. Aue, Berkes and Horváth [1] derived Gaussian approximation for partial sum processes. [17] used blocking technique to derive various asymptotic properties. The stability and asymptotic properties for infinite lag Augmented GARCH have not been discussed in the literature. Here we can tackle the latter problem by using the similar method as for process (1).
Then process (16) has a unique L p strictly stationary solution with form Remark 1. Inequalities (12) and (19) can be viewed as contracting conditions. In situations that they equal 1, extra assumptions are needed to guarantee a stationary solution; see Section 2.3. Such a process can be long-memory.
For 0 < p < 1, we need a slight modification of condition (19) and consider an approach similar to Corollary 1. Douc, Roueff and Soulier [9] used Volterra expansion to establish the stationary distribution. Here we can deal with a more general situation.
, then there exists a stationary solution for (16) which has a finite pth norm.
Remark 2. Notice that in Model (16), if all coefficients f k ≥ 0 for k ≥ 0, then the stationary distribution for (16) is nonnegative. This is useful in checking the existence of Λ −1 (X t ), for example if Λ(x) = x, then for σ 2 t = Λ −1 (X t ), we require X t to be nonnegative.
Example 6. For the ARCH(∞) model (18), we let in (17) X t = ρ t , k = ξ k and f k (x) = β k x. Corollary 2 gives the sufficient condition for stationarity of (Y t ): The above condition is also proposed in Giraitis, Kokoszka and Leipus [13]. With the special structure (18), they apply the Volterra series expansion, which is also used in subsequent works; see Kazakevicius and Leipus [21], Giraitis, Leipus and Surgailis [14,15] among others. In comparison, our treatment does not rely on this special structure. Instead we use a convolution relation and backward generation which can be applied to a broader class of nonlinear models.

Long-range dependent AR(∞) processes
If condition (12) in Theorem 1 is violated and ∞ k=1 a k = 1, as the fractional integration process (7) shows, a stationary solution can still possibly exist. For convenience we still assume that 0 is in the state space. For process (1) define X (t) t+k recursively: Condition 2. There exists some p > 0 such that We can view G t as a backward shift process. This condition can be interpreted as that the cumulative influence of the initial state is finite. It holds for many processes. Below we shall consider the example of random coefficient AR models.
A special case of (21) is the bilinear model with f (20) based on (21): Thus the difference follows Notice the initial value for above iteration is Consequently by induction and (22) we get E(X Then (1) exists a stationary L p solution.
We now apply Theorem 2 to (21). Assume that a k = f k ( 0 ) p < ∞, p ≥ 2 and (23) holds, then there exists a stationary L p solution.
and |G(s)| is bounded from below by a constant c > 0, then (23) holds. Also we can replace (23) by some corresponding conditions on the tail sum, where In the following example we shall apply Tauberian's Theorem to verify (24). For sequenceses (a n ) and (b n ), denote a n ∼ b n , if a n /b n → 1 as n → ∞.
which is an analogue of Condition 2 for process (16) and it also implies the cumulative influence of initial state is finite. From the recursive equation and that the initial value X (27) directly follows.

Simultaneous autoregressive schemes
In this section we shall consider stationary distribution for spatial models. Linear spatial processes were studied in Whittle [33] and Besag [3] among others. For the form (10), which can be viewed as a bilateral version of Model (1), we adopt an idea which is similar to the loopy propagation commonly used in machine learning. First set the initial values to be zero, and then update them based on previous results: Similarly, we can set the initial value and update them for the general form (8). Under suitable conditions on G (cf Condition 3), X {k} t has a limit as k → ∞; cf Theorem 3.
Condition 3. There exist constants a v ≥ 0, v ∈ Z d , and p ≥ 1, such that Theorem 3. Assume Condition 3 with some p ≥ 1 and the contraction condition k =0 a k < 1. Then there exists a unique L p stationary solution for (8).
Consider the spatial threshold AR model This example is a spatial generalization of threshold AR processes of Tong [31].

Extension to non-stationary processes
Though stationary models work well in many cases, they may be unsuitable for more complicated situations: when the location domain has boundary or is non-lattice, due to different configuration of neighborhood, there is hardly a geometric or physical base to assume stationarity. For those irregular cases, the function G (8) can be location dependent. Paulik, Das and Loh [24], Brunsdon, Fotheringham and Charlton [4] studied linear cases and Jenish and Prucha [19,20] derived LLN and CLT for nonlinear situations under mixing or near-epoch dependence.
Here we consider the model: where Θ is a set with countably many points, Θ t ⊆ Θ which may change with t and the data generating mechanism G (t) is a real-valued measurable function. The above setting may appear in practice, for instance if the lattice exists boundary, or if we are not dealing with regular lattice but just certain undirected graphs etc, then the relative configuration for each point will be different and it is no longer appropriate to assume same function G for every point.
For this more general situation, under certain uniform bounded conditions on G (v) , there exists measurable function H t such that X t = H t (ξ) satisfies the system (30) where ξ = ( t ) t∈Θ . Condition 4. Assume there exists coefficients a t,s ≥ 0, ρ < 1, M < ∞ and p ≥ 1 such that the data generating mechanism G (t) satisfies Corollary 5. If (G (t) ) t∈Θ satisfy Condition 4, then there exists a measurable function H t such that X t = H t (ξ) ∈ L p and (30) holds.

Functional dependence measures
In this section, we shall compute functional dependence measure introduced by Wu [34] for the processes (1) and (8). In view of (4) and (9), we consider the form where data generating mechanism g is a real-valued measurable function such that X i is properly defined, and i , i ∈ Θ, are i.i.d. random variables. For model (1) with representation (4), Ξ = {0, 1, 2, . . .} and Θ = Z. For spatial process on lattice in Z d , both Ξ and Θ are Z d .
Assume that X i ∈ L p , p > 0. Let j , i , i, j ∈ Θ, be i.i.d. random variables. In view of (31), X t is a random variable constructed on the underlying random sequence ( i , i ∈ Θ). Therefore instead of directly describing the relationship between X s and X t , we use functional dependence measure to capture the extent to which X t depends on the underlying random variables ( i ). Change t−i to t−i and keep other i unchanged, we get a copy of X t which is denoted by X t,t−i . By stationarity, the functional dependence measure To deal with functional dependence measures, we need the following Theorem 4 which concerns magnitudes of the convolved sequences. The result is of independent interest. Case (v) provides an explicit decay speed of the functional dependence measure and it implies that the bound in case (ii) is sharp.
In the following two sections, we will apply this dependence measure into our models and derive relationship of decay rate between the functional dependence measure and Lipschitz coefficients. The functional dependence measure of the underlying procedures can be quite useful for further deriving asymptotic properties; cf Section 3.2-3.4.
We can similarly have a corresponding result for the process (16).
Corollary 7. Let (X i ) i∈Z be the random coefficient AR(∞) process defined in (16). Assume that conditions in Corollary 2 are satisfied with p ≥ 1 and Remark 5. For the PGARCH and asymmetric PGARCH, θ t,p = 0 for any t.
Proof of Corollary 7. Change 0 to 0 and we can similarly obtain a new sequence X t,0 which satisfies Let u n = a n and v n = b n as in (37). Then the result follows from Theorem 4.

Functional dependence measure for simultaneous autoregressive schemes
Consider the process (8). Recall the generating mechanism of (X Condition 5. Let p ≥ 1. For the data generating mechanism G 0 in (8), assume that

Condition 5 holds, for instance, when
Then the functional dependence measure δ

Proposition 1. Assume that function G satisfies Conditions 3 and 5 with
For the line transect model (10) , which is the same as Theorem 4. For higher dimension Z d , we require an extra factor |v| 1−d .
Example 11. If the process only contains finite order, then we can have geometric moment contraction ( [35]) under milder conditions. To be specific, for X t = G t (X s , s ∈ Θ), t ∈ Z d , assuming that the index set Θ has only finitely many vectors in Z d -{0}, then r = max{|v|, v ∈ Θ} is finite and functional dependence measure of X t decays geometrically, provided that Condition 3 holds with ρ = v∈Θ a v < 1. Let K = [|t|/r]. Then Hence we can have geometric moment contraction δ t = O(ρ |t|/r ). In the nonlinear time series setting; cf (5) and (6) , Shao and Wu [29] obtained a similar result.

Functional dependence measure for non-stationary simultaneous autoregressive schemes
Let (Θ, d) be a metric space containing countably many indices. By Corollary 5, we can construct (X t ) t∈Θ satisfying (30). Interestingly, we can obtain similar results as Proposition 1 for such a system. To account for non-stationarity, we define the functional dependence measure Condition 6. Let p ≥ 1. For G (t) in (30), assume that for all t ∈ Θ, and M 0 := sup t∈Θ g t ( 0 , 0 ) p < ∞.

Central limit theorem
Theorem 1 in El Machkouri, Volnỳ and Wu [11] asserts that, if the functional dependence measure δ i,2 for the process (9) is summable, Notice that the above CLT holds without specifying any other requirement on Γ n . The summability of (δ i,2 ) follows from Theorem 4. If additionally |∂Γ n |/|Γ n | goes to zero, then For the process (8), by Proposition 1, if Condition 5 holds with p = 2 and v =0 a v < 1, then i∈Z d δ i,2 < ∞ and the above CLT holds.

Quenched central limit theorem
In certain applications such as MCMC, the process starts at values that do not follow the stationary distribution. This leads to the idea of quenched CLT; see Volnỳ and Woodroofe [32]. For process (1) due to its infinite order, we cannot generate it directly. However we can generate sequence (X • k ) k≥1 through Theorem 5 provides a CLT for such sequences.
Proof of Theorem 5. By Theorem 1, there exists a stationary solution in L p . Let c 0 = max( X 0 p , X • 0 p ). Condition 1 leads to recursive inequality of func- Since d k → 0, (44) goes to 0 as n → ∞. Thus it remains to show the CLT for summation of (X i ). From Theorem 4 (i), Condition 1 leads to the summability of functional dependence measure ( X 0 − X 0,−k p ) k≥0 . Then by (42) in Subsection 3.2, the CLT holds. The proof for Corollary 9 is similar.

An invariance principle
Section 2 provides sufficient conditions for the existence of stationary distributions for process (1) and (16). Based on the functional dependence measures in Section 3.1, we can further derive Gaussian approximation results.
Theorem 6. Let (a k ) be either (i) the constants in Condition 1 for process (1) or (ii) a k = u k +v k defined in (37) for process (16), p > 2. Assume k≥1 a k < 1 and Then there exists a probability space (Ω c , A c , P c ) on which we can define a pro- (16)) and Proof of Theorem 6. We shall only prove case (i) since (ii) is similar. By Theorem 1, (1) has an L p stationary solution (4). Following steps in Section 3, we can get (36). Since Aue, Berkes and Horváth [1] obtained an invariance principle for the process which is a special case of our (16) and (17). Assuming that |1/Λ (Λ −1 (x))| ≤ Cx γ and Λ(σ 2 0 ) ≥ ω hold for some constants C, γ, ω > 0 and Y 0 has a finite v > 4(1 + γ) moment, they obtained a strong invariance principle for S n = n i=1 Y i with rate o a.s. (n θ ), where θ > 5/12. Our Theorem 6 provides a much sharper rate. Let p = v/(1 + γ). Then . Therefore Theorem 6 leads to a Gaussian approximation with error rate o a.s. (n 1/p ), which is much sharper than their rate o a.s. (n θ ) with θ > 5/12 since p > 4. Aue, Berkes and Horváth [1] applied their invariance principle to a change point detection problem with weighted CUSUM statistics ( [6]). It is expected that our sharper strong invariance principle can lead to an improved convergence rate.

Proofs
In this section we shall provide proofs of results stated in the previous sections. Proof. We shall show by induction that τ k ≤ c + k i=1 b i . It trivially holds when k = 0. Suppose it holds for any k ≤ n. Then for k = n + 1, by (33) and Thus N k=0 τ k ≤ (c + B)/δ and therefore T := ∞ k=0 τ k is finite. Applying (33) again and letting N → ∞, we have T = τ 0 + B + AT , implying T = (c + B)/δ.
(60) where x means biggest integer that is no larger than x.
Next we shall apply induction and show that τ n ≤ Mn −θ holds for all n ∈ N. (61) By Lemma 1 we have τ n ≤ c + B, and thus τ k ≤ Mk −θ for any k ≤ N 0 . Suppose for any k ≤ n − 1, τ k satisfies (61). Then for k = n we have by the induction hypothesis that For the second part, applying Jensen's inequality we derive For the last part, according to part(i), Then the induction step is completed in view of Proof of (iii). By (ii), the upper bound follows. By (33), we get τ n ≥ u n τ 0 +v n ≥ min{1, c}(u n + v n ), implying the lower bound.
For the last part, since u k n θ = u k k θ (n/k) θ ≥ C − Δ a , for k ≥ (1 − Δ)n, we know