Statistical inference for non-stationary GARCH( p, q ) models

: This paper studies the quasi-maximum likelihood estimator (QMLE) of non-stationary GARCH( p,q ) models. By expressing GARCH models in matrix form, the log-likelihood function is written in terms of the product of random matrices. Oseledec’s multiplicative ergodic theorem is then used to establish the asymptotic properties of the log-likelihood function and thereby, showing the weak consistency and the asymptotic normality of the QMLE for non-stationary GARCH( p,q ) models.


Introduction
Quasi-maximum likelihood estimator (QMLE) is commonly used in practice to estimate the parameters of ARCH-type models. Literature on statistical inference for the GARCH(p, q) models is considerable. Recent studies on the properties of the QMLE can be found in Berkes et al. [3], Berkes and Horváth [2], Straumann [15], and Robinson and Zaffaroni [13], among others. These papers establish the strong consistency and asymptotic normality of the QMLE by assuming that within a parameter space Θ, the GARCH(p, q) equation admits a strictly stationary solution for all θ ∈ Θ. In the contrary, Jensen and Rahbek ([7], [8]) relax the stationarity conditions and establish the asymptotic behavior of non-stationary GARCH(1, 1) and ARCH(1) models. That is, the QMLEs of both stationary and non-stationary GARCH(1, 1) model are asymptotically normal and consistent in certain senses. The purpose of this paper is to extend Jensen and Rahbek's result to general non-stationary GARCH(p, q) situations.
In this paper, we are interested in the case of ρ > 0 . Under this situation, the GARCH model does not admit any strictly stationary solution. However, a stochastic process {X t } 0≤t≤n can nevertheless be defined by specifying the initial probability distribution of the vector Y −1 .

Main results
Before stating the assumptions, an alternative vector-matrix representations of the GARCH model is introduced. Let Y ′ t = (Y t , 1) T and The GARCH model can be rewritten as Y ′ t = A ′ t Y ′ t−1 . We assume the following conditions throughout this paper. A1: Eǫ 4 t < ∞, and E|ǫ t | −2δ < ∞ for some δ > 0 . A2: The top Lyapunov exponent of A for the data generating process is strictly positive. A3: The top Lyapunov exponent of A ′ for the data generating process is strictly positive and simple (cf. Theorem 4.1 for the definition of simplicity).
Remark 2.1. Details about the concepts of the Lyapunov exponents related to the discussion in this paper are given in Appendix A.1. According to Oseledec's multiplicative ergodic theorem, p + q − 1 real numbers (−∞ is allowed), called Lyapunov exponents, can be associated to a sequence of random matrices A 1 , A 2 , . . . , to characterize the asymptotic behavior of the product A n A n−1 . . . A 1 . These Lyapunov exponents may have multiplicities greater than one. A Lyapunov exponent is called simple if its multiplicity is one. The greatest Lyapunov exponent is the top Lyapunov exponent defined in Section 1.
When the top Lyapunov exponent of A is strictly positive, we have the volatilities diverging to infinity, exhibiting the explosive behavior.
Lemma 2.1. Let ρ be the top Lyapunov exponent of A and suppose that ρ > 0 . Then, we have lim t→∞ σ 2 t = +∞ a.s. The main results on consistency and asymptotic normality of the QMLE are given as follows.
and ω be arbitrarily chosen fixed values. Here, all the elements in H 0 are non-negative but not all elements equal to zero. Then, there exist a positive-definite matrix Ω and a fixed open neighborhood M (θ 0 ) of θ 0 , independent of n , such that (I) with probability tending to one as n → ∞, the likelihood function L n (θ) is uniquely minimized in M (θ 0 ), . Remark 2.3. Theorem 2.1 guarantees the existence of a consistent local QMLE in an open neighborhood M (θ 0 ) of θ 0 . As θ 0 is unknown, in practice, we search for the stationary points of L n (θ) within instead. Denote the set of such stationary points by T . Then, θ n constructed in Theorem 2.1 belongs to T . That means, if n is sufficiently large, T contains a vector that is close enough to the true parameter θ 0 . If T is a singleton, then the only element in T must equal to θ n = arg min M (θ 0 ) L n (θ) . Although the uniqueness of the stationary point is not guaranteed, based on simulations, Gauss-Newton type methods usually give a solution close to the true value θ 0 in most practical situations.

Proofs
This section provides proof of Lemma 2.1 and Theorem 2.1. Lemma 2.1 is shown in subsection 3.1. An outline of the proof of Theorem 2.1 will be given in subsection 3.2 while the technical details are given in the subsequent sections and the appendix. The following conventions are used throughout the paper.
Convention 3.2. The notation e i refers to a unit vector with the i-th component equaling one and other components equaling zero. When there is no confusion, the dimension of e i is not specified.  Convention 3.5. Two matrix norms are used throughout this section. They are · 1 , the largest row sum of the matrix and the operator norm · , i.e., M = sup |x|=1 |M x| .

Convention 3.6.
Let Ω be the sample space. It can be chosen as the set that contains all sample paths of {ǫ t } t∈Z . Let L be a shift operator on Ω .

Proof of Lemma 2.1
Applying the recursive relationship Y t = A t Y t−1 + b repeatedly, we have for sufficiently large t . Consequently, σ 2 t diverges almost surely.

Outline of the Proofs of Theorem 2.1
Note that the process is not stationary and therefore, the ergodic theorem and central limit theorem are not directly applicable to establish the asymptotics of L n (θ) . In the case of GARCH(1, 1), Jensen and Rahbek [7] suggest that the asymptotic properties of θ n can be obtained without using the convergence and asymptotic normality of L n (θ) if the derivatives of ℓ t (θ) up to order three can be approximated by some stationary processes. To generalize the results of GARCH(1, 1) to GARCH(p, q), the most difficult part of the proof is to show that the quantity h t−j (θ 0 )/h t (θ 0 ), which appears in the derivatives of ℓ t (θ) , has the following two properties: (1) For any fixed positive integer j , h t−j (θ 0 )/h t (θ 0 ) has limiting distribution when t → ∞ .
Provided that (1) and (2) hold, the remaining of the proof is analogous to that in Jensen and Rahbek [7]. The proof of (1) and (2) are less trivial than that of the GARCH(1, 1) case. For property (1), take j = 1 as an example. In the GARCH(1, 1) case, h t−1 (θ 0 )/h t (θ 0 ) can be approximated by Here, the right-hand side is stationary. In the GARCH(p, q) case, the quantity This complicates the matter. To establish the convergence of h t−j (θ 0 )/h t (θ 0 ) , techniques for product of random matrices are indispensable. Property (1) is established in the following lemma, which is a consequence of Proposition 4.1 and Lemma 4.1 given in Section 4. Lemma 3.1. There exists a stationary, ergodic, and adapted stochastic vectorvalued process {η t } such that N.H. Chan and C.T. Ng/Non-stationary GARCH

962
To see this, let F ≫ 0 be a (p + q − 1)-dimensional vector. Since all elements in A t−j are non-negative, Applying the above step repeatedly, If the first component of F is one, then This inequality is applicable when F = Y t−j−1 /h t−j . We have the following lemma. The proof of the lemma is given in Section 5.

Product of random matrices
This section is devoted to establishing some properties related to the product of random matrices P ′ t = A ′ t A ′ t−1 . . . A ′ 1 that was used in Section 3 to establish Lemma 3.1. Recall that the GARCH model can be written in vector matrix notation Section 2). The product P ′ t arises from applying the above recursive relationship repeatedly. Oseledec's multiplicative ergodic theorem and the concepts of Lyapunov exponents are essential tools for our purpose. According to Oseledec's multiplicative ergodic theorem, p + q − 1 Lyapunov exponents are associated to P ′ t to characterize the asymptotic behavior of P ′ t . The results on lim and are given in subsection 4.2 and 4.3 respectively. Subsection 4.1 provides an introduction to Oseledec's multiplicative ergodic theorem which is essential to understanding the materials presented in subsections 4.2 and 4.3.

Oseledec's multiplicative ergodic theorem
Results related to Oseledec's multiplicative ergodic theorem are introduced in this subsection. References on this topic can be found in Ledrappier [10], the collections of Cohen et al. [5] and Arnold, Crauel and Eckmann [1]. Section 1.5 of Krengal [9] also provides a short introduction to some of the results. Oseledec's multiplicative ergodic theorem is stated in Theorem 4.1, in which, Lyapunov exponents and their multiplicities are defined. Ledrappier's version of multiplicative ergodic theorem and some related results are stated without proof in Theorem 4.2.
(II). For 1 ≤ k ≤ s, the random set is a subspace with dimension r k + · · · + r s .
(III). The subspaces can be arranged in an asending order . . be a stationary and ergodic stochastic process of invertible d×d matrices such that E log with the following properties.
1. For 1 ≤ k ≤ s, and u ∈ W k . We have we have the version of LeJan [11] for Oseledec's theorem In addition, ξ k (ω) depends only on the values of M t for −∞ < t ≤ 0 and hence the process ξ k (L t ω) is stationary, ergodic, and adapted.

Top Lyapunov exponent of
The purpose of the subsection is to prove Proposition 4.1 below.
Proof of Proposition 4.1. We prove this statement in two steps.
Step 1: and P t and x are all non-negative, it is enough to show that 1 t log |P ′ t e 1 |, . . . , 1 t log |P ′ t e p+q−1 | converge to the same limit ρ ′ 1 . Theorem 4.1 guarantees that 1 t log |P ′ t e 1 | converges. To obtain the convergence of 1 t log |P ′ t e 2 |, . . . , 1 t log |P ′ t e p+q−1 | , the following identities will be used. Similar identities can be found in the proof of Theorem 1.3 in Bougerol and Picard [4]. For simplicity, we use the notation Π n j=2 Then It can be seen from these identities that 1 The required result follows from the following fact. For all positive sequences a n , b n and positive constants k 1 , k 2 , we have 1 t log a n → a and 1 is a consequence of the above fact and the following inequalities, The second inequality is obtained by considering y such that sup |y|=1 |P ′ t y| is attained. Since the absolute values of all components of y must be smaller than one, we have This yields the required result.

Asymptotic behavior of
The purpose of this subsection is to establish the following lemma.
as t → ∞ , then there exists a stationary, ergodic, and adapted stochastic R p+q+1valued process {η ′ t } t∈Z such that as n → ∞,

966
Outline of the proof of Lemma 4.1: To proof Lemma 4.1, we construct the stochastic process η ′ t from Ledrappier's version of multiplicative ergodic theorem, which is stated in Theorem 4.2. This theorem associates a stationary and ergodic sequence of invertible matrices Here, we construct invertible matrix M t and a linear transform The proof is organized as follows. Firstly, the invertible random matrices M t and the linear transform E t are defined. Proposition 4.2 relates the matrix A ′ t to M t , which can be used to establish the identity (4.1). We show in Proposition 4.3 that A ′ t and M t share the same set of Lyapunov exponents except −∞ that appears in A ′ t only. These allow us to establish the asymptotic behavior of A ′ t from those of M t . Finally, we show that can be served as an approximation to t to the invertible matrix M t : M t and the linear transform E t : R max(p,q)+1 → R p+q that links A ′ t and M t , are constructed as follows.

967
Define In order to apply Theorem 4.2, we need E log + M −1 0 < ∞. The choice of the norm here is immaterial as all matrix norms are equivalent. It is more convenient to work with the norm · 1 . For {M t } chosen in this subsection, the condition holds as E log + |ǫ 0 | −2 < ∞, or equivalently, E|ǫ 0 | −2δ < ∞ for some δ > 0.
Proof. Directly from the definition.
The multiplicities of a Lyapunov exponents ρ ′ s , . . . , ρ ′ 1 are the same for A ′ t and M t .
Proof. First, we show that −∞ is a Lyapunov exponent of A ′ t . Let J r (λ) be the standard Jordan block of order r with diagonal elements equaling λ. Simple algebraic manipulations show that a non-random full-rank matrix P with min(p, q) − 1 columns can be found so that t with multiplicity at least equaling to min(p, q) − 1 (see Theorem 4.1).
By Theorem 4.1, we can find vector spaces Let V P be the vector space spanned by the columns of P. Define a set of vector spaces The linear transformation E 0 does not change the dimension of a vector space. In addition, for any 1 ≤ k ≤ s, any elements in V P , η P , say, cannot be written as a linear combination of any basis of E 0 V k , ξ 1 , . . . , ξ rk , say. To see this, assume on the contrary that By the invertibility of M , we have M t · · · M 1 ξ i are linearly independent and hence, As a result, the dimension of E 0 V k + V P must be min(p, q) − 1 + r k + · · · + r s .

Now, it can be seen that
Then .
fulfills the requirement. By Proposition 4.2 and the fact that

Miscellaneous results on matrices
This appendix presents two results of the matrices B and (Q t,j ) 11 introduced in Section 3. These two results are frequently used. In the following, (B j ) ik is the (i, k)-th element of the j-th power of B.
Proof. (I) The first conclusion is trivial.
(II) The characteristic equation of B is given by Let λ 1 , . . . , λ p be the eigenvalues, then, Take R = (|λ 1 | + δ). By Cauchy's estimation (see Theorem 10.26 in Rudin [14]), an upper bound is given by (III) Under the condition that β 1 , . . . , β p > 0, the characteristic equation (5.1) has one and only one positive real root, which is also the root with the largest modulus.
Consider the Jordan decomposition B = P JP −1 . Normalizing the first component, the eigenvector corresponding to λ 1 is and the corresponding row in P −1 with the first component normalized is Note that all elements in this row vector must be greater than zero. If any one of them is negative, for example, the second component, then we have The characteristic equation is no longer satisfied by λ 1 . Since λ 1 only appears once in the Jordan matrix J and the coefficients of λ j 1 for e T 1 B j in the decomposition B j = P J j P −1 do not equal to zero, we have (B j ) 1i = O(λ j 1 ). Remark 5.1. A necessary and sufficient condition for B j to decay exponentially is that ρ(B) < 1. This condition is equivalent to the condition that all roots of 1 − β(z) = 0 lie outside the unit disc. The latter one will be used often.
Here ρ depends on B and on the choices of δ and δ i s. We turn to the multinomial expansion of (Q t,ℓ ) 11 .

973
Next, we show that for sufficiently large i and that (Q t,ℓ ) 11 decays exponentially almost surely. Note that (B i )11 (B i−1 )11 → λ , then for an arbitrarily chosen δ ′ > 0 , we have For non-degenerate ǫ 2 , f(0) < λ −r and g(0) > log λ . By the right-continuity of both f(x) and g(x) at x = 0 , we can choose δ ′ > 0 such that K 1 = f(δ ′ ) < λ −r and K 2 = g(δ ′ ) > log(λ) also. Almost sure exponential decay of (Q t,ℓ ) 11 follows from Appendix A: A detail Proof of Theorem 2.1 We will closely follow the method of Jensen and Rahbek [7] to complete the proof of Theorem 2.1. It would be much easier to establish Theorem 2.1 with the additional assumptions that ω = ω 0 and H 0 = (σ 2 0 , σ 2 −1 , . . . , σ 2 −p+1 ) , in which case h t (θ 0 ) = σ 2 t . Under such assumptions, we have the following Theorem A.1, which is proved in subsections A.1 to A.3. In subsection A.4, we prove Some technical lemmae to be used are given in subsection A. 5. In what follows, we give an outline of the proof of Theorem A.1. It suffices to construct positive-definite matrices Ω 1 , Ω 2 , and a neighbourhood of θ 0 , N (θ 0 ) such that the following conditions C1-C3 hold. Then, Lemma 1 of Jensen and Rahbek [7] yields our results.
and (C3) the third-order derivatives are uniformly bounded by n-dependent random variables C n , max where C n → p c for some 0 < c < ∞.
The mean ergodic theorem and the martingale-array central limit theorem (see Pollard, [12]) can be be used to establish C1-C3 provided that we are able to construct stationary and ergodic stochastic processes which approximate l t (θ 0 ) and its derivatives up to the second order. Similarly, to establish C3, we need stationary and ergodic stochastic processes v i1,i2,i3 t such that Ev i1,i2,i3 The derivatives of l t (θ) up to first three orders are given below (see equations 8-10 in Jensen and Rahbek [7]).
Below, we consider the terms (θ) that appear in the above equations individually. Some useful identities are given.
First, when ω = ω 0 and H 0 = ( In particular, we have The quantities ∂ i h t (θ) , ∂ i1i2 h t (θ) , and ∂ i1i2i3 h t (θ) can be expressed in terms of h t−j (θ) , for j = 1, 2, 3, . . . Consider the following recursive relationship in vector-matrix form, With (A.4), the recursive relationships for the derivatives of H t up to order three can be obtained. For example, the first order derivatives are given by Applying the above recursive relationships repeatedly, we have This subsection is devoted to establishing the approximations to h i 1t (θ 0 ) and h i1i2 2t (θ 0 ) by stationary and ergodic processes which are then used in subsection A.1 to guarantee C1 and C2. Since we are only interested in θ = θ 0 when establishing C1 and C2, we drop the term (θ 0 ) and write (α, β) instead of (α 0 , β 0 ). Throughout this section, we assume that the conditions in Theorem A.1 hold.
Applying Y t = A t Y t−1 + b repeatedly, h t can be written as the sum of Note that the first term will be dominated when h t = σ 2 t → +∞ , which is guaranteed by Lemma 2.1. In addition, we have It is shown in Lemma 3.1 that there exists a stationary, ergodic, and adapted stochastic vector-valued process {η t } such that when t → ∞, The approximation to h i 1t (θ 0 ) and h i1i2 2t (θ 0 ) are given by u i 1t and u i1i2 2t . For and for θ i = α µ , where µ = 1, . . . , q, define The second order derivatives h i1i2 2t are approximated by When {θ i1 , θ i2 } = {β 1 , β 1 }, (Q t,j ) 11 and u i1 1,t−j are independent. Using Minkowski's inequality and the preceding result of E(u i 1t ) p , we have Proof.
Step 1: First, we give upper and lower bounds for the differences u i1...ik kt − h i1...ik kt and show that the upper and lower bounds converge to zero in L p .
Here, we only consider the case {θ i } = {β 1 }. In this case, we have It should be noted that for any integer j , the summand converges almost surely to zero as t → ∞ according to Lemma 3.1 and it can be bounded by Here, the quantity max 1≤k≤p+q−1 1 − η t−1,k Y t−1,k /h t is bounded above by some random variable with finite moment according to Lemma A.16. In addition, it converges to zero almost surely as t → ∞ by Lemma 3.1. Therefore, the moments of this quantity converge to zero by dominated convergence theorem. Also, we have by noting that the term vanishes as a result of Proposition 5.1 and Lemma 3.2. The lower bound for u i kt − h i kt can be given by That the p-th moment of the upper bound converges to zero can be shown using Minkoswki's inequality and the following fact. Let a t and b t be two sequences. If a t decays exponentially and b t → 0, then, the sequence To see this, let n be an integer such that for j > n, we have |b n | < δ , where δ > 0 is an arbitrarily small real number. Suppose that |a t | ≤ Kλ t . Then, for t > n, The last term converges to zero as t → 0, hence the required result.
Next, we construct an upper bound for u β1 1t − h β1 1t . Note that by Proposition 5.1 and Lemma 3.2, the sum converges. Then, suppose that n is an integer so that the sum is sufficiently small. Assume that t > n, then we have Then we have the required result h β1 1t − u β1 1t → L p 0.
Step 3: That u i1i2 2t − h i1i2 2t → L p 0 can be shown in a similar manner as in Step 1 by means of Lemma A.1 and the recursive relationship (A.7).

A.2. Conditions C1 and C2
With the stationary and ergodic stochastic processes u i 1t and u i1i2 2t constructed in the last subsection, conditions C1 and C2 are established in this subsection. Again, since we are only interested in θ = θ 0 when establishing C1 and C2, we drop the term (θ 0 ) and write (α, β) instead of (α 0 , β 0 ). Define where Ω 1 and Ω 2 in Lemma 2.1 are chosen to be E(1 − ǫ 2 ) 2 Ω and Ω respectively. Lemma A.3 gives C1 while Lemma A.4 gives C2. Lemma A.5 establishes the positive-definiteness of Ω.
Proof. The convergence of the first order derivative of the quasi log-likelihood function, can be obtained by the martingale central limit theorem. Using Lemma A.2 and the mean ergodic theorem, the sum of conditional covariances is To show that the Linderberg condition holds, we bound h i 1t by a stationary and ergodic process.
What remains are similar to the arguments in Berkes et al. [3]. If there is a positive integer m such that ψ m = ψ * m and for all 0 < i < m, Since the first and second terms in the square bracket are independent, the distribution of ǫ 2 t−m must be degenerate, which is impossible under our assumption. Thus we must have ψ j = ψ * j for all j = 1, 2, . . . . Within the radius of convergence, by the assumption that α(z) and 1 − β(z) are co-prime, α(z) = α * (z) and β(z) = β * (z). That is, λ = 0.

A.3. Condition C3
This subsection is devoted to bounding the quantities h The results are given in Lemmas A.8 to A.11. It should be noted that the conditions ω = ω 0 and H 0 = (σ 2 0 , σ 2 −1 , . . . , σ 2 −p+1 ) are never used in this subsection, and so, the results given here are applicable to proving Theorem 2.1, too.
The neighborhood N (θ 0 ) is chosen as a rectangular region θ L ≪ θ ≪ θ U such that all components in θ L are strictly positive. The notations θ L = (β L , α L ) and θ U = (β U , α U ) are used. Using Proposition 5.1 and Lemma 3.2, together with the continuity of E 1 (Q t,j )11(β,α 0 ) with respect to β , if β L is chosen enough close to β U , then converges almost surely and has a finite expectation.
For this selected neighborhood N (θ 0 ), we have the following two useful lemmas.
for some positive constants κ U and κ L .
Lemma A.8. There exists a stationary and ergodic process {v 0t } such that ≤ v 0t and the r-th moment Ev r 0t < ∞ for r = 1, 2, 3, 4. Proof. Let θ ∈ Θ and partition the vector θ into β and α. Then We establish a bound for the right-hand side. By Lemma A.6, where κ U is defined in (A.9), which is non-stochastic and does not depend on θ. Consider the quantity h t (β 0 , α 0 ) h t (β, α 0 ) .

By (A.13) of Lemma A.13,
Together with Lemma A.14 and (A.12) in Lemma A.13, The result for higher moments can be obtained by applying Minkoswki's Inequality to (A.11).
Lemma A.9. There exist a neighbourhood N (θ 0 ) and a stationary and ergodic processes {v i 1t } such that sup and the r-th moment Ev r 1t < ∞ for r = 1, 2, 3, 4. Proof. Suppose that there exist θ L and θ U such that θ L ≪ θ ≪ θ U for all θ ∈ N (θ 0 ). We consider the derivatives with respect to θ i for θ i = α µ and θ i = β µ respectively as follows.
thus, the derivative is bounded by which is almost surely convergent with finite expectation.
Proof. From equation (A.7) for the second derivatives, we only need to consider the term Consider the case that θ i1 = β µ1 . We have A result similar to Lemma A.10 is stated below without proof.
The derivatives up to order three that appear in the above relations are given in Equations (A.1)-(A.3). The fourth order derivatives can be obtained by differentiating Equation (A.3). By using Lemma A.15, the convergence results hold if the following two conditions are satisfied.
1. X 2 t /h t (θ, ϕ) and the quantities h i1...ik kt that relate to differentiations with respect to (α, β) only are bounded by some stationary and ergodic processes with finite unconditional moments. 2. The quantities h i1...ik kt that involve differentiations with respect to ϕ decay almost surely at the rate ≤ O(t k µ t ) for some non-negative integer k and 0 < µ < 1 .
The first condition is established in Lemmas A.8 to A.11. We now show that the second condition holds. By Equation (A.5), when θ i0 = ω, where λ is the eigenvalue of B with the largest modulus. Similarly, when θ i0 = h −µ+1 , we have ∂ i0 H t (θ, ϕ) = B t−j e µ = O(λ t ).
Note that A bound for the first term on the right-hand side is given in Lemma A.8. For the second term, we have According to Proposition 4.1, for all 0 < δ < ρ, h t (θ 0 , ϕ) ≥ O(e (ρ−δ)t ).
If δ is chosen so that O(e (ρ−δ)t ) > O((Q t,t ) 11 (θ 0 )) , then, for some 0 < µ < 1 . Using similar arguments as presented above, it is not difficult to show that we have in general, for some non-negative integer k ′ ≤ k and 0 < µ < 1 .
In particular, the first element is given by Proof. The recursive relationships for H t (β 1 , α) and H t (β 2 , α) are given by