Asymptotic variance of functionals of discrete-time Markov chains via the Drazin inverse.

We consider a $\psi$-irreducible, discrete-time Markov chain on a general state space with transition kernel $P$. Under suitable conditions on the chain, kernels can be treated as bounded linear operators between spaces of functions or measures and the Drazin inverse of the kernel operator $I - P$ exists. The Drazin inverse provides a unifying framework for objects governing the chain. This framework is applied to derive a computational technique for the asymptotic variance in the central limit theorems of univariate and higher-order partial sums. Higher-order partial sums are treated as univariate sums on a `sliding-window' chain. Our results are demonstrated on a simple AR(1) model and suggest a potential for computational simplification.


Introduction
Assume {Φ} is a ψ-irreducible, discrete-time Markov chain with transition kernel P evolving on a Polish state space X . Denote by B(X ) the Borel sets associated with X . A "univariate partial sum" will refer to S n (g) = n i=1 g(Φ i ) for a function g : X → R. Define the "slidingwindow chain" as {Φ i = (Φ i , . . . , Φ i+p−1 )} for some integer p. A "high-order partial sum" is S n (g) = n i=1g (Φ i , . . . , Φ i+p−1 ) for some functiong : X p → R. We consider the role of linear operators in formulating the limiting behavior of the chain. Of special interest are the "fundamental kernel," Z, and "Drazin inverse" (or "group inverse") Q # , of the kernel Q = I − P . Our study is somewhat unconventional since it focuses on Q # rather than Z, which has traditionally been used in investigations of a Markov chain's limiting behavior. The framework centering on Q # is particularly tidy and convenient, and can simplify computation. As an example, we show Q # allows an easy correspondence between the objects governing sliding-window chains and their analogues on the original chain.
A considerable literature is available on central limit theorems for univariate partial sums, with a variety of formulas for the asymptotic variance of S n (g). The sliding-window chain inherits the stability properties of {Φ} and admits a viewpoint in which S n (g) is treated as a univariate partial sum, n i=1g (Φ i ). Connections between the limiting properties of S n (g) and S n (g) are made by describing the probability laws governing {Φ} in terms of those of {Φ}. Our investigation builds upon the central limit theorem derived in Meyn and Tweedie (1993, Ch. 17) and related ideas in Glynn and Meyn (1996). Our main reference on the Drazin inverse of a linear operator is Koliha (1996a), but Campbell and Meyer's (1979) investigations in finite-dimensional state space also provide inspiration. References for Markov-chain central limit theorems abound, including Kemeny and Snell (1960), Cogburn (1972), Nummelin (1984), Kipnis and Varadhan (1986), Tóth (1986), Geyer (1992), Chan and Geyer (1994), Tierney (1994), Robert (1995), Robert and Casella (1999, Ch. 4), Roberts and Tweedie (1996), Nummelin (2002), Hobart et al. (2002), Roberts and Rosenthal (2004), and Jones (2004). The main results are in Section 2. Markov chain stability conditions and solutions to the "Poisson equation" are discussed in Section 2.1; these establish the fundamental kernel and Drazin inverse as bounded linear operators on spaces of functions and signed measures. Section 2.2 expresses representations for asymptotic variance associated with Meyn and Tweedie's (1993) central limit theorem in terms of the Drazin inverse. Extensions of the Drazin inverse methodology to the sliding window chain are in Sections 2.3 and 2.4. Demonstrations on linear autoregressive processes are in Section 3 and conclusions are made in Section 4.

Results
Let M denote the space of bounded signed measures on (X , B(X )). Let ν ∈ M, g : X → R a measurable function, A and B kernels, x ∈ X , and E ∈ B(X ). We have the following notation: νA(E) := A(x, E)ν(dx); Ag(x) := g(y)A(x, dy); BA(x, E) := A(y, E)B(x, dy); νg := g(x)ν(dx); I(x, E) := 1 if x ∈ E and 0 otherwise; I g (x, E) := g(x)I(x, E). Define the n-step transition probability kernel P n of {Φ} as the n-fold product of P with itself.

The Poisson equation and linear operator solutions
Define the kernel R = ∞ i=0 q(i)P i , where q is a probability measure on the nonnegative integers. A set C is petite if it holds that R(x, E) ≥ ν(E) for every x ∈ C, E ∈ B(X ), some q as above, and some nontrivial measure ν on (X , B(X )). A rather weak stability condition for {Φ i } which yields a central limit theorem is where b is a finite constant, f : X → [1, ∞), V : X → [1, ∞) and C ∈ B(X ) is a petite set. Such a chain is termed f -regular and has a stationary distribution π with πf < ∞ (see e.g. Glynn and Meyn, Thm. 2.2). Define the kernel Π(x, E) := π(E) and note it satisfies ΠP = P Π = Π. For a functional g : X → R the limiting properties of S n (g) can be understood through a solutionĝ : X → R to the "Poisson equation," When (1)  In a similar spirit, we formalize a framework for operators A that send g to its solutionsĝ = Ag of the Poisson equation (2). Following convention, for a function h :  Proof. Suppose the kernel A is such thatĝ = Ag satisfies the Poisson equation (2) for g ∈ L f . Choose g 1 ∈ L ∞ f , ν ∈ M V +1 arbitrarily and set g 2 = Ag 1 . Then Proposition 1 implies g 2 ∈ L ∞ V +1 , in turn implying |νg 2 | < ∞. Since νg 2 = {νA}g 1 , then νA ∈ M f . Thus A is a bounded linear operator from M f to M V +1 and the conclusion regarding Q # , Z follows. Since πV < ∞ then clearly π ∈ M V +1 . Since (1) holds this implies πf < ∞ and we have νΠg 1 < ∞, implying νΠ ∈ M f . Thus Π is a bounded linear operator from M f to M V +1 .
For a kernel A to be the Drazin inverse of the kernel Q = I − P it must satisfy the algebraic criteria QA = AQ, QAQ = Q, and AQA = A, in which case it is unique in that respect. The next Proposition establishes Q # is the Drazin inverse of Q and delineates what will be useful relationships among Z, Q # , and Π.
(i) Q # satisfies the criteria (3) and is therefore the unique Drazin inverse of Q, Proof. Note when (1) is true and πV < ∞, the kernels Π, Q # and Z exist and are well-defined due to Propositions 1 and 2.
To prove statement (i), it is straightforward to use the definitions of Q and Q # to show (3). The uniqueness of Q # with respect to the criteria (3) follows from Koliha (1996, lem. 2.4).
Statement (iii) now follows from (ii), recalling Q = I − P . Statement (iv) also follows immediately from (ii) since Z = {I − P + Π} −1 and To prove the general case, apply this in the induction step The results to this point, and those to follow, require the drift condition (1) as a sufficient condition. We pause for a remark on the necessity of (1). The existence of the operator Q # follows from f -regularity with π(V ) < ∞, since for a ψ-irreducible chain these imply i≥0 |P i g − πg| < ∞ for any initial x and any function g with |g| ≤ f (Meyn and Tweedie, 1993, Thm. 14.0.1). This is not equivalent to f -ergodicity, which merely has P n − π f → 0. Of course, if a chain is not f -ergodic there is no hope of Q # existing, so that an investigation into the necessity of (1) must involve f -ergodic chains. A ψ-irreducible chain is f -regular if and only if it satisfies (1) (Meyn and Tweedie, 1993, Thm. 14.2.6), and, while an aperiodic f -regular chain is f -ergodic, aperiodic f -ergodic chains need not be f -regular (Meyn and Tweedie, 1993, Thm. 14.3.3). However, the f -ergodic chain restricted to a full (a set A with ψ(A) = 1) absorbing set is f -regular. Thus, when the maximal irreducibility measure ψ is Lebesgue measure, an aperiodic f -ergodic chain is f -regular almost everywhere. In this case, which we suspect will contain most real applications, the drift condition (1) is 'almost' necessary as well as being sufficient for the existence of Q # when π(V ) < ∞. The question of the necessity of the drift condition (1) in establishing the existence of the operator Z is resolved similarly, with the difference being Glynn and Meyn (1993 , with V as above, τ B being the hitting time for a set B ∈ B(X ), and c(·) a set function with c(B) < ∞. This is slightly stronger than the characterization, equivalent to (1), of a ψ-irreducible chain being f -regular when it has a countable covering of the state space X by regular sets The special case of V -uniform ergodicity occurs when the bounding function V in (1) is a multiple of f . The operator norm A V = sup x∈X sup g:|g|≤V |Ag(x)|/V (x) makes the relevant kernels bounded linear operators from either L ∞ V to itself or M V to itself. It also allows an equivalent characterization of V -uniform ergodicity: providing an immediate proof of the existence of Q # since i≥0 {P i − Π} converges in the norm · V , considered over either spaces of functions or measures. When Q # is a bounded linear operator on a Banach space, operator theory results such as those of Koliha (1996b) describe its spectral properties. One particular implication is that Q must have an isolated eigenvalue at zero, which admits the study of "second eigenvalue" properties of Markov chains. Such ideas have been used to investigate a Markov chain's rate of convergence to its invariant distribution. See, e.g., Lawler and Sokal (1988) and more recent work by François (1999) and León (2001).

Representations of asymptotic variance
Considering the kernels as operators between spaces of functions makes sense when concerned with calculating asymptotic quantities of partial sums. Considering the kernels as operators between spaces of measures is logical when concerned with convergence of measures, such as convergence of the transition probabilities to the stationary. A combination of the two is applicable to studying convergence of normalized partial sums to a normal distribution in the Central Limit theorem. We recount some important asymptotic properties of the partial sum S n (g). Statements  (i) The asymptotic mean of n −1 S n (g) is (ii) Supposeĝ solves (2) and π(ĝ 2 ) < ∞.
The quantity σ 2 g is the asymptotic variance of the partial sum S n (g). It is a quantity of interest in Markov chain theory; for example in the CLT as in Proposition 4.ii, also in MCMC methodology. We express the asymptotic variance in terms of the Drazin inverse Q # .
Note (9) may be interpreted in terms of the autocovariances of {Φ i }. Assuming Φ 0 is initially distributed according to π, it is straightforward to show πI g {I − Π}g = V ar[g(Φ 0 )] and (1), P j → Π, and since ΠQ # = 0 it follows the first term in (9) disappears. The resulting formula is then the familiar (Meyn and Tweedie, 1993, Thm. 17.5.3) 2.3 Representations for a sliding window chain in terms of the original chain Under suitable regularity conditions on {Φ}, the asymptotic properties of S n (g) are detailed in formulas (4) -(6) with π, P , and Q # replaced byπ,P , andQ # , analogous objects governing {Φ i }. In this section, the connection ofπ,P , andQ # to P , π, and Q # is made explicit. The transition probability kernelP of the sliding window chain is defined on the product space (X p , B(X ) p ). We extend measures from {Φ} to {Φ} as follows.
; for a functiong : X p → R defineνg appropriately. Define fromg the functiong * : X → R byg * := gdP * and note we can write νg * =νg. An analogue to the drift condition (1) holds for the sliding window chain. From f and V in (1) The stability condition (1) may now be rewritteñ It remains to show thatC is petite.
We now define a number of relevant linear spaces. Let M p be the space of bounded signed measures on the product space (X p , B(X ) p ). Recall the spaces L ∞ h and M h defined in Section 2.1. LetL ∞ h denote the space of measurable functionsg : X p → R such thatg * ∈ L ∞ h . Let M h denote the space of bounded signed measuresν ∈ M p for which there is a signed measure ν ∈ M h such thatνg = νg * for anyg ∈L ∞ h . Based on Proposition (6) and the preceding discussion we expect if the original chain is stable so is the sliding-window chain. Then by the results in Section 2.2 an invariant measure must exist, a Drazin inverse must exist and so also must solutions to the sliding-window version of the Poisson equation. The following shows this is the case.
(i) The measureπ ∈M f is the invariant measure of the sliding window chain.
(ii) The Drazin inverse ofQ, denotedQ # , exists as a bounded linear operator fromL ∞ f tõ L ∞ V +1 and fromM f toM V +1 , is unique and provides solutionsĝ =Q #g to the Poisson equation.
The measureπ ∈M f such thatπg = πg * with π ∈ M f is therefore the invariant measure of the sliding window chain. Statement (ii): Since πV < ∞ thenπṼ < ∞. DefineQ = I −P . As implied by Proposition 3.i,Q # exists, is unique, and may be writtenQ # = SinceπṼ < ∞, the central limit theorem and asymptotic variance expressions in Propositions 4 and 5 apply to {Φ} withP ,Π,Q # replacing P, Π, Q # . We have already relatedP andΠ to P and Π. We next connectQ # to its analogous object Q # on the original chain.

Asymptotic variance of high-order partial sums
We apply these results to show how the asymptotic variance of S n (g) can be expressed in terms of Q # . Proposition 9. When Proposition 4.ii holds for analogous objects of the sliding window chain, the asymptotic variance of S n (g) is where Proof. It follows almost directly from Proposition 8 thatπIgP p−1Q#g = πPgQ #g * . Combine this with Proposition 3.v with j = p − 1 and (6) in which the termπIg{P i −Π}g has been expanded toπIgP ig − (πg) 2 .

Applications to autoregressive processes
At this point, we have established the Drazin inverse methodology for {Φ}, extended it to {Φ} and expressed this extension in terms of the kernels related to {Φ}. We will now illustrate all of this with some applications.
We consider a simple time series model, the linear autoregressive process of order one, AR(1). The AR(1) model is Φ i+1 = φΦ i + ǫ i+1 , where the {ǫ i } are mean-zero Gaussian random variables with constant variance σ 2 . We suppose |φ| < 1 so that {Φ i } is a mean zero stationary process. The transition probability kernel identifies P (x, ·) with a Gaussian distribution having mean φx and variance σ 2 . A straightforward iterative argument will then show that P i (x, ·) is a Gaussian probability measure with mean φ i x and variance σ 2 Known results (see, e.g., Brockwell and Davis, 1991, Ch. 3) establish the invariant distribution π is mean-zero Gaussian with variance σ 2 . We verify the drift condition (1) for V -uniform ergodicity (and thus f -regularity, since f is a multiple of V ) using the bounding function V (x) = e c|x| , c > 0, which satisfies V ≥ 1. Note P V (x) < ∞ and {Φ t } is ψ-irreducible with the irreducibility measure ψ being Lebesgue measure. Since |φ| < 1 and E[e c|ǫi| ] < ∞, for |x| large enough it holds that ; then C is petite and the drift condition for V -uniform ergodicity is satisfied: First, in Example 1, to demonstrate the basic techniques described in Section 2.2, we treat a problem involving a univariate partial sum only. Next, in Example 2, to demonstrate the techniques derived in Section 2.4, we treat a problem involving a high-order partial sum. A third example is also given, which combines the formulas derived in Example 2 to rederive the asymptotic variance of the least squares estimator for the autoregressive parameter.
Although the examples deduce fairly broad formulas, they have been laid out in such a way as to sketch how our high-order asymptotic formulas might be used in a general methodology: the first problem provides an understanding of the action of Q # as applied to a univariate partial sum, which is then applied in the second, more complicated problem involving higher-order partial sums; finally, the third problem pulls the results together in a useful context. The amount of computational effort required in each example is roughly the same as that which would be needed in taking a more direct approach, such as applying (10), e.g., which does not involve Q # . Nevertheless, we anticipate advantages in future applications to general autoregressive time series in which Q # is known with uncertainty and at considerable computational cost. By following a sequence of steps similar to these examples, one avoids calculation of the Drazin inverse of the sliding window chain, thereby heading off additional uncertainty and computational costs that would arise in working with this more complicated chain directly. Further discussion of potential applications is given in Section 4.
Example 1. We calculate the asymptotic variance (6) for g(x) = x d for some integer d. By rescaling P i (x, ·) with respect to the distribution of a standard Gaussian random variable Z ∼ G(0, 1), the binomial theorem provides ( in which indexing uses j ′ = 2j only, since the odd moments of Z are zero. Here, ⌊d/2⌋ represents the greatest integer not exceeding d/2. The binomial theorem may be used again where K j = 0, . . . , j if j < d/2 and K j = 1, . . . , j if j = d/2. The special indexing arises since πg = σ d π E[Z d ] is identical to the expanded summand in (13) indexed by j = d/2 and k = 0. The first term in (6) involves πI g Q # g. To evaluate this using (14), write h(x) = x d−2j , note It then remains to calculate πI g {I − Π}g, the second term in (6). This is easily done using Thus, the asymptotic variance (6) for g(x) = x d is Expressions for the example values d = 1, 2, 3 are In general settings, the even moments of Z may be calculated through the well-known formula E[Z 2j ] = 2 j π −1/2 Γ(j + 1/2), where Γ is the gamma function.
Example 2. In this example, the sliding-window chain formulas derived in Section 2.4 are used to evaluate the asymptotic behavior of the partial sums S 1 = n i=1 Φ i Φ i+r−1 and S 2 = n i=1 Φ i Φ i+p−1 , with r and p being fixed, positive integers. The specific objective is to evaluate , for i = 1, 2, and σ 12 ≈ n −1 Cov[S 1 , S 2 ] under the simple AR(1) model. The parameter r shall be restricted to 1 ≤ r < (p + 1)/2, which identifies the simpler cases in calculating the second term in (12). Setg(x) = a 1 x 1 x r + a 2 x 1 x p , for arbitrary constants a 1 and a 2 . Expressions for the asymptotic mean and variance of S n (g) will have the form µg = a 1 µ 1 + a 2 µ 2 and σ 2 g = a 2 1 σ 2 1 + 2a 1 a 2 σ 12 + a 2 2 σ 2 2 . Thus, once µg and σg are derived, the desired quantities may simply be read off. We begin the derivation of σ 2 g by calculating the first term in (12) from Proposition 9. We need the following lemma.
Putting together (16), (17), (19), and (20), The asymptotic variance (12) is given by Example 3. Under the AR(1) model, a least squares estimator for φ arises by regressing onestep-ahead values Φ i+1 onto current values Φ i in the manner of no-intercept simple linear regression. Calculating this from a trajectory of n + 1 time points, Φ 1 , . . . , Φ n+1 , the estimator isφ Observe thatφ is calculated from the partial sums of the previous example with p = 2 and r = 1: the partial sums are , and the least squares estimator isφ(S 1 , S 2 ) = S 2 /S 1 .

Conclusions and discussion
A linear operator framework was established for discrete-time Markov chains and the Drazin inverse was shown to unify objects central to Markov chain theory. This framework was applied to asymptotic variance calculation for univariate and higher-order partial sums, with the formulas expressed in terms of objects governing the original chain, {Φ i }, which presumably would be obtained by statistical analysis. A potential application is to analysis of a general autoregressive time series. The present study does not go as far as to provide a methodology for the analysis of general autoregressive time series, but our hope is that this technique might be exploited in future applications in order to reduce computational expense. Another extension of these methods would be to continuous-time Markov chains. We note some existing literature which used asymptotic variance formulas to monitor the convergence of simulated Markov chains to their invariant distributions. For instance, Peskun (1973) studied convergence in chains with finite state space using the fundamental matrix. More recent work by Chauveau and Diebolt (2002) builds upon the formula (10) to study convergence in general state-space chains. We conjecture the present results would be useful in these applications as well. It is prudent to remark on continuity and approximation of π and Q # . Continuity results for Q # are found in Campbell (1980) and Koliha and Rakočević (1998). These can be extended to the continuity of π through our Proposition 3.iii. Meyer (1975) contains an early algorithm for calculating Q # for finite state-space Markov chains. Working with abstract Banach spaces, Gonzolez and Koliha (1999) study semi-iterative methods for solving linear equations such as π = πP in cases where Q # exists. More recently, iterative methods for approximating Q # and other generalized inverses in general state-space settings are studied in Djordjevic, Stanimirovic, and Wei (2004). We expect the results of these investigations would be helpful in developing efficient computational algorithms for Markov chain applications.