Large deviations asymptotics and the spectral theory of multiplicatively regular Markov processes

We continue the investigation of the spectral theory and exponential asymptotics of Markov processes, following Kontoyiannis and Meyn (2003). We introduce a new family of nonlinear Lyapunov drift criteria, characterizing distinct subclasses of geometrically ergodic Markov processes in terms of inequalities for the nonlinear generator. We concentrate on the class of"multiplicatively regular"Markov processes, characterized via conditions similar to (but weaker than) those of Donsker-Varadhan. For any such process {Phi(t)} with transition kernel P on a general state space, the following are obtained. 1. SPECTRAL THEORY: For a large class of functionals F, the kernel Phat(x,dy) = e^{F(x)}P(x,dy) has a discrete spectrum in an appropriately defined Banach space. There exists a"maximal"solution to the"multiplicative Poisson equation,"defined as the eigenvalue problem for Phat. Regularity properties are established for \Lambda(F) = \log(\lambda), where \lambda is the maximal eigenvalue, and for its convex dual. 2. MULTIPLICATIVE MEAN ERGODIC THEOREM: The normalized mean E_x[\exp(S_t)] of the exponential of the partial sums {S_t} of the process with respect to any one of the above functionals F, converges to the maximal eigenfunction. 3. MULTIPLICATIVE REGULARITY: The drift criterion under which our results are derived is equivalent to the existence of regeneration times with finite exponential moments for {S_t}. 4. LARGE DEVIATIONS: The sequence of empirical measures of {Phi(t)} satisfies an LDP in a topology finer than the \tau-topology. The rate function is \Lambda^* and it coincides with the Donsker-Varadhan rate function. 5. EXACTR LARGE DEVIATIONS: The partial sums {S_t} satisfy an exact LD expansion, analogous to that obtained for independent random variables.


Introduction and Main Results
Let Φ = {Φ(t) : t ∈ T} be a Markov processes taking values in a Polish state space X, equipped with its associated Borel σ-field B. The time index T may be discrete, T = Z + , or continuous T = R + , but we specialize to the discrete-parameter case after Section 1.1. The distribution of Φ is determined by its initial state Φ(0) = x ∈ X, and the transition semigroup {P t : t ∈ T}, where in discrete time all kernels P t are powers of the 1-step transition kernel P . Throughout the paper we assume that Φ is ψ-irreducible and aperiodic. This means that there is a σ-finite measure ψ on (X, B) such that, for any A ∈ B satisfying ψ(A) > 0 and any initial condition x, for all t sufficiently large.
Moreover, we assume that ψ is maximal in the sense that any other such ψ ′ is absolutely continuous with respect to ψ (written ψ ′ ≺ ψ).
For a ψ-irreducible Markov process it is known that ergodicity is equivalent to the existence of a solution to the Lyapunov drift criterion (V3) below [34,17]. Let V : X → (0, ∞] be an extended-real valued function, with V (x 0 ) < ∞ for at least one x 0 ∈ X, and write A for the (extended) generator of the semigroup {P t : t ∈ T}. This is equal to A = (P − I) in discrete time (where I = I(x, dy) denotes the identity kernel δ x (dy)), and in continuous-time we think of A as a generalization of the classical differential generator A = d dt P t | t=0 . Recall that a function s : X → R + and a probability measure ν on (X, B) are called small if for some measure m on Z with finite mean we have t≥0 P t (x, A) m(t) ≥ s(x)ν(A), x ∈ X, A ∈ B.
A set C is called small if s = ǫI C is a small function for some ǫ > 0. Also recall that an arbitrary kernel P = P (x, dy) acts linearly on functions f : X → C and measures ν on (X, B), via P f ( · ) = X P (·, dy)f (y) and ν P ( · ) = X ν(dx) P (x, · ), respectively.
We say that the Lyapunov drift condition (V3) holds with respect to the Lyapunov function V [34], if: For a function W : X → [1, ∞), a small set C ⊂ X, and constants δ > 0, b < ∞, Condition (V3) implies that the set S V is absorbing (and hence full), so that V (x) < ∞ a.e.
[ψ]; see [34,Proposition 4.2.3]. As in [34,32], a central role in our development will be played by weighted L ∞ spaces: For any function W : X → (0, ∞], define the Banach space of complex-valued functions, with associated norm g W := sup x |g(x)|/W (x). We write B + for the set of functions s : X → [0, ∞] satisfying ψ(s) := s(x) ψ(dx) > 0, and, with a slight abuse of notation, we write A ∈ B + if A ∈ B and ψ(A) > 0 (i.e., the indicator function I A is in B + ). Also, we let M W 1 denote the Banach space of signed and possibly complex-valued measures µ on (X, B) satisfying µ W := sup F ∈L W ∞ |µ|(F ) < ∞. The following consequences of (V3) may be found in [34,Theorem 14.0.1]. Theorem 1.1 (Ergodicity) Suppose that Φ is a ψ-irreducible and aperiodic discrete-time chain, and that condition (V3) is satisfied. Then the following properties hold: 1. (W -ergodicity) The process is positive recurrent with a unique invariant probability measure π ∈ M W 1 and for all x ∈ S V , where P x denotes the conditional distribution of Φ given Φ(0) = x.

(W -regularity) For any
where E x is the expectation with respect to P x , and the hitting times τ A are defined as, 3. (Fundamental Kernel) There exists a linear operator Z : L W ∞ → L V +1 ∞ , the fundamental kernel, such that AZF = −F + π(F ), F ∈ L W ∞ . That is, the function F := ZF solves the Poisson equation, A F = −F + π(F ) .

Multiplicative Ergodic Theory
The ergodic theory outlined in Theorem 1.1 is based upon consideration of the semigroup of linear operators {P t } acting on the Banach space L W ∞ . In particular, the ergodic behavior of the corresponding Markov process can be determined via the generator A of this semigroup. In this paper we show that the foundations of the multiplicative ergodic theory and of the large deviations behavior of Φ can be developed in analogy to the linear theory, by shifting attention from the semigroup of linear operators {P t } to the family of nonlinear, convex operators {W t } defined, for appropriate G, by x ∈ X , t ∈ T .
Formally, we would like to define the 'generator' H associated with {W t } by letting H = (W − I) in discrete time and H = d dt W t | t=0 in continuous time. Observing that W t G = log(P t e G ), in discrete time we have HG = (W − I)G = log(P e G ) − G = log(e −G P e G ), and in continuous time we can similarly calculate, whenever all the above limits exist. Rather than assume differentiability, we use these expressions as motivation for the following rigorous definition of the nonlinear generator, when e G is in the domain of the extended generator. In continuous time, this is Fleming's nonlinear generator; see [22] for a starting point, and [20,21] for recent surveys.
In this paper our main focus will be on the following 'multiplicative' analog of (V3), where the role of the generator is now played by the nonlinear generator H. We say that the Lyapunov drift criterion (DV3) holds with respect to the Lyapunov function V : X → (0, ∞], if: For a function W : X → [1, ∞), a small set C ⊂ X, and constants δ > 0, b < ∞, [This condition was introduced in [32], under the name (mV3).] Under either condition (V3) or (DV3), we let {C W (r)} denote the sublevel sets of W : C W (r) = {y : W (y) ≤ r}, r ∈ R.
The main assumption in many of our results below will be that Φ satisfies (DV3), and also that the transition kernels satisfy a mild continuity condition: We require that they possess a density with respect to some reference measure, uniformly over all initial conditions x in the sublevel set C W (r) of W . These assumptions are formalized in condition (DV3+) below.
Condition (DV3+) captures the essential ingredients of the large deviations conditions imposed by Donsker and Varadhan in their pioneering work [14,15,16], and is in fact somewhat weaker than those conditions. In Section 2 an extensive discussion of this assumption is given, its relation to several well-known conditions in the literature is described in detail. In particular, part (ii) of condition (DV3+) [to which we will often refer as the "density assumption" in (DV3+)] is generally the weaker of the two assumptions.
In most of our results we assume that the function W in (DV3) is unbounded, W ∞ := sup x |W (x)| = ∞. When this is the case, we let W 0 : X → [1, ∞) be a fixed function in L W ∞ , whose growth at infinity is strictly slower than W in the sense that Below we collect, from various parts of the paper, the "multiplicative" ergodic results we derive from (DV3+), in analogy to the "linear" ergodic-theoretic results stated in Theorem 1.1.
Under (DV3), the stochastic process m = {m(t)} defined below is a super-martingale with respect to F t = σ{Φ(s) : 0 ≤ s ≤ t}, t ≥ 0, From the super-martingale property and Jensen's inequality we obtain the bound, which gives the desired bound in (1.), where η := δη 0 . The multiplicative ergodic limit (7) follows from Theorem 3.1 (iii). The existence of an inverse G to H is given in Proposition 3.6, which establishes the boundF ∈ L V ∞ stated in (1.), as well as result (3.). Theorem 2.5 shows that (DV3) actually characterizes W -multiplicative regularity, and provides the bound in (2.).
As in [32], central to our development is the observation that the multiplicative Poisson equation (8) can be written as an eigenvalue problem. In discrete-time with Λ = Λ(F ), (8) becomes (e F P )eF = e Λ eF , or, writing f = e F ,f = eF and λ = e Λ , we obtain the eigenvalue equation, The assumptions of Theorem 1.2 are most easily illustrated in continuous time. Consider the following diffusion model on R, sometimes referred to as the Smoluchowski equation. For a given potential u : R → R + , this is defined by the stochastic differential equation where u x := d dx u, and W = {W (t) : t ≥ 0} is a standard Brownian motion. On C 2 , the extended generator A of X = {X(t) : t ≥ 0} coincides with the differential generator given by, When σ > 0 this is an elliptic diffusion, so that the semigroup {P t } has a family of smooth, positive densities P t (x, dy) = p(x, y; t)dy, x, y ∈ R [33]. Hence the Markov process X is ψ-irreducible, with ψ equal to Lebesgue measure on R.
A special case is the one-dimensional Ornstein-Uhlenbeck process, where the corresponding potential function is u(x) = 1 2 δx 2 , x ∈ R.
Proposition 1.3 The Smoluchowski equation satisfies (DV3+) with V = 1 + uσ −2 and W = 1 + u 2 x , provided the potential function u : R → R + is C 2 and satisfies: Proof. Let V = 1 + uσ −2 . We then have, It is thus clear that the desired drift conditions hold. The proof is complete since P t (x, dy) possesses a continuous density p(x, y; t) for each t > 0: We may take T 0 = 1, and for each r we take β r equal to a constant times Lebesgue measure on C W (r). Proposition 1.3 does not admit an exact generalization to discrete-time models. However, the discrete-time one-dimensional Ornstein-Uhlenbeck process, does satisfy the conclusions of the proposition, again with V = 1 + ǫ 0 x 2 for some ǫ 0 > 0, when δ > 0 and W is an i.i.d. Gaussian process with positive variance.
Notation. Often in the transition from ergodic results to their multiplicative counterparts we have to take exponentials of the corresponding quantities. In order to make this correspondence transparent we have tried throughout the paper to follow, as consistently as possible, the convention that the exponential version of a quantity is written as the corresponding lower case letter. For example, above we already had f = e F ,f = eF and λ = e Λ .

Large Deviations
From now on we restrict attention to the discrete-time case. Part 1 of Theorem 1.2 extends the multiplicative mean ergodic theorem of [32] to the larger class of (possibly unbounded) functionals F ∈ L W 0 ∞ . In this section we assume that (DV3+) holds with an unbounded function W , and we let a function W 0 ∈ L W ∞ be chosen as in (6). For n ≥ 1, let L n denote the empirical measures induced by Φ on (X, B), and write ·, · for the usual inner product; for µ a measure and G a function, µ, G = µ(G) := G(y) µ(dy), whenever the integral exists. Then, from Theorem 3.1 it follows that for any real-valued F ∈ L W 0 ∞ and any a ∈ R we have the following version of the multiplicative mean ergodic theorem, wheref a := e G(aF ) is the eigenfunction constructed in part 3 of Theorem 1.2, corresponding to the function aF .
In Section 5, strong large deviations results for the sequence of empirical measures {L n } are derived from the multiplicative mean ergodic theorem in (15), using standard techniques [9,7,12]. First we show that, for any initial condition x ∈ X, the sequence {L n } satisfies a large deviations principle (LDP) in the space M 1 of all probability measures on (X, B) equipped with the τ W 0 -topology, that is, the topology generated by the system of neighborhoods Moreover, the rate function I(ν) that governs this LDP is the same as the Donsker-Varadhan rate function, and can be characterized in terms of relative entropy, where the infimum is over all transition kernelsP for which ν is an invariant measure, ν ⊙P denotes the bivariate measure [ν ⊙P ](dx, dy) := ν(dx)P (x, dy) on (X × X, B × B), and H( · · ) denotes the relative entropy, [Throughout the paper we follow the usual convention that the infimum of the empty set is +∞.] As we discuss in Section 2.6 and Section 5, the density assumption in (DV3+) (ii) is weaker than the continuity assumptions of Donsker and Varadhan, but it cannot be removed entirely.
Further, the precise convergence in (15) leads to exact large deviations expansions analogous to those obtained by Bahadur and Ranga Rao [1] for independent random variables, and to the local expansions established in [32] for geometrically ergodic chains. For real-valued, non-lattice functionals F ∈ L W 0 ∞ , in Theorem 5.3 we obtain the following: For c > π(F ) and x ∈ X, where a ∈ R is chosen such that d da Λ(aF ) = c,f a (x) is the eigenfunction appearing in the multiplicative mean ergodic theorem (15), σ 2 a = d 2 da 2 Λ(aF ), and the exponent J(c) is given in terms of I(ν) as A corresponding expansion is given for lattice functionals. These large deviations results extend the classical Donsker-Varadhan LDP [14,15] in several directions: First, our conditions are weaker. Second, when (DV3+) holds with an unbounded function W , the τ W 0 -topology is finer and hence stronger than either the topology of weak convergence, or the τ -topology, with respect to which the LDP for the empirical measures {L n } is usually established [24,4,13]. Third, apart from the LDP we also obtain precise large deviations expansions as in (18) for the partial sums with respect to (possibly unbounded) functionals F ∈ L W 0 ∞ .
Following the Donsker-Varadhan papers, a large amount of work has been done in establishing large deviations properties of Markov chains under a variety of different assumptions; see [12,13] for detailed treatments. Under conditions similar to those in this paper, Ney and Nummelin have proved "pinned" large deviations principles in [37,38]. In a different vein, under much weaker assumptions (essentially under irreducibility alone) de Acosta [10] and Jain [28] have proved general large deviations lower bounds, but these are, in general, not tight.
One of the first places where the Feller continuity assumption of Donsker and Varadhan was relaxed is Bolthausen's work [4]. There, a very stringent condition on the chain is imposed, often referred to in the literature as Stroock's uniform condition (U). In Section 2.5 we argue that (U) is much more restrictive than the conditions we impose in this paper. In particular, condition (U) implies Doeblin recurrence as well as the density assumption in (DV3+) (ii).
More recently, Eichelsbacher and Schmock [19] proved an LDP for the empirical measures of Markov chains, again under the uniform condition (U). This LDP is proved in a strict subset of M 1 , and with respect to a topology finer than the usual τ -topology and similar in spirit to the τ W 0 topology introduced here. In addition to (U), the results of [19] require strong integrability conditions that are a priori hard to verify: In the above notation, in [19] it is assumed that for at least one unbounded function W 0 : X → R, we have E x [exp{a|W 0 (Φ(n))|}] < ∞, uniformly over n ≥ 1, for all real a > 0. This assumption is closely related to our condition (DV3), and, as we show in Section 3, (DV3) in particular provides a means for identifying a natural class of functions W 0 satisfying this bound.

Structural Assumptions
There is a wide range of interrelated tools that have been used to establish large deviations properties for Markov processes and to develop parts of the corresponding multiplicative ergodic theory. Most of these tools rely on a functional-analytic setting within which spectral properties of the process are examined. A brief survey of these approaches is given in [32], where the main results relied on the geometric ergodicity of the process. In this section we show how the assumptions used in prior work may be expressed in terms of the drift criteria introduced here and describe the operator-theoretic setting upon which all our subsequent results will be based.

Drift Conditions
Recall that the (extended) generator A of Φ is defined as follows: For a function g : X → C, we write Ag = h if for each initial condition Φ(0) = x ∈ X the process ℓ(t) := t−1 s=0 h(Φ(s)) − g(Φ(t)), t ≥ 1, is a local martingale with respect to the natural filtration {F t = σ(Φ(s), 0 ≤ s ≤ t) : t ≥ 1}. In discrete time, the extended generator is simply A = P − I, and its domain contains all measurable functions on X.
The following drift conditions are considered in [34] in discrete time, where in each case C is small, V : X → (0, ∞] is finite a.e. [ψ], and b < ∞, δ > 0 are constants. We further assume that W is bounded below by unity in (V3), and that V is bounded from below by unity in (V4). It is easy to see that (V2)-(V4) are stated in order of increasing strength: (V4) ⇒ (V3) ⇒ (V2). Analogous multiplicative versions of these drift criteria are defined as follows, where H is the nonlinear generator defined in (4). The following implications follow easily from the definitions: Proof. We provide a proof only for k = 3 since all are similar. Under (DV3), P e V ≤ e V −W +bI C . Jensen's inequality gives e P V ≤ P e V , and taking logarithms gives (V3).
We find that Proposition 2.1 gives a poor bound in general. Theorem 2.2 shows that (DV2) actually implies (V4). Its proof is given in the Appendix, after the proof of Theorem 2.5.

Spectral Theory Without Reversibility
The spectral theory described in this paper and in [32] is based on various operator semigroups { P n : n ∈ Z + }, where each P n is the nth composition of a possibly non-positive kernel P . Examples are the transition kernel P ; the multiplication kernel I G (x, dy) = G(x)δ x (dy). for a given function G; the scaled kernel defined by for any function F : X → C with f = e F ; and also the twisted kernel, defined for a given function h : X → (0, ∞) by This is a probabilistic kernel (i.e., a positive kernel withP h (x, X) = 1 for all x) provided P h (x) < ∞, x ∈ X. It is a generalization of the twisted kernel considered in [32], where the function h was taken as h =f for a specially constructedf . It may also be regarded as a version of Doob's h-transform [40]. The most common approach to spectral decompositions for probabilistic semigroups {P n } is to impose a reversibility condition [23,5,41]. The motivation for this assumption comes from the L 2 setting in which these problems are typically posed, and the well-known fact that the semigroup {P n } is then self-adjoint. We avoid a Hilbert space setting here and instead consider the weighted L ∞ function spaces defined in (2); cf. [30,31,25,35,32].
The weighting function is determined by the particular drift condition satisfied by the process. In particular, under (DV3) it follows from the convexity of H (see Proposition 4.4) that for any 0 < η ≤ 1 we have the bound, which may be equivalently expressed as ∞ is a bounded linear operator for any function f satisfying F + W ≤ ηδ (where F + := max(F, 0)), and any 0 ≤ η ≤ 1.
Under any one of the above Lyapunov drift criteria, we will usually consider the function v defined in terms of the corresponding Lyapunov function V on X via v = e V . For any such function v : X → [1, ∞) and any linear operator P : L v ∞ → L v ∞ , we denote the induced operator norm by, The spectrum S( P ) ⊂ C of P is the set of z ∈ C such that the inverse [Iz − P ] −1 does not exist as a bounded linear operator on L v ∞ . We let ξ = ξ({ P n }) denote the spectral radius of the semigroup { P n }, In general, the quantities ||| P||| v and ξ depend upon the particular weighting function v. If P is a positive operator, then ξ is greater than or equal to the generalized principal eigenvalue, or g.p.e. (see e.g. [39]), and they are actually equal under suitable regularity assumptions (see [2,32], and Proposition 2.8 below). As in [32], we say that P admits a spectral gap if there exists ǫ 0 > 0 such that the set S( P ) ∩ {z : |z| ≥ ξ − ǫ 0 } is finite and contains only poles of finite multiplicity; recall that z 0 ∈ S( P ) is a pole of (finite) multiplicity n if: (i) z 0 is isolated in S( P ), i.e., for some ǫ 1 > 0 we have {z ∈ S( P ) : |z−z 0 | ≤ ǫ 1 } = {z 0 }; (ii) The associated projection operator can be expressed as a finite linear combination of some where [s ⊗ ν](x, dy) := s(x)ν(dy).
See [32,Sec. 4] for more details. Moreover, we say that P is v-uniform if it admits a spectral gap and also there exists a unique pole λ • ∈ S( P ) of multiplicity one, satisfying |λ • | = ξ({ P t }).
Recall that a Markov process Φ is called geometrically ergodic [32] or equivalently Vuniformly ergodic [34] if it is positive recurrent, and the semigroup converges in the induced operator norm, where 1 denotes the constant function 1(x) ≡ 1. It is known that this is characterized by condition (V4). Under this assumption, in [32] we proved that Φ satisfies a "local" large deviations principle. In this paper under the stronger condition (DV3+) we show that these local results can be extended to a full large deviations principle.
The following result, taken from [32,Proposition 4.6], says that geometric ergodicity is equivalent to the existence of a spectral gap: (a) If Φ is geometrically ergodic with Lyapunov function V , then its transition kernel P admits a spectral gap in L V ∞ and it is V -uniform.
Next we want to investigate the corresponding relationship between condition (DV3) and when the kernel P has a discrete spectrum in L v ∞ . First we establish an analogous 'near equivalence' between assumption (DV3) and the notion of v-separability, and in Theorem 3.5 we show that v-separability implies the discrete spectrum property.
[ψ], we say that the linear operator P : it can be approximated uniformly by kernels with finite-rank. That is, for each ǫ > 0, there exists a finite-rank operator K ǫ such that ||| P − K ǫ ||| v ≤ ǫ. Since the kernel K ǫ has a finite-dimensional range space, we are assured of the existence of an integer n ≥ 1, functions Note that the eigenvalues of K ǫ may be interpreted as a pseudo-spectrum; see [8].
The following equivalence, established in the Appendix, illustrates the intimate relationship between the essential ingredients of the Donsker-Varadhan conditions, and the associated spectral theory as developed in this paper. Note that in Theorem 2.4 the density assumption from part (ii) of (DV3+) has been replaced by the more natural and weaker statement that I C W (r) P T 0 is v-separable for all r. 1 The fact that this is indeed weaker than the assumption in (DV3) (ii) follows from Lemma B.3 in the Appendix. Applications of Theorem 2.4 to diffusions on R n and refinements in this special case are developed in [26].
We say that a linear operator P : L v ∞ → L v ∞ has a discrete spectrum in L v ∞ if its spectrum S has the property that S ∩ K is finite, and contains only poles of finite multiplicity, for any compact set K ⊂ C \ {0}. It is shown in Theorem 3.5 that the spectrum of P is discrete under the conditions of (b) above.
Taking a different operator-theoretic approach, Deuschel and Stroock [13] prove large deviations results for the empirical measures of stationary Markov chains under the condition of hypercontractivity (or hypermixing). In particular, their conditions imply that for some T 0 , the kernel P T 0 (x, dy) is a bounded linear operator from L 2 (π) to L 4 (π), with norm equal to 1.

Multiplicative Regularity
Recall the definition of the empirical measures in (14), and the hitting times {τ A } defined in (3). The next set of results characterize the drift criterion (DV3) in terms of the following regularity assumptions: The Markov process Φ is called geometrically regular if there exists a geometrically regular set C, and η > 0 such that (ii) A set C ∈ B is called H-multiplicatively regular (H-m.-regular) if for any A ∈ B + , there exists η = η(A) > 0 satisfying, The Markov process Φ is H-m.-regular if there exists an H-m.-regular set C ∈ B, and η > 0 such that In [34, Theorem 15.0.1] a precise equivalence is given between geometric regularity and the existence of a solution to the drift inequality (V4). The following analogous result shows that (DV3) characterizes multiplicative regularity. A proof of Theorem 2.5 is included in the Appendix.
(ii) The drift inequality (DV3) holds for some V : X → (0, ∞) and with H ∈ L W ∞ . If either of these equivalent conditions hold, then for any A ∈ B + , there exists ǫ > 0, 1 ≥ η > 0, and B < ∞ satisfying, where V is the solution to (DV3) in (ii).
In a similar vein, in [44] the following condition is imposed for a diffusion on X = R n : For any n ≥ 1 there exists K n ⊂ X compact, such that for any In [44,42] it is shown that this condition is closely related to the existence of a solution to (DV3), where the function W is further assumed to have compact sublevel sets. Under these assumptions, and under continuity assumptions similar to those imposed in [43], it is possible to show that the operator P n is compact for all n > 0 [42, Theorem 2.1], or [11,Lemma 3.4]. We show in Proposition 2.6 that the bound assumed in [44] always holds under (DV3+). We say that G : X → R + is coercive if the sublevel set {x : G(x) ≤ n} is precompact for each n ≥ 1. Coercive functions exist only when X is σ-compact. Proposition 2.6 Let Φ be a ψ-irreducible and aperiodic Markov chain on X. Assume moreover that X = R n ; that condition (DV3+) holds with V : X → [1, ∞) continuous; W unbounded; and the kernels {I C W (r) P T 0 : r ≥ 1} are v-separable for some T 0 ≥ 1. Then, there exists a sequence of compact sets {K n : n ≥ 1} satisfying (27).
Proof. Lemma B.2 combined with Proposition C.7 implies that we may construct functions (V 1 , W 1 ) from X to [1, ∞), and a constant b 1 satisfying the following: sup{V (x) : Lemma C.8 combined with continuity of V then implies that (27) also holds, with K r = closure of C W 1 (n r ) for some sequence of positive integers {n r }. Proposition 2.6 has a partial converse: Proposition 2.7 Suppose the chain Φ is ψ-irreducible and aperiodic. Suppose moreover that X = R n ; that the support of ψ has non-empty interior; that P has the Feller property; and that there exists a sequence of compact sets {K n : n ≥ 1} satisfying (27). Then Condition (DV3) holds with V, W : X → [1, ∞) continuous and coercive.
Proof. Proposition A.2 asserts that there exists a solution to the inequality H(V ) ≤ − 1 2 W + bI C with (V, W ) continuous and coercive, C compact, and b < ∞. Under the assumptions of the proposition, compact sets are small (combine Proposition 6.2.8 with Theorem 5.5.7 of [34]). We may conclude that C is small, and hence that (DV3) holds.

Perron-Frobenius Theory
As in [32] we find strong connections between the theory developed in this paper, and the Perron-Frobenius theory of positive semigroups, as developed in [39].
Suppose that { P n : n ∈ Z + } is a semigroup of positive operators. We assume that { P n } has finite spectral radiusξ in L v ∞ . Then, the resolvent kernel defined by R λ := [Iλ − P ] −1 is a bounded linear operator on L v ∞ for each λ >ξ. We assume moreover that the semigroup is ψ-irreducible, that is, whenever A ∈ B satisfies ψ(A) > 0, then ∞ k=0 P k (x, A) > 0, for all x ∈ X. If Φ is a ψ-irreducible Markov chain, then for any measurable function F : X → R, the kernel P = P f generates a ψ-irreducible semigroup. In general, under ψ-irreducibility of the semigroup, one may find many solutions to the minorization condition, with λ > 0, s ∈ B + , and ν ∈ M + , that is, s : X → R + is measurable with ψ(s) > 0, and ν is a positive measure on (X, B) satisfying ν(X) > 0. The pair (s, ν) is then called small, just as in the probabilistic setting. Theorem 3.2 of [39] states that there exists a constantλ ∈ (0, ∞], the generalized principal eigenvalue, or g.p.e., such that, for any small function s ∈ B + , The semigroup is said to beλ-transient if for one, and then all small pairs (s, ν), satisfying s ∈ B + , ν ∈ M + , we have ∞ k=0λ −k−1 ν P k s < ∞; otherwise it is calledλ-recurrent. Proposition 2.8 shows that the generalized principal eigenvalue coincides with the spectral radius when considering positive semigroups that admit a spectral gap. Related results may be found in Theorem 4.4 and Proposition 4.5 of [32].
Proposition 2.8 Suppose that { P n : n ∈ Z + } is a ψ-irreducible, positive semigroup. Suppose moreover that the semigroup admits a spectral gap in L v ∞ , with finite spectral radiusξ. Then: (iv) For any λ >ξ, and any (s, ν) that solve (28) Proof. Suppose that either (i) or (ii) is false. In either case, for all small pairs (s, ν), It then follows that the projection operator Q defined in (25) satisfies ν Qs = 0 for all small s ∈ L v ∞ , ν ∈ M v 1 . This is only possible if Q = 0, which is impossible under our assumption that the semigroup admits a spectral gap.
To complete the proof, observe that the semigroup generated by the kernel R λ also admits a spectral gap, with spectral radiusγ = (λ −ξ) −1 . It follows that there is a closed ball D ⊂ C containingγ such that the two kernels below are bounded linear operators on From (i) and (ii) we know that R λ isγ-recurrent, which implies that νYγs = 1, and that P h =ξh (see [39,Theorem 5.1]). Moreover, again from (i), (ii), since νYγs < ∞ it follows that the spectral radius of ( R λ − s ⊗ ν) is strictly less thanγ, which implies (iii). Finally, since |||Yγ||| v < ∞ we may conclude that h ∈ L v ∞ , and this establishes (iv). On specializing to the kernels {P f : F ∈ L W 0 ∞ } we obtain the following corollary. Define for any measurable function F : X → (−∞, ∞]: (i) Λ(F ) = log(λ(F )) = the logarithm of the g.p.e. for P f .
Lemma 2.9 Consider a ψ-irreducible Markov chain, and a measurable function G : Proof. We have |||P n g ||| v < ∞ for some n ≥ 1 when Ξ(G) < ∞. Consequently, since G and V are assumed positive, we have g( Proposition 2.10 Under (DV3+) the functional Ξ is finite-valued and convex on L W 0 ∞ , and may be identified as the logarithm of the generalized principal eigenvalue: Proof. Theorem 2.4 implies that P f is v-separable, and Proposition 2.8 then gives the desired equivalence. Convexity is established in Lemma C.1.
The spectral radius of the twisted kernel given in (21) also has a simple representation, when the function h is chosen as a solution to the multiplicative Poisson equation: ∞ , the twisted kernelPf satisfies (DV3+) with Lyapunov functionV := V −F + c for c ≥ 0 sufficiently large. Consequently, the semigroup generated by the twisted kernel has a discrete spectrum in Lv ∞ , and its log-spectral radius has the representation,Ξ Proof. The kernels P f andPf are related by a scaling and a similarity transformation, It follows that (DV3+) (i) is satisfied with the Lyapunov functionV , and we haveV ≥ 1 for sufficiently large c sincef ∈ L v ∞ . The representation ofΞ also follows from the above relationship betweenPf and P f .
Since the set C W (r) is small for the semigroup {P t f : t ≥ 0}, there exists ǫ > 0, T 1 < ∞, and a probability distribution ν such that Consequently,f −1 is bounded on C W (r).

Doeblin and Uniform Conditions
The uniform upper bound in condition (DV3+) (ii) is easily verified in many models. Consider first the special case of a discrete time chain Φ with a countable state space X, and with W such that C W (r) is finite for all r < W ∞ . In this case we may take T 0 = 1 in (DV3+) (ii), and set This is the starting point for the bounds obtained in [2]. A common assumption for general state space models is the following: See [13,12], as well as [43,27,29]. It is obvious that (31) implies the validity of the upper bound in our assumption (DV3+) (ii). Somewhat surprisingly, Condition (U) also implies a corresponding lower bound, and moreover we may take the bounding measure equal to the invariant measure π: Proposition 2.12 Suppose that Φ is an aperiodic, ψ-irreducible chain. Then, condition (U) holds if and only if there is a probability measure π on (X, B), a constant N 0 ≥ 1, and a sequence of non-negative numbers {δ n : n ≥ N 0 }, satisfying, Proof. It is enough to show that condition (U) implies the sequence of bounds given in (32). Condition (U) implies the following minorization, Since the chain is assumed aperiodic and ψ-irreducible, it follows that the chain is uniformly ergodic, a property somewhat stronger than Doeblin's condition [34,Theorem 16.2.2]. Consequently, there exists an invariant probability measure π, and constants B 0 < ∞, b 0 > 0 such that, Condition (U) then gives the following upper bound: On multiplying (31) by π(dy), and integrating over y ∈ X, we obtain, Let Γ denote the bivariate measure given by, Γ(dx, dy) = π(dx)P T 1 (x, dy), for x, y ∈ X. The previous bound implies that Γ has a density p(x, y; T 1 ) with respect to π×π, where p( · , · ; T 1 ) is jointly measurable, and may be chosen so that it satisfies the strict upper bound, p(x, y; The probability measure Γ has common one-dimensional marginals (equal to π). Consequently, we must have p(x, y; T 1 )π(dx) = 1 a.e. y ∈ X [π]. For n ≥ 2T 1 we define the density p(x, y; n) via, p(x, y; n) := P n−T 1 (x, dz)p(z, y; T 1 ), x, y ∈ X.
We have the upper bound sup x,y p(x, y; n) ≤ b 0 for all n ≥ T 1 since P k is an L ∞ -contraction for any k ≥ 0. Combining this bound with (33) gives the strict bound, This easily implies the result.
Note that, for the special case of reflected Brownian motion on a compact domain, a similar result is established in [3].
We have already noted in the above proof that the lower bound in (32) implies the Doeblin condition, which is known to be equivalent to (V4) with V bounded for a ψ-irreducible chain [34,Theorem 16.2.2]. Consequently, condition (U) frequently holds for models on compact state spaces but it rarely holds for models on R n . We summarize this and related correspondences with drift criteria here.
where r > 1 is arbitrary, and ǫ > 0 is to be determined. The functions V and V 0 are equivalent when ǫ ≤ T 1 −1 r −T 1 +1 since then by Hölder's inequality, x ∈ X, and the right hand side is in under the assumptions of (ii). Moreover, we have V ≥ ǫV 0 by considering only the first term in the definition of V . Hence V ∈ L V 0 ∞ and V 0 ∈ L V ∞ , which shows that V and V 0 are equivalent. We assume henceforth that this bound holds on ǫ.
Hölder's inequality also gives the bound, This implies the result since the state space is small.

Donsker-Varadhan Theory
In Donsker and Varadhan's classic papers [14,15,16] there are two distinct sets of assumptions that are imposed for ensuring the existence of a large deviations principle, roughly corresponding to parts (i) and (ii) of our condition (DV3+).
Lyapunov criteria. The Lyapunov function criterion of [16,43] is essentially equivalent to (DV3), with the additional constraint that the function W has compact sublevel sets; see conditions (1)-(5) on [43, p. 34]. In the general case (when X is not compact) this implies that (DV3) holds with an unbounded W . It is worth noting that the nonlinear generator is implicitly already present in the Donsker-Varadhan work, visible both in the form of the rate function, and in the assumptions imposed in [15,16,43].
Continuity and density assumptions. In [43] two additional conditions are imposed on Φ. It is assumed that the chain satisfies a strong version of the Feller property, and that for each x, P (x, dy) has a continuous density p x (y) with respect to some reference measure α(dy) which is independent of x.
These rather strong assumptions are easily seen to imply condition (DV3+) (ii) when W is coercive, so that the sets C W (r) are pre-compact.

Multiplicative Mean Ergodic Theorems
The main results of this section are summarized in the following two theorems. In particular, the multiplicative mean ergodic theorem given in (35) will play a central role in the proofs of the large deviations limit theorems in Section 5. For all these results we will assume that Φ satisfies (DV3) with an unbounded function W . As above, we let B + denote the set of functions h : X → [0, ∞] with ψ(h) > 0; for A ∈ B we write A ∈ B + if ψ(A) > 0; and let M + denote the set of positive measures on B satisfying µ(X) > 0.
As in (6) in the Introduction, we choose an arbitrary measurable function W 0 : X → [1, ∞) in L W ∞ , whose growth at infinity is strictly slower than W . This may be expressed in terms of the weighted L ∞ norm via, lim where {C W (r)} are the sublevel sets of W defined in (5). The function W 0 is fixed throughout this section. Given F ∈ L W 0 ∞ and an arbitrary α ∈ C, we recall from [32] the notation P α := e αF P , and where v := e V and V is the Lyapunov function in (DV3+). Next, we collect the main results of this section in the following theorem. Recall the definition of the empirical measures {L n } from (14). (i) There is a maximal, isolated eigenvalue λ(αF ) ∈ S α satisfying |λ(αF )| = ξ(αF ). Furthermore, Λ(αF ) := log(λ(αF )) is analytic as a function of α ∈ Ω, and for real α it coincides with the log-generalized principal eigenvalue of Section 2.4.
(ii) Corresponding to each eigenvalue λ(αF ), there is an eigenfunctionf α ∈ L v ∞ and an eigenmeasureμ α ∈ M v 1 , where v := e V , normalized so thatμ α (f α ) =μ α (X) = 1. The functionf α solves the multiplicative Poisson equation, and the measureμ α is a corresponding eigenmeasure: Proof. Lemma B.3 in the Appendix shows that (P f 0 ) 2T 0 +2 is v η -separable for any F 0 ∈ L W 0 ∞ , and Theorem 3.5 then implies that the spectrum of P f 0 is discrete. It follows that solutions to the eigenvalue problem for Theorem 3.4 establishes the limit (iii) for α ∈ C in a neighborhood of the origin. Consider then the twisted kernelP =Pf a , where a is real. Proposition 2.11 states that this satisfies (DV3+) with Lyapunov functionV := V /f a . An application of Theorem 3.4 to this kernel then implies a uniform bound of the form (iii) for α in a neighborhood of a. For any given a > 0 we may appeal to compactness of the line-segment {a ∈ R : |a| ≤ a} to construct ω > 0 such that (35) holds for α ∈ Ω.
We note that this result has many immediate extensions. In particular, if condition (DV3+) is satisfied, then this condition also holds with (V, W ) replaced by (1 − η + ηV, W ) for any 0 < η < 1. Consequently,f ∈ L vη ∞ for any 0 < η ≤ 1 when F ∈ L W 0 ∞ . Part (iii) of the theorem is at the heart of the proof of all the large deviations properties we establish in Section 5. For example, from (35) we easily obtain that, for any F ∈ L W 0 ∞ , the log-moment generating functions of the partial sums converge uniformly and exponentially fast: We therefore think of Λ(αF ) as the limiting log-moment generating function of the partial sums {S n } corresponding to the function F , and much of our effort in the following two section will be devoted to examining the regularity properties of Λ and its convex dual Λ * . Following [32], next we give a weaker multiplicative mean ergodic theorem for α in a neighborhood of the imaginary axis. Recall the following terminology: The asymptotic variance σ 2 (F ) of a function F : X → R is defined to be variance obtained in the corresponding Central Limit Theorem for the partial sums of F (Φ(n)), assuming it exists. For a V -uniformly ergodic (or, equivalently, a geometrically ergodic) chain, the asymptotic variance is finite for any function F satisfying F 2 ∈ L V ∞ , and [34, Theorem17.0.1] gives the representation, The minimal h for which this holds is called the span of F . If the function F can be written as a sum, F = F 0 + F ℓ , where F ℓ is lattice with span h and F 0 has zero asymptotic variance then F is called almost-lattice (and h is its span). Otherwise, F is called strongly non-lattice. The lattice condition is discussed in more detail in [32]. The proof of the following result follows from Theorem 3.1 and the arguments used in the proof of [32,Theorem 4.2].
Theorem 3.2 (Bounds Around the iω-Axis) Assume that the Markov chain Φ satisfies condition (DV3+) with an unbounded W , and that F ∈ L W 0 ∞ is real-valued.

Spectral Theory of v-Separable Operators
The following continuity result allows perturbation analysis to establish a spectral gap under (DV3). Recall that we set v η := e ηV ; for any real-valued F ∈ L W ∞ we define f := e F ; and we let P f denote the kernel P f (x, dy) := f (x)P (x, dy).

Lemma 3.3 Suppose that Φ is ψ-irreducible and aperiodic, and that condition
Proof. We have from the definition of the induced operator norm, Also, we have the elementary bounds, for all x ∈ X, Combining these bounds gives, The supremum is bounded under the assumptions of the proposition, which establishes the desired bound. We now show that, for any given h ∈ L vη ∞ , F ∈ L W 0 ∞ , the map G → I G−F P f h represents the Frechet derivative of P f h. We begin with the mean value theorem, where F θ = θF + (1 − θ)G for some θ : X → (0, 1). The bounds leading up to (39) then lead to the following bound, for all x ∈ X, It follows that there exists b 1 < ∞ such that which establishes Frechet differentiability.
Next we present a local result, in the sense that it holds for all F with sufficiently small L W ∞norm, where the precise bound on F W is not explicit. Although a value can be computed as in [32], it is not of a very attractive form. Note that Theorem 3.4 does not require the density condition used in (DV3+).
The definition of the empirical measures {L n } is given in (14).
(ii) There exist positive constants B 0 and b 0 such that, for all g ∈ L vη ∞ , x ∈ X, n ≥ 1, we have withf ,μ, λ(F ) given as in (i).
(iii) If V is bounded on the set C used in (DV3) then we may take η 0 = 1.
Proof. Assumption (DV3) combined with Theorem 2.2 implies that P is v η -uniform for all η > 0 sufficiently small (when V is bounded on C then (DV3) implies v-uniformity, so we may take η = 1).
It follows that the inverse [I − P + 1 ⊗ π] −1 exists as a bounded linear operator on L vη ∞ [34, Theorem 16.0.1]. An application of Lemma 3.3 implies that the kernels P f converge to P in norm We have the explicit representation, writing ∆ : The first term on the right hand side exists as a power series in H −1 ∆, provided Moreover, in this case we obtain the bound, For any F ∈ L W ∞ we have the upper bound, |F | ≤ [|||F||| W δ −1 ]δW , where δ > 0 is given in (DV3). Recalling the definition of the log-generalized principal eigenvalue functional Λ from Section 2.4, and assuming that θ := |||F||| W δ −1 < 1, we may apply the convexity of Λ (see Lemma C.1) to obtain the upper bound, where b is given in (DV3). From (44) we conclude that there is a constant ǫ 0 > 0 such that ǫ 0 < 1 2 ǫ 1 , and (42) together with the bound |λ(F ) − 1| < 1 2 ǫ 1 hold whenever |||F||| W < ǫ 0 . For such F , it follows that (43) holds, and hence P f is v η -uniform. SettingȞ := [Iλ(F ) − P f + 1 ⊗ π] we may express the eigenfunction and eigenmeasure explicitly as: The remaining results follow as in [32,Theorem 4.1].
In order to extend Theorem 3.4 to a non-local result we invoke the density condition in (DV3+) (ii). In fact, any such extension seems to require some sort of a density assumption.
Recall that, in the notation of Section 2.2 and Section 2.4, we say that the spectrum S in L v ∞ of a linear operator P : L v ∞ → L v ∞ is discrete, if for any compact set K ⊂ C \ {0}, S ∩ K is finite and contains only poles of finite multiplicity. We saw earlier that condition (DV3+) implies that P 2T 0 +2 is v-separable. Next we show in turn that any v-separable linear operator P has a discrete spectrum in L v ∞ .
Proof. Assume first that T 0 = 1. For a given ǫ > 0, set P = K + ∆ with |||∆||| v < ǫ, and with K a finite-rank operator. Write K = n i=1 s i ⊗ ν i , and for each z ∈ C define the complex numbers {m ij (z)} via Let M (z) denote the corresponding n × n matrix, and set γ(z) = det(I − M (z)). The function γ is analytic on {|z| > |||∆||| v } because on this domain we have Moreover, this function satisfies γ(z) → 1 as |z| → ∞, from which we may conclude that the equation γ(z) = 0 has at most a finite number of solutions in any compact subset of {|z| > |||∆||| v }.
As argued in the proof of Theorem 3.4, if γ(z) = 0, then we have, Conversely, this inverse does not exist when γ(z) = 0. Recalling that ǫ ≥ |||∆||| v , we conclude that S( P ) ∩ {z : |z| > ǫ} = {z : γ(z) = 0}. The right hand side denotes a finite set, and ǫ > 0 is arbitrary. Consequently, it follows that the spectrum of P is discrete. If T 0 > 1 then from the foregoing we may conclude that the spectrum of P T 0 is discrete. The conclusion then follows from the identity For each n ≥ 1, we define the nonlinear operators Λ n and G n the space of real-valued functions F ∈ L W 0 ∞ , via, Λ n (F ) := 1 n log E x exp(n L n , F ) The following result implies that both sequences of operators {G n } and {Λ n } are convergent.
Smoothness properties of the limiting nonlinear operators are established in Propositions 4.3 and 4.5.
Proposition 3.6 Suppose that (DV3+) holds with an unbounded function W . Then there exists a nonlinear operator G : Proof. Note that the second bound follows from the first. So, let δ 0 > 0 and F 0 ∈ L W 0 ∞ be given, and consider an arbitrary F ∈ L W 0 ∞ satisfying F − F 0 W 0 ≤ δ 0 . We defineF n := G n (F ) for n ≥ 0, andF = G(F ) := log(f ), withf given in Theorem 3.1. We show below that for any η > 0, there exists b(η) < ∞ such that for all such F , Taking this for granted for the moment, observe that we then have, for any r ≥ 1, n ≥ 1, Moreover, Theorem 3.1 implies that for any r ≥ 1, provided we have the uniform bound (45). Putting these two conclusions together, and letting r → ∞ then gives, lim sup This then proves the desired uniform convergence, since η > 0 is arbitrary.
We now prove the uniform bound (45). We begin with consideration of the functions {F : F − F 0 W 0 ≤ δ 0 }, since the corresponding bounds on {F n } then follow relatively easily.
Let τ = min{k ≥ 1 : |F (Φ(k))| ≤ r}, with r ≥ 1 chosen so that {x : |F (x)| ≤ r} ∈ B + . The stochastic process below is a positive local martingale, The local martingale property combined with Fatou's Lemma then gives the bound, and then by Jensen's inequality and the definition of τ , The right hand side is bounded below by −k 0 (V + 1) for some finite k 0 by (V3) and [34,Theorem 14.0.1]. However, this bound can be improved. Since F ∈ L W 0 ∞ , and since W ∈ L V ∞ with (W 0 , W ) satisfying (6), we can find, for any η 0 > 0, a constant b 0 (η 0 ) and a small set Small sets are special (see [39]), which implies that Moreover, it follows from [34, Theorem 14.0.1] that for some b 0 < ∞, Combining the bounds (46-49) establishes (45) forF . From (35) in Theorem 3.1 we have, for any η > 0, constants From the forgoing we see that the right hand side is bounded by 2ηV +b(2η) for some b(2η) < ∞ and all n.
To complete the proof, we show that a corresponding lower bound holds: By definition of f n and an application of Jensen's inequality we have for all n ≥ 0, where the expectation is with respect to the process with transition kernelPf . On taking logarithms, and appealing to the mean ergodic limit for the twisted process, for constants This together with the bounds obtained onF shows that (45) does hold.

Entropy, Duality and Convexity
In this section we consider structural properties of the operators G, H and the functional Λ. As above, we assume throughout that Φ satisfies (DV3+) with an unbounded function W , and we choose and fix an arbitrary function W 0 ∈ L W ∞ as in (34). Also, throughout this section we restrict attention to real-valued functions in L W 0 ∞ and real-valued measures in M W 0 1 since one of our goals is to establish convexity and present Taylor series expansions of G, H, and Λ acting on L W 0 ∞ . Recall from Proposition 2.8 that the log-generalized principle eigenvalue Λ coincides with the log-spectral radius Ξ on this domain.
The convex dual of the functional Λ : A probability measure µ ∈ M W 0 1 and a function F ∈ L W 0 ∞ form a dual pair if the above supremum is attained, so that Λ(F ) + Λ * (µ) = µ, F .
The main result of this section is a proof that Λ * can be expressed in terms of relative entropy (recall (17)) provided that we extend the definition to include bivariate measures on (X × X, B × B). Throughout this section we let M denote a generic function on X × X, and Γ a generic measure on (X × X, B × B). The definitions of L W ∞ and M W 1 are extended as follows: The following proposition shows that consideration of the bivariate chain Ψ, For any univariate measure µ and transition kernelP , we write µ ⊙P for the bivariate measure µ⊙P (dx, dy) := µ(dx)P (x, dy). In particular, Proposition 4.1 shows that if Φ satisfies (DV3+) with an unbounded W , then so does Ψ.
Proposition 4.1 The following implications hold for any Markov chain Φ, with corresponding bivariate chain Ψ: (ii) If C is a small set for Φ, then X × C is small for Ψ; (iii) If C ∈ B, µ, and T 0 ≥ 1 satisfy P T 0 (y, A) ≤ µ(A) for y ∈ C, A ∈ B, then on setting C 2 = X × C and µ 2 = µ ⊙ P we have, where P 2 denotes the transition kernel for Ψ; (iv) If ν ∈ M + is small for Φ then ν 2 := ν ⊙ P is small for Ψ; (v) Suppose that Φ satisfies the drift condition (DV3). Then Ψ also satisfies the following version of (DV3), where H 2 is the nonlinear generator for Ψ, C 2 = X × C, and Proof. To prove (i) consider any set A 2 ∈ B × B with ψ 2 (A 2 ) > 0. Define Then we have ψ(g) > 0, and hence by ψ-irreducibility of Φ, ∞ k=0 P k g (x) > 0, for all x ∈ X. It follows immediately that ∞ k=0 P k 2 I A 2 (x, y) > 0, for all x, y ∈ X, from which we deduce that Ψ is ψ 2 -irreducible. This proves (i), and (ii)-(iv) are similar.
Proof. Any probability measure Γ on (X × X, B × B) can be decomposed as Γ(dx, dy) = π(dx)P (x, dy), whereπ is the first marginal for Γ. We show in Lemma 4.11 that the marginals of Γ must agree when Λ * (Γ) < ∞, and this establishes (i). Finiteness of Λ * (Γ) also implies that Γ is absolutely continuous with respect to π ⊙ P . This follows from Proposition 4.6 (iv) below, applied to the bivariate chain Ψ. Consequently, the transition kernel can be expressed,P (x, dy) = m(x, y)P (x, dy), for x, y ∈ X, for some measurable function m : With M = log m, Proposition C.10 gives the upper bound, We apply Proposition C.4 to obtain a corresponding lower bound: There is a sequence {M k : as k → ∞. Moreover, we have Λ(M ) = 0 sinceP (x, dy) = m(x, y)P (x, dy) is transition kernel for a positive recurrent Markov chain, and hence '1-recurrent [39]. Consequently, We thus obtain the identity Λ * (Γ) = Γ, M , which is precisely (ii). Finally, part (iii) follows from Proposition 4.6 (iii) combined with Proposition 4.10.

Convexity and Taylor Expansions
We now return to consideration of the univariate chain Φ, and establish some regularity and smoothness properties for the (univariate) functional Λ and the nonlinear operators H and G.
We recall the definition of the twisted kernelP h from (21), and for any h : X → (0, ∞) we define the bilinear and quadratic forms, When h ≡ 1 we remove the subscript so that F, G := P (F G) − (P F )(P G), and Q(F ) := P (F 2 ) − (P F ) 2 . It is well-known that σ 2 (F ) := π(Q(ZF )) is equal to the asymptotic variance given in (37)   (i) Λ is strongly continuous: For each F 0 ∈ L W 0 ∞ there exists B < ∞, such that for all F ∈ L W 0 ∞ satisfying F W 0 < 1, is analytic as a function of a. Moreover, we have the second-order Taylor expansion, where g =f 0 := e G(F 0 ) , and π g is the invariant probability measure ofP g .
Proof. Part (i) follows from Proposition 2.10 combined with Lemma 3.3.
To establish (ii) we note that Λ n (F 0 + aF ) is an analytic function of a for each initial x, and F 0 , F ∈ L W 0 ∞ . Proposition 3.6 states that this converges to Λ(F 0 + aF ), which is convex and hence also continuous on R, and the convergence is uniform for a in compact subsets of R. This implies that the limit is an analytic function of a.
The second-order Taylor series expansion follows as in the proof of property P4 in the Appendix of [32].
where inequalities between functions are interpreted pointwise.
(ii) H is smooth: We have the second-order Taylor expansions, for any F, F 0 ∈ L W 0 ∞ , where g =f 0 := e G(F 0 ) and A g is the generator ofP g .
Proof. We first show that H : which shows that H(F ) ∈ L V ∞ . Given these bounds, the smoothness result (ii) is a consequence of elementary calculus.
To establish convexity, we let H i = H(F i ) and f i = e F i , so that P f i = e H i f i , i = 1, 2. An application of Hölder's inequality gives the bound, We can also obtain a Taylor-series approximation for G, but it is convenient to consider a re-normalization to avoid additive constants. Define, We have the Taylor series expansion, where Zf 0 is the fundamental kernel forPf 0 , normalized so that πZf 0 F = 0, F ∈ L W 0 ∞ .
Proof. The strong continuity follows from strong continuity of P g given in Lemma 3.3. The Taylor-series expansion is established first with F 0 = 0. Given F ∈ L W 0 ∞ , a ∈ R, we let f a = exp(aF ), and letf a be the solution to the eigenfunction equation given by Under assumption (DV3) alone we have seen in Theorem 3.4 that this is an eigenfunction in L v ∞ for small |a|. We also haveF a = log(f a ) = G 0 (F a ) + k(a), with k(a) = π(F a ). In the analysis that follow, our consideration will focus onF a rather than G 0 (F a ) since constant terms will be eliminated through our normalization.
We note that the first derivative may be written explicitly as, Observe that the derivative is in L v ∞ since both I F P fa and [I − P fa + 1 ⊗ π] −1 are bounded linear operators on L v ∞ . Similar conclusions hold for all higher-order derivatives. We define the twisted kernel as above, As in [32] we may verify that the function F a = d daF a is a solution to Poisson's equation, where π a is invariant forP a . Setting a = 0 gives the first term in the Taylor series expansion for G 0 .
To obtain an expression for the second term we differentiate Poisson's equation: We wish to compute the second derivative, F (2) a = d 2 da 2 log(f a ), which requires a formula for the derivative ofP a : For any G ∈ L V ∞ , d da Letting H a = F a , F a f a , the identities (58) and (59) then give, Letting Z a denote the fundamental kernel forP a we conclude that Evaluating all derivatives at the origin provides the quadratic approximation for G 0 , where Z is the fundamental kernel for P , normalized so that πZ = 0, and F = ZF .
To establish the Taylor-series expansion at arbitrary a 0 ∈ R we repeat the above arguments, applied to the Markov chain with transition kernelP a 0 . This satisfies (DV3+) withV = c + V −F a 0 for sufficiently large c > 0, by Proposition 2.11.

Representations of the Univariate Convex Dual
The following result provides bounds on the (univariate) convex dual functional Λ * , and gives some alternative representations: Proposition 4.6 Suppose that (DV3+) holds with an unbounded function W . Then, for any probability measure µ ∈ M W 0 1 : (iii) There exists ǫ 0 > 0, independent of µ ∈ M W 0 1 , such that (iv) If µ is not absolutely continuous with respect to π, then Λ * (µ) = ∞.
The proof is provided after the following bound.
Lemma 4.7 Suppose that (DV3+) holds with an unbounded function W . Then,F ∈ L ∞ provided the following conditions hold: F ∈ L V ∞ ; Λ(F ) = 0; and F = F I C V (r) for some r ≥ 1.
Proof. From the local martingale property we have, This then gives the bound, Proof of Proposition 4.6. For any F ∈ L W 0 ∞ , and any r ≥ 1 we write, F r = I C V (r) [F − γ r ], where γ r ∈ R is chosen so that Λ(F r ) = 0. Its existence follows from Proposition 4.3.
To prove (iv), write µ = pµ 0 + (1 − p)µ 1 where µ 0 , µ 1 are probability measures on (X, B) such that µ 1 ≺ π is absolutely continuous and µ 0 is singular with respect to π. Let S denote the support of µ 0 . We have Λ(F ) = 0 whenever F ∈ L ∞ is supported on S, and hence which is infinite, as claimed.
Proof. (sketch) LetP a =Pǧ a and let π a denote the invariant distribution for given G W 0 ≤ 1, and a ∈ [0, 1]. We let Z a the fundamental kernel forP a , normalized so that π a (Z a G) = π a (G), and we let G a = Z a G. Proposition 4.3 then gives the representation, The proof is completed on showing that where the supremum is over all a and G in this class. This follows from the arguments above -see in particular (45) and the surrounding arguments.
In the following proposition we give another characterization of dual pairs (µ, G) for Λ * . (iii) Suppose that µ ∈ M W 0 1 , and that there exists G ∈ L W 0 ∞ satisfying, Then µ is invariant under the twisted kernelP g .
Proof. The first result is simply Jensen's inequality: If equality holds, it then follows that e H is constant a.e. [π].

Characterization of the Bivariate Convex Dual
We now turn to the case of bivariate functions and measures. Given any function of two variables M : X × X → R, we let m = e M and extend the definition of the scaled kernel in (20) via, P m (x, dy) := m(x, y)P (x, dy) , x, y ∈ X .
The following result shows that the spectral radius of this kernel coincides with that defined for the bivariate chain Ψ. The proof is routine.
Proposition 4.10 Suppose that P m has finite spectral radius λ m in v η -norm for all sufficiently small η > 0. Let P 2 denote the transition kernel for the bivariate chain Ψ.
(i) I m P 2 has the same spectral radius in v η2 -norm for sufficiently small η > 0, with v η2 (x, y) = exp(η[V (y) + 1 2 δW (x)]). (ii) If P m has an eigenfunctionf , then I m P 2 also possesses an eigenfunction given by, For a Markov process with transition kernel P satisfying (DV3+), we say that M and M are similar if there exists H ∈ L V ∞ such that The function M is called degenerate if it is similar to M ≡ 0. The log-generalized principal eigenvalues agree (Λ(M ) = Λ( M )) whenever M, M are similar. This is the basis of the following two lemmas.
Proof. The conclusion that Γ ≺ π ⊙ P follows from Proposition 4.6 (iv). where Γ 1 and Γ 2 denote the two marginals. If Γ 1 = Γ 2 it is obvious that the right hand side cannot be bounded in H.

Lemma 4.12
Suppose that (DV3+) holds with an unbounded function W . Suppose moreover that M ∈ L V ∞,2 , and that the asymptotic variance of the partial sums n−1 k=0 M (Φ(k), Φ(k + 1)), n ≥ 1, is equal to zero. Then the function M is degenerate.
Proof. Applying [32,Proposition 2.4] to the bivariate chain Ψ with transition kernel P 2 , we can find M such that where π 2 = π ⊙ P is the invariant probability measure for P 2 . Since Φ(k + 1) is conditionally independent of Φ(k − 1) given Φ(k), it follows that M does not depend on its first variable. Thus we can find F ∈ L V ∞ satisfying whereπ is a marginal of Γ (see Lemma 4.11). Then, the function M 0 is similar to whereF = log(f ), withf equal to an eigenfunction for P m , with eigenvalue λ(M ). (ii) Conversely, suppose that Γ ∈ M W 0 1,2 is given, satisfying Γ ≺ [π ⊙ P ], and suppose that its one-dimensional marginals agree. Consider the decomposition, Γ(dx, dy) = [π ⊙P ](dx, dy), whereπ := Γ 1 = Γ 2 is the (common) first marginal of Γ on (X, B), andP is a transition kernel. Let Proof. Part (i) is a bivariate version of Proposition 4.9: We know that Γ is an invariant measure for a bivariate process, whose one-dimensional transition kernel is of the form, Invariance may be expressed as follows: Γ(dy, dz) = x∈X Γ(dx, dy)P m (y, dz) , y, z ∈ X.
To prove (ii), letΛ( · ) denote the functional defining the log-generalized principal eigenvalue for the transition kernelP = P m . Proposition 2.

Large Deviations Asymptotics
In this section we use the multiplicative mean ergodic theorems of Section 3 and the structural results of Section 4 to study the large deviations properties of the empirical measures {L n } induced by the Markov chain Φ on (X, B); recall the definition of {L n } in (14). As in the previous section, we also assume throughout this section that the Markov chain Φ satisfies (DV3+) with an unbounded function W , and we choose and fix a function W 0 : (34). Our first result, the large deviations principle (LDP) for the sequence of measures {L n }, will be established in a topology finer (and hence stronger) than either the topology of weak convergence, or the τ -topology. As described in the Introduction, we consider the τ W 0 -topology on the space M 1 of probability measures on (X, B), defined by the system of neighborhoods (16).
Since the map (x 1 , . . . , x n ) → 1 n n i=1 δ x i from X n to M 1 may not be measurable with respect to the natural Borel σ-field induced by the τ W 0 -topology on M 1 , we will instead consider the (smaller) σ-field F, defined as the smallest σ-field that makes all the maps below measurable: where E o andĒ denote the interior and the closure of E in the τ W 0 topology, respectively.
The proof is based on an application of the Dawson-Gärtner projective limit theorem along the same lines as the proof of Theorem 6.2.10 in [12]. The main two technical ingredients are provided by, first, the multiplicative mean ergodic theorem Theorem 3.1 (iii) which, as noted in (36), shows that the log-moment generating functions converge to Λ. And second, by the regularity properties of Λ and the identification of Λ * in terms of relative entropy, established in Section 4 and Section C of the Appendix.
As in Section 4, in order to identify the rate function for the LDP we find it easier to consider the bivariate chain Ψ. Recall the bivariate extensions of our earlier definitions from equations (51), (52), (53) and (54).
Proof of Theorem 5.1. We begin by establishing an LDP for Φ with rate function given by Λ * . Recall that Proposition 3.6 gives In order to apply the projective limit theorem we need to extend the domain of the convex dual functional Λ * as follows. For probability measures ν ∈ M W 0 1 , Λ * (ν) is defined in (50), and the same definition applies when ν is a probability measure not necessarily in M W 0 1 . More generally, let L ′ denote the algebraic dual of the space L = L W 0 ∞ , consisting of all linear functionals Θ : L → R, and equipped with the weakest topology that makes the functional Therefore, we can identify the space of probability measures M 1 with the corresponding subset of L ′ , and observe that the induced topology on M 1 is simply the τ W 0 -topology.
Next, extend the definition of Λ * to all Θ ∈ L ′ via and observe that [12, Assumption 4.6.8] is satisfied by construction (with W = L = L W 0 ∞ , X = L ′ and B = F), and that by Proposition 4.3 the function Λ(F 0 + αF ) is Gateaux differentiable. Therefore, we can apply the Dawson-Gärtner projective limit theorem [12, Corollary 4.6.11 (a)] to obtain that the sequence of empirical measures {L n } satisfy the LDP in the space L ′ with respect to the convex, good rate function Λ * . Moreover, since by Proposition C.9 we know that Λ * (Θ) = ∞ for Θ ∈ M 1 , we obtain the same LDP in the space (M 1 , F), with respect to the induced topology, namely, the τ W 0 -topology; see, e.g., [12,Lemma 4.1.5].
Next note that, in view of Proposition 4.1, the bivariate chain Ψ also satisfies the same LDP. But in this case, we claim that can express Λ * (Γ) for any bivariate probability measure Γ as follows: if the two marginals Γ 1 and Γ 2 of Γ agree; ∞ , otherwise.
Finally, an application of the contraction principle [12, Theorem 4.2.1] implies that the univariate convex dual Λ * (ν) coincides with I(ν) in (62). Simply note that the τ W 0 -topology on the space of probability measures is Hausdorff, and that the map Γ → Γ 1 is continuous in that topology. Theorem 5.1 strengthens the "local" large deviations of [32] to a full LDP. The assumptions under which this LDP is proved are more restrictive that those in [32], but apparently they cannot be significantly relaxed. In particular, the density assumption of (DV3+) (ii) cannot be removed, as illustrated by the counter-example given in [18]. This example is of an irreducible, aperiodic Markov chain with state space X = [0, 1], satisfying Doeblin's condition. It can be easily seen that this Markov chain satisfies condition (DV3) with Lyapunov function V (x) = − 1 2 log x, x ∈ [0, 1], and with W given by Taking δ = 1, C = [0, 1] and b = 2 yields a solution to (DV3), with the Lyapunov function V and the unbounded function W as above. But for this Markov chain the density assumption in (DV3+) (ii) is not satisfied, and as shown in [18], it satisfies the LDP with a rate function different from the one in Theorem 5.1. The LDP of Theorem 5.1 can easily be extended to the sequence of empirical measures of k-tuples L n,k , defined for each k ≥ 2 by We write M 1,k for the space of all probability measures on (X k , B k ), and we let F k denote the σ-field of subsets of M 1,k defined analogously to F in (61), with X k in place of X, and with real-valued functions F in the space Similarly, the τ W 0 k -topology on M 1,k is defined by the system of neighborhoods A straightforward generalization of the argument in the above proof yields the following corollary. The proof is omitted.
Corollary 5.2 Under the assumptions of Theorem 5.1, for any initial condition Φ(0) = x, the sequence of empirical measures {L n,k } satisfies the LDP in the space (M 1,k , F k ) equipped with the τ W 0 k -topology, with the good, convex rate function where ν k−1 denotes the first (k − 1)-dimensional marginal of ν k .
Next we show that under the assumptions of Theorem 5.1 it is possible to obtain exact large deviations results for the partial sums S n , of a real-valued functional F ∈ L W 0 ∞ . In the next two theorems we prove analogs of the corresponding expansions of Bahadur and Ranga Rao for the partial sums of independent random variables [1]. Our results generalize those obtained by Miller [36] for finite state Markov chains, and those in [32] proved for geometrically ergodic Markov processes but only in a neighborhood of the mean; see [32] for further bibliographical references.
First we note that, since for any F ∈ L W 0 ∞ the map ν → ν, F from M 1 to R, is continuous under the τ W 0 topology, we can apply the contraction principle to obtain an LDP for the partial sums {S n } in (66): Their laws satisfy the LDP on R with respect to the good, convex rate function J(c) as in (19), Alternatively, based on (the weak version of) the multiplicative mean ergodic theorem in (63), we can apply the Gärtner-Ellis theorem [12,Theorem 2.3.6] to conclude that the laws of the partials sums {S n } satisfy the LDP on R with respect to the good rate function J * (c), so that, in particular, J(c) = J * (c) for all c. Now suppose for simplicity that the function F has zero mean π(F ) = 0 and nontrivial central limit theorem variance σ 2 (F ) > 0; recall the definition of σ 2 (F ) from Section 3.1. To evaluate the supremum in (67), we recall from Lemma 2.10 that Λ(aF ) is convex in a ∈ R, and since by Theorem 3.1 it is also analytic, it is strictly convex. Therefore, if we define then J * (c) = ∞ for values of c larger than F max , and the probabilities of the large deviations events {S n ≥ nc} decay to zero super-exponentially fast. Therefore, from now on we concentrate on the interesting range of values 0 < c < F max . Note that, although in the case of independent and identically distributed random variables it is easy to identify F max as the right endpoint of the support of F , for Markov chains this need not be the case, as illustrated by the following example.
Example. Let Φ = {Φ(n) : n ≥ 0} be a discrete-time version of the Ornstein-Uhlenbeck process in R 2 , with Φ(0) = x ∈ R 2 and, Φ(n + 1) = Φ 1 (n + 1) where {N (k)} is a sequence of independent and identically distributed N (0, 1) random variables. Let A denote the above 2-by-2 matrix, and assume that the roots of the quadratic equation z 2 + a 1 z + a 2 = 0 lie within the open unit disk in C.
Note that there exists γ < 1 and a positive definite matrix P satisfying, A T P A ≤ γI. One may take P = ∞ 0 γ −k (A k ) T A k , where γ < 1 is chosen so that the sum is convergent. Then Φ satisfies (DV3+) (i) with Lyapunov function V (x) = 1 + ǫx T P x, and W = V , for suitably small ǫ > 0 (hence, the drift condition (DV4) also holds). Condition (DV3+) (ii) holds with T 0 = 2 since P 2 (x, · ) has a Gaussian distribution with full-rank covariance.
Consider the functions The asymptotic variance of F 0 is zero, and for any initial condition we have We conclude that F max = (F + ) max = 1, although π{F > c} > 0 for each c ≥ 0 under the invariant distribution π.
Recall form Section 3.1 the definitions of lattice and non-lattice functionals.

Theorem 5.3 (Exact Large Deviations for Non-Lattice Functionals)
Suppose that Φ satisfies (DV3+) with an unbounded function W , and that F ∈ L W 0 ∞ is a real-valued, strongly-non-lattice functional, with π(F ) = 0 and σ 2 (F ) = 0. Then, for any 0 < c < F max and all x ∈ X, where a > 0 is the unique solution of the equation d da Λ(aF ) = c, σ 2 a := d 2 da 2 Λ(aF ) > 0,f a (x) is the eigenfunction constructed in Theorem 3.1, and J(c) is defined in (19). A corresponding result holds for the lower tail.
The proof of Theorem 5.3 is identical to that of the corresponding result in [32], based on the following simple properties of a Markov chain satisfying (DV3+). We omit properties P5 and P6 since they are not needed here.
Properties. Suppose Φ satisfies (DV3+) with an unbounded function W , and choose and fix an arbitrary x ∈ X and a function F ∈ L W 0 ∞ with zero asymptotic mean π(F ) = 0 and nontrivial asymptotic variance σ 2 = σ 2 (F ) = 0. Let S n denote the partial sums in (66) and write m n (α) for the moment generating functions The proofs of the following properties are exactly as those of the corresponding results in [32], and are based primarily on the multiplicative mean ergodic theorem Theorem 3.1, and the Taylor expansion of Λ(F ) given in Proposition 4.3. Observe that by Theorem 2.2 we have that the Lyapunov function V in (DV3+) satisfies π(V 2 ) < ∞.
An analogous asymptotic expansion for lattice functionals is given in the next theorem; again, its proof is omitted as it is identical to that of the corresponding result in [32].
Theorem 5.4 (Exact Large Deviations for Lattice Functionals) Suppose Φ satisfies (DV3+) with an unbounded function W , and that F ∈ L W 0 ∞ is a real-valued, lattice functional with span h > 0, π(F ) = 0 and σ 2 (F ) = 0. Let {c n } be a sequence of real numbers in (ǫ, ∞) for some ǫ > 0, and assume (without loss of generality) that, for each n, c n is in the support of S n . Then, for all x ∈ X, where Λ n (a) is the log-moment generating function of S n , Λ n (a) := log E x e aSn , n ≥ 1, a ∈ R , each a n > 0 is the unique solution of the equation d da Λ n (a) = c n , and J n (c) is the convex dual of Λ n (a), J n (c) := Λ * n (c) := sup A corresponding result holds for the lower tail.
Observe that the expansion (69) in the lattice case is slightly more general than the one in Theorem 5.3. If the sequence {c n } converges to some c > ǫ as n → ∞, then, as in [32], the a n also converge to some a > 0, and have, for all x ∈ S, where the last bound uses Cauchy-Schwartz. This gives an upper bound for x ∈ S, and the same bound also holds for all x since g n (x) ≤ sup y∈S g n (y). Choosing η > 0 so small that ρ := e η2T B 1 (2η)(1 − ǫ) Proof of Theorem 2.5.
Proof of Theorem 2.2.
The construction of a Lyapunov function V * follows from the bounds given above, beginning with (72) (note however that W ≡ 1 under (DV2)). Assume that the set A ∈ B + is fixed, with V bounded on A. We assume moreover that A is small -this is without loss of generality by [34,Proposition 5.2.4 (ii)]. Fix k ≥ 0, and define, Consideration of this stopping time in (72) gives the upper bound, for some b 1 < ∞, and on summing both sides we obtain the pair of bounds, We now demonstrate that this function satisfies the desired drift condition: We have, . This is indeed a version of (V4).
Proposition A.2 Suppose that X is σ-compact and locally compact; that P has the Feller property; and that there exists a sequence of compact sets {K n : n ≥ 1} satisfying (27): For Then, there exists a solution to the inequality, such that V, W : X → [1, ∞) are continuous, their sublevel sets are precompact, C ∈ B is compact, and b < ∞.
Proof. Let {O n : n ≥ 1} denote a sequence of open, precompact sets satisfying O n ↑ X, and K n ⊂ closure of O n ⊂ O n+1 , n ≥ 1. For each n ≥ 1 we consider a continuous function s n : X → [0, 1] satisfying s n (x) = 1 for x ∈ O n , and s n (x) = 0 for x ∈ O c n+1 . We then define a stopping time τ n ≥ 1 through the conditional distributions, From the conditions imposed on s n we may conclude that τ Kn ≥ τ On ≥ τ n ≥ τ O n+1 for each n ≥ 1.
For n ≥ 1, m ≥ 1 we define V n,m : X → R + by, Continuity of this function is established as follows: First, observe that under the Feller property we can infer that P x {τ n = k} is a continuous function of x ∈ X for any k ≥ 1. The bound τ n ≤ τ Kn , n ≥ 1, combined with (27) then establishes a form of uniform integrability sufficient to infer the desired continuity. Moreover, by the dominated convergence theorem we have V n,m (x) ↓ 0, m → ∞, for each x ∈ X. Continuity implies that this convergence is uniform on compacta. We choose {m n : n ≥ 1} so that V n,mn (x) ≤ 1 on O n+1 , and we define V n = V n,mn . Letting W n = n − 1 1 − s m , we obtain the bound H(V n ) ≤ −W n + 1. Let {p n } ⊂ R + satisfy n≥1 p n = 1, p n n = ∞, and define, W := 1 + B v-Separable Kernels The following result is immediate from the definition (24).
Lemma B.1 Suppose that { P n : n ∈ Z + } is a positive semigroup, with finite spectral radiuŝ ξ > 0. Then the inverse [Iz − P ] −1 admits the power series representation, where the sum converges in norm.

Lemma B.2 (i) is a simple corollary:
Lemma B.2 Consider a positive semigroup { P n : n ∈ Z + } that is ψ-irreducible. Then: (i) The spectral radiusξ in L v ∞ of { P t } satisfiesξ < b 0 for a given b 0 < ∞ if and only if there is a b < b 0 , and a function v 1 : X → [1, ∞) such that v 1 equivalent to v, and P v 1 ≤ bv 1 .
Then v 1 ∈ L v ∞ by Lemma B.1, and v ≤ v 1 by construction. Moreover, it is easy to see that v 1 satisfies the desired inequality.
Conversely, if the inequality holds then for any 0 < η < 1, n ≥ 1, It follows thatξ ≤ η −1 b since v and v 1 are equivalent. Since η < 1 is arbitrary, this shows that b ≥ξ, and completes the proof.
The following result will be used below to construct v-separable kernels.
Lemma B.3 Suppose that P is a positive kernel, and that there is a measure µ ∈ M v 1 satisfying Then P 2 is v-separable.
We then define P ǫ (x, dy) = r ǫ (x, y)v −1 (y)µ(dy) , x, y ∈ X , and P ǫ2 := P P ǫ . The latter kernel may be expressed P ǫ2 = s i ⊗ ν i , with We have s i ∈ L v ∞ and ν i ∈ M v 1 for each i. For any g ∈ L v ∞ , x ∈ X, we then have, Lemma B.4 Suppose that (DV3) holds with W unbounded. Fix 0 < η ≤ 1, and consider any measurable function F satisfying We then have |||I C W (r) c P f ||| vη → 0, exponentially fast, as r → ∞.

Proof of Theorem 2.4.
(a) ⇒ (b). When (DV3) holds we can conclude from Lemma B.4 that |||P − I C W (r) P||| v 0 → 0 as r → ∞. It follows that |||P T − I C W (r) P T ||| v 0 → 0 as r → ∞ for any T ≥ 1. In particular, this holds for T = T 0 . Under the separability assumption on {I C W (r) P T 0 : r ≥ 1} it then follows that P T 0 is v-separable.
(b) ⇒ (a). We first show that each of the sets {C v 0 (r) : r ≥ 1} is small. Under the assumptions of (b) we may find, for each ǫ > 0, an integer N ≥ 1, functions This gives for any r ≥ 1, Let A ∈ B be a small set with ν i (A c ) < ǫ for each i. From the bound above and using similar arguments, It follows that for any r ≥ 1, we may find a small set A(r) such that P T 0 (x, A(r)) ≥ 1 2 , for x ∈ C v 0 (r). It then follows from [34, Proposition 5.2.4] that C v 0 (r) is small.
We now construct a solution to the drift inequality in (DV3). Using finite approximations as in (74), we may construct, for each n ≥ 1, an integer r n ≥ n such that Since the norm is submultiplicative, this then gives the bound, We then define for each n ≥ 1, From the previous bound on |||(P I C c rn ) k ||| v 0 we have the pair of bounds, Finally, we set where C = C v (r) for some r, and the constants b and r are chosen so that W (x) ≥ 1 for all x ∈ X. The bounds (75) together with the lower bound v n ≥ v 0 e n I C c rn imply that which implies the existence of r and b satisfying these requirements.
In much of the remainder of the appendix we replace (DV3+) with the following more general condition: (ii) There exists T 0 > 0 such that I C W (r) P T 0 is v-separable for for each r < ∞.
Theorem 2.4 states that this is roughly equivalent to (DV3+) with an unbounded function W . In fact, we do have an analogous upper bound for P T 0 : Lemma B.6 Suppose that the conditions of (76) hold. Then, for each r ≥ 1, ǫ > 0, there is a positive measure β r,ǫ ∈ M v 1 such that Proof. We apply the approximation (74) used in the proof of Theorem 2.4, where {s i : 1 ≤ i ≤ N } ⊂ L v 0 ∞ are non-negative valued, and {ν i : 1 ≤ i ≤ N } ⊂ M v 0 1 are probability measures. We may assume that the {s i } satisfy the bound 1 = P T 0 (x, X) ≥ s i (x) − 1, x ∈ C W (r), and it follows that we may take β r,ǫ = 2 N i=1 ν i . The following result is proven exactly as Lemma B.5, using Lemma B.6.

C Properties of Λ and Λ *
In this section we obtain additional properties of Λ and Λ * . One of the main goals is to establish approximations of Λ(G) through bounded functions when G is possibly unbounded. Similar issues are treated in [13,Chapter 5] where a tightness condition is used to provide related approximations.
Lemma C.1 For a ψ-irreducible Markov chain: (i) The log-generalized principal eigenvalue Λ is convex on the space of measurable functions F : X → (−∞, ∞].
(ii) The log-spectral radius Ξ is convex on the space of measurable functions F : X → (−∞, ∞]. Proof. The proofs of (i) and (ii) are similar, and both proofs are based on Lemma B.2. We provide a proof of (ii) only. Fix F 1 , F 2 ∈ L W 0 ∞ , η, θ ∈ (0, 1), and let b i = η −1 ξ(F i ), i = 1, 2. Lemma B.2 implies that there exists functions {v 1 , v 2 } equivalent to v, and satisfying We then define so that by Hölder's inequality, The function v θ is equivalent to v. Consequently, we may apply Lemma B.2 once more to obtain that ξ( Taking logarithms then gives, This completes the proof since 0 < η < 1 is arbitrary.
The following result establishes a form of upper semi-continuity for the functional Λ.
Lemma C.2 Suppose that Φ is ψ-irreducible, and consider a sequence {F n } of measurable, real-valued functions on X. Suppose there exists a measurable function F : X → R such that F n ↑ F , as n ↑ ∞. Then the corresponding generalized principal eigenvalues converge: Λ(F n ) → Λ(F ), as n ↑ ∞.
Proof. It is obvious that lim sup n→∞ Λ(F n ) ≤ Λ(F ). To complete the proof we establish a bound on the limit infimum. Under the assumptions of the proposition we have P T fn ≥ P T f 1 , for any T ≥ 1, n ≥ 1. It follows that we can find an integer T 0 ≥ 1, a function s : X → [0, 1], and a probability ν on B satisfying ψ(s) > 0 and P T 0 fn ≥ s ⊗ ν , 1 ≤ n ≤ ∞.
Let (f n , λ n ) denote the Perron-Frobenius eigenfunction and generalized principal eigenvalue for P fn , normalized so that ν(h n ) = 1 for each n. For each n ≥ 1 we have the upper bound, P fnfn ≤ λ nfn . This gives a lower bound on the {f n }: f n ≥ λ −T 0 n P T 0 fnf n ≥ λ −T 0 n ν(f n )s = λ −T 0 n s.
In applying Lemma C.2 we typically assume that suitable regularity conditions hold so that Ξ(F ) = Λ(F ). Under a finiteness assumption alone we obtain a complementary continuity result for certain classes of decreasing sequences of functions. One such result is given here: Lemma C.3 Suppose that |||P||| v < ∞, and that F : X → R is measurable, with Ξ(F + ) < ∞. Then, with F n := max(F, −n) we have, Ξ(F n ) ↓ Ξ(F ), as n ↑ ∞.
To establish a tight approximation for Λ(M ), where M = log m is as in the proof Theorem 4.2, we will approximate M by bounded functions.
Proposition C.4 Suppose that |||P||| v < ∞, and that F : X → R is measurable, with Ξ(F ) < ∞, and Λ(F ) = Ξ(F ). Then, there exists a sequence {n k : k ≥ 1} such that with F k := F I{−n k ≤ F ≤ k} we have: Proof. Let F 0 k := F I{F ≤ k}. From Lemma C.2 we have Λ(F 0 k ) ↑ Λ(F ), k → ∞. It follows that we also have Ξ(F 0 k ) ↑ Ξ(F ), k → ∞, since Ξ dominates Λ. We now apply Lemma C.3: For each k ≥ 1 we may find n k ≥ 1 such that with F k := F I{−n k ≤ F ≤ k}, The following proposition implies that Λ is tight in a strong sense under (DV3+): Proposition C.5 Suppose that the conditions of (76) hold. Then, for any increasing sequence of measurable sets K n ↑ X, and any G ∈ L W 0 ∞ , The proof is postponed until after the following lemma.
Lemma C.6 Suppose that the conditions of (76) hold, and consider any increasing sequence of measurable sets K n ↑ X, and any G ∈ L W 0 ∞ . Then, on letting g n = exp(I K c n G), n ≥ 1, we have |||P T 0 P gn − P T 0 +1 ||| v → 0 , n → ∞.
Proof. We may assume without loss of generality that G ≥ 0. As usual, we set g = e G . Under (76) we have |||P gn ||| v ≤ |||P g ||| v < ∞, n ≥ 1. Consequently, given Lemma B.4, it is enough to show that for any r ≥ 1, To see this, observe that for any h ∈ L v ∞ , x ∈ X, I C W (r) [P T 0 P gn − P T 0 +1 ]h (x) = I C W (r) [P T 0 I K n c [P g − P ]]h (x) , where the measure β r,ǫ ∈ M v 1 is given in Lemma B.6. Consequently, This proves the result since ǫ > 0 is arbitrary.
Proof of Proposition C.5. To see (i), consider any G ∈ L W 0 ∞ , and any sequence of measurable sets K n ↑ X. We assume without loss of generality that G ≥ 0.
Fix any b > 1, and define for n ≥ 1, G n = (T 0 + 1)bI K c n G. In view of Lemma C.6, given any Λ > 0, we may find n ≥ 1 such that the spectral radius of the semigroup generated by the kernel P n := P T 0 P gn satisfies ξ n < e Λ . With n, Λ fixed, we then have for some b n < ∞, P k n v ≤ b n e kΛ v for k ≥ 1. This has the sample path representation, x ∈ X, k ≥ 1.
Denote by h 0,k (x) the expectation on the left hand side. We then have, for each j ≥ 1, h j,k (x) := P j h 0,k (x) ≤ b n e kΛ (|||P||| v ) j v(x) , x ∈ X.
We then obtain the following bound using Hölder's inequality, x ∈ X, k ≥ 1.
This shows that Λ I Kn G → Λ G as claimed.
Proposition C.5 allows us to broaden the class of functions for which Ξ is finite-valued.
If the state space X is σ-compact, then we may assume that W 1 is also coercive.
Proof. Fix a sequence of measurable sets satisfying K n ↑ X, with sup x∈Kn V (x) < ∞ for each n. Proposition C.5 implies that we may find, for each k ≥ 1, an integer n k ≥ 1 such that Ξ(2 k+1 I K c n k W 0 ) ≤ 1. We then define The functional Ξ is convex by Lemma C.1, which gives the bound, To see that W 1 ∈ L V ∞ we apply Lemma 2.9. Finally, if X is σ-compact, then the {K n } may be taken to be compact sets, which then implies the coercive property for W 1 .
We have the following useful corollary. The proof is routine, given Proposition C.7 and Proposition B.2 (i); see also [2,Theorem 2.4].
We now turn to properties of the dual functional Λ * defined in (64). The continuity results stated in Proposition C.5 lead to the following representation.
Proposition C.9 Suppose that the conditions of (76) hold. Let Θ be a linear functional on L W 0 ∞,2 satisfying Λ * (Θ) < ∞. Then Θ may be represented as, where ν ∈ M W 0 1 is a probability measure.
Proof. We proceed in several steps, making repeated use of the bound, First note that on considering constant functions in (77)  It is clear that finiteness of Λ * implies that Θ, 1 = 1. Next, consider any G : X → R + with G ∈ L W 0 ∞ . Then, since Λ(cG) ≤ 0 for c ≤ 0, We conclude that Θ, G ≥ 0 for G ≥ 0. Consider now a set A ∈ B of ψ-measure zero. Then Λ(cI A ) = 0 for any c ≥ 0, and we can argue as above using (77) that ∞ > Λ * (Θ) ≥ sup c>0 Θ, I A c, which shows that Θ, I A = 0.
It follows that lim sup n→∞ Θ(G n ) = 0, which implies that Θ defines a countably additive set function, so that Θ is in fact a probability measure.
More generally, we define Λ * for bivariate probability measures Γ not necessarily in M W 0 1,2 using the same definition as in (54). Recall from Lemma 4.11 that the two marginals of Γ agree whenever Λ * (Γ) < ∞. Proposition C.10 provides further structure.
Lemma B.5 shows that Λ(ǫW ) < ∞ for ǫ > 0 sufficiently small, and this gives (79). DefineP through the decomposition Γ =π ⊙P , and letĚ denote the expectation for the Markov chain with transition kernelP . We assume thatP is of the form P (x, dy) = m(x, y)P (x, dy), x, y ∈ X, and set M = log(m), since otherwise the relative entropy is infinite and there is nothing to prove. We then have, for any G ∈ L W 0 ∞,2 , Λ(G) = lim T →∞ Taking the supremum over all G ∈ L W 0 ∞,2 gives (78).