SELF-NORMALIZED LARGE DEVIATIONS FOR MARKOV CHAINS

: We prove a self-normalized large deviation principle for sums of Banach space valued functions of a Markov chain. Self-normalization applies to situations for which a full large deviation principle is not available. We follow the lead of Dembo and Shao [DemSha98b] who state partial large deviations principles for independent and identically distributed random sequences.


From Cramér to Shao
Let (E, E) be a Polish space and (X n ) n be a sequence of E-valued random variables. For a borel-function g : E → R d and q > 1, we introduce S n (g) = n i=1 g(X i ) and V n,q (g) = n i=1 g(X i ) q 1/q . If (X n ) n is an independant and identically distributed (shortened as i.i.d.) sequence with distribution µ and if g : E → R is µ-integrable, the classical Cramér-Chernoff large deviation theorem states that where h g is the Cramér transform of the i.i.d. sequence (g(X i )) i . This inequality is useful if h g (x) > 0 for all x > g, i.e. if the "Cramér condition" is satisfied: there exists τ > 0 such that e τ g dµ < ∞.
Under this condition, we have 1 n log P S n (g) n ≥ x → −h g (x).
However, this assumption is way too strong in many situations. In [Sha97], Shao shows that it is possible to get rid of this exponential moment assumption taking advantage of self-normalization. He considers for instance the self-normalized sequence R n,q (g) = S n (g) n 1−1/q V n,q (g) and obtains the following very interesting result (with g g L q (µ) = 0 if g L q (µ) = ∞ and , lim n 1 n log P (R n,q (g) ≥ x) = − K(x) < 0, without any moment assumption on the random variable (g(X i )) i .
In this work, we consider the same problematic in the Markovian framework and obtain analog results in section 2.5. (corollary 2).

Full and partial large deviation principles
Introducing the notion of partial large deviation principle in two articles [DemSha98a] and [DemSha98b], Dembo and Shao give a more general sense to Shao's paper [Sha97] and lighten the tools used to obtain these results.
To help comprehension, we recall the basic vocabulary in large deviation theory. Let E be a metric topological space equiped with its Borel σ-field E. A function I : E → [0, ∞] is a good rate function if its level sets {x; I(x) ≤ t} are compact and it is a weak rate function if its level sets are closed (namely, if I is lower semi-continuous, shortened to l.s.c. in the sequel). A sequence of probability measures (µ n ) n on (E, E) satisfies a large deviation principle (shortened to LDP) with a good rate function I if, for every open subset G and every closed subset F of E, We say that the sequence (µ n ) n satisfies an upper LDP (resp. a lower LDP) if (2) only (resp. (1) only) holds. Moreover, a weak LDP is said to hold if (2) is satisfied for the compact sets of E only and if I is a weak rate function.
The concept of partial large deviation principle (PLDP) has been introduced by Dembo and Shao in [DemSha98a and b] : the sequence (µ n ) n satisfies an upper PLDP with weak rate I with respect to a subclass S of E if, for every A ∈ S, we have : The full PLDP is said to hold if (1) is satisfied as well for every open G ⊂ E.

Plan of the paper
In section 2, we give our main results. A weak large deviation principle for "balanced couples" is stated in section 3 as a preliminary to obtain the main Theorem (the same way as in the i.i.d. case where the weak Cramér Theorem is the first step to prove self-normalized results). We give some commentaries along with examples in section 4. The proofs of the results are given in sections 5 and 6 : section 5 deals with the weak large deviation principle while section 6 provides partial exponential tightness which is the key to obtain partial large deviation Theorem. At last, section 7 brings some precisions about upper weak large deviations (Theorem 2).

Main results, partial LDP
We consider a Markov chain X = (X i ) i∈N taking values in a Polish space E endowed with its Borel σ-field E. Its transition kernel is denoted by p, C b (E) is the space of real bounded continuous functions on E and P(E) the space of probability measures on E equiped with the topology of weak convergence. If ζ belongs to P(E 2 ), we denote by ζ 1 and ζ 2 the first and second marginal. If ξ ∈ P(E) and Γ ∈ E ⊗ E, then ξp(·) = ξ(dx)p(x, ·), p(f )(x) = p(x, dy)f (y) and ξ ⊗ p(Γ) = I Γ (x, y)ξ(dx)p(x, dy).
We work with the canonical form of the Markov chain (E N , E ⊗N , (P x ) x∈E , (X n ) n≥0 ) and the following notation : for any initial distribution ν, P ν = ν(dx)P x .

Assumptions on the Markov chain
These are the assumptions we might have use of in the following, the third one being useless for upper LDP results.
The upper bounds stated in this section require a regularity assumption concerning the Markov chain. Let us recall the classical Feller property and the "almost Fellerian" extension proposed by Dupuis and Ellis [DupEll] and related to a condition introduced by J.G. Attali ([Att]) : Assumption 1 (Fellerian or almost Fellerian transition).
• The transition p satisfies the Feller property (or is "Fellerian") if the map : x → p(x, ·) is continuous for the weak convergence topology of P(E).
• More generally, denoting D(p) the discontinuity set of x → p(x, ·), p is "almost Fellerian" if, for every x ∈ E and all δ > 0, there exist an open set G δ of E containing D(p) and a real number r(x) > 0 such that for any y ∈ E, d(x, y) ≤ r(x) =⇒ p(y, G δ ) ≤ δ. In particular, for all x ∈ E, p(x, D(p)) = 0.
We recall that a Lyapunov function is a measurable, non negative borel-function whose level sets are relatively compact. The existence of an invariant probability measure µ for p is guaranteed by the following "stabilization" condition: there exists a Lyapunov function V, a < 1 and b ∈ R such that pV ≤ aV + b.
If this invariant probability µ is unique, (X n ) n is µ-stable (almost surely, L n = 1 n n i=1 δ X i converges weakly to µ) and we have the law of large numbers: if g : E → R d is continuous and such that g ≤ ρ(U ) with ρ(t) t → 0 when t → ∞, The following assumption is a stronger version. It was introduced by Donsker-Varadhan (condition H * in [DonVar76]) in order to obtain an upper LDP for the empirical distributions of a Markov chain. Our version is taken from Dupuis-Ellis ( [DupEll] chap 8) : Assumption 2 (Criterion of exponential stabilization associated with (U, V )). There exists a borel-function U : E → R + , a Lyapunov function V and a non-negative constant C such that : Remark 1.
a) Under the assumptions 1 and 2, p always has an invariant probability measure (see Proposition 9.2.6 in [DupEll]). This probability is unique if p is irreducible.
c) U and W = e U are Lyapunov functions.
Assumption 3 (Strong irreducibility). p satisfies the following two conditions : 1) There exists an integer L such that, for every (x, y) ∈ E 2 , 2) p has an invariant probability measure µ.
Remark 2. Assumptions 1,2 and 3 are always satisfied in the i.i.d. case.

Particular case of the functional autoregressive models
Let us illustrate our asumptions considering the following model taking values in R d : We do not know many large deviations results for such models. We can mention the LDP for the unidimensional linear autoregressive model with gaussian noise (see [BryDem97], ). There also exists a moderate large deviation result for the multidimensional linear autoregressive model or for the kernel estimator with a generalized gaussian noise (see [Wor99]). The study of such models with weaker assumptions for the noise is one of the motivations of selfnormalization (following [Sha97]). Let us consider the conditions imposed by the assumptions stated in section 2.1 for this particular model: • This Markov chain is Fellerian if f and σ are continuous; it is almost Fellerian if f and σ are Lebesgue almost everywhere continuous and ε 1 has a bounded density with respect to the Lebesgue measure on R d . The almost fellerian assumption allows to study important models in econometry such as threshold autoregressive models for which function f is defined as follows: with f i continuous, Γ i disjoined with boundaries of Lebesgue measure zero (and ∪Γ i = R d ).
• Exponential stabilization is checked for the model if there exists a positive borel-function U such that lim Also, if φ : R + → R + is an increasing function, for any s ∈]0, 1[, we have Therefore, Hence, we can take U (x) = φ( x ) under various assumptions. Of course, a less constraining condition on noise will lead to a more restrictive condition on the function f . a) Under Cramér condition E (exp(τ ε 1 )) < ∞, (τ > 0), we can take φ(t) = τ r t and the condition on f is lim , we can take, for any s ∈]0, 1[, c) If we only assume that the noise is integrable E( ε 1 ) < ∞, then φ(t) = log(t) and the condition on f is • At last, strong irreducibility is satisfied as soon as ε 1 has a strictly positive density with respect to Lebesgue measure.

Donsker-Varadhan rate functions
According to Donsker-Varadhan theory ( [DonVar75a] and [DonVar76]), we introduce the relative entropy between two probability measures µ and ν on E : and the rates related with the LDP concerning the empirical distributions L n = 1 n n i=1 δ X i−1 ,X i and L 1 n = 1 n n i=1 δ X i−1 are, if ζ ∈ P(E 2 ) and ξ ∈ P(E), Furthermore, we assume that (B, B) is a separable Banach space endowed with its Borel σ-field. For a B-valued measurable function g on E (resp. G on E 2 ), we set, for x ∈ B : These functions are convex (this statement is proved in Paragraph 5.1) but might not be l.s.c.. Hence, we consider the corresponding l.s.c.-normalized functions h g and h G , with These functions are l.s.c. and convex. We call them "Donsker-Varadhan rates". Finally, the following notations are constantly used in the sequel: For a function h : H → R + and a subset Γ of H, we note h(Γ) = inf x∈Γ h(x).
• If g : E → B, the study of the rate h g and the sequence g n is a particular case of the one involving h G and G n using the function G(x, y) = g(y) (it is not difficult to show that, in this case, h g = h G ). Hence, we shall only work with functions G defined on E 2 .

Previous LDP for vector valued functionals
With the Donsker-Varadhan rate h G , we have the following theorem (see [GulLipLot94]) which generalizes known results for bounded functions.

Theorem (Full upper LDP ([GulLipLot94])
Let G : E 2 → R d be a continuous function.
1) Assume assumptions 1 and 2 are satisfied. Let V ⊕ V be defined by V ⊕ V (x, y) = V (x) + V (y) and where ρ : R + → R + is a function such that ρ(t) t → ∞ when t → ∞. Then : 2) If assumption 3 is also satisfied, the full LDP with rate h G is valid.
As an example, the case studied in section 2.2 (Functional autoregressive models) proves that the domination condition (6) is not easily checked. In section 2.5., we give self-normalized large deviation principles which would be obvious under the assumption (6), as well as [Sha97] and [DemSha98b] who handled self-normalization in the i.i.d. case to get rid of Cramér's condition.

Partial Large Deviation Principle
We now state our main results extending Dembo-Shao's work ( [DemSha98b]) to our Markovian framework.
Theorem 1 (Self-normalized LDP). Assume that the transition probability p is almost Fellerian and satisfies the criterion of exponential stabilization associated with x (1) ≥ r and J(r) = h G (Γ r ). J is increasing, leftcontinuous and, for every In particular, the following Chernoff-like upper bound holds for every compact subset H of H M : (1) n ) satisfies an upper LDP with rate J(·) on R + .

We obtain interesting Corollaries :
Corollary 1. Assume that the assumptions and notations of Theorem 1 hold. In addition, we suppose that the chain satisfies the strong irreducibility hypothesis with invariant probability measure µ and G (1) is integrable with respect to µ ⊗ p. Then, for any initial distribution ν, the full partial large deviation principle is valid and we have Finally, we give the following more explicit Corollary, applying Theorem 1 to a function G = ( F q , F ). For q > 1 , we introduce the notation J q (r) = J(r q ).
Corollary 2. Let F be a continuous function from E 2 into R d . Assume that the transition probability p is almost Fellerian and satisfies the criterion of exponential stabilization associated with (U, V ). Then, for any given q > 1 and any compact subset satisfies an upper LDP with rate function J q on R + . c) If, in addition, the chain satisfies the strong irreducibility hypothesis with the invariant probability measure µ and if F q is integrable with respect to µ ⊗ p then, J q (r) > 0 if F dµ ⊗ p ≤ r( F q dµ ⊗ p) 1/q and, for any initial distribution ν, with K(r) = 0 if and only if F q is integrable with respect to µ ⊗ p and

Remark 4.
a) If the function U is bounded above on the compact sets of E, the results hold uniformly over the initial states x ∈ K for any compact subset K of E.
b) If the function U is l.s.c., then H M is a compact subset of P(E) and the results hold uniformly over H M .

Tests on Markovian models
The results stated in section 2.5 (more particularly corollary 2) are obviously interesting, as in the i.i.d. case, to obtain exponential speed to the law of large numbers. For example, the large deviation upper bounds allow to reproduce Shao's results on Student statistic stated in 1.1. and build tests for assumptions such as "the random variable observed is a Markov chain with transition p", with exponentially decreasing levels. Let us be more specific for a test between two assumptions.
We consider two transition probabilities (p i (x, ·))(i = 0, 1) on (E, E) satisfying assumptions 1 and 2. Let (µ i ) (i = 0, 1) be the unique invariant probability measures associated with p i and let us assume that there exists a measurable, strictly positive function h such that, for any x ∈ E, ν has density f with respect to its distribution under P 0 ν , with: A natural test "p 0 against p 1 " will have the rejection region The errors are Part d) of corollary 2 leads to the following result. Let us assume that L(p 1 | p 0 ) > 0 or L(p 0 | p 1 ) > 0: the models are distinguishable. Then, for any t such that we have the upper bounds: For another application of the self-normalized large deviation, one can look [HeSha96].
3 Weak LDP for vector valued functions with Donsker-Varadhan rates

Known results concerning empirical distributions
Several upper LDP for the empirical distributions L n = 1 [Aco90] and Dupuis-Ellis [DupEll]. The statement that follows is a synthesis of the results we need in our proofs :

About Donsker-Varadhan rate functions
The function G = (G (1) , G (2) ) considered in Theorem 1 is a particular case of the following class of functions that we will call "balanced couples": • G (2) is continuous from E 2 into a separable Banach space B and, for a continuous function Besides, if the function G (1) has compact level sets (i.e. if G (1) is a Lyapunov function), then the couple (G (1) , G (2) ) will be called "Lyapunov balanced couple".
The following lemma will be proved in section 5.1.
Lemma 1 (Properties of Donsker-Varadhan rate functions). (5) is a convex function. Hence, its l.s.c.-normalized function h G is a convex weak rate function.

1) For any function
2) If G is a Lyapunov balanced couple then h * G defined in Theorem 1 is a convex and l.s.c. function.

Upper weak LDP for balanced couples
Theorem 2 (Upper weak LDP for balanced couples).
Assume that p is an almost Fellerian transition on (E, E) that satisfies the criterion of expo- n )) n satisfies uniform an upper weak LDP for every initial distribution with weak rate function h * G (·). In other words, for any compact subset K of R + × B and any compact subset H of H M , In particular, if R ∈]0, ∞[ and if C is a compact set in B then lim sup

Lower LDP
A general lower LDP relative to the sums of Banach space valued additive functionals of a Markov chain has been proved by De Acosta and Ney ([AcoNey98]) with no other assumptions that the irreducibility of the chain and the measurability of the function. Yet, it seems difficult to compare the "spectral rate" for which their lower LDP holds with h G .
Our demonstration relies on the dynamic programming method developped by Dupuis and Ellis ( [DupEll]) for proving the lower LDP which needs a stronger assumption than standard irreducibility (Condition 8.4.1. in [DupEll]). Therefore, we achieve a less general result than that of De Acosta and Ney but it holds with the rate h G as well as the upper LDP.
The following Theorem requires strong irreducibility but no assumption about the regularity of p or G.
Theorem 3. If p fulfills the strong irreducibility assumption and if G : E 2 → B is measurable, integrable with respect to µ ⊗ p, then, for every initial distribution ν, the sequence (G n ) n satisfies for any open set U of B lim inf n→∞ 1 n log P ν G n ∈ U ≥ −h G (U ).

Cramér and Donsker-Varadhan for i.i.d. random vectors
We consider a Polish space E and an i.i.d. E-valued random sequence (X n ) n with distribution µ.
• If g : E → B is a measurable function (where B is a separable Banach space), the sequence (g(X n )) n is i.i.d., B-valued, and (g n ) n satisfies a weak-convex LDP with the Cramér rate : . This result is due to Bahadur and Zabell [BahZab79] (see also [DemZei] and [Aze]).
Under the Cramér condition E(exp(t g(X) )) < ∞ (for any t if B is a separable Banach space and for at least a t > 0 in the particular situation B = R d ), h Cramer g is a good rate function and the full LDP holds (see [DonVar76] or [Aco88]).
• On the other hand, we are in our context with p(x, ·) = µ(·) ∀x ∈ E. There always exists a Lyapunov function V such that e V dµ < ∞. Hence, the criterion of exponential stabilization associated with (V, V ) is satisfied. The strong irreducibility hypothesis is satisfied and Theorem 3 holds for any measurable function g : E → B, integrable with respect to µ.
The convex large deviation principle allows us to write for any ν ∈ P(E), any x ∈ B and Hence, h g ≥ h Cramer g and all our upper bound results stated in section 2 involving rate h g are still valid with h Cramer g (without assuming Cramér condition). The lower bound results obtained in Theorem 1 and in corollaries 1 and 2 hold with the rate h Cramer g according to the weak-convex Theorem.
As a direct consequence of full upper LDP Theorem in i.i.d. case, h g = h Cramer g whenever we have E(exp ρ( g(X) )) < ∞, (with ρ : R + → R + satisfying ρ(t) t → ∞). Moreover, for x is the gradient of θ → log exp( θ, g(y) )µ(dy) at a point θ(x) belonging to the interior of θ; exp( θ, g(y) )µ(dy) < ∞ and if γ θ(x) is the probability measure proportional to exp( θ(x), g )µ, we have gdγ θ(x) = x and h Cramer Taking up these two facts, one might ask whether h g = h Cramer g is always true. At this point, we cannot answer to this question. But, we show in the following that it is true for our partial large deviations bounds.
In order to avoid situations in which the Cramér rate is senseless (for example, when the Laplace transform is infinite everywhere except in 0), it is natural to consider the weak LDP associated to a balanced couple (g (1) , g (2) ) for which the domain of the Laplace transform contains the set ] − ∞, 0[×B * . This is the idea of Dembo and Shao [DemSha98a and b], [Sha97]. Our paper follows the steps of [DemSha98b] where the authors consider an i.i.d. sequence (Y n ) n taking values in B and the balanced couple (ρ • N (x), x) (ρ and N defined as in Theorem 1). Therefore, when ρ • N (Y ) is integrable, corollaries 1 and 2 yield the same self-normalized results than [DemSha98b] with the same rate (namely the Cramér rate). Without assuming that ρ • N (Y ) is integrable, Theorem 1 and parts a,b of Corollary 2 remain valid.

About the full upper LDP
Can we extend the full upper LDP stated in 2.4 for functions g such that g = O(V )? Answer is no as we can see with the following counter-example (inspired by [AcoNey98]).

• Description of the Markov chain
Denoting E(·) the integer part, the following sets form a partition of N : We consider the N-valued Markov chain defined by the following transition kernel : This Markov chain is recurrent and aperiodic with stationary distribution µ such that : * µ(0) = 1 4−3p 0 = c, µ(u m ) = µ(v m ) = µ(w m ) = cp m for every m.
In order to compute the Donsker-Varadhan rate I 1 , we must determine which transition kernels q are absolutely continuous with respect to p. They are necessarily of the following form : for each m ∈ N, * q(0, u m ) = q m > 0, q(u m , v m ) = q(v m , w m ) = q(w m , 0) = 1.
• A function g Let g be the function defined by g(u m ) = m, g(v m ) = −2m and g(w m ) = m.
For any probability measure ν such that I 1 (ν) < ∞, we have gdν = 0 and h g (x) = 0 if x = 0, h g (x) = ∞ elsewhere. On the other hand, if we set r(t) = j≥t p j and a > 0, then we have, for this function, P x (g n ≥ a) = P(X n−1 = 0)r(an), P x (g n ≤ −a) = P(X n−2 = 0)r(an).

Moreover, P(X
Therefore, if the sequence (p m ) m is such that 1 n log r(an) → R(a) > −∞ and if R is continuous, increasing to infinity, then (g n ) n satisfies a LDP with rate J(x) = −R(|x|) for every initial state (see the Lemma 9). The upper weak LDP cannot possibly hold with rate h g . We now check if these results are compatible with the upper LDP Theorem given in section 2.4.

• A criterion of exponential stabilization
Assume that p satisfies the criterion of exponential stabilization associated with (U, V ), V non-negative, increasing and such that :

Large deviation upper bound and regularity of the function
This example shows that the regularity of the function G is necessary to get an upper LDP (unlike the lower LDP).
We consider the model where (ε i ) i is an i.i.d. sequence having the distribution P(ε 1 = 1) = P(ε 1 = −1) = 1 2 and we take E = [−2, 2]. The transition kernel of this Markov chain is Let ζ be an invariant probability for q t(·) and D be the following subset of [−2, 2] : We can prove by induction that, if q t(·) is absolutely continuous with respect to p, then necessarily its invariant probability measure ζ is such that ζ(D) = 0. As a consequence, the rate h I D is infinite everywhere except in 0 where it is nul. But, starting from 0, the chain remains in D : therefore, we have 1 n log P 0 ((I D ) n ∈ {1}) = 0. According to these two observations, the upper LDP cannot hold for the sequence (I D ) n with rate h I D .
Similarly, our results about Lyapunov balanced couples are no longer valid when the function is not regular enough : for instance, if G(x) = (|x|, I D (x)), then we have The upper large deviations Theorem given in 2.4 does not apply to every measurable function G, even when it is bounded. This remark is also true for our weak upper LDP (Theorem 2), hence for our PLDP.
2) The convexity of h * G follows by a similar argument. Let us prove the lower semi-continuity of h * G . Let (x n ) n = (x (1) n , x (2) n ) n be a sequence of R + × B converging to x = (x (1) , x (2) ) and ε > 0 . Assume that lim inf h * G (x n ) < ∞ (otherwise there is nothing to prove). Let (ζ n ) n be a sequence of P(E 2 ) such that : Therefore, as the function ζ → G (1) dζ has compact level sets (because G (1) is a l.s.c. Lyapunov function) and as (x (1) n ) n is bounded, the sequence (ζ n ) n is relatively compact.
Then let (x u(n) ) n be a subsequence such that lim n h * G (x u(n) ) = lim inf n h * G (x n ) and (ζ u(n) ) n converges weakly to some probability measure ζ. For the same reasons of uniform integrability, we have

Proof of the weak upper LDP
Let G = (G (1) , G (2) ) be a balanced couple. Let K be a compact subset of R + × B and Γ K = {ζ ∈ P(E 2 ); Since G (1) is a Lyapunov function, then Γ K is a relatively compact subset of P(E 2 ). According to Donsker-Varadhan Theorem given in section 3.1, we have: Theorem 2 follows from this Lemma : . Let us prove that I(Γ K ) ≤ I(Γ K ). For every given ζ ∈ Γ K , there exists a sequence (ζ n ) n of Γ K which converges weakly to some probability measure ζ.
If the strong irreducibility hypothesis holds with the invariant probability measure µ, we have, for any x : This property implies the µ-irreducibility of the Markov chain. According to Nummelin ([Num]), the chain has a small set C. In other words, there exists a probability measure ξ on C, a real number h > 0 and a > 0 such that : In particular, ξ << µ. If we note ξU (C) = ξ(dx)U (x, C) and R C = {ω; i≥0 I C (X i (ω)) = ∞}, two situations can occur : * C is a transient small set : ξU (C) < ∞ and P ξ (R C ) = 0 ; * C is a recurrent small set : ξU (C) = ∞ and P ξ (R C ) = 1.
The transient case is incoherent here because it would imply that µ(C) = ξ(C) = 0. Consequently, C is a recurrent small set. We set Γ C = {x; P x (R C ) = 1} and note that 1 = P ξ (R C ) = P ξ ( ∞ i=n I C (X i ) = ∞) = E ξ (P Xn (R C )) for any n. Moreover, P Xn (R C ) = 1 if and only if X n ∈ Γ C , hence ξp n (Γ C ) = 1 for all n. Therefore, for any x of E and any integer l ≥ L, we have p n (x, Γ c C ) = 0. Every point of E leads to the recurrent small set C almost surely and the transition kernel has an invariant probability measure. Therefore, the chain is positive recurrent (see, for example, Theorem 8.2.16 in [Duf]).

Lower Laplace principle
We follow the method developped by Dupuis-Ellis ( [DupEll]) to prove the lower LDP for empirical distributions. * Representation formula For any initial distribution ν and every j ≤ n, we introduce the following notations: δ X i−1 ,X i and L n,n = L n ; Let f be a bounded Lipschitz function B → R ; F j is the σ-field generated by X 0 , ..., X j .
This is the dynamic programming equation for the controlled Markov chain with state space E × B, control space P(E 2 ) and the transition kernel at time j is Q j,n : Q j,n (y, r, ζ; •) = distribution of (Z 2 , r+ 1 n G(Z 1 , Z 2 )) where Z = (Z 1 , Z 2 ) has the distribution ζ.
The final cost is (y, r) → f (r) and the running cost is c j (y, r, ζ) = 1 n K(ζ | δ y ⊗ p) . * Lower Laplace principle Let q be a transition kernel of a recurrent Markov chain with stationary distribution α such that I(α ⊗ q) < ∞ and G dα ⊗ q > ∞.
For a Markov chain (Y j ) 0≤j≤n with initial distribution ν and transition kernel q, (T j ) 0≤j≤n , is the controlled Markov chain for the control ζ = δ y ⊗ q. Consequently, and we will take ν δ = ζ δ/M .
By convexity of the relative entropy, We now prove that ζ t belongs to N . According to Lemma 4, there exists a transition kernel q t with stationary distribution ζ 1 t such that, for all n and any x ∈ E, q n t (x, ·) << p n t (x, ·). Moreover, ζ 1 t << µ. Let us show that q t satisfies the strong irreducibility hypothesis.
Obviously, ζ 1 t (·) ≥ tµ(·) . The probability measures ζ 1 t and µ are equivalent. We denote h the density of ζ 1 t with respect to µ ; h ≥ t. Let A and B belong to E ; Consequently , q t (x, ·) ≥ t f (x) p(x, ·) µ almost surely. We modify q t as we did in Lemma 4 for this inequality to be true on E. This modification raises no problem since ζ 1 t ∼ µ : we change q t on a set N such that ζ 1 t (N ) = 0. Therefore, ζ 1 t remains invariant for q t and q n t (x, ·) ∼ p n (x, ·) for every n ∈ N and all x ∈ E. ♦ The lower "Laplace principle" is proved : for any bounded Lipschitz function f , The lower part of Bryc Theorem ("lower Laplace principle" =⇒ "lower LDP") is proved only considering bounded Lipschitz functions and without using the lower semi-continuity of the "rate" (see [DupEll]). Consequently, we have the lower LDP for any initial distribution ν : for any open subset U of B .♦ 6 Proof of the partial LDP

An exponential tightness result
The following result allows us to take advantage of exponential tightness criteria stated in [DemSha98b]: Lemma 5. Suppose that assumptions 1 and 2 are satisfied and let F be a continuous function from E 2 into R + . Then, there exists a continuous non-negative function T increasing to infinity such that h T •F is a good rate function and, for every compact subset H of H M and r > 0,

Proof
The result is clear if the function F is bounded. Let us assume that F is unbounded and consider This function increases to infinity. Let β be a continuous function, strictly increasing to infinity and such that β ≥ α. According to the definition of α, We consider a continuous increasing function k such that k(t)/t → 0 The conditions required to apply the full upper LDP given in section 2.4 are satisfied with ρ = k −1 (·), hence h T •F is a good rate function and we have, for every r > 0, An immediate consequence of this result is that, for any δ > 0, Indeed, we have and h T •F ([δT (r), ∞[) → ∞ as r → ∞, for every given δ > 0. ♦ 6.2 Proof of Theorem 1, part a.

Proof
We only need to check that ρ(t)/t → ∞ as t → ∞. There exists B > 0 such that, if we introduce For every t ≥ B + 1, we have ρ(t) ≥ at + b thanks to the convexity of ρ(·). If we set L(t) = ρ(t)/t then L(t)/L(t 1/2 ) = ρ(t)/(t 1/2 ρ(t 1/2 )) and therefore L(t) → ∞. As a matter of fact, if (t n ) n ↑ ∞ was a sequence such that lim sup L(t n ) < ∞, then we would have L(t n )/L(t 1/2 n ) ≤ lim sup L(t n )/a < ∞, which contradicts our hypothesis on ρ. ♦ Now, we can apply Theorem 2 to the function G : the sequence (G (1) n , G (2) n ) n satisfies a weak upper LDP with rate h * G . According to [DemSha98b], the partial LDP stated in Theorem 1 holds as soon as (G (1) n , G (2) n ) n is exponentially tight with respect to the class S(ρ • N ) ; in other words if, for any positive number R and any set A ∈ S(ρ • N ), there exists a compact subset To prove such a statement, we apply formula (9) proved in the previous Paragraph to the function F = N • G (2) : this yields, for every δ > 0, On the other hand, ρ • N belongs to the class of functions introduced in definition 0.1.1. of [DemSha98b]. Therefore, the proof of Lemma 0.1.1. in [DemSha98b] applies to the function (G (1) , G (2) ) = (ρ • N (G (2) ), G (2) ), and this entails, for any ε > 0 This property allows us to apply Proposition 0.1.2. of [DemSha98b] to prove that the distributions of (G (1) n , G (2) n ) are exponentially tight with respect to S(ρ • N ). Part a) of Theorem 1 is then proved ♦.
Assume that h G (Γ r ) = 0. This implies the existence, for any integer n, of a probability measure ζ n such that Gdζ n ∈ Γ r and I(ζ n ) ≤ 1 n . As I is a good rate function, we can consider a subsequence (ζ n ) n that converges weakly to ζ. By construction of (ζ n ) n , we have I(ζ) = 0 and Gdζ ∈ Γ r . The first assertion entails that ζ 1 = ζ 2 (which we note µ in the following) and that µ is an invariant probability measure for p. The second assertion implies that G (1) is integrable with respect to ζ = ζ 1 ⊗ p, and that For any r > 0, the set Γ r = (x (1) , x (2) ); ρ•N (x (2) ) x (1) ≥ r belongs to the class S(ρ • N ). The Chernoff upper bound is an obvious consequence of part a). ♦ We now prove the upper LDP with rate J.
Lemma 8. J is a left continuous non-decreasing function. In addition, J is nul in zero and infinite on ]1, ∞[.

Proof
If r > r then Γ r is a subset of Γ r , hence h G (Γ r ) ≥ h G (Γ r ) and J is non-decreasing. Moreover, for r > 1, the set of probability measures ζ such that Gdζ ∈ Γ r is empty (since the function ρ • N is convex) and J(r) = ∞.
We now prove the left-continuity of J (and consequently its lower semi-continuity). Let r ∈]0, 1] and let (r n ) n be an increasing sequence with limit r such that sup n J(r n ) ≤ J(r). Let then (x n ) n be a R + × B valued sequence such that x n ∈ Γ rn and ζ n be a sequence of P(E 2 ) such that Gdζ n = x n and I(ζ n ) < h G (x n ) + 1/n < h G (Γ rn ) + 2/n = J(r n ) + 2/n. Therefore sup n I(ζ n ) < ∞ and, since I has compact level sets, there exists a subsequence ζ u(n) which converges weakly to some ζ ∈ P(E 2 ). The uniform integrability argument we used in 5.1 leads to the following identities: and thus i.e. ( G (1) dζ, G (2) dζ) ∈ Γ r . Consequently, since I is l.s.c., we have We reach a contradiction ; J is left-continuous therefore l.s.c.. ♦ Since J is l.s.c. and infinite on ]1, ∞[, it is a good rate function. The upper LDP of rate J comes via the following Lemma taken from [Wor00].
Lemma 9. Let (µ n ) n be a sequence of probability measures on R + .
If there exists a R + -valued function J, increasing to infinity, nul in zero, left continuous and such that, for any r > 0, then (µ n ) n satisfies an upper LDP with rate J(·) on R + .
We apply this result to our situation and we obtain that ( ρ•N (G (2) n ) G (1) n ) n satisfies an upper LDP with rate J on R + .

Proof (taken from [Wor00])
Since J converges to infinity, the sequence (µ n ) n is exponentially tight. If we consider the rate H associated with a LDP satisfied by a sub-sequence, we have, for any r > 0, and H(r) ≥ H(]r n , ∞[) ≥ J(r n ) for a sub-sequence (r n ) n increasing to r. H(r) ≥ J(r) by the lower semi-continuity of J. ♦ 6.4 Proof of Corollary 2, part d.
Applying Theorem 1 to the borelian set A r ∈ S(|.| q ) defined by Using the same arguments as in the proof of lemma 7, we have K(r) = 0 if and only if F q is integrable with respect to µ ⊗ p and F dµ ⊗ p ≥ r( F q dµ ⊗ p) 1/q . ♦

Appendix
As the weak Cramér Theorem in i.i.d. case, the weak upper LDP stated in Theorem 2 might have its own interest. We show in this section that it can be easily checked, without assuming the criterion of exponential stabilization, if the transition is fellerian.

7.1
Weak LDP for the empirical distributions The following upper LDP result has been essentially proved by Donsker and Varadhan [Don-Var75a] and [DonVar76].

Theorem
Assume that p is fellerian. Then (L n ) n satisfies a uniform upper weak LDP over all initial distributions: lim sup n sup ν∈P(E) 1 n log P ν (L n ∈ Γ) ≤ −I(Γ) for every compact subset Γ of P(E 2 )

Proof
• Denoting LB(E 2 ) the space of Lipschitz bounded functions on E 2 provided with the norm . LB = . ∞ + r(·), r(·) being the Lipschitz constant of the function, let J be the weak rate defined on P(E 2 ) by G(x, y)ζ(dx, dy) − log e G(x,y) ζ 1 (dx)p(x, dy) .
As a matter of fact, J is l.s.c. because p is Fellerian hence ν → ν ⊗ p is a continuous map.
Setting p * G(x) = log e G(x,y) p(x, dy), we have p * G ∈ C b (E) when G ∈ LB(E 2 ) and the following identity is obtained with conditional expectations, for every initial distribution ν : 1 = E ν e n j=1 G(X j−1 ,X j )−p * G(X j−1 ) = E ν e n (G(x,y)−p * G(x))Ln(dx,dy) .
Letting n → ∞ and choosing λ arbitrarily close to J(Γ) yields the following upper bound for every compact subset Γ ∈ P(E 2 ) lim sup n→∞ sup ν∈P(E) 1 n log P ν (L n ∈ Γ) ≤ −J(Γ). (10) • The proof will be over as soon as we prove that (10) holds with the rate I instead of J.

Weak LDP for balanced couples
Taking advantage of the result proved in section 7.1, we can easily check this altered version of Theorem 2: Theorem 4.
Assume that p is fellerian. If G = (G (1) , G (2) ) is a Lyapunov balanced couple on E 2 , then ((G (1) n , G (2) n )) n satisfies a uniform weak upper LDP for every initial distribution with the weak rate function h * G (·). In other words, for any compact subset K of R + × B,