The exponential resolvent of a Markov process and large deviations for Markov processes via Hamilton-Jacobi equations

We study the Hamilton-Jacobi equation f - lambda Hf = h, where H f = e^{-f}Ae^f and where A is an operator that corresponds to a well-posed martingale problem. We identify an operator that gives viscosity solutions to the Hamilton-Jacobi equation, and which can therefore be interpreted as the resolvent of H. The operator is given in terms of optimization problem where the running cost is a path-space relative entropy. Finally, we use the resolvents to give a new proof of the abstract large deviation result of Feng and Kurtz.


Introduction
Let E be Polish and let A ⊆ C b (E) × C b (E) be an operator such that the martingale problem for A is well posed. In this paper, we study non-linear operator H ⊆ C b (E) × C b (E) given by all pairs (f, g) such that t → exp f(X(t)) − f(X(0)) − t 0 g(X(s))ds is a martingale with respect to F t := σ(X(s) | s t) and where X a solution of a well-posed martingale problem for A (If e f ∈ D(A), then (f, e −f Ae f ) ∈ H). The operator H, the martingales of (1.1) corresponding to H, and the semigroup (1.2) that formally correspond to H play (possibly after rescaling) a key role in the theory of stochastic control and large deviations of Markov processes, see e.g. [3, 4, 6, 8-10, 17, 18, 21]. Consider a sequence of Markov processes X n . [8] showed in their extensive monograph on the large deviations for Markov processes that the convergence of the non-linear semigroups V n (t) defined by V n (t)f(x) = 1 n log E e nf(Xn(t)) X n (0) = x to some appropriate limiting semigroup V(t) is a major step in establishing pathspace large deviations for the sequence X n . It is well-known in the theory of linear semigroups that the convergence of semigroups V n (t) to V(t) is essentially implied by the convergence of their infinitesimal generators 'H n f = ∂ t V n (t)f| t=0 ' to 'Hf = ∂ t V(t)f| t=0 ', see e.g. [11,15,22]. The results also hold for the non-linear context. However, in the non-linear setting, the relation between semigroup and generator is less clear. To be precise, V(t) is generated by H if we have a resolvent R(λ) := (½ − λH) −1 , λ > 0, (1.3) which approximates the semigroup in the following way (1.4) To be able to effectively use the Trotter-Kato-Kurtz approximation results in the theory of large deviations or stochastic control, it is therefore important to have a grip on the resolvent that connects the semigroup V(t) to the operator H via (1. 3) and (1.4). An important first step in this direction was made in [8] by replacing the Markov process X by an approximating jump process with bounded generator. Indeed, in the case of bounded A can establish the existence of (1.3) by using fix-point arguments. [8] then proceed to establish path-space large deviations for sequences of Markov processes using probabilistic approximation arguments, semigroup convergence (Trotter-Kato-Kurtz) and the theory of viscosity solutions to characterize the limiting semigroup.
A second observation is that in the context of diffusion processes, or for operators H that are first-order, it is not clear that one can actually invert (½ − λH) due to issues with the domain: solutions of the Hamilton-Jacobi equation f − λHf = h can have non-differentiable points. However, one can often give a family of operators R(λ) in terms of a deterministic control problem that yield viscosity solutions to the equation f − λHf = h. An extension H of H can then be defined in terms of R such that the operator H and the semigroup V(t) are connected as in (1.3) and (1.4). This paper therefore has a two-fold aim.
(1) Identify an operator R(λ) in terms of a control problem, which yields viscosity solutions to f − λHf = h where H is in terms of the martingales of (1.1). This we aim to do in the context of general (Feller) Markov processes.
(2) Give a new proof of the main large deviation result of [8] by using the operators R(λ).
Regarding (1), we will show that the operators R(λ) defined as give viscosity solutions to the Hamilton-Jacobi equation for H. That is: R(λ)h is a viscosity solution to (1.6) Here S t (Q |P x ) is the relative entropy of Q with respect to the solution of the martingale problem started at x evaluated up to time t, and τ λ is the law of an exponential random variable with mean λ. Our proof that R(λ) is a viscosity solutions to the Hamilton-Jacobi equation will be carried out using a variant of a result by [8] extended to an abstract context in [13]. The family {R(λ)} λ>0 of (1.5) gives viscosity solutions to (1.6) if (a) for all (f, g) ∈ H we have R(λ)(f − λg) = f, (b) R(λ) is contractive and a pseudo-resolvent. That is: | |R(λ)| | 1 and for all h ∈ C b (E) and 0 < α < β we have In other words: if R(λ) serves as a classical left-inverse to ½ − λH and is also a pseudo-resolvent, then it is a viscosity right-inverse of (½ − λH).
To finish the analysis towards goal (1), we need to establish that our resolvent approximates the semigroup: (c) For the resolvent in (1.5) it holds that V(t)h = lim m R t m m h, where the semigroup is given by (1.2).
This result follows from the intuition that the sum of n independent exponential random variables of mean t/n converges to t. The difficulty lies in analysing the concatenation of suprema as in (1.5), which will be carried out using suitable upper and lower bounds. The second goal, (2), of this paper is to reprove the main large deviation result of [8]. The general procedure is as follows: • Given exponential tightness, one can restrict the analysis to the finite-dimensional distributions.
• One establishes the large deviation principle for finite-dimensional distributions by assuming this is true at time 0 and by proving that rescaled versions of the semigroups (1.2) of conditional log-moment generating functions converge.
• One proves convergence of the infinitesimal generators H n → H and establishes well-posedness of the Hamilton-Jacobi equation f − λHf = h to obtain convergence of the semigroups.
This paper follows the same general strategy, but establishes the third step in a new way. Instead of working with the resolvent of approximating Markov jump processes, the proof in this paper is based on a semigroup approximation argument of [13] combined with the explicit identification of the resolvents corresponding to the non-linear operators H n .
We give a short comparison of the result in this paper to the main result in [8]. Our condition on the convergence of Hamiltonians H n → H is slightly simpler than the one in [8]. This is due to being able to work with the Markov process itself instead of a approximating jump process. The result in this paper is a bit weaker in the sense that we assume the solutions to the martingale problems are continuous in the starting point, as opposed to only assuming measurability in [8]. This is to keep the technicalities as simple as possible, and it is expected this can be generalized. In addition, [8] establishes a result for discrete time processes, which we do not carry out here. This extension should be possible too. The paper is organized as follows. We start in Section 2 with preliminary definitions. In Section 3 we state the main results on the resolvent. In addition to the announced results (a), (b) and (c) we also obtain that R(λ) is a continuous map on C b (E). Proofs of continuity of R(λ) in addition to various other regularity properties are given in Section 5, the proofs of (a), (b) and (c) are given in Section 6. In Section 4 we state a simple version of the large deviation result. A more general version and its proof are given in Section 7.

Preliminaries
Let E be a Polish space. C b (E) denotes the space of continuous and bounded functions. Denote by B(E) the Borel σ-algebra of E. Denote by M(E) and M b (E) the spaces of measurable and bounded measurable functions f : E → [−∞, ∞] and denote by P(E) the space of Borel probability measures on E. | |·| | will denote the supremum norm on C b (E). In addition to considering uniform convergence we consider the compact-open and strict topologies: • The compact open topology κ on C b (E) is generated by the semi-norms p K (f) = sup x∈K |f(x)|, where K ranges over all compact subsets of E.
• The strict topology β on C b (E) is generated by all semi-norms p Kn,an (f) := sup n a n sup x∈Kn |f(x)| varying over non-negative sequences a n converging to 0 and sequences of compact sets K n ⊆ E. See e.g. [14,20,23].
As we will often work with the convergence of sequences for the strict topology, we characterize this convergence and give a useful notion of taking closures. A sequence f n converges to f for the strict topology if and only if f n converges to f bounded and uniformly on compacts (buc): We denote by D E (R + ) the Skorokhod space of trajectories X : R + → E that have left limits and are right-continuous. We equip this space with its usual topology, see [7,Chapter 3]. As D E (R + ) is our main space of interest, we write P := P(D E (R + )). Let X be a general Polish space (e.g. E or D E (R + )). For two measures µ, ν ∈ P(X) we denote by the relative entropy of ν with respect to µ. For any sub-sigma algebra F of B(X), we denote by S F the relative entropy when the measures are restricted to the σ-algebra F. In the text below, we will often work with the space D E (R + ). We will then write S t for the relative entropy when we restrict to F t := σ (X(s) | s t).
Finally, for λ > 0, denote by τ λ ∈ P(R + ) the law of an exponential random variable with mean λ:

The martingale problem
) and a measure ν ∈ P(E), we say that P ∈ P(D E (R + )) solves the martingale problem for (A, ν) if P • X(0) −1 = ν and if for all f ∈ D(A) Af(X(s))ds is a martingale with respect to its natural filtration F t := σ (X(s) | s t) under P.
We say that uniqueness holds for the martingale problem if for every ν ∈ P(X) the set of solutions of the martingale problem that start at ν has at most one element. Furthermore, we say that the martingale problem is well-posed if this set contains exactly one element for every ν.

Viscosity solutions of Hamilton-Jacobi equations
Consider an operator If B is single valued and (f, g) ∈ B, we write Bf := g. We denote D(B) for the domain of B and R(B) for the range of B.
Consider the equation • We say that a bounded upper semi-continuous function u : • We say that a bounded lower semi-continuous function v : (2.5) • We say that u is a solution of (2.1) if it is both a subsolution and a supersolution.
• We say that (2.1) satisfies the comparison principle if for every subsolution u, we have sup Note that the comparison principle implies uniqueness of viscosity solutions.

Convergence of operators
there are (f n , g n ) ∈ B n such that β−lim n f n = f and β−lim g n = g.

Large deviations
Definition 2.4. Let {X n } n 1 be a sequence of random variables on a Polish space X. Furthermore, consider a function I : X → [0, ∞] and a sequence {r n } n 1 of positive real numbers such that r n → ∞. We say that • the function I is a rate-function if the set {x | I(x) c} is closed for every c 0.
We say I is good if the sub-level sets are compact.
• the sequence {X n } n 1 is exponentially tight at speed r n if, for every a 0, there exists a compact set K a ⊆ X such that lim sup n r −1 n log P[X n / ∈ K a ] −a.
• the sequence {X n } n 1 satisfies the large deviation principle with speed r n and good rate-function I if for every closed set A ⊆ X, we have lim sup and if for every open set U ⊆ X,

The non-linear resolvent of a Markov process
Our main result is based on the assumption that the martingale problem is wellposed and that the solution map in terms of the starting point is continuous.
is an operator such that the martingale problem The map x → P x is assumed to be continuous for the weak topology on P = P(D E (R + )).
We introduce the triplet of key objects in semi-group theory: generator, resolvent, and semigroup.

Definition 3.2. (a) Let H be a collection of pairs
are martingales with respect to the filtration F t := σ (X(s) | s t) and law P x .
(c) For t 0 and h ∈ C b (E), define Note that the final equality follows by Lemma A.1.
The following is an immediate consequence of [ The first main result of this paper is the following. The proof of this result follows in Section 6. To facilitate further use of the nonlinear resolvent, we establish also that (a) The map R(λ) maps C b (E) into C b (E).
(b) The operators R(λ) act as the resolvent of the semigroup {V(t)} t 0 .
These properties will allow us to use our main result to establish large deviations in a later part of the paper, see Section 7. We state (a) and (b) as Propositions.
for the strict topology.
Proposition 3.5 will be verified in Section 5, in which we will also verify other regularity properties of R(λ). Proposition 3.6 is a part of our main results connecting the resolvent and semigroup and will be established in Section 6.

Strategy of the proof of Theorem 3.4 and discussion on extensions
Theorem 3.4 will follow as a consequence of Proposition 3.4 of [13]. We therefore have to check three properties of R(λ): The pseudo-resolvent property: for all h ∈ C b (E) and 0 < α < β we have We verify (c) in Section 5 as it relates to the regularity of the resolvent. We verify (a) and (b) in Sections 6.1 and 6.2 respectively. As is known from the theory of weak convergence, the resolvent is related to exponential integrals.
• (a) is related to integration by parts: for bounded measurable functions z on R + , we have • (b) is related to a more elaborate property of exponential random variables.
• Finally, the approximation property of Proposition 3.6 is essentially a law of large numbers. The sum of n independent random variables of mean t/n converges to t.
In the non-linear setting, our resolvent is given in terms of an optimization problem over an exponential integral. Thus, our method is aimed towards treating the optimisation procedures by careful choices of measures and decomposition and concatenation or relative entropies by using Proposition A.3 and then using the properties of exponential integrals.
Any of the results mentioned in the above section can be carried out by introducing an extra scaling parameter into the operators. t → exp r f(X(t)) − f(X(0)) − t 0 g(X(s))ds are martingales. As above, we have (f, r −1 e −rf g) (e rf , g) ∈ A ⊆ H[r]. Relatively straightforwardly, chasing the constant r, one can show that R[r](λ)h gives viscosity We also have Question 3.8. To some extent one could wonder whether Theorem 3.4 has an extension where H † is a collection of pairs (f, g) such that are supermartingales, and where H ‡ is a collection of pairs (f, g) such that The statement would become that for each h ∈ C b (E) and λ > 0 the function R(λ)h is a viscosity subsolution to f − λH † f = h and a viscosity supersolution to f − λH ‡ f = h. Indeed, some of the arguments in Section 6 can be carried out for sub-and supermartingales respectively. Certain arguments, however,use that we work with martingales. For example, Lemma A.1 holds for probability measures only.

Large deviations for Markov processes
In this section, we consider the large deviations on D E (R + ) of a sequence of Markov processes X n . In Section 7 below, we will instead consider the more general framework where the X n take their values in a sequence of spaces E n that are embedded in E by a map η n and where the images η n (E n ) converge in some appropriate way to E. As this introduces a whole range of technical complications, we restrict ourselves in this section to the most simple case.
be linear operators and let r n be positive real numbers such that r n → ∞. Suppose that • The martingale problems for A n are well-posed. Denote by x → P n x the solution to the martingale problem for A n .
• For each n that x → P n x is continuous for the weak topology on P(D E (R + )).
• for all compact sets K ⊆ E and a 0 there is a compact set K a ⊆ D E (R + ) such that lim sup n sup x∈K 1 r n log P n The first two conditions correspond to Condition 3.1. The final one states that we have exponential tightness of the processes X n uniformly in the starting position in a compact set.
Corresponding to the previous section, define the operators H n consisting of pairs g(X n (s))ds are martingales. Also define the rescaled log moment-generating functions Theorem 4.2. Let Condition 4.1 be satisfied. Let r n > 0 be some sequence such that r n → ∞. Suppose that (a) The large deviation principle holds for X n (0) on E with speed r n and good rate function I 0 .
In addition, the processes X n satisfy a large deviation principle on D E (R + ) with speed r n and rate function ).

(4.1)
Here ∆ c γ is the set of continuity points of γ. The conditional rate functions I t are given by

Remark 4.3.
A representation for I in a Lagrangian form can be obtained by the analysis in Chapter 8 of [8]. To some extent the analysis is similar to the one of this paper. First, one identifies the resolvent as a deterministic control problem by showing that it solves the Hamilton-Jacobi equation in the viscosity sense. Second, one shows that it approximates a control-semigroup. Third, one uses the controlsemigroup to show that (4.1) is also given in terms of the control problem.

Regularity of the semigroup and resolvent
The main object of study of this paper is the resolvent introduced in Definition 3.2. Before we start with the main results, we first establish that the resolvent itself is 'regular': • We establish that h → R(λ)h is sequentially continuous for the strict topology.
• We establish that lim λ↓0 R(λ)h = h for the strict topology.
Before starting with analysing the resolvent, we establish regularity properties for the cost function that appears in the definition of R(λ).

Properties of relative entropy
A key property of Legendre transformation is that convergence of convex functionals implies (and is often equivalent) to Gamma convergence of their convex duals. This can be derived from a paper of Zabell [24]. In the context of weak convergence of measures this has recently been established with a direct proof by Mariani in Proposition 3.2 of [16]. We state the result for completeness. (a) µ n → µ weakly, (1) The Gamma lower bound: for any sequence ν n → ν we have lim inf n S(ν n | µ n ) (2) The Gamma upper bound: for any ν there are ν n such that ν n → ν such that lim sup n S(ν n | µ n ) S(ν | µ).
Our resolvent is given in terms of the cost functional Below, we establish Gamma convergence for S λ .
• The Gamma lim inf n inequality, in addition to the compactness of the level sets (coercivity) of S λ is, established in Lemma 5.2.
• In Proposition 5.3 we strengthen the coercivity to allow for compactness of the level sets of S λ uniformly for small λ (equi-coercivity). This property will allow us to study R(λ) uniformly for small λ.
• The Gamma lim sup n inequality is established in Proposition 5.4.

The Γ − lim inf inequality and coercivity
Lemma 5.2. For any λ > 0 the map is lower semi-continuous. In addition, the map has compact sublevel sets in the following sense: fix a compact set K ⊆ P(D E (R + )) and c 0. Then the set Proof. The first claim follows by lower semi-continuity of (P, Q) → S t (Q | P) and Fatou's lemma. For the second claim note that a set A ⊆ P( is compact for all t, see Theorem 3.7.2 in [7]. Thus, fix t and suppose Q ∈ A(c). Then there is some P ∈ K such that The result now follows by Proposition A.4.
The final estimate in the above proof is not uniform for small λ. this is due to the fact that the exponential random variables τ λ concentrate near 0. Thus, we can only control the relative entropies for small intervals of time after which the measure Q is essentially free to do what it wants. Equi-coercivity of the level sets can be recovered to some extent by restricting the interval on which one is allowed to tilt the measure.
Proof. First recall that a set of measures in P is compact if the set of their restrictions to a finite time interval is relatively compact. Pick P ∈ K and 0 < λ λ 0 and let Q * ∈ P be such that S λ (Q * | P) c. We obtain By the remark at the start of the proof, this set is compact by Proposition A.4.

The Γ − lim sup inequality: construction of a recovery sequence
For the proof of the Γ − lim inf inequality, we could use Proposition 5.1 and Fatou.
In the context of the Γ − lim sup inequality, we run into the following issue. Given a sequence x n → x and fixed time t, the result of Proposition 5.1 will allow to construct a sequence Q n converging to Q such that lim sup n S t (Q n | P xn ) S t (Q | P x ). This statement can, however, not immediately be lifted to the functional S λ as the construction gives no information on times s = t. But, using the Markovian structure of the family {P y } y∈E and continuity of these measures in y will allow us to construct measures Q n converging to Q such that also lim sup n S λ (Q n | P xn ) S λ (Q | P x ). This construction will be carried out via a projective limit argument.
Then, there are measures Q n ∈ P(D E (R + )) that converge to Q. In addition We infer from Fatou's lemma that also We will construct the measures Q n by arguing via appropriately chosen finitedimensional projections of Q. Thus, we need to establish a conditional version of the lim sup n inequality for Gamma convergence of relative entropy functionals. We state and prove this conditional result first, after which we prove Proposition 5.4.
Suppose that this family of measures is a version of the regular conditional measures µ n (· | x) and also of {µ(· | x)} x∈X .
Then there are measures ν n ∈ P(X × Y) converging to ν such that the restriction of ν n to X equals ν n,0 and lim sup n→∞ S(ν n | µ n ) S(ν | µ).
Proof. First of all, note that if S(ν | µ) = ∞, the proof is trivial. Thus, assume Denote by ν(· | x) a version of the regular conditional probability of ν conditional on x ∈ X. By the Skorokhod representation theorem, [2, Theorem 8.5.4], we can find a probability space (Ω, A) and a measure κ on (Ω, A), and random variables X n , X : Ω → X such that the random variables X n and X under the law κ have distributions ν n,0 and ν 0 and such that X n converges to X κ almost surely. Thus, by assumption, there is a set B ∈ A of κ measure 1 on which X n → X and on which µ n (· | X n ) =μ(· | X n ) converges to µ(· | X) =μ(· | X). It follows by Proposition 5.1 that on this set there are measures π n (· | X n ) such that: weakly, lim sup n S(π n (· | X n )| µ n (· | X n )) S(ν(· | X)| µ(· | X)).
We could construct a sequence of measures ν n out of ν 0 and the conditional kernels π n . To establish the lim sup n inequality for the relative entropies, however, we will need to interchange a lim sup n and an integral by using Fatou's lemma. At this point, we are not able to give a dominating function that will allow the application of Fatou. To solve this issue, we will use π n only when its relative entropy is not to large.
We start with the proof of (1). By construction and Proposition A.3, we have In line 3, we used Fatou's lemma, using as an upper bound the function S(ν(· | X) | µ(· | X))+ 1. This function has finite κ integral as Next, we establish (2): ν n → ν. By (1) and Proposition A.4 the collection of measures ν n is tight. As a consequence, it suffices to establish that hdν n → hdν for a strictly dense set of functions h that is also an algebra by the Stone-Weierstrass theorem for the strict topology. Clearly, the set of linear combinations of functions of the form h(x, y) = f(x)g(y) is an algebra that separates points. Thus, it suffices to establish convergence for h(x, y) = f(x)g(y) only. For h of this form, we have By the weak convergence of ν n (· | X n ) to ν(· | X) on a set of κ measure 1, we find by the dominated convergence theorem that This establishes that hdν n → hdν for h(x, y) = f(x)g(y) and thus that ν n → ν.
Proof of Proposition 5.4. First of all: we can choose finite collections of times T k := . . } such that: For any k, we find by Lemma 5.5 and induction over the finite collection of times in T k that there are measures Q k n ∈ P(D E (R + )) such that (1) for all t t imax(k) : Thus, we obtain for all t 0 that which implies by Proposition A.4 that the family Q k n is tight. By construction, i.e. Lemma 5.5, the restrictions of the measures Q k n to the set of times T k converge to the restriction of Q to the times in T k . A straightforward diagonal argument can be used to find k(n) such that restriction of the measures Q n := Q k(n) n to the union k T k to Q restricted to the union k T k . This however, establishes that Q n converges to Q by Theorem 3.7.8 of [7].

Regularity of the resolvent in x
We proceed with the proof of Proposition 3.5: establishing R(λ)h ∈ C b (E). For the proof of upper semi-continuity of x → R(λ)h(x) we use the following technical result that we state for completeness. Lemma 5.6 (Lemma 17.30 in [1]). Let X and Y be two Polish spaces. Let φ : Proof of Proposition 3.5. Fix λ > 0 and h ∈ C b (E). Denote as before to shorten notation. By Lemma 5.2 the map Q → S λ (Q | P x ) has compact sublevelsets and is lower semi-continuous. As h is bounded we have where Γ x := {Q ∈ P | S λ (Q | P x ) 2 | |h| |}. Note that Γ x is non-empty and compact. Due to the lower semi-continuity of S λ and the continuity of the integral over h, it follows that x → R(λ)h(x) is upper semi-continuous by Lemma 17.30 of [1] if the collection of sets Γ x is upper hemi-continuous; or in other words: if Q n ∈ Γ xn and (x n , Q n ) → (x, Q) then Q ∈ Γ x . This, however, follows directly from the lower semi-continuity of S λ . Next, we establish lower semi-continuity of x → R(λ)h(x). Let x n be a sequence converging to x. Pick Q so that It follows by Proposition 5.4 that there are Q n ∈ P(D E (R + )) such that Q n → Q and lim sup n S λ (Q n | P xn ) S λ (Q | P x ). We obtain that establishing lower semi-continuity.

Regularity of the resolvent in h
We proceed with establishing that the resolvent is sequentially strictly continuous in h, uniformly for small λ.
Pick an arbitrary λ such that 0 < λ λ 0 . For x ∈ K, let Q x,λ ∈ P be the measure such that Denote by T (λ) := −λ log δ 2||h1−h2|| . Then it follows that Now denote by Q x,λ the measure that equals Q x,λ on the time interval [0, T (λ)] and satisfies S T (λ) ( Q x,λ |P x ) = S( Q x,λ |P x ). By Proposition 5.3 the set of the measures Q x,λ , x ∈ K, 0 < λ λ 0 , is relatively compact, which implies we can find a K ⊆ E with probability (1 − δ 2 ) the trajectories stay in K. We conclude that for all λ such that 0 < λ λ 0 .

Strong continuity of the resolvent and semigroup
We establish that as λ ↓ 0 the resolvents converge to the identity operator. We also establish strict continuity of the semigroup.

Lemma 5.8.
For h ∈ C b (E) we have lim λ→0 R(λ)h = h for the strict topology.
Proof. As | |R(λ)h| | | |h| | strict convergence lim λ→0 R(λ)h = h follows by proving uniform convergence on compact sets K ⊆ E. If we choose for Q the measure P x in the defining supremum of R(λ)h(x), we obtain the upper bound As the measures {P x } x∈K are tight, we have control on the modulus of continuity of the trajectories t → X(t). This implies that the right-hand side converges to 0 as λ ↓ 0 uniformly for x ∈ K. We prove the second inequality. Fix ε ∈ (0,4 | |h| |), we prove that for λ sufficiently small, we have sup x∈K R(λ)h(x) − h(x) ε. First of all, let T (λ) := −λ log ε 4||h|| and let Q x,λ optimize R(λ)h(x). We then have Also note that as in Lemma 5.7 we have S λ (Q x,λ | P x ) 2 | |h| |. This implies, using that t → S t is increasing in t, that Denote by Q x,λ the measures that equal Q x,λ up to time T (λ) and satisfy Now let λ λ * := log 4 | |h| | ε −1 −1 . Then T (λ) 1 and we obtain for all s 1 that Proof. The map t → S(t)e f is strictly continuous by Theorem 3.1 of [14] and bounded away from 0. Thus a straightforward verification shows that also V(t)f = log S(t)e f is strictly continuous.

Measurability of the optimal measure
In Section 6 below, we will apply the resolvent to the resolvent. This means we have to perform an optimization procedure twice. In particular, this implies we have to integrate over the outcome of the first supremum. To treat this procedure effectively, we need measurability of the optimizing measure.
Lemma 5.10. Let h ∈ C b (E) and λ > 0. There exists a measurable map x → Q x such that Q x ∈ P and We base the proof of this result on a measurable-selection theorem. We state it for completeness.
Theorem 5.11 (Theorem 6.9.6 in [2]). Let X, Y be Polish spaces and let Γ be a measurable subset of X × Y. Suppose that the set Γ x := {y | (x, y) ∈ Γ } is non-empty and σ-compact for all x ∈ X. Then Γ contains the graph of a Borel measurable mapping f : X → Y.
We will apply this result below by using the following argument. Let f, g be measurable maps f, g : which is the inverse image of {0} and hence measurable.
Proof of Lemma 5.10. We aim to apply Theorem 5.11. Thus, we have to establish that the set Γ ⊆ E × P defined by is measurable and that Γ x := {Q | (x, Q) ∈ Γ } is non-empty and σ-compact.
Similarly as in the proof of Proposition 3.5, we find that Γ x is compact and nonempty. We also saw in that proof that the map ( is upper semi-continuous. As x → R(λ)h(x) is continuous by Proposition 3.5 we see that the set Γ is the set of points where two measurable functions agree implying that Γ is measurable. An application of Theorem 5.11 concludes the proof.

Proofs of the main results
In this section, we prove the two main results: Theorem 3.4 and Proposition 3.6. We argued in Section 3.1 that the first result follows by establishing that R(λ) is a classical left-inverse of (½ − λH) and that the family R(λ) is a pseudo-resolvent. We establish these two properties in Sections 6.1 and 6.2. The proof of Proposition 3.6 is carried out in Section 6.3.

R(λ) is a classical left-inverse of ½ − λH
The proof that R(λ) is a classical left-inverse of ½ − λH is based on a well known integration by parts formula for the exponential distribution for bounded measurable functions z on R + we have A generalization is given by the following lemma.
Lemma 6.1. Fix λ > 0 and Q ∈ P(D E (R + )). Let z be a measurable function on E.
Then we have The lemma allows us to rewrite the application of R(λ) to f − λg in integral form.
The integral that comes out can be analyzed using the definition of H in terms of exponential martingales. This leads to the desired result. Proof. Fix λ > 0, x ∈ E and (f, g) ∈ H. We start by proving R(λ)(f − λg)(x) f(x).
Set h = f − λg. By Lemma 6.1 we have By optimizing the integrand, we find by Lemma A.1 As (f, g) ∈ H we can reduce the inner expectation to time 0 by using the martingale property. This yields establishing the first inequality. We now prove the reverse inequality R(λ)(f − λg)(x) f(x). To do so, we construct a measure Q that achieves the supremum. For each time t 0, define the measure Q t via the Radon-Nykodim derivative dQ t dP (X) = exp f(X(t)) − f(X(0)) − t 0 g(X(s))ds .
Note that as t → exp f(X(t)) − f(X(0)) − t 0 g(X(s))ds is a P x martingale, we have for s t that Q t | Fs = Q s | Fs . Thus, standard arguments show that there is a measure Q ∈ P such that Q| Ft = Q t | Ft . Note that by construction, we have Q(X(0) = x) = 1. Using this measure Q, applying Lemma 6.1, we obtain establishing the second inequality.

R is a pseudo-resolvent
The next step is the verification that the family of operators R(λ) is a pseudoresolvent. As in the previous section, this property is essentially an extension of a key property of the exponential distribution. We state it as a lemma that can be verified using basic calculus. Lifting this property to the family R(λ) yields the pseudo-resolvent property.

Proposition 6.4. For all
x ∈ E, and 0 < α < β, we have Note that the right-hand side of (6.2) can be rewritten as 3) To establish (6.2) we establish two inequalities. To do so, we will consider two techniques. First, to prove that the right-hand side is dominated by the left-hand side, we need to concatenate optimizers. To establish the other inequality, we will take an optimizer for R(β)h and make a time-dependent splitting, so that we can dominate the first part in the first optimization, and the second by the second optimization in (6.3). The proof of Proposition 6.4 will be carried out in the next two sections. Both proofs are inspired by the proof of Lemma 8.20 of [8] where the pseudo-resolvent property is established for the deterministic case.

Concatenating measures
In this section, we will prove that

(6.4)
We start by introducing the procedure of concatenating measures. For s 0 and X, Y ∈ D E (R + ) such that X(s) = Y(0), define the concatenation κ s X,Y ∈ D E (R + ) by For Q ∈ P(D E (R + )) and map q : D E (R + ) → P(D E (R + )) with q(X) = Q X that is F s measurable and supported on a set such that Y(0) = X(s) define the measure Before starting with the proof of (6.4), we start with the computation of the relative entropy of Q ⊙ s q.
Lemma 6.5. Fix s, t > s and X ∈ D E (R + ). We have Proof. Fix s, t > s and X ∈ D E (R + ). Define the measure Q s,X (dZ) = Q X (dY)δ κ s X,Y (dZ).
It follows by definition that Q⊙ s q(dZ) = Q(dX) Q s,X (dZ) and that Q s,X is the regular conditional measure of Q ⊙ s q conditioned on F s . Denote by P [0,s],X the measure P x conditioned on F s . Proposition A.3, applied for the conditioning on F s yields Both measures Q s,X and P [0,s],X are supported by trajectories that equal X on the time interval [0, s]. Shifting both measures by s, we find Q X (as defined above) and by the Markov property P X(s) . As this shift is a isomorphism of measure spaces, we find which establishes the claim.
Proof of (6.4). Fix h ∈ C b (E), x ∈ E and 0 < α < β. We aim to establish (6.4) by taking the optimizers for both optimization procedures on the right-hand side and to concatenate them. This will yield a new measure that also turns up in the optimization procedure on the left-hand side, thus establishing the claim. For the concatenation, we use Lemma 6.5 to put together the relative entropies of both procedures finish with Lemma 6.3 to obtain the correct integral form. Thus, let Q ∈ P be the optimizer of For any y ∈ E let Q y ∈ P be the optimizer of Fix s 0. We established in Lemma 5.10 that the map q defined by q(y) := Q y is measurable. Thus, using Q and q, we define Q s := Q ⊙ s q as in (6.5). By definition of R(β)h(x), we find We treat both terms on the right-hand side separately.
Using Lemma 6.5 in line 3 below, we find that the second term equals Thus, for each fixed s, we find a lower bound for R(β)h(x). If we multiply this inequality by the probability density β−α αβ e β −1 s−α −1 s on R + and integrate over s, we find The integrals of terms in line three, five and size immediately simplify to integration over τ α (ds) and τ β (dt). The two other integrals can be simplified by using that for nice functions G we have Plugging in also the equality β−α Note that the terms in the third and fourth line together give − Changing the roles of s and t in the double integrals, we arrive at the inequality By our choice of Q X(t) , we see that indeed which establishes (6.4).

Decomposing measures
In this section, we will prove that The main step in the proof is to decompose the measure that turns up as the optimizer in the variational problem defining R(β)h. Fix x ∈ E and let Q ∈ P such that By general measure theoretic arguments, we can find for every fixed t a F t measurable family of measures X → Q t,X such that and such that if Q t,X is restricted to trajectories up to time t we find δ X . Denote by Q t,X the measure that is obtained from Q t,X under the push-forward map θ t (X)(s) = X(t + s).
Thus, Q t,X is supported by trajectories such that Y(0) = X(t) (for Q almost all X).
Proof of (6.7). As in Section 6.2.1, we obtain that Thus, if we can prove that then we obtain (6.7) by replacing Q t,X by its optimum to obtain R(β)h(X(t)) in the integrand and afterwards optimizing to obtain R(α). This, however, follows as in the proof of the first inequality in Section 6.2.1.

A variational semigroup generated by the resolvent
We conclude this section by proving Proposition 3.6, that is, we establish that the resolvent approximates the semigroup. Again, the key idea is to reduce to a property of exponential distributions. This time, we will use that the sum of n independent exponential random variables with mean t/n converges to t. As the resolvent is defined in terms of an optimization procedure, we cannot directly apply this intuition. However, we will use natural upper and lower bounds for concatenations of R(λ) that we can control. The result will follow immediately from Lemma's 6.7 and 6.8 below. We start with the definition of some additional operators. For each distribution τ ∈ P(R + ) and For all τ and h, we have T + (τ)h T − (τ)h. For exponential random variables τ λ or fixed times t, we find Lemma 6.7. For τ 1 , τ 2 , we have Proof. The first claim follows by similar, but easier, arguments as in the proof of (6.4) in Section 6.2.1. Similarly, for the second claim, we refer to the arguments in Section 6.2.2.
Lemma 6.8. Let h ∈ C b (E) and t ∈ R + and let τ n ∈ P(R + ) be such that τ n → δ t . Then we have lim for the strict topology. In addition, we have for each sequence x n → x that as well as sup n | |T − (τ n )h| | | |h| |.
Proof. Fix h ∈ C b (E) and a sequence τ n and t such that τ n → δ t . Note that it is immediate that sup n | |T + (τ n )h| | | |h| | and sup n | |T − (τ n )h| | | |h| |. We proceed by establishing strict convergence for T + (τ n )h. By Lemma A.1, we have By Lemma 5.9 the map t → V(t)f is continuous for the strict topology. Thus strict continuity of τ → T + (τ)h follows. For the second statement, fix x n converging to x in E. Let Q ∈ P(D E (R + )) such that and such that S t (Q | P x ) = S(Q | P x ).
By Proposition 5.4, we can find Q n ∈ P n such that Q n → Q and such that for each s we have lim sup s S s (Q n | P xn ) S s (Q | P x ) and S s (Q n |P xn ) S s (Q | P x ) + 1 for all n and s. These properties imply that S t (Q n | P xn ) + 1 if s t + 1. (6.10) Thus, applying the lim inf n to T − (τ n )h(x n ), we find As Q n → Q and τ n → τ and the map s → X(s) is continuous at t for Q almost every X as Q ≪ P x , the first term converges to h(X(t))Q(dX). For the second term, we obtain by (6.10) and the property that We conclude that lim inf n T − (τ n )h(x n ) V(t)f(x n ).

A large deviation principle for Markov processes
In Section 4, we considered a sequence of Markov processes on a Polish space E and stated a large deviation principle on D E (R + ). In this section, we prove a more general version of this result that takes into account variations that one runs into in practice. As a first generalization, we consider Markov processes t → X n (t) on a sequence of spaces E n that are embedded into some space E using maps η n : E n → E.
As an example X n could be a process on E n := {−1,1} n , whereas we are interested in the large deviation behaviour of the average of the n values which takes values in E = [−1,1]. In Theorem 4.2, we assumed exponential tightness and that certain sequences of functions converge. We need to modify these two concepts to allow for a sequence of spaces.
• We want to establish convergence of functions that are defined on different spaces. We therefore need a new notion of bounded and uniform convergence on compact sets. The key step in this definition will be to assign to each compact set K ⊆ E a sequence of compact sets K n ⊆ E n so that η n (K n ) 'converge' to K. In fact, to have a little bit more flexibility in our assignment of compact sets, we will work below with an large index set Q so that to each q ∈ Q we associate compact sets K q n ⊆ E n and K q ⊆ E.
• Exponential tightness and buc convergence can be exploited together to make sure we get proper limiting statements. As our notion of buc convergence changes, we have to adapt our notion of exponential tightness to take into account the index set Q.
We make to additional generalizations that are useful in practice.
• Often, it is hard to find an operator Rather one finds upper and lower bounds H † and H ‡ for the sequence H n . See also Question 3.8 on whether at the pre-limit level one is able to work with upper and lower bounds.
• In the context of averaging or homogenisation, the natural limiting operator H is a subset of C b (E) × C b (F), where F is some space that takes into account additional information. For example F = E × R, where the additional component R takes into account the information of a fast process or a microscopic scale.
We thus start with a section on preliminaries that allows us to talk about these four extensions.

Preliminary definitions
Definition 7.1 (Kuratowski convergence). Let {A n } n 1 be a sequence of subsets in a space E. We define the limit superior and limit inferior of the sequence as lim sup where U x is the collection of open neighbourhoods of x in E. If A := lim sup n A n = lim inf n A n , we write A = lim n A n and say that A is the Kuratowski limit of the sequence {A n } n 1 .

Embedding spaces
Our main result will be based on the following setting.

Assumption 7.2.
We have spaces E n and E, F and continuous maps η n : E n → E, η n : E n → F and a continuous surjective map γ : F → E such that the following diagram commutes: In addition, there is a directed set Q (partially ordered set such that every two elements have an upper bound). For each q ∈ Q, we have compact sets K q n ⊆ E n and compact sets K q ⊆ E and K q ⊆ F such that (a) If q 1 q 2 , we have K q1 ⊆ K q2 , K q1 ⊆ K q2 and for all n we have K q1 Remark 7.3. Note that (b) implies that lim sup n η n (K q n ) ⊆ K q and together with (d) that lim sup n η n (K q n ) ⊆ K q . Thus, the final three conditions imply that the sequences η n (K q n ) for various q ∈ Q covers all compact sets in E, and also are covered by compact sets in E (in fact this final statement holds on the larger space F). This implies that the index set Q connects the structure of compact sets in E and F in a suitable way to (a subset) of the compact sets of the sequence E n .
We use our index set Q to extend our notion of bounded and uniform convergence on compacts sets. Definition 7.4. Let Assumption 7.2 be satisfied. For each n let f n ∈ M b (E n ) and • if for all q ∈ Q and x n ∈ K q n converging to x ∈ K q we have • if for all q ∈ Q lim n→∞ sup x∈K q n |f n (x) − f(η n (x))| = 0.

Viscosity solutions of Hamilton-Jacobi equations
Below we will introduce a more general version of viscosity solutions compared to Section 2. One recovers the old definition by taking B † = B ‡ = B, F = E and γ(x) = x. (7.2) • We say that u : X → R is a subsolution of equation (7.1) if u ∈ USC u (E) and if, for all (f, g) ∈ B † such that sup x u(x) − f(x) < ∞ there is a sequence y n ∈ F such that lim n→∞ u(γ(y n )) − f(γ(y n )) = sup and lim sup n→∞ u(γ(y n )) − g(y n ) − h 1 (γ(y n )) 0.

(7.4)
• We say that v : E → R is a supersolution of equation ( and lim inf n→∞ v(γ(y n )) − g(y n ) − h 2 (γ(y n )) 0. (7.6) • We say that u is a solution of the pair of equations (7.1) and (7.2) if it is both a subsolution for B † and a supersolution for B ‡ .
• We say that (7.1) and (7.2) satisfy the comparison principle if for every subsolution u to (7.1) and supersolution v to (7.2), we have

Notions of convergence of Hamiltonians
We now introduce our notion of upper and lower bound for the sequence H n .
(a) The extended sub-limit ex − subLIM n H n is defined by the collection and if for any q ∈ Q and sequence z n(k) ∈ K q n(k) (with k → n(k) strictly increasing) such that lim k η n(k) (z n(k) ) = y in F with lim k f n(k) (z n(k) ) = f(γ(y)) < ∞ we have lim sup k→∞ g n(k) (z n(k) ) g(y).
and if for any q ∈ Q and sequence z n(k) ∈ K q n(k) (with k → n(k) strictly increasing) such that lim k η n(k) (z n(k) ) = y in F with lim k f n(k) (z n(k) ) = f(γ(y)) > −∞ we have lim inf k→∞ g n(k) (z n(k) ) g(y).
(7.13) Remark 7.8. The conditions in (7.8) and (7.11) are implied by LIM f n = f. Conditions (7.9) and (7.10) are implied by LIM n g n g whereas conditions (7.12) and (7.13) are implied by LIM n g n g.
Comparing this to Definition 2.3, we indeed see that the sub and super-limit can be interpreted as upper and lower bounds instead of limits.

Large deviations for Markov process
We proceed by stating our main large deviation result, which extends Theorem 4.2.
We first give the appropriate generalization of Condition 4.1.
be linear operators and let r n be positive real numbers such that r n → ∞. Suppose that • The martingale problems for A n are well-posed on E n . Denote by x → P n x the solution to the martingale problem for A n .
• For each n that x → P n x is continuous for the weak topology on P(D En (R + )).
• For each a 1 > 0 there is a q ∈ Q such that lim sup • [Exponential compact containment] For each q ∈ Q, T > 0 and a 2 > 0 there existsq =q(q, T, a 2 ) ∈ Q such that lim sup n→∞ sup y∈K q n 1 r n log P ∃ t T : Note that these conditions can be mapped to the ones of Condition 4.1, except for the third one. In Theorem 4.2, we assumed the large deviation principle at time 0 which implies this remaining condition if E n = E. Here, however, we need to assume that the mass is concentrated already on a q ∈ Q before the maps η n .
Theorem 7.10. Suppose that we are in the setting of Assumption 7.2 and that Condition 7.9 is satisfied. Denote X n = η n (Y n ). Define the operator semigroup are martingales with respect to F n t := σ{Y n (s) | s t}. Suppose furthermore that (a) The large deviation principle holds for X n (0) = η n (Y n (0)) with speed r n and good rate function I 0 .
(b) The processes X n = η n (Y n ) are exponentially tight on D E (R + ).
(c) There are two operators Suppose that for all h ∈ D and λ > 0 the comparison principle holds for viscosity subsolutions to f − In addition, the processes X n = η n (Y n ) satisfy a large deviation principle on D E (R + ) with speed r n and rate function ).

(7.14)
Here ∆ c γ is the set of continuity points of γ. The conditional rate functions I t are given by We proceed with a two remarks on how to obtain exponential tightness of the processes and the variational representation of the rate function. We start with the exponential tightness. The verification of exponential tightness of the processes η n (Y n ) comes down to verifying two statements. The first one is exponential compact containment, which has been assumed in Condition 7.9. The second one is to control the oscillations of the process, which can often be achieved by considering the exponential martingales. This has been done in the proof of Corollary 4.19 of [8]. We state it for completeness, including a definition that we need in its statement.
Definition 7.11. Let q be a metric that generates the topology on E. We say that D ⊆ C b (E) approximates the metric q if for each compact K ⊆ E and z ∈ K there exist f n ∈ D such that lim n sup x∈K |f n (x) − q(x, q)| = 0.
Proposition 7.12 (Corollary 4.19 [8]). Suppose that we are in the setting of Assumption 7.2 and Condition 7.9. Let r n > 0 be some sequence such that r n → ∞. Denote X n = η n (Y n ). Let D ⊆ C b (E) and S ⊆ R. Suppose that (a) Either F is closed under addition and separates points in E and S = R or F approximates a metric q and S = (0, ∞).
(b) For each λ ∈ S and f ∈ D there are (f n , g n ) such that (λf n , g n ) ∈ D(H n ) with LIM f n = f and for all q ∈ Q sup n sup x∈K q n g n (x) < ∞.
Then the sequence of processes {X n } is exponentially tight.
Note that Condition (b) often follows from the convergence H † ⊆ ex−LIM SUP n H n .
We proceed with a remark on the variational representation of the rate function.
Remark 7.13. For an expression of the large deviation rate-functional in a Lagrangian form, one can show that a variational resolvent, similar to the one in this paper, but with a Lagrangian instead of an entropy as a penalization, solves the limiting Hamilton-Jacobi equation. This has been carried out in Chapter 8 of [8]. Generally, this leads to an expression L can usually be obtained from the operators H † and H ‡ by a (Legendre) transformation. Often one formally has We refrain from carrying out this step as it would follow [8, Chapter 8] exactly.

Strategy of the proof and discussion on the method of proof
Feng and Kurtz [8] showed in their extensive monograph that path-space large deviations of the processes X n = η n (Y n ) on D E (R + ) can be obtained by establishing exponential tightness and the convergence of the non-linear semigroups V n (t).
We repeat the important steps in this approach.
(1) A projective limit theorem (rather a special version of the projective limit theorem and the inverse contraction principle, [8,Theorem 4.28]) for the Skorokhod space establishes that, given exponential tightness, it suffices to establish large deviations for the finite dimensional distributions of X n = η n (Y n ).
(2) By Bryc's theorem, the large deviations for finite dimensional distributions follow from the convergence of the rescaled log-Laplace transforms.
(3) Using the Markov property, one can reduce the convergence of the log-Laplace transforms to the large deviation principle at time 0 and the convergence of semigroups.
We will give a new proof of the path-space large deviation principle on the basis of this strategy. However, the key component of establishing the convergence of semigroup will be based on the explicit identification of the resolvents of the nonlinear semigroups and the semigroup convergence result of [13]. At this point we remark two differences with the main result of [8].
Throughout we assume that the maps η n ,η n and x → P n x are continuous, whereas in [8] they are allowed to be measurable only. The results in [13] allow one to work with measurable resolvents also, but the methods of the first part of this paper are based on properties of continuous functions. It would be of interest to see whether these methods can be extended to the context of measurable functions also.
The key point why [8] can work with measurable maps is the approximation of the processes X n by their Yosida approximants. This approximation does introduce an extra condition into the notions of ex − LIM SUP and ex − LIM INF. Compare our 7.9 and 7.12 to Equations (7.19) and (7.22) of [8].

Proof of Theorem 7.10
The following result is based on the variant of the projective limit theorem and Bryc's theorem. See Theorem 5.15, Remark 5.16 and Corollary 5.17 in [8].
Theorem 7.14. Suppose that we are in the setting of Assumption 7.2 and that Condition 7.9 is satisfied. Denote X n = η n (Y n ). Define the operator semigroup V n (t) on C b (E n ): Suppose furthermore that (a) The large deviation principle holds for X n (0) = η n (Y n (0)) with speed r n and good rate function I 0 .
(b) The processes X n = η n (Y n ) are exponentially tight on D E (R + ).
Then the processes X n = η n (Y n ) satisfy a large deviation principle on D E (R + ) with speed r n and rate function ).

(7.15)
Here ∆ c γ is the set of continuity points of γ. The conditional rate functions I t are given by We will not prove this result, but refer to [8, pages 93 and 94] as it follows from essentially the projective limit theorem and Brycs result. The new contribution of this paper is a new method to obtain the convergence of semigroups based on the explicit identification of the resolvent corresponding to the semigroups V n (t).
Proof of Theorem 7.10. The result follows from Theorem 7.14 if we can establish the convergence of semigroups, and obtain a limiting semigroup that is defined on all of C b (E). To do so, we apply Theorem 6.1 in [13]. The semigroups V n are of the type as in Remark 3.7, whose resolvents and generators we have identified in Theorem 3.4 and Proposition 3.6. The conditions on convergence of Hamiltonians for [13, Theorem 6.1] have been assumed in Theorem 7.10 and we can work with B n = C b (E n ) due to Proposition 3.5.
The following two ingredients for the application of [13, Theorem 6.1] are missing • joint local equi-continuity of the semigroups {V n (t)} n 1 , • joint local equi-continuity of the resolvents {R n (λ)} n 1 , We check these properties in Lemmas 7.15 and 7.16 below. As a consequence [13, Theorem 6.1] can be applied, and we obtain convergence of V n (t) to a semigroup V(t), which is defined on the quasi-closure of the set Thus, if for all h ∈ C b (E) we have lim λ→0 R(λ)h = h for the strict topology, then indeed the semigroup V(t) is defined on all of C b (E). We prove this in Lemma 7.17 below. This establishes the final result.
The estimates below will be similar in spirit to estimates carried out in Section 5.
There we were able to use tightness of sets of measures that have bounded relative entropy (see Proposition A.4). Here, however, we need an argument that allows us to obtain tightness in the sense of estimates with the index set Q from exponential compact containment condition and rescaled boundedness of relative entropies. A basic estimate of this type is included as Proposition B.1 and will serve as the key replacement of Proposition A.4. Proof. Fix h 1 , h 2 ∈ C b (E n ), δ > 0, q ∈ Q, and T > 0. By exponential compact containment, see Condition 7.9, there isq such that lim sup n→∞ sup y∈K q n 1 r n log P ∃ t T : Y n (t) / ∈ Kq n Y n (0) = y −a. h 2 (Y n (t))Q(dY n ) − 1 r n S(Q | P n ) .
As h 2 is bounded, the optimizer Q n must satisfy 1 rn S(Q n | P n ) 2 | |h 2 | |. Thus by Proposition B.1 and (7.16) applied to Q n restricted to the marginal at time t, we have that for each δ = 2ε > 0, there is aq such that Proof. If we work for a single λ instead of uniformly over 0 < λ λ 0 , we can proceed as in the proof above. We first cut-off the tail of the exponential random variable which introduces a small error. Then we use the exponential compact containment condition and Proposition B.1 to find an appropriateq that can be used to finish the argument as in the proof of Lemma 7.15 above. If we work with a uniform estimate over 0 < λ λ 0 , the argument needs to be adapted as in Lemma 5.7. We carry out a similar adaptation in the proof of Lemma 7.17 below. Fix q ∈ Q such that K ⊆ K q and set h n = h • η n . Then we have by construction that LIM h n = h and by Theorem 6.1 of [13] we have LIM R n (λ)h n = R(λ)h for any λ > 0.
Pick x ∈ K and let x n ∈ K q n such that η n (x n ) → x. We have Thus the result follows if we can prove that for each ε > 0 there is a λ such that sup n |R n (λ)h n (x n ) − h n (x n )| ε. (7.17) Denote by P n y the law of Y n on D En (R + ) when started in y ∈ E n . We have R n (λ)h n (x n ) − h n (x n ) = sup Q∈P(DE n (R + ) ∞ 0 h(η n (Y n (t))) − h(η n (x n ))Q(dY n ) − 1 r n S t (Q | P n xn )τ λ (dt). (7.18) As in Lemma 5.8, we argue via a lower and upper bound.
As the measures P xn are exponentially tight, the measures Q n,λ • η −1 n restricted to F 1 are tight due to Proposition B.1. Tightness implies we can control the modulus of continuity, which implies we can upper bound (7.20) uniformly in n by ε by choosing λ small.

A Properties of relative entropy
The following result by Donsker and Varadhan can be derived from Lemma's 4.5.8 and 6.2.13 of [5]. f, ν − log e f , µ ∀ ν ∈ P(X).
By the second property of previous lemma, we immediately obtain lower semicontinuity of S.
We next give an extension of Theorem D.13 in [5], given as Exercise 5.13 in [19].
The final result of this appendix is the equi-coercivity of relative entropy in the second component.