On large deviations in the averaging principle for SDE's with a"full dependence", correction

We establish the large deviation principle for stochastic differential equations with averaging in the case when all coefficients of the fast component depend on the slow one, including diffusion.


Introduction
This is a corrected version of the paper [16]. We consider the SDE system Here X t ∈ E d , Y t ∈ M, M is a compact manifold of dimension ℓ (e.g. torus T ℓ ), f is a function with values in d-dimensional Euclidean space E d , B is a function with values in T M, C is a function with values in (T M) ℓ (i.e., in local coordinates an ℓ × ℓ matrix), (W t ) is an ℓ-dimensional Wiener process with respect to some increasing and right continuous filtration (F t ) on some probability space (Ω, F, P ), ε > 0 is a small parameter, i.e., ε → 0. Concerning SDE's on manifolds we refer to [5]. The large deviation principle (LDP) for such systems with a "full dependence", that is, C(X t , Y t ), was not treated before [16]. Only the case C(Y t ) was considered in the papers by [1,2,3] for a compact state space and by [14] for a non-compact one. Also the papers [10], [15] and [11] on similar or close topics for more general systems with small additive diffusions should be mentioned, which, however, all concern only the case C(Y t ). Concerning most recent developments the reader is referred to [7] and the references therein.
The LDP for systems like (1) is important in averaging and homogenization, in the KPP equation theory, for stochastic approximation algorithms with averaging and so forth. The problem of an LDP for the case C(X t , Y t ) has arisen since [1,2]. Intuitively, the scheme used for C(Y t ) should work; at least, almost all main steps go well. Indeed, there was only one lacuna; the use of Girsanov's transformation did not allow freezing of X t if C depended on the slow motion, while it worked well and very naturally for the drift B(X t , Y t ). Yet the problem remained unresolved for years and the answer was unclear. Notice that this difficulty does not appear in analogous discrete-time systems (see [4,Chapter 11]).
It turned out that the use of Girsanov's transformation in some sense prevented from resolving the problem. Our approach in this paper is based on a new technical result, Lemma 5 below. The main new idea is to use two different scales of partitions of the interval [0, T ], a "first-order partition" by points ∆, 2∆, . . ., which do not depend on the small parameter ε and "second-order partitions" which depend on ε in a special way, by points ε 2 t(ε), 2ε 2 t(ε), . . . . Then the exponential estimates needed for the proof of the result can be established in two steps. First, the estimates for a "small" partition interval are derived using the uniform bound of Lemma 3 (see below) and the estimates for stochastic integrals. It is important that in the "second" scale the fast motion is still close enough to its frozen version [the bound (14) below]. Second, the bounds for "small" partitions and induction give one the estimate for a "large" partition interval.
The original proof in [16] contained some gap relates to a boundedness of some auxiliary constant b in the proof: in the original version this constant may depend implicitly on the partition size ∆, while the choice of ∆ could depend on b, hence generating a vicious circle. The main aim of this version of the paper is to present the "patch". A provisional version of this correction may be found in [17]. The present version is simplified further. The correction uses improved approximations that keep this constant b bounded in the lower and upper bounds, and it uses also a truncated Legendre transformation in the upper bound. The author is deeply indebted to Professor Yuri Kifer for discovering this vicious circle in the original version of the paper. The main technical tool remains the Lemma 5. All standing assumptions are the same as in the original version.
The main result is stated in Section 2. In Section 3 we present auxiliary lemmas, among them the main technical Lemma 5 with its proof and a version of an important lemma from [3] (see Lemma 6) which requires certain comments. Those comments along with other related remarks are given in the Appendix, the latter has been also slightly extended. The proof of the main theorem is presented in Section 4.

Main result
We make the following assumptions.
(A f ) The function f is bounded and satisfies the Lipschitz condition.
(A C ) The function CC * is bounded, uniformly nondegenerate, C satisfies the Lipschitz condition.
(A B ) The function B is bounded and satisfies the Lipschitz condition.
Some conditions may be relaxed; for example, B may be assumed locally bounded, C locally (with respect to x) nondegenerate and so on.
The family of processes X ε satisfies a large deviation principle in the space C([0, T ]; R d ) with a normalizing coefficient ε −2 and a rate function S(ϕ) if the following three conditions are satisfied: and S is a "good" rate function; that is, for any s ≥ 0, the set is compact in C([0, T ]; R d ). We will establish the following equivalent set of assertions due to Freidlin and Wentzell, where ρ(φ, ψ) = sup 0≤s≤T |φ s − ψ s |, where Φ(s) : where S is a "good" rate function (see above). In what follows,φ t is a derivative function for ϕ t and if it does not exist almost everywhere or if the integral T 0 L(ϕ t ,φ t ) dt diverges, then by definition T 0 L(ϕ t ,φ t ) dt := +∞.
The limit H exists and is finite for any β, the functions H and L are convex in their last arguments β and α correspondingly, L ≥ 0 and H is continuously differentiable in β.
The differentiability of H at any β will be provided by the compactness of the state space of the fast component. The constants C in the calculus may change from line to line, unlike K, C f , L f and some other.

Auxiliary lemmas
LetW t = ε −1 W tε 2 , y t = Y tε 2 , x t = X tε 2 , and let y x t solve an SDE, BelowF t := F tε 2 , β ∈ E d , βf means a scalar product and the index y in E y stands for the initial value of y x t at t = 0. Let us consider the semigroup of operators T β t , t ≥ 0, on C(M) defined by the formula where β ∈ E d , βf is a scalar product and the index y in E y means the initial value of y x t at t = 0. In the case if some inequality is uniform over y ∈ M, this index may be dropped in the calculus. Lemma 2 Let assumptions (A f ), (A B ), (A C ) be satisfied. Then the spectral radius r(T β 1 ) is a simple eigenvalue of T β 1 separated from the rest of the spectrum and its eigen-function e β belongs to the cone C + (M). Moreover, function r(T β 1 ) is smooth (of C ∞ ) in β and for any b > 0 the function e β is bounded and separated away from zero uniformly in |β| < b and all x ′ , x.
). The functionH(x ′ , x, β) is of C ∞ in β and convex in β. For any b > 0 there exists C(b) such that, for any y, |β| < b and for all values of t > 0 uniformly in x, x ′ , Notice that |H( In what follows, ∇ βH stands for the gradient ofH with respect to β. Lemmas 1 -4 are standard (cf. [14] or [13]). They are based on Frobeniustype theorems for positive compact operators (see [8]) and the theory of perturbations of linear operators (see [6,Chapter 2]).
Step 1. It suffices to prove (8) and (9) for t 0 = 0. Moreover, since H is continuous, it suffices to check both inequalities for x = x 0 . Indeed, the bound and we use the uniform continuity of the function H on compact sets (remind that |β| ≤ b). The same arguments are applicable to the second inequality of the assertion of the lemma. So, in the sequel we consider the case x 0 = x. Let us show first that if ε is small enough. Due to Lemma 3, it would be correct if y s were replaced by y x s and t(ε) ≥ ν −1 C(b). We will also use the bounds By virtue of the Lemma 3 we have if ε is small enough.
Let us estimate the second term in (12). By virtue of the inequalities for the Itô and Lebesgue integrals, we have By virtue of Gronwall's lemma, one gets In particular, So the second term in (12) does not exceed the value exp(C f b t(ε)ν)ν −2 Ct(ε) 2 ε 2 which is o(exp(Kt(ε))) for any K < 0. Indeed, for any such K we have, Hence, we get with any K < 0 for ε > 0 small enough, by virtue of (13). The upper bound in (10) follows.
The lower bound in (10) may be etablished similarly. For the convenience of the reader we show the calculus. We estimate for Since the second term in (12) is o(exp(Kt(ε))) with any K < 0, this implies the bound Now due to (13), we get, with any K < 0 and ε > 0 small enough, which implies the lower bound in (10).
Notice that both bounds in (10) are uniform with respect to |β| ≤ b and x ′ , x, y 0 . Since the function H is continuous, we get on the set if δ(ν) is small enough.
In particular, |x kt(ε) − x| < δ(ν) for any 1 ≤ k ≤ N. By induction, we get from (16) for such k, or, after the time change, Since H is continuous then we obtain for k = N, The Lemma 5 is proved. QED The next Lemma is an improved version of the Lemma 7.5.2 from [3]. Although we will not use it explctly, its technique is essential.
We added to the original assertion the property that χ n t may be chosen piecewise linear. Indeed, such functions are used in the proof; see [3,Section 7.5]. The existence of β s asserted in the lemma also follows from the proof; see [2] or [3]. Assertions aboutψ andβ s also added to the original assertion can be deduced from the proof using similar arguments.
In fact, there is a little gap in the original proof, namely, an additional assumption was used which was not formulated explicitly. This is why we present a precise statement and give necessary comments in the Appendix. 4 Proof of theorem 1 1. First part of the proof: the lower bound. Let S(ϕ) < ∞, and ν > 0. To establish the lower bound, we will show the inequality: given any ν > 0, and any δ > 0, we have for ε > 0 small enough, Denote H(x, β) =H(x, x, β). The existence of the limitH(x, x ′ , ·) for any x, x ′ , and its differentiability and continuity are asserted in Lemmas 3 and 4. Throughout the proof, we may and will assume that for any s, L(ϕ s ,φ s ) < ∞. Indeed, this may be violated only on a set of s of Lebesgue measure zero. Notice that due to the boundedness of the function f , this inequality implies sup s |φ s | ≤ f C , since for any |α| > f C , we have L(x, α) = +∞. Unlike in the previous section, in the sequel both X 0 = x 0 and Y 0 = y 0 are fixed, hence, the symbols P and E will be used without indices.

2.
We are going to reduce the problem of estimation from below the probability to that for the probability where both ψ, χ approximate ϕ. The rough idea is eventually to choose a step function as ψ and piecewise linear one as χ, however we are going to perform these approximations gradually. A step function is needed because we only have a technical tool -the Lemma 5 -established for this very case. A piecewise linear ψ is not necessary, but convenient. Eventually we will consider a finite-dimensional "discretized" subset of the set {ρ(X, ϕ) < δ} with appropriately chosen ∆, X ψ , deterministic curves ψ, χ, and constants δ ′ k : in particular, we will choose δ ′ 1 << δ ′ 2 << . . . << δ ′ T /∆ << δ. While performing all these approximations, we need to establish simultaneously a special property: at any point s, the Fenchel-Legendre adjoint to theχ s variable β s = β s [ψ s ,χ s ] (see below) can be chosen uniformly bounded.

5.
Our next goal is the choice of appropriate functions χ and ψ. It is essential to keep the integral T 0 L(ϕ s ,χ s ) ds close to S(ϕ). Also, by technical reasons we want some discretization. Hence, we will use a trick well-known in the definition of stochastic integrals based on the following Lemma. Hence, applying this Lemma we may fix some a ∈ [0, 1] for which there exists a sequence m ′ → ∞ such that Simultaneously for almost every a ∈ [0, 1], by virtue of the same Lemma and because ϕ is absolutely continuous, we also have, and sup each time over a new subsequence. Yet, to simplify notations, in the sequel m ′ will be replaced by m. Denote Notice that ψ is piecewise constant (step function) with finitely many values, while χ is piecewise linear with finitely many values of slopes. Let Notice that S ϕ (ϕ) = S(ϕ). Then (21) implies At the same time we have, if m is large enough. Moreover, in addition, ifδ ′ and λ = ρ(ϕ, χ) are small enough with respect to δ; hence, we can fix the valueδ ′ here.
We can choose a vectorχ s := α ∈ L • [f, ψ s ] so that the value L(ψ s ,χ s ) is close enough to L(ψ s ,χ s ). Recall that there are finitely many vector-values ofχ s for any given m and a; correspondingly, we will choose finitely many approximations satisfyingχ s ∈ L • [f, ψ s ]. Let us also choose some adjoint β for each α =χ s and denote it by β[ψ s ,χ s ].
In the case if the set L • [f, ϕ s ] is empty, the function H(ϕ s , β) is linear in β and one can chooseφ s :=φ s and β[ϕ s ,φ s ] = 0, see Appendix A.
Notice that whatever is the case -the interior L • [f, ψ s ] empty or not -and whatever is the choice of β -if not unique -in all cases there are finitely many of vectors β[ψ s ,χ s ] chosen. Hence, we may denote Notice that this value is fixed from now on. Let We may assume thatχ is as close to ϕ as we like, say, ρ(χ, ψ) < ν/3 and also 7. In the general case, the discretisations of ϕ should be read ϕ ∆,a = (ϕ ∆−ã , ϕ 2∆−ã , . . . , ϕ m∆−ã , ϕ T ), whereã = a−[a/∆]∆; if a = 0 then we may use the approximation ϕ ∆ = (ϕ ∆ , ϕ 2∆ , . . . , ϕ m∆ ), m∆ = T . Notice that 'almost every value' of a does not guarantee any particular value, so that we cannot be sure about taking a = 0. Hence, let us consider the general case here. Denote k∆ −ã =: t k , 1 ≤ k ≤ m, and t m+1 := T in the case ofã = 0 (and no t m+1 in the case ofã = 0). Since the drift of the diffusion X ψ is bounded -f C < ∞ -we have straight away (however, cf. [3, proof of the Lemma 7.5.1]), if δ ′′ and ∆ are small enough, (notice that here ∆ ≤ ∆(δ ′′ ) is not required), and assuming all our curves start at x 0 at time zero (hence, we do not include the starting point into the definition of ϕ ∆ ). Here for discretized curves we use the metric, Now, we are going to estimate from below the value in the right hand side of the inequality, where δ ′ 1 < δ ′ 2 < . . . < δ ′ m+1 = min(δ(ν), δ ′′ ), i = 1, . . . , m, and δ(ν) is from the Lemma 5; here all values δ ′ i and certain auxiliary values z i will be chosen in the next two steps as follows: where 0 < κ ≤ 1. Emphasize that δ ′′ and ∆ may be chosen arbitrarily small at this stage; in particular, we require that they should satisfy the conditions of the Lemma 5, which will be used in the sequel, that is, we do require δ ′′ ≤ δ(ν) and ∆ ≤ ∆(ν). Hence, both δ ′′ and ∆ are fixed at this stage.
9. Let us show that given δ ′ m+1 , there exists C m+1 > 0 such that on the set if ε is small enough. There exists a finite number of vectors v 1 , v 2 , . . . , v 2d such that v k = 1 ∀k (any orthonormal basis would do accomplished by its "symmetric" transformation, i.e. with each coordinate vector v we consider −v as well), and for any (non-random) vector ξ and any positive c, Let ν ′ m > 0 (this is a new constant which has nothing to do with ν and will be fixed shortly, see (41) below; we need it only while establishing the inequality (37)). By exponential Chebyshev's inequality we estimate, for any v := v k and any 0 ≤ z ≤ 1 on the set {|X ψ tm − χ tm | < δ ′ m }, if ε is small enough. We used here the identity χ t m+1 − χ tm = ∆ mχtm+ . Denote so that the rightmost side of (38) may be represented as Notice that h(0) = 0. Moreover, sinceχ tm+ = ∇ βH (ψ tm , ψ tm , β(m + 1)) (see (recall that ∆ m ≤ ∆ and that here m ∇ βH stands for the modulus of continuity of the function ∇ βH given |β(m)| ≤ b + 1 (b + 1 will be useful in the sequel, although here b would be enough)). The inequality Recall that a slightly stronger assumption was used in the rule of choosing δ ′ m and we will need a stronger version in a minute, see (40) below. Moreover, since ∇ βH is bounded and continuous due to the Lemma 4, then h ′ (z) ≥ C m /2 for small z, say, for 0 ≤ z ≤ z m (thus, z m is fixed here), on the rather than (39). Hence, under the assumption of (40), the right hand side in (38) with z = z m on the set {|X ψ tm − χ tm | < δ ′ m } does not exceed the value Recall that the constant ν ′ m should have been fixed in the beginning of this step of the proof; hence, we can do it now, once we have chosen z m , since the latter does not require any knowledge of ν ′ m . Given {|X ψ tm − χ tm | < δ ′ m }, this implies the bound, which is equivalent to (37) with C m+1 := C ′ m+1 z m . In turn, (37) implies the estimate if ε is small enough. Indeed, ν, C m+1 and ∆ m being fixed, one can choose ε so that 10. By "backward" induction from k = m to k = 1, choosing at each step δ ′ k−1 and z k−1 small enough in comparison to (cf. (40)), as well as all auxiliary values C k−1 , for ε small enough and since m+1 k=1 δ ′ k ≤ 2δ ′ m+1 , we get the desired lower bound: provided 4bδ ′ m+1 < ν. This is equivalent to (5). This bound is uniform in x ∈ E d , |y| ≤ r, and ϕ ∈ Φ x (s) for any r, s > 0, similar to the Lemma 7.4.1 from [3].
11. The property of the rate function S to be a "good rate function" can be shown as in [3], using the semi-continuity of the function L(x, y) with respect to y and continuity with respect to x variable (see [3,Lemma 7.4.2]).
12. Second part of the proof: the upper bound. Assume that the assertion (4) is not true, that is, there exist s and ν > 0 with the following properties: ∀δ > 0, there exists δ 0 <δ, ∀ε, there exists ε <ε : In other words, for some (hence, actually, for any) δ 0 > 0 arbitrarily close to zero, there exists a sequence ε n → 0 such that We fix any such δ 0 > 0.
Further, consider F 1 , the compact obtained from F by dropping the δ 0 /2neighbourhood of the set Φ x (s) = {ϕ ∈ C[0, T ; R d ] : ϕ 0 = x, S(ϕ) ≤ s}. Denoteδ ν = inf ϕ∈F 1 δ ν (ϕ), and take any δ ′ ≤ min δ ν /(4KT + 2), δ 0 /2 where K is a Lipschitz constant of f . Choose a finite δ ′ -net for the set F 1 , let ϕ 1 , . . . , ϕ N be its elements. All of them do not belong to Φ x (s), hence, S(ϕ i ) ≥ s ′ with some s ′ > s. Notice that Then, for any n there exists an index i such that There is a finite number of i = 1, . . . , N. Thus, there exists at least one i such that (44) holds true for this i for some subsequence n ′ → ∞ and correspondingly ε n ′ → 0; however, we will keep the notation n for simplicity. We may rewrite (44) as since N does not depend on ε n , strictly speaking with some new ν > 0; however, it is again convenient to keep the same notation. Denote ϕ(δ ′ ) := ϕ i with this i (any one if not unique).
In fact, this implies the same inequality for any δ ′ > 0, because with any δ ′ for which the inequality holds true, every greater value would do as well. Therefore, for any δ ′ , there exists ε > 0 (arbitrarily small) such that We are going to show that this leads to a contradiction.
Consider the function ℓ b (φ t ,φ t ). We have, and the function ℓ is decreasing with b → ∞. Hence, given ν > 0, one can choose a b > 0 such that Notice that we have chosen b, which is now fixed for the second part of the proof of the Theorem. Moreover, one can also choose a discretisation step ∆ (see above, step 5 of the proof and, in particular, the Lemma 7) such that for almost every a ∈ [0, 1] In addition, we require ∆ ≤ ∆(ν/20) (see the Lemma 5). Hence, we have chosen ∆ and m = T /∆. We also fix any a ∈ [0, 1] satisfying (48).
A similar calculus and similar inequalities are valid for any unit vector β 0 . This shows, in particular, that dim L A (x) = dim L f (x), and, moreover, that L A (x) = L f (x). Since A(x) is convex, it shows also that the interior A • (x) with respect to L A(x) is not empty, except for only the case dim(L A(x) ) = 1. Hence, the third condition is equivalent to the second one and to the first.
So, the condition A • k = ∅ is always satisfied if the set {f (x, ·)} for any x consists of more than one point. In fact, if card{f (x, ·)} = 1 for any x then f does not depend on y. In this case, one has nothing to average.
Notice that our considerations above provide the following description of the set L C. Aboutα s ∈ L • [f, ϕ s ]. Let x = ϕ s ,α =α[x,χ] as described in the proof of the theorem 1. If we show that for any direction v (a unit vector) satisfying the property m v < M v , the strict double inequality holds true m v < ∂H(x, zv)/∂z| z=0 < M v , z ∈ R 1 , then it would followα s ∈ L • [f, ϕ s ]. Let ν > 0 and again two open sets B ′ and B ′′ be chosen such that sup y∈B ′ vf (x, y) < m v + ν/2, and inf y∈B ′′ vf (x, y) > M v − ν/2. Let µ inv (B ′′ ) be invariant measure for the event {y x t ∈ B ′′ }. We can choose ν and correspondingly B ′′ so that µ inv (B ′′ ) < 1. Then, due to large deviation asymptotics for the process y x t , for any µ inv (B ′′ ) < ζ < 1 there exists λ > 0 such that P t −1 t 0 1(y x s ∈ B ′′ ) ds ≥ ζ ≤ exp(−λt), t ≥ t ζ .