On regularization by a small noise of multidimensional ODEs with non-Lipschitz coefficients

In this paper we solve a selection problem for multidimensional SDE $d X^\varepsilon(t)=a(X^\varepsilon(t)) d t+\varepsilon \sigma(X^\varepsilon(t))\, d W(t)$, where the drift and diffusion are locally Lipschitz continuous outside of a fixed hyperplane $H$. It is assumed that $X^\varepsilon(0)=x^0\in H$, the drift $a(x)$ has a Hoelder asymptotics as $x$ approaches $H$, and the limit ODE $d X(t)=a(X(t))\, d t$ does not have a unique solution. We show that if the drift pushes the solution away of $H$, then the limit process with certain probabilities selects some extreme solutions to the limit ODE. If the drift attracts the solution to $H$, then the limit process satisfies an ODE with some averaged coefficients. To prove the last result we formulate an averaging principle, which is quite general and new.


Introduction
Consider an ODE du(t) dt = a(u(t)); u(0) = 0, where a is a continuous function of linear growth that satisfies a local Lipschitz condition everywhere except of the point u = 0. Then uniqueness of the solution to (1.1) may fail; e.g. for a(u) = |u|sgn(u) the ODE (1.1) has multiple solutions ±t 2 /4, t ≥ 0. Consider a perturbation of (1.1) by a small noise: du ε (t) = a(u ε (t))dt + εdW (t), where W is a Wiener process. Equation (1.2) has a unique strong solution due to the Zvonkin-Veretennikov theorem [23]. It easy to see that a family of distributions of {u ε } is weakly relatively compact because a has a linear growth. Moreover, any limit point of {u ε } as ε → 0 satisfies equation (1.1) because a is continuous. Hence, if the limit lim ε→0 u ε (in distribution) exists, then this limit may be considered as a natural selection of a solution to (1.1). The corresponding problem was originated in papers by Bafico and Baldi [2,3], who considered the one-dimensional case; other generalizations see, for example, in [4,5,6,7,8,9,12,15,19,20,21,22] and references therein. Investigations in multidimensional case are much complicated than in the one-dimensional one. There are still no simple sufficient conditions that ensure existence of a limit lim ε→0 u ε and a characterization of this limit. One of the reason for this is the absence of the linear ordering in the multidimensional case. Indeed, in the one-dimensional situation the are only two ways to exit from the point 0: one way to the right and another to the left. The probability of going left or right can be easily obtained since there are explicit formulas for hitting probabilities for one-dimensional diffusions. The equation for the limit process outside of 0 must satisfy the original ODE because a is Lipschitz continuous there.
In this paper we consider the multidimensional case, where the Lipcshitz condition for a may fail at a hyperplane. Let us describe the corresponding model. Consider an SDE du ε (t) = a(u ε (t))dt + εσ(u ε (t))dW (t); where a : R d → R d , σ : R d → R d×m are measurable functions, W is an m-dimensional Wiener process.
Assume that a and σ are of linear growth, σ is continuous and satisfies the uniform ellipticity condition. This ensures existence and uniqueness of a weak solution to (1.3) and relative compactness for the distributions of {u ε }.
Set H := R d−1 × {0}. Suppose that the initial starting point x 0 ∈ H and that the drift a satisfies the local Lipschitz property in R d \ H.
Note that the definition of a on H is inessential because u ε spends zero time in H with probability 1 due to the non-degeneracy of the diffusion coefficient.
The case when a is globally Lipschitz continuous in the lower half-space R d − := R d−1 × (−∞, 0) and and the upper half-space R d + := R d−1 × (0, ∞) was investigated in [20]. The result was formulated in terms of the vertical components of a ± (x 0 ) := lim x→x 0 ,x∈R d ± a(x). In this paper we investigate the case when the drift has Hölder-type asymptotic in a neighborhood of H. Namely, we will assume that A1. a d (x) = |x d | γ b(x), where γ < 1, x d is the d-th coordinate of x = (x 1 , ..., x d ), and b is a globally Lipschitz continuous function in R d + and R d − , b ± (x) = 0, x ∈ H. A2. a k , k = 1, . . . , d − 1, are globally Lipschitz functions in R d + and R d − . This case has new features, and the proofs will be based on new ideas compared to the proofs from [20]. To illustrate the difference, let us recall briefly results of [20], where the case γ = 0 was considered, and sketch the expected results in the case γ ∈ (0, 1).
Case 1. (The vector field a pushes outwards the hyperplane) Denote by n = (0, ..., 0, 1) the normal vector to the hyperplane H. Assume that γ = 0 and ±(a ± (x), n) > 0, x ∈ H. Then there are two solutions u ± to du(t) = a(u(t))dt (1.4) that start at x 0 ∈ H and exit from H immediately to the upper and the lower half spaces, respectively. It was proved in [20] that if γ = 0, then the limit process u 0 immediately leaves H and moves as u ± with probabilities proportional to |(a ± (x 0 ), n)|. The corresponding proof was similar to the one-dimensional situation. It used some comparison principle adapted to the multidimensional situation. Investigations for arbitrary γ ∈ (0, 1) will be similar, but selection probabilities will be different.
Remark 1.1. It was assumed in [20] that the noise is additive, i.e., σ is the identity matrix and m = d. The case of multiplicative noise is completely analogous.
Remark 1.2. If γ = 0 and the vector field a pushes away H from one side of H and attracts from another side (for example, (a ± (x), n) > 0), then there is a unique solution to (1.4) that starts at x 0 ∈ H. This solution exits from H immediately (to the upper half space in our case) and the limit process u 0 equals this solution of the ODE, see [20]. If γ ∈ (0, 1), the result is similar. Assume, for example, that b ± (x 0 ) > 0. Then there exists a unique solution to (1.4) that exit H immediately (there may be other solutions that stay in H). Moreover this solution exits to the upper half space and the limit process u 0 equals this solution. We do not prove this result in this paper. The proof is similar to [20].
Case 2. (The vector field a pushes towards the hyperplane) Assume that γ = 0 and ±(a ± (x), n) < 0, x ∈ H. It can be seen that any limit point of {u ε } must stay at H with probability 1. It was proved in [20] that the limit process u 0 satisfies an ODE on H with the drift P H (p + (x)a + (x) + p − (x)a − (x)), where P H is the orthogonal projection to H and the coefficients p ± (x) are equal to . Note that this multidimensional result has no one-dimensional analogues, where the limit is zero process. In multidimensional case the first (d − 1) coordinates may change while d-th coordinate stays zero.
The idea of proof was to analyze the time spent by u ε in upper and lower half-spaces. It was seen that since any limit process stays at H and u ε is close close to H for small ε, then the times spent in upper and lower half-spaces in a neighborhood of x ∈ H are proportional to the d-th coordinates a − d (x) and a + d (x), respectively (they are not zero if γ = 0). Note that, the proof in [20] was independent of the type of a noise. The small noise might be arbitrary process that (a) ensures existence a solution and (b) converges to 0 uniformly in probability as ε → 0 (however, the corresponding results were formulated for Brownian noise only).
The proof from [20] does not work if a d (x) → 0 as x approaches to H. The time spent in upper and lower half-spaces might depend on the asymptotic of decay of a d in a neighborhood of H. In this paper we prove the result when a satisfies assumptions A1, A2 with γ ∈ (0, 1), b + (x) < 0 and b − (x) > 0 for x ∈ H.
It appears that if we scale the vertical coordinate ε −δ u d,ε (t) for a special choice of δ > 0, then a pair (u 1,ε (t), ..., u d−1,ε (t)) and ε −δ u d,ε (t) can be considered as components of a Markov process in a "slow" and "fast" time, respectively. Hence the description of the limit process for {u ε } is closely related to the averaging principle for Markov processes. We will see that the limit process satisfies an ODE on H whose coefficients are an averaging of functions of a ± k , k = 1, ..., (d − 1) over a stationary distribution of a scaled vertical component given the other components were frozen. The idea to use some scaling for small-noise problem was effectively used in one-dimensional case if the drift is a power-type function and the noise is a Levy α-stable process or even more general.
x ∈ H, then the limit process may be non-Markov and satisfy certain equation [20] that depends somehow on a Wiener process W (that formally should disappear in a limit equation).
The paper is organized as follows. In section 2 we formulate the problem and the main results. The proofs for the cases when the drift pushes outwards H and towards H are given in §3 and §4, respectively.
In subsection 2.3 we also formulate an averaging principle, which is quite general and new result. The proof of averaging principle is postponed to section 5.
Singular diffusions: analytic and stochastic approaches between the University of Potsdam and the Institute of Mathematics of the National Academy of Sciences of Ukraine.

Main results
Let us represent u ε (t) as a pair (X ε (t), Y ε (t)), where Y ε is the last coordinate of u ε and X ε consists of the first d − 1 coordinates. Below we study only the general problem for the pair (X ε (t), Y ε (t)), which can be easily be reformulated for u ε . For notational convenience, we assume below that X ε is a d-dimensional process but not (d − 1) dimensional one.
The general setup is the following. Let X ε , Y ε be stochastic processes with values in R d and R, respectively. Assume that the pair X ε , Y ε satisfies the following SDE where B, W are Wiener processes (multidimensional and one-dimensional), that may be dependent. Denote We assume that domains of ψ ± , ϕ ± are the whole space x ∈ R d , y ∈ R, despite we use their values on the corresponding half-spaces only. The functions ψ, ϕ may have jump discontinuity on H.
B2 ϕ ± (x, 0) = 0 for any x ∈ R d ; B3 β(x, y) = β + (x, y)1I y≥0 + β − (x, y)1I y<0 , where β ± are bounded, continuous and separated from zero function in the whole space R d × R; function b is bounded and continuous in (R d × R) \ H; B4 γ ∈ (0, 1). Under assumptions B1-B4 there exists a weak solution to (2.1). Indeed, it follows from the standard compactness arguments that there exists a weak solution to dX Note that all coefficients may be discontinuous in H but the processes spend zero time there with probability 1. Any redefinition of coefficients in H does not affect the equations. Using the transformation of time arguments, see for example [13], we get a solution to Finally, Girsanov's theorem yields existence of a weak solution to (2.1) .
Remark 2.1. If b is non-degenerate, then existence of a solution can be proved without transformation of time arguments.

Repulsion from the hyperplane
In this subsection we assume that ϕ ± (x, 0) > 0 for all x ∈ R d . Then sgn(y)ϕ(x, y)y γ > 0, y = 0 and the drift pushes away from the hyperplane R d × {0}.
The proof is given in §3.
Remark 2.2. If ±ϕ ± (x, 0) > 0 (or ±ϕ ± (x, 0) < 0) for all x ∈ R d , then the limit process is (X + (t), Y + (t)) (respectively (X − (t), Y − (t)) ) with probability 1. Remark 2.3. If we have inequality ϕ + (x 0 , 0) > 0 and ϕ − (x 0 , 0) < 0 only at the initial point (and hence in some neighborhood by continuity of coefficients), then the functions (X ± (t), Y ± (t)) are well defined up to the moment τ ± H := inf{t > 0 : Y ± (t) = 0} of the first return to H. In this case we have the convergence in distribution for the stopped processes: . The proof is essentially the same, but it involves routine localization arguments in addition.

Attraction to the hyperplane
In this subsection we assume that ϕ ± (x, 0) < 0 for all x ∈ R d .
Suppose that assumptions B1-B4 holds true and ψ ± are locally Lipschitz in x for any fixed y.
Theorem 2.2. For any T > 0 we have the uniform convergence in probability where X(t) is a solution to the following ODE The proof is given in §4.
where π (x) is the stationary distribution for the SDE Hence, i.e., the drift of the limit equation is the averaging of ψ ± over the stationary distribution of an SDE with frozen x variable. The corresponding relation between the averaging principle and averaging of coefficients in the limit equation for the small noise perturbation problem will be seen from the proof.
In the next subsection we formulate an averaging principle, which is applied in the proof of Theorem 2.2. We consider more general SDEs than (2.1) because the idea of the proof is universal. The corresponding result may be interesting by itself.

Averaging
Let for ε > 0 the processes X ε (t), Y ε (t) take values in R d , R k and have the form where B ε t , W ε t are Brownian motions and N ε (du, dt), Q ε (dz, dt) are Poisson point measures on a common filtered probability space (Ω ε , F ε , P ε ), and the random measures N ε (du, dt), Q ε (dz, dt) have the intensity measures ν ε (du)dt and ε −1 µ ε (dz)dt, respectively. These random measures are involved into the system in the partially compensated form, which is quite typical for the Lévydriven SDEs; what is a bit unusual is the choice of the cutoff functions 1 |u|≤ρ , 1 |z|≤ρ with the number ρ > 0 to be specified separately. This choice will become clear later, when we describe the limit behavior of the Lévy measures ν ε (du), µ ε (dz) as ε → 0. Note that here and below we do not assume a uniqueness of a solution to prelimit equation (2.4).
The factor ε −1 in the intensity measure for Q ε (dz, dt) and the factors ε −1 , ε −1/2 at the integrals w.r.t. ds and dW ε s in the equation for Y ε mean that the evolution of the component Y ε happens at the 'fast' time scale ε −1 t, which we will also call the 'microscopic' time scale. The component X ε evolves at the 'slow', or 'macroscopic' time scale t; its evolution involves the deterministic term, two stochastic terms (continuous and partially compensated jump parts), and a residual term ξ ε , for which we do not impose any structural assumptions, and only require it to be asymptotically small in the following sense: H 0 . (Negligibility of the residual term). The process ξ ε (t) is an adapted càdlàg process, and for any T > 0, sup The aim of this subsection is to get the averaging principle (AP) for the 'slow' component X ε . Let us stress that the framework we adopt is quite general; in particular, • the two-scale system (2.4) is fully coupled in the sense that the coefficients of the 'slow' component depend on the 'fast' one, and vice versa; • the noises for the 'slow' and the 'fast' component are allowed to be dependent; • the coefficients of the 'slow' component can be discontinuous.
Let us introduce further assumptions on the system (2.4). Note that all the assumptions listed below are quite natural and non-restrictive. H 1 . (Bounds for the coefficients). There exists a constant C such that for all values of x, y, u, z.
In addition, for any R > 0 there exists a constant C R such that (Bounds for the Lévy measures). There exist constants C and p > 0 such that To introduce the next condition, let us define the weak convergence of a family of Lévy measures on R m in the following way: Condition (2.5) yield that the cutoff functions 1 |u|≤ρ , 1 |z|≤ρ used in (2.4) are a.s. continuous w.r.t. the measures ν(du), µ(dz), respectively. Note that there exists at most countable set of levels ρ such that (2.5) fails, hence one can always choose ρ to satisfy this condition. Of course, changing the cutoff level would change the drift coefficients respectively.
Next, assume that the drift of the fast component performs an attraction to origin. H 5 . (The drift condition for the microscopic dynamics) There exist κ > 0 and c, r > 0 such that In addition, the balance condition holds: where p is introduced in the assumption H 2 .
Consider a family of 'frozen microscopic equations' where W is a Wiener process and Q(dz, dt) is an independent Poisson point measure with the intensity measure µ(dz)dt. For the corresponding 'frozen dynamics' we introduce a separate family of assumptions.
F 0 . (The 'frozen microscopic dynamics' is well defined and Feller). For any x and any initial value y(0) = y, the SDE (2.8) has a unique weak solution, which is a Markov process. Furthermore we denote the corresponding family of Markov processes by y (x) , x ∈ R d , and write P (x) t (y, dy ′ ) for the corresponding family of transition probabilities.
We also denote P f rozen the semigroup of operators corresponding to the two-component process (x, y (x) ) in which the first component is constant and the second one is the Markov process specified above. We assume that this semigroup is Feller. For this family, we assume the following mixing property, which is actually the local Dobrushin condition, uniform in parameter x; see [16,Section 2]. F 1 . (The 'frozen microscopic dynamics' is locally mixing). There exists h > 0 such that, for any R > 0 there exists ρ = ρ R > 0 such that, for any x, y 1 , y 2 with |x| ≤ R, t (y, dy ′ ) denotes the transition probability of the process y (x) , and the total variation distance between probability measures is defined as We note that assumptions F 1 , H 5 ensure that, for each x ∈ R d , the laws of y (x) t converge to the invariant probability measure (IPM) π (x) (dy) with an explicitly rate; see Proposition 5.1 below.
For the coefficients of the 'slow' component, we assume a weaker analogue of H 3 , where the convergence and continuity of the limiting coefficients may fail on an exceptional set, which should be negligible, in a sense. H 6 . (The coefficients of the slow component are convergent). There exist functions a(x, y), σ(x, y), c(x, y, u) and an open set B ⊂ R d × R k such that, for any compact set K ⊂ B, a ε (x, y) → a(x, y) and σ ε (x, y) → σ(x, y) as ε → 0 uniformly on K, and for any R > 1 In addition, the functions a(x, y), σ(x, y), and c(x, y, u) are continuous on B and B × (R m \ {0}), respectively.
Define the averaging of the limiting drift coefficient for the macroscopic component w.r.t. the family of IPMs for the frozen microscopic one: Next, consider the limiting diffusion matrix and compensated/non-compensated jump kernels for the macroscopic component, and introduce the corresponding averaged characteristics as Finally, we introduce an auxiliary technical assumption. A 0 . The averaged coefficients a(x), b(x) are continuous. The averaged Lévy kernels K (ρ) (x, dv), Remark 2.5. It is easy to give a sufficient condition for A 0 to hold. Namely, it is enough to assume, in addition to t (y, dy ′ ) are continuous in x w.r.t. the total variation convergence for each y ∈ R k , t ≥ t 0 . Then, because of the convergence (5.6), the same continuity holds for the family of the IPMs π (x) (dy). The latter continuity, combined with H 1 , H 2 , H 4 , and H 6 , yields the required continuity of the averaged coefficients. Now we are ready to formulate our main statement.
in probability and {Y ε (0)} be bounded in probability.
Then the family {X ε , ε > 0} is weakly compact in D([0, ∞), R d ), and any of its weak limit point as ε → 0 is a solution to the martingale problem (L, where ϕ ∈ C ∞ 0 . If the martingale problem (2.9) is well posed, then X ε weakly converges as ε → 0 to its unique solution with X(0) = x 0 .

Proof of Theorem 2.1
The proof almost copying the proof of Theorem 3.1 in [20]. Thus we only sketch the main steps of the proof.
Step 1. The sequence {(X ε , Y ε )} is weakly relatively compact. The proof follows from boundedness of functions ϕ, ψ, b, β. Therefore, to prove the Theorem it suffices to verify that any subsequence {(X εn , Y εn )} contains sub-subsequence {(X εn k , Y εn k )} that converges to the desired limit. Without loss of generality we will assume that {(X ε , Y ε )} is weakly convergent by itself.
Step 2. Estimate for the time spent by Y ε in a neighborhood of 0. We will use the following general statement.
Lemma 3.1. Assume that processes {η ε (t)} satisfy the following SDE where |γ| < 1, and a ε (t), b ε (t) are F t -adapted processes such that Then there is a constant K = K(A, C 1 , C 2 ) such that The proof of Lemma is quite standard. We postpone it to the Appendix. Without loss of generality we will assume that where c 1,2,3 are some positive constants. This assumption does not restrict generality, since the general case can be considered using a localization. Under this additional assumption, Lemma 3.1 applied to τ ε (δ) := inf{t ≥ 0 : |Y ε (t)| ≥ δ}. and the Chebyshev inequality yield Remark 3.1. It can be seen from the construction of Y ± that the inequality (3.2) is valid for τ ± (δ) := inf{t ≥ 0 : |Y ± (t)| ≥ δ} also.
Step 3. We see from (3.2) that with high probability the random variable τ ε (δ) is dominated by δ 1−γ 2 . It follows from the standard estimates for moments of SDEs that for small t we have where constant C can be selected independently of ε ∈ [0, 1].
Step 4. We denote by (X x,y (t), Y x,y (t)) a solution to the corresponding ODE that starts from x ∈ R d , y = 0. This solution never hits R d × {0}, recall (3.1). We have correctness of the definition of (X x,y (t), Y x,y (t)) because in all other points coefficients satisfy the local Lipschitz condition.
To estimate I 1 we need the following statement on integral equations. Let f (t) = (f X (t), f Y (t)) be a non-random continuous function, and functions X ±,x,y (f ) , Y ±,x,y (f ) satisfy the integral equation Remark 3.2. We do not assume that a pair X ±,x,y is a unique solution. Recall also that the domains of ψ ± , ϕ ± is the whole space.
The proof of the Lemma is standard. Notice that if α is small enough, then Y ±,x,y (f ) (t) = 0, t ∈ [0, T ] and coefficients of the integral equations are locally Lipschitz continuous if y = 0.
Let ω be such that Y ε (τ ε (δ)) = δ. Then Since b and β are bounded we have the uniform convergence in probability: for any δ > 0.
where p ± are defined in (2.2).
Here we used the following Since, ν was arbitrary, this completes the proof of Lemma 3.3 and Theorem 2.1.
Without loss of generality we will assume that where c is a constant. The general case can be considered using a localization. Hence, condition H 5 is satisfied with κ = γ.

Consider equation with frozen coefficients
Existence and uniqueness of a weak solution to equation with frozen coefficients, and the strong Markov property follows from [10]. Hence condition F 0 holds true.
To verify condition F 1 , we modify the argument from [16,Section 3.3.2]. Because the diffusion coefficient in (4.4) is discontinuous, we do not have a good reference to state that the transition probability density p (x) (t, y, y ′ ) is continuous in x, y, y ′ . In order to overcome this minor difficulty we use the following localization argument. Consider the SDE dy (x,+) (t) = ϕ + (x, 0)(|y (x,+) (t)| ∧ 2) γ sgn(y (x,+) (t)) dt + β + (x, 0) dW (t). (4.5) This is an SDE with a constant diffusion coefficient and bounded and Hölder continuous drift coefficient, hence the standard analytic theory (e.g. [11]) yields that its transition probability density p (x,+) (t, y, y ′ ) is continuous in x, y, y ′ . Then for y 0 = 1 and every t 0 > 0 it holds that Combining these two estimates we see that there exist t 0 > 0 and r > 0 small enough, so that sup in the RHS we could actually take any number > 1 3 + 1 3 = 2 3 . This proves the local Dobrushin condition in a small ball centered at y 0 = 1. To extend this condition to a large ball |y| ≤ R, we use another standard argument, based on the support theorem. Namely, y (x) can be represented as an image of a Brownian motion under the time change and the change of measure; see [13]. Since the Wiener measure in C 0 (0, ∞) has a full topological support, it is easy to show using this representation that, for any t 1 > 0, there exists δ > 0 such that Take h = t 0 + t 1 and for x, y 1 , y 2 with |x| ≤ R, |y 1 | ≤ R, |y 2 | ≤ R consider two processes Y 1 t , Y 2 t which start at y 1 , y 2 respectively, solve (4.4) independently up to the time t 1 , and then provide the maximal coupling probability on the time interval [t 1 , t 1 + t 0 ], conditioned on their values at the time t 1 (we can construct such a process using the Coupling Lemma for probability kernels, [16,Theorem 2.2.4].) Then h (y 2 , dz 2 ) for any |x| ≤ R, |y 1 | ≤ R, |y 2 | ≤ R, which completes the proof of F 1 . The invariant probability measure π (x) (dy) equals, see [14,Exercise 5.40]: The averaged coefficient

Proof of Theorem 2.3
The weak compactness of the family {X ε , ε > 0} in D([0, ∞), R d ) follows, in a standard way, from the negligibility assumption H 0 and the boundedness assumptions H 1 , H 2 . Under the assumptions of the theorem, for any C ∞ 0 -function ϕ the function Lϕ is continuous and bounded. Hence, in order to prove that any weak limit point of the family {X ε , ε > 0} as ε → 0 solves the MP (2.9), it is enough to show that, for any C ∞ 0 -function ϕ, any s 1 . . . , s q < s < t, and any continuous and bounded function Φ : we denote by E ε the expectation w.r.t. P ε . Denote Observe that Lϕ is a bounded and continuous function. So, by H 0 relation (5.1) is equivalent to Then by the Itô formula we have Applying H 0 once again, we get that, to prove (5.1) and (5.3), it is enough to prove, for any s 1 . . . , s q < t, Before proving (5.5), we formulate and prove two auxiliary statements.

Auxiliaries, I: uniform ergodic rate for the frozen microscopic dynamics
Proposition 5.1. Let conditions H 1 − H 5 , F 0 , F 1 hold. If κ ∈ (0, 1) and p > 0 are from these conditions, then for every R > 0 there exists C such that for any x, y with |x| ≤ R, |y| ≤ R If κ ≥ 1, then there exists a > 0 such that, for every R > 0 and any x, y with |x| ≤ R, |y| ≤ R, with a constant C depending on R.
Proof. The required statement is actually obtained, though not in this precise form, in [16,Section 3]. The difference between the current situation and the one studied in [16] is that the ergodic rates were obtained there for individual processes (while here we have a family indexed by x) and separately for diffusions and Lévy driven SDEs (while here we have both types of the noise involved simultaneously). This difference is not crucial, and we just give a short outline of the argument, referring to [16] for details. The convergence conditions H 3 , H 4 yield that the bounds from the conditions H 1 , H 2 and the drift condition H 5 remain true for the limiting coefficients A(x, y), Σ(x, y), C(x, y, z) and Lévy measure µ(du). Then we have the following: if V ∈ C 2 is a function such that V (y) ≥ 1 and V (y) = |y| p , |y| ≥ 2, then for any x ∈ R d the semimartingale decomposition holds V (y (x) (t)) = V (y (x) (0)) + t 0 AV (x, y (x) (s)) ds + (martingale part), where the function AV (x, y) satisfies with some constants C V , a V > 0. For the proof of this statement, see [17], Proposition 2.5. Given (5.7), (5.8) we can proceed analogously to [16,Sections 3.3,3.4]. Namely, for κ ∈ (0, 1) we use [ where h is the same as in the assumption F 1 , C V , c V > 0 are some new constants, and V is a new function which is equivalent to V in the sense that, for some positive constants c 1 , c 2 = y (x) (kh), k ≥ 0 for the process y (x) , see [16,Section 2.8]. Combined with the local Dobrushin condition assumed in F 1 , we get by [16, Corollary 2.8.10] for κ ∈ (0, 1) the inequality where we have used the identity Since Lyapunov condition and the local Dobrushin condition are uniform in x, the constant C here can be chosen uniformly for x ∈ R d ; one can easily check this following line by line the proofs of [ t (y, dy ′ ) − π (x) (dy ′ ) T V is non-increasing in t and V (y) is locally bounded, this completes the proof of the required statement in the case κ ∈ (0, 1).
That is, to prove the required weak convergence it is enough to prove that {y εn } is weakly compact in D([0, T ], R k ).
To prove the weak compactness, we use L 2 -moment bounds for the increments of the process y εn combined with a truncation of the large jumps. Namely, by H 2 for any fixed δ > 0 there exists Thus it is enough to prove weak compactness for every 'truncated' family {y εn,Q }, Q > 0, where y ε,Q satisfies an analogue of (5.10) with the integral for q ε taken over {|z| ≤ Q} instead of R l . For such a 'truncated' family, applying [17,Proposition 2.5] we get that |y εn,Q (s)| 2 = |y εn,Q (0)| 2 + s 0 H(r) dr + (martingale part), (5.15) where H is bounded. Combining this with the maximal martingale inequality, we get that |y εn,Q (s)| 2 is bounded. Since the coefficient A ε (x, y) is bounded locally in y, the above bound and the (uniform) bounds for C ε , µ ε from H 1 , H 2 yield the required weak compactness of {y εn }. Summarizing all the above, we have that {y εn } weakly converges to P * . Combined with (5.14), this contradicts to (5.13) and proves the required statement.
The following lemma collects several simple statements used in the proof.
To prove statement (d), we first mention that each function χ j is continuous by the assumption F 0 . These functions converge monotonously, at each x ∈ R d , to the function where the last identity holds by the assumption H 6 . Then the required uniform convergence follow by the Dini theorem.
To prove statement (e), we first use statements (c) and (d) to get uniformly for x with |x| ≤ R. Then the required statement follows by the identity .
Statement (f) can be obtained using the same 'truncation of large jumps' argument as in the proof of Proposition 5.2 and the bounds from the assumptions H 1 , H 2 ; we omit the details.
This proves (5.5) and completes the entire proof.
Let x > 0 be arbitrary. Changing the variables s := z γ+1 ε 2 and t := y γ+1 ε 2 we get v ε (x) = 2ε  Therefore, we get from (6.1) the following equivalence for any fixed x = 0 as ε → 0 : where K is a constant independent of δ. This yields that for any fixed δ ≥ 0: This completes the proof of the Lemma.