Invariant measures for stochastic functional differential equations

We establish new general sufficient conditions for the existence of an invariant measure for stochastic functional differential equations and for exponential or subexponential convergence to the equilibrium. The obtained conditions extend Veretennikov--Khasminskii conditions for SDEs and are optimal in a certain sense.


Introduction
While ergodic properties of stochastic differential equations (SDEs) are more or less understood by now, less is known about ergodic properties of stochastic functional (or delay) differential equations (SFDEs). In this article we establish new general sufficient conditions for existence of an invariant measure for SFDEs and obtain estimates for the rate of convergence to the equilibrium. SFDEs in general have quite a peculiar ergodic behavior that can be very different from the ergodic behavior of SDEs. Let us briefly describe the main features. First of all, as was shown in [24], an SFDE might have a reconstruction property. Namely, consider the equation dX (x) (t) = f (X (x) (t − 1))dt + g(X (x) (t − 1))dW (t), t 0, (1.1) where f : R → R is a Lipschitz function, g : R → R is a positive strictly increasing bounded Lipschitz function, x : [−1, 0] → R is a continuous function and W is a 1-dimensional Brownian motion. It turns out that if for some N > 0 (that might be arbitrarily large) one observes a single piece of trajectory {X(t, ω), t ∈ [N, N + 1]}, then with probability 1 one can reconstruct the initial condition {X(t), t ∈ [−1, 0]} of the SFDE. Clearly, this is not the case for SDEs. As a result of this, the solution to (1.1) is not a strong Feller process and it does not have a mixing property. Indeed, if x = y, then the measures Law{X (x) (t), t ∈ [N, N + 1]} and Law{X (y) (t), t ∈ [N, N + 1]} are mutually singular for any N > 0. Therefore, one cannot hope to construct a classical coupling between these measures to show asymptotic stability.
SFDEs might also have a resonance property. If one considers a delay version of a classical Ornstein-Uhlenbeck process, dX(t) = −λX(t − 1) dt + dW (t), t 0, (1.2) where λ > 0, then (contrary to the non-delay case) for large enough λ (more precisely, λ π/2) this equation does not have an invariant measure [14]. Moreover, for large λ the equation oscillates to infinity with rapidly increasing diameter of oscillations.
Due to the above mentioned challenges the question of existence of an invariant measure and rate of convergence to the equilibrium remained open even for a relatively simple SFDE (1.1) if f is not affine. In the current paper we present an answer to this question.
Let us recall that there are two quite general approaches that are used to study the ergodic properties of Markov processes. The first approach is based on functional inequalities, see, e.g., [1]. The second approach is based on the concept of small sets and utilizes the coupling method, see, e.g., [18]. Using these techniques, it was shown that if the drift vector field of an SDE points towards the origin (the so-called Veretennikov-Khasminskii condition), then, under some further non-degeneracy assumptions, the SDE has a unique invariant measure and converges to it in total variation, see [4], [13], [26]. More general SDEs are treated in [11].
Unfortunately, these methods are not applicable for SFDEs due to their lack of mixing properties. Note though that ergodic properties of affine SFDEs can be treated by comparison with the deterministic case and by studying the fundamental solutions, see [7], [14], [17], [20]. However this technique also does not work for non-affine SFDEs.
Some sufficient conditions for the existence of an invariant measure for SFDEs are obtained in [12,Theorem 3]. Let us note though that it might be quite hard to verify these conditions in practice.
To overcome these difficulties and to derive verifiable sufficient conditions M. Hairer, J. Mattingly and M. Scheutzow suggested a new approach targeted specifically at Markov processes with bad mixing properties [10]. They introduced a new concept of a d-small set, and showed that under certain conditions (much weaker than mixing) a Markov process has a unique invariant measure and converges to it. The price to pay is that this convergence occurs in the Wasserstein metric rather than in total variation. This approach was further developed in [3].
In this paper we apply this general approach to SFDEs. The main obstacle here is to construct a proper Lyapunov function. Due to the memory property it is much more challenging than in the SDE case. Indeed, a solution to an SFDE is an infinite dimensional Markov process with non-locally compact state space and rather involved generator. We develop a new technique inspired by some ideas from [23].
The obtained result can be formulated as follows: one should check that the drift vector field f (x) points towards the origin only for "typical" x. This extends and generalizes the corresponding theorems for SFDEs in [23], [10], [3]. The obtained conditions and rates are optimal in a certain sense. We explain our result in more details below in Section 2.
Note also that there is an alternative fruitful approach, which is also suitable for SFDEs, that was suggested and developed in [10], [16]. It is based on the generalized coupling method. Using this approach it is possible to establish uniqueness of an invariant measure and asymptotic stability under some natural conditions. However, this approach does not allow directly to obtain the results on existence of an invariant measure and on the convergence rate. Therefore we do not use it here.
The paper is organized as follows. We formulate and discuss our main results in Section 2. Section 3 contains specific applications of our results to different SFDEs as well as some counterexamples. All proofs are placed in Section 4.

Main results
We assume that all random objects are defined on a common probability space (Ω, F , P). Fix r > 0, positive integers d, m and let C := C([−r, 0], R d ) be the space of continuous functions endowed with the supremum norm · . We study a stochastic functional differential equation where f : C → R d and g : C → R d×m are measurable functions, W is an m-dimensional Brownian motion, the initial condition x ∈ C, and we used the standard notation X t (s) := For a matrix M ∈ R d×m we denote by |||M||| its Frobenius norm, that is, |||M||| := M 2 ij . For a real a we put a + := max(a, 0). We suppose that the drift and diffusion of (2.1) satisfy the following condition: Assumption A1. The drift f is continuous and bounded on bounded subsets of C. The diffusion g is non-degenerate, that is, for any x ∈ C the matrix g(x) admits a right inverse g −1 (x) and sup Furthermore, f satisfies the one-sided Lipschitz condition and g is Lipschitz. Namely, there exists C > 0 such that for any x, y ∈ C we have It follows from [21] that under Assumption A1 SFDE (2.1) has a unique strong solution. Moreover, this solution X = (X t ) t 0 is a strong Markov process with the state space (C, B(C)), see Proposition 4.1 below. We denote the transition probabilities of X by P t (x, ·), where t 0, x ∈ C.
In this article we study the invariant probability measures of X. Further, we will drop the word "probability" and refer to these measures just as invariant measures.
It was shown in [10, Theorems 3.1 and 3.7] (see also [16,Section 6.1]) that under A1 X has at most one invariant measure and if it has one, then the transition probabilities weakly converge to this measure. Note however that A1 does not guarantee the existence of the invariant measure of X. Indeed, the equation satisfies A1 but does not have an invariant measure. Also assumption A1 alone does not imply any bound on convergence rate, see [10,Remark 3.4].
We will provide two different sets of conditions for the existence of an invariant measure for SFDE (2.1) and present upper bounds for the rate of convergence to the equilibrium. To formulate our results we need to introduce some notation.
Let (E, B(E)) be a Polish space. Recall that the Wasserstein (or Kantorovich) distance between two probability measures µ, ν on (E, B(E)) is defined as follows: where d is a lower semicontinuous metric on E and the infimum is taken over all random variables X, Y that are distributed as µ and ν, correspondingly. If the metric d is the discrete metric, that is d(x, y) = ½(x = y), then the Wasserstein distance is equivalent to the total variation distance which is defined by where again the infimum is taken over all random variables X, Y that are distributed as µ and ν, correspondingly. In the paper we will consider only bounded distances d. In this case, convergence in total variation implies convergence in the Wasserstein metric; the latter is also equivalent to the weak convergence (see, e.g., [2]).
Throughout the paper, we will take the space C as the state space E. For x ∈ C we denote the diameter of the range of x by As in [10, Section 5], we consider the following family of distances on C: where ρ > 0. Now we are in position to present our main results. We consider two different groups of conditions which are sufficient for the existence of invariant measure and exponential or subexponential convergence to the equilibrium.
Assumption A2 (Exponential convergence). The diffusion g is globally bounded and the drift f is sublinear. The latter means that there exist constants β ∈ [0, 1), Furthermore, there exist constants σ, M > 0 and a function κ : Assumption A3 (Subexponential convergence). The diffusion g and drift f are globally bounded. Furthermore, there exist α ∈ (0, 1), σ > 0, M > 0 and a function κ : We will also present results concerning convergence in the total variation distance. To state these results we need an additional assumption on the structure of the drift and the diffusion.
Assumption A4 (Convergence in total variation). The drift f is globally Lipschitz and the diffusion g depends on x only through x(0). Theorem 2.1. Suppose that Assumptions A1 and A2 hold. Then SFDE (2.1) has a unique invariant measure π and the transition probabilities P t (x, ·) converge to it exponentially in the Wasserstein metric. That is, for any ρ > 0 there exist C > 0, λ 1 > 0, λ 2 > 0 such that for all x ∈ C we have W dρ (P t (x, ·), π) Ce λ 1 |x(0)|+D(x) e −λ 2 t , t 0. (2.5) Moreover, if additionally Assumption A4 holds, then the convergence in the Wasserstein metric in (2.5) can be replaced by convergence in total variation metric.
It is interesting to compare the obtained theorem with the corresponding result for SDEs. Recall that in the non-delay case the following condition is sufficient [25] for existence and uniqueness of the invariant measure and exponential convergence of transition probabilities in total variation: where M > 0, σ > 0. In other words, for large enough y ∈ R d the drift f should point towards the origin. Therefore, condition (2.3) is a direct equivalent of (2.6) for SFDEs. We can call it the extended Veretennikov-Khasminskii condition.
Note that it is sufficient to check (2.3) only for trajectories x with not too large diameters. This is quite important as it makes verifying (2.3) in practice much easier, see Section 3. The intuition here is the following. As one can see from the results of Section 4 below, for large enough n with high probability D(X n ) is approximately of the size O(|X(n)| β ) regardless of the initial conditions. Thus, it is very unlikely that the trajectory will have a much bigger diameter. Even if it happens, one can just wait till the trajectory has a smaller diameter and then the drift would point towards the origin. Thus, one has to check the extended Veretennikov-Khasminskii condition only for "typical" trajectories. Note that this additional assumption lim z→∞ κ(z)z −β = ∞ is optimal, see Section 3 for counterexamples.
The convergence in the Wasserstein metric in (2.5) cannot be replaced by the convergence in total variation without additional Assumption A4. This is due to the reconstruction property discussed above. If the diffusion does not depend on the past, then SFDE does not have the reconstruction property and the convergence occurs in total variation.
Let us also mention that one cannot hope to replace (2.3) by something like Indeed, the delayed Ornstein-Uhlenbeck equation (1.2) satisfies this assumption, but it does not have an invariant measure. Let us move on to our second main result that concerns subgeometrical convergence.
Theorem 2.2. Suppose that Assumptions A1 and A3 hold. Then SFDE (2.1) has a unique invariant measure π and the transition probabilities P t (x, ·) converge to it subexponentially in the Wasserstein metric. That is, for any ρ > 0 there exist Moreover, if additionally Assumption A4 holds, then the convergence in the Wasserstein metric in (2.7) can be replaced by convergence in total variation metric.
We see that in the subgeometrical case it is also enough to check the extended Veretennikov-Khasminskii condition only for trajectories with not too big diameter. The explanation is the same. It is worth mentioning that since the drift f "pushes" to the origin weaker than in the exponential case, one has to check (2.4) for a slightly bigger set of x than just "typical trajectories".
We also would like to mention that the obtained rate of convergence to infinity in the right-hand side of (2.7) matches the corresponding rate for the SDE case. The latter cannot be improved, see [8,Section 7.1].
The proofs of Theorems 2.1 and 2.2 are postponed till Section 4.
Convention on constants. Throughout the paper, we denote by C a positive constant whose value may change from line to line.

Examples and Counterexamples
In this section we present a number of examples showing how the theoretical results from Section 2 can be used for studying convergence of SFDEs. In addition to it, we provide some counterexamples that show the optimality (in a certain sense) of Assumptions A2 and A3.
We begin with the following example.
where the memory r 0, h : R → R is a smooth function such that h(z) = −|z| γ sign z for |z| 1, γ ∈ (−1, 1) and the diffusion g is bounded Lipschitz and non-degenerate.

This implies that there exists large enough
, then A2 holds. Therefore, by Theorem 2.1 SFDE (3.1) has a unique invariant measure and converges to it exponentially in the Wasserstein metric. If γ ∈ (−1, 0), then A3 holds. In this case we apply Theorem 2.2. We obtain that (3.1) still has a unique invariant measure but converges to it subexponentially with the rate given in (2.7).
Remark 3.1. In Example 3.1 it was crucial that it was sufficient to check condition (2.3) or (2.4) only for x ∈ C with not "too large" diameter. Evidently, these conditions are not satisfied for all x ∈ C. Thus, the exponential/subexponential ergodicity of (3.1) cannot be obtained by [10,Remark 5.2] or [3, Theorem 3.3].
Example 3.2. Using the same method we can study more general equations. Let d, m ∈ N, r 0. We are interested in ergodic properties of the SFDE We see that the drift and diffusion of equation (3.3) satisfy Assumption A1 and thus this equation has a unique strong solution.
In order to verify A2 and A3, we choose again κ(z) := z (1+γ)/2 . We consider the Jordan decomposition of the measure µ: where µ + and µ − are two finite nonnegative measures, and note that for any x ∈ C with . By our assumptions, we have c 1 > 0. The verification of A2 and A3 is completed exactly as in Example 3.1. Thus, applying Theorems 2.1 and 2.2, we obtain that X has a unique invariant measure and converges to it exponentially if γ ∈ [0, 1) or subexponentially if γ ∈ [−1, 0). Now we move on and present some counterexamples to demonstrate a certain optimality of the conditions in Theorems 2.1 and 2.2.
First, we consider the case β = 0. The next example shows that in this case it may happen that no invariant measure exists if the drift and diffusion satisfy all the conditions of Theorem 2.1 with the only exception that the condition lim z→∞ κ where f is a Lipschitz continuous function which takes values in [−1, A], and satisfies f (x) = A whenever D(x) N + 1 and f (x), x(0) −|x(0)| whenever D(x) N and |x(0)| 1. We claim that A > 0 can be chosen in such a way that for every fixed initial condition we have lim t→+∞ X(t) = +∞ a.s. (3.4) This would imply in particular that X does not have an invariant measure.
To verify the claim we fix the initial condition x ∈ C and introduce an auxiliary sequence Since the drift f is bounded from below by −1, we derive for n ∈ Z + , n 1 where we also used the fact that the memory r = 2 and hence X(n) − X(n − 1) N + 1 implies that D(X t ) N + 1 for all t ∈ [n, n + 1]. Recall that by definition By the strong law of large numbers, where ξ denotes the standard Gaussian random variable. Hence by taking large enough A we get Y (n)/n → 1, a.s. as n → ∞. Since X(n) Y (n), this yields (3.4); thus the claim is proved and the process X does not have an invariant measure.
Next we consider the case β ∈ (0, 1). We show that the condition lim z→∞ κ(z)z −β = ∞ in A2 cannot be replaced with the condition lim inf z→∞ κ(z)z −β N. Since we need to construct an example with unbounded κ, the proof here will be different from the proof in Example 3.3.
Consider the following stochastic delay equation for d = m = 1, r = 2: Nx(0) β . Similar to Example 3.3, one can easily extend f in such a way that Assumptions A1 and A2 hold with the only exception that condition (2.3) is satisfied for all x ∈ C with D(x) (N − 1)|x(0)| β and |x(0)| 1. Let us prove that SFDE (3.6) does not have an invariant measure.
Proof. We consider two different cases. If x(0) < 0, then Now we go back to our equation (3.6). Define the "bad" set Let us prove that if the process X starts with any initial condition from G, then it tends to infinity with positive probability. Put τ := inf{t 0 : X(t) = 1} and W * := inf s∈[0,1] W (s). Note that if X 0 ∈ G, then, thanks to Lemma 3.2, we have D(X s ) N|X(s)| β for any s ∈ [0, 1]. Hence, it follows from the definition of f that for any x ∈ G. This and (3.6) imply that if X 0 ∈ G, then for any s ∈ [0, 1] we have Therefore on the set {W * −X(0) + 1} we have τ 1. We employ this observation together with (3.7) to deduce for any x ∈ G where in the fourth transition we used the fact that x(0) 1 and hence We apply the Markov property of X to get for any x ∈ G where we defined recursively y 0 := x(0) and y n := y n−1 + 2Ny β n−1 . Since β > 0 and y n x(0) + n, we see that there exists large enough Z 0 0 such that the right-hand side of (3.8) is positive whenever x(0) Z 0 . Thus for any x ∈ G ′ := G ∩ {x(0) Z 0 } we have P x (lim n→∞ X(n) = +∞) > 0. This implies by [23, Theorem 3a and 3c] that X does not have an invariant measure.
Example 3.5. Finally, let us mention that the condition that the diffusion g depends on x only through x(0) in Assumption A4 also cannot be dropped. Indeed, consider again SFDE (3.1) with γ = 0, g(x) = g(x(−r)), r > 0, x ∈ C and g : R → R + is a bounded increasing and strictly positive function. This equation satisfies Assumptions A1 and A2 and its drift is Lipschitz. Nevertheless, as shown in [24] this equation converges to its invariant measure only weakly and not in total variation. Hence without this additional assumption, one cannot replace convergence in the Wasserstein metric in (2.5) and (2.7) by the convergence in total variation.

Proofs of the Theorems 2.1 and 2.2
Till the end of this section without loss of generality and to simplify the notation we assume that the memory r = 1. In Section 4.1 we establish general lemmas that are useful for the proofs of our main results. In Sections 4.2 and 4.3 we prove Theorems 2.1 and 2.2.

General tools
First let us verify that the strong solution to SFDE (2.1) has indeed a Markov property. Whilst this statement is well-known for the case of Lipschitz drift and diffusion, we were not able to find in the literature the proof of the Markov property of SFDE in the case of the one-sided Lipschitz drift. Thus we provide it here for the sake of completeness.
Since the function f is arbitrary, (4.1) would imply the Markov property for X. is a G t,s | B(C) measurable function. Introduce now a function Φ : C × Ω → C, (x, ω) → X (s,x) t (ω). By above, for any fixed x ∈ C the function Φ(x, ·) is G t,s | B(C)-measurable. By [10, Proposition 5.4], there exists Therefore Φ(x, ·) is continuous in probability with respect to x. Since the space C is Polish, [ where we also used the fact that X s is F s measurable and the σ-algebras F s and G t,s are independent. Now let us prove (4.1). It follows from the measurability properties of Φ established above, that f ( Φ(X s , ·)) is σ(X s , G t,s )-measurable. Using again the independence of F s and G t,s and a standard approximation argument (see, e.g., [19,Theorem 7 Similarly, E(f (X t )|X s ) = E(f ( Φ(X s , ·))|X s ) = Ef ( Φ(x, ·))| x=Xs and therefore identity (4.1) holds.
To establish the strong Markov property we employ again bound (4.3). This inequality and the Portmanteau theorem imply that the process X is Feller. Since it has also continuous trajectories, it is strongly Markov [22, Theorem 3.3.1].
As mentioned above, our approach for establishing ergodicity is based on Lyapunov functions. The propositions below state that if one is able to construct a "good" Lyapunov function, then SFDE (2.1) possesses all the required ergodic properties. These propositions essentially follow from the corresponding results in [3] and [10].
Recall that by P t we denoted the Markov semigroup associated with the strong solution to (2.1).
Let us check that this chain satisfies all the conditions of [3, Theorem 2.1]. By iterating (4.4) n 0 times, we see that Therefore the first condition of [3, Theorem 2.1] holds. As explained above, the space (C, d δ ) is a complete separable metric space, therefore the second condition is also met. It follows from estimates (4.6), (4.7), and our assumption lim x →∞ V (x) = +∞ that the third and the fourth conditions of [3, Theorem 2.1] are also satisfied.
Thus, all conditions of [3, Theorem 2.1] are met. Hence the skeleton chain has a unique invariant measure π, and there exist constants C 1 > 0, C 2 > 0 such that where we also used the fact that d ρ d δ . Now by a standard argument (see, e.g., [3, p. 550]), we see that the measure π is also a unique invariant measure for our original Markov kernel P t and that bound (4.5) holds. Proof. We begin by observing that, thanks to the additional Assumption A4, the Markov semigroup P t satisfies the Harnack inequality. Namely, it follows from [27,Theorem 4.1] (see also [5,Theorem 1.1]) that for any t > 1 and large enough p > p 0 , there exists C = C(p) such that Therefore for any x, y ∈ C, A ∈ B(C) we have Thus, we have the following bound on the total variation distance.
Now similar to the proof of Proposition 4.2 we fix t = 2 and consider the skeleton Markov chain with the transition kernel P (x, A) := P 2 (x, A), x ∈ C, A ∈ B(C).
It follows from the above that P satisfies all the assumptions of [ Therefore if t = 2n + s, where n ∈ Z + and s ∈ [0, 2], then for any x ∈ C we derive d T V (P t (x, ·), π) = d T V (P 2n+s (x, ·), P s π) d T V (P 2n (x, ·), π) C 1 (1 + V (x)) Ψ(H −1 Ψ (C 2 t)) 1−ε , where we made use of the nonexpanding property of the total variation metric. This completes the proof of the proposition.
The following two lemmas describe the behaviour of D(X t ). These lemmas provide very important estimates that will be used in the sequel. Lemma 4.4. Suppose that Assumption A1 holds. Assume that the drift f satisfies the growth condition (2.2) with β ∈ [0, 1) and the diffusion g is globally bounded. Then there exists a constant C > 0 such that for any λ 0 we have (4.8) Finally, there exist constants C 1 , C 2 > 0 such that for any z > 0 we have Proof. We begin by observing that for any x ∈ C where we denoted M(t) := t 0 g(X s ) dW (s). We make use of the growth condition (2.2) and the estimate D(X s ) D(x) + D(X 1 ), valid for all s ∈ [0, 1], to derive where we also used the fact that β < 1 and hence for some C β > 0 one has Cz β z/4+C β for all z 0. Substituting (4.12) into (4.11), we get To estimate the exponential moments of sup 0 t 1 |M(t)| we use the Dambis-Dubins-Schwarz theorem and the global boundedness of g. It follows that where B 1 , B 2 , . . . , B d are (possibly dependent) one-dimensional Brownian motions and Therefore, we apply the Cauchy-Schwarz inequality to get for any λ 0, x ∈ C E x exp{λ sup where by B we denoted a standard Brownian motion. This together with (4.13) implies (4.8).
Arguing as above, we see that there exists constants C > 0, λ 0 > 0 such that for any This together with (4.13) implies (4.9). Estimate (4.10) follows directly from (4.9) and the Chebyshev inequality.

Proof of Theorem 2.1
To prove Theorem 2.1 we use the following Lyapunov function: where the parameters λ > 0, γ > 0 are to be set later. To avoid technicalities we assume that the function κ from Assumption A2 is increasing and concave. Clearly, this is not a restriction at all: if Assumption A2 is satisfied, then there exists an increasing concave function κ such that A2 is also satisfied with κ in place of κ. It follows that there exists a constant C κ 1 such that κ(t) C κ (t + 1) for any t 0.
First, let us prove that V is a Lyapunov function on the set where D(x) is relatively big compared with |x(0)|. The heuristics here is as follows. As we explained in the introduction, it is not typical for the process X to have a large diameter. Thus, D(X t ) will decrease with high probability. This will also cause the decrease of the Lyapunov function V . Formally we have the following lemma.
The case when the initial diameter is "small" is much more complicated and more precise estimates are needed. In this case D(X t ) stays at the same level, and the decrease of the Lyapunov function V happens due to the decrease of |X(t)|. To formalize these ideas we will use the following version of the Gronwall inequality.  Proof. Consider the function g(t) := e −θt (f (0) − r/θ) + r/θ, 0 t T . Clearly, g(0) = f (0) and for any 0 s u T we have Hence by [11,Proposition 9.2], we have f (4.21) We start treating this case with the following key lemma. We want to apply a version of Gronwall's lemma to the function u → E x ϕ(X(u)).
First of all, we observe that this function is finite. Indeed, thanks to (4.8), we have for any x ∈ C, 0 u 1 E x ϕ(X(u)) C + E x e |X(u)| C + e |x(0)| E x e D(X 1 )| < ∞.
We make use of assumption A2 and apply Ito's lemma. We have for |y| M ∂ϕ(y) ∂y i = λe λ|y| |y| −1 y i ; ∂ 2 ϕ(y) ∂y i ∂y j = λ 2 e λ|y| |y| −2 y i y j − λe λ|y| |y| −3 y i y j , i = j; Thus, we derive for any x ∈ C, 0 s u 1 where we denoted We use the boundedness of g, the definition of ϕ, and estimate (4.8) to derive Thus (M(t)) t 0 is a martingale and To estimate I 1 we assume that |x(0)| M 0 : We continue this calculation in the following way. Recall that κ(z)/z β → +∞ as z → +∞. Therefore for any B > 0 there exists N = N(B) > 0 such that where C κ was defined in the beginning of Section 4.2. Hence Lemma 4.5 implies that there exist constants C 1 , C 2 such that for any B > 0 there exists N = N(B) > 0 such that for any t ∈ [0, 1] By Lemma 4.4, for any x ∈ C, t ∈ [0, 1] we have Using again the fact that κ(z)/z β → +∞ and combining the above estimate with (4.24) and (4.25), we see that there exist constants C 3 > 0, C 4 > 0 such that for any B > 0 there exists N 1 = N 1 (B) > 0 such that Next, we estimate the integrand in I 2 . We estimate this term applying Hölder's inequality to the three factors. The first and third factors are estimated as above. Further, for any x ∈ C, t ∈ [0, 1] where in the last inequality we used Lemma 4.4. Thus, there exist constants C 5 > 0, C 6 > 0 such that for any B > 0 there exists N 2 = N 2 (B) > 0 such that For M > M, x ∈ C we estimate the integrand in I 4 as follows. (4.28) Combining (4.23), (4.26), (4.27) and (4.28) with (4.22), we see that there exist constants C 7 > 0, C 8 > 0 such that for any B > 0 there exists Now, we choose M large enough and λ > 0 small enough so that Clearly θ is independent of B and N 3 . Recall also that we have checked in the beginning of the proof that E x ϕ(X(u)) < ∞ for any u ∈ [0, 1]. Thus, by Lemma 4.7 (a version of the Gronwall inequality) we get Lemma 4.9. Suppose that Assumptions A1 and A2 hold. Then there exist λ > 0, γ > 0, c 1 ∈ (0, 1), c 2 > 0 such that the function V defined in (4.15) satisfies the following inequality: Proof. First we note that for any λ > 0, γ > 0, x ∈ C we have We take ν > 0 and ρ ∈ (0, 1) as in Lemma 4.8 and put λ := ν/2. Then, thanks to Thus, it remains to show that on C B,N the second factor in the right-hand side of (4.30) is smaller than (1 + ρ/2). Without loss of generality we assume that N(B) is large enough so that BN β N (otherwise we can take larger N(B)). Using the inequality |X(1)| |x(0)| − D(X 1 ), we deduce for any x ∈ C B,N E x e 2(D(X 1 )−γ|X(1)| β ) + 1 + E x e 2(D(X 1 )−γ|X(1)| β ) where we have also used the fact that for some C γ > 0 we have γz β z + C γ for all z 0. We continue this estimate, using Lemma 4.4. Recall that on C B,N we also have D(x) |x(0)|, thanks to our additional assumption on N(B). Therefore we derive for any and the constant C depends neither on γ nor on B. Thus, taking γ = γ(C) large enough, and combining (4.30), (4.31) and (4.32), we see that for any B > 0 there exists a constant N 1 (B) such that on C B,N 1 we have Now with such λ and γ in hand we apply Lemma 4.6 with ε = 1 − ρ/4. We get that there exist B = B(λ, γ) > 0, L = L(λ, γ) > 0 such that Together with (4.33) this bound implies that for some N 2 = N 2 (λ, γ), Finally, if |x(0)| N 2 and D(x) L 1 , then by (4.17) This completes the proof of the lemma.

Proof of Theorem 2.2
Now we move on to the subgeometric case. We fix till the end of this section the constants α, σ, M and the function κ from Assumption A3. As above without loss of generality, we assume that κ is an increasing concave function. We work with a Lyapunov function where λ 1 > 0, λ 2 > 0, and ψ : R + → R + is an increasing continuous concave function. We will specify λ 1 , λ 2 , and the function ψ later. As before, we consider two cases: the "small" diameter case (where we gain from the decrease of the first factor in the Lyapunov function) and the "large" diameter case (where we gain from the decrease of the second factor in the Lyapunov function). We start with the second case. Recall the definition of λ 0 from Lemma 4.4.
By above, if D(x) > (R + ψ(|x(0)|)) 1/2 , then E x V (X 1 ) V (x)/2. Now we move on to the "small" diameter case. Recall the definition of the constant C κ from the beginning of Section 4.2.
This together with (4.46) proves the lemma.
Proof of Theorem 2.2. Let V be a Lyapunov function defined in (4.34) with the parameters specified in Lemma 4.12. Let Ψ : R + → R + be a differentiable concave increasing function such that Ψ(z) = z (log z) (2−2α)/α for large enough z (more precisely, for z N where N > 0 is the same as in Lemma 4.12). It follows from (4.42) that for x ∈ C with |x(0)| N E x V (X 1 ) V (x) − C 1 V (x)(log V (x)) (2α−2)/α + C 2 .
This together with (4.36) implies that for any x ∈ C Thus, condition (4.4) holds with the function Ψ defined above. Therefore Theorem 2.2 follows now from Propositions 4.2 and 4.3.