Quantitative contraction rates for Markov chains on general state spaces

We investigate the problem of quantifying contraction coefficients of Markov transition kernels in Kantorovich ($L^1$ Wasserstein) distances. For diffusion processes, relatively precise quantitative bounds on contraction rates have recently been derived by combining appropriate couplings with carefully designed Kantorovich distances. In this paper, we partially carry over this approach from diffusions to Markov chains. We derive quantitative lower bounds on contraction rates for Markov chains on general state spaces that are powerful if the dynamics is dominated by small local moves. For Markov chains on $\mathbb{R^d}$ with isotropic transition kernels, the general bounds can be used efficiently together with a coupling that combines maximal and reflection coupling. The results are applied to Euler discretizations of stochastic differential equations with non-globally contractive drifts, and to the Metropolis adjusted Langevin algorithm for sampling from a class of probability measures on high dimensional state spaces that are not globally log-concave.


Introduction
In recent years, convergence bounds for Markov processes in Kantorovich (L 1 Wasserstein) distances have emerged as a powerful alternative to more traditional approaches based on the total variation distance [36], spectral gaps and L 2 bounds [27,9,10], or entropy estimates [27,9,1]. In particular, Hairer, Mattingly and Scheutzow have developed an analogue to Harris' Theorem assuming only local strict contractivity in a Kantorovich distance on the "small" set and a Lyapunov condition combined with non-strict contractivity outside, cf. [22,24]. Meanwhile there have been numerous extensions and applications of their result [25,6,3,11].
In [29], Joulin and Ollivier have shown that strict Kantorovich contractivity of the transition kernel implies bounds for the variance and concentration estimates for ergodic averages of a Markov chain. Their results have since been extended to cover more general frameworks by Paulin [38]. More recently, Pillai and Smith [39] as well as Rudolf and Schweizer [40] have developed a perturbation theory for Markov chains that are contractive in a Kantorovich distance, cf. also Huggins and Zou [26] as well as Johndrow and Mattingly [28] for related results. These works show that variants of the results in [29] carry over to perturbations of the original chain, thus paving the way for a much broader range of applications.
All the works mentioned above assume that, at least locally, strict contractivity holds w.r.t. an L 1 Wasserstein distance based on some underlying distance function on the state space of the Markov chain. The contraction rate is the key quantity in the resulting bounds, and it is hence important to develop applicable methods for quantifying contraction rates.
Contractivity with respect to the L 1 Wasserstein distance based on the Euclidean distance in R d is sometimes interpreted as non-negative Ricci curvature of the Markov chain w.r.t. this metric [41,29,37]. This is a strong condition that is often not satisfied in applications. However, in many cases it is still possible to obtain contractivity with respect to a Kantorovich distance in which the underlying distance function has been modified accordingly. This allows for applying the results from [29] to a significantly broader class of examples. For diffusion processes, a corresponding approach to quantitative contraction rates in appropriately designed metrics has been developed systematically in recent years in a series of papers [15,17,44,19,20], see also [4,5,42] for previous results. The approach has been extended to Lévy driven SDEs in [33,32], see also [31,43].
Below we propose a corresponding approach for Markov chains on general metric state spaces. The approach is powerful in situations where the dynamics is dominated by small, local moves. This will be demonstrated below for Euler schemes for non-globally contractive stochastic differential equations, as well as for the Metropolis-adjusted Langevin Algorithm (MALA). In these cases, the Ricci curvature condition required in [29] is not satisfied in the standard L 1 Wasserstein distance and hence the construction of an alternative metric is required. For dynamics dominated by large or global moves, our approach does not apply in the form presented here. Sometimes, related approaches can be used nevertheless, see e.g. [2] for the construction of a contractive distance for Hamiltonian Monte Carlo.

Main results
Let p(x, dy) be a Markov transition kernel on a separable metric space (S, d).
For probability measures µ and ν on S, the Kantorovich distance (L 1 Wasserstein distance) W ρ (µ, ν) based on the underlying distance function ρ is defined as Here the infimum is over all couplings of µ and ν, i.e., over all random variables X, Y defined on a common probability space (Ω, A, P) such that P • X −1 = µ and P • Y −1 = ν.
For f 0 ≡ 0, a = 1 and V ≡ 0, W ρ coincides with the total variation distance d TV (µ, ν) (or with d TV (µ, ν)/2, depending on the convention used in the definition of the total variation distance), whereas for f 0 (r) = r, a = 0 and V ≡ 0, W ρ is the standard L 1 Wasserstein distance W d on (S, d). The distance functions we consider are in between these two extremes. Notice, however, that if a > 0 then (2.3) d TV (µ, ν) ≤ a −1 W ρ (µ, ν), and if f (r) ≥ br for some constant b > 0 then Therefore, in these cases, contraction properties w.r.t. W ρ directly imply upper bounds for the total variation and L 1 Wasserstein distances w.r.t. the metric d.
We now assume that we are given a Markovian coupling of the transition probabilities p(x, ·) (x ∈ S) in the form of measurable maps X ′ , Y ′ : Ω → S, defined on a measurable space (Ω, A), and a probability kernel (x, y, A) → P x,y (A) from S × S × Ω to [0, 1] such that for any x, y ∈ S, (2.5) X ′ ∼ p(x, ·) and Y ′ ∼ p(y, ·) under P x,y .
For probability measures µ on S and γ on S × S let (µp)(B) = µ(dx)p(x, B) for B ∈ B(S), and P γ (A) = γ(dx dy)P x,y (A) for A ∈ A. Note that if γ is a coupling of two probability measures µ and ν on S, then under P γ the joint law of (X ′ , Y ′ ) is a coupling of the probability measures µp and νp, i.e., Our goal is to derive explicit bounds of the form where c is a strictly positive constant. Here the choice of the metric ρ is adapted in order to maximize the value of c in our bounds. If (2.7) holds, then the transition kernel p is a strict contraction w.r.t. the distance W ρ .
Proof. Let µ and ν be probability measures on S and suppose that γ is a coupling of µ and ν. Then, under P γ , the joint law of (X ′ , Y ′ ) is a coupling of µp and νp. Therefore by (2.7), The assertion follows by taking the infimum over all couplings of µ and ν.
In the terminology of Joulin and Ollivier [29], (2.8) says that the Markov chain has a Ricci curvature lower bound c on the metric space (S, ρ). By general results, such a bound has many important consequences including quantitative convergence to a unique equilibrium [18], upper bounds on biases and variances as well as concentration inequalities for ergodic averages [29,38], a central limit theorem for ergodic averages [30], robustness under perturbations [39,40,26,28], etc. However, in applications, it is usually not clear how to choose a distance function ρ such that we have good bounds for c. This is the problem addressed in this paper for the case of a "local dynamics" where the Markov chain is mainly making "small" moves. Depending on whether or not the probability measures p(x, ·) and p(y, ·) have a significant overlap for x close to y, we suggest two different approaches.
2.1. Contractivity with positive coupling probability. Our first two general results apply in situations where the probability measures p(x, ·) and p(y, ·) have a significant overlap if x and y are sufficiently close. In this case we can always consider a coupling ((X ′ , Y ′ ), P x,y ) of the transition probabilities such that P x,y [X ′ = Y ′ ] > 0 for x close to y. This enables us to obtain strict contractivity in metrics that have a total variation part, i.e., the function f defining the underlying distance has a discontinuity a > 0 at 0.
Hence β(r) is an upper bound for the expectation of the increase ∆R of the distance during a single transition step of coupled Markov chains with initial states x and y such that d(x, y) = r. Similarly, α(r) is a lower bound for distance decreasing fluctuations of ∆R, and π(r) is a lower bound for the coupling probability. We make the following assumptions on α, β and π: (A1) There exists a positive constant r 0 ∈ (0, ∞) such that (a) inf r∈(0,r 0 ] π(r) > 0, and (b) inf r∈(r 0 ,s) α(r) > 0 for any s ∈ (r 0 , ∞).
Then for any x, y ∈ S, where c is an explicit strictly positive constant defined in (3.13) below.
The proof is given in Section 3. Explicit expressions for the function f and the contraction rate c depending only on α, β, π and ε are given in Subsection 3.1. Although these expressions are somehow involved, they can be applied to derive quantitative bounds in concrete models. In particular, the asymptotic dependence of the contraction rate on parameters of the model can often be made explicit. This will be demonstrated for the Euler scheme in Section 2.4.
By Lemma 2.1, Theorem 2.2 implies that the transition kernel p is contractive with rate c w.r.t. the W ρ distance on probability measures on S. Since the function f defined in (3.8) is bounded from below by a multiple of both 1 (0,∞) and of the identity, the theorem yields quantitative bounds for convergence to equilibrium both w.r.t. the total variation and the standard L 1 Wasserstein distance.
The assumption (A3) imposed in Theorem 2.2 is sometimes too restrictive. By a modification of the metric, it can be replaced by the following Lyapunov condition: In ((A4)b) we use the convention that the value of the fraction is +∞ if β(r) ≤ 0.
is the concave increasing function defined in (3.17) below, and the constant M ∈ R + is defined in (3.19). Then for any x, y ∈ S, where c is an explicit strictly positive constant defined in (3.24) below.
The proof of the theorem is given in Section 3 and explicit expressions for the function f and the constants M and c in terms of α, β, π, ε, V, C and λ are provided in Subsection 3.2.
The idea of adding a Lyapunov function to the metric appears for example in [21] and has been further worked out in the diffusion case in [19]. Theorem 2.3 can be seen as a more quantitative version of Theorem 4.8 in [24], which is an extension of the classical Harris' Theorem. Note, however, that contractivity in our result is expressed in an additive metric ρ, as opposed to the multiplicative semimetric used in [24]; see also [19] for a more detailed discussion on these two types of metrics. An application of Theorem 2.3 to the Euler scheme is given in Theorem 6.1 below.

2.2.
Contractivity without positive coupling probability. The assumption that there is a significant overlap between the measures p(x, ·) and p(y, ·) for x close to y is sometimes too restrictive. For example, it may cause a bad dimension dependence of the resulting bounds in high dimensional applications. Therefore, we now state an alternative contraction result that applies even when π(x, y) = 0 for all x and y.
We now impose the following conditions on α and β: Thus we no longer assume a positive coupling probability for r < r 0 . Instead, we require in (B1) and (B2) that α(r) = Ω(r) and β(r)/α(r) = O(1) as r ↓ 0. These assumptions can be verified for example for Euler schemes if the coupling is constructed carefully. We will do this in Section 2.4 for Euler discretizations of SDEs with contractive drifts, whereas for more general drifts we will follow a slightly different approach.
where c is an explicit strictly positive constant defined in (4.12) below.
The proof is given in Section 4. Notice that in contrast to Theorem 2.2 and Theorem 2.3, the function f in Theorem 2.4 does not have a jump at 0, i.e., the Kantorovich metric W ρ does not contain a total variation part. This corresponds to the fact that under Assumptions (B1), (B2) and (B3), it can not be expected in general that the coupled Markov chains meet in finite time.

2.3.
Stability under perturbations. Contractions in Kantorovich distances can sometimes be carried over to small perturbations of a given Markov chain. For instance, in Subsection 2.5 we will deduce contractivity for the Metropolis adjusted Langevin algorithm from corresponding properties of the Euler proposal chain. Suppose as above that ((X ′ , Y ′ ), P x,y ) is a Markovian coupling of the transition probabilities p(x, ·) and p(y, ·). Moreover, let (( X, Y ), P x,y ) be a corresponding coupling of p(x, ·) and p(y, ·) for another (perturbed) Markov transition kernel p on S. Here we assume that for given x, y ∈ S, (X ′ , Y ′ ) and ( X, Y ) are defined on a common probability space. We start with a simple observation. If there exists a metric ρ on S and a constant c ∈ (0, ∞) such that for x, y ∈ S, In applications it is often difficult or even impossible to verify Condition (2.28) for x very close to y. If P x,y [ X = Y ] > 0 for x close to y, then this condition can be relaxed.
Assume that p > 0, b ≤ cf (r 0 )/4, and let ρ be the metric defined by Then The proof is given in Section 5. In Section 7 we will apply Theorem 2.5 to our results for the Euler scheme in order to obtain contractivity for the Metropolis adjusted Langevin algorithm (MALA).
Note that Theorem 2.5 is related to the perturbation results in [39,40,28]. In all these papers, a Kantorovich contraction in some metric is assumed for the initially given unperturbed Markov chain. Then, in [39], the authors obtain bounds on the distance to equilibrium of a perturbed Markov chain in the same Kantorovich metric. In [40,28], the metric also remains unchanged, but the object of interest is a bound on the distance between a perturbed and the unperturbed chain. A related result in continuous time, giving bounds on the distance between invariant measures of a perturbed and an unperturbed diffusion, has been obtained in [26]. In contrast to these results, we consider a perturbed metric in Theorem 2.5, but we obtain a stronger result showing that the perturbed Markov chain is again contractive w.r.t. the modified metric.
2.4. Application to Euler schemes. We now show how to apply the general methods developed above to Euler discretizations of stochastic differential equations of the form where (B t ) t≥0 is a Brownian motion in R d , and b : R d → R d is a Lipschitz continuous vector field. Quantifying contraction rates for Euler discretizations is important in connection with the derivation of error bounds for the unadjusted Langevin algorithm (ULA), cf. [7,14,13,12,8] for corresponding results. Such applications of the techniques presented below will be discussed in detail in the upcoming paper [34] by the second author. The transitions of the Markov chain for the Euler scheme with step size h > 0 are given by (2.36) x →x + √ hZ, wherex := x + hb(x) and Z ∼ N(0, I d ).
The corresponding transition probabilities are given by (2.37) p(x, ·) = N(x, hI d ) for any x ∈ R d , φx ,hI (X ′ )xŷ Figure 1. Construction of the coupling of p(x, ·) and p(y, ·): Given the value of X ′ , we set Y ′ = X ′ with the maximal probability min(1, p(y, i.e., the transition density from x is In the case of b ≡ 0, the Markov chain is a Gaussian random walk with transitions We consider the coupling of two transitions of the Euler chain from x and y respectively given by where Z ∼ N(0, I d ) and U ∼ Unif(0, 1) are independent random variables, and is obtained by adding toŷ the increment √ hZ added tox, reflected at the hyperplane betweenx andŷ.
Both (X ′ , Y ′ refl ) and (X ′ , Y ′ ) are couplings of the probability measures p(x, ·) and p(y, ·). For the coupling (2.40), Y ′ = X ′ with the maximal probability min(1, p(y, X ′ )/p(x, X ′ )). Furthermore, in the case where Y ′ = X ′ , the coupling coincides with the reflection coupling, i.e., Y ′ = Y ′ refl . The resulting combination of reflection coupling and maximal coupling is an optimal coupling of the Gaussian measures p(x, ·) and p(y, ·) w.r.t. any Kantorovich distance based on a metric ρ(x, y) = f (|x − y|) with f concave, cf. [35] for the one-dimensional case. We will not use the optimality here, but it shows that (2.40) is an appropriate coupling to consider if we are interested in contraction properties for single transition steps of the Markov chain. Remark 2.6 (Relation to reflection coupling of diffusion processes). A reflection coupling of two copies of a diffusion process satisfying a stochastic differential equation of the form (2.35) is given by Hence the noise increment is reflected up to the coupling time, whereas after time T , X t and Y t move synchronously. Our coupling in discrete time has a similar effect. Ifx andŷ are far apart then the transition densities φx ,hI and φŷ ,hI have little overlap, and hence reflection coupling is applied with very high probability. If, on the other hand,x andŷ are sufficiently close, then with a non-negligible probability, X ′ = Y ′ . Once both Markov chains have reached the same position, they stick together since their transition densities coincide subsequently. In this sense, the coupling (2.40) is a natural discretization of reflection coupling. Indeed, we would expect that as h ↓ 0, the coupled Markov chains with time rescaled by a factor h converge in law to the reflection coupling (2.42) of the diffusion processes. On the other hand, a coupling of Markov chains in which jumps are always reflected (i.e., a coupling without the positive probability of jumping to the same point) would converge as h ↓ 0 to a reflection coupling of diffusions in which the coupled processes do not follow the same path after the coupling time.
We assume that under the probability measure P x,y , (X ′ , Y ′ ) is the coupling of p(x, ·) and p(y, ·) introduced above. We set and we consider the intervals Thus in the notation from Section 2.2, we set In particular, Notice that the definitions ofβ andα differ from those of β and α given in (2.20) and (2.21), sinceβ andα take into account only the coupled random walk transition step from (x,ŷ) to (X ′ , Y ′ ), but not the deterministic transition from (x, y) to (x,ŷ). We also consider Assumptions. In our main result for the Euler scheme we assume that there exist constants J ∈ [0, ∞) and K, L, R ∈ (0, ∞) such that the following conditions hold: (C1) One-sided Lipschitz condition: for any x, y ∈ R d .
(C2) Strict contractivity outside a ball: (C3) Global Lipschitz condition: Notice that by (C2) and (C3), Note, however, that we can often choose J much smaller than L, e.g., we can even The global Lipschitz condition is required for the stability of the Euler scheme, but the constant L will affect our lower bound for the contraction rate only in a marginal way. On the other hand, our bound on the contraction rate will depend in an essential way on the one-sided Lipschitz constant J.
The bounds provided in the next lemma are crucial to apply the techniques developed above to the Euler scheme.
The proof of the lemma is contained in Section 6.
Contractive case. At first, we consider the case where the deterministic part of the Euler transition is a contraction, i.e., In this simple case, we can prove a rather sharp result. We choose a metric Here a is a non-negative constant, R is chosen as in Assumption (C2), and g a : [0, R] → R is an appropriately chosen decreasing function (see (6.14) for a = 0 and (6.20) for a = 0) satisfying Hence f a is a concave increasing function satisfying r/2 ≤ f a (r) ≤ r, and thus In particular, the distance ρ 0 is equivalent to the Euclidean distance.
and let ρ a be defined by (2.53) with g a specified in (6.14), (6.20), respectively. Let where c 0 is the explicit constant in Lemma 2.7. Then (2.54) and (2.55) hold, and if h ∈ (0, h 0 ), then The proof, based on Theorem 2.2 for a > 0 and Theorem 2.4 for a = 0, is given in Section 6. Remark 2.9 (Dependence on parameters and dimension). The lower bound for the contraction rate in (2.58) is of the correct order Ω(h min(R −2 , K)). This corresponds to the optimal contraction rate Θ(min(R −2 , K)) for the corresponding diffusion process, see [17, Lemma 1 and Remark 5]. Note also that the lower bound for the contraction rate does not depend on the dimension d provided the parameters R, K and L can be chosen independent of d.
General case. We now turn to the general, not globally contractive case. Here it is no longer possible to obtain contractivity w.r.t. a metric satisfying (2.55), but we can still choose a metric that is comparable to the Euclidean distance, and apply the theorems above. We illustrate this at first by applying Theorem 2.2. Let We now choose a metric ρ a : Here a is a non-negative constant, with R and c 0 chosen as in Assumption (C2) and Lemma 2.7, respectively, let ρ a be defined by (2.60) with ϕ and g a specified in (2.61) and (6.20), respectively, and let Note that except for the additional factor ϕ(R) = exp(− c −1 0 Λ(R 2 +2 √ hR)), the expression for the contraction rate c 2 (a) is similar to the one for the rate c 1 (a) in the contractive case. The proof based on Theorem 2.2 is given in Section 6. If the interval [2 √ h, Φ(R)] is empty, the theorem can still be applied with R replaced by a slightly larger value. It is also possible to replace Condition (C2) by a Lyapunov condition and apply Theorem 2.3 instead of Theorem 2.2. A corresponding result for the Euler scheme is given in Section 6, cf. Theorem 6.1.
Remark 2.11 (Dependence on parameters and dimension). The lower bound for the contraction rate in (2.64) does not depend on the dimension d provided the parameters R, K and Λ can be chosen independent of d. Moreover, by choosing h sufficiently small, we can ensure that Λ is close to the one-sided Lipschitz constant J. Hence the global Lipschitz constant L is only required for controlling the step size h, whereas the contraction properties for sufficiently small h can be controlled essentially by one-sided Lipschitz bounds. This is important since in many applications, only a one-sided Lipschitz condition is satisfied globally. In this case, our approach can still be applied on a large ball if the step size is chosen sufficiently small depending on the radius of the ball and the growth of the local Lipschitz constant.
The explicit expression for the metric in Theorem 2.10 is a bit complicated. As an alternative, we can use a simplified metric without a discontinuity that is sufficient to derive bounds of similar order as for the metric used above, whenever condition (C2) is satisfied. We assume hL ≤ 1/6, and we set (2.65) with R and L as in Assumptions (C2) and (C3). The choice of r 1 ensures that Let c 0 denote the explicit constant in Lemma 2.7, and let We now consider a simplified metric of the form where c 0 is chosen as in Lemma 2.7. Then The proof of the theorem is contained in Section 6.
Remark 2.13. Again, the lower bound c 2 for the contraction rate only depends on R, K and Λ. Furthermore, note that r exp(−qr 1 ) ≤ f (r) ≤ r for all r ≥ 0, and hence the metric ρ is comparable to the Euclidean distance. As a consequence, Theorem 2.12 implies weak contractivity in the standard L 1 Wasserstein distance. Note also that the function f depends on the discretization parameter h via q and r 1 . It is, however, possible to modify the definition of f so that it no longer depends on h, at the cost of getting a worse constant c 2 . We refer the interested reader to [34], where similar bounds are used with a metric independent of h.
Theorem 2.12 can be extended to cover pseudo metrics based on functions that are strictly convex at infinity. This allows for obtaining upper L 2 bounds for Euler schemes under similar assumptions as above. Such bounds are applied to the analysis of Multi-level Monte Carlo algorithms in the upcoming paper [34].
2.5. Application to MALA. The Metropolis-adjusted Langevin Algorithm is a Metropolis-Hastings method for approximate sampling from a given probability measure µ where the proposals are obtained by an Euler discretization of an overdamped Langevin SDE. In [16], the dimension dependence of contraction rates of MALA chains w.r.t. standard Kantorovich distances has been studied for a class of strictly log-concave probability measures that have a density w.r.t. a Gaussian reference measure. Our goal is to provide a partial extension of these results to non log-concave measures. By considering the MALA transition step as a perturbation of the Euler proposals, we obtain contraction rates w.r.t. a modified Kantorovich distance provided the discretization time step is of order h = O(d −1 ).
We consider a similar setup as in [16]: µ is a probability measure on R d given by where V is a function in C 4 (R d ), We assume that we are given a norm · − on R d such that as well as finite constants C n ∈ [0, ∞), p n ∈ {0, 1, 2, . . .}, and K c , R c ∈ (0, ∞) such that the following conditions hold for any n ∈ {1, . . . , 4}: Here (2.76) can be interpreted as strict convexity of U outside a Euclidean ball.
Remark 2.14. (i) For discretizations of infinite-dimensional models, · − is typically a finite-dimensional approximation of a norm that is almost surely finite w.r.t. the limit measure in infinite dimensions, see for instance [16, Example 1.6]. Correspondingly, we may assume that the measure concentrates on a ball of a fixed radius w.r.t. · − . This will be relevant for the application of Theorem 2.16 below, which states uniform contractivity on such balls. (ii) Condition (2.75) is the same condition that has been assumed in the strictly convex case in [16]. (iii) In (2.76), we assume strict convexity outside a ball of fixed radius w.r.t. the Euclidean norm and not w.r.t. · − . Such a bound can be expected to hold with R c independent of the dimension if, for example, the non-convexity occurs only in finitely many directions. The application of a coupling approach in situations where (2.76) does not hold requires more advanced techniques, see e.g. [44].
The transition step of a Metropolis-Hastings chain with proposal density p(x, y) and target distribution µ(dx) = µ(x) dx is given by where x is the previous position, X ′ is the proposed move, is the Metropolis-Hastings acceptance probability, and U ∼ Unif(0, 1) is a uniform random variable that is independent of X ′ . We consider the proposal where h ∈ (0, 2) is the step size of the time discretization. The corresponding proposal kernel is 1+ε/4 , we see that the proposal is a transition step of the semi-implicit Euler discretization The reason for considering the semi-implicit instead of the explicit Euler approximation is that under appropriate conditions, the acceptance probability Lemma 2.15 (Upper bounds for rejection probability). Suppose that (2.75) holds and let k ∈ N. Then there exists an explicit polynomial P k : R 2 → R + of degree max(p 3 + 3, 3p 2 + 2) such that for any x ∈ R 2 and h ∈ (0, 2), The proof of the lemma is given in [16, Proposition 1.7]. The polynomials P k are explicit. Their coefficients depend only on the constants C 2 , C 3 , p 2 and p 3 in (2.75) and on the moments . Therefore, the results in the last section apply to the proposal chain, thus yielding a contraction rate of order Ω(h). Since the rejection probability is of higher order, we can then apply the perturbation result in (2.5) to prove a corresponding contractivity for the MALA chain. To this end, we consider the coupling ( X, Y ) of transition steps of the MALA chain from positions x, y ∈ R d given by (2.77) and where (X ′ , Y ′ ) is the (optimal) coupling for the proposal steps considered in (2.40), and U ∼ Unif(0, 1) is independent of both X ′ and Y ′ . Hence, the proposals are coupled optimally and the same uniform random variable U is used to decide about acceptance or rejection for each of the steps. Nevertheless, in general ( X, Y ) is not an optimal coupling of the corresponding MALA transition probabilities. .
The function f and the constants c 3 and h 0 depend only on R and on the values of the constants C n , p n , K c , R c in assumptions (2.75), (2.76).
The proof of Theorem 2.16 is given in Section 7.
Remark 2.17. The theorem shows that by choosing the step size of order Θ(d), a contraction rate of the same order holds on balls w.r.t. · − provided conditions (2.75) and (2.76) are satisfied. In the strictly convex case, it has been shown in [16] by a synchronous coupling that a corresponding result holds even for step sizes of order Θ(1) if the Euclidean norm in (2.82) is replaced by · − . One could hope for a similar result in the not globally convex case, but the combination of reflection coupling with a different norm leads to further difficulties. A possibility to overcome these difficulties might be the two-scale approach developed in [44].

Proofs of Theorems 2.2 and 2.3
In this section, we prove the first two theorems. We first specify the explicit choice of the metric and the explicit values of the contraction rate c. The reason for choosing the metric this way will become clear by the subsequent proofs of the theorems.
For r, s > 0, we consider the intervals  For r ∈ (r 0 , ∞) we set  where sup ∅ = 0. By Assumption (A3) we can choose γ such that r 1 is finite. We have We also fix a constant r 2 ∈ (r 1 , ∞). The value of r 2 will be determined in condition (3.12) below. The underlying metric we consider is given by (2.15 Assumption (A3) ensures that such a constant exists. Indeed, for r ≥ r 1 , γ vanishes, whence ϕ is constant and Φ is linear. By definition of α, we see that α is uniformly bounded by ε 2 . Therefore, the value on the right hand side of (3.12) goes to zero as r 2 → ∞, and (3.12) holds for large r 2 by (A3). The contraction rate is now given by Note that (3.13) guarantees that g(r) ≥ 1 2 for r ≤ r 2 .

3.2.
Choice of the metric in Theorem 2.3. Now suppose that (A1), (A2) and (A4) hold. In this case we set We also fix a constant r 2 ∈ (r 1 , ∞). The value of r 2 will be determined by condition By (A4)b and since ϕ ≤ 1, there always exists a finite r 2 such that (3.21) holds. To optimize the estimates, we choose r 2 as small as possible, i.e., we set and the function g is defined as ds .
Note that (3.24) and (3.19) guarantee that g(r) ≥ 1 2 for r ≤ r 2 . In the minimum defining c, the first term guarantees contractivity for r ≤ r 0 , the second term is used for all r, the third term guarantees contractivity for r ≥ r 2 and the last term ensures contractivity with rate c for r 0 < r ≤ r 1 .

Proof of Theorem 2.2 and Theorem 2.3.
Since the arguments are similar, we prove both theorems simultaneously, distinguishing cases where needed. In the situation of Theorem 2.2, we set M = 0. Let x, y ∈ R d such that r = |x − y| > 0.
where I r = ((r − ε) + , r). By taking expectations, we conclude that Our goal is to compensate the second term by the first term for r ≤ r 0 and by the last term for r 0 < r ≤ r 2 (and possibly by a Lyapunov part for r ≥ r 2 ). In order to verify (2.16) and (2.18), we now distinguish three cases.
Case r ∈ (r 0 , r 2 ). Since f ′ = gϕ on (0, r 2 ), we have Note that both summands are negative since g and ϕ are decreasing. Now we note first that our choice of ϕ guarantees that Since ϕ ′ ≤ 0 and g is decreasing, we have But indeed, by definition of ϕ and γ, we have for s ∈ I r ϕ ′ (s) = − γ(s)ϕ(s) ≤ −γ(r)ϕ(r) .
To prove that the right hand side is bounded from above by −cf (r), it is sufficient to show c(a + Φ(r)) ≤ −β(r) ϕ(r 1 ) 2 for any r ≥ r 2 .
We claim that this holds by the definition of r 2 . Indeed, by the definition of c, Hence, by (3.12), and thus y) . Finally, we now show contractivity for r ≥ r 2 under the conditions in Theorem 2.3. Here, by (3.38) and (3.26), Since c ≤ λ/4 by its definition, we obtain However, due to our choice of r 2 in (3.22) and since f ′ ≤ ϕ, we have Moreover, due to our choice of c in (3.24) and since f ≤ Φ, we get Hence (3.45) is indeed satisfied for r ≥ r 2 and the proof is complete.

Proof of Theorem 2.4
For proving Theorem 2.4, we proceed in a similar way as in the proofs of Theorem 2.2 and Theorem 2.3 above. Suppose that conditions (B1), (B2) and (B3) hold. Now, the intervals I r , r ∈ (0, ∞) are given by (2.19) and we consider the dual intervalsÎ s , s ∈ (0, ∞) defined by  Note the additional factor 4 that has been introduced for technical reasons for s < 2r 0 . In applications, this will usually not affect the bounds too much, as typically r 0 is a small constant. Condition (4.6) can always be satisfied by choosing r 0 small enough. As in (3.6), we set (4.7) r 1 := sup{r > 0 : γ(r) > 0}, where sup ∅ = 0. Similarly as below (3.6), by Assumption (B3), we can choose γ such that r 1 is finite. The metric is chosen similarly as in the proof of Theorem 2.2 above, where now a = 0. We define Here (4.9) the constant r 2 is chosen such that where the contraction rate c is given by Proof of Theorem 2.4. Let x, y ∈ R d and r = d(x, y). For r ≥ r 2 , (2.26) follows in the same way as in the proof of Theorem 2.2 (with a = 0). The crucial assumption for this is (4.10), which holds due to (B3), by analogy to (3.12) in the proof of Theorem 2.2, which holds due to (A3). Now assume that r < r 2 . To prove (2.26), we show that The first inequality follows similarly as in the proof of Theorem 2.2, cf. (3.26). To prove the second inequality, note that on (0, r 2 ), By (4.14) it is sufficient to show that ϕ, g and c have been chosen in such a way that Then, by (4.13) we can conclude that We first verify (4.15). This condition is satisfied provided For s < 2r 0 we have to argue differently, since, in general,Î s is not contained in (s, ∞) in this case. Observe first that if supÎ s (γgϕ) ≤ 0, then (4.17) holds trivially since ϕ is decreasing. Hence it is sufficient to consider the case of supÎ s (γgϕ) > 0. Noting that gϕ ≤ 1, we have by (4.5) sup Is (γgϕ) ≤ sup and hence, since g ≥ 1 2 , Thus, (4.17) holds for s < 2r 0 since by (4.6), We thus have shown that (4.17) and hence (4.15) are satisfied. It remains to verify (4.16). This condition holds provided or, equivalently, The function g has been chosen in (4.11) in such a way that this condition is satisfied.

Proof of perturbation result
We now prove the perturbation result in Theorem 2.5. Let x, y ∈ S, x = y. By (2.33), (2.30), (2.31) and (2.32), Note that in the second inequality we have used that f is a contraction. For d(x, y) < r 0 we obtain For d(x, y) ≥ r 0 , we use the fact that b = cf (r 0 )/4. Hence, The assertion of Theorem 2.5 follows from (5.1) and (5.2).

Proof of results for the Euler scheme
In this section, we prove the contraction results for the Euler scheme.

Proof of Lemma 2.7 (i), (ii) and (iii).
We start with reduction steps. At first, we observe that the definitions ofβ(x, y),α(x, y) andπ(x, y) only depend onr = |x−ŷ| and R ′ = |X ′ − Y ′ |. Thus, the assertions (i) (ii), (iii) are statements about the coupled random walk transition step (x,ŷ) → (X ′ , Y ′ ) defined by (2.40), and we may assume w.l.o.g. that (x,ŷ) = (x, y). Furthermore,r and the law of R ′ under P x,y are invariant under translations and rotations of the underlying state space R d . Therefore, we may even assume w.l.o.g. thatx = x = 0 andŷ = y = re 1 , where r =r and e 1 , . . . , e d denotes the canonical basis of R d . Then In particular, Since this is distributed as in the one-dimensional case, we may assume w.l.o.g. d = 1.
We are now left with a simple one-dimensional problem where x = 0, y = r, andr = r = |x − y|. The coupling is given by (6.4) where Z ∼ N(0, 1) and U ∼ Unif(0, 1) are independent. Hence X ′ ∼ N(0, h), the conditional probability given Z that Here we have used in the third step that the integrand is symmetric w.r.t. t = r/2, i.e., invariant under the transformation t → r −t. Thusβ(x, y) = E x,y [R ′ −r] = 0, which proves Assertion (i).
Next, we are going to prove the lower bound forα(x, y). Recall from (2.43) and (2.44) that I r = (0, r + √ h) for r < √ h and I r = (r − √ h, r) for r ≥ √ h. We first consider the case r ≥ √ h. Similarly as above, we obtain Here we have used in the last step that s → s(u − s/2) is decreasing for s ≥ u, and r/ √ h ≥ 1 ≥ u for u ∈ [0, 1/2]. Note that in the second step we only use the reflection behaviour of the coupling. This is due to the fact that the contribution from jumping to the same point would be of negligible order in h. Now assume Here, we have used in the last step that for r < √ h and u ∈ [−1/2, 0], we have and hence e s − 1 ≤ (1 − e −1 )s. By combining (6.5) and (6.6), we obtainα(x, y) ≥ c 0 min(r, This proves Assertion (ii).
Therefore, Assertion (iii) holds as well.
We first consider the case where r = |x − y| > R. By the choice of h 0 in the statement of the theorem, h ≤ K/L 2 . Therefore, by Lemma 2.7, Since f a is concave with f ′ a ≥ 1/2, we immediately obtain (6.11) ≤ −Khr/4, and hence, as f a (r) ≤ r and r > R, (ii). Now suppose r ≤ R. Sincer ≤ r by (2.52), we have We can now apply the arguments in the proofs of Theorems 2.2 and 2.4 with α and β replaced by the corresponding quantitiesα andβ for the coupled random walk transition (x,ŷ) → (X ′ , Y ′ ), with r 2 and r 1 replaced by R. Indeed, note that since the case of r > R has already been considered above, we only need to use the parts of the proofs of Theorems 2.2 and 2.4 concerned with the case of r ≤ R and thus Assumptions (A3) and (B3) are not required. We consider first a = 0. In this case, we can proceed as in the proof of Theorem 2.4 with r 0 = √ h. By Lemma 2.7, we can choose α(r) = c 0 min(r, in order to satisfy (4.4), (4.5), (4.6), (4.9), (4.11) and (4.12). HereÎ s is defined by (4.1). With these choices we obtain as in the proof of Theorem 2.4 where f 0 is defined by (2.53). Noting thatr ≤ r by (2.52), the bounds in (6.13) and (6.15) now imply that for r ≤ R, It only remains to show c ≥ c 1 (0)h. Suppose first that s < 2 √ h = 2r 0 . Then By (6.14), (6.17), (6.18) we see that . The assertion for a = 0 now follows by (6.11), (6.16) and (6.19). Now consider the case a ≥ √ h. Here we can proceed as in the proof of Theorem 2.2 with r 0 = ε = √ h. We now choose the intervals I r and the dual intervalsÎ s according to (3.1) and (3.2), i.e., I r = ((r− √ h) + , r) andÎ s = (max(s, √ h), s+ √ h). By Lemma 2.7, we can choose α, β, γ, γ, ϕ and Φ as above so that conditions (3.3), (3.4), (3.5), (3.7), (3.9), (3.10), (3.11), (3.13) and (3.14) are satisfied. In particular, choosing a ≥ √ h = r 0 guarantees that (3.10) is satisfied since β(x, y) ≤ 0 for all x, y ∈ R d . Note that for u ∈Î s we have α(u) ≥ c 0 h, because u ≥ √ h. Setting (6.20) we obtain where ρ a is defined by (2.53). The bound c ≥ c 1 (a) follows as in (6.18) and (6.19). Here we have used that by the assumptions, h 0 L ≤ 1 and h 0 L 2 ≤ K. Moreover, the assumption on h 0 implies 1/(4L √ h 0 ) ≥ R since r 2 > R. Hence by (3.3), γ(r) ≤ 0 for r ≥ R, and thus (3.4), (3.6) and (3.9) are satisfied with For a ≥ 2 √ h, condition (3.10) is satisfied by (6.22), (6.23), and since by assumption, In order to verify (3.12) we need to choose r 2 ≥ r 1 = R such that (6.25) 2 To this end, note that for r ≥ R, we have Φ(r) = Φ(R) + (r − R)ϕ(R). Furthermore, since 1/(4L √ h) ≤ r 2 by assumption, on [R, r 2 ] we can use the formula for α given in (6.22). Hence (6.25) is satisfied if Since we assume that a ≤ Φ(R), this condition holds if we choose (6.27) r 2 = R + 2 c 0 /K .
Hence from Theorem 2.2 we obtain E x,y [ρ a (X ′ , Y ′ )] ≤ (1 − c)ρ a (x, y) with c given by (3.13), for ρ a (x, y) = 1 x =y + f a (|x − y|), where (6.28) f a (r) = r 0 ϕ(s ∧ r 2 )g a (s ∧ r 2 )ds with ϕ given by (6.24) and g a = g given by (3.14). Moreover, we can easily bound the second quantity appearing in the definition (3.13) of c. Indeed, for s < r 2 and u ∈Î s we have (6.22). Since ϕ(s) ≥ ϕ(R) and Φ(u) ≤ u, we obtain and hence This implies the assertion, since by (6.27), In the following variation of Theorem 2.10, Condition (C2) is replaced by a Lyapunov condition: Theorem 6.1 (Euler scheme, general case with Lyapunov condition). Suppose that Conditions (C1) and (C3) are satisfied and that the transition kernel p of the Euler scheme satisfies Assumption (A4)a with a Lyapunov function V , i.e., there exist constants C, λ > 0 such that pV ≤ (1 − λ)V + C. Moreover, assume , (16L 2 r 2 2 ) −1 , where r 1 , r 2 > 0 are constants specified in (6.31) and (6.35). Suppose further that a ∈ (2 √ h, r 2 ) and let ρ a (x, y) = (a + M 2C (V (x) + V (y)))1 x =y + f a (|x − y|) with M given by (6.32) and f a defined in (6.36). Let with ϕ given by (6.30). Then for all h ∈ (0, h 0 ) we have Example 6.2. It is easy to see that if the drift b satisfies a linear growth condition |b(x)| 2 ≤ L 0 (1 + |x| 2 ) for all x ∈ R d with a constant L 0 > 0 (which is implied by (C3) with L 0 = 2 max(L 2 , |b(0)| 2 )) and a dissipativity condition with constants M 1 , M 2 > 0, then the transition kernel p of the Euler scheme satisfies the Lyapunov condition pV ≤ (1 − λ)V + C with the Lyapunov function V (x) = |x| 2 and constants λ = 2hM 2 −h 2 L 0 and C = h 2 L 0 +2hM 1 +hd, whenever h < 2M 2 /L 0 . Since the quadratic function satisfies the growth condition required in Theorem 6.1 and the dissipativity condition (6.29) is significantly weaker than Assumption (C2), we can apply this result to more general cases than the ones covered by Theorems 2.10 and 2.12.
In order for (3.19) to be satisfied, it is sufficient to choose M such that M ≤ 1 4 Note, however, that ϕ(s) ≥ ϕ(r 1 ) for all s > 0 and hence h c 0 4 Thus if we choose (6.32) M = h c 0 ϕ(r 1 ) 4(r 1 + 1) , then condition (3.19) is indeed satisfied. Note that r 1 + 1 is chosen here instead of r 1 in order to prevent the value of M from being too large when r 1 is very small (or even zero). Now condition (3.20) reads as , we see that (6.33) holds for all a ≥ 2 √ h. It remains to verify condition (3.21), for which we need to hold for all r ≥ r 2 . Since ϕ is decreasing, using the choice of M in (6.32), we see that it is sufficient to have Λr for all r ≥ r 2 .
This finishes the proof.
We have chosen q in (2.67) such that c 0 q = 7Λ R ≥ 6Λ r 1 .
By (2.65) and (2.67), we see that the constant c 2 has been defined in (2.69) in such a way that (6.46) holds true, and thus the assertion (2.71) is indeed satisfied.
(ii)r ≤ √ h. Noting that f ′ (r) ≤ 1 and f (r) ≤r, we see by (6.44)  , it is not difficult to see that the proof of Theorem 2.12 carries over to our slightly modified setup. Therefore, similarly to Theorem 2.12, we can find for any fixed R ∈ (0, ∞) a concave strictly increasing function f with f (0) = 0 and constants c 2 > 0, h 0 > 0 such that for h ∈ (0, h 0 ), , for any x, y ∈ B − R . We now want to apply the perturbation result in Theorem 2.5. Setting d(x, y) = |x − y| and ρ(x, y) = f (|x − y|), we see that condition (2.30) holds with c = c 2 h. Moreover, by Lemma 7.1 below, there exists a constant p > 0 depending only on R, such that for h 0 sufficiently small and h ∈ (0, h 0 ), condition (2.32) is satisfied for any x, y ∈ B − R with r 0 = √ h. Thus, to apply Theorem 2.5, it remains to show that (2.31) holds with a constant b ≥ 0 satisfying To this end notice that for x, y ∈ B − R , Furthermore, since X = X ′ if the proposal is accepted and X = x otherwise, we obtain by (7.1) and Lemma 2.15, that for any x ∈ B − R and h ∈ (0, 2),  ) . Since the right hand side in (7.3) is of order Ω(h 3/2 ), we conclude that (7.3) holds for dh < h 1 provided h 1 ∈ (0, ∞) is chosen sufficiently small. Hence, Theorem 2.5 applies and by (2.34) we obtain for any x, y ∈ B − R and h < h 1 d −1 , where c 3 = min(c 2 /8, p/4h) and f (r) = f (r) + 2bp −1 1 r>0 . By (7.5), the first probability on the right hand side is bounded by 1 − p 0 for |x − y| ≤ √ h. Moreover, by Lemma 2.15, there exists a finite constant c ′ ∈ (0, ∞) such that for any x, y ∈ B − R and h ∈ (0, 2) (7.7) P x,y [ X = X ′ ] = E x,y [1 − α h (x, X ′ )] ≤ c ′ h 3 2 . A corresponding upper bound holds for P x,y [ Y = Y ′ ]. Hence, by combining (7.5), (7.6) and (7.7), we conclude that there exist constants h 0 > 0 and p = p 0 2 > 0 such that P x,y [ X = Y ] ≤ 1 − p for any h ∈ (0, h 0 ) and x, y ∈ B − R with |x − y| ≤ √ h.