Wasserstein and total variation distance between marginals of L\'evy processes

We present upper bounds for the Wasserstein distance of order $p$ between the marginals of L\'evy processes, including Gaussian approximations for jumps of infinite activity. Using the convolution structure, we further derive upper bounds for the total variation distance between the marginals of L\'evy processes. Connections to other metrics like Zolotarev and Toscani-Fourier distances are established. The theory is illustrated by concrete examples and an application to statistical lower bounds.


Introduction
Lévy processes form the prototype of continuous-time processes with a continuous diffusion and a jump part. In applications, there is a high interest to disentangle these parts based on discrete observations. While Aït-Sahalia and Jacod [1] among many others propose an asymptotically (as the observation distances become smaller) consistent test on the presence of jumps for general semimartingale models, Neumann and Reiß [18] argue that, already inside the class of α-stable processes with α ∈ (0, 2], no uniformly consistent test exists. The subtle, but important, difference is the uniformity over the class of processes. Mathematically, the difference is that on the Skorokhod path space D([0, T ]) α-stable processes, α ∈ (0, 2), induce laws singular to that of Brownian motion (α = 2), while their respective marginals at t k = kT /n, k = 0, . . . , n for n fixed, have equivalent laws, which even converge in total variation distance as α → 2 to those of Brownian motion. It is our aim here to shed some light on the geometry of the marginal laws of one-dimensional Lévy processes and to quantify the distance of the marginal laws non-asymptotically as a function of the respective Lévy characteristics (b, σ 2 , ν). The marginals form, of course, infinitely divisible distributions, but we prefer here the process point of view which is sometimes more intuitive.
Let us recall the fundamental result by Gnedenko and Kolmogorov [12].
As a particular example, consider the compound Poisson process with Lévy measure δ−ε+δε 2ε 2 that has jumps of size ε and −ε both at intensity 1 2ε 2 . Then as ε ↓ 0, the marginals converge to those of a standard Brownian motion, which can also be derived from Donsker's Theorem. Below, we shall be able to quantify this rate of convergence for general Lévy processes in terms of the (stronger) p-Wasserstein distances W p . The derived Gaussian approximation of the small jump part relies on the fine analysis by Rio [20] of the approximation error in Wasserstein distance for the central limit theorem. This is the subject of Theorem 9, of which the following is a simplified statement: Result. Let X S (ε) be a Lévy process with characteristics (0, 0, ν ε ) where ν ε is a Lévy measure with support in [−ε, ε]. Introducingσ 2 (ε) = ε −ε x 2 ν ε (dx), there exists a constant C depending only on p such that: W p (L (X S t (ε)), N (0, tσ 2 (ε))) ≤ C min ( √ tσ(ε), ε) ≤ Cε.
A Gaussian approximation of the small jumps of Lévy processes has already been employed, for example when simulating trajectories of Lévy processes with infinite Lévy measure (see e.g. [4]).
Sometimes we can even obtain bounds on the total variation distance, which for statistical purposes, especially testing, is particularly meaningful. The currently available bound in the literature is by Liese [16]. Theorem 2 ([16,Cor. 2.7]). For Lévy processes X 1 and X 2 with characteristics (b 1 , σ 2 1 , ν 1 ) and (b 2 , σ 2 2 , ν 2 ), respectively, introduce the squared Hellinger distance of the Lévy measures (put ν 0 = ν 1 + ν 2 ): Then the total variation distance between the laws of X 1 t and X 2 t is bounded as: . Note that the bound is very loose or even trivial in the case ν 2 = 0 and λ 1 = ν 1 (R) > 1/t because then tH 2 (ν 1 , ν 2 ) = tλ 1 > 1. So, this bound does not allow to deduce a total variation approximation of Brownian motion by jump processes of infinite jump activity like α-stable processes with α ↑ 2. In fact, for pure jump Lévy processes these bounds are analogous to the bounds by Mémin and Shiryayev [17] in the path space D([0, T ]), where pure jump processes and Brownian motion have singular laws (for other results on distances on D([0, T ]) see e.g. [6,9,14,15]). Our main idea is to use the convolutional structure of the laws to transfer bounds from Wasserstein to total variation distance. This strategy is implemented for Lévy processes with a non-zero Gaussian component (but without any restriction on the Lévy measures, which can be infinite, and even with infinite variation) and yields Theorem 14: Result. For Lévy processes X 1 and X 2 with characteristics (b j , σ 2 j , ν j ) and σ j > 0, j = 1, 2, we have for all t > 0, ε ∈ [0, 1]: with the above notation, ν ε j = ν j (· \ (−ε, ε)) and λ j (ε) = ν ε j (R). The results proven in this paper provide further insight in the geometry of the space of discretely observed Lévy processes. At the same time, their nonasymptotic character finds fruitful applications in nonparametric statistics, when proving general lower bounds in a minimax sense. The technology is shown at work in Section 5.2, making the original proof by Jacod and Reiß [13] for volatility estimation under high activity jumps simpler and much more transparent.
The results are stated in dimension one. After the first version of this paper was completed, however, new results for non-asymptotic multidimensional central limit theorem in Wasserstein distances have appeared (see e.g. [2]). Since a (special form of a) central limit theorem was the main technical tool in our proof of Theorem 9, this makes a multidimensional extension of our findings a promising future research direction that seems worth investigating. Another potentially fruitful line of research would be to go beyond the independence structure of the increments and consider the general framework of semimartingales. Lévy processes are the basic building blocks for these more general processes and it is common to use this easier setting as a first step towards a more general proof; however, the techniques that were used in this paper heavily depend on the independence structure and do not directly extend to this more general framework.
The paper is organized as follows. In Section 2 we review basic properties of the Wasserstein distances and discuss their relationship with the Zolotarev and Toscani-Fourier distances. Then we recall the main non-asymptotic bounds for the Wasserstein distances in the CLT and introduce Lévy processes. Section 3 derives bounds between marginals of Lévy processes in Wasserstein distance. The main focus is on the small jump part, which is treated in Theorem 9 and for which the tightness of the bounds is discussed in detail, first for concrete examples and then more generally using a lower bound via the Toscani-Fourier distance. Main results are presented in Section 3.3. Section 4 introduces properties of the total variation distance and then shows how bounds in Wasserstein or Toscani-Fourier distance transfer under convolution to total variation bounds, see e.g. Proposition 4 and Proposition 7. For Gaussian convolutions the different bounds are first compared and then applied to the marginals of Lévy processes. Section 5 is devoted to the application of the total variation bounds for proving the minimax-optimality of integrated volatility estimators in the presence of jumps proposed in [13].

The Wasserstein distances
Let (X , d) be a Polish metric space. Given p ∈ [1, ∞), let P p (X ) denote the space of all Borel probability measures µ on X such that the moment bound holds for some (and hence all) x 0 ∈ X . Definition 1. Given p ≥ 1, for any two probability measures µ, ν ∈ P p (X ), the Wasserstein distance of order p between µ and ν is defined by where the infimum is taken over all random variables X ′ and Y ′ having laws µ and ν, respectively. We abbreviate W p (X, Y ) = W p (L (X), L (Y )) for random variables X, Y with laws L (X), L (Y ) ∈ P p (X ).
The following lemma introduces some properties of the Wasserstein distances that we will use throughout the paper. For a proof, the reader is referred to [25], Chapter 6. Lemma 1. The Wasserstein distances have the following properties: (1) For all p ≥ 1, W p (·, ·) is a metric on P p (X ).
(3) Given a sequence (µ n ) n≥1 and a probability measure µ in P p (X ) if and only if (µ n ) n≥1 converges to µ weakly and for some (and hence all) (4) The infimum in (1) is actually a minimum; i.e., there exists a pair (X * , Y * ) of jointly distributed X -valued random variables with L (X * ) = µ and L (Y * ) = ν, such that Following the terminology used in [26], we can say that the Wasserstein distances are ideal metrics since they possess the following two properties.
Lemma 2. Let X be a separable Banach space. For any three X -valued random variables X, Y, Z, with Z independent of X and Y , the inequality holds. Furthermore, for any real constant c, we have W p (cX, cY ) = |c|W p (X, Y ). (2) Proof. Lemma 1 guarantees the existence of two random variables X * , Y * , independent of Z, such that We have: The equality (2) follows by homogeneity of the expectation.
An immediate corollary of Lemma 2 is the subadditivity of the metric W p under independence (or equivalently under convolution of laws). Corollary 1. If X 1 , . . . , X n are independent random variables as well as Y 1 , . . . , Y n , then Proof. By induction, it suffices to prove the case n = 2. LetX 2 be a random variable equal in law to X 2 and independent of Y 1 and of X 1 . By means of Lemma 2 we have Hence, by triangle inequality A useful property of the Wasserstein distances is their good behaviour with respect to products of measures.
In the case where µ 1 = · · · = µ n and ν 1 = · · · = ν n , one may choose X * 1 = · · · = X * n and Y * 1 = · · · = Y * n . The conclusion readily follows. The distance W 1 is commonly called the Kantorovich-Rubinstein distance and it can be characterized in many different ways. Some useful properties of the distance W 1 are the following.

Proposition 1.
[See [10]] Let X and Y be integrable real random variables. Denote by µ and ν their laws and by F and G their cumulative distribution functions, respectively. Then the following characterizations of the Wasserstein distance of order 1 hold: R ψdµ − R ψdν , the supremum being taken over all ψ satisfying the Lipschitz condition |ψ(x) − ψ(y)|≤ |x − y|, for all x, y ∈ R. This property is generally called Kantorovich-Rubinstein formula.

Wasserstein, Zolotarev and Toscani-Fourier distances
Let µ, ν be two probability measures on R endowed with the distance d(x, y) = |x − y|, x, y ∈ R. Writing p > 0 as p = m + α with m ∈ N 0 and 0 < α ≤ 1, denote by F p the Hölder class of real-valued bounded functions f on R which are m-times differentiable with Definition 2. The Zolotarev distance Z p between µ and ν is defined by Remark 1. It is easy to see that the functional Z p is a metric. For p = 0 the metric Z p is defined by the relation Z 0 = lim p→0 Z p and F 0 is the set of Borel functions satisfying the condition |f (x) − f (y)|≤ I x =y . Thanks to the characterisation of the total variation given in Property 6 below, it follows that Z 0 (µ, ν) = µ − ν T V . Also, by means of the Kantorovich-Rubinstein formula, recalled in Property 1, we have Z 1 (µ, ν) = W 1 (µ, ν).
The following result shows that the Wasserstein distance of order p is bounded by the p-th root of the Zolotarev distance Z p . This fact, together with Theorem 4 below, will be a useful tool to control the Wasserstein distances between the increments of compound Poisson processes.
Theorem 3 (See [20], Theorem 3.1). For any p ≥ 1 there exists a positive constant c p such that for any pair (µ, ν) of laws on the real line with finite absolute moments of order p be sequences of independent random variables and N be an integer-valued random variable independent of the random variables from both sequences. Then, Theorem 5 (See [26], Theorem 1.4.2.). Let X and Y be integrable real random variables with laws µ and ν, respectively. Then the following characterization of the Zolotarev distance holds: for any p ≥ 1 where Γ denotes the Gamma function.
Let P 1 and P 2 be two probability measures on the real line. We will denote by ϕ 1 (resp. ϕ 2 ) the characteristic function of P 1 (resp. P 2 ), i.e.
Also, denote by L b (R) (resp. L b (C)) the class of real-valued (resp. complexvalued) bounded functions on R with Lipschitz norm bounded by 1.
Definition 3. For s > 0, the Toscani-Fourier distance of order s, denoted by T s , is defined as: The distance introduced in Definition 3 first appeared in [8], under the name "Fourier-based metrics", to study the trend to equilibrium for solutions of the space-homogeneous Boltzmann equation for Maxwellian molecules. After that, it has been used in several other works, and especially linked to the kinetic theory, see [3] for an overview. In [25], T 2 is called the "Toscani distance".
Proof. Thanks to Lemma 1 and Property 1 For all u ∈ R \ {0}, let us consider the function Ψ u (x) = e iux u and observe that the Lipschitz norm of Ψ u is 1. It immediately follows that

Wasserstein distances in the central limit theorem
The class of Wasserstein metrics proves to be very useful in estimating the convergence rate in the central limit theorem. We recall some results. Let (Y i ) i≥1 be a sequence of centred i.i.d. random variables with finite and positive variance σ 2 . We denote by µ n the law of 1 centred random variables with finite absolute third moment, Esseen [5] proved the following result.

Theorem 6.
[See e.g. [19], Theorem 16] For any n ≥ 1, The constant 1 2 in this inequality cannot be improved. A bound for the Wasserstein distances of order r ∈ (1, 2] is due to Rio [20]: [See [20], Theorem 4.1] For any n ≥ 1 and any r ∈ (1, 2], there exists some positive constant C depending only on r such that For r > 2 and i.i.d. random variables with a finite absolute moment of order r, we have the following: [22]] For any n ≥ 1 and r > 2, there exists some positive constant C, depending only on r, such that If one only assumes finite absolute moment of order r, this rate cannot be improved. In particular, under this assumption, the classical rate of convergence 1 √ n cannot be recovered for r > 2. For that reason, from now on, we will only focus on the case r ∈ [1, 2].

Lévy processes
Let us denote by P denotes the infinitely divisible law with characteristics (bt, σ 2 t, νt). X can be characterised via the Lévy-Itô decomposition (see [23]), that is via a canonical representation with independent components X = X (1) + X (2) For all ε ∈ (0, 1] where W is a standard Brownian motion, ∆X s := X s − lim r↑s X r is the jump at time s of X, X S (ε) is a pure jump martingale containing only small jumps and X B (ε) is a finite variation part containing jumps larger in absolute value than ε.
In the following, sometimes we will write Also, for a given Lévy process X we define an auxiliary characteristic σ : R + → R capturing the variance induced by small jumps:

Wasserstein distances for Lévy processes
Let X j , j = 1, 2, be two Lévy processes with characteristics (b j , σ 2 j , ν j ), j = 1, 2. As we will see later, thanks to Corollary 1 and the Lévy-Itô decomposition, in order to control W p (X 1 t , X 2 t ) it is enough to separately control the Wasserstein distances between two Gaussian random variables as well as W p (L (X j,S t (ε)), N (0, tσ 2 j (ε))) and W p (X 1,B t (ε), X 2,B t (ε)). A bound for the Wasserstein distances between Gaussian distributions is given by: Upper bounds for W p (L (X j,S t (ε)), N (0, tσ 2 j (ε))) and W p (X 1,B t (ε), X 2,B t (ε)) will be the subject of Sections 3.1 and 3.2, respectively.

Distances between marginals of small jump Lévy processes
Let X be a Lévy process with Lévy measure ν and denote by X S (ε) the Lévy process associated with the small jumps of X, following the notation introduced in Section 2.4.

Remark 2. The inequality
is clear from the definition of W 2 , noting that tσ 2 (ε) is the second moment of both arguments. The interest of Theorem 9 lies in the bound which after renormalisation yields Thus, not surprisingly in view of the central limit theorem, the Gaussian approximation is better as t is large. Also wheneverσ 2 (ε) is much larger than ε 2 , then a Gaussian approximation is valid, e.g. for α-stable processes with α > 0 and ε small, see Example 1 below. (4) gives in general the right order. Indeed, let us consider for ε > 0 the Lévy measure ν ε = δ−ε+δε

Remark 3. The upper bound
Let us develop the case p = 1. Applying the scheme of proof proposed in [20], see proof of Theorem 5.1, we show that there exists a constant K such that To see that, we consider the cases where t ≤ ε 2 and t > ε 2 separately.
• t ≤ ε 2 : From the definition of the Wasserstein distance of order 1 it follows that • t ≥ ε 2 : Again, by the definition of the Wasserstein distance of order 1, we find that with N ∼ N (0, 1). Since in this case ( √ t/ε)N has variance at least one, there exists a constant K such that In the case p ∈ (1, 2] W p is even larger than W 1 . For the case p = 2 see also [7]. Example 1. Let us illustrate Theorem 9 for the class of α-stable Lévy processes with a Lévy density proportional to 1 |x| 1+α , α ∈ [0, 2). For all ε ∈ (0, 1], let us denote by X S (ε) the Lévy process describing the small jumps and by νI [−ε,ε] its Lévy measure, i.e.
for some constant C α . In particular, we havē Therefore an application of Theorem 9 guarantees the existence of a constant C, possibly depending on p and α, such that: Equation (6) validates the intuition that a Gaussian approximation of the small jumps is the better the more active the small jumps are. Indeed, the approximation in (6) is better when α is larger.
Let us now prove Theorem 9. For that we need to recall the following lemma: Lemma 5 (See [21], Lemma 6.). Let X be a Lévy process with Lévy measure ν.
Proof of Theorem 9. Let us introduce n random variables defined by Y j = √ n(X S tj/n (ε) − X S t(j−1)/n (ε)). The Y j 's are i.i.d. centred random variables with variance equal to tσ 2 (ε) and such that X S t (ε) = 1 √ n n j=1 Y j . An application of Theorems 7 and 6 (using the fact that Y j has the same law as √ nX S t/n (ε) and the homogeneity property of the Wasserstein distances stated in Lemma 2) gives Let us now argue that Indeed, applying Lemma 5 to the family f ( Thus, using the fact that E[(X S t (ε)) 4 ] = t |x|<ε x 4 ν(dx) + 3t 2σ4 (ε), we get Therefore, for any R > ε, Taking the limit as R → ∞, we conclude. It follows that Moreover, by definition of the Wasserstein distance of order 1 and denoting by N a centered Gaussian random variable with variance tσ 2 (ε), we have Similarly, by means of Theorem 7, for p ∈ (1, 2] W p (L (X S t (ε)), N (0, tσ 2 (ε))) ≤ lim sup The upper bound (4) follows by the fact that Theorem 9 can be used to bound the Wasserstein distances between the increments of the small jumps of two Lévy processes.

Distances between random sums of random variables
with the constant c p from Theorem 3.
Proof. By the triangle inequality, where N ′′ is independent of (Y i ) i≥1 and with the same law as N . Thanks to Theorems 3 and 4, the first summand in (7) is bounded by (c p E[N ]Z p (X 1 , Y 1 )) 1/p . Alternatively, this summand can be estimated via Jensen's inequality joined with the fact that Therefore, To control the second summand, we proceed similarly which, by noting L (N ′′ ) = L (N ), concludes the proof.
In the preceding theorem one term is bounded alternatively by the Zolotarev or the Wasserstein distance between X 1 and Y 1 . The difference is the factor in front which is either the first or the pth moment of N . If N is likely to be large, then better bounds can be obtained by profiting from the variance stabilisation for centred sums. Since the larger jumps are not our main issue, this is not pursued further.
In the Poisson case the moments and the Wasserstein distances can be easily analysed. Proposition 3. Let N and N ′ be two Poisson random variables of mean λ and λ ′ , respectively. Let us denote by m (p,ℓ) the moment of order p of a Poisson random variable of mean ℓ, i.e.
Then the following upper bound holds for p ≥ 1 : In particular, Proof. Without loss of generality, let us suppose λ ′ ≥ λ and let N ′′ be a Poisson random variable with mean λ − λ ′ , independent of N . Thanks to Lemma 2 we have To deduce (8) and (9) we use the fact that m (1,ℓ) = ℓ, m (2,ℓ) = ℓ + ℓ 2 and by Hölder's inequality.

First main result
We will use the notation introduced in Section 2.4. In accordance with that, for any given Lévy process X j with characteristics (b j , σ 2 j , ν j ), X j,B (ε) will be a compound Poisson process with Lévy measure ν j (dx)I (ε,∞) (|x|), i.e.
where N j is a Poisson process of intensity λ j (ε) := ν j (R\(−ε, ε)) independent of the sequence of i.i.d. random variables (Y λj (ε) ν j (dx). Recall from Proposition 3 that m (p,ℓ) denotes the moment of order p of a Poisson random variable of mean ℓ.
We now address the problem of how to compute the Wasserstein distance between n given increments of two Lévy processes. To that end, fix a time span T > 0, a sample size n ∈ N and consider the sample (X 1 kT /n −X 1 (k−1)T /n , X 2 kT /n − X 2 (k−1)T /n ) n k=1 . From Lemma 3 we know that we can measure the distance between the random vectors (X 1 kT /n − X 1 (k−1)T /n ) n k=1 and (X 2 kT /n − X 2 (k−1)T /n ) n k=1 in terms of the Wasserstein distance between the marginals. This observation combined with Theorem 11 allows us to obtain an upper bound for the Wasserstein distance of order p between the increments of these Lévy processes.
Corollary 3. Let X j , j = 1, 2, be two Lévy processes with characteristics (b j , σ 2 j , ν j ), j = 1, 2. Then, with respect to the ℓ r -metric on R n given by where C is a constant depending only on p. The term W p (X 1,B T /n (ε), X 2,B T /n (ε)) can be bounded as in Theorem 10 with t = T /n.
In the Euclidean case r = 2 we see that in the bound for the Wasserstein distance the drift part disappears as n → ∞ (T fixed), while the Gaussian part remains invariant and the Gaussian approximation of small jumps gives an error of order min(σ j (ε), n 1/2 ε). The bound on the larger jumps scales as n 1/2 (T /n + (T /n) 1/2 ) (for p = 1 even as T /n 1/2 ) so that the entire bound on the Wasserstein distance remains bounded as n → ∞.

Lower bounds
Applying the general lower bound established in Proposition 2 to Lévy processes, we get the following result: Corollary 4. Let W be a Brownian motion and X ε be a pure jump Lévy process with jumps of absolute value less than ε ∈ (0, 1]. This means that X ε has Lévy triplet (0, 0, ν ε ), supp(ν ε ) ⊂ [−ε, ε] and characteristic function By the bound given in Proposition 2 we usually do not lose in approximation order as the following lower bound examples demonstrate.

.
We conclude that also in this case W 1 (L (Y t (ε)), N (0, t)) ≥ K min( √ t, ε) holds for some positive constant K, independent of t and ε.

Notation and some useful properties
Let (X , F ) be a measurable space and let µ and ν be two probability measures on (X , F ). Definition 4. The total variation distance between µ and ν is defined as Lemma 6. The total variation distance has the following properties.
Remark 4. Let X be a discrete set, equipped with the Hamming metric d(x, y) = I x =y . In this case, thanks to Property 2. above, for any probability measures µ and ν on X we have The total variation distance does not always bound the Wasserstein distance, because the latter is also influenced by large distances. However, thanks to the following classical result, one can get some control on W p given a bound on the total variation distance.

Theorem 12.
[See [25], Theorem 6.13] Let µ and ν be two probability measures on a Polish space (X , d). Let p ∈ [1, ∞) and x 0 ∈ X . Then In particular, if p = 1 and the diameter of X is bounded by D, then In Proposition 4 we will show an inequality that can be thought of as an inverse of the one above. Namely, the total variation distance between two measures convolved with a common measure can be bounded by a multiple of the Wasserstein distance of order 1.

Wasserstein distance of order 1 and total variation distance
Recall that a real function g is of bounded variation if its total variation norm is finite, i.e.
where the supremum is taken over the set P = {P = (x 0 , . . . , x nP ) : x 0 < x 1 < · · · < x nP } of all finite ordered subsets of R. We will denote by BV (R) the space of functions of bounded variation. We now state a lemma that will be useful in the following.
Lemma 7. Let g be a real function of bounded variation and F ⊆ {φ: Proof. The proof is an easy consequence of the following classical results on Lebesgue-Stieltjes measures: 1. For every right-continuous function g: R → R of bounded variation there exists a unique signed measure µ such that 2. Let φ ∈ L ∞ (R) and let g ∈ BV (R) be a right-continuous function. Let µ be the finite signed measure associated to g as in (11).Then φ(t−y)µ(dy) is well defined, measurable in t ∈ R and bounded in absolute value by φ ∞ g BV .
More precisely, let µ be the finite signed measure associated to g. It is enough to prove that φ(t − y)µ(dy) is the weak derivative of h φ since then, using Point 2. above, we deduce that h φ Lip = φ(· − y)µ(dy) ∞ ≤ φ ∞ g BV and hence (10). The claim above follows by Fubini's Theorem: for all T > 0 Hence, φ(t−y)µ(dy) is the weak derivative of φ(u)(g(t−u)−lim x→−∞ g(x))du as desired.
Proposition 4. Let µ and ν be two measures on (R, B(R)) and G be an absolutely continuous measure with respect to the Lebesgue measure admitting a density g of bounded variation. Then the total variation distance between the convolution measures µ * G and ν * G is bounded by Proof.
the supremum being taken over compactly supported functions φ. Denote by h φ (t) = R φ(x)g(x − t)dx. From the last equality it follows that hence, applying Lemma 7 to F = {φ: R → R : φ ∞ ≤ 1 with compact support} and Proposition 1, we deduce that The upper bound established in Proposition 4 is sharp. To see that, let us consider the following example.
Example 2. Let µ = δ 0 , ν = δ ε and G = N (0, 1) for some ε > 0. Denoting by ϕ the density of a random variable N ∼ N (0, 1) and by Φ its cumulative distribution function, we have At the same time it is easy to see that W 1 (µ, ν) = ε and g BV = 2 π . Therefore, the upper bound established in Proposition 4 is exactly the correct estimate up to the first order.

Total variation distance and Toscani-Fourier distances
For any Lebesgue density f introduce its Fourier transform F f (u) = e iux f (x)dx.
A first elementary result linking the total variation distance between convolution measures to Toscani-Fourier metrics is the following.
Proposition 5. Let µ, ν and G be probability measures and suppose that its characteristic functions ϕ µ , ϕ ν , ϕ G are differentiable. Assume that G has a Lebesgue density g with mth weak derivative g (m) . Then, for all k, j, r ∈ {1, . . . , m}, we have for some numerical constant C > 0.
Proof. First of all, remark that if any one among the g (•) 2 , (xg(x)) (•) 2 , appearing above is infinite, then there is nothing to prove. Therefore, from now on, we will assume that they are all finite. Since G admits a density g with respect to Lebesgue measure, µ * G and ν * G have densities g * µ and g * ν.
Using the Cauchy-Schwarz inequality we have for some numerical constant C > 0. For all k > 0 an application of the Plancherel identity yields Hence, In the same way we also have and we conclude as before that for all r, j > 0 It remains to apply the inverse Fourier transform.
Using a different set of hypotheses, one can also establish the following relation between the total variation distance and the Toscani-Fourier distance.
Proposition 6. Let µ, ν and G be real probability measures absolutely continuous with respect to the Lebesgue measure. Let f µ , f ν and g denote their densities and F µ and F ν denote the cumulative distribution functions of µ and ν. Suppose that F g ∈ L 1 and that F µ − F ν ∈ L 1 . Further suppose that the graphs of f µ * g and f ν * g intersect in at most N points. Then, Proof. As in the proof of Proposition 4, let us introduce the function Using an integration by part and Plancherel identity, we get Also observe that , Let us denote by −∞ = x 0 < x 1 < · · · < x N < x N +1 = +∞ the points of intersections between the graphs of f µ * g and f ν * g. In particularφ(u) = ± N i=0 (−1) i I [xi,xi+1) (u) with the sign depending on the sign of f µ * g − f ν * g on (−∞, x 1 ). Thus, In particular, we get that hence |F h ′φ (u)|≤ 2N |F g(u)|. This fact, together with (12), concludes the proof.
Let us observe that another way to link the total variation distance between convolution measures to the Toscani-Fourier distance is offered by Theorem 2.21 in [3] joint with Proposition 4. More precisely, Theorem 2.21 in [3] states that, under appropriate hypotheses on µ and ν, }, X ∼ µ and Y ∼ ν. Therefore, from Proposition 4, it follows that where g denotes the density of G. Using some ideas from the proof of Theorem 2.21 in [3] we will be able to prove the following general result.
Proposition 7. Let µ, ν ∈ P j (R), j ≥ 1, and G be a measure, absolutely continuous with respect to the Lebesgue measure. Suppose that the density g of G is j-times weakly differentiable with jth derivative g (j) ∈ L 2 . Then, Proof. Using the same notation as in Proposition 6, we have for all R > 0 By Cauchy-Schwarz inequality, holds. Taking R = ( Using Plancherel identity and the properties of the Fourier transform, we deduce that It follows that

Remark 5.
To better understand the upper bounds presented above, let us specialise to the case G = N (0, σ 2 ). In order to compare the results presented in Propositions 5-7 let us start by observing that the following equalities hold.
We are now able to compare the previous results for independent random variables Z ∼ N (0, σ 2 ), X ∼ µ, Y ∼ ν.
Proposition 5 for T 1 : With a numerical constant C > 0, independent of the laws of X, Y, Z, Proposition 6: Let N be the number of intersections between the graphs of the densities of X + Z and Y + Z. Then, Proposition 7 for T 1 : Proposition 7 for T 2 : Proposition 4 + Theorem 2.21 in [3]: We see that Proposition 7 gives a much tighter bound than Proposition 4 + Theorem 2.21 in [3] when T 2 (X, Y ) is small.

Main total variation results
As it was the case in Section 3, in order to obtain an upper bound for the total variation distance between the marginals X 1 t and X 2 t of two Lévy processes it is enough to separetely control the total variation distance between Gaussian distributions, between the small jumps and the corresponding Gaussian component and finally between the big jumps. The latter can be controlled by means of the following result.
Theorem 13. Let (X i ) i≥1 and (Y i ) i≥1 be sequences of i.i.d. random variables a.s. different from zero and N , N ′ be two Poisson random variables with N (resp. N ′ ) independent of (X i ) i≥1 (resp. (Y i ) i≥1 ). Denote by λ (resp. λ ′ ) the mean of N (resp. N ′ ). Then, Proof. Without loss of generality, let us suppose that λ ≥ λ ′ and write λ = α + λ ′ , α ≥ 0. By triangle inequality, where N ′′ is a random variable independent of (X i ) i≥1 and with the same law as N ′ . The first addendum in (13) can be bounded as follows. Let P be a Poisson random variable independent of N ′′ and (X i ) i≥1 with mean α. Then, where the last bound follows by subadditivity of the total variation distance. By definition, it is easy to see that In order to bound the second addendum in (13) we condition on N ′ and use again the subadditivity of the total variation joined with the fact that L (N ′ ) = L (N ′′ ): The treatment of the small jumps is the subject of the following result: Proposition 8. Let X be a pure jump Lévy process with Lévy measure ν. Introduce ν ε = νI |x|≤ε . Then, for all Σ > 0 and ε ∈ (0, 1], we have Proof. This follows by applying first Proposition 4 and then Theorem 9.
As a consequence of the above estimates on the Wasserstein distances, we obtain a bound for the total variation distance of the marginals of Lévy processes with non-zero Gaussian components.
Theorem 14. With the same notation used in Theorem 11 and Section 2.4, The proof follows from Proposition 8, the classical bound and Theorem 13.
Another useful result follows directly from Proposition 7 with j = 1 and allows to bound the total variation distance for Lévy processes with positive Gaussian part by the Toscani-Fourier distance for the same Lévy processes, but with a smaller Gaussian part.

Lower bounds in the minimax sense
One of the main goals in statistics is to estimate a quantity of interest from the data. There are different criteria that can be used to judge the quality of an estimator. In nonparametric statistics it is common to use a minimax approach. Let us recall the classical setting. From the data (X 1 , . . . , X n ) one wants to recover a quantity of interest θ (e.g. θ is the density of the observations, or the regression function, or the Léyy density, or the diffusion coefficient, etc.). In practice θ is unknown (but supposed to belong to a certain parameter space Θ) and one needs to estimate it via an estimator (a measurable function of the data)θ n =θ n (X 1 , . . . , X n ). To measure the accuracy of the estimator one computes the minimax risk where the infimum is taken over all possible estimators T n of θ and d is a semi-distance on Θ. Furthermore, one says that a positive sequence (ψ n ) n≥1 is an optimal rate of convergence of estimators on (Θ, d) if there exist constants C < ∞ and c > 0 such that and lim inf n→∞ ψ −2 n R * n ≥ c, (lower bound).
The goal is then to construct an estimator θ * n such that where (ψ n ) n≥1 is the optimal rate of convergence and C ′ < ∞ is a constant. The usual way to proceed is to build an estimatorθ n of θ and start the investigation about its performance firstly via an upper bound like (14). This is important since the first thing to check is that the considered estimator is at least consistent, that is automatically implied if sup θ∈Θ E[d 2 (θ,θ n )] → 0. After that, a natural question is whether one could construct a better (in terms of rate of convergence in the class (Θ, d)) estimator. In order to ensure that it is not possible to obtain a better estimator than the one already constructed one has to prove a lower bound, that is it is needed to prove that the rate of convergence of any other possible estimator of θ will not be faster than the rate obtained in the upper bound. This is in general a difficult task and we refer to Chapter 2 in [24] for general techniques to prove lower bounds. Without recalling all the steps needed to prove a lower bound following [24], let us stress here that one of the fundamental ingredients is to have a fine upper bound for the total variation distance or other measure distances. To that end the estimates in Section 4.4 can be of general interest to prove lower bounds in the minimax sense.
One situation when this general procedure applies is the following, where we show how to simplify the arguments used in [13] in order to prove the desired lower bound for an estimator of the integrated volatility.

5.2
How to simplify the proof of the lower bound in [13] In [13], the authors consider a one-dimensional Itô-semimartingale X with characteristics (B, C, ν): They assume that X belongs to the class S r A of all Itô-semimartingales that satisfy |b t |+c t + (|x| r ∧1)F t (dx) ≤ A ∀t ∈ [0, 1].
Their goal is to estimate the integrated volatility C at time 1, C(X) 1 , from high-frequency observations X i n , i = 0, . . . , n. They have an upper bound for an estimator of C(X) 1 and they want to prove that the rate of convergence attained by that estimator is optimal. To that aim they need to prove that any uniform rate ψ n for estimating C(X) 1 satisfies ψ n ≥ (n log n) − 2−r 2 if r > 1.