Concentration inequalities for Stochastic Differential Equations with additive fractional noise

In this paper, we establish concentration inequalities both for functionals of the whole solution on an interval [0, T ] of an additive SDE driven by a fractional Brownian motion with Hurst parameter H $\in$ (0, 1) and for functionals of discrete-time observations of this process. Then, we apply this general result to specific functionals related to discrete and continuous-time occupation measures of the process.


Introduction
In this article, we consider the solution (Y t ) t≥0 of the following R d -valued Stochastic Differential Equation (SDE) with additive noise: (1.1) with B a d-dimensional fractional Brownian motion (fBm) with Hurst parameter H ∈ (0, 1). We are interested in questions of long-time concentration phenomenon of the law of the solution Y . A well known way to overcome this type of problem is to prove L 1 -transportation inequalities. Let us precise what it means. Let (E, d) be a metric space equipped with a σ-field B such that the distance d is B ⊗ Bmeasurable. Given p 1 and two probability measures µ and ν on E, the Wasserstein distance is defined by where the infimum runs over all the probability measures π on E × E with marginals µ and ν. The entropy of ν with respect to µ is defined by otherwise.
Then, we say that µ satisfies an L p -transportation inequality with constant C 0 (noted µ ∈ T p (C)) if for any probability measure ν, W p (µ, ν) 2CH(ν|µ). (1. 2) The concentration of measure is intrinsically linked to the above inequality when p = 1. This fact was first emphasized by K.Marton [11,10], M.Talagrand [15], Bobkov and Götze [1] and amply investigated by M.Ledoux [9,8]. Indeed, it can be shown (see [9] for a detailed proof) that (1.2) for p = 1 is actually equivalent to the following: for any µ-integrable α-Lipschitz function F (real valued) we have for all λ ∈ R, E (exp (λ (F (X) − E[F (X)]))) exp Cα 2 λ 2 2 (1. 3) with L(X) = µ. This upper bound naturally leads to concentration inequalities through the classical Markov inequality. For several years, L 1 (and L 2 since T 2 (C) implies T 1 (C)) transportation inequalities have then been widely studied and in particular for diffusion processes (see for instance [4,16,6]). For SDE's driven by more general Gaussian processes, S.Riedel established transportation cost inequalities in [12] using Rough Path theory. However, his results do not give long-time concentration, which is our focus here.
In the setting of fractional noise, T.Guendouzi [7] and B.Saussereau [14] have studied transportation inequalities with different metrics in the case where H ∈ (1/2, 1). In particular, B.Saussereau gave an important contribution: he proved T 1 (C) and T 2 (C) for the law of (Y t ) t∈ [0,T ] in various settings and he got a result of large-time asymptotics in the case of a contractive drift. Our first motivation to this work was to get equivalent results in a discrete-time context, i.e. for L((Y k∆ ) 1 k n ) for a given step ∆ > 0 and then long-time concentration inequalities for the occupation measure, i.e. for 1 n n k=1 f (Y k∆ ) (where f is a general Lipschitz function real valued). Indeed, in a statistical framework we only have access to discrete-time observations of the process Y and such a result could be meaningful in such context. To the best of our knowledge, this type of result is unknown in the fractional setting. We first tried to adapt the methods used in [14] in several ways as for example: find a distance such that (y t ) t∈[0,T ] → (y k∆ ) 1 k n is Lipschitz and prove T 1 (C) with this metric. But the constants obtained in the L 1 -transportation inequalities were not sharp enough, so that we couldn't deduce large-time asymptotic as B.Saussereau. In [4], H.Djellout, A.Guillin and L.Wu explored transportation inequalities in the diffusive case and both in a continuous and discrete-time setting. In particular, for the discrete-time case, they used a kind of tensorization of the L 1 transportation inequality but the Markovian nature of the process was essential. However, they prove T 1 (C) through its equivalent formulation (1.3) and to this end, they apply a decomposition of the functional in (1.3) into a sum of martingale increments, namely: with X = (Y k∆ ) 1 k n and Y is the solution of (1.1) when B is the classical Brownian motion. This decomposition has inspired the approach described in this paper: instead of proving an L 1 transportation inequality (1.2), we prove its equivalent formulation (1.3) by using a similar decomposition and the series expansion of the exponential function. Through this strategy, we prove several results under an assumption of contractivity on the drift term b in (1.1). First, in a discrete-time setting, we work in the space (R d ) n endowed with the L 1 metric and we show that for any α-Lipschitz functional F : (R d ) n → R and for any λ > 0, In a similar way, we consider the space of continuous functions C([0, T ], R d ) endowed with the L 1 metric and we prove that for any α-Lipschitz functionalF : . From these inequalities, we deduce some general concentration inequalities and large-time asymptotics for occupation measures. Let us note that we have no restriction on the Hurst parameter H and we retrieve the results given by B.Saussereau for H ∈ (1/2, 1) in a continuous setting and also the result given in [4] for H = 1/2, namely for diffusion.
The paper is organised as follows. In the next section, we describe the assumptions on the drift term and we state the general theorem about concentration, namely Theorem 2.2. Then, in Subsection 2.3, we apply this result to specific functionals related to the occupation measures (both in a discrete-time and in a continuous-time framework). Section 3 outlines our strategy of proof which is fulfilled in Sections 4 and 5.

Notations
The usual scalar product on R d is denoted by , and | . | stands either for the Euclidean norm on R d or the absolute value on R. We denote by M d (R) the space of real matrices of size d × d. For a given n ∈ N * and (x, y) ∈ R d n × R d n , we denote by d n the following L 1 -distance: (2.1) Analogeously, for a given T > 0 and (x, be a Lipschiz function between two metric spaces, we denote by

Assumptions and general result
Let B be a d-dimensional fractional Brownian motion (fBm) with Hurst parameter H ∈ (0, 1) defined on (Ω, F , P) and transferred from a d-dimensional Brownian motion W through the Volterra representation (see e.g. [3,2]) In the sequel, the distribution of W will be denoted by P W .
We consider the following R d -valued stochastic differential equation driven by B: Here x ∈ R d is a given initial condition, B is the aformentioned fractional Brownian motion and σ ∈ M d (R).
We are working under the following assumption : and there exist constants α, L > 0 such that:  [4,14] for instance). At this stage, a more general framework seems elusive.
We are now in position to state our results for general functionals F andF . First, we prove a result on the exponential moments of F Y andF Y which is crucial to get Theorem 2.2.

Remark 2.2. Let us note that this proposition is actually equivalent to L 1 -transportation inequalities as mentionned in the introduction. More precisely, item
From Proposition 2.1, we deduce the following concentration inequalities: Let H ∈ (0, 1) and ∆ > 0. Let n ∈ N * , T 1 and d n , d T be the metrics defined respectively by (2.1) and (2.2). Then, (i) there exist C H,∆,σ > 0 such that for all Lipschitz functions F : (R d ) n , d n → (R, | · |) and for all r 0, (2.10) and for all r 0, (2.11) Proof. We use Markov inequality and Proposition 2.1. Then, we optimize in λ to get the result. In the following subsection, we outline our main application of Theorem 2.2 for which long time concentration holds.

Long time concentration inequalities for occupation measures
We now apply our general result to specific functionals to get the following theorem.

Theorem 2.3.
Let H ∈ (0, 1) and ∆ > 0. Let n ∈ N * and T 1. Then, and for all r 0, Proof. We apply Theorem 2.2 with the following functions F andF : which are respectively f Lip n -Lipschitz with respect to d n (defined by (2.1)) and f Lip T -Lipschitz with respect to d T (defined by (2.2)).

Sketch of proof
Recall that F Y andF Y are defined by (2.7). The key element to get the bound (2.8) and (2.9) is to decompose F Y andF Y into a sum of martingale increments as follows. Let (F t ) t 0 be the natural filtration associated to the standard Brownian motion W from which the fBm is derived through (2.4).
With these definitions, we have: where ⌈T ⌉ denotes the least integer greater than or equal to T .
With this decomposition in hand, we first estimate the conditional exponential moments of the martingale increments M k − M k−1 andM k −M k−1 to get Proposition 2.1. This is the purpose of Proposition 5.2 for which the proof is based on the following lemma: Then for all λ > 0, Proof. Since X is centered, by using the series expansion of the exponential function, we have: Since for all t ∈ R, |t| so that: Hence, we have in (3.3): which concludes the proof.
Remark 3.1. The previous proof follows the proof of Lemma 1.5 in Chapter 1 of [13]. We chose to give the details here since this step is crucial to get our main results.
Finally, the end of the proof of Proposition 2.1 (i) is based on the following implication: if there exists a deterministic sequence (u k ) such that The same arguments are used for item (ii) of Proposition 2.1.
Sections 4 and 5 are devoted to the proof of Proposition 2.1. The first step, detailed in Section 4, consists in giving a new expression to the martingale increments and to control them. The second step, which is outlined in Section 5.1, focuses on managing the conditional moments of these increments to get Proposition 5.2. The proof of Proposition 2.1 is finally achieved in Section 5.2.
Throughout the paper, constants may change from line to line and may depend on σ without being specified.

Control of the martingale increments
For the sake of clarity, we set ∆ = 1 in the sequel, so that by (2.7) we have t k = k. When ∆ > 0 is arbitrary, the arguments are the same, it sufficies to apply a rescaling.
Through equation (2.6) and the fact that b is Lipschitz continuous, for all t 0, Y t can be seen as a measurable functional of the time t, the initial condition x and the Brownian motion (W s ) s∈ [0,t] . Denote by Φ : (4.1) Now, let k 1, we have With exactly the same procedure, we get Let us introduce now some notations. First, for all t 0 set u := t − k + 1, then for all u 0, we define otherwise, . We then have and , we deduce from (4.4) and (4.5) that for all u 0 In the remainder of the section, we proceed to a control of the quantity |X u −X u |. We have the following upper bound on |X u −X u |: Proposition 4.1. There exists C H > 0 such that for all u > 0 and k ∈ N * , where X u −X u is defined in (4.6), Ψ H is defined by In where Ψ H is defined in Proposition 4.1.
Proof. Let u 2. In the following inequalities, we make use of Hypothesis 2.1 on the function b and of the elementary Young inequality a, b 1 2 ε|a| 2 + 1 ε |b| 2 with ε = 2α. By (4.6), We then apply Gronwall's lemma to obtain (4.8) Now, we set for all v 2, We apply an integration by parts to ϕ k taking into account that W Recall that by (4.8), our goal here is to manage To control each term involving I 1 , I 2 and I 3 in (4.11), we will need the following inequality:

Inequality (4.12) is obtained through Lemma 4.2 and the elementary inequalities
Proof. It is enough to apply an integration by parts and then use that sup v∈ [2,u] to conclude the proof.
It remains to show how the terms involving I 1 , I 2 and I 3 in (4.11) can be reduced to the term (4.12). Let us begin with I 1 which is straightforward: Then, using the definition of I 2 , (4.14) Finally, where the last inequality is given by the following fact: there exists C H > 0 such that for all k = 1, sup It remains to combine the three above inequalities (4.13), (4.14) and (4.15) with (4.12) to get the following in (4.11): Putting this inequality into (4.8) gives the result (we can replace u − 1 by u, the inequality remains true when u 2 up to a constant).

When k = 1
where Ψ H is defined in Proposition 4.1.
Proof. The proof begins as in the proof of Lemma 4.1. We have through inequality (4.8): Then, we use Lemma 4.2 in the previous inequality, which gives: This inequality combined with (4.16) concludes the proof (we can replace u − 1 by u, the inequality remains true when u 2 up to a constant).

Second case : u ∈ [0, 2]
The idea here is to use Gronwall lemma in its integral form. By Hypothesis 2.1, b is L-Lipschitz so that: Then, for u ∈ [0, 2], For all k 1 and for all v ∈ [0, 2], we set The inequality (4.18) combined with Lemma 4.1 and Lemma 4.3 finally prove Proposition 4.1. (ii) There exists C, ζ > 0 such that for all k ∈ N * and for all p 2,

Conditional moments of the martingale increments
where ψ n,k := n−k+1 To prove this result, we first need the following intermediate outcome. defined by (4.19). Then, for all p 2, there exists C > 0 such that Ψ H (u, k) and Ψ H is defined in Proposition 4.1.
The same occurs forM instead of M by replacing F byF and ψ n,k by ψ ′ Proof. For the sake of simplicity, assume that F Lip = 1. By inequality (4.2), we have for all p 2, Now, we use Proposition 4.1 and for the sake of clarity we set Then, by Jensen inequality, Recall that W (k) = (W s+k−1 − W k−1 ) s≥0 and thus W (k) is independent of F k−1 . Then, We denote by F (k) the filtration associated to W (k) , we rewrite Using the elementary inequality (a + b) 1/p a 1/p + b 1/p , we finally get : p 1/p and the proof is over since W (k) andW (k) have respectively the same distribution as W (1) andW (1) .

Proof of Proposition 2.1
We have the following result: Proposition 5.2. (i) There exists C ′ , ζ > 0 such that for all k ∈ N * and for all λ > 0, (ii) There exists C ′ , ζ > 0 such that for all k ∈ N * and for all λ > 0, Proof. Let us prove (i). From E[M k − M k−1 |F k−1 ] = 0 and Proposition 5.1, we immediately get the result by using Lemma 3.1.

A Sub-Gaussianity of the supremum of the Brownian motion
Consequently, for all p 2, Proof. sup Therefore for all x 0, we have P sup P(X > x)dx for non-negative random variables and a simple change of variable.

B Uniform sub-Gaussianity of G (k)
α, [0,2] In this section, we consider the following Gaussian processes: for all k ∈ N * , where (W t ) t∈[0,T ] is a d-dimensional Brownian motion and K H is defined by (2.5).
Remark B.1. Since we are interested in the law of G (k) , we have replaced W (k) by W in the expression of G (k) given by (4.19).
First, we have the following control on the second moment of G (k) -increments.
⊲ Second case: k = 1 Let us divide this part of the proof into three new cases: First, consider 0 v ′ < v 1, then G (1) coincides in law with the fractional Brownian motion: Secondly, for 1 v ′ < v 2, by (B.5):