Sub-exponential convergence to equilibrium for Gaussian driven Stochastic Differential Equations with semi-contractive drift

The convergence to the stationary regime is studied for Stochastic Differential Equations driven by an additive Gaussian noise and evolving in a semi-contractive environment, i.e. when the drift is only contractive out of a compact set but does not have repulsive regions. In this setting, we develop a synchronous coupling strategy to obtain sub-exponential bounds on the rate of convergence to equilibrium in Wasserstein distance. Then by a coalescent coupling close to terminal time, we derive a similar bound in total variation distance.


Introduction
We consider a class of non-Markovian Stochastic Differential Equations (SDE) in R d , d ≥ 1 driven by a Gaussian process (G t ) t≥0 with stationary increments, of the following form: where b : R d → R d is a (at least) continuous vector field and σ a d × d constant invertible matrix. In a seminal paper, Hairer [10] provided a Markovian structure above such SDEs driven by fractional Brownian motion (fBm) with the help of the following Mandelbrot-Van Ness decomposition of the fBm: A series of ergodicity results on the existence and uniqueness of the invariant distribution were then established, including rates of convergence to equilibrium in total variation distance. A significant stream of literature followed, focusing in particular on: the elaboration of an ergodic theory for SDEs with extrinsic memory [11,21], extensions to SDEs with multiplicative fractional noise [7,8,12], etc. Moreover, the recent developments in statistical estimation for fractional SDEs [13,18,19] benefited from this theory. The strategy in [10] is to develop a coalescent coupling method in this setting. We recall that coalescent coupling means that one tries to stick together two coupled paths X and Y of the SDE starting from different initial conditions. However, since the SDE is not Markovian, or more precisely since the increments of the fBm depend on the whole past, a non-trivial coupling is needed for the paths X and Y to remain together after being sticked. Unfortunately, getting the paths together generates a waiting time between the coupling attempts which is very large and this constraint leads to very slow rates of convergence of order t −(α−ε) for any ε > 0, where 1 4 ].

Notations
The usual scalar product on R d is denoted by , and for x = (x 1 , . . . , x d ) ∈ R d , |x| stands for the Euclidean norm. M d stands for the space of diagonal square matrices of size d endowed with any norm · . For some probability measures ν and µ on R d and r ≥ 1, we denote by W r (ν, µ) the r-Wasserstein distance between ν and µ, defined by: The total variation distance is defined by where B(R d ) denotes the Borelian σ-field on R d and for a probability distribution π, π(f ) := f (x)π(dx). Denote by C([0, ∞), R d ) the space of continuous functions from [0, ∞) with values in R d . We introduce a Wasserstein-type distance on the space P(C([0, ∞), R d )) of probabilities on C([0, ∞), R d ), defined by: for P and Q ∈ P(C([0, ∞), R d )), for r ≥ 1, where X . = (X t ) t≥0 and Y . = (Y t ) t≥0 . For a T > 0, one will also use the notation X T +. := (X T +t ) t≥0 . Observe that W ∞ r certainly induces a topology on P(C([0, ∞, R d )) which is stronger than the usual weak topology induced by the topology of uniform convergence on compact sets. The following norms on functional spaces will be encountered: for continuous functions (d(t)) t∈[0, 1] and (w(t)) t∈R+ , We frequently use the letter C to represent a positive real number whose value may change from line to line. The expressions a ∨ b and a ∧ b where a and b are any real numbers, stand respectively for the maximum of a and b, and its minimum.

The fractional case
For the sake of clarity, we choose to focus first on the case where (G t ) t≥0 is a standard d-dimensional fractional Brownian motion with Hurst parameter H ∈ (0, 1). In this case, a rigorous definition of invariant distribution has been introduced in [10]. More precisely, with the help of the Mandelbrot-Van Ness representation (see Section 2.3 for background) and a two-sided version of the noise process denoted by (B H t ) t∈R , (X t , (B H s+t ) s≤0 ) t≥0 can be realised through a Feller transformation (Q t ) t≥0 . In particular, an initial distribution of the dynamical system (Y, B H ) is a distribution µ 0 on R d × W, where W is an appropriate Hölder space (cf. Appendix A). Rephrased in more probabilistic terms, an initial distribution is the distribution of a couple (Y 0 , (B H s ) s≤0 ). For an initial distribution µ, one denotes by L((X µ t ) t≥0 ) the distribution on C([0, +∞), R d ) of the process starting from µ. Then, such an initial distribution is called an invariant distribution if it is invariant by the transformation Q t for every t ≥ 0. With a slight abuse of language, one says that two invariant distributions ν 1 and ν 2 are equivalent if L((X ν1 t ) t≥0 ) = L((X ν2 t ) t≥0 ). As mentioned before, we assume in this paper that the drift term is contractive only out of a compact set but also that there are no "repulsive" regions. This corresponds to the first two items of the following assumption: (C1): x − y, b(x) − b(y) ≤ 0 .
(C1 ii ) There exist κ, R > 0 such that (C1 iii ) b is locally Lipschitz with polynomial growth: there exists C, N > 0 such that ∀x ∈ R d , |b(x)| ≤ C(1 + |x| N ). Moreover, we recall that σ is always assumed to be invertible in this paper.
Under (C1), it can be shown that existence and uniqueness hold for the solution to (1) (despite the fact that b is only locally Lipschitz continuous) and that (1) can be embedded into a Feller Markov process (see Appendix A for details). Furthermore, existence and uniqueness hold for the invariant distribution ν when σ is invertible (see e.g. [10]). We denote by Qν the (unique) distribution of the whole process and byν the first marginal of ν. Note that if (Y t ) t≥0 denotes a stationary solution to (1), thenν = L(Y t ) for any t ≥ 0.
As mentioned before, when b = −∇U where U : R d → R is C 2 , (C1) is fulfilled when U is convex everywhere, uniformly strictly convex out a compact subset of R d , and with partial derivatives with polynomial growth. We are now in a position to state our first main result.

2(1−H−ǫ)
1+υ −1 +2(1−H−ǫ) . More generally, In the particular case where X 0 has moments of any order, letting υ go to ∞ leads to the result announced in the introduction : for any ε > 0, there exists a positive constant C such that Note that the rate of convergence decreases with H, which is reasonable since the memory increases with H.
Remark 2.1. The functional generalisation (4) is an obvious consequence of (3). Actually, our proof is based on a synchronous coupling and (C1) ensures that if X and Y are two solutions built with the same fractional Brownian motion, is a stationary solution, Let us remark that in the Markovian literature, the functional generalisation always holds independently of the way of coupling, which explains that this precision is rarely mentioned.
When convergence holds in Wasserstein distance, a classical method to deduce total variation bounds is to wait sufficiently that the paths get close and then to attempt a coalescent coupling (once only). This strategy can be applied in the fractional case and a suitable calibration of the parameters leads to the following result : Theorem 2. Let the assumptions of Theorem 1 be in force and assume that b is Lipschitz continuous when H > 1/2. Let ν t denote the law of X t . Then, 4 (i) For any ǫ > 0, there exists C > 0 such that The proof of this theorem is achieved in Subsection 7.2.
Remark 2.2. One is thus able to preserve the orders obtained in Wasserstein distance. This property can be interpreted as follows: the cost for sticking the paths is negligible with respect to the one that is needed to get the paths close.
Remark 2.3. It is worth noting that here, the functional result is not a trivial consequence of the marginal one. Actually, (5) requires to prove that the paths can remain stuck between t and +∞ and more precisely, to show that the cost of this request is small enough.

The general case
Consider now the general case where the driving process is a purely nondeterministic R d -valued Gaussian process G = (G (1) , . . . , G (d) ) with stationary increments. We assume that the components of G are independent (although this assumption can be lifted, see Remark 2.7).
Here, the purely nondeterministic property means that each component admits a moving-average representation of the following type: there exists a M d -valued function (G(t)) t∈R = (g i (t)) t∈R,i∈{1,...,d} such that ∀t > 0, G(t) = 0, and where W is a standard two-sided R d -valued Brownian motion and G satisfies Remark 2.4. The "purely nondeterministic" property is usually defined in a slightly more general way for non-Gaussian processes, but we show in Appendix B that it is equivalent to the above moving-average representation in the Gaussian case. This explains our slight abuse of language. In fact, this assumption means that there is no time-dependent deterministic drift in the noise process (in a sense made precise in Appendix B). We could have taken into account such a drift assuming some growth and regularity conditions, but at the cost of heavier notations, thus we chose not to.
In order to be able to extend our main results to the general case, we introduce a set of assumptions on the kernel G.
• For H i < H ′ i ∈ (0, 1), a kernel of the form g i (u) = (−u) It yields a process G (i) with the local regularity of an H i -fBm and the long-range dependence of an H ′ i -fBm. Remark 2.6. We make a few comments on (C2): (C2 i ) is equivalent to say that there exists i such that g i is not zero on a set of positive Lebesgue measure. It is a fairly natural condition that ensures that the law of G (i) has full support in C(R + ) (see [3]). Besides, up to a shift in the definition of G, we can assume that supp(G) ∩ [−1, 0] = ∅. (C2 ii ) refers to the memory of the noise process, and α plays an important role in our theorems. For instance, let us consider a one-dimensional fBm of parameter H. In this case, Keeping in mind that the memory of the fBm increases with H, one can thus interpret the parameter α as follows: the weight of the memory decreases when α increases. (C2 iii ) implies that G is a.s. Hölder-continuous with Hölder exponent depending on ζ (see further). Note that ζ can be negative. Having in mind that G is integrable near 0 − (due to (7)), (C2 iii ) mostly states that G is not too "pathological" around 0 − . The case of an H-fBm corresponds to ζ = 1 2 − H. Note also that ζ does not appear in the rate of convergence γ (see Theorem 3 below).
By Proposition A.2, under (C1) and (C2), SDE (1) admits a.s. a unique solution owing to the a.s. continuity of (G t ) t≥0 . The definition of invariant distribution is similar to the one recalled in the fractional case. The idea is to build a stochastic dynamical system over the SDE through the moving-average representation (6) of (G t ) t≥0 (which corresponds to the Mandelbrot-Van Ness representation when G is a fBm) and this way, to embed (X t ) t≥0 into a Feller Markov process on the product space R d × W, where W denotes an appropriate Hölder space. We go back to this construction in Appendix A. Then, for the existence of invariant distribution in the general case, we refer to Proposition A.4 where we prove that existence holds under (C1) and (C2). As concerns the uniqueness, it will be given by the main theorem (the coupling method used to evaluate the rate of convergence is also a way to prove uniqueness of invariant distribution). We are now able to provide the extension of Theorem 1 to the general case.
This result thus emphasises that the rate of convergence depends mainly on the long-time parameter α. When the process is a fBm, one retrieves Theorem 1 since in this case, α = 1 2 − H. Remark 2.7. Following carefully the proof of this theorem, one can see that it is still true for a noise process with dependent components. In particular, the formulation of Assumption (C2) is valid for this more general setting. The main nontrivial modification concerns the support of the law of G in that case. This question is addressed in Remark 5.6. Now, let us focus on the generalisation of Theorem 2, which reveals an additional difficulty. Actually, the proof of Theorem 2 is based on an explicit construction of the coupling of the fBms which in turns implies to be able to build the coupling between the underlying Brownian Motions (of the Mandelbrot-Van Ness representation). More precisely, this comes down to solve the following problem: for a given kernel g : (−∞, 0) → R and a given (smooth enough) function ϕ, find a function Ψ such that: 6 When g(t) = (−t) H− 1 2 , this equation has an explicit solution (see Lemma 4.2 of [10]) given by where c H is a real constant. In other words, one is able to invert explicitly the operator related to (8) in this case. In the general case, a way to overcome this absence of explicit form would be to prove the invertibility of the operator and to provide some related properties, which is a priori a difficult problem. In Subsection 7.4, we give heuristics on how to find Ψ in general. However we choose here to only provide a set of ad hoc conditions which are sufficient to extend Theorem 2: (C3): For each i ∈ {1, . . . , d}, for any C 1 -function ϕ : (0, +∞) → R, there exists a function Ψ ϕ : (0, +∞) → R such that and there exists a C 1 -function h i : (−∞, 0) → R such that for any ϕ ∈ C 1 ((0, +∞); R), Ψ ϕ is given by Moreover, one of the two following statements holds true: (9)). When H > 1/2, Assumption (C3 i ) holds whereas (C3 ii ) holds when H < 1/2.
We then have the following result: 3 Overview of the proof of the theorems 3

.1 Decomposition of the driving process
To understand the memory structure of the Gaussian process (G t ) t≥0 , one can consider the Mandelbrot-Van Ness representation equivalent to (6), given by: where (W t ) t∈R is a two-sided R d -valued Brownian motion and G(u) is a diagonal matrix with entries g i (u) satisfying (C2). This representation immediately gives rise to the decomposition where the process D is seen a the "past" component encoding the "memory" of W , while Z stands for the "innovation" process (when looking at G after time τ ). For given τ > θ ≥ −∞ and ∆ ≥ 0, we subdivide D into (D t (θ, τ )) t≥0 and (D ∆ t (θ)) t≥0 respectively defined for all t ≥ 0 by Hence for ∆ = τ − θ, G reads With an adequate choice of θ and τ , this is the decomposition of the noise between "remote" past and "recent" past that we shall use. Finally, the components of the previously defined processes are denoted by D

Strategy of proof for the convergence in Wasserstein distance
This subsection gives an overview of the proof of Theorem 3 (from which Theorem 1 is a consequence in the special case of fractional Brownian motion). We already pointed out that existence and uniqueness hold for the invariant measure ν of (X t , (G s+t ) s≤0 ) t≥0 , where X is the solution to (1) and (G t ) t∈R denotes a Gaussian process of the form (6) satisfying (C2). Now consider a synchronous coupling of the SDE (1): with generalised initial conditioñ where P G denotes the distribution of (G t ) t≤0 on a Hölder-type space Hρ ;α,ζ (see Appendix A) and the transitions probabilities µ 1 (·, dx) and µ 2 (·, dy) correspond respectively to the conditional distributions of X 0 and Y 0 given (G t ) t≤0 . Furthermore, one assumes that the law of X 0 , here denoted by µ 1 , satisfies the moment condition given in Theorems 1 to 4, and µ 2 ⊗ P G = ν where ν denotes the unique invariant distribution. Hence, (Y t ) t≥0 is stationary and in particular, L(Y t ) =ν whereν denotes the first marginal of the invariant distribution ν. As a consequence, and the strategy is now to control the right-hand side of the previous inequality. To this end, let (τ k ) k∈N be any non-decreasing sequence of stopping times. The following inequality is the starting point of our proof. Assuming that the expectations below are finite, we have for υ > 0 and all t ≥ 1, 8 In Section 4, we build an increasing sequence of stopping times (τ k ) k≥1 such that ∀k ≥ 1, where η ∈ (0, 1) and K is a given positive number independent of k. Condition (16) means that at time τ k , the supremum norm of the memory term of the decomposition introduced in Subsection 3.1 is bounded with a large probability. Roughly, the consequence is that the dynamics of the SDE between τ k and τ k+1 is not so far from a standard diffusion perturbed by a controlled drift term. Such a property is certainly of interest if one is able to obtain some probabilistic bounds on the sequence (τ k ) k≥1 . More precisely, one can build a sequence (τ k ) k≥1 such that the condition (16) holds and such that for any p ∈ N, This is the aim of Section 4. With such a rough view, one hopes to obtain a contraction property between τ k and τ k+1 . More precisely, we shall prove that where ρ lies in (0, 1) (and is independent of k). Establishing such a property will be the purpose of Section 5 below. The fundamental idea there is to send X far enough from the origin, in a region where exponential contraction happens independently of the position of Y . This is achieved using the support of the process (Z t (τ )) t∈[0,1] defined in (12), so that reaching this region happens with positive probability. The final step in the proof of Theorem 3 happens in Section 6. In view of (17), a sub-exponential bound for P (τ k+1 > t − 1) is computed, which combined with (18) and injected in (15) and then (14), yields the expected result.

Strategy of proof for the convergence in total variation distance
From the definition of the total variation distance, we have the following inequality: Using the synchronous coupling of the noises used so far up to time t − 1, we have seen that we are able to control the L 2 -distance between X and Y and hence, to lower-bound the probability that X and Y be close at time t − 1. This coupling is very convenient as it is in some sense "free of the past". Then, when X t−1 and Y t−1 are close, the idea to get bounds in total variation is to show that the cost of the coalescent coupling between t − 1 and t is "small" (or equivalently, the probability that X t = Y t is high). This part is achieved using a Girsanov-type argument close to [10]: one exhibits a (random) function ϕ defined on [t − 1, t] such that if the driving Gaussian processes G and G of X and Y satisfy (on a subset Ω 1 of Ω) then, the paths stick at time t. Then, the Girsanov theorem is applied on the underlying Wiener processes involved by G and G (this step requires Assumption (C3) in the general case) and an optimization of the parameters shows that the order of the Wasserstein rate of convergence is preserved in total variation. To extend the result to the functional setting ((5) in the fractional case), the additional step is to show that the (non-trivial) coupling which is necessary to preserve that X and Y stay together after time t is also small when when X t−1 and Y t−1 are close. 9 4 Construction and properties of (τ k ) k∈N The aim of this section is to exhibit a sequence of stopping times which satisfies (16). We also obtain that the probability tails of these stopping times decrease algebraically fast. First, we give the following simple consequences of (C2).
Lemma 4.1. Let G be an M d -valued function satisfying (C2) and W be the a.s. continuous version of a two-sided R d -valued Brownian motion. Then and there exists C > 0 such that ∀r ≤ 1: Proof. Note that the proof reduces to a one-dimensional problem, since it suffices to prove the above claims for all the diagonal elements of G independently. Hence, let g be any of the diagonal entries of G and remark that g satisfies (C2). Starting with the proof of b), it follows from (C2 ii ) that As a consequence, one can prove a), since we have Noticing that g ′ (r) = r −∞ g ′′ (u) du for r ≤ −1, the proof of the first part of c) follows along the same lines. Then the two cases depend on the integrability of (−r) −(α+1) at infinity. If α > 0, then it is integrable and we get which gives the result. If α ≤ 0, then one has to integrate between r and −1, which adds a constant to the upper bound of g.
The inequalities of (d) are consequences (C2 iii ): if ζ < −1, While if ζ ≥ −1, we can no longer integrate between r and 0: The bounds on g follow by exactly the same method.

Construction
We propose an iterative construction of the stopping times. First, fix τ 0 = 0 and assume that for With the constant α > − 1 2 from (C2), let us set, for T ≥ 2 and ǫ ∈ (0, α + 1 2 ): = W 1 2 +ǫ,∞ (recall the definition of this norm from Section 2.1). It will be convenient to write α ǫ = α + 1 2 − ǫ. Then, let us define where C 1,αǫ = C1 αǫ and C 1 was the constant in (C2 ii ). Finally, set Of course the first condition of (16) is satisfied for this construction of (τ k ) k∈N . The next proposition shows that with this choice of (τ k ) k∈N , the second condition of (16) is also satisfied.

Proposition 4.2.
With the notations of Subsection 3.1 and (τ k ) k∈N as above, the following inequality holds for any k ≥ 1, almost surely: Proof. In view of Lemma 4.1, we can integrate-by-parts the following expression: where we used the fact that lim Hence, Then by a change of variable, where the second inequality is a straightforward consequence of assumption (C2 ii ).
for any T ≥ 2, we observe that by our choice of ∆ k in (19), which gives the desired result.
The following technical lemma will be useful in the proof of the second main proposition of this section. Consider the process (R t ) t∈[0,1] defined by The same conclusion holds for R. In particular, there exists K R > 0 such that Proof. We shall prove that for any t ∈ [0, 1], As a consequence of (21), the increment stationarity and the Gaussian property of G, it follows that for any p ≥ 1 and any s, t ∈ [T, T + 1], Hence by Kolmogorov's continuity criterion, G has a Hölder-continuous modification of any order It is then clear that the bound (21) also holds for R, and so the Hölder continuity as well. Besides, since the random variable M has a finite first moment, it follows that Thus the desired inequality follows from Markov's inequality with We now prove Inequality (21). It is enough to prove it for a single component of G, so fix i ∈ {1, . . . , d}: For I 1 , Lemma 4.1 c) and d) are used to get: Now for I 3 , Lemma 4.1 d) implies that: Since we assumed that t ∈ [0, 1], this always yields I 3 ≤ Ct 1−2ζ . Finally, I 2 is bounded similarly to I 3 since: and this gives the expected result.
Remark 4.4. When ζ < −1, one could argue that R is more than h-Hölder continuous for some h < 1. Since we do not want (and do not need) to discuss Hölder regularity for Hölder exponents larger than 1, we stick to the possibly non-optimal statement of the above lemma.
We are now ready to prove that the third condition of (16) is satisfied.
With the notations of Subsection 3.1 and (τ k ) k∈N as above, there exists K ∈ R + and η > 0 such that the following inequality holds true for any k ≥ 1, Proof. We divide D(1 + τ k−1 , τ k ) into two parts: By integration-by-parts again, the first term in the RHS of the previous equality reads We deduce the following upper bound: In view of (C2 ii ) and Lemma 4.1 c), we obtain where S k,ǫ := sup Now consider the event where we shall readily determine a suitable constant S H,ǫ .
In view of (23), it suffices to find S H,ǫ such that ̟ S k,ǫ ≤ S H,ǫ happens with positive probability. Observe that Since we know by Fernique's theorem that E exp{ W 1 2 +ǫ,∞ } < ∞, it follows from Markov's inequality that for all s ≥ 0, Hence we deduce from (24) that for the following inequality holds true: Let us define p S := P( W 1 2 +ǫ,∞ ≤ S H,ǫ ) and observe that p S is independent of k. The second term in (22) is D t (τ k − 1, τ k ) and it satisfies where the process R was defined in (20). From Lemma 4.3, we recall that The independence of the processes D 1 where p S and p R are as in the previous paragraphs. This concludes the proof.

Probability tails of τ k
Proposition 4.6. With the previous notations, we have that for all p ≥ 1 and for all t > 0, Proof. Recall that in the previous subsection, we defined τ k = 1 + τ k−1 + ∆ k and ∆ k = 1 ∨ αǫ . Hence for any p ≥ 1, From the fact that for any j ∈ N, S j,ǫ The desired result readily follows from Markov's inequality.

Contraction between successive stopping times
where is as in (12) with τ = 0. Define the following sets, for any r > 0: and set the following hitting times, for any continuous process X and Borel set E, From the definition of R in (C1 ii ), we know that contraction happens on C r .
Proof. Owing to (C1), it is enough to prove the result when |y| ≤ R. LetR be a positive number strictly greater than R and assume that |x| ≥R. For a given β ∈ (0, 1], set By (C1), On the other hand, for any small positive ε As a consequence, for all |x| ≥R and y ∈ R d , Since β(R) → 1 asR goes to infinity, we can fixR 0 such that for anyR ≥R 0 , β(R) ≥ 3 4 . Let R ≥R 0 and fix ε = κ/2. Then, setR large enough in such a way that for any |x| ≥R Then, the result holds withκ = κ/4.
Before stating the next lemma, which is crucial to prove the contraction, we recall the definition of the Cameron-Martin space of Z (i) . We refer to Chapter 8.4 and Appendix F in [14] for a general account on the link between the Cameron-Martin space and its realisation as a reproducing kernel Hilbert space. For any t ∈ [0, 1], set the following function: There exists a Hilbert space H(Z (i) ) of functions on T such that and where the completion is taken with respect to the norm of the inner product. Recall from Remark 2.6 that without loss of generality, we can assume that the support of G intersects [−1, 0]. Moreover, in view of (C2 i ) Z has at least one non-degenerate component. Let us assume without loss of generality that the first component is non-degenerate.
Proof. As a consequence of the Cameron-Martin formula for Gaussian measures [15, p.216], Hence we shall prove that the probability on the RHS of the previous inequality is positive. For this, consider the pseudo-metric induced by and its entropy number: where the infimum runs over all n-uples of d Z (1) -balls of radius at most ǫ.
Here we have and for h as in Lemma 4.3, we deduce from (21) that Note that if there is a map F such that N (ε) ≤ F (ε) and if there exist 1 < c 1 ≤ c 2 < ∞ such that for any ε > 0, t | ≤ ε > 0 for any ε > 0 and thus P( We now turn to the second part of this proof. Let t 1 ∈ (0, 1) be such that E[(Z Therefore the function ϕ given by Remark 5.3. When G is a fractional Brownian motion, Z is the so-called Riemann-Liouville process. In that case, it is known that the Cameron-Martin space of Z is equivalent to that of the fBm, which is dense in C 0 ([0, 1]). Thus by a general result on Gaussian measures, the support of P Z is C 0 ([0, 1]) (see for instance Theorem 3.6.1 in [1]), which implies the conclusions of Lemma 5.2.
(i) There exist some positive η and δ depending only on K such that for any x ∈ R d and any (d t ) t∈[0,1] ∈ C 0 (K), some random times 0 ≤ T 1 < T 2 ≤ 1 exist such that the process X x,d defined by (25) satisfies the following property with probability greater than η: (ii) If (C1) holds, there exists ρ > 0 such that for all p ≥ 2 and for all x, y ∈ R d , Proof. (i) The proof is based on Lemma 5.2 and on the fact that Z is almost surely α-Hölder continuous for a given positive α ∈ (0, 1) (for this last point, see Equation (26) and proceed as in Proposition 4.3). According to the assumptions of the beginning of this section, we assume that Z (1) is the component of Z with a non-degenerate support and let ϕ and t 0 < t 1 ∈ [0, 1] be as in Lemma 5.2. The first idea is to build a deterministic path ϕ : R → R which, up to ε, guarantees to attain a contraction area. We emphasise that the path ϕ is built carefully in order to avoid dependency on the parameters, and in particular on the initial condition x and on d. This leads to very rough controls (the arguments could be refined in view of quantitative bounds): we calibrate a value C 1 such that for a small ε, for all process (d t ) t∈[0,1] such that d ∞,[0,1] ≤ K, To this end, let us remark that it is enough to prove the property when |x| ≤R + K + 1. In this case, assume that T 1 > t 1 . Then, (X x,d t ) [0,t1] ⊂ B(0,R + K + 1) and hence, where for a given positive r, b ∞,r = sup x∈B(0,r) |b(x)|. But, if we set we remark that the right-hand member of (28) is greater thanR + K + 1 on the event which leads to a contradiction on Ω 1 . More precisely, if ϕ is a (deterministic path) such that |ϕ(t 1 )| = C 1 , then for any x ∈ R d and d such that If T 2 ≥ 1, the proof is achieved. Otherwise, we have on Ω 1 Let α ∈ (0, 1). For a given C 2 > 0, let If ω ∈ Ω 1 ∩ Ω 2 , we thus have: and hence, for all ω ∈ Ω 1 ∩ Ω 2 , We can now conclude the proof. Let η := P(Ω1) 2 . By Lemma 5.2, η > 0. Let α > 0 such that Z is α-Hölder continuous. Then, there exists C 2 large enough such that P(Ω 2 ) ≥ 1 − η. For this value, we set δ = δ(C 2 ). Then, by construction, the announced statement is true on Ω 1 ∩ Ω 2 and we have This concludes the proof.
(ii) Let F be the random (a.s.
We have: By the first statement and Lemma 5.1, we obtain that on Ω ϕ , for every t ∈ [T 1 , T 2 ], F ′ (t) ≤ − p 2κ F (t). Hence, But since x − y, b(x) − b(y) ≤ 0, the mapping t → |X x,d t − X y,d t | p is non-increasing and hence, Then, it follows that is independent of p and works as well). This concludes the proof.
We now assume that we are given a sequence satisfying (16).
Proposition 5.5. Assume that τ 0 = 0 and that (τ k ) k∈N is a sequence of stopping times which satisfy (16). Then there exists ρ ∈ (0, 1) such that (18) holds true, i.e. for any p ≥ 2, ∀k ∈ N, 1] ≤ K} the event on which the recent past is controlled. It is now clear that by "recent past", we mean the events that are σ (W s − W 1+τ k , 1 + τ k ≤ s ≤ τ k+1 )measurable. Thus on Ω k R , the whole past satisfies since Ω k R c is σ (W s − W 1+τ k , 1 + τ k ≤ s ≤ τ k+1 )-measurable. As mentioned previously, the whole past is controlled. Hence we get that in view of Proposition 5.4(ii). Hence, we can use again the independence between Ω k R and X 1+τ k − Y 1+τ k to get from the previous inequalities that which yields the desired result with ρ = 1 − η + ρη ∈ (0, 1).
Remark 5.6. Note that the assumption on the independence of the components of G only appeared in Lemma 5.2. Thus in order to extend Theorem 1 to the case where the components of G may be dependent, observe first that Z (1) is now a sum of d independent processes with at least one of them having a non-degenerate support: It should be clear that the first part of Lemma 5.2 is unchanged given that H(Z (1) ) is identified. Thus we claim that H(Z (1) ) is now spanned by the functions and is still non-degenerate in view of Remark 2.6. Hence taking now t 1 such that the rest of the proof follows accordingly.

Proof of Theorems 1 and 3
Recall that Theorem 1 is a special case of Theorem 3 in the case of a fractional noise. Hence we present the proof of the latter, which is built as follows. In Subsections 6.1 and 6.2, we consider the L 2 -control related to the parallel coupling of solutions to the SDE starting from x and y respectively. In the first one, we show that the L 2 -distance admits some polynomial bounds. Then, owing to a universal argument, we prove that this control extends to a sub-exponential bound. Finally, the proof of Theorem 3 is achieved in Subsection 6.3 where we integrate our bounds with respect to the invariant distribution. Then there exists ̺ > 0 such that for any p ≥ 1 and any t > 0,

Polynomial bounds
Proof. Using Equation (15), Proposition 5.5 and Proposition 4.6, we thus obtain which is the desired result.

From polynomial to sub-exponential bounds
Since the result of Proposition 6.1 is true for any large p, we shall obtain a sub-exponential rate of convergence (see Proposition 6.3 below) by looking carefully at C p . Before this, we need to recall Fernique's theorem, which gives the existence of λ 0 ∈ R + such that, for any λ < λ 0 , We will need the following consequence of Fernique's theorem: Lemma 6.2. Let β < 2 and B ∈ R + , then there exists C > 0 such that for any λ < λ 0 ,

Now from Markov's inequality and Fernique's theorem,
Distinguishing between λx 2/β−1 − B larger or smaller than 1, we get which gives the desired result.
Proposition 6.3. Let the assumptions of Theorem 1 hold and let (X, Y ) be the solution to (13) starting from (X 0 , Y 0 ) ∈ (L 2(1+υ) ) 2 . Then for any ǫ ∈ (0, α + 1 2 ), there exists C > 0 such that and recall that C ̺,p,υ = M υ 1+υ p k∈N e −̺ k 1+υ (k+1) pυ 1+υ . Let also P be a countable subset of (1, +∞) and (α p ) p∈P be a sequence of positive numbers, to be chosen in the next paragraph. By Proposition 6.1, we have that Hence Let Γ denote the gamma function and γ υ = (1 + υ −1 )γ, where γ is given in the Proposition. We choose α p of the form where the value of A will be set later. Let also P = {γ υ n : n ∈ N such that γ υ n ≥ 1}. We obtain for some positive c γυ which is independent of t. According to (30), this yields Hence it remains to prove that p∈P α p < ∞. We observe that Hence there are two constraints: i) the first one is that γυ αǫ < 2, to ensure that the expectation is finite for any k, and ii) that γ cannot be too large, to ensure that the previous series converges.

Proof of Theorem 3
First, notice that the existence of the stationary lawν of (1) is given by Proposition A.4. Hence, one can now consider a random variable Y 0 ∼ν and Y the solution to (1) started from Y 0 . According to Proposition A.4, Y 0 has moments of any order, thus Proposition 6.3 applies and we obtain: In view of (14), Equation (3) of Theorem 1 now follows (for any noise satisfying (C2)). As for the functional version (4), it is an easy consequence of the previous result and the fact that the mapping t → E|X t − Y t | 2 is non-increasing (see Remark 2.1). This concludes the proof of Theorem 3.

From Wasserstein to Total Variation Bounds
In this part, the aim is to prove Theorems 2 and 4. As mentioned before, the idea of the proof is the following: for a given t ≥ 0, use first the rate of convergence in Wasserstein distance by letting 22 the fBms being identical until time t − 1. Then, attempt a coalescent coupling between times t − 1 and t and hope that the fact that the paths are very close leads to a high probability of success (by success, we mean that X t = Y t ). Such a strategy will work if one is able to have a precise estimation of the probability of success at time 1 for two paths starting from two points x and y. Let us remark that the non-Markov feature of the process leads to some specific difficulties. For instance, a strategy like the mirror coupling seems to be difficult to use here since such a coupling is only a way to ensure that the paths meet together in a finite time (which can be controlled). But unfortunately, the price to pay to remain stuck seems to be too costly in this case. We thus follow the strategy initiated by Hairer [10], based on the addition of an adapted drift term and on the Girsanov theorem. However, we will see that such an approach works for the fractional Brownian motion for which the Volterra kernel has an explicit inverse but we will need to add ad hoc assumptions in the general case.

A first general property
The first step is independent of the Gaussian kernel. In this step, the idea is to identify a drift term which, added to the Gaussian noise of one of the components yields a sticking at time 1.
Proposition 7.1. For a given function ϕ : R + → R, denote by E ϕ (x, y) the SDE 1 defined by starting from (x, y). Then, one can build a C 1 -function (ϕ S (t)) t≥0 adapted with respect to σ(G s , s ∈ (−∞, t)) such that the (well-defined) solution (x t , y t ) t≥0 to E ϕS (x, y) satisfies x 1 = y 1 a.s. and such that where c does not depend on (x, y) and ϕ S . Furthermore, if b is Lipschitz continuous, then for any β ∈ (0, 1/2), a positive constant c exists such that Proof. To build the function (ϕ S (t)) t∈[0,1] , one slightly adapts the proof of [10, Lemma 5.8]. More precisely, one sets ρ(t) = y t − x t and remarks that if ϕ S is continuous, ρ is a C 1 -function which is a solution to Let us notice that ρ is certainly a random function depending on (x t ) t∈[0,1] and thus on (G t ) t∈[0,1] . Then, set z(t) = |ρ(t)| 2 . Owing to Assumption (C1), Let β ∈ (0, 1). Setting, with the convention 0/|0| β = 0, one obtains : In particular, z(1) = |y 1 − x 1 | 2 = 0. Furthermore, there exists c independent of x and y such that But, if b is Lipschitz continuous, a constant c exists such that The result follows. Now, we need to control the corresponding underlying Wiener increments related to the movingaverage representation (6). More precisely, let (x(t),x(t)) t≥0 be a couple of solutions to where (G, G) is a couple of two-sided Gaussian processes with kernel G and underlying two-sided Wiener processes (W, W ), as in (6). We also assume that and hence that, ( G t ) t≤0 = (G t ) t≤0 a.s. With these notations, one needs to answer to the following question : if on a subset of Ω, G t = G t + t 0 ϕ S (s)ds, what must be the corresponding relationship between W and W (on this same subset of Ω) ? At this stage, we choose to separate the fractional and general cases:

The fractional case
The proof of Theorem 2 is achieved at the end of this section and follows from the two next propositions. Here, we will denote the couple (G, G) introduced in (34) by (B H , B H ). and such that the following bound holds true: where c is a positive deterministic constant independent of x and y.
(ii) Let (x t ,x t ) t≥0 denote a solution to (34). There exists a constant C > 0 such that for any x, y ∈ R d such that |x − y| ≤ 1, where κ is defined in (i).
(iii) Furthermore, there exists a constant C > 0 such that for any x, y ∈ R d such that |x − y| ≤ 1, where for some path (z(t)) t≥0 and a given T > 0, z T +. = (z T +t ) t≥0 (and κ is defined in (i)). Remark 7.3. When H > 1/2, the result is still true for any κ ∈ (0, 1). Since it has no impact on the final exponent, we choose to state the result with κ = 1/2. The third statement emphasises the fact that one is able to keep the paths together until infinity and that the cost is of the same order as the one for sticking the positions. Let us remark that oppositely to [10] where the strategy of proof is based on a series of attempts, there is only one attempt here. This has several consequences on the proof. First, in the sticking part (corresponding to (ii)), a standard "optimal coupling" can be used since one does not need to worry about what happens when the coupling attempt fails. More precisely, it is not necessary to build a coupling strategy where one controls the distance between the underlying Wiener processes in case of failure. Similarly, in (iii) where the idea is to keep the paths together, the strategy of [10] was to try to get this property successively on a series on intervals (whose length increases exponentially) in order to preserve the possibility of trying again the attempt in case of failure. Here, the fact that there is only one attempt implies that the coupling strategy is built in such a way that at time 1, there are two possibilities: staying together until infinity or failing.
Proof. (i) Once again, the proof follows the lines of [10]. More precisely, by (9) Thus, if H < 1/2, where in the last line we used the controls established in Proposition 7.1. When H > 1/2, one uses the last statement of Lemma 4.2 of [10]: (ii) By construction, for any couple (W, W ) of Brownian motions on [0, 1], the corresponding couple of solutions satisfies: As a consequence, 0 Ψ S (s)ds and P W denotes the Wiener distribution on C([0, 1], R d ) (the last equality is obtained by a maximal coupling, see e.g. [6]). By Girsanov's Theorem, we know that Υ * P W is absolutely continuous with respect to P W with density D P Wa.s. defined by: where we choose to write Ψ S = Ψ w S in order to keep in mind that Ψ S is not deterministic. Thus, and hence, Let M > 0 and denote by We have: But, Then, by Jensen inequality and the fact that |x − y| ≤ 1, one deduces from (i) that On the other hand, by Cauchy-Schwarz and Chebyshev inequalities, (1 + D(w)) 2 P W (dw).

But
(1 + D(w)) 2 P W (dw) ≤ 3 + D(w) 2 P W (dw) As a consequence, there exists a constant C independent of M such that for any x, y such that |x − y| ≤ 1, To conclude, it is now sufficient to set M = 1 (for instance).  .
In other words, we suppose that the positions have stuck at time 1. In order to keep the paths together after time 1, we need that Then, let us remark that by (32) where c H is a positive constant. By construction, (Ψ S (t)) t≥1 is a σ(G s , s ≤ 1)-measurable function which satisfies a.s.: where C is a deterministic constant independent of x and y. By Proposition 7.1, we deduce that Combining with the first statement, one deduces that a universal constant C exists such that for every x, y such that |x − y| ≤ 1, By the same strategy of proof of (ii), one deduces the result. Actually, by construction, and hence, denotes the Wiener distribution on C([0, ∞), R d ). Then, following the lines of (ii), one deduces from (38) that which yields the result. Proposition 7.4. Let t > 1 and assume that (X, Y ) is a couple of solutions of the fractional SDE such that ( W s ) s≤t−1 = (W s ) s≤t−1 and such that there exists c 1 > 0 such that Then, there exists a constant c 2 > 0 and a coupling ( W s ) s∈[t−1,+∞] = (W s ) s∈[t−1,+∞] such that Proof. Let ε ∈ (0, 1]. Since we assume that ( W s ) s≤t−1 = (W s ) s≤t−1 , we deduce from Proposition 7.2, the increments of (W, W ) can be built on [t − 1, ∞) in such a way that if |X t−1 − Y t−1 | ≤ 1, Then, By the Markov inequality and (39), On the other hand, by (40), In order to optimise, we choose ε in such a way that The result follows.
Proof of Theorem 2: Let us recall that the first Wasserstein estimate of Theorem 1 is obtained through a synchronous coupling. Hence, (39) holds with ρ = γ (defined in Theorem 1). Theorem 2 then is a direct consequence of Proposition 7.4.

The general case
First, assume that h j satisfies (C3 i ) and let δ > 0. We have: The fact that lim t→0 h j (t) = 0 implies that the first member in the right-hand side goes to 0 as δ → 0. As a consequence, But using that h j is C 1 on (0, 1], Then, by the integrability condition on h ′ j of Assumption (C3 i ) and Proposition 7.1, one deduces that a constant c exists such that for every t ∈ (0, 1], Second, consider the case where h j satisfies (C3 ii ). By Proposition 7.1, ϕ S is C 1 on [0, 1]. By an integration by parts, one obtains: Since lim t→0 − I hj (t) = 0 and since h j is locally integrable (by Assumption (C3 ii )), a similar argument as before shows that Then, since h j belongs to L 2 ([−1, 0], R), one deduces from Proposition 7.
From what precedes and from (41), one deduces that a constant c exists such that But, d dt and hence, using Jensen inequality, By Proposition 7.1, one deduces that for x, y such that |x − y| ≤ 1,

About the existence of h i in (C3)
As mentioned before, the verification of Assumption (C3) seems to be a difficult problem that we choose not to address in this paper. Nevertheless, in this section, we show that this problem (at least the existence of h i ) can be connected with the inversion of the Laplace transform of the kernel G. For the sake of simplicity, let us consider the one-dimensional case and assume that Assumption (C3) is fulfilled. Then, plugging (11) into (10) and dropping the index i for short, one gets Let us for instance treat the case of h satisfying (C3 i ). Then one gets This equality holds for any ϕ ∈ C 1 (R + ; R d ) and for any t ≥ 0 if and only if or equivalently that Denoting the Laplace transform of a function f by L f (p) = +∞ 0 e −pt f (t) dt, p > 0 and applying it on both side of this equality, it follows that Hence, it would suffice to find h such that L h (p) = 1 p 2 Lg (p) . However it is generally a difficult matter to find, or even prove the existence, of the inverse Laplace transform. For instance, the Bromwich-Wagner formula provides a general criterion to invert the Laplace transform [17, p.268].
To illustrate the limitations of this approach and the reason we do not develop this question further, we take the example of the fractional kernel g(t) = t H− 1 2 . In that case, one has L g (p) ≈ p −H− 1 2 , and the map Acknowledgement. The authors are thankful to Benjamin Arras who suggested the trick to go from a polynomial to a sub-exponential bound.

A Invariant distribution of Gaussian driven SDEs
In this section, one wishes to give some precisions about the definition and the existence of invariant distribution for general Gaussian driven SDEs (see [4] for a similar but more probabilistic definition). As mentioned before, we use the construction of [10] (related to fractional SDEs) by building a stochastic dynamical system (SDS) over SDE (1). Denote by C ∞ 0 (R − ), the set of C ∞ -functions w from (−∞, 0] to R such that w(0) = 0 having compact support and set for a given ρ ∈ (0, 1) The application w → w ρ;α,ζ defines a norm on C ∞ 0 (R − ) and one denotes by H ρ;α,ζ the closure of C ∞ 0 (R − ) for the norm . ρ;α,ζ . When (ζ − α) + is removed in the previous definition (or if (ζ − α) + = 0), we write simply H ρ and · ρ its norm, and it is proven in Lemma 3.5 of [10] that H ρ is a Polish space for any ρ ∈ (0, 1). The first step of the construction of the SDS consists in considering the Volterra-type operator related to the kernel G. Following the lines of [10], we expect to be able, for each i ∈ {1, . . . , d}, to give a "regular" construction of the moving-average operator D gi related to (6), where for a function g : (−∞, 0] and a smooth function w : R → R with compact support, the operator D g is defined by Proposition A.1. Assume that g is a one-dimensional kernel satisfying (C2) and let ρ ∈ (2ζ∨0, 1) andρ = 2 ∧ (ρ − 2ζ). Then the linear operator D g is bounded (continuous) from H ρ to Hρ ;α,ζ .
Proof. Our proof closely follows the one from [10, Lemma 3.6], the difference lying in the use of assumption (C2) on the general kernel g. We have to prove that D g is bounded, i.e. that for any w ∈ C ∞ 0 (R − , R), D g w ρ;α,ζ ≤ C w ρ . Without loss of generality, let t > s and set h = t − s. Assume first that h ∈ [0, 1].
We shall use the exact same stationary noise process that was constructed in Lemma 3.10 of [10], namely where W is the Wiener measure on H ρ (which is in fact H ×d ρ , by a slight abuse of notations), (P t ) t≥0 is the transition semigroup associated to W (for which W is the only invariant measure) and (θ t ) t≥0 is an appropriate shift operator (see [10, p.722-723] for precise definitions). The second step is to show some existence, uniqueness and regularity properties related to SDE (1). To this end, consider for any T > 0 and for each x ∈ R d and each g ∈ Hρ, the solution Ξ(x, g) of the following ODE: We have the following property: is a well-defined function. Furthermore, Ξ is locally Lipschitz continuous on R d × C([0, T ], R d ).
Proof. This result corresponds to Lemma 3.9 of [10]. The only difference lies in the assumptions on the drift function which are slightly more general in this setting (more precisely, we do not make assumptions on the derivative of b). We thus provide several details. First, let x ∈ R d and g ∈ C([0, T ], R d ). For a given t 0 > 0, let F be the application from C([0, t 0 ], R d ) to C([0, t 0 ], R d ) defined by F (y)(t) = x + t 0 b(y(s))ds + g(t), t ∈ [0, t 0 ]. Let A r,x := {y : y(0) = x, y − x ∞,[0,t0] ≤ r}. For y ∈ A r,x , the fact that b is locally Lipschitz continuous implies that a constant C r0,x such that for any r ∈ (0, r 0 ], for any t ∈ [0, t 0 ], so that for a small enough t 0 , the set A r,x is stable by the application F . Furthermore, it can be checked that for t 0 small enough, the application F is also contractive on A r,x so that by the Banach fixed-point Theorem, existence and uniqueness classically hold for Ξ(x, g) on C([0, t 0 ], R d ). But, owing to Lemma A.3 below, there exists a constant C T depending only T such that Then, a maximality argument shows that Ξ(x, g) is well-defined on [0, T ].
Let us now prove the local Lipschitz property. For any positive r 1 and r 2 , set B = {(x, g), |x| ≤ r 1 , g ∞,[0,T ] ≤ r 2 }. Using that the control of the solutions established in (42) is locally uniform in the variable (x, g) (and available for t 0 = T ), one deduces that a constant C exists such that for any (x, g) and (y,g) ∈ B, By a Gronwall argument, this implies that (x, g) → Ξ(x, g) is Lipschitz continuous on B.
Then, the following controls hold true: for any T ≥ 0, a constant C exists such that for any t ∈ [0, T ], Proof. First, (C1) implies that a constant β exists such that for any x, y ∈ R d , where C denotes a positive constant. Then, let h denote the function defined by h(t) = e αt |x(t) − y(t)| 2 .
We have The first statement follows. As concerns the second one, this is a direct consequence of the Gronwall lemma.
For a given T ≥ 0, let R T denote the shift operator from C((−∞, 0], R d ) to C([0, T ], R d ) defined by: for every t ∈ [0, T ], R T u(t) = u(t − T ) − u(−T ). This operator is needed to achieve the increments of G in the following (at least formal) sense: for a given t 0 ≥ 0, In view of what precedes, one can now realise the SDE through the mapping ξ : R + × R d × H ρ → R d (t, x, w) → Ξ(x, R t D g w)(t).
Following the lines of [10], one deduces from Proposition A.1, Proposition A.2, and the embedding Hρ ;α,ζ ֒→ C(R − ; R) (for anyρ > 0) that ξ defines a stochastic dynamical system (SDS) over SDE (1) and the stationary noise process defined above. This embedding of the SDE into this SDS structure leads to the definition of an homogeneous Feller Markov transition (see [10] for details) and thus to invariant distributions on R d ×H ρ (related to this transition). We have the following result: We denote by L 2 g this class of functions and denote by D * g the operator which acts on φ ∈ L 2 g as follows: D * g φ(s) = +∞ 0 (φ(s) − φ(s + u)) g ′ (−u) du.
Lemma A.5. For any t > 0, the function defined by φ t (s) := 1 [0,t] (s)e s−t , s ∈ R, belongs to L 2 g . Hence, if G is the one-dimensional Gaussian noise with kernel g constructed on the two-sided Wiener process W , we have Besides, Proof. To prove that lim ǫ→0 +∞ ǫ (φ t (s) − φ t (s + u)) g ′ (−u) du exists, we use the continuous differentiability of φ t on (0, t) and the fact that lim u→0 + |ug ′ (−u)| ≤ Cu − ζ 2 (see Lemma 4.1 d)) which is integrable. Thus φ t belongs to the domain of D * g (as does any continuously differentiable function). Next we prove that (hence in particular that D * g φ t ∈ L 2 (R) for any t ≥ 0). We have that For the first term in the right-hand side of (47), it is clear that its supremum for t ∈ [0, 1] is finite. Thus we assume in the following that t ≥ 1. This reads Thus there exists C > 0 independent of t ∈ R + such that We recall the following facts: • there exists C > 0 (independent of t and s) such that | s+1 s (1 − e u−s ) g ′ (s − u) du| ≤ C (in view of Lemma 4.1 d)); • C g := sup s∈(−∞,−1] |g(s)| < ∞ (as a consequence of (C2 ii )); • 0 −1 g(s) 2 ds < ∞ (see (7)), and deduce from them that (recall that C can change from line to line) Thus we focus on the remaining term, and using (C2 ii ) we get: (1 + t − s) −2(α+1) ds, which is bounded uniformly in t since α > − 1 2 . Therefore the second term in the RHS of (47) is bounded for t ∈ R + and so we have proven (46).

B.1 Decomposition between purely nondeterministic and deterministic processes (Wold decomposition)
In the following definition, spA denotes the closure in L 2 (Ω) of the vector space spanned by A ⊂ L 2 (Ω). The representation of stochastic processes as a sum of a deterministic and purely nondeterministic process was an active field of research in the 50's and 60's, after the seminal work of Karhunen. We quote the following result which is well-suited to the framework of this paper. where W is an R d -valued standard Brownian motion and G is an M d -valued function satisfying (7).
Proof. This proof is a generalisation of the proof of [2, Theorem 4.2] which relies on the integral representation of stationary processes given in [9]. Let Y be the process defined from G as in Proposition B.3. Then the result from [9] implies that there exists an R d -valued standard Brownian motion and a kernelG such thatG ∈ L 2 (R) and suppG ⊆ R − , and (Y t ) t∈R