Total variation estimates for the TCP process

The TCP window size process appears in the modeling of the famous Transmission Control Protocol used for data transmission over the Internet. This continuous time Markov process takes its values in [0, \infty), is ergodic and irreversible. The sample paths are piecewise linear deterministic and the whole randomness of the dynamics comes from the jump mechanism. The aim of the present paper is to provide quantitative estimates for the exponential convergence to equilibrium, in terms of the total variation and Wasserstein distances.

transmitted, then W is increased by 1, otherwise it is divided by 2 (detection of a congestion). As shown in [DGR02,GRZ04,OKM96], a correct scaling of this process leads to a continuous time Markov process, called the TCP window size process. This process X = (X t ) t 0 has [0, ∞) as state space and its infinitesimal generator is given, for any smooth function f : [0, ∞) → R, by (1) The semi-group (P t ) t 0 associated to (X t ) t 0 is classically defined by for any smooth function f . Moreover, for any probability measure ν on [0, ∞), νP t stands for the law of X t when X 0 is distributed according to ν.
The process X t increases linearly between jump times that occur with a state-dependent rate X t . The first jump time of X starting at x 0 has the law of √ 2E + x 2 − x where E is a random variable with exponential law of parameter 1. In other words, the law of this jump time has a density with respect to the Lebesgue measure on (0, +∞) given by see [CMP10] for further details. The sample paths of X are deterministic between jumps, the jumps are multiplicative, and the whole randomness of the dynamics relies on the jump mechanism. Of course, the randomness of X may also come from a random initial value. The process (X t ) t 0 appears as an Additive Increase Multiplicative Decrease process (AIMD), but also as a very special Piecewise Deterministic Markov Process (PDMP). In this direction, [MZ09] gives a generalization of the scaling procedure to interpret various PDMPs as limits of discrete time Markov chains. In the real world (Internet), the AIMD mechanism allows a good compromise between the minimization of network congestion time and the maximization of mean throughput. One could consider more general processes (introducing a random multiplicative factor or modifying the jump rate) but their study is essentially the same than the one of the present process.
The TCP window size process X is ergodic and admits a unique invariant law µ, as can be checked using a suitable Lyapunov function (for instance V (x) = 1 + x, see e.g. [BCG08,CD08,MT93,HM11] for the Meyn-Tweedie-Foster-Lyapunov technique). It can also be shown that µ has a density on (0, +∞) given by x ∈ (0, +∞) → 2/π n 0 1 − 2 −(2n+1) n 0 (−1) n 2 2n n k=1 (2 2k − 1) (this explicit formula is derived in [DGR02,Prop. 10], see also [GRZ04,MZ09,MZ06,GK09] for further details). In particular, one can notice that the density of µ has a Gaussian tail at +∞ and that all its derivatives are null at the origin. Nevertheless, this process is irreversible since time reversed sample paths are not sample paths and it has infinite support (see [LP11] for the description of the reversed process). In [RR96], explicit bounds for the exponential rate of convergence to equilibrium in total variation distance are provided for generic Markov processes in terms of a suitable Lyapunov function but these estimates are very poor even for classical examples as the Ornstein-Uhlenbeck process.
They can be improved following [RT00] if the process under study is stochastically monotone that is if its semi-group (P t ) t 0 is such that x → P t f (x) is nondecreasing as soon as f is nondecreasing. Unfortunately, due to the fact that the jump rate is an nondecreasing function of the position, the TCP process is not stochastically monotone. Moreover, we will see that our coupling provides better estimates for the example studied in [RT00]. The work [CMP10] was a first attempt to use the specific dynamics of the TCP process to get explicit rates of convergence of the law of X t to the invariant measure µ. The answer was partial and a bit disappointing since the authors did not succeed in proving explicit exponential rates.
Our aim in the present paper is to go one step further providing exponential rate of convergence for several classical distances: Wasserstein distance of order p 1 and total variation distance. Let us recall briefly some definitions.
Definition 1.1. If ν andν are two probability measures on R, we will call a coupling of ν andν any probability measure on R × R such that the two marginals are ν andν. Let us denote by Γ(ν,ν) the set of all the couplings of ν andν.
Definition 1.2. For every p 1, the Wasserstein distance W p of order p between two probability measures ν andν on R with finite p th moment is defined by Definition 1.3. The total variation distance between two probability measures ν andν on R is given by It is well known that, for any p 1, the convergence in Wasserstein distance of order p is equivalent to weak convergence together with convergence of all moments up to order p, see e.g. [Rac91,Vil03]. A sequence of probability measures (ν n ) n 1 bounded in L p which converges to ν in total variation norm converges also for the W p metrics. The converse is false: if ν n = δ 1/n then (ν n ) n 1 converges to δ 0 for the distance W p whereas ν n − δ 0 TV is equal to 1 for any n 1.
Any coupling (X,X) of (ν,ν) provides an upper bound for these distances. One can find in [Lin92] a lot of efficient ways to construct smart couplings in many cases. In the present work, we essentially use the coupling that was introduced in [CMP10]. Firstly, we improve the estimate for its rate of convergence in Wasserstein distances from polynomial to exponential bounds.
For anyλ < λ, any p 1 and any t 0 > 0, there is a constant C = C(p,λ, t 0 ) such that, for any initial probability measures ν andν and any t t 0 , Secondly, we introduce a modified coupling to get total variation estimates.
Theorem 1.5. For anyλ < λ and any t 0 > 0, there exists C such that, for any initial probability measures ν andν and any t t 0 , where λ is given by (5).
Remark 1.6. In both Theorems 1.4 and 1.5, no assumption is required on the moments nor regularity of the initial measures. Note however that following Remark 3.4, one can obtain contraction's type bounds when the initial measures ν andν have initial moments of sufficient orders. In particular they hold uniformly over the Dirac measures. Ifν is chosen to be the invariant measure µ, these theorems provide exponential convergence to equilibrium.
The remainder of the paper is organized as follows. We derive in Section 2 precise upper bounds for the moments of the invariant measure µ and the law of X t . Section 3 and Section 4 are respectively devoted to the proofs of Theorem 1.4 and Theorem 1.5. Unlike the classical approach "à la" Meyn-Tweedie, our total variation estimate is obtained by applying a Wasserstein coupling for most of the time, then trying to couple the two paths in one attempt. This idea is then adapted to others processes: Section 5 deals with two simpler PDMPs already studied in [PR05, LP09, CMP10, RT00] and Section 6 is dedicated to diffusion processes.

Moment estimates
The aim of this section is to provide accurate bounds for the moments of X t . In particular, we establish below that any moment of X t is bounded uniformly over the initial value X 0 . Let p > 0 and α p (t) = E(X p t ). Then one has by direct computation 2.1 Moments of the invariant measure Equation (6) implies in particular that, if m p denotes the p-th moment of the invariant measure µ of the TCP process (m p = x p µ(dx)), then for any p > 0 It gives all even moments of µ: m 2 = 2, m 4 = 48 7 , . . . and all the odd moments in terms of the mean. Nevertheless, the mean itself cannot be explicitly determined. Applying the same technique to log X t , one gets the relation log(2)m 1 = m −1 . With Jensen's inequality, this implies that 1/ √ log 2 m 1 √ 2.

Uniform bounds for the moments at finite times
The fact that the jump rate goes to infinity at infinity gives bounds on the moments at any positive time that are uniform over the initial distribution.
Let us now fix t > 0. If β p (t) 0, then α p (t) (2p) p/2 and the lemma is proven. We assume now that β p (t) > 0. By the previous remark, this implies that the function s → β p (s) is strictly decreasing, hence positive, on the interval [0, t]. Consequently, for any s ∈ [0, t], Integrating this last inequality gives 1 β p (t) hence the lemma.
Let us derive from Lemma 2.1 some upper bounds for the right tails of δ x P t and µ.
Corollary 2.2. For any t > 0 and r 2e(1 + 1/t), one has Moreover, if X is distributed according to the invariant measure µ then, for any r √ 2e, one has Proof. Let t > 0 and a = 2(1 + 1/t). Notice that, for any p 1, E x (X p t ) is smaller than (ap) p . As a consequence, for any p 1 and r 0, P x (X t r) exp (p log(ap/r)).
Assuming that r ea, we let p = r/(ea) to get: For the invariant measure, the upper bound is better: E(X p ) (2p) p/2 . Then, the Markov inequality provides that, for any p 1, As above, if r 2 2e, one can choose p = r 2 /(2e) to get the desired bound.
Remark 2.3. A better deviation bound should be expected from the expression (3) of the density of µ. Indeed, one can get a sharp result (see [CMP10]). However, in the sequel we only need the deviation bound (7).

Exponential convergence in Wasserstein distance
This section is devoted to the proof of Theorem 1.4. We use the coupling introduced in [CMP10]. Let us briefly recall the construction and the dynamics of this stochastic process on R 2 + whose marginals are two TCP processes. It is defined by the following generator when x y and symmetric expression for x < y. We will call the dynamical coupling defined by this generator the Wasserstein coupling of the TCP process (see Figure 1 for a graphical illustration of this coupling). This coupling is the only one such that the lower component never jumps alone. Let us give the pathwise interpretation of this coupling. Between two jump times, the two coordinates increase linearly with rate 1. Moreover, two "jump processes" are simultaneously in action: 1. with a rate equal to the minimum of the two coordinates, they jump (i.e. they are divided by 2) simultaneously, 2. with a rate equal to the distance between the two coordinates (which is constant between two jump times), the bigger one jumps alone. The existence of non-simultaneous jumps implies that the generator does not act in a good way on functions directly associated to Wasserstein distances. To see this, let us define V p (x, y) = |x − y| p . When we compute LV p , the term coming from the deterministic drift disappears (since V p is unchanged under the flow), and we get for x/2 y x: For example, choosing p = 1 gives: This shows that E[|X t − Y t |] decreases, but only gives a polynomial bound: the problem comes from the region where x − y is already very small.
Choosing p = 2, we get The effect of the jumps of X is even worse: if x = 1 and x − y is small, LV 2 (x, y) is positive (≈ (1/4)(x − y)).
For p = 1/2, the situation is in fact more favorable: indeed, for 0 < y x, By a direct computation, one gets that Hence, when 0 < y x, with λ = 1 − M ∼ 0.1635. This would give an exponential decay for V 1/2 if x was bounded below: the problem comes from the region where x is small and the process does not jump.
To overcome this problem, we replace V 1/2 with the function the two positive parameters α and x 0 being chosen below. The negative slope of ψ for x small will be able to compensate the fact that the bound from (8) tends to 0 with x, hence to give a uniform bound. Indeed, for 0 < y x, Finally, as soon as (1 + α)M < 1, one has By direct computations, one gets that the best possible choice of parameters α and x 0 is α = 1/ √ M − 1 ∼ 0.0934 and x 0 = √ 2. We obtain finally, for any x, y > 0, 1208. Hence, directly from (9), for any x, y > 0 Immediate manipulations lead to the following estimate.
Remark 3.2. The upper bound obtained in Proposition 3.1 is compared graphically to the "true" function t → E x,y |X t − Y t | 1/2 in Figure 2. By linear regression on this data, one gets that the exponential speed of convergence of this function is on the order of 0.4. Note also that this method can be adapted to any V p with 0 < p < 1, giving even better (but less explicit) speed of convergence for some p = 1 2 : we estimated numerically that the best value for λ would be approximately 0.1326, obtained for p close to 2/3.
We may now deduce from Proposition 3.1 estimates for the Wasserstein distance between the laws of X t and Y t . Then, for any t 0 > 0 and any θ ∈ (0, 1), there exists C(p, t 0 , θ) < +∞ such that, for any initial conditions (X 0 , Y 0 ) and for all t t 0 , where λ is defined by (11).
Proof of Theorem 3.3. Let p 1. For any 0 < θ < 1, Hölder's inequality gives, for any t 0, Thanks to Lemma 2.1 and the inequality (a + b) q 2 q−1 (a q + b q ), when q 1, one gets Then, it suffices to use Equation (13) to conclude the proof with Remark 3.4. Let us remark that we can obtain "contraction's type bounds" using Equation (12) instead of (13) : for any p 1, any 0 < θ < 1, any t 0 and any x, y > 0, We then obtain that if ν andν have finite θ/2-moments then for p 1 and t 0, which still allows a control by some Wasserstein "distance" (in fact, this is not a distance, since θ/2 < 1) of the initial measures.
Remark 3.5. We estimated numerically the exponential speed of convergence: • of the function t → E x,y (|X t − Y t |) for the Wasserstein coupling (by Monte Carlo method and linear regression). It seems to be on the order of 0.5 (we obtained 0.48 for x = 2, y = 10, m = 10 000 copies, and linear regression between times 2 and 10); • of the Wasserstein distance t → W 1 (δ x P t , δ y P t ), using the explicit representation of this distance for measures on R to approximate it by the L 1 -distance between the two empirical quantile functions. It is on the order of 1.6 (we get 1.67 for x = 2, y = 0.5, m = 1 000 000 copies, and linear regression on 20 points until time 4).
In conclusion, our bound from Theorem 3.3 seems reasonable (at least when compared to those given by [RR96], see section 4.2 below), but is still off by a factor of 4 from the true exponential speed of convergence. Since the coupling itself seems to converge approximately 3 times slower than the true Wasserstein distance, one probably needs to find another coupling to get better bounds.
Remark 3.6. Let us end this section by advertising on a parallel work by B. Cloez [Clo12] who uses a completely different approach, based on a particular Feynman-Kac formulation, to get some related results.

Exponential convergence in total variation distance
In this section, we provide the proof of Theorem 1.5 and we compare our estimate to the ones that can be deduced from [RR96].

From Wasserstein to total variation estimate
The fact that the evolution is partially deterministic makes the study of convergence in total variation quite subtle. Indeed, the law δ x P t can be written as a mixture of a Dirac mass at x + t (the first jump time has not occured) and an absolutely continuous measure (the process jumped at some random time in (0, t)): where, according to Equation (2), This implies that the map y → δ x P t − δ y P t TV is not continuous at point x since one has, for any y = x, Nevertheless, one can hope that The lemma below makes this intuition more precise.
Lemma 4.1. Let t ε x−y > 0. There exists a coupling (X t ) t 0 , (Y t ) t 0 of two TCP processes driven by (1) and starting at (x, y) such that the probability P(X s = Y s , s t) is larger than where f x is defined in Equation (2) and α(x) := ∞ 0 e −u 2 /2−ux du. Moreover, for any x 0 > 0 and ε > 0, let us define Then, Proof. The idea is to construct an explicit coalescent coupling starting from x and y. The main difficulty comes from the fact that the jump are deterministic. Assume for simplicity that y < x. Let us denote by (T x k ) k 1 and (T y k ) k 1 the jump times of the two processes. If then the two paths are at the same place at time T x 1 since in this case The law of T x 1 has a density f x given by (2). As a consequence, the density of T 1 y + x − y is given by s → f y (s − x + y). The best way of obtaining the first equality in (17) before a time t x − y is to realize an optimal coupling of the two continuous laws of T x 1 and T y 1 + x − y. This makes these random variables equal with probability Assume now that 0 x − y ε t. For any s x − y, one has As a consequence, if 0 x − y ε t, where p t (x) is defined in (14) and α(x) = ∞ 0 e −u 2 /2−ux du. Finally, one has to get a lower bound for the probability of the set {T y 2 − T y 1 x − y}. For this, we notice that z ∈ [0, +∞) → P(T z 1 s) = p s (z) = e −s 2 /2−sz is decreasing and that Y T y 1 (y + t)/2 as soon as T y 1 t. As a consequence, This provides the bound (15). The uniform lower bound on the set A x 0 ,ε is a direct consequence of the previous one.
Let us now turn to the proof of Theorem 1.5.
Proof of Theorem 1.5. We are looking for an upper bound for the total variation distance between δ x P t and δ y P t for two arbitrary initial conditions x and y. To this end, let us consider the following coupling: during a time t 1 < t we use the Wasserstein coupling. Then during a time t 2 = t − t 1 we try to stick the two paths using Lemma 4.1. Let ε > 0 and x 0 > 0 be as in Lemma 4.1. If, after the time t 1 , one has where A ε,x 0 is defined by (16), then the coalescent coupling will work with a probability greater than the one provided by Lemma 4.1. As a consequence, the coalescent coupling occurs before time t 1 + t 2 with a probability greater than Moreover, ¿From the deviation bound (7), we get that, for any x 0 2e(1 + 1/t 1 ), The estimate of Proposition 3.1 concerning the Wasserstein coupling ensures that for any 0 < t 0 t 1 . As a consequence, the total variation distance between δ x P t and δ y P t is smaller than In order to get a more tractable estimate, let us assume that t 2 x 0 and use that 1 − e −u u to get Finally let us set Obviously, for ε small enough, x 0 max(t 2 , 2e(1 + 1/t 1 )) and t 1 t 0 . Then, one gets that One can now express ε as a function of t 1 to get that there exists K = K(t 0 ) > 0 such that δ x P t 1 +t 2 − δ y P t 1 +t 2 TV K(1 + t 1 )e − 2λ 3 t 1 .
Since t 2 = (4λ/3)t 1 , one gets that This provides the conclusion of Theorem 1.5 when the initial measures are Dirac masses. The generalisation is straightforward.

A bound via small sets
We describe here briefly the approach of [RR96] and compare it with the hybrid Wasserstein/total variation coupling described above. The idea is once more to build a successful coupling between two copies X and Y of the process. In algorithmic terms, the approach is the following: • let X and Y evolve independently until they both reach a given set C, • once they are in C, try to stick them together, • repeat until the previous step is successful.
To control the time to come back to C × C, [RR96] advocates an approach via a Lyapunov function. The second step works with positive probability if the set is "pseudo-small", i.e. if one can find a time t , an α > 0 and probability measures ν xy ∀x, y ∈ C 2 , L(X t |X 0 = x) αν xy and L(X t |X 0 = y) αν xy .
The convergence result can be stated as follows.
Then for A = Λ δ + e −δt sup x∈C V , A = Ae δt , and for any r < 1/t , If A is finite, this gives exponential convergence: just choose r small enough so that (A ) rt e −δt decreases exponentially fast.
To compute explicit bounds we have to make choices for C and V and estimate the corresponding value of α. Our best efforts for the case of the TCP process only give decay rates of the order 10 −14 . We believe this order cannot be substantially improved even by fine-tuning C and V .

Two other models
This section is devoted to the study of two simple PDMPs. The first one is a simplified version of the TCP process where the jump rate is assumed to be constant and equal to λ. It has been studied with different approaches: PDE techniques (see [PR05,LP09]) or probabilistic tools (see [LL08,OK08,CMP10]). The second one is studied in [RT00]. It can also be checked that our method gives sharp bounds for the speed of convergence to equilibrium of the PDMP which appears in the study of a penalized bandit algorithm (see Lemma 7 in [LP08]).

The TCP model with constant jump rate
In this section we investigate the long time behavior of the TCP process with constant jump rate given by its infinitesimal generator: The jump times of this process are the ones of a homogeneous Poisson process with intensity λ. The convergence in Wasserstein distance is obvious.
Lemma 5.1 ( [PR05,CMP10]). For any p 1, Remark 5.2. The case p = 1 is obtained in [PR05] by PDEs estimates using the following alternative formulation of the Wasserstein distance on R. If the cumulative distribution functions of the two probability measures ν andν are F andF then The general case p 1 is obvious from the probabilistic point of view: choosing the same Poisson process (N t ) t 0 to drive the two processes provides that the two coordinates jump simultaneously and As a consequence, since the law of N t is the Poisson distribution with parameter λt, one has This coupling turns out to be sharp. Indeed, one can compute explicitly the moments of X t (see [LL08,OK08]): for every n 0, every x 0, and every t 0, where θ n = λ(1 − 2 −n ) = nλ n for any n 1. Obviously, assuming for example that x > y, As a consequence, the rate of convergence in Equation (20) is optimal for any n 1. Nevertheless this estimate for the Wasserstein rate of convergence does not provide on its own any information about the total variation distance between δ x P t and δ y P t . It turns out that this rate of convergence is the one of the W 1 distance. This is established by Theorem 1.1 in [PR05]. It can be reformulated in our setting as follows.
Let us provide here an improvement of this result by a probabilistic argument.
Proposition 5.4. For any x, y 0 and t 0, As a consequence, for any measure ν with a finite first moment and t 0, Remark 5.5. Note that the upper bound obtained in Equation (22) is non-null even for x = y. This is due to the persistence of a Dirac mass at any time, which implies that taking y arbitrarily close to x for initial conditions does not make the total variation distance arbitrarily small, even for large times.
Proof of Proposition 5.4. The coupling is a slight modification of the one used to control Wasserstein distance. The paths of (X s ) 0 s t and (Y s ) 0 s t starting respectively from x and y are determined by their jump times (T X n ) n 0 and (T Y n ) n 0 up to time t. These sequences have the same distribution than the jump times of a Poisson process with intensity λ.
Let (N t ) t 0 be a Poisson process with intensity λ and (T n ) n 0 its jump times with the convention T 0 = 0. Let us now construct the jump times of X and Y . Both processes make exactly N t jumps before time t. If N t = 0, then X s = x + s and Y s = y + s for 0 s t.
Assume now that N t 1. The N t − 1 first jump times of X and Y are the ones of (N t ) t 0 : In other words, the coupling used to control Wassertein distance (see Lemma 5.1) acts until the penultimate jump time T Nt−1 . At that time, we have Then we have to define the last jump time for each process. If they are such that then the paths of X and Y are equal on the interval (T X Nt , t) and can be chosen to be equal for any time larger than t.
Recall that conditionally on the event {N t = 1}, the law of T 1 is the uniform distribution on (0, t). More generally, if n 2, conditionally on the set {N t = n}, the law of the penultimate jump time T n−1 has a density s → n(n − 1)t −n (t − s)s n−2 1 (0,t) (s) and conditionally on the event {N t = n, T n−1 = s}, the law of T n is uniform on the interval (s, t).
Conditionally on N t = n 1 and T n−1 , T X n and T Y n are uniformly distributed on (T n−1 , t) and can be chosen such that .
This coupling provides that For any n 2, This equality also holds for n = 1. Thus we get that since N t is distributed according to the Poisson law with parameter λt. This provides the estimate (22). To treat the case of general initial conditions and to get (23), we combine the coupling between the dynamics constructed above with the choice of the coupling of the initial measures µ and ν as a function of the underlying Poisson process (N t ) t 0 : the time horizon t > 0 being fixed, if N t = 0, one chooses for L(X 0 , Y 0 ) the optimal total variation coupling of ν and µ; if N t 1, one chooses their optimal Wasserstein coupling. One checks easily that this gives an admissible coupling, in the sense that its first (resp. second) marginal is a constant rate TCP process with initial distribution ν (resp. µ). And one gets with this construction, using the same estimates as above in the case where N t 1: which clearly implies (23).

A storage model example
In [RT00], Roberts and Tweedie improve the approach from [RR96] via Lyapunov functions and minorization conditions in the specific case of stochastically monotonous processes. They get better results on the speed of convergence to equilibrium in this case. They give the following example of a storage model as a good illustration of the efficiency of their method. The process (X t ) t 0 on R + is driven by the generator In words, the current stock X t decreases exponentially at rate β, and increases at random exponential times by a random (exponential) amount. Let us introduce a Poisson process (N t ) t 0 with intensity α and jump times (T i ) i 0 (with T 0 = 0) and a sequence (E i ) i 1 of independent random variables with law E(1) independent of (N t ) t 0 . The process (X t ) t 0 starting from x 0 can be constructed as follows: for any i 0, Proposition 5.6. For any x, y 0 and t 0, Moreover, if µ is the invariant measure of the process X, we have for any probability measure ν with a finite first moment and t 0, Remark 5.7. In the case α = β, the upper bound (24) becomes Remark 5.8 (Optimality). Applying L to the test function f (x) = x n allows us to compute recursively the moments of X t . In particular, This relation ensures that the rate of convergence for the Wasserstein distance is sharp. Moreover, the coupling of total variation distance requires at least one jump. As a consequence, the exponential rate of convergence is greater than α. Thus, Equation (24) provides the optimal rate of convergence α ∧ β.
Remark 5.9 (Comparison with previous work). By way of comparison, the original method of [RT00] does not seem to give these optimal rates. The case α = 1 and β = 2 is treated in this paper (as an illustration of Theorem 5.1), with explicit choices for the various parameters needed in this method. With these choices, in order to get the convergence rate, one first needs to compute the quantity θ (defined in Theorem 3.1), which turns out to be approximately 5.92. The result that applies is therefore the first part of Theorem 4.1 (Equation (27)), and the convergence rate is given byβ defined by Equation (22). The computation gives the approximate value 0.05, which is off by a factor 20 from the optimal value α ∧ β = 1.
Proof of Proposition 5.6. Firstly, consider two processes X and Y starting respectively at x and y and driven by the same randomness (i.e. Poisson process and jumps). Then the distance between X t and Y t is deterministic: Obviously, for any p 1 and t 0, Let us now construct explicitly a coupling at time t to get the upper bound (24) for the total variation distance. The jump times of (X t ) t 0 and (Y t ) t 0 are the ones of a Poisson process (N t ) t 0 with intensity α and jump times (T i ) i 0 . Let us now construct the jump heights (E X i ) 1 i Nt and (E Y i ) 1 i Nt of X and Y until time t. If N t = 0, no jump occurs. If N t 1, we choose E X i = E Y i for 1 i N t − 1 and E X Nt and E Y Nt in order to maximise the probability This maximal probability of coupling is equal to As a consequence, we get that The law of T n conditionally on the event {N t = n} has the density u → n u n−1 t n 1 [0,t] (u).

This ensures that
Since the law of N t is the Poisson distribution with parameter λt, one has

This ensures that
which completes the proof. Finally, to get the last estimate, we proceed as follows: if N t is equal to 0, a coupling in total variation of the initial measures is done, otherwise, we use the coupling above (the method is exactly the same as for the equivalent result in Proposition 5.4, see its proof for details).

The case of diffusion processes
Let us consider the process (X t ) t 0 on R d solution of where (B t ) t 0 is a standard Brownian motion on R n , σ is a smooth function from R d to M d,n (R) and A is a smooth function from R d to R d . Let us denote by (P t ) t 0 the semi-group associated to (X t ) t 0 . If ν is a probability measure on R d , νP t stands for the law of X t when the law of X 0 is ν. Under ergodicity assumptions, we are interested in getting quantitative rates of convergence of L(X t ) to its invariant measure in terms of classical distances (Wasserstein distances, total variation distance, relative entropy,. . . ). Remark that if A is not in gradient form (even if σ is constant), X t is not reversible and the invariant measure is usually unknown, so that it is quite difficult to use functional inequalities such as Poincaré or logarithmic Sobolev to get a quantitative rate of convergence in total variation or Wasserstein distance (using for example Pinsker's inequality or more generally transportationinformation inequality). Therefore the only general tool seems to be Meyn-Tweedie's approach, via small sets and Lyapunov functions, as explained in Section 4.2. However, we have seen that in practical examples the resulting estimate can be quite poor.
The main goal of this short section is to recall the known results establishing the decay in Wasserstein distance and then to propose a strategy to derive control in total variation distance.

Decay in Wasserstein distance
The coupling approach to estimate the decay in Wasserstein distance was recently put forward, see [CGM08] and [BGM10] or [Ebe11]. It is robust enough to deal with nonlinear diffusions or hypoelliptic ones. In [BGG12], the authors approach the problem directly, by differentiating the Wasserstein distance along the flow of the SDE.
Let us gather some of the results in these papers in the following statement.

Total variation estimate
If Σ = 0 in Equation (25), the process (X t ) t 0 is deterministic and its invariant measure is a Dirac mass at the unique pointx ∈ R d such that A(x) = 0. As a consequence, for any x =x, δ x P t − δx TV = 1 and H(δ x P t |δx) = +∞.
A non-zero variance is needed to get a convergence estimate in total variation distance. Classically, the Brownian motion creates regularity and density. There are a lot of results giving regularity, in terms of initial points, of semigroup in small time. Let us quote the following result of Wang, which holds for processes living on a manifold.
Lemma 6.3 ( [Wan10]). Suppose that σ is constant and denote by η the infimum of its spectrum. If A is a C 2 function such that 1 2 (Jac A + Jac A T ) K I d then there exists K η such that, for small ε > 0, Remark 6.4.
• There are many proofs leading to this kind of results, see for example Aronson [Aro67] for pioneering works, and [Wan10] using Harnack's and Pinsker's inequalities.
• Note that in [GW12], an equivalent bound was given for the kinetic Fokker-Planck equation but with ε replaced by ε 3 . Now that we have a decay in Wasserstein distance and a control on the total variation distance after a small time, we can use the same idea as for the TCP process. As a consequence, we get the following result.
Theorem 6.5. Assume that σ is constant. Under Points 1. or 2. of Proposition 6.1, one has, for any ν andν in P 1 (R d ), Under Point 3. of Proposition 6.1, one has, for any ν andν in P 2 (R d ), Proof. Using first Lemma 6.3 and then Point 1. of Proposition 6.1, we get νP t −νP t TV = νP t−ε P ε −νP t−ε P ε TV The proof of the second assertion is similar, except the use of Point 3. of Proposition 6.1 in the second step.