Quantitative ergodicity for some switched dynamical systems

We provide quantitative bounds for the long time behavior of a class of Piecewise Deterministic Markov Processes with state space Rd \times E where E is a finite set. The continuous component evolves according to a smooth vector field that switches at the jump times of the discrete coordinate. The jump rates may depend on the whole position of the process. Under regularity assumptions on the jump rates and stability conditions for the vector fields we provide explicit exponential upper bounds for the convergence to equilibrium in terms of Wasserstein distances. As an example, we obtain convergence results for a stochastic version of the Morris-Lecar model of neurobiology.


Introduction and main results
Piecewise deterministic Markov processes (PDMPs in short) are intensively used in many applied areas (molecular biology [31], storage modelling [7], Internet traffic [19,22,23], neuronal activity [29,8], populations growth models [27]...). Roughly speaking, a Markov process is a PDMP if its randomness is only given by the jump mechanism: in particular, it admits no diffusive dynamics. This huge class of processes has been introduced by Davis (see [14,15]) in a general framework. Several works [11,18,12] deal with their long time behavior (existence of an invariant probability measure, Harris recurrence, exponential ergodicity...). In particular, it is shown in [13] that the behavior of a general PDMP can be related to the one of the discrete time Markov chain made of the positions at the jump times of the process and of an additional independent Poisson process. Nevertheless, this general approach does not seem to provide quantitative bounds for the convergence to equilibrium. Recent papers have tried to establish such estimates for some specific PDMPs (see [10,20,3]) or continuous time Markov chains (see [9]).
In the present paper, we investigate the long time behavior of an interesting subclass of PDMPs that plays a role in molecular biology (see [28,31,8,35]). We consider a PDMP on R d × E where E is a finite set. The first coordinate moves continuously on R d according to a smooth vector field that depends on the second coordinate whereas the second coordinate jumps with a rate that may depend on the first one. This class of Markov processes is reminiscent of the so-called iterated random functions in the discrete time setting (see [17] for a good review of this topic).
Let E be a finite set, (a(·, i, j)) i,j∈E 2 be n 2 nonnegative continuous functions on R d , and, for any i ∈ E, F i : R d → R d be a smooth vector field such that the ordinary differential equation has a unique and global solution t → ϕ i t (x) on [0, ∞) for any initial condition x ∈ R d . Let us consider the Markov process (Z t ) t 0 = ((X t , I t )) t 0 on R d × E defined by its extended generator L as follows: for any smooth function f : R d × E → R (see [15] for full details on the domain of L). Definition 1.1. In the sequel, η t (resp. µ t ) stands for the law A simple case occurs when a(x, i, j) can be written as λ(x, i)P (i, j) for (λ(·, i)) i∈E a set of nonnegative continuous functions, and P an irreducible stochastic matrix, in which case: Let us describe the dynamics of this process in this simple case, the general case being similar. Assume that (X 0 , I 0 ) = (x, i) ∈ R d × E. Before the first jump time T 1 of I, the first component X is driven by the vector field F i and then X t = ϕ i t (x). The time T 1 can be defined by: where E 1 is an exponential random variable with parameter 1. Since the paths of X are deterministic between the jump times of I, the randomness of T 1 comes from the one of E 1 and If we assume that λ := inf (x,i) λ(x, i) > 0 then the process I jumps infinitely often.
At time T 1 , the second coordinate I performs a jump with the law P (i, ·) and the vector field that drives the evolution of X is switched. . .

Remark 1.3.
In general, I is not a Markov process on its own since its jump rates depend on X. In this paper, we will study both the simple -Markovian -case and the general case.
The main goal of the present work is to provide quantitative bounds for the long time behavior of ergodic processes driven by (1.2) thanks to the construction of explicit couplings. We need to introduce a distance on the set of probability measures to quantify this convergence. Definition 1.4. Let P p be the set of probability measures on R d × E such that the first marginal has a finite p th moment. If η ∈ P p and Z is a random variable on R d × E with distribution η, we denote by X and I the two coordinates of Z in R d and E respectively and by µ and ν their distributions.
Let us now define the distance W p on P p as follows: for η,η ∈ P p , The distance W p is made of a mixture of the Wasserstein distance for the first component and the total variation distance for the second one. Recall that for every p 1, the Wasserstein distance W p between two laws µ andμ on R d with finite p th moment is defined by where the infimum runs over all the probability measures Π on R d × R d with marginals µ andμ (such measures are called couplings of µ andμ). It is well known that for any p 1, the convergence in W p Wasserstein distance is equivalent to weak convergence together with convergence of all moments up to order p. However, two probability measures can be both very close in the W p sense and singular. Choose for example µ = δ 0 andμ = δ ε . In this case, W p (µ,μ) = ε and µ −μ TV = 1.
See e.g. [30,34] for further details and properties for Wasserstein distances.
Estimates for the Wasserstein distances do not require (nor provide) any information about the support of the invariant measure (which is the set of the recurrent points).
This set may be difficult to determine and if the initial distribution of X is not supported by this set, the law of X t and the invariant measure may be singular. To illustrate this fact, one can consider the following trivial example: The process (X, I) is ergodic and the first marginal µ of its invariant measure is supported by the segment {ρa ; ρ ∈ [0, 1]} (it is shown in [7] that µ is a Beta distribution).
Despite its extreme simplicity, this process does not in general converge in total variation: if Z 0 = (0, 1), the law of X t is singular with the invariant measure for any t 0, so L(X t ) − µ TV is always equal to 1. On the contrary, W p (L(X t ), µ) goes to 0 exponentially fast (see below).
We are able to get explicit rates of convergence in two situations. Firstly, if the jump rates of I does not depend on X, then the vector fields (F i ) i∈E will be assumed to satisfy an averaged exponential stability. Secondly, if the jump rates of I are assumed to be Lipschitz functions of X, then the vector fields (F i ) i∈E will be assumed to satisfy a uniform exponential stability.
Electron. Commun. Probab. 17 (2012), no. 56, 1-14. ecp.ejpecp.org Remark 1.5. These conditions are not necessary to ensure the ergodicity of Z. In [2], it is shown that, for constant jump rates and under Hörmander-like conditions on the vectors fields (F i ) i∈E , the invariant probability measure is unique and absolutely continuous with respect to the Lebesgue measure on R d . In a similar setting, exponential rates of convergence to equilibrium are obtained in [6]. But these estimates rely on compactness arguments and are not explicit.

Constant jump rates
If the jump rates of I do not depend on X, then (I t ) t 0 is a Markov process on the finite space E and (X s ) 0 s t is a deterministic function of (I s ) 0 s t . Many results are available both in the discrete time setting (see [26,33,21,25,1]) and in the continuous time setting (see [24,16,4]). Moreover, if Assumption 1.8 (see below) does not hold, [5] provides a simple example of surprising phase transition for a switching of two exponentially stable flows that can be explosive (when the jump rates are sufficiently large). Assumption 1.6. Assume that the jump rates (a(·, i, j)) i∈E do not depend on x and that I is an irreducible Markov process on E. Let us denote by ν its invariant probability measure.
Remark 1.7. If a(·, i, j) does not depend on x, we can always write a(i, j) = λ(i)P (i, j) with P (·, ·) a Markov transition matrix.
where ν is defined in Assumption 1.6.
Firstly, one can establish that the process X is bounded in some L p space. Lemma 1.9. Under Assumptions 1.6 and 1.8, there exists κ > 0 such that, for any q < κ, Let us now turn to the long time behavior estimate. Theorem 1.10. Assume that Assumptions 1.6 and 1.8 hold. Let p < q < κ and denote by s the conjugate of q: q −1 + s −1 = 1. Assume that µ 0 andμ 0 admit a finite q th moment smaller than m. Then, where ρ and θ p are positive constants depending only on the Markov chain I.
Corollary 1.11. Under the hypotheses of Theorem 1.10, the process Z admits a unique invariant measure η ∞ and

Non constant jump rates
Let us now turn to the case when the jump rates of I depend on X. We will assume that the a(x, i, j) are smooth in the x variable and that each vector field F i has a unique stable point. Assumption 1.12. There exist a > 0 and κ > 0 such that, for any x,x ∈ R d and i, j ∈ E, The lower bound condition insures that the second -discrete -coordinate of Z changes often enough (so that the second coordinates of two independent copies of Z coincide sufficiently often). This is a rather strong condition that can be relaxed in certain situations. If for example a(x, i, j) = λ(x, i)P ij as in (1.2), Lipschitz continuous λ bounded from below and an aperiodic P are sufficient to get a similar rate of convergence.
(1.4) Assumption 1.13 ensures that, for any i ∈ E, As a consequence, the vector fields F i has exactly one critical point σ(i) ∈ R d . Moreover it is exponentially stable since, for any x ∈ R d , In particular, X cannot escape from a sufficiently large ball (this implies that the (continuous) functions a(·, i, j) are also bounded from above along the trajectories of X).
More precisely, the following estimate holds.
Lemma 1.14. Under Assumptions 1.12 and 1.13, the process Z cannot escape from the compact setB(0, r) × E whereB(0, r) is the (closed) ball centered in 0 ∈ R d with radius r given by In particular the support of any invariant measure is included inB(0, r).
Let us now state our main result which establishes the quantitative exponential ergodicity of the process Z under Assumptions 1.12 and 1.13.
with p = e −2rκ/α and e = exp(1) and where b depends on the coalescence time of two independent processes defined on E as the second coordinates of independent copies of Z. To obtain this result we couple two paths of our process and compare their distance to a real-valued process that can pass instantly (less and less often) from small to large values (2r+1). This may seem rough, but in the general case, nothing much better can be done : if one of the flows is very strongly attractive, two paths may indeed very rapidly diverge. For particular examples, or under additional assumptions on the flows, it must be possible to get better rates.
Section 2 is dedicated to the proof of Theorem 1.10. Theorem 1.15 is established in Section 3. As an example of the applicability of this result, we show in Section 4 that the stochastic version of the Morris-Lecar model introduced in [35] satisfies our assumptions, so that explicit bounds may be obtained for its convergence to equilibrium.

Constant jump rates
The aim of this section is to prove Theorem 1.10. Assumption 1.6 ensures that (I t ) t 0 is an irreducible Markov process on the finite space E. Its generator A is the matrix defined by A(i, i) = −λ(i) and A(i, j) = λ(i)P (i, j) for i = j. Let us denote by ν ∞ its unique invariant probability measure. The study of the long time behavior of I is classical: since I takes its values in a finite set, it is quite simple to construct a coalescent coupling of two processes starting from different points.

Lemma 2.1 ([32]
). If Assumption 1.6 holds then there exists ρ > 0 such that for any i, j ∈ E, where (I t ) t 0 and (Ĩ t ) t 0 are two independent Markov processes with infinitesimal generator A starting respectively at i and j and T = inf t 0 : I t =Ĩ t is the first intersection time. ecp.ejpecp.org

Moments estimates
In this section we prove Lemma 1.9 and get an L p estimate for |X t |. For any p 2 and ε > 0, Thanks to Gronwall's Lemma, we get that This study has been already performed in [4]. Let us state the precise result. We denote by A p the matrix A−pB where A is the infinitesimal generator of I and B is the diagonal matrix with diagonal (α(1), . . . , α(n)) and associate to A p the quantity Re γ. (2. 2) The long time behavior of e(p, t) is characterised by θ p as follows. For any p > 0, there exist 0 < C 1 (p) < 1 < C 2 (p) < +∞ such that, for any any t > 0, C 1 (p)e −θpt e(p, t) C 2 (p)e −θpt .
See [4] for further details.
If p < κ, then t → E(|X t | p ) is bounded as soon as E(|X 0 | p ) is finite. This concludes the proof of Lemma 1.9.

Convergence rate
Let us now get the upper bound for the Wasserstein distance W p for some p < κ. Assume firstly that the initial law are two Dirac masses at (x, i) and (x, i). It is easy to construct a good coupling of the two processes (X, I) and (X,Ĩ): since the jump rates of I do not depend on X, one can choose I andĨ equal! As a consequence, for any p 2, Quantitative ergodicity for some switched dynamical systems As a consequence, Let us now turn to a general initial condition. Choose (x, i) and (x, j) in R d × E and consider the following coupling: the two processes evolve independently until the intersection time T of the second coordinates. Then, I andĨ are chosen to be equal for ever: 1). This term will turn to be negligible in W p (η t ,η t ). Now fix t > 0 and β ∈ (0, 1) and decompose: Choose q ∈ (p, κ) and define r = q/p and s as the conjugate of r. By Hölder's inequality and Lemma 1.9, we get Moreover, At last, one has to optimize over β ∈ (0, 1). With This concludes the proof of Theorem 1.10.

Non constant jump rates
Let us now turn to the proof of Theorem 1.15. In this section we do not assume that the jump rates depend only on the discrete component. Thus, the coupling is more subtle since once I andĨ are equal, they can go apart with a positive rate. The main idea is the following. If I andĨ are equal, the distance between X andX decreases exponentially fast and then it should be more and more easier to make the processes I andĨ jump simultaneously (since the jump rates are Lipschitz functions of X). This idea has been used in a different framework in [10,3].
This section is organized as follows. Firstly we prove Lemma 1.14 that ensures that the process X cannot escape from a sufficiently large ball. In particular, the support of the invariant law of X is included in this ball. Then we construct the coupling of two processes Z = (X, I) and Z = (X,Ĩ) driven by the same infinitesimal generator (1.2) with different initial conditions. At last we compare the distance between Z andZ to an companion process that goes to 0 exponentially fast.
Proof of Lemma 1.14. Settingx = 0 in (1.4) ensures that, for ε ∈ (0, α), In other words, As a consequence, In particular, X cannot escape from the centered closed ball with radius r = √ M /α.

The coupling
Let us construct a Markov process on (R d × E) 2 with marginals driven by (1.2) starting respectively from (x, i) and (x, j). This is done via its infinitesimal generator L which is defined as follows: y,x, j)).
• if i = j: where (·) + and (·) − stand respectively for the positive and negative parts. Notice that if f depends only on (x, i) or on (x, j), then Lf = Af . Let us explain how this coupling works. When I andĨ are different, the two processes (X, I) and (X,Ĩ) evolve independently. If I =Ĩ then two jump processes are in competition: a single jump vs two simultaneous jumps. The rate of arrival of a single jump equals to i ∈E |a(x, i, i ) − a(x, i, i )|. It is bounded above by κ|x −x|. The rate of arrival of a simultaneous jump is given by i ∈E (a(x, i, i ) ∧ a(x, i, i )). Assume firstly that X 0 andX 0 belong to the ballB(0, r) where r is given by (1.5). Let us define, for any t 0, Electron. Commun. Probab. 17 (2012), no. 56, 1-14. ecp.ejpecp.org The process (∆ t ) t 0 is not Markovian. Nevertheless, as long as I =Ĩ, ∆ decreases with an exponential rate which is greater than α. If a single jump occurs, then ∆ is increased by 1 and it can continuously increase (since the continuous parts are driven by two different vector fields). Nevertheless ∆ is bounded above by D + 1 with D = 2r. At the next coalescent time T c of two independent copies of I, ∆ jumps to ∆−1 and then decreases exponential fast once again (as long as the discrete components coincide).
There exists b > 0 such that T c is (stochastically) smaller than E(b) (for example, if E = {0, 1}, then T c is equal to the minimum of the jump times of the two independent processes which are both stochastically smaller than a random variable of law E(a) and T c is stochastically smaller than E(2a)).

The companion process
Theorem 3.1. For any t 0, Remark 3.2. If α goes to ∞, then γ goes to d whereas γ ∼ pα/b if b goes to ∞.
Proof. Starting from D + 1, the process U jumps after a random time with law E(b) to d and then goes to zero exponentially fast until it (possibly) goes back to D + 1. The first jump time T starting from D can be constructed as follows: let E be a random variable with exponential law E(1). Then In other words, the cumulative distribution function F T of T is such that, for any t 0, Let us define p = e −Dκ/α . The law of T is the mixture with respective weights p and 1 − p of a Dirac mass at +∞ and a probability measure on R with density and cumulative distribution function Starting at D, U will return to D with probability 1−p. The Markov property ensures that the number N of returns of U to D is a random variable with geometric law with parameter p. The length of a finite loop from D to D can be written as the sum S + E where the law of S has the density function f given in (3.2), the law of E is the exponential measure with parameter b and S and E are independent. Lemma 3.3. The variable S is stochastically smaller than an exponential random variable with parameter α i.e. for any t 0, Proof. For any t 0, This ensures the stochastic bound.
As a consequence, the Laplace transform L S of S with density f is smaller than the one of an exponential variable with parameter α: for any s < α, If L e is the Laplace transform of S + E, then, for any s < α ∧ b, we have Classically, for any s ∈ R such that (1 − p)L e (s) < 1, one has Let us denote by the two roots of ξ 2 − (α + b)ξ + pαb = 0. Notice that γ < α ∧ b <γ. For any s < γ, one has (1 − p)L e (s) < 1 and Let us now turn to the control of E(U t |U 0 = D). The idea is to discuss whether H > βt or not for some β ∈ (0, 1) (and then to choose β as good as possible): • if H < βt, then U t e −(1−β)αt , • the event {H βt} has a small probability for large t since H has a finite Laplace transform on a neighbourhood of the origin.

Example
The Morris-Lecar model introduced in [ Here we focus on a model described in [35], to which we refer for additional information. This model is motivated by the fact that m and n, being proportions of open channels, are better coded as discrete variables. More precisely, we fix a large integer K (the total number of channels) and define a PDMP (V, u 1 , u 2 ) with values in R × {0, 1/K, 2/K . . . , 1} 2 as follows.
Firstly, the potential V evolves according to where C and I are positive constants (the capacitance and input current), the g i and V i are positive constants (representing conductances and equilibrium potentials for different types of ions), u 3 (t) is equal to 1 and u 1 (t), u 2 (t) are the (discrete) proportions of open channels for two types of ions.
These two discrete variables follow birth-death processes on {0, 1/K, . . . , 1} with birth rates α 1 , α 2 and death rates β 1 , β 2 that depend on the potential V : Let us check that our main result can be applied in this example. Formally the process is a PDMP with d = 1 and the finite set E = {0, 1/K, . . . , 1} 2 . The discrete process (u 1 , u 2 ) plays the role of the index i ∈ E, and the fields F (u1,u2) are defined (on R) by (4.1) by setting u 1 (t) = u 1 , u 2 (t) = u 2 . The constant term u 3 g 3 in (4.1) ensures that the uniform dissipation property (1.4) is satisfied: for all (u 1 , u 2 ), The Lipschitz character and the bound from below on the rates are not immediate.
Indeed the jump rates (4.2) are not bounded from below if V is allowed to take values in R. However, a direct analysis of (4.1) shows that V is essentially bounded : all the fields F (u1,u2) point inward at the boundary of the (fixed) line segment S = [0, max(V 1 , V 2 , V 3 + (I + 1)/g 3 u 3 )], so if V (t) starts in this region it cannot get out. The necessary bounds all follow by compactness, since α i (V ) and β i (V ) are C 1 in S and strictly positive.