Fluctuations of the Empirical Measure of Freezing Markov Chains

In this work, we consider a finite-state inhomogeneous-time Markov chain whose probabilities of transition from one state to another tend to decrease over time. This can be seen as a cooling of the dynamics of an underlying Markov chain. We are interested in the long time behavior of the empirical measure of this freezing Markov chain. Some recent papers provide almost sure convergence and convergence in distribution in the case of the freezing speed $n^{-\theta}$, with different limits depending on $\theta<1,\theta=1$ or $\theta>1$. Using stochastic approximation techniques, we generalize these results for any freezing speed, and we obtain a better characterization of the limit distribution as well as rates of convergence as well as functional convergence.


Introduction
Let (i n ) n≥1 be an inhomogeneous-time Markov chain with state space {1, . . . , D} with the following transitions when i = j: P (i n+1 = j|i n = i) = q n (i, j), q n (i, j) = p n (q(i, j) + r n (i, j)), where (p n ) n≥1 is a decreasing sequence converging toward some p ∈ [0, 1], the remainders r n (i, j) tend to 0 (fast enough) and q is the discrete generator of some {1, . . . , D}valued ergodic Markov chain. This model is related to the simulated annealing algorithm, and the sequence (p n ) n≥1 can be interpreted as the cooling scheme of an underlying Markov chain generated by q. If p < 1, since lim n→+∞ q n (i, j) = pq(i, j), the probability of (i n ) n≥1 to move decreases over time, from which the appellation freezing Markov chain. As pointed out in [14,Section 1] or in [17,Section 2], this type of inhomogeneous Markov chain is naturally related to MCMC algorithms, Bernoulli trials (like the GEM model) or urn models.
The behavior of (i n ) n≥1 is simple enough to understand, and depends on the summability of the sequence (p n ) n≥1 . The chain (i n ) n≥1 converges in distribution to the unique invariant probability ν associated to q if ∞ n=1 p n = +∞ (see Theorem 2.4 below). On the other hand, if ∞ n=1 p n < +∞, the Markov chain freezes along the way, as a consequence of the Borel-Cantelli Lemma. Then, we shall assume that ∞ n=1 p n = +∞, so that we can investigate the convergence of the empirical distribution x n = 1 n n k=1 δ i k .
The problem of the convergence of this empirical measure can be traced back to the thesis of Dobrušin [15], and several questions are still open, as pointed out in the recent article [17]. Some results can be obtained from the general theory developed in [31,29], and [14,17] study the present model. In particular, convergence results are only obtained for a freezing rate of the form p n = a/n θ (and r n (i, j) = 0). More precisely, • if θ < 1 then (x n ) n≥1 converges to ν in probability; see [14,Theorem 1.2]. • if θ < 1/2, then (x n ) n≥1 converges to ν a.s. This can be extended to 1/2 ≤ θ < 1 when the state space contains only two points; see [14,Theorem 1.2] and [17,Corollary 2].
• if θ < 1 and D = 2, then, up to an appropriate scaling, the empirical measure (x n ) n≥1 converges in distribution to a Gaussian distribution; see [17,Theorem 2]. • if θ = 1 then (x n ) n≥1 converges in distribution, and the moments of the limit probability are explicit. If q corresponds to the complete graph (see Section 4) then this limit probability is the Dirichlet distribution. When D = 2, this covers classic distributions such as Beta, uniform, Arcsine or Wigner distributions; see [ • when D = 2, some convergence results are established for (x n ) n≥1 for general sequences (p n ) n≥1 , under technical conditions that we find hard to check in practice; see [17,Theorem 3].
We shall refer to the case θ < 1 as standard, since it is related to classic laws of large numbers and central limit theorems. This case was called subcritical in [17], in comparison with the critical case θ = 1. Since we can slightly generalize this critical case here, the term non-standard will be preferred from now on. In the present article, we generalize the aforementioned results by proving that, in the standard case, if ∞ n=1 (p n n 2 ) −1 < +∞ then (x n ) n≥1 converges to ν a.s., and we also give weaker conditions for convergence in probability; this is the purpose of Theorem 2.10. Under slightly stronger assumptions and up to a rescaling, we obtain convergence of (x n ) n≥1 to a Gaussian distribution with explicit variance in Theorem 2.11. Finally, if p n ∼ a/n, then (x n ) n≥1 converges in distribution exponentially fast to a limit probability (see Theorem 2.8). This distribution is characterized as the stationary measure of a piecewisedeterministic Markov process (PDMP), possesses a density with respect to the Lebesgue measure and satisfies a system of transport equations; see Propositions 3.1 and 3.4. Furthermore, Corollary 3.9 links the standard and non-standard setting by providing a convergence of the rescaled stationary measure of the PDMP to a Gaussian distribution as the switching accelerates. We also investigate the complete graph dynamics in Section 4 and are able to derive explicit results in Propositions 4.1 and 4.2. Most of our convergence results are also provided with quantitative speeds and functional convergences.
In contrast with the Pólya Urns model (see for instance [22]), where the random sequence converges almost surely to a random variable (thus converges in distribution), Theorems 2.8 and 2.11 provide convergences in distribution which are not a consequence EJP 23 (2018), paper 2.
Fluctuations of the empirical measure of freezing Markov chains of an almost sure convergence. However, note that, by letting p n = 1 for all n ≥ 1, we can recover classic limit theorems for homogeneous-time Markov chains (see [24]). Furthermore, the remainder term r n (i, j) enables us to deal with different freezing schemes (see Remark 2.3). In particular, the proofs in [14] and [17] are mainly based on the method of moments, which is why more stringent assumptions are considered there. Our approach is completely different, and is based on the theory of asymptotic pseudotrajectories detailed in [3] and revisited in [4].
Briefly, a sequence is an asymptotic pseudotrajectory of a flow if, for any given time window, the sequence and the flow starting from the same point evolve close to each other (see for instance [6,3]). This definition can be formalized for dynamical systems and be extended to discrete sequences of probabilities and continuous Markov semi-groups. This theory allows us to derive the behavior of the sequence of empirical measures (x n ) n≥1 from the one of auxiliary continuous-time Markov processes. The interested reader may find illustrations of this phenomenon in [4,Figures 3.1,3.2 and 3.3], see also Figure 4. In the present paper, depending on whether we work in a standard or non-standard setting, these processes are either a diffusive process or a switching PDMP. The careful study of these limit processes is of interest per se, and is done in Section 3. More precisely, Gaussian distributions appear naturally since we deal with an Ornstein-Uhlenbeck process generated by with p and h respectively defined in Assumption 2.1, and in (2.6). On the contrary, we shall see that, in a non-standard framework, the empirical measure is linked to a PDMP, called exponential zig-zag process, generated by These Markov processes shall be defined and studied more rigorously in Section 3. In particular, besides some classic long-time properties (regularity, invariant measure, rate of convergence. . . ), we prove in Theorem 3.3 the convergence of the exponential zig-zag process to the Ornstein-Uhlenbeck process when the frequency of jumps accelerates, i.e. when a → +∞.
The rest of this paper is organized as follows. In Section 2, we specifiy the notation and assumptions mentioned earlier, that will be used in the whole paper. We also state convergence results for (x n ) n≥1 , which are Theorems 2.8, 2.10 and 2.11. We study the long-time behavior of the two auxiliary Markov processes in Section 3 and investigate the case of the complete graph in Section 4, for which it is possible to get explicit formulas. The paper is then concluded with the proofs of the main theorems in Section 5.

Notation
We shall use the following notation throughout the paper: N i and and we identify an integer N with the multi-index (N, . . . , N ). Likewise, for any x ∈ R d , let |x| = d i=1 |x i |. • For some multi-index N and an open set U ⊆ R d , C N (U ) is the set of functions f : U → R which are N i times continuously differentiable in the direction i. For any f ∈ C N (U ), we define When there is no ambiguity, we write C N instead of C N (U ), and denote by C N b and C N c the respective sets of bounded C N functions and of compactly supported C N functions.
• We denote by L (X) the probability distribution of a random vector X, and we identify the measures over {1, . . . , D} with the 1 × D real-valued matrices. Let L be the Lebesgue measure over R D .
• If µ, ν are probability measures and f is a function, we write µ(f ) = f (x)µ(dx). For a class of functions F , we define Note that, for every class of functions F considered in this paper, convergence in d F implies (and is often equivalent to) convergence in distribution (see [4, Lemma A.1]). In particular, let be respectively the Wasserstein distance and the total variation distance.
• We write, for n ≥ 1, u n = O(v n ) if there exists some bounded sequence (h n ) n≥1 such that u n = h n v n . Moreover, if lim n→+∞ h n = 0, we write u n = o(v n ), and if lim n→+∞ h n = 1, we write u n ∼ v n .

Assumptions and main results
Let D be a positive integer and (i n ) n≥1 be a {1, . . . , D}-valued inhomogeneous-time Markov chain such that, ∀i = j, The following assumption, which will be in force in the rest of the paper, describes the behavior of the transitions q n as time goes by. Assumption 2.1 (Freezing speed). Assume that that the matrix Id +q is irreducible and, for n ≥ 1 and i = j, q n (i, j) = p n (q(i, j) + r n (i, j)) , where (p n ) is a sequence decreasing to p ∈ [0, 1] such that ∞ n=1 p n = +∞, and lim n→+∞ r n (i, j) = 0. For i = j, assume q(i, j) ≥ 0, q n (i, j) ≥ 0 and Note that we do not require (p n ) n≥1 to converge to 0. Of course, if p > 0, then the series n p n trivially diverges; as pointed out in the introduction, if this series converge then the problem is trivial. In fact, if p n = 1 and r n (i, j) = 0 for any integers i, j, n, then the freezing Markov chain (i n ) n≥1 is a classic Markov chain. When p = 0, the dynamics of Assumption 2.1 corresponds to the lazier and lazier random walk introduced in [4]. where A, A are square matrices and P is a permutation matrix. Indecomposability allows such a decomposition, as long as B has a nonzero entry. In any case, Id +q possesses a unique absorbing class of states on which it is irreducible. Using Perron-Frobenius Theorem (see [21,Theorem 2p.53]), the matrix Id +q possesses a unique invariant measure ν , and the associated chain converges toward it under aperiodicity assumptions (see also Remark 3.3). Note that aperiodicity hypotheses are not relevant for the freezing Markov chain whenever p < 1, since the freezing scheme automatically provides aperiodicity to the Markov chain.
Under Assumption 2.1, Id +q possesses a unique invariant distribution ν , which writes ν q = 0; let ν ∈ be its associated vector. 1 The algebric term indecomposable also exists for matrices, and is sometimes mistaken for irreducibility.
Throughout this paper, a Markov chain (or its associated transistion matrix) is said indecomposable if it admits a unique recurrent class. EJP 23 (2018), paper 2. Remark 2.3 (Interpretation of the term r n (i, j)). The remainder r n (i, j) in (2.1) can either model small perturbations of the main freezing speed p n q(i, j), or a multiscale freezing scheme with p n being the slowest freezing speed. For instance, the case The following result characterizes the long-time behavior of the inhomogeneous Markov chain (i n ) n≥1 .  x n+1 = γ n+1 γ n x n + γ n+1 e in+1 , (2.4) that the vector x n belongs to the simplex and that (x n , i n ) ∈ E = × {1, . . . , D}. We highlight the fact that, in general, the sequence (x n ) n≥1 is not a Markov chain by itself, but (x n , i n ) n≥1 is.

Remark 2.5 (Interpretation of ).
The transpose x → x is a natural bijection between and the set of probability measures over {1, . . . , D}. Then, the sequence (x n ) n≥1 can be viewed as the sequence of empirical measures of the Markov chain (i n ) n≥1 . From that viewpoint, we highlight the fact that the L 1 norm over can be interpreted (up to a multiplicative constant) as the total variation distance: indeed, for any x,x ∈ , Following [3,4], and given sequences (γ n ) n≥1 , ( n ) n≥1 , we define the following parameter which rules the speed of convergence in the context of standard fluctuations: Finally, we need to introduce a fundamental tool in the study of the standard fluctuations: the matrix h, which is solution of the multidimensional Poisson equation With the help of Perron-Frobenius Theorem (see [21,Theorem 2p.53]), it is easy to see that h is well-defined.
Throughout the paper, we shall treat two different cases, which entail different limit behaviors for the fluctuations of (x n ) n≥1 or (y n ) n≥1 . Each of these cases corresponds to one of the two following assumptions.
Note that, under Assumption 2.6, the sequences (γ n ) n≥1 and (p n ) n≥1 are equivalent up to a multiplicative constant and the scaling (α n ) n≥1 is trivial, hence we are not interested in the behavior of (y n ) n≥1 .
then, denoting by ρ the spectral gap of Id +q, for any v < θ there exist a class of functions F defined in (5.4) and a positive constant C such that It should be noted that our approach for the study of the long-time behavior of (x n , i n ) n≥0 also provides functional convergence for some interpolated process (X t , I t ) t≥0 defined in (5.3) (see Lemma 5.1, from which Theorem 2.8 is a straightforward consequence). Moreover, note that the speed of convergence provided by Theorem 2.8 writes, for any function f : × {1, . . . , D} → R, two times differentiable in the first variable, there exists a constant C f such that EJP 23 (2018), paper 2. Remark 2.9 (Is it possible to generalize Assumption 2.6?). This remarks leans heavily on the proof of Theorem 2.8 and may be omitted at first reading. It is interesting to wonder whether it is possible to obtain non-standard fluctuations for a more general freezing speed (p n ) n≥1 . To that end, let us try to mimic the computations of the proof of for any vanishing sequences (γ n ) n≥1 and (γ n ) n≥1 . Our method being based on asymptotic pseudotrajectories, the limit of the rescaled process of (x n , i n ) n≥1 belongs to a certain class of PDMPs which can be attained if, and only if, Without loss of generality, one can choose γ n =γ n and C 2 = C 3 = 1.
Then, the third term of (2.7) entails γ n = (n + o(1)) −1 as n → +∞, which in turn implies p n = C 1 n −1 + o(n −1 ) when injected in the first term of (2.7). Also, note that assuming A < 1 or θ > 1 in Theorem 2.8 would not provide better speeds of convergence, since one would obtain a speed of the form lim n→+∞ x n = ν in probability, or equivalently in L 1 .

Moreover, if
∞ n=1 γ 2 n p −1 n < +∞, then lim n→+∞ x n = ν a.s. Moreover, if = λ(γ, γ/p) ∧ λ(γ, R) > 0, then, for any v < there exists a constant 2 C > 0 such that |x n − ν| ≤ C n v a.s. Theorem 2.11 (Standard fluctuations). Under Assumptions 2.1 and 2.7, (y n ) n≥1 converges in distribution to the Gaussian distribution N 0, Σ (p,Υ) The precise proofs of the main results are deferred to Section 5. As pointed out in the introduction, our proofs of Theorems 2.8 and 2.11 rely on comparing (x n ) n≥1 and (y n ) n≥1 with auxiliary continuous-time Markov processes, using the theory of asymptotic pseudotrajectories (in Lemma 5.1) and the SDE method (in Section 5.2). Then, these discrete Markov chains will inherit some properties of the Markov processes that we investigate in Section 3. In particular, the results we use provide functional convergence of the rescaled interpolating processes to the auxiliary Markov processes (see [
It should be noted that assuming that (p n ) n≥1 is decreasing, lim n→+∞ p n = 0 and ∞ n=1 p n = +∞ do not imply in general that p n+1 ∼ p n . A slight modification of the proof shows that, if p n+1 is not equivalent to p n , we have to assume the existence of a sequence (β n ) n≥1 such that and such that the sequences (γ 2 n β 2 n p −1 n ) n≥1 and (β n γ n ) n≥1 are decreasing; then the conclusion of Theorem 2.11 holds.

The auxiliary Markov processes
In this section, we study the ergodicity of the processes arising as limits of the freezing Markov from Section 2. We also study their invariant measure, and provide explicit formulas when it is possible.

The exponential zig-zag process
In this section, we investigate the asymptotic properties of the exponential zig-zag process, which arise from the non-standard scaling of the Markov chain (i n ) n≥1 . To this end, let (X t , I t ) t≥0 be the strong solution of the following SDE (see [23]), with values in E: (3.1) where the N i,j are independent Poisson processes of intensity aq(i, j)1 {i =j} and Thus, the infinitesimal generator of this process is L Z defined in (1.3) (see e.g. [18,13,25]). Actually, the exponential zig-zag process is a PDMP; the interested reader can consult [13,8] for a detailed construction of the process (X, I). Let us describe briefly its dynamics: setting I 0 = i, the process possesses a continuous component X which is exponentially attracted to the vector e i . The discrete component I t is piecewiseconstant, and jumps from i to j following the epochs of the processes N i,j , which in turn leads the continuous component to be attracted to e j (see Figure 1 for sample paths of the exponential zig-zag process, and Figure 3 for a typical path in the framework of Section 4.2). EJP 23 (2018), paper 2.
The following result might be seen as a direct consequence of [7, Theorem 1.10] or [11,Theorem 1.4], although these articles do not provide explicit rates of convergence, which are useful for instance in the proof of Corollary 3.9. Figure 1: Sample paths on [0, 5] of the exponential zig-zag process for X 0 = (1/3, 1/3, 1/3), q(i, j) = 1 and a = 0.5, a = 2, a = 20 (from left to right).

Proposition 3.1 (Ergodicity).
The exponential zig-zag process (X t , I t ) t≥0 admits a unique stationary distribution π. If ρ is the spectral gap of q, then for any for any v < aρ(1+aρ) −1 , there exists a constant C > 0 such that Note that the speed of convergence provided in Proposition 3.1 can be improved when D = 2, since we are able to use more refined couplings (see Proposition 4.5).
for any function f smooth enough. Now, let us establish the absolute continuity of this invariant distribution with respect to the Lebesgue measure L. Lemma 3.2 (Absolute continuity of the exponential zig-zag process). Let K ⊂˚ be a compact set. There exist constants t 0 , c 0 > 0 and a neighborhood V of K such that, for any (x, i) ∈ E and for all t ≥ t 0 ,

Remark 3.3 (When Id +q is only indecomposable)
. This remark echoes Remark 2.2 and describes the behavior of the Markov chain (x n , i n ) n≥1 when Id +q is reducible but indecomposable. In that case, Proposition 3.1 holds as well. However, Id +q possesses a unique recurrent class which is strictly contained in {1, . . . , D}, the vector ν possesses at least one zero and belongs to the frontier of the simplex , and π(˚ ) = 0. It is then impossible to obtain an equivalent to Proposition 3.1 with a convergence in total variation; when Id +q is irreducible, this is possible using techniques inspired from [10, If Id +q is indecomposable, one can obtain equivalents of Lemma 3.2 and Proposition 3.4 below by replacing the Lebesgue measure L on R D by the Lebesgue measure on the linear subspace spanned by the recurrent class of Id +q.
Proof of Lemma 3.2. The proof is mainly based on Hörmander-type conditions for switching dynamical systems obtained in [1,8]. Using the notation of [8], let F i : x → e i − x and then, if D ≥ 3, As a consequence, the strong bracket condition of [ Using the Markov property, this holds for every t ≥ t 0 , which entails (3.6).
x i , x 2 , ... π, we obtain that π admits an absolutely continuous part (note that uniqueness comes from Proposition 3.1). Since π cannot have an absolutely continuous part and a singular one (see [1,Theorem 6]), π admits a density with respect to the Lebesgue measure, which entails (3.7).
Now, let us characterize the function ϕ. We have and, using Lemma 3.5, for any 1 ≤ i ≤ D, As a consequence, Hence, (3.9) writes It follows that ϕ is the solution of (3.8).

The Ornstein-Uhlenbeck process
In this short section, we recall a classic property of multidimensional Ornstein-Uhlenbeck processes, which is useful to characterize the behavior of (y n ) n≥1 in a standard setting. Thus, we define (Y t ) t≥0 as the strong solution of the following SDE, with values in R D : where W is a standard D-dimensional Brownian motion and (Σ (p,Υ) ) 1/2 is the square root of the positive-definite symmetric matrix Σ (p,Υ) , i.e. (Σ (p,Υ) ) 1/2 ((Σ (p,Υ) ) 1/2 ) = Σ (p,Υ) . The process Y is a classic Ornstein-Uhlenbeck process with infinitesimal generator L O defined in (1.1). Such processes have already been thoroughly studied, so we present only the following proposition, which quantifies the speed of convergence of Y to its equilibrium. Moreover, Proof of Proposition 3.6. First, since a straightforward integration by parts shows that, for any f ∈ C 2 c , N 0, Σ (p,Υ) (L O f ) = 0 so that N 0, Σ (p,Υ) is an invariant measure for the Ornstein-Uhlenbeck process Y.
It is well-known and easy to check that where W is a standard Brownian motion. Consequently, if we considerỸ another Ornstein-Uhlenbeck process generated by L O and driven by the (same) Brownian motion W , which entails the uniqueness of the invariant probability distribution as well as the exponential ergodicity of the process.

Acceleration of the jumps
The current section links the Sections 3.1 and 3.2 in the following sense: Markov chain Exponential zig-zag process Ornstein-Uhlenbeck process Slow freezing Fast freezing Acceleration of the jumps EJP 23 (2018), paper 2.
Indeed, we prove in Theorem 3.7 the convergence of the (rescaled) exponential zig-zag process to a diffusive process as the jump rates go to infinity. Such results are fairly standard and are already known in the cases of (linear) zig-zag processes (see [19,9]) or of particle transport processes (see [12]). Heuristically, since there are more frequent jumps, the process tends to concentrate around its mean ν, and the effect of the discrete component fades away. This phenomenon can be seen on Figure 1. We shall end this section with Corollary 3.9, which provides the convergence of the stationary distribution of the exponential zig-zag process toward a Gaussian distribution.
To this end, let (a n ) n≥1 be a sequence of positive numbers such that a n → +∞ as n → +∞ and, for any integer n, let (X √ a n (X (n) − ν) and denote by Y Then, Then, by Dynkin's formula, for fixed n, the processes (M t (k)) t≥0 and (N t (k, l)) t≥0 are local martingales with respect to the filtration generated by (X (n) , By integration by parts, Note that I (n) is a Markov process on its own, generated by L (n) In other words, for any t > 0, we can write I (n) t = I ant a.s., for some pure-jump Markov process (I t ) t≥0 generated by Using the ergodicity of (I t ) t≥0 together with lim n→+∞ a n = +∞, we have  3.8 (Heuristics for a direct Taylor expansion of the generator). As for many limit theorems for Markov processes, one would like to predict the convergence of the exponential zig-zag process to the Ornstein-Uhlenbeck diffusion from a Taylor expansion of the generator. Let us describe here a quick heuristic argument based on [12], which justifies the particular choice of functions ϕ k in the proof of Theorem 3.7. For the sake of simplicitylet us work in the setting of Section 4.2, that is the generator of (X t , I t ) t≥0 is of the form · ∇ x f (x) which cannot be rescaled to converge to some diffusive operator. We need an approximation f a of f in a sense that lim a→+∞ f a = f and L Z f a has the form of a second order operator. Then, let Here, j ν j g j (x) − g i (x) = ν 1 − 1 {i=1} does not depend on x, neither does the function h, which is thus defined by (2.6). Furthermore, h(x, i) From Proposition 3.1, for any fixed n ≥ 1, the process (X ν i √ a n ϕ (n) y √ a n + ν, i dy. Proof of Corollary 3.9. Let n ≥ 1, t ≥ 0 and Up to a constant, d F is the Fortet-Mourier distance and metrizes the weak convergence. Fix t ≥ 0 and let X where Y is an Ornstein-Uhlenbeck process with generator L O and initial condition 0. Using the definition of d F and Proposition 3.1, Let us check that the term W δ 0 ,π (n) is uniformly bounded. To that end, let By Hölder's inequality, Consequently to Proposition 3.6, which goes to 0 as t → +∞.

Complete graph
In this section, we consider a particular case of freezing Markov chain, where all the states are connected, and the jump rate to a state does not depend on the position EJP 23 (2018), paper 2. of the chain. This example of Markov chain has already been studied in the literature, for instance in [14]. Section 4.1 deals with the general D-dimensional case, for which most of the results of Section 3 can be written explicitly, notably the invariant measure of the exponential zig-zag process, which is a mixture of Dirichlet distributions (see Figure 2). Section 4.2 studies more deeply the case D = 2, where we can refine the speed of convergence provided in Proposition 3.1.

General case
Throughout this section, following [14], we assume that there exists a positive vector θ ∈ (0, +∞) D such that, for any 1 ≤ i, j ≤ D, and we will recover [14,Theorem 1.4]. If D = 2, let us highlight that an irreducible matrix Id +q automatically satisfies (4.1) (if Id +q is indecomposable then this is true as soon as q(1, 2)q(2, 1) = 0).  Proof of Proposition 4.1. If q satisfies (4.1), it is straightforward that its invariant distribution ν is given by ν i = θ i |θ| −1 for any 1 ≤ i ≤ D. The convergence of (i n ) n≥1 to ν and of (x n , i n ) n≥1 to some distribution π are direct corollaries of Theorems 2.4 and 2.8. Moreover, Proposition 3.4 holds, hence π satisfies (3.7) and it is clear that is the unique (up to a multiplicative constant) solution of (3.8), which entails that In the framework of (4.1), it is also possible to obtain explicitly the solution of the Poisson equation related to q as well as the covariance matrix of the limit distribution in the standard setting. This is the purpose of the following result, whose proof is straightforward using Theorem 2.11 together with the expressions (1.2) and (2.6).
Finally, let us emphasize the fact that Corollary 3.9 provides an interesting convergence of rescaled Dirichlet distributions, when considered in the particular case of the complete graph. . For any vector θ ∈ (0, +∞) D , if (X n ) n≥1 is a sequence of independent random variables such that L (X n ) = D(a n θ), then lim n→+∞ √ a n (X n − ν) = N 0, diag(ν) − νν in distribution.

The turnover algorithm
In this subsection, we consider the turnover algorithm introduced in [17]. This algorithm studies empirical frequency of heads when a coin is turned over with a certain probability, instead of being tossed as usual. The authors provide various convergences in distribution for this proportion, depending on the asymptotic behavior of the turnover probability, which corresponds to (p n ) n≥1 in the present paper. However, this turnover algorithm can be seen as a particular case of freezing Markov chain, and can then be written as the stochastic algorithm defined in (2.4), in the special case D = 2. Since x n (1) = 1 − x n (2), there is only one relevant variable in this section, which belongs to [0, 1]: Note that we are in the framework of Section 4.1, with θ 1 = q(2, 1) and θ 2 = q(1, 2), and that Propositions 4.1 and 4.2 hold. In particular, we have ν i = θ i (θ 1 + θ 2 ) −1 . Then, for any y ∈ R and (x, i) ∈ [0, 1] × {1, 2}, the infinitesimal generators defined in (1.1) and and which is, at first glance, different from the variance provided in [17], which is (under our notation) The factor a 2 comes from the fact that [17] studies the behavior of a −1 y n . The factor 2 comes from the choice of normalization mentioned earlier, since x n ∈ [0, 1] and x n ∈ [−1, 1].
Whenever D = 2, it is easier to visualize the dynamics of (X, I) (see Figure 3), and we can improve the results of Proposition 3.1 concerning the speed of convergence of the exponential zig-zag process to its stationary measure π.
Since the inter-jump times of the exponential zig-zag process are spread-out, it is also possible to show convergence in total variation with a method similar to [10,Proposition 2.5]. Note that, following Proposition 4.1, the limit distribution of (X t ) t≥0 is the first margin of π, namely β(aθ 1 , aθ 2 ).
Proof of Proposition 4.5. Without loss of generality, let us assume that θ 1 ≥ θ 2 , that is v = aθ 1 . Using Proposition 4.1, it is clear that π is the limit distribution of (X, I). Let us EJP 23 (2018), paper 2.
turn to the quantification of the ergodicity of the process. Since the flow is exponentially contracting at rate 1, one can expect the Wasserstein distance of the spatial component X to decrease exponentially. The only issue is to bring I to its stationary measure first. So, consider the Markov coupling (X, I), ( X, I) of L Z on E × E, which evolves independently if I =Ĩ, and else follows the same flow with common jumps. We set T 0 = 0 and denote by T n the epoch of its n th jump. If I 0 = I 0 , the first jump is not common a.s., but in any case, since D = 2, I T1 = I T1 a.s. and L (T 1 ) = E (v). Consequently, Note that if L (I 0 ) = L (Ĩ 0 ), let I 0 =Ĩ 0 , so that the coupling (X, I), ( X, I) always has common jumps and Letting (X 0 , X 0 ) be the optimal Wasserstein coupling entails Wasserstein contraction. The results above hold for any initial conditions (X 0 ,Ĩ 0 ). Then, let L (X 0 ,Ĩ 0 ) = π to achieve the proof; in particular, L (Ĩ 0 ) = ν = θ1 θ1+θ2 δ 1 + θ2 θ1+θ2 δ 2 .

Proofs
In this section, we provide the proofs of the main results of this paper that were stated throughout Section 2.
Proof of Theorem 2.4. Under Assumption 2.1, let us first assume that p > 0. The matrix (Id +q) is irreducible, and so is (Id +pq). Moreover, ν is also the invariant measure of pq, and Perron-Frobenius Theorem entails that there exist C > 0 and ρ ∈ (0, 1) such that for every n ≥ 1 and i ∈ {1, . . . , D}, d TV δ i (Id +pq) n , ν ≤ Cρ n . Now, let us prove that (i n ) n≥1 is an asymptotic pseudotrajectory of the dynamical system induced by Id +pq. The limit set of such a system being contained in every global attractor (see [3, Theorems 6.9 and 6.10]), we have d TV (δ in (Id +pq), i n+1 ) = d TV (δ in (Id +pq), δ in (Id +q n )) ≤ |p n − p| + j =in |r n (i n , j)| ≤ |p n − p| + D i,j=1 |r n (i, j)|, (5.1) and the right-hand side of (5.1) converges to 0, which ends the proof.

Asymptotic pseudotrajectories in the non-standard setting
In this section, we prove Theorem 2.8 using results from [4], based on the theory of asymptotic pseudotrajectories for inhomogeneous-time Markov chains. Indeed, with the convention 0 k=1 = 0, let τ n = n k=1 γ k , m(t) = sup{k ≥ 0 : τ k ≤ t}, (5.2) and define the piecewise-constant processes We shall show that, as t → +∞, the process (X t , I t ) t≥0 converges in a way (see Figure 4) to the exponential zig-zag process (X t , I t ) t≥0 solution of (3.1), that we already studied in Section 3.1. To that end, let (P t ) t≥0 be the Markov semigroup of (X, I), N 1 = (2, . . . , 2, 0) and Note that convergence with respect to d F implies convergence in distribution (see [4, Lemma A.1]).  Moreover, the sequence of processes ((X n+t , I n+t ) t≥0 ) n≥1 converges in distribution, as t → +∞, toward (X π t , I π t ) t≥0 in the Skorokhod space, where (X π , I π ) is a process generated by L Z with initial condition π.  • Convergence of a kind of discrete infinitesimal generator L n , which characterizes the dynamics of (X, I), to L Z defined in (1.3).
• Smoothness of the limit semigroup (P t ) t≥0 and control of its derivatives with respect to the initial condition of the process.
• Uniform boundedness of the moments of (x n , i n ) n≥1 up to some order, which is trivially satisfied here since E is compact.
Proof of Lemma 5.1. In what follows, the notation O (as n → +∞) is uniform over , and we study the convergence of L n to L Z in the sense of [4]. Let (x, i) ∈ E and χ N1 (x, i) = D k=1 x 2 k . We recall that We turn to the study of the regularity of the limit semigroup, following [26]. Let t > 0 and note that P t f ∞ ≤ f ∞ . Moreover, the process (X, I) is solution of the following SDE (we emphasize below the dependence on the initial condition):  0. Gathering those expressions, and since f N is bounded for every multi-index N ≤ N 1 , it is clear that P t f ∈ C N1 , with, for any j, k ≤ D, Hence, for any j ≤ N 1 , Finally, for any n ≥ 1, |x n | ≤ 1, so that

ODE and SDE methods in the standard setting
In the present section, we successively provide proofs for Theorems 2.10 and 2.11. We shall prove the former with a method involving an asymptotic pseudotrajectory for some interpolated process, similarly to Section 5.1 and [5]. On the contrary, the fluctuations obtained for (x n ) n≥1 in Theorem 2.11 are obtained through a more classic result for stochastic algorithms, namely the SDE method developed in [16] (see also [27]).
Note that (γ n D j=1 q(i n , j)h j − γ n+1 D j=1 q(i n+1 , j)h j ) is the main term of a telescoping series. It remains to bound the norm of the sum of γ n+1 p −1 by Theorem 2.10, we obtain   where Σ (p,Υ) is defined in (1.2). Classically, we should prove that lim n→+∞ r n = 0, in order to work in the framework of [16,, which is quite difficult.
Nevertheless, rather than checking that lim n→+∞ r n = 0 it is sufficient 3 to prove that r n = r (1) n + r 2 n , lim The sequence ( r q(i n+1 , j)h j . (5.12) The first line of (5.12) is a telescoping series and is bounded by α n γ n+1 which goes to 0.
The second line of (5.12) is bounded by, C m(t+T ) n=m(t) |α n+2 γ n+1 − α n+1 γ n | , (5.13) for some C > 0. Since (5.12) is a telescoping series as well, and goes to 0, we established the announced decomposition (5.11). As a conclusion, the diffusive limit (Y t ) t≥0 is the solution of (3.10), which trivially admits V : z → z as a Lyapunov function, as required in [16,. The only use of an assumption on the eigenelements of Σ (p,Υ) would be to guaranty the existence, uniqueness of and convergence to an invariant distribution for Y, which was already proved in Proposition 3.6.
3 This assertion can be easily checked at the end of [16, p.156], whose proof is based on usual arguments on diffusion approximation, such as [18]. The decomposition (5.11) is often assumed in more recent generalizations, see for instance [20]. Note that we cannot use directly [20], which besides does not provide functional convergence.