Metastable Markov chains: from the convergence of the trace to the convergence of the finite-dimensional distributions

We consider continuous-time Markov chains which display a family of wells at the same depth. We provide sufficient conditions which entail the convergence of the finite-dimensional distributions of the order parameter to the ones of a finite state Markov chain. We also show that the state of the process can be represented as a time-dependent convex combination of metastable states, each of which is supported on one well.


INTRODUCTION
Several different methods to prove the metastable behavior of Markov chains have been proposed in the last years [37,10,15,16,19,9,18]. Among them, the martingale approach [2,5,6] provides tools, based on the uniqueness of solutions of martingale problems, to derive the convergence of the trace of a reduced model. The two main results of this article state that with slightly stronger assumptions one can prove the convergence of the finite-dimensional distributions of the reduced model and express the state of the process as a time-dependent convex combination of metastable states.
Consider a sequence of finite sets (E N : N ≥ 1). The elements of E N are called configurations and are denoted by the Greek letters η, ξ, ζ. Let {η N (t) : t ≥ 0} be a continuous-time, E N -valued, irreducible Markov chain.
Trace process. Let η(t) be a process on a state space X. Fix a proper subset A of X. The trace of the process η(t) on the set A, denoted by η A (t), is the process obtained from η(t) by stopping its evolution when it leaves the set A and by restarting it when it returns to the set A. More precisely, denote by T A (t) the total time spent at A before time t: where 1{B} represents the indicator of the set B. Note that the function T A is piecewise differentiable and that its derivative takes only the values 1 and 0. It is equal to 1 when the process is in A and it is equal to 0 when it is not. Let S A (t) be the generalized inverse of T A (t): S A (t) = sup{s ≥ 0 : T A (s) ≤ t} .
The trace process is defined as η A (t) = η(S A (t)). It is shown in [2, Proposition 6.1] that if η(t) is a continuous-time, irreducible Markov chain, then η A (t) is a continuous-time, A-valued, irreducible Markov chain whose jump rates can be expressed in terms of the probabilities of hitting times of the original chain.
Slow variables. Consider a partition E 1 N , . . . , E n N , ∆ N , n ≥ 2, of the set E N , and let E N = ∪ 1≤x≤n E x N ,Ȇ x N = ∪ y =x E y N , so that (1.1) Here and below we use the notation A⊔B to represent the union of two disjoint sets A, B: A ⊔ B = A ∪ B, and A ∩ B = ∅. The set ∆ N has to be understood as a set which separates the sets E x N that we will refer to as wells. Let φ N : E N → {0, 1, . . . , n}, Ψ N : E N → {1, . . . , n} be the projections defined by Note that φ N (η) = 0 for η ∈ ∆ N , while Ψ N is not defined on the set ∆ N . In general, φ N (η N (t)) is not a Markov chain, but only a hidden Markov chain. We say that φ N is a slow variable if there exists a time-scale (θ N : N ≥ 1) for which the dynamics of φ N (η N (tθ N )) is asymptotically Markovian. This property is made precise in (H1), (H2) below.
Martingale approach. Let (θ N : N ≥ 1) be a time-scale, and denote by ξ N (t) the process η N (t) speeded-up by θ N : ξ N (t) = η N (tθ N ). Let D(R + , E N ) be the space of right-continuous functions ω : R + → E N with left-limits endowed with the Skorohod topology. Let P η = P N η , η ∈ E N , be the probability measure on the path space D(R + , E N ) induced by the Markov chain ξ N (t) starting from η. Expectation with respect to P η is represented by E η .
Let ξ EN (t) be the trace of the process ξ N (t) on E N , and recall that ξ EN (t) is a continuous-time, E N -valued Markov chain. Denote by X N (t), X T N (t) the hidden Markov chains given by X N (t) = φ N (ξ N (t)), X T N (t) = Ψ N (ξ EN (t)), respectively. Note that X N (t) takes values in {0, 1, . . . , n}, while X T N (t) takes values on the set S := {1, . . . , n}. Moreover, X T N (t) is the trace on the set S of the process X N (t).
A theory has been developped recently in [2,5,6] to prove the convergence of the process X N (t) = φ N (ξ N (t)) to a Markov chain. In the terminology introduced above, this corresponds to the identification of the slow variables of the process η N (t) in the time-scale θ N . Under some general assumptions, the approach yields that (H1) The dynamics X T N (t) = Ψ N (ξ EN (t)) is asymptotically Markovian: For all x ∈ S, and sequences η N ∈ E x N , under the measure P ηN , the process X T N (t) converges in the Skorohod topology to a Markov chain denoted by X(t); The first condition asserts that the trace on S of the process X N (t) converges to a Markov chain, while the second one states that the amount of time the process X N (t) spends outside S vanishes as N ↑ ∞, uniformly over initial configurations in E N .
The second condition can be restated as (1.2) Soft topology. It is clear that the convergence of the process X N (t) to X(t) in the Skorohod topology does not follow from conditions (H1) and (H2). Consider, for example, a continuous-time, S-valued Markov chain Y (t), and a sequence δ N > 0, δ N ↓ 0. Fix t 0 > 0, and define the process The sequence of processes Y N (t) fulfills properties (H1) and (H2), but Y N (t) does not converge to Y (t) in the Skorohod topology. Actually, not even the 1-dimensional distributions converge. This example is artificial, but in almost all models in which a metastable behavior has been observed (cf. the examples of Section 4), due to the very short sojourns of X N (t) in 0, the process X N (t) can not converge, in any of the Skorohod topologies, to X(t). To overcome this obstacle a weaker topology has been proposed in [29], called the soft topology, in which the convergence takes place.

Convergence of the finite-dimensional distributions
We propose here an alternative. Instead of searching for a weak topology in which the convergence takes place if conditions (H1) and (H2) are fulfilled, we show that conditions (H1), (H2) together with some further properties entail the convergence of the finite-dimensional distributions of X N (t) to those of X(t). The first main result of the article reads as follows: Proposition 1.1. Beyond (H1) and (H2), suppose that for all x ∈ S, Then, for all x ∈ S, and all sequences {η N : N ≥ 1}, η N ∈ E x N , under P ηN the finite-dimensional distributions of X N (t) converge to the finite-dimensional distributions of the chain X(t).
The proof of this result is presented in Section 2, together with several sufficient conditions for (1.3) to hold.

Metastability.
We have coined properties (H1) and (H2) as the metastable behavior of the Markov chain η N (t) in the time-scale θ N . However, it has been pointed out that in mathematical-physics metastability means the convergence of the state of the process. The second result of this note fills the gap between these two concepts by establishing that properties (H1), (H2) together with conditions (M1), (M2) below lead to the convergence of the state of the process to a convex combination of states supported on the wells E x N . The precise statement of this result requires some notation.
Denote by µ N the unique stationary state of the chain ξ N (t), and by µ y N , y ∈ S, the probability measure µ N conditioned to E y N : Note that µ y N is defined on E N and it is supported on E y N .
Reflected process. For x ∈ S, denote by {ξ N R,x (t) : t ≥ 0} the Markov chain ξ N (t) reflected at E x N . This is the Markov chain obtained from ξ N (t) by forbiding jumps from E x N to its complement (E x N ) c . This mechanism produces a new Markov chain whose state space is E x N , which might be reducible. We assume that for each x ∈ S the reflected chain ξ N R,x (t) is irreducible and that µ x N is its unique stationary state. In the reversible case this latter assumption follows from the irreducibility. In the non-reversible case, if the Markov chain η N (t) is a cycle chain (cf. [20,34]) it is easy to define the sets E x N for the reflected chain on E x N to be irreducible. Let (S R,x N (t) : t ≥ 0), be the semigroup of the Markov chain ξ N R,x (t). Trace process. Similarly, we denote by ξ N T,x (t) the trace on E x N of the process ξ N (t), and by (S T,x N (t) : t ≥ 0) the semigroup of the Markov chain ξ N T,x (t). Mixing times. Denote by µ − ν TV the total variation distance between two probability measures defined on the same denumerable set Ω: where x + = max{x, 0} denotes the positive part of x ∈ R. Hereafter, the set Ω will be either the set E N or one of the wells E )-mixing time of the reflected, trace processes, respectively: where δ η stands for the Dirac measure concentrated on the configuration η.
Hitting times. For a subset A of E N , denote by H A , H + A the hitting time and the time of the first return to A: where τ 1 represents the time of the first jump of the chain ξ N (t): Let (α N : N ≥ 1), (β N : N ≥ 1) be two sequences of positive numbers. The relation α N ≪ β N means that lim N →∞ α N /β N = 0. In the next result, we assume that for each x ∈ S there exists a set B x N ⊂ E x N fulfiling the following conditions: (M1) For every δ > 0 we have (1.10) Condition (M1) requires that, once in E x N , the process reaches the set B x N quickly. Additionally, condition (M2) imposes that it takes longer to leave the set E x N , when starting from B x N , than it takes to mix in E x N . Slightly more precisely, condition (M2) requests the existence of a time scale ε N , longer than the mixing time of the reflected process and shorter than the exit time from the set E x N . Note, however, that in condition (1.10) the initial configuration belongs to the set B x N , while in the definition of the mixing time the initial configuration may be any element of the set E x N . In any case, condition (1.10) is in force if ε N ≫ T N,R,y mix . Assume that the chain is reversible. Fix y ∈ S, denote by p R,y t (ζ, ξ) the transition probabilities of the reflected process ξ N R,y (t), and fix η ∈ B y N . By definition, where f t (ζ) = p R,y t (η, ζ)/µ y N (ζ) and t = ε N . By Schwarz inequality and a decomposition of f t along the eigenfunctions of the generator of the reflected process (cf. equation (12.5) in [35]), the square of the previous expression is bounded by exp{−2λ R,y t} f 0 2 µ y N , where λ R,y represents the spectral gap of ξ N R,y (t) and f 0 µ y N the norm of f 0 in L 2 (µ y N ). Since f 0 2 µ y N = 1/µ y N (η), as t = ε N , we conclude that δ η S R,y N (ε N ) − µ y N TV ≤ 1 µ y N (η) 1/2 e −λR,yεN . Therefore, in the reversible case, condition (1.10) of (M2) is fulfilled provided   12) and that either of the following three conditions (a), There exists a constant 0 < C 0 < ∞, such that for all y, z ∈ S, N ≥ 1, (1.14) Then, for every t > 0, x ∈ S and every sequence {η N : where (S N (t) : t ≥ 0) represents the semigroup of the Markov chain ξ N (t).
The article is organized as follows. Propositions 1.1 and 1.2 are proved in Sections 2, 3, respectively. In Section 4 we show that the assumptions of these propositions are in force for four different classes of dynamics. In the last section, we present a general bound for the probability that a hitting time of some set is smaller than a value in terms of capacities (which can be evaluated by the Dirichlet and the Thomson principles). Throughout this article, c 0 and C 0 are finite positive constants, independent of N , whose values may change from line to line.

CONVERGENCE OF THE FINITE-DIMENSIONAL DISTRIBUTIONS
In this section, we prove Proposition 1.1, and we present some sufficient conditions for (1.3). We will use the shorthand T N (t) for the time T EN (t) spent by the process ξ N (t) in E N before time t. Likewise, we will denote the generalized inverse of T N (t) by S N (t). Note that condition (H2) can be stated as The proof of Proposition 1.1 is based on the next technical result, which provides an estimate for the distribution of the trace process X T N in terms of the distribution of the process X N . Lemma 2.1. Assume conditions (H1) and (H2). Then, for all N ≥ 1, δ > 0, y ∈ S, η ∈ E N , and r > 3δ, for all r > 0, y ∈ S and Proof. Fix N ≥ 1, δ > 0, y ∈ S, η ∈ E N and r > 3δ. By definition of X T N , and since S N (r − 3δ) ≥ r − 3δ, A N (r, δ, y) = X N (s) = y for some r − 3δ ≤ s ≤ r − 2δ , and J (1) On the other hand, Denote by H the first time the process X N (s) hits the point y after r − 3δ: By the strong Markov property, the second term on the right hand side of the penultimate equation is equal to Recall from (1.1) the definition ofȆ y N . The previous probability is bounded by By condition (H1), J N (y, δ) vanishes as N → ∞ and then δ → 0. To complete the proof of the lemma, it remains to set R   Denote by P x , x ∈ S, the probability measure on D(R + , S) induced by the Markov chain X(t) starting from x. Since P x [X(t) = X(t−)] = 0 for all t ≥ 0, the finite-dimensional distributions of X T N converge to the ones of X(t). Proof of Proposition 1.1. We prove the result for one-dimensional distributions. The extension to higher order is immediate. Fix x, y ∈ S, r > 0, and a sequence {η N : N ≥ 1}, η N ∈ E x N . By assumption (H1), by Lemma 2.1, and by (1.3), the inequality in the penultimate formula must be an identity for each y ∈ S, which completes the proof of the proposition.
In many examples, however, it is not true that the right hand side vanishes, uniformly over configurations in E N , as N → ∞ and then δ → 0. In condensing zero ranges processes or in random walks in a potential field, starting from certain configuration in a valley E x N , in a time interval [0, δ], the chain ξ N (s) visits many times the set ∆ N and the right hand side of the previous inequality, for such configurations η, is close to 1.
Proof. Fix x ∈ S, η ∈ E x N and s > 0. Multiplying and dividing the probability In particular, condition (1.3) follows from the assumption of the lemma.
The next condition is satisfied by random walks in a potential field [11,31,34,33], illustrated by Example 4.3. It is instructive to think of the sets B x N ⊂ E x N below, as the deep part of the well E x N . The first condition requires the process to reach the set B x N quickly, while the second one imposes that it will not attain the set ∆ N in a short time interval when starting from B x N .
. By the strong Markov property and since s belongs to the interval [2δ, 3δ], the first term on the right hand side is bounded by which completes the proof of the lemma.
In Lemma 2.7 below we present some conditions which imply that, for all Recall from (1.4), (1.5) that µ x N represents the stationary measure µ N conditioned to E x N , and S R,x N (t) the semigroup of the reflected process on E x N . The third set of conditions which yield (1.3) relies on the next estimate. Lemma 2.5. For every 0 < T < δ < s, x ∈ S, and configuration η ∈ E x N , up to time T , we may couple the chain ξ N (s) with the chain reflected at the boundary of E x N , which has been denoted by ξ N R,x (s). By the Markov property at time T and replacing ξ N (s) by ξ N R,x (s), the second term of the previous equation becomes By definition of the total variation distance, and since, by assumption, the stationary measure of the reflected process is given by µ , this sum is less than or equal to The second term is equal to , which completes the proof of the lemma.

Corollary 2.6. Assume that for each
Then, condition (1.3) is in force.
Proof. The assertion of the corollary is a straightforward consequence of the assumptions and Lemma 2.4, Lemma 2.5 with η ∈ B x N .
Denote by λ N (η), η ∈ E N , the holding rates of the Markov chain ξ N (t). For two disjoint subsets A, B of E N , denote by cap N (A, B) the capacity between A and B: Similarly, for two disjoint subsets A, B of E x N we represent by cap N,x (A, B) the capacity between A and B for the trace process ξ N T,x (t): where λ T,x N (η) stands for the holding rates of the trace process ξ N T,x (t).
The following lemma offers sufficient conditions for having for any δ > 0, in terms of mixing time or capacity estimates.
The first term on the right hand side of the preceding equation vanishes ,as N → ∞, by (2.1). The second term is bounded by Up to this point, we proved that To prove the assertion of the lemma under the assumption (2.6), note that by Proposition A.2 in [5], where the last equality follows from identity (A.10) in [5].
Assume, now, that (2.7) is in force. The following argument is inspired by Theorem 6 in [1] and Theorem 1.1 in [38]. We include it here for completeness.
Recall from (1.5) that we denote by S T,x N (t) the semigroup of the trace process We may choose, for example, Since this estimate is uniform in η, we may iterate it, using the Markov property, to get (2.10) This expression vanishes, as N → ∞, if (2.7) is satisfied and if we choose ϑ N according to (2.9). Finally, if the process is reversible, by Theorem 5 in [1], there exists a finite universal constant C 0 such that Hence, (2.5) follows from (2.8) by Markov's inequality.
The previous lemma evidences the importance of an upper bound for the mixing time of the trace process. This is the content of Remark 2.8 below.
Denote by R N (η, ξ), η, ξ ∈ E N , the jump rates of the Markov chain ξ N (t), and by R T,x N (η ′ , ξ ′ ), η ′ , ξ ′ ∈ E x N , the jump rates of the trace process ξ N T,x (t). Assume that the Markov chain ξ N (t) is reversible and denote by D N , D N,T,x the Dirichlet form of the processes ξ N (t), ξ N T,x (t), respectively: By replacing, in the first line of the previous formula, the measure µ N by the conditioned measure µ x N , and by restricting the sum to configurations η, ξ ∈ E x N , we obtain the Dirichlet form of the reflected process, denoted by D N,R,x (f ).
Denote by T N,T,x rel , T N,R,x rel the relaxation times of the trace process ξ N T,x (t), the reflected process ξ N R,x (t), respectively: where the supremum is carried over all functions g : E x N → R with mean zero with respect to µ x N , and g µ x N represents the L 2 (µ x N ) norm of g: g 2 Hence, the Dirichlet form corresponding to the trace on E x N dominates the Dirichlet form corresponding to the reflected process on E x N and, consequently, the relaxation time T N,T,x rel of the former is smaller than the relaxation time T R,x rel of the latter. Then, by the proof of [35,Theorem 12.3], The right hand side of the preceding inequality, which is often used as an upper bound for the mixing time T N,R,x mix of the chain ξ N (·) restricted in the well E x N , is also a bound for the mixing time of the trace process. Remark 2.9. In many interesting cases, e.g. random walks on a potential field [11,31,34,33] or condensing zero-range processes [4,28], the set B x N may be taken as a singleton.

CONVERGENCE OF THE STATE
In this section, we prove Proposition 1.2. From now on, we assume that the number of valleys is fixed and that the sequence of Markov chains fulfills conditions (H1), (H2), and (M1), (M2). Proof. It is enough to show that all assumptions of Corollary 2.6 are fulfilled. The first one follows from (2.8) and assumption (M1). The second one is assumption (1.12). Finally, the third and fourth follow from condition (M2).
The next result is a straightforward consequence of the previous lemma and Proposition 1.1. Proof of Proposition 1.2. The proof is divided in several steps. At each stage we write the main expression as the sum of a simpler one and a negligible remainder.
By the strong Markov property, using the notation ξ N B = ξ N (H B y N ), the first term in (3.2) is equal to

Let us now define
and recall the definition of the time-scale ε N introduced in condition (M2). Rewrite the previous sum as where By (1.9), this latter expression vanishes as N → ∞. By the Markov property, the sum appearing in (3.3) is equal to On the set {H ∆N > ε N }, we may replace the chain ξ N (t) by the reflected chain at E y N , denoted by ξ N R,y (t). The previous expression is thus equal to This sum can be rewritten as where, by (1.9) and a similar argument to the one following (3.3) Since, for every η ∈ B y N , ξ ∈ E N , the first term of (3.4) is equal to where the remainder R Therefore, The first term in (3.5) can be written as where R The probability inside the expectation is less than or equal to whereȆ y N has been introduced in (1.1). Since µ y . On the other hand, the second term is less than or equal to and, by assumption (H1) and (1.13), Lemma 3.3 below shows that the first term in (3.6) is equal to where lim δ→0 lim sup We may rewrite the sum in (3.7) as where R (8) By (2.8) and condition (M1), for every 0 < δ < t, In view of the definition of p(η), the first term in (3.8) can be written as Clearly, ξ∈EN |R N (t, δ, ξ)| is less than or equal to where the supremum is carried over real numbers r, s in [0, t]. By assumption (H1) and Corollary 3.2, Up to this point we proved that where lim δ→0 lim sup Therefore, in view of (3.9), which completes the proof of the proposition, in view of (3.10) and Corollary 3.2. Proof. For all ξ ∈ E y N , (3.11) By (1.12), the first term of this sum vanishes, as N → ∞. It remains to show that the second term also vanishes under assumption (a), (b) or (c).
Assume first that (a) holds. Then, by reversibility, the last term in (3.11) is equal to This expression vanishes, as N → ∞, by assumption (H1). This completes the proof of the lemma under the hypothesis (a). Assume now that condition (b) is in force. In this case, the last term in (3.11) is bounded by Here again, by assumption (H1), this expression vanishes, as N → ∞. This completes the proof of the lemma under the hypothesis (b). Assume, finally, that condition (c) is fulfilled. Note that , by Markov's inequality. The last expression vanishes as N → ∞ by (1.12). Define the stopping time σ N as By repeating the arguments that led to (2.8) and (2.10) we obtain that This concludes the argument.

EXAMPLES
We present in this section four examples to evaluate the conditions introduced in the previous sections. The first example belongs to the class of models in which the metastable sets are singletons. In the second and third examples the metastable sets are not singletons, but the process visits all configurations of a metastable set before hitting a new metastable set. These processes are said to visit points. In the second example the assumptions of Lemma 2.3 are in force, but not in the third. For this latter class, we show that the conditions of Corollary 2.6 are fulfilled for an apropriate singleton set B x N . In the last example, the process does not visit all configurations of a metastable set before reaching a new metastable set. In these models the entropy plays an important role in the metastable behavior of the system. For this last model, we prove that the hypotheses of Lemma 2.4 hold.
The purpose of this section is not to show that the conditions of Lemmata 2.3, 2.4 or Corollary 2.6 are in force in great generality. Actually, in some cases, this requires lengthy arguments and a detailed analysis of the dynamics. We just want to convince the reader that this is possible. In other words, that one can deduce from conditions (H1), (H2) and some additional reasonable conditions the convergence of the finite-dimensional distributions and the convergence of the state of the process.
In the arguments below we use the Dirichlet and the Thomson principles for the capacities between two disjoint sets of E N . We do not recall these results here and we refer to [10, Section 7.3] Example 4.1 (Inclusion process [23,8]). Denote by T L , L ≥ 1, the discrete, onedimensional torus with L points. The inclusion process, and the condensing zero-range process presented below, describes the evolution of particles on T L . Denote by η x , x ∈ T L , the total number of particles at x, and by E N the set of configurations on T L with N particles: Let σ x,y η be the configuration obtained from η by moving a particle from x to y: where r(−1) = r(1) = 1, r(x) = 0, otherwise.
The inclusion process is clearly irreducible and it is reversible with respect to the probability measure µ N given by In this model the metastable sets E x N are singletons. This phenomenon occurs in many other models. For instance, in spin systems evolving in large, but fixed, volumes as the temperature vanishes (cf. the Ising model with an external field under the Glauber dynamics [36,40,3] and the Blume-Capel model with zero chemical potential and a small magnetic field [16,30,17]). It also occurs for random walks evolving among random traps [25,24].
We claim that all hypotheses of Propositions 1.1, 1.2 are in force. Actually, with the exception of (H1) and (H2), all assumptions trivially hold because the metastable sets are singletons. Set B x N = E x N = {ξ x,N }. A. Conditions (H1) and (H2). We already mentioned that assumptions (H1) and (H2) have been proved in [8] Since the process has been speeded-up by θ N = 1/d N , τ 1 is an exponential random variable of rate 2N . It is thus enough to choose a sequence ε N such that ε N ≪ 1/N . E. Condition (1.10) of (M2). This condition is empty because E x N = {ξ x,N }. It holds for any sequence ε N > 0. F. Condition (1.12) of (M2). This is a consequence of [8, Proposition 2.1] which asserts that µ N (ξ x,N ) → 1/L. G. Conditions (a), (b) or (c). Assumption (a) of Proposition 1.2 is in force as the process is reversible. [4,28]). Let E N , N ≥ 1, be the set given by (4.1). Fix α > 1, and define g : N → R + as g(0) = 0 , g(1) = 1 and g(n) = a(n) a(n − 1)
Fix 1/2 ≤ p ≤ 1, and denote by p(x) the finite-range transition probability given by p(1) = p, p(−1) = 1 − p, p(x) = 0, otherwise. Recall from (4.2) the definition of the configuration σ x,y η. The nearest-neighbor, zero-range process associated to the jump rates {g(k) : k ≥ 0} and the transition probability p(x) is the continuous-time, E N -valued Markov process {η N (t) : t ≥ 0} whose generator L N acts on functions f : E N → R as The Markov process corresponding to the generator L N is irreducible. The invariant measure, denoted by µ N , is given by where Z N is the normalizing constant.
Fix a sequence {ℓ N : N ≥ 1} such that 1 ≪ ℓ N ≪ N , and let According to equation (3.2) in [4], for each The condensing zero-range process is an example of a process which visits points in the sense that, starting from a well E N x , the dynamics visits all configurations of E N x before reaching another well. This property reads as follows.
Other examples of metastable dynamics which visit points are random walks in a potential field [14,11,31,34].
We show below that all hypotheses of Propositions 1.1, 1.2 are in force. In certain cases we impose further assumptions on the dynamics, e.g., that it is reversible or that |S| = 2, to avoid lengthy arguments. The main tool to prove this assertion is the fact that the process visit points. Recall from (2.4) that we denote by cap N (A, B) the capacity between two disjoint subsets A and B of E N . Since ξ N (t) is the process η N (t) speeded-up by θ N , by [4, Theorem 2.2], for any disjoint subsets A, B of S, where C(A, B) is the capacity between A and B for the random walk on S with transition probabilities p(y −x), for x, y ∈ S.
A. Conditions (H1) and (H2). Assumptions (H1) and (H2) have been proved in [4] in the reversible case and in [28] in the totally asymmetric case, p = 1. We assume from now on that the process is reversible: p(1) = p(−1) = 1/2. B. Condition (1.3). We prove that the assumptions of Lemma 2.4 are in force for B x N = {ξ x,N }, where ξ x,N represents the configurations in which all particles are placed at site x.
Fix x ∈ S and η ∈ E x N . By the Markov inequality and [2, Proposition 6.10], By (H1), page 806 in [4], Therefore, by (4.3), for every δ > 0, On the other hand, for every s > 0,  (1.9) of (M2). Since the exterior boundary of E x N is contained in for some finite constant C 0 . In particular, condition (1.9) of (M2) is fulfilled provided we choose ε N θ N ≪ ℓ α N . Indeed, by Corollary 5.4 we have On the other hand, by monotonicity of capacities Since the holding rates λ N (η) are uniformly bounded by C 0 θ N , if we denote by ∂E x N the interior boundary of the set E x N , the previous sum is bounded by ). An explicit computation shows that the measure of ∂E x N is bounded by ℓ −α N . The proof of this assertion is similar to the one of [4, Lemma 3.1] and is omitted.
Together with (4.6) and [4, Proposition 2.1], this gives (4.5). (Remark: In the case |S| = 2, it is possible to compute exactly cap N (ξ x,N , ∆ N ) and one gets that it is of order θ N ℓ −(1+α) N . We lost a factor 1/ℓ N at the first estimate in the preceding display.) E. Condition (1.10) of (M2). The proof relies on an estimate of the spectral gap. We prove this condition in the case of two sites, the general case can be handled using the martingale approach developped by Lu and Yau [26,Appendix 2].
Assume that |S| = 2, and denote by λ R,1 the spectral gap of the process ξ N (t) reflected at E 1 N = {0, . . . , ℓ N }. We claim that On two sites, the zero-range process is a birth and death process, and the reflected process on E 1 N is the continuous-time Markov chain whose generator is given by for all ζ = N − ℓ N , and g R,N (N − ℓ N ) = 0, due to the reflection at E 1 N . Denote by µ 1 N the stationary measure µ N conditioned to E 1 N . In order to prove (4.7), we have to show that there exists a finite constant C 0 such that for all N ≥ 1 and all functions f : {0, . . . , ℓ N } → R, where f, g µ 1 N represents the scalar product in L 2 (µ 1 N ).
Fix a function f : {0, . . . , ℓ N } → R. By Schwarz inequality, The sum over ξ ′ is bounded by C 0 η 1+α . Hence, since µ 1 N (η) ≤ C 0 η −α , changing the order of summations the previous expression is seen to be less than or equal to

This expression is bounded by
N because g is bounded below by a positive constant and the process is speeded-up by θ N . This proves claim (4.8), and therefore (4.7).

G. Conditions (a), (b) or (c). Assumption (b) of Proposition 1.2 is in force since
for all x, y ∈ S. Example 4.3 (Random walk in a potential field). In this example, the sets B x N are still reduced to singletons, B x N = {ξ x,N }, but µ N (ξ x,N ) → 0. To simplify the discussion as much as possible, we assume that the process is reversible and that the potential has two wells of the same height, but the arguments apply to the more general situations considered in [11,31,34].
Let Ξ be an open, bounded and connected subset of R d with a smooth boundary ∂ Ξ. Fix a smooth function F : Ξ ∪ ∂ Ξ → R, with three critical points, satisfying the following assumptions: Denote by Ξ N the discretization of Ξ: Ξ N = Ξ ∩ (N −1 Z) d , N ≥ 1. Let µ N be the probability measure on Ξ N defined by where Z N is the partition function Z N = η∈ΞN exp{−N F (η)}. By equation (2.3) in [31], where Hess F (x) represents the Hessian of F calculated at x and det Hess F (x) its determinant. Let {η N (t) : t ≥ 0} be the continuous-time Markov chain on Ξ N whose generator L N is given by

11)
where · represents the Euclidean norm of R d .
Recall that m i , i = 1, 2, represent the two local minima of F in Ξ, and σ the saddle point.
κ > 0, two balls of radius κ centered at the local minima. Assume that κ is small enough for sup x∈Vi F (x) < H. Denote by E i N the discretization of the sets It has been proved in [31,34] that the process X T N (t) fulfill conditions (H1) and (H2). We claim that the assumptions of Propositions 1.1 and 1.2 are in force.
We prove condition (1.3) through Corollary 2.6 with B i N = {ξ i,N }, where ξ i,N is a point in Ξ N which approximates the local minima m i . A. First condition of Corollary 2.6. Fix η ∈ E i N . By the Markov inequality, it is enough to prove that lim By [2, Proposition 6.10], the expectation is bounded by 1/cap N (η, B i N ). Consider a path (η 0 = η, η 1 , . . . , The factor θ N appeared as the process has been speeded-up. This expression vanishes as N → ∞ in view of (4.10), the definition of θ N , and because F ( Since it is clear that µ N (∆ N )/µ N (E i N ) → 0, we turn to the last two conditions of Corollary 2.6, which correspond to conditions (1.9) and (1.10) of (M2). B. Condition (1.9). Let h i = inf x∈∂Vi F (x). We claim that this condition is in force provided Since, under P ξ i,N , H (E i N ) c = H ∆N , we need to estimate P ξ i,N [H ∆N ≤ 2ε N ]. By Corollary 5.4, where ∂ − E i N stands for the inner boundary of E i N : By definition of E i N , the right-hand side of the penultimate formula is bounded above by C 0 ε N θ N N d exp{−N [h i − h]}, which proves the claim.
C. Condition (1.10). We claim that this condition is fulfilled provided for some b > 0.
We first estimate the spectral gap of the reflected process ξ N R,i (t), denoted by λ R,i . We claim that λ R,i ≥ c 0 θ N N −(d+1) . To prove this assertion, we have to show that for all N ≥ 1 and all functions f : E i N → R, where f, g µ i N represents the scalar product in L 2 (µ i N ). For each η ∈ E i N , denote by γ(η) = (η 0 = η, . . . , η M = ξ i,N ) a discrete version of the path from η to ξ i,N given byẋ(t) = −(∇F )(x(t)). This means that η j+1 − η j = 1 N , M ≤ C 0 N , and η j is the closest point of the lattice Ξ N to x(t j ) for some increasing sequence of times {t j } 0≤j≤M . Cleary, In particular, Since M ≤ C 0 N , by Schwarz inequality, where the last inequality follows from (4.15). Fix an edge (ζ, ζ ′ ) and consider all configurations η ∈ E i N whose path γ(η) contains this pair (that is (ζ, ζ ′ ) = (η j , η j+1 ) for some 0 ≤ j < M ). Of course, there are at most |E i N | ≤ C 0 N d such configurations. Hence, changing the order of summation, the previous sum is seen to be bounded above by This proves claim (4.14) since the double sum is equal to (2/θ N ) f, (−L R,i N )f µ i N . We turn to the proof of condition (1.10). Fix a sequence ε N satisfying (4.13) for some b > 0. By (4.10), µ N (ξ i,N ) ≥ c 0 N −d/2 . Hence, by (4.14),   [39,6] In this example, the metastable behavior is not due to an energy landscape but to the presence of bottlenecks. After attaining a well, the system remains there a time long enough to relax inside the well before it hits a point from which it can jump to another well. In this example, to fulfill condition (M1) the set B x N can not be taken as a singleton.
In many other models the entropy plays an important role in the mestastable behavior. In the majority of them, the time-scale in which the mestastable behavior is observed can not be computed explicitly and is given in terms of the spectral gap or the expectation of hitting times. This is the case of polymers in the depinned phase [13,12,27], or the evolution of a droplet in the Ising model with the Kawasaki dynamics [7,22].
We consider below a random walk on a graph E N which is illustrated in Figure 1 in the two-dimensional case. For N ≥ 1, d ≥ 2, let I N = {0, . . . , N }, Let µ N be the probability measure on E N given by where Z N is the normalizing factor. The measure µ N is the unique stationary (actually, reversible) state. Denote by θ N the inverse of the spectral gap of this chain. By [39,Example 3.2.5], there exist constants 0 < c(d) < C(d) < ∞ such that for all N ≥ 1, Recall that we denote by C the four corners of E N . Let ∆ N be the points at graph distance less than ℓ N from one of the corners: where d(η, ξ) stands for the graph distance from η to ξ.
We refer to Figure 1 for an illustration of these sets. Assumptions (H1) and (H2) for this model follow from the arguments presented in [ .7) is an easy consequence of (2.12). The following argument also works for d = 2.
Fix δ > 0, η ∈ E 0 N , and recall that we denote by C the set of corners. Let ε N ≪ 1 be a sequence such that N 2 ≪ ε N θ N . By equation (6.18) in [25], (4.17) We may therefore assume that the process ξ N (t) does not hit C before ε N . On this event, we may couple ξ N (t) with a speeded-up random walk ξ N (t) on I d N , and ξ N (t) hits B x N when ξ N (t) hits J d N . By Theorem 5 in [1] applied to ξ N (t), and in particular the first condition of Lemma 2.4. B. Second condition of Lemma 2.4. The argument is based on the fact that the process relaxes to equilibrium inside each cube much before it hits the corners. Fix δ > 0, δ < s < 3δ, η ∈ E 0 N , and let ε N be as in A, i.e. N 2 ≪ ε N θ N ≪ θ N . By (4.17), we may insert the event {H C > ε N } inside the probability appearing in the second displayed equation in Lemma 2.4. After this operation, applying the Markov property, the probability becomes On the set {H C > ε N }, we may couple the process ξ N (t) with the speeded-up, random walk reflected at Q 0 N . Denote by P 0 N the distribution with respect to this dynamics and by E 0 N the expectation. Up to this point we proved that Since the mixing time of the (speeded-up) random walk on Q 0 N is of order N 2 /θ N ≪ ε N , the previous expression is bounded by where µ 0 N is the stationary state of the reflected random walk. As µ 0 N (η) ≤ C 0 µ N (η), and since µ N is the stationary state, the previous expression is bounded by which completes the proof of the second condition of Lemma 2.4.
The convergence of the finite-dimensional distributions has been addressed in [6]. We show below that conditions (M1) and (M2) are in force in dimension d ≥ 3. C. Condition (1.8). It is enough to prove that for all δ > 0, This condition has been proved above in A.
D. Condition (1.9). Recall from (4.16) N . Up to the hitting time of the set ∆ N the process ξ N (t) behaves as the chain ξ N (t) introduced below (4.17). It is therefore enough to prove condition (1.9) for this latter process. Let ∆ N be the simplexes given by We have to show that for i = 1, 2, where P η stands for the distribution of ξ N (t) starting from η. By symmetry, it suffices to do so for i = 1. Set γ N = ε −1 N , and denote by ζ ⋆ N (t) the γ N -enlargement of the process ξ N (t). We refer to Section 5 for the definition of the enlargement and the statement of some properties. Denote by P ⋆ η the distribution of the process ζ ⋆ N (t) starting from η, and by V ⋆ the equilibrium potential between ∆ (1) To bound the equilibrium potential V ⋆ , we follow a strategy proposed in [6]. We first claim that Fix L N = 2ℓ N , and let f : N → R + the function given by where D ⋆ N represents the Dirichlet form of the enlarged process ζ ⋆ N (t). There are two contributions to the Dirichlet form D ⋆ N (F ⋆ ). The first one corresponds to edges whose vertices belong to the set The other contribution, is due to the edges between the sets Λ N and Λ ⋆ N . Since F ⋆ is bounded by 1, this contribution is bounded by 1 4 This completes the proof of (4.20).
We turn to (4.19). Let ≺ be the partial order on J d N defined by η ≺ ξ if η i ≤ ξ i for 1 ≤ i ≤ d. We may couple two copies of the process ξ N (t), denoted by ζ η N (t), ζ ξ N (t), starting from η ≺ ξ, respectively, in such a way that ζ η Suppose that (4.19) does not hold. There exists, therefore, δ > 0, a subsequence N j , still denoted by N , and a configuration η N ∈ J d N such that V ⋆ (η N ) ≥ δ. By the previous inequality and by definition of J d N , V ⋆ (ξ) ≥ δ for all ξ such that max i ξ i ≤ M N . In particular, Comparing this bound with (4.20) we deduce that δ 2 γ N M d N ≤ C 0 ℓ d−2 N θ N , which is a contradiction since γ N = ε −1 N and ε N θ N ≪ M d N /ℓ d−2 N . E. Condition (1.10). It is well known that the mixing time of a random walk on a d-dimensional cube of length N is of order N 2 , which proves that condition (1.10) is fulfilled since ε N θ N ≫ N 2 .
F. Last conditions of Proposition 1.2. Condition (1.12) is clearly in force by definition of ∆ N . On the other hand the chain is reversible.

APPENDIX
We present in this section a general estimate for the hitting time of a set in Markovian dynamics. Fix a finite set E and let {η(t) : t ≥ 0} be a continuoustime, irreducible, E-valued Markov chain. Denote by π the unique stationary state of the process, by R(η, ξ), η, ξ ∈ E its jump rates, and by P η its distribution starting from η.
We start with an elementary lemma.
Lemma 5.1. Let X, T γ be two independent random variables defined on some probability space (Ω, F , P ). Assume that T γ has an exponential distribution of parameter γ > 0. Then, for all b > 0, P X ≤ b ≤ e γb P X ≤ T γ .
Proof. Since X and T γ are independent, for every b > 0, The last term is equal to e −γb P X ≤ b , which completes the proof of the lemma.
Note that if X is an exponential random variable of parameter θ, the inequality reduces to 1 − e −θb ≤ e γb θ θ + γ · Hence, choosing γ = 1/b, if θb is small, the inequality is sharp in the sense that the left-hand side is equal to θ b + O([θ b] 2 ), while the right-hand side is equal to e θ b + O([θ b] 2 ).
Taking into account that for every ξ ∈ E we have P ⋆ ξ H A ⋆ > H A = 1 because points η ⋆ ∈ A ⋆ are only accessible from η ∈ A, the preceding computation gives Denote by ν ⋆ A,B the equilibrium measure between A, B for the chain ξ γ (t), which is concentrated on the set A and is given by If A is a set with small measure with respect to the stationary measure, it is expected that, for most configurations η ∈ E, H A is approximately exponentially distributed under P η . Let λ −1 be its expectation, so that P η H A ≤ b ≈ 1 − exp{−bλ} ≈ bλ, provided bλ ≪ 1. On the one hand, by [5,Proposition A.2], where V * η,A is the equilibrium potential between η and A for the time-reversed dynamics, and cap(η, A) the capacity between η and A. If V * η,A π ≈ 1 (for instance, because π(η) ≈ 1), we conclude that λ ≈ cap(η, A). On the other hand, choosing γ = b −1 as the parameter for the enlarged process, for every η ∈ E, Once more, if V ⋆, * η,E ⋆ π ⋆ ≈ 1, we conclude that b −1 ≈ cap ⋆ (η, E ⋆ ), so that The next lemma establishes this estimate.

Lemma 5.3. Fix a proper subset
A of E. For every b > 0 and η ∈ E \ A, and Proof. Fix a proper subset A of E, b > 0 and η ∈ E \ A. Fix γ > 0, and consider the γ-enlarged process. Denote by H E ⋆ the hitting time of the set E ⋆ . By definition of the enlargement, under P ⋆ η , H E ⋆ has an exponential distribution of parameter γ and is independent of H A . Hence, by Lemma 5.1, The previous probability is the value of the equilibrium potential between A and E ⋆ computed at the configuration η, denoted hereafter by V ⋆ A,E ⋆ . By equation (3.3) in [30] and by (5.2), the previous expression is bounded by This proves the first assertion of the lemma. We may also rewrite the right-hand side of (5.5) as where 1{η} represents the indicator of the set {η}. By [5,Proposition A.2], the previous sum is equal to where P ⋆, * represents the distribution of the process ξ γ (t) reversed in time, and ν A,E ⋆ the equilibrium measure given by (5.4). By definition of the enlarged process, for every initial condition η ∈ E, H E ⋆ has an exponential distribution of parameter γ. The penultimate displayed equation is thus bounded by γ −1 cap ⋆ (A, E ⋆ ), which completes the proof of the lemma.