Metastable Markov chains

We review recent results on the metastable behavior of continuous-time Markov chains derived through the characterization of Markov chains as unique solutions of martingale problems.

We present in this review recent developments in the theory of metastable Markov chains. The goal of the theory consists in describing the evolution of a Markov chain by a simpler dynamics, typically one whose state-space is much smaller than the original one, preserving the "macroscopic" features of the original process.
To illustrate the problem, we present in the next section an example which motivates the definitions of metastability introduced in Section 2. We then develop three general methods, based on the characterization of Markov chains as solutions of a martingale problems, to derive the metastable behavior of these dynamics.
There are two recent and compulsory monographs on this subject. The first one, by Olivieri and Vares [110], addresses the problem from the perspective of the large deviations theory, and the second one, by Bovier and Den Hollander [31], uses potential theoretic tools. We do not recall these approaches here and refer the reader to the books. The reader will also find there physical motivations, an historical account and an exhaustive list of references, three aspects which are overlooked here. We tried, though, to include in the references the articles published after 2015.
Throughout the article, all new notation and concepts are introduced in blue. We believe this will help the reader who may want to skip some introductory parts. We present in Section 13 and 14 all results on Markov chains and potential theory used in the article. Comments on the method presented in this review are left to the end of Subsection 2.3.

A random walk in a graph
We present in this section an example of a Markov chain to motivate three different definitions of metastability. Denote by E N , N ≥ 1, the set shown in Figure 1. In this picture, each large square represents a d-dimensional discrete cube of length N , Λ N = {1, . . . , N } d , d ≥ 2. Each pair of neighboring cubes has one and only one common point. In particular, E N has 4(N d − 1) elements. Elements of E N are represented by the Greek letters η, ξ, ζ, and are called points or configurations. Let E j,N , 0 ≤ j ≤ 3, be copies of Λ N . The set E N is formed by the union of the sets E j,N in which some corner points have been identified. We denote by E 0,N the north cube and proceed labeling the sets in the clockwise order so that E 3,N represents the west cube.
Denote by η N (t) the continuous-time, E N -valued, Markov chain which waits a mean-one exponential time at each configuration and then jumps uniformly to one of the neighbor points. This Markov chain is clearly irreducible. Denote by deg (η), η ∈ E N , the degree of the configuration η, that is the number of neighbors. The measure π N , defined by π N (η) = Z −1 N deg (η), where Z N is the normalizing constant which turns π N a probability measure, satisfies the detailed balance conditions, and is therefore the unique stationary state.
The purpose of this section is to provide a synthetic description of the Markov chain η N (t). In this example, the reduced model is evident. Denote by Υ N : E N → {0, 1, 2, 3} the projection which sends a configuration in E j,N to j: where χ A stands for the indicator function of the set A. The value of Υ N at the intersections of the cubes is not important and can be set arbitrarily.
The derivation of the asymptotic evolution of the coarse-grained model is based on properties of random walks evolving on discrete cubes. Denote by z N (t) the symmetric, continuous-time random walk on Λ N [the process η N (t) restricted to Λ N ], and by π Λ N its stationary state, the probability measure which gives weights proportional to the degree of the vertices. It is well known, cf. [96,Proposition 10.13], that the mixing time of z N (t) is of order N 2 and that the time needed to hit a point at distance N is of order α N = N 2 log N in dimension 2, and α N = N d in dimension d ≥ 3.
Assume that the chain starts at the center of the cube E j,N . Denote by B the set of points which belong to more than one cube, called hereafter the intersection points, and by H N B the hitting time of B: Since the mixing time is of order N 2 and the hitting time H N B is of a much larger order, the chain equilibrates, or thermalizes, before reaching one of the corners of E j,N . This mean that the distribution of the chain approaches π Λ N before attaining B. In particular, η N (t) looses track of its starting point before hitting one of the corners, and it reaches one of the two intersection points with a probability close to 1/2.
After thermalizing inside the cube E j,N , the random walk η N (t) wanders around E j,N for a length of time of order α N , and then attains a point in the intersection of E j,N with E j±1,N , where summation is performed modulo 4. Denote this point by ξ, and assume, to fix ideas, that it belongs to E j,N ∩ E j+1,N . Fix a sequence ( N : N ≥ 1) such that N → ∞, N /N → 0. The precise choice of N is not important. Denote by V N the set of points in E N which are at an Euclidean distance N or less from ξ. After hitting ξ, the random walk performs some short excursions from ξ to ξ which remain in V N . Some of these excursions are contained in the set E j,N and some in E j+1,N .
It takes a time of order 2 N for η N (t) to escape from V N , that is, to reach a point in V c N , the complement of V N . Note that 2 N is much smaller than α N and so the escape time from V N is negligible in this time-scale.
Starting from a point at the external boundary of V N , it takes a time of order N 2 log N in dimension 2 and N d in dimension d ≥ 3 to hit again the set B. Since this time is much longer than the mixing time, once in V c N , before hitting the set B again, the process equilibrates inside the cube. Thus, we are back to the initial situation, and we can iterate the previous argument to provide a complete description of the evolution of the random walk η N (t) among the cubes.
According to the previous analysis, the evolution of the random walk can be described as follows. Starting from a point not too close from the corners, the random walk equilibrates in the cube from where it starts before it reaches one of the intersection points. Since it has equilibrated, it reaches one of the two boundary points with equal probability. Then, after some short excursion close to the intersection point, it escapes from the corner to one of the neighboring cubes, with equal probability due to the symmetry of the set E N . In particular, with probability 1/2 the random walk returns to the cube from which it came when it hit the intersection point. The escape time being much shorter than the equilibration time, the small excursions around the intersection can be neglected in the asymptotic regime. After escaping, the process equilibrates in the cube where it is and we may iterate the description of the evolution.
Loss of memory being the essence of Markovian evolution, in the time-scale α N , the coarse-grained, speeded-up process should evolve as a S := {0, 1, 2, 3}-valued, continuous-time Markov chain Y(t) with holding rates equal to some λ > 0 and jump probabilities given by p(j, j ± 1) = 1/2.
In which sense can Y N (t) converge to a Markov chain? Figure 2 presents a typical realization of the process Y N (t). The process remains a time interval of order α N at a point x ∈ S until η N (t) reaches an intersection point. At this time, η N (t) performs very short excursions [in the time scale α N ] in both neighboring squares. These short excursions are represented in Figure 2 by the bold rectangles to indicate a large number of oscillations in a very short time interval. After many short excursions the random walk escapes from the boundary and remains in one of the neighboring cubes for a new time interval of order α N . These fluctuations in very short time intervals, represented by the black rectangles in Figure 2, rule out the possibility that Y N (t) converges in any of the Skorohod topologies. Thus, either we content ourselves with the convergence of the finitedimensional distributions or we need to adjust the trajectories of Y N (t) by removing these short excursions.
The first step consists in introducing a set ∆ N ⊂ E N to separate the squares E j,N . This procedure is illustrated in Figure 3, where E j N represents E j,N \ ∆ N . The set ∆ N is not unique. We only require that it is small enough for the fraction of time spent in ∆ N to be negligible, but large enough for the process, starting from a point outside of ∆ N , to equilibrate before it hits an intersection point.
In the example of this section, the set E k N can be the points of E k,N which are at distance at least N from the intersection points, or, as in Figure 3, the set of points at distance greater than N from the faces of the cubes. Here, as above, N is a sequence such that N → ∞, N /N → 0. Figure 3: The sets E k N are indicated in blue. The two red dots represent points in E 0 N and E 2 N . The trace process η E (t) may jump from one to the other. It has therefore long jumps, in contrast with the original random walks which only jumps to nearest neighbors. The picture is misleading as the annulus around each blue square is much smaller than the square.
In the next section, we propose two different types of amendments of the trajectories of η N (t) to achieve convergence in the Skorohod topology of the coarse-grained model.
Before we turn to that, consider the example shown in Figure 4. Assume that each line has N points, counting the common intersection point. Consider a random walk evolving on this graph. The process waits a mean-one exponential time at the end of which it jumps to one of its neighbors with equal probability. Since one-dimensional random walks on a set of N points equilibrate in a time of order N 2 , and since it hits a point a distance N in the same time-scale, there is no separation of scales and the argument presented above to claim the possibility of a synthetic description of the dynamics does not apply.

Metastability as model reduction
The phenomenon described in the previous section, in which a process remains a long time in a set in which it equilibrates before it attains, in a very short transition, another set where the same behavior is observed, is shared by many different types of dynamics (cf. Section 15 for many examples).
For this reason, we present in a general framework the adjustments needed in the trajectory of the coarse-grained model to yield convergence in the Skorohod topology. Let (E N : N ≥ 1) be a sequence of finite state spaces. Elements of E N are represented by the Greek letters η, ξ, ζ. Denote by η N (t) a continuous-time, E N -valued, irreducible Markov chain. Its generator is represented by L N and its unique stationary state by π N . Therefore, for every function f : E N → R, where R N (η, ξ) stands for the jump rates.
For a nonempty subset A of E N , let H A , resp. H + A , stands for the hitting time of the set A, resp. the return time to A, in this formula, τ 1 represents the time of the first jump of η N (t), Assume that E N contains n > 1 disjoint sets E 1 N , . . . , E n N , called valleys, separated by a set ∆ N , so that E 1 N , . . . , E n N , ∆ N forms a partition of E N . Let S := {1, . . . , n}, and denote by Φ N : E N → S ∪ {d} the projection which sends a configuration in E j N , ∆ N to j, d, respectively: Let X N (t) be the (S ∪ {d})-valued process given by In the example of the previous section, the trajectory of X N (t) = Φ N (η N (t)) resembles the one presented in Figure 2 with additional spikes due to very short excursions [in the time scale α N ] out of E k N which occur far from the intersection points.

Last passage
The first adjustment of the trajectories which enables convergence in the Skorohod topology consists in removing the fast fluctuations by recording the last set E k N visited by η N (t). For t > 0, denote by η N (t −) the left limit of η N at t: Let X V N (t) be given by and w N (t) represents the last time before t the process was in one of the valleys E k N : If the set on the right-hand side is empty, we set w N (t) = 0. This remark is not important as we will always start the process from a configuration in E N . Note that Since H E j N is of order α N , the trajectory of X V N (t) is formed by a sequence of time intervals of this magnitude in which the process remains constant. The objections raised above for the convergence in the Skorohod topology are thus overturned, and we may expect, due to the loss of memory which emerges from the equilibration, that in the time scale α N , X V N (t) converges to a S-valued Markov chain in the Skorohod topology.
Definition 2.1 (Metastability according to LP). The Markov chain η N (t) is said to be metastable, in the sense of last passage, in the time-scale θ N if there exists a partition {E 1 N , . . . , E n N , ∆ N } of the state space E N and a S-valued, continuous-time Markov chain X(t) such that (LP1) For any k ∈ S = {1, . . . , n} and any sequence (η N : The sets E j N are called valleys and the process X(t) the reduced model.
The main difficulty in proving such a result lies in the fact that the process η N ( v N (t) ) is not markovian. For this reason we propose an alternative modification of the trajectory which keeps this property. This method requires the definition of the trace of a process, which we present below in the context of continuous-time Markov chains taking values in a finite state space.

Trace process
Let E be a finite set and let η(t) be an irreducible, continuous-time, E-valued Markov chain. Denote by R(η, ξ), η = ξ ∈ E, the jump rates of this chain, by λ(η) = ξ∈E R(η, ξ) the holding rates, and by π the unique stationary probability measure. Denote by D([0, ∞), E) the space of right-continuous trajectories ω : [0, ∞) → E which have left limits endowed with the Skorohod topology [26]. This notation will be used below, without further comments, replacing E by another metric space. Let P η , η ∈ E, be the probability measures on D([0, ∞), E) induced by the Markov chain η(t) starting from η. Expectation with respect to P η is represented by E η .
Fix a non-empty, proper subset F of E and denote by T F (t), t ≥ 0, the total time the process η(t) spends in F on the time-interval [0, t]: where, we recall, χ F represents the indicator function of the set F . Denote by S F (t) the generalized inverse of the additive functional T F (t): The irreducibility guarantees that for all t > 0, S F (t) is finite almost surely. The process T F is continuous. It is either constant, when the chain visits configurations which do not belong to F , or it increases linearly. Figure 5 illustrates this behavior. Denote by η F (t) the trace of the chain η(t) on the set F , defined by η F (t) := η(S F (t)). Taking the trace of the process corresponds to changing the axis of time in Figure 5. When the process hits F c , time is frozen until η(t) reaches F again, at which time the clock is restarted. In particular, η F (t) takes values in the set F .
It can be proven [15,Section 6] that η F (t) is an irreducible, continuous-time, Fvalued Markov chain. The jump rates of the chain η F (t), denoted by R F (η, ξ), are given by where the hitting time H A and the return time H + A have been introduced in (2.1). The unique stationary probability measure of the trace chain, denoted by π F (η), is the measure π conditioned to F : Figure 5: An example of the transformation which maps the chain η(t) into its trace on the set {a, b}. The first graph shows the trajectory of η(t), the second one the function T F (t) for F = {a, b}, and the third one the trajectory η F (t) = η(S F (t)). Note that S F (t) is obtained from T F (t) by inverting the roles of the x and y axes.

Metastability
We return to the chain η N (t) introduced at the beginning of this section. Denote by P N η , η ∈ E N , the probability measures on D([0, ∞), E N ) induced by the Markov chain η N (t) starting from η. Expectation with respect to P N η is represented by E N η . Denote by η E N (t) the trace of the process η N (t) on the set E N . As explained in Figure 5, by taking the trace of η N (t) on E N we first remove from the trajectory the time-intervals corresponding to the excursions in ∆ N (the intervals in black in the leftmost picture of Figure 5), and then, we push back the trajectory, as in the rightmost picture of this figure. This procedure removes rapid fluctuations from the trajectory providing an alternative definition of metastability.
Let Ψ N : E N → S the projection which sends a configuration in E j N to j: In contrast with Φ N , Ψ N is defined only on E N . Let X T N (t) be the process given by Note that X T N (t) is not a Markov chain, but just a hidden Markov chain. It corresponds to the trace on S of the process X N (t) introduced in (2.2).
) converges in the Skorohod topology to X(t); The first condition asserts that in the time scale θ N the trace on S of the process X N (t) converges to a Markov chain, while the second one states that in this time scale the amount of time the process X N (t) spends outside S is negligible, uniformly over initial configurations in E N . In particular, condition (T2) can be stated as Remark 2.3. The use of the word "metastability", instead of tunneling, to name the phenomenon described in the previous section, might be inadequate. Metastability has been used to represent the transition from a metastable state to a stable one. This corresponds to the case in which the reduced model X(t) takes value in a set with two elements, one being transient and the other absorbing. We allow ourselves this abuse of nomenclature.
Remark 2.4. The same sequence of Markov chains (η N (t) : N ≥ 1) may have more that one metastable description. In a certain time-scale α N , one may observe transitions between shallow valleys and in a much longer time-scale β N transitions between deeper valleys.

Remark 2.5.
There are examples of Markov chains [63,62,77,78,13] with a countably infinite number of valleys. In these cases, the reduced model X(t) is a continuoustime Markov chain in a countable state-space. In this article, we restrict ourselves to the finite case to avoid technical issues on the martingale problem.
Remark 2.6. One of the main features of metastability is the fast transition between valleys. This information is encapsulated in condition (T2) which states that the time spent outside the valleys is negligible. In particular, the transition time between two valleys is negligible in the metastable time-scale. Remark 2.7. All results presented in this review are in asymptotic form, they characterize the limiting behavior of the coarse-grained model. Quantitative estimates at fixed N are important in concrete problems. For example, to describe synthetically a molecular dynamics which can be represented as a Markov chain in a very large, but fixed, state space. The problem consists in finding a reduced model which keeps the main features of the original chain. It might be interesting to adapt the approach presented here to this framework.
The transition path theory [54,101,33,98] has been designed for this set-up, as well as the intertwining method [11,8,9,10]. See also the results by Bianchi and Gaudillière [24] Remark 2.8. In constrast with the pathwise approach [38,110], no attempt is made here to describe the transition path between two valleys. Remark 2.9. In the example of the previous section, the process X N (t) remains constant in time-intervals of length of order α N . In this sense, Ψ N can be understood as a slow variable, since it evolves in a much longer time-scale than the original process, and metastability as the search for slow variables and the description of the evolution of these slow variables.

Remark 2.10.
In most examples, as the Ising model at low temperature [104,105], metastability is observed as a result of the presence of an energy barrier which the system has to overpass to reach a new region of the state-space.
The example of the previous section is of different nature. In this model, there is no energy landscape but a bottleneck which creates a metastable behavior. Here, entropy [the number of configurations] determines the height of the barriers. Say, for example, that three squares are 3-dimensional while the last one is 2-dimensional. In this case, in the time-scale N 3 , one observes an evolution among the 3-dimensional cubes and the last square can be included in the set ∆ N as the time spent there is of order N 2 log N .
In other models, as random walks in a potential field, both energy and entropy play a role.

Finite-dimensional distributions
Definition 2.1 describes the evolution of a modified version of the original process, and Definition 2.2 the one of the trace. To avoid tiny surgeries of the trajectories, we may turn to the convergence of the finite-dimensional distributions, an alternative adopted by Kipnis and Newman in [80] and Sugiura [120,121]. Note that while X N (t) takes value in S ∪ {d}, X(t) is S-valued.
The article is organized as follows. We present, in Sections 4-7, a general scheme to derive the metastable behavior of a Markov chain in the sense of Definition 2.2 for dynamics which "visit points". This approach is based on the characterization of Markov chains as solutions of martingale problems, examined in Section 3. In the following two sections, an alternative approach is proposed for dynamics in which the entropy plays a role in the metastable behavior. In Section 10, we discuss tightness. In Section 11, we show that conditions (T1), (T2) entail the metastability in the sense of the last passage, and, in Section 12, we prove that these conditions together with property (12.1) lead to the convergence of the finite-dimensional distributions. In Section 13 and 14 we recall some general results on Markov chains and potential theory used in the article. In the last section, we list some dynamics which fall within the scope of the theory.

Martingale problems
The proof of condition (T1) in Definition 2.2 relies on the uniqueness of solutions of martingale problems, the subject of this section. To avoid technical problems, we restrict ourselves to the context continuous-time Markov chains taking values in a finite state-space E. We refer to the classical books [119,55] for further details.
Denote by L the generator of the Markov chain η(t): for every function f : E → R, It is well known that for every f : E → R, is a zero-mean martingale in (Ω, (F o t ), P). It turns out that the converse is true. Let A be the generator of an E-valued, irreducible, continuous-time Markov chain, and ν a probability measure on E.
Definition 3.1 (The martingale problem (A, ν)). A probability measures P on (Ω, F) is a solution of the martingale problem associated to the generator A and the measure ν if for every f : E → R the process M f given by (3.2) Next result is a particular case of Theorem 4.4.1 in [55]. This result provides a simple strategy to prove condition (T1) of Definition 2.2. Fix k ∈ S, a sequence η N ∈ E k N , and denote by P N the probability measure on D([0, ∞), S) induced by the process X T N (t) = X T N (tθ N ) and the measure P N η N . Prove first that the sequence P N is tight. Then, to characterize the limit points, show that they solve a martingale problem (L, δ k ), where L is the generator of a S-valued Markov chain [guessed a priori] and δ k the probability measure on S concentrated on k. Tightness is postponed to Section 10 and uniqueness is discussed in the next sections.

The martingale approach
We carry out in this section the strategy outlined in the previous section to prove the uniqueness of limit points of the sequence X T N (t). It is based on the uniqueness of solutions of martingale problems, presented above, and on the fact that limits of martingales are martingales recalled below.
Let (Ω, F, P) be a probability space, (F t : t ≥ 0) a filtration, and (M N : N ≥ 1) a sequence of martingales measurable with respect to the filtration. Proof. Fix 0 ≤ s < t and a bounded random variable Y , measurable with respect to F s . Since M N is a martingale, As Fix k ∈ S, a configuration η N in E k N , and denote by P N the probability measure on D([0, ∞), S) induced by the process X T N (t) and the measure P N η N . The main result of this section asserts that all limit points of the sequence X T N (t) solve a martingale problem (L, δ k ) if we can prove a local ergodic theorem and calculate the limit of the coarse-grained jump function, properties (P1) and (P2) formulated at the end of this section.
Fix a function F : S → R. As the trace process is a Markov chain, is a martingale. In this formula, L E N represents the generator of the trace process η E N (t). Since X T N (t) = Ψ N (η E N (tθ N )), changing variables this expression becomes Denote by R T N (η, ξ) the jump rates of the trace chain η E N (t). The expression inside of the integral can be written as .
where R ( ) N (ζ) represent the jump rate from the configuration ζ to the set E N for the trace process speeded-up by θ N : Up to this point, we proved that the martingale M N (t) is equal to If the functions R Furthermore, if for all j = ∈ S, the sequences r ( ) N (j) converged to some r(j, ) ∈ R + , one could replace in the previous formula r ( ) N (X T N (s)) by r(X T N (s), ) at the cost of a small error.
Therefore, under the two previous conditions, up to a negligible error, is a martingale. Denote by P a limit point of the sequence P N . Let X(t) represent the coordinate process of D([0, ∞), S): Assume that P[ X(t−) = X(t) ] = 1 for all t > 0, where X(t−) = lim s<t , s→t X(s).
Suppose, without loss of generality, that P N converges to P. Let L be the generator of the S-valued Markov chain associated to the jump rates r. As P[ X(t−) = X(t) ] = 1, the finite-dimensional projections are continuous (cf. equation (13.3) in [26]). Thus, since the expression in (4.3) is uniformly bounded, we may pass to the limit and conclude from Lemma 4.1 that is a martingale under the measure P. Moreover, as η N ∈ E k N , P N [X(0) = k] = 1 for all N so that P[X(0) = k] = 1. Therefore, P is a solution of the (L, δ k ) martingale problem. By Theorem 3.2, this property characterizes P, and under this measure the coordinate process is a continuous-time Markov chain whose generator is L.
We summarize the conclusions of the previous analysis in Theorem 4.2 below. We first formulate the main hypotheses. N : E N → R + , ∈ S, which are constant on the sets E j N , j ∈ S, and such that for every function F : S → R, t > 0, and sequence η N ∈ E N , for some non-negative real numbers r N (k, ), named the coarse-grained jump rates. Note from the formula for the martingale M N (t) that the values of r N ( , ) are unimportant. We assume that these rates converge: There exist r(j, ) ∈ [0, ∞), such that for all j = ∈ S, Theorem 4.2. Fix k ∈ S, a sequence η N ∈ E k N , and denote by P N the probability measure on D([0, ∞), S) induced by the process X T N (t) and the measure P N η N . Assume that conditions (4.4) and (4.5) are in force. Then, every limit point P of the sequence P N such that P X(t−) = X(t) = 1 for all t > 0 .
solves the (L, δ k ) martingale problem, where L is the generator of the S-valued Markov chain whose jump rates are r(j, ).
Note that we do not need to prove property (P1) with an absolute value inside the expectation. This observation simplifies considerably the proof of this replacement.
We present in Sections 5, 6 sufficient conditions, formulated in terms of the stationary state and of capacities between the sets E j N , for conditions (P1), (P2) to hold. In Sections 8, 9 we propose alternative proofs of the uniqueness of limit points for the sequence P N .

Local ergodicity
In this section, we provide sufficient conditions, formulated in terms of the stationary state and of capacities, to replace the jump rates R where λ N (η) stands for the holding rate at η of the Markov chain η N (t): λ N (η) = ξ∈E N R N (η, ξ).

Recall that
where R T N (η, ξ) represents the jump rates of the trace process. Thus, R (k) N (η) is the rate at which the trace process jumps from η to E k N multiplied by θ N . In view of equation (2.5) for the jump rates of the trace process, In particular, R (k) N vanishes in the interior of the sets E j N , where by interior we mean the set of configurations in E j N whose neighbors belong to N is a singular function. While it vanishes in the interior of the sets E j N , it assumes a large value at the boundary because the right-hand side of (5.2) is multiplied by θ N .
The goal of this section is to replace the time integral of the singular function R (k) N by the time integral of a very regular function, one which is constant at each set E j N . This replacement is expected to hold whenever the process equilibrates in the valleys E j N before it jumps to a new one.
Let f N : E N → R be a sequence of real functions defined on E N . Fix t > 0, and consider the time integral The time integral can be decomposed according to the sojourns in the sets E j N . If the process equilibrates during these visits, by the ergodic theorem, we expect the integral of f N over these time-intervals to be close to the integral of the mean value of f N on these sets. Hence, let where G N represents the σ-algebra of subsets of E N generated by the sets E j N , j ∈ S, and π E the stationary state of the trace process η E N (t) [which, by (2.6), is the stationary state π N conditioned to E N ]. Clearly, The function f N is the candidate, and one expects that, under certain conditions on the sequence f N , where the supremum is carried over all configurations η = ξ j,N .
(b) The sequence g N is uniformly bounded and is constant over each set E j N : There exist a finite constant C 0 and a sequence of functions G N : S → R such that Then, for all t > 0, In the reversible case, this result follows from Corollary 6.5 and Proposition 6.10 in [15] and from the hypotheses of the theorem. In the nonreversible case, it follows from Corollary 6.5 in [15] and Proposition A.2 in [19].
Remark 5.2. The proof of this result takes advantage of the fact that the absolute value is outside of the expectation.
In particular, condition (a) of the theorem becomes that for all j ∈ S, there exists Remark 5.4. The configuration ξ j,N has no special role. By Theorem 2.7 in [15], if condition (5.4) holds for one configuration in E j N , it holds for all.
The coarse-grained jump function, denoted by r where, for k = j, Remark 5.5. Hypothesis (a) of Theorem 5.1 requires the process to visit all configurations of the valley E j N before it reaches a new one. Dynamics which display this behavior are said to "visit points". This class includes condensing zero-range processes [16,84,4,116], random walks in a potential field [91,92,93] or models in which the valleys are singletons as the inclusion process [25] or random walks evolving among random traps [63,62,77,78], but it does not contain the example of Section 1. For such dynamics, in which the entropy plays a role in the metastable behavior, a different approach is needed. This is discussed in Sections 8 and 9.

The coarse-grained jump rates
In this section, we investigate the asymptotic behavior of the coarse-grained jump rates r N (k, j), defined in (5.5). This is condition (4.5) of Theorem 4.2.

Reversible case
In the reversible case, we may express the coarse-grained jump rates r N (j, k) in terms of capacities. If follows from the explicit formulae (5.1), (5.5) and from an elementary argument taking advantage of the reversibility that Here and below we often write E j ,Ȇ j for E j N ,Ȇ j N , respectively. Therefore, in the reversible case, one can compute the limit of the coarse-grained jump rates r N (j, k) if one can calculate the asymptotic behavior of π N (E j ) and of

Nonreversible case
Summing over j = k in (5.5) provides a formula for the coarse-grained holding rates, denoted by λ N (j): The expression on the right-hand side corresponds to the capacity between E k andȆ k . Therefore, Remark 6.1. Equation (6.2) provides a formula for the magnitude of the scaling parameter θ N . To derive a non-trivial limit for the coarse-grained model X T N , time has to be rescaled by the inverse of the capacity between the sets E j andȆ j : The asymptotic behavior of the coarse-grained holding rates can be computed through formula (6.2) provided one can estimate the capacities and the measures of the valleys. Once this has been done, to compute the jump rates, it remains to estimate the jump probabilities.
Recall from Section 13 the definition of a collapsed chain. Fix j ∈ S, and denote by η C,j (t) the Markov chain obtained from the chain η N (t) by collapsing the valley E j to a point, denoted by j. The chain η C,j (t) takes value in E C,j N : induced by the collapsed process η C,j (t) starting from η. Expectation with respect to P C,j η is represented by E C,j η . By the last formula of the proof of [19,Proposition 3.4], for any k ∈ S, k = j, Denote by P j the probability measure on D([0, ∞), S) induced by the reduced model X(t) starting from j. We present below a set of sufficient conditions which ensure that p N (j, k) converges to P j [H k < H S\{j,k} ]. This approach has been developed and gradually refined in [84,92,116], and it is based on the premise that the capacities can be calculated through the Thomson and the Dirichlet principles.
Denote by L 2 (π N ) the space of square-summable functions f : E N → R endowed with the scalar product · , · π N given by We assume that the generator L N of the Markov chain η N (t) satisfies a sector condition with a constant C 0 independent of N : For every f , g ∈ L 2 (π N ), Suppose that for fixed j, k ∈ S, k = j, where cap S (A, B), A, B ⊂ S, represents the capacity with respect to the reduced model X(t).
We also assume that the capacities for the collapsed process η C,j (t) can be calculated: Denote by cap C,j N (A, B) the capacity between A, B ⊂ E C,j N , A ∩ B = ∅ induced by the collapsed process η C,j (t). We assume that the limit of the capacity The computation of the capacities requires test flows or test functions which approximate the optimal ones in the variational principles. It is thus implicitly assumed in hypotheses (6.4) and (6.5) that explicit expressions for such flows or functions are available. We assume below that there exists a sequence of functions V N j,k : where D N (f ) stands for the Dirichlet form of f : The last identity in (6.6) follows from the fact, proved in (14.6), that D N (h N E k ,Ȇ j,k ) = cap N (E k ,Ȇ j,k ) and from assumption (6.4).
We assume, furthermore, that V N j,k is constant in each valley E N : j,k to be small in these sets. We assume that (6.8) Proposition 6.2. Fix j, k ∈ S, k = j, and assume that conditions (6.3)-(6.8) are in force. Then, lim The proof of this proposition is divided in several lemmata. Since V N j,k is constant on E j , we may collapse it to a function defined on E C,j N . Recall from (6.7) the value of V N j,k at E j and let V C,j j,k : E C,j N → [0, 1] be given by The dependence of V C,j j,k on N has been omitted.
A,B the solution of the boundary value elliptic problem Note that the function h j,k is constant and equal to p N (j, k) on the set E j .
) π N and compute separately the limit of the four terms. By equation (13.14), Dirichlet form associated to the collapsed process. By (14.6), By assumption (6.6), the same result holds for V j,k in place of h j,k .
It remains to examine the cross terms. By (13.14), where π C,j N stands for the stationary measure π N collapsed at E j . Since h (j) E k ,Ȇ j,k is harmonic on ∆ N ∪ {j}, and since V C,j j,k vanishes onȆ j,k and coincides with h Using again the harmonicity of h , and the fact that it vanishes on E j,k , we may extend the sum to the entire set E C,j N and conclude, as at the beginning of the proof, that Since V j,k vanishes onȆ j,k , the first term on the righ-hand side is equal to Since h j,k and V j,k are constant in E j , non-negative and bounded by 1, the absolute value of the second term on the right-hand side is less than or equal to By condition (6.8), this expression multiplied by θ N converges to 0 as N → ∞. Thus, by (6.6), Putting together all previous estimates yields the assertion.
Fix two non-empty subsets A, B of E N such that A ∩ B = ∅, Recall from Section 14 that we represent by C 1,0 (A, B) the space of functions f : E N → [0, 1] which are equal to 1 on A and 0 on B. Let f N j,k = h j,k − V j,k , and note that this function is constant on each valley E .
represents the capacity associated to the symmetric dynamics. By the sector condition, stated in assumption (6.3), and Lemma 14.12, this symmetric capacity is bounded below by which proves the assertion of the lemma.
The claim of the proposition follows from Lemma 6.4 and the fact that Remark 6.5. Assumption (6.3) can be replaced by the hypothesis that Proof. We only used the sector condition, assumption (6.3), in the proof of Lemma 6.4 to guarantee that θ N D N (F ) is bounded below by a strictly positive constant. Since 7 The negligible set ∆ N We provide in this section sufficient conditions for assumption (LP2) or (T2) to hold. Recall from (5.5) that r N (k, j) represents the coarse-grained jump rates. Assume that they converge: Recall that we represent by X(t) the reduced model, the S-valued Markov chain whose jump rates are given by r(k, j). Denote by A ⊂ S the subset of S formed by the points which are absorbing for the reduced model X(t). Next result is Theorem 2.7 in [15] and Theorem 2.1 in [19].
Theorem 7.1. Assume that conditions (5.4) and (7.1) are in force. Assume, furthermore, that for all k ∈ A, t > 0, and that for all Then, property (LP2) is in force.
Remark 7.2. In the previous theorem, we may replace conditions (7.2), (7.3) by the assumption In some spin dynamics, the valleys are formed by few configurations and the following simple argument applies.
Then, condition (LP2) is in force.
Proof. Fix t > 0. Clearly, dividing and multiplying by π N (η), Since π N is the stationary state, the previous expression is equal to t π N (∆ N )/π N (η), which proves the lemma.

The Poisson equation
We present here an alternative method to prove uniqueness of limit points of the sequence of measures P N introduced in Section 4. It relies on asymptotic properties of the solutions of Poisson equations. Assume that we are able to foretell the dynamics of the reduced model, and denote by L its generator. Fix a function F : S → R, and let G = LF . Denote by f , g : E N → R the function given by The functions f , g are constant on each valley E N and vanish at ∆ N . The method presented below relies on the assumption that the solution f N of the Poisson equation A solution of this equation exists only if g has zero-mean with respect to π N , which is not necessarily the case. Therefore, we need first to turn g into a zero-mean function and then to consider the solution of the Poisson equation. This is the content of conditions (A1), (A2).
Assume that there exists a sequence of function g N : E N → R such that (A1) g N has zero-mean with respect to π N , vanishes on ∆ N and converges to g uniformly on E N ; (A2) Denote by f N the unique solution of the Poisson equation The natural candidate for g N in conditions (A1) and (A2) is the function g itself, but it does not have zero-mean. To fulfill this condition, denote by π the stationary state of the reduced model. We expect π N (E k N ) to converge to π(k). Hence, . Properties (A1), (A2) have been proved in [57,117] for elliptic operators on R d of the form L N f = e N V ∇ · (e −N V a∇f ) and in [94] for one-dimensional diffusions with periodic boundary conditions. It is an open problem to prove these conditions in the context of interacting particle systems, say for condensing zero-range processes.
The main result of this section, Theorem 8.2 below, asserts that conditions (A1), (A2) guarantee uniqueness of limit points of the sequence P N . The proof of this result requires some preparation.
Let Q N η , η ∈ E N , be the probability measure on D([0, ∞), E N ) induced by the speeded-up process ξ N (t) := η N (tθ N ) starting from η. Keep in mind that the generator of this process is θ N L N . Denote by (F o t : t ≥ 0) the σ-algebra of subsets of D([0, ∞), E N ) generated by {η(s) : 0 ≤ s ≤ t}, where η(s) represents the coordinate process. Fix η ∈ E N and denote by {F η t : t ≥ 0} the usual augmentation of {F o t : t ≥ 0} with respect to Q N η . We refer to Section III.9 of [114] for a precise definition. The advantage of F η t with respect to F o t is that it is right-continuous: where the intersection is carried out over all q ∈ (0, ∞) ∩ Q. By definition of T E N , Hence, as the filtration is right-continuous, . Moreover, as the coordinate process corresponds to the distribution of ξ N (t), ξ E N (t) corresponds to the trace of the speeded-up process ξ N (t) on E N .
It is easy to check that we may commute the trace operation with the acceleration of the process: On the left-hand side, we first computed the trace of the chain η N (t) on E N and then accelerated it by θ N , while on the right-hand side we first speeded-up the chain η N (t) by θ N and then computed the trace of the result on E N . In particular, the process . We may now state the main result of this section.
Theorem 8.2. Fix k ∈ S and a sequence η N ∈ E k N . Assume that conditions (A1) and (A2) are in force for every function F : S → R. Then, every limit point P of the sequence P N such that for all t > 0 . (8.5) solves the (L, δ k ) martingale problem.
Proof. Fix a function F : S → R. Let f N : E N → R be the function given by assumption (A2). Then, is a martingale with respect to the filtration G N,η N t . Since g N vanishes on ∆ N , we may insert in the integral the indicator function of the set E N . Then, a change of variables yields that this integral is equal to Therefore, for all s ≥ 0, we may replace in the previous equation g N , f N by g, f , respectively, at a cost which vanishes as N → ∞. Therefore, is a martingale because G = LF . Since P N corresponds to the distribution of X T N , is a martingale under P N up to a small error. Let P be a limit point of the sequence P N satisfying (8.5), and assume, without loss of generality, that P N converges to P. By (8.5), the one-dimensional projections are continuous, and we may pass to the limit to obtain that M (t) is a martingale under P.
On the other hand, as η N ∈ E k N , P N [X(0) = k] = 1 for all N , so that P[X(0) = k] = 1. This proves that any limit point of the sequence P N satisfying (8.5) is a solution of the (L, δ k ) martingale problem.

Local ergodic theorem in L 2
It is not clear whether the scheme presented in the previous section can be applied to a large class of dynamics. The proof of condition (A2) is unclear even for the simple example of Section 2.
The method presented in Sections 4-6 has also a drawback. As the function f = k∈S F (k)χ E k N has a sharp interface, the jump rates R (k) N which appear in the computation of L E N f , are singular functions, vanishing at the interior of the valleys and taking large values at the boundary. This lack of smoothness turns the proof of the local ergodic theorem more demanding.
Following [20], we propose below an alternative approach, in which we replace the indicator function χ E k N by "smooth" approximations obtained by solving the resolvent equation where L E N represents the generator of the trace process η E N (t), I the identity and γ N a suitable sequence of positive numbers. The resolvent equation (9.1) has a unique solution, denoted by u k N . Equation (9.12) provides a stochastic representation of the solution, different from the usual one given in terms of a time integral. This guarantees existence. Uniqueness can be proven as follows. Let u 1 , u 2 be two solutions, and set w = u 1 − u 2 . The function w solves (9.1) with a right-hand side equal to 0. Multiply both sides of the equation by w and integrate with respect to π E to get that We prove in Lemmata 9.1, 9.2 that u k N is close to χ E k N and that the local ergodic theorem holds for L E N u k N if γ N is larger than the equilibration times in the valleys and smaller than the transition times between valleys.

The enlarged process
We assume below that the reader is familiar with the results on enlarged and reflected chains summarized in Section 13.
We do not require below the process η N (t) to be reversible, but we impose certain conditions on the reflected processes. Denote by η R,k N (t) the process η N (t) reflected at E k N . Recall that this means that we forbid all jumps between E k N and its complement, and consider the resulting dynamics in E k N . Denote by π E k the stationary measure π N conditioned to E k N . We assume that for all k ∈ S the reflected process at E k N is irreducible and that π E k is a stationary state (and therefore the unique stationary state up to multiplicative constants). If the process is reversible, the second condition follows from the first one. By Lemma 13.7, this is also the case in the non-reversible setting if the valley E k N is formed by cycles. Denote by L R,k N the generator of the process η R,k N (t) and by t k,N rel the relaxation time of the symmetric part of the generator: where the infimum is carried over all zero-mean functions f : E k N → R.
Let E ,k N be copies of the sets E k N , k ∈ S, and set Denote by P : E N ∪ E N → E N ∪ E N the application which maps a configuration in E N , E N , to its copy in E N , E N , respectively. Fix a sequence γ N , and denote by ζ N (t) the γ N -enlargement of the trace process η E N (t). The process ζ N (t) is a Markov chain taking values in E N ∪ E N and whose generator, denoted by L E N , , is given by In this formula, R T N (η, ξ) represents the jump rates of the trace process η E N (t). Hence, from a configuration η ∈ E N the chain may only jump to P η and this happens at rate 1/γ N . From a configuration η ∈ E N , besides the jumps of the original chain, the enlarged process may also jump to P η and this happens at rate 1/γ N . The parameter γ N will be large, which makes the jumps between E N and E N rare.
The stationary state of ζ N (t), denoted by π E , is given by where, recall, π E stands for the stationary state π N conditioned to E N . In dynamics in which the process jumps to a new valley before visiting all configurations in the valley, as configurations are not visited, it makes more sense to suppose that the dynamics starts from a distribution rather than from a configuration. Denote this initial distribution by ν N and assume that there exist ∈ S and a finite constant C 0 such that for all N ≥ 1 Note that the measure π E satisfies this condition. For two non-empty, disjoint subsets A, B of E N ∪ E * N , denote by cap (A, B) the capacity between A and B for the enlarged process. Consider two sequences (a N : N ≥ 1), (b N : N ≥ 1) of positive real numbers. We say that a N is much smaller than Lemma 9.1. Fix k ∈ S, two sequences of positive numbers γ N , θ N , and a sequence of probability measures ν N satisfying (9.2). Assume that γ N θ N and that there exists a finite constant C 0 such that for all N ≥ 1 Then, representing by u k N the solution of the resolvent equation (9.1), Denote by ζ E N (t) the trace of the process ζ N (t) on E N , and by P ,γ N η , η ∈ E N ∪ E N , the probability measure on D([0, ∞), E N ∪ E N ) induced by the enlarged process starting from η. Let r N (j, k), j = k ∈ S, be the coarse-grained jump rates at which the trace process ζ E N (t) jumps from E ,j N to E ,k N . By (5.5), these rates are given by where λ (η) represents the holding rates of ζ N (t). Since the enlarged process jumps from η ∈ E N to P η at rate γ N , the previous expression is equal to According to Section 6, in the reversible case, the coarse-grained jump rates r N (j, k) can be expressed in terms of capacities, while in the non-reversible case they can be computed if there are good approximations of the equilibrium potential. Assume, from now on, that these rates converge: There exist a time-scale θ N and jump rates r(j, k) such that lim N →∞ θ N r N (j, k) = r(j, k) for all j = k ∈ S . (9.5) Condition (9.3) follows from this hypothesis since The sequence θ N represents the time-scale at which the process jumps between valleys. The proof of a metastable behavior is set up on the ground that this timescale is much larger than the equilibration time inside the valleys. This hypothesis is formulated here by requiring the relaxation times of the processes reflected at a valleys to be much smaller than θ N : for all k ∈ S, Lemma 9.2. Fix ∈ S, and a sequence of probability measures ν N satisfying (9.2).
Assume that for all j, k ∈ S, Let γ N be a sequence such that max j,k α j,k N γ N θ N , where α j,k N stands for the left-hand side of (9.8). Then, for all T > 0, k ∈ S, By (9.12) and a straightforward computation, where r N (j, k), j = k, are the coarse-grained jump rates introduced in (9.4), and r N (j, j) = − k =j r N (j, k). Thus, for every function F : S → R, Fix ∈ S and a sequence of probability measures satisfying conditions (9.2). Let P N be the probability measure on D([0, ∞), S) induced by the process X T N and the measure ν N . Next theorem is the main result of this section. Theorem 9.3. Fix ∈ S and a sequence ν N of probability measures satisfying (9.2). Assume that conditions (9.5) and (9.8) are in force. Then, every limit point P of the sequence P N such that for all t > 0 . (9.10) solves the (L, δ ) martingale problem, where L is the generator of the S-valued Markov chain whose jump rates are r(j, k).
Theorem 9.3 describes the asymptotic evolution of the trace of the Markov η(t) on E N . The next lemma shows that in the time scale θ N the time spent on the complement of E N is negligible. The proof is similar to the one of Lemma 7.3 and uses Schwarz inequality and assumption (9.2) to replace ν N by π N .
for all j ∈ S. Fix ∈ S, and let {ν N : N ≥ 1} be a sequence of probability measures satisfying (9.2). Then, for every t > 0, Remark 9.5. The introduction of the enlarged process is inspired by the definition of the soft hitting time of Bianchi and Gaudillière [24].
Remark 9.6. Hypothesis (9.8) can be divided in two: Assume (9.6), and suppose that there exist constants 0 Remark 9.7. Hypothesis (9.8) can be weaken as follows. Instead of fixing the same rate γ N for all valleys, we may choose a valley-dependent rate. This does not alter the stationary state, and it permits to choose larger parameters γ N for deeper valleys. Assumption (9.8) may also be weaken to admit a deep valley, all the other ones being shallow (cf. [20]).
Remark 9.8. In Subsection 15.5, we apply the method presented above to a polymer model examined by Caputo et al. in [37,35]. It can also be employed to derive the reduced model of the random walk presented in Section 2. We refer to in [20]. Lacoin and Teixeira [83] followed this scheme to prove the metastable behavior of a polymer interface which interacts with an attractive substrate.
Proof of Theorem 9.3. Fix ∈ S and a sequence ν N of probability measures satisfying (9.2). Fix a function F : S → R and a limit point P of the sequence P N satisfying (9.10). Assume, without loss of generality, that P N converges to P. We claim that is a martingale under P, where L is the generator associated to the jump rates r(j, k) introduced in (9.5). Fix 0 ≤ s < t, q ≥ 1, 0 ≤ t 1 < · · · < t q ≤ s, and a bounded function g : S q → R. Let G = g(X(t 1 ), . . . , X(t q )), where X(s) represents the coordinate process of D([0, ∞), S). We shall prove that where E stands for the expectation with respect to P.
Fix a sequence γ N such that for all j, k ∈ S, which is possible in view of (9.8), and recall that we denote by u k N the solution of (9.1).
By the Markov property of the trace process η E (t), is a martingale. In particular, if Thus, by the penultimate equation and since X T By definition of H N and w k N , introduced just above (9.7), L E N H N = k F (k) w k N . Hence, by Lemma 9.2 and (9.9), where L N is the generator of a S-valued Markov chain given by At this point, the martingale has been expressed as a function of the process X T N (t). By definition of the measure P N , the previous expectation is equal to where, recall, X(t) represents the coordinate process in D([0, ∞), S) and E N expectation with respect to P N . By assumption (9.5), (L N F )(k) converges to (LF )(k) for all k ∈ S. Therefore, as P N converges to P and in view of (9.10) [which guarantees that the finite-dimensional projections are continuous], passing to the limit, we get that This shows that (9.11) holds, and completes the proof of the theorem.

The resolvent equation
We examine in this subsection the asymptotic behavior of the solution of resolvent equation (9.1).
Fix γ N > 0 and consider the γ N -enlargement of the process η E N (t). Let h k N : 1] be the equilibrium potential between the sets E ,k N andȆ ,k N : Since L E N , h k N = 0 on E N , we deduce that the restriction of h k N to E N solves the resolvent equation (9.1). Since the solution is unique, u k N = h k N on E N and we have a simple stochastic representation of the solution of the resolvent equations.
Remark 9.9. The enlargement of the chain η E N (t) thus provides a stochastic representation of the resolvent equation (9.1).
Lemma 9.10. There exists a finite constant C 0 , independent of N , such that for all k ∈ S, Proof. Denote the left-hand side of the inequality by A N , and by B N the same expression with π E in place of π E . Since π E (η) = (1/2) π E (η), η ∈ E N , A N = 2B N . As u k N and h k N coincide on E N , we may replace the former by the latter. On the other hand, as where D N, (f ) represents the Dirichlet form of f with respect to the enlarged process ζ N (t).
By (14.6), . By assumption (9.3), the capacity is less than or equal to C 0 π E (E ,k N )/θ N for some finite constant C 0 . This proves the assertion because π E (E ,k Proof of Lemma 9.1. Fix ∈ S, a sequence of probability measures ν N satisfying the hypotheses of the lemma and t > 0. Denote by S E (t), t ≥ 0, the semigroup associated to the trace process η E (t), and by f N t the Radon-Nikodym derivative dν N S E (t)/dπ E . By (13.8), Hence, by Schwarz inequality, the square of the expectation appearing in the statement of the lemma is bounded above by By Lemma 9.10, the second term is bounded by C 0 γ N π E (E N )/θ N . Thus, by the assumption on the sequence of probability measures ν N , the previous displayed formula is bounded by C 0 γ N /θ N . This expression vanishes as N → ∞ by the hypothesis on γ N .

Local ergodicity
The proof of Lemma 9.2 is divided in several steps. Denote by · , · π E the scalar product in L 2 (π E ). For a zero-mean function f : E N → R, let f −1 be the H −1 norm of f associated to the generator L E N : where the supremum is carried over all functions h : E N → R. By [81, Lemma 2.4], for every function f : E N → R which has zero-mean with respect to π E , and every T > 0, Recall that we denote by π E k the stationary measure π N conditioned to E k N . Let L R,E k N be the generator of the reflected process η N (t) at E k N . For a function f : E k N → R which has zero-mean with respect to π E k , denote by f k,−1 the H −1 norm of f with respect to the generator L R,E k N : where the supremum is carried over all functions h : for any function h : E N → R. These expression are not equal because two kinds of jumps appear on the right-hand side and do not on the left: The trace process may jump between valleys, and it may also perform a jump inside a valley (crossing the set ∆ N ) which is not possible in the original dynamics. It follows from the previous inequality and from the formulae for the H −1 norms that for every function f : E N → R which has zero-mean with respect to each measure (9.14) Lemma 9.11. Let {ν N : N ≥ 1} be a sequence of probability measures on E N . Then, for every function f : E N → R which has zero-mean with respect to each measure π E j and for every T > 0, Proof. By Schwarz inequality, the expression on the left hand side is bounded above by By (9.13) and by (9.14), the second expectation is bounded by as claimed.
Proof of Lemma 9.2. Fix ∈ S, and a sequence of probability measures ν N satisfying the hypotheses of the lemma. Fix k ∈ S. Since w k N − w k N has zero-mean with respect to each π E j , by the assumption on the sequence ν N and Lemma 9.11, the square of the expectation appearing in the statement of the lemma is bounded by Hence, by the spectral gap of the reflected process, Therefore, the sum appearing in (9.15) is bounded by By the hypotheses of the lemma, this expression vanishes as N ↑ ∞, which completes the proof.
Proof of Lemma 9.4. Fix ∈ S, and let ν N be a sequence of probability measures satisfying (9.2). By Schwarz inequality, the square of the expectation appearing in the statement of the lemma is bounded above by By assumption (9.2), the first expectation is bounded by C 0 /π E (E N ). On the other hand, by Schwarz inequality, the second expectation is less than or equal to The expression appearing in the penultimate displayed formula is thus bounded above by C 0 t 2 [ π N (∆ N )/π N (E N ) ], which concludes the proof of the lemma.

Tightness
In this section, we present sufficient conditions for the tightness of the sequence P N introduced in Theorems 4.2, 8.2 and 9.3. We need a slight generalization of Lemma 8.1. Recall the notation introduced just before this lemma. We proved there that for each t ≥ 0 and η ∈ E N , S E N (t) is a stopping time with respect to the filtration (F η t : t ≥ 0). Lemma 10.1. Let {G r : r ≥ 0} be the filtration given by G r = F η S E N (r) , and let τ be a stopping time with respect to {G r }. Then, S E N (τ ) is a stopping time with respect to {F η t }.
Proof. Fix a stopping time τ with respect to the filtration {G r }. This means that for We claim that {S E N (τ ) < t} ∈ F η t . Indeed, by (8.3), this event is equal to {T E N (t) > τ }, which can be written as By the penultimate displayed equation, each term belongs to F η t−(1/n) ⊂ F η t , which proves the claim.
We may conclude. Since where the intersection is carried out over all q ∈ (0, ∞) ∩ Q, and since the filtration {F η t } is right continuous, by the previous claim, Recall that ξ N (t) = η N (tθ N ), and the definition of the measure Q N η introduced just before Lemma 8.1. Expectation with respect to this measure is denoted by Q N η , as well. Note that η E N (tθ N ) = ξ E N (t).

Lemma 10.2. Suppose that for all
and that lim Then, the sequence of measures P N is tight. Moreover, every limit point P is such that P X(t) = X(t−) = 0 for every t > 0.
Proof. Fix η ∈ E N . According to Aldous' criterion [26], we have to show that for every δ > 0, R > 0, where the supremum is carried over all stopping times τ bounded by R and all 0 ≤ a < a 0 . Since X T N (t) = Ψ N (ξ E N (t)), the previous probability can be written as

the expression in the previous displayed equation is bounded by
Fix b = 2a 0 so that b − a ≥ a 0 . Decompose this probability according to the event {S E N (τ + a) − S E N (τ ) > b} and its complement.
In other words, By Lemma 10.2, S E (τ ) is a stopping time for the filtration {F η t }. Hence, by the strong Markov property and since ξ N (S E N (t)) belongs to E N for all t ≥ 0, By Chebychev inequality, a change of variables and by our choice of b, this expression is less than or equal to By assumption (10.1), this expression vanishes as N → ∞ for every a 0 > 0. We turn to the case {S E N (τ + a) − S E N (τ ) ≤ b}. On this set we have that Since S E N (τ ) is a stopping time for the filtration {F t } and since ξ N (S E N (t)) belongs to E N for all t, If ξ ∈ E j N , this later event corresponds to the event {H(Ȇ j N ) ≤ b}. The maximum is thus bounded by By assumption (10.2), this expression vanishes as N → ∞ and then a 0 → 0. This completes the proof of the tightness. The same argument shows that for every t > 0, Hence, if P is a limit point of the sequence P N , lim a0→0 P X(t − a) = X(t) for some 0 ≤ a ≤ a 0 = 0 .
This completes the proof of the second assertion of the lemma since Conditions (10.1), (10.2), can be formulated in terms of capacities. Next results is Theorem 2.6 in [15] and Theorem 2.1 in [19]. Note that we do not require the process to be reversible.
Let A ⊂ S be the set of absorbing points of the Markovian dynamics induced by the rates r(j, k). Assume that for all j ∈ A, t > 0, Assume that for all k ∈ S \ A, Then, conditions (10.1), (10.2) hold.
This result, which guarantees tightness, together with Theorems 4.2, 5.1 and Remark 5.3, which provide uniqueness, yield the convergence of the sequence X T N .
Theorem 10.4. Fix k ∈ S, a sequence η N ∈ E k N , and denote by P N the probability measure on D([0, ∞), S) induced by the process X T N (t) and the measure P N η N . Assume the hypotheses of Theorem 10.3. Then, the sequence P N converges to the solution of the (L, δ k ) martingale problem, where L is the generator of the S-valued Markov chain whose jump rates are r(j, k).

The last passage
We prove in this section that the last passage process, introduced in Definition 2.1, converges if conditions (T1), (T2) hold. In order to prove this statement, we first define a metric in the path space D([0, ∞), S ∪ {d}) which induces the Skorohod topology. Assume that 0 ∈ S and identify the point d with 0 ∈ Z so that S ∪ {d} is a metric space with the metric induced by Z.
For each integer m ≥ 1, let Λ m denote the class of strictly increasing, continuous mappings of [0, m] onto itself. If λ ∈ Λ m , then λ 0 = 0 and λ m = m. In addition, consider the function It follows from this result that the last-passage process X V N (t) converges whenever the trace process X T N (t) converges and (T2) is in force.

The finite-dimensional distributions
Recall the definition of the process X N (t) defined in (2.2), and the one of the reduced model X(t) introduced in Definition 2.1. Next result is Proposition 1.1 of [89]. With further mixing conditions one can prove that the state of the process at time tθ N is a time-dependent convex combinations of states supported in the valleys.
Denote by p t (j, k) the transition probabilities of the reduced model X(t), by π k N the measure π N conditioned to E k N , and by µ − ν TV the total variation distance between two probability measures µ and ν defined on E N . Let (S N (t) : t ≥ 0) be the semigroup associated to the Markov chain η N (t). Then, under mixing conditions specified in [89], for every j ∈ S and sequence η N ∈ E N j , where δ η , η ∈ E N , stands for the Dirac measure concentrated on the configuration η.

Markov chains
We briefly present in this section some results on Markov chains used in the article. Fix a finite set E. Consider a continuous-time, E-valued, Markov chain (η(t) : t ≥ 0). Assume that the chain η(t) is irreducible and denote by π the unique stationary state.
Elements of E are represented by the letters η, ξ. Let P η , η ∈ E, be the probability measure on D([0, ∞), E) induced by the Markov chain η(t) starting from η. Recall from (2.1) the definition of the hitting time and the return time to a set.
Denote by L the generator of the Markov chain η(t), Let L 2 (π) be the set of square-summable functions f : E → R endowed with the scalar product · , · π given by Denote by L * the adjoint of the operator L in L 2 (π): For all functions f , g : E → R, An elementary computation yields that where the jump rates R * (η, ξ) satisfy The chain is said to be reversible if the generator L is self-adjoint: L * = L. It is reversible if and only if the jump rates satisfy the detailed balance conditions: The operator L * corresponds to the generator of a Markov chain, represented by η * (t), and called the adjoint or time-reversed process. The holding rates λ * (η) = ξ∈E R * (η, ξ) of this chain coincide with the original ones, λ * (η) = λ(η), and the jump probabilities p * (η ξ) satisfy the balance conditions Let L s be the symmetric part of the generator L: The operator L s is self-adjoint in L 2 (π) and it corresponds to the generator of the Markov chain whose jump rates, denoted by R s (η, ξ), are given by R s (η, ξ) = (1/2){R(η, ξ)+ R * (η, ξ)}. A simple computation shows that these rates satisfy the detailed balance conditions (13.2).
Denote by D(f ) the Dirichlet form of a function f : E → R: We leave to the reader the assignment of checking the last equality. An elementary computation shows that This formula holds even in the non-reversible case. In the sum, each unordered pair {η, ξ} ⊂ E, ξ = η, appears twice. Denote by (S(t) : t ≥ 0), the semigroup associated to the generator L, so that (d/dt)S(t) = L S(t) = S(t) L. Fix a probability measure ν on E and let f t be the Radon-Nikodym derivative of νS(t) with respect to π. We claim that Indeed, fix a function g : E → R and consider the mean E ν [g(η(t))], where E ν represents the expectation with respect to the measure P ν = η∈E ν(η) P η . This expectation can be written as As (d/dt)S(t)g = S(t) L g, taking derivative on both sides of this identity we get that The left-hand side can be written as f t , L g π = L * f t , g π . Hence, for all functions g, (d/dt)f t , g π = L * f t , g π , which proves claim (13.7).
By (13.7) and (13.1), The inequality follows from the positiveness of the Dirichlet form derived in (13.6).
Integrating in time yields that In particular, for all 0 ≤ s ≤ t, f t , f t π ≤ f s , f s π . (13.8) The spectral gap of the generator, denoted by g, is the value of the smallest positive eigenvalue of the symmetric part of the generator: where the infimum is carried over all functions f : E → R which are orthogonal to the constants, i.e., which have zero-mean with respect to π: E π [f ] = f , 1 π = 0.

Reflected chain
Fix a non-empty, proper subset F of E. Denote by (η R,F (t) : t ≥ 0), the Markov chain η(t) reflected at F . This is the F -valued process obtained from η(t) by forbidding all jumps between F and E \ F . The generator L R,F of this Markov process is given by Assume that the reflected process η R,F (t) is irreducible. It is easy to show that the conditioned probability measure π F defined by π F (η) = π(η) π(F ) , η ∈ F , (13.9) satisfies the detailed balance conditions (13.2) for the reflected process if the chain is reversible.
In general, π F may not be invariant. Consider, for example, an asymmetric random walk on the circle. The uniform measure is invariant, but its restriction to an interval I is not invariant for the process reflected at I. For cycle generators, however, it is possible to reflect the chain preserving the stationary state.

Cycle generators
The results of this subsection are taken from Section 4 of [95]. We refer to [93] for an application. Cycle: A cycle is a sequence of distinct configurations (η 0 , η 1 , . . . , η n−1 , η n = η 0 ) whose initial and final configuration coincide: The number n is called the length of the cycle. Cycle generator: A generator L is said to be a cycle generator associated to the cycle c = (η 0 , η 1 , . . . , η n−1 , η n = η 0 ) if there exists reals r i > 0, 0 ≤ i < n, such that R(η, ξ) = r i if η = η i and ξ = η i+1 for some 0 ≤ i < n , 0 otherwise .
We denote this cycle generator by L c,r , where r = (r 0 , . . . , r n−1 ). Most of the time we omit the dependence on r and write L c,r simply as L c . Note that and that the chain is irreducible only if {η 0 , η 1 , . . . , η n−1 } = E. Consider a cycle c = (η 0 , η 1 , . . . , η n−1 , η n = η 0 ) of length n ≥ 2 and let L c be a cycle generator associated to c. Denote the jump rates of L c by R(η i , η i+1 ). A measure π is stationary for L c if and only if (13.10) Sector condition: Next lemma asserts that every cycle generator satisfies a sector condition. The proof of this result can be found in [81,Lemma 5.5.8].
Lemma 13.1. Let L c be a cycle generator associated to a cycle c of length n. Then, L c satisfies a sector condition with constant 2n: For all f , g : E → R, Cycle decomposition: Every generator L, stationary with respect to a probability measure π, can be decomposed as the sum of cycle generators which are stationary with respect to π. where L cj are cycle generators associated to c j which are stationary with respect to π.
As R c (ζ, ζ ) ≤ R(ζ, ζ ), L 1 is the generator of a Markov chain. Since both L and L c are stationary for π, so is L 1 . Finally, if we draw an arrow from ζ to ζ if the jump rate from ζ to ζ is strictly positive, the number of arrows for the generator L 1 is equal to the number of arrows for the generator L minus 1 or 2. This procedure has therefore strictly decreased the number of arrows of L.
Once all k-cycles have been removed, 2 ≤ k < |E|, we have obtained a decomposition of L as where L k is the sum of k-cycle generators and is stationary with respect to π, andL is a generator, stationary with respect to π, and with no k-cycles, 2 ≤ k < |E|. IfL has an arrow, as it is stationary with respect to π and has no k-cycles,L must be an |E|-cycle generator, providing the decomposition stated in the lemma.
Proof. Fix f and g : E → R. By Lemma 13.2, where L cj is a cycle generator, stationary with respect to π, associated to the cycle c j . By Lemma 13.1 and by Schwarz inequality, since all cycles have length at most |E|, the previous sum is bounded by as claimed Remark 13.4. A generator L is reversible with respect to π if and only if it has a decomposition in 2-cycles. Given a measure π on a finite state space, by introducing k-cycles satisfying (13.10) it is possible to define non-reversible dynamics which are stationary with respect to π. The previous lemma asserts that this is the only way to define such dynamics. Lĉ k ,r k , and π is a stationary state for all cycle generators. We leave the reader to find an example. However, in view of Lemma 13.1, it is natural to look for one which minimizes the length of the longest cycle.
Remark 13.6. In a finite set, the decomposition of a generator into cycle generators is very simple. The problem for countably-infinite sets is much more delicate. We refer to [65] for a discussion.
Let F be a proper subset of E and consider the chain reflected at F . The last result of this subsection provides sufficient conditions for the measure π conditioned to F to be a stationary state for the reflected process in the non-reversible case.
Lemma 13.7. Assume that the generator L can be written as a sum of cycle generators: where c 1 , . . . , c p are cycles and π is a stationary state for each L cj . Then, the measure π conditioned to F is stationary for the reflected chain at F if there exists a subset A of {1, . . . , p} such that Proof. Since π is a stationary state for each L cj , it is also a stationary state for L R,F = j∈A L cj . As the reflected process does not leave the set F , the measure π is stationary if and only if its restriction to F is stationary.

Enlarged chains
Let E be a copy of E. The elements of E are represented by the letters η, ξ. Denote by P : E ∪ E → E ∪ E the application which maps a configuration in E, E , to its copy in E , E, respectively.
Following [24], for γ > 0 denote by η γ (t) the Markov process on E ∪ E whose jump rates R γ (η, ξ) are given by Therefore, being at some state ξ in E , the process may only jump to P ξ and this happens at rate 1/γ. In contrast, being at some state ξ in E, the process η γ (t) jumps with rate R(ξ, ξ ) to the state ξ ∈ E, and jumps with rate 1/γ to P ξ. We call the process η γ (t) the γ-enlargement of the process η(t).
The probability measure π is invariant for the enlarged process η γ (t) and it is reversible whenever π is reversible. Let F be a subset of E. Think of F as a valley. If γ is much larger than the mixing time, the distribution of η(H F ), where F = {P η : η ∈ F }, is very close the stationary state conditioned to F .

Collapsed chains
The collapsed chain consists in collapsing a subset of the state-space to a point and in the defining a dynamics which keeps the properties of the original evolution as much as possible. This is a well-known technique, see for instance [32,1].
Fix a subset A of E, and let E A := [E \ A] ∪ {d}, where d stands for an extra configuration added to E and meant to represent the collapsed set A. Denote by (η C,A (t) : t ≥ 0) the chain obtained from η(t) by collapsing the set A to the singleton {d}. This is the continuous-time Markov chain on E A with jump rates R C,A (η, ξ), η, ξ ∈ E A , given by (13.11) The collapsed chain {η C,A (t) : t ≥ 0} inherits the irreducibility from the original chain. Denote by π C,A the probability measure on E A given by π C,A (d) = π(A) , π C,A (η) = π(η) , η ∈ E \ A . (13.12) Since ξ ∈A,ζ∈A π(ξ) R(ξ, ζ) = ξ ∈A,ζ∈A π(ζ) R(ζ, ξ) , one checks that π C,A is a stationary state, and therefore the unique invariant probability measure, for the collapsed chain η C,A (t).
The collapsed chain has to be understood as follows. Until the process hits the set A, it evolves as the original one. When it reaches this set, it immediately equilibrates and its position is replaced by the stationary distribution conditioned to A.
In particular, we may couple the collapsed process with the original one until the set A is reached, so that, for every η ∈ E \ A, and B ⊂ E \ A, provided P C,A η represents the distribution of the collapsed chain η C,A (t) starting from η. It follows from this identity and the explicit formulae for the jump rates and the stationary state that for every B ⊂ E \ A, where cap C,A (d, B) represents the capacity between d and B for the collapsed chain.
This identity ceases to hold if we replace A by a set in E \ A because (13.13) is incorrect if d, A are replaced by a set D ⊂ E \ A.
Denote by L C,A the generator of the chain η C,A (t). Fix two functions f , g : with a similar definition for G. We claim that (13.14) Conversely, if F , G : E → R are two functions constant over A, (13.14) holds if we define f , g : with an analogous equation for f , F replaced by g, G, respectively. To prove (13.14), fix two functions f , g : E A → R. By definition of L C,A , In view of (13.11), (13.12), this expression is equal to .
Since F (η) = f (η) for η ∈ E \ A, and F (ξ) = f (d) for ξ ∈ A, with similar identities with G, g replacing F , f , the last sum is equal to Since F is constant on A, we may add to this expression η∈A ξ∈A to obtain that the last displayed expression is equal to LF, G π , which concludes the proof of the first assertion of (13.14). The second statement is obtained following the computation in the reverse order.
14 Potential theory In this section, we present general results on the potential theory of continuous-time Markov chains used throughout the article. Reversible Markov chains can be interpreted in terms of electrical circuits. This description may provide some intuition on the notions introduced below, as Dirichlet form, capacity or equilibrium potential. We refer to the monographs of Doyle and Snell [52] and Gaudillière [66]. The analogy has been extended to the non-reversible context by Balázs and Folly [12].

The capacity
Fix two non-empty subsets A, B of E such that A ∩ B = ∅. The capacity between A and B, denoted by cap (A, B), is given by The capacity is monotone in the second coordinate. Let B be a subset of E such By (13.3), for any sequence of configurations η 0 , η 1 , . . . , η n such that p(η i , η i+1 ) > 0, 0 ≤ i < n, In particular, for any η ∈ A, ξ ∈ B, by (14.1) and the penultimate identity we have that

A formula for the capacity
Recall the formula (13.6) for the Dirichlet form D(f ) of a function f : E → R. Fix two disjoint subsets A, B of E: A ∩ B = ∅. Denote by h A,B : E → R the equilibrium potential between A and B. It is the unique solution of the boundary-value elliptic problem It has a stochastic representation as Since h A,B is harmonic on (A ∪ B) c , it vanishes over B and it is equal to 1 at A, By the representation (14.5) of the equilibrium potential, By the strong Markov property at the first jump, for every η ∈ A,

Flows
Denote by c(η, ξ) the conductance of the oriented edge (η, ξ), and by c s (η, ξ) its symmetric version: Let E be the set of oriented edges defined by An anti-symmetric function φ : E → R is called a flow. The divergence of a flow φ at η ∈ E is defined as while its divergence on a set A ⊂ E is given by The flow φ is said to be divergence-free at η if (div φ)(η) = 0. Denote by F the set of flows endowed with the scalar product given by Remark 14.1. If the Markov chain is irreducible, the set of oriented edges E represents In this language, the stationary state corresponds to the non-negative function m : E → R + defined on the vertices which makes the function ϕ R : E → R, defined by ϕ R (η, ξ) = m(η)φ R (η, ξ) divergence free at every vertex.

The Dirichlet and the Thomson principles
For a function f :

(14.9)
It follows from the definition of these flows that for all functions f : E → R, g : E → R, (14.10) Fix two disjoint subsets A, B of E and two real numbers a, b. Denote by C a,b (A, B) the set of functions f : E → R which are equal to a on A and b on B: Let F a (A, B) be the set of flows from A to B with strength a ∈ R: In particular , F 1 (A, B) is the set of unitary flows from A to B.
Let h * A,B be the equilibrium potential corresponding to the adjoint dynamics. It is the solution of the elliptic problem (14.4) with the adjoint generator L * in place of L. It can be represented through the adjoint chain η * (t) by equation (14.5) with the obvious modifications.

Theorem 14.2 (Dirichlet principle). For any disjoint and non-empty subsets
Furthermore, the unique optimizers of the variational problem are given by Furthermore, the unique optimizers of the variational problem are given by Theorem 14.2 appeared in Gaudillière and Landim [67], and Theorem 14.3 is due to Slowik [118]. Similar Dirichlet and Thomson principles are available in the context of diffusions processes, [90,85]. Remark 14.5. These variational formulae, expressed as infima, provide simple lower and upper bounds for the capacity. To obtain sharp bounds, good approximations of the harmonic functions are needed to produce test functions and test flows close to the optimal ones. In concrete examples, one of the difficulties is that the test flows constructed are never divergence free, and a correction has to be introduced to remove the divergence of the test flow, [91,93,116].

Reversible dynamics
In the reversible case, the conductance is symmetric: c(η, ξ) = c(ξ, η). In particular, all flows Φ f , Φ * f , Ψ f , introduced in (14.9), coincide, and the optimal flow φ of Theorem 14.2 vanishes because the equilibrium potentials h * A,B , h A,B are equal. Hence, in the reversible case, where the last identity follows from (14.10). We recover in this way the Dirichlet principle for reversible dynamics: In the Thomson principle, the optimal function g vanishes, and we recover the Thomson principle for reversible dynamics: In the reversible case, the Thomson principle can also be expressed in terms of functions.
Lemma 14.6. We have that where the infimum is carried over all functions f : Proof. Fix a function f : E → R such that (Lf )(η) = 0 for all η ∈ E \ (A ∪ B). By Schwarz inequality and equation (13.6) for the Dirichlet form, As the chain is reversible, the jump rates satisfy the detailed balance conditions (13.2). We may thus rewrite the sum appearing on the left-hand side as Since h A,B = χ A on A ∪ B and Lf = 0 on the complement, the previous sum is equal to − η∈A π(η) (L f )(η) .
We have thus proved that Remark 14.7. By inserting test functions, the previous lemma provides lower bounds for the capacity between two sets. In practical situations, however, it is almost impossible to find functions which are harmonic at every point of (A ∪ B) c . But it might be possible to find functions which are almost harmonic in the sense that Lf is small. The previous proof applied to any test function yields that for every > 0, where we used Young's inequality 2ab ≥ − a 2 − −1 b 2 and the fact that the absolute value of the harmonic function is bounded by 1.

Dirichlet principle II
We provide in this subsection an alternative variational formula for the capacity in terms of functions only. Fix two disjoint subsets A, B of E. Let F 0 (A, B) ⊥ be the set of flows in F which are orthogonal to all flows in F 0 (A, B). By [97,Theorem 8.7], for every function f in where the supremum is carried over all ψ = 0. We may rewrite the right-hand side to obtain that which is more convenient.
Lemma 14.9. We have that Proof. Denote by A the set on the right-hand side. Its is clear that Then, As f is constant equal to a, b on A, B, respectively, this sum can be written as Each of these sums vanish because φ belongs to F 0 (A, B).
In the first part of the proof, we showed that the left-hand side of this identity is equal to (14.13). Hence, for all a, b ∈ R and all f : (14.13) vanishes. From this we conclude that for all ξ ∈ A ∪ B, This proves that φ belongs to F 0 (A, B) and completes the proof of the lemma.
It follows from (14.12), the previous lemma and (14.10) that where the set C(A, B) has been introduced in the statement of Lemma 14.9. We replaced g by −g in the previous expression to remove the minus sign in the first term. The previous argument permitted to formulate in terms of functions a variational formula originally expressed through flows. Since, by (13.5), L s g , g π = Lg , g π , in the previous formula we may replace L s by L. This identity together with Theorem 14.2 provides a Dirichlet principle in terms of functions only. This is the content of the next result. In contrast with the one formulate in terms of flows, it involves an inf sup instead of an inf inf which is simpler to estimate. Moreover, the optimal function is given by f = (1/2){h A,B + h * A,B }. Theorem 14.10 has been proved by Doyle [51] and, independently, by Gaudillière and Landim [67]. A version in the context of diffusions is due to Pinsky [111,112]. Remark 14.11. It is also possible to transform the variational problem inf g∈C0,0(A,B) into a supremum over flows satisfying certain identities. The resulting variational formula does not seem to be useful.

Sector condition
Recall from (13.4) that we denote by L s the symmetric part of the operator L in L 2 (π): L s = (1/2)(L + L * ). This operator is self-adjoint in L 2 (π) and the corresponding Markov chain, denoted by η s (t) is reversible. Moreover, for every function f : E → R, Therefore, the Dirichlet form associated to the operator L s , denoted by D s (f ) and defined by the leftmost term of the previous equation, coincides with the Dirichlet form of the original process.
In particular, if we represent by cap s (A, B) the capacity between two disjoint, non-empty subsets A, B with respect to the chain η s (t), by (14.11), Hence, as h a,B belongs to C 1,0 (A, B), by (14.6) and the previous identity, 14) It turns out that a converse inequality holds if the generator satisfies a sector condition. Recall that a generator L satisfies a sector condition with constant C 0 if for every functions f , g : E → R, Next result states that the capacity between two sets can be estimated by by the symmetric capacity between these set if the generator satisfies a sector condition 14) asserts that the height of a valley in non-reversible dynamics is smaller than the one in the reversible version. Therefore, non-reversible dynamics mix faster than their reversible counterpart.
Remark 14.14. When the state space E is finite, the generator always satisfies a sector condition (cf. Corollary 13.3), but Lemma 14.12 holds in the context of countablyinfinite state spaces and diffusions.

Recurrence
We assume in this section that the set E is countably infinite. A classical problem in the theory of Markov chains is to determine wether a chain is recurrent or not. Potential theory is a powerful tool in this framework.
Here is an open problem, for instance. Consider the random walk in random environment evolving on Z 2 as follows. For each line l(k) = {(x, k) : x ∈ Z} flip a fair coin. If it comes head, on this line the random walk may only jump to the right, while it may only jump to the left if it comes tail. This represented by drawing an arrow from (x, k) to (x + 1, k) for each x ∈ Z if the side shown is head, or from (x, k) to (x − 1, k) if it is tail. Do the same thing for each column to obtain a graph as in Figure 6.
As illustrated in Figure 6, each point (x, y) in Z 2 is the tail of two arrows. Denote by η(t) the random walk on Z 2 which waits a mean-one exponential time ate each site of Z 2 and which jumps with equal probability along one of the two arrows. In view of this example, consider a chain η(t) defined on a countably infinite space E which is irreducible and assume that there exists a stationary state, denoted by π. Note that π may not be summable, as in the example above. But we assume that π is explicitly known because all estimates below involve π. This is clearly a strong hypothesis and in many cases a stationary state is not known.
Recall that the Markov chain η(t) is recurrent if and only if there exist a configuration η ∈ E such that P η [H + η = ∞] = 0. There is nothing special about η. If this identity holds for some configuration η, due to the irreducibility, it holds for every. Let (B n : n ≥ 1) be a sequence of finite subsets of E containing η and increasing to E, η ∈ B n ⊂ B n+1 , ∪ n B n = E. Then, By definition (14.1) of the capacity, for any finite set B containing the site 0, where M (ξ) = π(ξ) λ(ξ), λ(ξ) being the holding rate at ξ. Hence, the Markov chain η(t) is recurrent if and only if there exist a configuration η ∈ E and a sequence of finite subsets B n containing η and increasing to E such that The proof of the recurrence is thus reduced to the estimation of the capacity between a configuration and the complement of a finite set.
Of course, if condition (14.15) holds for some configuration η ∈ E and for some sequence of finite subsets B n containing η and increasing to E, it also holds for all configurations ξ ∈ E and for all sequences of finite subsets C n containing ξ and increasing to E.
The next two results, taken from [67], follow from the previous observation and the estimate (14.14) and Lemma 14.12. Recall from the previous subsection that η s (t) stands for the reversible version of the process η(t) whose generator is given by L s introduced in (13.4). It follows from these results, cf. [67], that a irreducible Markov chain on a countable state space E which admits a stationary measure is recurrent if the Markov chain η s (t) is recurrent and if where the symmetric conductance c s has been introduced in (14.8), and the asymmetric one is given by c a (η, ξ) = (1/2) [ c(η, ξ) − c(ξ, η) ].
Benjamini and Hermon [74,21] used Theorem 14.15 to investigate the recurrence of non-backtracking random walks and to show that for every transient, nearest-neighbor Markov chain on a graph, the graph formed by the vertices it visited and edges it crossed is a.s. recurrent for simple random walk.

Examples
We present in this section some dynamics whose metastable behavior has been derived with the arguments presented in the article.

Random walks in a potential field
We describe the reversible version of the dynamics. The non-reversible one is obtained by replacing 2-cycles, in the terminology of Subsection 13.2, by k-cycles.
Let Ξ be an open and bounded subset of R d , and denote by ∂ Ξ its boundary, which is assumed to be a smooth manifold. Fix a twice continuously differentiable function F : Ξ ∪ ∂ Ξ → R. We assume that the second partial derivatives of F are Lipschitz continuous; that all the eigenvalues of the Hessian of F at the critical points which are local minima are strictly positive; that the Hessian of F at the critical points which are not local minima or local maxima has one strictly negative eigenvalue, all the other ones being strictly positive. In dimension 1 this assumption requires the second derivative of F at the local minima to be strictly negative. Finally, we assume that for every x ∈ ∂ Ξ, (∇F )(x) · n(x) > 0, where n(x) represents the exterior normal to the boundary of Ξ, and x · y the scalar product of x, y ∈ R d . This hypothesis guarantees that F has no local minima at the boundary of Ξ.
Denote by Ξ N the discretization of Ξ: The elements of Ξ N are represented by the symbols x = (x 1 , . . . , x d ), y and z. Let µ N be the probability measure on Ξ N defined by be the continuous-time Markov chain on Ξ N whose generator L N is given by where · represents the Euclidean norm of R d . The rates were chosen for the measure µ N to be reversible for the dynamics. We restrict our atention here to the evolution among the shallowest valleys. One can infer from this discussion the general case which can be found in [91]. Denote by M the set of local minima and by S the set of saddle points of F in Ξ. Let S 1 be the set of the lowest saddle points: We represent by z 1 , . . . , z n the elements of S 1 , S 1 = {z 1 , . . . , z n }. Denote by H the height of the saddle points in S 1 : Let Ω be the level set of Ξ defined by The set Ω can be written as a disjoint union of connected components: where Ω j ∩ Ω k = ∅, j = k, and where each set Ω j is connected. Some connected component may not contain any saddle point in S 1 , and some may contain more than one saddle point. Denote by Ω j , 1 ≤ j ≤ m, the connected components Ω j which contain a point in S 1 .
Each component Ω j is a union of valleys, Ω j = W j,1 ∪ · · · ∪ W j,mj . The sets W j,a are defined as follows. LetΩ j be the interior of Ω j . Each set W j,a is the closure of a connected component ofΩ j . The intersection of two valleys is a subset of the set of saddle points: W j,a ∩ W j,b ⊂ S 1 . Figure 7 illustrates the valleys of two connected components.
Each valley W a contains exactly one local minimum of F , denoted by m a . Let h a = F (m a ).
Letθ a = H − h a > 0, a ∈ S, be the depth of the valley W a . The depthsθ a provide the time-scale at which a metastable behavior is observed. Let θ 1 < θ 2 < · · · < θ p , p ≤ , be the increasing enumeration of the sequenceθ a , 1 ≤ a ≤ : The chain exhibits a metastable behavior on p different time scales in the set Ω. Let T q = {a ∈ S :θ a = θ q }, 1 ≤ q ≤ p, so that T 1 , . . . , T p forms a partition of S, and let S q = T q ∪ · · · ∪ T p , 1 ≤ q ≤ p .
Define the projection Ψ q N : Note that Ψ q N (x) = 0 for all points x which do not belong to ∪ a∈Sq E a N . Denote by X q N (t) the projection of the Markov chain η N (t) by Ψ q N : The theory presented in Sections 4-6 yields the existence, for each 1 ≤ q ≤ p, of a time-scale β q N and a S q -valued Markov chain X q (t) with the following property. For each a ∈ S q and sequence of configurations x N in E a N , starting from x N , the finitedimensional distributions of the projected process X q N (t) = X q N (tβ q N ) converge to the ones of X q (t). The time-scales β q N can be explicitly computed and are related to the capacity between valleys.
We refer to [91,92,93,90] for more details. This model is at the origin of the study of metastability from a dynamical point of view. The first results can be traced back at least to Hood [76], van't Hoff [75], Arrhenius [7], Eyring [58] and Kramers [82]. We refer to the recent books by Olivieri and Vares [110] and Bovier and den Hollander [31] and to the review by Berglund [23] for references and alternative derivations of these results.

Spin dynamics
Since the seminal paper by Cassandro, Galves, Olivieri and Vares [38], which introduced the pathwise approach to metastability, the metastable behavior of many spin dynamics have been derived in different ways. We do not review here the main results, but just illustrate the theory developed in the previous sections with one example. We again refer the reader to [110,31] for a complete list of references on the subject.
The Blume-Capel model was introduced in [27,36] to study the 3 He -4 He phase transition. One can think as a system of particles with spins. The value σ(x) = 0 corresponds to the absence of particles, while σ(x) = ±1 to the presence of a particle with spin equal to ±1.
Fix an external field h ∈ R, a magnetic field λ ∈ R, and denote by H : Ω L → R the Hamiltonian given by where the first sum is carried over all unordered pairs of nearest-neighbor sites of Λ L . Denote by µ β the Gibbs measure associated to the Hamiltonian H at inverse temperature β. This is the probability measure on Ω L given by where Z β is the partition function, the normalization constant which turns µ β into a probability measure. We refer to [47] for a description of the ground states, the configurations which minimize the Hamiltonian H, according to the values of the parameters h and λ. In all cases, the ground states form a subset of the set {−1, 0, +1}, where −1, 0, +1 represent the configurations of Ω L with all spins equal to −1, 0, +1, respectively.
The continuous-time Metropolis dynamics at inverse temperature β is the Markov chain on Ω L , denoted by {σ t : t ≥ 0}, whose infinitesimal generator L β acts on functions f : Ω L → R as In this formula, σ x,± represents the configuration obtained from σ by modifying the spin at x as follows, where the sum is taken modulo 3, and the jump rates R β are given by where a + , a ∈ R, stands for the positive part of a: a + = max{a, 0}. The Gibbs measure µ β introduced in (15.2) satisfies the detailed balance conditions (13.2), and is therefore reversible for the dynamics.
Assume from now on that the chemical potential vanishes, λ = 0, and that the magnetic field h is small and positive, 0 < h < 2. In this situation, the configurations −1, 0 are local minima of the Hamiltonian, while the configuration +1 is a global minimum. Moreover, H(0) < H(-1).
Assume that 2/h is not an integer and let n 0 = 2/h , where a stands for the integer part of a ∈ R + . Denote by R c the set of configurations with n 0 (n 0 + 1) + 1 0-spins forming, in a background of −1-spins, a n 0 × (n 0 + 1) rectangle with an extra 0-spin attached to the longest side of this rectangle. This means that the extra 0-spin is surrounded by three −1-spins and one 0-spins which belongs to the longest side of the rectangle.
It is proved in [87,88] that, as the temperature vanishes, starting from −1 the process visits the set R c before hitting 0 or +1: The set R c represents the energetic barrier which has to be surmounted to pass from −1 to {0, +1}. Fix ξ ∈ R c , let and let θ β be given by where o β (1) is a remainder which vanishes as β → ∞.
The metastable behavior of this model has been explored by Cirillo and Olivieri [47], Manzo and Olivieri [100], and more recently by Cirillo and Nardi [44], and Cirillo, Nardi and Spitoni [46]. The mean-field Potts model is another spin dynamics in which the spin may take more than two values. It has been examine recently in [92] and by Nardi and Zocca in [103].

Zero range processes
Denote by N the set of non-negative integers, N = {0, 1, 2, ...}, by T L , L ≥ 1, the discrete, one-dimensional torus with L points, and by η the elements of N T L called configurations. The total number of particles at x ∈ T L for a configuration η ∈ N T L is represented by η x . Let E N , N ≥ 1, be the set of configurations with N particles: Fix α > 1, and define g : N → R + as g(0) = 0 , g(1) = 1 and g(n) = a(n) a(n − 1) , n ≥ 2 , where a(0) = 1, a(n) = n α , n ≥ 1. In this way, n i=1 g(i) = a(n), n ≥ 1, and {g(n) : n ≥ 2} is a strictly decreasing sequence converging to 1 as n ↑ ∞.
Fix 1/2 ≤ p ≤ 1, and denote by p(x) the transition probability given by p(1) = p, p(−1) = 1 − p, p(x) = 0, otherwise. Let σ x,y η be the configuration obtained from η by moving a particle from x to y: The nearest-neighbor, zero-range process associated to the jump rates {g(k) : k ≥ 0} and the transition probability p(x) is the continuous-time, E N -valued Markov process {η N (t) : t ≥ 0} whose generator L N acts on functions f : E N → R as Hence, if there are k particles at site x, at rate pg(k), resp. (1 − p)g(k), one of them jumps to the right, resp. left. Since g(k) decreases to 1 as k → ∞, the more particles there are at some site x the slower they jump, but the rate remains bounded below by 1.
This Markov process is irreducible. The stationary probability measure, denoted by π N , is given by where Z N is the normalizing constant. Fix a sequence { N : N ≥ 1} such that 1 N N , and let E x N , x ∈ T L , be the set of configurations in which all but N particles sit at x: According to equation (3.2) in [15], for each x ∈ T L , π N (E x N ) → 1/L as N ↑ ∞. Denote by η E N (t) the trace of the process η N (t) on E N = ∪ x E x N , and let Ψ N : Under some further conditions on the sequence N , it can be proven, following the method presented in Sections 4-6, that the time-rescaled coarse-grained process X N (t) = X N (tN 1+α ) = Ψ N (η E N (tN 1+α )) converges to a S-valued Markov chain X(t). The jump rates of the reduced model X(t) are proportional to the capacity of the random walk on the discrete torus with L points which jumps to the right with probability p and to the left with probability 1 − p. Moreover, in the time scale N 1+α the time spent by the process η N (t) on ∆ N = E N \ E N is negligible.
This model has been introduced by Evans [56] Godrèche examined the dynamics of the condensate in [68]. Its metastable behavior has been derived in [16,84,116]. The reduced model is a T L -valued Markov chain whose jump rates are proportional to the capacities of the underlying random walk associated to p(·).
The nucleation phase of this model has been described in [14]. Armendáriz, Grosskinsky and Loulakis [4] considered the case in which the total number of sites increases with the number of particles, keeping a constant density. In this situation, the reduced model is a Lévy-process.
In some dynamics the condensate is formed instantaneously as the size of the system grows, Waclaw and Evans [122], Chau1, Connaughton and Grosskinsky [39].

Random walks among random traps
Let (G N : N ≥ 1), G N = (V N , E N ), be a sequence of possibly random, finite, connected graphs defined on a probability space (Ω, F, P), where V N represents the set of vertices and E N the set of unoriented edges. Assume that the number of vertices, |V N |, converges to +∞ in P-probability. To fix ideas, one can consider the d-dimensional discrete torus with N d points.
Assume that on the same probability space (Ω, F, P), we are given an i.i.d collection of random variables {W N j : j ≥ 1}, N ≥ 1, independent of the random graph G N and whose common distribution belongs to the basin of attraction of an α-stable law, 0 < α < 1. Hence, for all N ≥ 1 and j ≥ 1, where L is a slowly varying function at infinity. For each N ≥ 1, re-enumerate in decreasing order the weights W N 1 , . . . , W N |V N | : W N j = W N σ(j) , 1 ≤ j ≤ |V N | for some permutation σ of the set {1, . . . , |V N |} and W N j ≥Ŵ N j+1 for 1 ≤ j < |V N |. Let (x N 1 , . . . , x N |V N | ) be a random enumeration of the vertices of G N and define W N x N j =Ŵ N j , 1 ≤ j ≤ |V N |, turning G N = (V N , E N , W N ) into a finite, connected, vertex-weighted graph.
Consider for each N ≥ 1, a continuous-time random walk {η N (t) : t ≥ 0} on V N , which waits a mean W N x exponential time at site x, after which it jumps to one of its neighbors with uniform probability. The generator L N of this walk is given by: for every f : V N → R, where y ∼ x means that {x, y} belongs to the set of edges E N and where deg(x) stands for the degree of x: deg(x) = #{y ∈ V N : y ∼ x}.
Let Ψ N : V N → {1, . . . , |V N |} be given by Ψ N (x N j ) = j. It has been proved for a class of random graphs that there exists a time-scale θ N for which time-rescaled process X N (t) = Ψ N (η N (tθ N )) converges to a K-process.
To describe the dynamics of the K-process, consider two sequences of positive real numbers u = (u k : k ≥ 1) and Z = (Z k : k ≥ 1) such that Consider the set N * = {1, 2, . . . } ∪ {∞} of non-negative integers with an extra point denoted by ∞. We endow this set with the metric induced by the isometry φ : N * → R, which sends n ∈ N * to 1/n and ∞ to 0. This makes the set N * into a compact metric space.
The K-process with parameter (Z k , u k ) can be informally described as follows. Being at k ∈ N, the process waits a mean Z k exponential time, at the end of which it jumps to ∞. Immediately after jumping to ∞, the process returns to N. The hitting time of any finite subset A of N is almost surely finite. Moreover, for each fixed n ≥ 1, the probability that the process hits the set {1, . . . , n} at the point k is equal to u k / 1≤j≤n u j . In particular, the trace of the K-process on the set {1, . . . , n} is the Markov process which waits at k a mean Z k exponential time at the end of which it jumps to j with probability u j / 1≤i≤n u i .
In contrast with the theory presented in the previous sections, here the reduced model takes value in a countably infinite space. Moreover, as Ψ N is a bijection, the process X N (t) is Markovian, and we do not need to remove a piece of the state space by considering the trace, and we prove the convergence of the projection to the reduce model.
The K-process has been introduced by Fontes and Mathieu [63] who also proved the convergence to the K-process of the trap model in the complete graph. Fontes and Lima [62] considered the case of the hypercube. These results have been extended to d-dimensional torus, d ≥ 2, and to random graphs in [77,78]. More recently, Cortines, Gold and Louidor considered a continuous time random walk on the two-dimensional discrete torus, whose motion is governed by the discrete Gaussian free field [49].

A polymer in the depinned phase
Fix N ≥ 1 and denote by E N the set of all lattice paths starting at 0 and ending at 0 after 2N steps: Fix 0 < α < 1 and denote by η N (t) the E N -valued Markov chain whose generator L N is given by In this formula η j,± represents the configuration which is equal to η at every site k = j and which is equal to η j ± 2 at site j.
Denote by g N the spectral gap of the chain. The exact asymptotic behavior of g N is not known, but, by [37,Theorem 3.5], g N ≤ C(α)(log N ) 8 /N 5/2 for some finite constant C(α).
Fix a sequence N such that 1 N N , and let By equation (2.27) in [35], π N (E 1 N ) = π N (E 1 N ) = (1/2) + O( N for all N large enough. This shows that the chain equilibrates inside each valley in a much shorter time-scale than the one in which it jumps between valleys. Let ν N be a sequence of probability measures concentrated on E 1 N and which fulfills conditions (9.2). Set θ N = 1/g N . The method presented in Section 9 yields that the time-rescaled coarse-grained process X T N (t) = X T N (tθ N ), introduced in condition (T1) of Definition 2.2, converges to the {1, 2}-valued Markov chain which starts from 1 and jumps from m to 3 − m at rate 1/2. Moreover, in the time scale θ N , the time spent by the process η N (t) outside the set E N is negligible. We refer the reader to [20] for the proofs.
The interest of this model is that the entropy plays an important role. In contrast with the models presented in the previous subsections, the metastable behavior is not determined by an energy landscape, but by a repulsion in a bottleneck region of the space. In particular, in the terminology introduced in Remark 5.5, this dynamics does not visit points and the method presented in Sections 4-6 does not apply.
Note that the metastable behavior has been derived without a precise knowledge of the time-scale at which it occurs. Of course, the jumps between valleys take place in the time-scale θ N , the inverse of the spectral gap, but the exact asymptotic behavior of g N is not known, and not needed in the proof of the metastable behavior of the dynamics.
This model has been introduced in [37,35]. The results described in this subsection are taken from [20]. where A x,y , resp. A x , is the set obtained from A by replacing the point x by y, resp. removing the element x:

Coalescing random walks
In contrast with the previous dynamics, in this example the reduced model takes value in a countably infinite state space. Let S = {1, 1/2, 1/3, . . . } ∪ {0}, and let C 1 (S) be the set of functions f : S → R of class C 1 , that is f ∈ C 1 (S) is the restriction to S of a continuously differentiable function defined on R. For each f ∈ C 1 (S) define Lf : S → R as To define the metastable time-scale, consider two independent random walks (x N t ) t≥0 and (y N t ) t≥0 on T d N , both with jump probability given by p(·), starting at the uniform distribution. Let θ N be the expected meeting time: Since x N t − y N t evolves as a random walk speeded-up by 2, θ N represents the expectation of the hitting time of the origin for a simple symmetric random walk speeded-up by 2 which starts from the uniform measure. In a general graph, though, the time-scale should be given by (15.3) mutatis mutandis.
Consider a continuous-time, random walk (x t ) t≥0 on Z d with jump probabilities given by p(·) and which starts from the origin. Assume that d ≥ 3, and denote by v d the escape probability: v d = P 0 [H + 0 = ∞]. It can be shown that The factor 2 in the denominator appears because the process has been speeded-up by 2. In particular, in d = 2, 1/π should be understood as (1/2)(2/π). We refer to [13] for a proof of this result. Consider the time-rescaled coarse-grained process X N (t) = Ψ N (A N (θ N t)) , t ≥ 0 .
Note that in this example we do not take the trace of the process on some set, but we just project it on a smaller state space.
Applying the ideas presented in the previous sections, it is proved in [13] that, starting from the configuration in which each site is occupied by a particle, X N (t) converges in the Skorohod topology to the Markov chain whose generator is given by L and which starts from 0.
This model has been first considered by Cox [50], who proved that the coalescence time [the time all particles coalesced into one] is asymptotically equal to a sum of independent exponential random variables. This result has been extended by Oliveira [106,107] to the case of transitive graphs. Related questions have been examined by Aldous and Fill [2], Durrett [53], Cooper, Frieze and Radzik [48], Chen, Choi and Cox [40].

Further examples
We mention in this last subsection other models whose metastable behavior has been derived with the tools presented in the previous sections.
The metastable behavior of sequences of continuous-time Markov chains on a fixed finite state-space has been examined in [17,95]. This problem has been addressed with large deviations techniques by Scopolla [115], Olivieri and Scopolla in [108,109], Manzo, Nardi, Olivieri and Scoppola [99] and Cirillo, Nardi and Sohier [45].
The evolution, in the zero-temperature limit, of a droplet in the Ising model under the conservative Kawasaki dynamics in a large two-dimensional square with periodic boundary conditions has been derived in [18,70]. The reduced model in this example is a two-dimensional Brownian motion on the torus.
Misturini [102] considered the ABC model on a ring in a strongly asymmetric regime. He derived the metastable behavior of the dynamics among the segregated configurations in the zero-temperature limit. Here, the reduced model is a Brownian motion.