Asymptotic behavior and aging of a low temperature cascading 2-GREM dynamics at extreme time scales

We derive scaling limit results for the Random Hopping Dynamics for the cascading two-level GREM at low temperature at extreme time scales. It is known that in the cascading regime there are two static critical temperatures. We show that there exists a (narrow) set of fine tuning temperatures; when they lie below the static lowest critical temperature, three distinct dynamical phases emerge below the lowest critical temperature, with three different types of limiting dynamics depending on whether the temperature is (well) above or below, or at a fine tuning temperature, all of which are given in terms of K processes. We also derive scaling limit results for temperatures between the lowest and he highest critical ones, as well as aging results for all the limiting processes mentioned above, by taking a second small time limit.


Introduction
It is believed that activated dynamics of spin glasses, which occur at time-scales on which the process has time to escape deep valleys of a highly complex random energy landscape, exhibit aging at low temperature, the nature of which should be closely linked to the specific properties of this landscape [10], [9]. Over the past decades one type of aging behavior was isolated and identified, first in the REM [3], [4], [5], [20] whose complete dynamic phase diagram is now known [25], and later also in the p-spin SK models [2,13] but for restricted time-scales at which the dynamics does not have time to discover the full correlation structure of the random environment and, not being influenced by strong long-distance correlations, behaves essentially like a REM.
A question arose as to whether this REM-like behavior was truly universal or depended on the choice of dynamics. Indeed, all the papers quoted above study the same dynamics, the Random Hopping dynamics, whose trajectories do not depend on the random Hamiltonian, a choice that is clearly not satisfactory from the point of view of physics. Would a physically more realistic choice of dynamics change the nature of aging? The recent paper [24] answered in the negative for the key example of Metropolis dynamics of the REM, for all time-scales away from extreme time-scales at which the phase transition from aging to stationarity takes place. Nothing is known to date about Metropolis dynamics of the REM at extreme time-scales, whether from a mathematical, theoretical or numerical point of view [1], let alone Metropolis dynamics of p-spin SK models.
The present paper initiates the study of the aging dynamics of the GREM, a model with built-in hierarchical structure for which both the process of extremes and the low temperature Gibbs measure are fully understood [15,14]. We consider here only the 2-GREM, which already enables us to identify new aging behaviors and an unforeseen mechanism, called fine tuning mechanism, while avoiding the heaviness inherent in the model with finitely many hierarchies. Furthermore, we restrict our analysis to extreme time-scales, where the process undergoes a transition from aging to equilibrium and, in doing so, visits the configurations in the support of the Gibbs measure. The parameters of the model and the temperature are chosen such that this support has a fully cascading structure, a setting that would be qualified as "2-step replica symmetry breaking" in physics. Shorter time-scales where the dynamics is aging will be considered in a followup paper. Finally, our choice of the dynamics is dictated by that of the time-scales. As already mentioned, Metropolis dynamics at extreme time-scales is out of reach. However, we do not study the plain Random Hopping dynamics considered in earlier literature but a hierarchical version, the Hierarchical Random Hopping dynamics, which is closer to Metropolis dynamics. Based on our knowledge of the REM, it is reasonable to expect that our choice of dynamics allows to correctly predict aging of Metropolis dynamics of the GREM near the phase transition.
For the most part of this paper, we will be concerned with the scaling limit of the dynamics for the time-scales and parameters specified above. Having obtained the limiting dynamics -given by K-processes appropriate to each regime that emerges in the analysis -(see Subsection 2.5), we then proceed to take a further small time limit for which aging results follow (see Subsection 1.3). Not only do we obtain a new, distinctive GREM-like aging behavior that goes beyond the known REM-like behavior, but we also isolate a new, temperature dependent fine tuning mechanism, that gives rise to three distinct aging regimes (corresponding to three distinct K-processes) which the dynamics can be tuned in by adjusting the temperature. This completely new fine tuning mechanism, and the rich aging picture that emerges were not predicted in the physics literature on the GREM dynamics [10], [29], [9].

The model
We now specify our setting. Let V N = {−1, 1} N , V Ni = {−1, 1} Ni , σ = σ 1 σ 2 , σ i ∈ V Ni , i = 1, 2, and make N 1 = pN for some p ∈ (0, 1) and N 2 = N − N 1 ; we view σ i as the i-th hierarchy or level of σ. Given a ∈ (0, 1), set H N (σ) = H σ ; σ ∈ V N } is a family of i.i.d. standard Gaussian random variables. We call random environment and denote by (Ω, F, P) the probability space on which the sequence of processes (H N (σ), σ ∈ V N ), N > 1, is defined. As usual, we call H N (σ) the Hamiltonian or energy of σ. We refer to the minima of H N (·) as low energy or ground state configurations. We will also refer to them, for being minima of H N (·), as top configurations. Likewise for H (i) N (·), i = 1, 2. The associated Gibbs measure at inverse the temperature β > 0 is the (random) measure G β,N defined on V N through G β,N (σ) = e −βH N (σ) /Z β,N where Z β,N is a normalization.
Let us briefly recall the key features of the statics of the 2-GREM (see [8] for a detailed account). As regard the Hamiltonian, two scenarios may be distinguished, related to the composition of the ground state energies in terms of their first and second level constituents: the cascading phase, when a > p, and where those energies are achieved by adding up the minimal energies of the two levels, so that to each first level ground state configuration, there corresponds many second level ground state configurations; in the complementary non-cascading phase, the composition of the ground state energies is different, and for each first level constituent there corresponds a single second level constituent.
Consider now the case where a > p. The free energy exhibits two discontinuities at the critical temperatures and the Gibbs measure behaves as follows. In the high-temperature region β < β cr 1 , no single configuration carries a positive mass in the limit N ↑ ∞, P-a.s.; here the measure resembles the high temperature Gibbs measure of the REM. On the contrary, in the low temperature region β > β cr 2 , the Gibbs measure becomes fully concentrated on the set of ground state configurations, yielding Ruelle's two-level probability cascade. In between, when β cr 1 < β < β cr 2 , an intermediate situation occurs in which the first level Hamiltonian variables "freeze" close to their ground state values, but not the second level ones, so that, once again, no single configuration carries a positive mass in the limit N ↑ ∞. To obtain a macroscopic mass, one must lump together an exponentially large number of second level configurations. In this paper we focus on the cascading phase (a > p) of the model at low temperature (β > β cr 2 ). We will also treat the case β cr 1 < β < β cr could lead to an aging behavior different from that of a more "realistic" Glauber dynamic. Indeed, a moment's thought reveals that the hierarchical structure of the Hamiltonian of the GREM is reflected in the dynamics, for typical choices of Glauber dynamics such as Metropolis dynamics, but that this property is lost in the above Random Hopping dynamics. The rates (1.4) retain this hierarchical property of Metropolis dynamics, but simplifies Metropolis rates within each given level by replacing them by the rates of the Random Hopping dynamics. We naturally call the resulting dynamics the Hierarchical Random Hopping dynamics (hereafter HRHD).
Following a standard notation σ ∼ σ indicates that d(σ, σ ) = 1, where d(·) stands for the usual Hamming distance in V N -we will below denote by d i such distance in V Ni , i = 1, 2. In other words, σ ∼ σ indicates that σ, σ differ in exactly one coordinate; we say in this context that σ, σ are (nearest) neighbors (in V N ). We recognize the graph whose vertices are V N and whose edges are the neighboring pairs of configurations of V N , abusively denoted also by V N , as the N -dimensional hypercube. Clearly, σ N is reversible w.r.t. G β,N .
It now remains to specify the time-scale in which we observe this process. As mentioned earlier, we are interested in extreme time-scales, where the dynamics is close to equilibrium. What we mean here by the dynamics being close to equilibrium at a given extreme time-scale is that the dynamics with time rescaled by that time-scale converges in distribution to a nontrivial Markov process which is ergodic in the sense of having an irreducible (countable) state space and a unique equilibrium distribution. The limiting dynamics is thus close to equilibrium, since it converges to equilibrium as time diverges, and it is in this sense that we say that the original dynamics is close to equilibrium at the extreme time-scale -see Remark 2.10 for a more precise discussion. They also are the time-scales that determine the phase separation line between aging and stationarity in the sense that taking time to zero, the process enters an aging regime.
For future reference we call P the law of σ N conditional on the σ-algebra F, i.e. for fixed realizations of the random environment, or P η , when the initial configuration η is specified. We will denote by P ⊗ P η the probability measure obtained by integrating P η with respect to P. Expectation with respect to P, P and P ⊗ P µ are denoted by E, E and E ⊗ E µ , respectively, where µ is the uniform probability measure on V N .

Dynamical phase transitions
The distinct static phases of the cascading 2-GREM, determined by β = β cr i , i = 1, 2, are expected to exhibit different dynamical behaviors under the HRHD at extreme (and conceivably other) time scales. This will be seen when comparing the results of our analysis of the HRHD below the lowest critical temperature (β > β cr 2 ) on the one hand, and those for intermediate temperatures (β ∈ (β cr 1 , β cr 2 )), on the other hand. Another source of dynamical phase transition in the HRHD at extreme time scales is the fine tuning phenomenon discussed next.

Fine tuning; heuristics
There are two competing factors governing the behavior of the RHD at extreme timescales. One is the number of jumps it takes for the dynamics to leave a first level ground state configuration σ 1 . This is a geometric random variable with mean 1 + N2 N1 exp{β √ aN Ξ Low temperature cascading 2-GREM close to equilibrium At relatively high temperatures, the second number dominates, and so after leaving a ground state configuration σ, which it does at times of order exp{β (1 − a)N Ξ (2) σ } ≈ exp{ββ * (1 − p)(1 − a)N }, σ N will visit many first level ground state configurations before it finds a second level ground state configuration. When it first finds such a second level configuration, say σ 2 , while in a first level ground state configuration σ 1 (meaning that it first returned to an overall ground state configuration σ 1 σ 2 ), σ 1 will be effectively distributed proportionally to exp{β √ aN Ξ (1) σ 1 }; this can be explained by a size-bias mechanism that operates in the selection of σ 1 . There is no such mechanism for the choice of σ 2 , and it is distributed uniformly.
On the other hand, at low enough temperatures, the first factor dominates, and while staying at a first level low energy configuration, the process has time to reach equilibrium at the second level, so at the time scale where we see (uniform) transitions between first level low energy configurations, the second level is in equilibrium. This is a longer time scale, composed of the many jump times at second level till exiting first level.
In a narrow strip of borderline temperatures, we see nontrivial dynamics at both levels going on at the same time scale (corresponding to jump times out of second level ground state configurations, roughly of magnitude exp{ββ * (1 − p)(1 − a)N }, as at high temperatures).
In order for the above picture to represent the dynamics, we need the temperature to be below the static lowest phase transition temperature 1/β cr 2 , so that the time spent off the ground state configurations is negligible. Moreover, this three-phase dynamical picture will take place if (and only if) the borderline temperatures alluded to aboveand to be called fine tuning temperatures below -are (well) below the static lowest phase transition temperature; otherwise, we will see only one dynamical phase below that lowest critical temperature, namely the low temperature phase alluded to above.

Intermediate temperatures
For values of β between β = β cr 1 and β = β cr 2 , we investigate the behavior of the dynamics at a time scale when we see transitions between the first level ground state configurations at times of order 1. In order that this time scale corresponds to an extreme time scale (as stipulated above -see the one but last paragraph of Subsection 1.1), we need a further a restriction in the temperature, to be seen below. The behavior of the dynamics of the first level configuration for intermediate temperatures in those conditions is similar to the one below the minimum between the lowest critical and the fine tuning temperature.

Aging results
Let us briefly anticipate our main aging results, holding in the case described at the last paragraph of Subsection 1.2.1, where fine tuning temperatures are below the lowest static critical temperature. In this case, as mentioned above, we have three phases for the dynamics at extreme time scales below the lowest static critical temperature. As already explained briefly above, our aging results in this paper are obtained by first taking the scaling limit of the dynamics at extreme time scales, thus obtaining ergodic processes, and next taking a small time limit in those processes, thus obtaining aging results. Let us suppose we have already taken the first, extreme time scale limit. We obtain three distinct dynamics in each of the temperature ranges: above fine tuning, at fine tuning, and below fine tuning (see Theorems 2.4, 2.5, and 2.7 in Subsection 2.5).
For i = 1, 2, let us consider the events N i = N i (t w , t) = {Y i does not jump between times t w and t w + t}, (1.5) where Y i represents the i-th level marginal of the process, and t w , t > 0, and define Π is an analogue of a (limiting) two-time overlap function of the HRHD. In the regime considered in this subsection, we have the following (vanishing time limit) aging result: writing FT as shorthand for fine tuning, with proper choices of α 1 , α 2 ∈ (0, 1), where Asl · is the arcsine distribution function. See Section 10 and also the last paragraph of Section 11 for details and other regimes.

A 2-GREM-like trap model
The idea behind the construction of trap models for low temperature glassy dynamics is as follows: the traps represent the ground state configurations and, assuming that at low temperature the dynamics spends virtually all of the time on those configurationshere an extreme time-scale is assumed -, higher energy configurations are simply not represented, and all one needs to do is specify the times spent by the dynamics at each visit to a ground state configurations, and the transitions among those configurations, in such a way that the resulting process be Markovian. The simplest such model to be proposed in the study of aging was put forth in [10], with {1, . . . , M } as configuration space, mean waiting time at i given by X i , with X 1 , X 2 , . . . iid random variables in the domain of attraction of of an α-stable law, α ∈ (0, 1), and uniform transitions among the configurations. This is the so called REM-like trap model or trap model on the complete graph. Models of a similar nature for the GREM were proposed in [10] and also in [29]. The scaling limit of the latter model for a fine tuning choice of level volumes was computed in [18] and its aging behavior away from fine tuning was studied in [23]. Out of our analysis of the HRHD in the cascading phase at low temperatures comes up the following GREM-like trap model on the ground state configurations of the GREM. The configuration space is represented by M 1 first level ground state configurations, labeled in decreasing order, and for each of those configurations, we have M 2 second level ground state configurations, labeled in decreasing order. The transition probabilities p(x, y) between x = (x 1 , x 2 ) and y = (y 1 , The factor ψ ∈ [0, ∞] in (1.8) interpolates between higher temperatures, above fine tuning (ψ = 0) and low temperatures, below fine tuning (ψ = ∞); ψ ∈ (0, ∞) corresponds to borderline, fine tuning temperatures in the picture outlined above, and to be described more precisely below. The factors γ 1 (·) correspond to the scaled M 1 maxima first-level Boltzmann factors exp{β · }. The time spent at each visit to x in the appropriate time scale is an exponential random variable with mean γ 2 (x), where for each x 1 , γ 2 (x 1 , ·) corresponds to the scaled M 2 maxima second-level Boltzmann factors exp{β (1 − a)N Ξ (2) σ1· }, with σ 1 the first level configuration labeled x 1 , as explained above. It must be said that this time scale is of the order of magnitude of the time needed for the dynamics to jump out of ground state configurations, and it is indeed extreme in the above sense only for fine tuning temperatures and above. In these cases, (1.7) indeed represents the transitions among ground states (in the extreme time scale). At lower temperatures, as explained in the discussion on dynamical phase transition above, the extreme time scale is longer, with uniform transitions on first level, with exponential waiting times, and on second level the dynamics is a trivial product of equilibria at different times.
The results indicated above do not seem to be in or be predicted by the physics literature, which has focused on short time scales, where all levels age simultaneously, and thus no effect of the longer time dynamical phase transition is present. This matches our short extreme times aging results only at fine tuning, where that simultaneity takes place. Also, our GREM-like trap model differs from those considered in the literature (in [10,29]).

Organization
In Section 2 we make precise the notions introduced in this introduction, and formulate our scaling limit results for σ N on extreme time scales for β > β cr 2 . In Sections 3-6 we formulate and argue entrance law results leading in particular to the transition probabilities between ground state configurations described in 1.7. These results are key ingredients to the proofs of the above mentioned scaling limit results, which are undertaken in Sections 7-9. Section 10 is devoted to a brief discussion about aging results that we obtain for the limit processes, as already mentioned. In Section 11 we briefly discuss results for the intermediate temperature phase (β cr 1 , β cr 2 ). An appendix contains definitions of the limit processes entering our scaling limit results, as well as auxiliary results.
2 Scaling limit of σ N . Main Results

Choice of parameters
As mentioned above, we will study the cascading phase, which, we recall, corresponds to a > p. (2.1) As regards temperatures, we want to take volume dependent ones (this is needed in order to capture the fine tuning phase transition). We also want low temperatures, which in the cascading phase corresponds to where the dependence of β on N is implicit. In order to describe that dependence, let us start by setting β * = √ 2 ln 2, κ = 1 2 (ln ln 2 + ln 4π), Given a sequence ζ N < N 2 β 2 * /2 of real numbers, let β(a, p, N, ζ N ) be the solution in β of the equation where β F T := 1−p 2 √ ap β * is the inverse of the fine tuning temperature. Depending on the behavior of ζ N we distinguish three types of temperature regimes. (Given two sequence s N ands N we write s N ∼s N iff lim N →∞ s N /s N = 1. We also write s N = O(1), resp. s N = o(1), iff |s N | ≤ C < ∞, for some C and all N > 0, resp. lim N →∞ s N = 0.) Definition 2.1 (At/above/below fine tuning). We say that a sequence β −1 ≡ β −1 N > 0 of temperatures is in the fine tuning (FT) regime if there exists a finite real constant ζ and a convergent sequence ζ N ∼ ζ such that β = β(a, p, N, ζ N ). We say that β −1 is below fine tuning if there exists a sequence ζ − N satisfying ζ − N → −∞ as N → ∞, and such that β = β(a, p, N, ζ − N ). Finally, we say that β −1 is above fine tuning if there exists a sequence ζ + N such that ζ + N → +∞ and limsup 2 as N → ∞, and β = β(a, p, N, ζ + N ). (Note that for β = β(a, p, N, ζ N ) to be a convergent sequence, ζ N /N 2 must be convergent.) In order to precisely describe our results, we start with some technical preliminaries. As described above, the way the ground state configurations are arranged in the cascading phase suggests the following relabeling of the state space V N .

Change of representation
≥ . . . (2.6) and, similarly, for each x 1 ∈ D 1 call ξ x1x2 2 , x 2 ∈ D 2 , the vertices of V N2 such that . This is a one to one mapping for almost every realization of Ξ. Let now X N = X N 1 X N 2 be the mapping of σ N on D by the inverse of ξ. This is the process we will state scaling limit results for. This alternative representation suits our purpose of taking scaling limits, mainly due to the convenience of working with a state space which naturally extends to set of the natural numbers, which will be the state space of the limiting processes. The class to which these processes belong, namely K processes, is described in the appendix. In the build up for those scaling limit results, let us introduce next scaling factors, and then the scaling limit of the environment. (2.8) and define the scaled variables

Scalings
where for i = 1, 2, u Ni is the scaling function for the maximum of 2 Ni i.i.d. standard Gaussians, (2.11) For later use set Clearly, in the fine tuning regime, (2.13)

Remark 2.3.
It follows from the above results that the Gibbs measure G β,N converges suitably to Gβ -the normalized γ -as N → ∞.

Scaling limit of X N
In order to have the three cases outlined in the heuristics discussion, namely, above, at and below fine tuning temperatures, we need that β F T > β cr 2 , namely, that The latter condition is equivalent to (2.2) once we replace 'lim inf' by 'lim' there. Set We recall that this is a process in the random environment γ N . The limiting processes, which are K processes, described in the appendix, will naturally also be processes in random environment.
These functions will play the role of random environment for the limiting process in this case. Theorem 2.4 (Above fine tuning temperatures). As N → ∞ X N ⇒ K(f 1 , w 1 ); (2.21) where ⇒ stands for convergence in P ⊗ P µ -distribution. The convergence takes place on the Skorohod space of trajectories of both processes, with the J 1 metric.
See the definition of K(·, ·) in the first subsection of the appendix. To state our second theorem, we assume lim N →∞ ζ N = ζ for some real finite ζ.
Theorem 2.5 (At fine tuning temperatures). As N → ∞ The convergence takes place in the Skorohod space of trajectories of both processes, with the J 1 metric.
See the definition of K 2 (·, ·) in the first subsection of the appendix. Remark 2.6. In order for the above mentioned two-level K-process to be well defined, we need to make sure that f 2 , f 2 satisfy (almost surely) the summability conditions (A.1, A.4). This is a classic result for (A.1) -recall that w ≡ 1 in this case -, and follows by standard arguments for (A.4) from the fact that α 1 < α 2 < 1 (as noted above, below (2.16)).
For our last theorem, we take ζ − N as in Definition 2.1.
Low temperature cascading 2-GREM close to equilibrium Theorem 2.7 (Below fine tuning temperatures). As N → ∞ X N ⇒X 1X2 , (2.24) whereX 1 ∼ K(f 3 , 1) and, given γ 2 andX 1 = x 1 ∈ N,X 2 is an iid family of random variables on N (indexed by time) each of which is distributed according to the weights given by γ x1 2 . The marginal convergence of the first coordinate takes place in the Skorohod space of trajectories of both processes, with the J 1 metric, and the convergence of the second coordinate is in the sense of finite dimensional distributions only.
Remark 2.8. If Condition 2.18 is not satisfied (within the cascading, low temperature regime treated in this paper), then we are below fine tuning temperatures and Theorem 2.7 holds for all β > β cr Remark 2.9. It is either known or follows readily from known results that the limiting processes in the above theorems are ergodic Markov processes 2 , having the infinite volume Gibbs measure Gβ (see Remark 2.3 above) as their unique equilibrium distribution. See [21] for the case of the 2-level K-process, and [17] for the cases involving weighted/uniform K-processes.

Strategy
The proofs of all the above theorems have the same structure 3 , reflecting the fact that at extreme time scales, the dynamics lives on ground state configurations. In each case, there are three ingredients: 1. Showing that the process spends virtually all of the time on ground state configurations; 2. Getting sharp estimates on the time spent on each visit to a ground state configurations; 3. Getting sharp estimates on the transition probabilities between ground state configurations.
Point 1 is quite simple in each case, resulting from straightforward expected value computations.
Point 2 is immediate for Theorems 2.4 and 2.5, since in those cases the time scale coincides with the order of magnitude of the waiting times at the ground state configurations. For Theorem 2.7, it is a longer time scale, corresponding to the time spent on the second level configurations before jumping from a given first level ground state configuration; the main point of this estimation involves deriving a law of large numbers for the number of visits to a given second level ground state configuration before the jump from the first level ground state configuration occurs -see (7.6).
As for Point 3, in Theorem 2.7 the transition between first level ground state configuration are already well known to be approximately uniform; and this time scale is well above the equilibration time scale for the second level, so the transitions we see are beween independent configurations of equilibrium -we get that from known spectral gap estimates.
The bulk of our work comprise the estimation of the transition laws for Theorems 2.4 and 2.5. In the latter case, we heavily rely on a potential theoretic point of view, extending the lumping approach adopted in the analysis of the Random Hopping dynamics for the REM in [3], and further developed in [6]. See Proposition 3.1. There are many fine points to be considered here, making for a long analysis. The treatment of Theorem 2.4 demands a finer estimation, on the one hand -see Proposition 3.3 and the discussion around its statement; on the other hand, a more direct approach works in this case.

Entrance law. Main result
In this and the next three sections, we will mostly not be concerned with limits, so we find it convenient to revert to the original representation of σ N as spin configuration. Given a subset A ⊂ V N , the hitting times τ A of A by the continuous and discrete time processes σ N and J * N are defined, respectively, as (3.1)

The Top
We then let the Top be the set Note that we may also write T = ∪ x1∈M1 T x1 where, for each Further introduce the canonical projection of T on V N1 , To each ξ x1 1 in T 1 we associate the cylinder set Clearly, T x1 is the restriction of T to this cylinder, T x1 = W x1 ∩ T . Finally, set (3.6)

Main entrance law results
From now on we fix (ζ N ), a sequence of real numbers such that β = β(a, p, N, ζ N ) > 0 for all N , and let ψ N be as in (2.12). For each x 1 ∈ M 1 and A ⊆ T x1 set .
We will see that this quantity can be interpreted as the probability that, starting in W x1 , the process exits W x1 before finding an element of A. Note that λ x1 N (A) is a random variable. We use it to define the random probability measure ν 1 on M 1 that assigns to x 1 the mass Similarly, givenη ∈ T , we denote by ν 1 the random measure on M 1 defined through

Proposition 3.1.
There exists a subset Ω ⊂ Ω with P( Ω) = 1 such that on Ω, for all N large enough, the following holds in the temperature domain determined by . At FT both limits hold weakly with respect to the environment. Below FT, there is a window of values of ζ − N for which both limits hold almost surely, and above which both hold in probability. One may readily check that the following window has these properties: ζ − N − log log N ; see Lemma 5.3, and its proof. Above FT, we have a mixed situation. For λ x1 N (T x1 ), there is a window above which the convergence is almost sure: ζ + N log N . And for ν N 1 (x 1 ), we need in addition the existence of lim N →∞ ζ N /N , and the convergence is weak. The asymptotics of the probabilities follow readily.
Getting the estimates in Proposition 3.1 above fine tuning when we do not have that ζ N log N , requires an extra level of precision, related to the fact that, in that regime, ν N 1 (·) is a quotient of vanishing terms. We state next a separate result where we deal with this case. Since it is a limit result, we require the existence of lim N →∞ ζ N /N .
where the limit holds in distribution in (Ω, F, P).
The proof of Proposition 3.1 follows a strategy initiated in [3] for the study of aging in the REM and further developed in [6] with the main object of providing the tools to tackle more general spin glass models, such as the GREM. These tools are prepared in Section 4. They are used in Section 5 to prove basic probability estimates for the jump chain that, in turn, are the key ingredients of the proof of Proposition 3.1, concluded in Section 6, after which we prove Proposition 3.3.

Entrance law. Key tools
As announced above, the proof of Proposition 3.1 naturally draws upon [6] since that paper was designed precisely for that purpose. The strategy pursued in [6] (and initiated in [3] and [11]) consists, firstly, in reducing the probabilities of interest, say, the harmonic measure of a given set, A, to quantities which are functions only of the simple random walks J • Ni and, secondly, in using lumping techniques (a coarse graining of the configuration space that is built from the set A) to express these quantities through a new chain, called lumped chain. This lumped chain, which is again Markovian, can be thought of as a (discrete analogue of a) diffusion in a convex potential (whose properties strongly depend on A) and is finally studied with precision using the potential theoretic approach developed in [11,12]. Going back to the original state space, the resulting estimates on the harmonic measure of A come with conditions, which essentially relate to specific properties of the set A (e.g. its size and sparseness).
To see how the HRHD (1.4) links up to the simple random walk, first observe that it can be described alternatively through its jump chain, J * N , and jump rates, w N , where Then, introducing the parameters where Z * β,N is a normalization making this measure a probability. In line with the strategy described above, this section has three parts. Subsection 4.1 gathers simple lemmata needed to link probabilities for the original chain σ N (with rates (1.4)) to probabilities for the jump chain J * N (with transitions (4.4)), and to link the latter to quantities depending only on the simple random walks J • Ni . In Subsection 4.2 we introduce the notion of lumped chains. Finally, in Subsection 4.3, we state and prove the properties of various sets that are needed in Section 5 to make use of the known lumped chain estimates from [6]. This last section can be skipped at first reading.
For future reference, we call P * the law of the process J * N conditional on F. We denote by P •,i the law of J • Ni , i = 1, 2. If the initial state, say η, has to be specified we write P η , P * η and P •,i ηi . We will denote by P ⊗ P η the probability measure obtained by integrating P η with respect to P. Expectation with respect to P, P * , P •,i , P and P ⊗ P µ are denoted by E, E * , E •,i , E and E ⊗ E µ , respectively, where µ is the uniform probability measure on V N .

Comparison lemmata
Our starting point is the observation that We skip the proof of Lemma 4.1 which is elementary. The next two lemmata deal with two classes of events that can be expressed through just one of the simple random walks J • Ni on V Ni . The first are REM-like events that can be reduced to those of a REM (which is a 1-level GREM). Let π i , i = 1, 2, denote the canonical projection of V N onto V Ni , that is 1 denotes the law of the projection π 1 J * N of the jump chain on V N1 . By (4.4) this is a Markov chain with transition probabilities p π N (σ 1 , The lemma now easily follows. The next lemma deals with so-called level-2 events, namely, events whose trajectories are confined to a given cylinder set C(σ 1 ), and that can thus be expressed through just the simple random walk J • N2 on V N2 . Define the outer boundary of a set A ⊂ V N as Then, for all σ ∈ C, (4.12) and note that by (4.4), τ ∂C is a geometric r.v. with success probability q * N (σ 1 ). Thus, on the one hand, Estimates on the probabilities P •,1 σ1 (τ A < τ B ) appearing in Lemma 4.2 and the Laplace transform of Lemma 4.3 are derived in [6] using lumping techniques. For the onedimensional case (when the set A reduces to a singleton), lumping reduces to the classical Ehrenfest chain. We recall below an expression for the probability generating function of hitting times of such chain, appearing in [27] (see (4.28,29) in that reference), and needed later. For t ∈ [0, 1) and σ 2 , (4.14)

Lumped chains and K-lumped chains
In this section we introduce certain functions of the simple random walks J • where σ i,j denotes the j-th cartesian co-ordinate of σ i . The image I i ≡ m i (J • Ni ) of the simple random walk J • Ni , called lumped chain, also is a Markov chain that now takes value in a discrete grid Γ Ni,d that contains Different choices of the partition Λ i yield different lumped chains. Given an integer n and a collection K = η 1 , . . . , η x , . . . , η n of elements of V Ni , the so-called K-lumped . . , n}, and denote by η j the column vectors η j ≡ (η x j ) x=1,...,n ∈ V n , j ∈ {1, . . . , N i }. Given an arbitrary labelling {e 1 , . . . , e k , . . . , e d } of the set of all d = 2 n elements of V n , we then set We denote by m i,K the function (4.15)-(4.16) resulting from (4.17), by the associated K-lumped chain and by P i,K its law.
As mentioned earlier the K-lumped chain should be thought of as a discrete analogue of a diffusion in a convex potential (at least for values of d of order o( potential is very steep close to its global maxima, achieved at the 2 d vertices of V d , and has a unique global minimum, henceforth called the origin. Furthermore, there is a one-to-one correspondance between the set K and its image K ≡ m i,K (K), and K ⊆ V d .
Intuitively, this chain exhibits the following scenario typical in the large deviation context of Freidlin and Wentzell: before reaching any vertex of V d it explores the bulk of the grid Γ Ni,d and remains there for an exponentially (in N i ) large time, revisiting the origin an exponential number of times before, at some random time, it quickly travels (in a polynomial time) from the origin to a vertex z ∈ V d , with almost uniform distribution.
Analysing this behavior using the potential theoretic tools developed in [11,12] for the analysis of metastability, Ref. [6] gives (among other results) precise sufficient conditions for the hitting time of subsets K of V d to be asymptotically exponentially distributed, and for the hitting distribution to be uniform. These conditions essentially require that K should be sparse enough (see Definition 1.2 in [6]) and that the partition Λ i (K) does not contain too many small boxes Λ k i (K) (which would give flat directions to the potential). Because each vertex of V d has exactly one pre-image by m i,K these results and their conditions can be translated back for the original process in the original state space.
The conditions mentioned above as well as the precision of the estimates of [6] are expressed through the following key quantities: is a function of the Hamming distance between the elements of A (depending on N i and d i ), whose definition is explicit but pretty involved. We therefore refer to (3.5)-(3.8) of Section 3 of [6] for this definition. Its properties are analyzed in detail in Appendix A3 of [6].

Properties of the Top and other sets
The aim of this section is to prepare the ground for the use of the results of [6] by proving bounds for the quantities U Ni,di (K), U Ni,di (σ, K) and F Ni,di defined, respectively, in (4.19) and (4.20) above and in (3.5)-(3.8) of [6], for the three types of sets K that we will later encounter in our proofs: the Top, the Top plus a non random point, and large random subsets of V N1 .

The Top
Consider the partitions Λ 1 (T 1 ) and Λ 2 (T x1 ) induced respectively by T 1 and π 2 T x1 , x 1 ∈ M 1 , through (4.17). Let K be any of the sets T 1 or π 2 T x1 , Proof. The proof is an easy adaptation of that of Lemma 4.2 of [22].
On Ω, for all large enough N the following holds: denoting by K any of the sets T 1 , π 2 T x1 , x 1 ∈ M 1 , or ∪ x1∈M1 π 2 T x1 we have, for all η ∈ K andη ∈ K, η =η, and for all 0 Proof. This is the analogue of Lemma 2.12 of [BBG1] and is proved in the same way.
Lemma 4.6. With the notation of Lemma 4.5, the following holds on Ω for all large enough N : for all η ∈ K andη ∈ K, η =η, For any σ ∈ V Ni and any subset A ⊂ V Ni , set Lemma 4.7. With the notation of Lemma 4.5, the following holds on Ω for all large enough N : for all η ∈ K and all σ ∈ V Ni \ K, (1)), j = j(σ, K).
. This is proven just as (4.25). In case (a) we write By (4.23) of Lemma 4.5 we may apply the bound just obtained in case (b) to bound the second term (namely the sum) in the right-hand side of (4.29) whereas the first term is bounded as in (4.27).

The Top and a non random point
We will frequently need to lump the simple random walk J • Ni on V Ni with sets of the form K ∪ σ i ⊂ V Ni where σ i ∈ V Ni is arbitrary and K = T 1 (then i = 1) or K = π 2 T x1 for some x 1 ∈ M 1 (then i = 2). We are now interested in the partition Proof. This is a simple adaptation of the proofs of Lemmata 4.6 and 4.7.
for the associated lumped chain, defined in (5.3), (5.10), and (5.11) of [6] (see Lemma 5.3 and Lemma 5.5 of [6]). We do not repeat these arguments in the proofs of the statements of Section 5.1.

Basic estimates for the jump chains
This Section is concerned with the jump chain only. We state and prove a collection of probability estimates that will later be shown, in Section 6, to form the basic blocks of the proof of Proposition 3.1. This section heavily relies on Ref. [6] and the preparations of Section 4.3.

Main estimates
We recall that the set Ω is defined in (4.21). From now on we drop the dependence on N in the notation.
ii) (Leaving the cylinder set W x1 .) For all non empty subset A ⊆ T x1 , and all σ ∈ W x1 \ A, The following almost sure (but rough) bounds on the ranked variables γ N 1 (ξ x1 1 ) will enable us to further express the quantity λ x1 (A). and Proof of Lemma 5.3. By (2.9) and (2.6), (1)). The lemma then follows from Borel-Cantelli Lemma.

Lemma 5.4.
On Ω, for all but a finite number of indices N 1 we have ). 14) The lemma now follows from ( (1)).
We now turn to "inter-level motions". Proposition 5.6 (Inter-level motion). On Ω, for all large enough N , the following holds: for all η ∈ T , all σ ∈ V N \ W , and some constant c > 0 It is not difficult to deduce from Proposition 5.6 that: On Ω, for all large enough N , we have for all σ ∈ V N \ W and all x 1 ∈ M 1 ,
where the last inequality is Theorem 3.2 of [6]. Inserting our estimates in (5.25) and combining the result with (5.20), we arrive at which readily yields (5.5). The second assertion of Proposition 5.2 is a direct consequence of the first, observing that where the last equality is (5.20), and use (5.31)-(5.32). The proof of Proposition 5.2 is now complete.
Proof of Proposition 5.6. Using (4.9), η = η 1 η 2 ∈ T can be written as η = C(η 2 ) ∩ C(η 1 ) where C(η 1 ) = W x1 for some x 1 ∈ M 1 . Thus, for the event {τ η < τ W \η } to take place, the chain must reach η "from within" the set C(η 2 ), without of course having ever visited W . Building on this observation we begin by establishing an a priori upper bound on the probability (5.16) that is valid for all starting points σ in V N \ W .
Lemma 5.8. The following holds on Ω (see (4.21)) for all large enough N : for all η ∈ T and all σ ∈ V N \ W Proof. Using the renewal identity (see e.g. Corollary 1.9 in [11]) .

(5.34)
To deal with the numerator we use reversibility and (4.5) to write Now for the event {τ σ < τ W } to take place the chain, starting in η, must exit η through the set C(η 2 ). Thus, setting To bound the last probability in the right-hand side of (5.36) we simply observe that on Ω, for all large enough N , where the last equality is proved just as Proposition 5.1, namely, using first Lemma 4.2 with A = {σ 1 }, B = T 1 and σ 1 = σ 1 to write that P * σ τ C(σ1) < τ W = P •,1 σ 1 (τ σ1 < τ T1 ), and using next Theorem 1.4 of [6] with the partition Λ 1 (T 1 ∪ σ 1 ) induced by T 1 ∪ σ 1 (that is to say, the Top and a non random point) together with (4.30) and (4.31) of Lemma 4.8 and It now remains to bound the denominator of (5.34). For this we simply decompose on the first step of the jump chain, where the last equality follows from (5.37). Since σ ∼σ p * N (σ, σ ) = 1 we get N 1 )) .
.  (1)) . This implies that the bound (5.16) holds true for all σ ∈ V + N \ W . To extend this result to the entire set V N \ W , observe that W ⊂ V + N (see the definition (3.4)-(3.6) of W ) and decompose the probability in (5.16) according to whether the jump chain visits V + N \ W before η or not, namely, for σ ∈ V N \ V + N write Call Q 1 and Q 2 , respectively, the first and second probabilities in the r.h.s. of (5.44). Then where the inequality in (5.45) follows from Corollary 5.9. To bound Q 2 note that given By Theorem 1.1 of [16], carefully collecting all estimates in its proof, we get that for some constant c > 0. It remains to bound the first probability in the right-hand side of (5.47). Notice that the mean hitting time of a given vertex in V + N1 by the simple random walk is to leading order 2 N1 = 2 (N ) (N )/N 1 . Assume first that dist(σ 1 , η 1 ) > 1. Then, by Tchebychev exponential inequality, for all u > 0, where the last inequality follows from (7.7) of Theorem 7.2 of [6]. Choosing e.g. u = N 2 1 yields P •,1 σ1 (τ η1 < (N )/N 1 ) ≤ c/N 2 1 for some c > 0. Proceeding in a similar way when dist(σ 1 , η 1 ) = 1 but using (7.6) of Theorem 7.2 of [6] we get P •,1 σ1 (τ η1 < (N )/N 1 ) ≤ c/N 1 for some c > 0. Hence, collecting our bounds, To prove (5.18) note that and use (5.1) of Proposition 5.1 to bound the probability in the left-hand side above and (5.17) to bound the second probability in the right-hand side.

Proof of Proposition 3.1
The set Ω of Propositions 3.1 is chosen to be Ω = Ω ∩ Ω + 1 ∩ Ω where Ω, Ω + 1 , and Ω are defined, respectively, in (4.21), above (5.47) and in (5.9). Since each of these set has full measure (see Lemma 4.4, Lemma 5.3 and the paragraph above (5.47)) P( Ω) = 1. From now on we will assume that ω ∈ Ω. Lemma 4.1 will be frequently used without making explicit mention of it.
Proof of Assertion (i) of Proposition 3.1. We first work out general expressions, valid at all temperature, that relate the entrance probabilities P σ τ η < τ T \η , σ / ∈ T , η ∈ T , to the basic REM-like, level-2 and inter-level probabilities estimated in Propositions 5.1, 5.2 and 5.6. To shorten the notations we write, given x 1 ∈ M 1 and η ∈ T x1 Note that the probabilities (3.10), (3.11) and (3.12) are of the form, respectively, P η (σ), Q η (σ), and R η (σ). As the next lemma shows, both P η (σ) and Q η (σ) can be expressed as functions of R η (σ) whereas, using a renewal kind of argument, R η (σ) itself is the solution of a linear system of equations.

Lemma 6.2.
Under the assumptions and with the notation of Proposition 3.1, the linear system (6.4) has a unique solution, R * η = (R * η (σ)) σ∈∂W , that obeys The proof of Lemma 6.2 makes use of the following two lemmata.
Low temperature cascading 2-GREM close to equilibrium Lemma 6.3. The matrix A has the following properties: for each σ ∈ ∂W Proof. Summing both sides of (6.4) over η ∈ T and using that by (6.1) and (6.2), η∈T R η (σ) = 1 for all σ ∈ S N \ W (6.19) yields the equality of (6.18). The first and final upper and lower bounds simply reflect the fact that A is a positive matrix and b η a positive vector.
We are now ready to prove Lemma 6.2.
Proof of Lemma 6.2. Let us establish that if 1 − λ x1 (T x1 ) O(N −1 ) then, there exists a constant 0 < c < ∞ (that depends on a 1 , a 2 ) such that, for all x 1 ∈ M 1 , all η ∈ T x1 and ).   (1)) for all for all σ ∈ V N \ W and all η ∈ T x1 . Inserting this rough bound in (6.23) and using again (5.18) to bound the resulting sum, the first term in (6.22) being bounded in (5.16), we get (1)).
where 1 is the vector with all components equal to one and where the inequalities hold component wise, for each R * η (σ), σ ∈ ∂W . Now, by (6.18) of 6.3 and Lemma 6.4, The claim of the lemma now readily follows.
We are now ready to prove (3.10), (3.11) and (3.12). Clearly, Eq. (3.12) follows from (6.5), (6.17) of Lemma 6.2 and the bounds (6.22) which are valid for all x 1 ∈ M 1 , all η ∈ T x1 and all σ ∈ V N \ W . Next, inserting (6.17) in (6.7) gives (6.28) Using (5.8) to express the probability in (6.28) and proceeding as in the proof of Lemma 6.2 to bound the term c T x 1 (σ) (that is, the terms b T x 1 (σ, η)) appearing in that expression yields (3.11). Finally, inserting (6.17) in (6.6) gives Using Lemma 6.2 to bound the two probabilities appearing above, reasoning again as in the proof of Lemma 6.2 to bound the terms b T x 1 (σ, η) and c T x 1 (σ), proves (3.10). As for (i-4), it follows from Corollary 5.7. The proof of Assertion (i) of Proposition 3.1 is complete.

Proof of Proposition 3.3
There are two parts to the argument, a quite simple one: how the process leaves M; and the bulk of the argument: how it returns to M. The latter part would be quite straightforward -apart from a result about the random walk in the hypercube and Ehrenfest chains; see Lemma A.5 -, if we could assume that both the returns to first level ground state configurations and to second level ground state configurations are uniform at each instance, irrespective of the starting point. Of course this is only approximately the case at each instance, and there are exponentially many instances to be controlled; so, in the end, indeed an extensive, careful analysis, is required, tailored to take advantage of existing bounds on the approximation of the law of the random walk in the hypercube to the uniform law.

Transition within M: leaving M
By the rules of our dynamics when leaving x 1 x 2 ∈ M, the probability to jump to x 1 x 2 for some x 1 ∼ x 1 equals q * N (ξ x1 ) (recall (4.3)), which vanishes as N → ∞ for x 1 ∈ M 1 . Oncē X N leaves x 1 x 2 ∈ M and goes to some neighboring x 1 x 2 , whileX N 1 rests, the number of jumpsX N 2 would have to take before coming back to M 2 4 is of the order of 2 N2 (by Corollary 1.8 of [6]), which in this temperature regime is much larger than 1/q * N (ξ x1 ), the order of the number of jumps ofX N beforeX N 1 moves. The upshot is that with probability tending to 1 as N → ∞, starting from M,X N first leaves M in such a way thatX N 1 leaves M 1 beforeX N returns to M. 4 Let us recall from Lemma 4.5 that x 2 may be assumed not in M 2 .

Transition within M: return to M
In the presentation of the arguments in the remainder of this subsection, we find it convenient to go back to the representationX N of our process (introduced in Subsec- We then have that τ i , i = 1, 2, . . ., represent the successive hitting times of M 1 byX N 1 . Notice that τ 1 = τ W . Let also A i , resp. A y i , i = 1, 2, . . ., denote the event thatX N 2 hits M 2 , resp. y ∈ M 2 during the i-th visit ofX N 1 to M 1 . Let I = min{i ≥ 1 : A i occurs}. Below we will compute the limit as N → ∞ of P σ (X N 1 (τ I ) = x, A y I ), xy ∈ M. (6.36) From the discussion on Subsubsection 6.2.1, we may take σ / ∈ W . The expression in (6.36) is not quite the probability in the left hand side of (3.18), but close enough in the sense that they turn out to have the same limit, as will also be argued below, in the conclusion of the our proof.
We will now coupleX N to a processX N =X N 1X N 2 which moves likeX N , except that at the timesτ i , i ≥ 1, it is uniformly distributed in the second coordinate. Lemma A.4 and Lemma 3.1 of [13] should in principle allow for a direct argument for such a coupling, but there is a periodicity issue for the aplication of the latter result, so we find it convenient to go through an intermediate step, in order to deal with that issue.
; for the definition ofX N 2 , we partition the trajectory ofX N 2 into alternating portions, the first kind of which starts at a timeτ i for some i ≥ 1, whenX N 1 = x 1 for some x 1 ∈ M 1 , and ends whenX N 1 jumps (away from x 1 ); the second kind are the remaining portions of the trajectory ofX N 2 . Let us denote the number of steps in a given first kind of portion by G (it is a geometric random variable). X N 2 will be defined so that it gives the same number of steps asX N 2 within each portion of their respective trajectories, in the same overall order, as follows. We let , and, defining the random times in the above paragraph in the same way (and with the same notation) forX N andX N , we have thatX N 2 is uniformly distributed on D 2 at the timesτ i , i ≥ 2, in such a way that, with probability tending to 1 as N → ∞, we have thatX N 2 =X N 2 for all times tillτĪ . Notice that, sinceĪ ≤ I, Lemma A.4 holds also forĪ.
We now proceed to compute P σ (X N 1 (τ I ) = x, A y I ), x ∈ M 1 . The first step is to replacē X N in that expresion byX N . The error in thus doing is bounded above by the probability that either there is a visit ofX N 2 to y on the first step of Υ i or on the last (extra) step of Υ i for some i ≤Ī. But this is readily checked to be an o(1) by applying Lemma A.4 and Lemma 3.1 of [13].
For the next step, let I = min{i ≥ 0 : A LĪ +i occurs}. Then for x ∈ M 1 , apart from an o(1) error according to the above paragraphs, we have that where F i is the event thatX N 2 gives less than N 3 2 jumps between τ L k +i−1 and τ L k +i , i = 1, 2, . . ., and all the other quantities and events in the latter probability should be defined withX N replaced byX N . Using the Markov property, the right-hand side above can be written as Removing the A c 's and using the Markov property, we find that the expression in (6.41) is bounded above by (6.42) since the events F 1 , F 2 , . . . depend only onX N 1 , and starting with a uniform distribution on D 2 , its invariant distribution, at time L k ,X N 2 (L k + ) still has that same distribution. Now from Lemma A.2 and Remark A.3 above, each probability of the form P(F i |·) in (6.42) are bounded above by c/N for some constant c. We also notice that the latter probability in the same expression equals π(x). Thus the second term within brackets on the bottom of (6.40) is bounded above by and since the latter sum is over probabilities, it equals 1. It follows that the expression within brackets on the bottom of (6.40) equals where o x ,x is an o(1) for every x, x ∈ M 1 . Let us now considerπ(z) for a given z ∈ M 1 ; arguing in the same way as for the expression within brackets on the bottom of (6.40), we find that it equalsπ(z) plus an error bounded above by where F 1 , F 2 , . . . are defined in the obvious, parallel way to F 1 , F 2 , . . . above. From (6.38), we have thatπ(x), x ∈ M 1 , are all of the same order of magnitude. It follows that where o x is an o(1) for every x ∈ M 1 .
In particular, we have that P µ2 (Ā c i |X N 1 (τ i ) = x i ) ∼ 1 for all i = 1, 2, . . . and using also Corollary 1.5 of [6], we find that We note that the latter conditional probability does not depend on i = 1, 2, . . ..
The upshot of the above discussion is that the right-hand side of (6.39) may be written as , (6.46) whereR k is the k-th power of the matrixR =P (I −Π), where I is the identity matrix in M 1 andΠ is the diagonal matrix in M 1 with entries {π(y), y ∈ M 1 }. Now by (6.38), we have thatΠ ∼ M 2 Γ, with = N2 N1 1 c N 1 2 N 2 +1 and Γ = diag {γ 1 (y), y ∈ M 1 } is the diagonal matrix in M 1 with entries {γ 1 (y), y ∈ M 1 }; thus, it is an o(1) and from (6.45), we have thatR is a positive matrix for all large enough N . We may thus apply the Perron-Frobenius theory to write the internal sum in (6.46) as where ρ is the top eigenvalue ofR, andS = vw T , with v, w the right and left eigenvectors ofR associated to ρ such that v T w = 1. See Theorem 8.2.11, its proof, and the preceding and subsequent material of Section 2 of Chapter 8 of [26]. To check that ρ < 1 for all large enough N , we first note thatR is a perturbation ofP , which is stochastic, and thus has 1 as its top eigenvalue, and then resort to a standard perturbation result to the effect that ρ = 1 − M 2 w TP Γv w Tv + C 2 , wherev andw are right and left eigenvectors ofP associated to the eigenvalue 1, ∼ , and C is a constant. See Theorem IV.2.3 in [31]. Since the latter matrix is stochastic, we may takev as the vector with all entries equal to 1. By (6.45) and again well known perturbation results, we may takew ∼v (see Subsection V.2.3 of [31]), and from (6.45) we have thatP ∼ 1 M1 I; we thus get ρ = 1 − M 2 γ 1 + C 2 , whereγ 1 = 1 M1 y∈M1 γ 1 (y), and ∼ . We then have that ρ < 1 for all large enough N .
Again the Perron-Frobenius theory tells us that the expression within brackets in (6.47) decays exponentially fast in k, uniformly in N . Since π(x),π(x) are o(1) for all x ∈ M 1 , the infinite sum in (6.47) vanishes as N → ∞.
Again resorting to well known perturbation theory results, sinceR is also a perturbation of 1 M1 I, we have that v and w may be taken as v ∼v = (1, . . . , 1), and w ∼ 1 M1 v; it follows thatS ∼ 1 M1 I; the upshot is that the second summand in (6.47) is asymptotic to and thus so is the left hand side of (6.39).

Conclusion of the proof of Proposition 3.3
Let us show that P (X N 1 (τ I ) = x, A y I ) agrees with the entrance probability at left-hand side of (3.18) apart from an o(1) error. As it stands, the former probability is actually the probability thatX N 2 visits y during the first visit ofX N 1 to M 1 whereX N 2 hits M 2 : the visit ofX N 2 to y may be not the first one to M 2 . However, this probability is clearly an upper bound for the entrance probability, and if we subtract the following probability, we obtain a lower bound. For i = 1, 2, . . ., let B i denote the event thatX N 2 hits M 2 at least twice during the i-th visit ofX N 1 to M 1 . We may then estimate P µ2 (X N 1 (τ I ) = x, B I ) in the same way as above (starting in (6.39)), by replacing A y I , A y L k + by B I , B L k + , respectively, and π(x) by P µ2 (B 1 |X N 1 (τ 1 ) = x). But this is bounded above by the right-hand side of (A.12), which was shown above to be an o(π(x)). We thus get that P µ2 (X N 1 (τ I ) = x, B I ) is an o(1), and subtracting it from P µ2 (X N 1 (τ I ) = x, A y I ) gives us a lower bound for the entrance law; the right-hand side of (6.48) as the limit for the latter quantity follows.

Proof of Theorem 2.7
In this and next two sections, we present the proofs of the scaling limit theorems for X N , one section for each proof. We will use the results of Section 3 on entrance laws.
There are two remaining things to establish in the case of Theorems 2.4 and 2.5: that the process spends virtually all of the time at the top, and what time is spent at each visit of a top configuration. The structure of the proof of Theorem 2.7 is not dissimilar: we control time spent off the top states as before, then evaluate the time spent on visits of X N 1 to top first level states, and finally resort to a spectral gap argument to get the behavior on second level.
Specifically in this section, we will concentrate on showing that 1.X N spends virtually all the time on M; 2. the timeX N 1 spends on each visit to each x 1 ∈ M 1 is roughly exponential with mean f 3 (x 1 ); 3. given an interval of constancy I = [a, b) ofX N 1 whereX N 1 = x 1 for some x 1 ∈ M 1 , and t 1 , . . . , t k such that a < t 1 < . . . , t k < b for some k ≥ 1, we have that X N 2 (t 1 ), . . . ,X N 2 (t k ) are roughly independent random variables taking values on N, each distributed roughly with probability weights given by γ 2 (x 1 , ·) normalized.
As anticipated in Subsection 2.6, Point 1 is quite straightforward, and so is Point 3 as far as first level transitions are concerned; second level transitions are readily dealt with through spectral gap estimates. The main work concerns Point 2, where the gist of the argument is a law of large numbers for the number of visits to a given second level ground state configuration before the jump from a given first level ground state configuration -see (7.6).
After arguing these points in variable detail, we sketch an argument on how they fit together in a proof of Theorem 2.7. We start with the second point, after a few remarks.
Let us notice that the total time spent by σ N (·/c N 2 ) on any single visit to a given σ 1 ∈ V N1 can be written as are iid mean 1 exponential random variables, and Remark 7.1. It should be quite clear that (2.15) and (2.17) remain valid when we replace γ N 2 and γ 2 byγ N 2 andγ 2 , respectively -see paragraph right above the statement of Theorem 2.4 -; one reason for this is that the denominator in (7.2) tends to 1 as N → ∞ for every fixed x 1 .

Remark 7.2.
We may indeed assume that all the convergences mentioned in the previous remark, which are weak for the original environment, may be taken strong by going to another, suitable probability space for the environment (resorting to Skorohod's theorem). We will effectively, and for convenience, assume below that we are in the full measure event of such a probability space where those convergences take place, and omit further reference to it.

Time spent on top first level visits
with {T j (x 2 ); j ≥ 0, x 2 ≥ 1} an iid family of mean one exponential random variables independent from G. One may readily check that in distribution as N → ∞, where E is a mean one exponential random variable. We will next show that L N (x 2 ) → 1 (7.6) in probability as N → ∞ for every x 2 ∈ M 2 . This and (7.5) readily implies that in distribution as N → ∞. From (7.5), since G is independent from the family of exponential random variables entering L N (x 2 ), we may suppose that G is roughly equal toĉ N 1 r, with r > 0 a real number, whereĉ N 1 = 1/c N 1 . So rather than L N (x 2 ), we may consider insteadL and show thatL N (x 2 , r) → 1 (7.9) in probability as N → ∞ for every r > 0. We now note that the sum in (7.8) may be understood as the time spent on ξ x1x2 2 by a continuous time space homogeneous simple symmetric random walk in the V N2 during the firstĉ N 1 r jumps.
Let us consider the delayed renewal process associated to that random walk consisting of successive return times of that random walk to ξ x1x2 it jumps to after that initial time. We then have a delayed renewal process with renewal times E 1 + R 1 , E 2 + R 2 , . . ., with E 1 , R 1 , E 2 , R 2 , . . . independent, E 2 , E 3 , . . . and R 2 , R 3 , . . . identically distributed, E 2 a mean one exponential random variable, and R 2 is distributed as the hitting time of ξ x1x2 2 starting from a nearest neighbor of ξ x1x2 2 . E 1 may either be distributed as E 2 or vanish, depending on whether the state ofX N 2 at the beginning of the visit ofX N 1 to x 1 was x 2 or not. Similarly, R 1 may be either distributed as R 2 or as the hitting time of ξ x1x2 2 by the random walk starting from another site of V N2 not ξ x1x2 2 or a neighbor of ξ x1x2 2 . As will be clear below, neither (the distribution of) E 1 or R 1 will play a role in the result. Let S n = n i=1 (E i + R i ) and S n = n i=1 R i , n ≥ 1. Let N t the counting process associated to S n , namely N t = N (t) = sup{n ≥ 0 : S n ≤ t}, t ≥ 0, S 0 = 0. Notice that the sum in (7.8) is bounded below and above respectively by T j (x 2 ). (7.10) We now claim that in order to establish (7.9), it is enough to show that 1 K S Q → 1 (7.11) in probability as N → ∞, where K = K N =ĉ N 1 r, Q = Q N =ĉ N 1 r 2 −N2 . Indeed, from (7.11) and the law of large numbers satisfied by iid mean one exponential random variables, it readily follows that 1 K S Q → 1 in probability as N → ∞. This in turn readily implies that 1 Q N K → 1 in probability as N → ∞, and again the law of large numbers satisfied by iid mean one exponential random variables implies that either of the two expressions in (7.10), after division by Q, converges to 1 in probability as N → ∞. The claim is established.
We may ignore R 1 in the argument for (7.11), or take it identically distributed to R 2 . We take the Laplace transform of the left hand side of (7.11) as follows. For t > 0 whereĒ denotes expectation with respect to the law ofJ • N2 . It follows from Proposition 7.7.i.b of [6], after a simple adaptation to continuous time, that the expression within square brackets on the right of (7.12) can be written as 1 +õ , and (7.11) follows.

Equilibrium on the second level
Let us check the third point of the list outlined at the beginning of the section. We initially remark that during a constancy interval ofX N 1 , whereX N 1 = x 1 for a given x 1 ∈ D 1 ,X N 2 is the mapping via ξ of a continuous time simple random walk on the hypercube V N2 with mean waiting time at ξ x1x2 2 given by γ N 2 (x 1 , x 2 ), starting from whichever second level configuration σ N was at the beginning of the interval. We denote this random walk byσ N 2 . Let us now briefly argue the claim that the time to reach equilibrium for that random walk is of order smaller than that of the length of the constancy interval -which we just saw above to be of the order of the inverse ofc N = c N 1 2 N2 c N 2 . After straightforward adjustments, we may check that the bound derived in [19] for the associated Metropolis dynamics for the REM applies forσ N 2 , and we find that almost surely, where T 2 is the inverse of the spectral gap ofσ N 2 . It follows that (7.15) and max σ2∈V N 2 where G N2 is the equilibrium Gibbs measure forσ N 2 , which is proportional to the weights 1 · }, and Z N2 is the partition function associated to G N2 .
From well known results about the existence and exact expression for the limit of 1 N log of both factors inside the square root above, we have that almost surely that square root is bounded from above by e cN for some finite constant c for all large enough N . It immediately follows from (7.15) and the above that for times of the form t = s(c N ) −1 , s > 0, we have that the left hand side of (7.16) is almost surely bounded above by e cN e −e dN for all large enough N , with d > 0 related to the left hand side of (7.15), and thus it almost surely vanishes as N → ∞. This and (2.15) in turn readily imply the claim of the third point at the beginning of the section.

Preliminaries
We start with results about the number of visits of a given configuration σ by a random walk on V N before reaching vertex σ = σ . There are two initial situations: equilibrium and σ. Let τ σ = inf{k ≥ 1 : J • N = σ}, where J • N is the random walk on V N . We know from elementary theory of Markov chains that (see e.g. Theorems 1.7.5 and 1.7.6 in [28]; to get (7.17) we use also the fact that the uniform measure on the vertices of the hypercube is invariant for X , which is moreover irreducible). Let µ denote the uniform invariant measure for J • .  Proof. We write the left hand side of (7.18) as follows for all large enough n, where * = is due to the reversibility of X , and the inequality at the end follows from Theorem 1.6 of [6].

Time outside T 1
Let us estimate the time spent byX N 1 outside T 1 till the first visit to a vertex σ 1 ∈ T 1 , and between two visits to σ 1 . Let us fix x 1 ∈ M 1 and take σ 1 = ξ x1 1 .
Let U denote the first such time, which can be written as follows.
j , j, k ≥ 0 are iid mean 1 exponential random variables, independent from all the other random variables.
We note that J • N1 and J • N2 are independent discrete time random walks on the hypercubes V N1 and V N2 , respectively, each starting from its respective equilibrium distribution. Thus The conditional expectation on the right-hand side may be written as  (7.21) and thus lim M1→∞ lim sup N →∞ E • (U) = 0. Let now W i denote the time spent byX N 1 outside T 1 between the i-th and i + 1-st visit to σ 1 ∈ T 1 , i ≥ 1. A similar reasoning as above yields where we used (7.17)

Let nowΥ
be the time spent outside M byX N 1 on its i-th visit to σ 1 ∈ T 1 -recall the notation on the paragraph of (7.3). A similar reasoning to that leading to (7.20) and (7.22) yields As a corollary to (7.26) and (7.7), we have that, recalling that x 1 = (ξ 1 ) −1 (σ 1 ), its i-th visit to σ 1 ∈ T 1 , and E is a mean one exponential random variable.

Conclusion of the proof of Theorem 2.7
Let us now fit together the above points in an argument for Theorem 2.7. From the first claim at the beginning of the section (argued in Subsection 7.3), it is enough to show thatX N 1 restricted to M 1 converges toX 1 restricted to M 1 . We already know from (7.27) that the sojourn times ofX N 1 on the various vertices of M 1 converge in distribution to the respective sojourn times ofX 1 . We only have to argue that the jump probabilities ofX N 1 restricted to M 1 converge to the uniform jump probabilities of X 1 restricted to M 1 . But that is established in (3.13). We have then thatX N 1 converges in distribution toX 1 in Skorohod space, and the full statement readily follows from the third point claimed at the beginning of the section (and argued in Subsection 7.2).
This concludes the proof of Theorem 2.7.

Proof of Theorem 2.4
We start by showing thatX N spends virtually all the time on M.

Time spent byX N outside M
We will show that the expected time spent byX N outside M until the first visit to M or between consecutive visits to M is small. Indeed, we will argue that the first such time (the others can be similarly treated) vanishes in probability as M 1 , M 2 → ∞ uniformly in N . We will be more precise next.
We will show that lim M1→∞ lim M2→∞ lim sup N →∞Ū = 0 in probability. Given the tightness result for I/[c N 1 2 N2 ] given in Lemma A.4, it is enough to show that for all R Let us first point out that for every i ≥ 1,Û i is bounded above stochastically by U/[c N 1 2 N2 ] (see (7.19) above). (8.3) then follows from the above and (7.23).
where G(x 1 ) is a geometric random variable with mean N2 N1 exp{β }. We find that the expectation on the right-hand side equals γ N 1 (x 1 ) x2>M2γ and (8.4) follows upon substitutions into (8.5) and the left hand side of (8.4).

Conclusion of the proof of Theorem 2.4
Given the result in Subsection 8.1 and the usual constancy interval matching argument that can be used to show convergence in Skorohod spaces, it is enough to show convergence of the transition probabilities among sites in M (in the process restricted to M, which is a Markov jump process) to the respective ones of the respective limit process (the one restricted to M, which is also a Markov jump process), and the convergence of the respective sojourn times. The latter convergence is quite clear, and the former follows immediately from Proposition 3.3.

Proof of Theorem 2.5
We start by observing that we can check thatX N spends virtually all the time on M by virtually the same argument as for below fine tuning. Indeed, the corresponding expressions in the present regime of (7.19), (7.22) and (7.24) are the same, except for the factor of c N 1 2 N2 , which is absent in the present regime. But notice that that factor is bounded as N → ∞, and so the arguments of Subsections 7.3.2 and 7.3.3 carry through.
We can then repeat the argument for the conclusion of the proof of Theorem 2.4 on Subsection 8.2, once we have the convergence of the transition probabilities ofX N restricted to M to those of the limiting 2-level K process restricted to M. For x, y ∈ N, let P N (x, y) denote the transition probability ofX N | M . Then, using the remark in Subsubsection 6.2.1 and Proposition 3.1.i-1 and i-2, we have that .
It is then enough to argue that P(x, y) is the transition probability from x to y for the 2-level K process restricted to M. We do that next.
Let X| M denote X restricted to M. We can construct X| M as follows. LetX denote the 1-level K process used in the construction of X as at the end of Subsection A.1. Let us now construct a 2-level processX in the same way as X, except that we useX| M1 instead ofX. One readily checks that 1.X| M1 is a Markov jump process on M 1 with uniform initial state, uniform transitions on M 1 (we should allow loops), and jump rate at x 1 ∈ M 1 given by 1/γ 1 (x 1 ); 3. lettingX 1 denote the jump chain ofX| M1 , and, for z ∈ D 2 , defining the events A n = {during the n + 1-st sojourn period ofX 1 ,X 2 visits M 2 }, A z n = {during the n + 1-st sojourn period ofX 1 ,X 2 visits z before visiting M 2 \{z}}, we have that, givenX 1 , the events A * n n , n ≥ 0, with * n = y or blank for each n, are independent, having respective (conditional) probabilities given by (a) in the case of * n = blank: P (N T > 0), where N is a Poisson counting process with intensity M 2 , and T is exponential with meanγ 1 (X 1 (n)), N and T independent; we then have that N T has a geometric distribution with success parameter 1 1+M2γ1(X1(n)) =λX 1 (n) , and thus P (N T > 0) = 1 −λX 1 (n) ; (b) similarly, the probability of A y n givenX 1 equals 1−λX 1 (n)

M2
. Let P (x, y) denote the transition probability of X| M from x to y. From the above, we conclude that if x 1 = y 1 , P (x, y) = P (A y2 0 )+ n≥0 w1,...,wn−1∈M1 P (A c 0 , . . . , A c n−1 , A y2 n ,X 1 (1) = w 1 , . . . ,X 1 (n − 1) = w n−1 ,X 1 (n) = x 1 ) which is readily checked to equal P(x, y) given in (9.1), in this case. When x 1 = y 1 , we have that the same expression as in (9.2) holds for P (x, y), except for the first term in the sum, which is absent, and thus it agrees with P(x, y) again in this case.

Aging in the K processes
As anticipated in the introduction, we will derive aging results for σ N in a two-stage scaling limit process. We first take the limit in the extreme time scale, where there is no aging, since σ N is close to equilibrium in that time scale: we have already done that in our scaling limit theorems. In the second stage, we take a small time limit of the limiting K processes. We will be concerned with correlation functions which involve only clock processes of the limiting processes, so we will take the second limit only of the relevant clocks. We will keep the presentation brief, in particular at and below fine tuning, since the issues involved are quite clear in those regimes, and technicalities are quite well known and fairly straightforward.
Let Y = Y 1 Y 2 be the K process representing the scaling limit of eitherX N orX N in Theorems 2.4, 2.5 and 2.7. We assume Y (0) = ∞ in the first case, ∞∞ in the second case, and (∞,X 2 ) in the latter case, whereX 2 is as in Theorem 2.7. Given θ > 0, we are interested in taking the following limit lim tw ,t→0 t/tw →θ Π(t w , t w + t), (10.1) where Π(·, ·) was defined in 1.6. Recall also N i defined in 1.5.

Below fine tuning
This is the simplest case, since on the one hand, Y 2 jumps at every time interval, so there is no point in considering N 2 (which has probability 0). On the other hand, Y 1 is a uniform K process with waiting function given by a Poisson process with intensity measure c1 x α 1 +1 dx for some constant c 1 . It is well known (see e.g., [7]) that the limit in (10.1) is given by the arcsine law Asl α1 (1/(1 + θ)), where This follows readily from the scaling limit of the clock process of Y 1 at small times. This issue will come up again in the other temperature regimes, so we let it rest for this regime.

At fine tuning
We first notice that, for i = 1, 2, N i = {R i ∩ (t w , t w + t) = ∅}, where R i , i = 1, 2, is the range of the clock processes Γ 1 and Γ , respectively (see end of Subsection A.1 below). It is a simple matter to check that in this case N 2 ⊂ N 1 , so indeed Π(t w , t w + t) = pP (N 1 ) + (1 − p)P (N 2 ). Let us now point out that, as is well known, (10.1) follows from a small time scaling limit for the respective clocks in an appropriate topology. Let us first examine Γ -recall the definition at the end of Subsection A.1, and the statement of Theorem 2.5. As pointed out at the end of Subsection A.1, within intervals of constancy ofX, the increments of Γ are those of a uniform K process with waiting function f(x 1 , ·), where x 1 is the constant value ofX within the interval. We know that the clock process of a uniform K process with waiting function given by a Poisson process with intensity measure c x α+1 dx, α ∈ (0, 1), c any constant in (0, ∞), converges in the J 1 Skorohod metric to an α-stable subordinator in the small time limit for almost every realization of the Poisson process (see e.g., [7] with a = 0). This and the fact that f 2 (x 1 , ·) are given by iid in x 1 Poisson processes with intensity measure c2 x α 2 +1 dx, for some constant c 2 , yields ε −1 Γ (ε α2 × ·) → S 2 (·) in distribution on the J 1 Skorohod space as ε → 0 for a.e. realization of f 2 , f 2 , where S 2 is an α 2 -stable subordinator. It readily follows that almost surely lim tw ,t→0 t/tw →θ P (R 2 ∩ (t w , t w + t) = ∅) = Asl α2 (1/(1 + θ)). It can be also readily checked that ε −1 Γ 1 (ε α1α2 × ·) → S 1 := S 1 • S 2 (·) in distribution on the J 1 Skorohod space as ε → 0 for a.e. realization of f 2 , f 2 , where S 1 is an α 1 -stable subordinator. S 1 is thus an α 1 α 2 -stable subordinator, and it follows that almost surely lim tw ,t→0 t/tw →θ P (R 1 ∩ (t w , t w + t) = ∅) = Asl α1α2 (1/(1 + θ)).

Above fine tuning
We first point out that right after the weighted K process Y jumps out of any state in N 2 , it gives infinitely many jumps within any nonempty open time interval. This tells us that N 1 = N 2 = {R ∩ (t w , t w + t) = ∅}, where R is the range of Γ, the clock process of Y (see Subsection A.1 below).
Let us then derive the small time limit of Γ. Recall from [17] that we may write (10.5) where N x , x ∈ N 2 , are independent Poisson counting processes with rate γ 1 (x 1 ), respectively, and T · · are iid mean one exponential random variables. We will now argue that ε −1 Γ(ε α2 × ·) → S 2 (·) (10.6) in distribution on the J 1 Skorohod space as ε → 0 for a.e. realization of γ 1 , γ 2 , where S 2 is an α 2 -stable subordinator. It is enough to establish this convergence for Γ(r) := (see Lemma 2.1 in [7]). Since this is a subordinator, in order to get the small time convergence, it is enough to consider the Laplace exponent of the small time scaledΓ, given byφ (1 − e λε −1 γ2(x) ). (10.8) It is convenient at this point to write the scaled inner sum as ε α2 (1 − e λε −1 γ 1 2 (u) ).
Now standard large deviation estimates and an application of Campbell's Theorem, showing that Z := u∈[0,1] (1−e λε −1 γ 1 2 (u) ) has an exponential moment, imply thatφ ε (λ) → const × rλ α2 , where const = E(Z) x1∈N γ 1 (x 1 ). Given γ 1 , this is the Laplace exponent of an α 2 -stable subordinator, and (10.6) follows. Thus lim tw ,t→0 t/tw →θ Π(t w , t w + t) = Asl α2 (1/(1 + θ)). We end this section on aging by pointing out, as anticipated in the introduction, that the aging results in [29] are consistent with ours only in the fine tuning regime. As also earlier anticipated, this is explained by the shorter time scale considered in that reference. In those time scales, all levels are supposed to be aging simultaneously. That indeed also happens in the fine tuning regime at the short extreme scale considered in this section. Recall the discussion on Subsubsection 1.2.2.
We may understand the aging results in the other regimes treated in detail so far as follows. Below fine tuning, we already explained that the second level is well within equilibrium, so it does not age in the short extreme time scale (also in not much shorter time scales). The aging behavior in that phase comes from the first level, with its characteristic α 1 exponent.
Above fine tuning, we have the opposite behavior: the first level by itself would be in equilibrium, thus not aging, and aging comes from the second level, with its characteristic α 2 exponent.

Scaling limit at intermediate temperatures
In this section, we briefly state and discuss our scaling limit and aging results for β ∈ (β cr 1 , β cr 2 ). We will be rather sketchy, trusting the experienced reader to be able to readily fill in the gaps with standard arguments. Recall from the discussion around (7.1) that the total time spent by σ N (·) on a single visit to a given σ 1 ∈ V N1 can be written as where again G is a geometric random variable with mean N2 N1 exp{β σ1 }, and T i , i ≥ 0, are iid standard exponential random variables.
For top first level configurations σ 1 = ξ x1 1 , the factor in front of the sum above contributes a constant (1/(1 − p)) almost surely in the limit as N → ∞, and thus we are left to properly scale the sum itself. At this point we may replace G by 1−p p 1 c N 1 γ 1 (x 1 )T , with T a standard exponential, independent of all the other remaining random variables, and then resort to Proposition 1.9.ii of [25] which gives the proper scaling of the sum, as well as conditions under which the scaled sum satisfies a law of large numbers. Since the result in [25] applies for the REM (which is indeed the model appearing in the above sum), we need to do some translation in terms of our parameters. Upon doing that, we find that the proper scaling is given byc N = c N 1 exp{− β 2 N 2 (1 − a)}, and, provided β < 1−p 2 √ ap β * = β F T , the following law of large numbers holds: as N → ∞ in probability for each t > 0. It follows that the sum in (11.1) scaled byc N converges in distribution as N → ∞ toγ 1 (x 1 )T , whereγ 1 (·) := 1 p γ 1 (·). Given also that the transition among top first level configurations is asymptotically uniform, we find that the asymptotic motion among the top first level configurations is consistent with a (uniform) K process. We can state the following result.
Theorem 11.1 (Intermediate temperatures). If β ∈ (β cr 1 , β cr 2 ) ∩ (0, β F T ), then In order that the conditions on β above are not empty, we need of course β F T > β cr 1 , which is equivalent to p < 1/3; in this case, it may or may not happen that β F T < β cr 2 ; in the former case, (11.3) holds only in a (nonempty) subinterval of (β cr 1 , β cr 2 ), namely (β cr 1 , β F T ); otherwise, it holds in the full intermediate interval.
If β F T < β < β cr 2 , then Theorem 1.10 in [25] tells us that the sum on the left hand side of (11.2), when properly rescaled, converges weakly to a stable subordinator instead. This signals aging, and thus the time scale is not extreme. This clarifies a point raised in Subsubsection 1.2.2.
Under the conditions of the above theorem, the following aging result readily follows in the same way as in the previous section lim tw ,t→0 t/tw →θΠ 1 (t w , t w + t) = Asl α1 (1/(1 + θ)), (11.4) whereΠ 1 (t w , t w + t) is the probability thatX 1 gives no jump within (t w , t w + t).

A.1 K-processes
Let D be a countably infinite set, and let ∞ denote a point not in D, and makē D = D ∪ {∞}. Let f, w : D → (0, ∞) be such that Consider {C x , x ∈ D}, an independent family of Poisson counting processes such that C x has intensity w x for each x ∈ D, with associated point processes S = {(θ x (i), i ≥ 1), x ∈ D} (the event times of the respective counting processes). Let ω : R + →D be such that ω(θ x (i)) = x for x ∈ D, i ≥ 1, and ω(s) = ∞ if s / ∈ S. We note that ω is well defined almost surely. Let {T s , s ∈ R + } be an iid family of mean 1 exponential random variables. Let now ν be an atomic measure on R + concentrated on S as follows ν({s}) = f(ω(s)) T s , s ∈ S, (A.2) and let Γ be its distribution function, namely, Γ : R + → R + is such that Γ(r) = ν([0, r]), and let ϕ be the right continuous inverse of Γ. Then for t ≥ 0 let X(t) := ω(ϕ(t)).

(A.3)
We call X thus defined a K-process onD with waiting time function f, and weight function w, starting at ∞. Notation: X ∼ K(f, w). In the particular case where w ≡ 1, we call X a uniform K-process onD with waiting time function f, and use the notation X ∼ K(f, 1). Also, we call Γ the clock process of X. We next define 2-level K-processes, as follows. LetX be a uniform K-process onD with (w ≡ 1 and) f as above such that (A.1) is satisfied. Let D be a countably infinite set and as before makeD = D ∪ {∞}. Let f : D × D → (0, ∞) be such that Let {C x , x ∈ D } be an iid family of intensity 1 Poisson counting processes, independent of {C x , x ∈ D}, with associated point processes S = {(θ x (i), i ≥ 1), x ∈ D }.
Let ω : R + →D be such that ω (θ x (i)) = x for x ∈ D , i ≥ 1, and ω (s) = ∞ if s / ∈ S . We note that ω is well defined almost surely.
Let ν be an atomic measure on R + concentrated on S as follows.
Given the realization of the (1-level) K-processX, let I be the set of maximal intervals of constancy ofX (maximal time intervals whereX is constant), the second step of the above description amounts to constructing within each such interval, say [a, b), a 1-level uniform K-process with waiting time function f (x, ·), where x is the constant value of X within that interval. This results in what can be seen as an excursion of a 2-level K-process X = X 1 X 2 with X 1 ≡ x. This excursion takes place within the time interval [a , b ), with a = Γ (a), b = Γ (b). Outside the union of all such intervals, X ≡ (∞, ∞).
We call Γ the clock processes of X. We also call Γ 1 :=Γ • Γ the clock process of X 1 , whereΓ is the clock process ofX.
Proof. Since H 1 (y 1 ) = √ aN Ξ y1 , and {Ξ y1 , y 1 ∈ D 1 } are iid standard Gaussian random variables, we have that the left hand side of (A.7) above dominates a binomial random variable with N 1 trials and probability of success Φ(−1/(β √ a)) in each trial, where Φ is the standard Gaussian distribution function. Therefore, by standard large deviation bounds, there exists 1 > 0 such that the probability of the complement of (A.7) may be bounded above by c 1 2 − 1 N1 for some constant c 1 , and 1 > 0. Now the probability of C c N may be bounded above by for some constant c 0 , and 0 = 0 ( 0 ) > 0 such that 0 → 0 as 0 → 0; the result follows by choosing 0 , 1 > 0 such that 0 < 0 < 1 .
Lemma A.2. Let J 2 be the number of jumps ofX N 2 tillX N 1 reaches ξ −1 1 (T 1 ). Given σ 1 ∈ V N1 such that d 1 (σ 1 , T 1 ) > 0 N 1 , then provided B N and C N occur, andX N 1 starts from ξ −1 1 (σ 1 ), we have that P σ1 (J 2 ≤ N 3 ) vanishes exponentially fast as N → ∞, i.e., there exists R > 1 such that P σ1 (J 2 < N 3 ) ≤ R −N . where J • N1 is the jump chain ofX N 1 , namely, a simple symmetric discrete time random walk on V N1 , and G j (σ 1 ), j ≥ 0, σ 1 ∈ V N1 , are independent geometric random variables with success parameter q N := re − √ N ∧ 1. The right-hand side of (A.9) may be bounded stochastically from below as follows. Notice that since B N occurs, the chain d 1 (J • N1 (·), T 1 ) observed only when J • N1 is at a distance at most 0 N 1 from T 1 may be identified (in distribution) to a Markov chain Z in {0, 1, . . . , 0 N } which has the same transition probabilities as Y := d (J • N (·), O), where J • N is a simple symmetric random walk on a hypercube of dimension N , and O is a given site of such a hypercube, except that Z is lazy at 0 Nthe jumps of Y to the right of 0 N are replaced by self jumps at 0 N of Z. Also, in this case, Z starts from 0 N . Since C N occurs, every time J • N1 gives a jump within distance at most 0 N from T 1 , independent af all else, there is an at least 1 probability it will land on a site σ 1 ∈ V N1 such that q * N (σ 1 ) ≤ q N . The above justifies bounding stochastically the right-hand side of (A.9) from below by where J 1 is the number of jumps Y takes to reach 0 starting from 0 N , and G j , j = 0, 1, . . ., are iid random variables, independent of J 1 ; G 0 is a mixture of two random variables, the first of which, with weight 1 − 1 is 0, and the other, with weight 1 , is geometric with success parameter q N . Now, P (G 0 ≥ N 3 ) = 0 (1 − q N ) N 3 ≥ 0 > 0 for all large enough N , and (A.8) follows with R = (1 − 0 ) − 0 .
Remark A.3. From Theorem 3.1 of [6] starting from any σ i ∈ V Ni , the probability that X N i does not go a distance 1 2 N i from σ i before it returns to σ i is bounded above by 1/N + 4/N 2 .
Proof of Lemma A.5. We start by computing π(x). Let G denote the number of jumps ofX N 2 beforeX N 1 leaves x. G is a geometric random variable with success parameter q * N (ξ x 1 ), independent of τ σ2 . Then, setting λ N = N1 2 c N 1 γ N 1 (x) , and applying (4.13, 4.14), we get π(x) = 1 2 N2 , and the first claim of (6.38) follows upon noticing that the sum in the denominator on the right-hand side above is ∼ 2 N2+1 /N 2 .
As for the second claim, we writeπ(x 1 ) = P(∪ x2∈M2 H x2 |X N 1 (0) = x 1 ), where H x2 is the event thatX N 2 hits x 2 before the first jump ofX N 1 . By the Bonferroni inequalities, we have that Since the summands on the central expression above are identically equal to π(x 1 ), and the expression on the right-hand side equals π(x 1 ) x2 =x 2 ∈M2 P(H x 2 |X N 1 (0) = x 1 , H x2 ), it is enough to argue that each summand in the expression above is an o(1). But, given Lemma 4.5 above, each such summand is, apart form an o(1) error, the probability that, starting from the origin, an Ehrenfest chain on {0, . . . , N 2 } passes by bN 2 , with b = 1/3, before an independent time which is geometrically distributed with success probability q * N (x 1 ). Writing that probability as a moment generating function as above (see e.g. the first equality in (A.11)), and applying (4.13, 4.14), we have that that equals The quotient inside the latter sum is bounded above by 1, and thus (A.13) is bounded above by (A.14) As we saw above the first term of this sum is ∼ γ 1 (x 1 ) N2 N1 1 c N 1 2 N 2 , which is an o(1). The second term is readily checked to also be an o(1), and the claim is established.