From Derrida's random energy model to branching random walks: from 1 to 3

We study the extremes of a class of Gaussian fields with in-built hierarchical structure. The number of scales in the underlying trees depends on a parameter alpha in [0,1]: choosing alpha=0 yields the random energy model by Derrida (REM), whereas alpha=1 corresponds to the branching random walk (BRW). When the parameter alpha increases, the level of the maximum of the field decreases smoothly from the REM- to the BRW-value. However, as long as alpha<1 strictly, the limiting extremal process is always Poissonian.


Introduction and main result
The Gaussian fields we consider are constructed as follows. Let α ∈ [0, 1] and N ∈ N. We refer to the parameter N as the size of the system. For j = 1 . . . N α and σ j = 1 . . . exp (N 1−α log 2), consider the vectors σ = (σ 1 , . . . , σ N α ). (We assume, without loss of generality, that N and α are such that N α and N 1−α are both integers). We refer to the indices j = 1 . . . N α as scales, and to the labels σ as configurations. The space of configurations is denoted by Σ where σ ∧ τ ≡ inf {j ≤ N α : (σ 1 , . . . , σ j ) = (τ 1 , . . . , τ j ) and σ j+1 = τ j+1 }. In spin glass terminology, σ ∧ τ is the overlap of the configurations σ and τ . In other words, the Gaussian field X (α,N ) is hierarchically correlated. The parameter α governs the number of scales in the underlying "trees". The choice α = 0 yields the celebrated REM of Derrida [12]; in this case the tree consists of a single scale (only for this boundary case is the field uncorrelated). The choice α = 1 yields the (classical) BRW, also known as the directed polymer on Cayley trees [14]: in this model, the number of scales grows linearly with the size of the system. In this sense, the fields X (α,N ) interpolate between REM and BRW (remark that these boundary cases are, within our class, the least resp. the most correlated fields). See Figure 1 below for a graphical representation.

Figure 1. Trees interpolating between REM and BRW
A fundamental question in the study of random fields concerns the behavior of the extreme values in the limit of large system-size. The case of independent random variables is simple, and completely understood, see e.g. the classic [20]. On the other hand, the study of the extremes of correlated random fields is a much harder question. There is good reason to develop an extreme value theory for Gaussian fields defined on trees: besides being typically amenable to a detailed analysis (see e.g. [3,5,7,8,9,10,15,21]), Gaussian hierarchical fields should be some sort of "universal attractors" in the limit of large system-size; this claim is a major pillar of the Parisi theory [23] which has remained to these days rather elusive (see however [18] and references therein for some recent advances). Our main result provides a characterization of the weak limit of the extremes of the hierarchical field (1.1).
and consider the random Radon measure on the real line Then Ξ Apart from the case α = 0, the picture depicted in Theorem 1 seems to be new. There is good reason to leave out the case α = 1: to clarify this, and to shed further light on our main result, let us spend a few words. First, the theorem implies that a (α) N is the level of the maximum of the random field X (α,N ) , and Ξ (α) N is then the extremal process. It steadily follows from the convergence of the extremal process that the maximum of the field, recentered by its level, weakly converges to a Gumbel distribution. As expected under the light of (say) Slepian's Lemma, the level of the maximum decreases when α (hence the amount of correlations) increases. However, this feature is only detectable at the level of the second order, logarithmic corrections; curiously, the pre-factor 1 + 2α interpolates smoothly between the REM-and the BRW-values ("from 1 to 3"). Notwithstanding, as long as α < 1 strictly, and in spite of what might look at first sight as severe correlations, all our models fall into the universality class of the REM, which is indeed characterized by convergence towards Poissonian extremal processes. In the boundary case of the BRW, the picture is only partially correct: the logarithmic correction is still given by a (α) N with α = 1, see [1,2,11], yet the weak limit of the maximum is no longer a Gumbel distribution [19], nor is the limiting extremal process a simple Poisson process [3,5,13,21].
We conclude this section with a sketch of the proof of our main result. A natural approach would be to choose a (α) N such that the expected number of extremal configurations in any given compact A ⊂ R is of order one in the large N -limit. However, with the level of the maximum as given by Theorem 1, classical Gaussian estimates steadily yield which is exploding as soon as α > 0 strictly. The reason for this is easily identified: by linearity of the expectation, we are completely omitting correlations, but these turn out to be strong enough to affect the level of the maximum. To overcome this problem, we rely on the multi-scale analysis which has emerged in the study of the extremes of branching Brownian motion (see e.g. [18]). To formalize, we need some notation. First, for a given σ ∈ Σ (α) N , we refer to the process as the path of a configuration. (The process S σ is a random walk with Gaussian increments, i.e. a discrete Brownian motion). We refer to any function F N : the modified (extremal) process. A key step in the proof is to identify a barrier E N , see (2.8) below for its explicit form, such that for any compact A ⊂ R, This naturally entails that the weak limit of the extremal process and that of the modified process must coincide (provided one of the two exists). We will thus focus our attention on the modified process Ξ (α) N,E N , thereby proving that mean of the process as well as its avoidance functions converge to the Poissonian limit as given by Theorem 1, to wit: By (1.3) and (1.4), it follows from Kallenberg's theorem on Poissonian convergence [17], that the modified process weakly converges to the Poisson point process Ξ; but by (1.2), the same must be true for the extremal process, settling the proof of Theorem 1.
The rest of the paper is devoted to the proof of (1.2), (1.3) and (1.4). Since α ∈ [0, 1) is fixed throughout, we lighten notations by dropping the α-dependence whenever no confusion can possibly arise, writing e.g.
Acknowledgements. This paper owes much to conversations with Bernard Derrida, who raised in particular the question whether models with an increasing number of scales can provide (possibly quantitative) insights into the fractal structure of the extremal process of BRW/branching Brownian motion. Unfortunately, our main result shows that this is not the case, at least as long as α < 1. It is tempting to believe that letting α depend on the size of the system (i.e. α(N ) → 1, as N ↑ ∞) gives rise to more interesting extremal processes.

Barriers, and the modified processes
The goal of this section is to construct the barrier E N to which we alluded in the introduction, and to give a proof of (1.2) and (1.3). In a first step, we construct a barrier which is not "optimal", but which provides important a priori information: It then holds: lim Proof. By Markov inequality, and simple counting, it holds: (2.1) By classical Gaussian estimates, the probability on the r.h.s. above is at most Using this, and straightforward estimates, we get which is evidently vanishing in the large N -limit, since 3α−1 2 < β c .
The above Lemma immediately implies that the weak limit of the extremal process Ξ N and the weak limit of the modified process Ξ N,U N must necessarily coincide (provided one of the two exists). We now identify conditions under which this remains true for barriers which lie even lower than U N . Lemma 3. Consider a barrier F N with the following properties: Then the weak limits of Ξ N,F N and Ξ N,U N coincide (provided one of the two exists).
Proof. The Lemma steadily follows from the claim The proof of (2.2) is straightforward. Simple rearrangements and subadditivity imply that the for probability of the complementary event, it holds: Building the complement, (2.2) immediately follows.
By the previous Lemma, and in view of a proof of the main theorem, it is crucial to identify conditions for which the mean(s) of the modified process(es) converge to a finite limit. This is done by For A ⊂ R compact, and µ as in Theorem 1, it holds: Proof. By linearity of the expectation, and by conditioning on the "terminal event", Let us focus on the conditional probability: we first write this as Inspection of the covariances shows that the Gaussian vector S σ k − k N α X σ , k = 1 . . . N α is, in fact, independent of X σ . Using this, and rescaling by (2.5) Again by inspection of the covariances, one immediately realizes that the law of the . . N α is that of a (discrete) Brownian bridge of lifespan N α , starting and ending in 0. To lighten notations, let (B N α (k), k ≤ N α ) be such a Brownian bridge, and shorten It thus holds: One immediately checks that within our choice of the barrier F N , and since α < 1 strictly, in which case it follows from the Lemmata in the Appendix that uniformly for x in compacts, and for N ↑ ∞. Plugging this into (2.3), we have The claim of the Proposition then immediately follows by straightforward estimates on the Gaussian density.
We can finally specify our choice of the barrier E N alluded to in the introduction. The optimal choice is (by far) not unique, and depends on an additional free parameter γ. The only requirement is that With any γ satisfying (2.7), and U N as in Lemma 2, we set This choice of a barrier clearly satisfies the assumptions of Proposition 4 and also Lemma 3. This has two fundamental consequences: first, the weak limit of the modified process Ξ (α) N,E N and that of extremal process Ξ (α) N must necessarily coincide (provided one of the two exists); second, the mean of the modified process converges to the alleged limit, i.e. (1.3) holds with E N as a barrier. Theorem 1 will thus follow as soon as we prove that avoidance functions (1.4) also converge with the very same choice for the barrier. This will be done in the next section. Before that, we shall briefly comment on the choice (2.8) of the barrier. (The discussion is intentionally informal: for details, the reader is referred e.g. to [18].) By Lemma 2, the path of extremal configurations (the process k → S σ k for σ s.t. X σ ≈ a N ) must necessarily satisfy the "U N -barrier condition". As we have seen in Proposition 4, conditioning onto the terminal event turns the path into a Brownian bridge which is required to stay below 0 during its lifespan. It is well known that in order to achieve this, the bridge will behave within good approximation as the path of its modulus, k → − | S σ k |, which is typically much lower than the shift −N γ 1 k =0,N α for γ < (1 − α)/2 (this is the so-called entropic repulsion, see e.g. [4]). In other words, requiring that the paths stay below E N is no stricter requirement than asking them to stay below U N . On the other hand, restricting the analysis on configurations whose paths stay below E N forces the expected number of correlated extremal pairs to vanish in the large N -limit: this stands crucially behind the Chen-Stein method which we implement below.

Convergence of the avoidance functions
The goal of this section is to prove (1.4), which we recall reads where E N is given by (2.8), A is any compact set, and Ξ is a Poisson point process with density µ(A) = A e −βcx dx/ √ 2π. To do so, we will use the so-called Chen-Stein method [6, Theorem 1A]. We begin with a warm-up computation. In what follows, we write E N (σ) for the event that a configuration σ satifies the "E N -barrier condition", more precisely: Recall that for two configurations σ, τ ∈ Σ (α) N , we denote by σ ∧ τ their overlap, namely the first scale at which the two configurations do not coincide.
Lemma 5 (Extremal pairs). Let A ⊂ R be compact. With the above notations, it holds: It follows from Lemma 5 that energies of extremal configurations are, in fact, independent random variables. It will come hardly as a surprise that this feature stands behind the onset of the Poisson point process in the large N -limit.
Proof of Lemma 5. By linearity of the expectation, and re-arranging the ensuing sum according to the possible overlap-values, it holds: Let us focus on the probability on the r.h.s. above: since σ and τ coincide up to scale K, by conditioning on the "trunk" which is shared by σ and τ , we get On the event appearing in (P ) we drop the E-requirements: by independence of the paths after the "branching point", this leads to This steadily implies that the r.h.s. of (3.4) is at most (3.5) where λ denotes Lebesgue measure. The argument of the exponential in (3.5) is easily seen to be bounded by β c (3 ln (3.2), and using that which is evidently vanishing in the large N -limit.
We can now finally move to the last missing piece, namely a proof of convergence of the avoidance functions (3.1). As mentioned, the main technical device here will be the so-called Chen-Stein method, [6,Theorem 1A]. To implement this, we need to introduce some notation. For compact A ⊂ R, we shorten and denote by L N (A) the law of the random variable Ξ N,E N (A). For a (sigma-finite) measure ν on R, we denote by Pois ν(A) the law of a Poisson random variable with mean ν(A). For ρ, ρ ∈ M 1 (R) two probability measures on R we denote by d T V (ρ, ρ ) their distance in total variation. In order to closely stick to the notation in [6], we write and define, for given σ ∈ Σ N , For a last piece of notation, we shorten p σ ≡ E[I σ ].
Coming back to our main task of proving (3.1), with µ(A) as in Theorem 1, it holds: The convergence of µ N (A) towards µ(A) is guaranteed by Proposition 4; in virtue of simple properties of Poisson random variables, this convergence implies that the second term on the r.h.s. above vanishes in the limit of large N . Concerning the first term on the r.h.s. of (3.7): the Chen-Stein method [6, Theorem 1A] yields the bound Since for any σ ∈ Σ N , p σ = 2 −N µ N (A), one immediately gets and by simple counting, (3.10) Plugging (3.9) and (3.10) in (3.8) we get the last equality by definition of Z σ . Since µ N (A) converges to a finite limit (by Proposition 4), the first two terms in the last display of (3.11) vanish in the limit of large N ; the third term is exactly what was analyzed in Lemma 5, and therefore also vanishing. All in all, (3.7) is vanishing, hence (3.1) holds and the proof of Theorem 1 is concluded.

Appendix
A fundamental ingredient in the proof of Theorem 1 are the estimates (2.6) on Brownian bridge probabilities appearing in the proof of Proposition 4; these are somewhat classical [16], sometimes going under the name of "ballot theorems". For the reader's convenience, we give here a short proof of the estimates as needed in our framework. Lemma 6. Let (∆ i ) i∈{0,..,n−1} be i.i.d random variables having a density with respect to the Lebesgue measure and (B n (j), j ∈ {1, .., n}) the related bridge, i.e. Proof. We refer to (∆ i ) i∈{0,..,n−1} as increments. The event in (3.12) is equivalent to the maximum of the bridge being lower than zero. Let m ∈ {0, .., n − 1} be the position of the maximum; remark that this is almost surely unique by the density-assumption. One steadily checks that applying a cyclic permutation, say π, to the increments of the bridge, shifts the position of the maximum to π −1 m. There is one cyclic permutation only, saŷ π, which shifts the position of the maximum to the origin, i.e. for whichπ −1 m = 0. On the other hand, the distribution of the bridge is not affected by any permutation, henceπ must be uniformly distributed among the n possible cyclic permutations: since the event in (3.12) is equivalent toπ being the identity, the Lemma follows.
In other words, the probability that a discrete bridge stays below zero during its lifetime decays as the inverse of the length of the bridge. On the other hand, since our bridges have "square-root fluctuations", one expects that whether the bridge is required to stay below zero or below a straight line shouldn't alter (much) the asymptotic behavior of these probabilities. This is indeed the case: The proof in the cases ε > 0 and ε < 0 are similar, we thus consider only the first case.