On the overlap distribution of Branching Random Walks*

In this paper, we study the overlap distribution and Gibbs measure of the Branching Random Walk with Gaussian increments on a binary tree. We first prove that the Branching Random Walk is 1 step Replica Symmetry Breaking and give a precise form for its overlap distribution, verifying a prediction of Derrida and Spohn. We then prove that the Gibbs measure of this system satisfies the Ghirlanda-Guerra identities. As a consequence, the limiting Gibbs measure has Poisson-Dirichlet statistics. The main technical result is a proof that the overlap distribution for the Branching Random Walk is supported on the set {0, 1}.


Introduction
In this paper, we study the Branching Random Walk (BRW), or directed polymer, on a binary tree. To fix notation, let T N be the binary tree of depth N and let {g v } v∈T N \∅ be a collection of i.i.d. standard Gaussian random variables indexed by this tree without its root. We define the Branching Random Walk by where p(v) is the root-leaf path to v excluding the root. Viewed as a gaussian process on ∂T N , (H(v)) v∈∂T N is centered and has covariance structure where v ∧ w denotes the least common ancestor of v and w and |α| is the depth of α ∈ T N . In particular, One can think of the root-leaf paths on the tree as polymer configurations and H N as an energy. We denote the partition function corresponding to this polymer model by Z N (β) = e βH N (v) and the free energy by If we denote the Gibbs measure by G N,β (v) = e βH N (v) Z then this induces a (random) probability measure on the leaves Σ N = ∂T N . In the following · N denotes integration with respect to this measure or the corresponding product measures. We will drop the subscript N when it is unambiguous. Finally, let R(v, w) = 1 N |v ∧ w|, and let R 12 = R(v 1 , v 2 ), which we call the overlap between two polymers. An important object in the study of mean field spin glasses is the (mean) overlap distribution µ N,β (A) = EG ⊗2 N,β (R 12 ∈ A). (1.1) The Branching Random Walk, was introduced to the mean field spin glass community in [17]. There, Derrida and Spohn argued that the statistical physics of this model should be similar to the Random Energy Model (REM). They predicted that the overlap distribution should consist of one atom at high temperature and two atoms at low temperature. In the language of Replica theory it should be Replica Symmetric (RS) at high temperature and one step Replica Symmetry breaking (1RSB) at low temperature. Furthermore, they predicted that, as with the REM, the limiting Gibbs measure of the system should be a Ruelle Probability Cascade (see the discussion preceding Corollary 3.6 for a definition). As a consequence, it was suggested [15,17] that the BRW should serve as an intermediate toy model for spin glass systems, between the REM and the Sherrington-Kirkpatrick (SK) model, as it is still analytically tractable, while having a key feature of SK that the REM lacks: a strong local correlation structure.
The study of replica symmetry breaking in its various forms is the subject of major research in the mathematical spin glass community. As such it is of interest to have a few simple, but non-trivial examples in which Replica Symmetry Breaking can be seen essentially "by hand". In this paper, we give proofs of the predictions of Derrida and Spohn described above using a combination of arguments that are basic to both fields. In particular, we avoid the use and analysis of the extremal process.
We begin first with the study of Replica Symmetry and Replica Symmetry Breaking. Replica Symmetry above the critical temperature was proved by Chauvin and Rouault in [15]. Our contribution is proving Replica Symmetry Breaking below the critical temperature, and in particular obtaining the mass on the atom at 1.
weakly as measures We now turn toward the characterization of the Gibbs measure for this system. Our next result is to prove that the Gibbs measure satisfies a class of identities called the Approximate Ghirlanda-Guerra Identities which will imply the Ruelle Probability Note that by Theorem 1.1, only the case β > β c is interesting in the above theorem.
Our proof is a version of the technique introduced by Bovier and Kurkova in [11,12] (see [10] for a textbook presentation) and is analogous to [5,6]. An immediate consequence of this is that the overlap array distribution for these systems converges to a Ruelle Probability Cascade, see Corollary 3.6. This also implies a mode of convergence of Gibbs measures and the convergence of the weights of balls in support a Poisson-Dirichlet process, which was first proved by Barral, Rhodes, and Vargas in greater generality by different methods [8]. This is explained in the discussion surrounding Corollary 3.6.
For experts in Branching Random Walks, we emphasize here the following point. Just as in the work of Arguin-Zindy, these methods allow us to obtain Poisson-Dirichlet statistics for the system without an analysis of the extremal process. In particular, we can avoid an analysis of the decoration (see, e.g., [27,21] for this terminology) thereby side-stepping a major technical hurdle.
The Approximate Ghirlanda Guerra Identities (AGGI) have emerged as a unifying principle in spin glasses. Due to the characterization-by-invariance theory [24], we know that the limiting overlap distribution is an order parameter for models that satisfy the AGGIs, as originally predicted in the Replica Theoretic literature [22]. As such, it has become very important to find models that satisfy these identities in the limit. This has proved to be very difficult.
They are known to hold exactly for the generic mixed p-spin glass models [24], the REM and GREM [11,12]. These ideas have extended to the 2D Gaussian Free Field and a class of Log-Correlated fields [5,6]. For many other models, however, we only know these results in a perturbative sense [16,19,24,25]. A contribution of this paper is the observation that the Branching Random Walk falls in to the class of models for which these identities hold exactly.
We finally turn to the main technical step involved in the proofs of the above results. Just as with the REM, both of these predictions can be shown to follow from standard concentration and integration-by-parts arguments provided one can show that the model is at most 1RSB and that the top of the support is at 1 when it is 1RSB. To our knowledge this result is thought of as folklore in the Branching Random Walk community. For example, such a result follows from similar ideas to those in [4,5,6,21]. The proof of this result for similar models can be seen in [4] and [18]. In our setting, this is the content of the following proposition. That is, for any weak limit we get that for some m ∈ [0, 1].
In the Random Energy Model, this is a consequence of the second moment method combined with a large deviations estimate. In our setting, however, this argument breaks down due to the correlation structure of the Branching Random walk. This is often explained by the seemingly innocuous observation that the sub-leading correction to the expected maximum (see the definition of m N in Section 2) is of the same order of magnitude as in the REM, but the pre-factor is 3/2 as opposed to 1/2. We point the reader to [3, Section 3.2] for a discussion of how this small change is a signature of a profound structural difference.
In the study of such models, this issue is dealt with by a truncated second moment method approach. In our setting, this takes the form of the tilted barrier estimates of Bramson, see Section 2.1.
Before turning to the proofs of the above, we make the following remarks.
Remark 1.5. In our setting, one does not need the full power of the Ghirlanda-Guerra identities to obtain the aforementioned characterization of the Gibbs measure. In particular, as a consequence of Proposition 1.4, the Approximate Ghirlanda-Guerra Identities are equivalent to an approximate form of Talagrand's identities [24], which characterize the Poisson-Dirichlet process through its moments. This is explained in more detail in the discussion surrounding Corollary 3.6.
Remark 1.6. These arguments do not depend greatly on the Gaussian nature of the problem. In particular, the main technical tool, Proposition 1.4, holds in fairly large generality (see Remark 2.1). The remaining results are essentially consequences of the sub-Gaussian tails of the model and an applications of integration-by-parts. These results should extend to increments that have sub-Gaussian tails. For experts, we also note that if the decoration process has enough moments, the first two results follow by an application of the Bolthausen-Sznitman invariance (see [9][24, Sect. 2.2]). As a study of the extendability of these results are not within the scope of this paper we do not examine these questions further. discussions regarding Branching Random Walks and for a careful reading of an early draft of this paper. This research was conducted while the author was supported by an NSF Graduate Research Fellowship DGE-1342536, and NSF Grants DMS-1209165 and OISE-0730136. Preparation of this manuscript was partially supported by NSF OISE-1604232.

The support of the overlap distribution
In this section, we will prove Proposition 1.4, namely that the support of the overlap distribution is the set {0, 1}. To this end, we introduce the following notation. Let denote the walk corresponding to the BRW at vertex v. In this notation, H N (v) = S v (N ). Let S(l) denote a random walk with standard Gaussian increments and let P z denote its law conditioned to start at z.
We think of the collection of walkers (S v ) v∈∂T N as a pack of walkers that branch at each time step. We call M N = max v∈∂T N S v (N ) the leader of the walkers. With slightor, depending on your taste, great-exaggeration, we call m N = β c N − 3 2βc log N . To justifying this simplification, we remind the reader of the result from [2] that the family Of course this does not necessarily show that m N the true location of M N . The true location will be an order 1 correction from this. Finally let λ N = m N N , and Λ(x) = x 2 the following we will drop the subscript N whenever possible for readability, and we say f a g if f ≤ C(a)g where C is a constant that depends at most on a.
This section is organized as follows. First we prove a basic estimate that will be used through out the section. We then prove the main estimates required to prove Proposition 1.4. We finally turn to the proof of Proposition 1.4. Before we start we make a brief remark regarding the extension of these results to more general increments than Gaussian.
Remark 2.1. The results of this section hold in more generality than we study here. For the interested reader, note that in the following we use sub-Gaussian tails (we believe that one can relax this, however computing the free energy in this setting becomes delicate), that the increments are i.i.d. and have support on (−1/2, 1/2), and finite moment generating function and rate functions, and that the tree is k-ary. In this setting, m N = N x + O(log N ) where xuniquely solves Λ * (x) = log k and x > EX, and λ uniquely achieves the equality in Λ * (x) + Λ(λ) = λx. For more on this see, e.g., [13]. To avoid un-necessary notation and technicalities, we will stick to the Gaussian case where we have the self-duality of the moment generating function, Λ = Λ * .

Tilted barrier estimates
In the following, we will repeatedly use of a class of estimates called tilted-barrier estimates. These estimates are used frequently in the study of Branching Random Walks and Log-Correlated fields and were, to our knowledge, introduced by Bramson [14]. The goal of these estimates is to bound probabilities of the form We think of the underlying event as follows: there is a random walker, S(l), which starts at z, and two lines, λl and λl + K, which are barriers. Our goal is to compute the probability that the walker stays below the farther barrier, λl + K, for the duration of its walk but ends in a window near the nearer barrier, λl.
The idea of the estimate is to tilt the law of the walker, P z , to a new measure, Q −z , so that under Q −z , the walkS(t) = λ · l − S(t) will be centered. The result will then follow by an application of the ballot theorem applied toS. We will make this precise in Proposition 1.4. Before proving this estimate, we first prove the relevant ballot theorem.
Gaussian and X 0 is the starting position. For any A, B ∈ Z with 0 ≤ A < A + 1 ≤ B, and z ≥ 0, we have that Proof. Let τ h = min {0 < k ≤ n : S(t) < −h} with the convention that if the condition never happens, τ h = n. Define the time reversed, reflected walk, S r , That is, S r (t) = S(t) − S(n). Let τ r h be the same hitting time as before for the reversed walk.
Observe that if S(t) ≥ 0 for t ∈ [n] and S(n) ∈ [A, B], we must have that τ 0 ≥ n 4 , τ r B ≥ n 4 , and S(n) ∈ [A, B]. This is for the following reasons. The first condition follows immediately from the positivity. The second condition follows from the fact that if τ r B ≤ n 4 < n 2 , S(n) splits as  Observe that under Q −z (S), the walkS has no drift, i.e., E Q −z S = 0, and starts at −z. Note that By Lemma 2.2 and the choice of λ, we have that

Applications of concentration and tilted barrier estimates
In this subsection, we prove three estimates regarding the probability of the Branching Random Walker having walkers that behave pathologically. Before we begin, we remind the reader of the interpretation of H N in terms of the walkers S v , and the interpretation of m N as (essentially) the location of the leader M N = max v∈∂T N S v discussed at the beginning of this section.
In our first estimate, we will show that there is a barrier beyond which it is unlikely for any walker to cross. In order for this probability to go to zero, we will need that the barrier drifts off to infinity logarithmically in N . This will follow from the Gaussian tails of the increments. To make this precise, define the event which is the event that there is a leaf v whose corresponding walker S v crosses the barrier L K at some time l ≤ N . We then have the following lemma. Proof. By the union bound and the Gaussian tail inequality, we see that which is the event that there is a leaf, v, whose corresponding walker, S v , enters the window λl + [0, K] on the time scale of N [ , 1 − ]. The probability of this event is bounded as follows.
Lemma 2.6. For all x, K ∈ Z, x > 0, K ≥ 1, and ∈ (0, 1/2), we have that On the overlap distribution of Branching Random Walks Proof. By a union bound The summand satisfies (2.5) We now compute the multiplicands in the summand. The first multiplicand can be controlled by the tilted barrier estimate (2.1) to get To bound the second multiplicand, first let T = N − T . Observe then that for all z ∈ [0, K − 1], we have K − z ≥ 1, so that by the titled barrier estimate yields Plugging (2.6)-(2.7) into (2.5) and plugging this into (2.4), yields the desired bound.
Our last estimate (once combined with the above two estimates) shows that it is unlikely for there to be two walkers who branch on the genealogical time scale T -that is, that here are two v, w ∈ ∂T N with |v ∧ w| = T -and both end only order 1 away from the leader m N (recall again the interpretation of m N from the beginning of the section). We make this precise in the following lemma and corollary.
First we fix a pair v, w ∈ ∂T N with |v ∧ w| = T and bound the probability of this pathological event. To this end, define for such a pair v, w the event This is the event that the walkers corresponding to v and w stay below the barrier λl + K for all time, stay below the barrier λl on the time scale tN , and end in the window [m N − x, m N + K]. We control the probability of this event as follows Proof. Observe that We bound the two multiplicands by application of the tilted barrier estimate. Observe that the first multiplicand satisfies and that the second satisfies Here we used that K − z ≥ 0 (for us z ≤ 0). Combining the results then yields the desired estimate, namely By an application of the union bound, we see that the previous estimate implies that it is rare for there to be any pairs of leaves |v ∧ w| = T that have this behavior.

Corollary 2.8.
Let v T , w T ∈ ∂T N be a pair satisfying |v ∧ w| = T . Under the conditions of Lemma 2.7, we have that for ∈ (0, 1 2 )

Proof of Proposition 1.4
The proof of Proposition 1.4 will now follow immediately from an application of the following two estimates. The first estimate says that below the critical temperature, the Gibbs measure gives no mass to points that more than a large, but order 1, distance from the leader. This follows more or less immediately from the sub-gaussian tails of the increments and the tightness of the (centered) leader, M N − m N . Lemma 2.9. Let β > β c . Then for each x, Proof. Before we begin, we make the following useful definitions. For readability, we suppress dependence on N whenever it is unambiguous. Let and there is a universal constant c 1 such that for all y, u with 0 ≤ u + y ≤ √ N , u ≥ −y, P (N N y ≥ e βc(y+u) ) e −βcu+C log + (y+u) . Let x 0 = x 0 (α, β) be such that for all y ≥ x 0 , c 1 log((1 + α)y) ≤ βα 2 y. Then for all y ∈ [x, u N ], P (N N y ≥ e βc(1+α)y ) e − βc α 2 y . This implies that Note that on the event M K , we have Z ≥ e βm N −βK from which it follows that Putting these together yields Since K was arbitrary, we may take K = c (α)x 2β , and the result follows.
We now show that it is unlikely to have two points that are both order 1 away from the leader, but overlap strictly in (0, 1). The idea of this estimate is that for this to happen there must be two walkers, S v1 and S v2 , whose branching time is of order N and both land order 1 away from the leader. This event is rare by the above.
where the last inequality follows by Lemma 2.6 and Corollary 2.8. Choosing L = c log N for c large enough, and sending N → ∞ then yields the result after applying Lemma 2.6.
We now prove Proposition 1.4.
Proof of Proposition 1.4. By [15], it suffices to take β > β c . Note that it suffices to show The result then follows from Lemma 2.9 and Lemma 2.10 by taking N → ∞ and then x → ∞.

The Derrida-Spohn conjecture and the Ghirlanda-Guerra identities
In this section we prove the Derrida-Spohn conjecture and show that the Branching Random Walk satisfies the Ghirlanda-Guerra Identities.

Derrida-Spohn conjecture
The proof of the Derrida-Spohn conjecture will follow immediately after the following technical preliminaries. Recall first the following result of Chauvin and Rouault.
Furthermore, for any bounded measurable f on Σ n , .
As a consequence we have the following where µ is a limit point of the mean overlap measure.
Proof. Notice that Lemma 3.2 gives Where µ N,β is as per (1.1). Since F N and F are convex and C 1 , and F N → F point-wise on R + , we have that F N → F . This gives us the lefthand side of the desired equality.
Furthermore, since f (x) = 1 − x is in C ([0, 1]), and the sequence µ N,β ∈ Pr ([0, 1]) is necessarily tight, weak convergence applied to the last term yields the righthand side of the desired equality.
Finally we observe that the limiting overlap distribution has suppµ ⊂ {0, 1}. We now turn to the proof of Theorem 1.1.
Proof of Theorem 1.1. By Corollary 3.3 and Proposition 1.4, we know that for any such weak limit, we get Differentiating (3.1), equating, and solving for m, we get

Ghirlanda-Guerra identities
We now turn to the proof of the Ghirlanda-Guerra Identities for these models. We need the following preliminary lemmas. Observe that a standard application of Gaussian concentration yields the following.
As a consequence of this, we find the Gibbs measure concentrates around a fixed energy level to order N . Lemma 3.5. The intensive energy concentrates. In particular, Proof. The result then follows from Lemma 3.4 after a modification of the proof of [24,Theorem 3.8].
We now turn to the proof of Theorem 1.2.
Proof of Theorem 1.2. Observe first that for β ≤ β c , µ = δ 0 by Theorem 1.1 so that the identities are trivial in this setting. It suffices to study β > β c . Furthermore, the limiting overlap distribution is supported on {0, 1} by Proposition 1.4, so it suffices to show the Approximate Ghirlanda-Guerra Identities for p = 1, since R p 12 = R 12 when R 12 ∈ {0, 1}.
The result then follows by a standard integration-by-parts argument.
Notice that if we apply Lemma 3.2 with Σ N = ∂T N , x(σ) = H N (σ), y(σ) = βH N (σ), and C(σ 1 , σ 2 ) = βN R 12 it follows that As a result, This implies that The Approximate Ghirlanda-Guerra identities have many deep consequences. We highlight one simple consequence regarding the limit of the overlap array distribution in these systems. Let Q β N be the overlap array distribution corresponding to G β N . As {Q N } is a sequence of measures on the compact polish space [0, 1] N 2 , it is tight. Let Q β be any limit point of this sequence. We will show that it is the unique limit point and is given by what is called a 1RSB Ruelle Probability Cascade which we define presently.
If θ = 1, let G = δ 0 . Then RP C(ζ) is the overlap array distribution induced by G. One important result to note is that RP C(ζ) satisfies the Ghirlanda-Guerra identities [24, Section 2]. This is a consequence of a standard invariance property of the Poisson-Dirichlet process/Poisson point processes of Gumbel type. What we will show now is that as a consequence of Theorem 1.2, we will have that Q = RP C(µ β ). This an immediate consequence of the characterization-by-invariance theory used in spin glasses [24,28,Section 15.13]. In this setting, the proof is fairly elementary and does not require the full machinery of this theory. Furthermore, it illustrates some essential ideas for this method. For these reasons, and to make this presentation self-contained, we include the proof. → Q β where Q β is the overlap distribution corresponding to RP C(µ β ) where µ β is as in (1.2).
The probability Q(R n = A) is computed by recursively applying the following cases: The Q-probability of this event is either µ β (R 12 = 1) or µ β (R 12 = 0). Case 2. n ≥ 3 and A is not the identity.
By weak exchangeability, we can assume that A is block-diagonal, and that the blocks are arranged in decreasing size. Take the first block and suppose that it is of length m ≥ 2. Let R n (m) denote the n − 1 × n − 1 matrix obtained by deleting the m−th row and column of R n and similarly for A(m).
By ultrametricity and the Ghirlanda-Guerra Identities combined with weak exchangeability, it follows that Q(R n = A) = Q(R n (m) = A(m), R 1m = 1) It thus suffices to compute Q(R n−1 = A(m)). To see why, note that if R n−1 = Id and R k,n = 1 for some k = n, then the remaining must all be zero. By the Ghirlanda-Guerra identities, the first event is Q(R n−1 = Id, R 1,n = 0) = Q(R n−1 = Id) 1 − 1 n − 1 Q(R 12 = 1) .
The first term is then computed by Case 3. The second event is computed by Case 2.
By applying repeatedly these cases, the probability of any such event is reduced to a polynomial in µ(R 12 = 1). Evidently, the same argument applied toQ yields same polynomials, and thus the desired result.
We now make the following remarks regarding how this can be understood to imply certain modes of convergence for the Gibbs measures.

Remark 3.7.
Observe that by the uniqueness portion of the Dovbysh-Sudakov theorem [23], this shows that the Gibbs measure G N sampling converges to G as above in the sense of Austin [7]. Remark 3.8. We also observe that for β > β c we can recover a result like that from [8] mentioned in the introduction. In particular, if we partition ∂T N in to groups of leaves with overlap at least 1 − , call them (B i ), then the ranked weights (G N,β (B i )) converge in law to the ranked weights of P D( βc β , 0). This follows from an approximation argument combined with Talagrand's identities (see, e.g., [20,Theorem 6.3.5] or [28,Section 15.4]).