Extending the Parisi formula along a Hamilton-Jacobi equation

We study the free energy of mixed $p$-spin spin glass models enriched with an additional magnetic field given by the canonical Gaussian field associated with a Ruelle probability cascade. We prove that this free energy converges to the Hopf-Lax solution of a certain Hamilton-Jacobi equation. Using this result, we give a new representation of the free energy of mixed $p$-spin models with soft spins.


Introduction
Let (β p ) p⩾2 be a sequence of real numbers and let ξ(r) ∶= ∑ p⩾2 β 2 p r p for r ∈ R. We will assume that the sequence (β p ) p⩾2 is such that ξ is well defined on the entire real line. This assumption can be relaxed if needed by restricting the parameters of the models we will be working with. Denote by (H N (σ)) σ∈R N the centered Gaussian field with covariance Let P N ∶= P ⊗N 1 denote the N -fold product of P 1 , a probability measure on R with bounded support. We aim to study the Gibbs measure built with respect to the energy function H N (σ) and the reference measure P N . A quantity of fundamental interest is the limit free energy When the support of P 1 is {±1} and ξ(r) = β 2 r 2 (the Sherrington-Kirkpatrick model [27]), this limit was discovered by Parisi in a celebrated work [23,24]; see also [12]. The formula was then proved rigorously for general ξ in [8,30,18], and was later extended to the current setting where we only assume that the support of P 1 is bounded in [16,22].
In order to further our understanding of this object, it was proposed in [15] (following [13,14]) to recast the limit free energy (1.1) as a particular value of the solution of a Hamilton-Jacobi equation. This solution depends on two parameters t ⩾ 0 and µ ∈ M(R + ), where M(R + ) denotes the set of Borel probability measures over R + . It was conjectured that an enriched version of the free energy, which would depend additionally on the parameters t ⩾ 0 and µ ∈ M(R + ), may converge to the same solution evaluated at these parameters.
The main purpose of this paper is to prove this conjecture. In order to state the result, we start by defining the enriched model precisely. We denote by M b (R + ) the subset of M(R + ) of measures with bounded support. By [19,Theorem 2.17], one can associate a Ruelle probability cascade [25] to each probability measure on [0, 1]; this Ruelle probability cascade is a random probability measure on the unit ball of a Hilbert space. We denote by R the Ruelle probability cascade corresponding to the uniform distribution over [0,1], and by (α ) ⩾1 an i.i.d. sample from R (that is, the law of (α ) ⩾1 is R ⊗∞ ). In particular, the law of the overlap α 1 ⋅ α 2 under ER ⊗2 is the uniform distribution over [0,1]. Given a measure µ ∈ M b (R + ), and conditionally on R, let z µ (α) be a Gaussian process indexed by α ∈ supp(R) with covariance In the expression above and throughout the paper, we use the shorthand notation, for every r ∈ [0, 1], To check that the Gaussian process z µ (α) exists, it suffices to verify that µ −1 (α 1 ⋅ α 2 ) is a positive semidefinite kernel on (supp R) 2 , and this follows from the fact that the support of R is ultrametric. Moreover, µ −1 (r) is left-continuous and, thus, continuous at r = 1, which implies that the process z µ (α) is stochastically continuous on supp(R). As a result, it is jointly measurable (see e.g. [6, Theorem 3.3.1]) and we can define, for every t ⩾ 0 and µ ∈ M b (R + ), where z µ i (α) are independent copies of z µ (α) for i ⩾ 1 (conditionally on R and independent of H N ). For a measure of the form where (v α ) α∈N k are the weights of the Ruelle probability cascade with parameters (ζ ) 1⩽ ⩽k , and (z α,i ) i⩾1 are independent copies of the Gaussian process with the covariance Ez α 1 z α 2 = q α 1 ∧α 2 , where α 1 ∧ α 2 = max{ ⩾ 0 ∶ α 1 j = α 2 j for j ⩽ }. The quantities (1.3) and (1.6) are equal in this case, because, by (the proof of) [19,Theorem 1.3] and standard properties of the Ruelle probability cascades, see [19,Theorem 4.4], both quantities are equal to the same continuous functional of the distribution of the array (µ −1 (α ⋅ α ′ )) , ′ ⩾1 under ER ⊗∞ and correspondingly of the array (q α ∧α ′ ) , ′ ⩾1 under E(∑ α∈N k v α δ α ) ⊗∞ ; and these distributions are equal, due to the property of the Ruelle probability cascades that the distribution of an overlap array is determined by the distribution of one overlap. Moreover, denoting by D ⩾ 0 the smallest real number such that the support of P 1 is contained in [− √ D, one can check (see for instance [29], [15, Proposition 2.1], or Subsection 3.5 below) that for every µ, ν ∈ M b (R + ), In view of this, we can whenever convenient replace the measure µ by an atomic measure. Finally, using again standard properties of the Ruelle probability cascades (see e.g. [31,Theorem 14.2.1] or [19,Theorem 2.9]), one can verify that F N (0, µ) does not depend on N ; we denote this quantity by We will recall a somewhat more explicit expression for ψ(µ) in (2.3) below. Denote by U a uniform random variable over [0, 1], and for every probability measure µ on R + , define X µ ∶= µ −1 (U ), where we recall that µ −1 is defined in (1.2). We also define, for every s ∈ R, ξ * (s) ∶= sup r⩾0 (rs − ξ(r)) .
Our first goal is to prove the following conjecture from [15] (specialized to the case where P N is a product measure).
The motivation in [15] for this statement is that the right side of (1.9), seen as a function of (t, µ), solves the formal Hamilton-Jacobi equation For discrete µ as in (1.4), one can check that where R 1,2 ∶= N −1 σ 1 ⋅ σ 2 is the overlap of σ 1 and σ 2 , When ξ is the square function, the right side of (1.11) can be interpreted as the conditional variance of the σ-overlap R 1,2 given the α-overlap α 1 ⋅ α 2 . More generally, the right side of (1.11) is small if and only if the conditional distribution of the overlap R 1,2 given α 1 ⋅ α 2 is concentrated. This evokes the synchronization phenomenon used in the proof of the Parisi formula by Talagrand in [30] along Guerra's interpolation [8] with nearly optimal parameters; see also [31]. The idea of using the Hamilton-Jacobi techniques to study replica symmetric solution of the SK model was already utilized in [7], and one-step replica symmetry breaking analogues of the equation (1.11) were derived and studied in various models in [3,1].
The main step in the proof of Theorem 1.1, which is to pass to the limit N → ∞ for the left side of (1.9) and get some expression for the limit, is almost identical to the argument in [22] (specialized to the one-dimensional case), so we only outline the necessary modifications. The main tool is the synchronization mechanism developed in [20,21,22] based on the overlap ultrametricity proved in [17] for measures that satisfy the Ghirlanda-Guerra identities (and the fact that one has a lot of flexibility in enforcing these identities by way of small perturbations). The synchronization has been applied in a variety of situations, e.g. [9,4,11], and here we demonstrate another application. A particular synchronization that will be needed here is the one that forces the overlaps µ −1 (α 1 ⋅ α 2 ) and R 1,2 = N −1 σ 1 ⋅ σ 2 to be deterministic functions of their sum in the thermodynamic limit. Notice that we need to use a synchronization argument here even in the case of Ising spins.
The reader may rightfully wonder what to make of the term ξ(N −1 σ 2 ) appearing in the exponential in (1.3), which was introduced for convenience but is otherwise a nuisance (except in the case of Ising spins, where it is deterministic and therefore causes no harm). The second goal of this paper is to explain how to remove this term and deduce from Theorem 1.1 the limit of the "untampered" free energy in (1.1). At present this is perhaps not as interseting as it sounds, since the proof of Theorem 1.1 could be modified to obtain the limit of the quantity without the term ξ(N −1 σ 2 ) directly. However, it is likely that a more direct proof of Theorem 1.1 exists, in which case it is important to notice that Theorem 1.1 is indeed all the information needed to conclude. Moreover, we obtain in this way a somewhat different expression for the limit in (1.1) than that obtained in [16,22].
In order to state this second result, we introduce two more parameters to the energy and write, for every s, t ⩾ 0, µ ∈ M b (R + ) and h ∈ R, Notice that when s = 0, this quantity is of the form covered by Theorem 1.1, up to a redefinition of P N to absorb the term exp(h σ 2 ). We denote The intuition for this result is simple, and consists in writing the Hopf-Lax formula for the equation By setting s = t = 1, µ = δ 0 , and h = 0 in Theorem 1.2, we thus get the following new representation for the free energy of models with soft spins.
Corollary 1.3. The limit free energy can be written as Organization of the paper. In order to prove Theorem 1.1, we first state a different expression for the left side of (1.9) in Proposition 2.1 below. We then rewrite it in the form of the right side of (1.9) in Section 3, by reasoning similarly to what was done in [15] in the case µ = δ 0 . We next turn to the proof of Proposition 2.1 in Section 4. Finally, we provide the proof of Theorem 1.2 in Section 5.

Parisi formula
In this section, we present the structure of the argument for identifying the limit on the left side of (1.9) in the more "classical" form in which Parisi formulas are usually stated. As a preparation for stating the formula we will obtain, we provide with an alternative description of the quantities ψ and Ψ appearing in the main statements of Section 1. Given a probability measure ν on R + , we write ν(s) ∶= ν([0, s]). For every ν ∈ M b (R + ) and λ ∈ R, we denote by Φ ν,λ = Φ ν,λ (t, x) the solution of the equation and we set Using classical properties of Ruelle probability cascades, one can verify that the functions ψ and Ψ defined in (1.8) and (1.13) respectively satisfy, for every µ ∈ M b (R + ) and h ∈ R, Given a probability measure ζ on R + , let ζ µ denote the probability measure on R + whose cumulative distribution function satisfies In other words, the c.d.f. of ζ µ is Notice that it suffices to prove Theorem 1.1 for t = 1. Indeed, once the result is known in this case, we recover the general statement by replacing ξ with tξ. The main step towards the proof of Theorem 1.1 is the following result. For σ 1 is distributed according to P 1 , we denote by d and D the smallest and largest points of the support of the distribution of σ 2 1 . We also write, for every r ∈ R, θ(r) ∶= rξ ′ (r) − ξ(r).
We now outline the structure of the argument for obtaining Proposition 2.1. By the definitions of d and D, when σ ∼ P N = P ⊗N 1 , we have that N −1 σ 2 ∈ [d, D], and any point u ∈ [d, D] can be approximated by some N −1 σ 2 for large N and σ ∈ supp P N . For every u ∈ [d, D] and ε > 0, let The measure µ will be fixed throughout, so we keep the dependency of F ε N (u) on µ implicit in the notation. It is clear that, denoting we have that as ε > 0 tends to zero, Proposition 2.1 is a direct consequence of the following result.
The proof of Theorem 2.2 will be given in Section 4. Before doing this, we show in the next section how to deduce Theorem 1.1 from Proposition 2.1.

Hopf-Lax representation
In this section, we take the validity of Proposition 2.1 for granted, and show that it implies Theorem 1.1. We decompose the argument into five subsections.

3.2.
Removing the constraint on the support. Let us first show that we can remove the constraint supp(ν) ⊆ [0, ξ ′ (u) + µ −1 (1)] in D u . without changing the value of the right side of (3.1).

Lipschitz continuity. We now show (3.5). Let us recall the definition of the set in (2.7) and define
Let us first suppose that ν is discrete. If we recall (3.4), the results of [22,Section 7] (specialized to the one-dimensional case) show that, for discrete ν, Given ν,ν ∈ M b (R + ), we can interpolate between f N (ν, ε) and f N (ν, ε) by replacing (1). Then the derivative of f N (ν, ε) in t along this interpolation path equals where ⟨ ⋅ ⟩ is the average with respect to the Gibbs measure on Ω ε N (u) × supp(R). Since, by Cauchy's inequality, R 1,2 ⩽ u + ε for σ 1 , σ 2 ∈ Ω ε N (u), the above derivative is bounded by where we also used the fact that the distribution of α 1 ⋅ α 2 ∼ U [0, 1] under EG ⊗2 N is the same as under ER ⊗2 by the properties of the Ruelle probability cascades (see e.g. [19,Theorem 4.4]). This and (3.13) imply (3.5) for discrete ν, and by extension for all ν ∈ M b (R + ).

Proof of the Parisi formula
The goal of this section is to prove Theorem 2.2, which we recall implies Proposition 2.1. We first prove the upper and then the lower bound.

Upper bound.
The upper bound is proved by the standard Guerra replicasymmetry-breaking interpolation [8]. By Lipschitz continuity (1.7), it is enough to consider discrete µ and suppose that the infimum in Theorem 2.2 is taken also over discrete distributions ζ ∈ M 0,u such that ζ −1 (1) = u. Given such ζ, let z(α) and y(α) be independent Gaussian processes (conditionally on R) indexed by α ∈ supp(R) with covariances , and let z i (α) be independent copies of z(α) for i ⩾ 1. We assume these processes to also be independent of H N and z µ i (α), conditionally on R. Consider an interpolating free energy, for t ∈ [0, 1], where the interpolation Hamiltonian is defined by One can see that Since R 1,1 = N −1 σ 1 2 ∈ (u − ε, u + ε) whenever σ 1 ∈ Ω ε N (u) and we also assumed that . Therefore, by the usual Gaussian integration by parts, where ⟨ ⋅ ⟩ is the average with respect to the Gibbs measure When ξ is only convex on R + , one can add a small perturbation that enforces the Ghirlanda-Guerra identities and, as a result, enforces asymptotic positivity of R 1,2 (see [28] or [19,Chapter 3]).
On the other hand, again, by the standard properties of the Ruelle probability cascades (recall the notation in (2.8)), Putting everything together shows that for all discrete distributions ζ ∈ M 0,u such that ζ −1 (1) = u. Since continuous extension of P(ζ µ , λ) to all ζ ∈ M 0,u not necessarily satisfying ζ −1 (1) = u is exactly (1)) 2 (this is analogous to why the term − 1 2 µ −1 (1) σ 2 was included in the definition of F N (t, µ)), this finishes the proof of the upper bound.

Lower bound.
The proof of the lower bound is identical to the one-dimensional case of [22], with some simplifications due to the one-dimensional nature of our problem and one minor modification to account for the presence of the term ∑ N i=1 σ i z µ i (α) that we will now explain.
The main effect of this term is that the cavity fields (in the first term) of the Aizenman-Sims-Starr representation will be of the form . To understand the distribution of the array (C , ′ ) , ′ ⩾1 under the Gibbs measure that arises in the cavity computation, we can use the synchronization mechanism from [20] to synchronize the overlaps R 1,2 and q 1,2 ∶= µ −1 (α 1 ⋅ α 2 ). This can be done by including terms in the perturbation Hamiltonian with covariances given by monomials R n 1,2 q m 1,2 and then use Theorem 4 in [20] to show that both R 1,2 and q 1,2 are non-decreasing 1-Lipschitz functions of their sum in the thermodynamic limit.
If we think of the sum S 1,2 ∶= R 1,2 + q 1,2 as the quantile transform ν −1 (U ) of ν = L(S 1,2 ) and uniform U ∼ U [0, 1], then both R 1,2 and q 1,2 are non-decreasing functions of U , which means they must be quantile transforms of their distributions. The distribution of q 1,2 is µ for all N by the properties of the Ruelle probability cascades ( [19,Theorem 4.4]) and, thus, in the limit. If the limiting distribution of R 1,2 (as usual, along some subsequence) is ζ then (recalling (2.4)) . This means that C , ′ ∼ ζ µ . Similarly, the cavity fields y(σ) coming from the Onsager correction in the second term in the Aizenman-Sims-Starr scheme will have covariance in the thermodynamic limit. If ζ −1 (1) = u then the lower bound one obtains by the cavity computation is equal to For general ζ ∈ M 0,u , we again appeal to the fact that (4.3) is a continuous extension from general ζ of P(ζ µ , λ) for ζ satisfying ζ −1 (1) = u.

Proof of Theorem 1.2
The goal of this section is to prove Theorem 1.2. We obtain this result by combining Theorem 1.1 with the observation in (1.14) that F N satisfies a Hamilton-Jacobi equation, up to a small error term. Denote by ⟨ ⋅ ⟩ the Gibbs measure Similarly to the observations in [13,Section 1] concerning the Curie-Weiss model (with ξ replaced by the square function there), we have Notice in particular that ∂ h F N ⩾ 0, ∂ 2 h F N ⩾ 0, and since the support of the measure P 1 is assumed to be bounded, the derivatives ∂ s F N and ∂ h F N are bounded uniformly in N . Moreover, since ξ is locally Lispschitz continuous, there exists a constant C < ∞ such that, for every N , We fix t ⩾ 0 and µ ∈ M b (R + ), and denote by f = f (s, h) ∶ R + × R → R the candidate limit for F N , namely where we setΨ Notice that we do not display the dependency of f andΨ on t ⩾ 0 and µ ∈ M b (R + ); we allow ourselves to do this since these parameters will be kept fixed throughout the section. For the same reason, from now on, we write F N (s, h) in place of F N (s, t, µ, h).
Recalling that, by Theorem 1.1, the quantityΨ(h ′ ) is the limit of F N (0, h ′ ), and using (5.1) and (5.2), it is clear thatΨ is uniformly Lipschitz continuous, nondecreasing, and convex. One can check that these properties transfer to the function f : it is uniformly Lipschitz continuous over R + × R, and for each fixed s ⩾ 0, the mapping h ↦ f (s, h) is nondecreasing and convex (see for instance [ Our goal is to show that F N (s, h) converges to f (s, h). While we refrain from writing down a general statement, we list here all the properties of these functions that will be used below: (1) the functions are uniformly Lipschitz, with a common Lipschitz constant; (2) the functions are nondecreasing and convex in h; (3) for each h, we have lim N →∞ F N (0, h) = f (0, h); (4) the function f satisfies the equation (5.5) almost everywhere, while the function F N satisfies the same equation, up to an error that we will show to be small after integration in h, uniformly over s.
Proof of Theorem 1.2. We split the proof into two steps.
Step 1. We write down an equation for the difference between F N and f and state some elementary bounds. We denote so that, almost everywhere in R + × R, Let φ ∈ C ∞ (R) be a smooth function such that φ(0) = 0 and φ ′ ⩽ 1, and define v N ∶= φ(w N ). By the chain rule, we have It will be convenient to be allowed to differentiate b N in h. In order to make this rigorous, we regularize b N a bit, by convolution with a smooth kernel. Let ζ ∈ C ∞ c (R) be a smooth function with compact support such that ∫ R ζ = 1, and for each ε > 0, One can check that for each fixed N and s ⩾ 0, the function b N,ε (s, ⋅) converges to b N (s, ⋅) almost everywhere in R as ε tends to zero (see for instance [5,Theorem C.5.7]). Moreover, and ∂ 2 h f ε , are all nonnegative, and ξ ′′ maps R + to R + , we deduce that Notice also that, since F N and f are Lipschitz with a common Lipschitz constant, we have that b N,ε L ∞ is bounded uniformly over N and ε. We write Step 2. We fix S ⩾ 1 for the remainder of the proof, and study the quantity where we kept it implicit in the notation that the functions in the integrands are evaluated at (s, h). We now estimate the contribution of each term on the right side in turn. By the definition of R and an integration by parts, we have Recalling also that for each fixed N , we have that b N,ε (s⋅) converges to b N (s, ⋅) almost everywhere, and using the dominated convergence theorem, we see that Summarizing, we have shown that, for almost every s ∈ [0, S], Recalling that φ ′ ⩽ 1 and using (5.3), we deduce that Recalling the definition of v N , fixing s = S 2 , this implies in particular, up to a redefinition of C < ∞, Notice also that the constant C < ∞ does not depend on our choice of function φ such that φ ′ ⩽ 1. We thus deduce that Finally, by the dominated convergence theorem, the integral on the right side converges to 0 as N tends to infinity. We have thus shown that Recall that R > 0 and that our choice of S ⩾ 1 was arbitrary. To conclude for the pointwise convergence of F N to f , it suffices to use the Lipschitz regularity of F N . Explicitly, for every ε > 0, we can write and we have seen above that the last integral converges to the corresponding integral with F N replaced by f as N tends to infinity. Moreover, by the Lipschitz continuity of F N , Hence, sending N to infinity first and then ε to zero allows us to conclude that for each s ⩾ 0 and h ∈ R, we have indeed lim N →∞ F N (s, h) = f (s, h).