On the accuracy of the normal approximation for the free energy in the REM

In the present paper we consider the fluctuations of the free energy in the random energy model (REM) on a moderate deviation scale. We find that for high temperatures the normal approximation holds only in a narrow range of scalings away from the CLT. For scalings of higher order, probabilities of moderate deviations decay faster than exponentially.


Introduction
The random energy model (REM for short) is a disordered spin system from statistical mechanics, invented by Derrida in 1980 [4,5]. It is a toy model to describe a system of N particles that can assume one of the 2 N accessible states from the set S N = {−1, +1} N , called the configuration space. The energy of a state σ ∈ S N is given by H(σ) = − √ N X σ where X σ is a N (0, 1)-distributed random variable, and the energies of different states are assumed to be independent, that is, (H(σ)) σ∈S N is (for fixed N) a sequence of i. i. d. normal distributed random variables. Despite its far-reaching simplifications, the REM is an important model from statistical mechanics and has been intensively studied over the last decades. More recent expositions of the model can be found in the books [1,13].
In the following, let (Ω, F, P ) be the probability space on which the triangular array of independent N (0, 1)-distributed random variables X σ : σ ∈ S N , N ∈ N is defined. The probability of observing a configuration σ ∈ S N of the N particle system is given by the random Gibbs measure P N,β (σ) := e −βH(σ) Z N (β) where β > 0 is the inverse temperature and Z N (β) a random normalization given by which is called partition function. Obviously, the minus sign in the definition of the random Hamiltonian H(σ) and the minus sign in the definition of the Gibbs measure cancel each other, however, it is convention to use them.
In statistical mechanics, one is interested in the existence of the so-called free energy in the limit N → ∞ in an appropriate sense. Note that this definition of the free energy differs from the one used by physicists by the factor −β −1 , which is constant and, therefore, omitted by mathematicians. A complete result on the existence of the free energy in the sense of almost sure convergence and convergence in L p was proved by Olivieri and Picco in 1984 [11] and reads as follows: P -almost surely and in L p (Ω, F, P ) for any 1 ≤ p < ∞.
The convergence in L 1 implies that the quenched free energy E F N (β) also converges to F (β) and, consequently, lim holds P -almost surely, which is why the free energy of the REM is said to be a self-averaging quantity. Moreover, the annealed free energy is given by and, therefore, the quenched free energy and annealed free energy coincide in the limit N → ∞ if β ≤ β c . This breaks down for β > β c , where the quenched free energy is strictly less than the annealed free energy. Even more, one already obtained a precise picture of free energy's deviations and fluctuations. In view of Theorem 1.1, it is a natural first step to ask for refinements of this limit theorem on the level of large deviations and, therefore, we shall briefly recall what a large deviation principle (LDP) is. For a thorough introduction to the field we refer to the books [3,8]. Let (X , B X ) be a measurable space, consisting of a Hausdorff topological space X endowed with the Borel σ-field B X . In addition to that, let γ n → ∞ be a sequence of real numbers and I : X → [0, ∞] be a lower semicontinuous function. A sequence of random variables (X n ) n∈N defined on some probability space (S, A, P) with values in (X , B X ) is said to satisfy the large deviation principle (LDP for short) with speed γ n and rate function I if for all A ∈ B X . The rate function I is said to be good if the level sets {x ∈ X : I(x) ≤ c} are compact subsets of X for all c ∈ R.
As already hinted at, the probabilities of O(1)-deviations from the limiting free energy F (β) have already been quantified. In [9], Fedrigo, Flandoli and Morandin proved a large deviation theorem for the free energy, which is stated next: [9]). The sequence of random variables ( 1 N log Z N (β)) N ∈N satisfies the LDP with speed N and good rate function I given by , where F (β) are the limit points of the free energy defined in (1).
Note that large deviation techniques can also be used to prove (1) P -a. s. via Varadhan's Lemma (cf. [6]).
While Theorem 1.2 describes the atypical behavior of F N (β) by studying the probabilities of large deviations, the typical behavior is described by theorems on its fluctuations, i. e. by theorems on distributional convergence of the properly rescaled free energy. This has been done by Bovier, Kurkova and Löwe in [2]. They proved the existence of multiple phase transitions on the level of distributional convergence and found that the fluctuations of the free energy are exponentially small. What is more, they are Gaussian if and only if β ≤ log 2/2: [2]).
Remark 1.1. Since it will be of some importance for the present paper, we quickly want to sketch the course of action followed in [2]: Using the Taylor expansion log(1 + x) = x + o(x) for x → 0 the authors defer the proof of a limit theorem for where Y N (σ) = (e β √ N Xσ − e N β 2 /2 )/e N β 2 are (for each N ) i. i. d. random variables with mean zero and variance s 2 N = 1 − e −N β 2 → 1 as N → ∞. Next, the authors show that Y N (σ) satisfies Lindeberg's condition if β < log 2/2, and obtain Theorem 1.3 (i) by means of the CLT for triangular arrays. However, for β = log 2/2 Y N (σ) does not satisfy Lindeberg's condition, which is related to the fact that Y N (σ)'s tails become too heavy, and the behaviour of the sum σ Y N (σ) is dominated by extremal events. Yet, the authors can still prove convergence in distribution to a normal distribution and attain Theorem 1.3 (ii). It is worth noting that the authors also acquire complete results for β > log 2/2, where non-standard limiting distributions occur, which is due to the fact that Y N (σ) has even more weight on its tails in these cases.
In view of Theorem 1.3, we ask the following question: can the tail probabilities be approximated by the tails of a normal distribution even for growing t, that is, does one find even for t N → ∞? It is well-known (use e. g. (8)) that for any x > 0 and, thus, we ask for the validity of for any x > 0 or, more general, for the existence of the LDP with speed t 2 N and Gaussian rate function Large deviation results for the remaining cases of scalings between those of the CLT and the LDP, i. e.
are commonly referred to as moderate deviation results in the literature, since one asks for deviations of F N (β) of order o(1) from F (β). In like manner, LDPs for scalings that are between those of the CLT and the LDP are called moderate deviation principles (MDPs).
However, we will stick to the term LDP, since the formal definitions of the LDP and MDP are the same. Note that moderate deviations for mean field models from statistical mechanics have already been studied (cf. e. g. [10,12]). We will show in this article that (3) holds if and only if t N = o( √ N ), that is, (3) holds only in a small range of scalings close to the CLT scaling. This is particularly interesting since it is out of harmony with the general picture of moderate deviations obtained by the case of partial sums of standardized i. i. d. random variables (X i ) i∈N . The prototypical answer for this case is that (t n √ n) −1 n i=1 X i satisfies under suitable conditions the LDP with speed t 2 n and Gaussian rate function I(x) = x 2 /2 for the whole range of scalings between the corresponding CLT and LLN (see [7] for a necessary and sufficient condition on this type of moderate deviations). In particular, the rate function does not depend on the moderate deviation scaling.
The main result of the present paper reads as follows, where t N → ∞ is from now on a diverging sequence of real numbers: Theorem 1.4 (Moderate deviations for the free energy in the REM).
(i) Let β < log 2/2. Then, , then the corresponding speed is t 2 N and the good rate function is Otherwise, if lim inf n→∞ t N √ N > 0, the LDP holds for any speed γ N = o(N ) with the good rate function (ii) Let β = log 2/2. Then, satisfies, for any scaling t N = o( √ log N ), the LDP with speed t 2 N and good rate function I given by Remark 1.2. 1. Note that the restriction γ N = o(N ) is natural in view of the LDP (Theorem 1.2): if one considers deviations of lower order than in the LDP, then the speed of convergence to zero of these probabilities is of lower order than the speed occuring in the LDP, which was N in our case. 2. The degenerated rate function appearing in (4) reflects the superexponential decay of moderate deviation probabilities in case of overscaling. 3. Observe that for β = log 2/2 the obtained rate function is I(x) = x 2 , which matches the fact that the limiting distribution in the CLT is N (0, 1/2) and for any x > 0.

Proof of Theorem 1.4
This section is devoted to the proof of Theorem 1.4, which is based on the following idea: As a first step, we follow the idea of the CLT's proof and use the approximation log(1 + x) = x + o(x) for x → 0 to defer the proof of the LDP for to the proof of the LDP for e To that end, we will show in Lemma 2.1 that the random variables (5) and (6) are exponentially equivalent (for a definition see e. g. Definition 4.2.10 in [3]), since it is know that exponentially equivalent random variables satisfy the same LDP (see e. g. Theorem 4.2.13 in [3]). Then, we are left to prove the LDP for the random variable where (Y N (σ); σ ∈ S N , N ∈ N) is a triangular array of independent random variables, which were defined in (2). However, the random variable Y N (σ) does not have finite exponential moments, which is why we use again the concept of exponential equivalence to switch over to the truncated random variables Y t 2), which can be studied by means of the Gärtner-Ellis theorem (cf. e. g. Theorem 2.3.6 in [3]).
We prepare the proof of Theorem (1.4) by stating and proving the above-mentioned lemmata: are exponentially equivalent for any speed γ N = o(N ).
Proof. Let ε > 0 and T β,N : for N sufficiently large, where we have made use of Markov's inequality to obtain the last but one line. A direct calculation yields and, therefore, Lemma 2.2. Let β ≤ log 2/2 and assume are exponentially equivalent on the scale t 2 N .
Proof. We get where σ 0 ∈ S N and Making use of the standard estimate which holds for all x > 0, we get Note that (β − log 2/(2β)) 2 > 0 if and only if β = log 2/2 so that the last line follows from the conditions made on the asympotic behavior of t N . Now that we have gathered all preliminary results, we can start with a proof of this article's main theorem: Proof of Theorem 1.4. We start with a proof of (i)'s first part and (ii). To that purpose, let β ≤ log 2/2 and assume By means of Lemma 2.1 and Lemma 2.2 it suffices to prove the desired LDP for This follows directly from the Gärtner-Ellis theorem once we have proved for any σ 0 ∈ S N , this follows, using the Taylor expansion log(1 + which we are going to prove in the sequel. To that purpose, we calculate the asymptotics of the first three moments of Y t N (σ 0 ) and get Ad (10): With c N (β) (cf. (7)) we have Using the standard estimate (8) for a Gaussian random variable, we see and which yields (10) as where we have used in the last line that · log 2/(2β 2 ) − 1 = 0 and (β − log 2/(2β)) 2 = 0 if β = log 2/2 and · (β − log 2/(2β)) 2 > 0 if β < log 2/2.
Ad (12): For every ε > 0 it is (1), where we have used the same argument as in (13) to derive the last line. Thus, with the help of (11) we see lim N →∞ which yields (12) as ε was arbitrary. Now, we see that (9) follows with the help of (10) and (11) from Since λ t N 2 −N/2 Y t N (σ 0 ) is bounded by λ it can easily be seen, using the Lagrange form of the remainder in Taylor's formula, that which finishes the proofs of (i)'s first part and (ii) with the help of (12).
For the the second part of (i), let γ N = o(N ) be an arbitrary speed. It suffices to prove for any ε > 0. The validity of (14) follows directly from for any δ > 0 by the first part of (i), which we proved above. Finally, this also yields the validity of (15) as (14)  in the case β = log 2/2, lim inf N →∞ t N / √ log N > 0 is still an open question. By Lemma 2.1 this random variable is exponentially equivalent to t −1 N 2 −N/2 σ∈S N Y N (σ) and it can even be shown that t −1 N 2 −N/2 σ∈S N Y t N (σ) satisfies the LDP with speed t 2 N and rate function I(x) = x 2 under the natural condition t N = o( √ N ). However, one can show that in this case t −1 N 2 −N/2 σ∈S N Y N (σ) and t −1 N 2 −N/2 σ∈S N Y N (σ) are not exponentially equivalent, since Y N (σ)'s tails become too heavy and extremal events start to dominate the sum's behavior. This is the same effect that can be observed in the CLT, where it engenders a breakdown of the standard CLT.