Near-extreme eigenvalues in the beta-ensembles

For beta-ensembles with convex polynomial potentials, we prove a large deviation principle for the empirical spectral distribution seen from the rightmost particle. This modiﬁed spectral distribution was introduced by Perret and Schehr (J. Stat. Phys. 2014) to study the crowding near the maximal eigenvalue, in the case of the GUE. We prove also convergence of ﬂuctuations.


Introduction
In random matrix models, the most popular statistics is the empirical spectral distribution (ESD). For a N × N matrix M N with real eigenvalues (λ 1 , · · · , λ N ), it is: (1.1) The first step in asymptotic study is to prove the convergence of µ (N ) and also of the so called integrated density of states Eµ (N ) . The limiting distribution σ is most often compactly supported. A second step is to prove the convergence of the largest eigenvalue λ (N ) = max(λ 1 , · · · , λ N ) to the end of the support of σ. At a more precise level, it is sometimes possible to establish large deviations. In the so-called β-models, the density of eigenvalues is where ∆(λ) is the Vandermonde determinant. Under convenient assumptions on the potential V , the ESD satisfy the large deviation Principle (LDP) with speed βN 2 /2 and good rate function where Σ is the logarithmic entropy Σ(µ) = ln(|x − y|)dµ(x)dµ(y) , (1.4) and (1.5) Moreover I V achieves its minimum 0 at a unique probability measure µ V which is compactly supported, and which is consequently the limit of µ (N ) .
The most famous example is the Gaussian Unitary Ensemble which corresponds to V (x) = x 2 /2 and β = 2. The limiting distribution µ V is then the semicircle distribution : (1.6) and the result of large deviations is due to [2], with c V = 3/4. Moreover under appropriate conditions again, the support of µ V is an interval [a V , b V ] and the maximal eigenvalue λ (N ) converges to b V .
To analyze the "crowding" phenomenon near the largest eigenvalue, Perret and Schehr proposed in [10] and [11] to study the empirical measure : where λ (1) < λ (2) < · · · < λ (N ) are the eigenvalues of M N ranked increasingly. They considered the Gaussian case with the Dyson values β = 1, 2, 4 and made a complete study of Eµ N , in the limit N → ∞ both in the bulk and at the edge.
In the present paper, we consider more general potentials V , actually convex polynomials of even degree. We first prove that µ N converges in probability to the pushforward ν V of µ V by the mapping x → b V − x. Then we prove that the family of distributions of (µ N ) N satisfies the LDP with speed N 2 and a "new" rate function which we call I DOS V , referring to the name "Density of States near the maximum" given by Perret and Schehr to Eµ N . There are two striking facts. The first one is that the LDP is obtained for a Wasserstein topology (and not for the usual weak topology). This ensures in particular that the rate function is lower semicontinuous. The second one is that the LDP is weak i.e. we do not have a large deviation upperbound for closed sets but only for compact sets. This implies that we could not deduce the convergence to the limit from the LDP as usual. In the Gaussian case, we have V (x) = x 2 /2 and (1.9) Section 2 is devoted to LDPs: Proposition 2.7 and Corollary 2.8 study the pair (λ (N ) , µ N ), which prepares the main result, the LDP for (µ N ) N in Theorem 2.9. The proofs are in Section 3. To complete the description of the asymptotic behavior of µ N , we

Assumptions and main result
To begin with, let us recall the definition of the Wasserstein distance.
where Π(µ, ν) is the set of probabilities on R 2 with first marginal µ and second marginal ν.
Besides, we denote by d the usual distance for the weak topology, given by Lipschitz bounded functions. It is known that We assume that the sequence of distributions of (λ (N ) ) N satisfies a large deviation principle with speed βN 2 /2, with a rate function J − V , on the left of b V i.e.  [14]).
We are now interested in the behavior of µ N . First, we have the following convergence result: Proposition 2.6. We denote by τ c µ the probability defined by Then, as N → ∞, µ N converges weakly in probability to the probability measure : Our main result rules the large deviations of the pair (λ (N ) , µ N ). We equip M p 1 (R + ) with d Wp and denote by B(µ; δ) the ball around µ of radius δ.
We define (2.6) which, since Σ is invariant by the transformation τ c , is also Proposition 2.7. We have 1. For any c ∈ R and µ ∈ M p 1 (R + ), Corollary 2.8. The sequence of distributions of (λ (N ) , µ N ) N satisfies a weak LDP on R × M p 1 (R + ) equipped with the product topology, at speed βN 2 /2 with rate function I V .
From these results, we may deduce on the one hand a weak LDP for the random measure µ N , and on the other hand a conditional LDP for µ N , knowing λ (N ) .
V dτ c ν . (2.11) The properties of I DOS V and G V are ruled by the following lemma: is well defined on M q 1 (R) with values in [0, +∞] and lower semicontinuous for the W q topology. is true for closed sets. Nevertheless, we can prove exponential tightness in a weaker topology, conditionally that λ (N ) remains bounded, which leads to: EJP 21 (2016), paper 52.
The conditional large deviations are ruled by the following theorem.
Then lim sup We have, by invariance We recover Point 3 of Prop. 2.5.
As noticed in Prop. 2.1 and Rem. 2.3 in [9] there is a unique µ such that J V (c) = I V (µ), let us call it µ c . In the Gaussian case, its explicit expression is in [5] (up to some notational changes).
EJP 21 (2016), paper 52. Remark 2.14. Let us notice that for fixed c, I V (c, ·) and J V (c, ·) may be seen as conditional rate functions. From (2.6), we conclude that whereas from (2.16) and the above remark, we conclude that Remark 2.15. Let us now give some additional comments relative to the Gaussian case: As we have seen above, the rate function I DOS V is zero for probabilities of the form µ = τ b µ SC , b ≥ 2 and thus is not a convex rate function. Notice that this particular functional is semicontinuous not only for d W1 but also for the weak topology. It is a consequence of the semicontinuity of Var. To prove this fact, use the representation where X and Y are two real random variables independent and µ distributed, and then apply Fatou's Lemma.
A nice consequence is that the weak LDP satisfied by µ N holds also in the weak topology (see p. 127 Remark (b) in [6]).

Proofs
In this section, we begin with the proofs of the easiest results and we end with the proof of the main result.

Proof of Corollary 2.4
The set is compact for the weak topology and is used to prove the exponential tightness for µ (N ) in [1] p. 78. Actually K M is also a compact set for the q-Wasserstein distance for q < p (see the Appendix). It is then enough to apply Theorem 4.2.4 of [6].

Proof of Proposition 2.6
Let f a bounded Lipschitz function with Lipschitz constant and uniform bound less than 1. Then, From the convergence of µ (N ) to µ V , and λ (N ) to b V , we deduce that µ N converges to

Proof of Lemma 2.10
1) Notice that the uniqueness of κ V comes from the convexity of V . 2 where F is a polynomial function and a p > 0. The function m p (ν) is lower semicontinuous as the supremum of the continuous functions (|x| p ∧ M )dν(x). The functions κ V (ν) and m k (ν), k ≤ p − 1 are continuous in ν for d Wq . Therefore, G V is lower semicontinuous. 4) We refer to [1] for the same properties of I V , using e.g. for the positivity that I DOS V (µ) = I V (τ κ V (µ) (µ)). From [2], −Σ(µ) is lower semicontinous for the topology of the weak convergence, and therefore is lower semicontinuous for the stronger topology W q .
At last, G V is lower semicontinuous from the 3). 5) First notice that τ b µ V has a support in R + iff b ≥ b V . From (2.10) and (2.6) we have and this infimum is 0, reached at c = b since τ b is an involution.
We could have also argued that, since the sequence µ N converges to τ b V µ V and I DOS V is the rate function in the LDP for µ N , this insures that I DOS

Proof of Theorem 2.9
It is enough to take c = κ V (µ) in the lower bound (2.8) in Proposition 2.7 to obtain: For the upperbound, we take F = R in (2.9). 2

Proof of Proposition 2.7
Since the potential V is assumed to be a convex polynomial, it is lower bounded by V min . Changing V into V − V min induces a change of c V into c V − V min , so in the above proofs we may and shall assume V ≥ 0.
Therefore, it is enough to prove the weak LDP for the measureQ N . The proof will consist in two parts: the lower bound and the upper bound.

Proof of the lower bound
We need an approximation lemma whose second statement is an easy consequence of Lemma 3.3 in [2] (see also [1] p. 79). Indeed, the statement is given there for the distance of the weak convergence. Since the measure ν (and therefore its approximation) has compact support, the same is true for the Wasserstein distance.
ii) Let ν be probability on a compact set in R + , with no atoms. Let (x i,N ) the sequence of real numbers defined by Then, and for any δ > 0 and N large enough, It is easy to see that µ M converges weakly to µ as M tends to ∞. Moreover, by dominated convergence theorem, This implies the convergence in W p distance (see Proposition 5.3, ii)). 2 To prove the lower bound (2.8), we will repeat almost verbatim the proof of [1] pp.
79-81, but follow step by step the rôle played by λ (N ) . We assume that I(c, µ) < ∞ so that µ has no atoms. We can also assume that µ is compactly supported, by considering µ M defined in Lemma 3.1. One can check that I(c, µ M ) → I(c, µ).
Recall that where the λ (k) are the increasing sequence of eigenvalues.
and we can write L N as Since on ∆ N , the (y i ) and the (x i,N ) form both increasing sequences, we have the lower bound: EJP 21 (2016), paper 52. and we use the same minoration as in [1] for the term For the second term, we use, since the y i and x i,N are positive, We get:Q Since we have assumed that µ is compactly supported, the sets {x i,N , 1 ≤ i ≤ N − 1} are uniformly bounded and by continuity of V , for some constant C 1 , which yields  On the other hand, from the choice of the x i,N , we have Finally the product in (3.4) can be managed exactly as in [1] p. 80. We conclude which is the expected lower bound.

Proof of the upper bound
We start as in the proof of the lower bound with the representation (3.1). Formula (3.3) can be rewritten as We have assumed V ≥ 0 so the second term in the third line of (3.5) is non positive. For the first term of the same line, notice that We bound the term in the second line of (3.5) by (N − 1) |x|dµ N (x). On the event {µ N ∈ B}, we have EJP 21 (2016), paper 52.
which is a lower semicontinuous function of ν.
We obtain: and since Σ M grows to Σ as M goes to infinity, this yields the upper bound (2.9).
The same is true for F =] − ∞, a]. Now, take F a non empty closed set.
The last term in the above equation is inf c∈F V (c − x)dµ(x). 2

Proof of Propostion 2.11 and Theorem 2.12
From Corollary 2.8, we have lim sup 2 as soon as F is compact. To extend this property to closed sets, we follow the classical way and prove:  With our assumptions on the potential V , there exists c 1 , c 2 > 0 such that Let a < b and C = sup{|a|, |b|}. For N ≥ 2, using the convexity of x p It remains to use the exponential tightness for the ESD µ (N ) , see [1], p. 77) where it is shown that: whereM is an affine function of M . From Lemma 3.2, (3.9) is satisfied for F a closed set of M q 1 (R + ). 2

Proof of Proposition 2.11
By Proposition 2.5, we know that for ∆ = and then, for a closed set F, (3.11) from (3.9) for closed sets. Now, we use the easy bound inf µ∈F,c∈∆

Proof of Theorem 2.12
We use Proposition 2.5 to estimate the probabilities of the conditioning events.
We want to study the fluctuations of µ N around its limit ν V given in (2.5). There are two contributions: the fluctuations of the largest eigenvalue and the fluctuations of the ESD. This yields a dichotomy according to the behavior of the test function. For the sake of simplicity, we choose a simple assumption on the test function f which is far from optimal. For V and β we introduce a new assumption: where TW β denotes the Tracy-Widom distribution of index β (see [12] for a definition), and where ⇒ denotes the convergence in distribution. 2. If V satisfies Assumption 2.2 and β > 0 and if ν V (f ) = 0, where γ V is a signed measure on [a V , b V ] given by formula (3.54) in [8].
Let us notice, from Remark 3.5 in [8], that in the Gaussian case, V (x) = x 2 /2, then Proof. Let H > b V − a V . Set K H the random set defined by we make a Taylor expansion of f : EJP 21 (2016), paper 52. Adding, The two sources of fluctuations are the convergences of ε N (rescaled) and ∆ N (f ).
• The first term converges to zero, thanks to (4.8).
• The second term is bounded by E (|R N (f ) ∧ 2|1 K H ) which tends to zero since on K H and each of these terms tends to zero in probability, thanks to (4.9) and (4.7).
• The third one is bounded by 2P((K H ) c ) which tends to zero, since the extreme eigenvalues tend to the endpoints of the support.

Appendix
We give some properties of the Wasserstein distance d Wp . where Π(µ, ν) is the set of probabilities on R 2 with first marginal µ and second marginal ν. f dµ − f dν .
We now give a characterization of the convergence of probabilities in the topology induced by d Wp on M p 1 (R). We refer to [13, Def. 6.8 and Theorem 6.9]. In the following, we denote by µ n → µ the weak convergence of probabilities, i.e. against bounded continuous functions. The condition in iv) is the condition of tightness, or relative compactness, in (M p 1 (R), d Wp ). In particular, it follows that, for any M ∈ R, the set K M := {µ, |x| p dµ(x) ≤ M } is a compact set in (M q 1 (R), d Wq ) for any q < p.