Convergence of the eigenvalue density for beta-Laguerre ensembles on short scales

In this note, we prove that the normalized trace of the resolvent of the beta-Laguerre ensemble eigenvalues is close to the Stieltjes transform of the Marchenko-Pastur (MP) distribution with very high probability, for values of the imaginary part greater than m^{-1+\epsilon}. As an immediate corollary, we obtain convergence of the one-point density to the MP law on short scales. The proof serves to illustrate some simplifications of the method introduced in our previous work to prove a local semi-circle law for Gaussian beta-ensembles.


Introduction
Consider an m×n matrix X, whose entries are i.i.d. complex Gaussian random variables with mean 0 and variance E|X ij | 2 = 1. The m × m matrix H = XX * , the star * denoting the conjugate transpose, is a Wishart matrix [23]. It is a classical result of random matrix theory that the distribution of the eigenvalues λ 1 , . . . , λ m of H is given by the density f 2 (λ) on R m : Here Z 2,m is a normalization factor. Suppose the limit m/n → d exists for 0 < d ≤ 1.
Then the empirical distribution of the eigenvalues of the rescaled matrix H/m converges to the Marchenko-Pastur distribution, with density In fact, V.A. Marchenko and L.A. Pastur [18] showed that weak convergence of the eigenvalue distribution also holds when the entries of X are i.i.d. but not necessarily Gaussian. A. Edelman and I. Dumitriu [7] have introduced an infinite family of tridiagonal random matrix models, termed β-Laguerre matrices, which generalize the Wishart model, and have the explicit eigenvalue density (1) f β,a (λ) = Z −1 β,m i<j for 0 < β < ∞. Here a > β(m − 1)/2 is a real parameter. Z β,m is another normalization factor. The (Gaussian) Wishart eigenvalue distribution corresponds to the Laguerre ensemble with β = 2 and a = βn/2. Almost-sure convergence of the eigenvalue density to the Marchenko-Pastur distribution with parameter was established for the eigenvalue distributions of the β-Laguerre matrix models by the moment method in [6]; that is, for any a < b: almost surely as n → ∞. F β,n (x) is the eigenvalue distribution function, defined as in the case β = 2 above. The object of the present note is to extend these results on convergence of the eigenvalue distribution of general β-Laguerre ensembles to short intervals [a m , b m ] such that b m −a m = O(m −1+ǫ ), for ǫ > 0 arbitrary.
The main result is the following: Theorem 1. Let β > 0, and 0 < d ≤ 1. Set a = mβ/2d. Let δ, κ, ǫ > 0 be positive parameters, and E ∈ (λ − + κ, λ + − κ). Denote by N I the number of eigenvalues of the β-Laguerre ensemble of size m with parameter a in the interval I. For any k > 0, there exists a constant C δ,κ,ǫ,k such that: The proof of Theorem 1 is obtained by combining Theorem 2 with Corollary 3 in Section 4. Following recent work of L. Erdös, B. Schlein, H.T. Yau and collaborators in the case of Wigner matrices (see for example [9], [10], [11], [12], [13], [14]), our method is based on the study of the resolvent matrix of the Edelman-Dumitriu tridiagonal models.
Let M β,d denote a normalized tridiagonal β-Laguerre matrix (see Section 2 for details on notation). The imaginary part of the trace of the resolvent provides an approximation for the eigenvalue distribution on scales comparable to the distance between z and the spectrum of M β,d . In [20], we showed how a resolvent expansion, together with an iterative argument based on the Schur complement identity could be used to derive a local version of the semi-circle law for the Gaussian β-ensembles. Tridiagonal models for these eigenvalue distributions appeared in [7]. In Proposition 2.1, we use a resolvent expansion to show that s β,d (z) is close to the Stieltjes transform of ρ M P,d for values of z away from the spectral edges and with ℑz > m −1/4+ǫ . This implies that Theorem 1 holds for intervals I of size |I| ≥ m −1/4+ǫ . The argument in the present work is substantially simpler than the corresponding one in [20], where we proved a local convergence result on the scale m −1/2+ǫ using a resolvent expansion and asymptotics for Hermite polynomials derived by the Riemann-Hilbert method. Instead of attempting to exploit the cancellation due to oscillation of the normalized Laguerre polynomials, we use a general off-diagonal resolvent estimate (see Lemma 2.2). The computation of the limit of the normalized resolvent trace for the deterministic matrix corresponding to β = ∞ using Riemann-Hilbert asymptotics in [20] has been replaced by a less involved derivation, see Section 3. The iterative argument leading to Theorem 2 in Section 4 is similar to the one in [20]. Note that Theorem 2 is deduced from Propositions 4.3 and 4.4, without reference to Proposition 2.1. In contrast, the local result on the intermediate scale m −1/2+ǫ in [20] was used as an input for an inductive argument to reach the scale m −1+ǫ 1 . Although it can be entirely replaced by the iteration in Section 4, we have chosen to present the argument for Proposition 2.1 because it provides an elementary alternative to the Schur complement approach to proving the Marchenko-Pastur law. The proof of Proposition 2.1 does not depend on the specific properties of the β-Laguerre ensembles other than concentration of the entries around their mean. It can be applied to fairly general tridiagonal models with independent entries to prove convergence of the eigenvalue distribution, and convergence up to some intermediate scale depending on the magnitude of the entries. We give examples of such extensions in Section 5.
We end this section with some references to previous literature. Local versions of the Marchenko-Pastur law for the eigenvalue distribution of XX * when the entries of X are independent but not necessarily Gaussian have appeared in [4], [12], [21]. The first paper deals with the hard edge of the spectrum in case d = 1. The use of a perturbative expansion around deterministic matrices associated to Laguerre polynomials already appears in [8], where the authors study fluctuations of the spectrum in the large β limit. In [19], I. Popescu proves convergence and Gaussian fluctuations for the moments of tridiagonal matrices under a scaling assumption for the moments of the entries. In contrast to the result in [20] at the time of publication, Theorem 1 appears to be new for general β. 1 Modulo minor changes, the inductive argument in [20] can also be used to obtain the semi-circle law down to scale n −1+ǫ .

Tridiagonal models and resolvent expansion
We recall the main result from the work of A. Edelman and I. Dumitriu [7], which will be our starting point. Let m ∈ N, β > 0 and choose 0 < d ≤ 1. Let a = mβ/(2d). Consider the bi-diagonal matrix In the equation above, the symbol χ r , r > 0 represents a random variable with chi distribution with r degrees of freedom, defined by the probability density function Γ denotes the Gamma function, defined for x > 0 by The random variables appearing in the matrix B β are all independent. By [7], the eigenvalue distribution of the real tridiagonal matrixM β,d = B β B t β is given by the density function f β,a in (1). We will be concerned with the scaled β-Laguerre matrix model, defined by We will determine the behavior of the normalized trace of the resolvent, s β,d (z) in a neighborhood of the real axis in the upper half-plane C + = {ℑz > 0}: The imaginary part ℑs β,d (z) is the Poisson integral of the eigenvalue distribution. As such, it is an approximation to the empirical eigenvalue distribution of M β,d at scale ℑz. To prove Theorem 1, it will suffice to show that, with probability no less than 1 − C δ,κ,ǫ,k m −k , we have 2dz is the Stieltjes transform of the Marchenko-Pastur distribution with parameter d, (see [13], Lemma B.1 or [14] 7.1, as well as Corollary 3 below).
We begin by writing The entries of the tridiagonal matrix ∆ are For large m, all the entries of ∆ are simultaneously small in magnitude, with overwhelming probability: for any c > 0. Here and below, we will say an event E = E(m) holds with overwhelming probability if, for each k, there is a constant C k such that This follows readily from the definitions, and properties of the χ r and χ 2 r distributions, which are concentrated around their mean Eχ 2 r = r, with exponential tails.
2.1. The matrix M ∞,d and generalized Laguerre polynomials. The spectral theory of the symmetric matrix M ∞,d can be described explicitly in terms of Laguerre polynomials. For α > −1, the generalized Laguerre polynomials L α k , k = 0, 1, . . . are orthogonal polynomials with respect to the measure The eigenvalues of M ∞,d are the normalized zeros of the mth generalized Laguerre polynomial with parameter There is a complete set of corresponding eigenvectors v i of the form We denote by u i the normalized eigenvectors To derive the above facts regarding the eigenvalues and eigenvectors of M ∞,d , we start from the three term recurrence relation for the Laguerre polynomialsL α k , normalized to have leading coefficient The three term recurrence relation reads (see [3, Section 4.11, p. 143]) The polynomialsL α k are related to the orthonormal polynomials L α k by see [3,Section 4.5]. Multipliying (7) by and using (8) and the relation Γ(x + 1) = xΓ(x), we find Letting x = l i , where l i is one of the m (real) zeros of L α+1 m (x), it follows that On the other hand, when j = 1, we use the recurrence relation [2, (22.7.30) (7) before normalizing. Since L α m (l i ) = 0, equation (9) then simplifies to Similar reasoning when j = m shows that (10) holds for all j = 1, . . . , m.

The resolvent expansion.
In this section we show that the normalized trace of the resolvent is well approximated by the Stieltjes transform of the Marchenko-Pastur density for ℑz > m −1/4+ǫ , ǫ > 0. Specifically, we have the following with overwhelming probability.
The choice of the scale m −1/4+ǫ in Proposition 2.1 is somewhat arbritrary, given that the inductive argument of Section 4 will gradually improve any initial convergence result to the optimal scale m −1+ǫ . The exponent 1 4 − ǫ was chosen because it represents a scale accessible without detailed information on the size of the entries of the resolvent. Using the estimates for Laguerre polynomials in [16], and the method of [15] to obtain a lower bound for v i , one can prove sharper bounds than (15) (13) can then be summed for values of ℑz smaller than m −1/4 , but still much larger than m −1+ǫ . We will not pursue this here.
The proof of Proposition 2.1 proceeds by a resolvent expansion comparing the trace of the resolvent of M β,d to that of the deterministic matrix M ∞,d . The precise spectral information available for the matrix M ∞,d allows us to calculate the large m limit of the normalized trace s ∞ (z), and to control the resolvent expansion (13). Starting from (2), write: Upon iteration, this yields Taking traces and normalizing: By the estimate (24) for the normalized trace s ∞ (z) of (M d,∞ − z) −1 , we can replace the first term on the right with s M P,d (z), introducing an O κ (m −ǫ ) error. It suffices to show that the remaining terms also decay at least at an O(m −ǫ ) rate as m → ∞.
Expand the kth term in of the sum in (13) to find: where i ′ l = i l , i l + 1 or i l − 1. Taking absolute values and using (3), this is bounded by with overwhelming probability. The next lemma gives an estimate for the entries R ij of the resolvent which is efficient when i and j are widely separated.
Several results on exponential decay of the resolvent entries could be used to obtain Lemma 2.2. We will use a slight variant of the estimate of Combes-Thomas type developed by M. Aizenman in the context of localization for discrete random Schrödinger operators: , and let H be a self-adjoint operator on ℓ 2 (Γ) whose off-diagonal elements are exponentially summable: Then, for energies not in the spectrum of H, with Note that the hypothesis (16) differs from the one in Lemma II.1 in [1], but the same proof works also for Lemma 2.3 as stated.
To bound (14), we can assume that i j = i ′ j . Indeed, the bound (15) only changes by a constant factor when the index i is changed by a unit. Consider the sum over i 1 in (14): We split this sum into three regions: By (15), the first two sums are bounded by We are reduced to considering the sum The second factor in each summand of (17) is estimated as follows: The first factor on the right side of the inequality (18) is no greater than m 1/8−ǫ/2 , since u i = 1 and 1 In summary, (17) is bounded by . We proceed to sum over i 2 in (14): We can now repeatedly sum over i 3 , . . . , i k−1 , using (15) at every step. Each summation results in an additional factor of m 1/2−ǫ . Thus (14) is bounded by Using the Cauchy-Schwarz inequality as in (18), we have
Choosing c smaller than ǫ/2 in (3), this last quantity is O(m −kǫ/2 ) with overwhelming probability. As for the final term in (13), it is bounded by for each i and j, provided ℑz > m 1/4−ǫ . Performing each of the k sums using (15) as previously, we find that the sum is bounded by with overwhelming probability. Letting k in (13) be larger than 8/ǫ, we obtain with overwhelming probability. Combined with the approximation (24), the relation (20) implies Proposition 2.1.

Convergence for β = ∞
In this section, we identify the limit of the resolvent for z as close as m −1/2+ǫ to the limiting spectrum, but away from its edges. The eigenvalues λ i are given by (4). In [20], the corresponding quantity for Hermite polynomials was estimated using asymptotics for these orthogonal polynomials in the complex plane.
Here, we use a differential equation for s ∞ (z) derived from the ODE satisfied by Laguerre polynomials ([2, (22.6.15)], [3, Section 4.5, (4.5.1)]: . Differentiating the ratio on the right of (21) and using (22) To solve the equation approximately, we treat the final two terms as error terms, and use the rough estimates These follow at once from |λ i − z| ≥ m 1/2−ǫ , so that ε = O(m −ǫ ). For z in the region we find the solutions The solution s − satisfies ℑs − > 0 for ℑz > 0, and we conclude that the right side of the equation is an approximation for s β,d : We expect that the error term can be replaced by O(1/m), but (24) is sufficient for our purposes.

Inductive argument
In this section, we improve the convergence of the density on short scales from the level m −1/4+ǫ to the optimal level of m −1+ǫ by an inductive argument. where the domain D is defined as To prove Theorem 2, we need three facts about the tridiagonal models and Stieltjes transforms. The first can be found in [6]: such that the first row of Q is independent of Λ and consists of independent entries with χ β distribution, normalized to unit norm.
The next corollary establishes the link between control of Stieltjes transform and control of the eigenvalue density. See for example [13]: Finally, we shall need the following standard result. A proof can be found in [20]: Suppose the Marchenko-Pastur distribution holds at level m a for some −1 < a ≤ 0, that is, for any c > 0, |s(z) − s M P,d (z)| < c for sufficiently large m, with overwhelming probability.
Then we have for any z such that m −1 < ℑz < m a .
The proof of Theorem 2 is a combination of the following propositions: Proposition 4.3. Suppose the Marchenko-Pastur law holds at level m a for some −1 < a ≤ 0, and that we have for ℑz > m a with overwhelming probability. Then, for any δ > 0, holds for z such that ℑz > m (a−1)/2+δ with overwhelming probability.
For a given sequence X n of random variables, we write with overwhelming probability if, for every c > 0, the event {|X n | ≤ c} has overwhelming probability.
Proposition 4.4. Suppose that with overwhelming probability, |R β,d 11 (z) − s M P,d (z)| = o(1) for z such that ℑz > m a for some −1 < a ≤ 0, then we have an improved Marchenko-Pastur law, that is: for z such that ℑz > m (a−1)/2+δ for any δ > 0 with overwhelming probability. What remains is the proof of the two propositions above: Proof of Proposition 4.3. By Schur's complement, we have the following relation .
Here a 11 denotes the normalized (1, 1)-entry of the bidiagonal matrix B β , a 2,1 the normal- removing the first row and column of B β . We remark that a 2 11 is distributed as d βm χ 2a and a 2 21 distributed like d βm χ β(m−1) . Lastly, a 11 , a 21 and R are independent. By the argument in Section 3, we need only show that R β,d 11 satisfies the approximate functional equality with overwhelming probability for z such that ℜz ∈ [λ − + κ, λ + − κ] and ℑz ≥ m (a−1)/2+δ , with δ > 0 arbitrary. We first note that, due to the restriction on ℜz (in particular, ℜz > κ since λ − > 0), equation (31) is equivalent to: where from now on we suppress the superscript β, d.
Rewriting equation (30), we have (33) za 2 21 · R 11 β,d R β,d 11 + zR β,d 11 + a 2 21 R 11 β,d − a 2 11 R β,d 11 + 1 = 0. We can further rewrite the above as It suffices, therefore, to show that (1) and for the set of z we are interested in. We first note that a 2 21 has distribution d βm χ 2 β(m−1) and so with overwhelming probability for all δ > 0. Similarly, a 2 11 has the distribution of d βm χ 2 βm/d and so |a 2 11 − 1| ≤ Cm −1/2+δ with overwhelming probability for all δ > 0. The assumption of the proposition implies that for all w with ℑw > m a , we have |R β,d 11 (w)| ≤ C, for some constant C. By (35), which will be proved independently below, we also have The inequality |R β,d 11 (z)| ≤ w z |R β,d 11 (w)|, for ℑz ≤ ℑw, now implies that approximation (36) holds with overwhelming probability whenever ℑz > m (a−1)/2+δ . We turn our attention to (35). Firstly, we write and similarly where ( q 1 , . . . q m−1 ) is the first row of the eigenvectors for M β , λ j are the eigenvalues and (q 1 , . . . , q m ) is the first row of eigenvectors for M β , and λ j the eigenvalues. We write where E X denotes the expectation with respect to the random variables X. By Proposition 4.1, q and λ j are independent and so areq andλ j . We write the first term in (37) as Since q and λ j are independent, we can condition on λ j and apply Proposition 4.5 below to the sum to conclude that the right side of the previous equation is bounded with overwhelming probability by By Lemma 4.2, the latter is bounded by C(ℑz) −1 m −1/2+a/2+c log m for some constant C and any c > 0. The fifth and sixth terms of equation (37) are similarly bounded. To deal with the middle two terms, we use the interlacing property of the eigenvalues of a matrix and its minor, and then the interlacing property of the eigenvalues of a matrix and a rank 1 perturbation, to obtain that these terms are bounded by 1 mℑz .
We end the proof of Proposition 4.3 with the following simple variant of McDiarmid's inequality, a proof of which can be found in the appendix of [20]: Proposition 4.5. Let X 1 , . . . , X m be independent subgaussian random variables. Let Ω ⊂ R n be such that (X 1 , . . . , X m ) ∈ Ω with overwhelming probabibility. Let F be a real function of m real variables such that if for all 1 ≤ i ≤ m, and outside of Ω, F is bounded by a polynomial in m. Then for any λ > 0, one has P(|F (X) − E(F (X))| ≥ λσ) ≤ C exp(−cλ 2 ) for some absolute constants C,c > 0, and σ = m i=1 c 2 i .
Proof of Proposition 4.4. The proof is similar to that of Proposition 4.3. By the assumption, it suffices to establish that with overwhelming probability. The difference inside the absolute value sign is j 1 m −q 2 j λ j −z . The statement now follows by another concentration argument and Lemma 4.2.

Some extensions
The resolvent expansion in Section 2 allows for a quick proof of convergence of the eigenvalue distribution for tridiagonal matrices with independent entries "close to" a Jacobi matrix whose limiting spectral density is known. Consider a sequence of random tridiagonal matrices given by uniformly in 1 ≤ j ≤ n with overwhelming probabilty, for some α > 0, wherē b j = Eb j , a j = Ea j .
If the deterministic tridiagonal matrix EAn n α has a limiting spectral density µ, that is, if the spectral measures µ n of n −α · EA n converge weakly to some measure µ, then for z in any compact subset D of {ℑz > 0}. Using Lemma 2.2 as in the proof of Proposition 2.1 to perform a one-step resolvent rexpansion, we have, for z ∈ D: with overwhelming probability. Thus the eigenvalue distribution of n −α A n almost surely converges weakly to µ.
As an application, consider the following example from [19]. Consider the Jacobi matrix The moments of the rescaled matrix n −α J n converge to those of the Nevai-Ullman distribution [22], [17, Section 5]: 1 n tr(n −α J n ) k → µ k , By a standard density argument, where X j , 1 ≤ j ≤ n − 1 and Y j , 1 ≤ j ≤ n, are mean zero random variables whose kth moment is bounded uniformly in j for all k, then for any β < α (41) holds for the matrix n −α A n . If we instead require that for k = 1, 2, then for any c > 0, we have uniformly in j, and a similar bound for a j . The convergence (41) with µ = ν α holds in probability. Popescu [19] assumes (42) holds for all k and obtains convergence of the moments of n −α A n as well as almost sure convergence to the limiting distribution. Note that [19] contains more general results that apply also to cases not easily accessible by our method. Given more precise information on the size of the entries of A n or the rate of convergence of the Stieltjes transform for the deterministic model, one can improve on the above. To give a simple example, we introduce a "positive temperature" version of the Jacobi matrices associated to the orthogonal polynomials p n (x), n ≥ 0, for the measures .
Let the random variable X j have distribution given by g j,m , with all the X j independent. We have Hereb n is the coefficient in the three-term recurrence relation: xp n (x) =b n+1 p n+1 (x) +ã n p n (x) +b n−1 p n−1 (x).
Note thatã n = 0. The eigenvalues of the matrix are the (rescaled) zeros of the nth orthogonal polynomial with respect to (43). Their limiting density can be explicitly computed; see [5,Eq. (2.4)]. It has the form where h m (x) is a polynomial of degree 2m − 2. A calculation as in [20] using the Riemann-Hilbert asymptotics in [5] shows that the normalized trace of (J n − z) −1 approximates the Stieltjes transform of the limiting density µ(dx) with precision O(1/n) for ℑz > n −1+ǫ and z away from ±1. By the resolvent expansion argument of Section 2 and Corollary 3, convergence of the empirical eigenvalue distribution for (α m n) −1/2m A n holds for intervals of size |I| ≥ n −1/4m+ǫ strictly inside the bulk of the limiting density.