Poisson Statistics for the Largest Eigenvalues of Wigner Random Matrices with Heavy Tails

We study large Wigner random matrices in the case when the marginal distributions of matrix entries have heavy tails. We prove that the largest eigenvalues of such matrices have Poisson statistics.


Introduction and Formulation of Results.
The main goal of this paper is to study the largest eigenvalues of Wigner real symmetric and Hermitian random matrices in the case when the matrix entries have heavy tails of distribution. We remind that a real symmetric Wigner random matrix is defined as a square symmetric n × n matrix with i.i.d. entries up from the diagonal A = (a jk ), a jk = a kj , 1 ≤ j ≤ k ≤ n, {a jk } j≤k − i.i.d. random variables. (1) A Hermitian Wigner random matrix is defined in a similar way, namely as a square n × n Hermitian matrix with i.i.d. entries up from the diagonal A = (a jk ), a jk = a kj , 1 ≤ j ≤ k ≤ n, {a jk } j≤k − i.i.d. complex random variables (2) (it is assumed quite often that the distribution of the diagonal matrix entries is different from the distribution of the off-diagonal entries). While our main results (with obvious modifications) are valid for the Hermitian random matrices with i.i.d. entries as well, we will restrict ourselves for the most of the paper to the real symmetric case. The ensembles (1), (2) were introduced by E.Wigner in the fifties ( [33], [34]), [35]) who considered the case of centered identically distributed (up from the diagonal) matrix entries with the tail of the distribution decaying sufficiently fast so that all moments exist. Wigner proved that under the above conditions the mathematical expectation of the empirical distribution function of n −1/2 A converges as n → ∞ to the semicircle law, i.e.
where λ 1 , . . . , λ n are the eigenvalues of n −1/2 A, the density of the semicircle law is given by and σ 2 is the second moment of matrix entries. Wigner's result was subsequently strengthened by many mathematicians (see e.g. [17], [1], [24], [8]). In particular, Pastur ([24]) and Girko ([10]) proved that if A (n) = (a (n) ij ), 1 ≤ i, j ≤ n, is an n × n random symmetric matrix with independent (not necessarily identicaly distributed) centered entries with the same second moment σ 2 , then the necessary and sufficient for the convergence of the empirical distribution function of the eigenvalues of n −1/2 A (n) to the semicircle law has the Lindeberg-Feller form: 1 ij . The simplest case of (1) from the analytical viewpoint is when the entries a ij , 1 ≤ i ≤ j ≤ n are independent Gaussian N (0, 1 + δ ij ) random variables. Then the ensemble is called Gaussian Orthogonal Ensemble (GOE). We refer the reader to [21], chapter 6 for the discussion of the GOE ensemble. In a similar fashion the ensemble of Hermitian random matrices with independent and identically distributed (up from the diagonal) matrix entries is called Gaussian Unitary Ensemble (GUE, see [21], chapter 5) amd the ensemble of Hermitian self-dual matrices with quaternion entries is called Gaussian Symplectic Ensemble (GSE, see [21], chapter 7). In the Gaussian ensembles one can study in detail the local statistical properties of the eigenvalues both in the bulk and at the edge of the spectrum. In the seminal papers ( [31], [32]) Tracy and Widom proved that after the proper rescaling the distribution of the largest eigenvalues in the GOE, GUE and GSE ensembles converges to what is now called Tracy-Widom distribution. In particular they proved that for the largest eigenvalue of GOE and for the largest eigenvalue of the GUE ensemble where q(x) is the solution of the Painléve II differential equation q ′′ (x) = xq(x) + 2q 3 (x) determined by the asymptotics q(x) ∼ Ai(x) at x = +∞. It was also established (see [31], [32], [9]) that after the rescaling at the edge of the spectrumas the k-point correlation functions have a limit. In the GUE case the limiting k-point correlation functions has a determinantal form where is a so-called Airy kernel. In the GOE case the limiting k-point correlation functions have a pfaffian form and can be written as where K(x, y) is a 2 × 2 matrix kernel such that where ǫ(z) = 1 2 sign(z). Similar formulas hold for the GSE. Soshnikov ([28], see also [29]) proved that the Tracy-Widom law for the largest eigenvalues is universal provided the laws of the distribution of matrix entries are symmetric, all moments exist and do not grow too fast. To the best of our knowledge there are no results proving universality in the bulk of the spectrum for real symmetric Wigner matrices. In the Hermitian case Johansson ( [13]) proved the universality in the bulk under the condition that matrix entries have a Gaussian component.
In this paper we will study the spectral properties of Wigner real symmetric and Hermitian random matrices in the case when the matrix entries have heavy tails of distribution. In other words, in addition to the assumption that a jk are i.i.d. real (complex) random variables up from the diagonal (1 ≤ j ≤ k ≤ n) we also assume that the probability distribution function satisfies where 0 < α < 2 and h(x) is a slowly varying function at infinity in a sense of Karamata ([14], [26]), in other words h(x) is a positive function for all x > 0, such that lim x→∞ h(tx) h(x) = 1 for all t > 0. The condition (12) means that the distribution of |a ij | belongs to the domain of the attraction of a stable distribution with the exponent α (see e.g. [11], Theorem 2.6.1). The case h(x) = const(1 + o(1)) in (12) was considered on a physical level of rigor by Cizeau and Bouchaud in [4], who called such set of matrices "Lévy matrices" (see also [2] and [12] for some physical results on the unitary invariant Lévy ensembles). They argued that the typical eigenvalues of A should be of order of n 1 α . Indeed, such normalization makes the euclidean norm of a typical row of A to be (with high probability) of the order of constant. Cizeau and Bouchaud suggested a formula for the limiting distribution of the (normalized by n 1 α ) eigenvalues. The formula for the limiting spectral distribution has a more complicated density then in the finite variance case. Namely, they claimed that the density should be given by where L C,β α is a density of a centered Lévy stable distribution defined through its Fourier transform L(k) : and C(x), β(x) satisfy a system of integral equations Note that the density f (x) in (13) is not a Lévy density itself, since C(x), β(x) are functions of x. It was also argued in [4] that the density f (x) should decay like 1 x 1+α at infinity, which suggests that the largest eigenvalues of A (in the case (12), h(x) = const) should be of order n 2 α , and not n 1 α as the typical eigenvalues.
Recently, Soshnikov and Fyodorov ([30]) used the method of determinants to study the largest eigenvalues of a sample covariance matrix A t A in the case when the entries of a rectangular m × n matrix A are i.i.d. Cauchy random variables. The main result of [30] states that the largest eigenvalues of A t A are of the order m 2 n 2 and whereλ i , i = 1, . . . , n are the eigenvalues of 1 m 2 n 2 A t A, z is a complex number with a positive real part, the branch of √ z on D = {z : ℜz > 0} is such that √ 1 = 1. E denotes the mathematical expectation with respect to the random matrix ensemble of sample covariance matrices and E denotes the mathematical expectation with respect to the inhomogeneous Poisson random point process on the positive half-axis with the intensity 1 πx 3/2 . The convergence is uniform inside D (i.e. it is unform on the compact subsets of D).
The goal of the paper is to study the distribution of the largest eigenvalues of real symmetric (1) and Hermitian (2) Wigner random matrices in the case of the heavy tails (12). Let us introduce the normalization coefficient b n defined so that for all positive x > 0, where G is defined in (12) above. This can be achieved by selecting b n to be the infimum of all t for which for any δ > 0 as n → ∞ and n 2 h(bn) b α n → 2 as n → ∞. Let us denote the eigenvalues of b −1 n A by λ 1 ≥ λ 2 ≥ . . . λ n . As we will see below this normalization is chosen in such a way that the largest normalized eigenvalues are of the order of constant and the vast majority of the eigenvalues go to zero. The main result of our paper is formulated in the next two theorems.

Remark 1
It is easy to see (and will be shown later) that the r.h.s. of (22) also gives the limiting distribution of the maximum of the (properly normalized) matrix entries, i.e. lim n→∞ Pr( max Similarly to Theorem 1.1 one can study the distribution of the second, third, fourth, etc largest eigenvalue of A. The following general result holds. n A holds true as well. We remind the reader that a Poisson random point process on the real line with the locally integrable intensity function ρ(x) is defined in such a way that the counting functions (e.g. numbers of particles) for the disjoint intervals I 1 , . . . , I k are independent Poison random variables with the parameters I j ρ(x)dx, j = 1, . . . , k. Equivalently, one can define the Poisson random point by requiring that the k-point correlations functions are given by the products of the one-point correlation functions (intensities), i.e. ρ k (x 1 , . . . , x k ) = k j=1 ρ(x j ). Remark 3 The arguments of the proof are quite soft and can be extended without any difficulty to banded random matrices and some other classes of Hermitian and real symmetric random matrices with independent entries with heavy tails of distribution.
Remark 4 For random Schrödinger operators the Poisson statistics of the eigenvalues in the localization regime was first proved by Molchanov in [23] (see also the paper by Minami [22]). There is a vast literature on the Poisson statistics of the energy levels of quantum systems in the case of the regular underlying dynamics (see e.g. [3], [27], [16], [25], [5], [18], [19], [20]).
The rest of the paper is organized as follows. We prove Theorem 1.1 in section 2. The proof of Theorem 1.2 is presented in section 3.

Proof of Theorem 1.1
We will prove Theorems 1.1 and 1.2 in the real symmetric case. In the Hermitian case the arguments are essentially the same. We start by considering the distribution of the largest matrix entry. It follows from (20) that (see e.g. [15], Theorem 1.5.1).
Let us order the N = n(n + 1)/2 i.i.d. random variables |a ij |, 1 ≤ i ≤ j ≤ n, i.e. |a i 1 j 1 | ≥ |a i 2 j 2 | ≥ |a i 3 j 3 | ≥ . . . ≥ |a i N j N |, where (i l , j l ) are the indices of the l-th largest (in absolute value) matrix entry. We will use the notation a (l) = b −1 n |a i l j l |, l = 1, . . . , N for the normalized l-th largest absolute value of matrix entries.
In this paper only the statistical properties of a finite number of the largest order statistics will be of importance. In particular, for any finite k the inequalities between the first k order statistics are strict with probability going to 1 as n → ∞. We start with a standard proposition that generalizes (24).
In other words, for any ǫ > 0 the restriction of the point configuration {b −1 n |a ij |} to the interval [ǫ, +∞) converges in distribution on the cylinder sets to the inhomogeneous Poisson random point process with the intensity ρ(x) = α x 1+α (we refer the reader to [6], sections 2 and 3 for the definition and elementary properties of Poisson random point processes). In particular, The proof of the proposition is a straightforward generalization of the calculations in (24)  For the convinience of the reader we sketch the proof of the lemma. We start with part a). We first remind the reader that by a classical result by Karamata ([14], see also [26] for a nice exposition of the subject) a slowly varying function at infinity can be respresented on the interval [B, +∞), where B is sufficiently large, as where η(x) is a bounded measurable function which has a limit at infinity and ε(x) is a continuous function on [B, +∞) such that lim x→∞ ε(x) = 0. Using (12), (21) and (26) In a similar fashion, in order to prove the statement b) we have to show that n(n−1) (1)), which again follows from (12), (21) and (26). The proof of part c) is very similar and, therefore, left to the reader.
Let (i 1 , j 1 ) be the indices of the maximal (in absolute value) matrix element. It follows from (24) and part a) of Lemma 1 that with probability going to 1 one has that |a i 1 j 1 | ≥ b 99/100 n and i 1 = j 1 . Let f 1 be a unit vector in R n such that all its coordinates except the i 1 -th and j 1 -th are zero, the i 1 -th coordinate is 1 and the j 1 -th coordinate is +1 if a i 1 j 1 is nonnegative and −1 otherwise. Then one can easily calculate tha value of the quadratic form (Af 1 , f 1 ) = |a i 1 j 1 | + 1 2 a i 1 i 1 + 1 2 a j 1 j 1 , and it follows from (24) and Lemma 1 b) that with probability going to 1 the sum of the last two terms is much smaller than the first term, and, in particular, (Af 1 , ). Since f 1 is a unit vector, we clearly have λ 1 ≥ (Af 1 , f 1 ). To show that |a i 1 j 1 |(1 + o(1)) is also an upper bound on the largest eigenvalue we need one more lemma. as n → ∞.
Finally, we observe that for any fixed row i for sufficiently small positive κ > 0. The proof of (32) is similar to the arguments given above and will be left to the reader. Since by part c) of Lemma 1 with probability going to 1 each row has at most one entry greater in absolute value than b In the case 0 < α < 1 the onsideration is similar. We again choose ǫ = 1 2M +1 , where M is some integer in such a way that ǫ < α 8 and γ = 1 4M +2 so that with probability 1 − O(n exp(−n γ )) we have that for all rows 1 ≤ i ≤ n of the matrix A j:|a ij |≤b for sufficiently large n. Combined with (32) this finishes the proof. Lemma 2 is proven. Remark 5 It follows from (24) and Lemma 2 that with probability going to 1. The Theorem 1.1 now follows from the fact that the matrix norm A ∞ is an upper bound for the largest eigenvalue of the matrix A.
3 Proof of Theorem 1.2 Let as in section 2 a (l) = b −1 n |a i l j l |, l = 1, . . . , N = n(n+1)/2 be the values b −1 n |a ij |, 1 ≤ i ≤ j ≤ n put in the decreasing order. In particular, a (1) = max ij b −1 n |a ij |. Similarly to section 2 we construct the unit vectors f l ∈ R n , l = 1, 2, . . . , such that all coordinates of f l except the i l -th and j lth are zero, the i l -th coordinate is 1 and the j l -th coordinate is +1 if a i l j l is nonnegative and −1 otherwise. Then Af l = |a i l j l |f l + 1 √ 2 m =i,j (a i l m + a j l m )e m , where e m are the standard basic vectors in R n , i.e. all coordinates of e m are zero, except for the m-th one which is 1. Since m =i,j a 2 i l m 1/2 ≤ m =i,j |a i l m |, it follows from Proposition 1 and Lemma 2 that for any finite number of the largest values a (l) , 1 ≤ l ≤ k one has that b −1 n Af l = a (l) f l + r l , where the euclidian norms of r l are o(1). Therefore by an elementary argument in perturbation theory for Hermitian operators we obtain that b −1 n A has eigenvalues a (l) (1 + o(1)), 1 ≤ l ≤ k, for any finite k. The last thing we need to show is that (with probability going to 1) these eigenvalues are exactly the k largest eigenvalues of b −1 n A. In other words, so far we proved that with probability going to 1 one has λ l ≥ a (l) (1 + o(1)), l = 1, . . . , k. Our goal now is to prove the reverse inequalities λ l ≤ a (l) × (1+ o(1)), l = 1, . . . , k. To simplify the exposition we will prove the result for the second eigenvalue. The reasoning in the general case is very similar. Let, as before, (i 1 , j 1 ) be the indices of the largest in absolute value matrix element. Consider an (n − 1) × (n − 1) submatrix obtained from A by deleting the i 1 -th row and the i 1 -th column. Let us denote the submatrix by B. It follows from Lemma 1 that i 2 = i 1 , j 2 = j 1 with probability going to 1. By the same reasoning as in Theorem 1 we have that the largest eigenvalue of b −1 n B is a (2) (1 + o(1)), but by interlacing property of the eigenvalues of A and B we have that the largest eigenvalue of B is not smaller than the second largest eigenvalue of A. Theorem 1.2 is proven.