Largest eigenvalues and eigenvectors of band or sparse random matrices

In this text, we consider an $N$ by $N$ random matrix $X$ such that all but $o(N)$ rows of $X$ have $W$ non identically zero entries, the other rows having less than $W$ entries  (such as, for example, standard or cyclic band matrices). We always suppose that  $1 \ll W \ll N$. We first prove that if the entries are independent, centered,  have variance  one, satisfy a certain tail upper-bound condition and $W \gg (\log N)^{6(1+\alpha)}$, where $\alpha$ is a positive parameter depending on the distribution of the entries, then the  largest eigenvalue of $X/\sqrt{W}$ converges to the upper bound of its limit spectral distribution, that is $2$, as for Wigner matrices. This extends some previous results by Khorunzhiy and Sodin where less hypotheses were made on $W$, but more hypotheses were made about the law of the entries and the structure of the matrix. Then, under the same hypotheses, we prove a delocalization result for the eigenvectors of $X$, precisely  that most of them cannot be essentially localized on less than $W/\log(N)$ entries. This lower bound on the localization length has to be compared to the recent result by Steinerberger, which states that the localization length in the edge is $\ll W^{7/5}$ or there is strong interaction between two eigenvectors in an interval oflength $W^{7/5}$.


Introduction
Random band matrices (i.e. random Hermitian matrices with independent entries vanishing out of a band around the diagonal) have raised lots of attention recently. Indeed, varying the bandwidth W from 1 to the full size shows (in the large size limit) a crossover between a strongly disordered regime, with localized eigenfunctions and weak eigenvalue correlation, and a weakly disordered regime, with extended eigenfunctions and strong eigenvalue repulsion. It is conjectured (and explained on a Physics level of rigor by Fyodorov and Mirlin in [9]) that for Gaussian band matrices, the localization strength (i.e. the typical number of coordinates bearing most of the ℓ 2 mass) of a typical eigenvector in the bulk of the spectrum shall be of order L ∼ N ∧ W 2 , so that eigenvectors of the bulk should be localized (resp. extended) if W ≪ √ N (resp. ≫ √ N). The only rigorous result in the direction of localization is by Schenker [13]. Therein it is proved that L ≪ W 8 for Gaussian band matrices. On the other hand, delocalization in the bulk is proved by Erdös, Knowles, Yau and Yin [7] when W ≫ N 4/5 . In both regimes, it is known from Erdös and Knowles [5,6] that typically (i.e. disregarding a negligible proportion of eigenvectors) L ≫ W 7/6 ∧ N for a certain class of random band matrices (with sub-exponential tails and symmetric distribution). We refer the reader to Spencer [15] and Erdös, Schlein and Yau [8] for a more detailed discussion on the localized/delocalized regime.
Regarding the edges of the spectrum, little is known about the behavior of the extreme eigenvalues and the typical localization length of the associated eigenvectors. As far as the limit of the largest eigenvalue is concerned, Khorunzhiy proved in [11] that for Gaussian band matrices, if (log N) 3/2 ≪ W ≪ N, then the extreme eigenvalues converge to the bounds u ± of the support of the limiting spectral measure (which is the semicircle law). For matrices with cyclic band structure and Bernoulli entries, Sodin extended this result to the case where log N ≪ W ≪ N in [14], where he proved important results about the fluctuations of the extreme eigenvalues around their limits. Concerning the localization length L of the eigenvectors associated to the extreme eigenvalues, one could conjecture the following on the basis of the Thouless argument as explained in [9]. For eigenvectors associated to eigenvalues λ close to the bottom edge e.g. u − , the localization strength should behave as L ∼ N ∧ W 2 (λ − u − ). Sodin's statement [14] combined with Erdös-Knowles-Yau-Yin's results [7] suggest that this should hold true as soon as W ≫ N 5/6 . Moreover, Steinerberger proved recently in [16] that for matrices with Bernoulli entries and cyclic band structure, with probability tending to one, we have either L ≪ W 7/5 or there is strong interaction between two eigenvectors in an interval of length W ( 7/5). Let us also mention that in the quite different framework of band matrices with heavy tailed entries, a transition between the localized and the delocalized regime at the edge was proved by the authors of the present text in [3].
In this text, we consider an random N × N Hermitian matrix X such that rows of X have W non identically zero entries (such as, for example, standard or cyclic band matrices). We always suppose that 1 ≪ W ≪ N. We first prove that if the entries are independent, centered, have variance one, satisfy a certain tail upper-bound condition and W ≫ (log N) 6(1+α) , where α is a positive parameter depending on the distribution of the entries, then the largest eigenvalue of X/ √ W converges to the upper bound of its limit spectral distribution, that is 2, as for Wigner matrices. This extends the above mentioned results by Khorunzhiy and Sodin [11,14] where less hypotheses were made on W , but more hypotheses were made about the law of the entries (they use in a crucial way the fact that the entries are symmetrically distributed) and about the structure of the matrix (in our result, we only need that most rows have W non zero entries, no matter what position the entries have on the matrix). Then, under some close hypotheses, we prove a delocalization result for the eigenvectors of X, precisely that most of them cannot be essentially localized on less than W/ log N entries. This lower bound on the localization length has to be compared to the recent result by Steinerberger in [16], which states that the localization length (here we use the word length rather than strength because in [16], the author considers intervals carrying most of the ℓ 2 -mass of the eigenvectors) in the edge is ≪ W 7/5 or there is strong interaction between two eigenvectors in an interval of length W 7/5 . The paper is organized as follows: our main results are stated in the next section,

hal-00863653, version 3 -18 Jan 2014
Theorem 2.4 is proved in Section 3, Theorem 2.9 is proved in Section 4, and some technical results needed here are proved in Section 5 and in the appendix.
Notation. Here, A ≪ B means that A/B −→ 0 as N → ∞. Let X denote the spectral radius of the Hermitian matrix X and λ max (X) denote its largest eigenvalue.

Main results
We make the following hypotheses.
Hypothesis 2.1. The matrix X = (X ij ) is an N ×N Hermitian random matrix (depending implicitly on N) with independent entries (modulo the symmetry).
and such that on each row of X, the number of non identically zero entries is ≤ W , with equality on all but o(N) rows. All non identically zero entries of X are centered with variance one. Moreover, there exist constants C ∈ [0, +∞) and α ∈ [0, +∞) such that for all k ≥ 2, Then the following theorem has been proved under weaker moment hypotheses in [4] (using the resolvent approach), but can easily be reproved here using a standard moment method as in [1,2]. Our first result is the following one.
with α the constant of (2). Then as N → ∞, we have the convergence in probability Remark 2.5. This theorem extends some results of [11] and [14]. In these papers, the convergence of (4) is proved under the respective hypotheses W ≫ (log N) 3/2 and W ≫ log N, but for some particular models of matrices: in [11] the matrices considered are Gaussian and in [14], they have Bernoulli distributed entries. Both make a crucial use of the fact that the entries are symmetrically distributed. Moreover, in both papers, the authors also suppose and rely heavily on a particular position of the non zero entries of X. We do not make such a hypothesis here.

hal-00863653, version 3 -18 Jan 2014
To state our main result, a lower bound on the localization length of eigenvectors of X, we slightly modify the hypotheses. Let X be a random matrix satisfying Hypothesis 2.1. We make the following two assumptions.
For example, Hypothesis 2.6 is satisfied if X satisfies the hypotheses of Theorem 2.4 or those the papers [11] and [14] (see Remark 2.5 above). However we emphasize that Hypothesis 2.6, focused on the extreme eigenvalues, does not make (at least directly) any assumption on the maximal number of non zero entries per row of X (it may be N), neither on the relative growth of W with respect to N.
We also reinforce the assumption on the tail of the distribution of the entries. Let C > 0 be fixed.
Hypothesis 2.7. The entries X ij belong to the set E C of complex random variables X such that Note that this assumption reinforces (2) as α ≤ 1/2. It is equivalent to the fact that there exists δ > 0 and K > 0 such that With a slight abuse of notation, we denote E C by E δ,K , as our proof mostly uses assumption (5).
The following theorem is the main result of this text.
Remark 2.8. The largest L and η are, the strongest the statement "there is no (L, η)localized eigenvector" is. Theorem 2.9. We suppose Hypotheses 2.1, 2.6 and 2.7. Fix η ∈ (0, 1/2) and choose L = L(N) such that Let λ 1 , . . . , λ N be the eigenvalues of X and let v 1 , . . . , v N be some associated normalized eigenvectors. Then for any κ such that η/(1 − η) < κ < 1, hal-00863653, version 3 -18 Jan 2014 Remark 2.10. The same proof can also bring to a version of this theorem where η = η(N) −→ 0. In this case, κ = κ(N) is allowed to tend to zero, thus the theorem allows to lower bound the localization length of most eigenvectors of X.
Remark 2.11. The estimate in Theorem 2.9 is almost sharp, as shown by the case where X is the block diagonal matrix formed with [N/W ](+1) GUE matrices of size W (or at most W for the last block).

Proof of Theorem 2.4
The proof goes along the same lines as the proof of Theorem 2 in the paper [10] by Füredi and Komlós (see also Theorem 2.1.22 in [1]). First note that by Theorem 2.3, we already know that for any η > 0, For any η > 0, for any k ≥ 1, hence it suffices to find a sequence k = k(N) such that for any η > 0, We have where the sum is over collections i = (i 1 , . . . , i 2k ) such that for all ℓ, i ℓ ∈ {1, . . . , N}. For each i, let G i be the simple, non oriented graph with vertex set {i 1 , . . . , i 2k } and edges {i ℓ , i ℓ+1 } (1 ≤ ℓ ≤ 2k, with the convention i 2k+1 = i 1 ). For the expectation in the RHT above to be non zero, we need all edges to be visited at least twice by the path i (because the X ij 's are centered) and the edges {i ℓ , i ℓ+1 } to be such that X i ℓ ,i ℓ+1 is non identically zero. The symmetric group S N acts on the set of i's by σ ·(i 1 , . . . , i 2k ) := (σ(i 1 ), . . . , σ(i 2k )). Following Section 2.1.3 of [1], we denote by W 2k,t the set of equivalence classes, under the action of S N , of i's such that all edges of G i are visited at least twice by the path i and G i has exactly t vertices (this set is actually stable under this action).
Note that for W 2k,t to be non empty, we need to have t ≤ k + 1. Indeed, G i is always connected hence its number of vertices is at most its number of edges plus one.
Note that for any w ∈ W 2k,t , the number of i's in the class w is at most NW t−1 .

It follows from the previous remarks that
Now, let us fix t ∈ {1, . . . , k + 1}, w ∈ W 2k,t and i ∈ w. Let us denote by l (resp. m) the number of edges of G i visited exactly twice (resp. at least three times). Obviously, 2l + 3m ≤ 2k. Moreover, as l + m is the number of edges of the G i , hence by connectedness of G i again, we have t ≤ l + m + 1. So 6t ≤ 6m + 6l + 6 = 2(3m + 2l) + 2l + 6 ≤ 4k + 2l + 6, so 2k − 2l ≤ 6(k − t + 1). Now, notice that as the X ij 's have variance one, EX i 1 i 2 · · · X i 2k i 1 can be reduced to the expectation of a product of 2k − 2l X ij 's, hence by (2) and Hölder's inequality, As a consequence, Now, we shall use Lemma 2.1.23 of [1], which states that #W 2k,t ≤ 4 k (2k) 6(k−t+1) as soon as t ≤ k + 1 (the case t = k + 1 is technically not contained in Lemma 2.1.23 of [1], but follows from Equation (2.1.20) and Lemma 2.1.3 of the same book). It follows that where the last inequality is true as soon as W > (2k(6Ck) α ) 6 . Then it is easy to see that (7) holds for k = k(N) such that log N ≪ k ≪ W 1 6(1+α) .

Proof of Theorem 2.9
Before proving Theorem 2.9, we shall need the following theorem and its corollary. The proof of Theorem 4.1 is postponed to Section 5.  Under Hypotheses 2.1 and 2.7, there are constants c 2 = c 2 (δ, K) > 0 and C 2 = C 2 (δ, K) < ∞ independent of all the other parameters such that for all t > 0, Let us denote by ρ(X) the spectral radius of X and, for L ≥ 1, by ρ L (X) the maximum spectral radius of its L × L principal submatrices (a principal submatrix is a submatrix hal-00863653, version 3 -18 Jan 2014 chosen by extracting a certain set of columns and the same set of rows, but this set does not need to be an interval). Hypotheses 2.1 and 2.7, there exists t < ∞ and c 3 > 0 such that

Corollary 4.2. Under
Proof. The number of ways to choose an L × L principal submatrix is ≤ N L = e L log N . For each submatrix S, P(ρ(S) ≥ t √ N log N ) ≤ exp{−c 2 (t 2 log N − C 2 )L}, hence by the union bound, L}, thus if c 2 t 2 > 1, then (9) holds for a certain c 3 > 0.
To prove Theorem 2.9, we shall also need the following lemma (see Lemma 4.2 in [3]).
Let us now prove Theorem 2.9.
Remark 5.2. If X is a random N × N matrix whose maximum number of non identically zero entries per row is W (like for a band matrix with band width W ), then (10) remains true with N replaced by W everywhere (for some constants still depending only on δ and K).
Proof. We denote by X 1 , . . . , X N the columns of X. We have , for any τ, C as in Lemma 6.2 of the appendix, So the lemma is proved.
For any fixed 0 < ε < 1/4, there exists a family (z i ) i∈I of elements of the unit ball of C N such that |I| ≤ (2/ε) 2N and any element of the unit sphere of C N is within a distance at most ε of one of the z i 's. Moreover, for any positive N × N Hermitian matrix P , The first part of this lemma is well known (see e.g. [12]), whereas its second part follows from the fact that for any vectors of the unit ball z, z i , and specifying z to be an eigenvector associated to λ max . Let us now prove Theorem 4.1.
Proof. By Lemma 5.1, we know that there are constants there are constants c 1 , C 1 depending only on δ, K such that for any z ∈ C N with z ≤ 1, Now, by Lemma 5.3, we have As a consequence, hal-00863653, version 3 -18 Jan 2014 for any δ > 0.
Proof. The second inequality follows from the fact that for any y ≥ 1, we have the inequality 1 + y(e r 2 /δ − 1) ≤ e r 2 y/δ (this is obvious with the series expansion of exp).
So let us prove the first inequality. Note that up to a replacement of X by rX and of δ by δ/r 2 , we shall suppose that r = 1.
The case where E[e δX 2 ] = ∞ is obvious, hence we focus on the other case, which allows to expend all sums with the moments of X.
. Indeed, both terms are equal for x = 0 and the derivative of 2RHT−2LHT is e x − 3e −x + 2, which is increasing, hence has the same sign as x.
Let a · b denote the standard scalar product of two complex vectors and let | · | denote the associated norm. Lemma 6.2. Let us fix δ, K > 0. Then there is τ = τ (δ, K) > 0 and C = C(δ, K) > 0 such that for all N ≥ 1, all z ∈ C N such that |z| ≤ τ , for any Y random vector taking values in C N with independent components in the set E δ,K defined at Hypothesis 2.7, Proof. First step: Let us first prove the result for Y having independent components in E R δ,K := {Y ∈ E δ,K ; Y is real-valued} and z ∈ R N . Let τ R > 0 be such that for any t ∈ [0, τ R ), we have 12t 2 K/δ < 1 and 1 − 12t 2 K/δ −1/2 ≤ e 12t 2 K/δ .
Let g be a standard real Gaussian variable, independent of the other variables, let E g denote the expectation with respect to g and let E denote the expectation with respect to all other variables than g.

Second step:
To extend this result to the complex case, just decompose Y and z into real and imaginary parts and use Hölder inequality to see that the constants τ = τ R √ 8 and C = 4C R are convenient in the general case.