On the least singular value of random symmetric matrices

Let $F_n$ be an $n$ by $n$ symmetric matrix whose entries are bounded by $n^{\gamma}$ for some $\gamma>0$. Consider a randomly perturbed matrix $M_n=F_n+X_n$, where $X_n$ is a random symmetric matrix whose upper diagonal entries $x_{ij}$ are iid copies of a random variable $\xi$. Under a very general assumption on $\xi$, we show that for any $B>0$ there exists $A>0$ such that $P(\sigma_n(M_n)\le n^{-A})\le n^{-B}$. The proof uses an inverse-type result concerning concentration of quadratic forms, which is of interest of its own.


introduction
Let F n be an n by n matrix whose entries are bounded by n O (1) . Consider a randomly perturbed matrix M n = F n + X n , where X n is a random matrix whose entries are iid copies of a random variable. It has been shown, under a very general assumption on ξ, that the singular value of M n cannot be too small. Theorem 1.1. [30, Theorem 2.1] Assume that M n = F n + X n , where the entries of F n are bounded by n γ , and the entries of X n are iid copies of a random variable of zero mean and unit variance. Then for any B > 0, there exists A > 0 such that Here σ n (M n ) is the smallest singular value of M n , defined as The dependence among the parameters in Theorem 1.1 were made explicitly in [32]. Under the stronger assumption that ξ has sub-Gaussian distribution, Rudelson and Vershynin [21] obtained an almost best possible estimate on the tail bound of σ n (M n ). For more results regarding this random matrix ensemble we refer the reader to [21,30,32].
One important application of Theorem 1.1 is a polynomial bound for the condition number of M n . The condition number κ(M ) = σ 1 (M )/σ n (M ) of a matrix M plays a crucial role in numerical linear algebra. The above corollary implies that if one perturbs a fixed matrix F of small spectral norm by a (very general) random matrix X n , the condition number of the resulting matrix will be relatively small with high probability. This fact has some nice applications in theoretical computer science. (See for instance [23,24] for further discussions on these applications).
Another popular model of random matrices is that of random symmetric matrices; this is one of the simplest models that has non-trivial correlations between the matrix entries. A significant new difficulty in the study of the singularity of X n (or of M n in general) is that the symmetry ensures that det(X n ) is a quadratic function of each row, as opposed to the regular random ensembles in which det(X n ) is a linear function of each row.
Answering an old question of Weiss, a recent result of Costello, Tao and Vu [3] shows that X n is almost non-singular with high probability.
Our result implies a polynomial bound for the condition number of M n , assuming that the spectral norm of X n is bounded (which is the case for most of matrix ensembles). Corollary 1.7. With the same assumptions as of Theorem 1.5, and assume that X n = n O (1) . Then for any B > 0, there exists A > 0 such that As another application, we show that the determinant of random symmetric matrices is concentrated around its mean with high probability. Corollary 1.8. Assume that the upper diagonal entries x ij of X n are iid copies of a random variable ξ of zero mean, unit variance, and there is a constant C > 0 such that P(|ξ| ≤ C) = 1. Assume furthermore that the entries f ij of the symmetric matrix F n also satisfy |f ij | ≤ C. This corollary refines an important case of [31,Theorem 34] obtained by Tao and Vu by a different method, which in turn compensates previously known results on the concentration of the determinant of non-symmetric random matrices (cf. [1,4,9,27]).
Notation. For a matrix M we use the notations r i (M ) and c j (M ) to denote its i-th row vector and its j-th column vector respectively; we use the notation (M ) ij to denote its ij entry.
Here and later, asymptotic notations such as O, Ω, Θ, and so for, are used under the assumption that n → ∞. A notation such as O C (.) emphasizes that the hidden constant in O depends on C. If a = Ω(b), we write b ≪ a or a ≫ b. If a = Ω(b) and b = Ω(a), we write a ≍ b.
2. The approach to prove Theorem 1.5 For the sake of simplicity, we will prove our result under the following condition. In fact, because ξ has unit variance, we have P(|x ij | ≥ n B+1 ) = O(n −2B−2 ).
Thus, we can assume that |x ij | ≤ n B+1 at the cost of an additional negligible term o(n −B ) in probability.
We next assume that σ n (M n ) ≤ n −A . Thus for some x = 1 and y ≤ n −A . There are two cases to consider. Case 1. det(M n ) = 0. In this case ξ has discrete distribution.
We first show that it is enough to consider the case of M n having rank n − 1, thanks to the following result.
We deduce Lemma 2.1 from a useful observation by Odlyzko, whose simple proof is presented in Appendix A.
Lemma 2.2 (Odlyzko's lemma, [19]). Let H be a linear subspace in R n of dimension at most k ≤ n. Then . . , f n + x n ), f i are fixed and x i are iid copies of ξ.
Proof. (of Lemma 2.1) View M n+1 as the matrix obtained by adding the first row and first column to M n . Let H be the vector space of dimension k spanned by the row vectors of M n . Then the probability that the subvector formed by the last n components of the first row of M n+1 does not belong to H, by Lemma 2.2, is at least 1 − ( Hence, In general, for 1 ≤ t ≤ n − k we have Because the rows (and columns) added to M n+t−1 each step (to create M n+t ) are independent, we have Next we show that in the case of M n having rank n − 1, it suffices to assume that rank(M n−1 ) ≥ n − 2, thanks to the following simple observation.
Assume that M n has rank n − 1. Then there exists 1 ≤ i ≤ n such that the removal of the i-th row and the i-column of M n results in a symmetric matrix M n−1 of rank at least n − 2.
Proof. (of Lemma 2.3) With out loss of generality, assume that the last n − 1 rows of M n span a subspace of dimension n − 1. Then the matrix obtained from M n by removing the first row and the first column has rank at least n − 2.
Without loss of generality, we assume that the matrix M n−1 obtained from M n by removing its first row and first column has rank at least n−2. We next express det(M n ) as a quadratic function of its first row (m 11 , . . . , m 1n ) as follows.
where c 11 (M n ) is the first cofactor of M n , while c ij (M n−1 ) are the corresponding cofactors of the matrix M n−1 .
It is crucial to note that, since M n−1 has rank at least n − 2, at least one of the cofactor Roughly speaking, our approach consists of two main steps.
• Step 1. Assume that then there is a strong additive structure among the cofactors c ij (M n−1 ) of M n−1 .
• Step 2. The probability, with respect to M n−1 , that there is a strong additive structure among the c ij (M n−1 ) is negligible.
We will execute Step 1 by proving Theorem 2.5 below (as a special case).
Step 2 will be carried out by proving Theorem 2.6.
) be the matrix of the cofactors of M n , we have Thus C(M n )y = | det(M n )|.
We infer that, without loss of generality, For j ≥ 2, we write where M n−1 is the matrix obtained from M n by removing its first row and first column, and c ij (M n−1 ) are the corresponding cofactors of M n−1 .
Hence, by the Cauchy-Schwarz inequality, by Condition 2, and by the bounds f ij ≤ n γ for the entries of F n , we have Similarly, for j = 1 we write Thus, It follows from (1), (2) and (3) Hence, for proving Theorem 1.5, it suffices to justify the following result.
To prove Theorem 2.4, we again express det(M n ) as a quadratic form of its first row.
Roughly speaking, our approach in this case also consists of two main steps.
• Step 1. Assume that then there is a strong additive structure among the cofactors c ij .
• Step 2. The probability, with respect to M n−1 , that there is a strong additive structure among the c ij is negligible.
We now state our main supporting lemmas.
Theorem 2.5 (Step 1). Let 0 < ǫ < 1 be given constant. Assume that i,j a 2 ij = 1, where a ij = a ji , and sup a P x 2 ,...,xn (| 2≤i,j≤n for some sufficiently large integer A, where x i are iid copies of ξ. Then there exists a vector u = (u 1 , . . . , u n−1 ) satisfying the following properties.
• There exists a generalized arithmetic progression Q of rank O B,ǫ (1) and size n O B,ǫ (1) that contains at least n − 2n ǫ components u i .
• All the components u i , and all the generators of the generalized arithmetic progression are rational numbers of the form p/q, where |p|, |q| ≤ n A/2+O B,ǫ (1) .
We refer the reader to Section 3 for a definition of generalized arithmetic progression. Theorem 2.5, which is at the heart of the paper, follows from our study of the inverse Littlewood-Offord problem for quadratic forms (Sections 3-6).
We next proceed to the second step of the approach showing that the probability for M n−1 having the above properties is negligible.
Theorem 2.6 (Step 2). With respect to M n−1 , the probability that there exists a vector u as in Theorem 2.5 is exp(−Ω(n)).
The rest of the paper is organized as follows. We state and prove the inverse Littlewood-Offord result for quadratic forms throughout Sections 3-6. As an application, we prove Theorem 2.5 in Section 7 and conclude Theorem 2.6 in Section 8. We will prove Corollary 1.8 in Section 9.

The inverse Littlewood-Offord result for quadratic forms
A classical result of Erdős [6] and Littlewood-Offord [16] asserts that if a i are real numbers of magnitude |a i | ≥ 1, then the probability that the random sum n i=1 a i x i concentrates on an interval of length one is of order O(n −1/2 ), where x i are iid copies of a Bernoulli random variable. This remarkable inequality has generated an impressive way of research, particularly from the early 1960s to the late 1980s. We refer the reader to [7,8,11,12,13,14,15,20,22,25,33] for further reading regarding these developments.
Motivated by inverse theorems from additive combinatorics (see [33,Chapter 5]), Tao and Vu brought a new view to the problem: find the underlying reason as to why the concentration probability of n i=1 a i x i on a short interval is large.
Typical examples of a i that have large concentration probability are generalized arithmetic progressions (GAPs).
A set Q is a GAP of rank r if it can be expressed as in the form It is convenient to think of Q as the image of an integer box B : The numbers g i are the generators of P , the numbers K ′ i and K i are the dimensions of P , and Vol(Q) := |B| is the volume of B. We say that Q is proper if this map is one to one, or equivalently if |Q| = Vol(Q). For non-proper GAPs, we of course have |Q| < Vol(Q). If −K i = K ′ i for all i ≥ 1 and g 0 = 0, we say that Q is symmetric.
A closer look at the definition of GAPs reveals that if a i are very near to the elements of a GAP of rank O(1) and size n O(1) , then the probability that n i=1 a i x i concentrates on a short interval is of order n −O (1) , where x i are iid copies of a Bernoulli random variable.
It was shown by Tao and Vu [26,30,32], in an implicit way, that these are essentially the only examples that have high concentration probability. An explicit and optimal version has been given in a recent paper by the current author and Vu.
We say that a is δ-close to a set Q if there exists q ∈ Q such that |a − q| ≤ δ.
Theorem 3.1 (Inverse Littlewood-Offord theorem for linear forms, [18]). Let 0 < ǫ < 1 and B > 0. Let β > 0 be a parameter that may depend on n. Suppose that not all a i are zero, and where x = (x 1 , . . . , x n ), and x i are iid copies of a random variable ξ satisfying Condition 1. Then the following holds. For any number n ′ between n ǫ and n, there exists a proper • Q has small rank, r = O B,ǫ (1), and small cardinality • There is a non-zero integer p = O B,ǫ ( √ n ′ ) such that all steps g i of Q have the form In this and all subsequent theorems, the hidden constants could also depend on c 1 , c 2 , c 3 of Condition 1. We could have written O c 1 ,c 2 ,c 3 (.) everywhere, but these notations are somewhat cumbersome, and this dependence is not our focus, so we omit them.
An immediate corollary of Theorem 3.1 is the β−net Theorem by Tao and Vu [30], which plays a crucial rule in their resolution of the circular law conjecture in random matrix theory.
To prove Theorem 2.5, we need an inverse-type result for the high concentration of the A classical approach is to pass the problem to the corresponding bilinear form.
Theorem 3.2 (Inverse Littlewood-Offord theorem for bilinear forms). Let 0 < ǫ < 1 and B > 0. Let β > 0 be a parameter that may depend on n. Suppose that n i,j a 2 ij = 1 and where x = (x 1 , . . . , x n ), y = (y 1 , . . . , y n ), and x i and y i are iid copies of a random variable ξ satisfying Condition 1, and f i are fixed. Then there exist a set I 0 of size O B,ǫ (1), a set I of size at least n − 2n ǫ , and integers 0 = k, k ii 0 , i 0 ∈ I 0 , i ∈ I, all bounded by n O B,ǫ (1) such that the following holds. Let R be the matrix defined as for each i ∈ I; and the other entries k ij of R are zero, except the diagonal terms k ii , i / ∈ I which are defined to be one. Then the following holds for any i-th row, i ∈ I, of the matrix A ′ = RA, where f = (f 1 , . . . , f n ).
One may apply Theorem 3.1 to (5) to conclude that most of the components of r i (A ′ ) are very close to a GAP of rank O(1) and n O(1) . However, we prefer to keep (5) as it is more convenient to use.

Roughly speaking, Theorem 3.2 states that if sup
. . , r ir (A) of A such that most of the remaining rows of A can be written as in the form k 1 r i 1 (A) + · · · + k r r ir (A) + a, where |k i | = n O(1) , and most of the components of a are very close to a GAP of rank O(1) and size n O(1) .
We now state our main inverse-type result for high concentration probability of quadratic forms.
Theorem 3.3 (Inverse Littlewood-Offord theorem for quadratic forms). Let 0 < ǫ < 1 and B > 0. Let β > 0 be a parameter that may depend on n. Suppose that n i=1 a 2 ij = 1, where a ij = a ji , and Then there exists a matrix R as in Theorem 3.2 such that for any i ∈ I, Remark 3.4. All the results presented in this section are in continuous setting. Similar results in discrete setting were obtained in [17]. The reader may also consult that paper for more examples and motivations.

A rank reduction argument and the full rank assumption
This section, which is independent of its own, provides a technical lemma we will need for later sections. Informally, it says that if we can find a proper symmetric GAP that contains a given set, then we can assume this containment is non-degenerate.
We consider P together with the map Φ : P → R r which maps k 1 g 1 + · · · + k r g r to (k 1 , . . . , k r ). Because P is proper, this map is bijective.
We know that P contains U , but we do not know yet that U is non-degenerate in P in the sense that the set Φ(U ) has full rank in R r . In the later case, we say U spans P.
Theorem 4.1. Assume that U is a subset of a proper symmetric GAP P of size r, then there exists a proper symmetric GAP Q that contains U such that the followings hold.
• U spans Q, that is, φ(U ) has full rank in R rank(Q) .
To prove Theorem 4.1, we will rely on the following lemma.

Lemma 4.2 (Progressions lie inside proper progressions).
There is an absolute constant C such that the following holds. Let P be a GAP of rank r in R. Then there is a symmetric proper GAP Q of rank at most r containng P and |Q| ≤ r Cr 3 |P |.
Suppose that Φ(U ) does not have full rank, then it is contained in a hyperplane of R r . In other words, there exist integers α 1 , . . . , α r whose common divisor is one and α 1 k 1 + · · · + α r k r = 0 for all (k 1 , . . . , k r ) ∈ Φ(U ).
Without loss of generality, we assume that α r = 0. We select w so that g r = α r w, and consider P ′ be the GAP generated by The new symmetric GAP P ′ will continue to contain U , because we have Also, note that the volume of P ′ is 2 r−1 K 1 . . . K r−1 , which is less than the volume of P .
We next use Lemma 4.2 to guarantee that P ′ is symmetric and proper without increasing the rank.
Iterate the process if needed. Because the rank of the newly obtained proper symmetric GAP decreases strictly after each step, the process must terminate after at most r steps.

proof of Theorem 3.2
For minor technical reasons, it is convenient to assume ξ to be a discrete random variable. The continuous case of the theorem can be recovered from the discrete one by a standard argument (approximating the continuous distribution by a discrete one while holding n fixed).
For short, we denote the vector (a i1 , . . . , a in ) by a i . We begin by applying Theorem 3.1.
Lemma 5.1. Let ǫ < 1, and B be positive constants. Assume that Then the following holds with probability at least 3ρ/4 with respect to y = (y 1 , . . . , y n ).
There exist a proper symmetric GAP Q y of rank O B,ǫ (1) and size max(O B,ǫ (ρ −1 /n ǫ/2 ), 1) and a set I y of n − n ǫ indices such that a i , y is β-close to Q y for each i ∈ I y .
Proof. (of Lemma 5.1) For short we write We say that a vector y = (y 1 , . . . , y n ) is good if We call y bad otherwise.
Let G denote the collection of good vectors. We are going to estimate the probability p of a randomly chosen vector y = (y 1 , . . . , y n ) being bad by an averaging method.
Thus, the probability of a randomly chosen y belonging to G is at least Consider a good vector y ∈ G. By definition, we have Next, if a i , (y + f ) = 0 for all i, then the conclusion of the theorem holds trivially. Otherwise, we apply Theorem 3.1 to the sequence a i , (y + f ) , i = 1, . . . , n, obtaining a proper symmetric GAP Q y of rank O B,ǫ (1) and size max(O B,ǫ (ρ −1 /n ǫ/2 ), 1), together with its elements q i (y) such that | a i , y − q i (y)| ≤ β for all i ∈ I y , a set of at least n − n ǫ .
We now work with the q i (y). By Theorem 4.1, we may assume that the q i (y) span Q y .
Recall that Next, for each y ∈ G, we choose from I y s indices i y 1 , . . . , i ys such that q iy j (y) span Q y , where s is the rank of Q y . We note that s = O B,ǫ (1) for all s.
Consider the tuples (i y 1 , . . . , i ys ) for all y ∈ G. Because there are s O B,ǫ (n s ) = n O B,ǫ (1) possibilities these tuples can take, there exists a tuple, say (1, . . . , r) (by rearranging the rows of A if needed) such that (i y 1 , . . . , i ys ) = (1, . . . , r) for all y ∈ G ′ , and a subset G ′ of G satisfying For each 1 ≤ i ≤ r, we express q i (y) in terms of the generators of Q y for each y ∈ G ′ , where c i1 (y), . . . c ir (y) are integers bounded by n O B,ǫ (1) , and g i (y) are the generators of Q y .
We will show that there are many y that correspond to the same coefficients c ij .
Next, because |I y | ≥ n − n ǫ for each y ∈ G ′′ , by an averaging argument, there is a set I of size at least n − 2n ǫ such that for each i ∈ I we have P y (i ∈ I y , y ∈ G ′′ ) ≥ P y (y ∈ G ′′ )/2.
Fix an arbitrary row a of index from I. We concentrate on those y ∈ G ′′ where the index of a belongs to I y .
Because q(y) ∈ Q y (q(y) is the element from Q y that is β-close to a, y ), we can write where c i (y) are integers bounded by n O B,ǫ (1) .
For short, for each i we denote by v i the vector (c i1 , . . . , c ir ), we will also denote by v a,y the vector (c 1 (y), . . . c r (y)).
Next, because each coefficient of the identity above is bounded by n O B,ǫ (1) , there exists a subset G ′′ a of G ′′ such that all y ∈ G ′′ a correspond to the same identity, and In other words, there exist integers k 1 , . . . , k r depending on a, all bounded by n O B,ǫ (1) , such that kq(y) + k 1 q 1 (y) + · · · + k r q r (y) = 0 for all y ∈ G ′′ a .
Note that k is independent of a and y. It is crucial to note that, since q i (y) is β-close to a i , (y + f ) . It follows from (11) that for all y ∈ G ′′ a |k a, (y + f ) + k 1 a 1 , (y + f ) + · · · + k r a r , (y + f ) | ≤ n O B,ǫ (1) β.
Next we introduce the following matrix.
The other entries of R are zero, except the diagonal terms (R) ii , where i / ∈ I, which are set to be one.
It is clear from the definition of R that Let A ′ := RA. It follows from (12) that for any i ∈ I, hence obtaining the conclusion of the theorem for the case ξ having discrete distribution.
To recover the continuous case, we approximate ξ by a discrete distribution while holding β, n, f ′ i and a ij fixed. By taking the limit, we again obtain (15) for some R.

proof of Theorem 3.3
In this section we will use the results from Section 5 to prove Theorem 3.3.
Let U be a random subset of {1, . . . , n}, where P(i ∈ U ) = 1/2 for each i. Let A U be a matrix of size n by n defined by We first apply the following lemma.
Lemma 6.1 (Concentration for bilinear forms controls concentration for quadratic forms).
We refer the reader to Appendix B for a proof of this lemma.
By the definition of ξ, it is clear that the random variable ξ − ξ ′ also satisfies Condition 1 (with different positive parameters). We next apply Theorem 3.2 to (16) to obtain the following lemma.
Note that Lemma 6.2 holds for all U ⊂ [n]. We will establish a similar result for the original matrix A.
As Next, let I be the collection of i which belong to at least |U |/2 index sets I U . Then we have Fix an i ∈ I. Consider the tuples (k ii 0 (U ), i 0 ∈ I 0 ) where i ∈ I U . Because there are only n O B,ǫ (1) possibilities such tuples can take, there must be a tuple, say (k ii 0 , i 0 ∈ I 0 ), such that (k ii 0 (U ), i 0 ∈ I 0 ) = (k ii 0 , i 0 ∈ I 0 ) for at least |U |/2n O B,ǫ (1) = 2 n /n O B,ǫ (1) sets U .
Because |I 0 | = O B,ǫ (1), it is easy to see that there is a way to partition I 0 into I ′ 0 ∪ I ′′ 0 such that there are 2 n /n O B,ǫ (1) sets U above satisfying that I ′′ 0 ⊂ U and U ∩ I ′ 0 = ∅. Let U I ′ 0 ,I ′′ 0 denote the collection of these U .
By passing to consider a subset of U I ′ 0 ,I ′′ 0 if needed, we may assume that either i / ∈ U or i ∈ U for all U ∈ U I ′ 0 ,I ′′ 0 . Without loss of generality, we assume the first case that i ∈ U . (The other case can be treated similarly).
By the definition of A U , and because I ′ 0 ∩ U = ∅ and I ′′ 0 ⊂ U , for i ′ 0 ∈ I ′ 0 and i ′′ 0 ∈ I ′′ 0 we can write Thus, Next, by Lemma 6.2, for each U ∈ U I ′ 0 ,I ′′ 0 we have Also, note that Hence, By applying the Cauchy-Schwarz inequality, we obtain where z j := (u j − u ′ j )y j , and in the last inequality we used the simple observation that Note that u j − u ′ j are iid copies of η (1/2) . Hence z j are iid copies of η (1/2) (ξ − ξ ′ ), where η (1/2) is independent of ξ and ξ ′ .
Clearly, we may assume that I ∩ I 0 = ∅ by throwing away those i ∈ I that belong to I 0 . We introduce the following matrix R.
for each i ∈ I; the other entries of R are zero, except the diagonal terms (R) ii , where i / ∈ I, are defined to be one.
It is clear that the matrix A ′ = RA satisfies the conclusion of Theorem 3.3, completing the proof.

proof of Theorem 2.5
We first apply Theorem 3.3 to a ij to obtain a matrix A ′ and a set |I| ≥ n − 2n ǫ such that for any i ∈ I, Ideally, our next step is to apply Theorem 3.1 to the r i (A ′ ). However, the application is meaningful only when r i (A ′ ) is relatively large. Investigating the degenerate case is our next goal. Set We consider two cases.
Next, because j c j (A) 2 = 1, there exists an index j 0 such that c j 0 (A) ≥ n −1/2 . Consider this column vector.
It follows from (19) that for any i ∈ I, The above inequality means that the components c i 0 (i) of c j 0 (A) belong to a GAP generated by c j 0 (j)/k, j ∈ I 0 , up to an error K. This suggests us the following approximation.
For each j / ∈ I, we approximate c j 0 (j) by a number v j of the form (1/⌊2K −1 ⌋) · Z such that |v j − c j 0 (j)| ≤ K. We next set Furthermore, by Condition 2, and because c i 0 (A), r i (M n−1 ) = 0 for i = j 0 , we infer that Note that v ≫ n −1/2 . Set u := ⌊1/ v ⌋ · v, we then obtain • There exists a GAP of rank O B,ǫ (1) and size n O B,ǫ (1) that contains at least n − 2n ǫ components u i .
• All the components u i , and all the generators of the GAP are rational numbers of the form p/q, where |p|, |q| ≤ n A/2+O B,ǫ (1) .
Also, it follows from (18) that Next, because the z i satisfy Condition 1, Theorem 3.1 applying to (20) implies that v can be approximated by a vector u as follows.
• There exists a GAP of rank O B,ǫ (1) and size n O B,ǫ (1) that contains at least n − n ǫ components u i .
• All the components u i , and all the generators of the GAP are rational numbers of the form p/q, where |p|, |q| ≤ n A/2+O B,ǫ (1) .
Note that, by the approximation above, we have u ≍ 1 and | u, r i (M n−1 ) | ≤ n −A/2+O B,ǫ (1) for at least n − O B,ǫ (1) row vectors of M n−1 .

Proof of Theorem 2.6
We first bound the number N of vectors u satisfying the conclusion of Theorem 2.6.
Because each GAP is determined by its generators and dimensions, the number of Qs is bounded by Next, for a given Q of rank O B,ǫ (1) and size n O B,ǫ (1) , there are at most n n−2n ǫ |Q| n−2n ǫ = n O B,ǫ (n) ways to choose the n − 2n ǫ components u i that Q contains.
Hence, we obtain the key bound Therefore, for our task of proving Theorem 2.6, it would be ideal if we can show that the probability P β 0 (u) that | u, r i (M n−1 ) | ≤ β 0 for n − O B,ǫ (1) rows of M n−1 are smaller than exp(−Ω(n))/N for each u, where β 0 := n −A/2+O B,ǫ (1) is the bound from the conclusion of Theorem 2.5.
Roughly speaking, our strategy is to classify u into two classes: one contains of u of very small P β 0 (u), and thus their contribution is negligible; the other contains of u of relatively large P β 0 (u). To deal with those u of the second type, we will not control P β 0 (u) directly but pass to a class of new vectors u ′ that are also almost orthogonal to many rows of M n−1 , while the probability P β 0 (u ′ ) are relatively smaller than P β 0 (u). More details follow.
8.1. Technical reductions and key observations. By paying a factor of n O B,ǫ (1) in probability, we may assume that | u, r i (M n−1 ) | ≤ β 0 for the first n − O B,ǫ (1) rows of M n−1 (because M n−1 is symmetric, this is the case of most correlations among the entries). Also, by paying another factor of n n ǫ in probability, we may assume that the first n 0 components u i of u belong to a GAP Q, and u n 0 ≥ 1/2 √ n − 1, where n 0 := n − 2n ǫ . We refer the remaining u i as exceptional components. Note that these extra factors do not affect our final bound exp(−Ω(n)).
A crucial observation is that, by exposing the rows of M n−1 one by one, and due to symmetry, the probability P β (u) that | u, r i (M n−1 ) | ≤ β for all i ≤ n − O B,ǫ (1) can be bounded by Also, because of Condition 1 and u n 0 ≥ 1/2 √ n − 1, for any β < c 1 /2 √ n − 1 we have and thus, Next, let C be a sufficiently large constant depending on B and ǫ. We classify u into two classes B and B ′ , depending on whether P β 0 (u) ≥ n −Cn or not.
Because of (30), and C is large enough, For the rest of the section, we focus on u ∈ B. Depending on the distribution of the sequences (ρ (i) β j (u)), we consider two cases.

Approximation for degenerate vectors.
Let B 1 be the collection of u ∈ B satisfying the following property: for any n ′ = n 1−ǫ components u i 1 , . . . , u i n ′ among the u 1 , . . . , u n 0 , we have For consision we set β = n −B−4 . It follows from Theorem 3.1 that, among any u i 1 , . . . , u i n ′ , there are, say, at least n ′ /2 + 1 components that belong to an interval of length 2β (because our GAP now has only one element). A simple argument then implies that there is an interval of length 2β that contains all but n ′ − 1 components u i . (To prove this, arrange the components in increasing order, then all but perhaps the first n ′ /2 and the last n ′ /2 components will belong to an interval of length 2β).
Thus there exists a vector u ′ ∈ (2β) · Z satisfying the following conditions.
Because of the approximation, whenever | u, r i (M n−1 ) | ≤ β 0 , we have It is clear, from the bound of β and β 0 , that β ′ ≤ c 2 /2 √ n − 1, and thus by (23), Now we bound the number of u ′ obtained from the approximation. First, there are O(n n−n 0 +n ′ ) = O(n 2n 1−ǫ ) ways to choose those u ′ i that take the same value u, and there are just O(β −1 ) ways to choose u. The remaining components belong to the set (2β) −1 · Z, and thus there are at most O((β −1 ) n−n 0 +n ′ ) = O(n O A,B,ǫ (n 1−ǫ ) ) ways to choose them.
Hence we obtain the total bound

Recall from (22) that
Roughly speaking, the reason we truncated the product here is that whenever i ≤ n 0 −n 1−ǫ , and β k is small enough, the terms ρ (i) β k (u) are smaller than (n ′ ) −1/2+o(1) , owing to (26). This fact will allow us to gain some significant factors when applying Theorem 3.1.
Note that π β k (u) increases with k, and recall that π 0 (u) ≥ n −Cn . Thus, by the pigeon-hole principle, there exists k 0 := k 0 (u) ≤ Cǫ −1 such that It is crucial to note that, since A was chosen to be sufficiently large compared to O B,ǫ (1) and C, we have Having mentioned the upper bound of ρ (i) β i (u), we now turn to its lower bound. Because of Condition 2, and u i ≤ 1, the following trivial bound holds for any β ≥ β 0 and i ≤ n 0 − n ′ , With all the necessary settings above, we now classify u basing on the distributions of the ρ For each 0 ≤ k 0 ≤ Cǫ −1 and each tuple (m 0 , . . . , m K ) satisfying m 0 + · · · + m K = n 0 − n 1−ǫ , we let B (m 0 ,...,m K ) k 0 denote the collection of those u from B 2 that satisfy the following conditions.
• There are exactly m k terms of the sequence (ρ (i) β k 0 (u)) belonging to the interval I k .
• For the remaining components u i , we just simply approximate them by the closest point in β i 0 · Z.
We have thus provided an approximation of u by u ′ satisfying the following properties. ( (3) All the u ′ i , including the generators of Q k , belong to the set β k 0 · {p/q, |p|, |q| ≤ n A/2+O B,ǫ (1) }.
Hence, in order to justify Theorem 2.6 in the case u ∈ B 2 , it suffices to show that the probability that (28) where in the last inequality we used (27).
We recall from the definition of B Hence, In the next step we bound the size of B ′ (m 1 ,...,m K ) Next, after locating Q k , the number N 1 of ways to choose u ′ i from each Q k is where we used the bound |Q k | = O(ρ −1 k /n 1/2−ǫ ).
The remaining components u ′ i can take any value from the set β i 0 ·{p/q, |p|, |q| ≤ n A/2+O B,ǫ (1) }, so the number N 2 of ways to choose them is bounded by Putting the bound for N 1 and N 2 together, we obtain a bound N for |B ′ (m 1 ,...,m K ) It follows from (29) and (30) that Summing over the choices of k 0 and (m 1 , . . . , m K ) we obtain the bound completing the proof of Theorem 2.6.
9. Proof of Corollary 1.8 Assume that the upper diagonal entries of M n satisfy the conditions of Corollary 1.8. We denote by λ 1 ≤ λ 2 ≤ · · · ≤ λ n the real eigenvalues of M n .
Our first ingredient is the following special form of the spectral concentration result of Guionnet and Zeitouni.
Following [4] and [9], we will apply the above theorem to the cut-off functions f + ǫ (x) := log(max(ǫ, x)) and f − ǫ (x) = log(max(ǫ, −x)), for some ǫ > 0 to be determined. The main reason we have to truncate the log function is because it is not Lipschitz. Note that f + and f −1 both have Lipschitz constant ǫ −1 . Although they are not convex, it is easy to write them as difference of convex functions of Lipschitz constant O(ǫ −1 ), and so Lemma 9.1 applies. Thus the followings hold for δ ≫ (ǫn) −1 Hence, Roughly speaking, (32) implies that λ i ∈S − ǫ ∪S + ǫ |λ i | is well concentrated around its mean. It thus remains to control the factor R := |λ i |≤ǫ |λ i |. We will bound R away from zero, relying on Theorem 1.5 and Lemma 9.2 below.
Our next goal is the following result.
Proposition 9.3. With probability 1 − n ω(1) we have Let us complete the proof of Corollary 1.8 assuming Proposition 9.3.
It remains to prove Proposition 9.3.
Proof. (of Proposition 9.3) Set δ := log n ǫn , which satisfies the condition of Lemma 9.1 as n is sufficiently large. By (32) we have We have E(U ) = 0. Thus, by Jensen inequality and by (32), Observe that It thus follows from (37) that exp(E( This relation, together with (36), imply that with probability 1 − n ω(1) , Appendix A. Proof of Lemma 2.2 Assume that v 1 , . . . , v k ∈ R n are independent vectors that span H. Also, without loss of generality, we assume that the subvectors (v 11 , . . . , v 1k ), . . . , (v k1 , . . . , v kk ) generate a full space of dimension k.
If u ∈ H, then there exist α 1 , . . . , α k such that Note that α 1 , . . . , α k are uniquely determined once the first k components of u are exposed. Thus we have where in the last estimate we use the fact (which follows from Condition 1) that sup a P(ξ = a) ≤ √ 1 − c 3 .
We have We next write Note that exp(− π 2 x 2 ) = R e(xt) exp(− π 2 t 2 )dt. Thus Consider x as (x U , xŪ ), where x U , xŪ are the vectors corresponding to i ∈ U and i / ∈ U respectively. By the Cauchy-Schwarz inequality we have where y U = x U − x ′ U and zŪ = xŪ − x ′Ū , whose entries are iid copies of ξ − ξ ′ .

Thus we have
Because a ′ ij = a ′ ji , we can write the last term as R E y U ,z ′Ū ,y ′ U ,zŪ e i∈U,j∈Ū a ′ ij y i z j + j∈Ū,i∈U a ji (−z ′ j )y ′ i t (exp(− where v := (y U , −z ′Ū ) and w := (y ′ U , zŪ ).