Low rank perturbations of large elliptic random matrices

We study the asymptotic behavior of outliers in the spectrum of bounded rank perturbations of large random matrices. In particular, we consider perturbations of elliptic random matrices which generalize both Wigner random matrices and non-Hermitian random matrices with iid entries. As a consequence, we recover the results of Capitaine, Donati-Martin, and F\'eral for perturbed Wigner matrices as well as the results of Tao for perturbed random matrices with iid entries. Along the way, we prove a number of interesting results concerning elliptic random matrices whose entries have finite fourth moment; these results include a bound on the least singular value and the asymptotic behavior of the spectral radius.


Introduction
In this note, we investigate the asymptotic behavior of outliers in the spectrum of bounded rank perturbations of large random matrices. We begin by introducing the empirical spectral distribution of a square matrix.
The eigenvalues of a N × N matrix M are the roots in C of the characteristic polynomial det(M − zI), where I is the identity matrix. We let λ 1 (M ), . . . , λ N (M ) denote the eigenvalues of M . In this case, the empirical spectral measure µ M is given by The corresponding empirical spectral distribution (ESD) is given by Here #E denotes the cardinality of the set E. If the matrix M is Hermitian, then the eigenvalues λ 1 (M ), . . . , λ N (M ) are real. In this case the ESD is given by Given a random N × N matrix Y N , an important problem in random matrix theory is to study the limiting distribution of the empirical spectral measure as N tends to infinity. We will use asymptotic notation, such as O, o, Ω, under the assumption that N → ∞. See Section 2.2 for a complete description of our asymptotic notation.
1.1. Random matrices with independent entries. We consider two ensembles of random matrices with independent entries. We first define a class of Hermitian random matrices with independent entries originally introduced by Wigner [52]. Definition 1.1 (Wigner random matrices). Let ξ be a complex random variable with mean zero and unit variance, and let ζ be a real random variables with mean zero and finite variance. We say Y N is a Wigner matrix of size N with atom variables ξ, ζ if Y N = (y ij ) N i,j=1 is a random Hermitian N × N matrix that satisfies the following conditions.
• {y ij : 1 ≤ i ≤ j ≤ N } is a collection of independent random variables.
• {y ij : 1 ≤ i < j ≤ N } is a collection of independent and identically distributed (iid) copies of ξ. • {y ii : 1 ≤ i ≤ N } is a collection of iid copies of ζ.
The prototypical example of a Wigner real symmetric matrix is the Gaussian orthogonal ensemble (GOE). The GOE is defined by the probability distribution on the space of N × N real symmetric matrices when β = 1 and dM refers to the Lebesgue measure on the N (N + 1)/2 different elements of the matrix. Here Z (β) N denotes the normalization constant. So for a matrix Y N = (y ij ) N i,j=1 drawn from the GOE, the elements {y ij : 1 ≤ i ≤ j ≤ N } are independent Gaussian random variables with mean zero and variance 1 + δ ij .
The classical example of a Wigner Hermitian matrix is the Gaussian unitary ensemble (GUE). The GUE is defined by the probability distribution given in (1.1) with β = 2, but on the space of N × N Hermitian matrices. Thus, for a matrix Y N = (y ij ) N i,j=1 drawn from the GUE, the N 2 different real elements of the matrix, {Re(y ij ) : 1 ≤ i ≤ j ≤ N } ∪ {Im(y ij ) : 1 ≤ i < j ≤ N }, are independent Gaussian random variables with mean zero and variance (1+δ ij )/2. A classical result for Wigner random matrices is Wigner's semicircle law [5,Theorem 2.5].
Theorem 1.2 (Wigner's Semicircle law). Let ξ be a complex random variable with mean zero and unit variance, and let ζ be a real random variables with mean zero and finite variance. For each N ≥ 1, let Y N be a Wigner matrix of size N with atom variables ξ, ζ, and let A N be a deterministic N × N Hermitian matrix with rank o(N ). Then the ESD of 1 √ N (Y N + A N ) converges almost surely to the semicircle distribution F sc as N → ∞, where Remark 1.3. Wigner's semicircle law holds in the case when the entries of Y N are not identically distributed (but still independent) provided the entries satisfy a Lindeberg-type condition. See [5,Theorem 2.9] for further details.
We now consider an ensemble of random matrices with iid entries.
Definition 1.4 (iid random matrices). Let ξ be a complex random variable. We say Y N is an iid random matrix of size N with atom variable ξ if Y N is a N × N matrix whose entries are iid copies of ξ.
When ξ is a standard complex Gaussian random variable, Y N can be viewed as a random matrix drawn from the probability distribution P(dM ) = 1 π N 2 e − tr(M M * ) dM on the set of complex N × N matrices. Here dM denotes the Lebesgue measure on the 2N 2 real entries of M . This is known as the complex Ginibre ensemble. The real Ginibre ensemble is defined analogously. Following Ginibre [28], one may compute the joint density of the eigenvalues of a random N × N matrix Y N drawn from the complex Ginibre ensemble.
Mehta [37,38] used the joint density function obtained by Ginibre to compute the limiting spectral measure of the complex Ginibre ensemble. In particular, he showed that if Y N is drawn from the complex Ginibre ensemble, then the ESD of and µ circ is the uniform probability measure on the unit disk in the complex plane. Edelman [22] verified the same limiting distribution for the real Ginibre ensemble. For the general (non-Gaussian) case, there is no formula for the joint distribution of the eigenvalues and the problem appears much more difficult. The universality phenomenon in random matrix theory asserts that the spectral behavior of a random matrix does not depend on the distribution of the atom variable ξ in the limit N → ∞. In other words, one expects that the circular law describes the limiting ESD of a large class of random matrices (not just Gaussian matrices) The first rigorous proof of the circular law for general (non-Gaussian) distributions was by Bai [3,5]. He proved the result under a number of assumptions on the moments and smoothness of the atom variable ξ. Important results were obtained more recently by Pan and Zhou [41] and Götze and Tikhomirov [31]. Using techniques from additive combinatorics, Tao and Vu [46] were able to prove the circular law under the assumption that E|ξ| 2+ε < ∞ for some ε > 0. Recently, Tao and Vu [47,48] established the law assuming only that ξ has finite variance.
For any matrix M , we denote the Hilbert-Schmidt norm M 2 by the formula Theorem 1.5 (Tao-Vu, [48]). Let ξ be a complex random variable with mean zero and unit variance. For each N ≥ 1, let Y N be a N × N matrix whose entries are iid copies of ξ, and let A N be a N × N deterministic matrix. If rank(A N ) = o(N ) and sup N ≥1 converges almost surely to the circular law F circ as N → ∞.

1.2.
Outliers in the spectrum. From Theorem 1.2 and Theorem 1.5, we see that the low rank perturbation A N does not effect the limiting ESD. In other words, the majority of the eigenvalues remain distributed according to semicircle law or circular law, respectively. However, the perturbation A N may create one or more outliers.
Let Y N be a N × N random matrix whose entries are iid copies of ξ. When the atom variable ξ has finite fourth moment, one can compute the asymptotic behavior of the spectral radius [5,Thoerem 5.18]. We remind the reader that the spectral radius of a square matrix is the largest eigenvalue in absolute value. Theorem 1.6 (No outliers for iid matrices). Let ξ be a complex random variable with mean zero, unit variance, and finite fourth moment. For each N ≥ 1, let Y N be a N × N random matrix whose entries are iid copies of ξ. Then the spectral radius of 1 √ N Y N converges to 1 almost surely as N → ∞. In [49], Tao computes the asymptotic location of the outlier eigenvalues for bounded rank perturbations of iid random matrices. Theorem 1.7 (Outliers for small low rank perturbations of iid matrices, [49]). Let ξ be a complex random variable with mean zero, unit variance, and finite fourth moment. For each N ≥ 1, let Y N be a N × N random matrix whose entries are iid copies of ξ, and let C N be a deterministic matrix with rank O(1). Let ε > 0, and suppose that for all sufficiently large N , there are no eigenvalues of C N in the band {z ∈ C : 1 + ε < |z| < 1 + 3ε}, and there are j eigenvalues λ 1 (C N ), . . . , λ j (C N ) for some j = O(1) in the region {z ∈ C : |z| ≥ 1 + 3ε}. Then, almost surely, for sufficiently large N , there are precisely j eigenvalues λ 1 in the region {z ∈ C : |z| ≥ 1 + 2ε}, and after labeling these eigenvalues properly, Recently, Benaych-Georges and Rochet [11] obtained an analogous result for finite rank perturbations of random matrices whose distributions are invariant under the left and right actions of the unitary group. Benaych-Georges and Rochet also study the fluctuations of the outlier eigenvalues.
Similar results have also been obtained for Wigner random matrices. When the atom variables have finite fourth moment, the asymptotic behavior of the spectral radius can be computed [5,Theorem 5.2]. Theorem 1.8 (No outliers for Wigner matrices). Let ξ be a complex random variable with mean zero, unit variance, and finite fourth moment, and let ζ be a real random variables with mean zero and finite variance. For each N ≥ 1, let Y N be a Wigner matrix of size N with atom variables ξ, ζ. Then the spectral radius of 1 √ N Y N converges to 2 almost surely as N → ∞. The asymptotic location of the outliers for bounded rank perturbations of Wigner matrices and other classes of self adjoint random matrices have also been determined. In fact, the fluctuations of the outlier eigenvalues can be explicitly computed. We refer the reader to [8,9,10,17,18,19,23,24,35,36,42,43,44] and references therein for further details. Theorem 1.9 (Outliers for small low rank perturbations of Wigner matrices, [44]). Let ξ be a real random variable with mean zero, unit variance, and finite fourth moment, and let ζ be a real random variables with mean zero and finite variance. For each N ≥ 1, let Y N be a Wigner matrix of size N with atom variables ξ, ζ. Let k ≥ 1. For each N ≥ k, let C N be a N × N deterministic Hermitian matrix with rank k and nonzero eigenvalues λ 1 (C N ), . . . , λ k (C N ), where k, λ 1 (C N ), . . . , λ k (C N ) are independent of N . Let S = {1 ≤ i ≤ k : |λ i (C N )| > 1}. Then we have the following.
• For all i ∈ S, after labeling the eigenvalues of 1 √ N Y N + C N properly, in probability as N → ∞.
• For all i ∈ {1, . . . , k} \ S, after labeling the eigenvalues of 1 √ N Y N + C N properly, in probability as N → ∞. Remark 1.10. Under additional assumptions on the atom variables ξ, ζ, the convergence in Theorem 1.9 can be strengthened to almost sure convergence [17].
Non-Hermitian finite rank perturbations of random Hermitian matrices have been studied in the mathematical physics literature. We refer the reader to [25,26,27] and references therein for further details.
1.3. Elliptic random matrices. We consider the following class of random matrices with dependent entries that generalizes the ensembles introduced above. These so-called elliptic random matrices were originally introduced by Girko [29,30]. Definition 1.11 (Condition C1). Let (ξ 1 , ξ 2 ) be a random vector in R 2 , where both ξ 1 , ξ 2 have mean zero and unit variance. We set ρ := E[ξ 1 ξ 2 ]. Let {y ij } i,j≥1 be an infinite double array of real random variables. For each N ≥ 1, we define the N × N random matrix Y N = (y ij ) N i,j=1 . We say the sequence of random matrices {Y N } N ≥1 satisfies condition C1 with atom variables (ξ 1 , ξ 2 ) if the following hold: • {y ii : 1 ≤ i} ∪ {(y ij , y ji ) : 1 ≤ i < j} is a collection of independent random elements, • {(y ij , y ji ) : 1 ≤ i < j} is a collection of iid copies of (ξ 1 , ξ 2 ), • {y ii : 1 ≤ i} is a collection of iid random variables with mean zero and finite variance.
Remark 1.13. Let ξ be a real random variable with mean zero and unit variance. For each N ≥ 1, let Y N be a N × N random matrix whose entries are iid copies of ξ. Then {Y N } N ≥1 is a sequence of random matrices that satisfy condition C1.
is a sequence of random matrices that satisfy condition C1, then it was shown in [40] that the limiting ESD of 1 √ N Y N is given by the uniform distribution on the interior of an ellipse. The same conclusion was shown to hold by Naumov [39] for elliptic random matrices whose atom variables satisfy additional moment assumptions.

Main results
In this note, we consider the outliers of perturbed elliptic random matrices. In particular, we consider versions of Theorem 1.6, Theorem 1.7, Theorem 1.8, and Theorem 1.9 for elliptic random matrices whose entries have finite fourth moment.
Definition 2.1 (Condition C0). Let (ξ 1 , ξ 2 ) be a random vector in R 2 , where both ξ 1 , ξ 2 have mean zero and unit variance. We set ρ := E[ξ 1 ξ 2 ]. For each N ≥ 1, let Y N be a N × N random matrix. We say the sequence of random matrices {Y N } N ≥1 satisfies condition C0 with atom variables (ξ 1 , ξ 2 ) if the following conditions hold: We will also define the neighborhoods E ρ,δ := {z ∈ C : dist(z, E ρ ) ≤ δ} for any δ > 0. We first consider a version of Theorem 1.6 and Theorem 1.8 for elliptic random matrices. Because of the elliptic shape of the limiting ESD, it is not enough to just consider the spectral radius.
Theorem 2.2 (No outliers for elliptic random matrices). Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ), where ρ = E[ξ 1 ξ 2 ]. Let δ > 0. Then, almost surely, for N sufficiently large, all the eigenvalues of 1 √ N Y N are contained in E ρ,δ . Theorem 1.14 and Theorem 2.2 immediately imply the following asymptotic behavior for the spectral radius of elliptic random matrices. Corollary 2.3 (Spectral radius of elliptic random matrices). Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ), where ρ = E[ξ 1 ξ 2 ]. Then the spectral radius of 1 √ N Y N converges almost surely to 1 + |ρ| as N → ∞.
We now consider the analogue of Theorem 1.7 and Theorem 1.9 for elliptic random matrices. Figure 1 shows an eigenvalue plot of a perturbed elliptic random matrix as well as the location of the outlier eigenvalues predicted by the following theorem. 1 We use √ −1 to denote the imaginary unit and reserve i as an index.  2.4 (Outliers for low rank perturbations of elliptic random matrices). Let k ≥ 1 and δ > 0. Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , . Suppose for N sufficiently large, there are no nonzero eigenvalues of C N which satisfy and there are j eigenvalues λ 1 (C N ), . . . , λ j (C N ) for some j ≤ k which satisfy Then, almost surely, for N sufficiently large, there are exactly j eigenvalues of , and after labeling the eigenvalues properly, We now consider the case of elliptic random matrices with nonzero mean, which we write as 1 is a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ), µ is a fixed nonzero complex number (independent of N ), and ϕ N is the unit vector ϕ N := 1 √ N (1, . . . , 1) * . This corresponds to shifting the entries of Y N by µ (so they have mean µ instead of mean zero). The elliptic law still holds for this rank one perturbation of 1 √ N Y N , thanks to Theorem 1.14. In view of Theorem 2.4, we show there is a single outlier for this ensemble near µ √ N .
Theorem 2.8 (Outlier for elliptic random matrices with nonzero mean). Let δ > 0. Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ), where ρ = E[ξ 1 ξ 2 ], and let µ be a nonzero complex number independent of N . Then almost surely, for sufficiently large N , all the eigenvalues of  [24] for a class of real symmetric Wigner matrices. Moreover, Füredi and Komlós study the fluctuations of the outlier eigenvalue. Tao [49] verified Theorem 2.8 when Y N is a random matrix with iid entries.
One of the keys to proving Theorem 2.4 and Theorem 2.8 is to control the least singular value of a perturbed elliptic random matrix. Let M be a N × N matrix. The singular values of M are the eigenvalues of |M | := √ M M * . We let σ 1 (M ) ≥ · · · ≥ σ N (M ) ≥ 0 denote the singular values of M . In particular, the largest and smallest singular values are where x denotes the Euclidian norm of the vector x. We let M denote the spectral norm of M . It follows that the largest and smallest singular values can be written in terms of the spectral norm. Indeed, σ 1 (M ) = M and σ N (M ) = 1/ M −1 provided M is invertible. We now consider a lower bound for the least singular value of perturbed elliptic random matrices of the form 1 √ N Y N − zI, where I denotes the identity matrix. A lower bound of the form for some A > 0, was shown to hold with high probability in [39,40]. Below, we consider only the case when z is outside the ellipse E ρ and thus obtain a constant lower bound independent of N .
In fact, Theorem 2.2 follows immediately from Theorem 2.10.
Proof of Theorem 2.2. We note that z is an eigenvalue of 1 On the other hand, Thus, we conclude that z is an eigenvalue of 1 √ N Y N if and only if The claim therefore follows from Theorem 2.10.
The condition number σ 1 (M )/σ N (M ) of a N × N matrix M plays an important role in numerical linear algebra (see for example [7]). As a consequence of Theorem 2.10, we obtain the following bound for the condition number of perturbed elliptic random matrices that satisfy condition C0. Corollary 2.11 (Condition number bound). Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , Then there exists C > 0 (depending on z) such that almost surely, for N sufficiently large, Proof. In view of Theorem 2.10, it suffices to show that almost surely for N sufficiently large. Since it suffices to show that almost surely for N sufficiently large. The claim now follow from Lemma 3.3 below. Indeed, the bound on the spectral norm of Y N has previously been obtained in [39] and follows from [5, Theorem 5.2].

2.1.
Overview. In order to prove Theorem 2.4, we will make use of Sylvester's determinant identity: det(I + AB) = det(I + BA), (2.2) where A is a N ×k matrix and B is a k ×N matrix. In particular, the left-hand side of (2.2) is a N × N determinant, while the right-hand side is a k × k determinant.
To outline the main idea, which is based on the arguments of Benaych-Georges and Rao [8], consider the rank one perturbation C N = vu * . In order to study the outlier eigenvalues, we will need to solve the equation From (2.2), we find that this is equivalent to solving Thus, the problem of locating the outlier eigenvalues reduces to studying the resolvent We develop an isotropic limit law in Section 5 to compute the limit of u * G N v; this limit law is inspired by the isotropic semicircle law developed by Knowles and Yin [35,36] for Wigner random matrices. Namely in Theorem 5.1 we show that not only does the trace of G N (z) almost surely converge to some function m(z) (defined in (4.3)) but arbitrary bilinear forms u * G N v almost surely converge to m(z)u * v. However, instead of working with G N directly, it will often be more convenient to work with the 2N × 2N Hermitian matrix 2 and its resolvent (Ξ N − ηI) −1 . In fact, the eigenvalues of Ξ N are given by the singular values Thus, for Im(η) > 0, the matrix Ξ N − ηI is always invertible. Moreover, when η = 0, the resolvent becomes In other words, we can recover G N by letting η tend to zero. Similarly, we will bound the least singular value of 1 √ N Y N − zI and prove Theorem 2.10 by studying the eigenvalues of the resolvent (Ξ N − ηI) −1 when Im(η) = N −β for some β > 0.
The paper is organized as follows. We present our preliminary tools in Section 3 and Section 4. In particular, Section 3 contains a standard truncation lemma; in Section 4, we study the stability of a fixed point equation which will determine the asymptotic behavior of the diagonal entries of G N . In Section 5, we apply the truncation lemma from Section 3 to reduce both Theorem 2.4 and Theorem 2.10 to the case where we only need to consider elliptic random matrices whose entries are bounded. We also introduce an isotropic limit law for G N and prove Theorem 2.8 in Section 5. Finally, we complete the proof of Theorem 2.10 in Section 6 and complete the proof of Theorem 2.4 in Section 7.
A number of auxiliary proofs and results are contained in the appendix. Appendix A contains a somewhat standard proof of the truncation lemma from Section 3. Appendix B contains a large deviation estimate for bilinear forms. In Appendix C, we study some additional properties of a limiting spectral measure which was analyzed in [40].
2.2. Notation. We use asymptotic notation (such as O, o, Ω) under the assumption that N → ∞. We use X Y, Y X, Y = Ω(X), or X = O(Y ) to denote the bound X ≤ CY for all sufficiently large N and for some constant C. Notations such as X k Y and X = O k (Y ) mean that the hidden constant C depends on another constant k. X = o(Y ) or Y = ω(X) means that X/Y → 0 as N → ∞.
An event E, which depends on N , is said to hold with overwhelming probability if P(E) ≥ 1 − O C (N −C ) for every constant C > 0. We let 1 E denote the indicator function of the event E. E c denotes the complement of the event E.
We let M denote the spectral norm of M . M 2 denotes the Hilbert-Schmidt norm of M (defined in (1.2)). We let I N denote the N × N identity matrix. Often we will just write I for the identity matrix when the size can be deduced from the context. For a square matrix M , we let tr N M := 1 N tr M . We write a.s., a.a., and a.e. for almost surely, Lebesgue almost all, and Lebesgue almost everywhere respectively. We use √ −1 to denote the imaginary unit and reserve i as an index.
We let C and K denote constants that are non-random and may take on different values from one appearance to the next. The notation K p means that the constant K depends on another parameter p.
Acknowledgments. The authors are grateful to Alexander Soshnikov for many useful discussions and Yan Fyodorov for references. They are particularly thankful to Terry Tao for helpful discussions and enthusiastic encouragement. The authors would also like to thank the anonymous referees for their valuable comments and corrections.

Preliminary tools and notation
In this section, we consider a number of tools we will need to prove our main results. We also introduce some new notation, which we will use throughout the paper.
Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ). We will work with the resolvent G N defined by and it's trace, denoted m N (z) In order to work with the resolvent, we will need control of the spectral norm G N . We bound the spectral norm of G N for z sufficiently large by bounding the spectral norm of 1 √ N Y N in the next subsection. When working with G N , we will take advantage of the following well known resolvent identity: for any invertible N × N matrices A and B, Suppose A is an invertible square matrix. Let u, v be vectors. If 1 + v * A −1 u = 0, from (3.3) one can deduce the Sherman-Morrison rank one perturbation formula (see [33,Section 0.7.4]): and From [33, Section 0.7.3], we obtain the inverse of a block matrix and Schur's complement: (3.6) where A, B, C, D are matrix sub-blocks and D, A − BD −1 C are non-singular. In the case that A, D − CA −1 B are invertible, we obtain It follows from the block matrix inversion formula that 3.1. Bounds on the spectral norm. We begin with the following deterministic bound.
Lemma 3.1 (Spectral norm of the resolvent for large |z|). Let M be a N × N matrix that satisfies M ≤ K. Then Proof. By writing out the Neumann series, we obtain Remark 3.2. If H is a Hermitian matrix, we have provided Im(z) = 0.
We will use the following estimate for the spectral norm. We note that the bound in Lemma 3.3 below is not sharp, but will suffice for our purposes.
Proof. We write . By assumption, the diagonal entries of the matrix have mean zero and finite variance. The above-diagonal entries are iid copies of ξ1+ξ2 2 . Thus the above-diagonal entries have mean zero and variance 1 4 Moreover, the above-diagonal entries have finite fourth moment: By [5,Theorem 5.2], we obtain a.s.
The claim follows from the bounds above and (3.10).

3.2.
Hermitization. In order to study the spectrum of a non-normal matrix it is often useful to instead consider the spectrum of a family of Hermitian matrices. We define the Hermitization of an N × N matrix X to be an N × N matrix with entries that are 2 × 2 block matrices. The ij th entry is the 2 × 2 block: We note the Hermitization of X can be conjugated by a 2N × 2N permutation matrix to 0 X X * 0 Let X N := 1 √ N Y N and define H N to be the Hermization of X N . We will generally treat H N as an N × N matrix with entries that are 2 × 2 blocks, but occasionally it will instead be useful to consider H N as a 2N × 2N matrix.
Additionally we define the 2 × 2 matrix q := η z z η (3.11) with η = E + √ −1t ∈ C + := {w ∈ C : Im(w) > 0} and z ∈ C. We define the Hermitized resolvent Note that this is the usual resolvent of the Hermitization of X N − zI, hence it inherits the usual properties of resolvents. For example, its operator norm is bound from above by t −1 . We will use the Hermitized resolvent extensively in Section 6 to estimate the least singular value of X N − zI and in Section 7.2 to estimate the expectation of bilinear forms involving G N (z).

3.3.
Truncation. Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ). Instead of working with Y N directly, we will work with a truncated version of this matrix. Specifically, we will work with a matrixŶ N where the entries are truncated versions of the original entries of Y N . Recall Here 1 E denotes the indicator function of the event E. We will also define the truncated entriesỹ for i = j, andŷ ii := 0 for all i ≥ 1. We setŶ N := (ŷ ij ) N i,j=1 . We also introduce the notationsĜ We verify the following standard truncation lemma.
be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , ξ 2 ). Then there exists constants C 0 , L 0 > 0 such that the following holds for all L > L 0 .
is a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 ,ξ 2 ). • a.s., one has the bounds and • a.s., one has The proof of Lemma 3.4 follows somewhat standard arguments; we present the proof in Appendix A.
For the truncated matricesŶ N , we have the following bound on the spectral norm. 3.4. Martingale inequalities. The following standard bounds were originally proven for real random variables; the extension to the complex case is straightforward.
Lemma 3.6 (Rosenthal's inequality, [16]). Let {x k } be a complex martingale difference sequence with respect to the filtration {F k }. Then, for p ≥ 2, Lemma 3.7 (Burkholder's inequality, [16]). Let {x k } be a complex martingale difference sequence with respect to the filtration {F k }. Then, for p ≥ 1, Lemma 3.9 (Lemma 6.11 of [5]). Let {F n } be an increasing sequence of σ-fields and {X n } a sequence of random variables. Write 3.5. Concentration of bilinear forms. We establish the following large deviation estimate for bilinear forms, which is a consequence of Lemma B.1 from Appendix B.
Lemma 3.10 (Concentration of bilinear forms). Let (x, y) be a random vector in C 2 where x, y both have mean zero, unit variance, and satisfy • max{|x|, |y|} ≤ L a.s., Let (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x N , y N ) be iid copies of (x, y), and set X = (x 1 , x 2 , . . . , x N ) T and Y = (y 1 , y 2 , . . . , y N ) T . Let B be a N × N random matrix, independent of X and Y . Then for any integer p ≥ 2, there exists a constant K p > 0 such that, for any t > 0, In particular, if B ≤ N 1/4 a.s., then for any integer p ≥ 2.
Proof. We first note that (3.19) follows from (3.18) by taking t = N −1/8 and applying the deterministic bound It remains to prove (3.18). By Markov's inequality, it suffices to show for any integer p ≥ 2. We will use Lemma B.1 from Appendix B to verify (3.20). By conditioning on the matrix B (which is independent of X and Y ), we apply Lemma B.1 and obtain 3.6. ε-nets. We introduce ε-nets as a convenient way to discretize a compact set. Let ε > 0. A set X is an ε-net of a set Y if for any y ∈ Y , there exists x ∈ X such that x − y ≤ ε. We will need the following well-known estimate for the maximum size of an ε-net.
Lemma 3.11. Let D be a compact subset of {z ∈ C : |z| ≤ M }. Then D admits an ε-net of size at most Proof. Let N be maximal ε separated subset of D. That is, |z − w| ≥ ε for all distinct z, w ∈ N and no subset of D containing N has this property. Such a set can always be constructed by starting with an arbitrary point in D and at each step selecting a point that is at least ε distance away from those already selected. Since D is compact, this procedure will terminate after a finite number of steps. We now claim that N is an ε-net of D. Suppose to the contrary. Then there would exist z ∈ D that is at least ε from all points in N . In other words, N ∪ {z} would still be an ε-separated subset of D. This contradicts the maximal assumption above.
We now proceed by a volume argument. At each point of N we place a ball of radius ε/2. By the triangle inequality, it is easy to verify that all such balls are disjoint and lie in the ball of radius M + ε/2 centered at the origin. Comparing the volumes give Similarly, if I is an interval on the real line with length |I|, then I admits an ε-net of size at most 1 + |I|/ε.

Stability of the fixed point equation
We will study the limit of the sequence of functions {m N } N ≥1 (defined in (3.2)). As is standard in random matrix theory, we will not compute the limit explicitly, but instead show that the limit satisfies a fixed point equation. In particular, we will show that the limiting function satisfies . In this section, we study the stability of (4.1) for −1 ≤ ρ ≤ 1. We begin with a few preliminary results.
Since (4.1) can be written as a quadratic polynomial, the solution of (4.1) has two branches when ρ = 0. We refer to the two branches as the solutions of (4.1). Lemma 4.3 (Solutions of (4.1)). Consider equation (4.1). Then one has the following.
Finally, it is straightforward to check that is the only solution of (4.1) that satisfies (4.2).
For the remainder of the paper, we let m(z) be the unique solution of equation Proof. From (4.3) and (4.5), we have Since ±2 √ ρ ∈ E ρ by Lemma 4.2, we conclude that Then there exists ε, C, c > 0 (depending only on δ, M, ρ) such that the following holds. Suppose m satisfies for all z ∈ D. If |ε 1 (z)|, |ε 2 (z)| ≤ ε for all z ∈ D, then: Proof. When ρ = 0, we note that Thus we obtain the bound |m (z)| ≤ 5/2.
Assume ρ = 0. Let C be a large positive constant such that C > 100M and Then ε < 49 100 C by construction. We will show that |m (z)| ≤ C/|ρ| for all z ∈ D. Suppose to the contrary that |m (z)| > C/|ρ| for some z ∈ D. Then which contradicts the assumption that C 2 > 2|ρ|. We conclude that |m (z)| ≤ C/|ρ| for all z ∈ D.
Using the bound above, we have Proof. Since m(z) satisfies (4.1), the claim follows from Lemma 4.5 by taking ε 1 (z) = ε 2 (z) = 0 (alternatively, one can derive the bounds directly from (4.1) and obtain an explicit expression for C, c in terms of δ, ρ, M ).
Then there exists ε, C > 0 (depending only on δ, M, ρ) such that the following holds. Let m be a continuous function on D that satisfies (4.6) for all z ∈ D. If |ε 1 (z)|, |ε 2 (z)| ≤ ε for all z ∈ D, then exactly one of the following holds: Proof. First we consider the case ρ = 0. For ε ≤ 1/2, we have that Assume −1 ≤ ρ ≤ 1 with ρ = 0. By Lemma 4.5, there exists ε, C > 0 such that if |ε 1 (z)|, |ε 2 (z)| ≤ ε/2 for all z ∈ D, then |m (z)| ≤ C for all z ∈ D. By rearranging (4.6), we then obtain for all z ∈ D. From Lemma 4.4, we obtain for all z ∈ D. Combining (4.7) and (4.8) we obtain the quadratic inequality For ε sufficiently small, the two possibilities above are distinct. Because m − m is continuous and since D is connected, a continuity argument implies that exactly one of the possibilities above holds for all z ∈ D.
We also verify that m(z) is a continuous function of ρ.
Proof. In order to denote the dependence on ρ, we let m ρ (z) be the function defined by for −1 ≤ ρ ≤ 1. Since the roots of a (monic) polynomial are continuous functions of the coefficients (see [20,50] Subtracting (4.9) from the equation above yields for |ρ| ≤ ε. Since |z| > 2, we conclude that m ρ (z) is continuous at ρ = 0.

Truncation arguments and the isotropic limit law
In this section, we begin the proof of Theorem 2.4 and Theorem 2.10 by reducing to the case where we only need to consider the truncated matrices {Ŷ N } N ≥1 . 5.1. Isotropic limit law. This subsection is devoted to Theorem 2.4. We will prove Theorem 2.4 using the following isotropic limit law, which is inspired by the isotropic semicircle law developed by Knowles and Yin [35,36].
Assuming Theorem 2.10 and Theorem 5.1, we complete the proof of Theorem 2.4. By the singular value decomposition, we write Lemma 5.2 (Eigenvalue criterion). Let z be a complex number that is not an zI is invertible by assumption, we rewrite the above equation as The claim now follows from (2.2) and (3.1).
Remark 5.3. The proof of Lemma 5.2 actually reveals that provided the denominator does not vanish. Versions of this identity have appeared in previous publications including [2,8,9,10,18]. Following Tao in [49], we define the functions where m(z) is defined in (4.3). Both f and g are meromorphic functions outside E ρ that are asymptotically equal to 1 at infinity. By Lemma 5.2, the zeroes of f coincide with the eigenvalues of 1 (5.1) we see that the multiplicity of any such eigenvalue is equal to the degree of the corresponding zero of f . It follows from (2.2) that where λ 1 (C N ), . . . , λ k (C N ) are the non-trivial eigenvalues of C N (some of which may be zero).
In order to study the zeroes of g, we consider the values of z / ∈ E ρ for which Indeed, for 0 < |λ| ≤ 1, there does not exist z / ∈ E ρ which solves (5.2); for |λ| > 1, (5.2) holds if and only if This follows from (4.1) and an analytic continuation argument 3 . By Theorem 2.2 (which was proved in Section 2 assuming Theorem 2.10 holds), it follows that a.s., for N sufficiently large, all the eigenvalues of 1 √ N Y N are contained in E ρ,δ . By Rouché's theorem, in order to prove Theorem 2.4, it suffices to show that a.s. sup One technical issue that arises when ρ = 0 is that the solution of (4.1) has two distinct analytic branches m, m 2 . In order to overcome this obstacle, we make the following observations.
It follows that (5.2) has no solution outside the ellipse when |ρ| ≤ |λ| ≤ 1. Furthermore, for |λ| sufficiently large, one can deduce the solution (5.3) for the branch m and then extend to the region |λ| > 1 by analytic continuation. Similarly, one can show that (5.2) has no solution outside the ellipse when |λ| < |ρ|; in fact, for |λ| < |ρ|,   [19] for the location of the outlier eigenvalues. Indeed, (5.3) can be obtained using techniques from free probability. Let µ sc,ρ be the semicircle distribution with variance ρ and let µ circ,1−ρ be the uniform distribution on the disk centered at the origin in the complex plane with radius ( with S ρ and C 1−ρ free random variables. Outside of the ellipse, the Stieltjes transform of µ ρ can be expressed as the Stieltjes transform of the circular law evaluated at the subordination function F (z) = z + ρm(z). This can be seen by adding the R-transforms together and inverting to obtain the Stieltjes transform. The inverse function of F is H(z) = z + ρ/z, which is precisely the function appearing in (5.3).
The function H plays the same role here as in [19]. Since we are only interested in solutions outside the ellipsoid, the domain of H is restricted to |z| > 1.
We now reduce the proof of Theorem 5.1 to the case where we only need to consider the truncated matrices {Ŷ N } N ≥1 . We letm(z) be the function given by (4.3) with ρ replaced byρ.
Theorem 5.5 (Isotropic limit law forŶ N ). Let {Y N } N ≥1 be a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 , Let ε > 0. Let L > 0, and consider the truncated random matrices {Ŷ N } N ≥1 from Lemma 3.4. For each N ≥ 1, let u N and v N be unit vectors in C N . Fix z ∈ C with 5 ≤ |z| ≤ 6. Then a.s., for N sufficiently large, We now prove Theorem 5.1 assuming Theorem 2.10 and Theorem 5.5.
Set ε := min{ε/100, ε }, and let N be a ε -net of D. By Lemma 3.11, |N | = O(1). By Theorem 5.5, we have a.s., for N sufficiently large, Furthermore, by Lemma 3.4 and Lemma 4.8 (taking L sufficiently large), we have a.s., for N sufficiently large, We now extend this bound to all z ∈ D. By Lemma 3.1, Lemma 3.3, and (3.3), we have a.s., for N sufficiently large, for all z, w ∈ D. Fix a realization in which (5.5) and (5.6) hold. Choose w ∈ D.
Then there exists z ∈ N with |z − w| ≤ ε . Thus, from (5.6), we have On the other hand, from the uniform continuity of m, we have Combining the bounds above with (5.5), we conclude that, for N sufficiently large, for any fixed realization in which (5.5) Thus, by Lemma 3.4, we obtain a.s., for N sufficiently large, by taking L sufficiently large. Combining the bound above with (5.7), we conclude that a.s., for N sufficiently large, Since ε is arbitrary, we in fact obtain that a.s.
as N → ∞. This implies that a.s., for N sufficiently large, On the other hand, by (5.8), we have a.s., for N sufficiently large, Combining (5.9) and (5.10), we obtain (5.4), and the proof is complete.
We will prove Theorem 5.5 in Section 7.
Proof. By Markov's inequality and the Borel-Cantelli lemma, it suffices to show that We write We now consider the pairs for which E[y i1j1 y i2j2 y i3j3 y i4j4 ] is nonzero. From Definition 1.11, we see that each pair (i s , j s ) must correspond to some (i r , j r ) or (j r , i r ) for s = r. Counting all such pairs yields (5.11).
Define the functions By (5.3), it follows that g has precisely one zero outside E ρ located at µ √ N + ρ µ √ N . By Lemma 5.2 and Theorem 2.2, a.s. the eigenvalues of X N + µ √ N ϕ N ϕ * N outside E ρ,δ correspond to the zeroes of f . From Theorem 5.1, we see that a.s.
uniformly for z / ∈ E ρ,δ . We conclude that if f has a zero outside E ρ,δ , it must tend to infinity with N . Thus, for the remainder of the proof, we restrict our attention to the region |z| ≥ 5. It remains to show that f a.s. has exactly one zero outside |z| ≥ 5 taking the value z = µ √ N + o(1). By writing out the Neumann series and applying Lemma 3.3, we obtain a.s.
uniformly for |z| ≥ 5. Thus, by Lemma 5.6, we conclude that a.s., uniformly for |z| ≥ 5. Let ε > 0; from Rouché's theorem, we conclude that a.s., for N sufficiently large, f has exactly one zero in the disk of radius ε centered at µ √ N . Let z be any zero of f outside E ρ,δ . Since z tends to infinity with N , we apply (5.12) and obtain a.s.
Therefore, we conclude that a.s., for N sufficiently large, f has precisely one zero outside E ρ,δ taking the value z = µ √ N + o(1), and the proof is complete.
provided all the relevant matrices on the right-hand side are invertible. From Theorem 5.7, we have a.s., for N sufficiently large, It thus suffices to show that a.s., for N sufficiently large, for some constant C > 0. From Lemma 3.4, it follows that a.s., for N sufficiently large, X N − X N ≤ C L for some constant C > 0. Thus, by taking L sufficiently large, we conclude that Thus, by the Neumann series, we obtain a.s., for N sufficiently large, , and the proof is complete.
We now reduce to the case where z is fixed (as opposed to taking the infimum over an uncountable number of complex numbers). We proceed using an ε-net argument and the following theorem.
Proof of Theorem 5.7. Let δ > 0, and let c > 0 be the constant from Theorem 5.8. We first note that a.s., for N sufficiently large, by Lemma 3.1 and Lemma 3.3. Thus, it suffices to show that a.s., for N sufficiently large, Let N be a c/10-net of the compact region D. By Lemma 3.11, |N | = O(1). Thus, by applying Theorem 5.8 to each z ∈ N , we obtain a.s., for N sufficiently large, We now extend this bound to all z ∈ D. Fix a realization in which (5.14) holds. Choose z ∈ D. Then there exists z ∈ N with |z − z | ≤ c/10. By Weyl's perturbation theorem (see for instance [12]), Thus, we conclude that for any realization in which (5.14) holds. The proof of the theorem is complete.
We will prove Theorem 5.8 in Section 6.

5.4.
Notation. It remains to prove Theorem 5.5 and Theorem 5.8. As such, for the remainder of the paper we only consider the truncated matrices {Ŷ N } N ≥1 from Lemma 3.4 for some arbitrarily large fixed constant L > 0. Thus, we drop the decorations from our notation and simply write Y N , X N , G N for the matriceŝ Y N ,X N ,Ĝ N . Similarly, we write m N (z) for the functionm N (z); we also write m(z) for the functionm(z).

Least singular value bound
This section is devoted to Theorem 5.8. For this entire section we work with fixed z satisfying the hypothesis of Theorem 5.8. 6.1. Hermitization. Recall the Hermitization H N and its resolvent R N (q) defined in Section 3.2.
In this section, for any matrix H with entries that are 2 × 2 blocks, we mean tr N (H) = 1 N i H ii where H ii is the i th diagonal 2 × 2 block of H. When working with N × N matrices with entries that are 2 × 2 blocks, we use superscripts to refer to entries of the 2 × 2 blocks. Additionally, when forming an N × N matrix whose ij th entry is the ab th entry (a, b ∈ {1, 2}) of the ij th 2 × 2 blocks we also use superscripts. For example, R 21 is the N × N matrix formed from taking each R ij block and replacing it by its (2,1)-entry.
Let Γ N (q) := tr N (R N ). By the symmetry of the matrix H N , l R 22 ll = l R 11 ll , i.e. Γ 11 N = Γ 22 N . Let a N (q) := Γ 11 N (q), b N (q) := Γ 12 N (q) and c N (a) := Γ 21 N (q). From the calculations in [40] (see also [39]), it follows that Γ N (q) converges almost surely to a limit Γ(q) := a(q) b(q) c(q) a(q) for each fixed q. This block matrix Stieltjes transform satisfies the fixed point equation where Σ is the operator on 2 × 2 matrices defined by The fixed point equation should be viewed as a matrix version of (4.1). For more information on the use of this block matrix resolvent, we refer the reader to [14,15] and the references within.
For a N × N matrix A, let ν A denote the symmetric empirical measure built from the singular values of A. That is, where σ 1 (A) ≥ · · · ≥ σ N (A) are the singular values of A. The measure ν A is also the empirical spectral measure of the Hermitization of A. It was established in [40] that ν X N −zI converges almost surely to a probability measure ν z as N → ∞. In Appendix C, we study the properties of Γ(q) and ν z . In particular, we will establish the following bound on the support of ν z when z is outside the ellipsoid.
Theorem 6.1. Fix −1 < ρ < 1 and let δ > 0. Then there exists c > 0 such that In this case, a lower bound on the singular values follows from [5,Chapter 5]. The ρ = −1 case can be obtained by symmetry.
Remark 6.3. We give a complete proof of Theorem 6.1 in Appendix C. We quickly describe an alternative proof using techniques from free probability. From [40] and the work of Voiculescu [51], one can study the limiting measure ν z by considering the distribution of where S and C are free non-commutative random variables, S is a semi-circular variable, and C is a circular variable. Indeed, Biane and Lehner [13] showed that the spectrum of √ ρS + √ 1 − ρC is the ellipsoid E ρ . Therefore, for any z ∈ C with dist(z, E ρ ) ≥ δ, it follows that 0 is not in the spectrum of A z , and hence 0 is not in the support of the distribution of A z . A continuity argument then implies that for any M > 0, there exists some c > 0 such that for a.e. z ∈ C with |z| ≤ M and dist(z, E ρ ) ≥ δ, we have ν z ([−c, c]) = 0.
Proving Theorem 5.8 is equivalent to showing that a.s. ν X N −zI ([0, c]) = 0 for some c > 0. By Theorem 6.1, we choose c such that ν z ([0, 2c]) = 0. In order to show that ν X N −zI ([0, c]) = 0 we will show that a N (q) is close to a(q) for q as in (3.11) with η = E + √ −1t, E ∈ [0, c] and t sufficiently small. As a N (q) and a(q) are the Stieltjes transform of ν X N −zI and ν z , respectively, at the point η this will allow us to compare the two measures. The equations involving a N (q) depend crucially on b N (q) and c N (q) so it is actually more straightforward to show Γ N (q) is close to Γ(q). We should note that the empirical spectral measure, µ X N , can be recovered by the formula −πµ X N = lim η= √ −1t→0 ∂ z b N . This formula only uses purely imaginary η. We consider more general η and a connection to the empirical spectral measure does not seem to be available.
In order to show that almost surely there are no singular values of X N − zI less than c, we follow the ideas of Bai and Silverstein [4]. First we prove an a priori bound on Γ N (q)−Γ(q), then use martingale inequalities to bound Γ N (q)−E[Γ N (q)], and finally bound E[Γ N (q)]−Γ(q). Because of the correlations between X ij and X ji we don't directly study the Stieltjes transform of the empirical spectral measure of (X N −zI) * (X N −zI), but instead consider the linearized problem and study Γ N (q). Similar linearization tricks have been used to study eigenvalues of polynomials of Wigner matrices (e.g. see [1,34]).
Since the vector space of 2 × 2 matrices is finite dimensional, all norms on it are equivalent. Therefore the use of · in the this section can be any norm, but the reader might find it useful to think of it is the max of each entry of the matrix. In order to show a 2 × 2 matrix converges, it suffices to show that each entry of the matrix converges. We will often employ this strategy.
We conclude this section with some useful matrix identities and notation.
We write H i to be the i th column (of 2 × 2 blocks) of H N and H By Schur's Complement, the diagonal entries of the resolvent are Recall that the diagonal elements of X N and hence H N have been set to zero. Let γ (i) Summing over i gives the trace: Lemma 6.4. There exist some α, β > 0 such that if q is as in (3.11) with t N ≥ N −β , then almost surely The proof will show that we can take α = 1/3 and β = 1/16; these values are not optimal, but are sufficient for our purposes. We will require that α + β < 1/2 and β < α. We define the modified resolventŘ Where e i is the N × 1 vector whose i th 2 × 2 block is the identity matrix and whose other entries are zero. The difference between the trace ofŘ    2N by 2N matrix). Indeed, by the resolvent identity, . The trivial bound on the resolvent, (3.8), shows the operator norm of the difference is bounded by 2t −1 N . Thus, we obtain the estimate Since we assume t N ≥ N −β , this term is deterministically bounded by CN β−1 uniformly for E ∈ [0, c] and 1 ≤ i ≤ N . We now bound γ with a and b either 1 or 2, and a = a + 1 (mod 2), b = b + 1 (mod 2). The final estimate uses that N times the operator norm of a self adjoint matrix bounds its trace. The trivial bound shows the operator norm is bounded by t −2 N . Then by Chebyshev's inequality and the union bound In order for this term to converge to zero we require that α + β < 1/2. Then, p can be chosen large enough to make the right-hand side summable. An application of the Borel-Cantelli lemma implies almost sure convergence.
Since α + β < 1/2 implies that β − 1 < −α, we conclude the proof of the lemma. Now we state and prove our a priori bound. Proof. First note that it is sufficient to prove the estimate on S N . If |E − E | ≤ δ, then Γ N (q) − Γ N (q ) = Γ N (q)(q − q )Γ N (q ) ≤ δt −2 N . Therefore showing that Γ N (q) = O(N −β ) for E ∈ S N with t N > N −β implies the bound Γ N (q ) = O(N −β ) for all q with E ∈ [0, c].

We introduce the notation
Let Λ N be the event that max 1≤i≤N i N ≤ N −α . By Lemma 6.4, 1 Λ N = 1 almost surely. With this notation we rewrite (6.4) as For sufficiently large N we have the bound, Thus, we can solve for −(q + Σ(Γ N (q))) −1 1 Λ N :
Sinceq1 Λ N = (q + Σ( N ))1 Λ N and Σ( N ) 1 Λ N converges to zero, we can choose N sufficiently large such that the imaginary part of the diagonal entries ofq are almost surely greater than zero, yielding the almost sure bound Γ(q) ≤ K.
Then using that Γ N (q) ≤ t −1 N , we obtain almost surely We conclude for η such that 1 > K|t N | −1 , E ∈ [0, c] and then for all η with t N > N −β and E ∈ [0, c] by analytic continuation that almost surely Γ N (q) = Γ(q) + N . Now we define the Stieltjes transforms of the measure ν X N −zI and ν z restricted to [−2c, 2c] and its complement to be , and observe that 1 (x−E) 2 +t 2 N forms a uniformly bounded, equicontinuous family as a function of E ∈ [0, c] for x ∈ [−2c, 2c]. Furthermore, since ν X N −zI converges almost surely to ν z by the calculations in [40] (see also [39]), we can conclude that a.s. sup E∈S N t −1 N (Im(a in N (q)) − Im(a in (q))) −→ 0.
Combining this estimate with Lemma 6.5 gives that a.s.
We conclude this section with a bound on the number of singular values less than c and then turn this into a bound on the trace of the resolvent. Let T N be an t N -net of [0, c]. Using the inequality So we conclude on the almost sure event Λ N that there are o(N 1−β ) eigenvalues in the interval [0, c]. We will require a similar a priori bound on the number of small eigenvalues for the N − 1 × N − 1 submatrices X (i) N , defined by removing the i th row and column of X N . Thus, we define the event Λ i that ν X (i) N −zI ([0, c]) = o(N −β ). By the interlacing theorem, Λ N ⊂ Λ i , so Λ i also occurs almost surely.
For k = 1, . . . , N , let E k be averaging with respect to the first k rows and columns of X N , and let E 0 be the identity. Since ν X N −zI ([0, c]) is bounded, almost surely o(N −β ) and E k [ν X N −zI ([0, c])] forms a martingale, we can apply Lemma 3.9 to obtain the almost sure estimate Repeating the argument shows this estimate also holds for ]. Now we use the spectral theorem to turn this bound on the number of singular values into a bound on the trace of powers of the resolvent. In order for this bound to be useful we will increase the imaginary part of η.
This term is O(1) if p = 1, 2 because t N = N −β/4 . The same argument bounds the 21 and 22 term. The above computation also verifies the lemma when a row or column has been removed.
Before proceeding with the proof, we define the relevant notation and give a lemma containing crude estimates.
Applying (3.7) to R N (which we view as an N × N matrix) yields Note that by Schur's Complement (3.6) the first term is an entry of the resolvent: We define . In order to study R ii we introduce the non-random 2 × 2 matrix Note that this is not actually an entry of a resolvent. In order to control the fluctuations of R ii , we use the resolvent identity to compare the R ii with R ii : This motivates the definition γ (i) We remind the reader γ (i) Redefine S N to be a N −2 -net of the interval [0, c]. Once again it suffices to prove the theorem for E ∈ S N . Lemma 6.9. For a, b ∈ {1, 2}, and p ≥ 2: There exist some K such that for all large N , We note that part of the use of the first inequality is the equality γ (i) Proof. Using the martingale inequality, (3.7), and the bound on a trace of a matrix and a submatrix, (6.5), we can bound any entry as Combining this estimate with (6.6) leads to To bound R ii , we begin with Combining this estimate with the trivial bound, |R ii | ≤ |t N | −1 , leads to: The last term is bounded for η in our domain, and the proof is complete.
Proof of Theorem 6.8. To control Γ N (q) − E[Γ N (q)], we rewrite it as a sum of martingale differences. Using the equality N (q)] and the formula for the differences of traces of submatrices (6.10), we have (6.16) To complete the proof it suffices to show that for arbitrary > 0, Recalling that To estimate (6.17) we iteratively apply (6.11) leading to: After applying this expansion to (6.17), it suffices to show that each entry of the 2 × 2 block converges to zero almost surely. By the triangle inequality, it suffices to bound an arbitrary product of entries of the blocks in the expansion. For the remainder of this section, we use lower case superscripts starting with the beginning of the alphabet to denote the values 1 or 2.
To bound R ii γ (i) N ))) we apply Rosenthal's inequality (Lemma 3.6), the bound on moments of quadratic forms (Lemma B.1), the bound on R ii (6.15), and the a priori bound (6.9). We obtain which is summable for large p. Recall that b = b + 1 (mod 2). The same estimates are used to bound the R ii ζ (i) term. In order to bound R ii γ (i) N , we begin with Burkholder's inequality (Lemma 3.7) and then apply R ii ζ (i) N by (6.5) and the bound on R ii given in (6.15) along with the Cauchy-Schwarz inequality and the estimate on quadratic forms (6.14).
The estimate of the R ii γ (i) ) term is done the same way.
Choosing p large enough in the above estimates to make the right-hand sides summable and an application of Borel-Cantelli shows that almost surely . We now show that for q as in (3.11) with t N = N −β/4 and β as in Lemma 6.4 We begin in a similar fashion to the a priori estimates with Schur's Complement, , from which we will subtract −(q + Σ(E[Γ N (q)])) −1 . We will apply the resolvent identity, add and subtract Σ(E[Γ (1) N (q)]), and repeatedly apply the identity Leading to the expansion Note that the third line is zero, because E[γ Before proving the lemma note that (6.18) will follow from a straight forward application of this lemma, the triangle inequality, Hölder's inequality and the estimates Γ N ≤ t −1 N , R ii ≤ K, and the estimate on quadratic forms (6.14).
Proof. Using the formula for the difference between traces, (6.10), the bound on the trace of the resolvent, (6.9), and the bound on quadratic forms, (3.18), we obtain The first term of (6.20) is bounded from a direct calculation and the second term uses the martingale difference decomposition and the expansions of the previous estimates.
The final estimate follows from and the boundedness of E[Γ N (q)]. The proof of the lemma is complete.
Then the arguments of Lemma 6.6 can be repeated to prove (6.18).
Taking differences for different k 1 and k 2 gives Repeating this for all values of k and splitting the integral over two regions leads to: The first integrand forms a uniformly bounded, equicontinuous family so the integral converges to zero by the weak convergence of ν X N −z to ν z . The summand is uniformly bounded away from zero when evaluated at a singular value in the interval [0, c]. So we conclude that almost surely there are no singular values in the interval [0, c].

Isotropic limit law
This section is devoted to Theorem 5.5. We divide the proof of Theorem 5.5 into the following three steps.
(1) Showing that the diagonal entries of G N (z) convergence uniformly to m(z).
(2) Establishing a rate of convergence of the off-diagonal entries of G N (z) to zero.
By Lemma 3.5, it follows that Ω N holds with overwhelming probability. We establish the following convergence result for the diagonal entries of G N (z).
Lemma 7.1 (Diagonal entries). Let ε > 0. Then, for N sufficiently large, Proof. We introduce the following notation. For any 1 ≤ i ≤ N , we let Y (i) N be the (N − 1) × (N − 1) matrix formed from Y N by removing the i-th row and i-th column. We let r i denote the i-th row of Y N with the i-th entry removed; let c i denote the i-th column of Y N with the i-th entry removed. We define N denote the N × N matrix formed from Y N by setting the entries in the i-row and i-th column to zero. We defině on the event Ω N . Fix 1 ≤ i ≤ N and z ∈ C with |z| ≥ 5. Let ε > 0. By the Schur complement (since the diagonal entries of Y N are zero), we have that We observe that (r T i , c i ) and G    on Ω N because |z| ≥ 5. Observe thatY (i) N − Y N is at most rank 2. By the resolvent identity, we have ≤ C N on the event Ω N . We conclude that, for N sufficiently large, with overwhelming probability. By (7.3) and (7.4), it follows that with overwhelming probability. Let D be a compact, connected set that satisfies If ρ = 0, we additionally assume that there exists z 0 ∈ D with Such a choice of z 0 in (7.6) always exists by (4.2). We now extend (7.5) to all z ∈ D. Let N be an ε-net of D. By Lemma 3.11, |N | = O(1). Thus, by the union bound, we have with overwhelming probability. Fix a realization in the event Ω N such that (7.7) holds. By (3.3) and (7.1), for all z, z ∈ D. Let z ∈ D. Then there exists z ∈ N with |z − z | ≤ ε. Thus, we have 3). Therefore, we conclude that ≤ Cε with overwhelming probability. By the union bound, we have with overwhelming probability. Thus, with overwhelming probability, If ρ = 0, we conclude that with overwhelming probability. We now obtain this bound in the case that ρ = 0 by applying Lemma 4.7. In view of (7.6) and Lemma 3.1, we have on the event Ω N . Thus, by Lemma 4.7, we conclude that (7.10) holds with overwhelming probability for any −1 ≤ ρ ≤ 1. By (7.8), (7.9), and (7.10), we obtain with overwhelming probability. Since Ω N holds with overwhelming probability, we have sup with overwhelming probability. By Lemma 4.6 and (7.1), we conclude that, for N sufficiently large, The proof of the lemma is complete.

7.2.
Off-diagonal entries. Let H N be the Hermitization of X N as in Section 6. Once again we will view H N as a N × N matrix of 2 × 2 blocks. We reuse the notation from Section 6 with q = q(z, η). Let R 21 N (η, z) be the N × N matrix with (R 21 N (η, z)) ij = R 21 ij (η, z). We begin by noting that when defined, u * N G N (z)v N = u * N (R 21 N (0, z))v N . Just as in Section 6 when we only needed to control R 11 ii but found it easier to instead control the block R ii , here we will estimate the 2 × 2 block R ij for i = j in order to control R 21 ij . We should note that many of our estimates will involve the norm R N (z, η) , but on the event that there are no eigenvalues outside the ellipse this norm is O(1). Lemma 7.2 (Off-diagonal entries). Fix z, η ∈ C with 5 ≤ |z| ≤ 6 and Im(η) > 0.
Proof. We begin with Schur's complement, (3.6), with A being the upper 1 by 1 block, D being the lower N −1 by N −1 block, and B and C being the corresponding off-diagonal blocks. Then for i = 1 and Other elements of R N can be computed by permuting the rows and columns of H N before applying Schur's complement.
Combining the identities (generalized to an arbitrary element) for i = j leads to Additionally, recall the diagonal entries of the resolvent are We begin with (7.11) and then apply (6.11) two times and finally (7.12) to obtain The first term is zero because E[H il ] = 0. We estimate the other terms as in Section 6 and bound each entry of the 2 × 2 blocks. Each entry is a sum of products of entries from the blocks. Thus, by the triangle inequality, it suffices to bound arbitrary products of each block's entries. As before lower case superscripts from the beginning of the alphabet are all either 1 or 2.
To bound the third term we apply Hölder's inequality and directly compute the moments: (7.14) We begin estimating the second term by averaging over the i th row and column of H.
We now apply the Cauchy-Schwarz inequality with (6.15) to get a weaker bound than desired. Once the weaker bound is proven, we will return to (7.15) and prove the desired bound.
which combed with (7.14) implies E[R ij ] = O(N −1 ). Returning to (7.15), applying (6.11) and (7.13) leads to: ]. The first term uses the just verified O(N −1 ) bound and the second uses the Cauchy-Schwarz inequality and a direct computation.
We now establish the following concentration result.

Lemma 7.3 (Concentration of bilinear forms).
Let ε > 0. Fix z ∈ C with 5 ≤ |z| ≤ 6. Then a.s., for N sufficiently large, The proof below is based on the arguments of Bai and Pan [6]. Let ε > 0 and fix 5 ≤ |z| ≤ 6. We will drop the dependence on z and simply write G N to denote the matrix G N (z). We introduce the following notation. Let X (k) N be the matrix obtained from X N by replacing all elements in the k-th column and k-th row with zero. Define G (k) Let r k be the k-th row of X N ; let c k be the k-th column of X N . Let E k denote the conditional expectation given r k+1 , . . . , r N , c k+1 , . . . , c N . Let e 1 , . . . , e N denote the standard basis of C N . Let . We will take advantage of the fact that all the elements of the k-th column and k-th row of G (k) N are zero except that the (k, k)-th element is −1/z. Thus, It follows from the definitions above that Define the events Ω (k) We let 1 Ω (k) N denote the indicator function of the event Ω (k) and η (k) We now collect a variety of preliminary calculations and bounds we will need to complete the proof.
(i) By (3.4) and (7.17), we have Similarly, we obtain 1 By the Schur complement, we have that Thus, on the event Ω N , we have |α (k) On the event Ω c N , α (k) N = 1. Therefore, we conclude that a.s., |α Similarly, we have a.s., (ii) By the Burkholder inequality (Lemma 3.7), for any p > 2, we have Similarly, N . (iv) By (3.4) and (7.17), we have (vii) We note that the entries of r k and c k have mean zero, variance 1/N , and are a.s. bounded by 4L/ √ N . Moreover, (r T k , c k ) and G (k) N are independent. Thus, by Lemma B.1 in Appendix B, for any p ≥ 2, we have (viii) By the bounds in (i), we have sup 1≤k≤N γ (k) Thus, by the Burkholder inequality, for any p ≥ 2, we have We now complete the proof of the lemma. Indeed, it suffices to show that, for any p > 2, We begin by decomposing u * In view of (ii) and the fact that Ω N holds with overwhelming probability, it suffices to show that, for any p > 2, By the resolvent identity, we have By (iii), (iv), (v), and (7.16), we decompose Similarly, by (iii), (iv), and (7.16), we have Therefore, in order to complete the proof, it suffices to show that, for any p > 2, We bound each term individually. By Rosenthal's inequality (Lemma 3.6), we have, for any p > 2, Here we used Lemma B.3 from Appendix B to verify that, for any p ≥ 2, Similarly, by another application of Rosenthal's inequality, one obtains, for any p > 2, we apply (viii) to obtain E|φ N 121 | p = O L,p (N −p/2 ) for any p ≥ 2. By (i), (vii), and Rosenthal's inequality, we have, for any p > 2, By definition of η (k) N , we have From (i), we have and thus, by the Burkholder inequality, we have, for any p ≥ 2, On the other hand, by (vii) and Rosenthal's inequality, for any p > 2, we conclude that The proof of the lemma is complete.
7.4. Proof of Theorem 5.5. We are now ready to prove Theorem 5.5 using the results of the previous subsections.
Since Ω N holds with overwhelming probability, it suffices to show that a.s., for N sufficiently large, The first term is a.s. less than ε/8 by Lemma 7.3. The second term is bounded by noting that R 21 N (0, z) = G N (z) and using (3.3) to conclude that Thus, it suffices to show that We will verify (7.20) by considering the diagonal entries and off-diagonal entries of R 21 N (η, z) separately. For the diagonal terms we write by the Cauchy-Schwarz inequality. By (7.19) and Lemma 7.1, we have Thus, it suffices to show that Since Ω N holds with overwhelming probability, we have (say) by the deterministic bound R N (η, z) ≤ Im(η) −1 . Thus, it suffices to show that From Lemma 7.2 and the Cauchy-Schwarz inequality, we see that and the proof is complete.
Appendix A. Truncation of elliptic random matrices In this appendix, we establish Lemma 3.4.
We take L 0 > 1 sufficiently large such that, for each i ∈ {1, 2}, Var(ξ i ) ≥ 1/2 for all L > L 0 . Assume L > L 0 . Then (3.14) follows by an application of the triangle inequality. Moreover,ξ 1 ,ξ 2 have mean zero and unit variance by construction. Thus, {Ŷ N } N ≥1 is a sequence of random matrices that satisfies condition C0 with atom variables (ξ 1 ,ξ 2 ). We now make use of the following bounds: if ψ is a random variable with finite fourth moment, then We note that Thus, by the Cauchy-Schwarz inequality and (A.2), we obtain for some constant C > 0 depending on M 4 . Similarly, we have for i ∈ {1, 2}.
for some constant C > 0 depending on M 4 . Consider the second term on the left-hand side. We write We now apply [ The proof of the lemma is complete.
This section is devoted to proving a large deviation estimate for bilinear forms. Throughout this section, we let K p denote a constant that depends only on p. These constants are non-random and may take on different values from one appearance to the next.
Proof of Lemma B.1. Let {F i } N i=0 denote the sequence of increasing σ-algebras defined by F i = σ(x 1 , y 1 , x 2 , y 2 , . . . , x i , y i ) for i = 1, 2, . . . , N . Following the usual convention, we let F 0 denote the trivial σ-algebra. We will continually make use of this filtration throughout the proof. We begin by writing (B.1) We will bound each of the three terms on the right-hand side of (B.1) separately. We begin with the first term. By Lemma 3.6, Here we have used where λ 1 (BB * ), . . . , λ N (BB * ) denote the eigenvalues of BB * . Combining the bounds above yields We now consider the second term on the right-hand side of (B.1). By Lemma 3.6, We will bound each of the terms on the right-hand side separately. For the first term, we write Applying Lemma 3.8 and Lemma B.3, we have ≤ K p (tr BB * ) p/2 + µ 2p tr(BB * ) p/2 ≤ K p (µ 4 tr BB * ) p/2 + µ 2p tr(BB * ) p/2 .
For the second term, we apply Lemma 3.6 and obtain . We now note that Combining the two bounds above, we obtain The third term on the right-hand side of (B.1) is similarly bounded. The proof of the lemma is complete.

Appendix C. Properties of the limiting measure
This section is devoted to studying the limiting distribution of the singular values of 1 √ N Y N − zI, where z ∈ C and {Y N } N ≥1 is a sequence of random matrices that satisfy condition C0 with atom variables (ξ 1 , ξ 2 ). In particular, this section contains the proof of Theorem 6.1. Throughout this section, we fix ρ := E[ξ 1 ξ 2 ] with −1 < ρ < 1. Let E ρ be the ellipsoid defined in (1.3). We let √ −1 denote the imaginary unit and reserve i as an index.
Remark C.1. Many of the results in this section also hold when ρ = ±1 (although the proofs are different). In particular, Theorem 6.1 holds when ρ = ±1; see Remark 6.2 for further details.
Let a N (η, z) be the Stieltjes transform of ν 1 √ N Y N −zI (defined in (6.2)). That is, for each z ∈ C, for η ∈ C + := {w ∈ C : Im(w) > 0}. We study the limiting distribution of the singular values by characterizing the limiting Stieltjes transform. We begin with the following lemma.
The proof of the lemma is complete.
Remark C.3. One can also use (C.4) and (C.5) to solve for b and c. Indeed, from Similarly, from (C.5), we have Remark C.4. Fix z ∈ C. If (C.1) holds for all η with Im(η) > 0, then a, b, c can be viewed as functions of η. In this case, an upper bound for a can be obtained (see (C.11)). In fact, in view of Lemma C.7, a can be uniformly bounded from above for all Im(η) > 0. Thus, one can use (C.2) and Remark C.3 to obtain uniform upper bounds on b, c for all Im(η) > 0.
Proof. Fix z ∈ C. Since almost surely 1 is almost surely tight. Existence now follows from a subsequence argument and by applying [5,Theorem B.9] and Lemma C.2.
For the remainder of the section, we fix −1 < ρ < 1 and let v z denote the unique probability measure from Lemma C.6. Let a(η, z) be its Stieltjes transform defined by (C.9) for all η ∈ C + . It follows from Lemma C.2, Lemma C.6, and the calculations in [40] (see also [39]) that a N (η, z) converges almost surely to a(η, z) as N → ∞ for each fixed z ∈ C and η ∈ C + . By [5,Theorem B.9], the sequence of measures given in (C.10) converge almost surely to ν z for each fixed z ∈ C. We now derive some properties of v z .
Proof. Fix z ∈ C. Since almost surely 1 √ N Y N − zI = O z (1) by Lemma 3.3, it follows that ν z is compactly supported.
Choose C sufficiently large such that ν z is supported on [−C /2, C /2]. Let C > 0 be the corresponding constant such that (C.11) holds. For any finite interval I ⊂ R, it follows from [5, Theorem B.8] that ν z (I) ≤ 2C|I|. (C.14) Here we used the fact that the continuity points of the function x → ν z ((−∞, x]) are dense in R. It follows from (C.14) that ν z has bounded density. As the roots of a polynomial depend continuously on the coefficients (see [20,50]), (C.8) and [5,Theorem B.10] imply that ν z has continuous density.
Remark C.9. The measure µ z , defined in the proof of Lemma C.8 above, is the almost sure limit of the empirical spectral measures built from the eigenvalues of ( 1 Lemma C.11 follows from a simple indirect proof; we leave the details to the reader. Using Lemma C.8, we now verify Theorem 6.1. Proof of Theorem 6.1. Since ν z is the almost sure limit of the measures in (C.10), it suffices to show that there exists c > 0 such that ν z ([0, c]) = 0 for all z ∈ C with dist(z, E ρ ) ≥ δ. For each z ∈ C, we define x z := sup{x ≥ 0 : ν z ([0, x]) = 0}.
By Lemma C.7 the set above is nonempty, and hence x z ≥ 0 for all z ∈ C.
We remind the reader that the least singular value of 1 √ N Y N − zI is trivially bounded almost surely by Lemma 3.1 for |z| sufficiently large because we have the almost sure bound 1 √ N Y N = O(1) from Lemma 3.3. Thus, it suffices to prove the theorem for all z in a compact set D ⊂ {z ∈ C : dist(z, E ρ ) ≥ δ}.
We now claim that x z is continuous in z. Indeed, since ν z is the almost sure limit of the measures in (C.10), we obtain almost surely as N → ∞. Thus, by Weyl's perturbation bound (see for instance [12]), for |z−z | ≤ x z , we have almost surely as N → ∞. We conclude that ν z ([0, x z − |z − z |]) = 0 and hence We note that (C.24) trivially holds when |z − z | > x z . Repeating the argument with z and z reversed, we obtain |x z − x z | ≤ |z − z |.
We conclude that x z is continuous in z. Since D is compact, it suffices to show that x z > 0 for all z ∈ D. The claim now follows from Lemma C.8.