Local circular law for the product of a deterministic matrix with a random matrix

It is well known that the spectral measure of eigenvalues of a rescaled square non-Hermitian random matrix with independent entries satisfies the circular law. We consider the product $TX$, where $T$ is a deterministic $N\times M$ matrix and $X$ is a random $M\times N$ matrix with independent entries having zero mean and variance $(N\wedge M)^{-1}$. We prove a general local circular law for the empirical spectral distribution (ESD) of $TX$ at any point $z$ away from the unit circle under the assumptions that $N\sim M$, and the matrix entries $X_{ij}$ have sufficiently high moments. More precisely, if $z$ satisfies $||z|-1|\ge \tau$ for arbitrarily small $\tau>0$, the ESD of $TX$ converges to $\tilde \chi_{\mathbb D}(z) dA(z)$, where $\tilde \chi_{\mathbb D}$ is a rotation-invariant function determined by the singular values of $T$ and $dA$ denotes the Lebesgue measure on $\mathbb C$. The local circular law is valid around $z$ up to scale $(N\wedge M)^{-1/4+\epsilon}$ for any $\epsilon>0$. Moreover, if $|z|>1$ or the matrix entries of $X$ have vanishing third moments, the local circular law is valid around $z$ up to scale $(N\wedge M)^{-1/2+\epsilon}$ for any $\epsilon>0$.


Introduction
Circular law for non-Hermitian random matrices. The study of the eigenvalue spectral of non-Hermitian random matrices goes back to the celebrated paper [19] by Ginibre, where he calculated the joint probability density for the eigenvalues of non-Hermitian random matrix with independent complex Gaussian entries. The joint density distribution is integrable with an explicit kernel (see [19,28]), which allowed him to derive the circular law for the eigenvalues. For the Gaussian random matrix with real entries, the joint distribution of the eigenvalues is more complicated but still integrable, which leads to a proof of the circular law as well [6,10,18,35].
For the random matrix with non-Gaussian entries, there is no explicit formula for the joint distribution of the eigenvalues. However, in many cases the eigenvalue spectrum of the non-Gaussian random matrices behaves similarly to the Gaussian case as N Ñ 8, known as the universality phenomena. A key step in this direction is made by Girko in [20], where he partially proved the circular law for non-Hermitian matrices with independent entries. The crucial insight of the paper is the Hermitization technique, which allowed Girko to translate the convergence of complex empirical measures of a non-Hermitian matrix into the convergence of logarithmic transforms for a family of Hermitian matrices, or, to be more precise, Tr logrpX´zq : pX´zqs " log " detppX´zq : pX´zqq ‰ , (1.1) with X being the random matrix and z P C. Due to the singularity of the log function at 0, the small eigenvalues of pX´zq : pX´zq play a special role. The estimate on the smallest singular value of X´z was not obtained in [20], but the gap was remedied later in a series of paper. Bai [1,2] analyzed the ESD of pX´zq : pX´zq through its Stieltjes transform and handled the logarithmic singularity by assuming bounded density and bounded high moments for the entries of X. Lower bounds on the smallest singular values were given by Rudelson and Vershynin [31,32], and subsequently by Tao and Vu [36], Pan and Zhou [30] and Gőtze and Tikhomirov [21] under weakened moments and smoothness assumptions. The final result was presented in [38], where the circular law is proved under the optimal L 2 assumption. These papers studied the circular law in the global regime, i.e. the convergence of ESD on subsets containing ηN eigenvalues for some small constant η ą 0. Later in a series of papers [7,8,39], Bourgade, Yau and Yin proved the local version of the circular law up to the optimal scale N´1 {2` under the assumption that the distributions of the matrix entries satisfy a uniform sub-exponential decay condition. In [37], the local universality was proved by Tao and Vu under the assumption of first four moments matching the moments of a Gaussian random variable.
( ) a ( ) b ( ) c In this paper, we study the ESD of the product of a deterministic NˆM matrix T with a random MˆN matrix X, where we assume N " M . In Figure 1, we plot the eigenvalue distribution of T X when T have two distinct singular values (except the trivial zero singular values). The goal of this paper is to prove a local circular law for the ESD of T X at any point z away from the unit circle. Following the idea in [7], the key ingredients for the proof are (a) the upper bound for the largest singular value of T X´z, (b) the lower bound for the least singular value of T X´z, and (c) rigidity of the singular values of T X´z. The upper bound for the largest singular value can be obtained by controlling the norm of T X´z through a standard large deviation estimate (see e.g. [9,27,33] and (2.64)). The lower bound for the least singular value of T X´z follows from the results in e.g. [32] and [36] (see also Lemma 2.23). Thus the bulk of this paper is devoted to establish (c).
Basic ideas. To obtain the rigidity of the singular values of T x´z, we study the ESD of Q :" pT X´zq : pT X´zq using Stieltjes transform as in [7]. We normalize X so that its entries have variance pN^M q´1. Then Q is an NˆN Hermitian matrix with eigenvalues being typically of order 1. We denote its resolvent by Rpwq :" pQ´wq´1, where w " E`iη is a spectral parameter with positive imaginary part η. Then the Stieltjes transform of the ESD of Q is equal to N´1Tr Rpwq, and we have the convergence estimate N´1Tr Rpwq « m c pwq (1.2) with high probability for large N . Here m c is the Stieltjes transform of the asymptotic eigenvalue density, and the convergence in (1.2) is referred to as the averaged law. By taking the imaginary part of (1.2), it is easy to see that a control of the Stieltjes transform yields a control of the eigenvalue density on a small scale of order η around E (which contains an order ηN eigenvalues). A local law is an estimate of the form (1.2) for all η " N´1. Such local laws have been a cornerstone of the modern random matrix theory. In [16], a local law was first derived for Wigner matrices. Subsequently in [7], a local law for the resolvent of pX´zq : pX´zq was established to prove the local circular law.
In generalizing the proof in [7] to our setting, a main difficulty is that the entries of T X are not independent. We will use a new comparison method proposed in [24], which roughly states that if the local laws hold for Rpwq with Gaussian X, then they also hold in the case of a general X. For definiteness, we assume N " M for now, and T is a square matrix with singular decomposition T " U DV . For a Gaussian X " X Gauss , we have V X Gauss U d "X Gauss , whereX is another Gaussian random matrix. Then for the determinant in (1.1), detpT X Gauss´z q" detpDV X Gauss U´zq d " detpDX Gauss´z q. (1. 3) The problem is now reduced to the study of the singular values of DX Gauss´z , which has independent entries. Notice the entries of DX Gauss are not identically distributed, which will make our proof much more complicated. However, this issue can be handled, e.g. as in [14], where a local law was obtained for generalized Wigner matrices with non-identically distributed entries.
To use the comparison method invented in [24], it turns out the averaged local law from (1.2) is not sufficient. We have to control not only the trace of Rpwq, but also the matrix Rpwq itself by showing that Rpwq is close to some deterministic matrix Πpwq, provided that η " N´1. This closeness can be established in the sense of individual matrix entries R ij pwq « Π ij pwq (see e.g. [7,17]). We call such an estimate an entrywise local law. More generally, in [4,25] the following closeness was established for generalized matrix entries: xv, Rpwquy « xv, Πpwquy, η " N´1, @}v} 2 , }u} 2 " 1. (1.4) We call the estimate in (1.4) an anisotropic local law. (If Π is a scalar matrix, (1.4) is also referred to as an isotropic local law, in the sense that Rpwq is approximately isotropic for large N .) This kind of anisotropic local law is needed in applying the method in [24]. Here we outline the three steps to establish the anisotropic local law for Q " pT X´zq : pT X´zq: (A) the entrywise local law and averaged local law when T is diagonal (Theorem 2.18); (B) the anisotropic local law when T is diagonal (Theorem 2.18); (C) the anisotropic local law and averaged local law when T is a general (rectangular) matrix (Theorem 2. 19). In performing Step (A), our proof is basically based on the methods in [7]. However, our multivariable self-consistent equations and their solutions are much more complicated here. Thus a key part of the proof is to establish some basic properties of the asymptotic eigenvalue density and prove the stability of the self-consistent equations under small perturbations. These work need some new ideas and analytic techniques (see Appendix A). In performing Step (B), we applied and extended the polynomialization method developed in [4, section 5]. Finally, as remarked around (1.3), (B) implies the anisotropic local law for a Gaussian X and a general T . Based on this fact we perform Step (C) using a self-consistent comparison argument in [24]. With the averaged local law proved in Step (C), we can prove the local circular law for T X. In general, the averaged local law we get is up to the non-optimal scale η " pN^M q´1 {2 . As a result, we can only prove the local circular law for T X up to the scale pN^M q´1 {4` . A new observation is that the non-optimal averaged local law can lead to the optimal local circular law for T X outside the unit circle (i.e. |z| ą 1) (see Section 2.4). To prove the optimal local circular law inside the unit circle (i.e. |z| ă 1), we need the optimal averaged local law up to the scale η " pN^M q´1, which can be obtained under the extra assumption that the entries of X have vanishing third moments.
Conventions. The fundamental large parameter is N and we assume that M is comparable to N (see (2.1)). All quantities that are not explicitly constant may depend on N , and we usually omit N from our notation. We use C to denote a generic large positive constant, which may depend on fixed parameters and whose value may change from one line to the next. Similarly, we use c or to denote a generic small positive constant. If a constant depend on a quantity a, we use Cpaq or C a to indicate this dependence. We use τ ą 0 in various assumptions to denote a small positive constant, and use ζ, τ 1 to denote constants that depend on τ and may be chosen arbitrarily small. All constants C, c and may depend on τ ; we neither indicate nor track this dependence.
For any (complex) matrix A, we use A : to denote its conjugate transpose, A T the transpose, }A} the operator norm and }A} HS the Hilbert-Schmidt norm. We use the notation v " pv i q n i"1 for a vector in C n , and denote its Euclidean norm by |v| " }v} 2 . We usually write the nˆn identity matrix I n as 1 without causing any confusions.
For two quantities A N and B N ą 0 depending on N , we use the notations A N " OpB N q and A N " B N to mean |A N | ď CB N and C´1B N ď |A N | ď CB N , respectively, for some positive constant C ą 0. We use A N " opB N q to mean |A N | ď c N B N for some positive constant c N Ñ 0 as N Ñ 8. If A N is a matrix, we use the notations A N " OpB N q and A N " opB N q to mean }A N } " OpB N q and }A N } " opB N q, respectively. for some small τ ą 0. We assume the entries X iµ of X are independent (not necessarily identically distributed) random variables satisfying for all 1 ď i ď M, 1 ď µ ď N . For definiteness, in this paper we only focus on the case where all matrix entries are real. However, our results and proofs also hold, after minor changes, in the complex case if we assume in addition EX 2 iµ " 0 for X iµ P C. We assume that for all p P N, there is an N -independent constant C p such that for all 1 ď i ď M, 1 ď µ ď N . We define Σ :" T T : , and assume the eigenvalues of Σ satisfy that τ´1 ě σ 1 ě σ 2 ě¨¨¨ě σ N^M ě τ (2.4) and all other eigenvalues are 0. We can normalize T by multiplying a scalar such that 1 N^M N^M ÿ i"1 σ i " 1. (2.5) We summarize our basic assumptions here for future reference.

The main theorem
Our main result is Theorem 2.6. To state it, we need to define the asymptotic eigenvalue density function for Q. We first introduce the self-consistent equations, and the asymptotic eigenvalue density will be closely related to their solutions. Define ρ Σ :" 1 N^M N^M ÿ i"1 δ σi (2.6) as the empirical spectral density of Σ. Let n :" |supp ρ Σ | be the number of distinct nonzero eigenvalues of Σ, which are denoted as τ´1 ě s 1 ą s 2 ą¨¨¨ą s n ě τ. (2.7) Let l i be the multiplicity of s i . By (2.5), l i and s i satisfy the normalization conditions 1 N^M n ÿ i"1 l i " 1, 1 N^M n ÿ i"1 l i s i " 1. (2.8) For each w P C`:" tw P C : Im w ą 0u, we define the self-consistent equations of pm 1 , m 2 q as 1 m 2 "´wp1`m 1 q`| z| The next lemma states that the solution to (2.11) in C`is unique if z is away from the unit circle. It is proved in Appendix A.3.
Lemma 2.2. Fix z P C such that |z| ‰ 1. For w P C`, there exists at most one analytic function m 1c,z,Σ pwq : C`Ñ C`such that (2.11) holds and wm 1c,z,Σ pwq P C`. Moreover, m 1c,z,Σ,N pwq is the Stieltjes transform of a positive integrable function ρ 1c with compact support in r0, 8q.
As a convention, for w P C`, we take ? w to be the branch with positive imaginary part. Define m :" ? wp1`m 1 q and m c :" ? wp1`m 1c q. Equation (2.11) then becomes The following lemma gives the basic structure of supp ρ 1,2c . Its proof is given in Appendix A.1.
(i) We say that the edge e k ‰ 0, k " 1, . . . , 2L, is regular if min 1ďiďn t|m c pe k q´a i pe k q|, |m c pe k q´b i pe k q|, |m c pe k q`c i pe k q|u ě , In the case |z| 2 ď 1´τ , we always call e 2L " 0 a regular edge.
(ii) We say that the bulk components re 2k , e 2k´1 s is regular if for any fixed τ 1 ą 0 there exists a constant cpτ, τ 1 q ą 0 such that the density of ρ 1c in re 2k`τ 1 , e 2k´1´τ 1 s is bounded from below by c.
Remark 1: The edge regularity conditions (i) has previously appeared (may be in slightly different forms) in several works on sample covariance matrices and Wigner matrices [3,11,23,24,26,29]. The conditions (2.18) and (2.19) guarantees a regular square-root behavior of ρ 1c near e k and ensures that the gap in the spectrum of ρ 1c adjacent to e k does not close for large N (Lemma A.5), min l‰k |e l´ek | ě (2.20) for some constant ą 0. The bulk regularity condition (ii) was introduced in [24]. It imposes a lower bound on the density of eigenvalues away from the edges. Without it, one can have points in the interior of supp ρ 1c with an arbitrarily small density and our arguments would fail.

Remark 2:
The regularity conditions in Definition 2.4 are stable under perturbations of |z| and ρ Σ . In particular, fix ρ Σ , suppose the regularity conditions are satisfied at z " z 0 with τ ď ||z 0 | 2´1 | ď τ´1. Then for sufficiently small c ą 0, the regularity conditions hold uniformly in z P tz : ||z|´|z 0 || ď cu. For a detailed discussion, see the remark at the end of Section A.3.
We will use the following notion of stochastic domination, which was first introduced in [12] and subsequently used in many works on random matrix theory, such as [4,5,7,13,14,24]. It simplifies the presentation of the results and their proofs by systematizing statements of the form "ξ is bounded by ζ with high probability up to a small power of N ".

Definition 2.5 (Stochastic domination). (i) Let
ξ "´ξ pN q puq : N P N, u P U pN q¯, ζ "´ζ pN q puq : N P N, u P U pN q7 be two families of nonnegative random variables, where U pN q is a possibly N -dependent parameter set. We say ξ is stochastically dominated by ζ, uniformly in u, if for any (small) ą 0 and (large) for large enough N ě N 0 p , Dq, and we use the notation ξ ă ζ. Throughout this paper the stochastic domination will always be uniform in all parameters that are not explicitly fixed (such as matrix indices, and w and z that take values in some compact sets). Note that N 0 p , Dq may depend on quantities that are explicitly constant, such as τ and C p in (2.1), (2.3) and (2.4).
(ii) If for some complex family ξ we have |ξ| ă ζ, we also write ξ ă ζ or ξ " O ă pζq. We also extend the definition of O ă p¨q to matrices in the weak operator sense as follows. Let A be a family of complex square random matrices and ζ a family of nonnegative random variables. Then we use A " O ă pζq to mean }A} ă ζ, where }A} is the operator norm of A.
(iv) We say that an event Ξ holds with high probability if 1´1pΞq ă 0.
In the following, we denote the eigenvalues of T X as µ j , 1 ď j ď N . We are now ready to state our main theorem, i.e. the general local circular law for T X.
Theorem 2.6 (Local circular law for T X). Suppose Assumption 2.1 holds, and τ ď ||z 0 | 2´1 | ď τ´1 for any N (z 0 can depend on N ). Suppose ρ Σ (defined in (2.6)) and |z 0 | are such that all the edges and bulk components of ρ 1c are regular in the sense of Definition 2.4. We assume in addition that the entries of X have a density bounded by N C2 for some C 2 ą 0. Let F be a smooth non-negative function which may depend on N , such that }F } 8 ď C 1 , }F 1 } 8 ď N C1 and F pzq " 0 for |z| ě C 1 , for some constant C 1 ą 0 independent of N . Let F z0,a pzq " K 2a F pK a pz´z 0 qq, where K :" N^M . Then T X has pN´Kq trivial zero eigenvalues, and for the other eigenvalues µ j , 1 ď j ď K, we for any a P p0, 1{4s. Hereχ where ρ 2c " ρ 2c,z,Σ is defined in (2.12). If 1`τ ď |z 0 | 2 ď 1`τ´1 or the entries of X have vanishing third moments, EX 3 iµ " 0, (2.23) for any a P p0, 1{2s. If N " M , the bounded density condition for the entries of X is not necessary.
Remark 1: Note that F z0,a pzq " K 2a F pK a pz´z 0 qq is an approximate delta function obtained from rescaling F to the size of order K´a around z 0 . Thus (2.21) gives the general circular law up to scale K´1 {4` , while (2.24) gives the general circular law up to scale K´1 {2` . Theχ D in (2.22) gives the distribution of the eigenvalues of T X. It is rotationally symmetric, because ρ 2c px, zq only depends on |z| (see (2.9) and (2.10)). When T is the identity matrix,χ D becomes the indicator function χ D on the unit disk D, and we get the well-known local circular law for X [7]. For a general T , we do not have much understanding ofχ D so far. This will be one of the topics of our future study. Also, we have assumed that z is strictly away from the unit circle. Our proof may be extended to the |z´1| " op1q case if we have a better understanding of the solutions m 1,2c to equations (2.9) and (2.10).
Remark 2: As explained in the Introduction, the basic strategy of this paper is first to prove the anisotropic local law for the resolvent of Q when X is Gaussian, and then to get the anisotropic local law for a general X through comparison with the Gaussian case. Without (2.23), our comparison arguments do not give the anisotropic local law up to the optimal scale, so we can only prove the weaker bound (2.21). We will try to remove this assumption in future works. Example 2.8 (Continuous limit). We suppose ρ Σ is supported in some interval ra, bs Ă p0, 8q, and that ρ Σ converges in distribution to some measure ρ 8 that is absolutely continuous and whose density satisfies τ ď dρ 8 pEq{dE ď τ´1 for E P ra, bs. Then there are only a small number (which is independent of n) of connected components for supp ρ 1c , and all the edges and bulk components are regular. See the remark at the end of Section A.1.

Hermitization and local laws for resolvents
In the following, we use the notation Y " Y z :" T X´zI, (2.25) where I is the identity matrix. Following Girko's Hermitization technique [20], the first step in proving the local circular law is to understand the local statistics of singular values of Y . In this subsection, we present the main local estimates concerning the resolvents`Y Y :´w˘´1 and Y : Y´w˘´1. These results will be used later to prove Theorem 2.6. Our local laws can be formulated in a simple, unified fashion using a 2Nˆ2N block matrix, which is a linear function of X. Definition 2.10 (Groups). For an IˆI matrix A, we define the 2ˆ2 matrix A rijs as We shall call A rijs a diagonal group if i " j, and an off-diagonal group otherwise .
Definition 2.11 (Linearizing block matrix). For w :" E`iη P C`, we define the IˆI matrix Hpwq " HpT, X, z, wq :"ˆ´w where we take the branch of ? w with positive imaginary part. Define the IˆI matrix Gpwq " GpT, X, z, wq :" Hpwq´1, (2.28) as well as the I 1ˆI1 and I 2ˆI2 matrices Throughout the following, we frequently omit the argument w from our notations.
By Schur's complement formula, it is easy to see that (2.30) Therefore a control of G immediately yields controls of the resolvents G L and G R .
In the following, we only consider the N ď M case. The N ą M case, as we will see, will be built easily upon N ď M case. We introduce a deterministic matrix Π, which will be proved to be close to G with high probability. Definition 2.12 (Deterministic limit of G). Suppose N ď M and T has a singular decomposition T " UDV,D " pD, 0q, (2.31) where D " diagpd 1 , d 2 , . . . , d N q is a diagonal matrix. Define π risc to be the 2ˆ2 matrix such that Let Π d be the 2Nˆ2N matrix with pΠ d q riis " π risc and all other entries being zero. Define Definition 2.13 (Averaged variables). Suppose N ď M . Define the averaged random variables Define π ris to be the 2ˆ2 matrix such that Remark: Note that under the above definition we have which is the Stieltjes transform of the empirical eigenvalue density of Y Y : and Y : Y . Moreover, we will see from the proof that m 1,2c are the almost sure limits of m 1,2 as N Ñ 8 with The following two propositions summarize the properties of ρ 1,2c and m 1,2c that are needed to understand the main results in this section. They are proved in Appendix A. In Fig. 2 we plot ρ 2c for the example from Fig. 1 in the cases |z| ą 1 and |z| ă 1, respectively. Figure 2: The densities ρ 2c px, zq when |z| " 0.5, 0.75, 1.2, 1.5. Here ρ Σ " 0.5δ ?
(iii) Suppose e j is a nonzero regular edge. If j is even, then ρ 1c pxq " ? x´e j as x Ñ e j from above. Otherwise if j is odd, then ρ 1c pxq " ? e j´x as x Ñ e j from below.
The same results also hold for ρ 2c . In addition, ρ 2c is a probability density.
We will consistently use the notation E`iη for the spectral parameter w. In this paper, we regard the quantities Epwq and ηpwq as functions of w and usually omit the argument w. In the following we would like to define several spectral domains of w that will be used in the proof. Definition 2.16 (Spectral domains). Fix a small constant ζ ą 0 which may depend on τ . The spectral parameter w is always assumed to be in the fundamental domain D " Dpζ, N q :" tw P C`: 0 ď E ď ζ´1, N´1`ζ|m 2c |´1 ď η ď ζ´1u.
(2.39) unless otherwise indicated. Given a regular edge e k , we define the subdomain Corresponding to a regular bulk component re 2k , e 2k´1 s, we define the subdomain D b k " D b k pζ, τ 1 , N q :" tw P Dpζ, N q : E P re 2k`τ 1 , e 2k´1´τ 1 su. We also need the following domain with large η, D L " D L pζq :" tw P C`: 0 ď E ď ζ´1, η ě ζ´1u, (2.43) and the subdomain of D Y D L , We call S a regular domain if it is a regular D e k or D b k domain, a D o domain or a D L domain. Remark: In the definition of D, we have suppressed the explicit w-dependence. Notice that when |z| 2 ă 1´τ , since |m 2c | " |w|´1 {2 as w Ñ 0, we allow η " |w| " N´2`2 ζ in D. In the definition of D e k , the condition E ě 0 is only for the edge at 0 when |z| 2 ď 1´τ . Now we are prepared to state the various local laws satisfied by G defined in (2.28 The local laws for G with a general T will be built upon the following result with a diagonal T . Theorem 2.18 (Local laws when T is diagonal). Fix τ ď ||z| 2´1 | ď τ´1. Suppose Assumption 2.1 holds, N " M , and T " D :" diagpd 1 , ..., d N q is a diagonal matrix. Let S be a regular domain. Then the entrywise local law, anisotropic local law and averaged local law hold with parameters pD, X, z, Sq. Now suppose that N ď M and T is an NˆM matrix such that the eigenvalues of Σ satisfy (2.4) and (2.5). Consider the singular decomposition T " UDV , where U is an NˆN unitary matrix, V is an MˆM unitary matrix andD " pD, 0q is an NˆM matrix such that D " diagpd 1 , d 2 , . . . , d N q. Then we have T X´z " U DV 1 X´z, where V 1 is an NˆM matrix and V 2 is an pM´N qˆM matrix defined through V "ˆV 1 V 2˙. If X " X Gauss is Gaussian, then V 1 X Gauss d "X Gauss U : withX being an NˆN Gaussian random matrix. Then by the definition of G in (2.28), GpT, X Gauss , z, wq d "ˆU 0 0 U˙G pD,X Gauss , z, wqˆU (2.50) Since the anisotropic local law holds for GpD,X Gauss , z, wq by Theorem 2.18, we get immediately the anisotropic local law for GpT, X Gauss , z, wq. The next theorem states that the anisotropic local law holds for general T X provided that the anisotropic local law holds for T X Gauss . --Theorem 2.19 (Anisotropic local law when N ď M ). Fix τ ď ||z| 2´1 | ď τ´1. Suppose Assumption 2.1 holds and N ď M . Let T " UDV be a singular decomposition of T , whereD " pD, 0q with D " diagpd 1 , d 2 , . . . , d N q. Let S be a regular domain. Then the anisotropic local law and averaged local law hold with parameters pT, X, z, S X p Dq. If in addition (2.23) holds, then the anisotropic local law and averaged local law hold with parameters pT, X, z, Sq.
Finally we turn to the N ą M case. Suppose T " UDV is a singular decomposition of T , where U is an NˆN unitary matrix, V is an MˆM unitary matrix andD "ˆD 0˙i s an NˆM matrix such that D " diagpd 1 , d 2 , . . . , d M q. Let U " pU 1 , U 2 q, where U 1 has size NˆM and U 2 has size NˆpN´M q. Following Girko's idea of Hermitization [20], to prove the local circular law in Theorem 2.6 when N ą M , it suffices to study detpT X´zq (see (2.52) below), for which we have Comparing with (2.49), we see that this case is reduced to the N ď M case, with the only difference being that the extra p´zq N´M term corresponds to the N´M zero eigenvalues of T X. Thus we make the following claim.
Claim 2.20. The N ă M case of Theorem 2.6 implies the N ą M case of Theorem 2.6.

Proof of Theorem 2.6
By Claim 2.20, it suffices to assume N ď M . Our main tool will be Theorem 2.19. A major part of the proof follows from [7, Section 5]. The following lemma collects basic properties of stochastic domination ă, which will be used tacitly during the proof and throughout this paper. (ii) If ξ 1 puq ă ζ 1 puq uniformly in u P U and ξ 2 puq ă ζ 2 puq uniformly in u P U , then ξ 1 puqξ 2 puq ă ζ 1 puqζ 2 puq uniformly in u P U .
(iii) Suppose that Ψpuq ě N´C is deterministic and ξpuq is a nonnegative random variable such that Eξpuq 2 ď N C for all u. Then if ξpuq ă Ψpuq uniformly in u, we have Eξpuq ă Ψpuq uniformly in u.
The Girko's Hermitization technique [20] can be reformulated as the following (see e.g. [22]): for any smooth function g, where 0 ď λ 1 ď λ 2 ď . . . ď λ N are the ordered eigenvalues of Y pzqY : pzq. For g " F z0,a , we use the new variable ξ " N a pz´z 0 q to write the above equation as Define the classical location γ j pzq of the j-th eigenvalue of Y pzqY : pzq by By Proposition 2.14, we have that for any δ ą 0ˇˇˇˇN Thus we obtain (2.21) if we can prove (2.56) for b " 1{2, and we obtain (2.24) if we can can prove (2.56) for b " 0 when 1`τ ď |z 0 | 2 ď 1`τ´1 or the assumption (2.23) holds. We need the following lemma which is a consequence of Theorem 2.19. Recall (2.16) and (2.20), the number of components L has order 1 and each component re 2k , e 2k´1 s contains order N of γ j 's. We define the classical number of eigenvalues to the left of the edge e k , 1 ď k ď 2L, as (2.57) Note that N 2L " 0, N 1 " N and N 2k`1 " N 2k , 1 ď k ď L´1. (i) If the averaged local law holds with parameters pT, X, z, Dpζ, N q X p Dpζ, N qq for arbitrarily small ζ, then the following estimates hold. For any e 2k ą 0 and N 2k`N 1{2` ď j ď N 2k´1´N 1{2` , In the case |z| 2 ď 1´τ with e 2L " 0, we have for any Moreover, if 1`τ ď |z| 2 ď 1`τ´1, then for any fixed 0 ă c ă e 2L , #tj : 0 ă λ j ă cu ă 1. (2.60) (ii) If the averaged local law holds with parameters pT, X, z, Dpζ, N qq for arbitrarily small ζ, then the following estimates hold. For any e 2k ą 0 and N 2k`N ď j ď N 2k´1´N , In the case |z| 2 ď 1´τ with e 2L " 0, we have for any N 2L`N ď j ď N 2L´1´N , Through a standard large deviation estimate, we have the following bound (see e.g. [9,27,33]), where c 0 , C 0 ą 0 are constants. Thus we have Together with Lemma 2.23 concerning the smallest singular value of T X´z, we get Since |log γ j | ă 1 by Proposition 2.14, we conclude (2.70) Then we use (2.58) and the bound (2.65) to estimate thaťˇˇˇˇN Thus we concludeˇˇˇˇˇN (2.71) Using γ j " 1, (2.60) and (2.73), we geťˇˇˇˇN  Proof. To prove (2.73), we need to prove that for any , C ą 0. In the case N " M without the bounded density assumption, we have λ 1 pzq ě τ λ 1 1 pzq, where λ 1 1 pzq is the smallest singular values of X´T´1z. Following [32] or [36, Theorem 2.1], we have | log λ 1 1 pzq| ă 1, which further proves (2.73). Now we turn to the case N ă M with the bounded density assumption. By (2.49) we have that T X´z " U DpV 1 X´D´1U´1zq ": U DỸ pzq.
Hence it suffices to control the smallest singular value ofỸ pzq, call itλ 1 pzq. Notice the columns Y 1 , . . . ,Ỹ N ofỸ pzq are independent vectors. From the variational characterizatioñ we can easily get where u k is the unit normal vector of spantỸ l , l ‰ ku and hence is independent ofỸ k . By conditioning on u k , we get immediately which is a much stronger result than (2.74). Here we have used Theorem 1.2 of [34] to conclude that xȲ k , u k y for fixed u k has density bounded by CN C3 .

Outline of the paper
The rest of this paper is devoted to the proof of Theorems 2.18 and 2.19. In Section 3, we collect the basics tools that we shall use throughout the proof. In Section 4, we perform step (A) of the proof by proving the entrywise local law and averaged local law in Theorem 2.18 under the assumption that T is diagonal. We first prove a weak version of the entrywise local law in Sections 4.1-4.3, and then improve the weak law to the strong entrywise local law and averaged local law in Sections 4.4-4.5.
In Section 5, we perform step (B) of the proof by proving the anisotropic local law in Theorem 2.18 using the entrywise local law proved in Section 4. Finally in Section 6 we finish the step (C) of the proof, where using Theorem 2.18, we prove Theorem 2.19 with a self-consistent comparison method. The first part of Appendix A establishes the basic properties of ρ 1,2c stated in Lemma 2.3 and Proposition 2.14. In Sections A.2 and A.3, we establish some key estimates on m 1,2c and the stability of the self-consistent equation (2.11) on regular domains.

Basic tools
In this preliminary section, we collect various identities and estimates that we shall use throughout the following. For J Ă I, we define the minor H pJq :" tH st : s, t P IzJu, and correspondingly G pJq :" pH pJq q´1. Let rJs :" ts P I : s P J ors P Ju. We also denote H rJs :" tH st : s, t P IzrJsu and G rJs :" pH rJs q´1. We abbreviate ptsuq " psq, pts, tuq " pstq, rtsus " rss and rts, tus " rsts.
Notice that by the definition, we have H (i) For i P I 1 and µ P I 2 , we have For i ‰ j P I 1 and µ ‰ ν P I 2 , we have (ii) For i P I 1 and µ P I 2 , we have (3.4) (iii) For r P I and s, t P Iztru, (3.5) (iv) All of the above identities hold for G pJq instead of G for J Ă I.
Proof. All these identities can be proved using Schur's complement formula. They have been previously derived and summarized e.g. in [14,15,17].

9)
and G´1 riis "´G rks riis¯´1´G´1 riis G riks G´1 rkks G rkis´G rks riis¯´1 . (3.10) (iii) All of the above identities hold for G rJs instead of G for J Ă I.
Proof. These identities can be proved using Schur's complement formula. The details are left to the reader.
Next we introduce the spectral decomposition of G. Let be the singular decomposition of Y , where λ 1 ě λ 2 ě . . . ě λ N ě 0 and tξ k u N k"1 and tζku N k"1 are orthonormal bases of C I1 and C I2 respectively. Then by (2.30), we have k¸.
(3.11) Definition 3.4 (Generalized entries). For v, w P C I , s P I and an IˆI matrix A, we shall denote A vw :" xv, Awy, A vs :" xv, Ae s y, A sw :" xe s , Awy, where e s is the standard unit vector.
Given vectors v P C I1 and w P C I2 , we always identify them with their natural embeddingŝ v 0˙a ndˆ0 w˙i n C I . The exact meanings will be clear from the context.
The following estimates hold uniformly for any w P Dpτ, N q. We have Let v P C I1 and w P C I2 , we have the bounds All of the above estimates remain true for G pJq instead of G for J Ă I.
Proof. The estimates in (3.13) follow from (3.11). For any unit vectors x, y P C I1 , we have For any unit vectors x P C I1 and y P C I2 , we have For the other two blocks of G, we can prove similar estimates. This implies (3.13). It is trivial to generalize the proof to B w G, where η´2 comes from the pλ k´w q´2 factor of B w G. For (3.14), we observe that Similarly, we can prove the identity for ř µPI2 |G µw | 2 and (3.15). For identity (3.16), first we can prove |G iw | 2 using (3.11). Then we use (2.30) and (3.18) to get Identity (3.17) can be proved in a similar way.
The following Lemma give useful large deviation bounds. See Theorem B.1 and Lemmas B.2-B.4 in [13] for the proof. See also Theorem C.1 of [14].
If the coefficients pa pN q ij q and pb pN q i q depend on some parameter u, then all of the above estimates are uniform in u.
We have stated some basic properties of ρ 1,2c and m 1,2c in Lemma 2.3 and Proposition 2.14. Now we collect more estimates for m 1,2c that will be used in the proof. The next lemma is proved in Appendix A.2. For w " E`iη P D, we define the distance to the spectral edge through Notice in the |z| ă 1 case, we do not take into consideration the edge at e 2L " 0.
Case 3 Suppose e k ‰ 0 is a regular edge. Then for w P D e k pζ, τ 1 , N q, if τ 1 ą 0 is small enough, Case 4 Suppose |z| 2 ď 1´τ . We take e 2L " 0 and τ 1 ą 0 to be small enough. Then for w P D e 2L pζ, for some constant t ą 0, and Im m 1,2c " |w|´1 {2 . (3.27) In Cases 1-4, we haveˇˇˇw where c ą 0 is some constant that may depend on τ and τ 1 . In Case 5, we havěˇˇw Note that the uniform bounds (3.29) and (3.30) guarantee that the matrix entries of Πpwq remain bounded. We have the following Lemma, which is prove in Appendix A.2.
Lemma 3.8. In Cases 1-4 of Lemma 3.7, we have and in Case 5 of Lemma 3.7, we have For all the cases in Lemma 3.7, uniformly in w and any deterministic unit vector v P C I .
The self-consistent equation (2.11) can be written as The stability of (3.34) roughly says that if Υpw, m 1 q is small and m 1 pw 1 q´m 1c pw 1 q is small for w 1 :" w`iN´1 0 , then m 1 pwq´m 1c pwq is small. For an arbitrary w P D, we define the discrete set Thus, if Im w ě 1 then Lpwq " twu, and if Im w ă 1 then Lpwq is a 1-dimensional lattice with spacing N´1 0 plus the point w. Obviously, we have |Lpwq| ď N 10 .
Definition 3.9 (Stability of (3.34)). We say that (3.34) is stable on D if the following holds. Suppose that N´2|m 1c | ď δpwq ď plog N q´1|m 1c | for w P D and that δ is Lipschitz continuous with Lipschitz constant ď N 4 . Suppose moreover that for each fixed E, the function η Þ Ñ δpE`iηq is non-increasing for η ą 0. Suppose that u 1 : D Ñ C is the Stieltjes transform of a positive integrable function. Let w P D and suppose that for all w 1 P Lpwq we have Then for some constant C ą 0 independent of w and N . We say that (3.34) is stable on D L if for 0 ď δpwq ď plog N q´1|m 1c |, (3.37) implies for some constant C ą 0 independent of w and N .
This stability condition has previously appeared in [4,7,24]. In [24], for example, the stability condition was established under various regularity assumptions. In the following lemma, we establish the stability on each regular domain. The proof is presented in Appendix A.3. This lemma leaves the case |w| 1{2`| z| 2 " op1q alone. We will handle this case in a different way in Section 4.5.

Entrywise local law when T is diagonal
In this section we prove the entrywise local law and averaged local law in Theorem 2.18 when T is diagonal. The proof is similar to the previous proofs of entrywise locals laws in e.g. [4,5,7,24]. We basically follow the ideas in [7], and we will provide necessary details for the parts that are different from the previous proofs.
The main novel observation of this section is that the self-consistent equations (2.9) and (2.10) can be "derived" from the random matrix model by an application of Schur's complement formula. It is helpful to give a heuristic argument here. We introduce the conditional expectation E ris r¨s :" Er¨| H ris s, i.e. the partial expectation in the randomness of the i andī-th rows and columns of H. For the diagonal G riis group, we ignore formally the random fluctuations in (3.6) to get that where we use the definition of m 1 and m 2 in (2.34). The 11 entry of (4.1) gives the equation from which we get that Summing over i and using that N´1 which gives (2.9). Multiplying (4.2) with |d i | 2 and summing over i, we get the self-consistent equation (2.10). In this section we give a justification of these approximations. Before we start the proof, we make the following remark. In this section we mainly focus on the domain D. On the domain D L , the proofs are much simpler and we only describe them briefly. The parameter z can be either inside or outside of the unit circle. Recall Lemmas 3.7 and 3.10, the domain D of w can be divided roughly into four cases: w near a nonzero regular edge, w Ñ 0, w in the bulk, or w outside the spectrum. In this section we will only consider the case |z| 2 ď 1´τ since it covers all four different behaviors. Notice in this case |m 1,2c pwq| " |w|´1 {2 for w in any compact set of C`by Proposition 2.15. Also due to the remark above Lemma 3.10, in Sections 4.1-4.4, we assume |w| 1{2`| z| 2 ě c for some c ą 0. We will handle the |w| 1{2`| z| 2 " op1q case in Section 4.5.

The self-consistent equations
To begin with, we prove the following weak version of the entrywise local law.
For the purpose of proof, we define the following random control parameters.
For J Ď I, define the averaged variables m  : The averaged error and the random control parameter are defined as Remark: By (2.4), we immediately get that and θ " OpΛq, since |m 1´m1c | ď τ´1Λ, |m 2´m2c | ď Λ.
We introduce the Z variables Z rJs ris :" p1´E ris q´G rJs riis¯´1 .
By the identity (3.6) we have where C " Cpτ q is a constant depending only on τ .
Proof. For i P I 1 , we have |m 1´m where in the first step we use (3.5), in the second and third steps the equality (3.15). Similarly, using (3.5) and (3.16) we get By induction on the indices in rJs, we can prove (4.12). The proof for m 2 is similar.
for s ‰ t P t1, 2u, (4.14) uniformly in w P D Y D L . In particular, these imply that Proof. Apply the large deviation Lemma 3.6 to Z ris in (4.10), we get thaťˇˇˇˇ`Z where in the third step we use the equality (3.14). Similarly we can prove the bound for`Z ris˘2 2 using Lemma 3.6 and (3.15). Now we consider` For the other part, we use Lemma 3.6 and (3.17) to get thaťˇˇ´D Similarly we can prove the estimate for`Z ris˘2 1 . Now we prove (4.15). By the definitions (4.7) and using (4.11), we get thaťˇ`Z We can estimate`Z ris˘2 2 and the third term in (4.14) in a similar way. For the Cases 1-4 in Lemma 3.7, we have |m 1c | " 1 for |w| " 1, Im m 1c " |w|´1 {2 " |m 1c | for |w| Ñ 0, and η ď CIm m 1c . Thus Then for the second term in (4.14), we have that This concludes (4.15). Finally, the estimate (4.16) follows directly from (4.13), (4.14) and (3.13).

The large η case
It remains prove Proposition 4.1 on domain D. We would like to fix E and then apply a continuity argument in η by first showing that the rough bound Λ ď |w|´1 {2 plog N q´1 in Lemma 4.5 holds for large η. To start the argument, we first need to establish the estimates on G when η " 1. The next lemma is a trivial consequence of (3.13). for some C ą 0. This estimate also holds if we replace G with G pJq for J Ă I.  .21), where }π´1 ris } " Op1q and } ris } ă N´1 {2 . Notice since G riis " Op1q, we have the estimate Then we can expand (4.31) to get that {2¯. (4.32) The 11 and 22 entries of (4.32) leads to the equations (4.34) Our goal is to prove that Im m 1,2 ě Cplog N q´1 with high probability for some C ą 0.
Using the spectral decomposition (3.11), we note that for l ą 1, Summing up these two inequalities and optimizing l, we get Assume that Im m 2 ď Cplog N q´1, then by (4.8) we also have Im m 1 ď Cτ´1plog N q´1. From (4.35), we get |m 2 | ď Cplog N q´1 {2 . Together with Im w " η ě c and Imr|z| 2 {p1`m 1 qs ă 0, (4.33) gives with high probability. Using the above estimate and On the other hand where we use Imr|z| 2 {p1`|d i | 2 m 2 qs ă 0 and Hence (4.34) implies Im m 2 ě c with high probability for some c ą 0. This contradicts Im m 2 ď Cplog N q´1. Thus Im m 2 ě Cplog N q´1 with high probability for some C ą 0 , which also implies Im m 1 ě Cplog N q´1 by (4.8).

Proof of the weak entrywise local law
In this subsection, we finish the proof of Proposition 4.1 on domain D. We shall fix the real part E of w " E`iη and decrease the imaginary part η. Recall Lemma 4.5 is based on the condition Λ ď |w|´1 {2 plog N q´1 (i.e. event Ξ). So far this is only established for large η in (4.30). We want to show this condition for small η also by using a continuity argument. It is convenient to introduce the random function vpwq " max where Lpwq is defined in (3.36). Fix a regular domain S, an ă ζ{4 and a large D ą 0. Our goal is to prove that with high probability there is a gap in the range of v, i.e.
for all w P S and large enough N ě N p , Dq.
Suppose vpwq ď N , then it is easy to verify for all w 1 P Lpwq. Hence tvpwq ď N u Ă Ξpw 1 q for all w 1 P S X Lpwq. Then by (4.19), we have that for all w 1 P S XLpwq, there exists an N 0 " N 0 p , Dq such that for all N ą N 0 . Taking the union bound we get Now consider the event N Im w 1 . We now apply Lemma 3.10. If κ ! 1 (recall (3.21)), then |w| " 1 and we have for all w 1 P Lpwq; if κ ě c ą 0 for some constant c ą 0, then for all w 1 P Lpwq. Combining these two cases we get for all w 1 P Lpwq. By (4.19), we have , for all w 1 P S XLpwq. Combining this bound with (4.44), we see there is Taking the union bound over Lpwq we get (4.39) for all N ě maxtN 0 , N 1 u. Now we conclude the proof of Proposition 4.1 by combining (4.39) with the large η estimate (4.30). We choose a lattice ∆ Ă S such that |∆| ď N 20 and for any w P S there is a w 1 P ∆ with |w 1´w | ď N´9. Taking the union bound we get Since v has Lipshcitz constant bounded by, say, N 6 , then we have Combining with (4.30), we see that there exists N 2 " N 2 p , Dq such that for N ą N 2 , Since and D are arbitrary, the above inequality shows that vpwq ă 1 uniformly in w P S, or (4.48) In particular we see that for all w P S, the event Ξ holds with high-probability. Now using (4.23) and (4.48), we get (4.49) To conclude Proposition 4.1, it remains to prove the estimate for the off-diagonal entries. By (4.11), it is not hard to see that for any |J| ď l with l P N fixed. Thus we have G rJs riis " O`|w|´1 {2˘a nd´G rJs riis¯´1 " O`|w| 1{2˘w ith high probability. Let i ‰ j P I 1 , using (3.8) and the above diagonal estimates, we get that where, as in the proof of Lemma 4.4, we use Lemmas 3.5 and 3.6 to obtain that

Proof of the strong enterywise local law
In this section, we finish the proof of the (strong) entrywise local law in Theorem 2.18 on domain D and under the condition |w| 1{2`| z| 2 ě c. In Lemma 4.5, we have proved an error estimate of the self-consistent equations of m 1,2 linearly in Ψ θ . The core part of the proof is to improve this estimate to quadratic in Ψ θ . For the sequence of random variables Z ris , we define the averaged quantities The following Lemma is an improvement of Lemma 4.5.
Lemma 4.8. Fix |z| 2 ď 1´τ . Then for w P D, and Υpw, m 1 q ă |w| 1{2 Ψ 2 θ`} rZs}`}xZy}. For w P D L , and Υpw, m 1 q ă pN ηq´1`}rZs}`}xZy}. Proof. The proof is almost the same as the one in Lemma 4.5, we only lay out the difference. We first consider the case w P D. By Proposition 4.1, the event Ξ holds with high probability. Hence without loss of generality, we may assume Ξ holds throughout the proof. Using (3.9), we get (4.57) By Proposition 4.1, (3.31) and (4.51), we have By Lemma 3.7, it is easy to verify that Then following the arguments in Lemma 4.5, we can obtain the desired result on Ξ. For w P D L , the proof is similar by using (4.4).
In the following lemma we prove stronger bounds on rZs and xZy by keeping track of the cancellation effects due to the average over the index i. The proof is given in Appendix B.
Now we finish the proof of the entrywise local law and averaged local law on the domain D. By Proposition 4.1, we can take in Lemma 4.9 Then using the stability Lemma 3.10, Here if ? κ`η ě plog N q´1, we use κ`η ď plog N q´1, we have Impm 1c`m2c q " Op ? κ`ηq, which also gives that We then use (4.53) to get that . (4.60) Repeating the previous steps with the new estimate (4.60), we get the bound after l iterations. This implies the averaged local law θ ă pN ηq´1 since l can be arbitrarily large. Finally as in (4.49) and (4.51), we have for i ‰ j This concludes the entrywise local law and averaged local law in Theorem 2.18 when |w| 1{2`| z| 2 " 1. When w P D L , we have proved the entrywise law (see the remark after (4.28)). Also we can prove a similar result as Lemma 4.9, which implies The averaged local law then follows from Lemma 3.10. We leave the details to the reader.

Proof of Theorem 2.18 when |z| and |w| are small
In the previous proof, we did not include the case where |w| 1{2`| z| 2 ď for some sufficiently small constant ą 0. The only reason is that Lemma 3.10 does not apply in this case. In this section, we deal with this problem. The main idea of this subsection is to use a different set of self-consistent equations, which has the desired stability when |w| and |z| are small. Multiplying (4.24) with |d i | 2 and summing over i, Recall that Σ :" DD : " D : D. We introduce a new matrix and defineG :"H´1. By Schur's complement formula, the upper left block ofG is and the lower right block is equal tõ Now we write m 1,2 in another way as We apply the arguments in the proof of Lemma 4.5 toH, and get that from which we get that Plugging this into (4.65), we get We take the equations in (4.62) and (4.67) as our new self-consistent equations, namely, 1pΞqf 1 pm 1 , m 2 q " 1pΞqOpΨ θ q, 1pΞqf 2 pm 1 , m 2 q " 1pΞqOpΨ θ q, (4.68) where (4.70) According to the following lemma, this system of self-consistent equations are stable when |w| and |z| 2 are small enough . Then there exists an ą 0 such that if |w| 1{2`| z| 2 ď , we have |u 1 pwq´m 1c pwq|`|u 2 pwq´m 2c pwq| ď Cδ, (4.71) for some constant C ą 0 independent of w, z and N .
Proof. The proof depends on the estimate of the Jacobian at pm 1c , m 2c q. By (3.26) and (A.35), where t 0 " pN´1 ř n i"1 l i {s i q´1. Then we can calculate that We can conclude the stability by expanding f 1,2 pu 1 , u 2 q around pm 1c , m 2c q and using a fixed point argument as in the proof of Lemma 3.10 in Section A.3.
With this stability lemma, we can repeat all the arguments in the previous subsections to prove the entrywise local law and averaged local law when |w| 1{2`| z| 2 ď .

Anisotropic local law when T is diagonal
In this section we prove the anisotropic local law in Theorem 2.18 when T is diagonal. The basic ideas of the proof follow from [4, section 5], and the core part of our proof is a novel way to perform the combinatorics. By the Definition 2.17 (ii) and the definition of matrix norm, it suffices to prove the following proposition for generalized entries of G.
The proof of Lemma 5.2 is based on the polynomialization method developed in [4, section 5]. Again we only give the proof for w P D. When w P D L , the proof is almost the same.

Rescaling and partition of indices
For our purpose, it is convenient to define the rescaled matrix for any J Ă I and |J| ď l for some fixed l. Consequently we define the control parameter Φ Φ " |w| 1{2 Ψ. (5.5) By the entrywise law, for w P D, under the above scaling. Now to prove Lemma 5.2, it is equivalent to prove We expand the product in (5.7) ašˇˇˇˇÿ i‰j u : Formally, we regard ti 1 , ..., i p , j 1 , ..., j p u as the set of 2p (index) variables that take values in I 1 . Let B p be the collection of all partitions of ti 1 , ..., i p , j 1 , ..., j p u such that i k , j k are not in the same block for all k " 1, ..., p. For Γ P B p , let npΓq be the number of its blocks and define a set of I 1 -valued variables as LpΓq :" tb 1 , ..., b npΓq u.
Now it is convenient to regard Γ as a symbol-to-symbol function such that each Γ´1 pb k q is a block of the partition. Then we can rewrite the sum ašˇˇˇˇÿ where Σ˚denote the summation subject to the condition that the values of b 1 , ...b n are ordered as b 1 ă b 2 ă . . . ă b n . We pick one term from the above summation and denote ∆pΓq :" Notations: For any b k P L, we can define a corresponding I 2 -valued variableb k in the obvious way, and we denote rLs :" tb 1 , ..., b n , b 1 , ..., b n u.
For notational convenience, we will also use letters i, j, k, l to denote the symbols in L.

String and string operators
During the proof we will frequently use the following resolvent identities for rescaled matrix R. In this section, we expand the R variables in ∆pΓq using the identities in Lemma 5.3. During the expansion, we need to distinguish carefully between an algebraic expression and its values as a random variable.
Definition 5.4 (Strings). Let A be an alphabet containing all symbols that may appear during the expansion, such as R rJs rijs ,´R rJs rijs¯´1 , S rijs , u : ris and v rjs for i, j, J Ă LpΓq. We define a string s to be a formal expression consisting of the symbols from A, and denote by s the random variable represented by it. Let M be the collection of all possible strings. We denote an empty string by H.
Given a string s, after an expansion of R's in it, we will get a different string s 1 . However they represent the same random variable s " s 1 . During the proof, we will identify more elements of A (see the symbols in (5.32)).
To perform the expansions in a systematical way, we define the following operators acting on strings. We call the symbols R rJs rijs ,´R rJs rijs¯´1 to be maximally expanded if J Y ti, ju " L. We call a string s to be maximally expanded if all the R symbols in s is maximally expanded. Definition 5.6. Define the function F d´max : M Ñ N (where the subscript "d-max" stands for "distance to being maximally expanded") through F d´max´R rJs rijs˚¯" |Lz pJ Y ti, juq| , where˚could be 1 or´1, and Define another function F off : M Ñ N with F off pΩq being the number of off-diagonal symbols in Ω.
By off-diagonal symbols, we mean the terms of the form A st with s R tt,tu or A rijs with i ‰ j, e.g. R rJs rijs and S rijs with i ‰ j. Later we will define other types of off-diagonal symbols (see (5.32)). Note that a R symbol is maximally expanded if and only if F d´max pRq " 0 and a string Ω is maximally expanded if and only if F d´max pΩq " 0. The next two lemmas are almost trivial by Definition 5.5. For ρ, we have

20)
where a is the number of maximally expanded off-diagonal R's in Ω.

Expansion of the strings
For simplicity of notations, throughout the rest of this section we omit the complex conjugates on the right hand side of (5.11) (if we keep the complex conjugates, the proof is the same but with slightly heavier notations). Suppose the right hand side of (5.11) is represented by a string Ω ∆ . Given a binary word w " a 1 a 2 ...a m with a i P t0, 1u, we define the operation where b qn`r :" b r (recall (5.8)) for any 1 ď r ď n and q P N. So a binary words w uniquely determines an operator composition. By (5.17), pΩ ∆ q w0 ` pΩ ∆ q w1 " pΩ ∆ q w and so we get ÿ |w|"m pΩ ∆ q w " Ω ∆ for any m ě 1, where |w| is the length of w.
Proof. We use m 0 to denote the number of 0's in w, and m 1 to denote the number of 1's. Furthermore, we use m Using F d´max pΩ ∆ q " np, we get a rough estimate m p1q 0`m 1 ă npp`6l 0 q. By pigeonhole principle, there are at least n 0's in a row in w that correspond to trivial τ 0 's. This indicates that pΩ ∆ q w is maximally expanded, which gives a contradiction.
This lemma shows that all the strings with sufficiently many off-diagonal symbols contributes at most Φ p . It only remains to handle the maximally expanded strings. Define a diagonal symbol as and for the error term, Φ by (4.15) and the averaged local law. Now for all maximally expanded pΩ ∆ q w with |w| " pn 21 qpp`6l 0 q, denote by σ pΩ ∆ q w the expression after plugging in (5.26) and (5.27) without the tail terms. Similar to Lemma 5.10, we have ÿ ΓPBpÿ b l PI1, l"1,...,npΓqˇE ÿ |w|"pn 2`1 qpp`6l0q, pΩ∆qw maximally expanded` pΩ ∆pΓq q w ´σ pΩ ∆pΓq q w ˘ˇˇˇˇˇˇˇď C p,ζ Φ p .
From the above bound and Lemmas 5.9, 5.10, we see that to prove (5.7), it suffices to show ÿ ΓPBpÿ b l PI1, l"1,...,npΓqˇE ÿ |w|"pn 2`1 qpp`6l0q, pΩ∆qw maximally expanded σ pΩ ∆pΓq q w ˇˇˇˇˇˇˇˇď C p,ζ Φ p . (5.28) We write σ pΩ ∆ q w as a sum of monomials in terms of S rijs , where i is an index to label these monomials. Notice that after plugging (5.29) into (5.28), the number of summands M pw, ∆pΓq, iq inside the expectation only depends on p and ζ. Thus to show (5.28), it suffices to prove the following lemma.
Lemma 5.11. Fix any Γ P B p and binary word w with |w| " pn 2`1 qpp`6l 0 q. Suppose pΩ ∆ q w is maximally expanded. Let M pw, ∆pΓqq be an monomial in σ pΩ ∆pΓq q w . We have for some constant C p,ζ that only depends on p and ζ.
For the rest of this section, we fix a Γ P B p and a maximally expanded pΩ ∆pΓq q w with |w| " pn 2`1 qpp`6l 0 q. Then we fix a monomial M pw, ∆pΓqq in σ pΩ ∆pΓq q w . Let Ω M be the string form of M pw, ∆pΓqq in terms of S rijs . It is not hard to see that Now we decompose S rijs as where we define the following symbols in A: where i is an index to label these monomials. Again it is not hard to see that Proof. By Cauchy-Schwarz inequality, Then using h " Hence it suffices to prove (5.40). The key is to extract the N´h {2 factor from E Qpw, ∆pΓqq. For this purpose, we need to keep track of the indices in L during the expansion.
Definition 5.14. Define a function F in : LˆM Ñ N with F in pl, Ωq giving the number of times l orl appears as an index of off-diagonal R or S in Ω.
The following lemma follows immediately from Definition 5.5 and the expansions we have done to obtain Ω Q from pΩ ∆ q w .  Let Ω X Q be the substring of Ω Q containing only S X symbols, and Ω R Q be the substring of Ω Q containing only S R symbols. Define V :" tl P L| F in pl, Ω ∆ q " 1u, (5.44) and V 0 :" tl P L| F in pl, Ω ∆ q " 1 and F in pl, Ω X Q q " 0u, (5.45) V 1 :" tl P L| F in pl, Ω ∆ q " 1 and F in pl, Ω X Q q ě 2u. Let n X be the number of off-diagonal S X symbols in Ω X Q and n R be the number of off-diagonal S R symbols in Ω R Q . Notice that n o :" n X`nR is the total number of off-diagonal symbols in Ω Q .

Introduction of graphs and conclusion of the proof
We introduce the graphs to conclude the proof of (5.40). We use a connected graph to represent the string Ω Q , call it by G Q0 . The indices in rLs are represented by black nodes in G Q0 . The S X st or S R st symbols in Ω Q are represented by edges connecting the nodes s and t. We also define colors for the nodes and edges, where the color set for nodes is tblack, whiteu and the color set for edges is tS X , S R , X, Ru. In G Q0 , all the nodes are black, all S X edges are assigned S X color and all S R edges are assigned S R color. We show a possible graph in Fig. 3. In this subsection, we identify an index with its node representation, and a symbol with its edge representation.
Definition 5.16. Define function deg on the nodes set rLs, where degplq is the number of S R edges connecting to the node l.
By Lemma 5.15, we see that for any l P V 0 , F in pl, Ω Q q " degplq`degplq " 1 pmod 2q. (5.47) Now we expand the S R edges. Take the S R ij edge as an example (recall (5.34)). We replace the S R ij edge with an R-group, defined as following. We add two white colored nodes to represent the summation indicesk, l R rLs, two X-colored edges to represent X ik and X lj , and a R-colored edge connectingk and l to representˆ0 R rLs kl 0 0˙. We call the subgraph consisting of the three new edges and their nodes an R-group. If i " j, we call it a diagonal R-group; otherwise, call it an off-diagonal R-group. We expand all S R edges in G Q0 into R-groups and call the resulting graph G Q1 . For example, after expanding the S R edges in Fig. 3, we get the graph in Fig. 4. In the graph G Q1 , the R edges, X edges and S X edges are mutually independent, since the R symbols are maximally expanded, and the white nodes are different from the black nodes. Notice that each white node represents a summation index. As we have done for the black nodes, we first partition the white nodes into blocks and then assign values to the blocks when doing the Figure 4: The resulting graph G Q1 after expanding each S R in Fig. 3 into R-groups.
summation. Let W be the set of all white nodes in G Q1 , and let W be the collection of all partitions of W . Fix a partition γ P W and denote its blocks by W 1 , ..., W mpγq . If two white nodes of some off-diagonal R-group happen to lie in the same block, then we merge the two nodes into one diamond white node (Fig. 5a). All the other white nodes are called normal (Fig. 5b). Let n pdq R be the number of diamond nodes (ď the number of diagonal R-edges in G Q1 ). Then we trivially have # of white nodes "´n  . WLOG, we assume these nodes are b 1 , ..., b |V0| . To have nonzero expectation, each white block must contain at least two white nodes. Therefore for each k " 1, ..., |V 0 |, there exists a block connecting to b k which contains at least 3 white nodes. Call such a block W pb k q, and denote by Apb k q the set of the adjacent white nodes to b k in W pb k q. (Note that the W pb k q's or Apb k q's are not necessarily distinct.) WLOG, let W 1 , ..., W d be the distinct blocks among all W pb k q's. Define V 00 :" tb k | Apb k q has no normal white nodes, 1 ď k ď |V 0 |u, and V 01 :" tb k | Apb k q has at least one normal white node, 1 ď k ď |V 0 |u.
The following lemma gives the key estimates we need.
Notice for b k P V 0 , Apb k q contains at least three diamond white nodes, while each of the white node is share by another b l . Thus we trivially have |V 00 | ď n pdq R . Now we prove (5.50). A diamond white node is connected to two black nodes and a normal white node is connected to one black node. Hence a diamond white node belongs to two sets Apb k1 q, Apb k2 q, and a normal white node belongs to exactly one set Apb k q. Therefore for each i " 1, ..., d, if W i contains exactly one Apb k q then Otherwise if W i contains more than one Apb k q, then Here the first inequality can be understood as following. For each black node b k with Apb k q Ď W i , we count the number of white nodes in Apb k q and add them together. During the counting, we assign weight-1 to a normal white node and weight-1{2 to a diamond white node (since it is shared by two different black nodes). If b k P V 00 , there are at least three diamond white nodes in Apb k q with total weight ě 3{2; if b k P V 01 , there are at least one normal white node and two other white nodes in Apb k q with total weight ě 2. Thus ř b k :Apb k qĎWi`2¨1 V01 pb k q`3 2¨1 V00 pb k q˘is smaller than the number of white nodes in W i . Then summing |W i | over i, we get For the other m´d blocks, each of them contains at least two white nodes. Therefore where we use (5.49) in the last step. This proves (5.50). For b k P V 00 , Apb k q contains at least three white nodes from off-diagonal R-groups, Recall (5.41)-(5.42), only τ pkq 1 may increase F in . Thus w contains τ pb k q 1 for each b k P V 1 Y V 2 (recall the definition of V 1 in (5.46)). Therefore by (5.22), (5.37) and the fact that V 00 and V 1 are disjoint, This proves the first inequality of (5.51).
By (2.3) and (5.6), a diagonal R edge contributes 1, an off-diagonal R edge contributes Φ, and S X or X edge contributes N´1 {2 . Denote Then using Lemma 2.21, we get where in the third step we used (5.50), in the fourth step h " |V| " |V 1 |`|V 00 |`|V 01 |, in the fifth step N´1 {2 ď Φ and (5.51), and in the last step (5.51). Thus we have proved (5.40), which concludes the proof of Proposition 5.1.

Anisotropic local law: self-consistent comparison
In this section we prove Theorem 2.19. We first prove the anisotropic and averaged local laws under the vanishing third moment assumption (2.23). When η ě N´1 {2`ζ |m 2c |´1, the anisotropic and averaged local laws can be established without assuming (2.23). For convenience, we only consider the case w P D and |z| 2 ď 1´τ in this section. The proof for other cases is almost the same. Notice we have } r Π} " Op1q by (3.31). By the remark around (2.50), if X " X Gauss is Gaussian, then (6.3) holds. Hence for a general X, it suffices to prove that } GpX, wq´GpX Gauss , wq} ă Φpwq. (6.5) Similar to Lemma 3.5, it is easy to prove the following estimates for G.
. v i is the i-th column vector of V 1 . Let u P R I1 and w P R I2 , then we have for some constant C ą 0,

Self-consistent comparison
Our proof basically follows the arguments in [24,Section 7] with some minor modifications. Thus we will not write down all the details for the proof. By polarization, it suffices to show the following proposition. uniformly in w P S and any deterministic unit vectors v P C I .
We first assume that (2.23) holds. Then we will show how to modify the arguments to prove the η ě N´1 {2`ζ |m 2c |´1 case. The proof consists of a bootstrap argument from larger scales to smaller scales in multiplicative increments of N´δ, where δ Pˆ0, ζ 2C 0˙, (6.11) with C 0 ą 0 being a universal constant that will be chosen large enough in the proof. For any η ě |m 1c |´1 N´1`ζ, we define η l :" ηN δl for l " 0, ..., L´1, η L :" 1. (6.12) where L " Lpηq :" max l P N| ηN δpl´1q ă 1 ( . Note that L ď 2δ´1. By (3.13), the function w Þ Ñ Gpwq´r Πpwq is Lipschitz continuous in S with Lipschitz constant bounded by CN 3 . Thus to prove (6.10) for all w P S, it suffices to show (6.10) holds for all w in some discrete but sufficiently dense subset p S Ă S. We will use the following discretized domain p S.
The bootstrapping is formulated in terms of two scale-dependent properties (A m ) and (C m ) defined on the subsets p S m :" It is trivial to see that property pA 0 q holds. Moreover, it is easy to observe the following result. Proof. This result follows from (3.33).
The key step is the following induction result.
Lemma 6.5. For any 1 ď m ď 2δ´1, property pA m´1 q implies property pC m q.
Combining Lemmas 6.4 and 6.5, we conclude that (6.14) holds for all w P p S. Since δ can be chosen arbitrarily small under the condition (6.11), we conclude that (6.10) holds for all w P p S, and Proposition 6.2 follows. What remains now is the proof of Lemma 6.5. Denote F v pX, wq "ˇˇG vv pX, wq´r Π vv pwqˇˇ.
(6.15) By Markov's inequality, it suffices to prove the following lemma.
Lemma 6.6. Fix p P 2N and m ď 2δ´1. Suppose that the assumptions of Proposition 6.2, (2.23) and property pA m´1 q hold. Then we have for all w P p S m and all deterministic unit vector v.
In the following, we prove Lemma 6.6. First, in order to make use of the assumption pA m´1 q, which has spectral parameters in p S m´1 , to get some estimates for spectral parameters in p S m , we shall use the following rough bounds for G xy . Lemma 6.7. For any w " E`iη P S and x, y P C I , we havěˇˇG xy pwq´r Π xy pwqˇˇăN 2δ x 2˙a nd y "ˆy 1 y 2˙f or x 1 , y 1 P C I1 and x 2 , y 2 P C I2 .
Proof. The proof is similar to the one for [24, Lemma 7.12].
Lemma 6.8. Suppose pA m´1 q holds, then and for all w P p S m and all deterministic unit vector v Proof. Let w " E`iη P p S m . Then E`iη l P p S m´1 for l " 1, . . . , Lpηq, and (6.13) gives Im G vv pwq ă 1. The estimate (6.17) now follows immediately from Lemma 6.7. To prove (6.18), we remark that if spwq is the Stieltjes transform of any positive integrable function on R, the map η Þ Ñ ηIm spE`iηq is nondecreasing and the map η Þ Ñ η´1Im spE`iηq is nonincreasing. We apply them to |w|´1 {2 Im G vv pE`iηq and Im m 1,2c pE`iηq to get for w 1 " E`iη 1 P p S m´1 , where we use Φpwq :" |w| 1{2 Ψpwq and the fact that η Þ Ñ ΨpE`iηq is nonincreasing, which is clear from the definition (2.45).
Now we apply the self-consistent comparison method presented in [24, Section 7] to prove Lemma 6.6. To organize the proof, we divide it into two small subsections.

Interpolation and expansion
Definition 6.9 (Interpolating matrices). Introduce the notation X 0 :" X Gauss and X 1 :" X. Let ρ 0 iµ and ρ 1 iµ be the laws of X 0 iµ and X 1 iµ , respectively, for i P I M 1 and µ P I 2 . For θ P r0, 1s, we define the interpolated law ρ θ iµ :" p1´θqρ 0 iµ`θ ρ 1 iµ . We shall work on the probability space consisting of triples pX 0 , X θ , X 1 q of independent I M 1ˆI2 random matrices, where the matrix X θ " pX θ iµ q has law For λ P R, i P I M 1 and µ P I 2 , we define the matrix X θ,λ piµq through We also introduce the matrices G θ pwq :" G`X θ , w˘, G θ,λ piµq pwq :" G´X θ,λ piµq , w¯, according to (6.2) and the Definition 2.11.
We shall prove Lemma 6.6 through interpolation matrices X θ between X 0 and X 1 . It holds for X 0 by the the anisotropic law (6.3) (see the remark above (6.5)). Lemma 6.10. Lemma 6.6 holds if X " X 0 .
Using (6.19) and fundamental calculus, we get the following basic interpolation formula.
provided all the expectations exists.
We shall apply Lemma 6.11 with F pXq " F p v pX, wq for F v pX, wq defined in (6.15). The main work is devoted to prove the following self-consistent estimate for the right-hand side of (6.20). Lemma 6.12. Fix p P 2N and m ď 2δ´1. Suppose (2.23) and pA m´1 q holds, then we have for all θ P r0, 1s, all w P p S m , and all deterministic unit vector v.
Combining Lemmas 6.10, 6.11 and 6.12 with a Grönwall argument, we can conclude the proof of Lemma 6.6 and hence Proposition 6.2.
In order to prove Lemma 6.12, we compare X θ,X 0 iµ piµq and X θ,X 1 iµ piµq via a common X θ,0 piµq , i.e. under the assumptions of Lemma 6.12, we will prove for all u P t0, 1u, all θ P r0, 1s, all w P p S m , and all deterministic unit vector v. Underlying the proof of (6.22) is an expansion approach which we will describe below. Throughout the rest of the proof, we suppose that pA m´1 q holds. Also the rest of the proof is performed at a single w P p S m . Define the IˆI matrix ∆ λ piµq through ∆ λ piµq¯s t :" λδ is δ µt`λ δ it δ µs . (6.23) Then we have for any λ, λ 1 P R and K P N, The following result provides a priori bounds for the entries of G θ,λ piµq .
Lemma 6.13. Suppose that y is a random variable satisfying |y| ă N´1 {2 . Then for all i P I M 1 and µ P I 2 . Proof. See [24,Lemma 7.14].
In the following, for simplicity of notations we introduce f piµq pλq :" F p v pX θ,λ piµq q. We use f pnq piµq to denote the n-th derivative of f piµq . By Lemma 6.13 and expansion (6.24) we get the following result. provided C 0 is chosen large enough in (6.11). Therefore we have for u P t0, 1u, where we used that X u iµ has vanishing first and third moments and its variance is 1{N . Thus to show (6.22), we only need to prove for n " 4, 5, ..., 4p, where we have used (2.3). In order to get a self-consistent estimate in terms of the matrix X θ on the right-hand side of (6.28), we want to replace X θ,0 holds for n " 4, ..., 4p, Then (6.28) holds for n " 4, ..., 4p.
Proof. From (6.27) we can get The result follows by repeatedly applying (6.30). The details can be found in [24,Lemma 7.16].

Conclusion of the proof with words
What remains now is to prove (6.29). In order to exploit the detailed structure of the derivatives on the left-hand side of (6.29), we introduce the following algebraic objects.
Definition 6.16 (Words). Given i P I M 1 and µ P I 2 . Let W be the set of words of even length in two letters ti, µu. We denote the length of a word w P W by 2npwq with npwq P N. We use bold symbols to denote the letters of words. For instance, w " t 1 s 2 t 2 s 3¨¨¨tn s n`1 denotes a word of length 2n. Define W n :" tw P W : npwq " nu to be the set of words of length 2n. We require that each word w P W n satisfies that t l s l`1 P tiµ, µiu for all 1 ď l ď n.
Next we assign each letter˚its value r˚s through ris :" v i , rµs :" µ, where v i P C I1 is defined in Lemma 6.1 and is regarded as a summation index. Note that it is important to distinguish the abstract letter from its value, which is a summation index. Finally, to each word w we assign a random variable A v,i,µ pwq as follows. If npwq " 0 we define If npwq ě 1, say w " t 1 s 2 t 2 s 3¨¨¨tn s n`1 , we define Notice the words are constructed such that, by (6.24), for n " 0, 1, 2, . . ., which gives that B BX iµ˙n F p v pXq " p´αq n n! ÿ n1`¨¨¨`np"n p{2 ź r"1 1 n r !n r`p{2 !¨ÿ wrPWn r ÿ w r`p{2 PWn r`p{2 A v,i,µ pw r qA v,i,µ pw r`p{2 q‚.
Then to prove (6.29), it suffices to show that for 4 ď n ď 4p and all words w 1 , ..., w p P W satisfying npw 1 q`¨¨¨`npw p q " n. To avoid the unimportant notational complications coming from the complex conjugates, we in fact prove that 33) and the proof of p6.32q is essentially the same but with slightly heavier notations. Treating empty words separately, we find it suffices to prove for 4 ď n ď 4p, 1 ď q ď p, and w r such that npw 0 q " 0, ř r npw r q " n and npw r q ě 1 for r ě 1. To estimate (6.34) we introduce the quantity R s :" |G vvs |`|G vsv |. for s P I, where as a convention we let v µ " e µ for µ P I 2 .
Lemma 6.17. For w P W we have the rough bound Furthermore, for npwq ě 1 we have For npwq " 1 we have better bound Proof. (6.36) follows immediately from the rough bound (6.17) and definition (6.31). For (6.37) we break A v,i,µ pwq into G vrt1s pG rs2srt2s¨¨¨Grsnsrtns q 1{2 times pG rs2srt2s¨¨¨Grsnsrtns q 1{2 G rsn`1s v and use Cauchy-Schwarz inequality. (6.38) follows from the constraint t 1 ‰ s 2 in the definition (6.31).
By pigeonhole principle, if n ď 2q´2 there exists at least two words w r with npw r q " 1. Therefore by Lemma 6.17 we havěˇˇˇˇA (6.39) Then by Lemma 6.1, where in the second step we used the two bounds in Lemma 6.8, |w|´1 {2 η " Op|w|Im m 1c q by Lemma 3.7, and in the last step the definition of Φ. Using the same method we can get Plugging (6.40) and (6.41) into (6.39), we get that the left-hand side of (6.34) is bounded by Using Φ ě cN´1 {2 , we find that the left hand side of (6.34) is bounded by here we used that q ď n and n ě 4. Choose C 0 ě 25, then by (6.11) we have N C0δ{2`12δ ď N ζ{2 and hence N C0δ{2`12δ Φ ď 1. Moreover, if n ě 4 and n ě 2q´1, then n ě q`2. Therefore we conclude that the left-hand side of p6.34q is bounded by (6.42) Now (6.34) follows from Holder's inequality. This concludes the proof of (6.29), and hence of (6.22), and then of Lemma 6.5. This finishes the proof of Proposition 6.2 under the assumption (2.23).
In the rest of this section, we prove Proposition 6.2 when η ě N´1 {2`ζ |m 2c |´1. In this case, we can verify that Φ ď N´1 {4´ζ{2 . (6.43) Following the previous arguments, we see that it suffices to prove the estimate (6.29) for n " 3. In other words, we need to prove the following lemma.
Lemma 6.18. Fix 1 ď m ď 2δ´1 and p P 2N. Let w P p S m X p D (recall (2.44)) and suppose pA m´1 q holds. Then we have Proof. The main new ingredient of the proof is a further iteration step at a fixed w. Suppose G´Π " O ă pN 2δ φq (6.45) for some φ ď 1. By the a priori bound (6.17), (6.45) holds for φ " 1. Assuming (6.45), we shall prove a self-improving bound of the form Once (6.46) is proved, we can use it iteratively to get an increasingly accurate bound for the left hand side of (6.14). After each step, we obtain a better a priori bound (6.45) where φ is reduced by N´ζ {4 . Hence after Opζ´1q iterations we can get (6.44). As in Section 6.1.2, to prove (6.46) it suffice to show which follows from Each of the three cases q " 1, 2, 3 can be proved as in [24,Lemma 12.7], and we leave the details to the reader. This concludes Lemma 6.18.

Averaged local law for T X
In this section we prove the averaged local law in Theorem 2.19. Again for convenience, we only consider the case w P D and |z| 2 ď 1´τ . First we assume (2.23) holds. The anisotropic local law proved in the previous section gives a good a priori bound. In analogy to (6.15), we define r F pX, wq : " |w| 1{2 |m 2 pwq´m 2c pwq| "ˇˇˇˇ1 N ÿ νPI2 G νν pwq´|w| 1{2 m 2c pwqˇˇˇˇ.
Since Φ 2 " Op|w| 1{2 {pN ηqq, it suffices to show that r F ă Φ 2 . Following the argument in Section 6.1, analogous to (6.29), we only need to prove that for all n " 4, ..., 4p. Here δ ą 0 is an arbitrary positive constant. Analogously to (6.33), it suffices to prove that for n " 4, ..., 4p, for ř r npw r q " n. The only difference in the definition of A v,i,µ pwq is that when npwq " 0, we define Similar to (6.35) we define R ν,s :" |G νvs |`|G vsν |. (6.51) By the anisotropic local law, G´r Π " O ă pΦq. Hence combining with Lemma 6.1 and (3.33), we get Using the anisotropic local law again, we get G " O ă p1q. Then we havěˇˇˇˇ1 ν,µ˘ă Φ 2 for npwq ě 1. (6.53) Following (6.53), for n ě 4, the left-hand side of (6.50) is bounded by Applying Holder's inequality, we conclude the proof. Then we prove the averaged local law when η ě N´1 {2`ζ |m 2c |´1. It suffices to prove Analogous to (6.50), it is reduced to show that where q is the number of words with nonzero length. Again we can prove the three cases q " 1, 2, 3 as in [24,Lemma 12.8], and we leave the details to the reader. This concludes the averaged law.
A Properties of ρ 1,2c and Stability of (2.11) A.1 Proof of Lemma 2.3 and Proposition 2.14 We now prove Lemma 2.3. First is a technical lemma for f defined in (2.15).
Lemma A.1. For w ą 0 and |z| ą 0, f can be written as where we have the following estimates for the poles and the coefficients, and Proof. The proof is based on basic algebraic arguments. Let It is easy to verify that ∆ " 18ps i`| z| 2 qw|z| 6`4 ps i`| z| 2 q 3 |z| 4`p s i`| z| 2 q 2 w|z| 4`4 w 2 |z| 6´2 7w|z| 8 ą 0.
Thus p i has three distinct real roots. By the form of p i , we see that there are two positive roots and one negative root, call them a i ą b i ą 0 ą´c i . Now we perform the partial fraction expansion for the rational functions in (2.15), wm 3´p s i`| z| 2 qm 2´? w|z| 2 m`|z| 4 " We take s i " 0 in p i and call the resulting polynomial as which has roots m "˘|z|, |z| 2 { ? w. By (2.7), we have p 1 ă p 2 ă . . . ă p n ă p 0 for all m ‰ 0. Comparing the graphs of p i 's (as cubic functions of m) for 0 ď i ď n, we get that maxˆ|z|, |z| 2 ? w˙ă a n ă a n´1 ă . . . ă a 1 , 0 ă b 1 ă b 2  Thus we get (A.3). By these bounds, we see that a 2 i´| z| 2 ą 0, b 2 i´| z| 2 ă 0 and´c 2 i`| z| 2 ą 0, which, by (A.7), give that A 1 i ą 0, B 1 i ą 0 and C 1 i ą 0. Plugging (A.6) into f , we get immediately ? wm 3´p s i`| z| 2 qm 2´? w|z| 2 m, which has roots m " 0, ps i`| z| 2 q˘aps i`| z| 2 q 2`4 w|z| 2 2 ? w .
Since p 1 i ă p i for all m, we get and From (A.9) and (A.11), we get (A.4). Then we compare p i with p 2 i :" ? wm 3´p s i`| z| 2 qm 2 , which has roots w " 0, ps i`| z| 2 q{ ? w. Notice wpb i`ci q ď 2 s i`| z| 2`? w|z| w|z| , from which we get that (A.14) In (A.1), it is sometimes convenient to reorder the terms and rename the constants to write f as where all the constants Ck and Cĺ are positive, and we choose the order such that Clearly, f is smooth on the 3n`1 open intervals of R defined by I´n :" p´8,´y n q, I´k :" p´y k`1 ,´y k q pk " 1, . . . , n´1q, I 0 :" p´y 1 , x 1 q, I k :" px k , x k`1 q pk " 1, . . . , 2n´1q, I 2n :" px 2n ,`8q.
Next, we introduce the multiset C of critical points of f (as a function of m), using the conventions that a nondegenerate critical point is counted once and a degenerated critical point twice. First we will prove the following elementary lemma about the structure of C (see Fig. 6 and 7).
Proof. We omit the dependence of f on w for now. By (A.15) we have We see that f 2 is decreasing on all the intervals I k for k "´n`1, . . . , 2n´1. Thus there is at most one point m P I k such that f 2 pmq " 0. We conclude that f has at most two critical points on I k . By the boundary conditions of f 1 on BI k , we get |C X I k | P t0, 2u for k "´n`1, . . . , 2n´1. For m ă´y n , we have f 2 pmq ă 0, while for m ą x 2n , we have f 2 pmq ą 0. By the boundary conditions of f 1 on BI´n and BI 2n , we see that f 1 decreases from 1 to´8 when m increases from´8 to´y n , while f 1 increases from´8 to 1 when m increases from x 2n to`8. Hence we conclude that each of the intervals p´8,´y n q and px 2n ,`8q contains a unique critical point in it, i.e. |C X I´n| " |C X I 2n | " 1.
From this lemma, we deduce that |C| " 2p is even. We denote by z 2p the critical point in I´n, z 1 the critical point in I 2n , and z 2 ě . . . ě z 2p´1 the 2p´2 critical points in I´n`1 Y . . . Y I 2n´1 . For k " 1, . . . , 2p, we define the critical values h k :" f pz k q. The next lemma is crucial in establishing the basic properties of ρ 1c (see e.g. Fig. 6). Proof. Notice for the equation (2.14), if we multiply both sides with the product of all denominators in f , we get a polynomial equation P w pmq " 0 with P w being a polynomial of degree 3n`1. An immediate consequence is that for any fixed w ą 0 and E P R, f p ? w, mq " E can have at most 3n`1 roots in m. This fact is useful in the proof of this lemma and Lemma 2.3.
For i "´n, . . . , 2n, define the subset J i pwq :" tm P I i : B m f p ? w, mq ą 0u. From Lemma A.2, we deduce that if i "´n`1, . . . , 2n´1, then J i ‰ H if and only if I i contains two distinct critical points of f , in which case J i is an interval. Moreover, we have J´n " p´8, z 2p q and J 2n " pz 1 ,`8q. Next, we observe that for any´n ď i ă j ď 2n, we have f pJ i q X f pJ j q " H. Otherwise if there were E P f pJ i q X f pJ j q, we would have |tx : f pxq " Eu| ą 3n`1. We hence conclude that the sets f pJ i q,´n ď i ď 2n can be strictly ordered. The claim h 1 ě h 2 ě . . . ě h 2p is now reformulated as f pJ i q ă f pJ j q whenever i ă j and J i , J j ‰ H. (A.17) To prove (A.17), we use a continuity argument. Let t P p0, 1s and introduce It is easy to check (A.17) holds for small enough t ą 0. We claim that This is trivial for i "´n, 2n. Recall that for´n`1 ď i ď 2n´1, J t i ‰ H is equivalent to I i containing two distinct critical points. Moreover, B t B m f t pmq ă 0 in I´n`1 Y . . . Y I 2n´1 , from which we deduce that the number of distinct critical points in each I i , i "´n`1, . . . , 2n´1, does not decreases as t decreases. This proves (A.18).
Next, suppose that there exist i ă j such that J i , J j ‰ H and f pJ i q ą f pJ j q. From (A.18), we deduce that J t i , J t j ‰ H for all t P p0, 1s. By a simple continuity argument, we get that f t pJ t i q ą f t pJ t j q for all t P p0, 1s. However, this is impossible for small enough t as explained before (A.18). This concludes the proof of (A.17).
To prove the second statement of Lemma A.3, we only need to show that h 1 ď C 0 pτ´1|w|´1 {2| z|q´?w and h 2p ě´C 0 pτ´1|w|´1 {2`| z|q´?w for some absolute constant C 0 . We only give the proof for h 1 ; the proof for h 2p is similar. At z 1 , we have f pz 1 q`?w ď pz 1`yn q where we use Now we would like to estimate z 1`yn . Again using (A.19), we have that Cĺ pz 1´x2n q 2 ě 1.
Using the above estimates and (A.2)-(A.4), we obtain that for some constant C 0 ą 0 that does not depend on τ .  Figure 6: The graphs of f p ? w, mq for the example from Figure 1, i.e. ρ Σ " 0.5δ ? 2{17`0 .5δ 4 ? 2{17 . We take |z| " 1.5, and w " 10 and 0.01 in the upper and lower graphs, respectively. In the lower graph, we only plot the five branches near m " 0. The remaining two branches are far away.
Proof of Lemma 2.3. Let Jpwq :" Ť 2n i"´n J i pwq. Given w ą 0 such that 0 P f pJpwqq, then the set tm P R : f p ? w, mq " 0u has 3n`1 points. Since f p ? w, mq " 0 has at most 3n`1 solutions in m, we deduce that m c pwq is real and hence m 1c pwq is also real. Since m 1c is the Stieltjes transform of ρ 1c , we conclude that w R supp ρ 1c . On the other hand, suppose w ą 0 and 0 R f pJpwqq. Then the set of preimages tm P R : f p ? w, mq " 0u " tm P R : P w pmq " 0u has 3n´1 points. Since P w pmq is a degree 3n`1 polynomial with real coefficients, we conclude that P w has a unique root with positive imaginary part. By the uniqueness of the solution of P w`iη in C`(Lemma 2.2) and the continuity of the roots of P w`iη in η, we conclude that Im m c pwq ą 0 and Im m 1c pwq ą 0 by taking η OE 0, i.e. w P supp ρ 1c . In sum, we get supp ρ 1c " tw ą 0 : 0 R f pJpwqqu. (A.20) From Lemma A.3, we see that there exists an absolute constant C 1 ą 0 such that if w ě C 1 τ´1, then h 1 pωq ď C 0 pτ´1|w|´1 {2`| z|q´?w ă 0. Hence fix w ě C 1 τ´1, we have 0 P f pJ 2n pwqq and  Figure 7: The graphs of f p ? w, mq for the example from Figure 1, i.e. ρ Σ " 0.5δ ? 2{17`0 .5δ 4 ? 2{17 . We take |z| " 0.5, and w " 6 and 0.01 in the upper and lower graphs, respectively. In the lower graph, we only plot the five branches near m " 0. The remaining two branches are far away.
w R supp ρ 1c (see the upper graphs in Fig. 6 and 7). This shows that ρ 1c is compactly supported in r0, C 1 τ´1s. Now we decrease w so that w ă s 1`| z| 2`1 , then using (A.2), By continuity, there must be some 0 ă w ă Cτ´1 such that 0 R f pJpwqq. Thus supp ρ 1c ‰ H. By (A.20), it is not hard to see that supp ρ 1c is a disjoint union of (countably many) closed intervals, where C 1 τ´1 ě e 1 ě e 2 ě . . .. Furthermore, for e i to be a boundary point, we must have that 0 is a critical value of f p ? e i , mq, i.e. there is a unique critical point m " m c pe i q such that w, mq with order 3n`1 and 6n, respectively. By Bézout's theorem, there are at most finitely many solutions to (A.22). Hence there are finitely many e i 's, call them e 1 ě e 2 ě . . . ě e 2L , where L " Lpnq P N.
To prove the statement about e 2L , we use Lemma A.4 below. This concludes Lemma 2.3.
Proof. By this lemma, the behavior of the leftmost edge e 2L changes essentially when z crosses the unit circle. From the following proof, we see that the singularity happens at |z| 2 " N´1 ř n i"1 l i s i . Thus the fact that the singular circle has radius 1 comes from our normalization (2.5) for T .
We first study equation (2.14) when w OE 0 in the case 1`τ ď |z| 2 ď 1`τ´1. We calculate the derivative of f as It is easy to see that J 0 ‰ H for all w ą 0, since B m f p ? w, 0q " 1´|z|´2 ą 0 (see the lower graph in Fig. 6). Call the end points of J 0 as z k pwq ą 0 and z k`1 pwq ă 0. By the definition of I 0 , we have z k ă b 1 ă |z|. Suppose z k " op|z|q as w Ñ 0, then (A.23) gives that 0 " 1´|z|´2`op1q, which gives a contradiction. Thus z k " |z| as w Ñ 0. Now using B m f p ? w, z k q " 0, we can estimate that for some C ą 0 independent of w, where in the second step we use that ? wz 3 k´p s i`| z| 2 qz 2 k´? w|z| 2 z k`| z| 4 ą 0, and ? wz 3 k´p s i`| z| 2 qz 2 k´? w|z| 2 z k ă 0 which come from that 0 ă z k ă b i for all 1 ď i ď n. By (A.24), we can find small enough such that f p ? w, z k q ą 0 for all 0 ă w ď . In this case 0 P f pJ 0 pwqq and hence w R supp ρ 1c . In fact, it is not hard to see that there is a solution m 0 " ? w|z| 2 {p|z| 2´1 q`op ? wq P I 0 such that f p ? w, m 0 q " 0 and B m f p ? w, m 0 q ą 0. This proves the first statement of Lemma A.4. Now we study equation (2.14) when |z| 2 ď 1´τ and w Ñ 0. For later purpose, we allow w to be complex and prove a more general result than what we need for this lemma. Let w " 0 in the equation (2.14), we get m " 0 or It is easy to see that g is smooth and decreasing on the intervals defined through |z| 4 s i`| z| 2˙p i " 2, . . . , nq, K n`1 :"ˆ| z| 4 s n`| z| 2 , 8˙.
By the boundary values of g on these intervals, we see that gpxq has exactly one zero on intervals K i for i " 1, . . . , n, and has no zero on K n`1 . Since gpxq " 0 is equivalent to a polynomial equation of order n, it has at most n solutions. We conclude that all of its solutions are real. Obviously the zeros on the intervals K i are positive for i " 2, . . . , n. Now we study the zero on K 1 . Observe that gp0q " 1´|z|´2 ă 0 (as |z| 2 ď 1´τ ), the zero on K 1 is negative, call it´t. Moreover, we can verify that gp´τ´1q ą 0 by (A.26), so t ă τ´1. If |z| 2 ě τ {2, then by the concavity of g on the K 1 , we get It is easy to see that there exists constants c 1 , τ 1 ą 0 such thaťˇ´p First we consider the case |z| ě ą 0.
To prove Proposition 2.14, we need the following lemma, which is a consequence of the edge regularity conditions (2.18) and (2.19).
Proof. Denote m k :" m c pe k q and let w Ñ e k . Notice by Lemma 2.3, if e k ‰ 0, we have ď e k ď Cτ´1. (A.42) Then we expand f around p ? e k , m k q to get that where by (A.31), and by (A.1), w´?e k | " |w´e k | and |m c pwq´m k | " |m 1c pwq´m 1c pe k q|, which proves the first part of the lemma. By (A.48), if w is real and |w´e k | ď τ 1 , we have that Thus on a sufficiently small interval U " re k´δ , e k`δ s, m c pwq has positive imaginary part for w on one side of e k and m c pwq is real for w on the other side. Hence U does not contain another edge. This shows that min l‰k |e l´ek | ě δ.
Proof of Proposition 2.14. The properties of ρ 1c have been proved in Lemmas 2.3, A.4 and A.5, and included in the Definition 2.4. Since supp ρ 2c " supp ρ 1c by the discussions after Lemma 2.2, we immediately get property (i) for ρ 2c . The conclusion ρ 2c being a probability measure is due to the definition of m 2 in (2.34) and the fact that m 2c is the almost sure limit of m 2 . The properties (ii) and (iv) for ρ 2c can be easily obtained by plugging m 1c into (2.9). To prove the property (iii) for ρ 2c , we need to know the behavior of Im m 2c pwq when w Ñ e j along the real line. By (2.9), it suffices to prove that if |x´e j | ď τ 1 for some small enough τ 1 ą 0, theňˇ´w p1`m 1c q 2`| z| 2ˇ"ˇm2 c´| z| 2ˇě for some constant ą 0. Suppose thatˇˇm 2 c pwq´|z| 2ˇ" op1q. Plugging m c into B m f p ? w, m c q in (A.23), and using condition (2.18)  A.2 Proof of Lemmas 3.7 and 3.8 We first prove Lemma 3.7. We consider the five cases separately. (A.52) By the regularity condition of Definition 2.4 (ii), we get immediately Im m 1c " 1. Since Im m 1c ď |1`m 1c | ď C by Proposition 2.15, we get |1`m 1c | " 1. Notice wm 1c can be expressed as By the same argument as above and using the fact that x ě τ 1 for x P re 2k`τ 1 , e 2k´1´τ 1 s, we get Impwm 1c q " Im ż R xρ 1c px, zq x´w dx " 1.
Since the imaginary parts of´w and |z| 2 {p1`m 1c q are both negative, we get Using the bounds for m 1c and Im m 1c proved above, it is easy to seěˇˇˇ´w p1`m 1c q`| z| 2 1`m 1cˇ" Op1q. (A.54) Equations (A.53) and (A.54) together give that Im m 2c " 1 and |m 2c | " 1. Similarly, we can also prove that nd Impwm 2c q " 1. Now (3.29) follows from Im˜w`s i wm 2c´| z| 2 1`m 1c¸ě s i Impwm 2c q.
(i) Suppose E " 1. We shall prove that min i t|m c pwq´a i pwq|, |m c pwq´b i pwq|, |m c pwq`c i pwq|u ě 1 , (A.55) for some constant 1 . This leads immediately to (3.29) sincěˇˇˇˇw˜1`s (A.56) For p i " ? Em 3´p s i`| z| 2 qm 2´? E|z| 2 m`|z| 4 , it is not hard to prove that its roots a i pEq, b i pEq and´c i pEq decrease as E increase. Since E R supp ρ 1c , we have m 1c pEq P R and So m 1c pEq (and hence m c pEq) increases as E increases. If e k is the smallest edge that is bigger than E, then for a i pEq bigger than m c pEq, we have that a i pEq´m c pEq ě a i pe k q´m c pe k q` pτ 1 q ě pτ 1 q, (A.57) by using |E´e k | ě τ 1 (see (2.42)). On the other hand, If e k´1 is the largest edge value that is smaller than E, then for a i pEq smaller than m c pEq, we have that m c pEq´a i pEq ě m c pe k´1 q´a i pe k´1 q` pτ 1 q ě pτ 1 q. for E P pe 2k´1 , e 2k q for some k. Now we are only left with the case E ă e 2L , the rightmost edge, when |z| 2 ě 1`τ . In this case, we have seen that 0 ă m c pEq ă b i pEq for all i in the proof of Lemma A.4. Thus we can use (A.57) to get lower bounds for |m c pEq´a i pEq| and |m c pEq´b i pEq|. Since c i pEq " 1 in this case (e.g. by (A.4) and using E, |z| " 1), |m c pEq`c i pEq| ě is trivial. Again we get the estimate (A.59).
Then we consider w " E`iη with η ď c 1 . First it is easy to check that a i pE`iηq, b i pE`iηq and c i pE`iηq are continuous in η. On the other hand for m c pE`iηq, we have by the condition distpE, supp ρ 1c q ě τ 1 . Thus we immediately get |m c pE`iηq´m c pEq| " Opηq.
Hence as long as c 1 is small enough, (A.55) is true, which further gives (3.29).
Finally we have |m 2c | " 1 for w P D o and η ď c 1 by Proposition 2.15.
Case 3: For regular edge e k ‰ 0, we always have e k ě for some ą 0 by Lemma A.4. Thus we always have |w| " 1 for w " E`iη P D e k pζ, τ 1 , N q as long as τ 1 is sufficiently small. If η " 1, then ?
We still need to prove the estimates for Im with C k ą 0, C k " 1, |D k | " Op|w´e k |q and Im D k " η. Then for E ě e k , we have Im m c pE`iηq " Impκ`iηq 1{2`O pηq " η ?
If k is even, the proof is the same except that in this case m c pwq´m k " C k pwqpe k´w q 1{2`D k pwq. For m 1c pwq and m 2c pwq, we get the conclusion by noticing w « e k and Im m 1c " Im´w´1 {2 m c¯" Im m c pwq, Im m 2c " Im where u :"ˆU : 0 0 U :˙v , u ris :"ˆu i uī˙.
To control Im Π vv , it is enough to bound @ u ris , π risc u ris D for each i. We first consider Cases 1-4 of Lemma 3.7. By the definition of π risc in (2.32), we get Im π ii,c " |u i | 2 Im where in the second step we use (3.29) and |1`m 1c | " |w|´1 {2 . In the first three cases of Lemma 3.7, we have |w| " 1 and Im w " OpIm m 1c q, which give that Im π ii,c ď CImpm 1c`m2c q. In case 4 of Lemma 3.7, we use |Im w|`|Re w|`|1`m 1c |´2 " Op|w|q and Im m 1,2c " |w|´1 {2 to get that Im π ii,c ď CImpm 1c`m2c q. Similarly we have the bound Im πīī ,c ď CImpm 1c`m2c q. Finally we can estimate the following term using similar methods, ď CRe pū i uīzq Impm 1c`m2c q ď C`|u i | 2`| uī| 2˘I mpm 1c`m2c q.

A.3 Proof of Lemma 3.10 and Lemma 2.2
We first prove Lemma 3.10. During the proof, we also use the following equivalent definition of the stability expressed in terms of m " ? wp1`m 1 q, u " ? wp1`u 1 q and f p ? w, mq. Suppose the assumptions in Definition 3.9 holds. Let w P D and suppose that for all w 1 P Lpwq we have |f p ? w, uq| ď |w| 1{2 δpwq. Then Case 1: We take over the notations in Definition 3.9 and abbreviate R :" f p ? w, uq, so that |R| ď |w| 1{2 δ. Then we write the equation f p ? w, uq´f p ? w, m c q " R as where using (A.1), α and β can be expressed as We shall prove that |α|`|B u α| ď C, |β| " 1, for some 1 ą 0. Using (A.67) and (A.68), we get immediately that |α|`|B u α|`|β| ď C. What remains is the proof of the lower bound |β| ě c. If Im w ě for some constant ą 0, the lower bound follows from Lemma A.6 below. If Im w ď for a sufficiently small , the lower bound follows from Lemma A.7 below. Now given the bound (A.66), it is easy to prove (A.62) with a fixed point argument. This proves the stability of (3.34) Lemma A.6. Suppose that Im w " 1 and |m c | " Im m c " 1.
Then |B m f p ? w, m c q| ě c for some constant c ą 0.
Proof. Using (2.13), m c " ? wp1`m 1c q and the conditions Im w " 1, Im m c " 1, we can get thaťˇˇˇB where we use the equation f p ? w, m c q " 0 in the derivation. By our assumption, the left-hand side of (A.72) can be arbitrarily small. For the right-hand side of (A.72), we have |m c | " 1 and | ?
wm c´| z| 2 | " 1 (because Im p ? wm c q " Im pw`wm 1c q " 1). Thus if |m c´i |z|| ě c 1 for some constant c 1 ą 0, we have |m 2`| z| 2 | " 1, anďˇˇˇp ? wm c´| z| 2 q|z| 2 m c b´1 2 pm 2 c´| z| 2 qpm c a´?wbqˇˇˇˇ" 1, which gives a contradiction. Thus we must have a lower bound |B m f p ? w, m c q| ě c if |m´i|z|| ě c 1 . We still need to deal with the case where |m c´i |z|| ď c 1 for some sufficiently small c 1 . Notice |z| " 1 in this case. Then we have Denote L i :" ps i`| z| 2 q|z| 2`| z| 4´2 i ? w|z| 3 . Since i ? w " ipx`iyq " ix´y with x, y ą 0 and x, y " 1, we have Re L i ą 0, Im L i ă 0 and |Re L i |, |Im L i | " 1. Furthermore, Im L 2 i ă 0 and |Im L 2 i | " 1. Thus each fraction 4|z| 4 {L 2 i in (A.73) has positive imaginary part and all the imaginary have order 1. ThereforeˇˇˇˇB Then by (A.69), we get that |B m f p ? w, i|z|q| ě c for some c ą 0. Using (3.29), it is easy to see that B m f p ? w, m c q " B m f p ? w, i|z|q`Op|m c´i |z||q.
Thus in the case |m c´i |z|| Ñ 0, we still can find c ą 0 such that |B m f p ? w, m c q| ě c.
Lemma A.7. Suppose that w P D b k and Im w ď . Then for sufficiently small ą 0, we have |B m f p ? w, m c q| " 1.
Proof. By We look at, for example, the term where m c´ai :" |m c´ai |e iθi . Using Im m c " 1, it is easy to see that Rep1´e´2 iθi q ě c 1 for some constant c 1 ą 0. Applying the same estimates to the B, C terms in (A.76), we geťˇˇB where we use (3.29). Combing with (A.77), we see that |B m f pw, m c pwqq| " 1 for small enough .

Case 2:
We mimic the argument in the proof of Case 1. We see that it suffices to prove |α|`|B u α| ď C and |β| " 1 for α, β defined in (A.64) and (A.65) and |u´m c | ď plog N q´1 {3 . Using (3.29), it is not hard to prove that |α|`|B u α|`|β| ď C. What remains is the proof of the lower bound |β| ě c. For the case Im w " 1, it follows from Lemma A.6. If w Ñ 0 in the case |z| 2 ě 1`τ , then m c pwq " Op ? wq Ñ 0 by (3.23). Thus we can use (A.23) to get directly that B m f p ? w, m c q " 1´|z|´2`Op ? wq ě c.
Case 3: The case Im w ě τ 1 can be proved with the same method as in the proof of case 1. Hence we only consider the case |w´e k | ď 2τ 1 in the following. Note that |w| " 1 in this case. Suppose Thus we conclude for small enough τ 1 that |β| " |w´e k | 1{2 " ? κ`η.
With the estimate (A.79), we now proceed exactly as in the proof of [4,Lemma 4.5], by solving the quadratic equation (A.63) for u´m c explicitly. We select the correct solution by a continuity argument using that (A.62) holds by assumption at z`iN´1 0 . The second assumption of (A.78) is obtained by continuity from the estimate on |u´m c | at the neighboring point z`iN´1 0 . We refer to [4,Lemma 4.5] for the full details. This concludes the proof.

Case 4:
The case when Im w ě τ 1 can be proved using the same method as in the proof of Case 1. Now we are left with the case |w| ď 2τ 1 for some sufficiently small τ 1 . First we assume |z| ě c ą 0 for some small c ą 0. Then mimicking the argument in the proof of Case 1, we see that it suffices to prove |α|`|B u α| ď C and |β| " 1 when |u´m c | ď plog N q´1 {3 . Using (3.29), it is not hard to prove that |α|`|B u α|`|β| ď C. The lower bound |β| ě c can be obtained easily from (A.32).
It is also easy to verify that By (3.29), we find that B t f p ? E, m c pEq, 0q " Op1q, while by (A.66), |B m f p ? E, m c pEq, 0q| " β " 1. Thus B t m c pE, 0q " Op1q. A simple extension of this argument shows that m c pE, tq " m c pEq`Optq and hence Im m c pE, tq is bounded from below by some c 1 " c 1 pτ, τ 1 q. Thus we conclude that if Definition 2.4 (ii) holds for some ρ Σ , then it holds for all ρ Σ,t with t in some fixed small interval around zero. Obviously, the above arguments also work for the perturbation of |z|.

B Proof of Lemma 4.9
Our proof of (4.59) is an extension of [4,Lemma 4.9], [7,Lemma 7.3] and [14,Theorem 4.7]. Here we only prove the bound for }rZs}. The proof for }xZy} is exactly the same. For i P I 1 , we define P i " E ris and Q i " 1´P i . Recall that Z ris " Q i G´1 riis , we need to prove that for w P D. For J Ă I, we define π rJs ris by replacing m 1,2 in (2.36) with m rJs 1,2 defined in (4.6). As in (4.58), we can prove that |m Thus if we abbreviate B i :" |w| 1{2 Q i´π ris ris G´1 riis π ris ris¯, it suffices to prove that B :" N´1 ř i B i ă Φ 2 o . We estimate B by bounding the p-th moment of its norm by Φ 2p o for p " 2n with n P N, i.e. E}B} p ă Φ 2p o . The lemma then follows from the Chebyshev's inequality. Using }KK : } " }K} 2 for any square matrix K, we get that for p " 2n, TrpBB : q n ě › › BB : › › n " }B} 2n .
Thus it suffices to prove that This estimate can be proved with the same method in [14,Appendix B], with the only complication being that π ris is random and depends on i. In principle, this can be handle by using (3.9) and (3.10) to put any indices j, k, ... P I 1 (that we wish to include) into the superscripts of π ris . This leads to a minor modification of the proof in [14,Appendix B]. Here we describe the basic ideas of the proof, without writing down all the details. The proof is based on a decomposition of the space of random variables using P s and Q s . It is evident that P s and Q s are projections, P s`Qs " 1 and all of these projections commute with each other. For a set J Ă I, we denote P J :" ś sPJ P s and Q J :" ś sPJ Q s . Let p " 2n and introduce the shorthand notationB ks :" B ks for s ď p odd andB ks :" B : ks for s ď p even. Then we get Introducing the notations k " pk 1 , k 2 , . . . , k p q and tku " tk 1 , k 2 , . . . , k p u, we can write We take the t " 3 as an example to describe the ideas for the proof of (B.6). Using (3.9), we get π r1s r1s " π r12s r1s`| w| 1{2 r1s 11 π