Local law for the product of independent non-Hermitian matrices with independent entries

We consider products of independent square non-Hermitian random matrices. More precisely, let X(1),...,X(n) be random matrices with independent entries (real or complex with independent real and imaginary parts) with zero mean and variance 1/N. Soshnikov and O'Rourke showed that the empirical spectral distribution of the product X(1)X(2)..X(n) converges to the n-th power of the circular law. We prove that if the entries of the matrices X(1),...,X(n) satisfy uniform subexponential decay condition, then in the bulk the convergence of the ESD holds up to the optimal scale.


Introduction
In this paper we study the spectrum of the product of non-Hermitian random matrices with independent entries.
The study of the spectrum of non-Hermitian random matrices dates back to 1965, when Ginibre [10] calculated the joint density function for the eigenvalues of N × N non-symmetric random matrix with independent standard Gaussian entries (Ginibre ensembles). The similar result for the product of independent complex Ginibre matrices was obtained in [3] by Akemann and Burda. One crucial property of random matrices with Gaussian entries is the determinantal structure, using which exact formulas for many important parameters that characterise the distribution of the eigenvalues (such as k-point correlation functions) can be obtained. If the entries of the matrix are not Gaussian, then we usually do not have exact formulas for the distribution of eigenvalues for finite N . Nevertherless, in many cases as N goes to infinity the spectrum of the models with non-Gaussian coefficients behaves similarly to the Ginibre case. This is known as universality phenomena. The aim of this article is to show that universality holds for certain local properties of the products of non-Hermitian random matrices. We now give a brief review of some known universality results for non-Hermitian random matrices.
Global regime. It can be shown using the exact formula for the eigenvalue density from [10], that the empirical spectral measure defined on the eigenvalues of the Ginibre ensemble with entries normalised to have variance N −1 converges weakly to the uniform distribution on the unit disk. The corresponding universality result, known as the Circular Law theorem and proven in a series of papers between 1985 and 2010 (see [19] for the final version), states that if the entries of the matrix are independent with zero mean and variance N −1 , then the empirical spectral distribution (ESD) converges weakly to the uniform distribution on the unit disk. The global regime for the products was studied in [11] and [15] by Götze-Tikhomirov and O'Rourke-Soshnikov, who established that the ESD of the product of n independent non-Hermitian random matrices with normalised entries converges weakly to the nth power of the circular law. Note, that in [15] an additional 2 + ε-moment assumption was used.
Intermediate and local regimes. Global regime deals with weak covergence, which considers the convergence on the subsets containing cN eigenvalues for some c ≥ 0. In other words, we normalize the eigenvalues to have the limiting ESD with compact support. On the other hand, if we change the normalization of the matrix in such a way, that for any compact set K the number of eigenvalues situated in K is much smaller than N , we enter the mesoscopic or itermediate regime. The smallest scale on which we can expect the linear statistics to have a deterministic behaviour in the limit can be obtained by multiplying the matrix by √ N . In this microscopic regime each compact set in the bulk contains only a finite number of eigenvalues.
There has been a remarkable progress recently in the study of universality in the intermediate and local regimes for non-Hermitian matrices. In [20] Tao and Vu proved universality for the k-point correlation functions (see [20] or [16] for definition) under the assumptions that the distributions of the entries of the matrix have exponentially vanishing tails and first four moments matching the moments of Gaussian random variable with zero mean and variance N −1 . The last assumption is crucial for [20], as the approach of Tao and Vu relies on the 4th moment comparison theorem.
In a series of papers [7], [8] and [21] Bourgade, Yau and Yin proved the universality of the local law up to the optimal scale (which can be interpreted as the universality of the 1-point correlation function) without imposing the 4th moment matching condition. The goal of our article is to show a similar result for a product of independent non-Hermitian matrices. We now introduce some basic objects and fix the notation, that will allow us to state precisely both theorems.
Let X 1 , . . . , X n be independent N × N matrices, X a = ( a x ij ) 1≤i,j≤N , with independent entries (real or complex with independent real and imaginary parts) having zero mean, variance N −1 and satisfying the uniform subexponential decay condition ∃θ > 0, such that max 1≤a≤n max 1≤i,j≤N Let f : C → R + be a smooth non-negative function with compact support, such that f ∞ ≤ C, f ≤ N C for some constant C > 0. For any d ∈ R + and z 0 ∈ C we define a N −d -rescaling of f around z 0 by For two N -dependent random variables A N ∈ C and B N ∈ R + we say that A is stochastically dominated by B Theorem 1 (Bourgade-Yau-Yin). Let µ 1 , . . . , µ N be the eigenvalues of X 1 . Then for any d ∈ (0, 1/2], any τ > 0 and z 0 ∈ C with |z 0 | ≤ τ −1 where f z0 is the N −d -rescaling of f around z 0 .
Theorem 2. Let µ 1 , . . . , µ N be the eigenvalues of X 1 X 2 · · · X n . Then for any d ∈ (0, 1/2], any τ > 0 small enough and z 0 such that |z 0 | ≥ τ and |1 − |z 0 || ≥ τ where f z0 is the N −d -rescaling of f around z 0 . Remark 1. In the same manner as in [7], [8] and [21] we separate the study of the local law in the bulk and at the special points on the edge of the spectrum and at the origin. In the latter case the analysis of the stability of the self-consistent equations, which is crucial in our approach, cannot be fulfilled, therefore this case requires different tools (for example, "4th moment comparison"-type results) and is not considered in the present article.
Remark 2. Recently Ajanki, Erdös and Krüger proved that the local law holds up to the optimal scale for a very large class of Hermitian matrices (see [1] and [2]). Although the model considered in these two articles is very general, it does not contain the matrix (X − z) * (X − z) studied in the present article, and thus our result cannot be deduced directly from [1] and [2]. It would be interesting to know how the method of Ajanki, Erdös and Krüger can be adjusted in order to obtain the local law for the model considered in the present article.
Remark 3. Being itself an interesting mathematical problem, the local law on the optimal scale is an important step towards the proof of the universality of the k-point correlation functions. Both known techniques developed to show the local universality (i.e. either using the local relaxation flow or the 4th moment comparison theorem) rely on the initial estimates provided by the local law on the optimal scale. Therefore, one of the interesting application of the main result of the present article would be proving the universality of the k-point correlation functions for the products of non-Hermitian matrices.
Outline of the proof. We start with the linearization trick, that trasforms the problem about the eigenvalues of the product X 1 · · · X n into the study of the eigenvalues of a large block matrix X having X 1 , . . . , X n as blocks. This will allow us later to exploit the Schur's complement formula to analyse the resolvent matrix. We show that local law for the product is equivalent to the local circular law for the linerization matrix X. To study the non-Hermitian matrix X, we follow Girko's Hermitization techniques, which argues that it is enough to study the distribution of the singular values of the family of shifted matrices X − z, z ∈ C. Using the approach developed in [7] we show that our initial problem can be reduced to the estimating of the Stieltjes transform of the linearized Hermitized matrix (X − z) * (X − z), z ∈ C. In Sections 3 and 4 we fix the notation and introduce the tools will be used in the proof of the Stieltjes transform concentration. The last section is devoted to the study of the Stieltjes transform of the matrix (X −z) * (X −z). We adapt the argument of Bourgade-Yau-Yin [7] to make it applicable in our setting. The main difference compared to [7] and thus technical difficulty arises from the fact that we cannot work directly with the Stieltjes transform and have to study the concentration for its partial traces. Similar results but for different values of the resolvent parameter were obtained in [14]. Although the approach is similar to that used in [14], many important statements should be adjusted in order to obtain strong enough estimates on a set, which is sufficiently large to imply the rigidity of the singular values of X − z.

Reduction to the Stieltjes transform concentration
Linearisation. Following Burda, Janik and Waclaw [9] we introduce a block cyclic matrix The n-th power of matrix X is an nN × nN block-diagonal matrix with matrices X a+1 X a+2 · · · X a+n , a ∈ Z/nZ on the diagonal. The advantage of considering this matrix is that the entries of this matrix are independent with zero mean. Also, we can rewrite (6) in terms of the eigenvalues of X where we used a change of variable for the last term. Below we show that the stochastic domination of (8) by N −1+2d ∆f L1 is equivalent to the local circular law for the matrix X. But before that we use Girko's hermitization idea to transform the study of the non-Hermitian matrix X into the study of a family of Hermitian matrices, defined in the next section.
Hermitization. Girko's Hermitization technique relies on the Green's formula for a function with compact support.
Lemma 1 (Green's formula, [18]). Let f, g : C → R be twice continuously differentiable functions and let R ⊂ C be a bounded set with C 1 boundary ∂R. Let A(z) be Lebesgue measure on C. Then where ∂/∂n denotes differentiation in the direction of the inner normal of R, and ds indicates integration with respect ot the arc length of ∂R.
If we suppose that f has compact support, fixz ∈ C and take g = − log |z −z| then Letμ 1 , . . . ,μ nN denote the eigenvalues of X. Then using (10) we obtain Girko's hermitization formula wheref (z) = f z0 (z n ). Define Y z := X − z and let λ 1 < λ 2 < . . . < λ nN be the eigenvalues of Y * z Y z . Then We now show how the estimates of (8) can be obtained by studying the eigenvalues λ j (z). Let ν z be a family of the empirical measures on the squared singular values of the matrix X − z and let m(z, w) be the Stieltjes transform of ν z,N m(z, w) = The convergence of m(z, w) to a limiting function m c (z, w), as well as the weak convergence of ν z,N was shown in [14,Proposition 1]. Together with [5,Lemma 11.9], where the authors studies properties of the function m c (z, w), we have the following result. (2) Almost surely, ν z,N converges weakly to ν z uniformly in every bounded region of z.
is the solution of the equation that satisfies Im m c (z, w) > 0 if Im w > 0.
Next lemma shows how we can use the properties of ρ z to reduce (6) to the problem of the rigidity of the singular values λ j (z) around their classical locations.
be classical locations of the eigenvalues of Y * z Y z . Then for any ε > 0 for any ε > 0. Denotef (z) := f z0 (z n ). Then where in the first two steps we used (20) and (18), next we applied integration by parts and finaly we used (19). From the properties of Wirtinger derivatives we have the following lemma Using the change of variable ξ = N d (z n − z 0 ) and the above lemma we get that Therefore, to prove (6) it is enough to show that In order to obtain the estimate (22), we proceed as in [7], where the similar estimate was obtained for a matrix with iid entries. Firstly, we separate a relatively small number of the largest terms, that will be estimated by controlling the smallest singular values λ 1 (z) and the properties of ρ z . We shall need the following result proven in [15].
uniformly in z.
From the properties of ρ z (see [7], Proposition 3.1) we have that γ 1 ≥ CN −2 . Therefore we conclude that Define ϕ := (log N ) log log N . Note that ϕ is asymptotically smaller than N ε for any ε > 0. Then and it is enough to show that In [7] it was shown that (26) can be obtained from the concentration of the Stieltjes transform stated precisely in the following theorem.
Theorem 5. There exists δ > 0 andQ such that for any τ ≤ |z| ≤ 1 − τ or 1 + τ ≤ |z| ≤ τ −1 where We refer reader to the section 5 in [7] for the detailed proof. The rest of the article is devoted to the proof of Theorem 5.

Notations and definitions
We start by fixing the notation and giving necessary definitions. The main argument will follow the general framework proposed by Bourgade, Yau and Yin, therefore we try to keep our notation as close as possible to the notation used in [7].
Throughout the rest of the article a and b will be elements of Z/nZ. Let X a , a ∈ Z/nZ, be independent N × N matrices, entries of which have zero mean, variance N −1 and satisfy condition (2). Let X be defined by (7). For z ∈ C and w = E + √ −1η ∈ C + introduce the matrices We shall consider nN × nN matrices as consisting of N × N blocks indexed by (a, b). We shall use left superscript to specify the submatrix. For example, for i, j ∈ {1, . . . , N } We shall use the index a instead of aa for for elements of the diagonal blocks, for example, a G kl := G k(a),l(a) . The i(a)th rows of matrices X and Y z will be denoted by x i(a) and y i(a) respectively. The corresponding columns of these matrices will be denoted by x i(a) and y i(a) .
Note that all these matrices depend on z ∈ C and w ∈ C + .

Now we can introduce
where m c was defined in Theorem 3.
We shall use C and c to denote different constants, that do not depend on N , w or z.
For ζ > 0 we say that an event Ξ N holds with ζ-high probability, if and

Tools and Methods
This section collects some basic and classical tools which will be relevant towards the main result. Note that in Lemmas 6, 7, 8 and 9 we deal with objects introduced in Section 3, while in lemmas 10, 11 and 12 we recall properties of the function m c , which was introduced in Theorem 3.

Linear Algebra
Lemma 4. (Schur complement formula, [12, Section 0.7.3]) Let A be an invertible matrix and let B be its inverse.

Divide the matrices A and B into blocks
so that the blocks with the same index have the same size, and the blocks on the diagonal are square submatrices.
Lemma 5. Let A be a square matrix and let w be a complex number. If A * A − ω is invertible, then Proof. Follows from Woodbury matrix identity (see [12, Section 0.7.4]).
The same is true for G.
The lemma now follows from the relation Proof. With the notation used in the proof of Lemma 7, for any i ∈ 1, nN where u ij are the entries or the unitary matrix U . Therefore, the bound for G ii follows from the fact that The proof for G ii is similar.

McDiarmid's Concentration Inequality
if the vectors u and u differs only in kth coordinate. Then for any t ≥ 0

Abstract Decoupling Lemma
We use the notation Let Ξ be an event and p an even integer, which may depend on N . Suppose the following assumptions hold with some constants C 0 , c 0 > 0.
(i) There exist deterministic positive numbers X < 1 and Y such that for any set A ⊂ {1, 2, . . . , N } with i ∈ A and |A| ≤ p, Q A S i in Ξ can be written as the sum of two new random variables: and Then, under the assumptions (i), (ii), (iii) above, we have for some C > 0 and any sufficiently large N .

System of "self-consistent equations"
The aim of this section is to prove Theorem 8. We begin with three independent lemmas.
Define subsets of C Next two lemmas are technical results which provide means to make manipulations with functions that approximate m c (z, w) easier. Similar results but for different values of z and w were proven in [14,Lemmas 13 and 14]. By tracking the changes in the behaviour of m c in different regions of z and w (see lemmas 10 and 11), the arguments can be adapted without difficulties to the new setting, therefore we state these lemmas without proof.

Lemma 14.
There exist α > 0 small enough and C > 0, such that for any h i : Lemma 15. Let ζ > 0. Then there exists Q ζ > 0 such that for all sufficiently large N , for any T, U ⊂ {1, . . . , nN }, for any a ∈ Z/nZ and {i, j} ⊂ {1, . . . , N } such that {i(a), j(a)} ⊂ T (i = j is allowed), for z ∈ Z τ and w ∈ S z,1,0 with ζ-high probability and if i(a) / ∈ U, then The above result is valid if we take the matrix G (U,T) and rows y i(a) instead of G (T,U) and y i(a) . From now on we fix α as in Lemma 14 and Q ζ as in Lemma 15. To state and prove our next result we shall need some additional notation.
and we shall suppress the right superscript if T = ∅.
For any t > 0 define an N -dependent set We are now in position to prove the main result of this section. Although the proof of this theorem mimics the argument used in [14,Theorem 6], for reader's convenience we provide here a complete proof.
Theorem 8. For any ζ > 0 there existsQ ζ > 0 such that the following implication is true for all z ∈ Z τ and w ∈ S z,1, holds with ζ-high probability, then hold with ζ-high probability.
Proof. We begin with equation (61). Using (53) and taking the expectation with respect to y i(a) a G (i(a),∅) ii .
The i(a)th row and column of G (i(a),i(a)) are equal to zero by definition. Therefore ) kl x i(a)l(a+1 and from (57) Recall that by (32) and from (54) we get (61). We now apply (53) to [ a G ii ] −1 , take expectation with respect to the column y i(a) and use (61) We estimate a Z i using Lemma 15 and (61) as Then by (55) We conclude that

Now by (56)
and the equation (62) is proved.
If we sum the left-and right-hand sides of (62) over i ∈ {1, . . . , N } and divide by N , we get Using again (56) we have Equations (60), (63) and (65) can be proved in the same way. Theorem 8 is established.

Weak concentration
In this section we study the stability properties of the solutions of the system (64)-(65) and obtain the initial estimate for Λ. Although the derivation of the self-consistent equations (64)-(65) is similar to the case of one matrix considered in [7] or to the case of products of matrices but on the different sets of z and w (as in [14]), the analysis of this system in our setting requires much more technical efforts. This is due to the fact, that the matrix Γ defined below, that corresponds to the linearization of the system (64)-(65), is singular at λ ± . Therefore we need to study carefully the behaviour of Γ ∞ around these critical point.
As in [14] we start by linearizing the system (64)-(65). Suppose that condition (59) holds i.e., for all a ∈ Z/nZ After expanding the terms of the type ( a m G ) −1 or (1 + a m G ) −1 around (m c ) −1 or (1 + m c ) −1 respectively, we obtain the following system of linear equations with respect to ∆ a := ( a m G − m c ) and ∆ a := ( a m G − m c ) Recall, that m c satisfies the self-consistent equation (16). We end up with the following linear system We introduce the following notation: ∆ := (∆ 1 , . . . , ∆ n , ∆ 1 , . . . , ∆ n ) T , and where Thus we can rewrite the system (67)-(68) as We have the following proposition about the behaviour of the inverse of Γ, which is proven in the Appendix.
Proposition 1. There exist C,τ , ε > 0 such that the following holds Case 1: if |z| ≥ 1 + τ , |w − λ ± | ≤ τ and η ≥ ϕ C N |mc| , then Case 3: if τ ≤ |z| ≤ 1 − τ , |w| ≤τ and η ≥ ϕ C N |mc| , then Remark 4. From the proof of Proposition 1 we see that if z and w are close to the origin, then Γ −1 behaves like ( |w| + |z| 2 ) −1 . This singularity differs from the singularities that we obtain in the cases 1 and 2 of the above proposition, and the methods of the present article are not sufficient to study the stability of the system (64)-(65) in this case.
We now study the stability of the system (64)-(65). We show that for z ∈Z τ and w ∈ S z,δ,Q there exists a gap in the range of Λ that depends on the error term in (64)-(65). Similarly to the case of one matrix, we identify three regimes of the range separation. Note that near the points λ ± we need an estimate for the error term that is decreasing with respect to η, therefore, later we shall replace the random control parameter Ψ by a deterministic one.
Then there exists M > 0 big enough such that the following holds: Proof. Case 1. Suppose that |w − λ ± | ≥ M −1 . Then we are in one of the two last cases of Proposition 1. The condition τ ≤ |z| assures that in these cases the norm of the matrix Γ −1 is bounded. If we linearise the system (64)-(65) with an O Ψ error term up to the first order and divide the equations by |m c |, we obtain the following system Since Γ −1 ∞ is bounded, we deduce that for any 1 ≤ a ≤ n If which implies that Λ|m c | −1 = O Ψ |m c | −1 .
Case 3. From (129) in the Appendix we know that if w = λ ± , then 1 − g 1 + g 2 = 0. If we rewrite the system (67)-(68) using (37) and (43) we get for a ∈ Z/nZ Consider now the matrix Γ(λ ± ). We will show that rankΓ(λ ± ) = 2n − 1. From (129) it follows that l n (I − Γ 1 Γ T 1 ) (the nth eigenvalue of (I − Γ 1 Γ T 1 )(λ ± ) defined in (128)) is equal to zero. From the formula (128) we have that for Since g 1 ∼ g 2 ∼ 1 we deduce that (I − Γ 1 Γ T 1 )(λ ± ) has only one vanishing eigenvalue. Using the formula for the determinant of block matrices we have that Therefore, if l j is an eigenvalue of I − Γ 1 Γ T 1 , then 1 − 1 − l j and 1 + 1 − l j are eigenvalues of Γ. We conclude that Γ(λ ± ) has 2n − 1 non-zero eigenvalues and that rankΓ(λ ± ) = 2n − 1. The sum of each row or column of Γ(λ ± ) is equal to zero, which implies that KerΓ = {t(1, 1, . . . , 1) T , t ∈ R}. Suppose for simplicity that the lower right n − 1-minor of Γ(λ ± ) is invertible and denote this minor byΓ. Then gives a system of n linear equations with respect to∆ : can be solved, which implies that and also We now linearise the system (64)-(65) with an error term bounded byΨ up to the second order and expand the function m c around λ ± according to (37) and (43). We end up with the following system for a ∈ Z/nZ Adding all these equations we get an equation for the sum of squares of ∆ a and ∆ a where we used (129). Suppose that does not vanish. Then, using (87) we have that If Λ ≤ 2M Ψ and |w − λ ± | ≤ M 3/2Ψ , then and from (87) we have We conclude that Λ ≤ CM 7/8 Ψ ≤ M Ψ for M big enough. The last thing to show is that (88) is non-zero for any z ∈Z τ . This follows from (131) and (133) in the Appendix. The proposition is thus proven.
In the following proposition we estimate Λ in the case when η is of order O (1). The beginning of the proof is similar to the proof of [14,Lemma 17], but for the reader's convenience we provide this proof here with all the details.
Proof. First of all recall that by (32) Therefore, a m G and a m G as functions of the columns x k , 1 ≤ k ≤ nN satisfy the condition (48) for any w ∈ S z,δ,Q ζ ∩ {η = η 0 } and we can apply the McDiarmid's concentration inequality, so that and similarly for a m G . If we take t = c −1/2 ϕ ζ/2 N −1/2 in the above inequality we get that for any w ∈ S z,δ, LetX be a nN × nN random matrix having the same block structure as the matrix X but with iid non-zero entries. Suppose that a,a+1 X kl has the same distribution as 12 X 11 for all a ∈ Z/nZ and 1 ≤ k, l ≤ N . Denote bŷ G andĜ corresponding resolvent matrices. In was shown in [14,Lemma 3] that and from [15,Lemma 14] we know that Therefore, where we used that m G = m G and that by definition of Q ζ (see the proof of Lemma 15) Q ζ > ζ. With the same argument as in Theorem 8 we can thus show that with ζ-high probability But from the relations (32) and (93) we have and similarly a+1 m Repeating the proof of [7, Lemma 6.12] we can show that |1 + m G | is large enough with respect to the error term of order O ϕN −1/2 . Therefore we rewrite (94) as The entries of the resolvent matrix are bounded by η −1 . We end up with the following equation Now we can conclude as in [7, Lemma 6.12] that sup w∈S z,δ,Q ζ ∩{η=η0} with ζ-high probability. The result follows using (93). Now, following [7], we establish a preliminary estimate for Λ that shows that the bound (59) holds for any z ∈Z τ and w ∈ S z,δ,SQ ζ . We shall use Proposition 2 with the function asΨ.
Theorem 9. For any ζ > 0 and any z ∈Z τ sup w∈S z,δ,Q ζ Proof. Following the approach used by Bourgade, Yau and Yin in [7], we prove the theorem in two steps. Firstly we show that with ζ-high probability the bound holds for N −K -net inSQ ζ for K > 0 big enough. Next, we use the continuity properties of Λ to extend the result to the whole setSQ ζ . The first part is a bootstrapping type argument. We fix E and consider firstly η = O (1), for which (99) holds by Proposition 3. Then we show that if we decrease η then condition (59) still holds, and thus we can apply Proposition 2 to get a gap in the range of possible values of Λ. From the continuity properties of Λ we deduce that Λ stays below the gap and that the weak estimate (99) holds for this smaller choice of η. We continue these iterations as long as E + √ −1η stays in S z,δ,Q ζ . We now provide the detailed proof. Let K > 0 and let η > 0 such that E + √ −1(η − N −K ) ∈ S z,δ,Q ζ . As we fix z and E, we can introduce a simplified notation Λ(η) := Λ(z, E + √ −1η). Suppose firstly that we are not in the neighbourhood of λ ± and that Then Note that |w| −1/2 N η From the definition of S z,δ,Q ζ (see (28)) we have that on this set η ≥ N −2 . According to Lemma 9 if we take K > 0 big enough then there exists N 0 ∈ N such that for all N ≥ N 0 Moreover, from the boundedness of |m c | −1 ∼ |w| 1/2 and the fact that we obtain that By (103) we see that Proposition 2 can be applied to Λ(η − N −K ), and by (104) we see that Suppose now that we are close to λ ± and that we have Note that in this case |w| ∼ 1. Therefore, as in (101) we have that for N sufficiently large. Again, by Proposition 2 and continuity of Λ we get that We showed that there exists K > 0 such that if (99) holds for E + √ −1η ∈ S z,δ,Q ζ with ζ-high probability, then with ζ-high probability (99) holds for E + √ −1(η − N −K ) with the same constant in O ( ), as long as From Proposition 3 we know that (99) holds for any w ∈ S z,δ,Q ζ ∩ {η = ε}. Starting from w ∈ S z,δ,Q ζ ∩ Θ(K) which are close to {η = ε}, we can step by step decrease the imaginary part of w and show that the bound (99) holds for all w ∈ S z,δ,Q ζ with ζ-high probability. We can finish the proof by using Lemma 9 and continuity properties of m c to extend (99) to the whole set S z,δ,Q ζ .
The important consequence of Theorem 9 is that for z ∈ Z τ and w ∈ S z,δ,Q ζ with ζ-high probability all the approximate equations (60)-(65) hold. We can expand this set of relations by adding the approximate equations for the individual entries of the resolvent matrices for the minors.
For any ζ, τ > 0 for any z ∈ Z τ and w ∈S z,δ,Q ζ with ζ-high probability the following holds and similarly when changing the rôles of G, U and G, T.
Proof. We use a similar argument as in the proof of Corollary 1 in [14], that relies on the Schur complement formula, Theorem 9 and lemmas 13, 14, 15 and 6.

Strong concentration
In this section we finish the proof of Theorem 5. Note that due to Theorem 9 the initial bound for Λ (59) holds on the set S z,δ,Q ζ with ζ-high probability, and thus on this set Theorem 8 and Proposition 2 hold. Recall, that in the proof of Theorem 9 we used the stability of the system of approximate equations (64)-(65) to obtain the following estimates for Λ depending on the error term in the self-consistent equations: suppose that the error terms in (64)-(65) are bounded byΨ, which is a deterministic function strictly decreasing in η near the points w = λ ± ; suppose that Λ ≤Ψ for some η = O (1); then with ζ-high probability for any w ∈ S z,δ,Q ζ (i) if w is far enough from λ ± , then Λ ≤Ψ, (ii) if w is in the neighbourhood of λ ± , then Λ ≤ (Ψ) 1/2 . Therefore, improving the bound of the error terms in (64)-(65) will lead us to a better estimate of Λ.
Following the idea from [7], we can use Theorem 9 to obtain the system of second order self-consistent equation and similarly forẼ a . In the next two lemmas we show that E andΛ is some deterministic estimate for Λ satisfyingΛ ≤ α|m c |. The arguments in the lemmas 16 and 17 are similar to the proof of Lemma 18 in [14] and Lemma 7.3 in [7] respectively, therefore we provide here only a sketch proof, indicating the ideas that were used, but omitting the technical details.
Lemma 16. With ζ-high probability for any w ∈ S z,δ,Q ζ n a=1 Proof. Firstly, we use Lemma 8 to rewrite E (1) a orẼ (1) a in a form that is easy to bound using the estimates for the entries of the resolvent matrix obtained in Corollary 1. For example, and By Corollary 1 all the off-diagonal entries of G are bounded by ϕ 2Q ζ Ψ, and for any j ∈ 1, nN G jj ∼ m c . Therefore we deduce that with ζ-high probability.
To bound the second term in (112), we rewrite it using Lemma 8 as .
Proceeding as in the proof of Lemma 18 in [14] and using the estimates of Corollary 1, we can show that this term is bounded by ϕ 4Q ζ Ψ 2 |m c | −1 .
All the other estimates follow using a similar argument.
Lemma 17. Let ζ > 0. Suppose that for any w ∈ S z,δ,Q ζ Λ ≤Λ ≤ α|m c | (116) with probability at least 1 − e −p N , where ϕ ≤ p N ≤ ϕ 2ζ . Then with probability at least 1 − e −p N (log N ) −2 for any w ∈ S z,δ,Q ζ n a=1 Proof. The main tool in the proof of this lemma is the Abstract decoupling lemma (Theorem 7). We consider this lemma in the following setting: let I = 1, nN , , and consider the random variables S i = (w a G ii ) −1 . Then a Z i = Q i S i and it will be enough to show that the conditions (49)-(51) in the hypothesis of the Abstract decoupling lemma hold with X = ϕ 4Q ζ Ψ[Λ]|m c | −1 and Y = |m c |. Condition (50) can be verified using the uniform subexponential decay condition for the entries of the matrix X, and condition (51) holds by the assumption of the lemma. Therefore, we need to show that decomposition (49) holds for any subset A ⊂ 1, N with |A| ≤ p N . It was shown in [7] that the existence of such decomposition can be deduced from the following set of estimates holding with ζ-high probability for any w ∈ S z,δ,Q ζ : Note that (i) and (ii) follow from Theorem 9, (iii) follows from Lemma 15 and (iv) was proven in Corollary1. Therefore, we can repeat the argument used by Bourgade, Yau and Yin in the proof of the strong estimates of the Stieltjes transform of one non-Hermitian matrix to find the decomposition (49) in our setting. See [7,Section 7.2] for the detailed proof.
We can now finish the proof of Theorem 5. Consider first the case when w is far enough from λ ± . From Theorem 9 we know that the initial bound holds, therefore we can take Suppose that |w − λ ± | ≥ M −1 . Then from propositions 2 and 3 and lemmas 16 and 17 we get that In the case |w − λ ± | ≤ M −1 the iteration procedure used by Bourgade, Yau and Yin in [7] is applicable in our setting. Note that in this case |m c | ∼ 1 and Im m c ∼ |w − λ ± | 1/2 . The idea is that if, for example, But and thus with probability at least 1 − e −ϕ ζ (log N ) −2 . Note that (log N ) −1 factor in the exponent appears due to (51). If we repeat the above procedure with the error term Ψ[Λ 0 ], we shall get again a better estimate that holds with probability 1−e −ϕ ζ (log N ) −2 . It was shown in [7, Section 7.1] that if we iterate K := log log N/ log 2 times, we shall obtain that with probability at least 1 − e −ϕ ζ/2 . Note that a similar argument applies in the regime when |w − λ ± | ≤ M 3/2 ϕ 4Q ζ (Ψ[Λ k ]) for some k ∈ {0, 1, . . . , K}. Therefore, we deduce that (27) holds for all w ∈ S z,δ,Q ζ .

A Proof of Proposition 1
First of all, we can easily verify that if I − Γ 1 Γ T 1 is invertible, then where in the last equality we used that Γ T 1 Γ 1 = Γ 1 Γ T 1 = Circulant(g 2 1 + g 2 2 , −g 1 g 2 , 0, . . . , 0, −g 1 g 2 ). Therefore, Since g 1 ∼ g 2 ∼ 1, we get that all the non-zero entries of the matrix Γ are of order 1. Thus, we deduce that The following lemma allows us to calculate directly the eigenvalues of I − Γ 1 Γ T 1 and the entries of (I − Γ 1 Γ T 1 ) −1 .
This concludes the proof of (70).
Remark 5. If we take |z| = 1 + τ then the coefficient near w − λ + is bounded away from zero, while the coefficient near w − λ − is of order O (τ ) as τ → 0.
and this implies that Thus which concludes the proof of Case 3.
Using again the equation (16) we have Then (144) is equivalent to We now take imaginary part of the above equation. Note, that we consider the case η = 0, w = E.
If E ∈ (max{0, λ − }, λ + ), then Im m c > 0. Thus we can divide by Im m c and obtain Consider now the real part of (146) Together with (148) we obtain (148) and (150) give us 1 Therefore, we have rewritten equation (144) as a system (150)-(151). Together with the real and imaginary parts of (16) we obtain the following system of equations   Equations (e), (a) and (d) imply that d 1 and d 2 satisfy the equation while (e) and (b) give 3 From the definition of d 1 and d 2 we have that d 1 = d 2 + |z| 2 − 1. We end up with a following system We are now going to fix z that satisfies condition (i) or (ii), and we will consider d 2 as a function of E. We will show that for any E satisfying condition (i) or (ii) and for any ω ∈ [−1, 1] this system does not have a solution equal to |z| −2 + 1 − 2w. Case 1: 1 + τ ≤ |z| ≤ τ −1 . We fix z. Define following functions