Circular law for the sum of random permutation matrices

Let $P_n^1,\dots, P_n^d$ be $n\times n$ permutation matrices drawn independently and uniformly at random, and set $S_n^d:=\sum_{\ell=1}^d P_n^\ell$. We show that if $\log^{12}n/(\log \log n)^{4} \le d=O(n)$, then the empirical spectral distribution of $S_n^d/\sqrt{d}$ converges weakly to the circular law in probability as $n \to \infty$.


Introduction
For an n × n matrix M n let λ 1 (M n ), λ 2 (M n ), . . ., λ n (M n ) be its eigenvalues.We define the empirical spectral distribution (esd) of M n as follows: For a sequence of random probability measures {µ n } n∈N , supported on the complex plane, we say that µ n converges weakly to a limiting probability measure µ, in probability, if for every bounded continuous function f : C → R, in probability.If (1.1) holds almost surely we say that µ n converges weakly to µ, almost surely.We are concerned in this paper with the esd of certain random, non-normal matrices, defined as follows.For a positive integer n, let π i n , i = 1, 2, . . .denote i.i.d.permutations, distributed uniformly on the symmetric group S n .Let P i n denote the associated permutation matrices, i.e., P ℓ n (i, j) := I(π ℓ n (i) = j) for ℓ ∈ [d], i, j ∈ [n] where for any integer m we denote [m] := {1, 2, . . ., m}.For d an integer, define S d n as Note that S d n can be viewed as the adjacency matrix of a d-regular directed multigraph.For two sequences of positive reals {a n } and {b n } we say that a n = O(b n ) (or a n = o(b n )) if for some universal constant C, lim sup n→∞ a n /b n ≤ C (respectively, = 0).We say that a n = ω(b n ) if b n = o(a n ).The main result of this paper is the following theorem.
Theorem 1.1.If log 12 n/(log log n) 4 ≤ d = O(n) then the esd of S d n / √ d converges weakly to the uniform distribution on the unit disk in the complex plane, in probability, as n → ∞.
We refer to this result as the weak circular law for sums of permutations.Remark 1.2.One expects the conclusion of Theorem 1.1 to hold almost surely.However, the estimate on the smallest singular value of S d n / √ d − zI contained in Theorem 2.1 below is not sharp enough to allow for the application of the Borel-Cantelli lemma.On the other hand, other estimates in the paper, and in particular the concentration inequalities and the estimates on moderately small singular values, see Section 2 for definitions, are not an obstacle to the application of Borel-Cantelli.
Remark 1.3.Theorem 1.1 is established for d ≥ log 12 n/(log log n) 4 .One expects its conclusion to hold as soon as d = ω (1).Obvious obstacles to proving this by our methods are that the minimal singular value estimate, Theorem 2.1 below, requires d = ω(log 8 n) to be useful, and our loop equations main theorem, Theorem 2.6, is only effective when d grows like a power of log n.Proving Theorem 1.1 for d = ω(1) remains a major challenge and seems to require new ideas.It is possible that one could use the methods of [29] to relax the assumptions in Theorem 2.1 to d = ω(1).
1.1.Background: esd's for non-normal matrices.The study of the esd for random Hermitian matrices can be traced back to Wigner [42,43] who showed that the esd's of n × n Hermitian matrices with i.i.d.centered entries of variance 1/n (upper diagonal) satisfying appropriate moment bounds (e.g., Gaussian) converge to the semicircle distribution.The conditions on finiteness of moments were removed in subsequent work, see e.g.[5,34] and the references therein.We refer to the texts [30,21,39,3,5] for further background and a historical perspective.
Wigner's proof employed the method of moments: one notes that the moments of the semicircle law determine it, and then one computes by combinatorial means the expectation (and variance) of the trace of powers of the matrix.This method (as well as related methods based on evaluating the Stieltjes transform of the esd) fails for non-normal matrices since moments do not determine the esd.
An analogue of Wigner's semicircle law in the non-normal regime is the following circular law theorem: Circular law.Let M n be an n × n matrix with i.i.d.entries of zero mean and unit variance.Then the esd of M n / √ n converges to the uniform distribution on the unit disk on the complex plane.
The circular law was posed as a conjecture based on numerical evidence in the 1950's.For the case that the entries have a complex Gaussian distribution it can be derived from Ginibre's explicit formula for the joint density function of the eigenvalues [23,30].The case of real Gaussian entries, where a similar formula is available, was settled by Edelman [18].For the general case when there is no such formula, the problem remained open for a very long time.An approach to the problem, which eventually played an important role in the resolution of the conjecture, was suggested by Girko in the 1980's [24], but mathematically it contained significant gaps.The first non-Gaussian case (assuming existence of density for the entries) was rigorously treated by Bai [4], and after a series of partial results (see [12] and the references therein), the circular law conjecture was established in its full generality in the seminal work of Tao and Vu [41].
Theorem 1.4 (Circular law for i.i.d.entries [41,Theorem 1.10]).Let M n be an n × n random matrix whose entries are i.i.d.copies of a fixed (independent of n) complex random variable x with zero mean and unit variance.Then the esd of 1  √ n M n converges weakly to the uniform distribution on the unit disk on the complex plane, both in probability and in the almost sure sense.
A remarkable feature of Theorem 1.4 is its universality: the asymptotic behavior of the esd is insensitive to the specific details of the entry distributions as long as they are i.i.d. and have zero mean and unit variance.It also extends to the sparse set-up.Namely consider a matrix of i.i.d.entries where each entry is the product of a zero mean and unit variance random variable, and an independent Bernoulli(p) random variable.From the two concurrent works of Götze and Tikhomirov [25] and Tao and Vu [40] it follows that if p decays polynomially in n, i.e. p ≥ n ε−1 for some ε > 0, then the limit is still the circular law.Later Wood [44] relaxed the moment assumptions of the entries.A recent article by Basak and Rudelson [7] shows that the same limit continues to hold when p decays at a poly-logarithmic rate.In all these works the entries of the matrix still enjoys the independence and this feature plays a key role in the proofs.In [11], Bordenave, Caputo and Chafaï studied random Markov generators where one puts i.i.d.entries on the off-diagonal positions and sets each diagonal to be the negative of the corresponding row-sum, showing that the limit law is a free additive convolution of the circular law and a Gaussian random variable.Their result covers sparse ensembles, including the Markov generator for a directed Erdős-Rényi graph with edge probability p(n) = ω(n −1 log 6 n).
Circular laws for matrices with less independence between entries were subsequently proved in [10], [1], [33], [2], and [32].In particular, in [32] Nguyen showed that the esd of a uniformly chosen random doubly stochastic matrices converges weakly to the circular law.Since the adjacency matrix of a random d-regular directed graph (digraph) is a random doubly stochastic matrix, one is naturally led to the question of establishing the limits of the esd for such matrices.This was addressed in recent work of the second author [17], where it was shown that the circular law holds for adjacency matrices of random regular digraphs assuming a poly-log(n) lower bound on the degree.
A completely different story emerges when one replaces the Ginibre matrices by other models whose distribution is invariant under the action of some large group (note that Ginibre matrices are indeed invariant under right or left multipliction by unitary matrices).The study of such invariant models was initiated by Feinberg and Zee [20], who evaluated non-rigorously the limit of the esd for such matrices and showed various properties of the limit, e.g. that it is supported on a single ring in the complex plane.By using a variant of Girko's method adapted to the unitary group, this was put on a rigorous basis by Guionnet, Krishnapur and Zeitouni [26], who evaluated the limit of the esd for a matrix of the form U D where D is diagonal satisfying some assumptions and U is a random Haar-distributed unitary, and showed that it coincides with the Brown measure of the associated limiting operators (an improved version appears in [37]).Building on this and closer to the topic of this paper, Basak and Dembo [6] showed that the esds of the sum Û d n of d i.i.d.Haar distributed Unitary/Orthogonal matrices converge to a probability measure µ d whose density with respect to Lebesgue measure is given by which coincides with the Brown measure of a sum of d free Haar unitaries.Note that from this one easily concludes the existence of a sequence d = d(n) so that the esd of Û d(n) n / d(n) converges to the circular law.
We finally get to our model: it sits at the intersection of sparse models of regular directed (multi)-graphs and the sum of unitaries treated in [6].Indeed, from the point of view of the latter we replace unitary matrices which are Haar-distributed on the full unitary group by unitaries which are Haar-distributed on the subgroup of permutation matrices.In this case a formal application of Girko's method leads one to expect convergence to µ d (if d is fixed, see e.g.[12]) or to the circular law when d = ω(1) (after rescaling by √ d).The goal of this paper is to establish that the latter indeed holds, at least when d does not grow too slowly or too rapidly.Remark 1.5.Our methods are not sharp enough to handle the case of d constant, both for the reasons mentioned in Remark 1.3 and the fact that the loop equations for fixed d are much more complicated.See however the recent work [8] for progress in this direction for random d-regular graphs of sufficiently large fixed degree.
We end this section by pointing out that for fixed d, the random regular digraph model considered in [17] is contiguous with the sum of permutations model conditioned to have no parallel edges (i.e. with the matrix conditioned to have no entries larger than 1, an event which occurs with positive probability) [31,27].However, we are unaware of any quantitative contiguity results that allow d to grow with n.Given such a result (allowing d to grow faster than log 12 n) it could be possible to deduce the main result of [17] from Theorem 1.1, for some range of d; however, this would require a quantitative version of Theorem 1.1 with failure probability smaller than the probability for the sum of permutations to yield a 0/1 matrix, which is of order exp(−cd 2 ).1.2.Outline of the paper.In Section 2 we provide a brief outline of the proof techniques of Theorem 1.1.We begin Section 2 by a short description of Girko's method, which in a nutshell consists of focusing attention on the logarithmic potential of the esd of S d n / √ d.This is done by analyzing the Hermitian matrix with z ∈ C (hereafter, for any n × n matrix B n and z ∈ C, for brevity, we often write z − B n to denote zI n − B n ).To implement Girko's method one requires good control on the smallest singular value of T n (z) as well as on its smallish singular values.The required control on the smallest singular value is derived in Theorem 2.2 and an outline of its proof can be found in Section 2.2.The desired control on the smallish singular values is obtained in Theorem 2.6 by controlling the difference of the Stieltjes transform of the esd of T 1/2 n (z) at the finite n level and at the putative limit, all the way up to (almost) the real line.An outline of the proof of Theorem 2.6 is given in Section 2.3.
For Theorem 2.2, to control the smallest singular value of a matrix A n we need to control the infimum of A n u 2 over all u in the unit sphere.To this end, we break the sphere into the set of "flat" vectors and its complement, where a vector is said to be flat if it is close in ℓ 2 norm to a vector with a large number of equal components (for a precise formulation see Definition 2.4).The infimum over flat vectors is taken care of in Section 3 and the infimum over the remaining vectors is treated in Section 4.
Section 5 and Section 6 are devoted to control certain traces of polynomials in S d n and to derive concentration inequalities for Lipschitz functions of sum of permutations, respectively.We then turn to the control on the Stieltjes transform of the esd of T 1/2 n (z).In Section 7.1 we show that the Stieltjes transform satisfies an (approximate) fixed point equation, first in expectation and then, using the concentration results of Section 6, also with high probability.In Section 7.2 we then finish the proof of Theorem 2.6 using the stability of the fixed point equation, apriori lower bound on Stieltjes transform of the esd of T 1/2 n (z) far away from the real line, and a bootstrap argument.Finally in Section 8 combining Theorem 2.2, Theorem 2.6, and using a replacement principle (see Lemma 8.1) we finish the proof of Theorem 1.1.
1.3.Notational conventions.We write C J for the subspace of vectors in C n supported on J ⊂ [n], and write B J , S J for the closed Euclidean unit ball and sphere in this subspace.If J = [n], we write B n , S n−1 for brevity.Given v ∈ C n and J ⊂ [n], v J denotes the projection of v to C J . 1 = 1 n denotes the n-dimensional vector with all components equal to one, and consequently 1 J denotes the vector with jth component equal to 1 for j ∈ J and 0 otherwise.For x, y ∈ R we sometimes write x ∧ y to mean min(x, y).

Preliminaries and proof outline
2.1.Proof overview.In this section we provide an outline of the proof of Theorem 1.1.As we go along we introduce necessary definitions and notation.
The standard technique to analyze the asymptotics of the esd of a non-normal matrix is Girko's method [24].The basis of this method is the following identity which is a consequence of Green's theorem: for any polynomial P (z) = n i=1 (z − λ i ) and any test function where m is the Lebesgue measure on C and ∆ denotes the two-dimensional Laplacian.Applying this identity with the characteristic polynomial Next, associate with any n-dimensional non-Hermitian matrix M n and every z ∈ C the 2ndimensional Hermitian matrix The eigenvalues of M z n are merely ±1 times the singular values of zI n − M n .Therefore, denoting by ν z n the esd of M z n , we have that where for any probability measure µ on R, Log, µ := R log |x|dµ(x).Therefore we have the following key identity 3) The utility of Eqn.(2.3) lies in the following general recipe for proving convergence of L Mn of a given family of non-Hermitian random matrices {M n }: Step 1: Show that for (Lebesgue almost) every z ∈ C, as n → ∞, the measures ν z n converge weakly, in probability, to some measure ν z .
Step 2: Justify that Log, ν z n → Log, ν z in probability.
Step 3: A uniform integrability argument allows one to convert the z-a.e.convergence of Log, ν z n to the convergence of C ∆ψ(z) Log, ν z n dm(z), for a suitable collection S ⊆ C 2 c (C) of (smooth) test functions ψ.Consequently, it then follows from (2.3) that for each fixed, non-random ψ ∈ S, in probability.
Step 4: Upon checking that f (z) := Log, ν z is smooth enough to justify the integration by parts, one has that for each fixed, non-random ψ ∈ S, in probability.For S large enough, this implies the weak convergence of the esds L Mn to a limit which has the density 1 2π ∆f with respect to Lebesgue measure on C, in probability.
To prove Theorem 1.1 our plan is to establish Steps 1-4 for M n = S d n / √ d.As has been the case for other models of random matrices, Step 2 is the most challenging part.Since ν z is the esd of a Hermitian matrix one can use tools such as the method of moments or the Stieltjes transform to deduce Step 1. However log(•) being unbounded both near zero and infinity the conclusion of Step 1 is not enough to establish Step 2. One needs additional control on the large as well as small singular values of To this end, we first note that the limit of the esd of S d n / √ d, the circular law, is compactly supported.Therefore one can actually check that establishing Steps 1-4 for z in a large ball in the complex plane is enough to complete the proof of Theorem 1.1.
Next note that each row-sum and column-sum of S d n is d and hence the maximal singular value of for any z in a large ball.One can also easily show that the trace of S d n (S d n ) * /nd is bounded with high probability (see Section 5), which can be used to show that ν z n integrates x 2 , and hence log(x), near infinity.
Most of this paper is devoted to obtaining bounds on the small singular values of S d n / √ d − z.First, one needs to have a lower bound on the smallest singular value.This is derived in Theorem 2.1.The idea behind the proof of Theorem 2.1 is outlined in Section 2.2.
Next we need to show that there are not too many singular values near zero.Equivalently, we need to show that the total mass of a small interval I around zero under the esd of M z n is not too large.That mass can be estimated by obtaining bounds on the Stieltjes transform of the esd at a distance from the real line which is comeasurate with the length of I (for example, see Lemma 8.3).In Section 2.3 we provide an outline on how to achieve the desired bounds on the Stieltjes transform of M z n (see Theorem 2.6).
where s n (•) denotes the smallest singular value.
We deduce Theorem 2.1 from the following more general result.First we introduce some notation.For an n × n matrix M n we write where we recall Thus, after modifying γ 0 slightly, we see that it is enough to prove Theorem 2.2 under the additional assumption that d ≤ n.
On a high level, the proof of Theorem 2.2 follows the general strategy of the recent work [17] of the second author, which establishes a similar result with S d n replaced by a uniform random 0-1 matrix constrained to have all row and column sums equal to d.We now motivate some of the main ideas of this strategy.
From the definition of the smallest singular value we have (2.9) We note that 1 is an eigenvector of (S d n + Z n ) * (S d n + Z n ) with eigenvalue |d + ζ| 2 .A short argument then shows that to obtain (2.8) it suffices to control the infimum of (S d n + Z n )u 2 for u ∈ S n−1 ∩ 1 ⊥ .Denoting the rows of S n + Z n by R 1 , . . ., R n , we have Thus, for a fixed vector u ∈ S n−1 ∩ 1 ⊥ , the task of controlling the probability that (S d n + Z n )u concentrates near the origin will involve bounding the probability that the scalar random variables R i • u concentrate near zero.
First we briefly review the argument from [36] for the case where S d n is replaced by a matrix X n with i.i.d.centered entries ξ ij of unit variance.In this case we have w ∈ C is a deterministic quantity involving the entries of u and Z n .Then we can bound P(|R i •u| ≤ t) for small t > 0 using standard anti-concentration estimates.For instance, we have the following Berry-Esséen-type bound (see Lemma 4.8): for fixed nonzero v ∈ C n and any r ≥ 0, .11)For this bound to be effective when applied to u, we need u to be "spread" in the sense that there is a set J ⊂ [n] with |J| ≥ cn such that |u j | ∼ 1/ √ n for all j ∈ J.After conditioning on the variables ξ ij with j / ∈ J, (2.11) gives For m ∈ [n] and ρ > 0, the set of (m, ρ)-compressible unit vectors is defined to be the ρ-neighborhood of the set of m-sparse vectors in the sphere: For m ≥ cn and ρ of constant order, one can show that incompressible vectors u ∈ S n−1 \ Comp(m, ρ) are spread in the above sense, i.e.
Thus, (2.12) is effective for incompressible vectors.While we only have a crude anti-concentration bound for compressible vectors, the bound can be tensorized to show any fixed compressible vector u.Then, from the fact that Comp(m, ρ) has low metric entropy (i.e. it can be covered by a relatively small number of small balls) one can apply the union bound over a suitable net to show inf u∈Comp(c n with high probability if c 1 , c 2 are sufficiently small constants. After obtaining uniform control on (X n + Z n )u 2 for u ∈ Comp(c 1 n, c 2 ), an averaging argument shows that in order to obtain an estimate of the form it suffices to get a bound of the form for an arbitrary fixed row R i and u ∈ S n−1 \ Comp(c 1 n, c 2 ).But this now follows from (2.12).See [36] for the detailed presentation of this argument.The distribution of S d n necessitates a somewhat modified approach, and in particular a different notion of structure than compressibility.In order to make use of the anti-concentration estimate (2.11) we will consider pairs of rows R i 1 , R i 2 .For each ℓ ∈ [d], conditioning on the remaining n − 2 rows of P ℓ n fixes π ℓ n ({i 1 , i 2 }).It follows that the i 1 -st row of P ℓ n is e j where j is drawn uniformly from π ℓ n ({i 1 , i 2 }), and e k denotes the k-th standard basis vector.Since the matrices {P ℓ n } ℓ∈[d] are independent, it is then possible to express where {ξ ℓ } ℓ∈[d] are i.i.d.Rademacher variables and w ∈ C is some quantity that is deterministic under conditioning on the rows [n]\{i 1 , i 2 } of all of the matrices {P ℓ n } ℓ∈ [d] .By the discussion under (2.10), we can then get a bound on P(|R i 1 • u| ≤ t) for small t > 0 via the Berry-Esséen-type bound (2.11), which will be effective when the vector of differences (u is spread.This motivates the following: Definition 2.4.For m ∈ [n] and ρ ∈ (0, 1), define the set of (m, ρ)-flat vectors (where the set Sparse(m) was defined in (2.13)).We denote the mean-zero flat vectors by For non-integral x ≥ 0 we will sometimes abuse notation and write Sparse(x), Flat(x, ρ), etc. to mean Sparse(⌊x⌋), Flat(⌊x⌋, ρ).
Our first task is get a lower bound on inf u∈Flat 0 (m,ρ) (S d n + Z n )u 2 holding with high probability for a suitable choice of m, ρ, which we obtain in Proposition 2.5 below.For a parameter K ≥ 1 define the boundedness event (recall our notation (2.7)).We will eventually take K = n γ 0 for an arbitrary fixed γ 0 ≥ 1 (cf.Section 4.4).For m ∈ [n] and ρ ∈ (0, 1) (possibly depending on n), define the event (2.18) Proposition 2.5 (Invertibility over flat vectors).There exist absolute constants C 2.5 , c 2.5 , c 2.5 > 0 such that the following holds.Let γ ≥ 1 and for all n sufficiently large depending on γ.
Section 3 is devoted to the proof of Proposition 2.5, and we defer discussion of the proof ideas to that section.
The remainder of the proof of Theorem 2.2 is given in Section 4. Having obtained control on flat vectors, our aim will then be to reduce the problem to obtaining an anti-concentration estimate on R i 1 • u, which we express as in (2.14), for a fixed row R i 1 and fixed u ∈ S n−1 ∩ 1 ⊥ ∩ Flat(m, ρ) c .(Actually we will consider dot products of the form (R i 1 −R i 2 )•u, but these can also be expressed in the form (2.14).)As in the i.i.d.setting discussed above this can be accomplished by an averaging argument, but the argument here is more delicate due to the dependencies among the entries of S d n .We adapt an approach used in [29] for the invertibility problem for random regular digraphs.The vector u must be chosen to be almost-orthogonal to the span of rows {R i : i / ∈ {i 1 , i 2 }}, and we want to ensure that the differences u π ℓ n (i 1 ) − u π ℓ n (i 2 ) are large for a large number of ℓ ∈ [d].If the indices π ℓ n (i 1 ), π ℓ n (i 2 ) were independent of u then it would be relatively easy to show that because u is non-flat, a random choice of i 1 , i 2 will give us a large number of differences, on average.However, since both u and π ℓ n (i 1 ), π ℓ n (i 2 ) are fixed by conditioning on {π ℓ n (i } the argument requires some care.See Lemma 4.4 for the details. Having reduced to consideration of a random walk of the form (2.14) with a large number of large differences u π ℓ n (i 1 ) − u π ℓ n (i 2 ) , we can conclude using the Berry-Esséen-type bound (2.11); this is done in Lemma 4.6.In Section 4.4 we combine all of these elements to complete the proof of Theorem 2.2.

2.3.
Control on the Stieltjes transform.We begin this section by fixing some notation.Denote C + := {ξ ∈ C : Im ξ > 0}.Fixing any z ∈ B C (0, R), for some R > 0, and ξ ∈ C + we define the Green function as follows: Instead of working with the Green function G n (•), we will see that it will be easier to work with its symmetrized version (2.20) We next define the Stieltjes transform of the esd of (z Tr G(S d n , ξ, z).Recall that the eigenvalues of the matrix (2.22) Our goal is to show that m converges to a limit m ∞ which is the Stieltjes transform of a probability measure on R and satisfies the equation As explained above, we need a bit more: we need to control the difference The proof of Theorem 1.1 only requires such control for ξ purely imaginary.This is achieved in Theorem 2.6 below.
Theorem 2.6.Fix any sufficiently small ε > 0 and z ∈ B C (0, 1 − ε).Take any sequence of reals Then there exist a constant C 2.6 , depending only on ε, absolute constants c 2.6 , C2.6 , C 2.6 , and an event Ω n with such that for all large n, on the event Ω n we have Remark 2.7.In Theorem 2.6 we treat the case when ξ is purely imaginary, which simplifies some of the computations.One can use a similar idea as in the proof of Theorem 2.6 to control the difference of m n (ξ) and m ∞ (ξ) for all ξ ∈ C + when Im ξ ≥ (log n) −C for some C > 0. The key is to establish stability of the equation (2.23) for all ξ ∈ C + .Since the proof of Theorem 1.1 does not require such control we do not attempt it here.
The key to the proof of Theorem 2.6 is to establish that m n (ξ) satisfies an approximate version of the equation (2.23).That is we need to show that P ( m n (ξ)) ≈ 0 where To show this, it is easier to work with m n (ξ), the Stieltjes transform of the symmetrized version of the empirical measure of the singular values of z − S d n / √ d where the entries of S d n are now centered.Then concentration bounds for Lipschitz functions of permutations under the Hamming metric also allow us only to consider P (E m n (ξ)).
To show that P (E m n (ξ)) ≈ 0 we start with a function related to G( S d n ), where G( S d n ) is defined by replacing S d n with S d n in (2.20).Then we use the resolvent identity and the fact that {P ℓ n } are independent to identify the dominant and negligible terms.This yields an approximate equation involving E m n (ξ) and an auxiliary variable.To remove the auxiliary variable we derive another approximate equation.
However, this alone does not yield Theorem 2.6.Because P (•) is cubic polynomial, bounds on Moreover, the bound on P ( m n (ξ)) depends implicitly on an bound on m n (ξ) (see Lemma 7.1).To overcome this difficulty, in Lemma 7.6 we show that if m n (ξ) if bounded below then a bound on P (•) can be translated to a bound on the difference between m n (ξ) and m ∞ (ξ).On other hand, we can easily show that the desired bounds on m n (ξ) hold when ξ is far away from the real line.This gives Theorem 2.6 when ξ away from the real line.
To propagate the above bound for all ξ ∈ S ε,̟ we use a bootstrap argument.In the random matrix literature the bootstrap argument has already been used on many occasions to prove local law for different random matrix ensembles.Specifically, Erdős, Schlein, and Yau [19] used it to prove the local semicircle law for Wigner matrices down to the optimal scale.Subsequently it was generalized to prove local laws for other ensembles of random matrices (see [9] and references therein).
To carry out the above scheme for ξ ∈ C + such that Im ξ is small we note that by Lipschitz continuity and the boundedness property of m ∞ (ξ) derived in Lemma 7.8, the bounds on m n (ξ) translates to a bound on the same with ξ replaced by ξ ′ , whenever |Im ξ − Im ξ ′ | is small.These bounds on m n (ξ ′ ) together with Lemma 7.1 yield the desired bound on | m n (ξ ′ )− m ∞ (ξ ′ )|.Repeating this scheme we obtain the desired result for all ξ ∈ S ε,̟ .
We note that in the work [17] on the spectrum of the adjacency matrix A n,d for a random d-regular digraph, a completely different argument is used to obtain quantitative control on the Stieltjes transforms g ξ,z (A n,d ) = 1 n Tr G(A n,d , ξ, z).There the approach is by comparison, first replacing A n,d with an i.i.d.0-1 Bernoulli matrix B n,p with entries of mean p = d/n, and then replacing B n,p with a suitably rescaled real Ginibre matrix G n (for which the desired bounds are known to hold), showing that g ξ,z changes by a negligible amount at each replacement.The comparison between g ξ,z (B n,p ) and g ξ,z (G n ) is done using the standard Lindeberg swapping argument, whose use in random matrix theory goes back to Chatterjee [14].The comparison of g ξ,z (A n,d ) with g ξ,z (B n,p ) is done by conditioning, basically showing that g ξ,z (B n,p ) concentrates near its expected value with failure probability smaller than the probability that B n,p lies in A n,d , the set of adjacency matrices for d-regular digraphs.Since A n,d is uniform in A n,d , obtaining a lower bound for the latter probability amounts to the enumerative problem of estimating the cardinality of A n,d , which can be solved with known techniques.It is possible that this comparison approach could be adapted to the current setup, first replacing S d n with a discrete i.i.d.matrix M d n having i.i.d.Poisson entries, and then replacing M d n with a Gaussian matrix.However, as S d n is not drawn uniformly from a set of matrices the first step would not reduce to an enumeration problem as it did for A n,d , and hence this step appears more challenging.Instead we would need a coupling between S d n and M d n , together with a lower bound on the probability that they are close in an appropriate norm.It is likely that a proof along these lines, even if doable, would be somewhat lengthier than the approach taken in the present article.

Invertibility over flat vectors
In this section we prove Proposition 2.5.Throughout this section and Section 4 we let S d n and Z n be as in the statement of Theorem 2.2, except that some lemmas and propositions are stated under additional assumptions on the range of d. (Recall from Remark 2.3 that we are free to assume d ≤ n; also note that Theorem 2.2 trivially holds for d ≤ log 8 n.) The general approach is similar to the proof in [17], and indeed we make use of two lemmas from that work (Lemma 3.5 and Lemma 3.6).However, the differences between the distribution of S d n and the adjacency matrix of a uniform random regular digraph A n,d cause the proof here to differ on most of the particulars.We have attempted to structure the proof in roughly the same way as in [17], and use Lemma 3.1 to encapsulate the parts of the proof which are most different from that work.On a technical level, the proof here is somewhat simpler as the joint independence of the permutations π ℓ n allows us to avoid the difficult coupling constructions of [17], as well as the use of heavy-powered graph discrepancy results.

3.1.
Anti-concentration for the image of a fixed vector.To lighten notation we will drop subscripts n from π n , π ℓ n in this section.We begin by obtaining lower tail bounds for the norm of (S d n + Z n )u for a fixed vector u ∈ S n−1 .Lemma 3.1 (Image of a fixed vector).There exist absolute constants c 3.1 , c 3.1 > 0 such that the following holds.Let d ≥ 1, and let u ∈ C n be such that there are disjoint sets Then Remark 3.2.We note that (3.2) is essentially optimal when md is small compared with n, at least for the case Z n = 0 (and we are aiming for estimates that are uniform in Z n ).Indeed, , where R i is the ith row of S d n .When md = o(n) the number of "good" rows R i whose support overlaps the support of u will be roughly md on average (in fact it concentrates near md, as will be shown in the proof).For each good row R i we will have E|R i • u| 2 ≈ 1/n, since the overlap of supports is of order 1 on average, and coordinates u j are typically of size 1/ √ n.This means we should expect S d n u 2 2 ≈ md/n, and (3.2) gives a lower bound at this scale.However, the bound is suboptimal when m ≈ n, in which case there will be roughly ≍ n good rows with overlaps of order d, which suggests E S d n u 2 2 ≈ d in this case.Thus, we expect (3.2) to hold with min( md/n, 1) replaced with md/n.The proof could be extended to give such a bound by exploiting the randomness of all d permutations within each row (in the proof we only use one permutation per row) but such a refinement is not necessary for our purposes.
The above lemma is a quick consequence of Lemma 3.3 below.First we need some notation.We write J := J 1 ∪ J 2 , and for each k ∈ [d] we set and Using the pointwise bound Proof of Lemma 3.3.Fix u as in the statement of the lemma.To lighten notation we will drop the dependence on u from X k (u), W k (u) and write X k , W k .
First we note that for any ℓ ∈ [d], j 1 ∈ J 1 , and ), then we have Indeed, fixing ((π ℓ ) −1 (j)) j / ∈{j 1 ,j 2 } we see that for any i ∈ (π ℓ ) −1 ({j 1 , j 2 }), either π ℓ (i) = j 1 or j 2 with equal probability.Thus, under the conditioning in (3.6) we have with equal probability, where ∆ i is some non-random quantity depending on π (ℓ) , ((π ℓ ) −1 (j)) j / ∈{j 1 ,j 2 } .Using the assumption (3.1) and the triangle inequality we immediately deduce (3.6).Now using (3.6), ) + e −1/16 =: 1 − q. (3.7) Now we establish the claim for the case m = 1, in which case J = {j 1 , j 2 }, and for each k (recall our assumption d ≤ n from Remark 2.3).Thus we have that B := d 1 k=2 I(B k ) is stochastically dominated by a sum of i.i.d.indicator variables with expectation O(c 3 ).From the Chernoff bound it thus follows that 9) taking c 3 sufficiently small.Let us denote the complement of this event by G. On G, there exists a set G ⊂ [d 1 ] with |G| ≥ d 1 /2 such that the sets {I k } k∈G are pairwise disjoint.We take G to be the largest such set (in the event of a tie we pick one in some measurable fashion).We have for some constant c ′ > 0, establishing the lemma for the case m = 1.Now assume m ≥ 2. In fact we are now free to assume m ≥ C 0 for some absolute constant C 0 > 0 to be specified later.Indeed, for m ≤ C 0 we can simply pass to singleton subsets of J 1 , J 2 and apply the case m = 1 (adjusting the constant c 1 ).
We next show that for any fixed k ∈ where (3.11)Note that the expectation in (3.10) is only taken over part of the randomness of the permutation π k .The idea for the proof is that after some further conditioning we can reduce to using only the randomness of π k on M k pairwise disjoint sets T 1 , . . ., T M k ⊂ I k \U (k) of size two, and the action of π k on these sets can be realized as the application of M k independent transpositions.Thus, we can extract a subsequence of M k rows R i j that are jointly independent under the conditioning, and apply the bound (3.7) to each one.
We turn to the details.Fix k ∈ [d] and write Îa k := I a k \U (k) for a = 1, 2. For given m 0 ∈ N and U ⊂ [n] let T (m 0 , U ) be the collection of all sequences T := (T j ) m 0 j=1 of pairwise disjoint 2-sets (Since π −1 (J 1 ), π −1 (J 2 ) are disjoint, this is the event that they bisect each of the sets T j for 1 ≤ j ≤ m 0 .)Conditional on π (k) and M k , for any where in the penultimate line we noted that under the conditioning and restriction to )} M k j=1 are jointly independent, and in the last line we applied (3.7).Now letting Undoing the conditioning on M k yields (3.10) as desired.
Define the decreasing sequence of sigma algebras and set F d to be the trivial sigma algebra.In words, conditioning on F k fixes the permutations π k+1 , . . ., π d , along with the values π ℓ (i) for ℓ ≤ k and all i in the preimages of where the penultimate equality follows upon noting that and applying the tower property of the conditional expectation, and in the last step we have used that M k is F k−1 -measurable.Iterating this bound over 2 ≤ k ≤ d 0 and combining with (3.13) we obtain Thus, where Next we will show that for any L ⊂ [d 0 ], for some absolute constant c > 0. Assuming (3.15), we have from (3.14) that where the last inequality is obtained by taking the constant C 0 > 0 sufficiently large and thus m ≥ C 0 .This yields (3.5) and hence Lemma 3.3.It only remains to establish (3.15).Since the variables M ℓ are exchangeable we may take On the other hand, and since the sets I ℓ are independent and uniformly distributed over [n]  2m , we have for each ℓ ≤ k, where we took the constant c 0 sufficiently small.Hence, We have thus shown ).The latter probability can be shown to be at most e −cmk by an argument using stochastic domination and the Chernoff bound similar to what was done in (3.8)-(3.9).This gives (3.15) and hence the claim.

Weak control on flat vectors.
In this subsection we establish the following, which already implies Proposition 2.5 when d ≥ n/ log n, but is weaker for smaller values of d.Recall the events E K (m, ρ) from (2.18).

Lemma 3.4 (Invertibility over flat vectors, weak version).
There are absolute constants c 3.4 , c 3.4 , c ′ 3.4 > 0 such that the following holds.Let γ ≥ 1 and We will need the following lemma from [17].m for some absolute constant C 3.5 > 0.
Proof of Lemma 3.4.Our plan is to use Lemma 3.1 first to obtain a bound on (S d n + Z n )u 2 for any arbitrary but fixed u ∈ Flat(m 0 , ρ 0 ), where ρ 0 := c/K √ m 0 for some c to be determined determined during the course of the proof.Then using Lemma 3.5 we claim that the metric entropy of Flat(m 0 , ρ 0 ) is small enough to allow us to take a union bound.
In order to apply Lemma 3.1 we need to find disjoint sets J 1 and J 2 such that |u j 1 − u j 2 | is large for every j 1 ∈ J 1 and j 2 ∈ J 2 .To this end, consider an arbitrary vector u ∈ Flat 0 (m 0 , ρ 0 ).By definition, there exists λ ∈ C, v ∈ Sparse(m 0 ) and w ∈ ρ 0 B C (0, 1) such that u = v + λ √ n 1 +w.First we claim that v + w 2 ≥ 1/2.(3.17)Indeed, by the triangle inequality, On the other hand by the assumption u ∈ S n−1 ∩ 1 ⊥ and applying the Cauchy-Schwarz inequality we get and so |λ| ≤ v + w 2 .Combined with (3.18) this gives (3.17).
It follows that there exists j 1 ∈ J with On the other hand, since j∈J c |w j | 2 ≤ w 2 2 ≤ ρ 2 0 it follows from the pigeonhole principle that there exists j 2 ∈ J c such that where we have used the fact that m 0 = o(n) and the definition of ρ 0 .Now using the triangle inequality we have To complete the proof of the lemma we then apply Lemma 3.1 with n/m 0 .Recalling that u ∈ Flat 0 (m 0 , ρ 0 ) was abitrary, we conclude the bound sup u∈Flat 0 (m 0 ,ρ 0 ) where we also use the fact that d ≤ n.Now by Lemma 3.5 we may fix a ρ 0 -net Σ 0 (m 0 , ρ 0 ) ⊂ Flat 0 (m 0 , ρ 0 ) for Flat 0 (m 0 , ρ 0 ) of cardinality at most (C 3.5 n/m 0 ρ 2 0 ) m 0 .On the event E K (m 0 , ρ 0 ) we have (S d n + Z n )v 2 ≤ ρ 0 K √ d for some v ∈ Flat 0 (m 0 , ρ 0 ).Letting u ∈ Σ 0 (m 0 , ρ 0 ) such that u − v 2 ≤ ρ 0 , by the triangle inequality we have where in the last step we have used the fact that E K (m 0 , ρ 0 ) ⊂ B(K).Thus, by the union bound, We where in the last step we choose c ′ 3.4 suffciently small.The proof of the lemma thus completes.
3.3.Proof of Proposition 2.5.In this subsection we upgrade the weak control on flat vectors obtained in Lemma 3.4 to obtain Proposition 2.5 by iterative application of Lemma 3.7 below.The idea is that once we have shown S d n + Z n is well-invertible over Flat(m 0 , ρ 0 ) for some small value of m 0 ∈ [n] we can exploit the improved anti-concentration properties of vectors in S n−1 \ Flat(m 0 , ρ 0 ).(Here and in the sequel, by saying that a matrix A is well-invertible over a subspace B we mean that with high probability a good lower bound on Au 2 holds for all u ∈ B.) This allows us to beat the increased metric entropy cost for Flat(m 1 , ρ 1 ) for some m 1 > m 0 that exceeds m 0 by a factor (essentially) d, and some ρ 1 > 0 somewhat smaller than ρ 0 .We can iterate this roughly log d n times to obtain control on Flat(m, ρ) with m essentially size n (up to log corrections).A similar iterative approach was used in the sparse i.i.d.setup in [25] (with the sets Flat(m 0 , ρ 0 ) replaced by sets of vectors lying close to m 0 -sparse vectors).
For deducing the improved anti-concentration properties as we increment the parameter m we will need the following lemma from [17].
where c 3.6 > 0 is some absolute constant.

Lemma 3.7 (Incrementing control on flat vectors).
There exists absolute constants c 3.7 , c ′ 3.7 , c 3.7 > 0 such that the following holds.Let γ ≥ 1 and and let m ′ , ρ ′ satisfy (3.24) Proof.Let m ⋆ , m ′ , ρ ⋆ , ρ ′ be as in the statement of the lemma (note that the lemma holds vacuously for d ≤ log 2 n by the assumptions (3.24)).Since the event E K (m, ρ) is monotone in the parameters m, ρ, we may and will assume that the upper bounds (3.24) hold with equality.First we will argue Indeed, consider an arbitrary fixed element By the assumed upper bound on m ⋆ we can apply Lemma 3.6 to obtain disjoint sets By deleting elements from J 1 and J 2 we may assume |J 1 | = |J 2 | = m ⋆ .Now we apply Lemma 3.1 to obtain where we have used the fact that m ⋆ d ≤ n.Since u was arbitrary, (3.26) follows.
As in the proof of Lemma 3.4 we conclude by application of the union bound.Indeed, using Lemma 3.5 we fix a ρ By similar reasoning as in the proof of Lemma 3.4, on the event 7 sufficiently small we also have that 2ρ Therefore, applying the union bound and (3.26) we deduce, Since K ≤ n γ , and ρ ⋆ and m ′ satisfies (3.23) and (3.24) respectively we further obtain that 3.7 sufficiently small to complete the proof of the lemma.
Proof of Proposition 2.5.We may and will assume throughout that n is sufficiently large depending on γ.In the sequel, we will frequently apply the observation that the events E K (m, ρ) are monotone increasing in the parameters m and ρ.
For k ≥ 0, set where c 2.5 := c ′ 3.4 ∧ c ′ 3.7 , and denote Note that m k is an increasing sequence by our assumption d ≥ log 3 n.From Lemma 3.4 and monotonicity of E K (m, •), we have From the definitions of k * and m k and using the fact that d ≥ log 3 n we see that for a sufficiently large constant C > 0. By monotonicity of E K (•, ρ), Thus, applying the union bound, 32) where we interpret the last sum as zero if k * = 0. From (3.31) we have for n sufficiently large.Thus, we can apply Lemma 3.7 with m ⋆ = n/d and ρ ⋆ = ρk * +1 to bound For the case that k * ≥ 1, since where c is a sufficiently small positive constant.From (3.31) we have ρk * +2 ≥ n −C ′ γ log d n for a sufficiently large constant C ′ > 0. This completes the proof of the proposition.

Invertibility over non-flat vectors
Having shown that S d n + Z n is well-invertible over vectors in Flat 0 (m, ρ) with m essentially of size n (up to log factors), it remains to control the infimum of (S d n + Z n )u 2 over the non-flat vectors u ∈ S n−1 ∩ 1 ⊥ ∩ Flat(m, ρ) c .The metric entropy of non-flat vectors is too large to take union bounds, so a different approach must be used for reducing to consideration of (S d n + Z n )u for a fixed vector u.We follow [36] by using an averaging argument, which in the setting of i.i.d.matrices reduces the problem to consideration of a dot product R i • u for a single row vector R i and a unit vector u that is orthogonal to the span of the remaining rows (and hence may be treated as fixed).
In the present setting, in order to use random transpositions we must consider a fixed pair of rows R i 1 , R i 2 and the dot product (R i 1 − R i 2 ) • u.Here u is a unit vector that is (almost) orthogonal to the remaining n − 2 vectors as well as R i 1 + R i 2 .The lack of independence between the rows makes the argument considerably more delicate than in [38].In particular, the vectors R i 1 , R i 2 and u all depend on the rows {R i : i = i 1 , i 2 }, and we want to avoid the event that, after conditioning on these n − 2 rows, the vector u is not flat on the supports of R i 1 and R i 2 .To overcome this we will adapt an argument of Litvak et al. that was used to bound the singularity probability for adjacency matrices of random regular digraphs [29].Specifically, we define "good overlap events" O i 1 ,i 2 on which we may select an appropriate (almost-) normal vector u that has "high variation" on the supports of R i 1 , R i 2 , see Definition 4.3.In Lemma 4.4 we show that, if we restrict to the events that (1) S d n + Z n is well-invertible over flat vectors, and (2) S d n has no holes in the sense that the nonzero entries are uniformly distributed in all sufficiently large submatrices, then the events O i 1 ,i 2 hold for a constant proportion of pairs i 1 , i 2 ∈ [n].Event (1) holds with high probability by Proposition 2.5, while the no-holes property ( 2) is shown to hold with high probability in Section 4.1.We can then restrict to O i 1 ,i 2 for some fixed i 1 , i 2 by an averaging argument, at which point we can control the dot product (R i 1 − R i 2 ) • u using a Berry-Esséen-type bound.As with the previous section, the arguments are similar to those in the work [17] for random regular digraphs, but differ in many particulars due to the different nature of the distribution of S d n .4.1.The no-holes property.In the graph theory literature, a graph is said to enjoy a discrepancy property if the number of edges between all sufficiently large pairs of vertex sets U, V is roughly δ|U ||V |, where δ is the density of the graph.In terms of the adjacency matrix this says that all sufficiently large submatrices have roughly the same density.We will need a one-sided version of this property, called the no-holes property, to hold for S d n with high probability -namely, that all sufficiently large submatrices have density at least half of the expected value.In fact, we will need this property to hold for all matrices {S Combining this with the union bound, Since d ≤ n the result immediately follows.
Remark 4.2.It is interesting to note that the dual property that S d n has no dense patches with high probability was a crucial ingredient in the work of Kahn-Szemerédi [22] on the mirror problem of proving an upper tail bound for the second largest singular value of S d n (i.e. the operator norm of the centered matrix S d n − d n 1 1 * ).4.2.Good overlap via an averaging argument.In this and the next subsection we make use of the following notation: for distinct i 1 , i 2 ∈ [n] we denote that is, the sigma algebra of events generated by all but the i 1 -st and i 2 -nd rows of each permutation matrix Here (S d n + Z n ) (i 1 ,i 2 ) denotes the matrix obtained by removing rows i 1 , i 2 from S d n + Z n .We note that the event O i 1 ,i 2 (k, ρ, t) is F(i 1 , i 2 )-measurable.Indeed, conditioning on F(i 1 , i 2 ) fixes the (S d n + Z n ) (i 1 ,i 2 ) as well as the pairs {π ℓ n (i 1 ), π ℓ n (i 2 )} ℓ∈ [d] , and the latter determine the vector For each pair of distinct indices i 1 , i 2 ∈ [n] we choose an F(i 1 , i 2 )-measurable random vector u (i 1 ,i 2 ) ∈ S n−1 ∩ 1 ⊥ and an F(i 1 , i 2 )-measurable random set L(i 1 , i 2 ) ⊂ [d] which, on the event O i 1 ,i 2 (k, ρ, t), satisfy the stated properties (a)-(c) for u, L; off this event we define u (i 1 ,i 2 ) and L(i 1 , i 2 ) arbitrarily (but in an F(i 1 , i 2 )-measurable way).
For m ≥ 1 and ρ, t > 0 we define the "good" event that (S d n +Z n ) is well-invertible over mean-zero flat vectors: for some absolute constant c 4.4 > 0.
Remark 4.5.The condition t ≤ |d + ζ| √ n is needed in order to bypass the possibility that 1 is an approximate minimal singular eigenvector of S d n + Z n .This can be best seen if one chooses ζ = −d.Proof of Lemma 4.4.Suppose the event on the left hand side of (4.8) holds.Let u, v ∈ S n−1 be the respective eigenvectors of (S 2 .By our assumptions on Z n we have that 1 is also an eigenvector of these matrices with eigenvalue |d + ζ| 2 .Then since by assumption, it follows that u and 1 are associated to distinct eigenvalues of (S d n + Z n ) * (S d n + Z n ) and hence u ⊥ 1; we similarly have that v ⊥ 1.We have thus located vectors u, v Furthermore, by the restriction to G(m, ρ, t) we have that u, v In the first stage of the proof, we show that there is a large number of "good" pairs (i We begin with (2), counting pairs (i 1 , i 2 ) that are "good" with respect to u.Since u ∈ S n−1 \ Flat(m, ρ), by Lemma 3.6 there exist disjoint sets

.11)
For i ∈ [n] and α ∈ {1, 2}, write We will use our restriction to the no-holes event D(c 4.4 md/n, m/4) to show that I(u) is large.First, let a contradiction.Hence, (4.13) holds.Now for i 1 ∈ [n] let We claim that for any Indeed, suppose towards a contradiction that |I 2 (i 1 ) c | ≥ m/4 for some i 1 ∈ I 1 .From (4.10) we have |J 2 | ≥ m, so by our restriction to D(c 4.4 md/n, m/4), Now we count pairs that are "good" with respect to v.For i 1 ∈ [n] write ) Jv(i 1 ) (for any vector v ′ and J ′ ∈ [n] we write v ′ J ′ to denote the projection of the vector v ′ onto coordinates indexed by J ′ ), we have But since w ∈ Sparse(m) this contradicts the assumption that v / ∈ Flat(m, ρ).Thus, putting Using the bound (4.15) we have Now we show that O i 1 ,i 2 (c 4.4 md/n, ρ/4, t) holds for all (i 1 , i 2 ) ∈ I ′ (u, v) (in fact it holds for all (i 1 , i 2 ) ∈ I(u)).Indeed, the vector u and the set L = L 1 (i 1 ) ∩ L 2 (i 2 ) witness the conditions (a)-(c) from Definition 4.3, as we now demonstrate.The condition that |L| ≥ c 4.4 md/n follows from the definition of I(u).The condition (a) follows from (4.11) and the definitions of L 1 (i 1 ), L 2 (i 2 ).Finally, (b) and (c) follow easily from (4.9) and the triangle inequality: A key point here is that while u and L = L 1 (i 1 )∩L 2 (i 2 ) witness that the event O i 1 ,i 2 (c 4.4 md/n, ρ/4, t) holds, we cannot take these to be u (i 1 ,i 2 ) and L(i 1 , i 2 ), respectively, as u and L are not themselves measurable with respect to F(i 1 , i 2 ).Now it remains to show that occurrence of all the events on the left hand side of (4.8) implies also the occurrence of the event { . By several applications of the Cauchy-Schwarz inequality and the fact that O i 1 ,i 2 (c 4.4 md/n, ρ/4, t) holds, we have Using the triangle inequality, recalling the definition of Ĩ(v), and using the fact that max i |v i | ≤ v 2 = 1, we further obtain where in the second-to-last inequality we have used the property (c) of the event O i 1 ,i 2 (c 4.4 md/n, ρ/4, t).
Combining and rearranging we have We have thus shown that on the event Taking expectations on both sides and rearranging yields the desired bound.

4.3.
Anti-concentration for random walks.In the previous section we essentially reduced our task to obtaining an anti-concentration estimate for the random variable (R . We accomplish this in the following lemma (recall our notation (4.6)).
Remark 4.7.In the proof we will only use the lower bound |L(i 1 , i 2 )| ≥ k and property (a) for u (i 1 ,i 2 ) and L(i 1 , i 2 ) from Definition 4.3, which is why the bound is independent of the parameter t.
We will need the following standard anti-concentration bound of Berry-Esséen-type; see for instance [16,Lemma 2.7] (the condition there of κ-controlled second moment is easily verified to hold with κ = 1 for a Rademacher variable).Lemma 4.8 (Berry-Esséen-type small-ball inequality).Let v ∈ C n be a fixed nonzero vector and let ξ 1 , . . ., ξ n be independent Rademacher variables.There exists an absolute constant C 4.8 such that for any r ≥ 0, Proof of Lemma 4.6.By symmetry we may take (i holds.This fixes the vector u (1,2) and the set For ease of notation we write u = u (1,2) and L = L(1, 2) for the remainder of the proof.Let r ≥ 0. Our aim is to show for some sufficiently large constant C. Let ξ 1 , . . ., ξ d be i.i.d.Rademacher variables, independent of all other variables, and for each ℓ ∈ [d] put where we recall τ (i 1 ,i 2 ) denotes the transposition that switches i 1 , i 2 , and we interpret τ Turning to prove (4.19) we note where Since |L| = k by the pigeonhole principle there must exists some j ⋆ such that For all ℓ ∈ L (j) we have |v ℓ | ≥ ρ/ √ n and so Moreover, since the components of v vary by at most a factor of 2 on L (j⋆) we also have , where P L (j⋆) denotes the law of {ξ ℓ } ℓ∈L (j⋆ ) .Applying this bound to the expression (4.20) (after conditioning on {ξ ℓ : ℓ / ∈ L (j) } and absorbing the resulting deterministic summands into the scalar z), we obtain (4.19) as desired.4.4.Proof of Theorem 2.2.Now we combine the results of this section and Section 3 to complete the proof of Theorem 2.2.Fix γ 0 ≥ 1 and let Γ 0 = C 2.2 γ 0 log d n with C 2.2 an absolute constant to be chosen sufficiently large.We may and will assume that n is sufficiently large depending on γ 0 .By Remark 2.3 we may assume log 8 n ≤ d ≤ n (4.24) (the desired bound holds trivially for smaller values of d).Recall the boundedness event B(K) from (2.17).From our hypotheses and the fact that Thus the event B(n Now using Lemma 4.4 we have for some constant C γ 0 depending only on γ 0 .Taking C 2.2 ≥ 3C 2.5 and combining (4.26)-(4.29)we conclude The proof of Theorem 2.2 is now complete.

Control on traces
In this short section, we derive simple estimates on traces for permutation matrices and for S d n (S d n ) * .We begin with the following simple estimate.Let π n be a random, uniformly chosen permutation on [n], and let P n denote the corresponding permutation matrix.Lemma 5.1.With notation as above, Proof.Let N ℓ denote the number of cycles of length ℓ in π n .Note that Tr P n = N 1 .Thus, the event {Tr P n ≥ k} is the union of the events that k particular indices are fixed points in the permutation π n and therefore Let now S d n be as in (1.2).We have the following lemma.
Lemma 5.2.With notation as above, there exists absolute constants c 5.2 , C ′ 5.2 , and C 5.2 so that for any d ≥ C 5.2 .In particular, there exists an absolute constant C 5.2 so that Proof.Note that Therefore, using that P i n (P j n ) * with i = j is distributed like P n , and that for fixed i they are independent of each other, we get from (5.1) that From (5.1) we have that E(e Tr P i n ) ≤ e e , and therefore, by independence and Markov's inequality, Substituting in (5.6) we obtain that , which completes the proof.
Note that Lemma 5.2 together with (5.1) imply that with for some absolute constant c ′ , and d and x sufficiently large.Indeed, and the conclusion follows by a union bound and the estimates in (5.3) and (5.7).

Concentration for resolvent sub-traces
In this section we derive concentration bounds on the traces of the diagonal and the off-diagonal blocks of the resolvent G(S d n ).To prove Theorem 2.6 we will need to consider the resolvent of S d n shifted by some deterministic matrices.Hence, we introduce the following notation.Let M n := M be a deterministic matrix of size n × n.Fix ξ ∈ C\R, z ∈ C and define Then, for i, j = 1, 2 and u ≥ 0 we have for some constant c 6.1 > 0, depending only on C 0 .
The following is an immediate corollary of Theorem 6.1.
Corollary 6.2.With notation as in Theorem 6.1, there exists an n 0 so that if Im ξ > n −1/16 and n > n 0 then, for i, j = 1, 2, We first prove Corollary 6.2 using Theorem 6.1.The proof of Theorem 6.1 follows that.
Proof of Corollary 6.2.
in Theorem 6.1 gives that for x > 0 we have This completes the proof upon using integration by parts.
We next establish Theorem 6.1, using a standard martingale approach.Specifically, we will apply a consequence of Azuma's inequality from [28] that is conveniently phrased for our setting.This will reduce the task to bounding the change in n −1 Tr F M ij (ξ) under the application of a transposition to one of the permutations π ℓ n .Define the Hamming distance between two permutations π, σ ∈ S n as follows: We extend to a Hamming metric on product space S d n in the natural way: for two sequences π Lemma 6.3 (Concentration for Hamming-Lipschitz functions).Let f : S d n → C be an L-Lipschitz function with respect to the Hamming metric (6.3), and let π = (π ℓ ) ℓ∈[d] be a uniform random element of S d n .Then, for any u ≥ 0, Proof.First we note that it is enough to prove that (6.4) holds for 1-Lipschitz function.Next, splitting f (π) into real and imaginary parts and applying the pigeonhole principle and the union bound, it suffices to show that for f a real-valued 1-Lipschitz function on S d n , By Chebycheff's inequality, (6.5) would follow if, for any λ > 0, For d = 1, the inequality (6.6) follows as in the proof of [28, Corollary 4.3], using that in Lemma 4.1 there, one actually controls the Laplace transform and not just the probabilities.To prove the case of general d, we use tensorization.For an arbitrary 1-Lipschitz function f : where we recall that π <k := (π ℓ ) ℓ∈[k−1] .For any fixed i ∈ [d] and π <i , the function h i viewed as a function of π i is a 1-Lipschitz function with respect to the Hamming metric while E i [h i ] = 0, where E i denotes the expectation with respect to π i .Therefore, applying the d = 1 case of (6.6) we obtain, for any i ∈ Since f − Ef = d i=1 h i and h i are measurable with respect to π <i+1 , iterating the above bound gives (6.6).Lemma 6.3 reduces our task to showing the normalized traces of F ij (ξ) are L-Hamming-Lipschitz for an appropriate L. For this task we will make use of the following: Lemma 6.4 (Resolvent identity).Let A and B be two Hermitian matrices, and let ξ ∈ C\R.Then More generally for any two invertible matrices C and D, we have As mentioned above we need to show that H n (•) is an L-Lipschitz function of π = (π 1 n , . . ., π d n ) with respect to the Hamming distance (6.3) for an appropriate value of L. By the triangle inequality it suffices to show it is L-Lipschitz as a function of π ℓ n with respect to the Hamming distance (6.2) on S n , for arbitrary fixed ℓ ∈ [d].
To this end, we define and π ℓ n is some fixed but arbitrary permutation over [n].We similarly define F M ij (ξ) and H n (ξ).Now using the resolvent identity we note that where Therefore, where and 0 n is the n × n matrix of zeros.To simplify (6.8) further, we note that the (k, n and one of π ℓ n (k) and π ℓ n (k) equals k ′ .Hence, using the triangle inequality and recalling the definition of for some k, k ′ ∈ [2n].Here e m denotes the canonical basis vector which has one in the m-th position.Since |Im ξ|, M ≤ C 0 we have the operator norm bounds Now combining (6.9)-(6.10)and (6.8), we obtain .11)This shows that we can apply Lemma 6.3 with f (π) = H n (ξ) and L = 16C 4 0 /n √ d(Im ξ) 2 , and the result follows.

Proof of the local law
In this section we prove Theorem 2.6.The proof consists of two key components.First we derive an approximate fixed point equation for m n (ξ), the Stieltjes transform of the symmetrized version of the empirical measure of the singular values of z − S d n / √ d.Since the fixed point equation is an equation of degree three it is not apriori immediate that m n (ξ) is close to the correct solution of the fixed point equation.To tackle this, we need certain properties of the roots of that cubic equation.We also need to employ a bootstrap argument to quantify the difference between m n (ξ) and its limit m ∞ (ξ) when Im ξ approaches zero.7.1.Derivation of the approximate fixed point equation.The main technical result of this section is the following lemma.
Then, there exists an event Ω n (ξ) with Recalling the definition of G(S d n ) (see (2.20)) we observe that m n (ξ) and m n (ξ) are the normalized traces of the resolvent of two Hermitian matrices differed by a finite rank perturbation.Therefore, one can use the following result to bound the difference between m n (ξ) and m n (ξ).Its proof is a simple application of Cauchy's interlacing inequality.We include it for completeness.Lemma 7.3.Let A i , i = 1, 2, be two n × n Hermitian matrices such that rank(A 1 − A 2 ) ≤ C 1 for some absolute constant C 1 .For i = 1, 2, and ξ ∈ C\R, let m A i n (ξ) denote the Stieltjes transform of the empirical law of the eigenvalues of A i .That is, Equipped with Lemma 7.3 and assuming Lemma 7.2 we now prove Lemma 7.1.
Proof of Lemma 7.1.Using Lemma 7.3 and the trivial bounds ).Therefore, Lemma 7.2 implies that where we have used Lemma 7.3 again and the fact that nIm ξ ≥ 1.It remains to show that with high probability.This will complete the proof of the lemma.To this end, applying Theorem 6.1, seting M = 0 there, using the trivial bound | m n (ξ)| ≤ 1/Im ξ again, and the triangle inequality we obtain that (7.5) and 3 , yielding (7.4).The desired probability bounds (7.5)- (7.6).The proof of the lemma now completes.Now it remains to prove Lemma 7.2.As we will see below, to prove the same we will first derive an approximate fixed point equation involving E m n (ξ) and an auxiliary variable Eν n (ξ) where Then an additional equation will be derived to eliminate Eν n (ξ) from the first equation.To obtain these two equations we will need to consider the expectation of the entries of product of matrices that are functions of centered permutation matrices.Hence, it will be useful to introduce the following notation.For ease of writing, for any permutation π n uniformly distributed on S n , we denote Equipped with the above notation we have the following lemma.
Lemma 7.4.Let M := M n be a 2n × 2n deterministic matrix.Then (i) Proof.Recalling (7.7), we make the following observations: and we deduce from above that Using (7.9) and a similar argument as above we also deduce that where the last step follows from (7.11).Thus, the part (i) of the lemma now follows upon plugging the bounds (7.12)-(7.13) in (7.10).To prove (iii) we apply (7.8)-(7.9),(7.11), and Cauchy-Schwarz inequality to deduce that This yields part (iii).The proofs of parts (ii) and (iv) follow from a similar argument as above and hence omitted.
We will apply Lemma by setting P = P ℓ n for some ℓ ∈ [d] and M will be functions of , where S d,(ℓ) n := j =ℓ P j n , and Recall the following result regarding the inverse of a block matrix.
where we have used the fact that the entries of P ℓ n are centered.Applying Lemma 7.4 we also note that where the last step follows from (7.14) and the standard operator norm bound G (ℓ) (S d n ) ≤ 1/Im ξ.Therefore, considering the (n + i, n + i)-th entry of the both sides of (7.16), taking an average over i ∈ [n], followed by taking an expectation over the randomness of {P ℓ n }, upon using (7.17), we obtain where .
Using the resolvent identity once again we observe that for any ℓ ∈ [d], where the last inequality follows from the facts that G(S d n ) , G (ℓ) (S d n ) ≤ (Im ξ) −1 and P ℓ n − EP ℓ n ≤ 2. Thus Term E 1 = O(d −1/2 (Im ξ) −3 ), which in particular implies that the first term in the rhs of (7.19) is the dominant term.Using (7.20) we also note that | m n (ξ) 2 − m (ℓ) (ξ) 2 | ≤ 4d −1/2 (Im ξ) −3 .Hence, from (7.19), upon using the facts that d = O(n) and Im ξ ≤ C 0 , we deduce where the last step follows from Corollary 6.2 upon taking (recall that 1 is the n-dimensional vector consisting of all ones) and observing that d 1/2 (Im ξ) 3 = O(n 1/2 ) = o(n 3/4 ).Note that (7.21) involves E ν n (ξ).To derive the desired approximate fixed point equation for E m n (ξ) one needs eliminate E ν n (ξ) from (7.21).To this end, consider the (i, n + i)-th entry of the both sides of (7.16), take an average over i ∈ [n], and proceed similarly as in the steps leading to (7.19) to deduce that where and the last step follows from the operator norm bounds (7.20) and the resolvent identity we also have that ).On the other hand an application of Corollary 6.2 and Cauchy-Schwarz inequality yield that Therefore, the approximate equation (7.22) simplifies to Finally multiplying both sides of (7.21) by (E m n (ξ)− ξ), using (7.23), and recalling that Im ξ ≤ C 0 , |z| ≤ R, we arrive at (7.2).This completes the proof of the lemma.
This means that m µ (iη) = −ix for some x > 0. Therefore Thus for any symmetric probability measure µ on R, the map η → P (m µ (iη, iη)) is essentially a cubic polynomial over the reals.Since m n (ξ) and m ∞ (ξ) are both Stieltjes transforms of symmetric probability measures and we need to control their differences only when ξ is purely imaginary, it is enough to derive properties of the roots of the equation where δ, η > 0. (i) There exists a unique positive root x ⋆ of the equation Q(x) = 0.
(ii) For any c 0 > 0, inf Proof.Since Q(0) = −η < 0 and lim x→∞ Q(x) = ∞, the number of roots of the equation Q(x) = 0 in the interval (0, ∞) is either one or three.If the number of positive roots of the equation Q(x) = 0 is three, then the Rolle's theorem implies that there exists x 0 ∈ (0, ∞) such that Q ′′ (x 0 ) = 0 which is clearly a contradiction, as we note that Q ′′ (x) = 3x 2 + 4η > 0 for all x ∈ R. Thus there exists a unique x ⋆ ∈ (0, ∞) such that Q(x ⋆ ) = 0. Turning to prove the second part of the lemma we note that where the last equality follows from the fact that Q(x ⋆ ) = 0. Since x, x ⋆ , η > 0, we have that , for all x ≥ c 0 .This completes the proof of the lemma.Recalling (7.24) we see that for any symmetric probability measure µ, P (m µ (iη, iη)) = iQ(x, η) where m µ (iη) = −ix.Therefore, Lemma 7.6(i) implies that there is a unique symmetric probability measure µ ∞ such that its Stieltjes transform m ∞ (ξ) satisfies the fixed point equation P (m) = 0.The second part of Lemma 7.6 ensures that for all η > 0 and in particular Proof.We set where C is chosen to be sufficiently large and for brevity we write Ŝd n := S d n / √ d.Recalling that d = O(n) and |z| ≤ 1 it follows from (5.8) that for C large, for some absolute constant c ′ establishing the desired assertion on the probability bound of Ω c 7.7,n .Now note that .
The desired lower bound on m n (ξ) on the event Ω 7.7,n now follows upon setting C7.7 = C.
When Im ξ is close to zero we cannot use Lemma 7.7.In that case, the desired bound | m n (ξ)| can be obtained by showing that it is close to m ∞ (ξ) and then obtaining bounds on | m ∞ (ξ)| which we derive in the lemma below.From [13, Eqn.(4.9)] we note that whenever Im (ξ 2 ) > 0, for some constants c and C depending only on ε.When Im (ξ 2 ) < 0 then we note that m ∞ (ξ 2 ) = m ∞ ( ξ2 ) = m c ( ξ2 ) and therefore (7.28) also holds for all ξ such that Im (ξ 2 ) < 0. Multiplying both sides of (7.28) by |ξ| and using the relation between m ∞ (•) and m ∞ (•) we establish the desired conclusion for m ∞ (•) for all ξ such that Re ξ = 0. We extend our conclusion for all ξ such that Re ξ = 0 by continuity of m ∞ (•) on C + .
Equipped with all ingredients we are now ready to prove Theorem 2.6.
Proof of Theorem 2.6.Recall that where we set C2.6 = 2 C7.7 .We need to show that m n (ξ) is close to m ∞ (ξ) uniformly for all ξ ∈ S ε,̟ .Consider a decreasing sequence of positive reals {η i } N i=0 such that η 0 = C2.6 , 1/(2n) < η i − η i+1 < 1/n and η N ∈ S ε,̟ .Note that N = O(n).Denote Υ n (ξ) := 3C 7.1 max{d −1/2 , n −1/4 log n}(Im ξ) This together with Lemma 7.7 further implies that on the event Ω 7.7,n ∩ Ω n (ξ 0 ) we have Therefore, Lemma 7.8 and the triangle inequality yields on the event Ω 7.7,n ∩ Ω n (ξ 0 ), for all large n.Note that we also have for all large n, where we use the fact that Im ξ 0 > Im ξ N ≥ (log n) −2 .Now we are ready to carry out the bootstrap argument.Indeed, applying Lemma 7.1 again and using the inequality for all ξ = iη with η ∈ [η 1 , η 0 ], on the event Ω 7.7,n ∩ Ω n (ξ 0 ), where in the last step we have used (7.32).On other hand, from (7.We complete the proof by induction.Indeed, we denote Ω j := ∩ j−1 i=0 Ω n (ξ i ) ∩ Ω 7.7,n .By the induction hypothesis we assume that (7.34) holds for all ξ = iη with η ∈ [η k , η 0 ] on the event Ω k .f dm for all smooth functions f supported on D ε , where we recall m(•) is the Lebesgue measure on C. Since ε > 0 is arbitrary and the circular law is supported on B C (0, 1), the above is enough to conclude the weak convergence of L S d n / √ d (for more details see the proof of Theorem 1.1).
We now turn our attention to the proof of Lemma 8.1.A key tool is the following dominated convergence theorem.Lemma 8.2.([41, Lemma 3.1]) Let (X , µ) be a finite measure space.For each integer n ≥ 1, let f n : X → R be random functions which are jointly measurable with respect to X and the underlying probability space.Assume that (i) There exists δ > 0 such that X |f n (x)| 1+δ dµ(x) is bounded in probability.(ii) For µ-almost every x ∈ X , f n (x) converges to zero in probability.Then X f n (x)dµ(x) converges to zero in probability.
With the help of Lemma 8.2, one can check that the proof of Lemma 8.1 actually follows from an easy adaptation of the alternative proof of [41,Theorem 2.1] sketched in [41,Section 3.6].We provide a short proof for completeness.for some another positive finite constant C ′ .Finally, using assumption (i) of Lemma 8.1, and Weyl's comparison inequality for second moment (cf.[41, Lemma A.2]), we see that the assumption (i) of Lemma 8.2 is satisfied.Thus, recalling (8.2), the proof now completes upon applying Lemma 8.2.
Now we are almost ready to complete the proof of Theorem 1.1.Recall that we earlier mentioned that the control on the Stieltjes transform derived in Theorem 2.6 provides us necessary estimates on the number of singular values near zero.Indeed, the following lemma does that job.We now proceed to the proof of Theorem 1.1.The idea behind the proof is the following.From Theorem 2.1 we have that s n (S d n / √ d − z) is not very small with large probability.Therefore we can exclude a small region near zero while computing Log, ν z n where we recall ν z n is be the esd of S d,z n and S d,z n was defined in (2.21).Then we use Theorem 2.6 to show that the integration of log(| • |) around zero, with respect to the probability measure ν z n , is negligible.Using Theorem 2.6 we also deduce that {ν z n } converges weakly, which therefore in combination with the last observation yields Step 2 of Girko's method.Then applying the replacement lemma we finish the proof.Below we make this idea precise.on the event Ω n ∩ Ω ′ n (recall the definition of Ω n from the statement of Theorem 2.6), where we used the fact d ≥ (log n) 12  (log log n) 4 .
Next using integration by parts it is easy to check that for any probability measure µ on R and 0 ≤ a 1 < a 2 < 1, This in particular implies that ν z n converges weakly to ν z ∞ , in probability (for example, apply Montel's theorem in conjunction with [3, Theorem 2.4.4(c)]),where ν z ∞ is the probability measure corresponding to the Stieltjes transform m ∞ (ξ).Therefore | log |x||dν z ∞ (x) in probability, (8.8) for any R positive.Recall that for z ∈ D ε the support of ν z ∞ is contained in [−7, 7].On the other hand, using that log |x|/|x| is decreasing for |x| > e, we have that where C is an absolute constant, and (5.Since δ > 0 is arbitrary and τ δ → 0 as δ → 0, combining (8. n = A n / √ n in Lemma 8.1 we see that assumption (ii) there is satisfied.The assumption (i) of Lemma 8.1 follows from (5.3).Hence, using Lemma 8.1 and the circular law for i.i.d.complex Gaussian matrices (which follows from e.g.[4], but essentially goes back to Ginibre [23]), we obtain that for every ε > 0 and every f ε ∈ C 2 c (C), supported on D ε , f ε (z)dµ n (z) → 1 π f ε (z)dm(z), in probability, (8.13)where for brevity we denote µ n := L S d n / √ d .To finish the proof it now remains to show that one can extend the convergence of (8.13) to all f ∈ C 2 c (C).That is we need to show that for any δ > 0 and f ∈ C

Date:
April 5, 2018.* Partially supported by grant 147/15 from the Israel Science Foundation.‡ Partially supported by NSF postdoctoral fellowship DMS-1606310.§ Partially supported by grant 147/15 from the Israel Science Foundation.

2 . 2 .Theorem 2 . 1 .
Control on the smallest singular value.The following result provides the required lower bound on the smallest singular value of 1 √ d S d n − z.Fix any R > 0 and let z ∈ B C (0, R):= {z ′ ∈ C : |z ′ | ≤ R}.Assume 1 ≤ d ≤ n 100 .There exists C 2.1 < ∞ depending only on R and an absolute constant C 2.1 > 0 such that

. 1 )
where c 7.1 is an absolute constant and C 7.1 depends only on C 0 and R.Since we have concentration bounds in Theorem 6.1, as we will see below, it will be enough to show that inequality (7.1) holds for E m n (ξ).To show the same, it will be convenient to consider the Stieltjes transform of symmetrized version of the empirical measure of the singular values of z − S d n / √ d, where S d n is now centered.For ease of writing, let us denote S d n := d ℓ=1 P ℓ n , where for ℓ ∈ [d], {P ℓ n } are i.i.d.uniformly distributed permutation matrices.Define the resolvent as G(S d n ) := G(S d n , ξ, z) := ξI 2n − 0 zI n zI n denote m n (ξ) := 1 2n Tr G(S d n ).Lemma 7.2 (Loop equation for the sum of centered permutation matrices).Fix ξ ∈ C + such that n −1/16 ≤ Im ξ ≤ C 0 for some C 0 > 0. Fix z ∈ B C (0, R) for some R < ∞.Then, there exists a constant C 7.2 , depending on C 0 and R, such that

Lemma 7 . 8 (
Properties of m ∞ ).Fix any ε > 0 and let z ∈ B C (0, 1 − ε).Fix any ξ ∈ C + such that |ξ| ≤ ε −1 .Then there exist ε 0 > 0 such that for any ε < ε 0 there exists constants c 7.8 and C 7.8 , depending only on ε, such that c 7.8 ≤ | m ∞ (ξ)| ≤ C 7.8 .Proof.The proof of this lemma follows from [13, Lemma 4.3].There they analyzed properties of the solution m c (ξ) of the cubic equation m(1 + m) 2 ξ + (1 − |z| 2 )m + 1 = 0, which has nonnegative imaginary part for all ξ ∈ C. In [4] it was shown that for any ξ ∈ C + , −m c (ξ) is the Stieltjes transform of the limiting distribution of the empirical measure of the singular values of z − A n / √ n where A n is an n × n matrix of i.i.d.entries with certain moment assumptions on its entries.Note that the limiting measure is the same in our set-up.Therefore m ∞ (ξ) = −m c (ξ) on C + .Since m ∞ (ξ) = ξm ∞ (ξ 2 ), we use the relation between m ∞ (ξ)and m c (ξ) to extract the properties of m ∞ (ξ).
and • 2 denotes ℓ 2 norm.Fix an arbitrary γ 0 ≥ 1.Let 1 ≤ d ≤ n γ 0 , and let Z n be a deterministic n×n matrix such thatZ n 1 ⊥ ≤ n γ 0 and Z n 1 = ζ 1, Z * n 1 = ζ 1 for some ζ ∈ C.There exists C 2.2 < ∞ depending only on γ 0 and an absolute constant C 2.2 < ∞ such that In the proof of Theorem 2.2 it will be convenient to assume d ≤ n.We now show how to reduce to this case (in fact we could reduce assuming d ≤ c 0 n for any fixed constant c 0 > 0).
Therefore, m n (ξ) is the Stieltjes transform of the symmetrized version of the empirical measure of the singular values of z − S d n / √ d, and one has m n ) , we have |L 1 (i 1 )| < dm/2n, which contradicts the fact that i 1 ∈ I 1 .This establishes(4.14).From (4.13) and (4.14) it follows that ≤ i ≤ n and 1 ≤ ℓ ≤ d, so that S d n agrees with S d n on the third through n-th rows.We denote the first two rows of S d n by R 1 and R 2 .By replacing S d Note that Υ n (ξ) = o(1) for all ξ ∈ S ε,̟ .Now applying Lemma 7.1 we see that on the event Ω n (ξ 0 ) we have P 7)-(8.10) we deduce that Log, ν z n → Log, ν z ∞ , in probability.(8.11)Now the remainder of the proof is completed using Lemma 8.1.Indeed, consider A n the n × n matrix with i.i.d.centered Gaussian entries with variance one.It is well-known that, for Lebesgue almost all z, 1 n log | det(A n / √ n − zI n )| → Log, ν z ∞ , almost surely.(8.12)For example, one can obtain a proof of (8.12) using [12, Lemma 4.11, Lemma 4.12], [13, Theorem 3.4], and [35, Lemma 3.3].Thus setting D = D ε , B