Universality of the least singular value for sparse random matrices

We study the distribution of the least singular values associated to an ensemble of sparse random matrices. Our motivating example is the ensemble of $N\times N$ matrices whose entries are chosen independently from a Bernoulli distribution with parameter $p$. These matrices represent the adjacency matrices of random Erd\H{o}s-R\'enyi digraphs and are sparse when $p\ll 1$. We prove that in the regime $pN\gg 1$, the distribution of the least singular values is universal in the sense that it is independent of $p$ and equal to the distribution of the least singular values of a Gaussian matrix ensemble. We also prove the universality of the joint distribution of multiple small singular values. Our methods extend to matrix ensembles whose entries are chosen from arbitrary distributions that may be correlated, complex valued, and have unequal variances.


Introduction
Random real symmetric and complex Hermitian matrices have been intensely studied since Wigner's discovery that, in the large N limit, their eigenvalue densities are universal and follow the semicircle distribution. Recent investigations have culminated in a proof of the Wigner-Dyson-Mehta conjecture, which asserts the universality of the local eigenvalue statistics in the limit [9,14,18].
In the case of non-Hermitian matrices, there has been a similar study of the singular values. Given a N × N matrix M , its singular values are the eigenvalues of √ M † M , which we label 0 ≤ λ 1 ≤ · · · ≤ λ N . (1.1) Traditionally, one studies the squares of the singular values, the eigenvalues of M † M . In the bulk, the limiting distribution of the squares of the singular values is universal under fairly general hypotheses and follows the Marchenko-Pastur law [1,4,27]. Averaged-energy universality for the local correlation functions around a fixed energy in the bulk was shown in [20,28], and [28] also showed universality for the largest singular value λ N . However, the methods in [28] do not suffice to prove universality of the least singular value. The analysis of this case is more subtle because the Marchenko-Pastur distribution has a density with a singularity at the origin. Early work on the least singular value of random matrices was motivated by the analysis of algorithms in computer science. An important recent development in this area is the method of smoothed analysis, which estimates the practical performance of algorithms [30]. In these applications, it is important to estimate the probability that λ 1 is small for various random matrix models. Since the inverse of the least singular value of a matrix is equal to the operator norm of its inverse, these estimates control the probability that the inverse has large norm. We refer the reader to the [29] for an introduction to this line of research. Recent results include [31] on Bernoulli matrices, [13] on structured random matrices, [33] on matrices with i.i.d. entries shifted by a deterministic matrix, and [5,26] on sparse matrices.
The universality of least singular value distribution was recently considered in [32] from a viewpoint inspired by the method of property testing in computer science and combinatorics. The main result is that if ξ is a real random variable with Eξ = 0 and Eξ 2 = 1, and such that E|ξ| C < ∞ for some sufficiently large absolute constant C, then the distribution of the least singular values of the ensemble of N × N random matrices M N with entries chosen i.i.d. with distribution ξ/ √ N satisfies P (N λ 1 (M N ) ≤ r) = 1 − e −r 2 /2−r + O(N −c ), (1.2) for some c > 0. Also given in [32] are analogous results for complex matrices and for the joint distribution of multiple smallest singular values. However, these results require that the entries are independent and have equal variances. This paper studies the universality of the least singular value from the same dynamical viewpoint that was used to prove the Wigner-Dyson-Mehta conjecture. Our motivating example is the ensemble of N × N matrices whose entries are chosen independently from a Bernoulli distribution with parameter p. These matrices represent the adjacency matrices of random Erdős-Rényi digraphs, which are directed graphs on N vertices where each possible directed edge is present with probability p. Such matrices are sparse when p ≪ 1, and our result implies that the distribution of the least singular value is universal in the regime pN ≫ 1. We also apply our method to prove universality of the least singular value for matrices whose entries have unequal variances and weak correlations.

Overview and main result
In this section we state and prove our main theorem on the universality of the least singular value for sparse matrices, invoking several preliminary results proved in the forthcoming sections. We follow the three-step method [18][19][20][21] and take advantage of recent technical improvements [22][23][24].
Step 1 is to obtain local control of the singular values of a deformed ensemble, and in particular to prove their rigidity, which is necessary for the stochastic analysis in the second step.
Step 2 is to obtain short time universality. We prove that for a sparse matrix, the least singular value is universal after time t = N −1+ε when the singular values are evolved according to the singular value analogue of Dyson Brownian motion.
Step 3 is to remove the time evolution and prove universality for the original ensemble.
Section 3 contains the main estimate of this paper, the analysis of the time evolution of the singular values, which is necessary for Step 2. We use coupling and homogenization ideas developed in [9,23]. In Section 4 we carry out Step 1, using the ideas of [24].
Step 3 is accomplished in Section 5 by a Green function comparison theorem, following [10]. In Section 6 we discuss how to extend our methods to matrices with correlated entries, and in Appendix A we verify the SDE governing the evolution of the singular values is well posed.
We suppose for the remainder of this paper that the entry distributions for all random matrices considered are absolutely continuous with respect to Lebesgue measure, so that their singular values are distinct almost surely and can be strictly ordered. The case of a general matrix H is dealt with by considering H(ε) = H + εV , where V is a Gaussian matrix. The entires of H(ε) have distributions that are absolutely continuous for ε > 0, since their laws are convolutions with a Gaussian distribution. All results below may be extended to H by taking the limit as ε goes to 0 and using Weyl's inequality for singular values. where w = N −1/2 (1, . . . , 1) † , f is a parameter (which may depend on N ) such that 0 ≤ f ≤ N 1/2 , and B N is a real matrix with independent entries b ij such that for all k. We assume q satisfies N α ≤ q ≤ N 1/2 (2.3) for some α > 0, the s for some universal constants c and C, and the matrices S N are doubly stochastic. Remark.
The assumption that the variance matrices S N are doubly stochastic is made only for convenience so that we are in the usual case where the limiting global distribution of the singular values is a quarter circle. We address the technically more involved case of matrices with correlated entries in Section 6, and our discussion there subsumes the case of matrices with independent entries and a non-stochastic variance matrix. Further, the moment condition in Definition 2.1 could be relaxed slightly, following [22], without affecting our work.
Definition 2.2. Given a random matrix with eigenvalues {λ i } and E 1 ≤ E 2 , define the eigenvalue counting function n(E 1 , E 2 ) = |{E 1 < λ i < E 2 }|. (2.5) Our main result shows that the least singular value of a sparse random matrix ensemble is universal in the large N limit. The Wigner ensemble case of this result was proved in Theorem 1.3 of [32]. where c > 0 is an absolute constant uniform in r.
Proof. For any N , let G N be a N ×N matrix with independent entries of distribution N (0, N −1 ). We set M t = M N + √ tG N and let λ 1 (t) be the least singular value of M t , suppressing dependence on N . Let G ′ N be a matrix independent from G N with the same entry distributions, and let λ 1 (t) be the least singular value of G t = G ′ N + √ tG N . Define the block matrices Note that the eigenvalues of H t are precisely the singular values of M t and their negatives, and similarly for X t . For any ε > 0, Lemma 5.10 on short time universality shows that with overwhelming probability we have (2.8) for some N -dependent parameter t N satisfying t N ≤ N −1+ε and some σ > 0. We conclude by removing the time evolution. This is accomplished by comparing smoothed versions of the eigenvalue counting functions using results in Section 5. Let n t be the eigenvalue counting function for H t andñ t be the counting function for X t , as in Definition 2.2. Set E = r/N, y = N −1−σ , η = yN −32σ . (2.9) Recasting (2.8) in terms of these counting functions gives that, for any D > 0, P(ñ t N (−E − y, E + y) = 0) − N −D ≤ P(n t N (−E, E) = 0) ≤ P(ñ t N (−E + y, E − y) = 0) + N −D .
(2.10) The conclusion of Lemma 5.13 is that where q is a smooth cutoff function defined in Section 5, and Further, Lemma 5.15 shows that there exists ε > 0 and c > 0 such that for all t ≤ N ε /N , where the last inequality holds by another application of Lemma 5.13. Fix this ε and corresponding t N ≤ N ε /N for the rest of the proof. After adjusting c downward and combining this display with the previous one, we have for large enough N P(n t N (−E, E) = 0) ≤ P(n 0 (−E + 2y, E − 2y) = 0) + N −c . (2.14) Similar reasoning, interchanging the roles of n 0 and n t , gives We conclude, after settingỹ = 2y, that there exists c > 0 such that, for large enough N , Rephrased in the language of cumulative distribution functions, this is Sinceλ 1 is the least singular value of a matrix with i.i.d. entries, we may use Theorem 1.3 in [32] to control its distribution. After changing the variable of integration, we have for some small c > 0 that Note that (2.18) shows Together with (2.17), this completes the proof.
Since Theorem 3.2 holds not just for the smallest singular value, but also for the smallest k singular values when k is at most a small power of N , the same proof shows the universality of the joint distribution of the smallest k singular values for fixed k. This is the content of the following theorem. While we do not state the universal distribution explicitly, an exact expression can be derived, and we refer the reader to Section 6 of [32] for details.
be an ensemble with independent entries of distribution N (0, N −1 ). Then uniformly in all choices of (E i ) ∈ R k , for some c > 0 and large enough N .
Finally, we mention that analogues of our results hold when the matrix entries are complex valued. The proofs are essentially identical.

Short time universality
In order to state our main result, we introduce the following notation. We fix δ 1 > 0 and let g and G be N -dependent parameters such that We consider a deterministic matrix V of initial data and let W be a matrix of i.i.d. real random variables with distribution N (0, N −1 ). We define Let {s i (t)} N i=−N (omitting the zero index) be the eigenvalues of H t , which are the singular values of M t along with their negatives. We set Definition 3.1. With g and G defined as above we say V is (g, G)-regular if for |E| ≤ G and η ∈ [g, 10] for large enough N , and if there exists a constant C V such that We now state the main result of this section. Let W ′ be a random matrix with the same entry distribution as W and define W t = W ′ + √ tW .
Theorem 3.2. Fix σ > 0, and let V be a (g, G)-regular deterministic matrix. Let {s i (t)} be the singular values of M t and {r i (t)} be the singular values of W t . We fix parameters 0 < ω 1 < ω 0 and times t 0 = N −1+ω 0 , t 1 = N −1+ω 1 , with the restrictions that Then, there exist ω, δ > 0 such that the following holds with overwhelming probability. For i < N ω and t a = t 0 + t 1 , we have (3.6)

Preliminaries
We will prove Theorem 3.2 by analyzing the SDE that governs the evolution of the singular values when the initial matrix V undergoes a Gaussian perturbation: Given initial data (s i (0)) N i=1 , the SDE for the perturbed singular values is Technical information about this SDE is contained in Appendix A, where we show existence and uniqueness of strong solutions and verify that it represents the claimed evolution of the singular values. The arguments used in the appendix may also be used to show existence and uniqueness of strong solutions for the rest of the SDEs considered in this section.
Our main idea is to analyze a symmetrized version of (3.8). Define s −k (t) and B −k (t) for Then, the SDE (3.8) is equivalent to Note we have labeled the particles from −1 to −N and 1 to N , so that the zero index is omitted. Unless otherwise stated, this will be our convention throughout this section, and we will no longer note this omission explicitly.
Note also that (3.9) is the same as a standard DBM, except it is missing the repulsion term for j = −i. In particular, for the least singular value i = 1, the force deflecting this particle from the origin is weaker than in the usual DBM. This complicates the analysis of (3.9), since we are now missing an important regularizing effect, and level repulsion estimates such as the one used in [24] to study the short time behavior of DBM seem out of reach. However, in a more recent work, DBM was analyzed using a homogenization argument in [23] to prove fixed energy universality after a short time. The key observation here is that, after a careful examination of the proof, the missing term is negligible in that homogenization, so the same technique may be applied to the SDE (3.8).
In this section, we begin by studying the differences s i (t) − r i (t). By demanding that the SDEs for the r i and s i are driven by the same Brownian motions, the Brownian motion term vanishes in the equation for the differences, which renders it considerably easier to work with. This coupling approach was developed in [9,24]. Here we use a refinement of this idea, introduced in [23], and construct a continuous interpolation. Next, we obtain precise control over this difference equation by comparing its semigroup elements to the solution kernel of an appropriate integral equation, which represents the homogenized version of the DBM. Theorem 3.2 then follows from a short calculation in Subsection 3.6 exploiting cancellation in the kernel coming from the symmetrization of the SDE.
Remark. In this section we only deal with the real variable case, where we apply a real Gaussian perturbation. If we consider complex matrices, the appropriate SDE is The only difference here is that the diffusion term loses a factor of √ 2. The argument in the complex case is the same as the real case, and the main result of this section Theorem 3.2 holds with no difference.

Definitions
The following notion will be used throughout. Given a probability measure ρ(E) dE and some even number 2N , we define the classical particle locations γ i as follows, suppressing dependence on N . For i ≥ 1, we set For i ≤ −1, we set In accordance with our labeling convention, we do not define γ 0 . Our labeling is chosen so that γ 1 = 0 for a symmetric distribution centered at 0. We recall the Stieltjes transform of a N × N matrix M with eigenvalues λ i is given by (3.13)
For α ∈ [0, 1] we introduce a continuous interpolation between s i (t) and r i (t): . (3.14) The initial values are given by More precisely, we construct solutions for a countable dense set of α ∈ [0, 1] and use the continuity of the map from initial data to solution paths to construct solutions for the remaining α. This approach avoids the possibility that uncountably many measure zero sets accumulate. See Appendix A.2 for details. In particular, z i (t, 0) = r i (t 0 + t) and z i (t, 1) = s i (t 0 + t). The time shift is enforced to simplify the notation. Using the deformed law Theorem 4.5, we will establish rigidity of the particles after the short time t 0 . Hence, the time shift means the z i (t) are rigid for times t ≥ 0, which is notationally convenient. See Lemma 3.6 for the precise statement.
We define the γ i (t) to be the classical locations associated to the eigenvalue density of the symmetrization of the deformed matrix V + √ tW , which has eigenvalues {s i (t)} N i=−N . We let γ sc i denote the classical locations for the semicircle law.

Interpolating Measures
We would like to introduce a family of measures ρ(E, t, α) dE which will interpolate between the densities of the initial data. One approach is to take the free convolution of z i (0, α) with the semicircle law. However, the resulting measures are not regular enough for our purposes.
The following two lemmas assert that we can construct interpolating measures with better regularity properties. The first concerns the regularity of the ρ(E, t, α), and the second verifies these measures interpolate between the densities of the two initial ensembles. Here m(z, t, α) is the Stieltjes transform of ρ(E, t, α) dE, and we let γ i (t, α) denote the classical locations for ρ(E, T, α) dE.
The set G α used in the lemma is defined as follows. We fix q * ∈ (0, 1) and set k 0 to be the largest index such that Then G α is defined as Lemma 3.4. A family of interpolating measures ρ(E, t, α) dE exists (in a sense made precise by the following lemma) such that the following holds. Let δ > 0. For |E| ≤ N −δ t 0 , t ≤ N −δ t 0 , and N −1+δ ≤ η ≤ 10 all of the following estimates are true with overwhelming probability.
Define d(i, j) = |i − j| if ij > 0 and d(i, j) = |i − j| − 1 if ij < 0. This is just to define an appropriate distance for our indexing, since no element is indexed with 0. The proof of the following lemma is the same as Lemma 3.4 in [23].
Lemma 3.5. The following estimates hold for the ρ(E, t, α) dE constructed in the previous lemma with overwhelming probability. We have for ε > 0 and ω 1 < ω 0 /2, For |j|, |k|, ≤ N ω 0 /2 , and any choice of t ≤ 10t 1 , We further have an analogue of the Lemma 3.5 in [23], giving rigidity and a local law. Here we will choose a centered intervalĈ q of indices, asymptotically of size qGN , so particles with indices inĈ q can be controlled uniformly in α. Precisely, we define k 1 to be the largest integer such that and set for q ∈ (0, 1)Ĉ We let m N (z, t, α) be the Stieltjes transform of the z i (t, α).
The key input to the proof of this lemma is a deformed local law for small times t. In the eigenvalue context this law was shown in [24]. We prove the needed singular value version of this result in Section 4. Excepting this change, the proof is identical to the one described in Appendices A and B of [23].

Short Range Equation
Here we follow Section 3.3 of [23]. Define a centered process (3.30) Note this differs from (3.35) of [23] because we use the classical location γ 1 to center the particles, as we have no γ 0 . The new classical locations arẽ We recall from [25] that Then the SDE that governs thez i is Because we do not have good control over the extremal particles, and because we are interested only in particles near the origin, it is convenient to introduce a short range cutoff. We fix q * ∈ (0, 1), and ω l , ω A > 0 such that We choose the parameters in this way so the error in the forthcoming Lemma 3.7 is o(1/N ). Given q ∈ (0, 1), we define and we let A q * , (i) be the indices j such that (i, j) ∈ A q * and A c q * , (i) be the indices j such that (i, j) / ∈ A q * . Defineẑ i (t, α) as the solution to, for |i| ≤ N ω A , (3.37) Here the initial condition isẑ i (0) =z i (0).
Theẑ i are good approximations to thez i . This is the content of the following lemma, whose proof is identical to the proof of Lemma 3.7 in [23]. (3.38)

Short Range Kernel
Here we follow Section 3.4 in [23] and introduce a coupled parabolic equation with no Brownian motion term. We define u i = ∂ αẑi (t, α), where the differentiation is justified in Appendix A.2. We have the operator B is implicitly defined by the above equation, and ξ i is an error term that vanishes for |i| ≤ N ω A . Because it vanishes for small values of i, we will show its effect on particles near the origin is negligible. We let U be the semigroup associated to B. Its elements U ij (s, t) are defined so that, if v(t) is any solution of the system ∂ t v i = −(Bv) i , then for any times t, s ≥ 0, (3.41) Note that the U ij are random.

Finite speed estimates
We state two results on the decay of the semigroup elements U ij . Their proofs are straightforward adaptations of those given in Section 4 of [23] for the kernel U (B) in that reference.

Homogenization
We recall some definitions from [23] necessary for the homogenization argument. We fix a constant ε B > 0 such that ω A − ε B > ω l , and fix an integer a such that 0 < |a| ≤ N ω A −ε B . We also define the deterministic particle locations γ f j = j(2N ρ sc (0)) −1 . We consider a particular solution w of the short range equation that will play a key role: (3.45) Define the cutoff η l = N ω ℓ (2N ρ sc (0)) −1 . We will need the kernel p t (x, y) of the following equation The main result of this subsection is that this kernel is a good approximation to elements of the short range semigroup. We choose parameters s 0 and s 1 such that We now collect some bounds on these objects. We let p (k) t (x, y) denote the derivative in x. The following lemma is Lemma 3.11 in [23].
Additionally, |i| ≥ |a| + N ω l +ε 1 then for any D > 0 we have with overwhelming probability Proof. We see from the definition of B and examining min w i that all w i (t) are non-negative for t ≥ 0. Hence The second estimate is a consequence of Lemma 3.9 and the third estimate is (3.109) in [23]. The last estimate follows from applying Lemma 3.8.

54)
and for any Proof. See the beginning of the proof of Lemma 3.13 in [23].

Semigroup Estimate
In the lemma below, we temporarily break with our convention of omitting the zero index in order to match the notation of [23]. The strange indexing is due to the fact that we will need to apply it in the context of an even number of particles labeled −N to −1 and 1 to N . We think of shifting the positive indices of w and f down 1, so we will define for i ≥ 0, Then the largest positive index in the following sum will not have a matching negative index, but we will see this makes no difference to our application below.
Lemma 3.13. For any ε 1 , ε 2 , D > 0 and α, there exists an event F α such that P(F α ) ≥ 1−N −D and on which the following estimate holds.
Proof. We omit the superscript for this proof. Lemma 3.12 in [23] gives (3.61) Note the additional terms in the sum where j = −(i + 1). We separate them out and bound them.
(3.62) In the last inequality we used Lemma 3.11, and ε > 0 is arbitrary. Now we are done, because The next lemma is the key estimate for the homogenization. It is here we deal with the missing repulsion term in the symmetrized equation (3.9). Lemma 3.14. For t ≥ s 1 , the Ito differential of w(t) − f (t) 2 2 takes the form where M t is a martingale and X t is a process defined by the above equality. Fix ε, D > 0. Then additionally, for every α there is an event F α such that P(F α ) ≥ 1−N −D and on which the following bounds hold. For s 1 ≤ t ≤ 9t 1 , For any u 1 and u 2 with 9t 1 > u 2 > u 1 ≥ s 1 , Proof. This result is the analogue of Lemma 3.13 in [23], and most of the proof goes through with only notational changes. We comment on one important difference. The term (3.137) becomes, using the notation of that reference, The difference is that if −i ∈ A q * \ A 2 , (i) we omit the j = −i term. We will show in this case that the term is negligible, so that it can be reinserted and the proof can be completed in the same way. By the mean value theorem, there exists ξ ∈ R such that Then, using Lemma 3.6, Lemma 3.12, and the hypothesis on i, we have the bound This is the same size as the error obtained for (3.137) and (3.138) in the bound (3.142) of [23].
We obtain the following lemma by integrating.
We now obtain a time averaged version of our desired result.
Proof. We follow the proof of Theorem 3.15 in [23]. There are two changes. First, we must modify the bound in display (3.165) to use our kernel B. Recall that we omit the index i = 0 in w and f . As in our discussion of Lemma 3.13, we shift the positive indices down one in order to apply a Sobolev inequality in Appendix D of [23]. As in the proof of Theorem 3.15 of that reference, we apply the Sobolev inequality to obtain The first two terms are bounded as in the proof in [23]. In particular, using rigidity, we see the second is The third term is, using Lemma 3.11 and the analogous bound for f 2 , (3.76) This error is bounded by N −1+δ for some small δ > 0, which smaller than the errors obtained when bounding (3.165) in [23]. Hence replacing the old kernel with the new one in the proof is permissible.
Second, in (3.170) we apply our Lemma 3.13 and change (3.171) to use our B.
The removal of the time average as in Theorem 3.16 of [23] goes through without change, since it only uses the abstract properties of U and the bounds in Lemma 3.10.
Theorem 3.17. Fix a and i such that Fix also ε, D > 0. There exists an event F α such that P(F α ) ≥ 1 − N −D and on which the following bound holds.
Then as in the proof of Theorem 3.10 in [23], choosing s 0 and s 1 in Theorem 3.17 such that N s 0 = (N s 1 ) 2/3 and N s 1 = (N t 1 ) 9/10 yields the final homogenization result.

Conclusion
Here we largely follow the argument in Section 3.7 of [23]. We will frequently need to use the above bounds, which hold for fixed α on a set of large probability, for all α ∈ [0, 1] simultaneously when bounding an integrand. This is accomplished using Lemma E.1 and the Remark that follows it in Appendix E of [23].
Recalling u i = ∂ αẑi , we havê We now use the finite speed of propagation estimates to show that the initial data far from zero makes a negligible contribution to u i (t 1 , α). We perform this estimate in two steps. First, we remark that u that is a solution of where ξ satisfies the bound with overwhelming probability for 0 ≤ t ≤ 1. We define v i as the solution Then by the Duhamel formula, We fix δ B > 0 and consider i such that |i| ≤ N ω A −δ B . By Lemma 3.8 and (3.83), we obtain For the second step, we fix ε a > 0 and consider w defined as As in the first step, Lemma 3.8 shows the terms with |j| > N ω A are negligible. Fix ε b > 0 such that ε b < ε a . Then by Lemma 3.9, (3.90) Therefore, it suffices to estimate Recall that by hypothesis the initial data are symmetric, z j = −z −j . Hence we just need to estimate and accrue an error of Summing over j ≤ N t 1 N εa , then using Lemma 3.10 and the normalization of p t gives (3.97) Choosing ε, ε 2 , ε a > 0 small enough shows that for large enough N , for some small δ > 0. Finally, using Lemma 3.6 we have with overwhelming probability for arbitrarily small ε > 0. Hence

Deformed local law
In this section we prove the deformed local law, Theorem 4.5, necessary for the proof of Lemma 3.6. The required ideas are already present in [24], which proved a deformed local law for symmetric matrices. This section consists of adapting these arguments to the singular value case using symmetrization.
The structure of this section is as follows. In Subsection 4.1 we define some notions necessary for the rest of the section, state the main result, and compute Green function elements. In Subsection 4.2 we establish some preliminary estimates, then prove Lemma 4.9, a weak version of the deformed local law. Finally, in Subsection 4.3, we use a fluctuation averaging argument to upgrade this weak law and prove Theorem 4.5.

Deformed Semicircle Law
It is well known that in the large N limit, many symmetric matrix ensembles have a macroscopic eigenvalue density that obeys the semicircle law in the large N limit, with density The free convolution of the semicircle law and a deterministic diagonal where again the i = 0 term is omitted. There is a unique solution to this equation, and it is the Stieltjes transform of a measure absolutely continuous with respect to Lebesgue measure. We call the associated density ρ (N ) fc,t . These facts and basic properties of the free convolution are proved in [8]. We often write m fc,t and ρ fc,t for these quantities, omitting dependence on N .
We now state a result on the stability of m fc,t , proved in Lemma 7.2 of [24]. Recall the notion of (g, G)-regularity introduced in Definition 3.1.
The constants do not depend on σ or q.
for all v i . Both statements hold uniformly in N .
(ii) We have

Stochastic Domination
We recall the notion of stochastic domination introduced in [15].
be two sets of nonnegative random variables, where U (N ) is a possibly N -dependent parameter set. We say that X is stochastically dominated by Y , uniformly in u, if for all ε > 0 and D > 0 we have sup for large enough N ≥ N 0 (ε, D). The stochastic domination is always uniform in all parameters that are not explicitly stated. If X is stochastically dominated by Y , uniformly in u, we write X ≺ Y . If for some complex family X we have |X| ≺ Y , we also write X = O ≺ (Y ).
We say a sequence of events E = (E (N ) ) holds with high probability if 1 − ½(E) ≺ 0.
We also recall the basic properties of ≺.
The following large deviations estimates will be important for our work. Proofs may be found in, for example, [7].
for all p ∈ N with some constants µ p . In the following statements, Ψ can be any random variable.
If all of the above random variables depend on an index u and the hypotheses of (i) -(iii) are uniform in u, then so are the conclusions.

Model and main result
We consider as in Section 3 a deterministic initial data matrix V . We let W be a matrix of i.i.d. random variables with distribution N (0, N −1 ) and define Recall the eigenvalues of M t , which we will call λ i (t), are the singular values of of H t and their negatives. We also define suppressing the dependence of m N on t in our notation.
In what follows we fix δ 1 > 0 and always suppose that V is (g, G)-regular, as defined in Definition 3.1. We also suppose z ∈ D and t ∈ T σ , which are defined by fixing σ > 0, q ∈ (0, 1), and setting The following theorem is the main result of this section.
Theorem 4.5. Let be H t defined as above for some (g, G)-regular V . Fix q ∈ (0, 1) and σ > 0. Uniformly for t ∈ T σ and z ∈ D, we have (4.8)

Reduction to diagonal V
In the proof below, we will assume that V is diagonal with entries {v i } N i=1 . If V is a general deterministic matrix, it has a singular value decomposition V = ADB † with D diagonal and A, B unitary. Since the Gaussian ensemble W defined above is invariant under multiplication on the left and right by unitary matrices, we have for a Gaussian ensemble W ′ with the same entry distribution as W . Hence V + √ tW and D + √ tW ′ have the same distribution of eigenvalues, and we have shown that without loss of generality we may consider diagonal initial data.

Green Functions
We define the Green function matrix (4.10) For concreteness, we compute G 1,1 . The other G i,i are analogous. We often write G ij for G i,j . Applying the Schur complement formula with index set T = (1, N + 1) yields where

13)
(4.14) Notice that A 12 = A 21 , as G (T) is symmetric. Further, we may apply the Schur complement formula to the (N − 1) × (N − 1) blocks of G (T) and take the trace to see so EA 11 = EA 22 = m (T) (z). From the formula for the inverse of a 2 × 2 matrix, we find (4.17) Set (4.20) Set, for i > 0, We define and E 4 similarly. Then . (4.24) Define Our expression for G ii is, with K i and B i defined similarly,

Weak law
We prove some bounds necessary for our bootstrap argument.
Proof. This follows as in the proof of Lemma 7.5 in [24], with the minor change that we are using m instead of m fc,t in the definition of g i . However, if |m fc,t − m| ≤ (ηN ) −1/2 , then by Lemma 4.1 we also have a lower bound Im m ≥ c, and this suffices to complete the proof.
Proof. We first consider B 1 . Note that by the assumption we have a stability bound for m similar to Lemma 4.1 and this shows so it suffices to bound r and A 22 − m (T) . We have The first inequality follows from the eigenvalue interlacing lemma. (See, for example, Lemma 7.5 in [18].) For the second, we apply Lemma 4.4 and then Ward's identity to obtain The final bound follows from the assumption on m − m fc,t and the upper bound on Im m fc,t in Lemma 4.1. Finally, note these bounds are independent of i. We now consider K 1 . By the stability bound (4.2), and by the large deviations bound Lemma 4.4, It remains to bound E 3 and E 4 . We just do E 3 , as E 4 is similar. Using (4.2), and noting that min(ct, η) ≤ |v 1 − z − tm fc,t |, which implies √ cηt ≤ |v 1 − z − tm fc,t |, The r term was already bounded, and similar large deviations arguments apply to A 12 and E 2 .
Because w 11 has subexponential decay, Again, these bounds are independent of i. For the final claim, we apply the Cauchy-Schwarz inequality and note 1 2N We first bound the first factor in the right side of (4.37). By the stability bound Lemma 4.1, we get the bound Taking imaginary parts in the equation that defines m fc,t gives Im m fc,t = 1 2N (4.41) We now consider the second factor in (4.37). We may sum the representation (4.26) for G ii to obtain ηN , and then Lemma 4.6 to control the sum of the errors. There is a factor of log(N ) that is absorbed by the stochastic domination.
Taking imaginary parts in (4.42) gives Then, using Lemma 4.1 to write 2 Im m ≥ Im m − R for z ∈ D and absorb the error term, we have Putting these bound on each factor together, the expression we want to control is bounded above by (4.46) We also prove a priori bounds.
Lemma 4.8. If η ≥ 1, then, Proof. The first two bounds are proved as before, except we use the trivial estimate instead of |m − m fc,t | ≤ N −c to estimate Im m (T) . For the last bound, we use Then We now prove the weak local deformed law at the optimal scale using a bootstrapping argument. Our presentation follows [7]. Proof. First, note that both m and m fc,t are N 2 -Lipschitz continuous on D. For m this is well-known, and for m fc,t this Lemma A.1 of [23]. It then suffices to prove the statement for the latticeD = D ∩ (N −3 Z 2 ). We will verify at the end of the proof that for z ∈D with η ≥ 1 the claim follows from Lemma 4.8. We proceed on the assumption the claim is true for such z.
Hence for z = z 1 , in (4.55) we have |R| ≺ (N η) −1/2 and that the second factor on the left side is bounded below for large enough N . To see this, write (4.58) The first term is bounded below by Lemma 4.1, and the error term is o(N −c ) by Lemma 4.7, because it equals Now we may apply this reasoning sequentially for all k such that z k ∈ D. Note that the c > 0 used to verify the hypothesis of Lemma 4.7 can be chosen to be the same for each step, so this lemma needs to be invoked only once. The conclusion follows by noting that P (∩ k Ω k ) can be made larger than 1 − N D 1 for any D 1 by taking D large enough.
For z ∈D with η ≥ 1, we can use the same argument with the bounds in Lemma 4.8, which hold unconditionally, so there is no need for a bootstrapping argument. In particular, we do not need to use Lemma 4.6, since we have the trivial bound |g i | ≤ η −1 ≤ 1.

Strong Law
We now improve the bound Lemma 4.9 using fluctuation averaging. For any random variable X, let Q i X denote the conditional expectation of X with respect to the ith column and ith row of M t . For I = (i, i + n), J = (j, j + n), and any index set T containing pairs (k, k + n), set With i ≤ n and T = (i, n + i), Schur's complement formula gives and then Lemma 4.10 gives (4.66) Define the deterministic quantities (4.67) Lemma 4.11. We have Proof. Note that with |ε k | ≺ (N η) −1/2 by the weak law Lemma 4.9 and eigenvalue interlacing. We conclude using the same algebraic manipulations as in Section 4.1.5.
Lemma 4.14. The following claims hold for z ∈ D and T, S with |T|, |S| ≤ log N , and such that T and S are composed of pairs (k, k + n).
(i) We have where E is a matrix such that R ≺ (|g i | + |g −i |)(N η) −1 and this bound is uniform in the i. Hence and therefore (4.77) (iv) For I = J, and T not containing I or J, and therefore

81)
and hence (4.78) holds for any superscripts T and S, and hence (4.80) holds for any combination of superscripts on the Green function elements. Proof.
(i) We have Here we have used the Cauchy interlacing theorem at most log N times to split off the error |r| ≺ (N η) −1 , and the log N factor can be absorbed by the stochastic domination. The first claim now follows from the same algebraic manipulations as in our discussion of the Green functions. The second claim follows from the stability bound Lemma 4.1 and the weak law Lemma 4.9.
(ii) The first claim follows from using the representation of G −1 II developed in our discussion of the Green function, only now applied in the same way to G (iii) The first claim is just a special case of (ii). The second follows from the explicit computation of the entries of G II in Section 4.1.5 and the stability bound |g i | ≤ Ct −1 .
(iv) We first establish a Green function identity. We the Schur complement formula on the index set I = (i, i + n) as in Section 4.1.5, except now we concentrate on the upper off-diagonal block. We obtain which implies We bound just the first entry of G II G IJ , as the rest are similar. This entry is In the last step, we used the bound Im G (I) kk ≤ Im m (I) ≤ log(N ) and absorbed the log N factor into the stochastic domination. For the second claim, we have a similar identity, obtained in the same way using the other off-diagonal block in the Schur complement formula.
We can expand this using the first identity to obtain As shown in the work in Section 4.1.5, up to an O(1) factor, we have The middle factor has a norm that is stochastically dominated by where we used the large deviations bounds Lemma 4.4 and the Ward identity as before. The lemma follows from using |g i |, |g −i | ≤ min(t −1 , η −1 ). The third claim follows from the first two. Clearly we could repeat this argument for any T satisfying the given hypotheses.
(v) First, we show this implies the modification of (4.78). We want to bound [G using the explicit representation there.
We record an elementary fact for later use. The following arguments are based on the proof of Lemma 7.15 in [24]. (4.95) Proof. We proceed as in the proof of Lemma 4.7 in Appendix B of [16], by computing moments. We first consider the case p = 2. Recall that if Y is a random variable independent of the ith row and ith column, then EQ i (X)Y = EQ i (XY ) = 0. We have By Lemma 4.14, and the fact that, up to a constant, the norm of a matrix bounds its trace, we have (4.98) For A 2 , we note that S i is deterministic, so we may move it inside the Q i .
By Lemma 4.13, we can write Recall that if Y is a random variable independent of the jth row and jth column, then EQ j (X)Y = EQ j (XY ) = 0. So the first term will cancel when multiplied against the Q j (·) term. We are left with multiplying two terms of the following form.
Using Lemma 4.14 again, we see that this is stochastically dominated by (N η) −1 (|g i | + |g −i |). Specifically, we use (4.80) on the middle three terms and (4.76) on the outside pairs. Then Now we sum over i and j separately, picking up log terms by Lemma 4.12, and obtain the bound We now discuss even p > 1. Our strategy, as in [24], is to adapt the proof of Theorem 4.7 in [16] to the deformed case. Applying Lemma 4.15 to M = S i Q i G −1 II S i p yields, using the notation in [24], We now apply the algorithm in [16] to decompose each term, using the analogous substitutions (which also hold with any superscript T added to the Green function matrices) We see that these substitutions create non-commutative polynomials in the Green functions with the properties that inverted Green function matrices alternate with non-inverted ones, and that the lower indices pair across adjacent terms. The algorithm yields binary strings σ k and an expansion into monomials.
(4.108) Set Φ = (N η) −1/2 . We now claim it suffices to establish the key bound where s is the number of lone labels. This is proved in the Lemma 4.17, following this proof. Assuming this lemma, we will conclude the proof. It follows from Lemma 4.17 that for a partition Γ with l = |Γ|, where the d l are the sizes of the blocks in Γ. Then |f i | ≤ η −1 implies the bound (4.112) Proof. We claim that if b(σ k ) is the number of ones appearing in σ k , then If σ = 0, then we have a single term of the form S i k Q i k (G (T ) II ) −1 S i k , and using Lemma 4.14 we have If σ > 1, we can proceed similarly, except each interior pair of diagonal and off-diagonal terms gains a factor of Ψ by (4.78), and there are σ of these. Now we have various cases.
(i) One of the monomials (F i k ) σ k is not maximally expanded. Then (F i k ) σ k contains ≥ 2p off-diagonal Green function entries, and b(σ k ) ≥ 2p − 1 ≥ 2r + s. The result follows from the previous discussion.
(ii) Every monomial is maximally expanded, and for every lone label a there is a label b ∈ {1, . . . , p} \ {a} such that the monomial (F i b ) contains an off-diagonal resolvent entry with lower index I a . Then b(σ k ) ≥ s and again we are finished.
(iii) Every monomial is maximally expanded, and there is a lone label without a matching lower index in another monomial, say i 1 . Then, since the S i are deterministic and the other terms are maximally expanded, the other S i k Q i k (F i k ) σ k S i k are independent of the i 1 row and column. Then we can write Proof. This follows from Lemma 4.16 and Markov's inequality.
Proof of Theorem 4.5. We consider each term in (4.66), average over N , and take trace. For the first term we get g i + g −i + error, (4.118) where It remains to show the contributions from averaging the last two terms are negligible. The second term is dealt with by by using Lemma 4.11 to make the replacement and then Lemma 4.18 to bound the averaged error from this replacement. The largest terms in the error have the form By Lemma 4.14, the third term is Again by Lemma 4.14 and Hence the average over the third term is negligible. Note the average over the |g i | ± |g −i | picks up a negligible log N factor, as above. Finally, we have where |R| ≺ (N η) −1 , and we can repeat the proof of Lemma 4.9 with the improved error (N η) −1 in place of (N η) −1/2 to conclude.

Removal of time evolution
In this section we show how to complete the proof of universality given the main homogenization result Theorem 3.2. In Subsection 5.1 we prove a local law for sparse matrices, which is necessary in the following subsections. In Subsection 5.2 we prove short-time universality for sparse random matrices. Finally, in Subsection 5.3 we show how to remove the time evolution through a Green function comparison argument.

Sparse local law
For clarity, in this subsection we consider just the case of sparse ensembles where the variances are equal, s ij = N −1 , but the case of a general doubly stochastic variance matrix satisfying the conditions in Definition 2.1 can be handled with minor modifications. The key point is that in each of these cases the limiting spectral distribution is a semicircle. More general variance matrices (which give rise to new limit distributions), along with correlated entries, are considered in Section 6. We prove a weak local law for sparse matrices following the methods initiated in [17]. We recall that our model is M = B + f |w w|, using the notation of Section 2. The next lemma implies it is enough to prove such a law for B. Let the singular values of M be (µ i ) N i=1 and the singular values of B be (λ i ) N i=1 . Lemma 5.1. The singular values of M and B are interlaced, Proof. Note that M is a rank 1 perturbation of B. The result follows from Weyl's inequality and Majorization for singular values. Proof. This follows from Markov's inequality.
We also require a slight modification of Lemma 3.8 from [17].
Lemma 5.3. Let a 1 , . . . , a N be centered and independent random variables satisfying for all p. Then for any A i ∈ C and B ij ∈ C, where σ 2 i is the variance of a i and Further, if a 1 , . . . , a N and b 1 , . . . , b N are independent random variables satisfying the above moment condition, then for B ij ∈ C we have N i,j=1 Repeating the Green function calculations for the deformed case, we have (5.12) For any δ > 0, set (5.13) We will proceed largely as in the proof of the deformed law. The key difference is that we have better stability for m sc (see Lemma 6.2 in [18]), so we can prove the local law for a larger spectral domain.
Lemma 5.4. Suppose z ∈ D δ . Let φ be the indicator function of some event, which may depend on z.
Proof. For concreteness, we consider B 1 , but our bounds will be uniform in i. By the stability bound for m sc and the hypothesis on m, we have 1 −z−m ≤ C. By Cauchy's interlacing lemma, r ≤ C(N η) −1 . Using Lemma 5.3, reasoning as in the proof of the deformed law, we have Combining these completes the proof for the B i . For the K i , it remains to bound E 3 and E 4 . As these are similar to what was done before, we just sketch the proof for E 3 .
The r term was already bounded. Large deviations arguments using Lemma 5.3 suffice to bound A 12 and E 2 , and h 11 is bounded using Lemma 5.2. Combining these completes the proof. For the final bound, we write .
(5.18) The first term equals 1 − m 2 sc and by Lemma 6.2 of [18] it is bounded above by a constant in D δ . The second term is O(N −c ), which follows from the hypotheses and the bounds in the aforementioned lemma.
Lemma 5.5. For z ∈ D δ , Let φ be the indicator function of some event, which may depend on z. If φ(Λ o + Λ d ) ≺ N −c for some c > 0, then Proof. The first claim is proved the same way as in display (3.39) in [17]. We use the explicit expression (5.10) for G ii above to compute The claim follows by fixing i and averaging over j.
The second claim is proved as in Lemma 3.13 of [17]. We use By hypothesis we find By the Ward identity, where the last inequality follows from using repeatedly. Taking the maximum over i = j gives which implies the claim.
Lemma 5.6. If η ≥ 2, then, Proof. The proof is similar to the above using the trivial estimates for η ≥ 2 as in Lemma 4.8.
Lemma 5.7. If η ≥ 2, Proof. The bound on Λ o follows from the same calculation as in Lemma 5.5, where we now use Lemma 5.6 to bound the error terms and the trivial estimates |G ij | ≤ η −1 to bound the Green function entries. For Λ d we estimate using (5.10) where we used |z + m sc | = |m sc | −1 ≥ 2, |m − m sc | ≤ 1. (5.32) The conclusion follows by combining the Λ d terms on the left and using Lemma 5.6 to bound R. Let be the symmetric 2N × 2N block matrix formed from M , and definem to be the Stieltjes transform of H.
Proof. As noted above, it is enough to establish the theorem for m, the Stieltjes transform of K. Define, following the proof of Lemma 4.9, the latticeD δ = D δ ∩ (N −3 Z 2 ). We have already shown in Lemma 5.7 that the claim holds for z ∈D δ with η ≥ 2. As in the proof of the deformed weak law, it suffices to prove the result holds uniformly for elements of the latticeD δ with η < 2. Define n k = 2 − kN −3 and z k = E + iη k . Fix σ > 0 and D > 0, and define It is well known that m sc satisfies a self consistent equation Using Lemma 5.6, we may Taylor expand (5.10) to find where |R ′ | ≺ q −1 + (N η) −1/2 . By the stability estimate Im m sc ≥ c and working on the set Ω 0 to control Λ, we have where |R| ≺ q −1 + (N η) −1/2 . Subtracting the two self consistent equations yields This shows that Λ d (z 1 ) ≤ N −c for some c > 0, and similar reasoning applies to Λ o . Then, by Lemma 5.4, the coefficient of (m − m sc ) in the self-consistent equation is bounded below and |R| ≺ q −1 + (N η) −1/2 . Hence Λ(z 1 ) ≺ (N η) −1/2 + q −1 . We conclude by Lemma 5.5 that We may apply this reasoning sequentially for all k such that z k ∈ D. The conclusion follows by noting that P(∩ k Ω k ) can be made larger than 1 − N D 1 for any D 1 by taking D large enough.

Short time universality for sparse matrices
Let M N be a sparse matrix ensemble and define H by forming a 2N × 2N symmetric block matrix as in (2.7): We will now defined the perturbed matrix H t . In order to accommodate the possibly unequal variance structure, we do not simply add a Gaussian matrix. Instead, we evolve the nonzero entries of H according to the following Ornstein-Uhlenbeck dynamics from [10].
Here the B ij are independent Brownian motions. The entries of the evolved matrix H t satisfy We choose this dynamics because it preserves the mean and variance of the initial entries, and because the resulting entries are Gaussian divisible. With r = min {N s ij } and G a 2N × 2N symmetrized version of a N × N ensemble of independent standard Gaussian variables, and theB ij are a family of symmetric, independent Brownian motions. Note that √ t ≍ r(1 − exp(−t/r)) (5.46) because r is bounded below. Before invoking Theorem 3.2, we prove a lemma that helps show sparse matrices satisfy the by hypothesis of being (g, G)-regular. We now obtain short time universality for the new dynamics. Proof. Recall that for any t we have (1) t on the optimal scale g = N −1+ν for any ν > 0. Further, by invoking Lemma 5.9, this matrix is (g, G)-regular for any such g. The additional factor of 1 + O(t) does not affect the (g, G)-regularity of the singular values if t ≤ N −1−ε by the argument at the end of the proof of Lemma 6.3 in [22]. In summary, we find that H (1) t is (g, G)-regular for any g = N −1+ν with overwhelming probability. By making ν small enough and choosing constants appropriately in the statement of Theorem 3.2, we may take t a ≤ N −1+ε in statement of that theorem and apply it to complete the proof. More precisely, we apply Theorem 3.2 after conditioning on the entries of H (1) ta . Because the hypotheses of Theorem 3.2 hold for the singular values of H (1) ta with overwhelming probability, by the weak law Lemma 5.8 and Lemma 5.9, we obtain the conclusion with overwhelming probability.

Green function comparison
We now follow the argument in Section 4 of [28] to control the distribution of the least singular value of a stable matrix ensemble H N in terms of Green functions. Fix a matrix H from this ensemble. We retain the notation H t for the dynamics in the previous subsection.
We fix ε, r > 0 and set We also consider a time parameter t. We will be interested in the case 0 ≤ t ≤ N ε 0 /N for some small ε 0 > 0. For t ≤ N ε 0 /N , note that H t still satisfies the weak local law at the optimal scale. As described in the proof of Theorem 6.3 of [22], it is a consequence of the weak local law (in particular the fact that Im m(z) is bounded down to the optimal scale) that there exists C such that, for any interval I with length |I| ≥ N −1+δ , holds with overwhelming probability.
Lemma 5.11. Fix t such that 0 ≤ t ≤ N ε 0 /N and ε > 0. With overwhelming probability, Proof. By the argument in the proof of Lemma 6.1 in [34], where d 1 = |E − x| + η 1 and d 2 = | − E − x| + η 1 , and the right side is bounded by a constant if min{d i } ≤ l and is O(η 1 /l) if min{d i } ≥ l. We obtain by (5.52), on a set of probability greater than 1 − N −D , We now describe how to bound Tr f 2 (H t ). The term Tr f 1 (H t ) is similar. For x ≥ E + l, we have We consider the N intervals where the first is of a different size than the rest. By (5.52), each interval contains at most N ǫ eigenvalues. We also consider the interval [3, ∞]. Using this decomposition, we obtain (5.60) This completes the proof.
Lemma 5.12. Fix t such that 0 ≤ t ≤ N ε 0 /N and ε > 0. There is a constant C such that, with overwhelming probability, Proof. We see Lemma 5.11 holds with E replaced by y ∈ [E − l, E + l]. Recall l 1 = lN 2ε . Hence, with high probability, By (5.52), the two counting functions in the above expression are at most CN ε , so we obtain an error of N −ε and Tr A matching lower bound is proved similarly.
As in [28], we fix a smooth function q : R → R + such that q(x) is decreasing for x ≥ 0, q(x) = 1 for |x| ≤ 1/9, and q(x) = 0 for |x| ≥ 2/9. Lemma 5.13. Fix t such that 0 ≤ t ≤ N ε 0 /N and ε > 0. For any D > 0, we have Proof. When Lemma 5.12 holds, n(−E, E) = 0 implies Tr χ E−l 1 ⋆ θ η 1 (H t ) ≤ 1/9 with overwhelming probability. Hence and Markov's inequality applied to q(Tr χ E−l 1 ⋆ θ η 1 (H t )) yields Also, again using Lemma 5.12, We require the following modification of Lemma A.1 in [10], which asserts the continuity of the above dynamics. The proof is essentially the same. The deformed matrix θ ab H t is defined as (θ ab H t ) kl = f + θ ab kl (h kl (t) − f ), where θ ab kl = 1 if k, l = a, b and some number 0 ≤ θ ab kl ≤ 1 otherwise, where we impose the symmetry condition θ ab ab = θ ab ba . We define the index set I to be the entries in the off-diagonal blocks of the 2N × 2N symmetrized matrix: where the supremum is taken over deformations in the off-diagonal block indices (i, j) ∈ I. Then We will use Lemma 5.14 to study the expressions appearing above: Im m(y + iη 1 ) dy . Proof. It suffices to control the derivatives of

N π
E+l −E−l Im m(y + iη 1 ) dy (5.76) in order to apply the Lemma 5.14, since the derivatives of q are bounded independent of N . Recall here that η 1 = N −1−99ε is below the natural scale. By Lemma 5.16, which is proved below, we have for any D > 0 and s ∈ [0, t], with a deterministic upper bound of CN 3(1+100ε) . Then, moving the derivatives inside the integral, bounding the resulting integrand, and using the fact that the factor of N is canceled by the fact the integral is over an interval of length O(N −1 ), we obtain where σ 1 can be made as small as desired by adjusting ε. Hence B = O(N 1−σ ) for some σ > 0. We conclude by choosing ε small enough in order to make Bt = o(1).
Lemma 5.16. With the notation above, Proof. We first suppose that each deformed matrix θ ab H s has Green functions elements G ij (z) bounded by a constant C independent of all other parameters for energies E ∈ [−1, 1] and η ≥ N −1+ε , with overwhelming probability. We fix s ∈ [0, t] and indices a, b. Let G(z) be the Green function for this matrix. Defining where η = N −1+ε . We have Γ(E + iη) ≤ C for some constant C with overwhelming probability. Given this control over the individual Green functions elements, the argument used in the proof of Lemma 5.2 in [22] proves the first claim. The second claim follows from the trivial deterministic bound We now show that we have a uniform constant bound on the deformed G ij . This follows from the proof of Theorem 6.3 in [22]. Note that in that reference the bound is obtained with overwhelming probability for fixed s, and implicitly a standard stochastic continuity argument gives the bound uniformly in s.

Random matrices with correlated entries
In this section, we prove the universality of the least singular value for a class of non-symmetric square random matrices with correlated entries. We first sketch the proof of a local law for the smallest singular values. The method is based on [12] and has only slight differences. We then show how to adapted the method in the previous section on the removal of the time evolution. As a consequence, the universality holds for the smallest singular value. The same result can be proved for non-Hermitian random matrices with complex entries in the same way.

Correlated model
Consider a family of centered real random variables (x ij ) 1≤i,j≤N that satisfy We introduce a sparsity parameter q = N τ for some τ ∈ (0, 1]. Assume that for any p ≥ 2, there is a constant µ p < +∞ such that Furthermore, we assume that ξ has a profile, in the sense that there is a function φ : [0, 1] 2 ×Z 2 → R such that ξ ijkl = φ(i/N, j/N, k − i, l − j), and that φ is piecewise Hölder-continuous with respect to the first two variables. We impose exponential decay on the correlation. For any index set A ∈ {(i, j) : 1 ≤ i, j ≤ N } we define F A to be the σ-algebra generated by (X ij ) (i,j)∈A . For any two index sets A, B we define their distance by d(A, B) := max We assume that there are universal constants c 1 , c 2 > 0 such that for any random variables Z 1 ∈ F A , Z 2 ∈ F B with Var[Z 1 ] = Var[Z 2 ] = 1, the following inequality holds, (A, B)).
This is a weaker condition than the finite-ranged correlation in [12], because every pair of entries could be correlated, although exponentially weakly. Nevertheless the same concentration estimates hold for linear combination and quadratic forms (see Lemma 3.6 in [12] for the finiteranged case).
which is bounded by exp(−c log 2 N ) for some constant c > 0. Therefore the first inequality is reduced to proving We split the sum into ⌊log 2 N ⌋ parts, each part being the sum of weakly correlated random variables. Specifically, write S l := i A i⌊log 2 N ⌋+l so that Heuristically, each S l can be viewed as the sum of independent random variables, because the summands have very weak correlation with each other. LetÃ i ,x ki be independent copies of A i and x ki , and letS l be defined likewise by replacing A i , x ki with their copies. It is easy to see that for any p ≥ 2, E|S l | p = E|S l | p + O(c p exp(−c log 2 N )).
by expanding out |S l | p and collect all the cross terms, which are exponentially small. This implies that |S l | ≺ |S l | + O(exp(−c log 2 N )).
ForS l one can apply Lemma A.1 in [17] to get Therefore we summarize the estimates above and see The factor log 2 N can be absorbed into ≺ by definition. This proves the first inequality. The second inequality follows from a very similar argument. One only needs to split the sum into O(log 2 N ) parts, each of which being the sum of weakly correlated random variables. The weak correlation will not worsen the estimate, as we saw in the proof of the first inequality.
As usual we write the symmetrized version of X, namely a 2N by 2N matrix H defined by It is easy to see that H is a special case of the one in [12], without the positive definite condition (see Definition 2.2 in [12]), since H has many 0 entries. However, we can still consider an alternative positive definiteness condition in the current case: ijkl . We say that ξ is positive definite with lower bound c 0 > 0 if Σ (N ) ≥ c 0 for all N .

Self-consistent equation
As usual we define the Green's function G(z) through We introduce two stochastic control parameters: With the help of Lemma 6.1 and repeating the argument of Lemma 3.9 in [12], one can prove an estimate that is the same as Lemma 3.9 in [12]: Here the O ≺ notation is in the entrywise sense. We omit the details here, but point out that the only difference in the proof is that the error term here is bigger by a factor of O(log 2 N ), which is negligible in the context of stochastic dominance. It remain to show that G is close to the solution M to the following equation.
The whole argument in [12] goes through except for the part where the positive-definiteness condition on the tensor ξ is used to prove the stability of (6.4) (that is, the solution is stable under small perturbation of the equation). We now discuss the necessary changes.
In [12] the equation was transformed into a continuous version. It was shown by a discretization argument that (6.4) is stable in the bulk if and only if the following continuous equation is stable in the bulk.
Here C 0 is an universal constant depending on c 1 , c 2 > 0 in (6.1).
Proof. For any (s, t) ∈ [0, 1] 2 , take an arbitrary real continuous function g ∈ C([0, 1] 2 ) with |g| 2 = 1. For each N ∈ N define a random variable By definition of positive definiteness and the decay of correlation, we have c 0 ≤ VarY N ≤ C 0 . One can explicitly compute the variance of Y N : Let N → ∞ and use the fact that c 0 ≤ VarY N ≤ C 0 . we have Since g was arbitrary, we conclude thatφ(θ, ϑ, s, t) ∈ [c 0 , C 0 ]. The stability of equation (6.5) is very similar to the case in [12] and was analyzed in [2]. It follows from Proposition 3.10 (ii) in [2] that the following estimate holds. Proposition 6.4. Let u solve (6.5) and u ′ solve a perturbed version of (6.5), namely There exist universal constants ε > 0, and C > 0 such that if |Re z| ≤ ε, Im z ∈ (0, 10], u − u ′ ∞ ≤ ε, then u ′ − u ∞ ≤ C r ∞ Via the discretization method in [12], we can prove the stability for equation (6.4). Below, A ∞ means max i,j |A ij |. Proposition 6.5. Let M be the solution to (6.4) and let M ′ solve a perturbed version of (6.4), namely M ′ (−z − Ξ(M ′ )) = I + R.

Universality
Let (B ij (t)) 1≤i,j≤N be a famlily of Brownian motions that has the same correlation structure as (x ij ): Define x ij (t) by the SDE dx ij = dB ij − x ij 2 dt.
We show that when t ≪ N −1 √ q, the evolution does not effect the local statistics of the smallest singular values. Following the argument in [12], we prove the same result as in Lemma 6.1 in [12]. Lemma 6.8. Let x = (x k ) 1≤k≤m be an array of real centered random variables such that sup k E[|x k | 3 ] ≤ κ 3 3 and Corr[Z 1 , Z 2 ] ≤ c 1 exp(−c 2 d) for all nontrivial random variables Z 1 ∈ σ(x 1 , · · · , x k ), Z 2 ∈ σ(x k+d , · · · , x m ) and any 1 ≤ k ≤ k + d ≤ m. Let f be a C 2 function on R m with D 2 f ∨ κ 3 ∨ m ≤ N 100 . Then, Proof. If f is a linear function in x, then the equality is exact without error terms. In general, define T := {j : |i − j| ≤ log 2 N }, U := {j : |i − j| ≤ 2 log 2 N }. (1 − t)∂ kl f (x (T) + t(x − x (T) )x k x l dt.
We expand the second term further, Therefore, f (x) can be written as Here θx := (x k ½ k∈U θ k + x k ½ k / ∈U ). Therefore, we can compute The first and second term are expectation of products of weakly correlated random variables. So, Using Taylor expansion again, we can replace ∂ kl f (x (U) ) by ∂ kl f (x) with the cost of a small error term. Therefore, We can similarly imitate the proof of Lemma 6.2 of [12] to prove the following. The rest of the argument is essentially the same as the one in Section 5. Similarly to Section 5.2, we may decompose the correlated OU dynamics (6.7) as x ij (t) =x ij (t) + w ij , (6.10) where the (w ij ) are i.i.d. Gaussian and thex ij (t) have a positive definite correlation structure.
Then we see the local law holds forX(t) = (x ij (t)), and the rest of the arguments in Section 5 go through, since they do not rely on the structure of the matrix. The only necessary change is that in the proof of Lemma 5.16 we need to use a different method to show the (non-deformed) Green function entries are uniformly bounded by a constant down to the scale N −1+δ . Using the local law, this reduces to showing the entries of M are bounded, and this is a consequence of the definition of M and the analogue of Lemma 4.20 in [12]. The regularity necessary for Theorem 3.2 is provided by Corollary 6.7 and the analogue of Lemma 5.9 for the correlated model in this Section. The analogue is proved by splitting into log 4 (N ) weakly correlated matrices and applying the argument in the proof of Lemma 5.9. We state our conclusion as a theorem.
Theorem 6.10. For the class of correlated sparse matrices whose correlation comes from a positive definite profile function, as defined above, the conclusion of Theorem 2.3 holds.

A Singular value dynamics
This appendix collects information on the SDE
We follow almost exactly the arguments in Section 4.3 of [3], explaining the necessary changes. We also show the solutions of this equation are the singular values of a matrix Brownian motion process. Throughout, N will be fixed, and λ(t) = (λ 1 (t), . . . , λ N (t)).
Proof. In the proof of Lemma 4.3.3 in [3], replace the given definition of f with Then the estimates (4.3.6) given still hold, and As in [3], we define (A.5) We have the identities x i (u i,1 (x) + u i,1 (x)) = N (N − 1) (A.6) The equation becomes (suppressing the λ R i in the arguments of the u functions): df (λ R (t)) = 1 dt + 1 Now we are finished, as in [3], by a Borel-Cantelli argument.
We now explain why solutions to (A.1) have the same distribution as the singular values of H t from (2.7). We recall the SDE for the eigenvalues of H † t H t given in Appendix C of [10].
Existence and uniqueness of solutions to (A.11) was shown in [11]. An application of Itô's lemma to (A.1) yields Then λ 2 k almost solves the SDE given in [10], except here we are not always choosing the positive square root of λ 2 k . However, we obtain a weak solution by noting that λ 2 solves (A.11) with the Brownian motions chosen as B k = sgn(λ k )B k . Hence the solutions of (A.1) have the desired distribution.

A.2 Interpolation
Here we provide the details of the construction of the interpolated solutions (3.14) and show they are differentiable with respect to α.
We first construct solutions z i (t, α) for α ∈ Q ∩ [0, 1] using the argument in the previous subsection. Because there are a countable number of solutions, each of which exists individually except possibly on some set of measure zero in the probability space Ω, they all exist and satisfy the SDE simultaneously on a set of full measure E 1 ⊂ Ω. For α 1 , α 2 ∈ Q ∩ [0, 1], definẽ Supposeũ k (t) is a particle whereũ k (t) = max iũi (t). Because the particles are ordered, the coefficients B ij are positive, so ∂ tũk (t) ≤ 0. We conclude that ũ(t) ∞ is nonincreasing, and ũ(t) ∞ ≤ (α 1 − α 2 )( z(0, 1) ∞ + z(0, 0) ∞ ) (A. 16) holds uniformly for all t on E 1 . Since z i (t, α) is Lipschitz in α (with Lipschitz constant depending on the random initial data), it extends uniquely to a (random) function z(t, α) continuous in α ∈ [0, 1]. Further, since the uniform limit of continuous functions is continuous, the Lipschitz estimate shows the paths in the variable t are continuous for all α.
Fix α 0 ∈ [0, 1]. Ifz i (t, α 0 ) is a solution a.s., then the same reasoning that led to (A.16) shows thatz i (t, α 0 , ω) = z i (t, α 0 , ω) for a set of full measure in Ω. (Note, however, that this set of full measure may vary with the choice of α 0 ). By Fubini's theorem, z i (t, α, ω) is solution for a set of (ω, α) of full measure in the product space Ω × [0, 1]. This completes the construction of the interpolated solutions.
Hence this relation holds for every ω ∈ E 1 and therefore almost surely.