Non-Hermitian random matrices with a variance profile (I): deterministic equivalents and limiting ESDs

For each n, let An = (σij) be an n × n deterministic matrix and let Xn = (Xij) be an n × n random matrix with i.i.d. centered entries of unit variance. We study the asymptotic behavior of the empirical spectral distribution μn of the rescaled entry-wise product Yn = ( 1 √ n σijXij ) . For our main result we provide a deterministic sequence of probability measures μn, each described by a family of Master Equations, such that the difference μn − μn converges weakly in probability to the zero measure. A key feature of our results is to allow some of the entries σij to vanish, provided that the standard deviation profiles An satisfy a certain quantitative irreducibility property. An important step is to obtain quantitative bounds on the solutions to an associate system of Schwinger–Dyson equations, which we accomplish in the general sparse setting using a novel graphical bootstrap argument.


Introduction
For an n × n matrix M with complex entries and eigenvalues λ 1 , . . . , λ n ∈ C (counted with multiplicity and labeled in some arbitrary fashion), the empirical spectral distribution (ESD) is given by (1.1) A seminal result in non-Hermitian random matrix theory is the circular law, which describes the asymptotic global distribution of the spectrum for matrices with i.i.d. entries of finite variance. The following strong form of the circular law was established by Tao and Vu [60], and is the culmination of the work of many authors [30,48,31,15,17,33,50,61] -see the survey [19] for a detailed historical account. Theorem 1.1 (Circular law). Let ξ be a complex random variable of zero mean and unit variance, and for each n let X n = (X (n) ij ) be an n × n matrix whose entries are i.i.d. copies of ξ. Then almost surely, the ESDs µ 1 √ n Xn converge weakly to the circular measure µ circ (dx dy) := 1 π 1 {|x| 2 +|y| 2 ≤1} dx dy.
Recently there has been increasing interest in extending the arguments of [47,57] to matrix models with more structured distributions. In neural networks, where random matrices are used to model the synaptic matrix, the work [53] considered perturbed i.i.d. matrices of the form X n + M n , where M n is a fixed matrix with all entries within a fixed proportion of columns taking a fixed positive value µ + , and all remaining entries taking a fixed negative value µ − . Their motivation was to conform to Dale's Law, stating that neurons are either inhibitory or excitatory. In this case M n is a rank-one perturbation; as was later shown rigorously in work of Tao [58], low rank perturbations do not affect the limiting spectral distribution, but may lead to the creation of outlier eigenvalues.
Several recent works have studied the limiting spectral distribution for random matrices of the form A n X n = (σ ij X ij ) (1.2) (suitably rescaled) where A n = (σ ij ) is a fixed (deterministic) standard deviation profile. From a modeling perspective the σ ij can reflect the varying degrees of interaction between species/neurons. In theoretical ecology the works [7,8] considered asymmetric standard deviation profiles, i.e. taking σ ij = σ ji , in order to create more realistic predator-prey cascading relationships. In neuroscience the works [6,5] considered matrices A n partitioned into a bounded number of block submatrices having constant entries within each block, in order to model networks with a bounded number of cell types. We also note that predating these works, Girko [32,Chap. 25,26] (see also the references therein) studied non-Hermitian matrices with standard deviation and mean profiles and provided canonical equations to describe the limiting spectral densities.
Some works have also gone beyond matrices with independent entries of specified mean and variance, for instance considering products and sums of deterministic matrices with a random matrix having i.i.d. entries [1], or allowing correlations between entries X ij , X ji [9]. We also mention that parallel to the study of non-Hermitian matrices there have been many works devoted to the study of Hermitian random matrices with a variance profile, both Wigner and Gram-type -see for instance Girko [32,Chapter 7,8], Shlyakhtenko [56], Guionnet [35], Anderson and Zeitouni [12], Hachem et al. [36], Ajanki et al. [2].
As has been pointed out in the ecology literature [8], a key feature that is missing from the literature on models of the form (1.2) is to allow A n to have zero entries.
Indeed, the nodes in large real-world ecological or neural networks do not interact with all other nodes. One fix has been to take A n to have i.i.d. Bernoulli(p) indicator entries, independent of X n , i.e. to model the support of the network by a sparse Erdős-Rényi digraph. As was shown by Wood [64] the circular law still holds for A n X n (after rescaling by (pn) −1/2 ) if p ≥ n α−1 for any fixed α ∈ (0, 1]. However, the valence of the nodes in the resulting network is highly concentrated around pn, while the valence distribution for real-world networks is highly non-uniform [8]. With the ability to set A n deterministically one can reflect some known underlying geometry of the network. In the present work, our main focus is to understand the asymptotic global spectral distribution under minimal assumptions on the variance profile. Key features of our results, which also present significant challenges in the proofs, are to allow a large number of the entries σ ij to be zero, as well as to allow asymmetry (i.e. σ ij = σ ji ). As an example, we are able to handle variance profiles satisfying a robust irreducibility condition (see Definition 2.7), which includes matrices that are close in the cut norm to an irreducible graphon (Lemma 2.5).
As compared with the proof of the circular law, the identification and description of a limiting measure is significantly more involved. In this article we prove the existence of a sequence of deterministic measures -called deterministic equivalents -which asymptotically approximate the random ESDs. In particular we obtain non-trivial information EJP 23 (2018), paper 110. even when the ESDs themselves do not converge to a limit. The identification of the deterministic equivalents involves analysis of a (cubic) polynomial system of Master Equations determined by the variance profile. A relative of the Master Equations known as the Quadratic Vector Equation was studied in recent work of Ajanki, Erdős and Krüger and Alt, Erdős and Krüger on the spectrum of Hermitian matrices with a variance profile [3,10].
Since the initial release of this paper, a local law version of our main statement (Theorem 2.3) was proved in [11] under the restriction that the standard deviation profile σ ij is uniformly strictly positive and that the distribution of the matrix entries possesses a bounded density and has all its moments finite. In this case, it is also proved that the density of the deterministic equivalents is positive and bounded on its support. The latter properties may no longer hold if the standard deviation profile has zero entries or is not uniformly lower bounded. In these cases, the limiting distribution may offer a wider variety of behavior, such as a blowup or vanishing density at zero, or a point mass at zero; see [22].
It is by now well known that the study of ESDs for non-Hermitian random matrices is intimately connected with proving the quantitative invertibility of such matrices -that is, establishing lower tail estimates for small singular values. The possible sparsity of the matrices considered here gives rise to significant challenges for this task. Bounds on the smallest singular value sufficient for our purposes were established by the first author in [23]. In the present work we obtain control on the remaining small singular values from Wegner-type bounds, which are established by a quantitative analysis of the Master Equations. Specifically, the key is to show solutions to the Regularized Master Equations (see (2.7)) are uniformly bounded in the spectral scale parameter t and the dimension n. For this task we use an iterative graphical bootstrap argument (reminiscent of bootstrap arguments from the theory of differential equations) that exploits expansion properties of a directed graph naturally associated to the variance profile. (Graph expansion properties were also key for the analysis of the smallest singular value in [23].) We discuss these aspects of the proof in more detail after presenting the results in Section 2.

The model
In this article we study the following general class of random matrices with nonidentically distributed entries. Definition 1.2 (Random matrix with a variance profile). For each n ≥ 1, let A n be a (deterministic) n × n matrix with entries σ (n) ij ≥ 0, let X n be a random matrix with i.i.d. entries X (n) ij ∈ C satisfying EX (n) and set where is the matrix Hadamard product, i.e. Y n has entries Y (n) ij ij . The empirical spectral distribution of Y n is denoted by µ Y n . We refer to A n as the standard deviation profile and to A n A n = (σ Non-Hermitian random matrices with a variance profile Our goal is to describe the asymptotic behavior of the ESDs µ Y n given a sequence of standard deviation profiles A n which can be sparse, and may not converge in any sense to a limiting standard deviation profile.

Master equations and deterministic equivalents
The main result of this article states that under certain assumptions on the sequence of standard deviation profiles A n and the distribution of the entries of X n , there exists a sequence of deterministic probability measures µ n that are deterministic equivalents of the spectral measures µ Y n such that µ Y n ∼ µ n in probability (n → ∞), see Section 2.1.2 for notation.
The measures µ n are described by a polynomial system of Master Equations. Denote by V T n the transpose matrix of V n and by ρ(V n ) its spectral radius. For a parameter s ≥ 0, the Master Equations are the following system of 2n + 1 equations in 2n unknowns q 1 , . . . , q n , q 1 , . . . , q n : , (1.5) where q, q are the n × 1 column vectors with components q i , q i , respectively. If s ≥ ρ(V n ), it can be proved that the Master Equations admit the unique trivial solution (q, q) = 0. Provided that 0 < s < ρ(V n ) and that the matrix V n is irreducible, it can be shown that the Master Equations admit a unique positive solution (q, q) which depends only on s. This solution s → (q(s), q(s)) is continuous on (0, ∞). With this definition of q and q, the deterministic equivalent µ n is defined as the radially symmetric probability distribution over C satisfying µ n {z ∈ C , |z| ≤ s} = 1 − 1 n q T (s)V n q(s) , s > 0 .
It readily follows that the support of µ n is contained in the disk of radius ρ(V n ).
denoted by ( dz) or dxdy. For x, y ∈ R we write max(x, y) = x ∨ y and min(x, y) = x ∧ y. The cardinality of a finite set S is denoted by |S|. For S ⊂ [n] and when clear from the context we will abbreviate S c = [n] \ S.

Matrices
We denote by 1 n the n × 1 vector of 1's. Given two n × 1 vectors u, v, we denote their scalar product u, v = i∈[n]ū i v i .
For a given matrix A, denote by A T its transpose and by A * its conjugate transpose. Denote by I n the n × n identity matrix. If clear from the context, we omit the dimension. For a ∈ C and when clear from the context, we sometimes write a instead of a I and similarly write a * instead of (aI) * =āI. The notation A = 0 stands for A 0 and A = 0. Given a matrix A, A refers to its spectral norm, and |||A||| ∞ to its max-row norm, defined as: |||A||| ∞ := max i∈[n] n j=1 |A ij | = max u ∞≤1 Au ∞ . We denote the spectral radius of an n × n matrix A by ρ(A) := max |λ| : λ is an eigenvalue of A . If A is a square matrix, we write Im(A) = (A − A * )/(2i). For an n-dimension vector a, diag(a i ) denotes the n × n diagonal matrix with a as its diagonal elements.

Convergence of measures
Given probability distributions ν n , ν over some set X (= R or C), we will denote the weak convergence of ν n to ν by ν n w − −−− → n→∞ ν. If ν n is random, ν n w − −−− → n→∞ ν almost surely (resp. in probability) stands for the fact that for all f ∈ C c (X ), f dν n − −−− → n→∞ f dν almost surely (resp. in probability).
Let (µ n ) and (ν n ) be deterministic sequences of probability distributions over X , and let (ν n ) be tight, i.e. for all ε > 0, one can find a compact set K ε such that sup n ν n (X \ K ε ) ≤ ε.
We will denote by µ n ∼ ν n as n → ∞ the fact that the signed measure µ n −ν n weakly converges to zero, i.e. f dµ n − f dν n → 0 for all f ∈ C c (X ). If the sequence (µ n ) is random while (ν n ) is deterministic and tight, then µ n ∼ ν n almost surely (resp. in probability) stands for f dµ n − f dν n − −−− → n→∞ 0 almost surely (resp. in probability), EJP 23 (2018), paper 110.

Stieltjes transforms
Let µ be a nonnegative finite measure on R and its Stieltjes transform. Then the following properties are standard Moreover, −(z + g µ (z)) −1 is the Stieltjes transform of a probability measure, see for instance [63,Theorem B.3]. In particular

Graph-theoretic notation
Given an n × n non-negative matrix A = (σ ij ) we form a directed graph Γ = Γ(A) on the vertex set [n] that puts an edge i → j whenever σ ij > 0. We denote the out-neighborhood of a vertex i ∈ [n] in the graph Γ by For δ ∈ (0, 1) we denote the associated densely-connected out-neighbors of a set S ⊂ [n] by N (δ) To obtain quantitative results we will generally work with the graph associated to the matrix A(σ 0 ) = (σ ij 1 σij ≥σ0 ) (2.6) which only keeps the entries exceeding a fixed cutoff parameter σ 0 > 0, setting the remaining entries to zero.

Model assumptions
We will establish results concerning sequences of matrices Y n as in Definition 1.2 under various additional assumptions on A n and X n , which we now summarize. We note that many of our results only require a subset of these assumptions.
For our main result we need the following additional assumption on the distribution of the entries of X n .
Remark 2.1. Assumption A0 is needed to apply the results from [23] to bound the smallest singular value as in (2.10). It is also used in Section 3 to quantitatively bound the difference between our random measures and their deterministic equivalents, which is crucial for obtaining logarithmic integrability of singular value distributions. This latter step can likely be accomplished with fewer moments, but we do not pursue this.
Remark 2.2 (Convention). While we will keep the value σ max generic in the statements, we will always set it to 1 in the proofs, with no loss of generality.
In order to express the next key assumption, we need to introduce the following Regularized Master Equations which are a specialization of the Schwinger-Dyson equations of Girko's Hermitized model associated to Y n (see Section 2.5 for further discussion).
The following is proved in Section 4.1.
Proposition 2.1 (Regularized Master Equations). Let n ≥ 1 be fixed, let A n be an n × n nonnegative matrix and write V n = 1 n A n A n . Let s, t > 0 be fixed, and consider the following system of equations where r = (r i ) and r = ( r i ) are n × 1 vectors. Denote by r = r r .
r i . r i (s, t) ≤ C .
A variance profile V n for which the previous estimate holds is called admissible. Remark 2.3. Assumption A2 may seem obscure at first sight as it necessitates to solve the regularized master equations to check if a variance profile is admissible. In particular, it is not clear if this assumption is compatible with some sparsity. In Section 2.4, we provide sufficient conditions on the variance profile V n which imply A2, namely A3 (lower bound on V n ), A4 (symmetric V n ) and A5 (robust irreducibility for V n ).

Statement of the results
Recall the Master Equations (1.5), and notice that these equations are obtained from the Regularized Master Equations (2.7) by letting the parameter t go to zero.
Notice however that condition q i = q i is now required for uniqueness and not a consequence as in (2.7).
In order to prove existence of solutions q, q to the Master Equations we need to assume the standard deviation profile A n is irreducible. This is equivalent to assuming the associated digraph Γ(A n ) is strongly connected (recall the notations from Section 2.1.4). This will cause no loss of generality, as we can conjugate Y n by an appropriate permutation matrix to put A n in block-upper-triangular form with irreducible blocks on the diagonal -the spectrum of Y n is then the union of the spectra of the block diagonal submatrices. EJP 23 (2018), paper 110. Theorem 2.2 (Master equations). Let n ≥ 1 be fixed, let A n be an n × n nonnegative matrix and write V n = 1 n A n A n . Assume that A n is irreducible. Then the following hold: 1. For s ≥ ρ(V n ) the system (1.5) has the unique solution q(s) = 0.
3. The function s → q(s) defined in parts (1) and (2) is continuous on (0, ∞) and is continuously differentiable Remark 2.4 (Convention). Above and in the sequel we abuse notation and write q = q(s) to mean a solution of the equation (1.5), understood to be the nontrivial solution for s ∈ (0, ρ(V n )).
The main result of this paper is the following.

Theorem 2.3 (Main result)
. Let (Y n ) n≥1 be a sequence of random matrices as in Definition 1.2, and assume A0, A1 and A2 hold. Assume moreover that A n is irreducible for all n ≥ 1.
1. There exists a sequence of deterministic measures (µ n ) n≥1 on C such that µ Y n ∼ µ n in probability.
2. Let q(s) T = (q(s) T q(s) T ) be as in Theorem 2.2, and for s ∈ (0, ∞) put Then F n extends to an absolutely continuous function on [0, ∞) which is the CDF of a probability measure with support contained in [0, ρ(V n )] and continuous density on (0, ρ(V n )).
Remark 2.5 (Almost sure convergence under different hypotheses). As with Theorem 1.1, a key component of the proof is a lower tail estimate for the smallest singular value of scalar shifts Y n − zI n of the form holding for a.e. fixed z ∈ C. (Crucially, we do not need such an estimate for every z ∈ C, as A2 only requires s = |z| > 0 and allows variance profiles for which (2.10) is false when z = 0.) Such a bound is available for arbitrary fixed z = 0 and α > 0 a small constant by recent work of the first author [23]; see Proposition 6.1. Obtaining (2.10) with α > 1 would immediately improve conclusion (1) to almost sure convergence by an application of the Borel-Cantelli lemma.
Such improvements are already available under stronger assumptions on A n and X n .
Remark 2.6 (Density of µ n versus density of F n ). In the previous theorem, by the classical polar change of coordinates, the density ϕ n of µ n for 0 < |z| < √ ρ V is given by the formula: EJP 23 (2018), paper 110.
As an illustration we show how our results recover the circular law for matrices with i.i.d. entries.
Example 2.1 (The circular law). Consider a standard deviation profile A n with all elements equal to 1 and assume that A0 holds. It is well known in this case that µ Y n w − → µ circ in probability (and even almost surely), where µ circ stands for the circular law with density π −1 1 {|z|≤1} . We can recover this result with Theorem 2.3. In this case, both systems (2.7) and (1.5) simplify into a single equation: From the first equation, one can prove that r(s, t) ≤ 1 for t ∈ (0, 1]. In fact, Hence A2 is fulfilled. The second equation has the unique nontrivial solution Consequently, F n (s) = s 2 for s ≤ 1. From Remark 2.6, we conclude the desired convergence.
In the next example, we prove that a doubly stochastic normalized variance profile is admissible and that the associated deterministic equivalent µ n is the circular law.
Example 2.2 (Doubly stochastic variance profile). Assume that matrix V n is doubly Then, one quickly verifies that the vectors r = r1 and q = q1 with r, q as in (2.11) respectively satisfy the Regularized Master Equations and the Master Equations. As a consequence A2 can be established as in Example 2.1. Let now A1 hold and assume that the variance profile V n is irreducible for all n ≥ 1 then one can apply Theorem 2.3 with µ n equal to the circular law. Remark 2.7. Note that under A1 the doubly stochastic condition implies that the number of non-zero entries in each row and column is linear in n.
In the following theorem, we relax the irreducibility assumption, which requires some additional argument.
Theorem 2.4 (The circular law for doubly stochastic variance profiles). Let (Y n ) n≥1 be a sequence of random matrices as in Definition 1.2, and assume A0 and A1 hold. Suppose also that the normalized variance profiles V n are doubly stochastic, i.e. 1 This parallels results of Girko [32, §7.11, §8.2] and Anderson and Zeitouni [12] (see also [56], [35] for the Gaussian case) for random Hermitian matrices with a doublystochastic variance profile which obey the Marchenko-Pastur or Wigner semi-circle laws.

Sufficient conditions for admissibility
Hereafter we introduce a series of assumptions directly checkable over matrices (V n ) without solving a priori the regularized master equations. These assumptions enforce A2.
The simplest such assumption is to enforce uniform positivity of the variances, which allows one to bypass some of the most technical portions of our argument. This assumption was also made in the recent work [11]. We generalize A3 below with the expansion-type condition A5. Proposition 2.5. Let A = (σ ij ) be an n × n matrix with entries σ ij ≥ σ min > 0 for some σ > 0. Let r 0 be the unique solution of the Regularized Master Equations (2.7). Then ij ) is a sequence of standard deviation profiles as in Definition 1.2 for which A3 holds, then A2 is satisfied, i.e. V n is admissible.
A4 (Symmetric variance profile). For all n ≥ 1, the normalized variance profile (or equivalently the standard deviation profile) is symmetric: Proposition 2.6. Let A = (σ ij ) be a symmetric matrix with nonnegative entries, and let r 0 be the unique solution of the Regularized Master Equations (2.7). Then ij ) is a sequence of standard deviation profiles as in Definition 1.2 for which A4 holds, then A2 is satisfied.
We now introduce the following strengthening of the irreducibility assumption, which can be understood as a kind of expansion condition on an associated directed graph. Definition 2.7 (Robust irreducibility). For δ, κ ∈ (0, 1) we say that a nonnegative n × n matrix A is (δ, κ)-robustly irreducible if the following hold: (2.14) For comparison, a nonnegative n×n matrix A is irreducible if and only if N A T (S)∩S c = ∅ for all S ⊂ [n] with 1 ≤ |S| ≤ n − 1. Thus, a matrix A satisfying the conditions of Definition 2.7 is "robustly irreducible" in the sense that A remains irreducible even after setting a small linear proportion of entries equal to zero. Remark 2.8 (Relation to broad connectivity). In their work on permanent estimators, Rudelson and Zeitouni assume a stronger expansion-type condition on A(σ 0 ) which they call broad connectivity [54]. The conditions for a nonnegative square matrix A to be (δ, κ)-broadly connected are the same as in Definition 2.7, except (2.14) is replaced by   A5 (Robust irreducibility). There exist constants σ 0 , δ, κ ∈ (0, 1) such that for all n ≥ 1, Note that A5 enables variance profiles with a large proportion of vanishing entries and is implied by A3.
Theorem 2.8. Consider a sequence of standard deviation profiles A n = (σ (n) ij ) as in Definition 1.2, and assume that A1 and A5 hold. Then A2 holds, i.e. V n is admissible.
It turns out that the mere irreducibility of V n provides a weaker form of A2.
The main difference here is that constant C depends on n and may blow up with n.
We now state two general classes of standard deviation profiles satisfying A5 that include many examples not covered by A3, A4 or Example 2.3. The classes are in a similar spirit, essentially saying that A = (σ ij ) is approximately controlled from below by a block matrix whose nonzero blocks form an irreducible "pattern", and which are "regular" in the sense that the nonzero entries in the blocks are uniformly distributed in a certain sense. In what follows we write e A (S, T ) = i∈S,j∈T We recall a common assumption from the extremal combinatorics literature (see for instance [45,Definition 1.6]). For ε, ρ 0 ∈ (0, 1), we say that an m × n matrix A is (ε, ρ 0 )-super-regular if for every S ⊂ [m], T ⊂ [n] with |S| ≥ ε m, |T | ≥ ε n we have ρ A (S, T ) ≥ ρ 0 . Informally, this says that all sub-matrices of A of relative size ε have density (proportion of non-zero entries) at least ρ 0 . EJP 23 (2018), paper 110. Definition 2.10 (Block-pseudorandom irreducibility). For δ, ε, ρ 0 ∈ (0, 1) and K ∈ N, we say that a non-negative n × n matrix A is (δ, K, ε, ρ 0 )-block-pseudorandomly irreducible, or (δ, K, ε, ρ 0 )-BPI for short, if (2.13) holds, and if there is a partition of [n] into sets T 1 , . . . , T K with T q ≥ n/(2K) for each 1 ≤ q ≤ K, and a K × K irreducible 0/1 matrix M = (m pq ), such that for each (p, q) ∈ [K] 2 with m pq = 1, the sub-matrix A Tp,Tq is (ε, ρ 0 )-super-regular.
Thus, the BPI condition is met if we can partition [n] into a bounded number of parts of roughly equal size, such that the partitioned matrix contains an irreducible "pattern" of super-regular blocks. We note the ways in which the assumption that A(σ 0 ) is BPI generalizes Example 2.3: 1. The blocks do not need to be the same size.
2. The matrix A does not have to be constant within blocks (and can in fact have a large proportion of zero entries in every block, as well as an arbitrary proportion of entries larger that σ 0 ).

3.
A is arbitrary outside the sub-matrices A Tp,Tq with m pq = 1 (whereas in Example 2.3 the entries of A were zero there).
We also note that this assumption generalizes the block fully indecomposable assumption from [3] (see Definition 2.9 there), which makes a stronger assumption on the connectivity pattern M than irreducibility (that it is fully indecomposable) and also requires entries to be uniformly bounded below in blocks for which m pq = 1.
whenever |S| ≤ δn/4. Since n ≥ |S| we conclude (2.14) holds in this case with κ = δ/4. Assume now that |S| > δn/4. Let G = {q ∈ [K] : |S ∩ T q | ≥ 1 2 |S||T q |/n}. From the pigeonhole principle we know that G is nonempty. Suppose first that G = [K]. Then since M is irreducible, there exists (p, q) ∈ G c × G such that m pq = 1. Now since |S ∩T q | ≥ |S||T q |/(2n) ≥ (δ/8)|T q |, and since ε < δ/8, we deduce from the super-regularity Indeed, supposing otherwise, if T p is the set of i ∈ T p for which this lower bound fails then we values of i ∈ S c , where in the penultimate inequality of the former display we used that q ∈ G, and in the first inequality of the latter display we used that p ∈ G c . The claim follows in this case.
It remains to handle the case that |S| > δn/4 and G = [K]. In this case, repeating the lines above shows that |N A (i) ∩ S| ≥ ρ 4K |S| for at least (1 − ε)|T p | values of i ∈ T p for every 1 ≤ p ≤ K (because for every p ∈ [K] we must have m pq = 1 for at least one value of q). We hence have |N Our next sufficient condition for robust irreducibility involves notions from the theory of graph limits, in particular the concept of a graphon. For further background we refer to [46]. For our purposes, let us say that a graphon is a (Lebesgue) measurable function W : [0, 1] 2 → [0, 1]. We equip the set of graphons with the cut norm: where the integral is taken with respect to product Lebesgue measure, and the supremum is taken over all measurable subsets of [0, 1]. To a non-negative n × n matrix A = (σ ij ) we associate the graphon W A which is equal to σ ij on the set We say that a graphon U is a stepfunction if there is a measurable partition P = {S 1 , . . . , S K } of [0, 1] and a K × K matrix M = (m pq ) such that U (x, y) = m pq for all p, q ∈ [K] and all (x, y) ∈ S p × S q . We call P, M the partition and pattern, respectively, associated with U . Definition 2.11. Let K ∈ N, and α, σ * ∈ (0, 1). Say that a graphon W is (K, α, σ * )irreducible if there exists a stepfunction U with W ≥ U pointwise, and such that the following hold for the partition P = {S 1 , . . . , S K } and pattern M associated with U : 3. M is irreducible. Lemma 2.5. Let A = (σ ij ) be a non-negative n × n matrix and let K ∈ N and δ, α, σ * ∈ (0, 1). Assume (2.13) holds, and that there is a (K, α, σ * )-irreducible graphon W such that Proof. Arguing as in the proof of Lemma 2.4 it suffices to verify the condition (2.14) for , and (2.14) follows in this case. We henceforth assume |S c | ≥ δn/2. We have for all p ∈ L c . It is possible to choose L such that both L and L c are nonempty since |Ŝ|, |Ŝ c | ≥ δ/4. Since M is irreducible we must have m p q ≥ σ * for some (p, q) ∈ L × L c . EJP 23 (2018), paper 110.

Now we have
Combined with the previous display, we conclude e A (S c , S) ≥ δ 0 n 2 . On the other hand,

Outline of the proof
As is well known in the literature devoted to large non-Hermitian random matrices, the spectral behavior of such matrices can be studied with the help of the so-called Girko's Hermitization procedure, which is intimately related with the logarithmic potential of their spectral measure [31]. By definition, the logarithmic potential U µ of a probability measure µ on C which integrates log | · | near infinity is the function from C to (−∞, ∞] defined by The probability measure µ can be recovered by the formula µ = − 1 2π ∆U µ , valid in the set D (C) of the Schwartz distributions, which means that Now, setting µ = µ Y n , the logarithmic potential can be written as where L n,z := 1 n n i=1 δ si,z is the empirical distribution of the singular values s 1,z ≥ · · · ≥ s n,z ≥ 0 of Y n − z. For technical reasons, it is easier to consider the symmetrized empirical distribution of the singular valueš L n,z := 1 2n for which a similar identity holds: This identity is at the heart of Girko's strategy. In a word, it shows that in order to evaluate the asymptotic behavior of the spectral distribution µ Y n , we can focus on the asymptotic behavior ofĽ n,z for almost all z ∈ C. By consideringĽ n,z , we are in the more EJP 23 (2018), paper 110. familiar world of Hermitian matrices. Informally, for all z ∈ C, we will find a sequence (ν n,z ) n∈N of deterministic probability measures such thatĽ n,z ∼ν n,z , and gives that for all ψ ∈ C ∞ c (C), showing that h n (z) is the logarithmic potential of a probability measure µ n , and in particular that µ Y n ∼ µ n . Further smoothness properties of h n (z) will finally yield the properties of µ n stated in Theorem 2.3.
We provide more details hereafter with precise pointers to the article's results.

Study of the associated Hermitian model
This topic is covered in Section 3.
Given z ∈ C, we establish the existence of a sequence (ν n,z ) n∈N of deterministic probability measures such thatĽ n,z ∼ν n,z almost surely. To this end, we introduce the 2n × 2n Hermitian matrix with spectral measureĽ n,z [42, Theorem 7.3.7]. The asymptotic analysis ofĽ n,z relies on the resolvent : The rigorous use of the Stieltjes transform for the study of ESDs of Hermitian random matrices goes back to Pastur [52], and was further developed by Bai to obtain quantitative results [13,14]. Beginning with the seminal works [26,27] of Erdő, Schlein and Yau this approach has been used to show that the semicircle law governs the spectral distribution for Wigner matrices down to near-optimal scales. In these works, the basic strategy is to use resolvent identities to show that the Stieltjes transform approximately satisfies a fixed-point equation, sometimes called the Schwinger-Dyson (or master-loop) equation. This approach was extended to Hermitian matrices with doubly stochastic variance profile [28]. However, for Hermitian matrices with more general variance profiles and non-zero mean it becomes necessary to consider a system of equations that are approximately satisfied by individual diagonal entries of the resolvent.
In Section 3 we derive the Schwinger-Dyson equations for our setting, cf. Proposition 3.1: for η ∈ C + , with unique solution the 2n × 1 vector p = (pp) satisfying Im p 0. We prove that there exists a probability distributionν n,z whose Stieltjes transform gν n,z is defined as gν n,z = 1 n i∈[n] p i . In Theorem 3.2, it is established that for all η ∈ C + , gĽ n,z (η) − gν n,z (η) → 0 almost surely, which in particular implies thatĽ n,z ∼ν n,z a.s. In Proposition 3.6 we obtain a quantitative estimate along the imaginary axis of the form for some integer c.
We note that recently there has been much work analyzing the Schwinger-Dyson equations corresponding to Hermitian random matrices with mean and variance profiles satisfying a range of assumptions. For the centered case one is led to the so-called Quadratic Vector Equation, which has been thoroughly analyzed in works of Ajanki, Erdős and Krüger and Alt, Erdős and Krüger [3,10]; the application to universality for local spectral statistics was carried out in [2]. Very recently they have made the extension to matrices with correlated entries, which involves the study of the so-called Matrix Dyson Equation [4]. In another recent work, He, Knowles and Rosenthal prove an approximate (matrix-valued) self-consistent equation for resolvents of Hermitian random matrices with arbitrary mean and variance profile, which covers the structure of the model (2.18) [39]. However, they assume the entries have all moments finite, and their aim is to obtain a local law at the optimal scale. In the present work, a sub-optimal quantitative analysis of the system (2.20) under few moments will suffice for our purposes of understanding the spectrum of the associated non-Hermitian model Y n at global scale.
2. From the spectral measuresĽ n,z ∼ν n,z to the spectral measures µ Y n ∼ µ n via

the associated logarithmic potentials
This topic is covered in Sections 3 (partly) and 6.
The fact thatĽ n,z ∼ν n,z a.s. does not ensure that the random logarithmic potential U µ Y n (z) becomes close to the deterministic logarithmic potential h n (z) (assuming the latter is well defined). Essentially, this is due to the fact that x → log |x| is unbounded near zero and infinity. While the singularity at infinity is easily handled using the almost sure tightness of the measuresĽ n,z , the singularity at zero presents a major technical challenge (indeed, this hurdle was the reason it took decades to establish the circular law under the optimal hypotheses). We show that under the admissibility assumption A2, x → log |x| isν n,z -integrable, and that for all τ, τ > 0, there is ε > 0 small enough such that P ε −ε log |x|Ľ n,z (dx) > τ < τ for all large n. (2.22) Together with the almost sure tightness and weak convergenceĽ n,z ∼ν n,z , we can show that U µ Y n (z) − h n (z) → n 0 in probability. The almost sure convergence is an open problem not covered in this article. The proof of (2.22) is based on two ingredients. The first one is a result from [23] by the first author giving a lower tail estimate for the smallest singular value of Y n − z for arbitrary fixed z ∈ C \ {0} under the sole assumption A1 on the standard deviation profile A n . For the second result, established in Section 6, we show that there exist two constants C, γ 0 > 0 such that for all x > 0, Bounds of this form on the expected density of states for random Hermitian operators are sometimes referred to as local Wegner estimates (after the work [62]). Their application to the convergence of the empirical spectral distribution for non-Hermitian random matrices goes back to Bai's proof of the circular law [15]; our presentation of the argument is closer to the one in [34].
Such an estimate follows from control on Im EgĽ n,z (it) for small t > 0, is obtained in two steps. First, sufficient control of EgĽ n,z (it) − gν n,z (it) is already provided by the estimate (2.21).
Second, we rely on A2 to state that gν n,z (it) is bounded independent of n and t. For this task, a variation of the Schwinger-Dyson equations, namely the Regularized Master Equations, introduced in Proposition 2.1, is obtained simply by setting η = it in the Schwinger-Dyson equations (2.20) -in this case, p ∈ iR 2n and r = (r T r T ) T is defined as r(t) = Im p(it). Hence by A2, for some C < ∞ independent of n and t (depending only on |z| and the parameters in our assumptions).

Description of the deterministic probability measure µ n
This is covered in Sections 4 and 7.
We have proved so far that µ Y n ∼ µ n in probability, where µ n is the probability measure whose logarithmic potential U µn (z) coincides with h n (z). It remains to establish the properties of µ n that are stated by Theorem 2.3.
In Section 4 we prove Theorem 2.2. Our approach to obtaining the solution q(s) of (1.5) is through the Regularized Master Equations (2.7). Since these equations are obtained by a simple transformation of the Schwinger-Dyson equations (2.20), by our work in Section 3 we know that (2.7) has a unique solution r(s, t) satisfying r(s, t) 0, where we write s = |z|. We then show that the pointwise limit r * (s) := lim t↓0 r(s, t) exists, and moreover that q = r * and is the unique solution to (1.5). Having properly defined q(s), our main task is to show that the distribution −(2π) −1 ∆h n (z) in fact defines a density on the set D := {z ∈ C : |z| = 0, |z| = ρ(V n )} and to provide an expression for this density. The general approach towards solving this problem can be found in the physics literature (see [29]). Define on C × (0, ∞) the functions For fixed t > 0, these functions can be seen as regularized versions of the logarithmic potentials U µ Y n (z) and U µn (z) respectively, which converge back as t ↓ 0, in D (C). On the other hand, let us consider again the resolvent R n (z, η) introduced in (2.19).
Using the well-known formula for the inverse of a partitioned matrix [42, §0.7.3] and writing R n (z, η) =: we get that by setting η = it, that ∂zU Y n (z, t) coincides with n −1 tr F n (z, it). Relying on the asymptotic analysis made in Section 3 on the resolvent R n , we can easily obtain an expression for ∂zU n (z, t) by considering the asymptotic behavior of n −1 tr F n (z, it). We EJP 23 (2018), paper 110. then conclude by studying the equation Section 7.1 is devoted to these questions.
In Section 7.2 we conclude the proof of Theorem 2.4. As we noted above, the key is that (2.23) is easily obtained under the double stochasticity assumption by examining the explicit solution r to the Regularized Master Equations.

Sufficient conditions for A2 to hold
This topic is covered in Section 5.
While (2.23) can be proved in a few lines under A3 (see Proposition 2.5) or A4 (see Proposition 2.6), establishing such a bound under the more general robust irreducibility assumption A5 is significantly more technical. Here it is helpful to view the standard deviation profile in terms of the associated directed graph Γ(A(σ 0 )) (which was defined in Section 2.1). The basic idea is that the equations (2.7) encode relationships between the size of components r i (t), r i (t) at a vertex i to the sizes of the components at neighboring vertices. Assuming toward a contradiction that r i0 (t) is large at some vertex i 0 , we can use the robust irreducibility assumption to propagate the property of having large r i (t) to most of the other vertices i. We can also use the equations (2.7) to show that r i (t) will consequently be small for most i. However, this contradicts the crucial trace identity See Section 5 for further details.
We remark that under the stronger broad connectivity assumption on the standard deviation profile (see Remark 2.8), Wegner-type estimates that are sufficient for the purposes of this paper were obtained by the first author by a completely different argument, following a geometric approach introduced by Tao and Vu in [60] -see [24, Theorem 4.5.1].

Open questions
Relaxing the robust irreducibility assumption While control on the smallest singular value is proved under very general conditions (see Proposition 6.1), we have made the additional robust irreducibility assumption A5 in order to handle the other small singular values via Wegner estimates. Would it be possible to lighten this assumption?
Almost sure convergence One may want to upgrade the convergence µ Y n ∼ µ n in probability in Theorem 2.3 to almost sure convergence, as discussed in Remark 2.5.
Extension to sparse models While our assumptions allow any fixed proportion of the entries σ ij to be zero, Assumption A1 requires the number of non-zero entries to be a constant proportion of the total number of entries. Indeed, otherwise by the Weyl comparison inequality (cf. e.g. [41, Theorem 3.3.13]), the empirical spectral distributions µ Y n converge weakly in probability to δ 0 , the point mass at the origin. To obtain a nontrivial limit would require a rescaling of the matrices Y n , which amounts to rescaling A n to have entries of growing size.
A1 is required both to bound the smallest singular value of the shifted random matrices and to prove effective bounds on the Stieltjes transform. We expect that EJP 23 (2018), paper 110. our results should extend to certain matrices with density ∼ n ε −1 for arbitrary fixed ε ∈ (0, 1), suitably rescaled. An interesting first case to consider is random band matrices with shrinking bandwidth. The limit of the empirical distribution of the singular values was recently computed in [43], but bounds on the smallest singular value were not considered.
Extension to allow heavy-tailed entries In a similar direction, it would be interesting to prove an analogue of Theorem 2.3 for the case that the entries X ij lie in the basin of attraction of an α-stable law for some α ∈ (0, 2). In this case we expect the deterministic equivalents µ n will not have compact support. The limiting empirical distribution of singular values for such matrices (allowed to be rectangular with bounded eccentricity) was studied by Belinschi, Dembo and Guionnet in [18]. For the case that the entries are i.i.d. the limiting empirical spectral distribution was established by Bordenave, Caputo and Chafaï in [20].

Asymptotics of singular values distributions
Recall that in Section 2.5 we introduced the Hermitian matrix Y z n in (2.18) whose spectral measure isĽ n,z in (2.17). We also introduced the resolvent of Y z n , R(z, η) in (2.19) and labeled its blocks in (2.25 The main objective of this section is to provide deterministic counterparts of the normalized traces of these matrix functions. Given the matrices G and G, it will be convenient to introduce the vectors g = (G 11 · · · G nn ) T ,g = ( G 11 · · · G nn ) T and g T = (g TgT ) . (3.2) We begin by deriving the Schwinger-Dyson equations, a system of equations approximately satisfied by the diagonal entries of the matrices in (3.1). We then show the Schwinger-Dyson equations have a unique solution corresponding to Stieltjes transforms of probability measures and analyze the properties of the solution. Finally we estimate the difference between (3.1) and the true solution of the Schwinger-Dyson equations, which in turn is used to estimate the difference between the empirical spectral measure of Y z n and its deterministic counterpart. Notation 3.1. Let α n = α n (z, η) and β n = β n (z, η) be complex sequences such that there exist some constant C > 0 and some integers c 0 , c 1 all independent from η and n but which may depend on z such that We denote this by α n = O η (β n ). If α n = α i n and β n = β i n depend on some extra parameter i ∈ I, then the notation O η () in α i n = O η β i n must be understood uniform in i. If α n and β n are vectors or matrices, the notation α n = O η (β n ) corresponds to a uniform entrywise relation.

Derivation of the Schwinger-Dyson equations
In this subsection we specialize to the case that the entries of X are i.i.d. standard complex Gaussian variables. Later we will compare a general matrix with a Gaussian matrix, at which point we will label the Gaussian matrix and associated quantities with a superscript N ; however, we omit the superscript in the present subsection.
For a resolvent R as defined in (2.19) with complex entries, the following differentiation formulas hold true and will be needed in the sequel: We will heavily rely on the variance estimates provided in Proposition A.1 and Applying the integration by part formula for complex Gaussian random variables (see for instance [51, (2 Plugging this into (3.5) yields Arguing similarly with the help of the following integration by parts formula, valid for i > n, Notice that equations (3.6)-(3.9) can be compactly written (3.10) Using Cauchy-Schwarz inequality and the estimates in Proposition A.1, we get On the other hand, using the same decorrelation Combining these two equations, we finally get Using the property (2.2) twice, one has .
Combining similarly equations (3.6) and (3.9) and decorrelating when needed with the help of Proposition A.1, we obtain the companion equation: We now introduce an unperturbed version of equations (3.11) and (3.12).

Schwinger-Dyson equations
In this section we introduce the Schwinger-Dyson equations (2.20). Notice that these equations already appear in [4], from which we will deduce properties for their solutions.
To write this more compactly, we introduce the notation b = b b for any two n × 1 vectors b andb with complex components and the following definitions:  Both Υ and J depend on z as well (and to be even more precise, on |z|). We will not indicate this dependence in the sequel. We now collect properties of solutions to (2.20).
1. The system (2.20) admits a unique solution p satisfying Im p 0.
2. For any initial vector p 0 with Im p 0 0, the iterations p k+1 = J ( p k ) converge to this solution p as k → ∞.
3. For all z ∈ C and i ∈ [n], the functions η → p i (η) and η →p i (η) are Stieltjes transforms of symmetric probability measures on R respectively denoted by µ i and µ i . In particular, let r i (t) := Im is the Stieltjes transform if a symmetric probability measureν n,z . We denote this Stieltjes transform by η → gν n,z (η).
We henceforth refer to the solution p = p(η), Im p 0 as the solution to the Schwinger-Dyson equations.
Proof. Proofs of parts (1) and (2) are a direct application of Earle-Hamilton's theorem [38], which was first used (to the authors' knowledge) in random matrix theory by [40]. We now address part (3) and (5).  Similarly,p(it) = i r(t) with r(t) = ( r i (t)) and r i (t) > 0. Notice for future use that lim t→∞ tr i (t) = 1 and lim t→∞ t r i (t) = 1 . (3.17) Parts (3) and (5) of the theorem are established.
We now prove part (4), that is i p i = ip i . Getting back to the system (2.20), we Since η = 0, we get the desired result. Then for all z ∈ C, (ν n,z ) n is tight, andĽ n,z ∼ν n,z almost surely. Moreover, for any ε > 0, x → log |x| isν n,z -integrable on the set {|x| ≥ ε}

Asymptotics of the spectral measureĽ n,z and the Hermitian resolvent
We will sometimes refer toν n,z as the deterministic equivalent ofĽ n,z . The proof of Theorem 3.2 is postponed to Section 3.5. Notice that the first part (Ľ n,z ∼ν n,z ) is a variation of classical results, see for example [36]. It will be a direct consequence of the forthcoming theorem on the asymptotics of Hermitian resolvent.
In order to get some insight on the asymptotics of the spectral measure µ Y n , we need more than the asymptotics ofĽ n,z . We rewrite hereafter the Schwinger-Dyson equations of Proposition 3.1 in a more suitable way for the forthcoming analysis. In what follows, the dependence in |z| is implicit and will be recalled if necessary.
We now introduce the deterministic equivalents to F and G, defined in (3.1). Let p = (p,p) be the solution of the Schwinger-Dyson equations (2.20). Define the n × n diagonal matrices P , P , Θ and Θ by P := diag(p) , After easy massaging, the Schwinger-Dyson equations p = J ( p, η) are equivalent to: Consider 2n × 2n matrix S defined as and B (z, η) can be made explicit in a similar fashion, but will not be used.
Theorem 3.3. Assume A0 and A1 hold. Then almost surely, for every z ∈ C and η ∈ C + , Moreover, there exist α, β > 0 such that for t ∈ (n −α , n β ) and n large enough, with η = it The rate provided along the imaginary axis is not optimal, but sufficient for our purposes.
The proof of Theorem 3.3 immediately follows from Propositions 3.4, 3.5 and 3.6 stated hereafter. Proposition 3.4. Assume A0 holds and let z ∈ C and η ∈ C + . Then almost surely, Proof. This is a direct application of [19,Lemma 4.21].
To manage the expectation terms n −1 tr E(·), we introduce the Gaussian counterparts of the quantities of interest. Consider a family of i.i.d. standard complex random variables (X N ij ; 1 ≤ i, j ≤ n), where X N ij = (U + iU )/ √ 2, with U, U being independent real N (0, 1) random variables. Notice in particular that Proposition 3.5. Assume A0 and A1 hold. Let z ∈ C and η ∈ C + . Then The proof of Proposition 3.5 relies on fairly standard arguments and is thus postponed to Appendix A.3. Proposition 3.6. Assume A1 holds, and let z ∈ C and η ∈ C + . Then Moreover, there exist α, β > 0 such that for t ∈ (n −α , n β ) and n large enough, with η = it The proof of Proposition 3.6 follows hereafter in Section 3.4. From Theorem 3.3, we will deduce the asymptotic behavior of the empirical distribution L n,z of the singular values of Y n − z by analyzing the convergence of n −1 tr G(z, η). Moreover, for any ε > 0, the asymptotic behavior of {|x|≥ε} log |x|Ľ n,z (dx) will be identified.

Proof of Proposition 3.6
We first prove the convergence to zero. Recall that V = A √ n A √ n , that g is introduced in (3.2), that p is the solution of the Schwinger-Dyson equations and define Recall the definition of Υ in (3.13). For given n × 1 vectors b andb, define .
and a similar expression holds for E G ii −p i . Taking into account the straightforward where K is an absolute constant (depending on σ max ). Letting η ∈ C + be chosen in such a way that K |η| 2 +|z| 2 Im 4 (η) + 1 Im 6 (η) < 1, one has ε ∞ (η) − −−− → n→∞ 0. On the other hand, EG ii − p i is analytic and uniformly bounded over the compact subsets of C + . Therefore, every converging subsequence converges toward an analytic function which coincides with the zero function on a sufficiently large open subset of C + , and hence is equal to the zero function over C + . The same applies to EG ii −p i . This proves that ε ∞ − −−− → n→∞ 0 for every η ∈ C + .
We have proved so far that for n ≥ n 0 (α, β) large enough and η ∈ Γ n (α, β), We are now in position to conclude: The same arguments apply verbatim for the term 1 where (a) follows from (3.7) and (3.19). One can now apply the same arguments as previously and handle similarly the term 1 n tr EF (z, η) − 1 n tr B (z, η). This completes the proof of Proposition 3.6.

Proof of Theorem 3.2
The convergenceĽ n,z ∼ν n,z is a direct consequence of Theorem 3.3. Now, it is easy to prove with the help of the law of large numbers that a.s.
This, together with Proposition 3.1-(5), yields The proof is complete.

Regularized Master Equations
Our first step is to introduce the so-called Regularized Master Equations, which are obtained from the Schwinger-Dyson equations (2.20) by taking η = it and substituting p(it) = ir(t). Given two n × 1 vectors a andã with nonnegative components and fixed numbers s > 0 and t ≥ 0, let a = ã a and define the following quantities Ψ( a, t) := diag 1 Remark 4.1. We shall also prove that for any initial vector r 0 0, the iterations r k+1 = I( r k , t) converge to r as k → ∞.
Proof of Proposition 2.1. This follows immediately from Proposition 3.1 by setting p(it) = ir(t) andp(it) = i r(t).

Proof of Theorem 2.2
In the following propositions, we recall some known properties of nonnegative and irreducible matrices.   r(s, t) < ∞ .
In particular, r(s, t) admits an accumulation point for fixed s > 0 as t ↓ 0.
Next we show that any accumulation point provided by the above lemma constitutes a solution to the Master Equations (1.5): 1. Let s > 0. If r * = (r * , r * ) 0 is an accumulation point for r(s, t) as t ↓ 0, then 2. If moreover s 2 ∈ (0, ρ(V )), then r * = 0.
Proof. The proof of part (1) is straightforward.
Proof. We first prove Item (1). Observe that Ψ( q)V T is a nonnegative irreducible matrix for all q 0. Assume that q = 0. Then ρ(Ψ( q)V T ) = 1 and q 0 by Proposition 4.2.
From the equation 1 T q = 1 T q obtained from (1.5), we have q = 0. Therefore, q 0 by an argument similar to the one used for q. Consequently, leading to the contradiction where the strict inequality is due to Proposition 4.1. Hence q = 0. We now turn to Item (2). The argument q = 0 ⇒ q 0 is identical to Item (1).
The first step towards establishing uniqueness of the solution is showing that if q = (q T , q T ) T and q = ((q ) T , (q ) T ) T are two positive solutions such that q = q , then Ψ( q) = Ψ( q ). Assume the contrary. The equation q = Ψ( q)V T q shows that 1 is the Perron-Frobenius eigenvalue of the irreducible matrix Ψ( q)V T (Proposition 4.2). Since its eigenspace has the dimension one, we get that q = αq for some α > 0. A similar EJP 23 (2018), paper 110. argument shows that q =αq for someα > 0. Using the assumption Ψ( q) = Ψ( q ) again and inspecting the expressions of these terms, we get that α =α −1 . Moreover, the equations 1 T q = 1 T q and 1 T q = 1 Tq show that α =α. This implies that q = q , a contradiction.
To establish the uniqueness, let us still consider the two positive solutions q = q . Write Ψ = diag(ψ i ) = Ψ( q) and Ψ = diag(ψ i ) = Ψ( q ), and define the vectors and their similarly defined analogues ϕ ,φ , and ϕ . It holds by the irreducibility of V that ϕ, ϕ 0. We now write and a similar equation forφ i , giving rise to the identity Considering now the two solutions ϕ and ϕ , we can write for every i ∈ [n], and we also have a similar inequality forε i := |(φ i −φ i )/ φ iφ i |. It results that the vector ε = (ε 1 , . . . , ε n ,ε 1 , . . . ,ε n ) T satisfies the inequality ε K q, q ε, By applying the Cauchy-Schwarz inequality to the scalar products x m 1, m = 1, . . . , n, where x m is the row m of K q, q , we get that K q, q 1 (K q 1) (K q 1) = 1. Now, for any k ∈ N, we have Since ϕ, ϕ 0 and V is irreducible, it holds that for any (i, j) ∈ [2n] 2 , there exists k ∈ [n] such that [K k q, q ] ij > 0, implying that K q, q is irreducible. Relying on these results, we will show that there exists i ∈ [n] such that x i 1 < 1, i.e. Cauchy-Schwarz inequality is strict for this row vector. Proposition 4.2 will then show that ρ(K q, q ) < 1. By consequence, the only solution to the inequality ε K q, q ε will be ε = 0, contradicting the assertion q = q , and the uniqueness of the solution of (1.5) for q = 0 will follow.
Recalling that Ψ( q) = Ψ( q ), there exists ∈ [n] such that ϕ φ = ϕ φ . Since V is irreducible, no column of this matrix is zero. Therefore, we can choose i ∈ [n] such that and his analogue v i (with the obvious notations). Consider also the 2 × 2 matrix Since det(M 2 ) = ϕ ϕ ( ϕ φ − √ ϕ φ ) = 0, the vectors v i and v i are not collinear. Therefore,

Lemma 4.3 is proved.
It remains to establish Theorem 2.2-(3). This is a consequence of following lemma, which also provides an expression for ∇ q(s) on (0, ρ(V ) 1/2 ).

Lemma 4.4.
Assume that the nonnegative n × n matrix V is irreducible. Then the function s → q(s) is continuous on (0, ∞), and is continuously differentiable on (0, the matrix A(s) has a full column rank, and Proof. We already know that q(s) = 0 on [ρ(V ) 1/2 , ∞), and we can easily check that ∇ q(s) = 2sA(s) −L b(s) on (ρ(V ) 1/2 , ∞). Let us show that q(s) is continuous on (0, ρ(V ) 1/2 ). Fix s ∈ (0, ρ(V ) 1/2 ). When u belongs to a small neighborhood of s in (0, ρ(V ) 1/2 ), the function q(u) is bounded by Lemma 4.1, since q(u) is the limit as t ↓ 0 of the bounded function q(u, t). Let u k → k s be such that q(u k ) → k q * . The vector q * is clearly a solution to (1.5). Observing that ρ(Ψ( q(u k ))V ) = 1, we get by the continuity of the spectral radius that ρ(Ψ( q * )V ) = 1. If q * were equal to zero, then we would have ρ(Ψ( q * )V ) = ρ(s −2 V ) > 1, a contradiction. Therefore q = 0, and by Lemma 4.3-(2), q * = q(s) since q(s) is the only nonnegative and non zero solution to (1.5). To obtain the continuity of q(s) on (0, ∞), all what remains to prove is that q(u) → 0 as u ↑ ρ(V ) 1/2 . Relying on Lemma 4.1, take a sequence u k ↑ k ρ(V ) 1/2 such that q(u k ) → k q * . Then, the same argument as in the proof of Lemma 4.3-(1) shows that q * = 0.
To establish the differentiability of q(s) on (0, ρ(V ) 1/2 ), we start by writing Doing a similar derivation forq i (s), we get the equation q(s) = N (s) q(s), where As in the proof of Lemma 4.3-(2), we can show that N (s) is irreducible. Thus, the Perron-Frobenius eigenvalue of N (s) is equal to one, it is algebraically simple, and its associated eigenspace is generated by q(s). Now, given two real numbers s, s ∈ (0, ρ(V ) 1/2 ) with s = s , we have where we set q i = q i (s) and q i = q(s ), and we used the same notational shortcut for all the other quantities. We thus have Doing a similar derivation forq i −q i , we obtain the system Using in addition the identity which shows that q(s) is differentiable for any s ∈ (0, ρ(V ) 1/2 ), with the gradient 2sA(s) −L b(s). The continuity of this gradient follows from the continuity of A(s) and b(s) and the fact that A(s) has a full column rank for any s ∈ (0, ρ(V ) 1/2 ).

Bounding solutions to the Regularized Master Equations via graphical bootstrapping
In this section we are concerned with establishing bounds on the solution r(s, t) 0 to the Regularized Master Equations (2.7) that are uniform in the regularization parameter t > 0. Here we will view the standard deviation profile A and all parameters as fixed. Hence, we fix n ≥ 1 and consider an arbitrary nonnegative n × n matrix A = (σ ij ). Putting V = 1 n A A and fixing s, t > 0, we let r = r(s, t) denote the unique solution to the Regularized Master Equations satisfying r 0, as is provided by Proposition 2.1.

Some preparation and proofs of Propositions 2.5 and 2.6
We begin by recording some key estimates and identities that will be used repeatedly in the sequel. We may write the Regularized Master Equations (2.7) in component form as , so we are free to take reciprocals).
Additionally from Proposition 2.1 we have trace identity: 1 n n j=1 r j = 1 n n j=1 r j .
for all i ∈ [n]. We can similarly bound the product  Proof of Proposition 2.6. If V = 1 n A A is symmetric, then the 2n regularized master equations merge into n equations since r = r. In fact since V T = V then if r = r T r T T is a solution of the Regularized Master Equations, so isř = r T r T T . By uniqueness, r = r. Hence the Regularized Master Equations write Our main objective now is to establish the following, which immediately yields Theorem 2.8.
Proposition 5.1. Assume σ ij ≤ σ max for all i, j ∈ [n] and some σ max < ∞, and that A(σ 0 ) is (δ, κ)-robustly irreducible for some σ 0 , δ, κ ∈ (0, 1) (see Definition 2.7). For every fixed z ∈ C \ {0} there exists a constant K = K(z, σ 0 , σ max , δ, κ) < ∞ such that In the proof of Proposition 2.5 we were able to pass from a lower bound on a single component r i to an upper bound on the average 1 n n j=1 r j in one line. This will not be possible when we allow some of the variances σ 2 ij to be zero. We will employ a bootstrap-type argument which we roughly outline as follows: 1. Assume towards a contradiction that 1 n n i=1 r i is large. By the pigeonhole principle there exists i 0 ∈ [n] such that r i0 is large.
2. Use the estimates (5.4)-(5.6) together with assumptions on the connectivity properties of the associated directed graph to iteratively "grow" the set of indices i for which we know r i is large.
3. Once we have shown r i is large for (almost) all i ∈ [n], by (5.5) it follows that r i is small for (almost) all i ∈ [n]. We then apply the trace constraint (5.3) to derive a contradiction.
We emphasize that the key idea for our proofs to bound 1 n i r i will be to play (5.5) against the trace constraint (5.3).
Recall the graph-theoretic notation from Section 2.1. In this section we abbreviate for the in-and out-neighborhoods of a vertex i in the graph Γ = Γ(A(σ 0 )), and similarly define N (δ) − (i). For parameters α, β > 0 we define the sets where here and in the sequel we write ϕ ∞ = max i∈[n] ϕ i and similarly ϕ ∞ = max i∈[n] ϕ i . (Note that by (5.2) we have ϕ ∞ ≥ t > 0.) EJP 23 (2018), paper 110.

Qualitative boundedness
In this subsection we establish Proposition 2.9 and Lemma 4.1, which give an n-dependent bound on the components of r, r assuming only that the standard deviation profile is irreducible. This was used in Section 4 with a compactness argument to establish existence of solutions to the Master Equations. While Lemma 4.1 follows from Proposition 5.1 under the robust irreducibility assumption A5 (which we assume for our main result), we prove this lemma separately for two reasons: • to show that A5 is not needed for the conclusion of Theorem 2.2, and • to provide a cartoon for the more technical proof of Proposition 5.1.
We will also establish some auxiliary lemmas that will be reused in the proof of Proposition 5.1 (in particular Lemmas 5.3 and 5.4). Proposition 2.9 and Lemma 4.1 are an immediate consequence of the following: For any fixed 0 < σ 0 ≤ σ max and s 0 > 0 there is a constant K 0 (n, s 0 , σ 0 , σ max ) such that the following holds. Let s ≥ s 0 , and suppose A is irreducible with σ ij ∈ {0} ∪ [σ 0 , σ max ] for all 1 ≤ i, j ≤ n. Then 1 n n i=1 r i ≤ K 0 . We now begin the proof of Proposition 5.2. First we dispose of the case that s > 2σ max : Rearranging we obtain r i * ≤ t/(s 2 − σ 2 max ), which combines with (5.6) to give the desired uniform bound for r i , i ∈ [n]. The same bound is obtained for r i by similar lines.
Without loss of generality we take σ max = 1. By Claim 5.1 we may assume 0 < s 0 ≤ s ≤ 2. We may also assume t ≤ 1. Indeed, otherwise it follows from (5.6) that 1 n n i=1 r i < 1 and we are done. Let K > 0 to be chosen later depending on n, σ 0 and s 0 , but independent of t, and assume 1 n n i=1 r i ≥ K. (5.10) We will derive a contradiction for K sufficiently large.
In the following lemma we use the irreducibility of A n to show that if T β is non-empty for some β sufficiently small, then T β = [n] for a somewhat larger value of β . This will allow us to assume a uniform lower bound on the components r i .
If T C0β = [n] for some β ≤ β 0 then by the trace identity (5.3), On the other hand, from (5.4) we have for all j ∈ [n]. In particular, 1 n n j=1 which contradicts (5.10) if K is sufficiently large. Hence we may assume T β0 is empty for β 0 (σ 0 , n) as in Lemma 5.2. Thus, Proof. From our assumption and (5.15), Since ϕ ∞ ≥ 2 and t ≤ 1, 1 n n j=1 r j ≥ 1 2 ϕ ∞ . Now again by (5.14), Next we seek to show that we can enlarge S α by lowering α. By irreducibility we can find a vertex i * ∈ S α that is connected to S c α . We can use this to show that the average of the components r k over S c α is bounded below by cr i * for some small c > 0 depending on α, n, s, σ 0 . From the pigeonhole principle we obtain k ∈ S c α with r k ≥ cr i * ≥ cα ϕ ∞ . Taking α = cα, we will then have shown |S α | ≥ |S α | + 1.
We begin by relating the values of r, r on a fixed set of vertices S ⊂ [n] to the values taken on S c . For an n × n matrix M and S, T ⊂ [n] nonempty we write M S×T for the |S| × |T | submatrix of M with entries indexed by S × T . The following lemma will also be used in the proof of Proposition 5.1.
S×S is invertible, and we denote its inverse In terms of W S the restrictions of r, r to S and S c satisfy . (5.20) Furthermore, the entries of W S satisfy the following bounds. For all i, j ∈ S, W S ij ≥ 0. (5.21) For all j ∈ S, i∈S W S ij ≤ r j /t (5.22) and if (5.17) holds for some β 0 > 0.
is well defined. Furthermore, since all of the matrices in the above series have nonnegative entries, (5.21) follows.
In particular, r j ≥ t i∈S W S ij , giving (5.22), and which rearranges to give (5.23).
Let us conclude the proof of Proposition 5.2 on the above lemma. Putting α 0 = 1/4, by Lemma 5.3 we have S α0 = ∅. Taking K sufficiently large depending on s 0 , σ 0 and n we can iterate Lemma 5.5 at most n times to find α = α(s 0 , σ 0 , n) > 0 such that S α = [n].
and we are done.
Proof of Lemma 5.5. We write W = W Sα . From the first equation in (5.20), for any i ∈ S α we have for some i ∈ S α . Then from (5.22), where in the third inequality we applied (5.4). Rearranging we have ϕ ∞ ≤ 2|S|/ (sα) ≤ √ 2n/(s 0 α) in this case. On the other hand, from (5.15) we have K ≤ ϕ ∞ /s 2 0 , and we obtain a contradiction if K > √ 2n/(s 3 0 α). Suppose now that (5.27) does not hold for any i ∈ S α . Then rearranging (5.26) we for any i ∈ S α . From the assumption that A n is irreducible there for all j ∈ S α . Inserting this bound in (5.28) we have where in the second inequality we applied the bounds σ ij ≤ 1 for all i, j ∈ [n] and r j ≤ (s 2 α ϕ ∞ ) −1 for all j ∈ S α (by (5.5)). Rearranging we have By the pigeonhole principle there exists k ∈ S c α such that Setting α = α 2 s 2 0 β 0 σ 2 0 /(2n 3 ) we have k ∈ S α . Also, since α < α we have S α ⊂ S α . Hence, |S α | ≥ |S α | + 1 as desired.

Quantitative boundedness
Now we prove Proposition 5.1. By rescaling the variance profile V we may take σ max = 1. By Claim 5.1 we may assume s ∈ (0, 2]. As in the proof of Proposition 5.2 we may assume t ≤ 1. We may also assume n is sufficiently large depending on s, σ 0 , δ and κ. In the remainder of the section we make use of asymptotic notation O( ), , , allowing implied constants to depend on the parameters s, σ 0 , δ and κ, but not on n and t.
As in the proof of Proposition 5.2 we assume (5.10) holds for some K > 0 and aim to derive a contradiction for K sufficiently large depending on s, σ 0 , δ and κ. The argument follows the same general outline as the proof in the previous subsection. We will reuse Lemmas 5.3 and 5.4 as stated, but we will need versions of Lemmas 5.2 and 5.5 with constants independent of n.

Lower bounding r i
The following is an analogue of Lemma 5.2.
for some β 0 (s, σ 0 , δ, κ) > 0. Note we are now free to use the estimates in Lemma 5.4 with this value of β 0 .

Upper bounding r i
Here our task is essentially to modify the proof of Lemma 5.5 to show we can take α sufficiently small and independent of n such that |S α \ S α | n, rather than merely nonempty. We can then conclude the proof by iterating this fact a bounded number of times. Let us summarize the key new ideas. In the proof of Lemma 5.5 we used the irreducibility assumption to find an element i * ∈ S α such that the average of the components r k over S c α was of order α,n r i * ≥ α ϕ ∞ (see (5.30)). In a similar spirit, Lemma 5.8 below controls the average of r k over k ∈ S c α from below by the average of r i over i ∈ U 0 , for a set U 0 ⊂ S α that is densely connected to S c α . By averaging over a large set U 0 we are able to use the full strength of the bounds in Lemma 5.4 and avoid any dependence of the constants on n.
Proceeding naïvely, one can then use Lemma 5.8 to deduce |S c0α 2 \ S α | ≥ c 0 α|S c α | for a sufficiently small constant c 0 = c 0 (s, σ 0 , δ, κ) > 0. However, when iterating this bound over a sequence of values α k+1 = c 0 α 2 k , the sets S α k grow by an exponentially decreasing proportion of n, so this is not enough to find a value of α for which |S α | is close to n.
Instead, in Lemma 5.9 we are able to grow S α by a constant factor using a nested iteration argument, which we now describe. We would like to find some value of α ∈ (0, α) for which |S α \ S α | ≥ c|S α | (5.35) where c > 0 is small constant. Suppose that (5.35) fails. By the expansion assumption, we know that S α contains a fairly large set U = N (δ) − (S c α ) ∩ S α (of size at least min(|S α |, κ|S c α |)) that is densely connected to S c α . In particular, if c is sufficiently small depending on κ, then U must have large overlap with S α . Denoting the overlap by U 0 , Lemma 5.8 can now be applied to deduce for some c 0 = c 0 (s, σ 0 , δ, κ) > 0 sufficiently small. The key is that the constant of proportionality on the right hand side is independent of α . Hence, for fixed α, as long as (5.35) fails we can iteratively lower α to increase |S α \ S α | by an amount α|S c α |, until eventually (5.35) holds. This whole procedure can then be iterated a bounded number of times to obtain α such |S α | is close to n.
Having motivated the key ideas, we turn now to the proofs.
Lemma 5.8. Let α ∈ (0, 1) and suppose that 0 < |S α | ≤ (1 − δ/2)n. If K is sufficiently large depending on α, s, σ 0 , δ, κ, then for any Rearranging we have where in the second bound we applied the robust irreducibility assumption and our assumed bounds on U 0 and S α , and in the bound we used that both S α and its complement are of linear size in n. From our assumption (5.10), (5.4) and the above it follows that Taking K sufficiently large depending on α, s, σ 0 , δ and κ, we may assume (5.37) holds.

From (5.20) and Lemma 5.4 we have
Rearranging we obtain a bound on i∈U0 W Sα ij , which we substitute in (5.39) to obtain i∈U0 where in the second inequality we applied (5.5) to bound r j ≤ 1/s 2 α ϕ ∞ for all j ∈ S α .
The result now follows by rearranging.
From (5.46) and the robust irreducibility assumption (specifically the condition (2.13)), Combining the previous two displays we obtain r j ≤ 2 δσ 2 0 α ϕ ∞ for all j ∈ S c α . Together with (5.47) we have  4), we obtain a contradiction if K is sufficiently large depending on s, σ 0 , δ and κ. It follows that (5.10) fails for sufficiently large K, which concludes the proof of Proposition 5.1.
Remark 5.1. We note that in the above proof we only applied the expansion bound (2.14) to sets of size at least δn/10.

Proof of Theorem 2.3-(1): tail estimates and asymptotics of the logarithmic potential
The main purpose of this section is to show that the logarithmic potential U µ Y n is close to h n (z) = − R log |x|ν n,z (dx) for large n, and moreover that h n is the logarithmic potential of a probability measure µ n . Recall that in Theorem 3.2 we have already established the almost sure convergence of the truncated potentials: Thus, we need to show that these measures uniformly integrate the singularity of x → log |x| at 0. The proof has two main ingredients. The first is a result from [23] by the first author (stated in Proposition 6.1 below) that provides control on the smallest singular value of Y n − z.
The second is the control of the remaining small singular values of Y n − z via the quantity E Im gĽ n,z (it) when t is close to zero. We refer to step 2 in Section 2.5 for an outline of this argument.
Finally, to obtain the deterministic equivalents µ n for the ESDs µ Y n we rely on a meta-model argument, which has been used before in [25,49]. The idea is that for fixed n we can define a sequence {Y (m) n } m≥1 of nm × nm random matrices as in Definition 1.2, where the standard deviation profiles A (m) n are obtained by replacing each entry σ ij of A n by an m × m block with entries all equal to σ ij . We can then show that the logarithmic potentials of the associated ESDs converge to h n as m → ∞, which will allow us to deduce that h n is itself the logarithmic potential of a probability measure. This argument is described in more detail in Section 6.2 below.

Control on small singular values
The following result, obtained by one of the authors in [23], gives an estimate on the lower tail of the smallest singular value s n,z of Y n − z.  = β(α)). An easy argument also gives (6.1) for arbitrary fixed α > 0 and β(α) under A3 and replacing A0 with a bounded density assumption -see [19,Section 4.4]. For the case that the entries X ij are real Gaussian variables and A n (σ 0 ) is (δ, κ)-broadly connected for some fixed σ 0 , δ, κ ∈ (0, 1) (see Definition 2.15), (6.1) holds with arbitrary α > 0 and β = α + 1 by  EĽ n,z ((−x, x)) ≤ C(x ∨ n −γ0 ). (6.3) Proof. We rely on the following elementary estimate for the Stieltjes transform of a probability measure µ (see for instance [34,Lemma 15]): , t)) . (6.4) Recall that Im gν n,z (it) = n −1 i∈[n] r i (|z|, t). The first Wegner estimate (6.2) is a straightforward consequence of Assumption A2 and the estimate (6.4). We now establish the second Wegner estimate (6.3) and first prove that there exists for all n ≥ 1. For t ≥ 1, E Im gĽ n,z (it) ≤ 1 by the mere definition of a Stieltjes transform. Assume t < 1 and recall that E Im gĽ n,z (it) = n −1 tr Im EG(z, it). By Theorem 3.3, there exist constants c 0 , C > 0 such that By A2, we therefore get that E Im gĽ n,z (it) ≤ C(t −c0 n −1/2 + 1).

Comparison of logarithmic potentials via a meta-model
We now turn to the task of finding the measures µ n from Theorem 2.3 which serve as a sequence of deterministic equivalents for the ESDs µ Y n . A first idea is to try to show that for every ψ ∈ C ∞ c (C), is "close" to −(2π) −1 ∆ψ(z)h n (z)dz. However, there is a difficulty in directly applying this approach, related to the fact thatĽ n,z does not converge in general with no further assumption on the variance profile matrices V n .
To circumvent this difficulty, we rely on a meta-model argument, which has been used in [25,49], and which we now describe. Let n be fixed, consider the standard deviation profile A n = (σ ij ) and the normalized variance profile V n = 1 n σ 2 where p m andp m are nm × 1 vectors. As an important consequence, we have: [ p ] i = gν n,z (η). n,z =ν n,z . We now state our proposition giving the existence of the measures µ n , the proof of which will occupy the main part of the remainder of this section. is well defined and for every compact set K ⊂ C, Moreover, h n (z) coincides with the logarithmic potential U µn (z) of a probability measure µ n on C.
2. For µ n as defined in part (1), there exists a constant C > 0, independent of n, such that for all M > 0, We will rely on the following lemmas, whose proofs are respectively deferred to Appendices A.5 and A.6. Lemma 6.1. Let (Ω, F, P) be a given probability space, ζ a finite positive measure on C and f n : Ω × C → R measurable functions satisfying sup n C |f n (ω, z)| 1+α P ⊗ ζ(dω × dz) ≤ C (6.8) for some constants α, C > 0.
Let g : C → R be a measurable function such that for ζ-almost all z ∈ C, .
(6.9) Lemma 6.2. Let (ζ n ) be a sequence of random probability measures on C. Assume that a.s. (ζ n ) is tight and that there exists a locally integrable function h : C → R such that for all ψ ∈ C ∞ c (C), The proof consists of the following three steps: 1. To show that for every z ∈ C \ {0}, x → log |x| isν n,z -integrable and 2. To show that the function h n (z) is measurable.
3. To show that for any compact set K ⊂ C, for some constant C > 0 independent of n.
The three previous steps being proved, the assumptions of Lemma 6.1 are fulfilled and the lemma yields for every ψ ∈ C ∞ c (C). It remains to apply Lemma 6.2 to conclude that h n is the logarithmic potential of a probability distribution µ n on C and point (1) of Proposition 6.3 will be proved. Let us address Step 1, to prove the convergence in (6.11), we separately consider the integrals defining the logarithmic potentials in the regions |x| greater than and less than ε.
Taking into account the convergence (6.7) and applying Theorem 3.2 to the sequence (Ľ (m) n,z ) m , we deduce that x → log |x| isν n,z -integrable near infinity, and that  where (a) follows from Wegner's estimate (6.2) in Corollary 6.2. Let (s i,z ) i∈[mn] be the singular values of Y (m) n − z ordered as s 1,z ≥ · · · ≥ s mn,z . For z ∈ C \ {0} and β > 0 the exponent as in Proposition 6.1, we introduce the event For all τ > 0, On the other hand, another application of the same Wegner estimate yields Therefore, by Markov's inequality, we finally obtain Thus, for all τ, τ > 0, we can choose ε > 0 small enough so that n,z (dx) > τ < τ for m large enough. Gathering this result with (6.13) and (6.14) yields (6.11), and Step 1 is proved.
We now address Step 2 and study the measurability of h n (z). Recall the regularized logarithmic potentials defined in (2.24) and consider also the following function: n,z (dx) .
Given z and z ∈ C, Hoffman-Wielandt's theorem applied to Y (m) and it follows that for any fixed t > 0 the family {z → U Y n,m (z, t)} m≥1 is uniformly equicontinuous. Since from Theorem 3.2 we have U Y n,m (z, t) − −−− → m→∞ U n (z, t) almost surely, it follows that z → U n (z, t) is continuous for any fixed t > 0. Finally, since x → log |x| iš ν n,z -integrable near zero for any z = 0 by (6.14), The measurability of h n follows and Step 2 is proved. We now address Step 3 and prove (6.12). Observe that on any compact set K ∈ C, there exists a constant C K such that Taking the expectation of the previous inequality finally yields Step 3 is proved.
We now prove point (2) of Proposition 6.3. Given M > 0, we get from Lemma 6.1 that where C > 0 is independent of n. Let ψ be a nonnegative C ∞ c (C) function equal to one for |z| < M and to zero if |z| > M + 1. As a byproduct of Lemma 6.
Proposition 6.3 is proved.

Conclusion of the proof of Theorem 2.3-(i)
We can now complete the proof of Theorem 2.3-(i) and prove that µ Y n ∼ µ n in probability, with µ n defined in Proposition 6.3. By Proposition 6.3, the sequence (µ n ) is tight. It remains to prove that for all ϕ ∈ C c (C), ϕdµ Y n − ϕdµ n → 0 in probability. By the density of C ∞ c (C) in C c (C), it is enough to show that . By mimicking the proof of Proposition 6.3, where Y (m) n and m are replaced with Y n and n respectively, we straightforwardly obtain that U µ Y n (z)−U µn (z) → 0 in probability for every z ∈ C \ {0}. This proof also shows that sup n E K |U µ Y n (z)| 2 (dz) < ∞ for all compact sets K ⊂ C. We also know by Proposition 6.3 that sup n K |U µn (z)| 2 (dz) < ∞. The result now follows from Lemma 6.1. For the remainder of this section, we set b n (z, t) := − z 2n tr Ψ( r(|z|, t) , t) and b n (z) = − z 2n tr Ψ( q(|z|)) , where Ψ(·, t) and Ψ(·) are defined in (4.1), r(·, t) is defined in Proposition 2.1, and q(·) is defined in 2.2.
Lemma 7.1. Under the same assumptions as in Theorem 2.3, the function z → b n (z) is locally integrable on C, and ∂zU µn (z) = b n (z) in D (C).
Proof. Recall the definition of U n (z, t) in (2.24). Recall also that by Proposition 6.3-(i), the probability measure µ n is such that: We first prove that U n (z, t) Recall from the proof of Proposition 6.3 that z → U n (z, t) is continuous for any fixed t > 0. It is moreover clear from the expressions of U n (z, t) and U µn (z) that U n (z, t) ↑ U µn (z) as t ↓ 0. Recall that U µn (z), being a logarithmic potential, is locally integrable (as can be seen by Fubini's theorem). Thus, given a fixed t 0 > 0, 0 ≤ U n (t, z) − U n (t 0 , z) ≤ U µn (z) − U n (t 0 , z), for 0 < t ≤ t 0 , and (7.2) immediately follows from the monotone convergence theorem. By a property of the convergence in D (C), this implies the convergence of the distributional derivative ∂z U n (z, t) We now prove that for all t > 0, ∂zU n (z, t) = b n (z, t) (7.4) in D (C). We shall rely on a meta-model argument. Recall the meta-model Y (m) n introduced in (6.6), its limiting property (6.7), and the definition of U Y n,m (z, t) in (2.24). Fix t > 0. By Theorem 3.2, U Y n,m (z, t) → U n (z, t) almost surely as m → ∞ for all z ∈ C. Furthermore, recalling the notation (s i,z ) i∈[mn] for the singular values of Y (m) where (a) follows from the elementary inequality 2 −1 log 2 (1 + x) ≤ x, valid for x ≥ 0. In particular, this implies that on every compact set K ⊂ C. By Lemma 6.1, we get that U n (·, t) is locally integrable on C, and that for all ψ ∈ C ∞ c (C). An integration by parts along with Jacobi's formula shows that for all ω ∈ Ω, the distributional derivative ∂z U Y n,m (t, z) coincides with the pointwise derivative, which is given by On the other hand, we know from Theorem 3.3 that ∂z U Y n,m (z, t) → b n (z, t) almost surely as m → ∞, for all z ∈ C. Moreover, from a singular value decomposition of Y (m) n − z we easily see that |∂ z * U Y n,m (z, t)| ≤ (4t) −1 . Consequently, we get by Lemma 6.1 again that Comparing with (7.5), we obtain that ∂zU n (z, t) = b n (z, t) in D (C). We now consider the limit in t ↓ 0 in (7.4). Since |b n (z, t)| ≤ |2z| −1 , the dominated convergence theorem yields b n (z, t) Combining this convergence together with (7.3) and (7.4), we obtain the desired result.
In order to characterize the probability measure µ n , we use the equation µ n = −(2π) −1 ∆U µn and rely on the smoothness properties of ∆U µn that can be deduced from Lemma 4.4. We recall that q(s) is defined in the statement of Theorem 2.2. The support of µ n is contained in {z : |z| ≤ ρ(V )}. Finally, F n is absolutely continuous on (0, ∞), and has a continuous density on (0, ρ(V )).
Before entering the proof, we note that the rotational invariance of µ n can be "guessed" from the form of the Schwinger-Dyson equations of Proposition 3.1. Indeed, from this one sees that the Stieltjes transform gν n,z (η) = n −1 tr P (|z|, η) ofν n,z depends on z only through its absolute value, and this is therefore also the case for U µn (z). It is easy to check that this yields the rotational invariance of µ n .

Proof of Theorem 2.4
In Example 2.2, it has been proved that Theorem 2.4 holds under the additional assumption that the matrices A n are irreducible. Now for the general case, by conjugating Y n by a permutation matrix we may assume A n takes the form are square irreducible matrices of respective dimension n 1 ≥ · · · ≥ n m . Indeed, for A n a general nonnegative matrix we can achieve this with the upper triangular blocks not necessarily zero, but these are forced to be zero by the stochasticity condition. Also, by A1 and the row-sum constraint applied to the last row of A n = (σ ij ), 1 = 1 n n j=1 σ 2 ij ≤ nm n σ 2 max so in fact we have n 1 , . . . , n m ≥ n/σ 2 max . n is doubly stochastic, and 3. n k → ∞ as n → ∞ (by (7.8)).
Thus, for each k the ESD µ (k) n of Y (k) n converges weakly in probability to µ circ . Since the µ Y n is the weighted sum: µ Y n = (n 1 /n)µ (1) n + · · · + (n m /n)µ (m) n we get that µ Y n converges weakly in probability to µ circ . This concludes the proof of Theorem 2.4.

A.1 Stieltjes transform of a symmetric probability measure
We note that a symmetric probability distributionν on R satisfiesν(A) =ν(−A) for each Borel set A ⊂ R. Proof. The necessity is obvious from the definition of the Stieltjes transform and from the fact thatν(dλ) =ν(−dλ). To prove the sufficiency, we use the Perron inversion formula, that says that for any function ϕ ∈ C c (R), R ϕ(x)ν(dx) = lim ε↓0 1 π R ϕ(x) Im gν(x + iε) dx.

A.2 Variance estimates
In this section we collect without proofs a number of standard variance estimates. Let (Y n ) be a sequence of matrices as in Definition 1.2. In the sequel we drop the subscript n. Denote by ( e i ) the standard vector basis. We introduce the following notations: Y = ( y 1 , · · · , y n ) and Q(η 2 ) = .
Recall the definition of matrices R and G in (2.19).
Proposition A.1. Let A0 and A1 hold. Let ∆ be a n × n deterministic diagonal matrix, then the following estimates hold: var(R ij ) = O η n −1 for 1 ≤ i, j ≤ 2n , These estimates can be obtained as in the proof of [49,Proposition 6.3], see also the references therein.
As a direct corollary of the previous proposition, we have: Proof. Let

A.3 Proof of Proposition 3.5
Notice first that it is sufficient to prove where C is any of the following 2n × 2n matrices: We follow the approach used in [59] and perform an entry-by-entry interpolation between Notice that Z 1 = Y . By convention, we denote Z n 2 +1 = Y N . Redenoting the resolvent R(z, η) defined in (2.19) as R Y to express the dependence on Y (thus, R N = R Y N ), we where e i is the i th canonical vector of C n . These matrices are (at most) rank-two matrices that are both independent of W m . We now use the identity R Z m = R W m − R Z m ∆ m R W m three times to obtain Similarly, The first two terms at the right hand sides of these equations are identical since the first Hence, v( θ) 0 and v( θ) i is lower bounded away from zero.
In order to prove p 0, we argue as follows: the p i 's are Stieltjes transforms of probability measures µ i . These probability measures are tight, see for instance Proposition 3.1-(v). In particular, there exists a real number K such that µ i ([−K, K]) ≥ 1 2 . Hence, .

A.6 Proof of Lemma 6.2
Let (ψ k ) k≥1 be a sequence of smooth compactly supported functions, dense in C c (C) for the supremum norm ψ ∞ = sup z∈C |ψ(z)|. By the diagonal extraction procedure, one can find a subsequence (ζ n ) such that with probability one (ζ n ) is tight and ψ k d ζ n − −−− → n →∞ − 1 2π ∆ψ k (z)h(z) (dz) for all k ≥ 1. Thus, on this set of probability one, the tight sequence (ζ n ) has a unique non-random limit point ζ, and this limit point satisfies ζ = −(2π) −1 ∆h in D (C), the set of Schwartz distributions. With this at hand, we get from the assumption that ψ k (z) ζ n (dz) for all k ≥ 1. By a density argument, we thus get that ϕ(z) ζ n (dz) for every ϕ ∈ C c (C).