Around the circular law

These expository notes are centered around the circular law theorem, which states that the empirical spectral distribution of a nxn random matrix with i.i.d. entries of variance 1/n tends to the uniform law on the unit disc of the complex plane as the dimension $n$ tends to infinity. This phenomenon is the non-Hermitian counterpart of the semi circular limit for Wigner random Hermitian matrices, and the quarter circular limit for Marchenko-Pastur random covariance matrices. We present a proof in a Gaussian case, due to Silverstein, based on a formula by Ginibre, and a proof of the universal case by revisiting the approach of Tao and Vu, based on the Hermitization of Girko, the logarithmic potential, and the control of the small singular values. Beyond the finite variance model, we also consider the case where the entries have heavy tails, by using the objective method of Aldous and Steele borrowed from randomized combinatorial optimization. The limiting law is then no longer the circular law and is related to the Poisson weighted infinite tree. We provide a weak control of the smallest singular value under weak assumptions, using asymptotic geometric analysis tools. We also develop a quaternionic Cauchy-Stieltjes transform borrowed from the Physics literature.

These expository notes are split in seven sections and an appendix. Section 1 introduces the notion of eigenvalues and singular values and discusses their relationships. Section 2 states the circular law theorem. Section 3 is devoted to the Gaussian model known as the Complex Ginibre Ensemble, for which the law of the spectrum is known and leads to the circular law. Section 4 provides the proof of the circular law theorem in the universal case, using the approach of Tao and Vu based on the Hermitization of Girko and the logarithmic potential. Section 5 presents some models related to the circular law and discusses an algebraic-analytic interpretation via free probability. Section 6 is devoted to the heavy tailed counterpart of the circular law theorem, using the objective method of Aldous and Steele and the Poisson Weighted Infinite Tree. Finally, section 7 lists some open problems. The notes end with appendix A devoted to a novel general weak control of the smallest singular value of random matrices with i.i.d. entries, with weak assumptions, well suited for the proof of the circular law theorem and its heavy tailed analogue.
All random variables are defined on a unique common probability space (Ω, A, P). A typical element of Ω is denoted ω. Table 1 gathers most frequently used notations.

Two kinds of spectra
The eigenvalues of A ∈ M n (C) are the roots in C of its characteristic polynomial P A (z) := det(A − zI). We label them λ 1 (A), . . . , λ n (A) so that |λ 1 (A)| ≥ · · · ≥ |λ n (A)| with growing phases. The spectral radius is |λ 1  . This turns out to be useful because the mapping A → H A is linear in A, in contrast with the mapping A → √ AA * . Geometrically, the matrix A maps the unit sphere to an ellipsoid, the half-lengths of its principal axes being exactly the singular values of A. The operator norm or spectral norm of A is Ax 2 = s 1 (A) while s n (A) = min The rank of A is equal to the number of non-zero singular values. If A is non-singular then s i (A −1 ) = s n−i (A) −1 for all 1 ≤ i ≤ n and s n (A) = s 1 (A −1 ) −1 = A −1 −1 2→2 .
A Figure 1. Largest and smallest singular values of A ∈ M2(R). Taken from [33].
Since the singular values are the eigenvalues of a Hermitian matrix, we have variational formulas for all of them, often called the Courant-Fischer variational formulas [82,Th. log natural Neperian logarithm function (we never use the notation ln) Table 1. Main frequently used notations. Ax, y .
Most useful properties of the singular values are consequences of their Hermitian nature via these variational formulas, which are valid on R n and on C n . In contrast, there are no such variational formulas for the eigenvalues in great generality, beyond the case of normal matrices. If the matrix A is normal 1 (i.e. A * A = A * A) then for every 1 ≤ i ≤ n, s i (A) = |λ i (A)|.
Beyond normal matrices, the relationships between the eigenvalues and the singular values are captured by a set of inequalities due to Weyl [154] 2 , which can be obtained by using the Schur unitary triangularization 3 , see for instance [82,Theorem 3.3

.2 page 171].
Theorem 1.1 (Weyl inequalities). For every n × n complex matrix A and 1 ≤ k ≤ n, (1.1) The reversed form n i=n−k+1 s i (A) ≤ n i=n−k+1 |λ i (A)| for every 1 ≤ k ≤ n can be deduced without much difficulty (exercise!). Equality is achieved for k = n and we have Note that µ A and ν A are supported respectively in C and R + . There is a rigid determinantal relationship between µ A and ν A , namely from (1.2) we get log |λ| dµ A (λ) = 1 n = log(s) dν A (s). 1 In these notes, the word normal is always used in this way, and never as a synonym for Gaussian. 2 Horn [80] showed a remarkable converse to Weyl's theorem: if a sequence s1 ≥ · · · ≥ sn of non-negative real numbers and a sequence λ1, . . . , λn of complex numbers of non increasing modulus satisfy to all Weyl's inequalities (1.1) then there exists A ∈ Mn(C) with eigenvalues λ1, . . . , λn and singular values s1, . . . , sn. 3 If A ∈ Mn(C) then there exists a unitary matrix U such that U AU * is upper triangular.
This identity is at the heart of the Hermitization technique used in sections 4 and 6.
The singular values are quite regular functions of the matrix entries. For instance, the Courant-Fischer formulas imply that the mapping A → (s 1 (A), . . . , s n (A)) is 1-Lipschitz for the operator norm and the ℓ ∞ norm in the sense that for any A, B ∈ M n (C), (1.6) Recall that M n (C) or M n (R) are Hilbert spaces for the scalar product A · B = Tr(AB * ). The norm · 2 associated to this scalar product, called the trace norm 4 , satisfies to In the sequel, we say that a sequence of (possibly signed) measures (η n ) n≥1 on C (respectively on R) tends weakly to a (possibly signed) measure η, and we denote This type of convergence does not capture the behavior of the support and of the moments 5 .
Example 1.2 (Spectra of non-normal matrices). The eigenvalues depend continuously on the entries of the matrix. It turns out that for non-normal matrices, the eigenvalues are more sensitive to perturbations than the singular values. Among non-normal matrices, we find non-diagonalizable matrices, including nilpotent matrices. Let us recall a striking example taken from [138] and [11,Chapter 10]. Let us consider A, B ∈ M n (R) given by We have λ 1 (A) = · · · = λ κn (A) = 0 and thus In contrast, B n = κ n I and thus λ k (B) = κ 1/n n e 2kπi/n for all 1 ≤ k ≤ n which gives µ B Uniform{z ∈ C : |z| = 1} 4 Also known as the Hilbert-Schmidt norm, the Schur norm, or the Frobenius norm. 5 Note that for empirical spectral distributions in random matrix theory, most of the time the limit is characterized by its moments, and this allows to deduce weak convergence from moments convergence.
as soon as κ 1/n n → 1 (this allows κ n → 0). On the other hand, from the identities AA * = diag(1, . . . , 1, 0) and BB * = diag(1, . . . , 1, κ 2 n ) we get s 1 (A) = · · · = s n−1 (A) = 1, s n (A) = 0 and s 1 (B) = · · · = s n−1 (B) = 1, s n (B) = κ n for large enough n, and therefore, for any choice of κ n , since the atom κ n has weight 1/n, We refer to the books [82] and [65] for more details on basic properties of the singular values and eigenvalues of deterministic matrices. The sensitivity of the spectrum to perturbations of small norm is captured by the notion of pseudo-spectrum. Namely, for a matrix norm · and a positive real ε, the ( · , ε)-pseudo-spectrum of A is defined by If A is normal then its pseudo-spectrum for the operator norm · 2→2 coincides with the ε-neighborhood of its spectrum. The pseudo-spectrum can be much larger for non-normal matrices. For instance, if A is the nilpotent matrix considered earlier, then the asymptotic (as n → ∞) pseudo-spectrum for the operator norm contains the unit disc if κ n is well chosen. For more, see the book [151].

Quarter circular and circular laws
The variance of a random variable Z on C is Var(Z) = E(|Z| 2 ) − |E(Z)| 2 . Let (X ij ) i,j≥1 be an infinite table of i.i.d. random variables on C with variance 1. We consider the square random matrix X := (X ij ) 1≤i,j≤n as a random variable in M n (C). We write a.s., a.a., and a.e. for almost surely, Lebesgue almost all, and Lebesgue almost everywhere respectively.
We start with a reformulation in terms of singular values of the classical Marchenko-Pastur theorem for the "empirical covariance matrix" 1 n XX * . This theorem is universal in the sense that the limiting distribution does not depend on the law of X 11 . Theorem 2.1 (Marchenko-Pastur quarter circular law). a.s. ν n −1/2 X Q 2 as n → ∞, where Q 2 is the quarter circular law 6 on [0, 2] ⊂ R + with density x → π −1 √ 4 − x 2 1 [0,2] (x). 6 Actually, it is a quarter ellipse rather than a quarter circle, due to the normalizing factor 1/π. However, one may use different scales on the horizontal and vertical axes to see a true quarter circle, as in figure 2.
The n −1/2 normalization factor is easily understood from the law of large numbers: s i (X) 2 = 1 n 2 Tr(XX * ) = 1 n 2 n i,j=1 |X i,j | 2 → E(|X 1,1 | 2 ). (2.1) The central subject of these notes is the following counterpart for the eigenvalues.
Note that if Z is a complex random variable following the uniform law on the unit disc {z ∈ C : |z| ≤ 1} then the random variables Re(Z) and Im(Z) follow the semi circular law on [−1, 1], but are not independent. Additionally, the random variables |Re(Z)| and |Im(Z)| follow the quarter circular law on [0, 1], and |Z| follows the law with density ρ → 1 2 ρ1 [0,1] (ρ). We will see in section 5 that the notion of freeness developed in free probability is the key to understand these relationships. An extension of theorem 2.1 is the key to deduce theorem 2.2 via a Hermitization technique, as we will see in section 4.
The circular law theorem 2.2 has a long history. It was established through a sequence of partial results during the period 1965-2009, the general case being finally obtained by Tao and Vu [150]. Indeed Mehta [113] was the first to obtain a circular law theorem for the expected empirical spectral distribution in the complex Gaussian case, by using the explicit formula for the spectrum due to Ginibre [53]. Edelman was able to prove the same kind of result for the far more delicate real Gaussian case [41]. Silverstein provided an argument to pass from the expected to the almost sure convergence in the complex Gaussian case [84]. Girko worked on the universal version and came with very good ideas such as the Hermitization technique [54,56,58,59,60]. Unfortunately, his work was controversial due to a lack of clarity and rigor 8 . In particular, his approach relies implicitly on an unproved uniform integrability related to the behavior of the smallest singular values of random matrices. Let us mention that the Hermitization technique is also present in the work of Widom [155] on Toeplitz matrices and in the work of Goldsheid and Khoruzhenko [63]. Bai [10] was the first to circumvent the problem in the approach of Girko, at the price of bounded density assumptions and moments assumptions 9 . Bai improved his approach in his book written with Silverstein [11]. His approach involves the control of the speed of convergence of the singular values distribution.Śniady considered a universal version beyond random matrices and the circular law, using the notion of * -moments and Brown measure of operators in free probability, and a regularization by adding an independent Gaussian Ginibre noise [138]. Goldsheid and Khoruzhenko [64] used successfully the logarithmic potential to derive the analogue of the circular law theorem for random non-Hermitian tridiagonal matrices. The smallest singular value of random matrices was the subject of an impressive activity culminating with the works of Tao and Vu [145] and of Rudelson and Vershynin [128], using tools from asymptotic geometric analysis and additive combinatorics (Littlewood-Offord problems). These achievements allowed Götze and Tikhomirov [66] to obtain the expected circular law theorem up to a small loss in the moment assumption, by using the logarithmic potential. Similar ingredients are present in the work of Pan and Zhou [116]. At the same time, Tao and Vu, using a refined bound on the smallest singular value and the approach of Bai, deduced the circular law theorem up to a small loss in the moment assumption [146]. As in the works of Girko, Bai and their followers, the loss was due to a sub-optimal usage of the Hermitization approach. 7 It is not customary to call it instead the "disc law". The terminology corresponds to what we actually draw: a circle for the circular law, a quarter circle (actually a quarter ellipse) for the quarter circular law, even if it is the boundary of the support in the first case, and the density in the second case. See figure 2. 8 Girko's writing style is also quite original, see for instance the recent paper [61]. 9 . . . I worked for 13 years from 1984 to 1997, which was eventually published in Annals of Probability.
It was the hardest problem I have ever worked on. Zhidong Bai, interview with Atanu Biswas in 2006 [36].
In [150], Tao and Vu finally obtained the full circular law theorem 2.2 by using the full strength of the logarithmic potential, and a new control of the count of the small singular values which replaces the speed of convergence estimates of Bai. See also their synthetic paper [147]. We will follow essentially their approach in section 4 to prove theorem 2.2. Figure 2. Illustration of universality in the quarter circular law and the circular law theorems 2.1 and 2.2. The plots are made with the singular values (upper plots) and eigenvalues (lower plot) for a single random matrix X of dimension n = 1000. On the left hand side, X 11 follows a standard Gaussian law on R, while on the right hand side X 11 follows a symmetric Bernoulli law on {−1, 1}. Since X has real entries, the spectrum is symmetric with respect to the real axis. A striking fact behind such simulations for the eigenvalues (lower plots) is the remarkable stability of the numerical algorithms for the eigenvalues despite the sensitivity of the spectrum of non-normal matrices. Is it theŚniady regularization of Brown measure theorem [138] at work due to floating point approximate numerics?
The a.s. tightness of µ n −1/2 X is easily understood since by Weyl's inequality we obtain The convergence in the couple of theorems above is the weak convergence of probability measures with respect to continuous bounded functions. We recall that this mode of convergence does not capture the convergence of the support. It implies only that a.s. lim n→∞ s 1 (n −1/2 X) ≥ 2 and lim n→∞ |λ 1 (n −1/2 X)| ≥ 1.
The asymptotic factor 2 between the operator norm and the spectral radius indicates in a sense that X is a non-normal matrix asymptotically as n → ∞ (note that if X 11 is absolutely continuous then X is absolutely continuous and thus XX * = X * X a.s. which means that X is non-normal a.s.). The law of the modulus under the circular law has density ρ → 2ρ1 [0,1] (ρ) which differs completely from the shape of the quarter circular law figure 3. The integral of "log" for both laws is the same.  This difference indicates the asymptotic non-normality of these matrices. The integral of the function t → log(t) is the same for both distributions.

Gaussian case
This section is devoted to the case where X 11 ∼ N (0, 1 2 I 2 ). From now on, we denote G instead of X in order to distinguish the Gaussian case from the general case. We say that G belongs to the Complex Ginibre Ensemble. The Lebesgue density of the n × n random where A * the conjugate-transpose of A. This law is a Boltzmann distribution with energy This law is unitary invariant, in the sense that if U and V are n × n unitary matrices then U GV and G are equally distributed. If H 1 and H 2 are two independent copies of 10 The argument is based on Gelfand's formula: if A ∈ Mn(C) then |λ1(A)| = lim k→∞ A k 1/k for any norm · on Mn(C) (recall that all norms are equivalent in finite dimension). In the same spirit, the Yamamoto theorem states that lim k→∞ si(A k ) 1/k = |λi(A)| for every 1 ≤ i ≤ n, see [82,Theorem 3.3.21]. GUE 11 then (H 1 + iH 2 )/ √ 2 has the law of G. Conversely, the matrices (G + G * )/ √ 2 and (G − G * )/ √ 2i are independent and belong to the GUE. The singular values of G are the square root of the eigenvalues of the positive semidefinite Hermitian matrix GG * . The matrix GG * is a complex Wishart matrix, and belongs to the complex Laguerre Ensemble (β = 2). The empirical distribution of the singular values of n −1/2 G tends to the Marchenko-Pastur quarter circular distribution (Gaussian case in theorem 2.1). This section is rather devoted to the study of the eigenvalues of G, and in particular to the proof of the circular law theorem 2.2 in this Gaussian settings. Lemma 3.1 (Diagonalizability). For every n ≥ 1, the set of elements of M n (C) with multiple eigenvalues has zero Lebesgue measure in C n×n . In particular, the set of nondiagonalizable elements of M n (C) has zero Lebesgue measure in C n×n .
Proof. If A ∈ M n (C) has characteristic polynomial P A (z) = z n + a n−1 z n−1 + · · · + a 0 , then a 0 , . . . , a n−1 are polynomials of the entries of A. The resultant R(P A , P ′ A ) of P A , P ′ A , called the discriminant of P A , is the determinant of the (2n−1)×(2n−1) Sylvester matrix of P A , P ′ A . It is a polynomial in a 0 , . . . , a n−1 . We have also the Vandermonde formula Consequently, A has all eigenvalues distinct if and only if A lies outside the proper polynomial hyper-surface {A ∈ C n×n : R(P A , P ′ A ) = 0}. Since G is absolutely continuous, we have a.s. GG * = G * G (non-normality). Additionally, lemma 3.1 gives that a.s. G is diagonalizable with distinct eigenvalues. Following Ginibre [53] -see also [114,49,Chapter 15] and [96] -one may then compute the joint density of the eigenvalues λ 1 (G), . . . , λ n (G) of G by integrating (3.1) over the non-eigenvalues variables. The result is stated in theorem 3.2 below. It is worthwhile to mention that in contrast with Hermitian unitary invariant ensembles, the computation of the spectrum law is problematic if one replaces the square potential by a more general potential, see [96]. The law of G is invariant by the multiplication of the entries with a common phase, and thus the law of the spectrum of G has also the same property. In the sequel we set ∆ n := {(z 1 , . . . , z n ) ∈ C n : |z 1 | ≥ · · · ≥ |z n |}. Theorem 3.2 (Spectrum law). (λ 1 (G), . . . , λ n (G)) has density n!ϕ n 1 ∆n where In particular, for every symmetric Borel function F : C n → R, We will use theorem 3.2 with symmetric functions of the form The Vandermonde determinant comes from the Jacobian of the diagonalization, and can be interpreted as an electrostatic repulsion. The spectrum is a Gaussian determinantal point process, see [83,Chapter 4]. 11 Up to scaling, a random n × n Hermitian matrix H belongs to the Gaussian Unitary Ensemble (GUE) when its density is proportional to H → exp(− 1 2 Tr(H 2 )) = exp(− 1 2 n i=1 |Hii| 2 − 1≤i<j≤n |Hij | 2 ). Equivalently {Hii, Hij : 1 ≤ i ≤ n, i < j ≤ n} are indep. and Hii ∼ N (0, 1) and Hij ∼ N (0, 1 2 I2) for i = j.

Theorem 3.3 (k-points correlations).
Let z ∈ C → γ(z) = π −1 e −|z| 2 be the density of the standard Gaussian N (0, 1 2 I 2 ) on C. Then for every 1 ≤ k ≤ n, the "k-point correlation" In particular, by taking k = n we get Recall that if µ is a random probability measure on C then Eµ is the deterministic probability measure defined for every bounded measurable f by Theorem 3.4 (Mean circular Law). Eµ n −1/2 G C 1 as n → ∞, where C 1 is the circular law i.e. the uniform law on the unit disc of C with density z → π −1 1 {z∈C:|z|≤1} .
Proof. From theorem 3.3, with k = 1, we get that the density of Eµ G is The n in front of ϕ n,1 is due to the fact that we are on the complex plane C = R 2 and thus d √ nxd √ ny = ndxdy. Here is the start of the elementary calculus: for r 2 < n, By taking r 2 = | √ nz| 2 we obtain the convergence of the density uniformly on compact subsets, which implies in particular the weak convergence.
The sequence (H k ) k∈N forms an orthonormal basis (orthogonal polynomials) of square integrable analytic functions on C for the standard Gaussian on C. The uniform law on the unit disc (known as the circular law) is the law of √ V e 2iπW where V and W are i.i.d. uniform random variables on the interval [0, 1]. This point of view can be used to interpolate between complex Ginibre and GUE via Girko's elliptic laws, see [100,90,19].
We are ready to prove the complex Gaussian version of the circular law theorem 2.2.
Proof. We reproduce Silverstein's argument, published by Hwang [84]. The argument is similar to the quick proof of the strong law of large numbers for independent random variables with bounded fourth moment. It suffices to establish the result for compactly supported continuous bounded functions. Let us pick such a function f and set Suppose for now that we have By monotone convergence (or by the Fubini-Tonelli theorem), s. which implies lim n→∞ S n − ES n = 0 a.s. Since lim n→∞ ES n = S ∞ by theorem 3.4, we get that a.s.
Finally, one can swap the universal quantifiers on ω and f thanks to the separability of the set of compactly supported continuous bounded functions C → R equipped with the supremum norm. To establish (3.2), we set Next, we obtain, with i 1 ,... running over distinct indices in 1, . . . , n, The first three terms of the right hand side are O(n −2 ) since max 1≤i≤n |Z i | ≤ f ∞ . Finally, some calculus using the expressions of ϕ n,3 and ϕ n,4 provided by theorem 3.3 allows to show that the remaining two terms are also O(n −2 ). See Hwang [84, p. 151].
It is worthwhile to mention that one can deduce the circular law theorem 3.5 from a large deviations principle, bypassing the mean circular law theorem 3.4, see section 5.
Following Kostlan [97] (see also Rider [122] and [83]) the integration of the phases in the joint density of the spectrum given by theorem 3.2 leads to theorem 3.6 below.
. . , E k are i.i.d. exponential random variables of unit mean, we get, for every r > 0, The law of large numbers suggests that r = 1 is a critical value. The central limit theorem suggests that λ 1 (n −1/2 G) behaves when n ≫ 1 as the maximum of i.i.d. Gaussians, for which the fluctuations follow the Gumbel law. A quantitative central limit theorem and the Borel-Cantelli lemma provides the follow result. The full proof is in Rider [122]. Moreover, if γ n := log(n/2π) − 2 log(log(n)) then where G is the Gumbel law with cumulative distribution function x → e −e −x on R.
The convergence of the spectral radius was obtained by Mehta [114, chapter 15 page 271 equation 15.1.27] by integrating the joint density of the spectrum of theorem 3.2 over the set 1≤i≤n {|λ i | > r}. The same argument is reproduced by Hwang [84, pages 149-150]. Let us give now an alternative derivation of theorem 3.4. From theorem 3.7, the sequence (Eµ n −1/2 G ) n≥1 is tight and every accumulation point µ is supported in the unit disc. From theorem 3.2, such a µ is rotationally invariant, and from theorem 3.6, the image of µ by z ∈ C → |z| has density r → 2r1 [0,1] (r) (use moments!). Theorem 3.4 follows immediately.
It is remarkable that the large eigenvalues in modulus of the complex Ginibre ensemble are asymptotically independent, which gives rise to a Gumbel fluctuation, in contrast with the GUE and its delicate Tracy-Widom fluctuation, see [90] for an interpolation.
Remark 3.8 (Real Ginibre Ensemble). Ginibre considered also in his paper [53] the case where C is replaced by R or by the quaternions. These cases are less understood than the complex case due to their peculiarities. Let us focus on the Real Ginibre Ensemble, studied by Edelman and his collaborators. The expected number of real eigenvalues is equivalent to 2n/π as n → ∞, see [42], while the probability that all the eigenvalues are real is exactly 2 −n(n−1)/4 , see [ [41]. The analogue of the weak circular law theorem 3.4 was proved by Edelman [42,Theorem 6.3]. More information on the structure of the Real Ginibre Ensemble can be found in [2], [27], and in [96] and [49,Chapter 15]. 12 Here Γ(a, λ) stands for the probability measure on R+ with Lebesgue density x → λ a Γ(a) −1 x a−1 e −λx . 13 Here χ 2 (n) stands for the law of V 2 2 where V ∼ N (0, In).
On overall, one can remember that the Complex Ginibre Ensemble is in a way "simpler" than the GUE while the Real Ginibre Ensemble is "harder" than the GOE. Real Ginibre ≥ GOE ≥ GUE ≥ Complex Ginibre Figure 4. Histograms of real eigenvalues of 500 i.i.d. copies of n −1/2 X with n = 300. The left hand side graphic corresponds to the standard real Gaussian case X 11 ∼ N (0, 1), while the right hand side graphic corresponds to the symmetric Bernoulli case X 11 ∼ 1 2 (δ −1 + δ 1 ). See remark 3.8.
Remark 3.9 (Quaternionic Ginibre Ensemble). The quaternionic Ginibre Ensemble was considered at the origin by Ginibre [53]. It has been recently shown [18] by using the logarithmic potential that there exists an analogue of the circular law theorem for this ensemble, in which the limiting law is supported in the unit ball of the quaternions field.

Universal case
This section is devoted to the proof of the circular law theorem 2.2 following [150]. The universal Marchenko-Pastur theorem 2.1 can be proved by using powerful Hermitian techniques such as truncation, centralization, the method of moments, or the Cauchy-Stieltjes trace-resolvent transform. It turns out that all these techniques fail for the eigenvalues of non-normal random matrices. Indeed, the key to prove the circular law theorem 2.2 is to use a bridge pulling back the problem to the Hermitian world. This is called Hermitization.
Actually, and as we will see in sections 5 and 6, there is a non-Hermitian analogue of the method of moments called the * -moments, and there is an analogue of the Cauchy-Stieltjes trace-resolvent in which the complex variable is replaced by a quaternionic type variable. 4.1. Logarithmic potential and Hermitization. Let P(C) be the set of probability measures on C which integrate log |·| in a neighborhood of infinity. The logarithmic potential U µ of µ ∈ P(C) is the function U µ : C → (−∞, +∞] defined for all z ∈ C by (4.1) For instance, for the circular law C 1 of density π −1 1 {z∈C:|z|≤1} , we have, for every z ∈ C, see e.g. [130]. Let D ′ (C) be the set of Schwartz-Sobolev distributions. We have P(C) ⊂ D ′ (C). Since log |·| is Lebesgue locally integrable on C, the Fubini-Tonelli theorem implies that U µ is Lebesgue locally integrable on C. In particular, U µ < ∞ a.e. and U µ ∈ D ′ (C). Let us define the first order linear differential operators in D ′ (C) and the Laplace operator ∆ = 4∂∂ = 4∂∂ = ∂ 2 x + ∂ 2 y . Each of these operators coincide on smooth functions with the usual differential operator acting on smooth functions. By using Green's or Stockes' theorems, one may show, for instance via the Cauchy-Pompeiu formula, that for any smooth and compactly supported function ϕ : C → R, where z = x + iy. Now (4.4) can be written, in D ′ (C), In other words, 1 2π log |·| is the fundamental solution of the Laplace equation on R 2 . Note that log |·| is harmonic on C \ {0}. It follows that in D ′ (C), This means that for every smooth and compactly supported "test function" ϕ : C → R, where z = x+ iy. This means that − 1 2π U · is the Green operator on R 2 (Laplacian inverse). Lemma 4.1 (Unicity). For every µ, ν ∈ P(C), if U µ = U ν a.e. then µ = ν.
Proof. Since U µ = U ν in D ′ (C), we get ∆U µ = ∆U ν in D ′ (C). Now (4.5) gives µ = ν in D ′ (C), and thus µ = ν as measures since µ and ν are Radon measures. Note more generally that the lemma remains valid if U µ = U ν + h for some harmonic h ∈ D ′ (C).
If A is a n × n complex matrix and P A (z) := det(A− zI) is its characteristic polynomial, for every z ∈ C \ {λ 1 (A), . . . , λ n (A)}. We have also the alternative expression 14 One may retain from this determinantal Hermitization that for any A ∈ M n (C), knowledge of ν A−zI for a.a. z ∈ C ⇒ knowledge of µ A Note that from (4.5), for every smooth compactly supported function ϕ : C → R, The identity (4.7) bridges the eigenvalues with the singular values, and is at the heart of the next lemma, which allows to deduce the convergence of µ A from the one of ν A−zI . The strength of this Hermitization lies in the fact that contrary to the eigenvalues, one can control the singular values with the entries of the matrix using powerful methods such as the method of moments or the trace-resolvent Cauchy-Stieltjes transform. The price paid here is the introduction of the auxiliary variable z. Moreover, we cannot simply deduce the convergence of the integral from the weak convergence of ν A−zI since the logarithm is unbounded on R + . We circumvent this problem by requiring uniform integrability. We recall that on a Borel measurable space (E, E), a Borel function f : E → R is uniformly integrable for a sequence of probability measures (η n ) n≥1 on E when lim t→∞ sup n≥1 {|f |>t} |f | dη n = 0.
We will use this property as follows: if η n η as n → ∞ for some probability measure η and if f is continuous and uniformly integrable for (η n ) n≥1 then f is η-integrable and lim n→∞ f dη n = f dη.
Remark 4.2 (Weak convergence and uniform integrability in probability). Let T be a topological space such as R or C, and its Borel σ-field T . Let (η n ) n≥1 be a sequence of random probability measures on (T, T ) and η be a probability measure on (T, T ). We say that η n η in probability if for all bounded continuous f : T → R and any ε > 0, This is implied by the a.s. weak convergence. We say that a measurable function f : T → R is uniformly integrable in probability for (η n ) n≥1 when for any ε > 0, We will use this property as follows: if η n η in probability and if f is uniformly integrable for (η n ) n≥1 in probability then f is η-integrable and f dη n converges in probability to f dη. This will be helpful in section 6 together with lemma 4.3 in order to circumvent the lack of almost sure bounds on small singular values for heavy tailed random matrices.
The idea of using Hermitization goes back at least to Girko [54]. However, the proofs of lemmas 4.3 and 4.5 below are inspired from the approach of Tao and Vu [150]. . Let (A n ) n≥1 be a sequence of complex random matrices where A n is n × n for every n ≥ 1. Suppose that there exists a family of (non-random) probability measures (ν z ) z∈C on R + such that, for a.a. z ∈ C, a.s.
(i) ν An−zI ν z as n → ∞ (ii) log is uniformly integrable for (ν An−zI ) n≥1 . Then there exists a probability measure µ ∈ P(C) such that (j) a.s. µ An µ as n → ∞ (jj) for a.a. z ∈ C, Moreover, if the convergence (i) and the uniform integrability (ii) both hold in probability for a.a. z ∈ C (instead of for a.a. z ∈ C, a.s.), then (j-jj) hold with the a.s. weak convergence in (j) replaced by the weak convergence in probability.
Proof of lemma 4.3. Let us give the proof of the a.s. part. We first observe that one can swap the quantifiers "a.a." on z and "a.s." on ω in front of (i-ii). Namely, let us call P (z, ω) the property "the function log is uniformly integrable for ν An(ω)−zI n≥1 and ν An(ω)−zI ν z ". The assumptions of the lemma provide a measurable Lebesgue negligible set C in C such that for all z ∈ C there exists a probability one event E z such that for all ω ∈ E z , the property P (z, ω) is true. From the Fubini-Tonelli theorem, this is equivalent to the existence of a probability one event E such that for all ω ∈ E, there exists a Lebesgue negligible measurable set C ω in C such that for all z ∈ C ω , the property P (z, ω) is true.
From now on, we fix an arbitrary ω ∈ E. For every z ∈ C ω , let us define the probability measure ν := ν z and the triangular arrays (a n,k ) 1≤k≤n and (b n,k ) 1≤k≤n by a n,k := |λ k (A n (ω) − zI)| and b n,k := s k (A n (ω) − zI).
Note that µ An(ω)−zI = µ An(ω) * δ −z . Thanks to the Weyl inequalities (1.1) and to the assumptions (i-ii), one can use lemma 4.5 below, which gives that (µ An(ω) ) n≥1 is tight, that for a.a. z ∈ C, log |z − ·| is uniformly integrable for (µ An(ω) ) n≥1 , and that Consequently, if the sequence (µ An(ω) ) n≥1 admits two probability measures µ ω and µ ′ ω as accumulation points for the weak convergence, then both µ ω and µ ′ ω belong to P(C) and U µω = U = U µ ′ ω a.e., which gives µ ω = µ ′ ω thanks to lemma 4.1. Therefore, the sequence (µ An(ω) ) n≥1 admits at most one accumulation point for the weak convergence. Since the sequence (µ An(ω) ) n≥1 is tight, the Prohorov theorem implies that (µ An(ω) ) n≥1 converges weakly to some probability measure µ ω ∈ P(C) such that U µω = U a.e. Since U is deterministic, it follows that ω → µ ω is deterministic by lemma 4.1 again. This achieves the proof of the a.s. part of the lemma. The proof of the "in probability" part of the lemma follows the same lines, using this time the "in probability" part of lemma 4.5.

Remark 4.4 (Weakening uniform integrability in lemma 4.3)
. The set of z in C such that z is an atom of Eµ An for some n ≥ 1 is at most countable, and has thus zero Lebesgue measure. Hence, for a.a. z ∈ C, a.s. for all n ≥ 1, z is not an eigenvalue of A n . Thus for a.a. z ∈ C, a.s. for all n ≥ 1, Hence, assumption (ii) in the a.s. part of lemma 4.3 holds if for a.a. z ∈ C, a.s. where f = log. Similarly, regarding "in probability" part of lemma 4.3, one can replace the sup by lim in the definition of uniform integrability in probability.
The following lemma is in a way the skeleton of proof of lemma 4.3 (no matrices). It states essentially a propagation of a uniform logarithmic integrability for a couple of triangular arrays, provided that a logarithmic majorization holds between the arrays. Lemma 4.5 (Logarithmic majorization and uniform integrability). Let (a n,k ) 1≤k≤n and (b n,k ) 1≤k≤n be two triangular arrays in R + . Define the discrete probability measures δ a n,k and ν n : If the following properties hold (i) a n,1 ≥ · · · ≥ a n,n and b n,1 ≥ · · · ≥ b n,n for n ≫ 1, a n,i for every 1 ≤ k ≤ n for n ≫ 1, (iv) ν n ν as n → ∞ for some probability measure ν, (v) log is uniformly integrable for (ν n ) n≥1 , then (j) log is uniformly integrable for (µ n ) n≥1 (in particular, (µ n ) n≥1 is tight), and in particular, for every accumulation point µ of (µ n ) n≥1 , Moreover, assume that (a n,k ) 1≤k≤n and (b n,k ) 1≤k≤n are random triangular arrays in R + defined on a common probability space such that (i-ii-iii) hold a.s. and (iv-v) hold in probability. Then (j-jj) hold in probability.
Proof. An elementary proof can be found in [23,Lemma C2]. Let us give an alternative argument. Let us start with the deterministic part. From the de la Vallée Poussin criterion (see e.g. [37,Theorem 22]), assumption (v) is equivalent to the existence of a non-decreasing convex function J : On the other hand, assumption (i-ii-iii) implies that for every real valued function ϕ such that t → ϕ(e t ) is non-decreasing and convex, we have, for every 1 ≤ k ≤ n, see [82,Theorem 3.3.13]. Hence, applying this for k = n and ϕ = J, We obtain by this way (j). Statement (jj) follows trivially. We now turn to the proof of the "in probability" part of the lemma. Arguing as in [37,Theorem 22], the statement (v) of uniform convergence in probability is equivalent to the existence for all δ > 0 of a non-decreasing convex function J δ : Since J δ is non-decreasing and convex we deduce as above This proves (j). Statement (jj) is then a consequence of remark 4.2.
Remark 4.6 (Logarithmic potential and Cauchy-Stieltjes transform). We may define the Cauchy-Stieltjes transform m µ : C → C ∪ {∞} of a probability measure µ on C by Since 1/|·| is Lebesgue locally integrable on C, the Fubini-Tonelli theorem implies that m µ (z) is finite for a.a. z ∈ C, and moreover m µ is locally Lebesgue integrable on C and thus belongs to D ′ (C). Suppose now that µ ∈ P(C). The logarithmic potential is related to the Cauchy-Stieltjes transform via the identity In particular, since 4∂∂ = 4∂∂ = ∆ as operators on D ′ (C), we obtain, in D ′ (C), Thus we can recover µ from m µ . Note that for any ε > 0, m µ is bounded on If supp(µ) is one-dimensional then one may completely recover µ from the knowledge of m µ on D ε as ε → 0. Note also that m µ is analytic outside supp(µ), and is thus characterized by its real part or its imaginary part on arbitrary small balls in the connected components of supp(µ) c . If supp(µ) is not one-dimensional then one needs the knowledge of m µ inside the support to recover µ. If A ∈ M n (C) then m µ A is the trace of the resolvent For non-Hermitian matrices, the lack of a Hermitization identity expressing m µ A in terms of singular values explains the advantage of the logarithmic potential U µ A over the Cauchy-Stieltjes transform m µ A for spectral analysis.
Remark 4.7 (Logarithmic potential and logarithmic energy). The term "logarithmic potential" comes from the fact that U µ is the electrostatic potential of µ viewed as a distribution of charged particles in the plane C = R 2 [130]. The so called logarithmic energy of this distribution of charged particles is The circular law minimizes E(·) under a second moment constraint [130]. If supp(µ) ⊂ R then E(µ) matches up to a sign and an additive constant the Voiculescu free entropy for one variable in free probability theory [153, Proposition 4.5] (see also the formula 5.1).
Remark 4.8 (From converging potentials to weak convergence). As for the Fourier transform, the pointwise convergence of logarithmic potentials along a sequence of probability measures implies the weak convergence of the sequence to a probability measure. We need however some strong tightness. More precisely, if (µ n ) n≥1 is a sequence in P(C) and if U : C → (−∞, +∞] is such that (i) for a.a. z ∈ C, lim n→∞ U µn (z) = U (z), (ii) log(1 + |·|) is uniformly integrable for (µ n ) n≥1 , then there exists µ ∈ P(C) such that U µ = U a.e. and µ = − 1 2π ∆U in D ′ (C) and µ n µ.
Let K ⊂ C be an arbitrary compact set. We choose r = r(K) ≥ 1 large enough so that the ball of radius r − 1 contains K, and therefore for every z ∈ K and λ ∈ C, The couple of inequalities above, together with the fact that the function (log |·|) 2 is locally Lebesgue integrable on C, imply, by using Jensen and Fubini-Tonelli theorems, where z = x + iy as usual. Since the de la Vallée Poussin criterion is necessary and sufficient for uniform integrability, this means that the sequence (U µn ) n≥1 is locally uniformly Lebesgue integrable. Consequently, from (i) it follows that U is locally Lebesgue integrable and that U µn → U in D ′ (C). Since the differential operator ∆ is continuous in D ′ (C), we find that ∆U µn → ∆U in D ′ (C). Since ∆U ≤ 0, it follows that µ := − 1 2π ∆U is a measure (see e.g. [79]). Since for a sequence of measures, convergence in D ′ (C) implies weak convergence, we get µ n = − 1 2π ∆U µn µ = − 1 2π ∆U . Moreover, by assumptions (ii) we get additionally that µ ∈ P(C). It remains to show that U µ = U a.e. Indeed, for any smooth and compactly supported ϕ : C → R, since the function log |·| is locally Lebesgue integrable, the Fubini-Tonelli theorem gives  be a deterministic sequence such that M n ∈ M n (C) for every n. If ν Mn ρ as n → ∞ for some probability measure ρ on R + then there exists a probability measure ν ρ on R + which depends only on ρ and such that a.s. ν n −1/2 X+Mn ν ρ as n → ∞.
Theorem 4.9 appears as a special case of the work of Dozier and Silverstein for information plus noise random matrices [39]. Their proof relies on powerful Hermitian techniques such as truncation, centralization, trace-resolvent recursion via Schur block inversion, leading to a fixed point equation for the Cauchy-Stieltjes transform of ν ρ . It is important to stress that ν ρ does not depend on the law of X 11 (recall that X 11 has unit variance). One may possibly produce an alternative proof of theorem 4.9 using free probability theory. For completeness, we will give in sub-section 4.5 a proof of corollary 4.10. Note also that for z = 0, we recover the quarter circular Marchenko-Pastur theorem 2.1.
It remains to check the uniform integrability assumption (ii) of lemma 4.3. From Markov's inequality, it suffices to show that for all z ∈ C, there exists p > 0 such that a.s.
The second statement in (4.9) with p ≤ 2 follows from the strong law of large numbers (2.1) together with (1.6), which gives The first statement in (4.9) concentrates most of the difficulty behind theorem 2.2. In the next two sub-sections, we will prove and comment the following couple of key lemmas taken from [150] and [146] respectively.
In particular there exists b > 0 which may depend on d such that a.s. for n ≫ 1, For ease of notation, we write s i in place of s i (n −1/2 X −zI). Applying lemmas 4.11-4.12 with M = −zI and M = −z √ nI respectively, we get, for any c > 0, z ∈ C, a.s. for n ≫ 1, The first term of the right hand side is a Riemann sum for 1 0 s −p ds which converges as soon as 0 < p < 1. We finally obtain the first statement in (4.9) as soon as 0 < p < min(γ/b, 1). Now the Hermitization lemma 4.3 ensures that there exists a probability measure µ ∈ P(C) such that a.s. µ Y µ as n → ∞ and for all z ∈ C, Since ν z does not depend on the law of X 11 (we say that it is then universal), it follows that µ also does not depend on the law of X 11 , and therefore, by using the circular law theorem 3.5 for the Complex Ginibre Ensemble we obtain that µ is the uniform law on the unit disc. Alternatively, following Pan and Zhou [116,Lemma 3], one can avoid the knowledge of the Gaussian case by computing the integral of ∞ 0 log(s) dν z (s) which should match the formula (4.2) for the logarithmic potential of the uniform law on the unit disc. Proof of lemma 4.11. We follow the original proof of Tao and Vu [150]. Up to increasing γ, it is enough to prove the statement for all 2n 1−γ ≤ i ≤ n − 1 for some γ ∈ (0, 1) to be chosen later. To lighten the notations, we denote by s 1 ≥ · · · ≥ s n the singular values of Y := n −1/2 X + M . We fix 2n 1−γ ≤ i ≤ n − 1 and we consider the matrix Y ′ formed by the first m := n − ⌈i/2⌉ rows of √ nY . Let s ′ 1 ≥ · · · ≥ s ′ m be the singular values of Y ′ . By the Cauchy-Poincaré interlacing 15 , we get n −1/2 s ′ n−i ≤ s n−i Next, by lemma 4.14 we obtain Now H j is independent of R j and dim(H j ) ≤ n − i 2 ≤ n − n 1−γ , and thus, for the choice of γ given in the forthcoming lemma 4.13, (note that the exponential bound in lemma 4.13 kills the polynomial factor due to the union bound over i, j). Consequently, by the first Borel-Cantelli lemma, we obtain that a.s. for n ≫ 1, all 2n 1−γ ≤ i ≤ n − 1, and all 1 ≤ j ≤ n − ⌈i/2⌉, Finally, (4.10) gives s 2 n−i ≥ (i 2 )/(32n 2 ), i.e. the desired result with c 0 := 1/(4 √ 2).
Lemma 4.13 (Distance of a random vector to a subspace). There exist γ > 0 and δ > 0 such that for all n ≫ 1, 1 ≤ i ≤ n, any deterministic vector v ∈ C n and any subspace H The exponential bound above is obviously not optimal, but is more than enough for our purposes: in the proof of lemma 4.11, a large enough polynomial bound on the probability suffices.
Proof. The argument is due to Tao We may thus directly suppose without loss of generality that v = 0 and that E(X ik ) = 0. Then, it is easy to check that The lemma is thus a statement on the deviation probability of dist(R, H). We first perform a truncation. Let 0 < ε < 1/3. Markov's inequality gives Hence, from Hoeffding's deviation inequality 16 , for n ≫ 1, It is thus sufficient to prove that the result holds by conditioning on denote the conditional expectation given E m and the filtration F m generated by X i,m+1 , . . . , X i,n . Let W be the subspace spanned by ]. Next we have Now, let us consider the disc D := {z ∈ C : |z| ≤ n ε } and define the D m → R + convex function f : x → dist((x, 0, . . . , 0), W ). From the triangle inequality, f is 1-Lipschitz: . 16 If X1, . . . , Xn are independent and bounded real r.v. with di := max(Xi) − min(Xi), and if Sn := X1 + · · · + Xn, then P(Sn − ESn ≤ tn) ≤ exp(−2n 2 t 2 /(d 2 1 + · · · + d 2 n )) for any t ≥ 0. See [110,Th. 5.7].
We deduce from Talagrand's concentration inequality 17 that where M m is the median of dist(Y, W ) under E m . In particular, Also, if P denotes the orthogonal projection on the orthogonal of W , we find We choose some 0 < γ < ε. Then, from the above expression for any 1/2 < c < 1 and The following lemma, taken from [150, Lemma A4], is used in the proof of lemma 4.11.
Lemma 4.14 (Rows and trace norm of the inverse). Now we take M = AA * and I = {i}, and we note that The desired formula follows by taking the sum over i ∈ {1, . . . , m}.

4.4.
Smallest singular value. This sub-section is devoted to lemma 4.12 which was used in the proof of theorem 2.2 to get the uniform integrability in lemma 4.3.
The full proof of lemma 4.12 by Tao and Vu in [146] is based on Littlewood-Offord type problems. The main difficulty is the possible presence of atoms in the law of the entries (in this case X is non-invertible with positive probability). Regarding the assumptions, the finite second moment hypothesis on X 11 is not crucial and can be considerably weakened. For the sake of simplicity, we give here a simplified proof when the law of X 11 has a bounded density on C or on R (which implies that X + M is invertible with probability one). In lemma A.1 in Appendix A, we prove a general statement of this type at the price of a weaker probabilistic estimate which is still good enough to obtain the uniform integrability "in probability" required by lemma 4.3.
Proof of lemma 4.12 with bounded density assumption. It suffices to show the first statement since the last statement follows from the first Borel-Cantelli lemma used with a > 1.
For every x, y ∈ C n and S ⊂ C n , we set x · y := x 1 y 1 + · · · + x n y n and x 2 := √ x · x and dist(x, S) := min y∈S x − y 2 . Let R 1 , . . . , R n be the rows of X + M and set and consequently, by the union bound, for any u ≥ 0, Let us fix 1 ≤ i ≤ n. Let Y i be a unit vector orthogonal to R −i . Such a vector is not unique, but we may just pick one which is independent of R i . This defines a random variable on the unit sphere S n−1 = {x ∈ C n : x 2 = 1}. By the Cauchy-Schwarz inequality, where π i is the orthogonal projection on the orthogonal complement of R −i . Let ν i be the distribution of Y i on S n−1 . Since Y i and R i are independent, for any u ≥ 0, Let us assume that X 11 has a bounded density ϕ on C. Since y 2 = 1 there exists an index j 0 ∈ {1, . . . , n} such that y j 0 = 0 with |y j 0 | −1 ≤ √ n. The complex random variable R i · y is a sum of independent complex random variables and one of them is X ij 0 y j 0 , which is absolutely continuous with a density bounded above by √ n ϕ ∞ . Consequently, by a basic property of convolutions of probability measures, the complex random variable R i · y is also absolutely continuous with a density ϕ i bounded above by √ n ϕ ∞ , and thus Therefore, for every b > 0, we obtain the desired result (the O does not depend on M ) This scheme remains indeed valid in the case where X 11 has a bounded density on R.
Lemma 4.16 (Rows and operator norm of the inverse). Let A be a complex n × n matrix with rows R 1 , . . . , R n . Define the vector space R −i := span{R j : j = i}. We have then Proof of lemma 4.16. The argument, due to Rudelson and Vershynin, is buried in [128]. Since A and A ⊤ have same singular values, one can consider the columns C 1 , . . . , C n of A instead of the rows. For every column vector x ∈ C n and 1 ≤ i ≤ n, the triangle inequality and the identity Ax = x 1 C 1 + · · · + x n C n give If x 2 = 1 then necessarily |x i | ≥ n −1/2 for some 1 ≤ i ≤ n and therefore s n (A) = min Conversely, for every 1 ≤ i ≤ n, there exists a vector y with y i = 1 such that where we used the fact that y 2 2 = |y 1 | 2 + · · · + |y n | 2 ≥ |y i | 2 = 1. Remark 4.17 (Assumptions for the control of the smallest singular value). In the proof of lemma 4.12 with the bounded density assumption, we have not used the assumption on the second moment of X 11 nor the assumption on the norm of M .

4.5.
Convergence of singular values measure. This sub-section is devoted to corollary 4.10. The proof is divided into five steps.
Step One: Concentration of singular values measure. First, it turns out that it is sufficient to prove the convergence to ν z of Eν n −1/2 X−z . Indeed, for matrices with independent rows, there is a remarkable concentration of measure phenomenon. More precisely, recall that the total variation norm of f : R → R is defined as where the supremum runs over all sequences (x k ) k∈Z such that x k+1 ≥ x k for any k ∈ Z. If f = 1 (−∞,s] for some s ∈ R then f TV = 1, while if f has a derivative in L 1 (R), f TV = |f ′ (t)| dt. The following lemma is extracted from [26], see also [74]. If M is a n × n complex random matrix with independent rows (or with independent columns) then for any f : R → R going to 0 at ±∞ with f TV ≤ 1 and every t ≥ 0, It is worth to mention that if M has independent entries which satisfy a uniform sub-Gaussian behavior, then for all Lipschitz function, the concentration of f dν M has a rate n 2 and not n, see e.g. the work of Guionnet and Zeitouni [71].
Proof. If A, B ∈ M n (C) and if F A (·) := ν A ((−∞, ·)) and F B (·) := ν B ((−∞, ·)) are the cumulative distribution functions of the probability measures ν A and ν B then it is easily seen from the Lidskii inequality for singular values 18 that Now for a smooth f : R → R, we get, by integrating by parts, Since the left hand side depends on at most 2n points, we get, by approximation, for every measurable function f : From now on, f : R → R is a fixed measurable function with f TV ≤ 1. For every row vectors x 1 , . . . , x n in C n , we denote by A(x 1 , . . . , x n ) the n × n matrix with rows x 1 , . . . , x n and we define F : (C n ) n → R by ...,xn) .
For any i ∈ {1, . . . , n} and any row vectors . , x n )) ≤ 1 and thus Finally, the desired result follows from the McDiarmid-Azuma-Hoeffding concentration inequality for bounded differences 19 applied to the function F and to the random variables R 1 , . . . , R n (the rows of M ).
Step Two: Truncation and centralization. In the second step, we prove that it is sufficient to prove the convergence for entries with bounded support. More precisely, we define where κ = κ n is a sequence growing to infinity. Then if Y = (Y ij ) 1≤i,j≤n , we have from Hoffman-Wielandt inequality (1.8), By assumption E|X ij | 2 1 {|X ij |>κ} goes to 0 as κ goes to infinity. Hence, by the law of large numbers, the right hand side of the above inequality converges a.s. to 0. On the left hand side we recognize the square of the Wasserstein W 2 coupling distance 20 between ν n −1/2 Y −zI and ν n −1/2 X−zI . Since the convergence in W 2 distance implies weak convergence, we deduce that it is sufficient to prove the convergence of Eν n −1/2 Y −zI to ν z . Next, we turn to the centralization by setting Then if Z = (Z ij ) 1≤i,j≤n , we have from the Lidskii inequality for singular values, In particular, it is sufficient to prove the convergence of Eν n −1/2 Z−zI to ν z . In summary, in the remainder of this sub-section, we will allow the law of X 11 to depend on n but we will assume that EX 11 = 0 , P(|X 11 | ≥ κ n ) = 0 and E|X 11 | 2 = σ 2 n , (4.14) 19 If X1, . . . , Xn are independent r.v. in X1, . . . , Xn and if f : X1 × · · · × Xn → R is a measurable function then P(|f (X1, . . . , Xn) − Ef (X1, . . . , Xn)| ≥ t) ≤ 2 exp(−2t 2 /(c 2 1 + · · · + c 2 n )) for any t ≥ 0, where We refer to McDiarmid [110]. 20 The W2 distance between two probability measures η1, η2 on R is W2(η1, η2) := inf E(|X1 − X2| 2 ) 1/2 where the inf runs over the set of r.v. (X1, X2) on R × R with X1 ∼ η1 and X2 ∼ η2. In the case where η1 = 1 n n i=1 δa i with 0 ≤ ai ր and η2 = 1 where κ = κ n = o( √ n) is a sequence growing to infinity and σ = σ n goes to 1 as n goes to infinity.
Step Three: Linearization. We use a popular linearization technique: we remark the identity of the Cauchy-Stieltjes transform, for η ∈ C + , whereν(·) = (ν(·) + ν(−·))/2 is the symmetrized version of a measure ν, and Through a permutation of the entries, this matrix H(z) is equivalent to the matrix q(z, η) := η z z η and for every 1 ≤ i, j ≤ n, Note that B(z) ∈ M n (M 2 (C)) ≃ M 2n (C) is Hermitian and its resolvent is denoted by Then R(q) ∈ M n (M 2 (C)) and, by (4.15), we deduce that We set It is easy to check that Hence, in order to prove that Eν n −1/2 X−zI converges, it is sufficient to prove that Ea(q) converges to, say, α(q) which, by tightness, will necessarily be the Cauchy-Stieltjes transform of a symmetric measure.
Step Four: Approximate fixed point equation. We use a resolvent method to deduce an approximate fixed point equation satisfied by a(q). Schur's block inversion (4.12) gives where Q ∈ M n−1,1 (M 2 (C)), is the resolvent of a minor. We denote by F n−1 the smallest σ-algebra spanned by the variables (X ij ) 1≤i,j≤n−1 . We notice that R is F n−1 -measurable and is independent of Q. If E n [ · ] := E[ · |F n−1 ], we get, using (4.14) and (4.16) Recall that B(z) is a minor of B(z). We may thus use interlacing as in (4.13) for the function f = (· − η) −1 , and we find .
Hence, we have checked that with ε 1 2 = o(1) (note here that q(z, η) is fixed). Moreover, we define Also, by (4.14) E|X 2 ij − σ 2 | 2 ≤ 2κ 2 σ 2 . Then, an elementary computation gives Also, we note by lemma 4.18 that a(q) is close to its expectation: Thus, the matrix has a norm which converges to 0 in expectation as n → ∞. Now, we use the identity Hence, since the norms of q + E a 0 0 a −1 and R nn are at most Im(η) −1 , we get . In other words, using exchangeability, Step with α = α(q) ∈ C + . We find Hence, α is a root of a polynomial of degree 3. Hence, to conclude the proof of corollary 4.10, it is sufficient to prove that there is unique symmetric measure whose Cauchy-Stieltjes transform is solution of this fixed point equation. For any η ∈ C + , it is simple to check that this equation has a unique solution in C + which can be explicitly computed. Alternatively, we know from (4.17) and Montel's theorem that η ∈ C + → α(q(z, η)) ∈ C + is analytic. In particular, it is sufficient to check that there is a unique solution in C + for η = it, with t > 0. To this end, we also notice from (4.17) that α(q) ∈ iR + for q = q(z, it). Hence, if h(z, t) = Im(α(q)), we find Thus, h = 0 and The right hand side in a decreasing function in h on (0, ∞) with limits equal to +∞ and 0 at h → 0 and h → ∞. Thus, there is a unique solution of the above equation.
We have thus proved that E a b b a converges. The proof of corollary 4.10 is over.
4.6. The quaternionic resolvent: an alternative look at the circular law.
Motivation. The aim of this sub-section is to develop an efficient machinery to analyze the spectral measures of a non-Hermitian matrix which avoids a direct use of the logarithmic potential and the singular values. This approach is built upon methods in the physics literature, e.g. [47,69,127,126]. As we will see, it is a refinement of the linearization procedure used in the proof of corollary 4.10. Recall that the Cauchy-Stieltjes transform of a measure ν on R is defined, for η ∈ C + , as The Cauchy-Stieltjes transform characterizes every probability measure on R, and actually, following Remark 4.6, every probability measure on C. However, if the support of the measure is not one-dimensional, then one needs the knowledge of the Cauchy-Stieltjes transform inside the support, which is not convenient. For a probability measure on C, it is tempting to define a quaternionic Cauchy-Stieltjes transform. For q ∈ H + , where This transform characterizes the measure: in D ′ (C), where ∂ is as in (4.3) and q(z, η) := η z z η .
If A ∈ M n (C) is normal then M µ A can be recovered from the trace of a properly defined quaternionic resolvent. If A is not normal, the situation is however more delicate and needs a more careful treatment.
Definition of quaternionic resolvent. For further needs, we will define this quaternionic resolvent in any Hilbert space. Let H be an Hilbert space with inner product ·, · . We define the Hilbert space H 2 = H × Z/2Z. For x = (y, ε) ∈ H 2 , we setx = (y, ε + 1). In particular, this transform is an involutionx = x. There is the direct sum decomposition For x ∈ H, if Π x : H 2 → C 2 denotes the orthogonal projection on ((x, 0), (x, 1)), for x, y ∈ D(A), we find The operator B will be called the bipartized operator of A, it is an Hermitian operator (i.e. for all x, y ∈ D(B), Bx, y = x, By ). If B is essentially self-adjoint (i.e. it has a unique self-adjoint extension), we may define the quaternionic resolvent of A for all q ∈ H + as Indeed, if q = q(z, η), we note that R A is the usual resolvent at η of the essentially selfadjoint operator B(z) = B − I H ⊗ q(z, 0). Hence R A inherits the usual properties of resolvent operators (analyticity in η, bounded norm). We define If H is separable and (e i ) i≥1 is a canonical orthonormal basis of H, we simply write R ij instead of R A (q) e i e j , i, j ∈ V . Finally, if A ∈ M n (C), we set If A is normal then it can be checked that R(q) kk ∈ H + and Γ A (q) = M µ A (q). However, if A is not normal, this formula fails to hold. However, the next lemma explains how to recover anyway µ A from the resolvent.
Lemma 4.19 (From quaternionic transform to spectral measures). Let A ∈ M n (C) and q = q(z, η) ∈ H + . Then, Moreover, mν A−z (η) = a(q) and, in D ′ (C), Proof. For ease of notation, assume that z = 0 and set τ (·) = 1 n Tr(·). If P is the permutation matrix associated to the permutation σ(2k − 1) = k, σ(2k) = n + k, we get Hence, Notice that Note also that µ A * A = µ AA * implies that Finally, since τ is a trace, Applying the above to A − z, we deduce the first two statements.
For the last statement, we write Hence, from Jacobi formula (see remark 4.6 for the definition of ∂ and ∂) , it)).
The function log |s + it| dν A−z (s) decreases monotonically to as t ↓ 0. Hence, in distribution, The conclusion follows.
Girko's Hermitization lemma revisited. There is a straightforward extension of Girko's lemma 4.3 that uses the quaternionic resolvent.
Lemma 4.20 (Girko Hermitization). Let (A n ) n≥1 be a sequence of complex random matrices defined on a common probability space where A n takes its values in M n (C). Assume that for all q ∈ H + , there exists such that for a.a. z ∈ C, η ∈ C + , with q = q(z, η), (i') a.s. (respectively in probability) Γ An (q) converges to Γ(q) as n → ∞ (ii) a.s. (respectively in probability) log is uniformly integrable for (ν An−zI ) n≥1 Then there exists a probability measure µ ∈ P(C) such that (j) a.s. (respectively in probability) µ An µ as n → ∞ (jj') in D ′ (C), Note that, by lemma 4.19, assumption (i') implies assumption (i) of lemma 4.3 : the limit probability measure ν z is characterized by mν z (η) = a(q).
The potential interest of lemma 4.20 lies in the formula for µ. It avoids any use the logarithmic potential.
Concentration. The quaternionic resolvent enjoys a simple concentration inequality, exactly as for the empirical singular values measure.

Lemma 4.21 (Concentration for the quaternionic resolvent). If
A is a random matrix in M n (C) with independent rows (or columns) then for any q = q(z, η) ∈ H + and t ≥ 0, Proof. Let M, N ∈ M n (C) with bipartized matrices B, C ∈ M 2n (C). We have Indeed, from the resolvent identity, for any q ∈ H + , It follows that D has rank r ≤ rank(B − C) = 2 rank(M − N ). Also, recall that the operator norm of D is at most 2Im(η) −1 . Hence, in the singular values decomposition Using Cauchy-Schwartz inequality, We obtain precisely (4.19). The remainder of the proof is now identical to the proof of lemma 4.18: we express Γ A (q) − EΓ A (q) has a sum of bounded martingales difference.
Computation for the circular law. As pointed out in [127], the circular law is easily found from the quaternionic resolvent. Indeed, using lemma 4.21 and the proof of corollary 4.10, we get, for all q ∈ H + , a.s.
In the proof of corollary 4.10, we have checked that for η = it, α(q) = ih(z, t) ∈ iR + where We deduce easily that A proof of the lemma follows by using the argument in the proof of lemma 4.3. Using their replacement principle, Tao and Vu have proved in [150] that the universality of the limit spectral measures of random matrices goes far beyond the circular law. We state it here in a slightly stronger form than the original version, see [22].
Theorem 5.2 (Universality principle for shifted matrices). Let X and G be the random matrices considered in sections 3 and 4 obtained from infinite tables with i.i.d. entries. Consider a deterministic sequence (M n ) n≥1 such that M n ∈ M n (C) and for some p > 0, lim n→∞ s p dν Mn (s) < ∞.

Related models.
We give a list of models related to the circular law theorem 2.2.
Sparsity. The circular law theorem 2.2 may remain valid if one allows the entries law to depend on n. This extension contains for instance sparse models in which the law has an atom at 0 with mass p n → 1 at a certain speed, see [66,146,156].
Outliers. The circular law theorem 2.2 allows the blow up of an arbitrary (asymptotically negligible) fraction of the extremal eigenvalues. Indeed, it was shown by Silverstein [137] that if E(|X 11 | 4 ) < ∞ and E(X 11 ) = 0 then the spectral radius |λ 1 (n −1/2 X)| tends to infinity at speed √ n and has a Gaussian fluctuation. This observation of Silverstein is the base of [31], see also the ideas of Andrew [8]. More recently, Tao studied in [143] the outliers produced by various types of perturbations including general additive perturbations.
Sum and products. The scheme of proof of theorem 2.2 (based on Hermitization, logarithmic potential, and uniform integrability) turns out to be quite robust. It allows for instance to study the limit of the empirical distribution of the eigenvalues of sums and products of random matrices, see [22], and also [67] in relation with Fuss-Catalan laws. We may also mention [115]. The crucial step lies in the control of the small singular values.
Cauchy and the sphere. It is well known that the ratio of two independent standard real Gaussian variables is a Cauchy random variable, which has heavy tails. The complex analogue of this phenomenon leads to a complex Cauchy random variable, which is also the image law by the stereographical projection of the uniform law on the sphere. The matrix analogue consists in starting from two independent copies G 1 and G 2 of the Complex Ginibre Ensemble, and to consider the random matrix Y = G −1 1 G 2 . The limit of µ Y was analyzed by Forrester and Krishnapur [50]. Note that Y does not have i.i.d. entries.
Random circulant matrices. The eigenvalues of a non-Hermitian circulant matrix are linear functionals of the matrix entries. Meckes [112] used this fact together with the central limit theorem in order to show that if the entries are i.i.d. with finite positive variance then the scaled empirical spectral distribution of the eigenvalues tends to a Gaussian law. We can imagine a heavy tailed version of this phenomenon with α-stable limiting laws.
Single ring theorem. Let D ∈ M n (R + ) be a random diagonal matrix and U, V ∈ M n (C) be two independent Haar unitary matrices, independent of D. The law of X := U DV * is unitary invariant by construction, and ν X = µ D (it is a random SVD). Assume that µ D tends to some limiting law ν as n → ∞. It was conjectured by Feinberg and Zee [48] that µ X tends to a limiting law which is supported in a centered ring of the complex plane, i.e. a set of the form {z ∈ C : r ≤ |z| ≤ R}. Under some additional assumptions, this was proved by Guionnet, Krishnapur, and Zeitouni [70] by using the Hermitization technique and specific aspects such as the Schwinger-Dyson non-commutative integration by parts. Guionnet and Zeitouni have also obtained the convergence of the support in a more recent work [73]. The Complex Ginibre Ensemble is a special case of this unitary invariant model. Very recently, Khoruzhenko discovered a new and relatively simple model (quadratized rectangular Ginibre matrix) which gives rise to a single ring.
Large deviations and logarithmic potential with external field. The circular law theorem 3.5 for the Complex Ginibre Ensemble can be seen as a special case of the circular law theorem for unitary invariant random matrices with eigenvalues density proportional to where V : C → R is a smooth potential growing enough at infinity. Since we discover an empirical version of the logarithmic energy functional E(·) defined in (4.8) penalized by the "external" potential V . Indeed, it has been shown by Hiai and Petz [119] (see also Ben Arous and Zeitouni [17]) that the Complex Ginibre Ensemble satisfies a large deviations principle at speed n 2 for the weak topology on the set of symmetric probability measures (with respect to conjugacy), with good rate function given by This rate function achieves its minimum 0 at point µ = C 1 . This is coherent with the fact that the circular law C 1 is the minimum of the logarithmic energy among the probability measures on C with fixed variance, see the book of Saff and Totik [130]. Note that this large deviations principle gives an alternative proof of the circular law for the Ginibre Ensemble thanks to the first Borel-Cantelli lemma.
Dependent entries. According to Girko, in relation to his "canonical equation K20", the circular law theorem 2.2 remains valid for random matrices with independent rows provided some natural hypotheses [57]. A circular law theorem is available for random Markov matrices including the Dirichlet Markov Ensemble [23], and random matrices with i.i.d. log-concave isotropic rows 22 [1]. Another Markovian model consists in a non-Hermitian random Markov generator with i.i.d. off-diagonal entries, which gives rise to a new limiting spectral distribution, possibly not rotationally invariant, which can be interpreted using free probability theory, see [24]. Yet another model related to projections in which each row has a zero sum is studied in [143]. To end up this tour, let us mention another kind of dependence which comes from truncation of random matrices with depend entries such as Haar unitary matrices. Namely, let U be distributed according to the uniform law on the unitary group U n (we say that U is Haar unitary). Dong, Jiang, and Li have shown in [38] that the empirical spectral distribution of the diagonal sub-matrix (U ij ) 1≤i,j≤m tends to the circular law if m/n → 0, while it tends to the arc law (uniform law on the unit circle {z ∈ C : |z| = 1}) if m/n → 1. Other results of the same flavor can be found in [89].
Tridiagonal matrices. The limiting spectral distributions of random tridiagonal Hermitian matrices with i.i.d. entries are not universal and depend on the law of the entries, see [120] for an approach based on the method of moments. The non-Hermitian version of this model was studied by Goldsheid and Khoruzhenko [64] by using the logarithmic potential. Indeed, the tridiagonal structure produces a three terms recursion on characteristic polynomials which can be written as a product of random 2 × 2 matrices, leading to the usage of a multiplicative ergodic theorem to show the convergence of the logarithmic potential (which appears as a Lyapunov exponent). In particular, neither the Hermitization nor the control the smallest and small singular values are needed here. Indeed the approach relies directly on remark 4.8. Despite this apparent simplicity, the structure of the limiting distributions may be incredibly complicated and mathematically mysterious, as shown on the Bernoulli case by the physicists Holz, Orland, and Zee [78].
5.3. Free probability interpretation. As we shall see, the circular law and its extensions have an interpretation in free probability theory, a sub-domain of operator algebra theory. Before going further, we should recall briefly certain classical notions of operator algebra. We refer to Voiculescu, Dykema and Nica [152] for a complete treatment of free non-commutative variables, see also the book by Anderson, Guionnet, and Zeitouni for the link with random matrices [7]. In the sequel, H is an Hilbert space and we consider a pair (M, τ ) where M is an algebra of bounded operators on H, stable by the adjoint operation * , and where τ : M → C is a linear map such that τ (1) = 1, τ (aa * ) = τ (a * a) ≥ 0.
Definition of Brown measure. For a ∈ M, define |a| = √ aa * . For b self-adjoint element in M, we denote by µ b the spectral measure of b: it is the unique probability measure on the real line satisfying, for any integer k ∈ N, Also, if a ∈ M, we define ν a = µ |a| .
Then, in the spirit of (4.7), the Brown measure [30] of a ∈ M is the unique probability measure µ a on C, which satisfies for almost all z ∈ C, log |z − λ| dµ a (λ) = log(s) dν a−z (s).
In distribution, it is given by the formula 23 The fact that the above definition is indeed a probability measure requires a proof, which can be found in [76]. Our notation is consistent: first, if a is self-adjoint, then the Brown (spectral) measure coincides with the spectral measure. Secondly, if M = M n (C) and τ := 1 n Tr is the normalized trace on M n (C), then we retrieve our usual definition for ν A and µ A . It is interesting to point out that the identity (5.2) which is a consequence of the definition of the eigenvalues when M = M n (C) serves as a definition in the general setting of von Neumann algebras.
Beyond bounded operators, and as explained in Brown [30] and in Haagerup and Schultz [76], it is possible to define, for a classM ⊃ M of closed densely defined operators affiliated with M, a probability measure on C called the Brown spectral measure of a ∈M.
Failure of the method of moments. For non-Hermitian matrices, the spectrum does not necessarily belong to the real line, and in general, the limiting spectral distribution is not supported in the real line. The problem here is that the moments are not enough to characterize laws on C. For instance, if Z is a complex random variable following the uniform law C κ on the centered disc {z ∈ C; |z| ≤ κ} of radius κ then for every r ≥ 0, E(Z r ) = 0 and thus C κ is not characterized by its moments. Any rotational invariant law on C with light tails shares with C κ the same sequence of null moments. One can try to circumvent the problem by using "mixed moments" which uniquely determine µ by the Weierstrass theorem. Namely, for every A ∈ M n (C), if A = U T U * is the Schur unitary triangularization of A then for every integers r, r ′ ≥ 0 and with z = x + iy and τ = 1 n Tr, Indeed equality holds true when T = T * , i.e. when T is diagonal, i.e. when A is normal. This explains why the method of moments looses its strength for non-normal operators.
To circumvent the problem, one may think about using the notion of ⋆-moments. Note that if A is normal then for every word A ε 1 · · · A ε k where ε 1 , . . . , ε n ∈ {1, * }, we have where k 1 , k 2 are the number of occurrence of A and A * .
⋆-distribution. The ⋆-distribution of a ∈ M is the collection of all its ⋆-moments: where n ≥ 1 and ε 1 , . . . , ε n ∈ {1, * }. The element c ∈ M is circular when it has the ⋆distribution of (s 1 + is 2 )/ √ 2 where s 1 and s 2 are free semi circular variables with spectral measure of Lebesgue density . The ⋆-distribution of a ∈ M allows to recover the moments of |a − z| 2 = (a − z)(a − z) * for all z ∈ C, and thus ν a−z for all z ∈ C, and thus the Brown measure µ a of a. Actually, for a random matrix, the ⋆-distribution contains, in addition to the spectral measure, an information on the eigenvectors of the matrix.
We say that a sequence of matrices (A n ) n≥1 where A takes it values in M n (C) converges in ⋆-moments to a ∈ M, if all ⋆-moments converge to the ⋆-moments of a ∈ M. For example, if G ∈ M n (C) is our complex Ginibre matrix, then a.s. as n → ∞, n −1/2 G converges in ⋆-moments to a circular element.
Discontinuity of the Brown measure. Due to the unboundedness of the logarithm, the Brown measure µ a depends discontinuously on the ⋆-moments of a [20,138]. The limiting measures are perturbations by "balayage". A simple counter example is given by the matrices of example 1.2. For random matrices, this discontinuity is circumvented in the Girko Hermitization by requiring a uniform integrability, which turns out to be a.s. satisfied the random matrices n −1/2 X in the circular law theorem 2.2.
However,Śniady [138,Theorem 4.1] has shown that it is always possible to regularize the Brown measure by adding an additive noise. More precisely, if G is as above and (A n ) n≥1 is a sequence of matrices where A n takes its values in M n (C), and if the ⋆-moments of A n converge to the ⋆-moments of a ∈ M as n → ∞, then a.s. n → ∞ µ An+tn −1/2 G converges to µ a+tc , c is circular element free of a. In particular, by choosing a sequence t n going to 0 sufficiently slowly, it is possible to regularize the Brown measure: a.s. µ An+tnn −1/2 G converges to µ a . Note that the universality theorem 5.2 shows that the same result holds if we replace G by our matrix X. We refer to Ryan [129] and references therein for the analysis of the convergence in ⋆-moments. See also the forthcoming book of Tao [144]. TheŚniady theorem was revisited recently by Guionnet, Wood, and Zeitouni [72].

Heavy tailed entries and new limiting spectral distributions
This section is devoted to the study of the analogues of the quarter circular and circular law theorems 2.1-2.2 when X 11 has an infinite variance (and thus heavy tails). The approach taken from [26] involves many ingredients including the Hermitization of section 4. To lighten the notations, we often abridge A − zI into A − z for an operator or matrix A and a complex number z. 6.1. Heavy tailed analogs of quarter circular and circular laws. We now come back to an array X := (X ij ) 1≤i,j≤n of i.i.d. random variables on C. We lift the hypothesis that the entries have a finite second moment: we will assume that, • for some 0 < α < 2, lim t→∞ t α P(|X 11 | ≥ t) = 1, (6.1) • as t → ∞, the conditional probability P X 11 |X 11 | ∈ · |X 11 | ≥ t converges to a probability measure on the unit circle S 1 := {z ∈ C : |z| = 1}.
The law of the entries belongs then to the domain of attraction of an α-stable law. An example is obtained when |X 11 | and X 11 /|X 11 | are independent with |X 11 | = |S| where S is real symmetric α-stable. Another example is given by X 11 = εW −1/α with ε and W independent such that ε is supported in S 1 while W is uniform on [0, 1]. The interest on this type of random matrices has started with the work of the physicists Bouchaud and Cizeau [28]. One might think that the analog of the Ginibre ensemble is a matrix with i.i.d. α-stable entries. It turns out that this random matrix ensemble is not unitary invariant and there is no explicit expression for the distribution of its eigenvalues. This lack of comparison with a canonical ensemble makes the analysis of the limit spectral measures more delicate. We may first wonder what is the analog of the quarter circular law theorem 2.1. This question has been settled by Belinschi, Dembo and Guionnet [15] (built upon the earlier work of Ben Arous and Guionnet [16]). Theorem 6.1 (Singular values of heavy tailed random matrices). There exists a probability measure ν α on R + such that a.s. ν n −1/α X ν α as n → ∞.
This probability measure ν α depends only on α. It does not have a known explicit closed form but has been studied in [16,25,15]. We know that ν α has a bounded continuous density f α on R + , which is analytic on some neighborhood of ∞. The explicit value of f α (x) is only known for x = 0. But, more importantly, we have In particular, ν α inherits the tail behavior of the entries: The measure ν α is a perturbation of the quarter circular law: it can be proved that ν α converges weakly to the quarter circular law as α converges to 2. Contrary to the finite variance case, the n −1/α normalization cannot be understood from the computation of since the later diverges. A proof of the tightness of ν n −1/α X requires some extra care that we will explain later on. However, at a heuristic level, we may remark that if R 1 , . . . , R n denotes the rows of n −1/α X then for each k, converges weakly to a non-negative α 2 -stable random variable. Hence the n −1/α normalization stabilizes the norm of each row of X.
Following [26], we may also investigate the behavior of the eigenvalues of X. Here is the analogue of the circular law theorem 2.2 for our heavy tailed entries matrix model. Theorem 6.2 (Eigenvalues of heavy tailed random matrices). There exists a probability measure µ α on C such that in probability µ n −1/α X µ α as n → ∞. Moreover, if X 11 has a bounded density, then the convergence is almost sure.
We believe that theorem 6.2 can be upgraded to an a.s. weak convergence, but our method does not catch this due to slow "in probability" controls on small singular values.
Again, the measure µ α depends only on α and is not known explicitly. However, it is isotropic and has a bounded continuous density with respect to Lebesgue measure dxdy on C: dµ α (z) = g α (|z|)dxdy. The value of g α (r) is explicit for r = 0. As r → ∞, the tail behavior of g α is up to multiplicative constant equivalent to This exponential decay is quite surprising and contrasts with the power tail behavior of f α . It indicates that X is typically far from being a normal matrix. Also, we see that the eigenvalues limit spectrum is more concentrated than the singular values limit spectrum. In fact, in the finite variance case, the phenomenon is already present: the quarter circular law has support [0, 2] while the circular law has support the unit disc. Again, the measure is µ α is perturbation of the circular law: µ α converges weakly to the circular law as α converges to 2. The proof of theorem 6.2 will follow the general strategy of Girko's Hermitization. Lemma 4.3 gives a characterization of the limit measure in terms of its logarithmic potential. Here, it turns out to be not so convenient in order to analyze the measure µ α . We will rather use the quaternionic version of Girko's Hermitization, i.e. lemma 4.20. For statement (i ′ ) in lemma 4.20, we will prove a generalized version of theorem 6.1. Theorem 6.3 (Singular values of heavy tailed random matrices). For all z ∈ C there exists a probability measure ν α,z on R + such that a.s. ν n −1/α X−z ν α,z as n → ∞. Moreover, with the notations used in lemma 4.20, for all q = q(z, η) ∈ H + , there exists Γ(q) ∈ H + , such that a.s. Γ n −1/α X (q) converges to Γ(q) and Γ(q) 11 = mν α,z (η).
Objective method -sparse random graphs and trees. The strategy for proving theorem 6.3, borrowed from [26], will differ significantly from the proof of theorem 2.1. We will prove that n −1/α X converges in some sense, as n → ∞, to a limit random operator A defined in the Hilbert space ℓ 2 (N). This will be done by using the "objective method" initially developed by Aldous and Steele in the context of randomized combinatorial optimization, see [6]. We build an explicit operator on Aldous' Poisson Weighted Infinite Tree (PWIT) and prove that it is the local limit of the matrices n −1/α X in an appropriate sense. While Poisson statistics arises naturally as in all heavy tailed phenomena, the fact that a tree structure appears in the limit is roughly explained by the observation that non-vanishing entries of the rescaled matrix n −1/α X can be viewed as the adjacency matrix of a sparse random graph which locally looks like a tree. In particular, the convergence to PWIT is a weighted-graph version of familiar results on the local tree structure of Erdős-Rényi random graphs.
Free probability. We note finally that it is possible to associate to the PWIT a natural operator algebra M with a tracial state τ . Then for some operator a affiliated to M, the probability measure µ α is equal to the Brown measure µ a of a, and ν α = µ |a| = ν a is the singular value measure of a. See the work of Aldous and Lyons [5, 106, Example 9.7 and Sub-Section 5], and the recent work of Male [107].
In summary, it it is sufficient to prove that for some p > 0, a.s. lim n→∞ s p dν n −1/α X (s) < ∞. (6.3) and (6.2) will follow. We shall use the following Schatten bound: for all 0 < p ≤ 2, for every A ∈ M n (C), where R 1 , . . . , R n are the rows of A (for a proof, see Zhan [158, proof of Theorem 3.32]). The above inequality is an equality if p = 2 (for p > 2, the inequality is reversed). For our matrix, A = n −1/α X, we find The strategy of proof of (6.3) is now clear: the right hand side is a sum of i.i.d. variables, and from (6.1), Y k,n = n −2/α n i=1 |X ki | 2 is the domain attraction of a non-negative α/2stable law. We may thus expect, and it is possible to prove, that for q small enough, lim n→∞ EY 4q k,n < ∞.
Then, the classical proof of the strong law of large numbers for independent random variables bounded in L 4 implies (6.3).
Uniform integrability. We will prove statement (ii) of lemma 4.20 in probability. Fix z ∈ C. Using (6.2), we shall prove the uniform integrability in probability of min(0, log) for (ν n −1/α X−z ) n≥1 . From Markov's inequality, it is sufficient to prove that for some c > 0, Arguing as in the finite variance case, the latter will in turn follow from two lemmas: The next lemma asserts that the i-th smallest singular of the random matrix n −1/α X+M is at least of order (i/n) 2α/(α+2) in a weak sense. This is not optimal but enough. Lemma 6.5 (Count of small singular values). There exist 0 < γ < 1 and c 0 > 0 such that for all M ∈ M n (C), there exists an event F n such that lim n→∞ P(F n ) = 1 and for all n 1−γ ≤ i ≤ n − 1 and n ≫ 1, .
Let us first check that these two lemmas imply (6.4) (and thus statement (ii) of lemma 4.20). Let us define the event E n := F n ∩ {s n (n −1/α X − z) ≥ n −b }. Let us define also Since the event E n has probability tending 1, the proof of (6.4) would follow from For simplicity, we write s i instead s i (n −1/α X −zI). Since s n ≥ n −b has probability tending to 1, by lemma 6.5, for all n 1−γ ≤ i ≤ n − 1, Then, for 0 < p ≤ 2, using Jensen inequality, we find .
In this last expression we discover a Riemann sum. It is uniformly bounded if p < γ/b and p < 2α/(α + 2). The uniform bound (6.4) follows.
Proof of lemma 6.4. The probability that s 1 (X) ≥ n 1+p is upper bounded by the probability that one of the entries of X is larger that n p . From Markov's inequality and the union bound, for p large enough, this event has probability at most 1/n. In particular, s 1 (X + M ) ≤ s 1 (X) + s 1 (M ) is at most 2n q for q = max(p, d) with probability at least 1 − 1/n. The statement is then a corollary of lemma A.1. Note: a simplified proof in the bounded density case may be obtained by adapting the proof of lemma 4.12 (see [26]).
Sketch of proof of lemma 6.5. We now comment the proof of lemma 6.5, the detailed argument is quite technical and is omitted here. It can be found in extenso in [26]. First, as in the finite variance case, the proof reduces to derive a good lower bound on where X 1 is the first row of X, W is a vector space of Co-dimension n − d ≥ n 1−γ (in R n or C n ) and P is the orthogonal projection on the orthogonal of W . However, in the finite variance case, dist 2 (X 1 , W ) concentrates sharply around its average: n − d. Here, the situation is quite different, for instance if W = vect(e n−d+1 , . . . , e n ), we have and thus (n − d) − 2 α dist 2 (X 1 , W ) is close in distribution to a non-negative α/2-stable random variable, say S.
On the other hand, if U is a n × n unitary matrix uniformly distributed on the unitary group (normalized Haar measure), and if W is the span of the last d row vectors, then it can be argued than dist 2 (X 1 , W ) is close in distribution to c(n−d)n 2 α −1 S. Hence, contrary to the finite variance case, the order of magnitude of the distance of X 1 to the vector space W depends on the geometry of W with respect to the coordinate basis. We have proved some lower bound on this distance which are universal on W . More precisely, for any 0 < γ < α/4, there exists c 1 > 0, such that for some event G n with P(G c n ) ≤ c 1 n −(1−2γ)/α , The above holds for n − d ≥ n 1−γ . We have crucially used the fact that for all p > 0, ES −p is finite, i.e. the non-negative α/2-stable law is flat in the neighborhood of 0.
Note: the result implies that the vector space W = vect(e n−d+1 , . . . , e n ) reaches the worst possible order of magnitude. Unfortunately, the upper bound on the probability of the event G c n is not good enough, and we also have to define the proper event F n given in lemma 6.5. This event F n satisfies P(F c n ) ≤ c exp(−n δ ) for some δ > 0 and c > 0.
6.3. The objective method and the Poisson Weighted Infinite Tree (PWIT).
Local convergence. We now describe our strategy to obtain the convergence of EΓ n −1/α X . It is an instance of the objective method : we prove that our sequence of random matrices converges locally to a limit random operator. To do this, we first notice that a n × n complex matrix M can be identified with a bounded operator in With an abuse of notation, without further notice, we will identify our matrices with their associated bounded operator in ℓ 2 (N). The precise notion of convergence that we will use is the following. if there exists a sequence of bijections σ n : N → N such that With a slight abuse of notation we have used the same symbol σ n for the linear isometry σ n : ℓ 2 (N) → ℓ 2 (N) induced in the obvious way. Note that the local convergence is the standard strong convergence of the operator σ −1 n A n σ n to A. This re-indexing of N preserves a distinguished element. It is a local convergence in the following way, if P (x, y) is a non-commutative polynomial in C, then the definition implies We shall apply this definition to random operators A n and A on ℓ 2 (N): to be precise, in this case we say that (A n , u) → (A, v) in distribution if there exists a random bijection σ n as in definition 6.6 such that σ −1 n A n σ n φ converges in distribution to Aφ, for all φ ∈ D(N), where a random vector ψ n ∈ ℓ 2 (N) converges in distribution to ψ if lim n→∞ Ef (ψ n ) = Ef (ψ) for all bounded continuous functions f : ℓ 2 (N) → R. Finally, we may without harm replace N by an infinite countable set V . All definitions carry over by considering any bijection from N to V : namely ℓ 2 (V ), for v ∈ V , the unit vector e v , D(V ) and so on.
The Poisson Weighted Infinite Tree (PWIT). We now define our limit operator on an infinite rooted tree with random edge-weights, the Poisson weighted infinite tree (PWIT) introduced by Aldous [4], see also [6].
The PWIT is the random weighted rooted tree defined as follows. The vertex set of the tree is identified with N f := ∪ k≥1 N k by indexing the root as N 0 = ø, the offsprings of the root as N and, more generally, the offsprings of some v ∈ N k as (v1), (v2), . . . ∈ N k+1 (for short notation, we write (v1) in place of (v, 1)). In this way the set of v ∈ N n identifies the n th generation. We then define T as the tree on N f with (non-oriented) edges between the offsprings and their parents (see figure 6).
We may define a random operator A on D(N f ) by the formula, for all v ∈ N f \{ø} where a(v) denotes the ancestor of v, while This defines a proper operator on D(N f ). Indeed, since {y v1 , y v2 , . . . } is an homogeneous Poisson point process of intensity 2 on R + : we have a.s. lim k→∞ y vk /k = 2. We thus find and similarly with Ae ø 2 .
Sketch of proof. We start with some intuition behind theorem 6.7. The presence of Poisson point processes is an instance of the Poisson behavior of extreme ordered statistics. If V 11 ≥ V 12 ≥ · · · ≥ V 1n is the ordered statistics of vector (|X 11 |, . . . , |X 1n |) then, it is well-known that the random variable in the space of non-increasing infinite sequences converges weakly, for the finite dimensional convergence, to where x 1 ≤ x 2 ≤ . . . are the points of an homogeneous Poisson point process of intensity 1 on R + . As observed by LePage, Woodroofe and Zinn [101], this fact follows easily from a beautiful representation for the order statistics of i.i.d. random variables. Namely, if G(u) = P(|X 11 | > u) is (one minus) the distribution function of |X 11 |, then where G −1 (u) = inf{y > 0 : G(y) ≤ u}, u ∈ (0, 1). To obtain the convergence to (6.6), it remains to notice that G −1 (u) ∼ u −1/α as u → 0, and x n ∼ n a.s. as n → ∞.
More generally, we may reorder non-increasingly the vector ((X 11 , X 11 ), (X 12 , X 21 ), . . . , (X 1n , X n1 )), and find a permutation π ∈ S n such that Then, the random variable (in the space of infinite sequences in C 2 of non-increasing norm) n −1/α (X 1π(1) , X π(1)1 ), (X 1π(2) , X π(2)1 ), . . . , (X 1π(n) , X π(n)1 ), (0, 0), . . . converges weakly, for the finite dimensional convergence, to In particular, we may define a bijection σ n in N f , such that σ n (ø) = 1, σ n (k) = π(k) if k = π −1 (1), and σ n arbitrary otherwise. Then, for this sequence σ n , we may check that n −1/α σ −1 n Xσ n e ø converges weakly to Ae ø in ℓ 2 (N f ). This is not good enough since we aim at the convergence for all φ ∈ D(N f ), not only e ø . In particular, the above argument does not explain the presence of a tree in the limit operator. Note however that from what precedes, only the entries such that |X ij | ≥ δn 1/α will matter for the operator convergence (for some small δ > 0). By assumption, where c = c(n) ∼ δ −1/α . In other words, if we define G as the oriented graph on {1, . . . , n} such that the oriented edge (i, j) is present if |X ij | ≥ δn 1/α then G is an oriented Erdős-Rényi graph (each oriented edge is present independently of the other and with equal probability). An elementary computation shows that the expected number of oriented cycles in G containing 1 and of length k is equivalent to c k /n. This implies that there is no short cycles in G around a typical vertex. At a heuristic level, this locally tree-like structure of random graphs explains the presence of the infinite tree T in the limit.
We are not going to give the full proof of theorem 6.7. For details, we refer to [25,26]. The strategy is as follows. For integer m, define J m = ∪ m k=0 {1, · · · , m} k ⊂ N f and consider the matrix A |m obtained as the projection of the random operator A on J m . We prove that for all integer m, there exists an injection π m from J m to {1, . . . , n} such that π m (ø) = 1 and the projection of n −1/α X on π m (J m ) converges weakly to A |m . The conclusion of theorem 6.7 follows by extracting a sequence m n → ∞ such that the latter holds.
To construct such injection π m , we explore the entries of X: we first consider the m largest entries of the vector in (C 2 ) m , ((X 12 , X 21 ), . . . , (X 1n , X n1 )), whose indices are denoted by i 1 , . . . , i m . We then look at the m-largest entries of ((X i 1 j , X ji 1 )) j =(1,i 1 ,...,i k ) , whose indices are i 1,1 , . . . , i 1,m . We repeat this procedure iteratively until we have discovered |J m | indices, and we define the injection π m as π m (v) = i v . The fact that the restriction of n −1/α X to (i v ) v∈Jm converges weakly to A |m can be proved by developing the ideas presented above.
Continuity of quaternionic resolvent for local convergence. Note that theorem 6.7 will have a potential interest for us, only if we know how to link the local convergence of definition 6.6 to the convergence of the quaternionic resolvent introduced in sub-section 4.6.
Recall that an operator B on a dense domain D (B) is Hermitian if for all x, y ∈ D(B), x, By = Bx, y . This operator will be essentially self-adjoint if there is a unique selfadjoint operator B 1 on D(B 1 ) ⊃ D (B) such that for all x ∈ D(B), B 1 x = Bx (i.e. B 1 is an extension of B). Lemma 6.8 (From local convergence to resolvents). Assume that (A n ) and A satisfy the conditions of definition 6.6 and (A n , u) → (A, v) for some u, v ∈ N. If the bipartized operator B of A is essentially self-adjoint, then, for all q ∈ H + , Proof. Fix z ∈ C and let B n (z) = B n − q(z, 0) ⊗ I, where B n is bipartized operator of A n . By construction, for all φ ∈ D(B) = D(N × Z/2Z), σ −1 n B n (z)σ n φ converges to B(z)φ (this is the strong operator convergence). The proof is then a direct consequence of [121, Theorem VIII.25(a)]: in this framework, the strong operator convergence implies the strong resolvent convergence. Namely, for all φ, ψ ∈ D(B) and η ∈ C + , φ, (σ −1 n B n (z)σ n − ηI) −1 ψ → φ, (B(z) − ηI) −1 ψ . We conclude by applying this to φ, ψ ∈ {e v , ev}. Remark 6.9 (A non-self-adjoint Hermitian operator). A key assumption in the above lemma is the essential self-adjointness of the bipartized limit operator. A local limit of Hermitian matrices will necessary be Hermitian. It may not however be always the case that the limit is essentially self-adjoint. Since any bounded Hermitian operator is essentially self-adjoint, for an example, we should look for an unbounded operator. Let (a k ) k∈N be a sequence on R + and define on D(N), for k ≥ 2, Be k = a k e k+1 + a k−1 e k−1 .
while Be 1 = a 1 e 2 . In matrix form, B is a tridiagonal symmetric infinite matrix. The work of Stieltjes [141] implies that B will be essentially self-adjoint if and only if lim n→∞ k≥n a −1 k = ∞.
6.4. Skeleton of the main proofs. All ingredients have finally been gathered. The skeleton of proof of theorems 6.2, 6.3 and the characterization of µ α and ν α,z is as follows: (1) By lemma 4.21, for all q ∈ H + , a.s. , in norm, (2) Since X has exchangeable rows, for all q ∈ H + , EΓ n −1/α X (q) = ER n −1/α X (q) 11 (3) We prove in sub-section 6.5 that the bipartized operator B of the random operator A of sub-section 6.3 is a.s. essentially self-adjoint (4) It follows by theorem 6.7 and lemma 6.8, By lemma 4.19, a.s. ν n −1/α X−z ν α,z as n → ∞, where ν α,z is characterized by mν α,z (η) = a(q) (6) We know from sub-section 6.2 that statement (ii) of lemma 4.20 holds for n −1/α X in probability. Then, in probability, µ n −1/α X µ α as n → ∞, where µ α is characterized by, in D ′ (C), We analyze in sub-section 6.5 R A (q) øø to obtain the properties of ν α,z and µ α . (8) Finally, when X 12 has a bounded density we improve the convergence to almost sure (in sub-section 6.6).
6.5. Analysis of the limit operator. This sub-section is devoted to items 3 and 7 which appear above in the skeleton of proof of theorems 6.2 and 6.3.

Self-adjointness.
Here we check the self-adjointness of the bipartized operator B of A.
Proposition 6.10 (Self-adjointness of bipartized operator on PWIT). Let A be the random operator defined by (6.5). With probability one, B is essentially self-adjoint.
This proposition relies on the following criterion of self-adjointness (see [26] for a proof).
Define the operator on D(V ) as Assume also that there exists a sequence of connected finite subsets (S n ) n≥1 in V , such that S n ⊂ S n+1 , ∪ n S n = V , and for every n and v ∈ S n , u / ∈Sn:{u,v}∈E Then the bipartized operator B of A is essentially self-adjoint.
We will use a simple lemma on Poisson processes (for a proof [25,Lemma A.4]).
Proof of proposition 6.10. For κ > 0 and v ∈ N f , we define The variables (τ v ) are i.i.d. and by lemma 6.12, there exists κ > 0 such that Eτ v < 1. We fix such κ. Now, we put a green color to all vertices v such that τ v ≥ 1 and a red color otherwise. We consider an exploration procedure starting from the root which stops at red vertices and goes on at green vertices. More formally, define the sub-forest T g of T where we put an edge between v and vk if v is a green vertex and 1 ≤ k ≤ τ v . Then, if the root ø is red, we set S 1 = C g (T ) = {ø}. Otherwise, the root is green, and we consider T g ø = (V g ø , E g ø ) the subtree of T g that contains the root. It is a Galton-Watson tree with offspring distribution τ ø . Thanks to our choice of κ, T g ø is almost surely finite. Consider L g ø the leaves of this tree (i.e. the set of vertices v in V g ø such that for all 1 ≤ k ≤ τ v , vk is red). The following set satisfies the condition of Lemma 6.11: We define the outer boundary of {ø} as For a connected set S, its outer boundary is Now, for each vertex u 1 , . . . , u k ∈ ∂ τ S 1 , we repeat the above procedure to the rooted subtrees T u 1 , . . . , T u k . We set Iteratively, we may thus almost surely define an increasing connected sequence (S n ) of vertices with the properties required for lemma 6.11.
Computation of resolvent. As explained in section 6.4, the properties of the measures µ α and ν α,z can be deduced from the analysis of the limit resolvent operator. Resolvent are notoriously easy to compute on trees. More precisely, let T = (V, E) be a tree and A, B be as in lemma 6.11 and let ø ∈ V be a distinguished vertex of V (in graph language, we root the tree T at ø). For each v ∈ V \{ø}, we define V v ⊂ V as the set of vertices whose unique path to the root ø contains v. We define T v = (V v , E v ) as the subtree of T spanned by V v . We may consider A v , the projection of A on V v , and B v the bipartized operator of A v . Finally, we note that if B is self-adjoint then so is B v (z) for every z ∈ C. The next lemma is an operator analog of the Schur inversion by block formula (4.12). Lemma 6.13 (Resolvent on a tree). Let A, B be as in lemma 6.11. If B is self-adjoint then for any q = q(z, η) ∈ H + , We come back to our random operator A defined on the PWIT and its quaternionic resolvent R A (q). We analyze the random variable The random variables a(z, η) solves a nice recursive distribution equation (RDE). This type of recursion equation is typical of combinatorial observable defined on random rooted trees. More precisely, we define the measure on R + , Lemma 6.14 (Recursive distribution equation). For all q = q(z, η) ∈ H + , if L q is the distribution on C + of a(z, η) then L q solves the equation in distribution: Moreover, with the same notation, Proof. This is sa simple consequence of lemma 6.13. Indeed, for k ∈ N, we define T k as the subtree of T spanned by kN f . With the notation of lemma 6.13, for k ∈ N, Then, by lemma 6.13, we get Now the structure of the PWIT implies that (j) a k and c k have common distribution L q (jj) the variables (a k , c k ) k∈N are i.i.d. Also the thinning property of Poisson point processes implies that Even if (6.8) looks complicated at first sight, for η = it, it is possible to solve it explicitly. First, for t ∈ R + , a(z, it) is pure imaginary and we set h(z, t) = Im(a(z, it)) = −ia(z, it) ∈ (0, t −1 ].
The crucial ingredient, is a well-known and beautiful lemma. It can be derived form a representation of stable laws, see e.g. LePage, Woodroofe, and Zinn [101] and also Panchenko where S is the positive α 2 -stable random variable with Laplace transform for all Proof of lemma 6.15. Recall the formulas, for y ≥ 0, η > 0 and 0 < η < 1 respectively, From the Lévy-Khinchin formula we deduce that, with s ≥ 0, Hence, by lemma 6.15, we may rewrite (6.8) as where S and S ′ are i.i.d. variables with common Laplace transform (6.10) and the function y = y(|z| 2 , t) = E[h α/2 ] 2/α is the unique solution of the equation in y: (since the left hand side is decreasing in y, the solution is unique). In the above equations, it is also possible to consider the limits as t ↓ 0.
As explained in section 6.4, this implies that, in D ′ (C), µ α is equal to Eb(·, it).
Using (6.9), after a simple computation, we find that the density g α of µ α at z is After more computations, it is even possible to study the regularity of y * , find the explicit solution at 0, and an asymptotic equivalent as r → ∞. All these results can then be translated into properties of µ α . We will not pursue here these computation which are done in [26]. We may simply point out that µ α converges weakly to the circular law as α → 2, is a consequence of the fact that the non-negative α/2-stable random variable S/Γ(1 − α/2) 2/α converges to a Dirac mass as α → 2 (see (6.10)). 6.6. Improvement to almost sure convergence. Let ν α,z be as in theorem 6.3. In order to improve the convergence to a.s., it is sufficient to prove that for all z ∈ C, a.s.
We have already proved that this convergence holds in probability. It is thus sufficient to prove that there exists a deterministic sequence L n such that a.s.
Now, thanks to the bounded density assumption and remark 4.17, one may use lemma 4.12 for the matrix X − n 1/α zI in order to show that that there exists a number b > 0 such that a.s. for n ≫ 1, Similarly, up to an increase of b if needed, we also get from (6.2) that a.s. for n ≫ 1,

Now, we consider the function
From what precedes, a.s. for n ≫ 1, 14) The total variation of f n is bounded by c log n for some c > 0. Hence by lemma 4.18, if then we have, In particular, from the first Borel-Cantelli lemma, a.s., lim n→∞ f n (s) dν n −1/α X−zI (s) − L n = 0.

Open problems
We list in this section some open problems related to the circular law theorem.
Universality of Gaussian Ensembles. The universality dogma states that if a real or complex functional of X is enough symmetric and depend on enough entries then it is likely that this functional behaves asymptotically (n → ∞) like in the Gaussian case (Ginibre Ensemble here) as soon as the first moments of X 11 match certain Gaussian moments (depends on the functional). This can be understood as a sort of non-linear central limit theorem. Among interesting functionals, we find for instance the following: • Spectral radius (has Gumbel fluctuations for the Complex Ginibre Ensemble); • argument of λ 1 (X) (is uniform on [0, 2π] for the Complex Ginibre Ensemble); • Law of λ n (n −1/2 X) (see [49,Chapter 15] for the Complex Ginibre Ensemble). The square of the smallest singular value s n (n −1/2 G) 2 of the Complex Ginibre Ensemble follows an exponential law [40] and this result is asymptotically universal [149]; • Gap probabilities and Voronoï cells (see [3] and [62] for the Ginibre Ensemble); • Linear statistics of µ X (some results are available such as [124,125,27,123]); • Empirical distribution of the real eigenvalues of n −1/2 X when X 11 is real (tends to uniform law on [−1, 1] for the Real Ginibre Ensemble); • Unitary matrix in the polar decomposition (Haar unitary for the Complex Ginibre). This is connected to the R-diagonal concept in free probability theory [75]; • If X 11 has infinite fourth moment then the eigenvalues of largest modulus blow up and are asymptotically independent (Poisson statistics at some scale); • A large deviations principle for µ X at speed n 2 which includes as a special case the one obtained for the Complex Ginibre Ensemble by Hiai and Petz [119] (see also Ben Arous and Zeitouni [17]) and references therein. The analogous question for Hermitian models (Wigner and GUE) is also open. The answer depends on the chosen scale, the class of deviations, and the topology.
One may group most of these functionals by considering the spectrum as a point process.
It is also possible to consider universality beyond i.i.d. entries models. For instance, if X has exchangeable entries as a random vector of C n 2 and if X satisfies to suitable mean-variance normalizations, then we expect that Eµ X tends to the circular law due to a Lindeberg type phenomenon, see [34] for the Hermitian case (Wigner). Similarly, if X, as a random vector of C n 2 , is log-concave (see footnote 22) and isotropic (i.e. its covariance matrix is identity) then we expect that Eµ X tends to the circular law, see [1] for i.i.d. log-concave rows. Since the indicator of a convex set is a log-concave measure, one may think about the Birkhoff polytope (convex envelope of permutation matrices) and ask if the circular law holds for random uniform doubly stochastic matrices, see [32] and [35].
Variance profile. We may consider the matrix Y defined as Y ij = X ij σ(i/N, j/N ) where σ : [0, 1] 2 → [0, 1] is a measurable function. The measure µ n −1/2 Y should converge a.s. to a limit probability measure µ σ on C. For finite variance Hermitian matrices, this question has been settled by Khorunzhy, Khoruzhenko, Pastur and Shcherbina [95], for heavy tailed Hermitian matrices, by Belinschi, Dembo, Guionnet [15]. Girko has also results on the singular values of random matrices with variance profile.
Oriented r-regular graphs and Kesten-McKay measure. Random oriented graphs are host of many open problems. For example, for integers n ≥ r ≥ 3, an oriented r-regular graph is a graph on n vertices such that all vertices have r incoming and r outgoing oriented edges. Consider the adjacency matrix A of a random oriented r-regular graph sampled from the uniform measure (there exists suitable simulation algorithms using matchings of half edges). It is conjectured that as n → ∞, a.s. µ A converges to the probability measure It turns out that this probability measure is also the Brown measure of the free sum of r unitary, see Haagerup and Larsen [75]. The Hermitian (actually symmetric) version of this measure is known as the Kesten-McKay distribution for random non-oriented r-regular graphs, see [94,111]. We recover the circular law when r → ∞ up to renormalization.
Invertibility of random matrices. The invertibility of random matrices is one of the keys behind the circular law theorem 2.2. Let us consider the case were X 11 is Bernoulli 1 2 (δ −1 +δ 1 ). A famous conjecture by Spielman and Teng (related to their work on smoothed analysis of algorithms [140,139]) states that there exists a constant 0 < c < 1 such that P( √ n s n (X) ≤ t) ≤ t + c n for n ≫ 1 and any small enough t ≥ 0. This was almost solved by Rudelson and Vershynin [128] and Tao and Vu [149]. In particular, taking t = 0 gives P(s n (X) = 0) = c n . This positive probability of being singular does not contradict the asymptotic invertibility since by the first Borel-Cantelli lemma, a.s. s n (X) > 0 for n ≫ 1. Regarding the constant c above, it has been conjectured years ago that This intuition comes from the probability of equality of two rows, which implies that P(s n (X) = 0) ≥ (1/2) n . Many authors contributed to the analysis of this difficult nonlinear discrete problem, starting from Komlós, Kahn, and Szemerédi. The best result to date is due to Bourgain, Vu, and Wood [29] who proved that P(s n (X) = 0) Roots of random polynomials. The random matrix X has i.i.d. entries and its eigenvalues are the roots of its characteristic polynomial. The coefficients of this random polynomial are neither independent nor identically distributed. Beyond random matrices, let us consider a random polynomial P (z) = a 0 + a 1 z + · · · + a n z n where a 0 , . . . , a n are independent random variables. By analogy with random matrices, one may ask about the behavior as n → ∞ of the roots λ 1 (P ), . . . , λ n (P ) of P in C and in particular the behavior of their empirical measure 1 n n i=1 δ λ i (P ) . The literature on this subject is quite rich and takes its roots in the works of Littlewood and Offord, Rice, and Kac. We refer to Shub and Smale [136], Azaïs and Wschebor [9], and Edelman and Kostlan [43,44] for (partial) reviews. As for random matrices, the case where the coefficients are real is more subtle due to the presence of real roots. Regarding the complex case, the zeros of Gaussian analytic functions is the subject of a recent monograph [83] in connection with determinantal processes. Various cases are considered in the literature, including the following three families: • Kac polynomials, for which (a i ) 0≤i≤n are i.i.d.
• Binomial polynomials, for which a i = Geometrically, the complex number z is a root of P if and only if the vectors (1, z, . . . , z n ) and (a 0 , a 1 , . . . , a n ) are orthogonal in C n+1 , and this connects the problem to Littlewood-Offord type problems [102] and small balls probabilities. Regarding Kac polynomials, Kac [92,91] has shown in the real Gaussian case that the asymptotic number of real roots is about 2 π log(n) as n → ∞. Kac obtained the same result when the coefficients are uniformly distributed [93]. Hammersley [77] derived an explicit formula for the kpoint correlation of the roots of Kac polynomials. The real roots of Kac polynomials were extensively studied by Maslova [109,108], Ibragimov and Maslova [85,87,88,86], Logan and Shepp [104,105], and by Shepp and Farahmand [132]. Shparo and Shur [135] have shown that the empirical measure of the roots of Kac polynomials with light tailed coefficients tends as n → ∞ to the uniform law one the unit circle {z ∈ C : |z| = 1} (the arc law). Further results were obtained by Shepp and Vanderbei [133], Zeitouni and Zelditch [157], Shiffman and Zelditch [134], and by Bloom and Shiffman [21]. If the coefficients are heavy tailed then the limiting law concentrates on the union of two centered circles, see [68] and references therein. Regarding Weyl polynomials, various simulations and conjectures have been made [52,45]. For instance, if (b i ) 0≤i≤n are i.i.d. standard Gaussian, it was conjectured that the asymptotic behavior of the roots of the Weyl polynomials is analogous to the Ginibre Ensemble. Namely, the empirical distribution of the roots tends as n → ∞ to the uniform law on the centered disc of the complex plane (circular law), and moreover, in the real Gaussian case, there are about 2 π √ n real roots as n → ∞ and their empirical distribution tends as n → ∞ to a uniform law on an interval, as for the real Ginibre Ensemble, see Remark 3.8. The complex Gaussian case was considered by Leboeuf [98] and by Peres and Virág [118], while the real roots of the real Gaussian case were studied by Schehr and Majumdar [131]. Beyond the Gaussian case, one may try to use the companion matrix 24 of P and the logarithmic potential approach. Numerical simulations reveal strange phenomena depending on the law of the coefficients but we ignore it they are purely numerical. Note that if the coefficients are all real positive then the roots cannot be real positive. The heavy tailed case is also of interest (rings?).

Appendix A. Invertibility of random matrices with independent entries
This appendix is devoted to the proof of a general statement (lemma A.1 below) on the smallest singular value of random matrix models with independent entries. It follows form lemma A.1 below that if X = (X ij ) 1≤i,j≤n is a random matrix with i.i.d. entries such as X 11 is not constant and E(|X 11 | κ ) < ∞ for some arbitrarily small real number κ > 0, then for any γ > 0 there exists are real number β > 0 such that for any n ≫ 1 and any deterministic matrix M ∈ M n (C) with s 1 (M ) ≤ n γ , lim n→∞ P(s n (X + M ) ≤ n −β ) = 0.
Both the assumptions and the conclusion are strictly weaker than the result of Tao and Vu. It is enough for the proof of the circular law in probability and its heavy tailed analogue.
Lemma A.1 (Smallest singular value of random matrices with independent entries). If (X ij ) 1≤i,j≤n is a random matrix with independent and non-constant entries in C and if a > 0 is a positive real number such that b := min 1≤i,j≤n P(|X ij | ≤ a) > 0 and σ 2 := min 1≤i,j≤n Var X ij 1 {|X ij |≤a} > 0, then there exists c = c(a, b, σ) > 1 such that for any M ∈ M n (C), n ≥ c, s ≥ 1, 0 < t ≤ 1, P s n (X + M ) ≤ t √ n ; s 1 (X + M ) ≤ s ≤ c log(cs) ts 2 + 1 √ n .
The proof of lemma A.1 follows mainly from [103,128]. These works have already been used in the proof of the circular law, notably in [66]. As we shall see, the term 1/ √ n comes from the rate in the Berry-Esseen Theorem. Following [103], it could probably be improved by using finer results on the Littlewood-Offord problem [148]. Note however, that lemma A.1 is sufficient for proving convergence in probability of spectral measures. We emphasize that there is not any moments assumption on the entries in lemma A.1. However, (weak) moments assumptions may be used in order to obtain an upper bound on the quantity P(s 1 (X + M ) ≥ s). Also, the variance (of the truncated variables) σ may depend on n : this allows to deal with sparse matrix models (not considered here).
For the proof of the circular law and its heavy tailed analogue, lemma A.1 can be used typically with t = 1/(s 2 √ n) and s = n r large enough such that with high probability s 1 (X + M ) ≤ s. In contrast with the Tao and Vu result, lemma A.1 cannot provide a summable bound usable with the first Borel-Cantelli lemma due to the presence of 1/ √ n.
Let us give the idea behind the proof of lemma A.1. A geometric intuition says that the smallest singular value of a random matrix can be controlled by the minimum of the 24 The companion matrix M of Q(z) := c0 + c1z + · · · + cn−1z n−1 + z n is the n × n matrix with null entries except Mi,i+1 = 1 and Mn,i = ci−1 for every i. The characteristic polynomial of M is Q. distances of each row to the span of the remaining rows. The distance of a vector to a subspace can be controlled with the scalar product of the vector with a unit norm vector belonging to the orthocomplement of the subspace. Also, when the entries of the matrix are independent, this boils down by conditioning to the control of a small ball probability involving a linear combination of independent random variables. The coefficients in this combination are the components of the orthogonal vector. The asymptotic behavior of this small ball probability depends in turn on the structure of these coefficients. When the coefficients are well spread, we expect an asymptotic Gaussian behavior thanks to the central limit theorem, more precisely its quantitative weighted version called the Berry-Esseen theorem. We will follow this scheme while keeping the geometric picture in mind.
The proof of lemma A.1 is divided into two parts which correspond to a subdivision of the unit sphere S n−1 of C n . Namely, for some real positive parameters δ, ρ > 0 that will be fixed later, we define the set of sparse vectors Sparse := {x ∈ C n : card(supp(x)) ≤ δn} and we split the unit sphere S n−1 into a set of compressible vectors and the complementary set of incompressible vectors as follows: Comp := {x ∈ S n−1 : dist(x, Sparse) ≤ ρ} and Incomp := S n−1 \ Comp.
We will use the variational formula, for A ∈ M n (C), Compressible vectors. Our treatment of compressible vectors differs significantly from [103,128] (it gives however a weaker statement). We start with a variation around lemma 4.13.
By assumption, for 1 ≤ k ≤ m, E m Y k = 0 and E m |Y k | 2 ≥ σ 2 . This function is convex and 1-Lipschitz. Hence, Talagrand's concentration inequality gives where M m is the median of f under P m . In particular, Also, if P denotes the orthogonal projection on the orthogonal of W , we find The latter, for n large enough, is lower bounded by cσ 2 n if δ 0 = b/4.
Let 0 < ε < 1 and s ≥ 1 be as in lemma A.2. We set from now on (in particular, ρ ≤ 1/4). The parameter 0 < δ < 1 is still to be specified: at this stage, we simply assume that δ < δ 0 . We note that if A ∈ M n (C) and y ∈ C n is such that supp(y) ⊂ π ⊂ {1, . . . , n}, then we have where A |π is the n × |π| matrix formed by the columns of A selected by π. We deduce that However, by Pythagoras theorem, for any x ∈ C |π| , where C i is the i-th column of A and H i := span{C j : j ∈ π, j = i}. In particular, s n (A |π ) ≥ min i∈π dist(C i , H i )/ |π|. Now, we apply this bound to A = X + M . Since H i has dimension at most nδ and is independent of C i , by lemma A.2, the event that, has probability at least 1 − cnδ exp(−cσ 2 n) for n ≫ 1. Hence P s n ((X + M ) |π ) ≤ εσ √ δ ≤ cnδ exp(−cσ 2 n).
Therefore, using the union bound and our choice of ρ, we deduce from (A. Incompressible vectors: small ball probability. Now, we come back to our matrix X + M : let C be the k-th column of X + M and H be the span of all columns but C. Our goal in this sub-section is to establish the bound, for all t ≥ 0, To this end, we also consider a random vector ζ taking its values in S n−1 ∩ H ⊥ , which is independent of C. Such a random vector ζ is not unique, we just pick one and we call it the orthogonal vector (to the subspace H). We have dist(C, H) ≥ | ζ, C |. We have reached now the final preparation step before the use of the Berry-Esseen theorem. This step consists in the reduction to a case where for a fixed set of coordinates, both the components of ζ and the random variables X ik +M ik are well controlled. Namely, if ζ ∈ Incomp, let π ⊂ {1, . . . , n} be as in lemma A.3 associated to vector ζ. Then conditioned on {ζ ∈ Incomp}, from Hoeffding's deviation inequality, the event that has conditional probability at least 1 − exp(−|π|b 2 /2) ≥ 1 − exp(−cδn) (recall that ζ hence π are independent of C). In summary, using our choice of δ, ρ, by lemma A.5 and (A.5), in order to prove (A.4), it is sufficient to prove that for all t ≥ 0, where P m (·) = P(·|E m , F m ) is the conditional probability given F m the σ-algebra generated by all variables but (X 1k , . . . , X mk ), m = ⌊δbn/4⌋, and the event We may write where u ∈ F m is independent of (X 1k , . . . , X mk ). It follows that The idea, originated from [103], is now to use the rate of convergence given by the Berry-Esseen theorem to upper bound this last expression.
Lemma A.6 (Small ball probability via Berry-Esseen theorem). There exists c > 0 such that if Z 1 , . . . , Z n are independent centered complex random variables, then for all t ≥ 0, Re(Z i ) − Re(z) ≤ t and similarly with Im. Hence, up to loosing a factor 2, we can assume with loss of generality that the Z i 's are real random variables. Then, if G is a real centered Gaussian random variable with variance τ 2 , Berry-Esseen theorem asserts that In particular, for all t ≥ 0 and x ∈ R, We conclude by using the fact that G has a density upper bounded by 1/ √ 2πτ 2 .
From the pigeonhole principle, there exists j such that |π j | ≥ m/L. We have and, i∈π j We deduce by (A.6) and lemma A.6 that (by changing the value of c), for all t ≥ 0, The proof of (A.4) is complete.