On the Marchenko-Pastur and circular laws for some classes of random matrices with dependent entries

In the ﬁrst part of the article we prove limit theorems of Marchenko-Pastur type for the average spectral distribution of random matrices with dependent entries satisfying a weak law of large numbers, uniform bounds on moments and a martingale like condition investigated previously by Götze and Tikhomirov. Examples include log-concave unconditional distributions on the space of matrices. In the second part we specialize to random matrices with independent isotropic unconditional log-concave rows for which (using the Tao–Vu replacement principle) we prove the circular law.


Introduction
One of the main points of interest of the theory of random matrices is universality, i.e. the question to what extent the limiting objects appearing in the theory are common for various random matrix models. Recently a lot of progress has been achieved in this direction including e.g. the proof of the circular law in a general form [36] as well as significant weakening of the assumptions leading to the Tracy-Widom distribution for the operator norm [33,30] or the sine law for local eigenvalues statistics (see e.g. [15,34]). Most of these results have been obtained for classical models of random matrices, i.e. models in which all the entries (or all the entries above the diagonal) of the matrix are independent random variables. Such models may be considered one of the cornerstones of the theory of random matrices (the other one being the family of invariant ensembles, which we are not going to discuss here).
There has been also some results concerning models in which the independence assumption is weakened or completely abandoned. Already in the seminal paper by Marchenko and Pastur [26] one considers rather independent rows than independent entries, compensating for the lack of independence with some moment assumptions. One of important examples considered in [26] is a random matrix with independent rows distributed uniformly on the unit sphere. This was generalized by Yin and Krishnaiah [37] to spherically symmetric distributions. Quite recently Aubrun [6] obtained the Marchenko-Pastur law for matrices with independent rows distributed uniformly on the n p ball, which was subsequently generalized by Pajor and Pastur [27] to matrices with independent rows distributed according to an arbitrary isotropic 1 log-concave measure.
Among other interesting results of Wigner and Marchenko-Pastur type there are those by Götze and Tikhomirov, dealing with matrices satisfying certain martingale like conditions, without any assumptions on the independence of the entries ( [18,20]). On the other hand Anderson and Zeitouni [4] constructed models of random matrices with locally dependent entries for which the limiting spectral distribution is different from that of classical random matrices. Similar results have been also obtained for structured random matrices (like Toeplitz or Hankel Matrices, see [12]).
As for the circular law, it is not our aim to list all the historical developments concerning this problem, so we will only mention that most of existing results concern random matrices with independent entries (see e.g. [16,17,8,7,28,19,36]) and to our best knowledge, the only article concerning nonindependent entries is [9] examining the case of random Markov matrices.
The aim of this paper is to provide some rather simple examples of random matrices with nonindependent entries for which classical limit theorems concerning the limiting spectral distribution still hold. We will focus on the Marchenko-Pastur and circular laws.
The amount of dependence we allow for varies from one result to another. Sometimes, when dealing only with the expected spectral distribution, we will not assume any independence, but only martingale like conditions in the spirit of Götze and Tikhomirov, a weak law of large numbers and sufficiently strong integrability. The proofs of these results follow the classical moment approach and are completely elementary.
For the almost sure limit theorems we will assume independence of rows or columns of the matrix (these types of results will be easy corollaries from the results for the expected spectral distribution and well known facts on the concentration of the Stieltjes transform).
Finally for the circular law we will assume that the rows are independent and distributed according to an unconditional isotropic log-concave measure (which is needed in the proof to ensure good concentration and bounds on the smallest singular value of the matrix). In this case again our results will follow easily from the results on the expected spectral distribution with use of the Tao-Vu replacement principle and recent general results on log-concave measures.
The organisation of the paper is as follows. In Section 2 we discuss limit theorems for symmetric nonnegative definite random matrices. The main results of this section are gathered in Subsection 2.2 (Theorems 2.3, 2.4).
In Section 3 we apply some of the results of Section 2 to prove the circular law for random matrices with independent log-concave unconditional rows (Theorem 3.4).
For simplicity we restrict our attention to matrices with real-valued entries (even for the circular law). However straightforward modifications of our arguments lead to counterparts of all the results in the complex case.
Acknowledgement The author would like to thank Sasha Sodin for valuable comments concerning an early version of the paper and the anonymous Referees whose comments helped improve the presentation of results.

Probabilistic assumptions
Let (N n ) n≥1 be a sequence of positive integers such that lim n→∞ n/N n = y ∈ (0, ∞) and for each n ≥ 1 consider random matrices A n = [X (n) i j ] 1≤i≤n,1≤ j≤N n . Let us assume that the following assumptions are satisfied (A1) for every k ∈ N , sup n max i≤n, j≤N n E|X Remark Note that the assumption (A3) may be read as 'the Euclidean norm of a random row of the matrix A n , normalized by N n and the Euclidean norm of a random column, normalized by n both converge in probability to 1'. It is obviously implied by the condition Examples and discussion of the assumptions We would now like to list some examples of random matrices satisfying assumptions (A1)-(A3) and to relate these conditions to other assumptions considered in the literature.
Let us first recall the following definition.
Definition 2.1. An n-dimensional random vector X is called isotropic if it is centered and its covariance matrix is equal to identity.

1.
Obviously if the entries of the matrix are independent with mean zero, variance one and uniformly bounded moments of all orders, then the assumptions (A1)-(A3) are satisfied (the assumption (A3) follows e.g. from Chebyshev's inequality). This example is not of particular interest from our point of view as the convergence of spectral distributions of matrices generated by independent random variables is well known under much weaker integrability assumptions.

2.
If the law of A n in R nN n (which we will identify with the space of n × N n matrices in a natural way) is log-concave (see Definition 3.2 below) and isotropic (with respect to the standard basis), then the assumptions (A1), (A3) are satisfied. If one also assumes unconditionality (see Definition 3.1) with respect to the standard basis, one obtains assumption (A2), although obviously (A2) is weaker than unconditionality.
Assumption (A1) follows from the so called Borell's lemma (see [11]), which states that log-concave variables are exponentially integrable, Assumption (A3) (or even the stronger condition (A3')) from Klartag's concentration results (see Theorem 3.5 in Section 3.2) and the fact that marginals of isotropic log-concave measures are also isotropic and log-concave.
In particular this class of examples includes matrices sampled from the p balls in R N n n in isotropic position, i.e. from the sets Here c p,k,n are constants of order (kn) 1/p . Another class of matrices covered by this case is matrices with independent log-concave unconditional rows.
We would like to note that the Marchenko-Pastur theorem has been recently proven by Pajor and Pastur in the case of general matrices with independent log-concave isotropic rows or columns (not necessarily unconditional). With our approach we will be able to show that the expected spectral measure of 1 N n A n A T n converges to the Marchenko-Pastur distribution even if the rows are not independent (however we additionally have to assume (A2)).

3.
The above examples of vectors with independent entries and with independent log-concave rows/columns have one common feature. Namely, the convergence of spectral distribution of 1 N n A n A T n can be proven by means of the Stieltjes transform and the bound of the form where X is any row of A n and M is any matrix with M 2 → 2 ≤ 1. In the case of independent entries this bound is straightforward, for log-concave rows it has been proven by Pajor and Pastur in [27]. It lies also at the core of the original proof by Marchenko and Pastur. The importance of concentration of measure for quadratic forms for the limit theorems for spectra of random matrices of sample covariance type has been also recently emphasized by El Karoui [14].
Here we would like to present a simple class of random matrices with independent rows, which satisfy assumptions (A1)-(A3) but do not satisfy the above bound.
Choose k ∈ N and assume for simplicity that for all n, N n is divisible by k. Divide the set {1, . . . , N n } into k disjoint sets of equal cardinality (say I 1 , . . . , I k ). Let µ n , n = 1, 2, . . . be isotropic probability measures on R N n /k , satisfying (A1) and (A2), with Euclidean norm concentrated around N n /k (i.e. µ n ({x : ||x| − N n /k| > }) → 0 for all > 0). Let us also identify R N n with R I 1 × . . . × R I k . Let δ n be a random variable distributed uniformly on {1, . . . , k} and Y n a random vector independent of δ n , distributed according to µ n . Finally let X (n) = k(X 1 {δ n =1} , X 1 {δ n =2} , . . . , X 1 {δ n =k} ). In other words we select the set of nonzero coordinates of X (n) uniformly among I 1 , . . . , I k , distribute the nonzero coordinates according to µ n and finally rescale the vector to make it isotropic in R N n . Now define the rows of A n , X (n) 1 , . . . , X (n) n as independent copies of X (n) . By construction, the matrix A n with rows X (n) 1 , . . . , X (n) n satisfies (A1)-(A3) (the condition (A1) follows from the fact that k is fixed). However if we let M be the matrix of the orthogonal projection on (say) I 1 , we get which is of the order n 2 . Thus in this case (1) is not satisfied.
Let us also mention that matrices with non-necessarily independent entries, satisfying assumption (A2) have recently been considered by Götze and Tikhomirov (see [18,20]). Their results, concerning convergence of the expected spectral measure were obtained with use of the Stein method and work under the assumption of finiteness of the second moments of the entries, which is much milder than our assumption (A1). Nevertheless, the other assumptions introduced in their paper are much more technical, martingale like conditions (although still rather natural in view of the CLT theory for martingales). In particular it may be checked that the example constructed above does not satisfy the assumptions (1.11) and (1.12) of Theorem 1.1. in [20]. It is also relatively easy to construct examples of matrices which satisfy (A1)-(A3) but fail the assumption (1.10) of [20]. From this point of view, some of the results we present may be seen as compliments of the theorem by Götze and Tikhomirov. It would be interesting to find a theorem covering both results.

4.
Obviously assumptions of the type (A3'), which are just a quantitative version of the weak law of large numbers, have been studied in the literature. It is classical and goes back to Khintchine that it is enough to assume that (X (n) i j ) 2 are negatively correlated and have uniformly bounded second moment (which would follow from (A1)), one can also consider matrices satisfying all sorts of mixing conditions (see e.g. [22,32]). Let us finally notice that if we have random matrices A n = [X i j ] satisfying (A1) and (A3), then the matrices (A (n) i j i j ), where i j are independent Rademacher variables, independent of A n will satisfy (A1)-(A3). If many cases, under stronger regularity of A n , we may multiply X (n) i j by more general sequences of independent mean zero variables. It is not our aim to go into details here, but rather to argue that assumptions (A1)-(A3) are satisfied by many models when independence is replaced by weaker conditions of partial independence, so we will leave the details to the Reader.

Main results for the symmetric case
Marchenko-Pastur Law Recall that the spectral measure of an n × n symmetric matrix H is a probability measure on the real line defined as where λ 1 ≤ . . . ≤ λ n are eigenvalues of A and δ x stands for the Dirac mass at x.
Recall also the following In consequence, if L n is the spectral measure of the matrix M n , then the (non-random) measure EL n converges weakly as n → ∞ to the Marchenko-Pastur law with parameter y.
Remark If one additionally assumes that the rows (or columns) of the matrix are independent, by concentration properties of the Stieltjes transform of the spectral distribution (which for the Reader's convenience we formulate in the Appendix) one can strengthen the above results to an almost sure convergence of L n . In the sequel we will not need the stronger version, however we will need a corresponding statement for a related model of random matrices described below, for which we will present a detailed proof.

Square matrices shifted by a multiple of identity
We will now consider the case N n = n and the empirical spectral distribution of the matrix where z is a complex number (note that for z = 0 the problem of spectral distribution of H n reduces to the Marchenko-Pastur theorem).
The interest in this type of random matrices stems from the fact that they play an important role in the proof of the circular law. In the second part of the article we will use the facts established in this section to prove the circular law for random matrices with independent log-concave unconditional rows.
Theorem 2.4. Assume that N n = n and a sequence of random matrices A n satisfies assumptions (A1) -(A3). Then for any k ∈ N, where µ k (|z| 2 ) is a function depending only on |z| 2 and not on the distribution of H n .

Corollary 2.5. Assume that N n = n, and A n is a sequence of random matrices with independent rows satisfying assumptions (A1) -(A3). Let L n (z) be the spectral measure of the matrix H n (z). For every z ∈ C, with probability one, L n (z) converges weakly to a non-random measure which does not depend on the distribution of the rows of A n (in particular is the same as for the case of matrices with independent standard Gaussian entries).
Remark Note that (as already mentioned) one half of the assumption (A3) follows from the independence of the rows and boundedness of moments of the entries.

Combinatorial facts 2.3.1 Trees
Let T = (V, E, r) be a rooted tree. Divide the set V into two disjoint classes U and D, where U is the set of vertices whose distance from the root is even, while D is the set of vertices whose distance from the root is odd. Thus each edge e ∈ E joins a vertex from D (call it v) with a vertex from U (call it u). We will denote such an edge by e = (u → v).

Consider the set I n T of functions
Finally define We will prove the following easy proposition.
Proof. In the course of the proof we will allow all the constants to depend on the constants in assumption (A1) and y without stating it explicitly. Thus if we write e.g. O a (1), we mean that the implicit constant depends only on y, the constants in (A1) and the parameter a.
We will proceed by induction with respect to the size of the tree. If |V | = 1 then clearly ζ n (T ) = 1 for all n. Let us thus assume that |V | > 1 and that the proposition holds for all trees of size smaller than |V |. Let us consider an arbitrary leaf w of the tree T , distinct from the root. Let x be the unique neighbour of w. We will consider in detail only the case when w ∈ D, the other case is similar. Let T = (Ṽ ,Ẽ, r) be the tree obtained from T by deleting the vertex w together with the adjacent edge e = (x → w). LetĨ n be the set of multi-indices iṼ : We have For n large enough (depending on T ) I ñ T =Ĩ n , and there are only |D| − 1 choices of i w such that (iṼ , i w ) / ∈ I n T . Moreover, by the generalized Hölder inequality and the assumption (A1), for every is bounded by a number independent of n. Therefore for large n, where in the last inequality we used the fact that |I ñ For every > 0 and every iṼ ∈ I ñ T we have By assumption (A1), the triangle inequality in L p and generalized Hölder's inequality are bounded by a constant C T , depending only on T and the constants in (A1). In consequence, by (3) and (4), we get that for large n, Now, by the formula for cardinality of I ñ T and the fact that for each i x there are at most Note that by the Cauchy-Schwarz inequality, and thus by (A3) we get which in combination with the induction assumption proves that ζ n (T ) → 1 as n → ∞.
The other case to consider, when w ∈ U, is analogous, one simply uses the other part of assumption (A3) to show that ζ n (T ) and ζ n (T ) are close.

Γ-trees
For the purpose of proving Theorem 2.4 it will be convenient to consider a special class of trees, which we introduce below in an abstract form together with a theorem on the asymptotic behaviour of counterparts of quantities ζ n investigated in the previous section. The proofs will be very similar to those for ζ n , the main difficulty being a slightly more involved notation.
Definition We define a Γ-tree as a rooted tree T = (V, E, r) possessing the following additional structure 1. The set V is partitioned into two sets S and O. The elements of S (resp. O) will be called special (resp. ordinary) vertices 2. Each edge adjacent to a special vertex is given an orientation in such a way that • if r ∈ O then for any special vertex u such that u is the only special vertex on the path Partition The orientation of paths between elements of S allows us to further partition O into two sets U and D in the following way.
• if v l is the last special vertex on the path then u ∈ D iff (m − l is odd and v l → v l+1 ) or (m − l is even and v l+1 → v l ).
Notice that every edge e with both ends in O has one end v in D and the other end u in U. We will assign to such edges the orientation u → v. This way we have given orientation to all the edges of T . Whenever we want to stress the orientation of the edge with ends u, v we will write e = (u → v).
Definition of ξ n (T ) For a fixed sequence of random n × n matrices A n = [X (n) i j ] i, j≤n and any Γ-tree T we define a number ξ n (T ) in the following way.
Example Consider a tree T on four vertices 1, 2, 3, 4, in which 1 is the root, Proposition 2.7. Assume that N n = n and the sequence of random matrices A n satisfies the assumptions (A1) and (A3). Then for every Γ-tree T , Proof. Since the argument is very similar to the proof of Proposition 2.6, we will present only a sketch. We proceed by induction with respect to the number of vertices in T . For V = 1, we have ξ n (T ) = 1 for all n. If |V | > 1, we consider an arbitrary leaf v of T different from the root. Let us notice that by removing this vertex together with the adjacent edge, we obtain a new treeT endowed with a structure of a Γ-tree, inherited from T . By arguments very similar to those given for Proposition 2.6 one can show that γ n (T ) = γ n (T ) + o T (1), which ends the induction step (again one has to consider two cases depending on the orientation of the edge adjacent to v and for each of them use one of the assertions given in assumption (A3)).

Proof of Theorem 2.3
Since the Marchenko-Pastur distribution is determined by its moments, which are given by the right hand side of (2), it is enough to prove the first part of the theorem. A major part of the proof will follow the classical approach for matrices with independent entries, which will be complemented by Proposition 2.6.
Definition of ∆ graphs We will work with the class of ∆ graphs, following the definition given in [7] but slightly changing the formalism to better suit our needs.

1.
For two sequences i = (i 1 , . . . , i k ) and j = ( j 1 , . . . , j k ) of integers (not necessarily distinct) we define a ∆-graph G(i, j) as a bipartite graph (I i , I j , E), such that I i = {i 1 , . . . , i k } (upper indices), I j = { j 1 , . . . , j k } (lower indices) and the set E of edges consists of k directed edges from i u to j u (u = 1, . . . , k) and k directed edges from j u to i u+1 (u = 1, . . . k), where we set i k+1 = i 1 . We will also label the edges from 1 to 2k in the order (i 1 , j 1 ),( j 1 , i 2 ),(i 2 , j 2 ),...,(i k , j k ), ( j k , i 1 ) (which clearly allows for the reconstruction of the indices i, j from the graph G(i, j)). We stress that I i and I j may not be disjoint, but their common elements will be nevertheless treated as different objects when considered as upper and lower vertices of the graph.

3.
Following the classical approach we will now introduce an equivalence relation on the pairs of indices (i, j).
We will say that two pairs (i, j) and (i , j ) are isomorphic if there exist injective functions f , g from I i , I j onto I i , I j respectively such that for u = 1, . . . , k,

4.
We will call graphs G(i, j) and G(i , j ) isomorphic (which we will denote by G(i, j) ∼ G(i , j )) iff (i, j) and (i , j ) are isomorphic.

5.
Let ∆(k) be the set of representatives for the isomorphism classes of ∆ graphs G(i, j) such that i = (i 1 , . . . , i k ), j = ( j 1 , . . . , j k ) and i l ∈ {1, . . . , k}, j l ∈ {k + 1, . . . , 2k}. Note that any graph based on two sequences of length k is isomorphic to a graph in ∆(k) (it is enough to properly relabel the vertices).
We will also define for ∆ ∈ ∆(k), I n ∆ to be the set of all indices i: • for any two lower vertices v, w, i v = i w , • for any upper vertex v, i v ∈ {1, . . . , n}, • for any lower vertex v, i v ∈ {1, . . . , N n }.
With the above notation we can write nN k n ∆∈∆(k) i,j∈{1,...,max(n,N n )} k : , where E(∆) is the set of edges of ∆.
Now, still following [7], we can divide ∆(k) into three classes ∆ i (k), i = 1, 2, 3, where • ∆ 1 (k) is the class of graphs, in which to each down edge there corresponds exactly one up edge with the same vertices and after merging the corresponding up and down edges and disregarding the orientation, one obtains a tree with k + 1 vertices, • ∆ 2 (k) is the class of graphs in which after disregarding the orientation of edges there is an edge of multiplicity one, Note that by assumption (A2) for ∆ ∈ ∆ 2 (k) and every i ∈ I n ∆ , Moreover, using connectedness, it is easy to see that each ∆ ∈ ∆ 3 (k) has at most k vertices and therefore |I n is bounded by a constant depending only on k and the constants in (A1), this implies that graphs from ∆ 3 (k) have no asymptotic contribution to 1 n EtrM k n . Thus, just as in the classical case of independent entries, we are left with the analysis of the contribution coming from ∆ 1 (k). This is where we will apply Proposition 2.6.
For r = 0, . . . , k − 1 let ∆ 1 (k, r) be the class of those graphs in ∆ 1 (k) which have exactly r + 1 upper vertices (which implies that there are k − r lower vertices). It is well known (see e.g. Lemma 3.3 in [7]) that |∆ 1 (k, r)| = 1 r+1 k r k−1 r . For each ∆ ∈ ∆ 1 (k, r) let T (∆) be the rooted tree obtained from ∆ in a way described when introducing the class ∆ 1 (k) (we choose the vertex i 1 to be the root). We have

Proofs of Theorem 2.4 and Corollary 2.5
Theorem 2.4 will also be proved with use of ∆-graphs. Let us remark that the proof of the corresponding theorem for matrices with independent entries given in [16] or [7], as well as its generalization in [13], use Stieltjes transform. However in [7] the authors mention that a combinatorial proof is possible and leave the details to the Reader as an exercise. Many of the formalities introduced below may be seen as a solution to this (rather involved) exercise and one of many possible ways to formalize the underlying combinatorics. We are not aware of any description of this particular problem in the literature (for specific models of random matrices the result may follow from general facts in free probability), but obviously in view of well known proofs of Wigner or Marchenko-Pastur theorems the methodology is rather standard and we do not claim any novelty here. Our main contribution is the observation given in Proposition 2.7 and its implications for the proof of Theorem 2.4 available beyond the case of independent random variables.
The combinatorial construction below will be introduced in full generality, however to illustrate it we provide two concrete examples which, while being relatively simple, capture the essential part of the argument. So as not to obscure the general idea we present these examples after the proof of Theorem 2.4.

1.
For the proof of Theorem 2.4 let us again consider the bipartite graphs G(i, j) as introduced in point 1 of Section 2.4. Although the basic definition of G(i, j) remains the same, because of the shift of the random matrices by zId we are forced to consider different combinatorial structure on the family of graphs, in particular to distinguish a special class of perpendicular edges (as in [7], Chapter 10) and to change the notion of isomorphism (in a way which will preserve perpendicular edges).

2.
Additionally to the partition of edges into the classes of up and down edges we will call an edge perpendicular if its two end-vertices are equal (i.e. have equal labels) or skew if they are distinct. We will denote the set of perpendicular up (resp. down) edges by U P(∆) (resp. DP(∆)) and the set of skew edges by S(∆).

3.
The pairs (i, j) and (i , j ) are said to be isomorphic if there exist injective functions f , g from I i , I j onto I i , I j respectively such that for u = 1, . . . , k, 3. We will call two graphs G(i, j) and G(i , j ) isomorphic iff the pairs (i, j), (i , j ) are isomorphic in the above sense.

4.
Let ∆(k) be the set of representatives for the isomorphism classes of ∆ graphs G(i, j) where i = (i 1 , . . . , i k ), j = ( j 1 , . . . , j k ) and i l , j l ∈ {1, . . . , 2k}. Similarly as in the previous section any graph based on two sequences of length k is isomorphic to a graph in ∆(k).
We will also define for ∆ ∈ ∆(k), I n ∆ to be the set of all indices i: V (∆) → {1, . . . , n} such that • for any two upper indices v, w, i v = i w , • for any two lower indices v, w, i v = i w , • for any edge i u(e) = i d(e) iff e is perpendicular.
Finally let us denote W n = n −1/2 A n − zId = (w i j ) i, j≤n . Note that to simplify the notation we have suppressed the superscript (n) denoting the dependence of the random coefficients on n. In what follows we will keep the same convention and write X i j instead of X (n) i j . We have ,j∈{1,...,n} k : For a fixed ∆ let ∆ be a graph obtained from ∆ by replacing each pair of vertices connected with a perpendicular edge by one vertex and removing corresponding perpendicular edges (while keeping all the skew edges). Then ∆ has |S(∆)| edges and is connected. For fixed ∆ all the summands in the internal sum over i on the right hand side above are bounded by a constant depending only on k and z. Moreover |I n ∆ | ≤ n |V (∆ )| . Thus graphs ∆ such that ∆ has fewer than |S(∆)|/2 + 1 vertices have no asymptotic contribution to n −1 EtrH k n . On the other hand let us notice that if for a skew edge e = (v, w) of multiplicity 1, (w, v) is not an edge of ∆, then the corresponding variable X i u(e) i v(e) appears in the product above exactly once and by (A2), the expectation of the product vanishes. Thus the only graphs with nonzero asymptotic contribution are those ∆'s for which each skew edge e treated as an undirected edge appears at least twice and ∆ has at least |S(∆)|/2 + 1 vertices. But the former condition means that the number a of edges of the graph ∆ obtained from ∆ by identifying corresponding up and down edges (i.e. edges with the same end points in ∆) is at most |S(∆)|/2. Thus together with the number b of vertices of ∆ it satisfies the inequality b ≥ a + 1. Since ∆ is also connected, this means that it is a tree (and b = a + 1 = |S(∆)|/2 + 1). Moreover, since the cycle in ∆ inherited from ∆ corresponds to a walk in ∆ which visits all vertices and returns to the vertex of departure, it means that all skew edges in ∆ appear exactly twice, once as an up edge and once as a down edge.
One can also see that among the perpendicular edges connecting any two vertices of ∆ there are equal numbers of up-edges and down-edges. Therefore we have as can be seen by expanding the product over e ∈ U P(∆) ∪ DP(∆) into a sum.
Proof of Corollary 2.5. LetL n = EL n . By Theorem 2.4, for any k ∈ N we have the convergence x kL n (z) → µ k (|z| 2 ). This already implies tightness ofL n and existence of a probability measure L ∞ with moments µ k (|z| 2 ) together with a subsequenceL n k converging to L ∞ . Since in the special case of i.i.d. Gaussian entries the measureL n is known to converge to a compactly supported measure, we conclude that the sequence of moments µ k (|z| 2 ) determines the measure L and in fact the whole sequenceL n converges to L ∞ .
Recall that the Stieltjes transform of a probability measure ν on R is the function S ν : C + → C given by the formula It is classical that for probability measures ν, ν n on R, the pointwise convergence S ν n → S ν implies the weak convergence of ν n to ν (see e.g. [7], Chapter 12).
Let S n : C + → C be the Stieltjes transform of L n . By Lemma 4.1 in the Appendix and the Borel-Cantelli Lemma |S n (α) − ES n (α)| → 0 almost surely. Thus with probability one S n (α) − ES n (α) → 0 for every α ∈ D, where D is a countable dense subset of C + . But ES n (α) is the Stieltjes transform of L n and thus converges to S L ∞ (α) -the Stieltjes transform of L ∞ , which shows that with probability one, for every α ∈ D, S n (α) → S L ∞ (α). Since S n 's are analytic on C + and jointly bounded on compact subsets of C + , a standard application of Montel's theorem shows that with probability one S n (α) → S(α) for all α ∈ C + , which implies that with probability one, L n converges weakly to L ∞ .
3 The circular law for matrices with independent log-concave unconditional columns

The main result
Before we formulate the main theorem of this section let us recall the basic definitions. Log-concave measures are usually defined in terms of Brunn-Minkowski inequalities. However, as proved by Borell (see [11] for general theory), log-concave measures not-supported on a proper hyperplane can be equivalently characterized in terms of densities. Since we will deal only with isotropic measures (see Definition 2.1) we will therefore use the following For an n × n matrix A let µ A denote the spectral measure of A defined as where λ i are (complex) eigenvalues of A.
Recall now the classical circular law in its most general form, obtained recently by Tao and Vu [36].
where (X i j ) i, j<∞ is an infinite array of independent mean zero, variance one random variables. Then with probability one, the spectral measure µ 1 n A n converges weakly to the uniform distribution on the unit disc in C.
The main result of this section is the following version of the circular law which allows to replace the independence assumption on the entries by a geometric condition of log-concavity and unconditionality.
Theorem 3.4. Let A n be a sequence of n × n random matrices with independent rows X (n) 1 , . . . , X (n) n (defined on the same probability space). Assume that for each n and i ≤ n, X (n) i has a log-concave unconditional isotropic distribution. Then, with probability one, the spectral measure µ 1 n A n converges weakly to the uniform distribution on the unit disc in C.

Preliminary facts on log-concave measures
In this section we will gather the results on log-concave measures which will be used in the proof of Theorem 3.4 (some of them have been already briefly mentioned in Section 2 during the discussion of assumptions (A1)-(A3)).
Let us start with the concentration result for the Euclidean norm (which we will denote by | · |) obtained by Klartag [24], which will ensure that the condition (A3) is satisfied, allowing us to use the results of Section 2.
Theorem 3.5. Let X be an isotropic log-concave random vector in R n . There exist universal positive constants C and c such that for all ∈ (0, 1), The next theorem is a recent small ball inequality by Paouris [29, Theorem 6.2] Theorem 3.6. Let X be an isotropic log-concave random vector in R n and let A be an n× n real nonzero matrix. Then for y ∈ R n and ∈ (0, c 1 ), where c 1 > 0 is a universal constant.
Recall that singular values of an m × n matrix A are eigenvalues of A * A (denote them by σ 1 ≥ . . . ≥ σ n ). In particular σ 1 = sup x∈S n−1 |Ax| = A 2 → 2 (the operator norm of the matrix) and σ n = inf x∈S n−1 |Ax|.
Similarly as in the classical case of matrices with i.i.d. entries we will need control on the smallest and the largest singular value of the matrix A n . To control the latter we will use a recent result from [3] specialized to square matrices (we would like to remark in passing that for the purpose of proving the circular law, a weaker estimate following e.g. from Theorem 3.5 would be enough, we state the strong estimate, since it gives the optimal bound on the operator norm).
Theorem 3.7. If A n is an n × n matrix with independent log-concave isotropic rows, then with probability at least 1 − exp(−c n), A n 2 → 2 ≤ C n, where C, c are universal constants.
The smallest singular value of random matrices with independent entries has been thoroughly investigated e.g. in [31,35,25]. For random matrices with independent log-concave rows/columns it has been considered in [2,1]. For the purpose of proving the circular law, it is enough to have a relatively crude bound on the smallest singular value of the matrix A n − zId, which we provide below. A similar argument in the case of matrices with independent heavy tailed entries can be found e.g. in [10]. Proposition 3.8. If A n is an n × n matrix with independent log-concave isotropic rows, and M n is an n × n deterministic complex matrix, then with probability at least 1 − n −2 , the smallest singular value of A n + M n is greater than cn −3.5 , where c is a universal constant.
Proof. Denote the smallest singular value of A n + M n by σ n . Denote the rows of A n by X i and the rows of M n by Y i (i = 1, . . . , n). It is classical (see e.g. [31,10]) that where H i is the linear span of all the vectors X j + Y j except for the i-th one. Thus Consider any i ≤ n. Since H i is independent of X i , by the Fubini theorem we have P(dist( . Let η = x + i y, where x, y ∈ R n , be any unit vector perpendicular to H i (selected in a measurable way). Then Since |x| 2 + | y| 2 = 1, at least one of the vectors x, y has the Euclidean norm not smaller than 1/ 2. Without loss of generality assume that |x| ≥ 1/ 2.
But 〈X i , x〉 is a log-concave one-dimensional variable of variance at least 1/2. It is well known that such variables have densities bounded from above by an absolute constant (see [21]). Therefore , which together with (5) proves the proposition.

Proof of Theorem 3.4
The proof will be based on the following replacement principle due to Tao  Then µ 1 n A n − µ 1 n B n converges almost surely to 0.
To apply the above theorem, let B n = (g i j ) i, j≤n where (g i j ) i, j is an infinite array of independent standard Gaussian variables (we may assume that they are defined on the same probability space as the vectors X (n) i defining the matrices A n ). We will prove that the hypotheses of Theorem 3.9 are satisfied. To this end we will very closely follow the main steps of the proof given by Tao and Vu for the independent case [36]. Due to lack of independence of all the entries, we cannot use their tools verbatim, however at each step we will be able to replace them by appropriate counterparts available in the log-concave setting (in fact given the 'log-concave toolbox' of Section 3.2, all the steps will be easier in our case) .
Once we check the assumptions of Theorem 3.9, the proof will be concluded since µ 1 n B n converges almost surely to the uniform measure on the unit disc.
The aim is thus to prove that for any z ∈ C, with probability one, To simplify the notation from now on we will suppress the superscript (n) and denote the row vectors of 1 n A n − zId by Z 1 , . . . , Z n . Similarly, let Y 1 , . . . , Y n be the row vectors of 1 n B n − zId. Since where V i is the span of Z 1 , . . . , Z i−1 and similarly where U i is the span of Y 1 , . . . , Y i−1 , the goal is to prove that with probability one.
Let us now recall the following identity dist(X j , W j ) −2 .
Apply the above lemma to M = M n = 1 n A n − zId, (M = M n = 1 n B n − zId resp.). Note that by Proposition 3.8 and the Borell-Cantelli lemma with probability one σ i (M n ) ≥ cn −4 for all sufficiently large n and i ≤ n. Thus with probability one, Moreover, by Theorem 3.5 with probability one for large n the vectors Z i , Y i have Euclidean lengths of the order O z (1), which yields that with probability one for large n and all i ≤ n, To obtain (6) it is thus sufficient to prove that with probability one, Here we again follow Tao and Vu and divide the proof of the above convergence into two lemmas.

Proof of Lemma 3.11
It is clearly enough to consider the part of the sum in question corresponding to Z i 's. Let us first notice that (7) implies that with probability one, for δ < , Thus it is sufficient to deal with the negative part of log dist(Z i , V i ). We will show that for any i ≤ n − n 0.99 , for some universal constant c > 0.
To this end we will demonstrate the following lemma which is a counterpart of Proposition 5.1. in [36].
Proof. Recall that nZ i = X i − nze i . If we denote by P the orthogonal projection onto W ⊥ , we get Since W is of dimension d, we have rank P = n − d, hence P HS = n − d, whereas P 2 → 2 is clearly equal to 1. The only difficulty in applying Theorem 3.6 with y = nz Pe i and A = P is that the matrix P and the vector nz Pe i are complex. This can be however easily overcome by identifying C n with R 2n , writing y = (Re nz Pe i , Im nz Pe i ) and noticing that P X i = A(X i ,X i ), whereX i is an independent copy of X i and A = ReP 0 ImP 0 .
We have A HS = P HS = n − d and A 2 → 2 ≤ 1, which together with Theorem 3.6 ends the proof of the lemma.
Since with probability one V i is of dimension i − 1 and is independent of X (n) i , we can apply Lemma 3.13 conditionally on V i to get (10). Thus by the Borell-Cantelli lemma, with probability one, for n large enough and all 1 ≤ i ≤ n − n 0.99 we have dist(Z i , V i ) ≥ c 1 − i/n and so for δ < 2 we get

Remark
In fact for the purpose of proving the circular law instead of the small ball inequality by Paouris one could use above Klartag's thin shell inequality (however the argument in this case seems to be slightly more technical).

Proof of Lemma 3.12
Let n = (1 − δ)n and let A nn (resp. B nn ) be the matrix with rows nZ 1 , . . . , nZ n (resp. nY 1 , . . . , nY n ). Let L 1 n A nn A * nn (resp. L 1 n B nn B * nn ) be the spectral distribution of 1 n A nn A * nn (resp. 1 n B nn B * nn ). Similarly as in [36], one can show that (9) is equivalent to where ν n,n = L 1 n A nn A * nn − L 1 n B nn B * nn .
The proof of (11) consists in splitting the integral range into several regions.

Final remarks
We would like to point out that the assumption on unconditionality of rows of the matrix A n was used only in the proof of Corollary 2.5, all the other ingredients in the proof of the circular law work well in the case of general isotropic log-concave vectors (in particular one can replace unconditionality with assumption (A2)).
Finally let us mention that both the small ball inequality (Theorem 3.6), used by us to obtain (10), and the condition (A3), necessary to apply Corollary 2.5, can be replaced e.g. by a Poincaré inequality, i.e. by the assumption that Var f(X i )| 2 for all smooth functions f : R n → R (see e.g. [5] for a general exposition) or some other sufficiently strong concentration result for Lipschitz functions (it is a long standing conjecture that isotropic log-concave measures satisfy the Poincaré inequality with a universal constant [23]). If we keep the assumption of unconditionality and instead of log-concavity we assume that all one-dimensional marginals of the row-vectors of the matrix A n have densities bounded by a universal constant (or just a constant depending polynomially on the dimension) we can obtain a suitable version of Proposition 3.8 and repeat the whole proof of the circular law above to get the following Theorem 3.14. Let A n be a sequence of n×n random matrices with independent isotropic unconditional rows X (n) i , s.t.
• the law of X (n) i satisfies the Poincaré inequality with a constant independent of n, • there exists γ < ∞ such that for all x ∈ S n−1 and all n, i 〈X Then the empirical spectral distribution of A n converges almost surely to the uniform measure on the unit disk.
Remark After this paper was accepted for publication the author has been able to remove the assumption of unconditionality from Theorems 3.4 and 3.14. The proof relies heavily on the Stieltjes transform techniques (motivated by [13]) and will be presented in a separate article.

Appendix. Concentration of the Stieltjes transform for matrices with independent rows
We will now present a concentration inequality for the Stieltjes transform of the spectral distribution of random matrices of the form AA * , where A is a random matrix with independent rows. The argument is pretty standard and we include it here just for the sake of completeness. Its various versions can be found e.g. in [7, p. 313] in the case of matrices with independent entries, however (as noted in [14]) in fact just independence of rows is required. Proof. Let E k denote the expectation with respect to the last n − k rows of the matrix A n and let X k be the k-th row of A, W k the matrix consisting of the remaining n − 1 rows of A and H k = W k W * k . Then, as one can easily check, We have which implies that |γ k | ≤ 2 y −1 .