Limit theorems for linear eigenvalue statistics of overlapping matrices

The paper proves several limit theorems for linear eigenvalue statistics of overlapping Wigner and sample covariance matrices. It is shown that the covariance of the limiting multivariate Gaussian distribution is diagonalized by choosing the Chebyshev polynomials of the first kind as the basis for the test function space. The covariance of linear statistics for the Chebyshev polynomials of sufficiently high degree depends only on the first two moments of the matrix entries. Proofs are based on a graph-theoretic interpretation of the Chebyshev linear statistics as sums over non-backtracking cyclic paths


INTRODUCTION
Recently, there was a surge of interest in the spectral properties of overlapping submatrices of large random matrices. A seminal study was done by Baryshnikov [3], which derived the joint eigenvalue distribution for principal minors of Gaussian Unitary Ensemble (GUE) and related this distribution to the last passage percolation problem. Later Johansson and Nordenstam [14] established the determinantal structure of this joint distribution, and their results were generalized in [10], [11] and [17] to other unitarily invariant ensembles of random matrices. Recently, Borodin in [4] and Reed in [18] obtained limit theorems for eigenvalue statistics of overlapping real Gaussian matrices. We extend these results further to Wigner and sample covariance matrices which lack the rotation invariance structure. 1.1. Wigner matrices. In our Wigner matrix model, A N and B N are two Hermitian matrices which are submatrices of an infinite Hermitian matrix X. This matrix X has independent identically distributed entries above the main diagonal and independent entries (with a possibly different distribution) on the main diagonal. We impose the following assumptions about the matrix entries. Let the off-diagonal entries of X be real random variables with all moments finite and assume that E (X ij ) = 0, E X 2 ij = 1, and E X 4 ij = m 4 . In addition, assume that the diagonal terms have all finite moments and in particular, E (X ii ) = 0 and We think about A N and B N as sequences of matrices of increasing size and we assume that there is a parameter t N that approaches infinity as N → ∞, and that quantities a (N ) /t N , b (N ) /t N , and ∆ (N ) /t N approach some positive limits a, b, and ∆, respectively.
We define the normalized matrices The normalization is chosen in such a way that the empirical distribution of eigenvalues of A N and B N converges to a distribution supported on the interval [−1, 1] . If f : R → R is a test function, then we define the corresponding linear statistic for matrix A N as where λ i are the eigenvalues of the matrix A N . It is the linear statistic of the eigenvalues of A N after rescaling. We define linear statistics for matrix B N similarly. Let us also define the centered statistics, We are interested in the joint distribution of these linear statistics when N is large. Recall that the Chebyshev polynomials of the first kind are defined by the formula: T k (cos θ) = cos kθ. We will prove the following result.
Theorem 1.1. Assume A N and B N are the real Hermitian overlapping matrices. Let T k (x) denote the Chebyshev polynomials of the first kind. As N → ∞, the distribution of the centered linear statistics N o (T k , A N ) and N o (T l , B N ) converges to a two-variate Gaussian distribution with the covariance equal to For the author, the motivation for this model came from the paper by Borodin [4] where a similar problem was considered and solved for matrices whose entries have the same first four moments as the Gaussian random variable. Formula (1) shows an interesting fact: the covariance structure of Chebyshev polynomials of degree higher than the second depends only on the first two moments of the off-diagonal matrix entries. FIGURE 2. Overlapping non-symmetric matrices 1.2. Sample covariance matrices. Next, we study the singular values of non-symmetric overlapping matrices. This is equivalent to the study of eigenvalues of certain sample covariance matrices. Namely, let A N and B N be two submatrices of an infinite random matrix X with independent identically distributed random entries. Let the entries of X be real random variables with all moments finite and assume that E (X ij ) = 0, E X 2 ij = 1, and E X 4 ij = m 4 . Suppose that A N has a As before, we think about A N and B N as sequences of matrices of increasing size and we assume that there is a parameter t N that approaches infinity as N → ∞, and that quantities 2 /t N approach some positive limits a 1 , a 2 , b 1 , b 2 , ∆ 1 , and ∆ 2 , respectively.
Let us define the normalized sample covariance matrices identity matrix, and Again, the normalization is chosen in such a way that the empirical distribution of eigenvalues of W A N and W B N converges to a distribution supported on the interval [−1, 1] . If f : R → R is a test function, then we define the corresponding linear statistic for matrix where λ i are the eigenvalues of the matrix W A N . Hence, this is a linear statistic of the squares of matrix A N 's singular values. We define linear statistics for matrix W B N similarly. Let us also define the centered statistics, Finally, let Theorem 1.2. Assume A N and B N are the real random overlapping matrices with the matrix entries that have the moments described above. Let T k (x) denote the Chebyshev polynomials of the first kind. As N → ∞, the distribution of the centered linear statistics N o (T k , W A N ) and N o (T l , W B N ) converges to a two-variate Gaussian distribution with the covariance equal to 1.3. CLT for continuously differentiable functions. In order to extend our results to continuously differentiable functions, we have to restrict the scope to the models with matrix entries that satisfy the Poincare inequality. The reason for this is that we will need a measure concentration inequality which is proven only for these models. Recall that a matrix entry X ij satisfies the Poincare inequality if there is a constant c > 0 such that for every continuously differentiable function f (x) , we have For example, the Poincare inequality holds for models with Gaussian entries or with complex entries uniformly distributed on the unit circle but not for the model with ±1 entries. In this section we consider only the case of Wigner matrices with real entries that satisfy the Poincare inequality. The results can be extended to the case of sample covariance matrices or to the complex case.
First, let us define the coefficients in the expansion of a function f over Chebyshev polynomials: Let F be the linear subspace of functions f : R → R which are differentiable with continuous derivative in the interval I δ = [1 − δ, 1 + δ] . Theorem 1.3. Assume that A N and B N are the overlapping real Hermitian matrices with the moments of matrix entries described in Section 1.1. Assume that matrix entries satisfy the Poincare inequality. Then for every pair of functions f and g in F, the linear statistics N o (f, A N ) and N o (g, B N ) converge in distribution to the bivariate Gaussian random variable with the covariance matrix V : Let us put the results of this paper in a more general prospective. Linear statistics of sample covariance matrices have been first investigated in Jonsson [15], who established the joint CLT for the moments of eigenvalues. A recent contribution is by Anderson and Zeitouni [2], who extended the study of linear statistics to a very general class of matrices with independent entries of changing variance. They derived a formula for the covariance of linear eigenvalue statistics, proved a CLT theorem for continuously differentiable test functions, and noted a relation to the Chebyshev polynomials. For more restricted classes of matrices, namely, for Gaussian and unitarily invariant matrices, important results about linear statistics were established in Diaconis-Shahshahani [8], Costin-Lebowitz [6], Johansson [13], Soshnikov [20] and [21], and Diaconis-Evans [7].
The main novelty of our results is that they extend the investigation to the spectra of overlapping submatrices. In addition, they explain the central role of the Chebyshev polynomials of the first kind by relating these polynomials to the number of non-backtracking tailless paths on a finite graph. This extends previous results by Feldheim and Sodin in [9], which were concerned with the Chebyshev polynomials of the second type.
In addition, we have found that the covariance of Chebyshev polynomials of the degree higher than the second depends only on the matrix entry variance, not on their higher moments. This result holds both for Wigner and sample covariance matrices. In order to see the significance of this result, recall that from the "four moment theorem" by Tao and Vu [23], we can expect that the covariance matrix in the CLT for linear statistics depends only on the first four moments of the matrix entries. However, the fact that the covariance for high-degree Chebyshev polynomials depends actually only on the first two moments is surprising and it seems that it was not noted before in the literature.
The rest of the paper is organized as follows. In Section 2, we will show how the Chebyshev polynomials of the first type are related to the non-backtracking tailless paths. In Section 3, we will prove the CLT results for simpler models in which the entries are either ±1 or uniformly distributed on the unit circle (Theorems 3.1 and 3.2). In Section 4 we will prove Theorems 1.1 and 1.2. Section 5 is devoted to the proof of Theorem 1.3. And Section 6 concludes.

THE CHEBYSHEV T-POLYNOMIALS AND NON-BACKTRACKING PATHS
The Chebyshev polynomials of the first and the second kind are defined by the formulas T k (cos θ) = cos (kθ) and U k (cos θ) = sin (kθ) / sin θ, respectively. For k ≥ 2, both the T and U polynomials satisfy the same recursion. For example, for U polynomials, it is U k (x) = 2xU k−1 (x) − U k−2 (x) . The inital conditions in this recursion are T 0 (x) = 1 and T 1 (x) = x for T polynomials and U 0 (x) = 1 and U 1 (x) = 2x for U polynomials.
The T and U polynomials can be related as follows: where ε = 1 if k is odd and ε = 0 if k is even.
2.1. Wigner matrices. Let G be a (d + 1)-regular graph with the vertex set V = {1, . . . , n} . We will say that the matrix A is a generalized adjacency matrix if it is Hermitian and if ) is an edge of G and 0 otherwise. A path γ is a sequence of vertices (u 0 , u 1 , . . . , u k ) which are adjacent in graph G. The length of this path is k. The path is called non-backtracking if u j+1 = u j−1 . It is closed if u k = u 0 . A closed path is non-backtracking tailless if it is non-backtracking and u k−1 = u 1 .
Theorem 2.1 and the following remark are essentially due to Feldheim and Sodin [9]. (See Lemma II.1.1 and Claim II.1.2 in their paper, where this result is proved for complete graphs.) Theorem 2.1 (Feldheim-Sodin). Suppose that A is a generalized adjacency matrix for a (d + 1)-regular graph G. Then for all k ≥ 1, where the sum is over all non-backtracking length-k paths from u to v, and P k (x) is a polynomial defined for k = 1, 2 as P 1 (x) := x, P 2 (x) = x 2 − (d + 1), and for k ≥ 3 by the recursion: Remark: P k (x) can be expressed in terms of Chebyshev's U -polynomials as follows: Theorem 2.2. Suppose that A is a generalized adjacency matrix for a (d + 1)-regular graph G. Then for all k ≥ 1, if k is odd, where the sum on the left hand-side is over all closed non-backtracking tailless paths (u 0 , u 1 , . . . , u k−1, u 0 ) of length k.

Proof: Let
where the sum over all closed non-backtracking tailless paths of length k that start from u 0 . Also, define where the sum is over all closed non-backtracking tailless paths of length k − 2 that start from u 1 , except that the paths that start with (u 1 , u 0 ) and those that end with (u 0 , u 1 ) are excluded. Then we can write: where the sums are over non-backtracking paths that start from (u 0 , . . . , u s ) and return along the same path in the opposite direction. For example, the second term corresponds to paths that start at u 0 , go to u 1 , then perform a non-backtracking tailles closed path of length k − 2 and then return to u 0 .
Next, we sum this expression over all possible u 0 and count how many times a term corresponding to a particular non-backtracking tailless path of length k − 2s , say γ = (v 0 , v 1 , . . . v k−2s−1 , v 0 ), enters the resulting expression. It is easy to see that such a term must be a part of the sum where u 0 is now arbitrary. In order for γ to be counted by a term indexed by (u 0 , . . . , u s ) in this sum, the non-backtracking path (u 0 , . . . , u s ) has to satisfy the following constraints: Hence, we can write: In addition v 0 Q k (A, v 0 ) = 0 for k = 1 and k = 2. Then, formulas (5), (8), and (9) imply immediately that

Sample Covariance Matrices.
A bipartite graph is a graph whose vertices belong to two sets, V and W, such that the vertices in V are connected only to vertices in W and vice versa. A bipartite graph is (c + 1, d + 1)-regular if every vertex in V is connected to c + 1 vertices in W, and every vertex in W is connected to d + 1 vertices in V.
Let G be a (c + 1, d + 1)-regular graph with |V | = n and |W | = m. Consider an n-by-m matrix A. We will identify row indices with elements of V and column indices with elements of W. We say that an n-by-m matrix A is a generalized adjacency matrix for a bipartite graph Let us define the following object where the summation is over all non-backtracking paths The following two results are essentially due to Feldheim and Sodin [9]. (However, their expressions for R k in terms of Chebyshev polynomials are incorrect. Compare their Lemma IV.1.1 and Claim IV.1.2.) Theorem 2.3. Suppose that the matrix A is a generalized adjacency matric for a (c + 1, d + 1)regular bipartite graph G. Then for all k ≥ 1, where R k (x) are polynomials, which for k = 1, 2 are defined as x + (c + 1) c, and for k ≥ 3, by the following recursion: Proof: It is easy to check the statement for k = 1.
and sum over v k−1 and w k . We use definition (10) for and every of these elements will be counted d − 1 times (once for each of possible and every of these elements will be counted cd times if k ≥ 3 and (c + 1) d times if k = 2 (every element will be counted once for each of It is also easy to check the formula for where U k are the Chebyshev polynomials of the second kind and U k := 0 for k < 0. Then, for Proof of the lemma is by verifying that the formula satisfies the recursion and that it gives the correct initial terms of the recursion. Now, let us define where T k are the Chebyshev polynomials of the first kind. Theorem 2.5. Suppose that the matrix A is a generalized adjacency matric for a (c + 1, d + 1)regular bipartite graph G with vertex set V ∪ W. Assume c ≥ d. Then for all k ≥ 1, where the sum is over all closed non-backtracking tailless paths where the sum over all closed non-backtracking tailless paths as the sum over all closed nonbacktracking tailless paths of length 2k that originate at v 1 ∈ V then proceed along an edge different from (v 1 , w 1 ) and end along an edge different from (w 1 , v 1 ) .
We define S k (w 1 ) and S k (w 1 , v 0 ) similarly except now the paths start with a vertex in W, for example, Then, we can write the following identity: where sums are over non-backtracking paths.
Let us sum this identity over v 0 , and consider, for example, a specific path of length this path will be counted d − 1 times (once for all possible v 0 not equal to v 1 and not equal to v k−1 ). By a similar counting we find that the following identity holds: defines a bijection of the non-backtracking tailless paths that start with a vertex in V to nonbacktracking tailless paths that start with a vertex in W. In addition, obviously It remains to check that if we define the polynomial S k (x) by the formula are defined as in Lemma 2.4, then the following identity is true: This can be verified by using the identity (5).

LINEAR STATISTICS OF POLYNOMIALS FOR SIMPLE MODELS.
We will use the results of the previous section to prove Theorems 3.1 and 3.2 below.
3.1. Wigner matrices. In this section we consider the following simple model. Assume that the diagonal entries are zero. We consider two cases for off-diagonal entries. Either the entries are uniformly distributed on the unit circle or they take values ±1 with probability 1/2.
In this section the normalized matrices A N will be defined as follows: is clearly not essential for first-order asymtotics. However it makes some formulas shorter. Recall that if f : R → R is a test function, then we define its linear statistic for matrix A N as with β = 1 for the model with ±1 entries and β = 2 for the model with entries on the unit circle.
Proof of Theorem 3.1: We will only compute covariance. The proof that other moments are determined by covariance follows from a similar combinatorial analysis. This analysis is sketched below in the proof of the corresponding theorem for the sample covariance matrices. Let , and let us estimate the following object: 1 where the sums are over all pairs of non-backtracking tailless ("NBT") cyclic paths of length k. Consider the sum and assign a type and a reduced type to each term in this sum. The type of a term is a graph and a pair of paths on this graph. The graph is formed by the edges of the NBT closed paths (v 0 , . . . , v k−1 , v 0 ) and v 0 , . . . , v k−1 , v 0 and the paths are induced by these two paths. We understand that the original labels of the vertices are removed in the type graph. In other words, the type is an equivalence class defined up to isomorphisms induced by arbitrary relabellings of the vertices. For example, any two disjoint cyclic NBT paths (in which no vertex except the original one is repeated twice) correspond to the same type. The reduced type is simply the type graph with the information about the paths removed.
Note that due to the expectation sign, the only types that bring a non-zero contribution to the sum are those in which every edge is traversed even number of times by the NBT paths.
In addition, if a term has a disconnected reduced type graph then its contribution will be cancelled by the contribution of a corresponding term from the product of the sums EA (v 0 , . . . , v k−1 , v 0 ) and EB v 0 , . . . , w k−1 , v 0 . Hence we only need to consider terms with a connected reduced type graph.
Finally, note the crucial observation that since the paths are non-backtracking and tailless, hence every vertex in the reduced type graph has the degree greater or equal to two.
Consider the sum of all terms that have the reduced type graph equal to the cycle on k vertices. Since both paths are on this cycle, hence the vertices must have the labels from rows and columns that are common to both A and B matrices. Hence, the number of valid labelings of the reduced type graph is asymptotically close to ∆ (N ) t N k , with the error Next, note that each reduced type contains 2k different types since every NBT path can start from any of k possible vertices in V and the path can have either the same or opposite orientation. In the case β = 1 both orientations contribute and in the case β = 2, only one orientation contribute. Hence, the total number of terms in this reduced type can be estimated as Each term of this type brings a contribution of 1 to the sum 14. After taking into account the normalization, we find that the contribution of this reduced type to the covariance is The next step is to show that the contribution of all other terms is negligible for all other types if N is large. This is straightforward. Indeed, if any edge is traversed by the NBT paths more than twice, then the total number of edges in the reduced type graph is < k, hence the number of vertices is also less than k (using the fact that each vertex must have the degree of at least 2). The total number of paths that connect r vertices is smaller than r r . Hence the number of terms with this reduced type is of order not greater than O k k t k−1 N or less, which implies that these terms give a negligible contribution for large N provided that k k t N . This holds if k log N/ (log log N ) , in particular for fixed k. Next, suppose that every edge is traversed exactly twice by the NBT paths, hence the number of edges is k. Suppose, however, that the graph is not a cycle and therefore one of the vertices must have the degree of at least 3 (since we ruled out disconnected graphs and all vertices have the degree of at least 2). Since sum of degrees is twice the number of edges it follows again that the number of vertices is < k and the number of terms in this reduced type is bounded by O k k t k−1 N . Hence the contribution of these terms is negligible. For k = l, we find that contributions of all types are negligible. Therefore, by using Theorem 2.2, we find that .
The condition on k and l arises because there are no cycles of length 1 and 2 on the underlying graph.
3.2. Sample covariance matrices. In this section we will assume that the entries of matrices A N and B N are either ±1 with equal probability, which we call β = 1 or the real case, or are uniformly distributed on the unit circle, which we call β = 2 or the complex case.
Let us define the normalized sample covariance matrices in the following form: identity matrix, and The normalization is chosen in such a way that the empirical distribution of eigenvalues of W A N and W B N converges to a distribution supported on the interval [−1, 1] . The choice of a is not essential for asymptotics, and it makes some formulas shorter. Define where λ i are the eigenvalues of the matrix W A N , and let The linear statistics for the matrix W B N are defined similarly. We will prove Theorem 3.2 in a slightly different form. First, recall the definition (12) of the rescaled Chebyshev polynomials: where T k are the Chebyshev polynomials of the first kind. (Here we explicitly indicate that T k depends on the parameters c and d.) Next, recall that there is a parameter t N that approaches infinity as N → ∞, and that quantities a (N ) 2 /t N approach some positive limits a 1 , a 2 , b 1 , b 2 , ∆ 1 , and ∆ 2 , respectively.
Define the rescaled linear statistics: as defined in (16) and X k (B N ) can be interpreted similarly. We will show that the following result holds. It is easy to check that it is equivalent to the result in Theorem 3.2. Proposition 3.3. As N → ∞, the distribution of the rescaled linear statistics X k (A N ) and X l (B N ) converges to a two-variate Gaussian distribution with the covariance equal to if min {k, l} > 1 and 0 otherwise.
Proof of Proposition 3.3: Note that matrix A N is a generalized adjacency matrix for a complete bipartite graph G. The vertex sets of G have a  are not random, we only need to understand the joint distribution of the normalized sums S k (A N ) := (t N ) −k A (v 0 , w 1 , v 1 , . . . , w k , v 0 ) and S l (B N ) := (t N ) −l B (v 0 , w 1 , v 1 , . . . , w l , v 0 ) . In particular, we need to show that in the limit of large N this distribution is Gaussian and to compute its covariance. The pattern of the argument is well-known (see, for example, Section 2.1 in Anderson, Guionnet, Zeitouni [1]) and for this reason we will be concise.
Consider the case k = l. (The case of k = l is similar and will be omitted.)

Covariance
We are interested in estimating the following object: where the sum is over all pairs of non-backtracking tailless ("NBT") cyclic paths of length 2k each. Note that for k = 1, the number of such paths is zero. In the following we assume k > 1. Consider the sum and assign a type and a reduced type to each term in this sum. This is done as for Wigner matrices with a small modification. Namely, since the original graph is bipartite, the reduced type graphs are also bipartite and we will keep the information about the partition. In other words, while we remove the labels from the vertices, we will still keep information whether a given vertex corresponds to a row or a column of the matrix. (This information is not very important for this model. However it will be useful in a later model with more general matrix entries.) First, note that due to the expectation sign, the only types that bring a non-zero contribution to the sum are those in which every edge is traversed even number of times by the NBT paths. In addition, if a term has a disconnected reduced type graph then its contribution will be cancelled by the contribution of a corresponding term from the product of the sums EA (v 0 , . . . , w k , v 0 ) and EB (v 0 , . . . , w k , v 0 ). Hence we only need to consider terms with a connected reduced type graph.
Crucially, since the paths are non-backtracking and tailless, hence every vertex in the reduced type graph must have the degree greater or equal to two. Now, consider the sum of all terms that have the reduced type graph equal to the cycle on 2k vertices. Since both paths are on this cycle, hence the vertices must have the labels from rows and columns that are common to both A and B matrices. Hence, the number of valid labelings of the reduced type graph is asymptotically close to ∆ (N ) Next, note that each reduced type contains 2k different types since every NBT path can start from any of k possible vertices in V and the path can have either the same or opposite orientation. In the case β = 1 both orientations contribute and in the case β = 2, only one orientation contribute. Hence, the total number of terms belonging to this reduced type can be estimated as Each term of this type brings a contribution of 1 to the sum. After taking into account the normalization, we find that the contribution of this reduced type to the covariance is The next step is to show that the contribution of all other terms is negligible for all other types if N is large. This is straightforward. Indeed, if any edge is traversed by the NBT paths more than twice, then the total number of edges in the reduced type graph is < 2k, hence the number of vertices is also less than 2k (using the fact that each vertex must have the degree of at least 2). The total number of paths that connect r vertices is smaller than r r . Hence the number of terms with this reduced type is of order not greater than O k 2k t 2k−1 N or less, which implies that these terms give a negligible contribution for large N provided that k 2k t N . This holds if k log N/ (2 log log N ) , in particular for fixed k. Next, suppose that each edge is traversed twice by the NBT paths but the reduced type graph is not a cycle. Then one of the vertices must have the degree of at least 3 (since we ruled out disconnected graphs and since all vertices must have the degree of at least 2.) Since the sum of degrees is twice the number of edges it follows again that the number of vertices is < 2k and the number of terms in this reduced type is bounded by O k 2k t 2k−1 N . Hence the contribution of these terms is negligible. Therefore, we arrive to the conclusion that Similarly,

Higher moments
The argument for higher moments is similar. Consider an m-th joint moment, where m a + m b = m. The type of a term in the expansion of this moment is given by a graph and m non-backtracking tailless paths which correspond to the factors in the product Non-negligible contributions come from the terms whose reduced type is the union of disjoint cycles of length 2k. These cycles must be traversed by the NBT paths exactly twice, which implies, in particular that m must be even. This defines a matching on the set of NBT paths. Every pair in this matching contributes to the limit of the m-th moment a factor equal to the limit covariance of the terms corresponding to the NBT paths. Hence, the resulting sum coincides exactly with the Wick formula for the higher moments of the Gaussian distribution. (See Zee [24]).

MORE GENERAL MATRICES
4.1. Wigner. Proof of Theorem 1.1: Following Borodin [4], we can write the following expression for the limit of the covariance of tr A N / √ t N k and tr B N / √ t N l as N → ∞ : Here C k := 2k k / (k + 1) are the Catalan numbers. Among other applications in combinatorics, they count the number of rooted planar trees with k edges.
For the convenience of the reader, let us explain the second term in this expression. Other terms can be obtained by a similar combinatorial analysis. Let us call "higher-order terms" all those terms in the expression for the covariance of tr A/ √ t N k and tr B/ √ t N l that involve matrix entry moments that are of the order higher than the second. Proof: Recall that every term in the expression for the covariance of tr A N / √ t N k and tr B N / √ t N l is coded by two paths, γ 1 and γ 2 , of the lengths k and l, respectively.
These paths trace the graphs G 1 and G 2 . Let them have e 1 and e 2 edges, respectively, and let f of these edges be common to both of these graphs. Then the union of these graphs, G 1 ∪G 2 , has e 1 + e 2 − f edges. We can assume that G 1 ∪ G 2 is connected, since otherwise the term that corresponds to (γ 1 , γ 2 ) would give a zero contribution to the covariance. Hence, the number of vertices in exactly one cycle as a subgraph, and ε < 0 in all other cases.
Every edge in G 1 ∪ G 2 must be traversed at least twice by the union of paths γ 1 and γ 2 . Since we are interested in the contribution of higher-order terms, hence at least one edge must be traversed 3 times or more. Hence we have k + l − 2 (e 1 + e 2 − f ) = δ, where δ = 1 if one of the edges is traversed 3 times and all others -two times, δ = 2 if one of the edges is traversed 4 times and all others -two times, or if two of the edges are traversed 3 times and all others -two times, and δ > 2 in all other cases.
The contribution of the type (G 1 ∪ G 2 , γ 1 , γ 2 ) to the covariance is kl be the additional term in the expression for the covariance of the statistics N o x k , W A N and N o x l , W B N which is due to the matrix entry moments of the order higher than the second. Then, as N → ∞, Let us expand this expression as a sum of products of matrix entries and take the expectation of each term in the expansion. We observe that the only reduced graph types that involve the entry moments higher than the second and that bring non-negligible contributions for large N are given by two trees glued along an edge. One of these tree has k edges and another one has l edges. The paths have lengths 2k and 2l, respectively, and they traverse every edge of the corresponding tree twice. Now suppose that the tree with k edges has t 1 row vertices and k + 1 − t 1 column vertices, and the tree with l edges has t 2 row vertices and k + 1 − t 2 column vertices. (To avoid confusion, note that for simplicity we set the matrix size parameter t N equal to N here, and we use letter t for a different purpose.) Then the contribution of this reduced graph type is as N → ∞, the contribution of this type will converge to In order to calculate the number of these reduced graph types, we need the following lemma.  Hence, the number of the reduced types with contribution (19) is given by (The prefactor kl corresponds to the choice of the tree edges that are glued together in the reduced type.) Therefore, the total contribution of higher entry moments to (18) converges to Next, we note that by the binomial theorem By using (20), we obtain the formula in the claim of the proposition. In fact, R k (a 1 , a 2 ) can be computed explicitly, and surprisingly, its value does not depend on a 1 or a 2 .
Proof: Recall that the Narayana polynomials are defined as Hence, if we take x = a 1 /a 2 , then we have Next, we use the fact that N s (x) are related to a particular case of the Jacobi polynomials. Namely, (This fact was apparently first noted in Proposition 6 of [16].) By substituting this identity into the previous formula, we get From (1) and (3), we can conclude that Y must coincide with W. Since this is true for every limit point Y, we will be able to infer that N o (f, A N ) and N o (g, B N ) converge to W as N → ∞.
Before proceeding to implementing this plan, let us derive some preliminary results. First, we will need some additional facts about expansions in Chebyshev polynomials. Consider the change of variable x = cos θ, where θ ∈ [−π, π] , and define F (θ) = f (cos θ). If f is absolutely continuous on [−1, 1] , then F (θ) is absolutely continuous on [−π, π]. The coefficients f n in the expansion of f in Chebyshev's polynomials correspond to the Fourier coefficients in the Fourier expansion of F (θ): First, we are going to show that if f is continuously differentiable, then ∞ n=1 n f n 2 < ∞. This will show that the entries of the covariance matrix V, defined in the statement of Theorem 1.3, are finite. In fact, this holds for a more general class of functions, namely, for the continouous embedding of the Sobolev class W 1,p .
Proof is by verification that c, d K is a scalar product.
We define Then, we have where c is a constant. Hence From (24) and (25) it follows that f − P f,m Lip → 0, where · Lip is the Lipschitz norm on the interval I δ . (For differentiable functions, Lipschitz norm is defined by h Lip := sup x∈I δ |h (x)| + sup x∈I δ |h (x)| .) In addition, by (23) and (24) we have In particular, by the triangle inequality, f K − P f,m K → 0 as m → ∞,and also We define P g,m (x) similarly.
The diagonal entries of this matrix converges to the diagonal entries of the matrix V, defined in the statement of Theorem 1.3. The off-diagonal term (V m ) 12 can be written as The first two terms are small by the application of the Schwartz inequality for the scalar product ·, · K , and the third term coincides with V 12 . Hence we can conclude that V m − V converges to zero as m → ∞. This implies that the Gaussian distributions W m converge to the Gaussian distribution W with the covariance matrix V. This finishes the first step of our proof. In order to prove tightness for the distribution family of {N o (f, A N ) , N o (g, B N )} (with respect to N ), we are going to prove that the norms of their covariance matrices are bounded. In fact, it is enough to prove that variances of each of N o (f, A N ) and N o (g, B N ) are bounded, since then the covariance will be bounded automatically.
Here, we rely heavily on the Poincare inequality property ("PI") of the matrix entries. The essential feature of the PI property is that it is well behaved with respect to taking the product of measures. By definition, the measure η on R has the PI property, if for some c η > 0 and all differentiable f, By approximation, this can be further extended to the case when h is Lipschitz. In particular, if h is a Lipshitz function on R, then we have Next, recall that N (f, A N ) = Trf A N / √ 4a (N ) . By using the properties of the PI with respect to scaling and taking products, we find that the joint distribution of the matrix entries of A N / √ 4a (N ) satisfies the PI with the constant c/a (N ) . At the same time, if the function f (x) is Lipschitz on R, then the function Trf (X) is Lipshitz on the space of Hermitian matrices, and (See Lemma 1.2. in [12]). Hence, by using (26), we find that where C is an absolute constant and c η depends only on the distribution of matrix entries. A complication arises since under our assumptions, f is assumed Lipschitz only on the interval I δ = [−1 − δ, 1 + δ]. Outside of I δ , we only know that it has a polynomial growth. Fortunately, in this situation we can write f as a sum of two functions: f = f 1 + f 2 , with f 1 Lipschitz and bounded everywhere on R, f 1 Lip < ∞, and f 2 vanishing on I δ/2 : [−1 − δ/2, 1 + δ/2] and having a polynomial growth. Then (27) can be applied to bound . In addition, from the results about the spectra of Wigner matrices, it is known that the probability for A N / √ 4a (N ) to have an eigenvalue outside of I δ/2 becomes exponentially small in N, as N grows. This implies that E Trf 2 A N / 4a (N ) 2 → 0, as N → ∞.
A similar argument holds for the random variable Trg B N / √ 4b (N ) , and therefore the norm of the covariance matrices of these two random variables is bounded. This shows that the distributions of the pairs {N o (f, A N ) , N o (g, B N )} form a tight family and concludes the second step of our proof.
Next, let Y be a limit point for the distributions Y N of {N o (f, A N ) , N o (g, B N )} , so that Y N k → Y in distribution for a sequence of N k . We are going to estimate the difference between the characteristic functions of the distributions Y and W m .
For convenience, we will assume that all relevant random variables are realized on a single probability space so that convergence in distribution reflects convergence almost surely. In this realization, let Y N k and W m,N k denote random variables that have the same joint distribution as {N o (f, A N k ) , N o (g, B N k )} and {N o (P f,m , A N k ) , N o (P g,m , B N k )} . The variables Y N k and W m,N k converge almost surely to random variables Y and W m , that have the distributions Y and W m , respectively. Then, where the inequality follows from Fatou's lemma.
Next, note that by using (27), we have lim sup which implies that for the first component of the vector Y N k − W m,N k we have the following bound: lim sup A similar expression can be written for the second component of Y N k − W m,N k . Now, suppose ξ 1 and ξ 2 are two real random variables with finite variances. Assume where the first inequality is a consequence of inequality II.3.14 on p.278 in Shiryaev [19]. By applying this inequality to the two components of the vector Y N k − W m,N k , and by using (28) and its analogue for the function g, we find:

which implies that
Ee itY − Ee itWm ≤ c t 2 max f − P f,m Lip , g − P g,m Lip .
By our choice, as m → ∞, P f,m and P g,m converge to f and g, respectively, in the Lipschitz norm. Hence, the random variables W m converge in distribution to Y. This concludes the third and final step of our proof. As explained before, these three steps imply that {N o (f, A N ) , N o (g, B N )} converge in distribution to the Gaussian random variable W.

CONCLUSION
We computed the joint distribution of the eigenvalue statistics for two random matrix models. For both Wigner and sample covariance matrices, we found that the covariance of the Chebyshev polynomials of the first type has the diagonal structure. The variances are different from those found in Borodin's paper for Gaussian matrices. This is in agreement with the results by Anderson and Zeitouni in [2] and by Tao and Vu in [23], which suggest that the linear statistic covariances should depend on the first four moments of matrix entries. However, somewhat surprisingly, we find that for high-degree Chebyshev polynomials the covariance depend only on the variance of matrix entries.