Fluctuation of Eigenvalues for Random Toeplitz and Related Matrices

Consider random symmetric Toeplitz matrices $T_{n}=(a_{i-j})_{i,j=1}^{n}$ with matrix entries $a_{j}, j=0,1,2,...,$ being independent real random variables such that \be \mathbb{E}[a_{j}]=0, \ \ \mathbb{E}[|a_{j}|^{2}]=1 \ \ \textrm{for}\,\ \ j=0,1,2,...,\ee (homogeneity of 4-th moments) \be{\kappa=\mathbb{E}[|a_{j}|^{4}],}\ee \noindent and further (uniform boundedness)\be\sup\limits_{j\geq 0} \mathbb{E}[|a_{j}|^{k}]=C_{k}<\iy\ \ \ \textrm{for} \ \ \ k\geq 3.\ee Under the assumption of $a_{0}\equiv 0$, we prove a central limit theorem for linear statistics of eigenvalues for a fixed polynomial with degree $\geq 2$. Without the assumption, the CLT can be easily modified to a possibly non-normal limit law. In a special case where $a_{j}$'s are Gaussian, the result has been obtained by Chatterjee for some test functions. Our derivation is based on a simple trace formula for Toeplitz matrices and fine combinatorial analysis. Our method can apply to other related random matrix models, including Hankel matrices and product of several Toeplitz matrices in a flavor of free probability theory etc. Since Toeplitz matrices are quite different from the Wigner and Wishart matrices, our results enrich this topic.


Introduction and main results
Toeplitz matrices emerge in many aspects of mathematics and physics and also in plenty of applications, see Grenander and Szegö's book [17] for a detailed introduction to deterministic Toeplitz matrices. The study of random Toeplitz matrices with independent entries is proposed by Bai in his review paper [3]. Since then, the literature around the asymptotic distribution of eigenvalues for random Toeplitz and related matrices is very large, including the papers of Basak and Bose [6], Bose et al. [8,9], Bryc et al. [10], Hammond and Miller [18], Kargin [23], Liu and Wang [26], Massey et al. [27], etc. We refer to [6] for recent progress. However, the study of fluctuations of eigenvalues for random Toeplitz matrices is quite little, to the best of our knowledge, the only known result comes from Chatterjee [11] in the special case where the matrix entries are Gaussian distributions. In this paper we will derive a central limit theorem (CLT for short) for linear statistics of eigenvalues of random Toeplitz and related matrices, for some of which the asymptotic distributions of eigenvalues are also new, including sparse Toeplitz and Hankel matrices and singular values of powers of Toeplitz matrices.
In the literature fluctuations of eigenvalues for random matrices have been extensively studied. The investigation of central limit theorems for linear statistics of eigenvalues of random matrices dates back to the work of Jonsson [22] on Gaussian Wishart matrices. Similar work for the Wigner matrices was obtained by Sinai and Soshnikov [30]. For further discussion on Wigner (band) matrices and Wishart matrices and their generalized models, we refer to Bai and Silverstein's book [4], recent papers [2,11] and the references therein. For another class of invariant random matrix ensembles, Johansson [21] proved a general result which implies CLT for linear statistics of eigenvalues. Recently, Dumitriu and Edelman [15] and Popescu [29] proved that CLT holds for tridiagonal random matrix models.
Another important contribution is the work of Diaconis et al. [14,13], who proved similar results for random unitary matrices. These results are closely connected to Szegö's limit theorem (see [20]) for the determinant of Toeplitz matrices with (j, k) entry g(j − k), where g(k) = 1 2π 2π 0 g(e iθ )e −ikθ d θ, see [12,7] and references therein for connections between random matrices and Toeplitz determinants. Here we emphasize that Szegö's limit theorem implies a CLT. Now we turn to our model. The matrix of the form T n = (a i−j ) n i,j=1 is called a Toeplitz matrix. If we introduce the Toeplitz or Jordan matrices B = (δ i+1,j ) n i,j=1 and F = (δ i,j+1 ) n i,j=1 , respectively called the "backward shift" and "forward shift" because of their effect on the elements of the standard basis {e 1 , · · · , e n } of R n , then an n × n matrix T can be written in the form a j F j if and only if T is a Toeplitz matrix where a −n+1 , · · · , a 0 , · · · , a n−1 are complex numbers [24]. It is worth emphasizing that this representation of a Toeplitz matrix is of vital importance as the starting point of our method. The "shift" matrices B and F exactly present the information of the traces. Consider a Toeplitz band matrix as follows. Given a band width b n < n, let (1.2) η ij = 1, |i − j| ≤ b n ; 0, otherwise.
Then a Toeplitz band matrix is (1.3) T n = (η ij a i−j ) n i,j=1 . Moreover, the Toeplitz band matrix T n can also be rewritten in the form (1.4) T n = bn j=0 a −j B j + bn j=1 a j F j = a 0 I n + bn j=1 a −j B j + a j F j , where I n is the identity matrix. Obviously, a Toeplitz matrix can be considered as a band matrix with the bandwidth b n = n − 1. In this paper, the basic model under consideration consists of n × n random symmetric Toeplitz band matrices T n = (η ij a i−j ) n i,j=1 in Eq. (1.3). We assume that a j = a −j for j = 1, 2, · · · , and {a j } ∞ j=1 is a sequence of independent real random variables such that (1.5) E[a j ] = 0, E[|a j | 2 ] = 1 for j = 1, 2, · · · , (homogeneity of 4-th moments) and further (uniform boundedness) In addition, we also assume a 0 ≡ 0 (we will explain in Remarks 1.3 and 5.5 below!) and the bandwidth b n → ∞ but b n /n → b ∈ [0, 1] as n → ∞. Set A n = Tn √ bn , a linear statistic of eigenvalues λ 1 , · · · , λ n of A n is a function of the form where f is some fixed function. In particular, when f (x) = x p we write (1.9) These ω p , p = 2, 3, . . . , are our main objects. Note that 1 n n j=1 λ p j − E[λ p j ] converges weakly to zero as n −→ ∞, moreover under the condition we have a strong convergence, see [6,23,26]. We remark that the fluctuations are of order 1 √ bn while those for Wigner matrices are of order 1 n , more like the case of classical central limit theorem. This shows that the correlations between eigenvalues for Toeplitz matrices are much weaker than those for Wigner matrices (another different phenomenon is that the limiting distribution for random Toeplitz matrices has unbounded support, see [10,18]). The potential reasons for this phenomenon may come from the fact that the order of the number of independent variables is O(n) for Toeplitz matrices while it is O(n 2 ) for Wigner matrices. On the other hand, the eigenvalues of random Toeplitz matrices are obviously not independent, so it is different from the case of CLT for independent variables.
In special case that the matrix entries a j are Gaussian distributions, by using his notion of 'second order Poincaré inequalities' Chatterjee in [11] proved the following theorem: Theorem ( [11], Theorem 4.5) Consider the Gaussian Toeplitz matrices T n = (a i−j ) n i,j=1 , i.e. a j = a −j for j = 1, 2, · · · , and {a j } ∞ j=0 is a sequence of independent standard Gaussian random variables. Let p n be a sequence of positive integers such that p n = o(log n/ log log n). Let A n = T n / √ n, then, as n → ∞, converges in total variation to N (0, 1).
The CLT also holds for tr(f (A n )), when f is a fixed nonzero polynomial with nonnegative coefficients.
The author remarked that the above theorem is only for Gaussian Toeplitz matrices based on the obvious fact: considering the function f (x) = x, CLT may not hold for linear statistics of non-Gaussian Toeplitz matrices. The author also remarked that the above theorem says nothing about the limiting formula of the variance Var (tr(A pn n )). However, we assert that CLT holds for a test function f (x) = x 2p even for non-Gaussian Toeplitz matrices. When f (x) = x 2p+1 the fluctuation is Gaussian if and only if the diagonal random variable a 0 is Gaussian. Moreover, if we suppose a 0 ≡ 0, we can obtain CLT for any fixed polynomial test functions. On the other hand, we can calculate the variance in terms of integrals associated with pair partitions. Unfortunately, our method fails to deal with the test function f (x) = x pn , where p n depends on n.
Our study is inspired by the work of Sinai and Soshnikov [30], but new ideas are needed since the structure of Toeplitz matrices is quite different from that of Wigner matrices. In addition, our method can apply to other related random matrix models, including Hermitian Toeplitz matrices, Hankel matrices, sparse Toeplitz and Hankel matrices, singular values of powers of Toeplitz matrices, and product of several matrices in a flavor of free probability theory. Now we state the main theorem of our paper as follows.
For every p ≥ 2, we have (1.12) ω p −→ N (0, σ 2 p ) in distribution as n → ∞. Moreover, for a given polynomial we also have in distribution as n → ∞. Here the variances σ 2 p and σ 2 Q will be given in section 4. From the proof of our main theorem, we can easily derive an interesting result concerning product of independent variables whose subscripts satisfy certain "balance" condition. When p = 2, it is a direct result from the classical central limit theorem. Here we state it but omit its proof, see Remark 5.6 for detailed explanation. Corollary 1.2. Suppose that a j = a −j for j = 1, 2, · · · , and {a j } ∞ j=1 is a sequence of independent random variables satisfying the assumptions (1.5)-(1.7). For every p ≥ 2, converges in distribution to a Gaussian distribution N (0, σ 2 p ). Here the variance σ 2 p as in section 4 corresponding to the case b = 0.
where A n (0) denotes the matrix with a 0 = 0. It is easy to see that Here the n is the standard normal distribution, independent of a 0 , and M p−1 is the (p-1)-th moment in Theorem 3.1. Since M p = 0 ⇐⇒ p is odd, if a 0 is neither a constant a.s. nor Gaussian then the fluctuation is not Gaussian for odd p.
The remaining part of the paper is arranged as follows. Some integrals associated with pair partitions are defined in section 2. Sections 3, 4 and 5 are devoted to the proof of Theorem 1.1. We will extend our main result to other models closely connected with Toeplitz matrices in section 6.

Integrals associated with pair partitions
In order to calculate the moments of the limit distribution and the limiting covariance matrix of random variables ω p , we first review some basic combinatorical concepts, and then define some integrals associated with pair partitions. (1) We call π = {V 1 , · · · , V r } a partition of [n] if the blocks V j (1 ≤ j ≤ r) are pairwise disjoint, non-empty subsets of [n] such that [n] = V 1 ∪· · ·∪V r . The number of blocks of π is denoted by |π|, and the number of elements of V j is denoted by |V j |.
(2) Without loss of generality, we assume that V 1 , · · · , V r have been arranged such that s 1 < s 2 < · · · < s r , where s j is the smallest number of V j . Therefore we can define the projection π(i) = j if i belongs to the block V j ; furthermore for two elements p, q of [n] we write p ∼ π q if π(p) = π(q).
(3) The set of all partitions of [n] is denoted by P(n), and the subset consisting of all pair partitions, i.e. all |V j | = 2, 1 ≤ j ≤ r, is denoted by P 2 (n). Note that P 2 (n) is an empty set if n is odd.
(4) Suppose p, q are positive integers and p + q is even, we denote a subset of P 2 (p + q) by P 2 (p, q), which consists of such pair partitions π: there exists 1 ≤ i ≤ p < j ≤ p + q such that i ∼ π j (we say that there is one crossing match in π).
For other cases of p and q, we assume P 2,4 (p, q) is an empty set.
Now we define several types of definitive integrals associated with π ∈ P 2 (p, q) or π ∈ P 2,4 (p, q). For the reader's convenience, we suggest to omit them for the moment and refer to them when they are needed in sections 3 and 4. Let the parameter b ∈ [0, 1].
To every partition π ∈ P 2,4 (p, q), we construct a projective relation between two groups of unknowns y 1 , . . . , y p+q and x 1 , . . . , x p+q 2 −1 as follows: Then two kinds of integrals with Type II are defined respectively by

Mathematical expectation
In this section, we will review some results about the moments of the limiting distribution of eigenvalues in [26], for the convenience of the readers and further discussion.
Let us first give a lemma about traces of Toeplitz band matrices. Although its proof is simple, it is very useful in treating random matrix models closely related to Toeplitz matrices. Lemma 3.2. For Toeplitz band matrices T l,n = (η ij a l,i−j ) n i,j=1 with the bandwidth b n where a l,−n+1 , · · · , a l,n−1 are complex numbers and l = 1, . . . , p, we have the trace formula j l ) and the summation J runs over all possibilities that J ∈ {−b n , . . . , b n } p . Proof. For the standard basis {e 1 , · · · , e n } of the Euclidean space R n , we have Repeating T l,n 's effect on the basis, we have By tr(T 1,n · · · T p,n ) = n i=1 e t i T 1,n · · · T p,n e i , we complete the proof.
We will mainly use the above trace formula in the case where T 1,n = · · · = T p,n = T n . Since a 0 ≡ 0, from Kronecker delta symbol in the trace formula of (3.2), it suffices to consider these J = (j 1 , . . . , j p ) ∈ {±1, . . . , ±b n } p with the addition of p k=1 j k = 0. We remark that a 0 ≡ 0 is not necessary to Theorem 3.1. In fact it is sufficient to ensure Theorem 3.1 if all finite moments of random variable a 0 exist and its expectation is zero.
For J ∈ {±1, . . . , ±b n } p , we construct a set of numbers with multiplicities We call S J the projection of J.
The balanced J's can be classified into three categories. Category 1 (denoted by Γ 1 (p)): J is said to belong to category 1 if each of its components is coincident with exactly one other component of the opposite sign. It is obvious that Γ 1 (p) is an empty set when p is odd.
Category 2 (Γ 2 (p)) consists of all those vectors such that S J has at least one number with multiplicity 1.
Category 3 (Γ 3 (p)) consists of all other balanced vectors in {±1, . . . , ±b n } p . For J ∈ Γ 3 (p), either S J has one number of at least 3 multiplicity, or each of S J has multiplicity 2 but at least two of the components are the same, which are denoted respectively by Γ 31 (p) and Γ 32 (p).
We are now ready to prove Theorem 3.1.
Proof of Theorem 3.1. By Lemma 3.2, we have By the definition of the categories and the assumptions on the entries of the random matrices, we obtain 2 = 0.
Next, we divide 3 into two parts For J ∈ Γ 3 (2k), we denote the number of distinct elements of S J by t. By the definition of the category, we have t ≤ k. Note that the random variables whose subscripts have different absolute values are independent. Once we have specified the distinct numbers of S J , the subscripts j 1 , · · · , j 2k are determined in at most 2 2k k 2k ways. If J ∈ Γ 31 (2k), then t ≤ 2k−1 2 . Again by independence and the assumptions on the matrix elements (1.7), we find We can choose the other at most k − 1 distinct numbers, which determine j p0 = j q0 . This shows that there is a loss of at least one degree of freedom, thus the contribution of such terms is O(n −1 ), i.e.
Since the main contribution comes from the category 1, each term E[a J ] = 1 for For fixed π ∈ P 2 (2k), can be considered as a Riemann sum of the definite integral As in the above arguments, by Lemma 3.2 and the assumptions on the matrix elements (1.7), we have This completes the proof.
Remark 3.4. When b = 0, we can easily get M 2k = 2 k (2k − 1)!!. This is just the 2k-moment of the normal distribution with variance 2, which is also obtained independently by Basak and Bose [6] and Kargin [23]. However, for b > 0 it is quite difficult to calculate M 2k because the integrals in the sum of (3.1) are not all the same for different partitions π's, some of which are too hard to evaluate.

covariance
In this section we evaluate the covariance of ω p and ω q . Recall (4.1) where the summation J,J ′ runs over all balanced vectors J ∈ {±1, . . . , ±b n } p and The main result of this section can be stated as follows: when p + q is even and when p + q is odd.
When p = q we denote σ p,q by σ 2 p , where σ p denotes the standard deviation. From the above theorem, we can obtain the variance of ω Q in Theorem 1.1 By the independence of matrix entries, the only non-zero terms in the sum of (4.2) come from pairs of balanced vectors J = (j 1 , . . . , j p ) and J ′ = (j ′ 1 , . . . , j ′ q ) such that (i) The projections S J and S J ′ of J and J ′ have at least one element in common; (ii) Each number in the union of S J and S J ′ occurs at least two times.
Definition 4.2. Any pair of balanced vectors J = (j 1 , . . . , j p ) and J ′ = (j ′ 1 , . . . , j ′ q ) satisfying (i) is called correlated. If j u ∈ J(j u is a component of J) and |j u | ∈ S J S J ′ , then j u is called a joint point of the ordered correlated pair.
To observe which correlated pairs lead to the main contribution to the covariance, we next construct a balanced vector of dimension (p + q − 2) from each correlated pair J of dimension p and J ′ of dimension q. Although the corresponding map of correlated pairs to such balanced vectors is not one to one, the number of preimages for a balanced vector of dimension (p + q − 2) is finite (only depending on p and q). We will study the resulting balanced vectors in a similar way in section 3.
Proof of Theorem 4.1. Let us first construct a map from the ordered correlated pair J and J ′ as follows. Let j u ∈ J be the first joint point(whose subscript is the smallest) of the ordered correlated pair J and J ′ , and let j ′ v be the first element we construct a vector L = (l 1 , . . . , l p+q−2 ) such that we proceed as in the way above. We call this process of constructing L from J and J ′ a reduction step and denote it by L = J |ju| J ′ . Remark 4.3. From the construction above, for any joint point of J and J ′ a reduction step can be done in the same way. Given θ ∈ S J S J ′ , when saying J θ J ′ , we mean that there exists certain joint point of J and J ′ j u satisfying |j u | = θ and J θ J ′ is the vector after this reduction step. In this section, j u is always the first joint point. While in section 5, j u may denote other joint points which is clear in the context.
Notice that the reduction might cause the appearance of one number with multiplicity 1 in S L , although each number in the union of S J and S J ′ occurs at least two times. If so, the resulting number with multiplicity 1 in S L must be coincident with the joint point j u . In addition, to estimate which terms lead to main contribution to higher moments of tr(A p n ), we will use the reduction steps and mark the appearance of the numbers with multiplicity 1 in section 5.
Next, we assume we have a balanced vector L of dimension (p + q − 2), we shall estimate in how many different ways it can be obtained from correlated pairs of dimensions p and q. First, we have to choose some component l u in the first half of the vector, 1 ≤ u ≤ p such that We also have to choose some component If j u is the joint point of the constructed correlated pair J and J ′ and j ′ v is the corresponding element in J ′ , then the pair {J, J ′ } or {J, −J ′ } is the pre-image of L. Note that since when u = v = 1 the conditions (4.6) and (4.8) are satisfied, the pre-image of L always exists. A simple estimation shows that the number of pre-images of L is at most 2pq, not depending on n (we will see this fact plays an important role in the estimation of higher moments in section 5).
Since there is at most one element with multiplicity 1 in S L , if there is one number with multiplicity 1, then the number will be determined by others because of the balance of L. Consequently, the degree of freedom for such terms is at most . Therefore, the sum of these terms will be O(b −1/2 n ), which can be omitted. Now we suppose each number in S L occurs at least two times. Recall the procedure in section 3, and we know that the main contribution to the covariance (4.2) comes from the L ∈ Γ 1 (p + q − 2), which implies E[ω p ω q ] = o(1) when p + q is odd. When p + q is even, for L ∈ Γ 1 (p + q − 2) the weight So far we have found such terms leading to the main contribution, now we calculate the variance. Based on whether or not the fourth moment appears, we evaluate the covariance. If the fourth moment doesn't appear, then j 1 , . . . , j p , j ′ 1 , . . . , j ′ q match in pairs. In the abstract, by their subscripts they can be treated as pair partitions of {1, 2, . . . , p, p + 1, . . . , p + q} but with at least one crossing match (i.e., P 2 (p, q) as in section 2). Thus, for every π ∈ P 2 (p, q), the summation can be a Riemann sum and its limit becomes f − I (π) ( it is f + I (π) when the first coincident components in J and J ′ have the same sign). On the other hand, if the fourth moment does appear, then j 1 , . . . , j p , j ′ 1 , . . . , j ′ q match in pairs except that there exist a block with four elements. Therefore, from the balance of p k=1 j k = 0 and p k=1 j ′ k = 0, we know that the main contribution must come from such partitions: j 1 , . . . , j p and j ′ 1 , . . . , j ′ q both form pair partitions; the block with four elements take respectively from a pair of j 1 , . . . , j p and j ′ 1 , . . . , j ′ q . Otherwise, the degree of freedom decreases by at least one. Similarly, for every π ∈ P 2,4 (p, q), the corresponding summation can be a Riemann sum and its limit becomes f − II (π) (it is f + II (π) when the first coincident components in J and J ′ have the same sign).
In a similar way as in section 3, noting that the coincident components in J and J ′ may have the same or opposite sign, we conclude with the notations in section 2 that as n −→ ∞. This completes the proof.

5.
Higher Moments of tr(A p n ) Let B n,p denote the set of all balanced vectors J = (j 1 , . . . , j p ) ∈ {±1, . . . , ±b n } p . Let B n,p,i (1 ≤ i ≤ n) be a subset of B n,p such that J ∈ B n,p,i if and only if With these notations, Lemma 3.2 could be rewritten as To finish the proof of Theorem 1.1, it is sufficient to show that given p 1 , p 2 , · · · , p l ≥ 2 and l ≥ 1, as n → ∞ we have where {g p } p≥2 is a Gaussian family with covariances σ p,q = E[g p g q ].
Then a CLT for follows, with the variance The main idea is rather straightforward: in an analogous way to the one used in Eq. (4.2), we will deal with  Remember that two balanced vectors J = (j 1 , . . . , j p ) and J ′ = (j ′ 1 , . . . , j ′ q ) are called correlated if the corresponding projections S J and S J ′ of J and J ′ have at least one element in common.
Definition 5.1. Given a set of balanced vectors {J 1 , J 2 , . . . , J l }, a subset of balanced vectors J mj 1 ,J mj 2 ,. . . ,J mj t is called a cluster if 1) for any pair J mi ,J mj from the subset one can find a chain of vectors J ms , also belongs to the subsets, which starts with J mi ends with J mj , such that any two neighboring vectors are correlated; 2) the subset J mj 1 ,J mj 2 ,. . . ,J mj t cannot be enlarged with the preservation of 1).
It is clear that the vectors corresponding to different clusters are disjoint. By this reason the mathematical expectation in (5.5) decomposes into the product of mathematical expectations corresponding to different clusters. We shall show that the leading contribution to (5.5) comes from products where all clusters consist exactly of two vectors, as is stated in Lemma 5.2 below. where the product ⋆ in (5.6) is taken over l vectors which exactly form a cluster.
For fixed p, all the involved moments no higher than p are O(1) because of uniform boundedness of matrix entries. On the other hand, 0 ≤ l t=1 I Jt ≤ 1. So to prove Lemma 5.2, we just need to count the number of terms in (5.6). As before, to complete the estimation it suffices to replace B n,p,i by B n,p . That is, it suffices to prove where the summation ⋆ is taken over l vectors which exactly form a cluster. Instead of Lemma 5.2, we will prove ).
Notice that we list the condition (iii) which looks redundant from J ∈ {±1, . . . , ±b n } p to emphasize the importance of 0 (i.e., the diagonal matrix entry a 0 ). In fact, if p 1 , p 2 , · · · , p l are all even, even for these J ∈ {0, ±1, . . . , ±b n } p but under the conditions (i) and (ii), we can still get the above estimation.
Proof of Lemma 5.3. The intuitive idea of the proof is as follows: from condition (i) in Lemma 5.3, regardless of the correlating condition, the cardinality of B p (denoted by card|B p | for short) is O(b p 1 +p 2 +···+p l 2 n ). We could say that the freedom degree is . But each correlation means two vectors share a common element so that it will decrease the freedom degree by one. To form a cluster we need l − 1 Thus we obtain the desired estimation in Lemma 5.3. However, unfortunately adding one correlation does not necessarily lead the freedom degree to decrease by one. Some may be redundant. So we have to make use of the correlations more efficiently.
As in section 4, we do the reduction steps as long as the structure of the cluster is preserved. To say precisely, we start from checking J 1 and J 2 . If j u is the first joint point of J 1 and J 2 satisfying the condition that J 1 |ju| J 2 can still form a cluster with the other vectors, then we do this reduction step. If this kind of j u does not exist, we turn to check J 1 and J 3 in the same way. After each reduction step, we have a new cluster of vectors J 1 , J 2 , . . . , J l . We continue to check J 1 and J 2 as before. If we cannot do any reduction step, we stop.
Suppose that we did m reduction steps in total. Then we have a new cluster of vectors J ′ 1 , J ′ 2 , . . . , J ′ l ′ and the dimension of J ′ i is p ′ i for 1 ≤ i ≤ l ′ . From the reduction process, for any θ ∈ S J ′ α 1 S J ′ α 2 , J ′ α1 θ J ′ α2 cannot form a cluster with the other vectors. The resulting cluster still satisfies condition (ii) in Lemma 5.3. However, the condition (i) may fail because the joint point of a pair of correlated vectors could be triple multiplicity, thus after a reduction step its multiplicity becomes one.
Note that after a reduction step the number of pre-images of the resulting vector only depends on the dimensions of the involving vectors, not depending on n. Thus we only need to estimate the degree of freedom of the reduced vectors J ′ 1 , J ′ 2 , . . . , J ′ l ′ . Since after one reduction step the total dimension of vectors will decrease by two and the number of vectors will decrease by one, we have and (5.10) l = l ′ + m.
Denote by l 0 the number of single multiplicity elements in Since one reduction step will add at most one element with single multiplicity, therefore (5.11) l 0 ≤ m.
Below, we will proceed according to two cases: l ′ > 1 and l ′ = 1.
In the case l ′ > 1 To complete the proof of this case, we need some definitions and notations. Let U be the set consisting of all elements which belongs to at least two of and denote the number of vectors in H θ by h θ = card|H θ |. Obviously, {J ′ i |i ≤ l ′ } = θ∈U H θ . We notice the following three facts. Fact 1: For any θ ∈ U, h θ ≥ 3. In fact, from the definition of U, we know h θ ≥ 2. If h θ = 2, the two vectors in H θ could still be reduced to one vector and the reduction doesn't affect their connection with the other vectors, which is a contradiction with our assumption that J ′ 1 , J ′ 2 , . . . , J ′ l ′ cannot be reduced. Fact 2: For any θ ∈ U and J ′ i ∈ H θ , the multiplicity of θ in S J ′ i is one . Otherwise, there are two vectors belonging to H θ , for example, J ′ 1 , J ′ 2 ∈ H θ but S J ′ 1 has two θ's, then J ′ 1 θ J ′ 2 could be a reduction step and J ′ 1 θ J ′ 2 , J ′ 3 , . . . , J ′ l ′ still forms a cluster.
Fact 3: For any different elements θ and γ in l ′ i=1 S J ′ i , card|H θ H γ | ≤ 1. Otherwise, suppose J ′ i and J ′ j belongs to H θ H γ and J ′ k is an element of H θ other than J ′ i and J ′ j . From Fact 1, J ′ k must exist. Now J ′ i θ J ′ k can form a reduction step since J ′ i θ J ′ k could be correlated with J ′ j by γ and other correlations won't be broken.
Choose a minimal dominating set denoted by U 0 , which means that any proper subset of U 0 is not a dominating set. Since U is a finite set, U 0 must exist and Once we have already known the elements of other than θ j , thus θ j will be determined by the balance of J ′ kj . Set then the different way of choice of ). From Eqs. (5.9) and (5.10), we have Note that m − l 0 ≥ 0 from Eq. (5.11). If h > l ′ , we have If h = l ′ and t = 1, the analysis is easy but a little complex. We will deal with it later. Now we focus on the situation that h = l ′ and t > 1. In this case, since . . , J ′ l ′ form a cluster, without lost of generality, we could assume that ( We know that γ ∈ U\U 0 since H θi H θj = ∅ for any 1 ≤ i = j ≤ t. From Fact 3, card|H θi H γ | = 0 or 1.

Given the elements of
, θ i can be decided by the the balance of some J ′ ∈ H θi \H γ . Then γ can be decided by any vector in H γ . Thus we have +l0 n ) ways to decide J ′ 1 , J ′ 2 , . . . , J ′ l ′ . From Eq.(5.12), one gets If h = l ′ and t = 1, which means that θ 1 appears exactly one time in each S Ji (1 ≤ i ≤ l ′ ), we will divide the situation into three subcases. Case I: l 0 > 0. Without loss of generality, we suppose α ∈ J ′ 1 is an element with single multiplicity in If the elements except for θ 1 and α are known, θ 1 could be determined from the balance of J ′ 2 and then α could be determined from the balance of J ′ 1 . So we have ways of choice in sum. From h θ1 = h = l ′ and Eqs. (5.9) and (5.10), we get Case II: l 0 = 0 and m > 0. As case I above, θ 1 is determined by other elements. To determine all the elements ways. From Eqs. (5.9) and (5.10), we have Case III: l 0 = 0 and m = 0.
In this case, we cannot do any reduction step. We claim that there exists γ ∈ S J1 other than γ = θ 1 such that the number of +γ and − γ in γ ∈ S J1 are not equal (when p 1 is even we can always find some γ other than θ 1 such that γ occurs odd times in S J1 ). Otherwise, θ 1 = 0 because of the balance of J 1 , which contradicts condition (iii) in Lemma 5.3.
To determine all the elements except for θ 1 and γ, we have So we can determine θ 1 from the balance of J 2 . Then γ will be determined by the balance of J 1 . Since h θ1 = l, we have Now we have completed the proof in the case of l ′ > 1.
In the case l ′ = 1 We also divide this case into two subcases. Case I ′ : l 0 > 0. Suppose α ∈ J ′ 1 is an element with single multiplicity in If the elements other than α are known, α could be determined from the balance of J ′ 1 . So we have ) ways. From l ′ = 1 and Eqs. (5.9), (5.10) and (5.11), we have Case II ′ : l 0 = 0. In this case, every element in S J ′ 1 appears at least twice. Thus we have totally ways. From l ′ = 1 and Eqs. (5.9) and (5.10), it follows that since l ≥ 3. Now we have completed the proof in the case of l ′ = 1 . The proof of Lemma 5.3 is then complete.
Remark 5.5. From the analysis of case III above, we could also understand why the technical but necessary condition of a 0 ≡ 0 is assumed in the introduction. But when p is even, the assumption of a 0 ≡ 0 is not necessary.
p is odd and the fluctuation can be stated as in Corollary 6.5 (For conciseness we replace b n by n there). From the above point of view and the extended results, our method gives a mechanism of producing a CLT.

Extensions to other models
In this section, we will make use of our method to deal with some random matrix models closely related to Toeplitz matrices, including Hermitian Toeplitz band matrices, Hankel band matrices, sparse Toeplitz and Hankel matrices, generalized Wishart-type Toeplitz matrices, etc. Besides, we will also consider the case of several random matrices in a flavor of free probability theory. Before we start our extensions, we first review and generalize the key procedures or arguments in calculating mathematical expectation (the distribution of eigenvalues), covariance and higher moments in sections 3, 4, 5 respectively as follows: 1)"Good" trace formulae. See Lemma 3.2. The simple formula both represents the form of the matrix and translates our object of matrix entries to its subscripts j 1 , . . . , j p , which are integers satisfying some homogeneous equation. In the present paper, we mainly encounter such homogeneous equations as where τ l can take ±1. We write J = (j 1 , . . . , j p ).
2)"Balanced" vectors via some homogeneous equations. We can generalize the concept of balance: a vector J = (j 1 , . . . , j p ) is said to be balanced if its components satisfy one of finite fixed equations with the form of (6.1). From the balance of a vector, one could determine an element by knowing the other ones and solve one of the equations.
3) Reduction via a joint point. Eliminate the joint point from two correlated vectors in some definite way, and one gets a new balanced vector. 4) Choose a minimal dominating set by which the freedom degree can be reduced case-by-case. We should particularly be careful about 0.
The CLT is essentially a consequence of the fact that we can omit the terms who have a cluster consisting of more than two vectors. Thus if the argument in section 5 is valid, the CLT is true. In the following models, we will establish a good trace formulae in each case and then balanced vector and reduction step can be defined in a natural way. With these equipment the analysis in section 5 is still valid after a small adaption.
The mathematical expectation and covariance vary from case to case. But the way to calculate them is similar with each other. We will define some suitable integrals associated with pair partitions similar to those in section 2, after which we can state the results concisely.
We will only state main results but not give their proofs in detail, because the proofs would have been overloaded with unnecessary notations and minor differences. When necessary, we might point out some differences of proofs.
6.1. Hermitian Toeplitz band matrices. The case is very similar to real symmetric Toeplitz band matrices, except that we now consider n-dimensional complex Hermitian matrices T n = (η ij a i−j ) n i,j=1 . We assume that Re a j = Re a −j and Im a j = −Im a −j for j = 1, 2, · · · , and {Re a j , Im a j } j∈N is a sequence of independent real random variables such that and further (uniform boundedness) In addition, we also assume a 0 ≡ 0 and the bandwidth b n → ∞ but b n /n → b ∈ [0, 1].
Theorem 6.1. With above assumptions and notations, Theorem 1.1 also holds for random Hermitian Toeplitz band matrices.
Remark 6.2. The distribution of eigenvalues for this case has been proved to be the same as real case in [26]. The covariance like in (4.2) is slightly different from real case because E[a 2 j ] = 0, which is given by f − II (π) + f + II (π) . (6.5) 6.2. Hankel band matrices. A Hankel matrix H n = (h i+j−1 ) n i,j=1 is closely related to a Toeplitz matrix. Explicitly, let P n = (δ i−1,n−j ) n i,j=1 the "backward identity" permutation, then for a Toeplitz matrix of the form T n = (a i−j ) n i,j=1 and a Hankel matrix of the form H n = (h i+j−1 ) n i,j=1 , P n T n is a Hankel matrix and P n H n is a Toeplitz matrix. In this paper we always write a Hankel band matrix H n = P n T n where T n = (η ij a i−j ) n i,j=1 is a Toeplitz band matrix with bandwidth b n and the matrix entries a −n+1 , · · · , a 0 , · · · , a n−1 are real-valued, thus H n is a real symmetric matrix.
For Hankel band matrices, as in Toeplitz case we also have a trace formula and its derivation is similar, see [26]. From the above trace formula, our method can apply to the case that p is even. We consider random Hankel matrices satisfying the following assumptions: assuming that {a j : j ∈ Z} is a sequence of independent real random variables such that (homogeneity of 4-th moments) and further (uniform boundedness) In addition, we also assume the bandwidth b n → ∞ but b n /n → b ∈ [0, 1] as n → ∞.
Theorem 6.4. Let H n be a real symmetric ((6.6)-(6.8)) random Hankel band matrix with the bandwidth b n , where b n /n → b ∈ [0, 1] but b n → ∞ as n → ∞. Set . Then (6.10) ζ p −→ N (0,σ 2 p ) in distribution as n → ∞. Moreover, for a given polynomial q j x j with degree p ≥ 1, set , we also have (6.13) ζ Q −→ N (0,σ 2 Q ) in distribution as n → ∞. Here the variancesσ 2 p andσ 2 Q will be given below. We remark that the limit of 1 n tr(H n / √ b n ) p can be calculated in the same way as in Toeplitz case, see also [26]. Next, we derive briefly the variancesσ 2 p andσ 2 Q . By Lemma 6.3, rewrite (6.14) Since a j and a −j are independent when j = 0, S J , the projection of J, should be J itself (forget the order of its components).
The only difference is the definition of balance and projection. Using the same technique in section 4, we get 4(2p,2q) g II (π) (6.16) as n → ∞.
Here for a fixed pair partition π, we construct a projective relation between two groups of unknowns y 1 , . . . , y 2p+2q and x 1 , . . . , x p+q as follows: If π ∈ P 2 (2p, 2q), let If π ∈ P 2,4 (2p, 2q), let Because of the existence of the characteristic function in the integrals of type I and II, we see that g I (π) = 0 if and only if Denote the subset of P 2 (2p, 2q) consisting of this kind of π by P I 2 (2p, 2q). g II (π) = 0 if and only if Denote the subset of P 2,4 (2p, 2q) consisting of this kind of π by P II 2,4 (2p, 2q). From the definition of balance, if we denote V i = {i 1 , i 2 , i 3 , i 4 } to be the block with four elements and 1 ≤ i 1 < i 2 ≤ 2p < i 3 < i 4 ≤ 2q, then i 1 + i 2 and i 3 + i 4 must be odd. Moreover, j + k is odd provided j, k ∈ V i and i ∼ π j.
From the discussion above, we get π∈P II 2,4 (2p,2q) g II (π), (6.20) Similar to Corollary 6.5, we get another central limit theorem for product of independent random variables. Corollary 6.5. Suppose that {a j : j ∈ Z} is a sequence of independent random variables satisfying the assumptions (6.6)-(6.8). For every p ≥ 1, Remark 6.6. The derivation of higher moments can be calculated in the same way as in section 5. Careful examination shows that the argument in case III in section 5 (the existence of γ) is the only part depending on the definition of balance. However, since 2p and 2q are even, the dimensions of all vectors involved are even. Remark 5.5 still applies so that one needn't care much about the definition of balance. Moreover, a 0 needn't equal to 0. The same thing happens in section 6.4.

Sparse Toeplitz and Hankel matrices.
A sparse matrix is a matrix in which some entries are replaced by 0, which occurs in some application background. A sparse random matrix provides a more natural and relevant description of the complex system in nuclei physics, we refer to [4] for an introduction. A sparse Toeplitz matrix can be expressed as follows. Let we define (6.24) ξ ij = 1, |i − j| ∈ B n ; 0, otherwise.
Then a sparse Toeplitz matrix is defined by (6.25) T n = (ξ ij a i−j ) n i,j=1 , and the corresponding Hankel matrix is H n = P n T n as before. Note that for Hankel matrices we can define a more general case with minor adaptations: . . , ±(n − 1)} and (6.26) ξ ij = 1, i − j ∈ B n ; 0, otherwise.
In addition, we assume that there exists a sequence {b n }with b n With the above assumptions, we claim: Theorem 6.7. Under the same conditions therein, Theorems 1.1 and 6.1 also hold for sparse random Toeplitz matrices, and Theorem 6.4 also holds for sparse random Hankel matrices.
We remark that in sparse matrix case, the only difference is integral interval of variables, for example, the moments in Eq. (3.1) become 6.4. Singular values of powers of Toeplitz matrices. For a real or complex (random) Toeplitz band matrix T n = (η ij a i−j ) n i,j=1 with the bandwidth b n , writing T * n for the adjoint matrix of T n , we consider the product T * s n T s n of powers with some s ∈ {1, 2, . . . , }. When the Toeplitz T n is replaced by a n × n matrix, the norm of product of random matrices in the above form was studied in [5], when investigating the limiting behavior of solutions to large systems of linear equations. Recently, the limiting distribution of the product of random matrices has been studied in [1] and its moments is known as Fuss-Catalan numbers. Set For real Toeplitz case, we assume that {a j : j ∈ Z} is a sequence of independent real random variables satisfying ((6.6)-(6.8)). For complex Toeplitz case, we assume that {Re a j , Im a j } j∈Z is a sequence of independent real random variables such that (6.30) E[a j ] = 0, E[|a j | 2 ] = 1 and E[a 2 j ] = 0 for j ∈ Z, (homogeneity of 4-th moments) and further (uniform boundedness) In addition, we also assume the bandwidth b n → ∞ but b n /n → b ∈ [0, 1] as n → ∞. Note that we have the trace formulas as in Eq. Here J = (j 1 , . . . , j 2ps ) ∈ {−b n , . . . , b n } 2ps , I J = 2ps k=1 χ [1,n] (i + k l=1 (−1) [ l−1 s ]0 j l ) and a J = 2ps l=1ã j l . [x] 0 denotes the largest integer not larger than x, and (6.34)ã j l = a j l , if [ l−1 s ] 0 is even; a j l , otherwise.
On the other hand, we remark that when s = 1 the real Toeplitz case reads (6.35) tr (W (1) where H n denotes the Hankel matrix as in Theorem 6.4. Analogous to Theorem 6.4, we have the following theorem for real and complex Toeplitz matrices: Theorem 6.8. Under the above assumptions, for a given polynomial in distribution as n → ∞. Hereσ 2 Q denotes the variance. Remark 6.9. Because all the involved vectors have even dimensions, according to Remark 6.6, the assumption of a 0 = 0 is not necessary.
We can evaluate the varianceσ 2 Q as in the Hankel case, but its expression is very redundant so we omit it. Here we only gives the limit of p-moment, i.e., Here for a given partition π ∈ P 2 (2ps), the unknowns {y 1 , . . . , y 2ps } and {x 1 , . . . , x ps } satisfies the relation: (6.39) y i = y j = x π(i) whenever i ∼ π j. each of which consists of s 1's or s (-1)'s. Then the pair partitions satisfying (6.40) correspond to the pair matches of these 2ps 1 or -1 such that each pair has one 1 and one -1. The total number of this kind of partitions is (ps)!. Thus, in the special case b = 0, the p-th moment in (6.38) equals 2 ps (ps)!. By the property of Γ-function, one knows that 2 ps (ps)! is the p-th moment of the density f (x) = 1 2s x 1−s s exp(− 1 2 x 1 s ) on [0, ∞). However, unfortunately when s > 1 the density f (x) is not uniquely determined by its moments, see [16]. Thus we cannot say that the distribution of eigenvalues of W (s) n converges weakly to f (x). 6.5. Product of several Toeplitz matrices. In free probability theory, one usually considers the limit of joint moments of several independent matrices [19,28]. Based on this point of view, the first version [25] of [26] by two of the authors, provided the limit joint distribution of several independent Toeplitz and Hankel matrices. In the present paper, we will prove Gaussian fluctuation of the joint moments for symmetric Toeplitz matrices (the same result proves right for Hermitian Toeplitz case and Hankel matrices ). More precisely, given independent symmetric Toeplitz band matrices T 1,n , T 2,n , . . . , T r,n , we study the limit and fluctuation of tr(T i1,n T i2,n · · · T ip,n ) for 1 ≤ i 1 , i 2 , . . . , i p ≤ r.
Theorem 6.11. Let T 1,n , T 2,n , . . . , T r,n be independent real symmetric random Toeplitz band matrices, each of which satisfies the assumptions in Theorem 1.1. with the above notations, we have n tr(T i1,n T i2,n · · · T ip,n ) − E[tr(T i1,n T i2,n · · · T ip,n )] .