SPECTRUM OF RANDOM TOEPLITZ MATRICES WITH BAND STRUCTURE

This paper considers the eigenvalues of symmetric Toeplitz matrices with independent random entries and band structure. We assume that the entries of the matrices have zero mean and a uniformly bounded 4th moment, and we study the limit of the eigenvalue distribution when both the size of the matrix and the width of the band with non-zero entries grow to inﬁnity. It is shown that if the bandwidth / size ratio converges to zero, then the limit of the eigenvalue distributions is Gaussian. If the ratio converges to a positive limit, then the distributions converge to a non-Gaussian distribution, which depends only on the limit ratio. A formula for the fourth moment of this distribution is derived.


Introduction
In recent interesting papers, (Bryc et al., 2006) and (Hammond and Miller, 2005), it was shown that the distribution of Toeplitz matrices with i.i.d. entries converges to a universal law as the size of the matrix grows. The limit law is not Gaussian, which is all the more surprising if we compare this result with the result of (Bose and Mitra, 2002) which says that the limit law is Gaussian for circulants, a closely related ensemble of matrices. This comparison suggests that the difference in the behavior of the limit law is due to entries in the upper-right and lower-left corners of the matrices. For this reason, it is interesting to study what happens if the entries in these corners are set equal to zero, that is, if the matrices have a band structure. It turns out that if the ratio of the band width to the matrix size decreases to zero as the matrices grow, then it is possible to adjust the matrices in such a way that the resulting matrix is a circulant, and the effect of the modification on the eigenvalue distribution is negligible. This allows us to show that in this case the limit eigenvalue distribution is Gaussian. In contrast, if the ratio of the band width to the matrix size approaches a positive limit as the size of the matrix grows, then the limit eigenvalue law is not Gaussian. The proof of this result is by the method of moments and by an explicit calculation of the kurtosis of the limit law. It turns out that it is different from the Gaussian value of 3 for all width/size ratios except one. This exceptional 412 DOI: 10.1214/ECP.v14-1492 ratio needs a calculation of higher moments. Numerical estimation suggests that even in this case the limit distribution is not Gaussian. Besides papers mentioned above, the spectrum of random Toeplitz matrices is investigated in (Bose and Sen, 2007), which studies the case when the entries of Toeplitz matrices are noncentered, in (Massey et al., 2007), where the main interest is in random Toeplitz matrices with palyndromic structure, and in (Meckes, 2007), which studies the norm of random Toeplitz matrices. In addition, (Chatterjee, 2009) shows that the fluctuations of the eigenvalue distribution around the limit are Gaussian. The rest of the paper is organized as follows. Section 2 is about the case when the ratio of the band width to the matrix size decreases as the size of the matrix grows, and Section 3 is about the case when this ratio tends to a non-zero limit.

Vanishing band to size ratio
We consider finite symmetric Toeplitz matrices that have the band structure. In other words, we assume that entries of an N -by-N symmetric matrix X N , which we denote X i j , depend only on the difference i − j and that they are zero if the difference is large, i.e. X i j = a |i−j| and a k = 0 if k > M . It is understood that a k depends on N . We will suppress the dependence in order to make the notation less cumbersome. If λ 1 , . . . , λ N are eigenvalues of the matrix X N , counted with multiplicities, then we define the empirical probability measure of eigenvalues as We are interested in the behavior of measures µ N as both N and M grow to infinity. Let the cumulative distribution function of µ N be denoted as F N (x).
Our first result is that if the size of the matrix grows faster than the width of the band, then the spectral distributions F N (x) converge to the Gaussian distribution.
Our strategy of proof is by relating eigenvalue distributions of Toeplitz matrices and circulants. Namely, we will modify the upper-right and lower-left corners of the matrix X N so that the resulting matrix Y N is a circulant and the eigenvalue distribution of Y N is close to the eigenvalue distribution of X N . See (Gray, 1972) for an explanation of this method in the deterministic setting. We will adapt this method to the case of random Toeplitz matrices. Let us introduce a measure of distance between probability distributions (cf. (Trotter, 1984)). If F (x) and G (x) are two distributions, then the distance in distribution is defined by where the infimum is over all random variables X and Y, which are defined over the same probability space and which have the distributions F and G. The usefulness of this distance stems from the fact that if two d F N , G N → 0, then for every continuous function f (x) . For the proof, see (Trotter, 1984). We are going to use this result in the following way. We will take the spectral distribution of Y N as G N , and we will show that d F N , G N → 0 in probability. This implies that for each t, in probability. Note that F N (t) and G N (t) are positive random variables bounded by 1. For each N , these two random variables are defined on the same probability space. This fact, and the convergence in probability in (1) imply that Hence, in order to prove the theorem it will remain to show that and that Var G N (x) → 0.
In order to prove that d F N , G N → 0 in probability, we use a result from (Hoffman and Wielandt, 1953). Let the Frobenius norm of a matrix A be defined by where N is the size of the matrix A.
Lemma 2.2. Let Λ (X ) and Λ (Y ) be empirical distributions of eigenvalues of matrices X and Y. Then For proof see (Hoffman and Wielandt, 1953) or (Trotter, 1984).
For random matrices X N and Y N , we compute: This expression converges to zero as N → ∞. By using the independence of random variables a i , we can also compute Hence, if we use the uniform boundedness of E M a i 4 and the equality E a i It follows that X N − Y N 2 2 and, therefore, d F N , G N converges to zero in probability. It remains to investigate G N (t) , the empirical distribution function of eigenvalues of Y N . These matrices are circulants and we can use the results and methods of (Bose and Mitra, 2002) in order to show that E G N 2x converges pointwise to the standard Gaussian law and Var G N (x) → 0. For completeness, we give a detailed argument. The eigenvalues of Y N can be computed explicitly as Let us write the random function G N (x) as follows: denotes the integer which is closest to x with ties broken in favor of smaller integers.
where c (ǫ) does not depend on k, M , or N .
Proof: Let θ k denotes 2πk/N . Formula (2) implies that each λ k,N is the sum of M +1 independent random variables with zero mean and variance equal to 4 M cos 2 lθ k = 2 M 1 + cos 2lθ k for l = 1, ..., M , and equal to 1/M for l = 0. The variance of this sum is which implies that Var λ k,N → 2 as N and M grow to infinity. By the Central Limit Theorem, λ k,N converges to the Gaussian distribution with zero mean and variance 2. Since by asssumption the 4th moment of M a k is uniformly bounded, therefore the Berry-Esseen bound is applicable and we can conclude that for those k, which satisfy the condition {2k/N } > ǫ > 0, inequality (3) is true. QED.
Using the lemma, we can write Hence, for every δ > 0 we can choose ǫ sufficiently small and then M sufficiently large and ensure that 1 This shows that as M and N grow, E G N 2x converges pointwise to the standard Gaussian law. The next step is to investigate the variance/covariance structure of b k,N = 1 (−∞,x] λ k,N . Hence, we need to estimate the expressions: Again we can take an arbitrarily small ǫ > 0 and require that {(k − l) /N } > ǫ. An argument as in the proof of Proposition 10.3.2 on p. 344 in (Brockwell and Davis, 1991) shows that the distribution of λ k,N , λ l,N is asymptotically Gaussian with the limit covariance matrix proportional to the identity matrix. Moreover, by assumption the 4th moment of M a k is uniformly bounded, and therefore by the multivariate Berry-Esseen theorem, the speed of the convergence to the Gaussian law is bounded by c (ǫ) / M . Hence, the difference in (4) can be bounded from above by c (ǫ) / M . This implies that for sufficiently large M the covariance of b k,N and b l,N can be made arbitrarily small (≤ δ) for all pairs k and l, such that {(k − l) /N } > ǫ. Therefore, for all sufficiently large N . Since δ is an arbitrary, we can conclude that Var G N (x) → 0 as N and M grow to infinity. This completes proof of Theorem 2.1.

Positive band to size ratio
Now let us ask what happens if the ratio of the band width to the matrix size remains constant as the size of the matrix grows.
for every positive integer n. The distribution Ψ c (x) is non-Gaussian for all c = 2 1 4 .
As we will see in the process of proof, the variance of the limit distribution is 2−1/c, and its fourth moment is 12 − 32 3 1 c if c > 2, and − 4 3 c 2 + 8c − 4, if c ≤ 2. It is very likely that the distribution is non-Gaussian for all c. In particular, numerical evaluations suggest that if c = 2 1 4 , then the 6th moment of the limit distribution is very close to 400/27 = 15 − 5/27, and therefore, the limit distribution is not Gaussian. (This result is obtained by a Monte-Carlo evaluation of the volumes of polytopes P (π) defined below. The standard error of the Monte-Carlo estimate of the 6th moment is 0.003.) Before we start the proof, let us develop some machinery of the method of moments. Clearly, We can write N −1 Tr X k as follows: where a, b, . . . , y, z are k numbers that take values between 1 and N . They show the row position of a particular entry X a b . We will call these numbers "positions", so, for example, a is the starting position. The indices x i are differences of positions. We will call them "step sizes" since they show how the "positions" of elements X i j change. They also have an additional meaning since they show the diagonal in which the entry X i j is located. In particular, the random variable a x i is non-zero only if the step size x i is between −M and M . Write: N −1 Tr X k = N −1 a x 1 ,...,x k θ a, x 1 , . . . , x k a x 1 a x 2 . . . a x k−1 a x k , where the summation is over all a between 1 and N , and all x 1 , . . . , x k such that every x i is between −M and M . Here θ a, x 1 , . . . , x k = 1 if a and x 1 , . . . , x k determine a valid sequence of  a, b, . . . , y, z, and θ a, x 1 , . . . , x k = 0 otherwise.
In the following we write − → x for the sequence x 1 , . . . , x k . Let E k denote the value of the limit lim N →∞ E N −1 Tr X k . Since the distributions of a i are symmetric by assumption, therefore E k = 0 if k is odd. Consider E 2k . Let π denote a partition of the 2k indices of the elements x i , such that indices i and j belong to the same block if and only if x i = x j or x i = −x j . By an argument as in (Bryc et al., 2006) and (Hammond and Miller, 2005), it is easy to see that only partitions with two-element blocks give a non-negligible contribution to E 2k as M and N approach infinity and M /N approaches a constant. These partitions are pairings of 2k indices x i . Moreover, (Bryc et al., 2006) and (Hammond and Miller, 2005) showed that another simplification is possible. It is as follows. If x i is in the same pair as x j , then it must be that x i = x j or x i = −x j . In the first case, we will write ǫ τ x i = ǫ τ x j = 1, where τ denote the pairing. In the second case, we will write ǫ τ x i = ǫ τ x j = −1.
It turns out that we can restrict attention to the case in which ǫ τ x i = −1 for all i. The contribution of other sequences is negligible. Let − → x = x 1 , . . . , x 2k ∈ τ mean that the sequence x 1 , . . . , x 2k agrees with the pairing τ and that the choice of ǫ τ is ǫ τ x i = −1 for all x i . We can think about the position a and the sequence x 1 , . . . , x 2k as a random walk on the integer lattice 2 . The horizontal axis denotes time and the vertical axis denotes the position of a particle. The random walk starts at the point (0, a) and at each step makes a jump of size x i . Note that pair a, − → x is valid (i.e., θ a, − → x = 1) if and only if the random walk stays between the bounds of 1 and N . In order to show non-Gaussianity of the limit distribution, it is sufficient to compute the limits of the second and the fourth moments. It turns out that for c > c 0 the limit kurtosis is greater than 3, and for c < c 0 the kurtosis is smaller than 3. Proof: We consider two cases. In the first case, c ≥ 2. In this case, we need to consider three possible regions for the initial position a. The first one is 1 ≤ a ≤ M . The second region is M < a < N − M , and the third one is N − M ≤ a ≤ N . If the initial position is in the first region, then x 1 can take values from −a + 1 to M , and x 2 = −x 1 . Hence the total number of valid combinations a, − → x is asymptotically equivalent to By symmetry, this is also true for the third region. For the second region, x 1 can take values from −M to M , and the total number of combinations a, − → x is asymptotically equivalent to Hence the total number of all possible a, − → x for all regions is asymptotically equivalent to (2c − 1) M 2 .
We should divide this by N M ∼ c M 2 . Hence, we obtain that the variance of µ N converges to 2 − 1/c. In the second case, c < 2. Then again we have three possible regions for the initial position. The first one is 1 ≤ a ≤ N − M , the second one is N − M < a < M , and the third one is M ≤ a < N . For the first region, x 1 can be between −a + 1 and M . Hence, the number of possible combinations a, − → x is The same estimate holds for the third region by symmetry. For the second region, x 1 can be between −a + 1 and N − a. Hence, we estimate the number of possible combinations a, − → x as Hence, the total number of all possible a, − → x is After dividing by M N , we find that the variance of µ N converges to 2 − 1/c. QED. Proof: Let c ≥ 4. We have five essentially different regions for the initial position a: In addition, we have 3 different pairings: The proof proceeds by straightforward computations in each of the possible cases. We start with the first pairing. In region 1, the variables x 1 and x 3 can take values from −a + 1 to M , and the variables x 2 and x 4 are determined by x 1 and x 3 . In regions 2, 3, and 4, the variables x 1 and x 3 (Note that the calculations for regions 2 and 3 are the same as for the second pairing, while the calculation for region 1 is different.) Regions 4 and 5 are symmetric to regions 2 and 1, respectively. Hence, the total number of possible (a, x) combinations for the third pairing is asymptotic to The total number of all possible (a, x) combinations for all pairings is Dividing this by N M 2 , we obtain the limit of the fourth moment of the distribution µ N is Proof: We know that for odd k, E N −1 Tr X k = 0. We have also computed the limit for k = 2 and 4 in the lemmas above. For arbitrary even k, we have the following argument. Let k = 2l. We have shown above that to compute E N −1 Tr X k = 0, we can restrict attention to those terms in the expansion of the trace that correspond to pairings of 2l indices. All other terms give asymptotically negligible contribution. Consider a pairing π of 2l indices. We will think about it as a product of l disjoint transpositions in the symmetric group S 2l operating on numbers {1, 2, . . . , 2l} . For this pairing we define a 2l-by-l matrix M π in the following way: Let us order the transpositions in π according to the smallest elements in these transpositions. For example, for π = (16)(23)(45), the first transposition is τ 1 = (16), the second one is τ 2 = (23), and the third one is τ 3 = (45). Let the k-th transposition be τ k = (i k , j k ) where i k < j k . Then define matrix elements M i k k = 1 and M j k k = −1. Set all other entries in the k-th column of matrix M π equal to zero. This defines the matrix M π . Next define another matix M π as follows. Let T is a 2l-by-2l lower-triangular matrix such that T i j = 1 if i ≥ j. Define M π := T M π . Intuitively, the first row of M π is the same as the first row of M π . The second row of M π is the sum of the first and the second row of M π , the third row of M π is the sum of the first, second, and the third rows of M π , and so on. Consider the space l+1 , where each point has coordinates a, x 1 , ..., x l . Let P N (π) be the convex polytope in l+1 defined by the following set of 3l inequalities (not all of which are independent from others): where 1 k denotes vector with k components all equal to 1. It is easy to see that the meaning of these inequalities is that the positions of a random walk with the starting position a and steps x 1 , ..., x l are all between 1 and N and that the steps are restricted to be between −M and M . Note that these are the same requirements that we need to impose to make sure that θ a, − → x = 1 in (6). This implies that the contribution of terms corresponding to the pairing π towards E N −1 Tr X k equals the number of integer points that belong to the polytope P N (π) multiplied by N −1 M −l . Let P N (π) be the polytope P N (π) rescaled by 1/M . Then, as N → ∞ the polytopes P N (π) converge to a convex polytope P (π) , which is defined by inequalities: Hence the number of terms inside the polytope P N (π) can be estimated as M l+1 multiplied by the volume of P (π) and the error of this estimate is O M l . Therefore, E N −1 Tr X k → 1 c π∈P 2 (2l) Vol (P (π)) .

QED.
Proof of Theorem 3.1: Lemma 3.4 shows the convergence of the expectations of the moments of the eigenvalue distributions F N (x) . The limits are smaller or equal than the moments of a Gaussian distribution which we would obtained if we set all θ a, − → x equal to 1 in formula (6). This implies the first claim of the theorem. Lemmas 3.2 and 3.3 imply that the kurtosis of the limit distribution is different from 3 for c = 2 1 4 . Hence, if c = 2 1 4 , then the limiting distribution is not Gaussian. QED.