Another look at the moment method for large dimensional random matrices

The methods to establish the limiting spectral distribution (LSD) of large dimensional random matrices includes the well known moment method which invokes the trace formula. Its success has been demonstrated in several types of matrices such as the Wigner matrix and the sample variance covariance matrix. In a recent article Bryc, Dembo and Jiang (2006) [7] establish the LSD for the random Toeplitz and Hankel matrices using the moment method. They perform the necessary counting of terms in the trace by splitting the relevant sets into equivalent classes and relating the limits of the counts to certain volume calculations. We build on their work and present a uniﬁed approach. This helps provide relatively short and easy proofs for the LSD of common matrices while at the same time providing insight into the nature of diﬀerent LSD and their interrelations. By extending these methods we are also able to deal with matrices with appropriate dependent entries .


Introduction
Random matrices with increasing dimension are called large dimensional random matrices (LDRM). They have appeared in many different areas of sciences -high dimensional data analysis, communication theory, dynamical systems, number theory, finance, combinatorics, diffusion processes, just to name a few.
In this article we deal with only real symmetric matrices so that all their eigenvalues are real. If λ is an eigenvalue of multiplicity m of an n × n matrix A n , then the Empirical Spectral Measure puts mass m/n at λ. Note that if the entries of A n are random, then this is a random probability measure. If λ 1 , λ 2 , . . . , λ n are all the eigenvalues, then the empirical spectral distribution function (ESD) F An of A n is given by Let {A n } be a sequence of square matrices with the corresponding ESD {F An }. The Limiting Spectral Distribution (or measure) (LSD) of the sequence is defined as the weak limit of the sequence {F An }, if it exists. If {A n } are random, the limit is in the "almost sure" or "in probability" sense.
For several matrices, the LSD is known to exist. There are several existing methods to establish such limits. One of the most fundamental methods is the method of moments: suppose {Y n } is a sequence of random variables with distribution functions {F n } such that E(Y h n ) → β h for every positive integer h, and {β h } satisfies Carleman's condition (see Feller, 1966, page 224) [10]: Then there exists a distribution function F , such that for all h, and {Y n } (or equivalently {F n }) converges to F in distribution. We will often in short write β h when the underlying distribution F is clear from the context.
Suppose {A n } is such a sequence of matrices, and let (by a slight abuse of notation) β h (A n ), for h ≥ 1, denote the h-th moment of the ESD of A n . Suppose there is a sequence of nonrandom Before we discuss further details of the moment method and our unified approach, we describe some random matrices found in the literature on LDRM. The LSD of these matrices are given elsewhere in this article. The first five matrices are scaled by n −1/2 to have a nontrivial LSD. Let {x 0 , x 1 , . . .} be a sequence of real random variables with mean zero and variance σ 2 . We take σ = 1 without loss. Let N denote the set of all positive integers and Z ≥ the set of all nonnegative numbers.
1. Toeplitz matrix. The n × n random (symmetric) Toeplitz matrix T n with inputs {x i } is the matrix whose (i, j)-th entry is x |i−j| . So it is given by . . x n−2 x n−1 x 1 x 0 x 1 . . . x n−3 x n−2 x 2 x 1 x 0 . . . x n−4 x n−3 . . .
2. Hankel Matrix. Similarly, the (symmetric) Hankel matrix with inputs {x i } is the matrix whose (i, j)-th entry is x i+j .
Note that we have used an unconventional indexing, for our convenience. It may be noted that for the above two matrices, the (i, j)-th entry satisfies where L : N 2 → Z ≥ is a function independent of n. The concept of such a link function is going to be very crucial in our arguments.
Again, this is an unconventional indexing. The (i, j)-th element of the matrix is x (i+j) mod n .
4. Symmetric Circulant. The symmetric version of the usual circulant matrix may be defined as . .
So, the (i, j)-th element of the matrix is given by x n/2−|n/2−|i−j|| .
5. Palindromic matrices. These may be defined as (symmetric) matrices where the first row is a palindrome. So, for example consider the symmetric Toeplitz matrix defined earlier. Impose the restriction that the first row is a palindrome. This imposes other restrictions on the entries and we obtain the following palindromic Toeplitz matrix, see Massey, Miller and Sinsheimer [15], 2006) [MMS]: . .
It may be noted that the n × n principal minor of P T n+1 is SC n . Likewise we may define the palindromic version of the other matrices.
The next two matrices are the most studied matrices in the theory of random matrices. Here the input is a double sequence. Though the entries can easily be reindexed as a single sequence, for certain reasons we prefer to deal with double sequences of input for these matrices.
x 1n x 2n x 3n . . . x n(n−1) x nn 7. Sample covariance matrix (S matrix). Suppose {x jk : j, k = 1, 2, . . .} is a double array of i.i.d. real random variables with mean zero and variance 1. In LDRM literature, the matrix S n = n −1 X n X T n where X n = ((x ij )) 1≤i≤p,1≤j≤n is called a sample covariance matrix (in short an S matrix). We do not centre the matrices at the sample means as is conventional in statistics literature. This however, does not affect the LSD due to the scaling factor n −1 in (10).
To avoid problems of nonexistence of moments and to keep the discussion simple, assume that the random entries are uniformly bounded. This assumption may be relaxed to the existence of the first two moments by resorting to suitable truncation arguments (see Theorem 2). Note that the h-th moment of the ESD of an n × n matrix A n , with eigenvalues λ 1 , λ 2 , . . . , λ n has the following nice form: Thus the major hurdle in establishing (M1) is computing at least the leading term in the expansion: E[Tr(A h n )] = E 1≤i 1 ,i 2 ,...,i h ≤n a i 1 ,i 2 ,n a i 2 ,i 3 ,n · · · a i h−1 ,i h ,n a i h ,i 1 ,n where a i,j,n denotes the (i, j)-the element of A n .
We need to do likewise for (M2) and (M4). These computations are available in the literature on a case by case basis. Every new matrix is a new (often difficult) combinatorial counting problem. Nevertheless, the moment method has been successfully applied for the Wigner, the sample covariance and the F matrices, and recently also for Toeplitz, Hankel and Markov matrices. See Bai (1999) [2] for some of the arguments in connection with Wigner, sample covariance and F matrices. For the arguments concerning Toeplitz, Hankel and Markov matrices see Bryc, Dembo and Jiang (2006) [7] [BDJ] and Hammond and Miller (2005) [12]. For the palindromic Toeplitz matrix, see [MMS] [15].
Suppose that {A n } are defined through an input sequence {x i } as in the first five matrices or through an input double sequence as in the last two matrices. Our primary goal is to present a unified way of dealing with such matrices.
Assume that the inputs are uniformly bounded, independent, with mean zero and variance σ 2 . We first argue that in general, for matrices with suitable structure (that is with appropriate link function(s), L, say), the summands which have three or more powers of any inputs are indeed negligible in (M1). See Lemma 1. We also establish a general method to deal with (M4). See Lemma 2.
Thus only those summands in the trace formula potentially contribute where any input appears exactly twice. It also then follows from (M1) that the limit will depend solely on σ 2 and the nature of the link function (which determines the number and nature of contributing terms) but not on the distribution of the entries. Moreover, if a few entries of the matrix are different in nature from the rest, then that will not affect the LSD (see Theorem 1).
Then we show how the problem of existence of the limit of the contributing summands may be approached using the volume method of [BDJ] [7]. Using the link function, we define an equivalence relation between the terms and the total sum is split into two iterated sums. One of these is a finite sum (over the equivalence classes, called words) and for any fixed word, the limit of the other sum is calculated. This leads to the identification of different types of words, and their roles. Depending on the nature of the matrix, only certain words contribute positively. For some matrices, this contribution is constant across contributing words. For some others, each word contributes positively and in a nonconstant fashion (see Table 1).
It is worth noting that under suitable restriction on the L function(s), Carleman's condition is automatically satisfied (see Theorem 3).
Then we utilise and extend our general framework to deal with dependent entries. We establish some new LSD with dependent entries in Theorems 10 and 11 via suitable extensions of the earlier Lemmata (see Lemma 9 and Lemma 10).

Main results
We first prove two robustness results (Theorems 1 and 2) in Section 2. These show that in general we may assume that the entries are uniformly bounded and the same LSD persists if we change a few entries of the matrix.
In Section 3 we introduce the unified method. Lemma 1 essentially shows that only square terms contribute to the limit moments and Lemma 2 helps in verifying (M4). Theorem 3 shows that under suitable assumption on the link function, the existence of LSD and a.s. convergence of the moments are equivalent for uniformly bounded inputs. It also implies Carleman's condition and the sub Gaussian nature of the LSD.
We then apply our general approach to provide relatively short and easy proofs of the LSD of the matrices discussed above. The details for the Wigner matrix is discussed in Section 4.1. The S matrix is discussed in Section 4.2: the case when p/n → y = 0 is presented in Section 4.2.1 and the case when p/n → 0 is in Section 4.2.2. It turns out that the so called "Catalan words" are the only contributing words in these cases.
Next we deal with the Toeplitz and Hankel matrices in Section 4.3. We provide a succinct presentation of the [BDJ] [7] arguments in Section 4.3.1. All possible words contribute to the Toeplitz limit but we discover that only "symmetric words" matter for the Hankel limit. Section 4.3.2 develops interesting results on the moments of the Toeplitz and Hankel limits.
As a byproduct, a quick proof of the unboundedness of the Hankel LSD is given in Section 4.3.3.
The volume method is applied to reverse circulant in Section 4.4 and in particular again only symmetric words contribute.
Section 4.5 deals with matrices which have Gaussian limits. These include the symmetric circulant, palindromic Toeplitz and Hankel and, a doubly symmetric Hankel matrix. For the symmetric circulant, we show that all words contribute one each to the limit, and hence its LSD is Gaussian. We exploit its close connection to the other three matrices to conclude that all these matrices also have Gaussian limits.
Finally, in Section 5 we discuss an extension of our approach to suitable dependent inputs. We do not aim to provide fully general results but discuss a few specific situations. We establish that, the same LSD obtains if the input sequence is either a mean zero variance 1 i.i.d. sequence sequence with mean zero and variance 1 (see Theorem 10). We also show that the LSD for Toeplitz and Hankel matrices exist when the input sequence is X i = ∞ j=0 a j ǫ i−j where {ǫ j } is i.i.d. with mean zero and variance 1 (see Theorem 11) and when {a j } is a suitable sequence. We provide their limit moments in terms of the limit moments when the input sequence is i.i.d.

Robustness of LSD
Most of the general results will be stated and proved when the {x i } are i.i.d. However, it must be intuitively clear that if we change a "few" of the entries then that should not alter the LSD. In our first theorem, we prove one such general robustness result. The reader will be able to establish more general non i.i.d. versions by simple modifications of our arguments.
The moment method arguments given later, crucially use the assumption of boundedness of the random variables. In the second theorem of this section, we will justify why we may always assume this condition without loss of generality. Once this is established, then in the rest of the paper we invariably provide the arguments only for the bounded case.
For the proof of the following two Theorems, we shall use the bounded Lipschitz metric. It is defined on the space of probability measures as: Recall that convergence in d BL implies the weak convergence of measures.
We need the following two facts. Fact 1 is an estimate of the metric distance d BL in terms of trace. A proof may be found in Bai and Silverstein (2006) [3] or Bai (1999) [2] and uses Lidskii's theorem (see Bhatia, 1997, page 69) [5]. Fact 2 is the well known Cauchy's interlacing inequality (see Bhatia, 1997, page 59) [5] and its consequence.
(a) (Interlacing inequality) Suppose C is an n × n symmetric real matrix with eigenvalues As a consequence, if A, B are n × n symmetric real matrices then (b) Suppose A and B are p × n real matrices. Let X = AA T and Y = BB T . Then Theorem 1. Let {A n } be a sequence of p × n random matrices and let {E n } be a sequence of perturbation matrices of same order where the nonnull entries are from a triangular array of random variables {ǫ i,n : (a) Suppose p = n, {A n , E n } are symmetric and the LSD of {n −1/2 A n } is F almost surely. If either of (H1) or (H2) below holds, the LSD of {n −1/2 (A n + E n )} is also F almost surely.
(b) Suppose p n → y ∈ (0, ∞) as n → ∞ and n −1 A n A T n has LSD G almost surely. Suppose further that (pn) −1 Tr(A n A T n ) is almost surely bounded. If either of (H1), (H2) below holds, the LSD of {n −1 H n H T n } is also G almost surely where H n = A n + E n .
Now suppose (H2) holds. Let E n be the matrix obtained from E n replacing all occurrences of ǫ i,n with ǫ i,n I(ǫ i,n ≤ w(n)) for 1 ≤ i ≤ k n . Then, On the other hand, by the rank inequality, We will show rank(En− e En) n → 0 almost surely. Let p in = P(|ǫ i,n | > w(n)) and P n = kn i=1 p in . Note that rank(E n − E n ) ≤ number of nonzero entries in (E n − E n ), which is bounded by α n kn i=1 I(|ǫ i,n | > w(n)). By Bernstein's inequality (see Chow and Teicher(1997) [9]), we have, for any δ > 0, where b n = nδ−αnPn αn . Using the given conditions, b n ≥ K log n for some large K and exp(−b 2 n /2[P n + b n ]) ≤ Cn −(1+∆) for some C, ∆ > 0. The result now follows from Borel-Cantelli Lemma.
The above arguments may be modified to prove (b). We give the main steps only. First assume (H1) holds. Then since (pn) −1 Tr(E n E T n ) → 0 a.s. (under (H1)) and, (pn) −1 Tr(A n A T n ) = O a.s (1). This proves (b) when (H1) holds. Now suppose (H2) holds. Let E n be the matrix obtained from E n replacing all occurrences of ǫ i,n with ǫ i,n I(ǫ i,n ≤ w(n)) for 1 ≤ i ≤ k n . Noting that (pn) −1 Tr( E n E T n ) → 0 a.s., same inequalities as above yield, On the other hand, by the rank inequality, Now we may proceed as in the proof of part (a). Note that here the matrices E n and E n are not symmetric but the above arguments remain valid. We omit the details. 2 Remark 1. Suppose {ǫ i,n } is row wise independent and sup i,n E |ǫ i,n | 4+η < ∞ for some η > 0.
Then the almost sure boundedness in (H1) holds by the SLLN of row-wise independent random variables.
Now we prove a theorem showing that we may work with bounded random variables. Let {A n } be a sequence of p × n random matrices whose input is either a single or a double sequence of Theorem 2. Let A n , k n and α n be as above such that k n → ∞ and k n α n = O(n 2 ).
(a) Let p = n, {A n } be symmetric. Suppose for every bounded, mean zero and variance one i.i.d. input sequence (double sequence), F n −1/2 An converges to some fixed nonrandom distribution F a.s. Then the same limit continues to hold if the input sequence is i.i.d. with variance one.
(b) Let p/n → y ∈ (0, ∞) as n → ∞. Suppose for every bounded, mean zero and variance one i.i.d. input sequence (double sequence), F n −1 AnA T n converges to some fixed nonrandom distribution G a.s. Then the same limit continues to hold if the input sequence is i.i.d. with variance one.
Proof. We will only consider the case when the input is a single sequence {x 0 , x 1 , x 2 , . . .}. The other case is similar.
By rank inequality, subtracting a rank one matrix E(x 0 )11 T if necessary, we can assume {x i } have mean zero. For t > 0, denote Since E(x 0 ) = 0 and E(x 2 0 ) = 1, we have µ(t) → 0 and σ(t) → 1 as t → ∞ and σ 2 (t) ≤ 1. Define bounded random variables It is easy to see that E(x 2 0 ) = 1 − σ 2 (t) − µ(t) 2 → 0 as t tends to infinity. Further, {x * i } are i.i.d. bounded, mean zero and variance one random variables. Let us denote the matrix A n constructed from this sequence {x * i } i≥0 by A * n . We now prove (a). By triangle inequality and (13), Now, by the hypotheses on A n , using strong law of large numbers, we get and the proof of (a) is complete.
To prove (b), let {x * i }, {A * n } be as above. Now we need to use inequality (14) and the facts that there exists C > 0 such that almost surely, lim sup n (np) −1 Tr(A n A T n ) ≤ C, lim sup n (np) −1 Tr(A * n (A * n ) T ) ≤ C ∀t > 0 and lim sup t→∞ lim sup n (np) −1 Tr(B n B T n ) = 0 where B n = A n − σ(t)A * n . The rest of the argument is as above. We omit the details. 3 The volume method: a unified approach As we pointed out earlier, the entries of the matrices are taken from an input sequence. We now develop this common thread in a systematic manner. As discussed, we will assume the input sequence to be independent and uniformly bounded with mean zero and variance one unless otherwise mentioned.
Link function L. Let d be a positive integer. Let L n : {1, 2, . . . n} 2 → Z d ≥ , n ≥ 1 be a sequence of functions symmetric in its arguments such that L n+1 (i, j) = L n (i, j) whenever 1 ≤ i, j ≤ n.
The sequence of matrices {A n } under consideration will always be such that the (i, j)-th entry of A n is of the form a i,j = x Ln(i,j) . To simplify notations we shall write L n = L and call it the link function. Likewise we will by abuse of notations write N 2 as the common domain of {L n }. The link function for the different matrices discussed so far are: (ii) Toeplitz matrix: L : (iii) Hankel matrix: L : (vi) Sample variance covariance matrix: Here the relevant link is a pair of functions given by: We will primarily consider matrices where we need only one link function. The approach is easily modified to include matrices such as the S, which requires two L functions.
The length l(π) of π is taken to be h. A circuit depends on h and n but we will suppress this dependence. Define The conditions (M1) and (M4) may be written in terms of circuits. For example, π: π circuit E X π → β h . Matched Circuits: Any value L(π(i − 1), π(i)) is said to be an L-value of π, and π is said to have an edge of order e (1 ≤ e ≤ h) if it has an L-value repeated exactly e times. If π has at least one edge of order one then E(X π ) = 0. Thus only those π with all e ≥ 2 are relevant. Such circuits will be said to be L-matched (in short matched ). For any such π, given any i, there is at least one j = i such that L(π(i − 1), π(i)) = L(π(j − 1), π(j)).
To deal with (M2) or (M4), we need multiple circuits. The following notions will be useful: k circuits π 1 , π 2 , · · · , π k are said to be jointly matched if each L-value occurs at least twice across all circuits. They are said to be cross-matched if each circuit has at least one L-value which occurs in at least one of the other circuits.
The following extension of L-matching will be used, specially when the inputs are dependent.
, π(j)) = 0. The L matching introduced earlier is obtained as a special case with f (x) = x. The concepts of jointly matching and cross matching can be similarly extended.
Words: Equivalence classes may be identified with partitions of {1, 2, · · · , h}: to any partition we associate a word w of length l(w) = h of letters where the first occurrence of each letter is in alphabetical order. For example, if h = 5, then the partition {{1, 3, 5}, {2, 4}} is represented by the word ababa.
The class Π(w): Let w[i] denote the i-th entry of w. The equivalence class corresponding to w will be denoted by The number of partition blocks corresponding to w will be denoted by |w|. If π ∈ Π(w), then clearly, #{L(π(i − 1), π(i)) : The notion of order e edges, matching, nonmatching for π carries over to words in a natural manner. For instance, ababa is matched. The word abcadbaa is nonmatched, has edges of order 1, 2 and 4 and the corresponding partition is {{1, 4, 7, 8}, {2, 6}, {3}, {5}}. As pointed out, it will be enough to consider only matched words. The total number of matched words having order 2 edges with |w| = k equals (2k)! 2 k k! . The class Π * (w): Define for any (matched) word w, Note that Π * (w) Π(w). However, as we will see, Π * (w) is equivalent to Π(w) for asymptotic considerations, but is easier to work with.
The L function for all of the above cases satisfy the following property. This property will be crucially used in the proof of the LSD of different matrices.
An example of an (L, f ) which does not satisfy Property B is, L(i, j) = min(i, j) and f (x) = x.
The following Lemma helps to reduce the number of terms in step (M1). We need the version with general f to tackle dependent inputs later. Let N h,3 + be the number of (L, f ) matched circuits on {1, 2, . . . , n} of length h with at least one edge of order ≥ 3.
As a consequence, as n → ∞, (b) Suppose {A n } is a sequence of n × n random matrices with input sequence {x i } which is uniformly bounded, independent, with mean zero variance 1 and (L, f ) with f (x) = x satisfies Property B. Then (ii) if h = 2k, then w has only order 2 edges lim n 1 n 1+k |Π * (w) − Π(w)| = 0 (22) and provided any of the last two limits below exists, Proof. Relation (19) is proved in [BDJ] [7] for L-functions corresponding to Toeplitz and Hankel matrices when f (x) = x. A careful scrutiny of their proof reveals that the same argument works for the present case. Relation (20) is an immediate consequence.
Relation (22) follows immediately since if π ∈ Π * (w) \ Π(w) then π must have an edge of order at least 4 and we can apply (20).
From mean zero and independence assumption (provided the limit below exists), By Holder's inequality, . Therefore, from part (a), matched circuits which have edges of order three or more do not contribute to the limit in (24). So, provided the limit below exists, w has only order 2 edges lim n 1 n 1+h/2 |Π(w)| = w has only order 2 edges Taking h = 2k establishes (23).
If h = 2k + 1, note that then there cannot be any matched word of odd length with only order 2 edges. Hence from (25), proving (22). The proof is now complete. 2 Define, for each fixed matched word w of length 2k with |w| = k, whenever the limit exists. For any fixed word, this limit will be positive and finite only if the number of elements in the set is of exact order n k+1 . From Lemma 1, it follows that then the limiting (2k)-th moment is the finite sum Theorem 3 may be used to verify Carleman's condition for {β 2k }.
The next Lemma helps to verify (M4). Let Q h,4 be the number of quadruples of circuits (π 1 , π 2 , π 3 , π 4 ) of length h such that they are jointly matched and cross-matched with respect to (L, f ).
Proof. Again, relation (27) is proved in [BDJ] [7] for L-functions corresponding to Toeplitz and Hankel matrices when f (x) = x and the same proof works here. We omit the details.
To prove (b), we write the fourth moment as If (π 1 , π 2 , π 3 , π 4 ) are not jointly matched, then one of the circuits, say π j , has an L-value which does not occur anywhere else. Also note that E X π j = 0. Hence, using independence, Further, if (π 1 , π 2 , π 3 , π 4 ) is jointly matched but is not cross matched then one of the circuits, say π j is only self-matched, that is, none of its L-values is shared with those of the other circuits.
. Therefore by part (a), proving the Lemma completely.
2 We now explore the question of when the limit in (26) exists.
Vertex and generating vertex: Any π(i) is called a vertex. This vertex is said to be generating if either i = 0 or w[i] is the position of the first occurrence of a letter. For example, if w = abbcab then π(0), π(1), π(2), π(4) are generating vertices. By Property B of L given earlier, a circuit of a fixed length is completely determined, up to a finitely many choices by its generating vertices. Obviously, number of generating vertices in π is |w| + 1. Hence we obtain the simple but crucial estimate |Π * (w)| = O(n |w|+1 ).

Remark 2.
Indeed, in all of the examples given so far, it turns out that we can find a set Π * * (w) ⊆ Π * (w) so that (i) lim n n −(k+1) |Π * (w)| − |Π * * (w)| = 0 and (ii) for any π ∈ Π * * (w) if the generating vertices are fixed then there is at most one choice for the other vertices. So, p(w) ≤ 1. As a further consequence, β 2k ≤ (2k)! 2 k k! . Thus the limit moments are dominated by the Gaussian moments (see also Table 1). A general result for link functions satisfying Property B is given in Theorem 3.
In that case, satisfies Carleman's condition and hence G is sub Gaussian. Even when the limit in (30) does not exist, the ESD of {n −1/2 A n } is tight almost surely and any subsequential limit L satisfies all the properties of G listed above.
Proof. For convenience, we write F n for F n −1/2 An . LetF n = E(F n ) be the expected empirical spectral distribution function. Since Property B holds, we immediately have the following: generating vertices. Having fixed the generating vertices arbitrarily, we have at most ∆(L, f ) choices for each of the remaining k vertices. Thus, On the other hand, it is easy to see that for any word w, if |w| < k then since the number of generating vertices is less than k, |Π * (w)| = O k (n k ) where the O k term may involve k.
Combining, we get for all k ≥ 1 and by (c) we have even moments of F n converge to corresponding moments of G a.s.
for all k ≥ 1 and hence G satisfies Carleman's condition. The rest of the proof is now immediate by the standard moment method arguments. 2 Remark 3. It easily follows from the above theorem that in all the theorems in the next section which establish LSD for different matrices, the moments of the ESD (and also their expectations) converge to the corresponding moments of LSD almost surely if the inputs are uniformly bounded.
Each matrix has its own contributing words: Of all the matrices we have introduced so far, the symmetric circulant, the palindromic Toeplitz, the palindromic Hankel and the doubly symmetric Hankel matrices achieve the full limit. That is, p(w) = 1 for all words. For all other matrices, only certain words contribute in the limit. Below we define two special type of words which arise.
Catalan words: A matched word w with l(w) = 2k and |w| = k will be called a Catalan word if (i) there is at least one double letter and (ii) sequentially deleting all double letters eventually leads to the empty word. For example, aabbcc, abccbdda are Catalan words whereas abab, abccab are not. The following result is well known.
Proof. Let us sketch a random walk proof which will be useful while discussing the S matrix.
Mark the first occurrence of any letter by +1 and the second occurrence by −1, obtaining a sequence (u 1 , u 2 , · · · , u 2k ) of +1 and −1 of length 2k. For example, the Catalan words abba and abccbdda are represented respectively by (1, 1, −1, −1) and (1, 1, 1, −1, −1, 1, −1, −1). This actually provides a bijection between the Catalan words of length 2k and sequences (u 1 , u 2 , · · · , u 2k ) satisfying the following three conditions: there are equal number of +1 and −1 in the sequence, (III) l 1 u i ≥ 0 ∀ l ≥ 1. We omit the details. By reflection principle, the total number of such paths is easily seen to be (2k)! (k+1)!k! . 2 Symmetric words: A matched word w of length 2k with |w| = k is symmetric if each letter occurs exactly once in an odd and exactly once in an even position. Otherwise it is called asymmetric. The following result is immediate.
Lemma 4. There are k! symmetric words of length 2k and any Catalan word is a symmetric word.
As we shall see later, to achieve non trivial LSD, as n → ∞ (and p → ∞), the scaling for all the matrices is n −1/2 except S. For S, the scaling is 1 when p/n → y = 0, ∞. The centering in all the above cases is zero. If p/n → 0, then the S matrix has a scaling of n p and a centering I p . The role of different words will be clear once we deal with the different matrices in details. Table 1 summarizes the results. We provide the value of p(w) for different types of matched w. The last column gives the limiting moments.

Examples
In the previous section we have developed a general theory based on the volume method of [BDJ] [7]. We now proceed to apply this machinery to different matrices introduced in Section 1.

Wigner matrix: the semicircular law
The semi-circular law W arises as the LSD of n −1/2 W n . It has the density function All its odd moments are zero. The even moments are given by Wigner (1955) [20] assumed the entries {x i } to be i.i.d. real Gaussian and established the convergence of E(ESD) of n −1/2 W n to the semi-circular law (32). Assuming the existence of finite moments of all orders, Grenander (1963, pages 179 and 209) [11] established the convergence of the ESD in probability. Arnold (1967) [1] obtained almost sure convergence under the finiteness of the fourth moment of the entries. Bai (1999) [2] has generalised this result by considering Wigner matrices whose entries above the diagonal are not necessarily identically distributed and have no moment restrictions except that they have finite variance.
Recall that the entries of the Wigner matrix satisfies j)).
This L function satisfies Property B with f (x) = x.
We are now ready to state and give a short proof of the LSD of the Wigner matrix.
Theorem 4. Let {w ij : 1 ≤ i ≤ j, j ≥ 1} be a double sequence of independent random variables with E(w ij ) = 0 for all i ≤ j and E(w 2 ij ) = 1 which are either (i) uniformly bounded or (ii) identically distributed. Let W n be an n × n Wigner matrix with the entries w ij . Then with probability one, F n −1/2 Wn converges weakly to the semicircular law W given in (32).
Proof. In view of Theorems 2 and 3, it is enough to assume (i) and prove the required almost sure weak convergence.
For any matched word w, there are k such constraints. Since each constraint is either R1 or R2, there are at most 2 k choices in all. Let λ denote a typical choice of k constraints and Π * λ (w) denote the subset of Π * (w) corresponding to λ and so we have the following finite disjoint representation: Fix w and λ. For π ∈ Π * λ (w), consider the graph with (2k + 1) vertices π(0), π(1), · · · , π(2k). By abuse of notation, π(i) thus denotes both, a vertex and its numerical value. Vertices within the following pairs are connected with a single edge: (i) the pairs (π(i − 1), π(j − 1)) and (π(i), π(j)) if w[i] = w[j] yields constraint R1.
So, the graph has a total of (2k + 1) edges. These may include loops and double edges. In any connected component, the numerical values of the vertices are of course equal.
Clearly, the number of generating vertices for π = the number of connected components and this is bounded by (k + 1). Now suppose λ is such that the graph has (k + 1) connected components. By pigeon hole principle, there must exist a vertex say π(i) which is connected to itself. Note that this is possible if and only if w[i] = w[i + 1] and R2 is satisfied. But this implies that there is a double letter w[i]w[i + 1] in w. Consider the reduced word w ′ of length 2(k − 1) after removing this double letter. We claim that the reduced word still has a double letter.
To show this, in the original graph, coalesce the vertices π(i − 1) and π(i + 1). Delete the vertex π(i) and remove the R2 constraint edges (π(i − 1), π(i + 1)) and (π(i), π(i)) but retain all the other earlier edges. For example, any other edge that might have existed earlier between π(i − 1), π(i + 1) is now a loop. This gives a new graph of degree 2 with 2k + 1 − 2 = 2(k − 1) + 1 vertices and has k connected components. Proceeding as before, there must be a self-edge implying a double letter yy in w ′ . Proceeding inductively, after k steps, we are left with just a single vertex with a loop. In other words, w is Catalan and all constraints are R2.
Conversely, it is easy to verify that if w is Catalan and all constraints in λ are R2, then the number of connected components in the graph is indeed (k + 1). To see this, essentially retrace the steps given above. First identify a double letter. This gives an R2 constraint. Remove it and proceed inductively. For example, coalesced vertices will fall in the same connected component. We omit the details.
Denote by λ 0 the case when all constraints are R2. Note that On the other hand suppose w is Catalan and not all constraints are R2, or, w is not Catalan and λ is arbitrary. Then the corresponding graph has at most k connected components and hence Now, (34) follows by combining (35), (36) and (37), and the proof is complete. 2 Corollary 4.1. Let {w ij : 1 ≤ i ≤ j, j ≥ 1} be a double sequence of independent random variables such that {w ij : i < j} are i.i.d. with Var(w 12 ) = 1. Suppose there exists ǫ n → 0, such that n 1/2 ǫ n → ∞ and n i=1 P(|w ii | ≥ n 1/2 ǫ n ) = o(n). Let W n be an n × n Wigner matrix with the entries w ij . Then with probability one, F n −1/2 Wn converges weakly to the semicircular law.
Proof of Corollary 4.1. Letw ii be an i.i.d. sequence (independent of {w ij : i < j}) having same distribution as that of w 12 . Let W n be the matrix obtained from W n replacing diagonal entries w ii withw ii . Now W n is a standard Wigner matrix and F n −1/2 f Wn converges to semicircular law.
Take E n and E n as diagonal matrices with entries {−w ii } and {w ii } respectively. Then (H2) of Theorem 1 is satisfied with k n = n, α n = 1 and w(n) = n 1/2 ǫ n . Writing W n = W n +E (1) n +E (2) n and appealing to Theorem 1 twice, the corollary follows. 2 Remark 4. If the diagonal entries are i.i.d. µ where µ is an arbitrary probability measure and the off-diagonal entries are i.i.d. ν with variance 1 then W continues to be the LSD (see Bai and Silverstein, 2006) [3]. The weakest known general condition under which the convergence to semicircle law holds was given by Pastur (1972) [16] (see Bai(1999) [2] or, Chatterjee (2005) [8] for a completely new approach). It states that if {w ij } 1≤i,j≤n are independent mean zero, variance one random variable depending on n such that then W continues to be the LSD. The above corollary supplements all these results.

Remark 5.
When p/n → y > 0, if {x ij } are i.i.d. with mean zero and variance 1, the same limit in Theorem 5(a) above continues to hold. This is due to Theorem 2(b).
But when p/n → 0, the i.i.d. unbounded case does not fall under the set up of Theorem 2 (b) and the finite level truncation used in Theorem 2 is not the optimum one. It requires different levels of truncation and more refined analysis. See Bai (1988). His arguments show that if the fourth moment of the underlying variables is finite, then the limit in Theorem 2 (b) continues to hold. For simplicity we have dealt with only the bounded case. In a separate paper we will deal with general matrices of the form XX ′ where we shall discuss this issue in more details.
The moments of the Marčenko-Pastur law with ratio index y > 0 are given by (see Bai, 1999 [2] or Bai and Silverstein, 2006 [3]), The following Lemma establishes the connection between these moments and the semicircle moments. This result is also available in Jonsson (1982) [13].
Lemma 5. For every positive integer k, Proof of Lemma 5. Observe that the left side of the above expression equals the coefficient of The latter is indeed the right side of the above expression. If y = 1 (for example if p = n) then β k (M P y) = β 2k (W ) ∀ k ≥ 1.
Suppose X ∼ semicircular law and Y ∼ Marčenko-Pastur law (with y = 1), then from the above, Y D = X 2 . If y ≤ 1 then we still have the inequality, β k (M P y ) ≤ β 2k (W ).
A circuit π now has the non uniform range 1 ≤ π(2m) ≤ p, 1 ≤ π(2m + 1) ≤ n. It is said to be matched if it is matched within the same Li, i = 1, 2 or across.
In the arguments that follow, we shall profitably use some of the earlier developments for the Wigner matrix. To do this, for any given word w, let Π(w) be the possibly larger class of circuits with the range 1 ≤ π(i) ≤ max(p, n), 1 ≤ i ≤ 2k. Likewise define Π * (w).
Thus from (41), we have a nontrivial contribution in the limit only when w is Catalan and hencē λ = λ 0 and we need to compute lim n −k p −1 |Π * λ 0 (w)|.
Incidentally, we already know that |Π * λ 0 (w)| ∼ [max(p, n)] k+1 for every Catalan word. The difference now is that in Π * λ 0 (w), the circuits are of non-uniform range. The number of Catalan words is now different from the Wigner case and the limit will depend on the word. We proceed as follows.
Suppose the Catalan word w is generated by (t + 1) independent even vertices (with range p) and (k − t) independent odd vertices (with range n), 0 ≤ t ≤ k − 1. Note that for such a word, Let M t,k = the number of Catalan words with (t + 1) independent even vertices and (k − t) independent odd vertices. Then To obtain M t,k , recall the relation between Catalan words and sequences of +1 and −1 given in proof of Lemma 3 of Section 3. Now note that a +1 in the even position produces an independent even vertex. So to get (t + 1) even independent vertices, we should look for those sequences having exactly t many +1 in the even positions. Thus M t,k = number of sequences of length 2k satisfying (I)-(III) (given in proof of Lemma 3 in Section 3) and which also have t many +1 in the even positions. Claim.
Thus, we conclude This proves the convergence of the moments of the expected ESD.
Note that (M4) does not follow directly from the development of Section 3 since S matrix has two link functions. Nevertheless, we have seen above its close connection to Wigner matrices and (M4) can be easily verified by appropriate modifications of those arguments. We omit the details. This proves the almost sure convergence of the ESD and proof of Part (a) is now complete once we establish the above Claim.
Proof of Claim, based on reflection principle. Each valid sequence represents a simple random walk starting at origin (0, 0) and ending at (2k, 0) such that there are (i) exactly t upward moves in k of the even steps and (ii) no part of the walk is below the x-axis. This implies that u 1 = +1 and u 2k = −1.
First we count the number of paths without considering the constraint (ii). There are k−1 t many ways of choosing t many plus ones from (k − 1) even steps and for each of them there are another k−1 t many ways of choosing t many minus ones from (k − 1) odd steps (see Case 1 in Table 2). So, total number of such choices is k−1 t 2 .
Again from the estimate (44), we must have |V | = 2m + 1, |E| = 2m, |V 2 | = m and |V 1 | = m + 1. Now, observe the following: (a) |V 2 | = m implies a pair partition of odd vertices. Denote it by a word w of length k. So, (b) Each pair in E must occur exactly twice.
(d) Note that (b) and (c) together imply that E X π = 1. Now suppose, and they are different from the rest of the odd vertices. If we fixed a word w, then independent of w, there are exactly N 1 (n) = n(n − 1) . . . (n − m + 1) choices of odd vertices satisfying the pairing imposed by w.
By (45) and (b), they have to be matched in pairs among themselves. Also, (c) rules out the possibility that the first pair is matched with the second and, third is matched with fourth. So the other two combinations are the only possibilities. It is easy to verify that this is the same as saying that for the even vertices, L(π(2i − 2), π(2i)) = L(π(2j − 2), π(2j)) where L is the Wigner link function.
Let π * (i) = π(2i). This is a circuit π * of length k. Equation (46) says that the circuit π * is L-matched. Let Π * (w)= all circuits π * satisfying Wigner link function and let Note that depending on whether w is or is not Catalan, This proves the convergence of the expected ESD for the bounded case. We again omit the verification of (M4). That would establish the almost sure convergence of the ESD.

Toeplitz and Hankel matrices with independent inputs
[BDJ] [7] established the LSD of Topelitz and Hankel matrices when the input is i.i.d. with finite second moment. See also Hammond and Miller (2005) [12]. We may prove the following version.
Theorem 6. Suppose {x i } are independent with mean zero and variance 1 which are either (i) uniformly bounded or (ii) identically distributed. Then with probability one, the ESD of n −1/2 T n and n −1/2 H n converge weakly as n → ∞ to nonrandom symmetric measures, say T and H respectively.

Remark 6.
[BDJ] [7] also prove that T and H have unbounded support and H is not unimodal. The first six moments of T are 0, 1, 0, 8/3, 0 and 11. Recently Hammond and Miller (2005) [12] obtained some useful bounds for the moments of the ESD. In particular they show that Note that the first term on the right above is the Gaussian (2k)-th moment. Also, compare this with the general bound (31).

The volume method proof
As always, without loss, assume variables are independent but uniformly bounded. Then we may appeal to Theorem 2 to conclude the result for the unbounded i.i.d. case. We reproduce in brief the volume method arguments of [BDJ] [7]. Though they assume the i.i.d. structure, their arguments go through for the present case of uniformly bounded independent inputs.
Recall that the L-functions for the Toeplitz and Hankel matrices are respectively given by L(i, j) = |i − j| and L(i, j) = i + j, and they both satisfy Property B.
Further, they showed that where {β 2k (T )} satisfies Carleman's condition (this also follows immediately from our general Theorem 3).
They proved that To obtain an expression for β 2k (T ), they proceeded as follows. For each matched word w with |w| = k, they showed that (the first equality below follows from Lemma 1), Let v i = π(i)/n. The number of elements in Π * * (w) can then be expressed as Since 1 n 1+k |Π * * (w)| is nothing but the (k + 1) dimensional Riemann sum for the function (49) and Similar arguments will lead to the conclusion We can proceed similarly to write v i as a linear combination L H i (v S ) of generating vertices for all i ∈ S. As before, realizing n −(k+1) |Π * (w)| as a Riemann sum we can conclude that it converges to the expression given below. We emphasize that unlike in the Toeplitz case, we do not automatically have L H 2k (v S ) = v 2k for every word w. Combining all these observations, we have and β 2k (H) = w matched, l(w)=2k,|w|=k p H (w).
In the next section, we investigate the moments of T and H and their interrelations.  (c) If w is not a symmetric word, then p H (w) = 0.  (b) Suppose w is a Catalan word of length 2k and its set of generating vertices is denoted by S. Note that it is enough to show, for both Toeplitz and Hankel, that for each j ∈ S, ∃ i ∈ S, i < j

Toeplitz and Hankel moments
To prove the above assertion, we apply induction on |w|. So assume that the above claim holds for all words w ′ such that |w ′ | < |w|. Let i and i + 1 be the positions of the first occurrence of a double letter xx in w. It implies that i − 1, i ∈ S and i + 1 ∈ S. Recall the notation v i = π(i)/n. Note that, for both Toeplitz and Hankel, the restriction due to the presence of this double letter xx translates into the following: v i−1 = v i+1 and at the same time v i is arbitrary (between 0 and 1).
Clearly w can be written as w = w 1 xxw 2 where the word w ′ := w 1 w 2 is again a Catalan word with |w ′ | = k − 1. Further, note that v = (v 1 , v 2 , . . . , v i−1 , v i+2 , . . . , v 2k ) satisfies the (k − 1) equations imposed by w ′ . Also, if v s in v corresponds to a generating vertex in w ′ , it also corresponds to a generating vertex of w (of course w has an extra generating vertex corresponding to v i ). Since |w ′ | < |w|, by assumption the assertion holds for w ′ . It is now easy to see that it also holds for w.
(c) We will check that v 0 = L H 2k (v S ) iff w is a symmetric word. Thus if w is not a symmetric word, p H (w), will be the integral of a k dimensional object in a (k + 1) dimensional space, and hence equal zero, proving (c).
and where j s , 1 ≤ s ≤ k is written in ascending order. Then j k = 2k and To show the converse, let v 0 = L H 2k (v S ). We have v 2k = v 2k + k s=1 α s (t is − t js ) for all α s ∈ {−1, 1}. Fix α k = 1. Having fixed α k , α k−1 , . . . , α s−1 , we choose the value of α s as follows: (a) if j s + 1 ∈ {i m , j m } for some m > s, set α s = α m if j s + 1 = i m and set α s = −α m if j s + 1 = j m (b) if there is no such m, choose α s arbitrarily. By this choice of {α s } we ensure that in v 2k + k s=1 α s (t is − t js ) the coefficient of each v i , i ∈ S cancels out. Thus, from uniqueness, . Therefore, by hypothesis, v 2k + k s=1 α s (t is − t js ) − v 0 = 0 and hence, all the coefficients of {v i } in LHS is zero. Since t 2k = v 2k−1 + v 2k and coefficient of t 2k is −1 in the left side, the coefficient of t 2k−1 must be +1 on the right side, so that the coefficient of v 2k−1 is zero. But then comparing the coefficient of v 2k−2 , the coefficient of t 2k−2 has to be −1. Proceeding in this manner, all t i for odd i must have coefficient +1 and all t i for even i must have coefficient −1. These imply that indices i s and j s occur exactly once in an odd and once in an even position. That is, the word is symmetric.
This transformation does not change the value of the integral corresponding to p H (w). Under this volume preserving transformation, (p 1 , p 2 , p 3 , p 4 , · · · , p 2k ) maps to (q 1 , −q 2 , q 3 , −q 4 , · · · , −q 2k ). Now a typical equation in Hankel case looks like p r = p s if w[r] = w[s] before transformation. Since w is symmetric, one of r and s will be odd and other will be even. So, under the transformation the equation becomes (−1) r−1 q r = (−1) s−1 q s or, q r + q s = 0 which is nothing but the corresponding Toeplitz equation for the word w. Hence, p H (w) = p T (w). This proves (e).
2 Tables 3 and 4 provide a full list of partition words with their type, corresponding range of integration, and their volumes upto order 6. The second column shows the role of Catalan and symmetric words.  [7], the unboundedness of LSD has been proved, separately for T and H. Now with Theorem 7 in our hand we can give a short combined proof of this fact as follows.  Cat. Cat.

Application of the volume method to reverse circulant
Assuming that E|x i | 3 < ∞, Bose and Mitra (2003) [6] showed that the LSD R of 1 √ n RC n has the density, with moments β 2k+1 (R) = 0 and β 2k (R) = k! for all k ≥ 0.
This was established by them by using the normal approximation. We state the following stronger result and provide a quick proof using the volume method. Proof of Claim 1. Fix any (pair matched) word w giving a partition (i s , j s ), 1 ≤ s ≤ k.
It is obvious that π(j) can be determined uniquely from the above equation since 1 ≤ π(j) ≤ n. We proceed inductively from left to right to get the whole circuit (i.e. π(0) = π(2k)) from only the independent vertices, uniquely.
This completes the proof of Claim 2 and the proof of the Theorem is complete.

Doubly symmetric and palindromic Hankel and Toeplitz matrices
Recall that the symmetric circulant has the link function n/2 − |n/2 − |i − j||. This may be considered as a "doubly symmetric" version of the Toeplitz matrix whose link function is |i − j|.
Likewise we may consider the doubly symmetric Hankel matrix as DH n whose link function is On the other hand, [MMS] [15] define a (symmetric) matrix to be palindromic if its first row is a palindrome. In particular, they discussed the palindromic Toeplitz matrix P T n and also the palindromic Hankel matrix P H n . They establish the Gaussian limit for F n −1/2 P Tn and F n −1/2 P Hn . Their approach is as follows: it is known from Hammond and Miller (2005) [12] that the LSD for T n is linked to solutions of some Diophantine equations with certain obstructions.
[MMS] show that for P T n , these restrictions are absent, yielding the Gaussian limit. They also observe a direct relation between P H n and P T n to obtain the same conclusion for the former.
Combining their observations and ours, we have the following result with a short proof.
Theorem 9. (a) For any input sequence, if the LSD of any one of n −1/2 DH n and n −1/2 P H n exists then the other also exists and they are equal.
(b) For any input sequence, (i) if the LSD of any one of n −1/2 P T n and n −1/2 SC n exist, then the other also exists and they are equal and (ii) (P T n ) 2k = (P H n ) 2k for every k.
(c) If the input sequence {x i } is independent with mean zero and variance 1 and are either (i) uniformly bounded or (ii) identically distributed, then the LSD of all four matrices n −1/2 P T n , n −1/2 SC n , n −1/2 P H n and n −1/2 DH n are standard Gaussian G with β 2k (G) = (2k)! 2 k k! .
Proof. Note that all the above four matrices are closely related to each other. In particular, (i) the n × n principal minor of DH n+3 is P H n .
(ii) the n × n principal minor of SC n+1 is P T n .
Hence by the interlacing inequality, the claims for the equality of the LSD follow immediately.
For the rest of the proof, we may as usual, without loss, assume that the variables are uniformly bounded. Further, from Theorem 3, if the LSD exist then they have mean zero and the LSD may be identified by the convergence of the expected moments of the ESD.
Let J n be the matrix with entries 1 in the main anti-diagonal and zero elsewhere. Then it is easy to see that (P H n )J n = J n (P H n ) = P T n and since J 2 n = I n , This shows that all four matrices will have the same LSD.
It is thus enough to show that the even moments of the symmetric circulant converge to the Gaussian moments.
There are (2k)! 2 k k! (pair) matched word w with l(w) = 2k. Hence it is enough to show that for each such word, for i < j, then |n/2 − |u i || = |n/2 − |u j ||. This leads to the following six possibilities in all: We now show that the first three possibilities are asymptotically negligible. Proof. Let (i 1 , j 1 ), (i 2 , j 2 ), . . . (i k , j k ) denote the pair partition corresponding to the word w, i.e. w[i l ] = w[j l ], 1 ≤ l ≤ k. Suppose, w.l.o.g. u i k − u j k = 0, ±n. Clearly a circuit π becomes completely specified if we know π(0) and all the 'slopes' {u i }.
As already observed, if we fix some value for u i l , there are at most six options for u j l . We may choose the values of π(0), u i 1 , u i 2 , . . . , u i k−1 in O(n k ) ways and then we may choose values of u j 1 , u j 2 , . . . , u j k−1 in O(6 k ) ways. For any such choice, from the sum restriction 2k i=1 u i = π(2k) − π(0) = 0 we know t i k + t j k and on the other hand by hypothesis, u i k − u j k = 0, +n, −n. Thus the pair (u i k , u j k ) has 6 possibilities. Thus there are most O(n k ) circuits with the given restrictions and the proof of the Lemma is complete.

Matrices with dependent entries
We now illustrate the volume method with a few new results for matrices with dependent entries.
Theorem 10. Suppose x t = ǫ t ǫ t+1 · · · ǫ t+d−1 . Suppose {ǫ i } are independent with mean zero and variance 1 and are either (i) uniformly bounded or (ii) identically distributed. Let A n,d = ((x L(i,j) )) be a sequence of random matrices with a one-dimensional link function L having property B with f (x) = x. Suppose ESD of n −1/2 A n,d converges to a nonrandom G almost surely when d = 1. Then the same LSD continues to hold almost surely for any d ≥ 2.
Since there is dependence between the {x i }, we will have to deal with more complicated matchings.
Matched circuits: Let L be a link function. Consider the d × h matrix, M π = ((m i,j )) where We now say that π is d-matched if every element of M π appears at least twice. This notion is extended to d-joint matching and d-cross matching in the obvious way. Note the following facts: 1. No two entries belonging to the same column of M π can be equal.
Let N M h,3 + = Number of d-matched circuits of length h with at least one entry of M π repeated at least thrice.
(b) there exists a constant K h,d such that, (c) Suppose x t = ǫ t ǫ t+1 · · · ǫ t+d−1 . Suppose {ǫ i } are uniformly bounded, independent with mean zero and variance 1. Let A n,d = ((x L(i,j) )) be a sequence of random matrices with a link function L having Property B with f (x) = x. Then for every h, and hence (M4) holds.
Proof. Let g : N → Z ≥ be defined as g(x) = ⌊x/d⌋. Obviously 0 has only d many preimages under g. So (L, g) satisfies Property B. If π is as given, then from the observations made earlier, π has at least one (L, g) match of order three or more. So an application of Lemma 1 completes the proof of (a).
(b) follows immediately from Lemma 2 by noting that the set under consideration is contained in {(π 1 , π 2 , π 3 , π 4 ) : they are jointly and cross-matched with respect to g .
The proof of (c) uses (b) and we skip the details. 2 Lemma 8. Each d-matched circuit π with only pair matchings is also pair-matched w.r.t. L and vice versa. Hence if l(π) = h is odd, then no d-matched circuit π can be pair-matched.
Proof. Let α 1 = min t L(π(t − 1), π(t)). Since π is d-matched, there exist at least two entries from the matrix M π whose values equal α 1 . These entries have to come from the first row since the elements in each column are strictly increasing. Let 1 ≤ j 1 = j 2 ≤ h be such that L(π(j 1 ), π(j 1 )) = L(π(j 2 ), π(j 2 )) = α 1 . Then the j 1 -th column of the matrix M π will be entirely matched with the j 2 -th column. Now, since M π contains only pair-matchings, no entry in the columns j 1 and j 2 can share the same values with entries from the rest of the matrix. Now drop these two columns and repeat the above procedure. Clearly, if h is odd, we cannot have a d-pair-matching. On the other hand, if h is even, we have concluded that the columns of M π are pairwise matched with one another. This automatically induces an L-pair-matching for π and we are done. Other direction is trivial. where X π = h i=1 ǫ L(π(i−1),π(i)) ǫ L(π(i−1),π(i))+1 · · · ǫ L(π(i−1),π(i))+d−1 . Lemma 7(a) and Lemma 8, imply that if h is odd, then lim E[β h (F n,d )] = 0 and hence for every d, lim β h (F n,d ) = 0 almost surely. Now suppose h = 2k is even. Let Π(w) be as defined in Section 3 for ordinary L-matching. From Theorem 3, almost surely, lim n −(k+1) w:|w|=k Π(w) = lim β 2k (n −1/2 A n,1 ) = lim E[β 2k (n −1/2 A n,1 ) = β 2k (G).
On the other hand, Lemma 7 and Lemma 8 imply that for all d, This proves the theorem for the uniformly bounded case. Now assume that {x i } is i.i.d. Then by following the arguments given in the proof of Theorem 2 it may be shown that the same LSD persists.
2 We now explore another dependent situation. Though the following results can possibly be extended to other matrices, we will restrict our attention to Toeplitz and Hankel matrices. Theorem 11. Suppose X t = ∞ j=0 a j ǫ t−j where {a j } satisfies j |a j | < ∞ and {ǫ i } are independent with mean zero and variance 1 such that either (i) {ǫ i } are uniformly bounded or (ii) identically distributed and j ja 2 j < ∞. Then with probability one, ESD of T n / √ n and H n / √ n, with entries coming from {X t }, converge weakly to non-random symmetric probability measures T a and H a respectively. These LSD do not depend on the distribution of ǫ 1 . Further, the (2k)-th moment of T a and H a are given by and T and H are as defined in Theorem 6.

Remark 7.
Observe that A 2k ≤ A 2k where A = j |a j |. Thus it follows from Theorem 11 that the limiting moments are bounded above by the Gaussian moments, that is, β 2k (T a ), β 2k (H a ) ≤ E(Z 2k ) where Z ∼ N (0, A 2 ). In some easy cases {A 2k } can be explicitly calculated. For example, if X t = ǫ t + θǫ t−1 then A 2k = k i=0 k i Proof of Theorem 11. We provide an outline of the proof. First assume that {ǫ j } is bounded by B. This implies that {X t } is also uniformly bounded by AB. In the rest of the proof, L will stand for either Toeplitz or Hankel link function and B n will stand for T n or H n .
Consider the following string of equalities: Thus after interchanging the order of summation, we get RHS of (66) is equal to w:w matched |w|=k a m a m+j , j ≥ 0 (67) The term in the big parenthesis does not involve w. It is also equal to the factor given in the statement in Theorem 11. It is easy to show that Carleman's condition is satisfied.
Also by using Lemma 10, it can be shown that This shows that the convergence of moments happens almost surely. So, we have established the result for the bounded case.
Let us now turn to the case when {ǫ j } are i.i.d. mean zero, variance one but not necessarily uniformly bounded. From Phillips and Solo (1992) if j ja 2 j < ∞ then we have the following SLLN: 1 n Now we may give the following truncation argument to reduce to the earlier case.
Let {ǫ * i } be the truncated version of {ǫ i } as in (15). Let X * t = ∞ j=0 a j ǫ * t−j . We have already established the Theorem for T * n , the Toeplitz matrix constructed from {X * t }. Following the steps of Theorem 2 and using (68) it can be easily shown that lim sup n d BL (F n −1/2 Tn , F n −1/2 T * n ) → 0 almost surely as the truncation level goes to infinity. Thus the Theorem remains true for the Toeplitz matrix constructed using {X t }. This proves the Theorem completely. 2