CLT for Linear Spectral Statistics of Wigner matrices ∗

In this paper, we prove that the spectral empirical process of Wigner matrices under sixth-moment conditions, which is indexed by a set of functions with continuous fourth-order derivatives on an open interval including the support of the semicircle law, converges weakly in ﬁnite dimensions to a Gaussian process.


Introduction and Main result
The random matrices theory originates from the development of quantum mechanics in the 1950's. In quantum mechanics, the energy levels of a system are described by eigenvalues of a Hermitian operator on a Hilbert space. Hence physicists working on quantum mechanics are interested in the asymptotic behavior of the eigenvalues and from then on, the random matrices theory becomes a very popular topic among mathematicians, probabilitists and statisticians. The leading work, the famous semi-circle law for Wigner matrices was found in [19].
The set of these real Wigner matrices is called the Real Wigner Ensemble (RWE).
The set of these complex Wigner matrices is called the Complex Wigner Ensemble (CWE).
The empirical distribution F n generated by the n eigenvalues of the normalized Wigner matrix n −1/2 W n is called the empirical spectral distribution (ESD) of Wigner matrix. The semi-circle law states that F n a.s. converges to the distribution F with the density Its various versions of convergence were later investigated. See, for example, [1], [2].
Clearly, one method of refining the above approximation is to establish the rate of convergence, which was studied in [3], [10], [12], [13], [18], [5] and [8]. Although the exact convergence rate remains unknown for Wigner matrices, Bai and Yao [6] proved that the spectral empirical process of Wigner matrices indexed by a set of functions analytic on an open domain of the complex plane including the support of the semi-circle law converges to a Gaussian process under fourth moment conditions.
To investigate the convergence rate of the ESD of Wigner matrix, one needs to use f as step functions. However, many evidences show that the empirical process associated with a step function can not converge in any metric space, see Chapter 9 of [8]. Naturally, one may ask whether it is possible to derive the convergence of the spectral empirical process of Wigner matrices indexed by a class of functions under as less assumption on the smoothness as possible. This may help us to have deeper understanding on the exact convergence rate of ESD to semi-circle law.
In this paper, we consider the empirical process of Wigner matrices, which is indexed by a set of functions with continuous fourth-order derivatives on an open interval including the support of the semicircle law. More precisely, let C 4 ( ) denote the set of functions f : → that has continuous fourth-order derivatives, where is an open interval including the interval [−2, 2], the support of F (x). The empirical process G n {G n ( f )} indexed by C 4 ( ) is given by In order to give a unified treatment for Wigner matrices, we define the parameter κ with values 1 and 2 for the complex and real Wigner matrices respectively. Also set β = (|x 12 | 2 − 1) 2 − κ. Our main result is as follows.
Then the spectral empirical process G n = {G n ( f ) : f ∈ C 4 ( )} converges weakly in finite dimensions to a Gaussian process G : and covariance function Here {T l , l ≥ 0} is the family of Chebyshev polynomials.
can be regarded as a population parameter. The linear spectral statisticθ = f (x)d F n (x) is then an estimator of θ . We remind the reader , has its strong statistical meaning in the application of Theorem 1.1. For example, in quantum mechanics, Wigner matrix is a discretized version of a random linear transformation in a Hilbert space, and semicircular law is derived under ideal assumptions. Therefore, the quantum physicists may want to test the validity of the ideal assumptions. Therefore, they may suitably select one or several f 's so that θ 's may characterize the semicircular law. Using the limiting distribution of G n ( f ) = n(θ − θ ), one may perform statistical test of the ideal hypothesis. Obviously, one can not apply the limiting distribution of n(θ − θ ) to the above test. Remark 1.3. Checking the proof of Theorem 1.1, one finds that the proof still holds under the assumption of bounded fourth moments of the off-diagonal entries is finite, if the approximation G n ( f m ) of G n ( f ) is of the desired order. Thus, the assumption of 6-th moment is only needed in deriving the convergence rate of F n − F . Furthermore, the assumption of the fourth derivative of f is also related to the convergence rate of F n − F . If the convergence rate of F n − F is improved and/or proved under weaker conditions, then the result of Theorem 1.1 would hold under the weakened conditions. We conjecture that the result would be true if we only assume the fourth moments of the off-diagonal elements of Wigner matrices are bounded and f have the first continuous derivative.
Pastur and Lytova in [17] studied asymptotic distributions of n f (x)d(F n (x) − F n (x)) under the conditions that the fourth cumulant of off-diagonal elements of Wigner matrices being zero and the Fourier transform of f (x) having 5th moment, which implies that f has the fifth continuous derivative. Moreover, they assume f is defined on the whole real line. These are stronger than our conditions. Remark 1.4. The strategy of the proof is to use Bernstein polynomials to approximate functions in C 4 ( ). This will be done in Section 2. Then the problem is reduced to the analytic case, which has been intensively discussed in Bai and Yao [6]. But the functions in [6] are independent of n and thus one can choose fixed contour and then prove that the Stieltjes transforms tend to a limiting process. In the present case, the Berstein polynomials depend on n through increasing degrees and thus they are not uniformly bounded on any fixed contour. Therefore, we cannot simply use the results of Bai and Yao [6] and thus we have to choose a sequence of contours approaching to the real axes so that the approximating polynomials are uniformly bounded on the corresponding contours. On this sequence of contours, the Stieltjes transforms don't have a limiting process. Our Theorem 1.1 cannot follow directly from [6]. We have to find alternative ways to prove the CLT. Remark 1.5. It has been also found in literature that the so-called secondary freeness is proposed and investigated in free probability. Readers of interest are referred to Mingo [14;15;16]. We shall not be much involved in this direction in the present paper. As a matter, we only comment here that both the freeness and the secondary freeness are defined on sequences of random matrices (or ensembles) for which the limits of expectations of the normalized traces of all powers of the random matrices and the limits of covariances of unnormalized traces of powers of the random matrices exist. Therefore, for a single sequence of random matrices, the existence of these limits has to be verified and the verification is basically equivalent to moment convergence method. Results obtained in [14] is in some sense equivalent to those of [17].
The paper is organized as follows. The truncation and re-normalization step is in Section 3. We derive the mean function of the limiting process in Section 4. The convergence of the empirical processes is proved in Section 5.

Bernstein polynomial approximation
It is well-known that iff ( y) is a continuous function on the interval [0, 1], the Bernstein polynomials For the function f ∈ C 4 ( ), there exists a > 2 such that [−a, a] ⊂ . Make a linear transformation Sinceh( y) y(1 − y)f ′′ ( y) has a second-order derivative, we can use Bernstein polynomial approximations once again to get Therefore, G n ( f ) can be split into three parts: For ∆ 3 , by Lemma 6.1 given in Appendix, Here and in the sequel, the notation Z n = O p (c n ) means that for any ε > 0, there exists an M > 0 such that sup n P(|Z n | ≤ M c n ) < ε. Similarly, Z n = o p (c n ) means that for any ε > 0, lim n P(|Z n | ≤ εc n ) = 0.
Note that f m (x) and h m (x) are both analytic. Following from the result proved in Section 5, replacing f m by h m , we obtain that It suffices to consider ∆ 1 = G n ( f m ). In the above, f m (x) and g m ( y) are only defined on the real line. Clearly, the two polynomials can be considered as analytic functions on the complex regions Let γ m be the contour formed by the boundary of the rectangle with

Truncation
As proposed in [9], to control the fluctuations around the extreme eigenvalues under condition (1.2), we will truncate the variables at a convenient level without changing their weak limit.
Condition (1.2) implies the existence of a sequence η n ↓ 0 such that We first truncate the variables asx i j = x i j |x i j |≤η n n and normalize them tox LetF n andF n denote the ESDs of the Wigner matrices n −1/2 (x i j ) and n −1/2 (x i j ), andĜ n andG n the corresponding empirical process, respectively. First of all, by (1.2), Therefore, there exist positive constants M 1 and M 2 so that ] and m, we obtain whereλ n j andλ n j are the jth largest eigenvalues of the Wigner matrices n −1/2 (x i j ) and n −1/2 (x i j ), respectively.
Therefore, the weak limit of the variables (G n ( f m )) is not affected if we substitute the normalized truncated variablesx i j for the original x i j .
From the normalization, the variablesx i j all have mean 0 and the same absolute second moments as the original variables. But for the CWE, the condition x 2 i j = 0 does no longer remain true after these simplifications. Fortunately, we have the estimate x 2 i j = o n −1 η 2 n , which is good enough for our purposes.
For brevity, in the sequel we still use x i j to denote the truncated and normalized variablesx i j .

Preliminary formulae
Recall that for a distribution function H, its Stieltjes transform s H (z) is defined by The Stieltjes transform s(z) of the semicircle law F is given by s(z) = − 1 2 (z − sgn(Im(z)) z 2 − 4) for z with Im(z) = 0 which satisfies the equation s(z) 2 + zs(z) + 1 = 0. Here and throughout the paper, z of a complex number z denotes the square root of z with positive imaginary part.
Define D = (n −1/2 W n − zI n ) −1 . Let α k be the kth column of W n with x kk removed and W n (k) the submatrix extracted from W n by removing its kth row and kth column. Define D k = (n −1/2 W n (k) − zI n−1 ) −1 . Let A * denote the complex conjugate and transpose of matrix or vector A. We shall use the following notations: The Stieltjes transform s n (z) of F n has the representation Throughout this paper, M may denote different constants on different occassions and ε n a sequence of numbers which converges to 0 as n goes to infinity.

The mean function
Let λ ex t (n −1/2 W n ) denote both the smallest and the largest eigenvalue of the matrix n −1/2 W n (defined by the truncated and normalized variables). For η 0 < (a − 2)/2, define Then, by Cauchy integral we have Since P(A c n ) = o(n −t ) for any t > 0 (see the proof of Theorem 2.12 in [4]), in probability. It suffices to consider In the remainder of this section, we will handle the asymptotic mean function of ∆. The convergence of random part of ∆ will be given in Section 5.
To this end, Then we prove where and in the sequel γ mh denotes the union of the two horizontal parts of γ m and γ mv the union of the two vertical parts. The limit of R 1 is given in the following proposition.
for both the CWE and RWE.
Proof. Since f m (z) are analytic functions, by [6], we have where a ≃ b stands for a/b → 1 as n → ∞, As f m (t) → f (t) uniformly on t ∈ [−a, a] as m → ∞, it follows that Since f m (z) is bounded on and in γ m , in order to prove R 2 → 0 as m → ∞, it is sufficient to show that where b nA = n( s n (z) A n − s(z)).
We first consider the case z ∈ γ mh . Since by the identity 1 Now suppose z ∈ γ mh . In order to analyze S 1 , S 2 , S 3 and S 4 , we present some facts.
where we have used Lemma 6.1. This implies which follows from the proof of (4.3) in [7], and which is derived from Lemma 6.2.

Fact 3.
From (3.2) we have Similarly, for S 3 , we have For S 1 , we will prove that By (4.7) and Fact 3, we have where the o(1) is uniform for z ∈ γ mh .
Thus, we have Using the fact 1 and Fact 2, for z ∈ γ mh , we obtain To estimate S 11 , we use Lemma 6.1 and the fact that Finally we deal with By the previous estimate for ǫ k (z), it follows that By the proof of (4.3) in [7], we have 1 Hence we neglect this item. In order to estimate the expectation in (4.11), we introduce the notation α k = (x 1k , x 2k , ..., x k−1k , x k+1k , ..., x nk ) * (x i ) and D k (z) (d i j ). Note that α k and D k (z) are independent.
Here we need to consider the difference between the CWE and RWE.
For the RWE, all the original and truncated variables have the properties: x i j = 0 and |x i j | 2 = For the CWE, the truncated variables have the properties: We introduce the parameters κ with values 1 and 2 for the CWE and RWE, and β = E(|x 1 | 2 −1) 2 −κ, which allow us to have the following unified expression, Hence and gives Summarizing the estimates of S i , i = 1, ..., 4, we have obtained that Note that b n (z) = n[ s n (z) − s(z)] = 2sgn(Im(z))n δ n (z) and that The second equation is equivalent to sgn(ℑ(z)) From the above two equations, we conclude (1), uniformly on γ mh .
Then, (4.6) is proved by noticing that |b n (z) − b nA (z)| ≤ nv −1 P(A c n ) ≤ o(n −t ) for any fixed t. Now, we proceed the proof of (4.5). We shall find a set Q n such that P(Q c n ) = o(n −1 ) and By continuity of s(z), there are positive constants M l and M u such that for all z ∈ γ mv , It is trivially obtained that A n ⊆ A nk (see [11]). Noting the independence of A nk and α k , we have where k denotes the expectation taken only about α k . Therefore This gives P(B c n ) = o(n −1 ) as n → ∞. Define Q n A n ∩ B n , we have P(Q n ) → 1, as n → ∞. Similar to (4.7), we have for z ∈ γ mv , where δ nQ (z) = 1 Similar to (4.7), we have . We shall prove that b nQ (z) is uniformly bounded for all z ∈ γ mv . Noticing that |z 2 − 4| > (a − 2) 2 and the fact P(Q c n ) = o(1), we only need to show that nδ nQ (z) is uniformly bounded for z ∈ γ mv . (4.14) Rewrite At first we have Here, the result follows from facts that 1/β k is uniformly bounded when Q n happens, 1 n D k (z) is bounded when A nk happens and P(Q c n ) = o(1/n). From this and the facts that s n (z) Q n → s(z) uniformly on γ mv and we conclude that the first term in the expansion of nδ nQ (z) is bounded.
By similar argument, one can prove that the second term of the expansion of nδ nQ (z) is uniformly bounded on γ mv . Therefore, we have Proposition 4.2. γ mv f m (z)n( s n (z) Q n − s(z))dz → 0 in probability as n → ∞.
Therefore, to complete the proof of (4.5), we only need to show that Proposition 4.3. γ mv f m (z)n(s n (z) Q n − s n (z) Q n )dz → 0 in probability as n → ∞.
We postpone the proof of Proposition 4.3 to the next section.

Convergence of ∆ − ∆
Let k = (x i j , k + 1 ≤ i, j ≤ n) for 0 ≤ k ≤ n and k (·) = (·| k ). Based on the decreasing filtration ( k ) , we have the following well known martingale decomposition This decomposition gives where q k (z) = 1 n x kk − 1 n α * k D k (z)α k − t r D k (z) and At first, we note that Next, we note that R 32 is a sum of martingale differences. Thus, we have (5.15) When z ∈ γ mv and A nk happens, we have |n −1 t r D k (z)D k (z)| ≤ η −2 0 . Also, z + n −1 t r D k (z) → z + s(z) uniformly. Further, |z + s(z)| has a positive lower bound on γ mv . These facts, together with (5.15), imply that R 32 → 0, in probability.
The proof of Proposition 4.3 is the same as those for R 32 → 0 and R 33 → 0. We omit the details.
Note that when z ∈ γ mh q k (z) = 0, q k (z) Taylor expansion of the log function implies where o p (1) follows from the following Condition 5.1. Clearly Y nk ∈ k−1 and k Y nk = 0. Hence {Y nk , k = 1, 2, ..., n} is a martingale difference sequence and n k=1 Y nk is a sum of a martingale difference sequence. To save notation, we still use G n ( f m ) to denote 1 2πi n k=1 Y nk from now on. In order to apply CLT to G n ( f m ), we need to check the following two conditions: nk converges to a constant limit to guarantee the convergence to complex normal. For simplicity, we may consider two functions f , g ∈ C 4 ( ) and show that C ov k [G n ( f m ), G n (g m )] converges in probability, where f m and g m denote their Bernstein polynomial approximations. That is, The proof of condition 5.1 with p = 3.
Since f ′ m (z) is bounded on γ mh , it suffices to prove n k=1 q k (z) For all z ∈ γ mh , by Lemma 6.2, we have Then the above gives The conditional covariance is For z ∈ γ mh , since For Q 1 , we have For Q 2 , since s(z 1 )s(z 2 ) f ′ m (z 1 )g ′ m (z 2 ) is bounded on γ mh × γ mh , in order to prove Q 2 → 0 in probability, it is sufficient to prove that Γ n (z 1 , z 2 ) converges in probability to Γ(z 1 , z 2 ) uniformly on γ mh ×γ mh . Decompose Γ n (z 1 , z 2 ) as uniformly on γ mh × γ mh . Hence In the following, we consider the limit of 1 . As proposed in [6], we use the following decomposition. Let e j = (0, . . . , 1, . . . , 0) ′ n−1 , j = 1, 2, . . . , k − 1, k + 1, . . . , n whose jth (or ( j − 1)th) element is 1, the rest being 0, if j < k (or j > k). Then We introduce the matrix First note that the term T * is proportional to the term of the left hand side. We now evaluate the contributions of the remaining four terms to the sum 1 .
The term T 1 → s(z 2 ) in L 2 since  Again, T 2b can be shown to be ignored. As for the remaining term T 2a , we have |X * C X − t r C| p K p ( |x 1 | 4 t r C C * ) p/2 + |x 1 | 2p t r(C C * ) p/2 . This is Lemma 2.7 in [7].