A functional CLT for partial traces of random matrices

In this paper we show a functional central limit theorem for the sum of the first $\lfloor t n \rfloor$ diagonal elements of $f(Z)$ as a function in $t$, for $Z$ a random real symmetric or complex Hermitian $n\times n$ matrix. The result holds for orthogonal or unitarily invariant distributions of $Z$, in the cases when the linear eigenvalue statistic $\operatorname{tr} f(Z)$ satisfies a CLT. The limit process interpolates between the fluctuations of individual matrix elements as $f(Z)_{1,1}$ and of the linear eigenvalue statistic. It can also be seen as a functional CLT for processes of randomly weighted measures.


Introduction
It is the purpose of this paper to add a new perspective to the central limit theorem for linear eigenvalue statistics. The main objects are the eigenvalues λ 1 , . . . , λ n of a random real symmetric or complex Hermitian matrix Z. Given a test function f , the linear statistic of these eigenvalues, denoted by X (n) 1 (f ), is tr(f (Z)) = f (λ 1 ) + · · · + f (λ n ). For many distributions of eigenvalues and smooth enough functions we have, after centering, the convergence in distribution to a normal random variable, (1.1) Over the last two decades, CLTs for linear eigenvalue statistics have grown into a hugely popular field of study within random matrix theory. To give a partial overview, the convergence in (1.1) was proven for invariant matrix models or orthogonal polynomial ensembles [Joh98, Pas06, Shc08, KS10, DP12, Dui15, BD17], for general Wigner or Wishart matrices [BS08,LP09a,Shc13], for matrices of compact groups [Joh97,Sos00], and for non-Hermitian matrices [RS06,NP10]. Comparing (1.1) with classical CLTs, for example for sums of independent random variables, it is highly remarkable that there is no additional scaling factor n −1/2 . This phenomenon is usually attributed to the strong dependence structure of the eigenvalues. Indeed, the classical orthogonal polynomial ensembles have a joint eigenvalue density containing the Vandermonde determinant ∆(λ) = i<j |λ i −λ j |, which leads to a repulsion of eigenvalues. It was shown however by [CL95,Sos02] that in general the variance of the linear eigenvalue statistic does not remain bounded for non-smooth test functions f . One sees a very different picture when, instead of the trace, we consider the fluctuations of an individual matrix element f (Z) 1,1 . Limit theorems for such entries have been considered by [LP09b,LP11,PRS12,ORS13]. The random variable f (Z) 1,1 depends not only on the distribution of the eigenvalues, but also on the eigenvectors. We will assume the matrix of eigenvectors to be Haar distributed on the orthogonal group (for real Z) or on the unitary group (for complex Z), and to be independent of the eigenvalues. This is satisfied for the prominent case of unitarily invariant ensembles (see Section 2.1). Then the central limit theorem takes the form Unlike for the full trace, an additional scaling is necessary. Although one might expect f (Z) 1,1 to scale as 1 n tr f (Z), the fluctuations of the former random variable are much larger. We remark that in our setting the convergence (1.2) is in fact a consequence of (1.1) (see Theorem 2.1).
In this paper, we show that we can in some sense interpolate between the different CLTs in (1.2) and (1.1) by summing a varying number of diagonal elements. The main object of interest is thus the partial trace X k,t are norms of projections of the eigenvectors (see (2.3)). In our main result, Theorem 2.2, we show that in a setting where the convergence (1.1) of the linear eigenvalue statistic holds, the process (1.4) converges as n → ∞ in distribution to a centered Gaussian process. The variance of the limit process at time t is given by (t − t 2 )σ 2 0 (f ) + t 2 σ 2 1 (f ). That is, the fluctuations interpolate between the limit variance of the CLTs in (1.2) and (1.1) and, unless σ 2 0 (f ) = σ 2 1 (f ), the limit is not a Brownian motion. A core argument in the proof is the independence of eigenvalues and eigenvectors. Assuming a convergence as in (1.1), the main task is then to handle the fluctuations induced by summing a varying number of entries of the eigenvector matrix. The main ingredient for this is a functional limit theorem for sums over subblocks of Haar distributed matrices proven by [DMR12,BDMR14]. This result itself relies on a powerful theorem of [MŚS07], allowing to evaluate higher order cumulants for entries of Haar matrices. Our strategy also allows us to prove a functional CLT for (1.4), when instead of the mean E[X (n) t (f )], one centers by the expectation conditioned on the eigenvalues. The result is the quenched convergence in Theorem 2.3, which gives a convergence in distribution under the law of the eigenvector matrices, valid for almost all (sequences of) eigenvalues. With this centering, the limit process is a Brownian bridge. This also shows that the results are not restricted to the random matrix setting, but could also be viewed in the framework of randomly weighted sums, when the weight are coming from Haar distributed matrices as in (1.3). For example, the functional CLT of Theorem 2.3 is also true for deterministic sequences λ i or more general point processes, see Remark 2.4.
Convergence of partial traces has been considered before in a couple of papers for particular distributions of random matrices. If Z is unitary and f the identity, a functional limit theorem for the partial trace has been proven in [D'A00]. A more general way of summing entries of unitary matrices was considered in [DDN03]. In [Rai98], real symmetric matrices are considered and the statement of Theorem 2.3 is proven under a strong moment condition on the λ i , using zonal polynomials. Using the arguments of Section 3.3, this would lead to a convergence of (1.4), again under higher moment conditions. This paper is structured as follows. In Section 2, we state and discuss our main assumptions and state our results. The proofs can be found in Section 3 and a lengthy variance computation is contained in Section 4.

Acknowledgments:
The author is very grateful to Maurice Duits for several helpful discussions and for inspiring the author to investigate the partial traces.

Random ensembles and main results
Let us begin with a closer look at the partial trace. When Z = Z (n) is a n × n complex Hermitian matrix, by the spectral theorem we may write Z (n) = U (n) Λ (n) (U (n) ) * , where U (n) is a n × n unitary matrix, Λ (n) is real diagonal with the eigenvalues λ 1 , . . . , λ n on the diagonal and A * denotes the conjugate transpose of A. If Z (n) is real symmetric, U (n) is orthogonal instead. With this decomposition, we have for the partial trace as defined in (1.3), The main object of our study is then the random non-negative finite measure X (n) t defined by (2.2) with δ z the Dirac measure in z and the weights are given by In this case µ(f ) is just the shorthand notation for f dµ. Note that the total mass of X (n) t is given by ⌊tn⌋. The representation (2.2) shows that statements about the partial trace are in fact statements about a weighted version of the classical empirical eigenvalue distribution, which we denote byμ (n) and which corresponds to all weights being equal to n −1 . In (2.2), the weight of λ i is a norm of the first ⌊tn⌋ entries of the corresponding eigenvector. Setting t = 1, all weights in (2.2) become 1, so thatμ (n) = 1 n X (n) 1 . In other words, nμ (n) (f ) is the linear eigenvalue statistic.
Another prominent eigenvalue measure is the spectral measure µ The spectral measure can be obtained from the partial trace by µ (n) 1 = X (n) 1/n . Although for classical ensembles of random matrices, the measuresμ (n) and µ (n) 1 have the same limit in probability as n → ∞, the fluctuations around this limit are very different, which becomes evident in the different central limit theorems in (1.1) (for nμ (n) ) and in (1.2) (for µ (n) 1 ). The additional randomness of the weights in the spectral measure leads to substantially larger fluctuations. Let us remark that a similar behavior can be observed on the scale of large deviations: whilê µ (n) satisfies a large deviation principle with speed n 2 , see [BAG97] or [AGZ10], for µ (n) 1 this is reduced to speed n [GR11, GNR16].

Assumptions
In order to present the results for complex and real matrices in a unified expression, we follow the classical notation of [Dys62] and introduce the parameter β, where β = 1 if U (n) is real and orthogonal and β = 2 if U (n) is complex and unitary. Let β ′ = β/2. We will always make the following assumption: (A1) The matrices U (n) and Λ (n) are independent and U (n) is Haar distributed on the unitary group (β = 2) or on the orthogonal group (β = 1).
Under assumption (A1), we can write the distribution of (U (n) , Λ (n) ) as P = P H ⊗ P Λ , where P H is the Haar measure on the unitary group and P Λ is the distribution of the eigenvalues. We denote expectation with respect to P H and P Λ by E H and E Λ , respectively. Any convergence in distribution will be under P unless we specify otherwise. Without loss of generality, assume that all matrices (U (n) , Λ (n) ) for n ≥ 1 are defined on a common probability space. While the distribution of U (n) is completely specified by (A1), we need that the empirical measure of the eigenvalues converges to a deterministic limit. Apart from Theorem 2.3, we also assume a CLT for the linear eigenvalue statistic. Note that the two next assumptions are also conditions on the test function f : R → R.
(A2) There exists a deterministic probability measure ν, such thatμ (n) converges weakly to ν P Λ -almost surely. Furthermore, Let us comment on the assumptions above. Suppose the matrix Z (n) is distributed with density proportional to with respect to the Lebesgue measure in each independent real entry in X. The potential V : R → (−∞, ∞] is supposed to be continuous and satisfy the growth (or confinement) condition The density (2.4) implies that assumption (A1) is satisfied and that the eigenvalues have a joint density proportional to with respect to the Lebesgue measure on R n (see [Meh04]). It follows from the large deviation principle of [BAG97] that the empirical eigenvalue distributionμ (n) converges exponentially fast to a compactly supported measure ν. Since the probability of deviating from the limit in the weak topology decays exponentially fast, the weak convergence holds almost surely on any joint probability space. That is, assumption (A2) is satisfied for any f continuous and bounded. If moreover ν is supported by a single interval and the effective potential attains its infimum only on the support of ν, then the largest and smallest eigenvalues each satisfy a large deviation principle [BADG01,APS01]. This implies that the probability of the extremal eigenvalues deviating from the support of ν decays exponentially and one easily obtains that (A2) holds also for continuous f growing at most polynomially at infinity. Turning to assumption (A3), it was shown in [Joh98,KS10] that when β = 2, V is real analytic, ν is supported by a single interval and f is continuously differentiable in a neighborhood of the support of ν and growing at most polynomially, then (A3) is satisfied. For β = 1, the conditions on V are more restrictive. [Shc08] gives a list of conditions on V , adding for example edge regularity, under which (A3) is holds for the same class of f as in the complex case.
As already mentioned in the introduction, the CLT in (A3) (and also assumption (A2)) is not only proven for random matrices with density (2.4), but for a large variety of models, for example general Wigner or Wishart matrices. Such matrices have in general no Haar distributed matrix of eigenvectors, such that assumption (A1) fails to hold. However, given a random matrix Z (n) satisfying (A2) and (A3), we may take U (n) Haar distributed on the orthogonal or unitary group, and defineZ (n) = U (n) Z (n) (U (n) ) * . Then the matrixZ (n) trivially has a Haar distributed matrix of eigenvectors independent of the eigenvalues. The second and third assumption continue to hold, so that nowZ (n) satisfies all asumptions.
Finally, let us remark that the method in the present paper also works if a weak convergence as in (A3) holds with a non-Gaussian limit, but to stay within the framework of CLTs for the linear eigenvalue statistic, we restrict the presentation to the Gaussian case.

Results
The following first theorem can be seen as a preview on the process convergence in Theorem 2.2 and highlights already the different effects the weights and eigenvalues have on the fluctuations. It shows a CLT for the weighted spectral measure or a single entry of the trace, more precisely, the asymptotic normality of √ n(µ 1 , as defined in the beginning of Section 2. The random weights are responsible for the weak convergence of the first term on the right hand side, while under (A3) the second term has fluctuations of smaller order, and vanishes in the limit. Moreover, although both terms depend on the eigenvalues, they are asymptotically independent.
Let us remark that for Z a random matrix satisfying (A1) and (A2), the first convergence in Theorem 2.1 may be rewritten as and if the distribution of Z satisfies also (A3), then the second convergence in Theorem 2.1 is equivalent to (1.2). As described in the introduction, the main objective is to show how the fluctuations of the linear eigenvalue statistic emerges from summing individual matrix elements. So now we consider the process (2.10) as a random element of D[0, 1], equipped with the Skorokhod-topology and the Borel-σ algebra. Our main result is then the following theorem.
Theorem 2.2 Under assumptions (A1), (A2) and (A3), the process X (n) (f ) converges as n → ∞ towards the continuous centered Gaussian process X (f ) with covariance The proof of Theorem 2.2 relies on a decomposition of the process X (n) (f ) into a sum of two processes similar to (2.8). We have is the process centered with respect to P H and and This decomposition has a similar effect as (2.8). The elements of the unitary matrix U are the main source for the fluctuations of W (n) (f ) and this process is asymptotically independent of Z (n) (f ). Since by assumption (A3), Z (n) t (f ) converges to a Gaussian multiplied by t, this will result in the convergence of the sum. The main step in the proof of Theorem 2.2 is then the following functional limit theorem for the process W (n) (f ). Note that assumption (A3) is not needed for this part.
1,n | 2 ) has a homogeneous Dirichlet distribution Dir n (β ′ ), which is defined by the Lebesgue density for the first n − 1 coordinates proportional to The uniform distribution on the standard simplex corresponds thus to β = 2. We will prove the CLT for weights following the general distribution Dir n (β ′ ) for any β ′ > 0, since it makes no difference in the proof. The starting point is the observation that the Dirichlet distribution can be generated by self-normalizing a vector of independent gamma random variables. More precisely, let γ 1 , . . . , γ n be independent random variables with distribution Gamma(β ′ ), then γ 1 γ 1 + · · · + γ n , . . . , γ n γ 1 + · · · + γ n ∼ Dir n (β ′ ). (3.1) where γ 1 , . . . , γ n are independent gamma distributed with parameters (β ′ , 1) and mean β ′ . Define the non-negative measurẽ where we used the independence of the weights and the independence of weights and eigenvalues and we take |t| < β ′ ||f || −1 ∞ . Expanding the logarithm as log(1 + x) = x − x 2 /2 + r(x) with |r(x)| ≤ |x| 3 for |x| ≤ 1/2 this gives with |R n (t, f )| ≤ √ n −1 (β ′ ) 2 |t| 3 ||f || 3 ∞ for n large enough. By Assumption (A2),μ (n) (f 2 ) converges to ν(f 2 ) = ν(f 2 )−ν(f ) 2 almost surely with respect to P Λ , so that by dominated convergence In order to come back to the original measure µ By the strong law of large numbers,μ 1 , it suffices to show that the second term in (3.7) vanishes in probability. Sinceμ (n) (f ) converges almost surely to ν(f ) = 0, this will follow if √ n(μ (n) 1 (1) − 1) is bounded in L 2 (P), which is easily checked by This implies then that the last term in (3.7) vanishes in probability and then by (3.6) the left hand side converges to N (0, (β ′ ) −1 ν(f 2 )) in distribution. This proves the first convergence in Theorem 2.1. The second convergence in Theorem 2.1 will follow from Lemma 3.1 below. To apply it to the present setting, we may set P 1 = P H , P 2 = P Λ , (3.9) By assumption, Y (n) converges in distribution under P Λ to Y ∼ N (0,σ 2 (f )). From 3.4 we get P Λ -almost surely for any t ∈ (−β ′ ||f || −1 ∞ , β ′ ||f || −1 ∞ ). Since the moment generating functions are continuous, almost sure convergence for fixed t implies almost sure pointwise convergence, which implies that the convergence (3.11) holds with X ∼ N (0, σ 2 0 (f )). Lemma 3.1 implies then the convergence of , this finishes the proof. ✷ Lemma 3.1 Let (Ω 1 × Ω 2 , G, P 1 ⊗ P 2 ) be a probability space and X (n) : Ω 1 × Ω 2 → Ω ′ and Y (n) : Ω 2 → Ω ′ random variables, where Ω ′ is a separable metric space with Borel σ-algebra. If Y (n) converges to Y in distribution under P 2 and (3.11) P 2 -almost surely for any bounded continuous F : Ω ′ → R, where E 1 , E is the expectation with respect to P 1 , P 1 ⊗ P 2 respectively, then in distribution under P 1 ⊗ P 2 , with X and Y independent.
Proof: The main observation is that functions F : Ω ′ × Ω ′ → R with F (x, y) = F (x)G(y) and F, G bounded continuous are sufficient to determine convergence in distribution, see Lemma 4.1 in [HJ77]. For such F, G, we have The first term vanishes by dominated convergence using (3.11), the second one by the convergence of Y (n) under P 2 . for some real γ m , 1 ≤ m ≤ M, and a 1 < b 1 ≤ a 2 < · · · ≤ b m such that ν((−∞, ·]) is continuous at all a i , b i . The last condition only excludes countable many points for the choice of a i , b i and in particular still allows to approximate any f ∈ L 2 (ν). Let U (n) be a sequence of n × n unitary or orthogonal Haar distributed matrices. We denote by W (n) a process indexed by subsets A × B of {1, . . . , n} 2 , such that If A and/or B are of the form {1, . . . , ⌊tn⌋} with t ∈ [0, 1], we replace the corresponding index by t.
It was shown in [DMR12], that for suitable index sets, W (n) converges to 2/βB, where B is a bivariate tied-down Brownian bridge, a centered Gaussian process on [0, 1] 2 with continuous paths and covariance (3.15) Theorem 3.2 ([DMR12], Thm 1.1) As n → ∞, the process ( W be the normalized number of eigenvalues ≤ s, then we claim that where P H = denotes equality in distribution under P H . To see this, let π by a permutation of {1, . . . , n} such that λ π(1) ≤ · · · ≤ λ π(n) . If Π is the permutation matrix with entries Π i,j = ½ π(i)=j , then Π is orthogonal. By the invariance of the Haar measure, we have and the last line equals the right hand side of (3.18). The equality in distribution in (3.19) holds also when both sides are viewed as a function of t, which implies (3.18). We are now almost in the situation to apply Theorem 3.2.

A subordination argument
We defined all unitary U (n) , n ≥ 1, and therefore also all W (n) , n ≥ 1 on a common probability space. By the Skorokhod representation theorem, there exists a modification of this space, such that W (n) → 2/βB almost surely, with respect to a measure we again denote by P H . The product structure implied by assumption (A1) allows us to extend this to a product space with law P H ⊗ P Λ such that ( W with s ∈ S. Since B is uniformly continuous, the convergence of Theorem 3.2 holds with respect to the supremum norm on D([0, 1] 2 ), which implies that the first term in (3.23) vanishes as n → ∞. Since F (n) (s) → F (s) for s ∈ S and using again the uniform continuity of B, the second term vanishes as well. By the bound in (3.22) the convergence W (n) (h) → W(h) follows P H ⊗ P Λ -almost surely in D([0, 1]). The product structure of the extended probability space implies then that for any bounded continuous G we get P Λ -almost surely, that is, (3.20) holds. Since B is a centered Gaussian process with continuous paths, the same holds for W(h). To calculate the covariance we first note that according to (3.15), For m = ℓ the minima in (3.25) all cancel and this reduces to Summing over m, ℓ, this yields for the covariance (3.26) That is, W(h) = σ 1 (h)B, with B a standard Brownian bridge. It remains to replace the elementary function h as in (3.13) by an arbitrary f .

Extension to general f
Let f ∈ L 2 (ν), G be a bounded uniformly continuous functional from D([0, 1]) to R, and ε > 0. Denoting now by d the Skorokhod J 1 -metric on D([0, 1]), let δ < 1 be so small In order to extend the convergence of W (n) (h) with h as in the previous sections replaced by f , we need an a-priori estimate on the distance of the processes W (n) (h) and W (n) (f ). The proof is postponed to the end of this section.
Lemma 3.3 There exists a constant c > 0, such that for η > 0 and g : R → R measurable, , Note that for g satisfying (A2), the upper bound in Lemma 3.3 is equal to cσ 2 0 (g)/η 2 . We may approximate f by a piecewise constant function h = h ε as in (3.13), such that ||f − h|| L 2 (ν) ≤ δ 2 ε. We want to apply Lemma 3.3 with g = f − h, however in order to control the upper bound we need to controlμ (n) (f h). For this, we write f = f + − f − with f ± ≥ 0, and assume the positive and negative part f + and f − is approximated by h + and h − respectively, with h ± ≥ 0 and such that h ± ≤ f ± . Then we can estimate such that (3.28) By assumption (A2), µ (n) (f 2 ) → ν(f 2 ) P Λ almost surely and the elementary form of h as in (3.13) implies µ (n) (h 2 ) → ν(h 2 ) as well. This implies that (3.28) converges P Λ almost surely to ν(f 2 − h 2 ) ≤ 2||f − h|| L 2 (ν) ||f || L 2 (ν) . We have P Λ almost surely and W(f ) are Gaussian processes with covariance (s ∧ t − st)σ 2 1 (h ε ) and (s ∧ t − st)σ 2 1 (f ), respectively, and if ε → 0 and (3.31) The combination of (3.30) and (3.31) shows that we may replace h in (3.24) by any f ∈ L 2 (ν), so that W (n) (f ) converges to W(f ) in distribution under P H , for P Λ -almost all λ. ✷ Proof of Lemma 3.3: We write By the invariance of the Haar distribution, the vector of increments (Y 1,n , . . . , Y n,n ) is exchangable under P H , meaning that any permutation of the Y i,n has the same distribution. Corollary 2 in [Pru98] shows that there exists a universal constant c > 0, such that