A Note on Cumulant Technique in Random Matrix Theory

We discuss the cumulant approach to spectral properties of large random matrices. In particular, we study in detail the joint cumulants of high traces of large unitary random matrices and prove Gaussian fluctuation for pair-counting statistics with non-smooth test functions.


Introduction
Random Matrix Theory has its origins in the works of statisticians in the 1920s and nuclear physicists in the 1950s. In the pioneering papers [1][2][3], Eugene Wigner introduced an ensemble of random matrices that now have his name and computed the limiting spectral distribution. The main ingredient of the proof was the method of moments that allowed Wigner to study asymptotics of the traces of powers of a random symmetric (Hermitian) matrix with independent identically distributed (i.i.d.) components. Since then, the method of moments has been successfully used to study spectral properties of large random matrices on many occasions. It works particularly well when a random matrix has many independent components. We refer the reader to [4][5][6][7][8][9][10][11][12][13] and references therein.
The purpose of this paper is to discuss several applications of the cumulant technique in Random Matrix Theory. The cumulant approach is especially useful if point correlation functions are given by the determinantal or Pfaffian formulas.
The paper is organized as follows. In the Methods section, we give a brief overview of the known results. Several novel results related to the joint cumulants of traces of high powers and pair counting statistics of eigenvalues of a large random unitary matrix are formulated and proved in the Results section. The Discussion section is devoted to brief comments on the future lines of research.
Throughout the paper, the notation a N = O(b N ) means that the ratio a N /b N is bounded from above in absolute value. The notation a N = o(b N ) means that a n /b N → 0 as N → ∞. The notation a N = Ω(b N ) means that the ratio a N /b N is bounded from above in absolute value by a power of log N. Finally, we sometimes use the notation a ∨ b for the maximum of a and b.

Determinantal Point Process and Cluster Functions
The ideas of the cumulant approach in Random Matrix Theory go back at least to the 1995 paper [14] by Costin and Lebowitz, which studied a so-called sine random point process, namely, a determinantal random point process with the correlation kernel K(x, y) = sin(π(x − y)) π(x − y) .
In other words, the k-point correlation functions of the random point process are given by ρ k (x 1 , . . . , x k ) = det[K(x i , x j )] 1≤i,j≤k , k ≥ 1.
We refer the reader to [15,16] for an introduction to determinantal random point processes.
The sine random point process appears as a scaling limit of many ensembles of random matrices, including the Circular Unitary Ensemble defined below in (11).
The main result of [14] states that #[0, L], the number of particles in an interval [0, L], has Gaussian fluctuation in the limit L → ∞ with logarithmically growing variance. Namely, To study the limiting distribution of the counting random variable #[0, L], Costin and Lebowitz suggested using the so-called cluster (Ursell) k-point functions. For k = 1, 2, 3, the cluster functions are given by For arbitrary k ≥ 1, one has where the sum at the r.h.s. of (4) is over all partitions π of the set {1, 2, . . . , k} into blocks B 1 , . . . , B m , |π| = m stands for the number of blocks in a partition π, and |B i | denotes the cardinality of a block B i . For determinantal random point processes, (2) and (4) imply where the sum in the first line is over all permutations in the symmetric group S k , and the sum in the second line is over all (k − 1)! cyclic permutations in S k , with the first of such cyclic permutations being 1 → 2 → 3 → . . . → k → 1. Cluster point functions are closely related to the cumulants of #[0, L]. Denote by I k (L) the integral of the k-point cluster function over the k-dimensional cube [0, L] k : Furthermore, denote by κ j (L), j = 1, 2, 3, . . . , the cumulants of the counting random variable #[0, L]. Recall that the moment and cumulant generating functions are related by It follows from (4) and (6) that the j-th cumulant of #[0, L] can be written as a linear combination of the integrals I k (L), 1 ≤ k ≤ j. Namely, the following relation holds for the generating functions: To prove the Central Limit Theorem for the normalized random variable , it is sufficient to show that since (8) would imply that all cumulants starting with the third one of the normalized counting random variable ] go to zero as L → ∞.
It is not hard to see that if a random point process is such that Var#[0, L] grows linearly in L and the cluster function integrals satisfy a routine application of (7) and (8) finishes the proof of the Central Limit Theorem for the counting function. However, the sine random point process exhibits a more delicate behavior-the variance of the number of particles grows only logarithmically, namely: Thus, a more subtle analysis of the asymptotics of I j (L) is required for j > 2. The proof ( [14]) of the Central Limit Theorem (3) follows from the bounds Remarkably, the result can be generalized to the case of any determinantal random point field on a locally compact Hausdorff phase space E, equipped with a σ-finite measure µ on the Borel σ-algebra provided the correlation kernel is Hermitian and locally trace class and the variance goes to infinity. Namely, the following holds: Theorem 1 ( [17,18]). Let (X, F , P n ) be a family of determinantal random point fields with hermitian locally trace class correlation kernels K n and I n be a family of Borel subsets of E with compact closure. Then, if Var n #(I n ) → ∞ as n → ∞, the normalized counting random variable #(I n )−E#(I n ) √ Var#(I n ) converges in distribution to a standard normal N(0, 1).

Remark 1.
The multivariate version of the theorem also holds (see [17,18]). It is worth noting that the univariate result allows a very nice probabilistic interpretation ( [19]). Namely, one can show that the counting random variable #(I) is equal in distribution to a sum of independent non-identically distributed Bernoulli random variables with the probabilities of success given by the eigenvalues of the integral operator K I f (x) = I K(x, y) f (y)µ(dy) on L 2 (I, µ). However, we are not aware of a simple probabilistic interpretation of the multivariate CLT result.
Analogous results hold for linear statistics L f = ∑ i f (x i ), where f is a bounded measurable test function, e.g., with a compact support, and the summation is over all points of a random point field.
Theorem 2 ( [17,18]). Let (X, F , P n ) be a family of determinantal random point fields with hermitian locally trace class correlation kernels K n and f n be a bounded, measurable, compactly supported function on the phase space E such that for any ε > 0 and some δ > 0 as n → ∞. Then, the normalized linear statistic L fn −EL fn √ VarL fn converges in distribution to N(0, 1) as n → ∞.
For additional results in this direction, we refer the reader to [20][21][22]. Moreover, the connection between integrals of cluster point functions and cumulants can be exploited to study significantly more challenging problems regarding empirical spectral distribution of spacings and extreme spacings in determinantal random point processes (see, e.g., [23,24]).

Linear Statistics in Classical Compact Groups
For many ensembles of large random matrices, the variance of a linear statistic either stays finite or grows slower than any power of the dimension, provided the test function is sufficiently smooth. Therefore, Theorem 2 is not typically applicable even if the point correlation functions are determinantal. It is instructive to consider classical compact groups. In this paper, we focus our attention on random unitary matrices. However, many results can be extended to the orthogonal and symplectic matrices without much difficulty.
Let V be a unitary matrix chosen at random with respect to the Haar measure on the unitary group U(N). We are interested in studying statistical properties of the eigenvalues of V, which we denote by e iθ j , 1 ≤ j ≤ N, −π ≤ θ 1 , . . . , θ N < π. The joint probability density of the eigenvalues is given by the formula ( [25]): The joint distribution of the eigenvalues is known as the Weyl measure, and the ensemble is known in Random Matrices as the Circular Unitary Ensemble ( [26][27][28]). Remarkably, if a test function f on the unit circle is sufficiently smooth, the variance of the linear statistics ∑ N j=1 f (θ j ) stays finite as N → ∞. Moreover, 29,30]). Let f be a real-valued function on the unit circle satisfying are the Fourier coefficients of f . Then, Remark 2. One can show that the result of Theorem 3 follows from the Strong Szego Theorem for Toeplitz determinants ( [30]).

Remark 3.
It follows from Theorem 3 that the real and imaginary parts of the traces of the powers of a random unitary matrix, Tr V k = ∑ N j=1 e ikθ j , k ≥ 1, converge to independent normal random variables with mean zero and variance k/2 as N → ∞. In ( [31]), Johansson proved that the rate of convergence to normal law is O(N −δN ) for some δ > 0. Recently, the results have been further improved in [32,33].
Many joint moments of the traces of powers of a random unitary matrix V coincide with the corresponding joint moments of the limiting normal random variables. Namely, let Z j , j ≥ 1, be a sequence of independent standard complex normals. Then, for any k ≥ 1 and non-negative integers a j , b j , 1 ≤ j ≤ k satisfying one has ( [29]): where the delta symbol δ ab is one if a = (a 1 , . . . , a k ) coincides with b = (b 1 , . . . , b k ) and is zero otherwise. For a beautiful survey of this and related results for random matrices from classical compact groups, we refer the reader to [34].
To better understand (16) and related phenomena, one can study the joint cumulants of the traces of powers of a random matrix. Let {X α∈A } be a family of random variables. The joint cumulants are defined as (see, e.g., [35]): where the sum is over all partitions π of the set {i 1 , . . . , i n }, B goes over the list of all blocks of the partition π, and |π| denotes the number of blocks in the partition. We recall that the joint moments are expressed in terms of the cumulants as We will use the notation T N,k := Tr(V k ) for the traces of powers of V and denote by For a determinantal random point process with a correlation kernel K(x, y), the cumulants of a linear statistic L f = ∑ i f (x i ) can be computed as ( [18]): For the CUE, the point correlation functions are given by the determinantal Formula (2) with the correlation kernel This allows one to obtain the following formula in the CUE case, provided the test function f is sufficiently smooth ( [36], see also [37][38][39]): that can be rewritten as where J N is a piece-wise linear function defined by the following formula Since the joint cumulants are symmetric and multilinear, it follows from (22) and (23)
It should be emphasized that the exact combinatorial structure manifested in Lemma 1 is specific to the classical compact groups and the sine determinantal point process.
To finish the proof of (14) and (16), using the cumulants, one observes that for the expression for J N (p 1 , . . . , p m ; k 1 , . . . , k p ) simplifies, namely: One then uses a combinatorial lemma from [36] to show that all joint cumulants Remark 5. The lemma is related to the combinatorial proof of the Strong Szego Theorem. We refer the reader to [41][42][43][44] for recent applications of the cumulant method for determinantal processes to study linear statistics of Hermitian unitary invariant random matrices and free fermions processes.

Multivariate Linear Statistics and Number Theory Connections
This subsection is devoted to the discussion of multivariate linear statistics of the form where θ = (θ 1 , . . . , θ N ) ∈ T N comes from the CUE (or, in general, the Circular Beta Ensemble with arbitrary β > 0), the scaling factor L N satisfies 1 ≤ L N ≤ N, and f is a sufficiently smooth test function defined on the unit circle (when L N = 1) or the real line (when L N → ∞.) The study of such multivariate linear statistics in [40,[45][46][47] was motivated by connections between Random Matrices and Number Theory (see, e.g., [48][49][50][51] and references therein.) The local case (L N = N) is of a particular interest for the CUE since the multivariate linear statistic where f ∈ C ∞ c (R) and (θ − φ) c is the phase difference on the unit circle (i.e., the difference modulo 2π) and corresponds to multivariate linear statistics of the (rescaled) zeros of the Riemann zeta function studied by Montgomery [52,53], Hejhal [54], and Rudnick and Sarnak [55] (see, e.g., [40]). We also point out a recent related preprint [56], where multivariate linear statistics have been studied for the determinantal random process with the projection correlation kernel on the unit sphere S d , d > 1 .
To study the limiting distribution of (30), we write using the Fourier transform where k n+1 = − ∑ n j=1 k j , andf (t) = 1 (2π) n/2 R n f (x)e −it·x dx, t = (t 1 , . . . , t n ). The proof of the Gaussian fluctuation for (S N ( f ) − ES N ( f ))/ √ N presented in [40] relies on a detailed study of the joint cumulants κ (N) p (k 1 , . . . , k p ), p > 1, for the arguments k i = O(N), 1 ≤ i ≤ p. It is combinatorial in nature. One of the key ingredients of the proof is the fact that joint cumulants scale with N. Namely, does not depend on N. Moreover, it is a piece-wise linear, bounded function of t 1 , . . . , t p on the hyperplane t 1 + . . . + t p = 0 (it is identically zero when t 1 + . . . + t p = 0.) (32) can be written for p > 1 and ∑
We finish this section with a discussion of the multivariate linear statistics for nonsmooth functions. For simplicity, we assume that n = 1 and restrict our attention to the global regime (L N = 1). Then, In [45], we proved that if f is an even real-valued function on the unit circle such that both f belongs to L 2 (T), then where ϕ m are i.i.d. exponential random variables with E(ϕ m ) = 1. We note that the condition ∑ k |f k | 2 k 2 < ∞ is necessary and sufficient for (36) to hold. If the series ∑ k |f k | 2 k 2 is slowly diverging so that the sequence of partial sums V N = ∑ |k|≤N |f k | 2 k 2 is slowly varying in the sense of Karamata ( [57]), then the Central Limit Theorem holds ( [46]): The case of the linearly growing variance will be studied in the next section, where we will consider a class of even test functions f for whichf k k → C = 0 as k → ∞. The values of the cumulant function κ (N) p (k 1 , . . . , k p ) for k i = O(N), 1 ≤ i ≤ p play an important role in the analysis.

Results
We study the Circular Unitary Ensemble of random matrices (11). In particular, we are interested in the joint distribution of the traces of high powers of a unitary matrix V. As before, we denote by κ In what follows, k i = O(N), 1 ≤ i ≤ p, and we are looking for scenarios when the joint cumulants vanish.
We start by considering the joint cumulants of Tr V k and Tr V −k for k ≥ N. While the distribution in this case is well understood (see, e.g., [58]), it is instructive to consider the cumulant approach that will be further expanded later in this section. We are interested in the values of the joint cumulants κ (N) It follows from Lemma 1 that κ (N) p (k, . . . , k, −k, . . . , −k) = 0 unless p is even and p = 2q. We then have k 1 = . . . k q = k and k q+1 = . . . = k 2q = −k. We will use the Formula (26).
unless each subset B j has an equal number of elements equal to k and −k, in which case the l.h.s. of (37) is zero. A routine combinatorial analysis produces ..,q m ): q 1 +...+q m =q, q 1 ,...,q m ≥1 q! q 1 ! · · · q m ! 2 . (38) Therefore, the cumulant generating function C N (x, y) can be written as (39) where is the modified Bessel function of the first kind. A more elementary way to derive (39) relies on the observation by Rains ([58]) that the kth powers of the eigenvalues of V are i.i.d. random variables uniformly distributed on the unit circle provided k ≥ N. Thus, Tr V k , for k ≥ N, is given by the sum of N i.i.d. uniform random variables on the unit circle. The traces of different powers are still correlated for finite N. However, a significant portion of the joint cumulants vanishes. Proposition 1. Let s i , 1 ≤ i ≤ l be l > 1 distinct positive integers and a i , b i be non-negative integers such that a i + b i ≥ 1 for all i. Suppose that Let p = ∑ l 1 (a i + b i ) and (k 1 , . . . , k p ) ∈ Z p be such that the first a 1 coordinates of the vector are equal to s 1 , the next b 1 coordinates are equal to −s 1 , the further next a 2 coordinates are equal to s 2 , the following b 2 coordinates are equal to −s 2 , etc, . . . , and the last b l coordinates are equal to −s l . Then, where {ϕ i } are i.i.d. uniform random variables on the unit circle, and δ nm is the delta symbol.
Proof of Proposition 1. While the proof does not depend on the value of l, we will assume that l = 2 in order to simplify the notations. It follows from (41) that the joint cumulant is zero unless since otherwise ∑ p 1 k i = 0. Therefore, from now on, we can assume that (44) holds. As before, we will use (26) to compute the joint cumulants. We call a subset B j balanced if the number of times s 1 appears in it is equal to the number of times −s 1 appears in it, and the same holds for s 2 and −s 2 . It follows from (41) that unless unless each subset B j is balanced, the l.h.s. of (37) is N. Otherwise, the l.h.s. of (37) is zero. We obtain that for a 1 = b 1 = c and a 2 = b 2 = d, Therefore, the coefficient in front of (xy) c (zw) d in the cumulant generating function coincides with the corresponding coefficient in the power series expansion of where {φ i }, {η i } are i.i.d. uniform random variables on the unit circle, and φ denotes the complex conjugate. The identity (43) follows from the derived identity for the cumulants and the Formula (18), expressing moments in terms of cumulants.
To illustrate the utility of the cumulant approach, we formulate and prove below the Central Limit Theorem for pair-counting statistics for a class of non-smooth test functions. We consider where (θ 1 , . . . , θ N ) is a random CUE point configuration, and f is an even real-valued function on the unit circle such that Remark 6. f (θ) = log | sin(θ/2)| satisfies the above test function requirements.
Theorem 4. Let f be a real even function on the unit circle, satisfying (46). Then, where the limiting variance can be computed as Proof of Theorem 4. We start with the following formula for the variance of S N ( f ): 45]). Let f be a real even function on the unit circle such that f ∈ L 2 (T). Then, Since the test function f satisfies (46), we have ∑ 1≤s≤N−1 and the two double sums in (51), up to a negligible error, are equal to the Riemann sums of converging integrals. As a result, where The integrals in (52) can be trivially simplified as which leads to (48) and (49).
To prove the theorem, we will study asymptotics of the higher moments of S N ( f ). We start by truncating f . Namely, we replace it byf (x) = ∑ |k|≤N log Nfk e ikx . It follows from (50) and (51) (N). Therefore, without a loss of generality, we can assume that the Fourier coefficientsf k are zero for |k| > N log N. Whenever it does not lead to ambiguity, we will use the notation f for the truncated version of a test function. Recall that where we use the notation T N,k = Tr V k . Let us fix a positive integer m > 2. We have as ∑ * π ∏ B∈π κ(T N,k i : i ∈ B), where the sum is over all partitions π of {1, . . . , 2m} that do not contain atoms and two-element subsets of the form {i, i + m}, i = 1, . . . , m. Therefore, For a given partition π, denote and Clearly, Following [40,45], we denote if and only if there is a sequence of blocks B 1 , . . . , B q , q ≥ 1, of the partition π such that for some 1 ≤ i 1 , . . . , i q−1 ≤ m. We call a partition π optimal if the cardinality of every equivalence class is 2. We claim that for each optimal partition π, the corresponding sum Σ π = σ m N m/2 × (1 + o(1)), and for each suboptimal partition, Σ π = o(N m/2 ). The first part of the claim immediately follows from the variance computations above, as the m−dimensional sum (57) then factorizes into the product of m/2 two-dimensional sums, each equal σ 2 N(1 + o(1)).
Next, we will show that the suboptimal partitions give negligible contributions o(N m/2 ). One of the key ingredients is the following upper bound on S π . Lemma 5. Let π be a partition of {1, . . . , 2m} that does not contain atoms and two-element subsets of the form {i, i + m}, i = 1, . . . , m. Then, i.e., S π ≤ N m/2 (log N) d for some d and all sufficiently large N. Moreover, if π contains at least one block of size less than 4, then Proof of Lemma 4. The proof uses mathematical induction on m. Without a loss of generality, we can assume that the equivalence relation ∼ π has only one equivalence class. Otherwise, the sum S π factorizes over the equivalence classes, and the argument is applicable to each equivalence class separately. If π does not contain an equivalence class of size less than 4, the bound (63) follows since the number of terms in the product on the r.h.s. of (58) is not bigger than m/2, each cumulant is O(N), and the partial sums of the harmonic series grow logarithmically.
Suppose now that π contains a block of size 2, say B 1 = {i, j + m}, 1 ≤ i = j ≤ m. Then, {i + m, j} cannot be a block of the partition. Consider the block of the partition that contains i + m. Without a loss of generality, we can assume that this block is B 2 , i.e., i + m ∈ B 2 . Construct a partition π of the set {1, . . . , 2m} \ {i, i + m}, where we discard B 1 and replace i + m in B 2 by j + m. It follows from the construction that S π ≤ S π and by the inductive hypothesis, S π = Ω(N m−1 2 ). Now, suppose that π has a block of size 3, say B 1 = {i, j, l + m}, 1 ≤ i, j, l ≤ m, l = i, j. Consider the block of π that contains l, say l ∈ B 2 . Construct a partition π of the set {1, . . . , 2m} \ {l, l + m}, where we discard B 1 and replace l in B 2 by i and j. Then, S π ≤ S π and, again, applying the inductive hypothesis, S π = Ω(N m−1 2 ).

Remark 7.
A similar argument shows that if π contains a block of size 4 of the form {i, j, l, l + m}, 1 ≤ i, j, l ≤ m, then S π = Ω(N m−1 2 ). Finally, if π contains a block of size 4 of the form B 1 = {i, j, n + m, l + m}, 1 ≤ i, j, n, l ≤ m, such that {i, j} = {n, l}, we consider two cases, namely, |k n | > |k l | and |k n | ≤ |k l |. In the first case, we construct a partition π of the set {1, . . . , 2m} \ {n, n + m}, where we discard B 1 and replace n in its block by i, j, and l + m. In the second case, we do the same procedure with l instead of n. In both cases, the sum is bounded from above by a sum of dimension one less, and we again obtain S π = Ω(N m−1 2 ) by the inductive assumption. Now we are ready to finish the proof of the theorem. Recall that we could assume without a loss of generality that the equivalence relation ∼ π has only one equivalence class. Note that (59) and (63), Lemma 4 and Remark 7 prove Σ π = o(N m/2 ) for all suboptimal partitions π, except the ones that comprise the blocks of size 4 of the form {i, j, j + m, l + m}, 1 ≤ i, j, l ≤ m, i = l. Since k i + k j + k j+m + k l+m = 0, and k j+m = −k j , we obtain k l+m = −k i . It follows from the direct computations (see, e.g., [40]) that κ (N) In particular, |κ Then, Since the fourth cumulant function vanishes unless the sum of the arguments is zero, one has k 1 = k 3 = k 5 = . . . = k m−1 . Moreover, the fourth cumulant function vanishes unless the sum of the absolute values of the arguments is at least 2N. Therefore, |k 1 | + |k 2i | ≥ N, 1 ≤ i ≤ m, and |Σ π | ≤ Const Splitting the summation with respect to k 1 into two parts, corresponding to |k 1 | ≤ √ N and |k 1 | > √ N correspondingly, one arrives at the bound:

Discussion
We have discussed several applications of the cumulant technique in Random Matrix Theory, specifically for the ensembles with determinantal k-point correlation functions. The suggested approach to the fluctuations of multivariate linear statistics of the eigenvalues of random unitary matrices can be extended to other classes of test functions and other classical groups.