Linear statistics of the circular $\beta$-ensemble, Stein's method, and circular Dyson Brownian motion

We study the linear statistics of the circular $\beta$-ensemble with a Stein's method argument, where the exchangeable pair is generated through circular Dyson Brownian motion. This generalizes previous results obtained in such a way for the CUE and provides a novel approach for studying linear statistics of $\beta$-ensembles. This approach allows studying simultaneously a collection of linear statistics whose number grows with the dimension of the ensemble. Also this approach requires estimating only low order moments of the linear statistics.


Introduction
The goal of this note is to study linear statistics of the circular β-ensemble (which we will usually denote by CβE or CβE(n) if we wish to stress the dimension). More precisely, if (e ix 1 , ..., e ixn ) is a realization of the ndimensional CβE, we shall study the Wasserstein-1 distance of the law of (1) to the law of where Z j are i.i.d. standard complex Gaussians.
Our main result will be that if d grows slowly enough with n, the distance goes to zero as n → ∞. Our approach will be to apply Stein's method for which we shall generate an exchangeable pair through circular Dyson Brownian motion. The estimates one will then need to apply Stein's method involve some low order moments of T d for which we can make use of results of [15].
The motivation for this approach comes from [11,9], where a similar approach is used for β = 2 (as well as the circular real ensemble and circular quaternion ensemble, i.e. the Haar measure on the orthogonal and symplectic groups), though the relevant dynamics is interpreted through the heat kernel on the unitary group which does not generalize so obviously to other values of β.
While the fact that finite collections of such linear statistics converge jointly in law to independent Gaussians with suitable variances, is certainly known (e.g. the approach of [16] should be easily adapted to the circular case and more recently such a result is proven in [15] -for other work related to the linear statistics of the CβE, see e.g. [23,10]), what our approach offers is a rate of convergence (which is likely to be extremely far from the true one -in the case of CUE the rate is known to be superexponential, see [17] -which is much faster than the one our approach suggests) as well as a possibility to study the joint convergence of linear statistics whose number grows with n. Another benefit of this approach is that one only needs to estimate only rather few moments. To the author's knowledge, such results aren't known for CβE. Moreover, this approach through Stein's method coupled with Dyson Brownian motion has potential to be applied to other β-ensembles.
The outline of this note is the following: we begin by recalling the definition of the CβE and the relevant Wasserstein distance as well as stating our main result. Next we shall recall the approach in [9] for multivariate complex normal approximation, the definition of circular Dyson Brownian motion, and point out what the relevant estimates we shall need for applying Stein's method to our case. These estimates involve the generator of circular Dyson Brownian motion acting on certain power sums, which are simple to calculate exactly, along with moment bounds of power sums which can be estimated with results from [15]. Finally we point out as an application of the results of [15] a limit theorem for the logarithm of the characteristic polynomial of the CβE. This is very similar to a result of [14] for the CUE.
Acknowledgements: The author wants to thank two anonymous referees for helpful comments about the article, as well as K. Kytölä for useful discussions. This work was supported by the Academy of Finland.
2. The circular β ensemble, The Wasserstein distance, and the main result The purpose of this section is to state our main result and to do this, we recall the definition of the CβE and the Wasserstein-1 distance.
and β > 0. The n-dimensional CβE is the following probability measure on ∆ n : where the normalization constant is a Selberg integral and can be evaluated exactly: Remark 2. We will often identify [0, 2π) with the unit circle and ∆ n with a subset of the n-fold product of the unit circle with itself.
The Wasserstein-1 distance is a metric on the space of random variables taking values in a fixed underlying space (which we'll take to be Euclidean, but more general cases are possible) with finite first absolute moment. Convergence with respect to it is equivalent to convergence in law along with convergence of the first absolute moment. Let us recall its two equivalent definitions (see e.g. Chapter 6 in [24] for more information on Wasserstein distances): Definition 3. The Wasserstein-1 distance between the laws of two R d (or C d as we'll actually be interested in) valued random variables -X and Yis , where the infimum is over all couplings of X and Y .
An equivalent definition for the metric (a result due to Kantorovich and Rubinstein) is given by We can now state our main result.
where Z j are i.i.d. standard complex Gaussians. Then Remark 5. As in [9], we could consider instead of T d a vector of the form where also r grows with n and one will get constraints on r and d for the vanishing of the Wasserstein distance with similar methods as those we use. For simplicity, we only consider the case of T d .
Remark 6. One can use this result to study linear statistics of functions on the unit circle with nice enough regularity by Fourier expanding them and applying our result.

Stein's method and circular Dyson Brownian motion
We'll give a short informal sketch of the Stein's method argument for multivariate normal approximation that will be relevant for us. For a detailed treatment, see e.g. [20]. After this, we shall state the precise theorem (that appears in [9]) that we shall make use of. Next we shall review the definition and some basic properties of circular Dyson Brownian motion and how it ties into our Stein's method argument.
3.1. Stein's method. For simplicity we'll consider the case of real normal variables (the complex one follows from this). Let us assume that Σ is a symmetric positive definite d × d matrix. We'll denote by Y a d-dimensional vector of i.i.d. standard Gaussians and by Y Σ we denote √ ΣY .
We'll also use the following notation: by ·, · HS we denote the Hilbert-Schmidt inner product of two matrices (12) A, B HS = Tr(AB * ), where B * denotes the Hermitian conjugate of B. We'll denote by · HS the corresponding norm.
We will then make use of the following facts (see [20]) for each f ∈ C 2 (R d ) for which the above integrand is in L 1 (with respect to the randomness). Here Hessf is the Hessian matrix of f , and the second inner product is the Euclidean inner product of R d .
Fact (Fact 2). If g ∈ C ∞ (R d ), then is a solution to the differential equation Let us now assume that we have a random vector X for which we wish to show that the law of X is close to that of Y Σ in the sense of the Wasserstein distance, and let us further assume that we have another random vector X ′ on the same probability space as X and X ′ d = X. Moreover, let us assume that for some invertible deterministic matrix Λ and some random vector V . We'll want to think of X ′ being close to X so that when for example Taylor expanding f (X ′ ) around X for some function f , we can ignore high enough order terms. Also we assume that where Σ is again our deterministic symmetric positive definite matrix and where we Taylor expanded ∇f (X ′ )−∇f (X) around X and ... denotes higher order terms in the expansion. Noting that and conditioning on X, we find that From Fact 2, we then find As Hessf and ∇f are bounded, if we can control Λ −1 M and Λ −1 V (and the higher order terms), this suggests that we can control the Wasserstein distance. This is indeed the case. We'll actually construct a one parameter family of the vectors X ′ through Dyson Brownian motion started from an independent CβE realization and the closeness of X and X ′ will come from the t → 0 limit. Let us state the actual theorem for the Stein's method argument in the following form (see Theorem 1.3 in [9] and Theorem 4 in [20] for proofs) Theorem 7 (Döbler and Stoltz, Meckes). Let W, W t (for t > 0) be C d valued L 2 (P) random vectors on the same probability space (Ω, A, P) such that for vector whose entries are i.i.d. standard complex Gaussians. Suppose that there exist non-random matrices Λ, Σ ∈ C d×d such that Λ is invertible and Σ is positive definite. Assume further that there exists a random vector R ∈ C d , random matrices S, T ∈ C d×d , and a deterministic function s : (0, ∞) → R with the following properties for each ǫ > 0. Then where · op denotes the operator norm: for A ∈ C d×d |Ax|.
Remark 8. As noted in [9], we can replace the estimate for E(|W t − W | 2 1 |Wt−W |>ǫ ) by the weaker one

Circular Dyson Brownian motion.
In this section we define circular Dyson Brownian motion and point out how it ties into our Stein's method argument.

Circular Dyson Brownian motion was introduced by Dyson [8]
and Discussed for example in [23]. Its existence is proven in [4]. It is a model for diffusing particles confined to the unit circle and interacting with each other through a logarithmic repulsion. The main result of [4] is that one can make the following definition: ., x n (t)) t≥0 ) which is the unique strong solution to the system of stochastic differential equations for j = 1, ..., n. Here b j are i.i.d. standard Brownian motions.
Remark 10. It is proven in [4], that for β ≥ 1 the particles almost surely do not collide (so x i (t) = x j (t) for i = j for all t), but for β ∈ (0, 1) they almost surely do.
In the following remark we'll informally recall some basic facts from diffusion theory applied to our setting.
Remark 11. As we are dealing with continuous semimartingales, we can make use of Itô's lemma, and general facts from diffusion theory hold. In particular, a simple application of Itô's lemma implies that we have for some fixed x(0) ∈ ∆ n and C 2 function f where E x(0) denotes expectation with respect to the law of the process started from x(0), and L β can be viewed as the infinitesimal generator of the process: As for β < 1 there can be collisions, there is some care to be taken about what the precise domain of the infinitesimal generator is (for example, if f ∈ C 2 (∆ n ) is a function in the domain of the generator, then one must have that lim From (27) we see that if ρ t (x; x(0)) is the density of the law of x(t) started at x(0), then it satisfies the equation This implies that the CβE is a stationary distribution for circular Dyson Brownian motion. To see this, note that for where C n,β is a normalization constant, and V (x) = − log |2 sin x 2 |, a simple calculation making use of the fact that 1 2 (0)) with initial data given by the CβE: ρ 0 (x, x(0)) = ρ(x), is ρ t (x, x(0)) = ρ(x) -or the CβE is a stationary distribution for circular Dyson Brownian motion.
Let us now prove our main estimates required for applying Theorem 7. This entails estimating E(f (x(t))|x(0)), when f is a function relevant to Theorem 7. This will be done through estimates on L β f for relevant f . For β = 2 [11,9] make use of similar results for the heat kernel of the unitary group found in [21,18] with a different kind of approach.
Lemma 12. Let x(0) be distributed according to the CβE and independent of (b j (t)). Also let k ∈ Z and write for x ∈ [0, 2π] n , p k (x) = n j=1 e ikx j . Then and where sgn(k) = k/|k| for k = 0 and L β is the operator from (28). Moreover, for k, l ∈ Z and In both cases, the convergence is in L 1 with respect to the law of the CβE.
Proof. Let us first establish the claims about the action of L β on p k and p k p l . We have from (28) Then note that We thus conclude that For k ∈ Z + , we expand the difference quotient and find (using for example p 0 (x) = n) which was the claim for k > 0. For k = 0 the claim is clear, and for k < 0 it follows by complex conjugating the k > 0 case. For calculating L β [p k p l ], we note that if we write ∆ = n j=1 ∂ 2 ∂x 2 j , then in general for twice differentiable functions f and g one has The first order part of L β satisfies a normal product rule so we find which was the claim concerning the action of L β on p k p l . Let us now turn to the convergence part. From (27) we find that for any fixed As we've seen that L β p k is a polynomial in the variables e ix j , we see that sup x |[L β p k ](x)| is a finite number depending on n and k. Thus the random variable we're taking an expectation of on the right side of the equation is uniformly bounded in t, and by the continuity of t → x(t) at t = 0, it converges to zero almost surely as t → 0. Thus by the dominated convergence theorem (applied to the E x integral) we conclude that the left side of the equation tends to zero. The same argument implies that the left side of the equation is bounded in x by a constant depending only on n and k, if we integrate over x with respect to the law of the CβE, we can apply the dominated convergence theorem again to achieve L 1 convergence with respect to the law of the CβE. The argument for the L 1 convergence of the p k p l -term is similar.

Moment estimates of power sums for the CβE
Before checking the conditions for Theorem 7, we need some moment estimates on power sums. We need a simplified version of the main result in [15] (their results are analogous to those of [6] though extended to general β from the unitary case through Jack polynomial theory): Theorem 13 (Jiang and Matsumoto). Let 0 ≤ m ≤ n, and for 0 ≤ m ≤ n with 0 ≤ j, k ≤ m, In most of our applications, we will have m = o(n) and this becomes Corollary 14. For 0 ≤ m = o(n) and n large enough Proof. This follows directly from the definition of A and B noting that for m = o(n)

Proof of Theorem 4
We can now check the conditions required for Theorem 7 and make the estimates needed for the proof of Theorem 4. Let us begin by checking the conditions for Theorem 7.
The first condition for Theorem 7 involved the conditional expectation of W t given W : where Λ ∈ C d×d with entries (52) Λ k,l = δ k,l nk β 2 , and R ∈ C d with entries for j, k ∈ Z + . To calculate this, we expand the product and consider each term separately: and (57) Thus Making use of Lemma 12, we find (59) lim We then write this as (60) lim or in other words (62) Σ k,l = 2 β δ k,l k.
Moreover S ∈ C d×d with entries and a similar argument yields or (again with convergence in L 1 ) as t → 0. Following Remark 8, it is enough for us to estimate E|W t − W | 3 (which for a diffusion one would expect to behave as t 3/2 , but we still outline an argument for checking it directly) which in turn we can bound from above by E|W t − W | 2 E|W t − W | 4 . Conditioning on W and using (59), one finds E|W t − W | 2 = O(t) as t → 0 and using similar arguments (in particular, the fact 5.5. The Wasserstein-1 distance. Thus the conditions for Theorem 7 are met (Λ is invertible and Σ positive definite) and we have where Z a d-dimensional vector of i.i.d. standard complex Gaussians, || · || op denotes the operator norm (with respect to the underlying Euclidean norm), | · | the Euclidean norm, and || · || HS the Hilbert-Schmidt norm. Let us check what these quantities are.
Being diagonal matrices, we note that We'll estimate E|R| by k E|R k | 2 and recall that R k consisted of two types of terms R k = A k +B k for which we write |R k | 2 ≤ 2(|A k | 2 +|B k | 2 ) and estimate these separately. We use a similar estimate for the Hilbert-Schmidt norms. More precisely, recalling the definition of R, S, and T we have We then make use of Corollary 14 to get bounds on these: .
Proof. Plugging Corollary 14 into (74), we find (80) The first sum is of order d 6 . For the second sum, we note that For the third sum, we note that For S we find (plugging Corollary 14 into (75) and using similar arguments as for R) 6. The logarithm of the characteristic polynomial of the CβE One of the results proven in [14] is a limit theorem where they prove, using mainly results of [6], that in a suitable Sobolev space of distributions, the real and imaginary parts of the logarithm of the characteristic polynomial of the CUE converge jointly in law to a pair of log-correlated Gaussian fields (in fact they can be understood as a restriction of the two-dimensional Gaussian Free Field restricted to the unit circle with a suitable convention for the "zero mode").
As the results in [15] generalize those of [6] to β = 2, one can prove a similar result for the characteristic polynomial of the CβE, though the estimates aren't quite as strong for the β = 2 case so one does not have quite as good a control on the roughness of the field -one needs to consider slightly larger Sobolev spaces than in the β = 2 case. We'll give a brief argument for a proof of this fact here. First we recall the definition of the relevant Sobolev spaces.
and equip it with the inner product (we write f = (f k ) k∈Z and g = (g k ) k∈Z ) With this inner product, H s is a separable Hilbert space. We denote by · s the corresponding norm. We then define our limiting object:

standard complex Gaussians and write formally
Remark 19. One can check that for any ǫ > 0, the above series converges almost surely in H −ǫ so X can be understood as an element of H −ǫ .
We can now state the relevant limit theorem whose proof is essentially identical to that in [14]. A similar argument also appears in [13] so we give only a brief proof.
Theorem 20. Let s > 1/2, β > 0, and where (e ix j ) n j=1 is distributed according to the CβE(n). Moreover, let (93) X n (θ) = Re log P n (θ) and Y n (θ) = Im log P n (θ), where the branch of log is such that Im log(1 − e i(x j −θ) ) ∈ (−π/2, π/2] for all j. Then (X n , Y n ) converges in law in H −s ×H −s to ( 2/βX, 2/βY ) where X is the field defined above and Proof. Following [14], we begin with the remark that expanding the logarithm gives (as an element of H −ǫ ) This implies that Thus in the Fourier basis, we have convergence in the sense of finite dimensional distributions (as [15] or Theorem 4 imply the convergence of say finite collections of the Fourier coefficients).
By Prokohorov's theorem, to prove convergence it is then enough to prove tightness. For this, one uses the fact that the unit ball in H −s ′ is compact in H −s for 0 < s ′ < s. Let us then note that by Theorem 13, if we take some small ǫ ∈ (0, 1), there exists a constant C such that for 0 ≤ j ≤ (1 − ǫ)n, E|p j (x)| 2 ≤ Cj while for j ≥ (1 − ǫ)n we trivially have E|p j (x)| 2 ≤ n 2 .
Thus picking s ′ ∈ (1/2, s) we have This is bounded as the first sum converges as n → ∞ and the second one is O(n 1−2s ′ ). A similar bound holds for Y n . Tightness then follows from the compactness of the unit ball mentioned above, and Markov's inequality.
An interesting question is could one use stronger results on the linear statistics to give a stronger sense for this convergence. Here we essentially only used convergence of finite collections of the linear statistics and in no way made use of the fact that the number of them may grow with n. If one were able to extend d in Theorem 4 from o(n 2/7 ) to something close to n, it seems conceivable that one could estimate for example the distance of the maximum of the field X n to the maximum of the truncation of the field X. Indeed, the superexponential rate of convergence for a single linear statistic proven in e.g. [17] suggests that our bounds are likely to be far from optimal so perhaps something like this could be possible.