Concentration for Coulomb gases on compact manifolds

We study the non-asymptotic behavior of a Coulomb gas on a compact Riemannian manifold. This gas is a symmetric n-particle Gibbs measure associated to the two-body interaction energy given by the Green function. We encode such a particle system by using an empirical measure. Our main result is a concentration inequality in Kantorovich-Wasserstein distance inspired from the work of Chafa\"i, Hardy and Ma\"ida on the Euclidean space. Their proof involves large deviation techniques together with an energy-distance comparison and a regularization procedure based on the superharmonicity of the Green function. This last ingredient is not available on a manifold. We solve this problem by using the heat kernel and its short-time asymptotic behavior.


Introduction
We shall consider the model of a Coulomb gas on a Riemannian manifold introduced in [6, Subsection 4.1] and study its non-asymptotic behavior by obtaining a concentration inequality for the empirical measure around its limit. Let us describe the model and the main theorem of this article.
Let (M, g) be a compact Riemannian manifold of volume form π. We suppose, for simplicity, that π(M ) = 1 so that π ∈ P(M ) where P(M ) denotes the space of probability measures on M . We endow P(M ) with the topology of weak convergence, i.e. the smallest topology such that µ → M f dµ is continuous for every continuous In particular, M G x dπ does not depend on x ∈ M and the Green function is unique up to an additive constant. See [1,Chapter 4] for a proof of these results. We will denote by G the Green function for ∆ such that for every x ∈ M . For x ∈ M the function G x may be thought of as the potential generated by a particle located at x in the manifold M and a 'uniform' negative background charge. The total energy of a system of n particles of charge 1/n (each particle coming with a 'uniform' negative background charge) would be H n : M n → (−∞, ∞] defined by where Z n is such that P n (M n ) = 1. This can be thought of as the Riemannian generalization of the usual Coulomb gas model described in [14] or [4]. In the particular case of the round two-dimensional sphere, it is known (see [9]) that if β n = 4πn 2 the probability measure P n describes the eigenvalues of the quotient of two independent n × n matrices with independent Gaussian entries. Define H : P(M ) → (−∞, ∞] by This is a convex lower semicontinuous function (see [6,Subsection 4.1]). Let i n : M n → P(M ) be defined by If β n /n → ∞, the author in [6] tells us that {i n * (P n )} n≥2 satisfy a large deviation principle with speed β n and rate function H − inf H. In particular, if F is a closed set of P(M ) we have lim sup or, equivalently The aim of this article is to understand the o(β n ) term for some family of closed sets F . Suppose we choose some metric d in P(M ) that induces the topology of weak convergence. The unique minimizer of H is µ eq = π (see Theorem 3.1) so a nice family of closed sets are the sets for r > 0. As H is lower semicontinuous we have that inf µ∈Fr (H(µ) − inf H) is strictly positive and the large deviation inequality is not vacuous. We would like a simple expression in terms of r for the leading term, so instead of using inf µ∈Fr (H(µ) − inf H) we will use a simple function of r. Let d g denote the Riemannian distance. The metric we shall use on P(M ) is If d ≥ 3 there exists a constant C > 0 that does not depend on the sequence {β n } n≥2 such that for every n ≥ 2 and r ≥ 0 In fact, by a slight modification we will also prove the following generalization.
Theorem 1.2 (Concentration inequality for Coulomb gases in a potential). Take a twice continuously differentiable function V : M → R and define Then H has a unique minimizer that will be called µ eq . Suppose P n is defined by (1.3) and let d be the dimension of M . If d = 2 there exists a constant C > 0 that does not depend on the sequence {β n } n≥2 such that for every n ≥ 2 and r ≥ 0 P n (W 1 (i n , µ eq ) ≥ r) ≤ exp −β n r 2 4 + β n 8π log(n) n + nD(µ eq π) + C β n n .
If d ≥ 3 there exists a constant C > 0 that does not depend on the sequence {β n } n≥2 such that for every n ≥ 2 and r ≥ 0 To prove Theorem 1.1 we follow [4] in turn inspired by [12] (see also [13]). We proceed in three steps. The first part, described in Section 2, may be used in any measurable space but it demands an energy-distance comparison and a regularization procedure. The energy-distance comparison will be explained in Section 3 and it may be extended to include Green functions of some Laplace-type operators. The regularization by the heat kernel, in Section 4, will use a short time asymptotic expansion. It may apply to more general kind of energies where a short-time asymptotic expansion of their heat kernel is known. Having acquired all the tools, Section 5 will complete the proof of Theorem 1.1 and, by a slight modification, Theorem 1.2.
2. Link to an energy-distance comparison and a regularization procedure In this section M may be any measurable space, π any probability measure on M and H n : M n → (−∞, ∞] any measurable function bounded from below. Given β n > 0 we define the Gibbs probability measure by (1.3). Let H : P(M ) → (−∞, ∞] be any function that has a unique minimizer µ eq ∈ P(M ). This shall be thought of as a rate function of some Laplace principle as in [6]. Consider a metric on P(M ) that induces the topology of weak convergence and define F r = {µ ∈ P(M ) : d(µ, µ eq ) ≥ r} for r > 0. We want to understand a non-asymptotic inequality similar to (1.4) with an explicit o(β n ) term. For this, we consider the following assumption.
Assumption. We will say that an increasing convex function f : Under assumption A, (1.4) implies . We would like to understand the o(β n ) term and find a bound that does not depend on r. Denote by D(· π) : P(M ) → (−∞, ∞] the relative entropy of µ with respect to π, also known as the Kullback-Leibler divergence, i.e. D(µ π) = M ρ log ρ dπ if dµ = ρ dπ and it is infinity when µ is not absolutely continuous with respect to π. The following result is the first part of the method mentioned in Section 1.
Theorem 2.1 (General concentration inequality). Suppose we have two real numbers a n and b n such that there exists a measurable function R : M n → P(M ) with the following property • for every x = (x 1 , ..., x n ) ∈ M n we have Let us denote e n = M n H n dµ ⊗n eq and e = H(µ eq ).
is an increasing convex function that satisfies assumption A then P n (d(i n , µ eq ) ≥ r) ≤ exp −β n 2f r 2 + nD(µ eq π) + β n (e n − e) + β n a n + β n f (b n ) .
Proof. We first prove the two following results. The first lemma is the analogue of [4, Lemma 4.1].

Lemma 2.2 (Lower bound of the partition function).
We have the following lower bound.
The second lemma will help us in the step of regularization.
we have that where we have used that f is increasing and convex.
where in ( * ) we have used Assumption A, in ( * * ) we have used Lemma 2.3 and in ( * * * ) we have used the monotonicity of f .
In the next section we return to the case of a compact Riemannian manifold and study a energy-distance comparison that will imply Assumption A.

Energy-distance comparison in compact Riemannian manifolds
We take the notation used in Section 1. The Kantorovich metric W 1 defined in (1.5) can be written as This result is known as the Kantorovich-Rubinstein theorem (see [15,Theorem 1.14]). In the case of a Riemannian manifold, by a smooth approximation argument such as the one in [2], we can prove that The next theorem gives the energy-distance comparison required to satisfy Assumption A. This is the analogue of [ Theorem 3.1 (Comparison between distance and energy). Suppose that µ eq ∈ P(M ) is a probability measure on M such that H(µ eq ) ≤ H(µ) for every µ ∈ P(M ). Then for every µ ∈ P(M ). This implies, in particular, that H has a unique minimizer and that Assumption A is satisfied by f (r) = r 2 2 . Furthermore, µ eq = π.
so that E(µ) = 2H(µ) whenever µ ∈ P(M ) ∩ F . We can also notice that if µ, ν ∈ P(M ) are such that H(µ) and H(ν) are finite then G(x, y)dµ(x)dν(y) < ∞ by the convexity of H, the measure µ − ν belongs to F and We begin by proving the following result that may be seen as a comparison of distances where the 'energy distance' between two probability measures µ, ν ∈ P(M ) of finite energy is defined as E(µ − ν). This is the analogue of [4, Theorem 1.1].
Proof. First suppose µ and ν differentiable, i.e. suppose they have a differentiable density with respect to π.
Then, as remarked in (1.1), we know that U is differentiable and We also know that Then, This implies that W 1 (µ, ν) ≤ E(µ − ν). In general, let µ, ν ∈ P(M ) such that H(µ) and H(ν) are finite. Take two sequences {µ n } n∈N and {ν n } n∈N of differentiable probability measures that converge to µ and ν respectively and such that E(µ n ) → E(µ) and E(ν n ) → E(ν) (see [3] for a proof of their existence) and proceed by a limit argument.
The next step to prove Theorem 3.1 is a fact that works for general two-body interactions i.e. G is not necessarily a Green function. . Suppose that µ eq is a probability measure such that H(µ eq ) ≤ H(µ) for every µ ∈ P(M ). Then, for every µ ∈ P(M ) such that H(µ) < ∞, we have E(µ − µ eq ) ≤ E(µ) − E(µ eq ).
Proof. As H(µ) and H(µ eq ) are finite we use (3.2) to notice that the affirmation were true then, defining µ t = (1 − t)µ eq + tµ = µ eq + t(µ − µ eq ), we would see that the linear term of E(µ t ) is M×M G(x, y)dµ(x)dµ eq (y)−E(µ eq ) < 0. This means that E(µ t ) < E(µ eq ) for t > 0 small which is a contradiction. Now we may complete the proof of Theorem 3.1.
Proof of Theorem 3.1. Let µ eq be a minimizer of H and let µ ∈ P(M ) be a probability measure on M . If H(µ) is infinite there is nothing to prove. If it is not, by Lemma 3.2 and 3.3 we conclude (3.1).
To prove that H has a unique minimizer supposeμ eq is another minimizer and use Inequality (3.1) with µ =μ eq to get W 1 (μ eq , µ eq ) = 0 and, thus,μ eq = µ eq .
In the next section we study a way to regularize the empirical measures in the sense of the hypotheses of Theorem 2.1.

Heat kernel regularization of the energy
In this section the main tool is the heat kernel for ∆. A proof of the following proposition may be found in [5,Chapter VI]. for every x, y ∈ M and t > 0. Such a function will be called the heat kernel for ∆. It is non-negative, it is mass preserving, i.e. M p t (x, y)dπ(y) = 1 for every x ∈ M and t > 0, it is symmetric, i.e.
for every x, y ∈ M and t > 0 and it satisfies the semigroup property i.e. M p t (x, y)p s (y, z)dπ(y) = p t+s (x, z) for every x, y ∈ M and t, s > 0. Furthermore, lim t→∞ p t (x, y) = 1 uniformly on x and y.
Let p be the heat kernel associated to ∆. For each point x ∈ M and t > 0 define the probability measure µ t x ∈ P(M ) by (4.1) dµ t x = p t (x, ·)dπ, or, more precisely, dµ t x (y) = p t (x, y)dπ(y). Then we define R t : M n → P(M ) by µ t xi and we want to find a n and b n of the hypotheses of Theorem 2.1 for R = R t . We begin by looking for b n .

4.1.
Distance to the regularized measure.
Proposition 4.2 (Distance to the regularized measure). There exists a constant C > 0 such that for all t > 0 and x ∈ M n W 1 (R t ( x), i n ( x)) ≤ C √ t.
Proof. The following arguments are very similar to those in [11] and they will be repeated for convenience of the reader. As Then, we will try to find a constant C > 0 such that W 1 (δ x , µ t x ) ≤ C √ t for every x ∈ M . As the only coupling between δ x and µ t x is their product we see that In fact we will study the 2-Kantorovich distance between δ x and µ t If we prove that there exists a constant C > 0 such that for every x ∈ M   d(x, y). Then r is differentiable π-almost everywhere and there exists a non-decreasing process L and a one-dimensional Euclidean Brownian motion β such that for every t ≥ 0 where ∆r is the π-almost everywhere defined Laplacian of r.

Applying Lemma 4.3 and Itô's formula and then taking expected values we get
where we are using the notation of Lemma 4.3. By [8,Corollary 3.4.5] we know that r∆r is bounded in M and as D t (x) = E[r(X t )] we obtain (4.2) where the constant C does not depend on x. Now we will look for a n of the hypotheses of Theorem 2.1.

4.2.
Comparison between the regularized and the non-regularized energy.
If d > 2 there exists a constant C > 0 such that, for every n ≥ 2, t ∈ (0, 1] and x ∈ M n , To compare H(R t ( x)) and H n ( x) we will write, for x = (x 1 , ..., x n ) ∈ M n , Let us define Then we may write So we want to compare G t and G. The idea we shall use is that if G is the kernel of the operatorḠ and p t is the kernel of the operatorP t then G t is the kernel of the operatorP tḠPt . But using the eigenvector decomposition we can see that where e 0 is the eigenvector of eigenvalue 0, i.e. the constant function equal to one. Then where we have used the semigroup property of t →P t , the fact thatP t e 0 = e 0 andP * t =P t . We will prove the previous idea in a somehow different but very related way. We begin by proving the analogue of (4.3). Proof. To prove the integrability of t → p t (x, y) − 1 we will need to know the behavior of p t for large and short t. For the large-time behavior we have the following result.
Proof. We follow the same arguments as in the proof of [7, Corollary 3.17]. By the semigroup property, the symmetry of p t and Cauchy-Schwarz inequality we get If λ is the first strictly positive eigenvalue of −∆ and if f ∈ L 2 (M ) we get If we choose f = p T 2 (x, ·) − 1 we obtain where we have used the semigroup property for the last equality. Similarly, we get  and C 2 such that for every t ∈ (0, 1) and x, y ∈ M we have The integrability of t → p t (x, y) − 1 when x = y and the fact that ∞ 0 (p t (x, x) − 1)dt = ∞ for every x ∈ M can be obtained from Lemma 4.7 and Lemma 4.6.
Using Lemma 4.6 and the dominated convergence theorem we obtain the continuity of the function (x, y) → ∞ 1 (p t (x, y) − 1) dt at any (x, y) ∈ M × M . By the dominated convergence theorem and Lemma 4.7 we obtain the continuity of the function (x, y) → 1 0 (p t (x, y) − 1) dt for x = y. Using Fatou's lemma we obtain the continuity of (x, y) → 1 0 (p t (x, y) − 1) dt at (x, y) such that x = y. So, we get that (x, y) → The following lemma assures that K(x, ·) is integrable for every x ∈ M .

Equivalently we have
and integrating in t from zero to infinity we obtain By a polarization identity we have that, for every φ, ψ ∈ C ∞ (M ),  Proof. Take the time (i.e. with respect to t) derivative (denoted by a dot above the function).
We will study the first term of the sum (the second being analogous).
where in the last line we have used the symmetry and the semigroup property of p. Using again the symmetry of p we getĠ t (x, y) = −2p 2t (x, y) + 2, and by integrating we obtain (−p s (x, y) + 1) ds for every 0 < s < t < ∞. As a consequence of the uniform convergence of Proposition 4.1 we can see that µ t x and µ t y defined in (4.1) converge to π as t goes to infinity. Fix any T > 0. As G T +s (x, y) = G T (α, β)dµ s x (α)dµ s y (β) for any s > 0 and as G T is continuous we obtain lim t→∞ G t (x, y) = M×M G T (x, y)dπ(x)dπ(y) = 0 and then Using Proposition 4.5 and 4.9 we conclude the following inequality. We can find an analogous result in [10, Corollary 4.10 (Off-diagonal behavior). For every n ≥ 2, t > 0 and (x 1 , ..., Proof. As the heat kernel is non-negative, by Proposition 4.5 and 4.9 we have that, for every x, y ∈ M , What is left to understand is n i=1 G t (x i , x i ). This will be achieved using Proposition 4.9 and the short-time asymptotic expansion of the heat kernel. A particular case is mentioned in [10, Lemma 5.3]. If d > 2 there exists a constant C > 0 such that for every t ∈ (0, 1] and x ∈ M Proof. By the asymptotic expansion of the heat kernel (see for instance [5,Chapter VI.4]) we have that there exists a constantC > 0 (independent of x and t) such that, for t ≤ 1, Then, We know by Proposition 4.9 that In the case d = 2 we obtain that, for t ∈ (0, 1], where C is 2C plus a bound for G 2 (x, x) independent of x. In the case d > 2 we use that s −d/2+1 ≤ 2s −d/2 for s ∈ (0, 1] and that G 2 (x, x) is bounded from above to obtain a constant C such that, for t ∈ (0, 1], Knowing the diagonal and off-diagonal behavior of the regularized Green function we can proceed to prove Proposition 4.4. Proof of Proposition 4.4. Take x = (x 1 , ..., x n ) ∈ M n . Then if d = 2 we have where we have used Corollary 4.10 and Proposition 4.11. If d > 2 we proceed in the same way to get Having acquired all the tools to apply Theorem 2.1 to the case of a Coulomb gas on a compact Riemannian manifold, the next section will be devoted to prove the main theorem and its almost immediate extension.

Proof of the concentration inequality for Coulomb gases
Proof of Theorem 1.1. First, we notice that e n = M H n dµ eq = n−1 n e = 0. To use Theorem 2.1 we define f (r) = r 2 2 and R = R for every x ∈ M n and n ≥ 2 so we we can apply Theorem 2.1 to obtain the desired result with C =C 2 2 +C. Finally we present the proof of 1.2.
Proof of Theorem 1.2. To apply Theorem 2.1 we notice that Assumption A is satisfied by f (r) = r 2 2 . Indeed Theorem 3.1 is still true for this new H except for the caracterization of the minimizer. We obtain, in particular, that H has a unique minimizer. By a calculation we can see that e − e n = 1 2n M×M G(x, y)dµ eq (x)dµ eq (y) which is of order 1 n and will be absorbed by the constant C. To meet the hypotheses of Theorem 2.1, we need to compare 1 n By using the relation where X t is the Markov process with generator ∆ starting at x we obtain whereĈ is some upper bound to ∆V and thus In conclusion, if we choose R = R in dimension greater than two so that we can apply Theorem 2.1.