Central limit theorems in the geometry of numbers

We investigate in this paper the distribution of the discrepancy of various lattice counting functions. In particular, we prove that the number of lattice points contained in certain domains defined by products of linear forms satisfies a Central Limit Theorem. Furthermore, we show that the Central Limit Theorem holds for the number of rational approximants for weighted Diophantine approximation in $\mathbb{R}^d$. Our arguments exploit chaotic properties of the Cartan flow on the space of lattices.


INTRODUCTION
Let Ω T ⊂ R d be an increasing family of compact domains, and let L d denote the space of lattices in R d with covolume one, endowed with the unique SL d (R)-invariant probability measure λ d . We consider the counting function Λ → |Λ ∩ Ω T | on L d . Under mild assumptions on the domains Ω T , One may ask whether it is possible to derive more precise information about the asymptotic behavior of |Λ ∩ Ω T | for generic lattices Λ.
The following naive heuristics might give an idea of what to expect. Let us decompose and provided that Ω (i 1 ) and Ω (i 2 ) T are "far apart", it seems plausible to conjecture that the random variables Λ → |Λ ∩ Ω i j T |, for j = 1, 2, on L d are "almost independent". Thus, one might wonder whether |Λ ∩ Ω T | behaves like a sum of independent random variables.
Some classical results of W. Schmidt motivated our line of study. In [14,15], Schmidt showed that for generic lattices Λ ∈ L d , for all ε > 0., and thus the counting function Λ → |Λ ∩ Ω T | indeed exhibits cancellations of the same order as a sum of independent random variables. Remarkably, the argument in [14] implicitly follows the heuristic approach outlined above and proves some form of pairwise independence using arithmetic considerations. The aim of this work is to establish a Central Limit Theorem (CLT) in this setting, at least under some additional assumptions on the domains Ω T . We stress that it is unlikely that a CLT holds for general domains; for instance, the counting of lattice points in the regions Ω T = (x, y) ∈ R 2 : 0 < x < T y and 1 < y < 2 is closely related to the distribution of averages for the horocyclic flow on L 2 , which do not admit a CLT (see e.g. [7]).
We shall in this paper consider domains defined by products of linear forms on R d . Such domains can be tesselated using images of a small number of regular tiles under a family of diagonal matrices in SL d (R), and allows us to use dynamical arguments developed in our recent work [4]. The crucial ingredients in our approach are quantitative estimates on higherorder correlations established in our joint work with Einsiedler [3]. Besides generic lattices in the space of lattices L d , we also consider the family of lattices which arises in many problems in the theory of Diophantine approximation.

Distribution of values for products of linear forms
We fix a collection of linearly independent linear forms L 1 , . . . , L d : R d → R with d 3 and consider the product form N(x) = L 1 (x) · · · L d (x). Our aim is to analyze the distribution of the values N(x) when x belongs to a lattice in R d . We fix an interval (a, b) ⊂ R + , and for T 1, we define the domains for some c = c(L 1 , . . . , L d ) > 0, and by [15], almost all unimodular lattices Λ in R d satisfy We shall investigate how the error term (also known as the discrepancy) in this formula behaves.
Our first result shows that the error term admits a Central Limit Theorem. We have currently verified our argument for d 4, but it might be possible optimize the estimates to deal with the case d = 3 as well.
A more general version of this theorem can be established along similar lines. Instead of considering linear forms, let L i : R d → R d i , 1 i k, be a family of linear maps, and set where · denotes the Euclidean norm. We shall assume that d 1 + · · · + d k = d and the map We show that a version of Theorem 2.2 still holds for these domains. In the case k = 2, such a result was also established in [5], using a different method, which does not seem to generalize to k 3.

Spiralling
Motivated by the paper [1], we shall also study "spiraling" of the lattice points contained in the regions (2.2), that is, the distribution of their angular components. We denote by ω i : R d i \{0} → S d i −1 the radial projections, and for a lattice Λ in R d and a Borel set D ⊂ S, we define It is not hard to show (see [1]) that for almost every unimodular lattice Λ in R d , and for any Borel subset D ⊂ S, We prove that if some regularity is imposed on D, then a suitable Central Limit Theorem also holds. Currently, the argument has been verified for d 4, but it might be possible optimize the estimates further to deal with the case d = 3.
Theorem 2.2 (CLT for spiraling). For d 4 and for every domain D ⊂ S with piecewise smooth boundary, there exists explicit σ = σ(D) > 0 such that for every u ∈ R,

Diophantine approximation
Let us now discuss the distribution of integral solutions of some inequalities which arise in the theory of Diophantine approximation. We start with Diophantine approximation on the real line, which is better understood due to the theory of continued fractions. Fix c > 0, and for x ∈ R, we consider the Diophantine inequality with (p, q) ∈ Z × N, and the corresponding counting function N T (x) = | (p, q) ∈ Z × N : 1 q T and p q is a solution of (2.3) | It is known (see, for instance, [14]) that for almost every x ∈ [0, 1], Fuchs showed in [8] that the discrepancy in this formula satisfies the Central Limit Theorem, that is to say, there exists σ > 0 such that for every u ∈ R, as T → ∞. We stress that the correct normalization in (2.4) has caused some confusion in the previous works [10,11,12]; the additional (log log T )-factor arises here because a certain counting function on L 2 is not square-integrable. This non-integrability issue does not appear in higher dimensions, whence this additional normalization factor should disappear. An analogue of this result for simultaneous Diophantine approximation has been recently established in [5].
In this paper we consider the following more general problem in weighted Diophantine approximation. Let us fix a collection of weights 0 < w 1 , . . . , w d < 1 and w 1 + . . . + w d = 1, and constants c 1 , . . . , c d > 0. Given a vector x = (x 1 , . . . , x d ) ∈ R d , we are interested in understanding the asymptotics of solutions for the system of Diophantine inequalities defined by The number of solutions is given by One can show, using Schmidt's arguments in [14], that for almost every x ∈ [0, 1] d , We prove here that the error term in (2.6) satisfies the Central Limit Theorem. In the special case when all weights (w i ) are equal, this result was established in [5].

Theorem 2.3 (CLT for Diophantine approximation).
For d 2, there exists explicit σ > 0 such that for every u ∈ R, In this note we outline the proofs of Theorem 2.1-2.3. Details will be published elsewhere.

The space of lattices and Siegel transforms
We denote by L d the space of lattices in R d with covolume one. We recall that L d can be realised as a homogeneous space L d ≃ SL d (R)/SL d (Z), so that it is equipped with the unique invariant probability measure λ d . Given a bounded Borel measurable function f : R d → R with compact support, we define its Siegel transform f : The starting point of our approach is the observation that the counting functions in Theorems 2.1-2.3 can be realized as certain averages of suitable Siegel transforms. This idea is simpler to explain in the setting of Theorem 2.3, so let us begin by focusing on this case. Let and let χ denote the characteristic function of the domain (x, y) ∈ R d × R : 1 y < 2 and (2.5) holds .
Then one can readily check that for every T = 2 N with N 1, where the lattice Λ x is defined in (1.1). This basic formula allows to study the distribution of N T (x) using dynamics on the space of lattices. More precisely, we shall use an approximation of the form (3.1) with χ replaced by a smooth function f ε that approximates χ well in the L 1and L 2 -senses.
The counting functions in Theorem 2.1 and 2.2 can also be approximated along similar lines, but the formulas are more complicated; in particular, one-parameter subgroups of diagonal matrices are no longer enough to achieve an approximation of N T as in (3.1). Let A d denote the subgroup of diagonal matrices in SL d (R), and set θ r = diag(1, . . . , 1, e r ). We shall show that for suitably chosen smooth compactly supported functions f ε,T on R d and finite subsets B(r, T ) of A d , in the L 1 -and L 2 -norms for (L d , λ d ). Our arguments from now on depend crucially on the fact (which will be explained in more detail below) that for a smooth compactly supported function φ on L d , the collections of functions φ(aΛ) : a ∈ A d are "weakly independent".

The method of cumulants
There exists today a plethora of different techniques to establish convergence to the Gaussian distribution. One of the first such techniques -if not the first -is nowadays often referred to as the Method of Moments, and was used by Chebyshev to prove the classical Central Limit Theorem. We refer to [2] for a modern exposition of this technique. An essentially equivalent technique, but better tailored for problems pertaining to Gaussian distributions, was later developed by Fréchet and Shohat, and goes under the name "Method of Cumulants". Let us briefly survey this method. Given bounded random variables X 1 , . . . , X r , their joint cumulant is defined by the curious expression where the sum is taken over all partitions P of the set {1, . . . , r}. We also set Cum (r) (X) = Cum (r) (X, . . . , X) for a single bounded random variable X. The cumulants have many useful combinatorial properties (see [16]). For instance, if there exists a non-trivial partition {1, . . . , r} = I ⊔ J such that the collections {X i : i ∈ I} and {X j : j ∈ J} are independent of each other, then Furthermore, a bounded random variable X with mean zero is normally distributed if and only if Cum (r) (X) = 0 for r 3. In what follows, we shall use the following useful criterion due to Fréchet and Shohat [6] to establish our Central Limit Theorems. Then for every u ∈ R, In our recent work [4], we used the Method of Cumulants to establish a general Central Limit Theorem for group actions which are exponentially mixing of all orders. Here we essentially follow the approach developed in [4], but substantial modifications will have to be made in order to handle more general averaging schemes, as well as unbounded test functions.

Estimates on the higher-order correlations
Let us now prepare the asymptotic formulas for higher-order correlations that will be used to estimate the cumulants and the variance. These formulas will be formulated in terms of Sobolev norms S k , k 1, defined for smooth compactly supported functions on the space L d (see [3]). In the proofs of Theorems 2.1 and 2.2, we will use estimates on correlations for the action on L d of the group of diagonal matrices A d ⊂ SL d (R). If we fix a invariant metric ρ on A d ∼ = R d−1 , then the following result is a special case of [3, Th. 1.1].

Theorem 3.2 (Exponential multiple mixing of all orders).
For every r 2, there exists an integer k r such that for all k k r , there is δ r,k > 0 with the property that for all φ 1 , . . . , φ r ∈ C ∞ c (L d ) and a 1 , . . . , a r ∈ A d , where D(a 1 , . . . , a r ) = min{ρ(a i , a j ) : i = j}.
In order to study weighted Diophantine approximation (2.5), we need to analyze the distribution of orbits for the one-parameter semigroup a w (t) = diag(e w 1 t , . . . , e w d t , e −t ), t > 0, for lattices contained in the subset We denote by σ d the measure Y d induced by the Lebesgue measure on [0, 1] d . We establish the following asymptotic formula for the higher-order correlations of the measures a w (t) * σ d , generalizing the work of Kleinbock and Margulis [9]. Theorem 3.3. For every r 2 and k k r , there exists δ ′ r,k > 0 such that for every φ 1 , . . . , φ r ∈ C ∞ c (L d+1 ) and t 1 , . . . , t r > 0, We note that Y d is an unstable manifold for the one-parameter semigroup but not for semigroup a w (t), unless the weights w i are all equal. The quantitative equidistribution for the translates a w (t)Y d has been established by Kleinbock and Margulis in [9]. We refine their argument to deal with higher-order correlations. The proof of Theorem 3.3 was inspired by [9]. It goes by induction on r and uses quantitative equidistribution of the measures a w (t) * σ d combined with non-divergence estimates for the unipotent flows.
From Theorem 3.3, we deduce the following non-divergence estimate: This corollary will be used to construct bounded approximations for Siegel transforms.

Bounded approximations
It might now be tempting to try to apply Proposition 3.1, combined with Theorems 3.2 and 3.3, to the approximations (3.1) and (3.2) directly. However, we stress that the Siegel transform f of a smooth compactly supported function f on R d gives an unbounded function on the space of lattices L d . Moreover, the Sobolev norms S k (f) in §3.3 are infinite. In order to deal with these issues, we shall use that f ∈ L p (λ d ) for p < d and show that one can approximate f by a family of functions and This observation will allow us to exploit the estimates from Subsection 3.3 to analyze the variance and the cumulants of higher orders. In the setting of Theorems 2.1 and 2.2, we shall use the approximation (3.2) and consider where φ L ε,T is the bounded approximation forf ε,T . Because of (3.7), the parameter L = L(T ) can be chosen so that After this choice has been made, it suffices to analyze convergence in distribution of Z * T .
In the proof of Theorem 2.3, we consider and its approximation where φ L ε denotes the bounded approximation to f ε as above, and f ε is a smooth approximation to the characteristic function χ (recall the notation from Subsection 3.1). Here the parameters ε = ε(T ) and L = L(T ) can be chosen so that To arrange (3.9), we use the non-divergence estimate established in Corollary 3.4 and the following uniform bound sup n 1 f ε • a n L 2 (σ d ) < ∞. (3.10) In order to prove (3.10), we interpret the L 2 -norm arithmetically and reduce this estimate to a problem of counting solutions of certain Diophantine equations.
Ultimately, we shall show that Z * T converges to the Normal Law using Proposition 3.1. Our main tool is the estimates on higher-order correlations from §3.3. We note the bounds in our computations will depend on the parameters ε, T , L, and thus the explicit forms of error terms in Theorems 3.2 and 3.3 are essential for this purpose.

Well-separated tuples and estimating the cumulants
By linearity, the estimates on cumulants arising in the proofs of Theorems 2.1 and 2.2 reduce to the following basic problem, which is discussed in more detail in our paper [4]. Given φ 1 , . . . , φ r ∈ C ∞ c (L d ) and (a 1 , . . . , a r ) ∈ A r d , we wish to estimate averages of cumulants of the form Cum (r) as (a 1 , . . . , a r ) varies over certain subsets of A r d .
The idea is to decompose A r d into finitely many regions where the cumulants can be estimated separately. These regions are defined as follows. Recall that ρ is a fixed invariant metric on A d . For I, J ⊂ [r] and a = (a 1 , . . . , a r ) ∈ A r d , we set ρ I (a) = max ρ(a i , a j ) : i, j ∈ I and ρ I,J (a) = min ρ(a i , a j ) : i ∈ I, j ∈ J .
We estimate the cumulants on ∆ Q (α, β) using the exponential multiple mixing property established in Theorem 3.2.
To prove this lemma, we introduce a cumulants "conditioned" on a given partition Q. For partitions P and Q, we set P ∧ Q = {P ∩ Q : P ∈ P, Q ∈ Q}. We define (3.12) Comparing (3.11) and (3.12), we realize that they are approximately equal for tuples (a 1 , . . . , a r ) ∈ ∆ Q (α, β) with suitably chosen α and β because according to Theorem 3.2. The second step in the proof of Lemma 3.5 utilizes the fact that when Q is a non-trivial partition of [r], then Cum (r) λ d ,Q (φ 1 • a 1 , . . . , φ r • a r ) = 0, which is a combinatorial version of (3.3) (see Proposition 8.1 in [4]). This leads to the estimate in Lemma 3.5.
In order to apply Lemma 3.5, we decompose A r d into regions where the tuples (a 1 , . . . , a r ) are "well-separated" or "clustered" on certain scales. We show (cf. [4,Prop. 6.2]) that for suitably chosen parameters 0 = α 0 < β 0 < α 1 < · · · < β r−1 < α r , we have a decomposition where the union is taken over the partitions Q of {1, . . . , r} with |Q| 2. It turns out possible to choose the parameters α j , β j in such a way that Lemma 3.5 can be applied to the averages of the cumulants over subsets of ∆ Q (α j , β j ) to conclude that they are negligible. Now it remains to estimate the average over a subset of ∆(α r ). Since we can choose α r quite small, the latter average can be estimated by bounding the number of terms.
The above argument requires some modifications for the proof of Theorem 2.3 because we need to take into account the estimator D ′ in Theorem 3.3. It will be convenient to embed A r d in A r+1 d by a → (e, a) and define subsets ∆ Q (α, β) of A r d with respect to this embedding for partitions Q of {0, 1, . . . , r}. As before, we use the decomposition (3.13). When the partition Q is non-trivial and different from {{0}, {1, . . . , r}}, we are able to modify the proof of Lemma 3.5 using Theorem 3.3 and estimate the cumulants Cum (r) σ d (φ 1 • a n 1 , . . . , φ r • a n r ) when (a n 1 , . . . , a n r ) ∈ ∆ Q (s, c r,k s). When Q = {{0}, {1, . . . , r}}, we observe that Theorem 3.3 implies that Cum (r) σ d (φ 1 • a n 1 , . . . , φ r • a n r ) ≈ Cum (r) λ d (φ 1 • a n 1 , . . . , φ r • a n r ) when (a n 1 , . . . , a n r ) ∈ ∆ Q (s, c r,k s), and the latter cumulant has already been estimated.
Finally, we have to deal with the average over (a n 1 , . . . , a n r ) ∈ ∆(α r ). For this purpose, we modify the function Z * T in such a way that its convergence in distribution is not affected. We set where the parameter M = M(N) → ∞ is chosen so that In particular Z * T and Z * * T have the same distributional limits, and thus it is suffices to establish convergence in distribution of Z * * T . Choosing M = M(N) appropriately, we can further arrange so that averages over subsets of ∆(α r ) in the cumulant calculations Cum Then using Theorem 3.2 with r = 2, we deduce that (3.14) converges to It should be noted that this argument have to be applied to the family of functions φ L ε,T • θ r , introduced in §3.4, with suitably chosen parameters. The explicit form of the error term in Theorem 3.2 still allows to justify convergence of (3.14). where ψ(n, y) = φ(a n y) − Y d (φ • a n ) dσ d for smooth compactly supported functions φ on L d+1 . We shall show that (3.16) converges to as n → ∞. A more tedious analysis, which utilizes the explicit quantitative bounds from Theorem 3.3 with r = 1, 2, allows to conclude that (3.18) converges to This leads to the formula (3.17). More precisely, this argument will be applied to the family of functions φ L ε , introduced in §3.4, but the explicit form of the error term in Theorem 3.3 allows to handle this.
Finally, we note that the expressions (3.15) and (3.17) can be computed explicitly in our setting using Rogers' formula [13]. In particular, we conclude that the obtained variances are positive.