Sticky central limit theorems at isolated hyperbolic planar singularities

We derive the limiting distribution of the barycenter $b_n$ of an i.i.d. sample of $n$ random points on a planar cone with angular spread larger than $2\pi$. There are three mutually exclusive possibilities: (i) (fully sticky case) after a finite random time the barycenter is almost surely at the origin; (ii) (partly sticky case) the limiting distribution of $\sqrt{n} b_n$ comprises a point mass at the origin, an open sector of a Gaussian, and the projection of a Gaussian to the sector's bounding rays; or (iii) (nonsticky case) the barycenter stays away from the origin and the renormalized fluctuations have a fully supported limit distribution---usually Gaussian but not always. We conclude with an alternative, topological definition of stickiness that generalizes readily to measures on general metric spaces.


Introduction
It has recently been observed that large samples from well-behaved probability distributions on metric spaces that are not smooth Riemannian manifolds are sometimes constrained to lie in subsets of low dimension, and that central limit theorems in such cases consequently behave non-classically, with components of limiting distributions supported on thin subsets of the sample space Hotz et al. (2013); Barden et al. (2013); Basrak (2010). Our results here continue this line of investigation with the first complete description of "sticky" behavior at a singularity of codimension 2.
More precisely, we prove laws of large numbers (Theorem 1.12; see Section 5 for proofs and more details) as well as central limit theorems (Section 1.4; proofs in Section 6) for Fréchet means of probability distributions (Definitions 1.6 and 1.7) on metric spaces possessing the simplest geometric singularities in codimension 2. The spaces are surfaces homeomorphic to the Euclidean plane R 2 and metrically flat locally everywhere except at a single cone point, where the angle sum-the length of a circle of radius 1-exceeds 2π (see Section 1.1 for precise definitions). Thus the surface is planar, the singularity is isolated, and its geometry is hyperbolic, in the sense of negatively curved; hence the title of this paper.
The asymptotic behavior splits into three cases, called fully sticky, partly sticky, and nonsticky (Definition 1.8 and Proposition 1.10), according to whether the mean lies stably at the singularity (Theorem 1.13), unstably at the singularity (Theorem 1.14), or away from the singularity (Theorem 1.15), respectively. Specific examples illustrating the sticky phenomena, including subtle non-local effects of the singular negative curvature when the mean lies in the smooth stratum (Example 2.5), occupy Section 2. In contrast to the usual strong law asserting almost-sure convergence of empirical means to a population mean, sticky strong laws deal also with the limiting behavior of supports of the laws of empirical means. In the sticky case this support degenerates in some specified sense already in finite random time (Theorem 1.12). Our sticky central limit theorems assert that the limiting distributions are mixtures of parts of Gaussians and collapsed (i.e., projected) parts of Gaussians. Even in the nonsticky case, the limiting laws can fail to be Gaussian (Example 2.5), which may come as a surprise: although the space is locally Euclidean near the mean, the conclusion of Theorem 2.3 of Bhattacharya and Patrangenaru (2005) can nonetheless not be valid.
Concluding our analysis is a topological characterization of stickiness for measures on isolated planar hyperbolic singularities (Theorem 7.6), as opposed to the algebraic one in terms of moments (Definitions 1.7 and 1.8) used for the rest of the exposition. Thinking topologically leads to a very general notion of stickiness (Definition 7.10), which we include with an eye toward sampling from more general geometrically or topologically singular spaces. We have in mind stratified spaces (see [Goresky and MacPherson (1988)] or [Pflaum (2001)]), suitably metrized, noting that (for example) every real semialgebraic variety admits a canonical Whitney stratification with finitely many semialgebraic strata (Gibson, et al., 1976, Section 2.7).
A motivating example of such stratified sample spaces comes from evolutionary biology, where the objects are phylogenetic trees. The space of such objects is CAT(0) (or equivalently, globally nonpositively curved) [Billera, et al. (2001)] and therefore has many desirable features where geometric probability is concerned [Sturm (2003)]. Barden et al. (2013) treat the space T 4 of phylogenetic trees with four leaves. 1 The singularity of T 4 at its center cone point is a (non-disjoint) union of a certain number of copies of an isolated planar singularity with angle sum 5π/2 > 2π. Therefore some features of our results are present in the central limit theorem at the cone point of T 4 ( Barden et al., 2013, Theorem 5.2), which identifies the support of the limit measure in each right-angled orthant as a cone over an interval. However, the limit measure exhibits additionally non-classical behavior at the boundary of its support, where mass concentrates on the edges and even more on the origin. The simpler nature of an isolated planar singularity, which lacks the global combinatorial complexity of tree space, allows us to discover these boundary components and characterize them by identifying the limit measure as the convex projection of a Gaussian distribution (Theorem 1.14).
While the strong law of large numbers on quasi-metric spaces by Ziezold (1977) and on manifolds by Bhattacharya and Patrangenaru (2003) requires the existence of a population mean, meaning square-integrability of the underlying law, for fully sticky strong laws the existence of a population mean is not necessary: no square-integrability is required. Curiously, a (fully) sticky central limit theorem can consequently hold in the absence of any population mean at all (Example 2.4). That said, the greater challenge consists in the partly sticky case; as is the case for the multivariate Central Limit Theorem, as well as for that on manifolds by Landsman (1996, 1998); Bhattacharya and Patrangenaru (2005) or on certain stratified spaces by Huckemann (2011), square-integrability is still required.
In addition to the theoretical interest in the asymptotic behavior of means on stratified spaces, another driving motivation comes from the need to accordingly devise inferential statistical methods for applications based on the asymptotic behavior of Fréchet sample means and similar mean quantities, e.g. Holmes (2003); Aydın et al. (2009);Nye (2011);Skwerer et al. (2013). This type of development is exemplified, in the form of confidence intervals on the spider, by Hotz and Le (2014).
Many parts of this paper are rather technical-though elementary-and require the buildup of notation in Sections 3 and 4, as we fold the isolated planar singularity onto R 2 . The behavior of first moments under folding and rotation is essential to understand the limiting location of barycenters on the singular space K (which we call the kale), and their limiting laws on K as well as on R 2 , which are described by certain sectors where a first folded moment is non-negative.
where α > 2π is the angle sum at the isolated point 0, called the origin, the sole point at which the metric is not locally Euclidean. Points are specified by polar coordinates p = (r, θ) ∈ K for a radius r > 0 and angle θ ∈ R/αZ, and the origin is often expressed as 0 = (0, 0) or 0 = (0, θ) for any θ ∈ R/αZ; that is, the origin is viewed as lying at zero radius along every ray emanating from it. The circle R/αZ, a group under addition, has the natural uniform metric defined by |θ − θ| α = min n∈Z |nα + θ − θ|.
Note that |θ − θ| α ≤ α/2. Denote by d(p 1 , p 2 ) the metric on K defined by When one of the points is the origin-so one of the radii vanishes-both cases apply, and in that situation the distance equals the other radius. Geometrically, K is the metric cone over a circle of length α placed at distance 1 from the cone point 0.
If we allowed α = 2π, then this construction would yield K = R 2 with the Euclidean metric. If we allowed α < 2π, then this construction would be a right circular ("ice cream") cone with angle sum α at the apex. The cases where the angle sum α is bigger than, equal to, or smaller than 2π correspond to the curvature at the origin being negative, flat, or positive, respectively. The name "kale" derives from the negative curvature of that particular leafy vegetable.
Definition 1.2. If I ⊂ R/αZ is any interval of angles, define the sector C I = {(r, θ) ∈ K | r ≥ 0 and θ ∈ I} that is the cone over I from the origin. (If I is closed, then C I is a closed subset of K.) Definition 1.3. For a fixed angle θ ∈ R/αZ, in polar coordinates on K the folding map if r > 0 and |θ − θ| ≥ π.
When |θ − θ| = π the second and third cases agree. A simple geometric description of the folding map is given in terms of light and shadow as follows, cf. also Figure 1. I θ = (r, θ ) ∈ K | r > 0 and |θ − θ| > π ⊂ K is the part of K invisible from the angle θ. The complement K \ I θ is the part visible from θ. The complement K \ I θ of the closure of the invisible part is fully visible, and the set I θ \ I θ of boundary points outside of I θ is partly visible. The shadow of any set A ⊆ R/αZ is Figure 1. Fix points p = 0 = p with angles θ and θ on the kale K. Left: The shadow I θ of p is the interior of the sector of points whose shortest paths to p pass through the origin. In other words, as seen from p, the origin casts the shadow I θ . All these points are invisible from p. (For future reference, with notation as in (1.18) and Lemma 4.3, including the upper dashed line gives I + θ and including the lower dashed line gives I − θ .) Right: Under the folding map F θ centered at angle θ, the shadow collapses to the negative horizontal axis.
The terminology referring to (in)visibility and shadow is motivated as follows. Imagine placing a light source at a point p = (r, θ). If rays of light (geodesics) in K are obstructed by the origin, then I θ is the set of points in the shadow cast by the origin. Alternatively, imagine light emanating from sources within I θ : an observer at (r, θ) is not able to resolve the image, since all light rays arriving at the observer have merged at the origin.
Remark 1.5. The folding map F θ is the unique continuous map K → R 2 that preserves all distances from points on the ray at angle θ to other points in K; c.f. Lemma 3.2. In particular, it preserves radius from the origin. The folding map F θ collapses the part of K invisible from θ to the negative horizontal axis of R 2 and takes the fully visible part of K bijectively to the complement of the negative horizontal axis.
The folding map F θ is the "logarithm map" from K to the tangent space at any point with positive radius along the ray at angle θ. In smooth manifolds, log maps are right inverse to exponential maps, the latter being globally defined on the tangent space at a point p, while the former is only defined in a neighborhood of p. Here, singularity of the metric at 0 ∈ K prevents exp from being well defined, whereas uniqueness of geodesics in K (that is, the absence of a cut locus) makes the log map globally defined on K.
1.2. Barycenters and folded first moments. Let µ be a Borel probability measure on K.
Our main results concern statistics of random points drawn independently from the measure µ on K. We assume throughout that µ satisfies the integrability condition Because K is not a linear space, the mean of a probability distribution on K cannot be defined using addition, as it can be in R 2 . Instead, we use the notion of barycenter of a distribution µ. If the second moment condition (square-integrability) is finite for all p ∈ K, and it has a unique minimizer (proved later, at Corollary 4.13). This leads to the following definition.
Definition 1.6. Under the second moment condition (1.3), the unique minimizer of Γ is the barycenter of µ, denoted byb.
It is possible to extend this definition in a consistent way to the setting where only the integrability condition (1.2) holds for µ rather than the stronger square-integrability condition (1.3); see Definition 1.11. For now, we only say enough to state this generalization of Definition 1.6, postponing the full discussion to Section 4.
Under the folding map F θ : K → R 2 , the measure µ pushes forward to a probability measureμ θ = µ • F −1 θ on R 2 . The family of measures {μ θ } θ∈R/αZ on R 2 allows us to deduce properties of the measure µ on K. For points z ∈ R 2 , use Cartesian coordinates z = (z 1 , z 2 ); the context should prevent any confusion with the radial representation (r, θ) of points in K. Back in R 2 , denote by e 1 = (1, 0) and e 2 = (0, 1) the standard basis vectors, and by "·" the standard inner product. The mean ofμ θ in R 2 can be defined in the usual way, as follows.
Definition 1.7. For θ ∈ R/αZ, the first moment of µ folded about θ (or equivalently, the mean of µ folded about θ) is The integrability condition (1.2) implies that the first moment m θ is finite and that θ → m θ is continuous. Definition 1.8. Fix a probability distribution µ on K and let K ⊂ R/αZ be the subset on which m θ,1 ≥ 0. The distribution µ is (i) fully sticky if K is empty; (ii) partly sticky if K is non-empty and m θ,1 = 0 on its entirety; and (iii) nonsticky if K has non-empty interior and m θ,1 > 0 on int(K).
The measure µ is sticky if it is either fully sticky or partly sticky. When µ is partly sticky, a direction θ is sticky if m θ,1 < 0 and fluctuating if m θ,1 ≥ 0.
Notice that since θ → m θ,1 is continuous, the set K from Definition 1.8 is always a closed set. To rule out certain pathologies, we always assume the following nondegeneracy condition.
Assumption 1.9. The measure µ is nondegenerate in the sense that where for angles θ, θ ∈ R/αZ, the union of the two rays at angles θ and θ .
If nondegeneracy does not hold, then µ(R θ,θ ) = 1 for some pair of angles θ, θ ∈ R/αZ such that |θ − θ | ≥ π: all of the mass is concentrated on two rays separated by an angle of at least π. Since |θ − θ | ≥ π means that (1, θ ) ∈ I θ (or equivalently that (1, θ) ∈ I θ ), it is not hard to show that this scenario is metrically equivalent the case of K = R.
The terms fully sticky, partly sticky, and nonsticky in Definition 1.8 are mutually exclusive. The following result shows that under minimal assumptions, every distribution is covered by one of these three cases; this is essentially Proposition 4.11. Proposition 1.10. If µ is a probability measure on K that is integrable (1.2) and nondegenerate (1.5), then µ is either fully sticky, partly sticky, or nonsticky. Furthermore, if µ is partly sticky, then the interval [A, B] on which m θ,1 ≥ 0 has length |A − B| < π; if µ is nonsticky, then |A − B| ≤ π and the function θ → m θ,1 is strictly concave on its interior (A, B).
We are now in a position to generalize the concept of barycenter in K to the setting where µ only satisfies the integrability condition (1.2) but not the square-integrability condition (1.3).
Definition 1.11. If the probability distribution µ satisfies (1.2) and is sticky (either fully or partly sticky), then set the mean of µ equal to the origin 0. If µ is nonsticky, then set the mean of µ equal to the point (m θ ,1 , θ ) ∈ K, where θ maximizes the function θ → m θ,1 .
In light of Proposition 1.10, the mean of µ is well defined for all distributions that satisfy the integrability and nondegeneracy assumptions; the second moment condition used in the definition of the barycenter is not necessary to define a mean. In Corollary 4.13 we show that when the barycenter is defined, the mean of µ coincides with its barycenter.
1.3. Empirical measures and the law of large numbers. For a given set of points {p n } N n=1 ⊂ K, define the empirical measure the averaged sum of unit measures supported on the points p n . This is a Borel probability measure on K, and all results of the previous section apply to µ N . Let b N = b(p 1 , . . . , p N ) be the barycenter of µ N : uniquely defined (by Corollary 4.13). For θ ∈ R/αZ, write η θ,N ∈ R 2 for the folded average The folded first moments of µ N , which we denote by m N θ ∈ R 2 , are defined by Comparing these formulas to (1.7), the folded average is evidently equivalent to the first moment of the empirical measure: (1.8) η θ,N = m N θ for all θ ∈ R/αZ.
An important issue in our analysis is whether the folded average η θ,N is close to the folded barycenter F θ b N , that is, whether "averaging commutes with folding". These two points in R 2 may not coincide; the relation between η θ,N and F θ b N is addressed later in Lemma 4.15.
Henceforth, let {p n } N n=1 be a collection of independent random points on K, each distributed according to µ. More precisely, let {p n (ω) | n = 1, . . . , N } be a collection of independent, identically distributed K-valued random variables, each distributed according to µ over a probability space (Ω, A, P). Their barycenter b N (ω) = b(p 1 (ω), . . . , p N (ω)) ∈ K is also a random variable taking values in K. For each θ ∈ R/αZ, let m N θ = m N θ (ω) be the random first moments associated with the empirical measures µ N = µ N (ω) = 1 N N n=1 δ pn(ω) . As before, denote by m θ the deterministic folded means of µ in Definition 1.7. For any angle θ, By the usual strong law of large numbers for R 2 -valued random variables, (1.9) m N θ → m θ P-almost surely as N → ∞, for all θ ∈ R/αZ.
Translating back into a law of large numbers in K for the random barycenters b N , the behavior in the first two cases is strikingly different than the typical law of large numbers in a Euclidean space. The following result is proved in Section 5.
Theorem 1.12 (Law of Large Numbers on K). Assume that µ satisfies the integrability condition (1.2). Exactly one of the following holds, depending on how sticky µ is. The theorem implies that for all of the sticky directions θ, the empirical mean b N stops fluctuating after some random but finite N * along the ray {(r, θ) | r ≥ 0}; this is the phenomenon that we refer to as "stickiness". In fluctuating directions, the empirical mean b N continues to vary as N → ∞, although the magnitude of the movement goes to zero asymptotically.
1.4. Central Limit Theorems. The central limit theorems in this section describe the asymptotic behavior of the properly normalized fluctuations of b N about the mean of µ. Due to the non-standard nature of the sticky law of large numbers, it is not surprising that the central limit theorem also takes a different form in sticky cases. Even in the nonsticky case, the central limit theorem is non-standard. Each of the three possibilities in Proposition 1.10 is covered in a separate theorem; these three theorems are proved in Section 6.
1.4.1. Fully sticky case. The simplest case is the fully sticky case, where there are asymptotically no fluctuations in any direction. On K define the scaling β(r, θ) = (βr, θ) for arbitrary β ≥ 0 such that F θ β(r, θ) = βF θ (r, θ) for all θ, θ ∈ R/αZ and r, β ≥ 0. Let ν N denote the distribution of the rescaled empirical means: for all Borel sets U ⊂ K.
Theorem 1.13. If a probability measure µ on K is fully sticky, then the rescaled empirical mean measures {ν N } ∞ N =1 from (1.10) converge in the total variation norm (and hence weakly) to the point measure δ 0 as N → ∞. In particular, for any bounded function φ : K → R, In this fully sticky case, the term "Central Limit Theorem" is a bit of a misnomer, since there are no asymptotic fluctuations. In fact, Theorem 1.13 would still be true if we replace √ N in (1.10) with any increasing function of N . The next two cases require a bit more notation and setup.

1.4.2.
Partly sticky case. Assume the second moment condition (1.3). Since the meanb of µ lies at the origin 0 in the partly sticky case, again consider the rescaled empirical measure ν N defined by (1.10). The limit of ν N is another measure on K, constructed as follows.
is the interval of fluctuating directions (Definition 1.8.2). Let g denote the law of the multivariate normal random variable on R 2 having mean zero and covariance matrix This matrix is well defined due to the square-integrability condition (1.3). Denote by D ρ ⊂ R 2 the closed sector (1.14)P ρ (q) = arg min ρ is the pushforward of the normal measure g, whose covariance matrix is defined in (1.12), under the projectionP ρ to D ρ . Figure 2 illustrates the construction in an example.
Theorem 1.14. If a measure µ on K is partly sticky and square-integrable (1.3), then the rescaled empirical mean measures {ν N } ∞ N =1 from (1.10) converge weakly to the measure h θ * from (1.15) as N → ∞, where θ * is the midpoint of the interval K in Definition 1.8. That is, for any continuous, bounded function φ : K → R, The measure h θ * is supported on the closed sector C [A,B] . The limit distribution h θ * can be decomposed into a singluar part and an absolutely continuous part: The absolutely continuous part is the restriction of a Gaussian to the set int(C [A,B] B] has no interior and h abs = 0. The singular part h sing is supported on the boundary ∂C [A,B] , and it includes an atom wδ 0 (p) at the origin with weight w = g (r cos ϑ, r sin ϑ) ∈ R 2 | r > 0 and ϑ B] is that sector in K centered at θ * that is spanned by the angles θ for which m θ,1 = 0. For N larger than a finite but random number B] under the folding map centered at θ * . With a Gaussian g centered at 0 ∈ R 2 , up to the bijection, the limiting measure is g on int(D ρ ) and the pushforward of g on R 2 \ D ρ to ∂D ρ under the convex projectionP ρ . The dashed arrows show the directions of this convex projection.
However, not all of the mass in the singular part lies at the origin; h sing also distributes mass continuously on the edges of the sector C [A,B] . In particular, 1.4.3. Nonsticky case. When µ is nonsticky, the mean of µ isb = (r * , θ * ) ∈ K, where r * = m θ * ,1 > 0 and θ * is the unique angle for which m θ * ,1 = max θ m θ,1 .
In particular this means thatb = 0, so the limit measure obtained by renormalizing fluctuations of b N lives on the tangent space ofb, which is isomorphic to R 2 , not K as in sticky cases.
With θ * fixed, the family of random variables {m N θ * } ∞ N =1 satisfies a standard central limit theorem in R 2 . Specifically, let g be the law of a multivariate normal random variable on R 2 with zero mean and covariance matrix This matrix is well defined under the square-integrability condition (1.3). The standard central limit theorem implies that as N → ∞ the law of the random variable in R 2 converges weakly to g.
Although is it reasonable to expect that F θ * b N would satisfy the same central limit theorem, this might in fact not be the case, depending on whether the closed shadow I θ * carries mass. Define κ ≥ 0 to be the random variable where (cf. Figure 1 for converge weakly to g as N → ∞. That is, for any continuous, bounded function φ : R 2 → R, is not Gaussian in the limit; see Example 2.5.

Examples
Here are a few examples illustrating some phenomena described by the limit theorems.
Example 2.1 (Partly sticky). Fix α > 2π and θ * ∈ R/αZ. Let K ≥ 3 be an odd integer. Let µ be the sum of K atoms having mass 1/K at the points That is, In this case m θ ≤ 0 for all θ ∈ R/αZ, while m θ = 0 if and only if |θ − θ * | ≤ π/K. The limit distribution h θ * is supported on the sector including a singluar part at the origin with weight 1/2 − k/π 2 and on ∂C [− π angle > 2π 5 Figure 3. Example 2.1 in the case K = 5.
In the limit, Example 2.1 gives the following.
Example 2.2 (Partly sticky with singular limit measure). Fix α > 2π and θ * ∈ R/αZ. Suppose µ is uniform on the set Then m θ ≤ 0 for all θ ∈ R/αZ, while m θ = 0 only for θ = θ * . The limit distribution h θ * puts an atom of mass 1/2 at the origin, and half a Gaussian on the ray {(r, θ * ) | r > 0}. In particular, h θ * has no absolutely continuous part. As in Example 2.1, the limit distribution does not vary with α, given that α > 2π.
Example 2.3 (Embedding the spider). Suppose α > Kπ. Then there are angles θ k ∈ R/αZ for k = 1, . . . , K such that |θ k −θ j | > π for all j = k. Working with measures supported on the union of the rays at angles θ 1 , . . . , θ K is equivalent to working with probability distributions on the spider with K legs-that is, an open book of dimension 1 with K leaves, cf. Hotz et al. (2013)-by mapping the ray {(r, θ k ) ∈ K | r > 0} to a leg of the spider.

Folding isolated hyperbolic planar singularities
This section elaborates on the geometric structure of the kale K defined in (1.1).
Lemma 3.1 (Openness of visibility). If p is fully visible from the angle θ 0 then it is fully visible from all θ sufficiently close to θ 0 . The same is true for invisibility.
Proof. The sets I θ and K \ I θ are open.
Recall that d 2 (z, w) : R 2 × R 2 → [0, ∞) denotes the Euclidean metric in R 2 . The following lemma follows easily from the definitions of F θ and the metric d on K.

Barycenters and first moments of probability measures on the kale
This section describes properties of the functions θ → m θ and θ → m N θ ; the behavior of these functions aids in understanding how the barycenters b N behave in the limit N → ∞. Recall that the barycenter is the minimizer of Γ(p), defined in (1.4). To motivate what comes next and better explain the connection between barycenters and the first component m θ,1 of folded means m θ , we recall the analogous calculation for R n . Define γ : for a given probability measure ν on R n . The barycenter of ν in this Euclidean setting is the point x ∈ R n that minimizes γ(x). Observe that wherex = x/ x is the unit vector in the direction of x. Hence if ν is square-integrable, and then the minimizer of γ lies in the directionx that maximizes and at a distance from the origin equal to the maximum value of (4.22). Here m ∈ R 2 is the mean of ν. Hence ifx * is the maximizing direction, then the barycenter can be written in polar coordinates (r,x) as (m ·x * ,x * ). From this it follows that the solution is the usual mean in Euclidean space. Even when the term γ(0) in (4.21) is infinite, it is reasonable to take this as the definition of mean. To make the maximization of (4.22) well defined, one only needs to assume ν is integrable rather than square-integrable. A similar calculation can be done in the kale setting. Since the folding map rotates the direction θ back to the direction e 1 in the Euclidean plane, m θ,1 is exactly analogous to (4.22). The following lemma proves the expression analogous to (4.21) in the setting of K.
Recalling the definition w ± (θ) from (1.18), observe that 0 ≤ w ± (θ) ≤r holds for all θ because the integrand is nonnegative and I ± θ ⊂ K. Also, as a consequence of (4.25), Since µ is a probability measure, due to σ-additivity, only countably many of the rays {(r, θ) | 0 ≤ r < ∞} for θ ∈ R/αZ carry positive mass of µ; of course, this allows in particular that µ has a density. Consequently, w + and w − are continuous and identical almost everywhere with respect to the understood measure on R/αZ induced by Lebesgue measure on [0, α), and so is θ → D ± m θ,2 .
Proposition 4.11. Assuming integrability (1.2) and nondegeneracy (Definition 1.9), the subset of R/αZ on which m θ,1 ≥ 0 is a closed interval that is exactly one of the following: (i) empty, (ii) of length < π, with m θ,1 = 0 on its entirety, or (iii) of length ≤ π, with m θ,1 strictly concave (and hence strictly positive) on its interior.
The length of the interval depends on µ as well as on α.
Finally, assume max m θ,1 = 0. Fix a left boundary point A a right boundary point B of K; note that A = B is possible. Corollary 4.10 again teaches that B − A ≤ π. By hypothesis, Hence (4.27) takes the forms These formulas, plus the choices of A and B as left and right endpoints, imply that m θ,1 < 0 for all θ ∈ [A−π, A)∪(B, B +π]. In words, every left endpoint of K is preceded by, and every right endpoint of K is followed by, an interval of length at least π on which m θ,1 < 0. Since |A − B| ≤ π, the interval [A, B] contains no endpoints of K other than A and B themselves. Therefore m θ,1 = 0 for all θ ∈ [A, B]. Corollary 4.10 prevents m θ,1 ≥ 0 for θ outside of [A − π, B + π]. Except for showing the strict inequality |B − A| < π, this completes the proof that max m θ,1 = 0 forces conclusion (ii). Suppose, then, that |B − A| = π. Corollary 4.4 implies that µ(C [A,B] ) = 1. If θ * is the midpoint of the interval [A, B], then the measureμ θ * is supported on the half-space B], whenceμ θ * is actually supported on a single line ∂H + . This contradicts the non-degeneracy hypothesis. Therefore |B − A| < π, as desired.
In either case, the mean of µ in Definition 1.11 coincides with the barycenter of µ.
Proof. Use the explicit expression for Γ(p) from Lemma (4.1) and Corollary 4.12; minimize over r and θ.
We conclude this section with important estimates relating folded averages η θ,N from (1.7) to folded barycenters F θ b N of empirical distributions on K.

Proof of the sticky law of large numbers
The standard law of large numbers for folded averages in R 2 states that m N θ → m θ as N → ∞. It holds uniformly in θ, as follows.
Theorem 5.2. Let T ⊂ R/αZ be a closed subset such that m θ,1 < 0 for all θ ∈ T . Then there is a random integer N * (ω) such that b N (ω) / ∈ C + T for all N ≥ N * (ω) holds P-almost surely. In particular, if µ is fully sticky then there is a random integer N * (ω) such that b N = 0 for all N ≥ N * (ω), P-almost surely. Similarly, if µ is partly sticky and T ⊂ R/αZ is any open interval containing the maximal interval where m θ,1 = 0, as described in Propositions 1.10 and 4.11, then b N ∈ C T for all N ≥ N * (ω), P-almost surely.
Proof. Since T is closed and m θ,1 is continuous, there is > 0 such that sup θ∈T m θ,1 < − < 0. By Lemma 5.1 there is a random integer N * (ω) such that m N θ,1 < − /2 for all θ ∈ T , almost surely for all N ≥ N * (ω). Now, b N is the unique minimizer of Since the empirical measures {µ N } ∞ N =1 are square-integrable (even if µ is not), by Lemma 4.1. Therefore, if θ ∈ T , and r > 0, and N ≥ N * (ω), then almost surely Hence the minimizer b N lies outside of C + T almost surely.
By a very similar argument, Corollary 4.13 and Lemma 5.1 together imply the following, which we state without proof. It also is a consequence of the strong law of Ziezold (1977).
We now give the proof the law of large numbers on K (Theorem 1.12) by collecting various results we have already proved.
Proof of Theorem 1.12. The fully sticky case is immediate from Theorem 5.2. Consider the partly sticky case. By Corollary 4.13 applied to the empirical measure µ N , the empirical barycenter is b N = (m N θ * ,1 , θ * ) where θ * maximizes θ → m N θ,1 . Combining this fact with Lemma 5.1 leads to the conclusion that lim sup holds P-almost surely. In the partly sticky case, m 1,θ ≤ 0 for all θ. Thus b N → 0 holds P-almost surely. The other statements in the partly sticky case follow from Theorem 5.2. Finally, consider the nonsticky case. Convergence b N →b again follows from the representation b N = (m N θ * ,1 , θ * ) where θ * maximizes θ → m N θ,1 . By Lemma 5.1 P-almost surely any maximizer θ N of θ → m N θ,1 converges, as N → ∞, to the maximizer of θ → m θ,1 , which is unique in the nonsticky case. By definition ofb, this implies that b N →b, P-almost surely.

Proofs of the central limit theorems
This section contains proofs of the three central limit theorems: Theorem 1.13, Theorem 1.14, and Theorem 1.15. First comes the fully sticky case, which follows almost immediately from Theorem 1.12.
Proof of Theorem 1.13. Let N * be the random integer from Theorem 1.12, which has the property that, P-almost surely, b N (x) = 0 for all N ≥ N * (ω). If φ : K → R is any bounded function then Since N * is almost surely finite, P(N < N * ) → 0 as N → ∞ which concludes the proof. Since the bound on the right hand side depends only on the supremum norm of φ, the bound also implies convergence in the total variation norm.
Next comes the proof of the central limit theorem in the partly sticky case.
Proof of Theorem 1.14. Let K = [A, B] be the interval on which m θ,1 = 0, so m θ,1 < 0 for all θ / ∈ [A, B] by hypothesis. Let ∈ (0, π/4). By Theorem 5.2 there is an integer N * (ω, ) such that, almost surely for bounded functions φ 1 and φ 2 that agree on C [A ,B ] . For any continuous bounded function φ 1 : K → R, there is a continuous bounded function ϕ : R 2 → R such that the composite B ] . Therefore, it suffices to prove (1.16) for functions of the form φ = ϕ • F θ * where ϕ : R 2 → R is continuous and bounded.
; each term in this sum is the average of N independent random variables in R 2 , and each term has zero mean since E(m N A ) = m A = 0 and E(m N B ) = m B = 0, by hypothesis. The Chebychev inequality implies δ 2 for = Cδ 3 /2 (6.38) by square-integrability with a constant C that depends only on µ.
By Theorem 5.2 there is an integer N * (ω, ) such that b N ∈ C [B ,A ] if N ≥ N * (ω, ) for almost surely all ω. In particular, given δ > 0 there is an integer N ,δ such that Setting N δ = N ,δ for = Cδ 3 /2 with (6.38), the above yields the desired claim (6.35).
We conclude with the proof of the central limit theorem for the nonsticky case.
7. Topological definition of sticky mean 7.1. Topological version for kale. Let M 1 be the set of all finite Borel measures µ on K satisfying the integrability condition (1.2). This section considers how the mean (or barycenter) of a measure µ ∈ M 1 varies under perturbations of the measure. For this reason, we temporarily modify the notation for m θ,1 to m θ,1 (µ), to reflect the measure µ being used. It is then easy to see that for µ, ν ∈ M 1 , m θ,1 (µ + ν) = m θ,1 (µ) + m θ,1 (ν). (7.45) Two measures µ, ν ∈ M 1 are considered equivalent if they differ only in their total mass, meaning that there is a constant c > 0 with µ = cν. Denote the space of equivalence classes by M 1 . Endow M 1 with the topology generated by the Wasserstein metric defined by where Lip 1 is the set of real-valued, Lipschitz-continuous functions on K with Lipschitz constant 1. This topology extends to M 1 by declaring the distance between µ and ν to be the Wasserstein distance ρ(µ, ν) when µ and ν are normalized so that µ(K) = ν(K) = 1. Now comes the first in a sequence of results leading us to a definition of sticky and nonsticky that is more topological than Definition 1.8.
Lemma 7.1. Let µ ∈ M 1 be fully sticky. There exists an open neighborhood U of µ so that ν ∈ U implies (i) ν is fully sticky and (ii) µ and ν have the same mean.
Definition 7.3. Fix a measure µ ∈ M 1 . A measure ν ∈ M 1 , thought of as a direction, is 1. sticky for µ if µ and µ + ν have the same mean for all sufficiently small > 0; 2. fluctuating for µ if µ and µ + ν have different means for all sufficiently small > 0.
Since normalization does not change whether a measure is sticky, partly sticky, or nonsticky, one could replace µ + ν by (1 − )µ + ν in the above definition. The latter has the advantage of producing a probability measure if both µ and ν were initially so.
It is convenient to have a specific class of perturbations at our disposal. Note that for the unit measure δ p supported at the point p = (1, θ ), Lemma 7.4. Any nonsticky or partly sticky µ ∈ M 1 has a fluctuating direction in M 1 .
The above lemmas combine with the fact that all measures in M 1 are either fully sticky, partly sticky, or nonsticky (Proposition 4.11 and Definition 1.8) to prove the following theorem, which could be seen as an alternative definition of the terms "fully sticky", "partly sticky", and "nonsticky" for finite measures on K. Remark 7.7. Theorem 7.6 shows that the behavior described in the law of large numbers (Theorem 1.12) is to be expected. As N gets large, empirical measure δ pn converges to µ in the topology generated by ρ if the p n are chosen independently and according to µ. (For instance combine (Villani, 2009, Theorem 6.9) and the standard weak convergence of empirical measures.) If µ is sticky then eventually µ N lies in a neighborhood of µ in which all measures have the same mean. On the other hand, if µ is nonsticky then nearby measures have different means than µ and hence the mean of µ N fluctuates. When µ is partly sticky, sometimes µ N lies in a set of measures sharing their mean with µ, and sometimes it lies in a set of measures having different means than µ.
Remark 7.8. Endowing M 1 instead with the topology generated by the open neighborhoods U µ, = {ν ∈ M 1 |m θ,1 (µ) − m θ,1 (ν)| < } maintains the truth of the above results. However, using the standard weak topology on measures, which is finer, would cause the topological characterization of stickiness to fail.
7.2. Topological definition for arbitrary metric spaces. Suppose K is a metric space, and let M be a set of probability measures.
Example 7.9. When M = M 1 is the set of Borel probability measures on K satisfying the integrability condition (1.2), different topologies on M are induced by the Wasserstein metric and by the sets U µ, in Remark 7.8. The standard weak topology is yet another possibility.
Definition 7.10. Let M be a set of measures on a metric space K. Assume M has a given topology. A mean is a continuous assignment M → {closed subsets of K}. A measure µ sticks to a closed subset C ⊆ K if every neighborhood of µ in M contains an open subset consisting of measures whose mean sets are contained in C.
Continuity implies that the mean µ is contained in C if µ sticks to C.
Example 7.11. This paper has investigated measures on the kale K, which can stick to the subset C = {0} consisting of the origin. The notion of "mean" here is Definition 1.11, which assigns to each measure a single point; this assignment is continuous by Lemma 4.3.
In spaces of interest, integrability conditions, such as those in Section 1 here, would imply existence of means. However, means in general metric spaces-even nice ones such as compact Riemannian manifolds-need not be single points. In other words, the general analogue of the minimization problem in Section 1.2 could have multiple solutions. For instance the mean set of the uniform measure on a sphere is equal to that entire sphere, whereas each sample mean is unique almost surely (cf. Remark 2.6 in Bhattacharya and Patrangenaru (2003)). In Section 5 of [Hotz and Huckemann (2014)] there is an example of a measure on the circle where the mean set is a proper circular arc. In fact, this can be viewed as the limiting case of measures with unique means, the central limit theorems for which feature arbitrarily slow convergence rates. Uniqueness of means for the kale stem from its negative curvature; see (Sturm, 2003, Proposition 4.3), for example.
Remark 7.12. In the language of earlier sections, Definition 7.10 only sets forth the notion of "sticky", which includes both the sticky and partly sticky cases. In the generality of Definition 7.10, it would be said that a measure µ fully sticks to C if some open neighborhood of µ consists entirely of measures whose means are contained within C. It would not be required that the means of the measures in such a neighborhood should equal b µ , or even intersect b µ at all. In the case where K is an open book [Hotz et al. (2013)], for example, means are unique and measures can stick to the spine, but nothing prevents the mean of a sticky measure from moving along the spine.
The set of partly sticky measures would be defined as those that are sticky but not fully sticky. Definition 7.10 implies that the set of partly sticky measures is the topological boundary of the set of sticky measures.
It remains open to characterize which metric spaces-among, say, the topologically stratified spaces (see [Goresky and MacPherson (1988)] or [Pflaum (2001)]), to be concrete-admit measures that stick to subsets of measure 0. Given such a sticky situation, first goals would be to prove laws of large numbers and central limit theorems, contrasting the fully, partly, and nonsticky cases. The limiting measures in such results would be singular analogues of Gaussian distributions; it is not clear what properties of Gaussian distributions are the right ones to lift so as to characterize the building blocks of limiting measures in general.