Geometry of Sample Spaces

In statistics, independent, identically distributed random samples do not carry a natural ordering, and their statistics are typically invariant with respect to permutations of their order. Thus, an $n$-sample in a space $M$ can be considered as an element of the quotient space of $M^n$ modulo the permutation group. The present paper takes this definition of sample space and the related concept of orbit types as a starting point for developing a geometric perspective on statistics. We aim at deriving a general mathematical setting for studying the behavior of empirical and population means in spaces ranging from smooth Riemannian manifolds to general stratified spaces. We fully describe the orbifold and path-metric structure of the sample space when $M$ is a manifold or path-metric space, respectively. These results are non-trivial even when $M$ is Euclidean. We show that the infinite sample space exists in a Gromov-Hausdorff type sense and coincides with the Wasserstein space of probability distributions on $M$. We exhibit Fr\'echet means and $k$-means as metric projections onto 1-skeleta or $k$-skeleta in Wasserstein space, and we define a new and more general notion of polymeans. This geometric characterization via metric projections applies equally to sample and population means, and we use it to establish asymptotic properties of polymeans such as consistency and asymptotic normality.


Introduction
Following the pioneering developments of directional statistics [33] and shape statistics [35,36,19], there is a growing need in many application domains for the statistical analysis of populations of objects in complicated non-Euclidean spaces.One can cite for instance tree-spaces in biology [8], Riemannian manifolds and Lie groups, including diffeomorphism groups, in medical image analysis and computer vision [45,47,49], or more generally stratified spaces [42].With the choice of a relevant distance, a natural generalization of the central values of a population of objects in these spaces is the Fréchet p-mean, that is the set of minima of the mean distance to the power p [25].While the choice of p = 2 is often used because it corresponds to the usual arithmetic, lower values of p up to p = 1 defining the median ("valeur equiprobable" in Fréchet's words) are also often useful for robust statistics.
This paper develops a general mathematical setting to study the behavior of empirical and population Fréchet p-means in spaces ranging from smooth Riemannian manifolds to general stratified spaces.We start from the key observation that independent, identically distributed (i.i.d.) random samples do not carry a natural ordering, and their statistics are typically invariant with respect to permutations of their order.Thus, an n-sample in a space M can naturally be considered as an element of the quotient space M n /S n of n-tuples modulo the permutation group S n .This space shall accordingly be called sample space.The paper takes this definition as a starting point for developing a geometric perspective on statistics, guided by the notion of orbit type.This way, we provide a theoretical basis for further investigations on unordered samples in non-Euclidean spaces.
1.1.Background.For non-positively curved spaces in the sense of Alexandrov, the 2-mean is always unique when it exists [48].For positively curved Riemannian manifolds, an important effort has been spent in determining the convexity conditions on the distribution that ensure uniqueness [34,11,1].However, many very useful distributions such as wrapped or truncated Gaussian distributions on the tangent spaces do not fulfill these conditions even if they have a unique Fréchet mean.
Asymptotic properties of the sample mean for distributions on Riemannian manifolds with a unique population Fréchet mean were studied by Bhattacharya and Patrangenaru [5,6,7].In particular, they showed the consistency of the sample Fréchet mean xn of n i.i.d.samples of a random variable x for large sample sizes (law of large numbers), building on a strong consistency result of [55].Under the Karcher and Kendall convexity conditions for the uniqueness of the population mean x, the Bhattacharya-Patrangenaru central limit theorem (CLT) further states that the random variables u n = √ n log x(x n ) converge in distribution to the Gaussian N (0, H−1 Cov(x) H−1 ) in the tangent space at x whenever the expected Hessian H of half the Riemannian squared distance at the population mean x is invertible.This type of CLT based on the delta method was further generalised in [37] to non-i.i.d.variables and in [29] to summary statistics other than the mean, such as principal geodesics.
In non-manifold stratified spaces of negative curvature, an intriguing phenomenon was discovered 10 years ago: the Fréchet mean may be sticky on singular strata [28].A regular random variable (that is a not fully concentrated on singular strata) whose Fréchet mean is located on a singular stratum is said to have a sticky mean if a sufficiently small variation of that random variable continues to have its Fréchet mean on the singular stratum.In other words, the singular strata are attractive.It is surprising that a CLT can still be derived under these conditions [28].This suggests that some regularity can be used for deriving CLTs in more general settings.Stickiness does not seem to happen in positive curvature.For instance, Kendall shape spaces in three or higher dimensions are stratified, but the Fréchet mean of regular random variables was shown to belong to the top regular stratum (manifold-stability) [30].In other words, singular strata of that kind are repulsive.
More recently, an apparently opposite unusual behavior of the CLT was discovered with smeary means, where the empirical Fréchet means converge at an asymptotic rate lower than √ n; see [21] e.g.Other results show that intermediate repulsive or attractive behaviours can happen on Riemannian manifolds, controlled either by the curvature [46,20] or by the topology [31].Thus, classical tests based on asymptotic results for Euclidean spaces might be biased, which is a critical problem for many applications.This highlights the need for a new mathematical framework to study the distribution of the empirical Fréchet mean, either in the small sample regime or asymptotically.
While considering n-samples disregarding ordering is not new, the literature is sparse in linking geometric properties of the quotient space to sample statistics.In the Euclidean case, de Finetti's theorem [23,16] and the theory of Hewitt and Savage [27] on exchangeability and presentability characterized distributions invariant to finite permutations leading to central limit theorems based on exchangeability instead of independence [13,9,38].We here develop a similar theory using additional geometric structures.
1.2.Overview and results.The convenient level of generality that we adopt is that of path-metric spaces [26,10], see A. 1, where the distance is given by the infimum of the length of curves joining the two points; for complete path-metric spaces the infimum is a minimum, see A. 2.
We first describe in Section 2 the orbifold (resp.path-metric) structure of the sample space M n /S n when M is a manifold (resp.a path-metric space).These results are non-trivial even when M is Euclidean but well known in the realm of reflection groups and Weyl chambers.The sample space M n /S n can be stratified by the number of pairwise distinct points.The regular part (M n /S n ) reg contains the unordered configurations where the n points are distinct.The lower dimensional strata are called the q-skeleta, see 2.2, and comprise unordered configurations with exactly q < n distinct points.A finer stratification classifying orbit types is based on the partition (k n describing the number of identical points; see 2.5 and 2.6.Sub-partitioning (distinguishing some of the points that were previously identified) gives a half-ordering on partitions which are thus organized in a geometric lattice structure.The orbit-type stratum (M n /S n ) (n) with the smallest partition (n) is the diagonal {x : where all points coincide.This is the 1-skeleton, which can be identified with the base manifold M .At the other end of the lattice, the regular orbit stratum ( is the open, dense, connected, and locally connected subset of all unordered configurations with n distinct points.The closure of (M n /S n ) (k) in M n /S n is the disjoint union of all (M n /S n ) (k ′ ) with (k ′ ) ≤ (k); see 2.10.The q-skeleton of M n /S n is the the union of all orbit strata (M n /S n ) (k) corresponding to all partitions (k) = (k 1 ≥ • • • ≥ k p ) with length p ≤ q ≤ n.The projection to q-skeleta and orbit strata will be used in Section 5 to characterize the Fréchet p-mean and to define a generalization called polymeans.
Section 3 investigates the metric properties of the sample spaces when we assume that M is a complete path-metric space.The L p metric d p (x, y) = 1 n n i=1 d(x i , y i ) p 1/p with p ∈ [1, ∞) on M n induces a canonical quotient metric on the sample space (M n /S n , dp ), which is then a complete path-metric space; see 3.2.Moreover, orbittype strata have convex closures, and a minimizing geodesic in the sample space (M n /S n , dp ) is the projection of a minimizing geodesic in the configuration space (M n , d p ).When M is Riemannian and p = 2, one can show that geodesics are more regular at interior points than at their end-points, 3.7.However, this assertion is generally wrong for non-Riemannian complete path-metric spaces, like for instance for the 3-spider, 3.8.This lack of regularity could be linked to stickiness.
In order to investigate sub-samples (bootstrap) and infinite samples together in the same space, we show in 4.7 that the sample space (M n /S n , dp ) is isometric to the space of mixtures of n-atomic measures (the empirical law of the samples) endowed with the p-Wasserstein metric.Moreover, the infinite sample space lim n→∞ M n /S n exists in a weakened Gromov-Hausdorff type sense and coincides with the p-Wasserstein space (P p (M ), dp ) of p-integrable probability distributions on M ; see 4.8.The extension of skeleta and orbit-type strata to infinite sample spaces can then be done easily: the q-skeleton in the infinite-sample space P p (M ) is the subset P(M ) q of all probability distributions with at most q support points; see 4.11.Similarly, for any partition (k) := (w 1 ≥ • • • ≥ w q ) consisting of non-negative weights w i summing up to 1, the (k)-stratum in the infinite-sample space P p (M ) is the subset of mixtures P = q i=1 w i δ xi ∈ P(M ) q with q distinct points x i .It is interesting to note that such a mixture of q Diracs is realized in a finite sample space for some n if the weights are all rational, but irrational weights can only be achieved in the infinite-sample limit.
With this setting, we are in position to exhibit in Section 5 empirical and population Fréchet means as metric projections onto the 1-skeletum in sample space or Wasserstein space, and we define a new and more general notion of empirical and population polymeans by the projection on the q-skeleta (M n /S n ) q or on the (k)-strata (M n /S n ) (k) .These polymeans can be interpreted as the clusters of the well known k-means clustering algorithm: the k distinct points are the cluster centroids (we also call them the unweighted polymeans) and the weights w i are the relative masses of the clusters.As everything is defined for p-integrable distributions (p ≥ 1), our definitions are actually valid for general Fréchet p-means and p-power k-means.Since q-skeleta and (k)-strata are closed in all sample spaces, as well as in the p-Wasserstein space, the existence of empirical and population polymeans is ensured.The uniqueness is a much harder problem.In the Riemannian case with p = 2, recent results on the regularity of the singular set of the distance to a sufficiently regular set show that empirical polymeans of i.i.d.samples with an absolutely continuous law are almost surely unique.This partly extends the previous result of [4] on the uniqueness of the empirical Fréchet p-mean.
We turn in Section 6 to probability distributions on sample spaces.It turns out that the correct space of infinite samples is not the quotient space M N /S (N) but the space P(M ) of probability distributions on M .Indeed, using this definition one obtains as in the theory of Hewitt and Savage [27] that probability distributions on infinite sample spaces correspond exactly to symmetric probability distributions on configuration spaces, which in turn correspond exactly to mixtures of product distributions.This definition is also in line with the infinite-sample limit 4.8.The analogous statement for random variables instead of probability distributions is that random samples correspond exactly (possibly after passing to an extended probability space) to conditionally i.i.d.random configurations; see 6.6.
This setting allows us to establish in Section 7 asymptotic properties of polymeans.We first show that the empirical q-means are strongly consistent estimators for the population q-means, in the sense that any accumulation point of the sets of empirical q-means is a population q-mean.Thus, when the population q-mean is unique, any measurable selection of empirical q-means converges in probability to the population q-mean, and we may inquire about the rate of convergence.We derive in 7.4 an upper bound on the convergence rate of empirical q-means to the population q-mean.The bound depends first on the convergence rate in Wasserstein space of empirical distributions-a well studied subject-and second on the subspace geometry of the q-skeleton within Wasserstein space-a purely geometric question.It remains an open problem if the bound is sharp and if q-means are asymptotically normal after a suitable normalization.However, when M is a Riemannian manifold, we establish in 7.6 the asymptotic normality of unweighted q-means for any p ≥ 1 under mild conditions (null measure of the union of the cut loci of the centroids and of their "medial axis" and non-degenerate expected Hessian of the power p distance to the closest centroid).We further refine this central limit theorem in 7.7 from i.i.d. to exchangeable sequences under some additional conditional independence assumptions.
In the appendix we collect some tools from path-metric geometry.
1.3.Open problems and future work.Our framework opens the door to many further investigations by linking two traditionally distinct strands of literature, namely, statistics on manifolds and orbifold or path-metric geometry.Tools from these fields can be fruitfully combined.The setup is fully general and applies to curved spaces and more general stratified spaces, as needed in the previously cited applications.It also encompasses Fréchet p-means and not only the classical 2mean, which opens the way to many useful asymptotic results for robust statistics.
Our results also suggest that the non-standard convergence rates in the CLT are not only due to the geometry of M but also the subspace geometry of the kskeleta within the sample spaces.For instance, considering the Fréchet mean as a projection on the 1-skeleton casts a new geometric light on the uniqueness problem: in a Riemannian manifold, it is unique whenever there is no mass on the singular set of the distance function to the 1-skeleton.Thus, one can conjecture that the geometry of the "medial axis" of the q-skeleton in p-Wasserstein space controls the uniqueness of the polymeans and that advances on the sub-space geometry of this set within Wasserstein space would extend this uniqueness theorem to more general settings.
Likewise, the rate of convergence of the empirical 2-mean towards the population 2-mean is controlled by the eigenvalues of the expected Hessian of the squared distance (Corollary 7.7).The convergence rate towards the limiting distribution in the direction of an eigenvector falls below √ n whenever the corresponding eigenvalue vanishes.Conversely, stickiness could be induced by eigenvalues going to infinity.This last behavior cannot happen in smooth Riemannian manifolds, but it can be approached by concentrating the curvature at singular points.This could be a way to study stickiness on smoothable manifolds.For i.i.d.samples with distribution P , we conjecture that these condition could be linked to the convexity or concavity of the geodesic distance in Wasserstein space from P to the polymean in the k-skeleton, and thus that it can be controlled by some kind of Ma-Trudinger-Wang (MTW) condition [22].

Orbit type stratification of sample spaces
Let M be a topological space.For any natural number n ∈ N >0 , the permutation group S n of n symbols acts on the n-fold product M n by permutation of the components.In symbols, we shall write x σ := x • σ for the action of σ ∈ S n on x ∈ M n .Definition 2.1 (Configurations and samples).An n-point configuration or ordered n-sample is an element of M n , and this space is called (ordered) configuration space.An n-sample is an element of the quotient space M n /S n , and this space is called sample space or unordered configuration space.The projection is denoted by π : Note that this definition of configuration spaces differs from the one commonly used in topology, where the points are required to be pairwise distinct.The set of pairwise distinct points is an open subset of M n , and its fundamental group in the case M = R 2 is the braid group.In contrast, we also consider the case where only q < n points are mutually distinct: Definition 2.2 (Skeleta).A configuration (x 1 , . . ., x n ) is said to belong to the qskeleton if it consists of at most q ∈ N distinct points x i .As the number of distinct points is S n -invariant, there is a corresponding notion of q-skeleta of samples.
The name skeleton is taken from the theory of simplicial complexes and cell complexes.The filtration of sample space into skeleta is rather coarse, and finer stratifications are needed to fully describe the local geometry of sample space.This is done next.Definition 2.3 (Orbifolds [52]).A Hausdorff topological space O is an orbifold, if the following data are given: • An open cover (U i ) of O which is closed under forming finite intersections.
• For each i there is an open subset V i ⊂ R N which is invariant under a faithful linear action of a finite group G i on R N and a G i -invariant quotient map π i : and a gluing map Lemma 2.4 (Orbifold structure of sample space).If M is a manifold, then the sample space M n /S n is an orbifold.Proof.For any x ∈ M n , choose a chart (U i , u i : U i → R m ) such that whenever The proof of 2.4 shows more generally that the quotient space of a smooth manifold with respect to a properly discontinuous action of a group is an orbifold; in this case it is sometimes called a developable or (by Thurston) a good orbifold.To understand the orbifold structure of sample space, one has to describe the different orbit types.Definition 2.5 (Orbit types).The orbit type of an ordered sample x ∈ M n is defined as the conjugacy class of its isotropy group (S n ) x := {σ ∈ S n : x σ = x}.As the orbit type is S n -invariant, there is a corresponding notion of orbit types of samples in M n /S n .
The following theorem classifies the orbit types of sample space.It turns out that there are many different orbit types, one for each partition of the integer n.This highlights the complicated geometry of sample space.
Theorem 2.6 (Classification of orbit types).The orbit types in the configuration space M n are exactly given by the integer partitions of n of the form Proof.This follows from the fact that a point x ∈ M n is fixed by a permutation if and only if and all other x i being distinct.Here ( is the cycle type of the permutation σ.For our purpose we enlarge the cycle type to (k )), and similarly for the other cycles in σ.Thus, the isotropy group of any x as above is conjugated to the subgroup S k1 × S k2 × . . .× S kp .Its conjugacy class is described by the cycle type (k 1 , . . ., k p ) with k 1 , . . ., k q ∈ N >0 , and equivalently by its enlargement to a partition of n. □ The configuration space M n and the sample space M n /S n are stratified by orbit type.
Definition 2.7 (Orbit-type strata).Let (H) denote the conjugacy class of any subgroup H of S n corresponding to a partition (k).We write (M n ) (H) and (M n ) (k) for the stratum of all points in M n of orbit type (H) and (k), respectively.Similarly, we write (M n /S n ) (H) and (M n /S n ) (k) for the corresponding stratum in M n /S n .Lemma 2.8 (Orbit-type strata).The stratum consists of all x = (x 1 , . . ., x n ) such that k 1 of the the x i are equal to y 1 ∈ M , k 2 of the remaining x i are equal to y 2 ̸ = y 1 in M , and so on, until the remaining k q of the x i are equal to y q ∈ M , and all y i are distinct.Thus, (M n ) (k) is the disjoint union of its connected components, which are all homeomorphic to the open subset of pairwise distinct points in M q .
Proof.This follows from the description of orbit types in the proof of 2.6.□ Definition 2.9 (Half-ordering of orbit types).For two conjugacy classes (H) and (H ′ ) of subgroups H and ).Note that the half-order between partitions is the inverse of the half-order between the corresponding conjugacy classes.The diagonal {x : is the stratum with the largest conjugacy class (S n ) and the smallest partition (n).The projection onto the corresponding stratum in M n /S n is a homeomorphism.The regular stratum is the open and dense subset of all configurations x with mutually distinct components x i .It has as orbit type the smallest conjugacy class ({Id}) and the largest partition (1 is open, dense, connected, and locally connected; it will also be denoted by Note that for q ≤ n, the q-skeleton of M n /S n is the the union of all orbit strata Proof.This follows from the description of the orbit-type strata given in 2.8, since at the boundary some distinct be a partition describing the orbit type (H) with H := S k1 × . . .× S kq .Then the projection (M n ) (k) → S n /N Sn (H) defines a topological fiber bundle, where N Sn (H) is the normalizer of H in S n , and where for any σ ∈ S n , the fiber over σ.N Sn (H) Proof.The proof in [43, 29.22], although given for smooth manifolds, is purely topological and applies here without change.□

Path metrics on sample spaces
The category of path-metric spaces is ideally suited for the description of sample spaces because it is well-behaved under quotients.We refer to the appendix for the definition of path metrics and some of their properties, and to the book of Gromov [26] and also [10] or [3] for further details.Throughout this section, d is a complete path metric on the separable topological space M , n ∈ N >0 , and p ∈ [1, ∞).
There are many choices of metrics on the configuration space M n which are consistent with the product topology.The following lemma describes some of them.Lemma 3.1 (Path metrics on configuration spaces).The following is a complete path metric on the configuration space M n : The identity on M n is Lipschitz continuous between any of the metrics Note that d p (x, y) = ∥d(x, y)∥ L p , where ∥ • ∥ L p denotes the L p norm of functions on the space {1, . . ., n} with the uniform probability distribution.The choice of normalizing constant 1  n is motivated by this probabilistic interpretation, as well as the large-sample limits in 4.3 and 4.8.
Proof.Completeness of (M n , d p ) follows from completeness of (M, d).As M is a path-metric space, there exists by A. Then obtains for the configuration z := c(x, y) by applying the L p norm that max{d p (x, z), d p (z, y)} ≤ rd p (x, y).This implies by A.3 that d p is a path metric on M n .The identity M n → M n is Lipschitz continuous under any of the metrics d p because x, y ∈ M n .□ The complete path metric d p on the ordered sample space M n induces a canonical quotient metric on the sample space M n /S n .As the permutation group S n acts isometrically on (M n , d p ), this quotient metric is again complete and admits a particularly simple description, as shown next.
Lemma 3.2 (Quotient metrics on sample spaces).The following quotient metric is a complete path metric on the sample space M n /S n : where x, ȳ ∈ M n /S n and x, y ∈ M n with π(x) = x, π(y) = ȳ.
Proof.The fibers of the projection are the orbits of the permutation group S n , which acts isometrically on (M n , d p ).Therefore, the metric dp is a path metric [10,Lemma 3.3.6].Moreover, this metric is complete: Given a Cauchy sequence in M n /S n , take a subsequence such that the distances between subsequent points are summable.Lift the sequence to M n such that distances between subsequent points are preserved.Then the lift is a Cauchy sequence, which converges thanks to the completeness of M n .□ Recall that a subset of a metric space is called convex if the restriction of the metric to this subset is a finite complete path metric [10, Definition 3.6.5].If the surrounding space carries a complete path metric, then this is equivalent to the subset being totally geodesic, i.e., any two points in the subset can be connected by a minimizing geodesic in the subset.Proof.
to the open subset of all pairwise distinct points in M q .This homeomorphism is even an isometry (up to a normalizing constant) under the d p metrics on M n and M q , respectively.Thus, the closure K is homeomorphic to M q .As (M q , d p ) is a complete path-metric space by 3.1, it follows that K is a convex subset of (M n , d p ).The projection π : M n → M n /S n restricts to an isometry π : K → (M n /S n ) (k) .It follows that dp restricts to a complete path metric on the closure of the stratum (M n /S n ) (k) .Therefore, by definition, the closure of the stratum (M n /S n ) (k) is a convex subset of (M n /S n , dp ).□ Example 3.4 (Lack of strict convexity).The closure of a connected component of an orbit stratum in M n need not be strictly convex in the sense that each minimal geodesic connecting two points in this stratum lies also in the stratum.
Proof.Let c 1 and c 2 be two distinct meridian geodesics in the 2-sphere M = S 2 , which connect the north pole N to the south pole S. Then c = (c 1 , c 2 ) is a minimizing geodesic between the points (N, N ) and (S, S) in M 2 .These points belong to the closed and connected orbit stratum (M As all summands are non-negative by the triangle inequality, they vanish.Equivalently, by Lemma A.5, all components c i : [0, 1] → M are constant-speed minimizing geodesics. For p = 1, one uses a similar argument for the energy-action pairs (d, ℓ) and (d 1 , ℓ 1 ), where ℓ is the length functional in (M, d), and ℓ 1 is the length functional in (M n , d 1 ).However, in this case, a curve is minimizing for these energy-action pairs if and only if it is a geodesic, regardless of whether it has constant speed or not.□ Theorem 3.6 (Geodesics between samples).Let M be a connected complete locally compact path-metric space.Then any minimizing geodesic in the sample space (M n /S n , dp ) is the projection of a minimizing geodesic in the configuration space (M n , d p ), which we call its horizontal lift.We next consider the special case where M is a finite-dimensional manifold with Riemannian metric g and complete geodesic distance d.Then 1 n (g ⊕ • • • ⊕ g) is an S n -invariant Riemannian metric on M n , whose geodesic distance is the metric d 2 on M n .The quotient space M n /S n carries a rich differential-geometric structure, which is described in detail in [43,Sections 29 and 30].In particular, one obtains by differential-geometric arguments that a minimal geodesic segment is more regular at interior points than at the end points.This is formalized in the following theorem.
Theorem 3.7 (Interior regularity of Riemannian geodesics [2, 3.5 and 3.4]).Let M be a finite-dimensional manifold with complete Riemannian metric g, and let M n be the product manifold with the product Riemannian metric 1 n (g ⊕ • • • ⊕ g).Then, for any lift c : [0, 1] → M n of a minimal geodesic segment in M n /S n , the isotropy group (S n ) c(t) of an interior point of c is contained in the isotropy groups (S n ) c(0) , (S n ) c(1) of the end points.
Thus, for any subgroup H ≤ S n , the set (M n /S n ) ≤(H) of orbits with orbit type smaller or equal to (H) is a strictly convex subset of M n /S n .This means that any minimal geodesic segment between two points in (M n /S n ) ≤(H) lies entirely in (M n /S n ) ≤(H) .In particular, the regular orbit-type stratum in M n /S n is a strictly convex open dense subset.Recall for comparison that (M n /S n ) ≥(H) is convex by 3.3 but may not be strictly convex by 3.4.is not contained in the trivial isotropy group of c(0) = (x, y).See the related discussion in [51,Chapter 8].Note that the 'curvature' of the spider at 0 is −∞.□

Infinite configuration and sample spaces
This section exhibits configuration spaces as spaces of random variables and sample spaces as spaces of probability distributions.Moreover, it identifies large-sample limits of these spaces.Throughout this section, (M, d) is a separable connected complete path-metric space, and p ∈ [1, ∞).Definition 4.1 (Random variables).For any complete probability space (Ω, F, P), we write L p (Ω, M ) for the space of all measurable functions x : Ω → M which satisfy for one (or equivalently, all) o ∈ M that ∥d(x, o)∥ L p (Ω) < ∞.We endow the space L p (Ω, M ) with the metric x, y ∈ L p (Ω, M ).
Proof.A configuration x ∈ M n is precisely a function x : {1, . . ., n} → M , and the metrics d p defined on M n and L p ({1, . . ., n}, M ) coincide.□ The description of configurations as random variables allows one to pass to a large-sample limit.Similar results are shown in [39].In the following lemma, (0, 1) denotes the unit interval with the Lebesgue measure and could, for all purposes, be replaced by any standard probability space.Lemma 4.3 (Infinite configurations).The configuration spaces (M n , d p ) are isometrically embedded in the complete path-metric space (L p ((0, 1), M ), d p ) and converge to it in the following sense: for any compact K ⊂ L p ((0, 1), M ), The lemma would imply pointed Gromov-Hausdorff convergence of (M n , d p ) to the space L p ((0, 1), M ) if the uniform convergence on compacts could be strengthened to uniform convergence on bounded sets.However, this is not the case, as one easily verifies by considering functions x of the form n 1/p 1 [0,1/n] for large n.
Proof.The isometric immersion of (M n , d p ) ∼ = L p ({1, . . ., n}, M ) into L p ((0, 1), M ) is given by the identification of n-tuples with piece-wise constant functions on (0, 1).It remains to prove the convergence.Let ϵ > 0. By the compactness of K, there are m ∈ N and x 1 , . . ., x m ∈ L p ((0, 1), M ) such that the open d p -balls B ϵ/3 (x i ) cover K. Let o ∈ M .By the dominated convergence theorem, there is r > 0 such that the configurations y i ∈ L p ((0, 1), M ) defined by For any n ∈ N, let E n : L p ((0, 1), F ) → L p ((0, 1), F ) be the conditional expectation with respect to the sigma-algebra generated by the intervals [ j−1 n , j n ), j ∈ {1, . . ., n}.Then, for sufficiently large n, the configurations z i := E n (y i ) satisfy d p (y i , z i ) ≤ ϵ/3 for all i ∈ {1, . . ., m}.Let A : F → M be the metric projection from f ∈ F to the nearest point A(f ) ∈ M , and let A * : L p ((0, 1), F ) → L p ((0, 1), M ) be the pushforward along A. Then the configurations w i := A * z i satisfy for all i ∈ {1, . . ., m} that It follows that every x ∈ K is ϵ-close to some w i ∈ L p ({1, . . ., n}, M ).□ Recall that any continuous curve c : [0, 1] → L p ((0, 1), M ) has a jointly measurable version c : [0, 1] × (0, 1) → M ; see e.g.[15,Proposition 3.2].Then the sample paths of c are the measurable functions c(•, ω) : [0, 1] → M , ω ∈ (0, 1).Lemma 4.4 (Geodesics between infinite configurations).(L p ((0, 1), M ), d p ) is a complete path-metric space.For p > 1, a continuous curve c : [0, 1] → L p ((0, 1), M ) is a constant-speed minimizing geodesic in (L p ((0, 1), M ), d p ) if and only if almost all of its sample paths are constant-speed minimizing geodesics in M .Proof.To show that L p (Ω, M ) is a complete path-metric space, we proceed as in the proof of 3.1, noting that the point c = c(a, b) can be chosen as a measurable function of a, b.Indeed, this follows from a measurable selection theorem [17] where E is the expectation with respect to the Lebesgue measure on (0, 1).Equivalently, the following property holds almost surely: for all rational numbers By A.3 this implies for almost every ω ∈ (0, 1) that the sample path is parameterized by constant speed.In particular, any such sample path can be extended continuously to all real numbers in [0, 1].Thus, we have established that c has a version whose sample paths are almost surely constant-speed minimizing geodesics.Moreover, this property is equivalent to the previous ones.□ On finite probability spaces, the statement about geodesics in 4.4 extends to p = 1 if the constant-speed condition is omitted, as shown in 3.5.However, this is not the case on infinite probability spaces, as the following example shows.
is a constant-speed minimizing geodesic in L 1 ((0, 1), M ), but none of its sample paths are continuous.□ Definition 4.6 (probability distributions).Let P p (M ) denote the space of all probability distributions P on M which satisfy for one (equivalently, all) o ∈ M that ∥d(o, •)∥ L p (P ) < ∞.We endow P p (M ) with the Wasserstein metric, where the infimum is over all probability distributions R on M × M with marginals P, Q.Moreover, we write P n (M ) for the subset of all atomic probability distributions of the form 1 n n i=1 δ xi , where δ xi is the Dirac measure centered at x i ∈ M .
As an aside, the set P n (M ) of atomic distributions can equivalently be characterized as the set of {0, 1/n, . . ., 1}-valued probability measures.This equivalence uses the separability of M and is shown in A.6.The following lemma identifies samples with probability distributions, namely, with their empirical laws.Lemma 4.7 (Samples as probability distributions).For any n ∈ N, the sample space (M n /S n , dp ) is isometric to the space (P n (M ), dp ) of atomic probability distributions.
Proof.Samples x = π(x) ∈ M n /S n are naturally identified with atomic probability distributions where the last minimum is over all atomic probability distributions R ∈ P n (M ×M ) with marginal laws P and Q.By Birkhoff's theorem, one may equivalently take the minimum over the larger set of all (not necessarily atomic) probability distributions R on M × M with marginal laws P and Q [44, Proposition 1.3.1].This shows that the right-hand side equals dp (P, Q).Therefore, the identification of samples with probability distributions is an isometry.□ Here M n /S n is identified with the subset P n (M ) of P p (M ) using 4.7. Proof Recall from 3.2 that the sample space (M n /S n , dp ) is the path-metric quotient of the configuration space (M n , d p ) with respect to the action of permutation group of {1, . . ., n}.A similar statement applies to infinite sample and configuration spaces, as shown in the following lemma.In analogy to 3.2, let π : L p ((0, 1), M ) → P p (M ) be the map from random variables to their law or, in more analytic terms, the pushforward of the Lebesgue measure along the given measurable function.Moreover, let Aut((0, 1)) be the automorphism group of the probability space (0, 1), i.e., the group of bi-measurable measure-preserving functions from (0, 1) to itself.Lemma 4.9 (Quotient structure).The Wasserstein metric dp on P p (M ) is a quotient metric: where P, Q ∈ P p (M ) and x, y ∈ L p ((0, 1), M ) with π(x) = P , π(y) = Q.
Proof.The first equality holds because any coupling R in the definition 4.6 of the Wasserstein metric is the joint law of some random variables x, y ∈ L p ((0, 1), M ).
The second equality holds because the action of Aut((0, 1)) is nearly transitive on the fibers of π in the following sense [12,Lemma 6.4]: for all x, y ∈ L p ((0, 1), M ) with π(x) = π(y) and all ϵ > 0, there exists σ ∈ Aut((0, 1)) such that The following lemma generalizes 3.6 from finite to infinite configurations and samples, respectively.Theorem 4.10 (Geodesics between infinite samples).Let M be a connected complete locally compact path-metric space.Then any minimizing geodesic in the infinite sample space (P p (M ), dp ) is the projection of a minimizing geodesic in the configuration space (L p (Ω, M ), d p ), which we call its horizontal lift.
Proof.This is proven in [51,Corollary 7.22] along the same lines as 3.6, i.e., using Lagrangian energy-action pairs.The horizontal lift is called displacement interpolation there.□ Skeleta and orbit-type strata of finite sample spaces M n /S n were defined in 2.1 and 2.7, respectively.Via the isometry 4.7 to atomic probability distributions and the isometric embedding 4.8 into p-integrable probability distributions, one obtains straight-forward extensions to skeleta and orbit-type strata of infinite sample spaces, as defined next.Definition 4.11 (Infinite skeleta and orbit-type strata).For any q ∈ N, the qskeleton in the infinite-sample space P p (M ) is the subset P(M ) q of all probability distributions whose support is a set of at most q points.Similarly, for any partition (w) := (w 1 ≥ • • • ≥ w q ) of 1 consisting of non-negative real numbers w i summing up to 1, the (w)-stratum in the infinite-sample space P p (M ) is the subset of all P = q i=1 w i δ xi ∈ P(M ) q with distinct points x i .The measure P is called regular if the points x i are distinct and the weights w i are strictly positive.

Means and polymeans
In this section, we generalize Fréchet means [25] and k-means [40] to polymeans using the path-metric structure of sample space.Background and further references on Fréchet means can be found in the textbook [47].Throughout this section, we consider the configuration space (M n , d p ) and sample space (M n /S n , dp ) of a connected complete path-metric space (M, d) for some n ∈ N and p ∈ [1, ∞).The following definition introduces polymeans as metric projections onto certain subsets of sample space M n /S n , namely q-skeleta (M n /S n ) q (see 2.2) or (k)-strata (M n /S n ) (k) (see 2.8).
Definition 5.1 (Polymeans).For any q ∈ N, a q-mean of a sample is a dp -nearest point in the q-skeleton of sample space.Similarly, for any partition (k) of n, a (k)-mean of a sample is a dp -nearest point in the closure of the (k)-stratum.
Recall that the q-skeleton is closed, and the closure of the (k)-stratum is the union of all (k ′ )-strata with (k ′ ) ≤ (k).This ensures the existence of q-means and (k)-means, as shown next.One should be aware that a q-mean might consist of less than q distinct points, and similarly a (k)-mean might have orbit type (k ′ ) with (k ′ ) ≤ (k).Lemma 5.2 (Existence of polymeans).If M is a complete locally compact pathmetric space, then every sample x ∈ M n /S n has a q-mean and a (k)-mean, for each q ∈ N >0 and orbit type (k Proof.For sufficiently large r > 0, the closed ball B r (x) has non-empty intersection with the q-skeleton.By the Hopf-Rinow theorem A.2, this intersection is compact and therefore contains a point of minimal dp -distance to x.The argument for the (k)-stratum is similar.□ Generic configurations have unique polymeans, as shown next.Here generic is understood in a measure-theoretic sense, i.e., up to null sets with respect to a given Riemannian volume form.

Lemma 5.3 (Uniqueness of polymeans).
Let M be a complete finite-dimensional Riemannian manifold, and assume that p = 2. Then the configurations x ∈ M n such that π(x) has more than one q-mean or more than one (k)-mean are a null set with respect to the Riemannian volume form.
Proof.We consider M n as a complete Riemannian manifold with Riemannian distance d 2 .Let K be the q-skeleton or the (k)-stratum in M n , and let C be the set of all points in M n whose distance to K is realized by more than one geodesic (sometimes called the medial axis).At any point in C, the squared distance function to K is non-differentiable [41,Remark 3.6].These points of non-differentiability constitute a C 2 -rectifiable set [41,Proposition 3.7].Thus, its subset C has vanishing measure.□ We next show that the definition of polymeans extends the definition of Fréchet p-means.
Proof.Recall that the 1-skeleton in sample space M n /S n consists of all ȳ = π(y, . . ., y) with y ∈ M and coincides with the orbit-type stratum (M n /S n ) (n) , where (n) denotes the partition of n of length 1.Thus, 1-means coincide with (n)-means and minimize, for a given x = π(x) in M n /S n , the functional over all ȳ = π(y, . . ., y) in the 1-skeleton of M n /S n .Minimizers of the right-hand side, seen as a function of y ∈ M , are exactly Fréchet means.Thus, a point y ∈ M is a Fréchet mean of a configuration x ∈ M n if and only if the sample π((y, . . ., y)) ∈ M n /S n is a 1-mean, or equivalently an (n)-mean, of π(x) ∈ M n /S n .□ k-mean clustering remains a very popular method in cluster analysis, more than 60 years after [40,32].Like the Fréchet p-mean, it can be generalized with the power p of the distance [54].We show below that this corresponds to our geometric definition of polymeans.
Example 5.5 (k-means).q-means correspond exactly to k-means clustering for k = q ∈ N.
Proof.Let x, ȳ ∈ M n /S n with ȳ belonging to the q-skeleton.Then there are lifts x, y ∈ M n such that π(x) = x, π(y) = ȳ, and d p (x, y) = dp (x, ȳ).The set {1, . . ., n} can be partitioned into non-empty subsets A 1 , . . ., A q such that y i = y j for any i, j ∈ S k and k ∈ {1, . . ., q}.Then The left-hand side is minimized by q-means ȳ, and the right-hand side is minimized by partitions A 1 , . . ., A k and k-means (y 1 , . . ., y k ) with k = q.Therefore, the q-mean and k-mean problems are equivalent.As an aside, the q-mean vector ȳ does not encode the optimal correspondence between points x i and y i , and the k-mean vector (y 1 , . . ., y k ) does not encode the multiplicities #A i .However, this information can be retrieved easily by matching each point x j to the nearest point In this situation, xi are called clusters or sub-samples of sizes k i .
Lemma 5.7 (Polymeans as clusters).If ȳ is a q-mean of x, then there are clus- , then the partition can be chosen such that each cluster xi has size k i .
Proof.Let A 1 , . . ., A q be a partition of {1, . . ., n} as in the proof of 5.6.Then the clusterings xi = π((x j ) j∈Ai ) and ȳi = π((y j ) j∈Ai ) have the desired property.□ Lemma 5.7 exhibits polymeans as weighted means, where the weights correspond to the cluster sizes, normalized by the total number of samples.The same interpretation is obtained by identifying polymeans with atomic measures via 4.7.In some situations it may be advantageous to consider unweighted polymeans, which encode only the locations but not the weights of the clusters.The following definition describes q such clusters located at mutually distinct points y 1 , . . ., y q ∈ M .Recall that the ensemble of such mutually distinct point configurations modulo permutations is the regular stratum (M q /S q ) reg .Definition 5.8 (Unweighted q-means).For any q ∈ N, an unweighted q-mean of a sample x = π(x) ∈ M n /S n is a regular q-sample z ∈ (M q /S q ) reg which minimizes the functional Unweighted q-means may fail to exist for a given q ∈ N >0 because the regular stratum (M q /S q ) reg is not closed.It is, however, open and dense.Thus, for any given q ∈ N >0 , there always exists an unweighted q ′ -mean with q ′ ≤ q.The definitions of weighted and unweighted polymeans are consistent with each other in the following sense.Lemma 5.9 (Relation between weighted and unweighted q-means).Let x ∈ M n /S n , and let z 1 , . . ., z q be distinct points in M .Then π(z 1 , . . ., z q ) ∈ (M q /S q ) reg is an unweighted q-mean of x if and only if π((z 1 , . . ., z 1 k1 times , . . ., z q , . . ., z q kq times )) ∈ M n /S n is a q-mean of x for some integer weights k i summing up to n.
Proof.This easily follows from the definitions.□ Skeleta and orbit-type strata in infinite sample space P p (M ) were defined in 4.10.This yields the following straight-forward extensions to polymeans of infinite samples.
Definition 5.10 (Population polymeans).A population q-mean of an infinite sample P ∈ P p (M ) is a dp -nearest point in the q-skeleton of P p (M ).Similarly, for any partition (k) := (k 1 ≥ • • • ≥ k q ) consisting of non-negative real numbers k i summing up to 1, a population (k)-mean of P ∈ P p (M ) is a dp -closest point in the (k)stratum of P p (M ).Moreover, an unweighted population q-mean of P ∈ P p (M ) is a dp -closest point in the regular stratum of P q (M ).

Random samples
Throughout this section, we consider the configuration space (M n , d p ) and sample space (M n /S n , dp ) of a separable complete path-metric space (M, d) for some n ∈ N and p ∈ [1, ∞).We use the letter P to designate probability distributions.Thus, P(M n /S n ) is the set of probability distributions on sample space, and P(M n ) is the set of all probability distributions on configuration space.Moreover, we write P(M n ) Sn for the subset of symmetric probability distributions, where symmetry means S n -invariance.Lemma 6.1 (Distributions of samples).Probability distributions on sample space M n /S n correspond exactly to symmetric probability distributions on configuration space M n .Proof.We claim that the projection from configuration onto sample space induces a bijection P(M n ) Sn ∋ P → π * P ∈ P(M n /S n ).To prove the claim, we will construct an inverse of this map by randomization over the S n -orbit using the probability kernel This kernel is S n -invariant and consequently descends to a probability kernel which maps samples x to uniform distributions on their fibers π −1 (x) in configuration space.The two kernels are related by K = K • π.For any probability distribution P on M n /S n , we write K(x) P (dx) for the composition of the kernel K with the probability distribution P .Formally, this is a measure-valued Pettis integral.Then the map where r σ : M n ∋ x → x σ ∈ M n is the action of the permutation σ on the configuration space, and where the last equality follows from the symmetry of P .□ Hewitt and Savage [27,Section 12] characterized the set of extremal points within the convex set of symmetric probability distributions on M n , for short, extremal distributions.Moreover, they proved that every symmetric probability distribution is a mixture of extremal distributions and called such mixtures presentable.As a corollary to Lemma 6.1, one obtains an elementary proof of these facts.The more widely studied case of infinite configurations is discussed in 6.3 and 6.4.Proof.The map (6.1.2) is a linear bijection and therefore maps extremal points in its domain to extremal points in its range.The extremal points in the domain are easily identified as the Dirac measures.The image of a Dirac measure δ x with x = π(x) ∈ M n /S n is the distribution 1 n! σ∈Sn δ xσ .The range of the map (6.1.2) consists of mixtures of such distributions, i.e., presentable distributions.Moreover, as (6.1.2) is surjective, all symmetric distributions are presentable.□ The following lemma characterizes distributions of infinite samples, thereby generalizing the corresponding result 6.1 for finite samples.The full permutation group S N of the natural numbers is too large for our purpose.Instead, we consider the infinite permutation group S (N) := n∈N S n , which acts upon the infinite configuration space M N := n∈N M .A probability distribution on M N is called symmetric if it is S (N) -invariant, and the set of symmetric distributions is denoted by P(M N ) S (N) .The correct space of infinite samples, which leads to a generalization of 6.1, is not the quotient space M N /S (N) , but the space P(M ).This is demonstrated in Example 6.5 and is in line with the limiting result 4.8.Lemma 6.3 (Distributions of infinite samples).Probability distributions on the infinite sample space P(M ) correspond exactly to symmetric probability distributions on the configuration space M N .Proof.For some fixed point o ∈ M , define a projection from infinite configuration space to infinite sample space as follows: δ xi , if the weak limit exists, The push-forward along this projection restricts to the following map from symmetric distributions to probability distributions on infinite sample space P(M ): π * : P(M N ) S (N) → P(P(M )).
We claim that the map π * is an inverse of the map where P N := n∈N P denotes the product distribution on M N , and where the integral is a measure-valued Pettis integral.Note that the distributions on the righthand side are laws of conditionally i.i.d.sequences of M -valued random variables.To prove the claim, we appeal to the infinite-sample version 6.4 of the Hewitt-Savage theorem, which states that symmetric distributions coincide exactly with presentable distributions, i.e., with Pettis integrals as above.For any P ∈ P(M ), the weak law of large numbers implies π(x) = P for P N -almost every x ∈ M N .This implies π * (P N ) = δ P .Consequently, every Q ∈ P(P(M )) satisfies This proves the claim and establishes the desired one-to-one correspondence.□ The above proof uses the well-known Hewitt-Savage theorem [27], which is a generalization of Corollary 6.2 to infinite sample spaces.As before, presentable distributions are defined as mixtures of extremal distributions, i.e., of extremal points in the convex set of symmetric distributions.Theorem 6.4 (Infinite Hewitt-Savage theorem [27]).The extremal points in the convex set P(M N ) S (N) of symmetric distributions are exactly the product distributions P N := n∈N P with P ∈ P(M ).Moreover, all symmetric distributions on M N are presentable.
This result is asymptotically consistent with its finite-sample counterpart 6.2.Indeed, by the Diaconis-Freedman theorem [18] symmetric distributions on M n are close to mixtures of product distributions for large n.More precisely, the total variation distance from k-dimensional marginal distributions of elements of P(M n ) Sn to mixtures of product distributions is at most k(k − 1)/n.
The following example shows that the correspondence 6.3 between probability distributions on sample space and symmetric probability distributions on configuration space fails if the sample space is defined as M N /S (N) instead of P(M ).

Example 6.5 (Infinite sample space).
There is a probability distribution on M N /S (N) which does not correspond to any symmetric probability distribution on M N .Proof.In analogy to 6.3 we say that a probability distribution Q on M N /S (N) corresponds to a symmetric probability distribution P on M N if Q = π * P , where π : M N → M N /S (N) is the canonical projection.For any such P , the weak limit lim n→∞ 1 n n i=1 δ xi exists for P -almost every x ∈ M N , as shown in the proof of 6.3.Moreover, this limit is invariant under the action of S (N) on M N because every permutation in S (N) affects only finitely many indices.Thus, if Q corresponds to some P , then the limit lim n→∞ 1 n n i=1 δ xi is well-defined and exists for Q-almost every x ∈ M N /S (N) .However, it is easy to construct a distribution Q on M N /S (N) which does not have this property.Indeed, assuming that M contains at least two points, one may construct a sequence of points x i ∈ M such that 1 n n i=1 δ xi does not converge weakly as n → ∞.Then Q := δ x with x := π(x) is the desired counter-example.□ We next investigate random samples and random configurations.For this purpose, we fix a probability space (Ω, F, P) on which all random variables are defined.A random configuration is a random variable in M n or M N , depending on the finite versus infinite case.Similarly, a random sample is a random variable in M n /S n or P(M ), respectively.A random configuration is called exchangeable if its law is symmetric, i.e., invariant under permutations in S n or S (N) , respectively.The following characterization is analogous to 6.1-6.4.Corollary 6.6 (Random configurations and samples).Random samples correspond exactly (possibly after passing to an extended probability space) to exchangeable configurations, which in turn correspond exactly to conditionally i.i.d.M -valued random variables.This statement applies to finite and infinite configurations and samples, respectively.
Proof.This can be shown in analogy to 6.1-6.4,working with random variables instead of their laws.The extension of the probability space is necessary, unless the given probability space is already sufficiently rich, for implementing the random ordering in the proof of 6.1 and the i.i.d.sampling in the proof of 6.3.□

Asymptotic properties of polymeans
Polymeans, similar to Fréchet means [29], satisfy a law of large numbers and a central limit theorem under suitable conditions, as shown next.We refer to 5.1, 5.8, and 5.10 for their definition.Throughout this section, (M, d) is a separable complete connected path-metric space, p ∈ [1, ∞), and q ∈ N >0 .The space M , as well topological products and quotients thereof, are endowed with the corresponding Borel sigma algebras.For some probability distribution P ∈ P p (M ), we consider a sequence of independent P -distributed random variables (x i ) i∈N defined on a complete probability space (Ω, F, P).The corresponding n-samples are denoted by xn := π(x 1 , . . ., x n ) ∈ M n /S n .We write µ n ⊂ (M n /S n ) q for the set of qmeans of xn , ȳn ∈ (M n /S n ) q for a measurable selection of q-means of xn , and zn ∈ (M q /S q ) reg for a measurable selection of unweighted q-means of xn .It will be convenient to identify the samples xn , ȳn , zn with their empirical laws P n , Q n , R n , respectively, using the isometry 4.7 between M n /S n and P n (M ).The population counterparts of the above empirical objects are denoted by µ 0 , ȳ0 , z0 , Q 0 , and R 0 , respectively.Note that all of these objects belong to one and the same path-metric space P p (M ) thanks to the isometric embedding 4.8 of finite into infinite sample spaces.
Definition 7.1 (Strong consistency [55]).The empirical q-means µ n are called strongly consistent estimators for the set µ 0 of population q-means if Note that strong consistency is equivalent to the following statement: with probability 1, any accumulation point of the sets µ n belongs to µ 0 .Lemma 7.2 (Strong consistency).The empirical q-means µ n are strongly consistent estimators for the population q-means µ 0 .
This statement is a consequence of the Gamma-convergence of the functionals which are minimized by µ n and µ 0 , respectively, as shown in the following proof.A similar argument is used in [55] and [29,Theorem A.3].These proofs are longer because implications of Gamma-convergence are re-proven there.
Proof.The empirical q-means µ n are the minimizers of the functional Similarly, the population q-means µ are the minimizers of the functional The empirical laws P n converge to the population law P in the Wasserstein metric dp by [44,Proposition 2.2.6].We claim that this implies Gamma-convergence F n → F .To prove the claim, note that for any converging sequence Moreover, any Q ∈ P p (M ) q can be approximated in the dp -distance by a sequence Q n ∈ (M n /S n ) q .Indeed, Q is of the form Q = q i=1 w i δ xi for some x i ∈ M and w i ∈ [0, 1], and the approximations Q n may be defined by rounding the weights to the nearest multiples of 1/n.For any such approximating sequence This proves that F n Gamma-converges to F .Thus, the accumulation points of F n -minimizers are F -minimizers, which is exactly strong consistency.□ If the empirical q-means are strongly consistent and the population q-mean is unique, then any measurable selection Q n of empirical q-means converges in probability to the population q-mean Q 0 .In this situation one may inquire about the rate of convergence Q n → Q 0 .As an auxiliary first step, the following lemma shows that Q n possesses the same best-approximation property as Q 0 , up to some error terms.Controlling these error terms leads to the convergence rate established subsequently in 7.4.Lemma 7.3 (Error bound).Assume that P ∈ P 2p (M ), let Q 0 ∈ P(M ) q be a qmean of P , assume that Q 0 is distinct from P , and for each n ∈ N, let Q n ∈ P n (M ) q be a q-mean of the empirical law P n .Then dp (P, Q n ) − dp (P, Q 0 ) ≤ dp (P n , P ) + O P (n −1/2 ).
Note that the first assumption guarantees for P -almost every x ∈ M the existence of the Riemannian gradient of the function ρ(x, •) at z0 .Indeed, the only points x where the gradient may fail to exist are the points z0,i , their cut loci Cut(z 0,i ), and the locations which are closest to more than one z0,i .A further condition of [21] to be verified for all x ∈ M is that the function ρ(x, •) is uniformly continuous on bounded domains with respect to the metric dp on M q /S q .This follows from the estimate |ρ(x, π(z ′ )) − ρ(x, π(z))| ≤ max i∈{1,...,q} min j∈{1,...,q} d(z i , z ′ j ) p ≤ q dp (π(z), π(z ′ )) p .
Thus, we have verified the conditions of [21,Theorem 11], and it follows that the sequence zn or equivalently R n is asymptotically normal.□ The asymptotic normality of unweighted q-means generalizes from independent to exchangeable observations x 1 , x 2 , . . .under certain conditions.Equivalently, as shown in 6.6, the observations can be seen as random elements in an infinite sample space.
Proof.By the infinite Hewitt-Savage theorem 6.4 and its Corollary 6.6, the exchangeable sequence x 1 , x 2 , . . . is i.i.d.conditionally on some sigma algebra G.It follows from 7.6 that conditionally on G, the sequence R 1 , R 2 , . . . is asymptotically normal with mean R 0 and covariance Σ, for some G-measurable symmetric bilinear form Σ on the tangent space of (M q /S q ) reg at R 0 .The covariance Σ can be computed explicitly as follows.Let ρ be defined as in the proof of 7.6, and recall that the gradient of the function ρ(x, •) evaluated at R 0 exists for P -almost every x ∈ M .Therefore, for any i ∈ N >0 , one may define the random variable X i as the gradient of the random function ρ(x i , •) evaluated at R 0 .Accordingly, X i is a random variable with values in the tangent space of (M q /S q ) reg at R 0 .Let H denote the Hessian of the function P ρ at R 0 .Thanks to the non-degeneracy assumption in 7.6, H is an automorphism on the tangent space of (M q /S q ) reg at R 0 , and we denote its inverse by H−1 .Then the covariance Σ is given by [21,Theorem 11] To ensure that Σ is deterministic, we make the following assumption: (1) show that (1) is equivalent to Minimizing curves for (E, A) are exactly constant-speed minimizing geodesics.
Proof.Properties (1) and ( 3) of Lagrangian actions hold by definition.Property (2) can be verified as follows: as (M, d) is a path-metric space, the definition of the energy implies for any real numbers s ≤ t and points x, y ∈ M that E s,t (x, y) = inf Estimating the right-hand side using Hölder's inequality yields E s,t (x, y) ≤ inf A s,t (c).
For constant-speed curves, Hölder's inequality is an equality.Moreover, any continuous curve can be reparameterized to constant speed.Therefore, the preceding inequality is actually an equality.This shows (2).The statement about minimizing curves hinges on the following Hölder inequality: for all u ≤ v ≤ w in the domain of a continuous curve c : Equivalently, by the above Hölder inequality, it holds for all u ≤ v ≤ w in [s, t] that E u,v (c(u), c(v)) + E v,w (c(v), c(w) = E u,w (c(u), c(w)), which means that c minimizes the energy-action pair (E, A). □ Lemma A.6 (Atomic distributions).Let M be a metric space or, more generally, a first-countable space.Then the set P n (M ) coincides with the set of {0, 1/n, . . ., 1}valued probability distributions on M .
Proof.Clearly, every distribution in P n (M ) takes values in {0, 1/n, . . ., 1}.Conversely, assume that P is a {0, 1/n, . . ., 1}-valued probability distribution.Let x ∈ M , and let (U i ) i∈N be a decreasing basis of open neighborhoods of x.If min i∈N P (U i ) vanishes, then it vanishes for sufficiently large i, and consequently x does not belong to the support of P .Otherwise, P ({x}) = min i∈N P (U i ) ≥ 1 n , which can be the case for only finitely many x ∈ M .Therefore, the support of P is a finite set.It follows that P is a weighted sum of Dirac measures at distinct points in M .Necessarily, the weights are multiples of 1/n.□ 3 for any r > 1/2 and any a, b ∈ M a point c = c(a, b) ∈ M such that max{d(a, c), d(c, b)} ≤ rd(a, b).

Lemma 3 . 3 (
Convexity of orbit-type strata).Connected components of orbit-type strata in the configuration space (M n , d p ) have convex closures.Moreover, orbittype strata in the sample space (M n /S n , dp ) have convex closures.
a constant-speed minimizing geodesic, and let x ∈ π −1 (c(0)).For each m ∈ N we construct a curve c m ∈ C([0, 1], M n ) as follows: Set c m (0) := x, and then inductively, for each k ∈ {0, . . ., m − 1}, choose c m |[(k/m, (k + 1)/m] as a constant-speed minimizing geodesic from c m (k/m) to the orbit π −1 (c((k + 1)/m)), until c m reaches the orbit π −1 (c(1)).The family {c m : m ∈ N} is equicontinuous because the curves c m have constant speed.Moreover, all curves c m take values in the compact ball of radius dp (c(0), c(1)) around x, which is compact by the Hopf-Rinov theorem A.2. Thus, by the Arzelà-Ascoli theorem [53, Theorem 43.15], the set {c m : m ∈ N} is pre-compact in the topology of uniform convergence and therefore has a cluster point c ∈ C([0, 1], M n ).The cluster point satisfies π • c = c because the curves c m satisfy π(c m (k/m)) = c(k/m) for all 0 ≤ k ≤ m.By construction, c is a minimizing geodesic.□

Example 3 . 8 (
Lack of interior regularity).The assertion of 3.7 is wrong for non-Riemannian complete path-metric spaces.Proof.Let (M, d) be an open book space, for example the 3-spider, one of the simplest tree spaces[8].We choose 3 points x, y, z on the 3 lines with the same distance from the center 0. Let c : [0, 2] → M 2 be the minimal geodesic from c(0) = (x, y) via c(1) = (0, 0) to c(2) = (z, z).Then the isotropy group S 2 of c(1) and c(2) r, satisfy d p (x i , y i ) ≤ ϵ/3 for all i ∈ {1, . . ., m}.Let F be the Banach space of continuous bounded functions on B r (o) with the uniform norm.Then (B r (o), d) embeds isometrically into F via the map B r (o) ∋ a → d(a, •) ∈ F .Thus, B r (o) may be seen as a subset of F .Moreover, F is separable because B r (o) is separable.
because the set Γ := (a, c, b) ∈ M 3 : max{d(a, c), d(c, b)} ≤ αd(a, b) is Polish, the projection Γ ∋ (a, c, b) → (a, b) ∈ M 2 is continuous, and the inverse image of any (a, b) ∈ M 2 under this projection is compact.To prove the statement about geodesics, we proceed as in the proof of 3.5 and associate Lagrangian energyaction pairs (E, A) and (E p , A p ) to (M, d) and (L p ((0, 1), M ), d p ), respectively.By A.3 a continuous curve c : [0, 1] → L p ((0, 1), M ) is a length-minimizing constantspeed geodesic if and only if it satisfies for all u ≤ v ≤ w in [0, 1] that

Lemma 4 . 8 (
Infinite samples).The sample spaces (M n /S n , dp ) are isometrically embedded in the complete path-metric space (P p (M ), dp ).For M locally compact, they converge to (P p (M ), dp ) in the following sense: for any compact K ⊂ P p (

Corollary 6 . 2 (
Finite Hewitt-Savage theorem).The extremal points in the convex set P(M n ) Sn of symmetric distributions are exactly of the form 1 n! σ∈Sn δ xσ , x ∈ M n .Moreover, all symmetric probability distributions on M n are presentable.
Theorem 3.5(Geodesics between configurations).A continuous curve c : [0, 1] → M n is a constant-speed minimizing geodesic in (M n , d p ) with p ∈ (1, ∞) if and only if its component curves c 1 , . . ., c n : [0, 1] → M are constant-speed minimal geodesics in (M, d).For p = 1 a similar statement holds without the requirement of constant speed.
2 ) (2) , but the geodesic c does not lie in (M 2 ) (2) .□ Proof.For p > 1, we associate Lagrangian energy-action pair (E, A) and (E p , A p ) to (M, d) and (M n , d p ), respectively, as in A.5: