Towards a bilipschitz invariant theory

Consider the quotient of a Hilbert space by a subgroup of its automorphisms. We study whether this orbit space can be embedded into a Hilbert space by a bilipschitz map, and we identify constraints on such embeddings.


Introduction
Given a Hilbert space and a subgroup of its automorphisms (isometric linear bijections), we wish to embed the corresponding orbit space into a Hilbert space by a bilipschitz map.We are motivated by the analysis of data that resides in such an orbit space.For example: Graphs.The adjacency matrix of an unlabeled graph on n vertices is a member of R n×n , but this matrix representation is only unique up to the conjugation action of S n .
Point clouds.A point cloud consisting of n vectors in R d can be represented as a member of R d×n up to the right action of S n (i.e., column permutation).
Landmarks.Landmarks of a biological specimen can be represented in R 3×n , with the left action of SO(3) giving different representations of the same specimen.
Audio signals.An audio signal can be modeled as a real-valued function in L 2 (R), but different time delays of the same signal arise from an orthogonal action of R.
In each of the settings above, data is represented in a Hilbert space V , but each data point is naturally identified with the other members of its orbit under a group G of automorphisms of V .To correctly measure distance, it is more appropriate to treat the data as residing in the orbit space V /G := {G • x : x ∈ V }.However, most of the existing data processing algorithms were designed for data in a Hilbert space, not an orbit space.Luckily, as the following section demonstrates, a bilipschitz embedding of the orbit space into a Hilbert space will enable one to use such algorithms.This motivates the study of whether a given orbit space can be embedded into a Hilbert space by a bilipschitz map.In this paper, we identify several necessary and sufficient conditions for such embeddings, and in some cases, we explicitly construct bilipschitz embeddings that minimally distort the quotient distance.
Section 3 defines the metric quotient V //G, whose points are the topological closures of G-orbits in V .This matches the intuition that salient features are continuous functions of the data; also, while V /G might only be a pseudometric space, V //G is always a metric space.Section 4 shows how a bilipschitz invariant on the sphere can be extended to a bilipschitz invariant on the entire space.In Section 5, we show that every bilipschitz invariant map is not differentiable.This represents a significant departure from classical invariant theory [56,76,84,12], which studies polynomial (hence differentiable) invariants.Section 6 then uses the extension from Section 4 to construct bilipschitz invariants for finite G ≤ O(d) from bilipschitz polynomial invariants on the sphere, which in turn exist precisely when G acts freely on the sphere (hence rarely).In Section 7, we use the semidefinite program from [67] to show that some of our extensions of polynomial invariants deliver the minimum possible distortion of V //G into a Hilbert space.Finally, Sections 8 and 9 estimate the Euclidean distortion of infinite-dimensional Hilbert spaces modulo permutation and translation groups, respectively.

Related work
The literature considers many ways of processing data in an orbit space.In what follows, we discuss some relevant examples, which we broadly categorize according to their inspiration.

Deep learning
First, we discuss how modern tools in machine learning currently process data in orbit spaces.(In short, while there has been some effort to construct Lipschitz feature maps, there has been little work to date on bilipschitz feature maps.) Much of today's excitement in machine learning comes on the heels of deep neural networks achieving super-human performance in various tasks.For data with translation invariance, such as images and audio signals, it is empirically beneficial for the early layers of a neural network to exhibit convolutional structure [83,37,63,52].Since convolutions are equivariant to translation, one might think of these early layers as isolating translationequivariant features of the data.The real-world success of convolutional neural networks inspired the development of more principled versions of this fundamental idea.For example, Mallat introduced the scattering transform [70,24,25,89], which iteratively alternates between taking a wavelet decomposition and a pointwise absolute value.The scattering transform is invariant to translation and stable to diffeomorphism, and it has since been generalized to other settings [49,78,35].
A modern goal in this vein is to develop deep learning tools to accommodate non-Euclidean data, such as data in orbit spaces.This is the charge of geometric deep learning [23,22].As an example, consider the task of classifying molecules according to whether they are harmful to humans.We can represent the molecule as a graph, with vertices representing atoms and edges representing bonds, and then we can train a graph neural network for this classification task.By design, graph neural networks are invariant to relabeling the vertices of the input graph.As such, they either fail to separate certain pairs of nonisomorphic graphs, or they are slow (assuming the graph isomorphism problem is hard).The standard message-passing graph neural networks distinguish isomorphism classes of graphs as well as the Weisfeiler-Leman test [92,96,82,59], while other approaches achieve better separation [73,21,58].

Invariant theory
One may express any group-invariant classifier as the composition of a group-invariant feature map with a classifier on the feature domain.The feature map is most expressive if it separates all pairs of distinct orbits.Under mild conditions on the group (e.g., if the group is finite), Hilbert [56] showed that there exists a polynomial map into finite-dimensional space that separates the orbits; in fact, one may take the coordinate functions to be generators of the algebra of invariant polynomials.Modern work in invariant theory constructs smaller separating sets of polynomials with bounds on their degrees [43,40,41,42].(Warning: polynomial invariants are considered to be separating if they separate orbits as well as the entire ring of polynomial invariants, which might fail to distinguish certain orbits if the group is not compact.)Such polynomial maps play a key role in both invariant and equivariant machine learning [86,87,19].
Polynomial invariants are also used to solve the orbit recovery problem.For this problem, there is a known group G acting linearly on a vector space V , and the task is to recover the orbit G • x of an unknown point x ∈ V from data of the form {y i = g i x + e i } n i=1 , where each g i ∈ G and e i ∈ V is drawn independently at random.(In some settings, this problem is also known as multi-reference alignment and can be viewed as a sub-problem of cryo-electron microscopy.)Much of the recent work on this problem applies the method of moments, where one uses the data to estimate the moment m k (x) := E g∼Unif(G) (gx) ⊗k for various k, and then estimates the orbit G • x from these moments [12,79,18,17,45].Notably, the coordinates of m k (x) are G-invariant polynomials of x.
Polynomial invariants have been investigated for well over a century, and the large body of existing work makes them particularly nice to interact with in theory.Unfortunately, they can be rather finicky in practice.As an example, consider the Viète map, which sends any tuple {a i } n i=1 of complex numbers to the coefficients of the corresponding degreen monic polynomial n i=1 (z − a i ).This defines a homeomorphism C n /S n → C n with the elementary symmetric polynomials as coordinate functions [93].While the continuity of the inverse map is most cumbersome to demonstrate where the polynomial has repeated roots, the inverse map is also unstable elsewhere, perhaps most famously at Wilkinson's polynomial 20 k=1 (z−k) [94].For this polynomial, perturbing the coefficient of z 19 by machine precision produces a sizable error in several of the roots.Considering we are motivated by computational applications, this suggests that we pursue a more stable family of invariants.
The theory that was developed to study this particular family of invariants has recently inspired the construction of new families of invariants for many more group actions.For example, [28,27] modified polynomial separating invariants for abelian groups in order to be Lipschitz, [9] identified bilipschitz permutation invariants based on sorting, and [44,3] discovered semialgebraic separating invariants for a variety of groups.Finally, [29,71,72] introduced and studied max filtering, which directly generalizes phase retrieval to produce easy-to-compute semialgebraic separating invariants for any closed group.Furthermore, max filtering invariants are frequently bilipschitz in the quotient metric, thereby avoiding the stability problem exemplified by Wilkinson's polynomial.

Metric embeddings
Consider the following fundamental question: Given metric spaces M and N , does there exist a bilipschitz map M → N , i.e., a metric embedding of M into N ?This is an active area of research; see [75] for example.In the cases where a metric embedding exists, one seeks metric embeddings of minimal distortion, that is, the quotient of upper and lower Lipschitz bounds.If M is finite and N is |M |-dimensional Euclidean space, the distortionminimizing embeddings can be determined by semidefinite programming [67,68].Building on ideas from [61], Eriksson-Bique [46] constructed a metric embedding of R d /G for any finite G ≤ O(d) into a Euclidean space.(This construction was later generalized in [97].)Furthermore, the dimension of the target space and the distortion of the map are both bounded by functions of d.Note that any metric embedding R d /G → R n determines a Ginvariant map R d → R n that separates G-orbits in a stable way.To date, Eriksson-Bique's map and max filtering are the only known methods of constructing bilipschitz invariants for arbitrary finite subgroups of O(d). 1 Bilipschitz maps and their applications Given a map f : X → Y between metric spaces X and Y , then provided X has at least two points, we may take α, β ∈ [0, ∞] to be the largest and smallest constants (respectively) such that α These are the (optimal) lower and upper Lipschitz bounds, respectively.(Notice that α and β are not well defined when X has fewer than two points.)The with the convention that division by zero is infinite.A map with finite upper Lipschitz bound is called Lipschitz, a map with positive lower Lipschitz bound is called lower Lipschitz, and a map with finite distortion is called bilipschitz.Observe that Lipschitz functions are necessarily uniformly continuous, and lower Lipschitz functions are necessarily invertible on their range with uniformly continuous inverse.Bilipschitz maps are particularly useful in the context of data science.Indeed, while many data science tools have been developed for data in Euclidean space, in many cases, data naturally arises in other metric spaces.Given a metric space X and a map f : X → R d , one may pull back these Euclidean tools through f to accommodate data in X.Furthermore, we can estimate the quality of this pullback in terms of the bilipschitz bounds of f .Three examples of this phenomenon follow.
Example 1 (Nearest neighbor search).Fix data {x i } i∈I in a metric space X and a parameter λ ≥ 1.The λ-approximate nearest neighbor problem takes as input some x ∈ X and outputs j ∈ I such that To date, many algorithms solve this in the case where X is Euclidean [4].When X is non-Euclidean, we may combine such a black box with a map f : X → R d of distortion c to solve the cλ-approximate nearest neighbor problem in X.Indeed, find j ∈ I such that Then Example 2 (Clustering).Given data {x i } i∈I in a metric space X and a parameter k ∈ N, we seek to partition the data into k clusters.There are many clustering objectives to choose from, but the most popular choice when X = R d is the k-means objective: This objective is popular in Euclidean space since Lloyd's algorithm is fast and works well in practice.In addition, k-means++ delivers a O(log k)-competitive solution to the k-means problem [7].When X is non-Euclidean, we may combine a λ-competitive solver for R d with a map f : X → R d of distortion c to obtain a c 2 λ-competitive solver for X.Indeed, suppose Example 3 (Visualization).To visualize data {x i } n i=1 in a metric space X, one might apply multidimensional scaling [64] to represent the data as points in a Euclidean space of dimension k ∈ {2, 3}.The multidimensional scaling objective is Here, , where I is the identity matrix and J is the matrix of all ones.The minimizer can be obtained by taking the eigenvalue decomposition g(D) Note that this requires one to compute the top eigenvalues and eigenvectors of an n × n matrix.Meanwhile, if X = R d , this is equivalent to running principal component analysis on {x i } n i=1 , which only requires computing the top eigenvalues and eigenvectors of a d × d matrix, and so the runtime is linear in n.This speedup is available to non-Euclidean X given a map f : X → R d with bilipschitz bounds α and β.In particular, let is an orthogonal projection in the space of symmetric matrices, we have As such, the faster solution introduces an additive error that is smaller when α and β are both closer to 1.This approach was used in [29] to visualize how the shapes of U.S. Congressional districts are distributed.

The metric quotient
In this section, we identify an appropriate notion of quotient for metric spaces.Given a group G acting on a metric space X, one may consider the set of orbits This quotient is endowed with a pseudometric d X/G defined by where the infimum is taken over all n ∈ N and p 1 , q 1 , . . ., p n , q n ∈ X such that x ∼ p 1 , q i ∼ p i+1 for each i ∈ {1, . . ., n − 1}, and q n ∼ y.
Example 4. Suppose X = R 2 with Euclidean distance and G = { t 0 0 1/t : t ̸ = 0}.Then the members of X/G are given by the origin: Furthermore, d X/G ≡ 0 since both axes are arbitrarily close to the origin and each hyperbola is arbitrarily close to both axes.
In this paper, we are primarily interested in cases where G acts by isometries on X, in which case an easy argument akin to the proof of Theorem 4 in [57] gives that Example 5. Suppose X = R 2 with Euclidean distance and G = SO(2).Then the members of X/G are given by the origin: Furthermore, the reverse triangle inequality implies Example 6. Suppose X is the Hilbert space ℓ 2 (N; R) of square summable real-valued sequences and G is the group of bijections N → N. Then d X/G does not define a metric on X/G.Indeed, suppose x ∈ X is entrywise nonzero, and observe We may continue in this way to construct a sequence in G • x that converges to the right shift y := (0, x 1 , x 2 , x 3 , x 4 , . ..), but y ̸ ∈ G • x since x is entrywise nonzero.Thus, Recall that a pseudometric space determines a metric space by identifying points of distance zero.This motivates the definition of the metric quotient: One may show that d X/ /G is a metric on X//G.Here, we use // to signify that we are taking two quotients: we mod out by G, and then by the zero set of d X/G .This notation has been used previously in the literature to denote the geometric invariant theory quotient [74], a notion that directly inspired the authors' definition of metric quotient.A similar quotient is defined in [91] for complete metric spaces modulo general equivalence classes.This quotient uses the same approach of identifying points with pseudometric distance zero, but then takes the metric completion of the result.In the special case where X is complete and G acts on X by isometries, one can show that X//G is already complete, and so these notions of quotient coincide.
The metric quotient has a categorical interpretation.For this, we view X as an object in some concrete category C. For a group G acting on X, a morphism π : (C2) every G-invariant morphism X → Z uniquely factors through π.
The categorical quotient separates the G-orbits in X as well as possible for a G-invariant morphism in C. Categorical quotients are unique up to canonical isomorphism.Lemma 7. Consider any set Ω of functions [0, ∞] → [0, ∞] that satisfies the following: (i) the identity function belongs to Ω, (ii) Ω is closed under composition, and (iii) every ω ∈ Ω is weakly increasing, upper semicontinuous, and vanishing at zero.
Then Ω determines a category C whose objects are all metric spaces and whose morphisms are all functions f : Y → Z for which there exists ω ∈ Ω such that i.e., f admits ω as a modulus of continuity.Furthermore, for any group G acting by isometries on X, the map X → X//G defined by x → [x] is a categorical quotient in C.
For example, the metric quotient is a categorical quotient for each category of metric spaces with one of the following choices of morphisms: In this paper, we are primarily interested in the Lipschitz category.In this category, a modification of the following proof shows that X → X//G is a categorical quotient for any group G acting on X, even if not by isometries.
Proof of Lemma 7. First, C is a category by (i) and (ii).Indeed, the identity map on any metric space admits the identity function ω id ∈ Ω as a modulus of continuity.Also, if To show that π : X → X//G defined by π(x) = [x] is a categorical quotient for C, we first verify that π is a morphism in C: Next, we verify (C1).Indeed, for every x ∈ X and g ∈ G, we have d X/G (G • x, G • gx) = 0, and so gx ∈ [x], i.e., π(gx) = [gx] = [x] = π(x).Thus, π is G-invariant.Finally, we verify (C2).Consider any G-invariant morphism f : X → Z, with modulus of continuity ω f ∈ Ω.Then for any x, x ′ ∈ X, it holds that inf (1) where the second step follows from the assumptions in (iii) that ω f is weakly increasing and upper semicontinuous.Now suppose [x] = [x ′ ].Then we may apply (1) to get where the last step follows from the assumption in (iii) that ω f is vanishing at zero.Thus, f is constant on the level sets of π, which uniquely determines f ↓ : X//G → Z such that f = f ↓ • π.It remains to show that f ↓ is a morphism in C. To this end, (1) gives It is worth noting some sufficient conditions for the orbits of G to be closed in X.First, the orbits are closed whenever G is finite.Next, if X is a normed vector space and G is a subgroup of linear isometries, then the orbits of G are closed in X whenever G is compact in the strong operator topology.As an example in which G acts on X by isometries but X//G ̸ = X/G, one may take X = R 2 and G ≤ O(2) to be the group of rotations by rational multiples of π.Indeed, in this example, the orbit of any unit vector is a dense but proper subset of the unit circle.
Suppose G acts on X by isometries.A continuous G-invariant map f : X → Y is constant on each orbit G • x, and by continuity, it is also constant on the closure Throughout, we reserve the superscript downarrow to denote this induced factor map over the metric quotient.Also, we write X/G instead of X//G when they coincide, but in such cases, we typically write [x] instead of G • x for brevity.In the same spirit, we write This paper is primarily concerned with real Hilbert spaces modulo subgroups of the corresponding orthogonal group.Unless stated otherwise, V and W denote nontrivial real Hilbert spaces, S(V ) is the unit sphere in V , and G is a subgroup of the orthogonal group O(V ).Whenever we have a complex inner product, we conjugate on the left.Notably, every complex Hilbert space V ′ is isometric to a real Hilbert space V ′′ with inner product Re⟨•, •⟩ V ′ , and furthermore, U (V ′ ) ≤ O(V ′′ ).For this reason, we do not lose generality by restricting our attention to real Hilbert spaces, though as we will see, some important examples are more naturally expressed in terms of complex Hilbert spaces.
This naturally emerges in the cross term when expanding the square of the quotient metric: For a complex Hilbert space and a group of unitaries, max filtering is defined by passing to the underlying real Hilbert space: Re⟨p, q⟩.

Homogeneous extension
We motivate this section with an example: Example 10.Take V := R d and G := {± id} ≤ O(V ), and consider the G-invariant map f : x → xx ⊤ .The induced map f ↓ : R d /G → R d×d is injective since the column space and trace of the matrix f ↓ ([x]) are given by which together determine {±x} = [x].However, Figure 1 (left) illustrates that f ↓ is not lower Lipschitz.As we will see in Theorem 21(b), this is an artifact of the differentiability of f at the origin.In fact, we can say more: for any unit vector u ∈ R d and c ≥ 0, we have The limit c → 0 establishes the lower Lipschitz bound α = 0, and c → ∞ establishes β = ∞.While f ↓ fails to be bilipschitz at points "far" from S(R d )/G, Figure 1 (middle) indicates that f ↓ is bilipschitz when restricted to S(R d )/G.This suggests that f is only problematic in the radial direction, which can be corrected by instead mapping g : cu → cuu ⊤ for any unit vector u ∈ R d and c ≥ 0. This extends f | S(R d ) to all of R d in a radially isometric way: Furthermore, Figure 1 (right) indicates that this extension is bilipschitz on all of R d /G.
The above example motivates the following definition: See Figure 2 for an illustration.A related notion appears in the proof of Lemma 3.5 in [46].It is routine to verify that the homogeneous extension is well defined.We note that [28] considers homogeneous extensions of functions with codomain W (see Definition 2 in [28]).By instead taking the codomain to be S(W ), the lower Lipschitz bound of f quantifies the non-parallel property (see Definition 1 in [28]), which in turn produces a lower Lipschitz bound of the homogeneous extension; this is a quantitative version of Proposition 1(b) in [28].In fact, we obtain optimal bilipschitz bounds in Theorem 13 below (cf. the upper Lipschitz bound given in Theorem 4 of [28] and the distortion bound given in Lemma 3.5 of [46].).
When G is trivial, Lemma 12 reduces to the following (surprisingly unfamiliar) identity: In this setting, a and b need not be nonnegative for the identity to hold.
In what follows, we say a group G acts topologically transitively on a topological space X if there exists x ∈ X such that the orbit G • x is dense in X.For example, the  group of rotations by rational multiples of π acts topologically transitively on the unit circle.By Lemma 8, when G acts by isometries on a metric space X, topological transitivity is equivalent to X//G being a singleton set.Since bilipschitzness is ill-defined for maps on the singleton metric space, the following result isolates the topologically transitive case for completeness.
(a) If G acts topologically transitively on S(V ), then for every f : S(V )//G → S(W ), the homogeneous extension f ⋆ : V //G → W is an isometry.
Proof.For (a), S(V )//G is a singleton set, so the image of f consists of a single unit vector w ∈ S(W ).Fix u, v ∈ S(V ) and a, b ≥ 0. Then d([u], [v]) = 0, and so Lemma 12 gives For (b), fix u, v ∈ S(V ) and a, b ≥ 0. Then Lemma 12 gives Since f has lower Lipschitz bound α, we then have where the last step applies Lemma 12.In the case where min{α, 1} = 1, this bound is sharp when [u] = [v] and a ̸ = b, by Lemma 12.In the case where min{α, 1} = α, this bound is sharp when a = b = 1 since α is the optimal lower Lipschitz bound for f by assumption.A similar argument delivers the optimal upper Lipschitz bound.
In practice, one might encounter f with optimal lower Lipschitz bound α > 1, in which case the homogeneous extension f ⋆ has distortion β, which is strictly larger than the distortion β/α of f .In this case, we can first modify f by lifting outputs into an extra dimension: Lemma 14.Let X be a metric space.Given f : X → S(W ) with optimal bilipschitz bounds α and β, then for each t ∈ (0, 1], the map f t : X → S(W ⊕R) defined by f t (x) = (tf (x), √ 1 − t 2 ) has optimal bilipschitz bounds tα and tβ.
Proof.The result follows from the fact that ∥f Given f : S(V )//G → S(W ) with optimal bilipschitz bounds α > 1 and β, then for every t ∈ [β −1 , α −1 ], the homogeneous extension (f t ) ⋆ also has distortion β/α.In the case where β < 1, we do not have an approach to make homogeneous extension preserve distortion.Next, we show how a generalization of the same lift can be used to convert a bounded bilipschitz map S(V )//G → W into a bilipschitz map S(V )//G → S(W ⊕ R): Lemma 15.Let X be a metric space.Given bounded g : X → W with bilipschitz bounds α and β, then for each t ∈ (0, ∥g∥ −1 ∞ ), the map g t : X → S(W ⊕ R) defined by g t (x) = (tg(x), 1 − ∥tg(x)∥ 2 ) has (possibly suboptimal) bilipschitz bounds tα and Proof.The lower Lipschitz bound follows from the fact that ∥g t (x)−g t (y)∥ ≥ t∥g(x)−g(y)∥.For the upper Lipschitz bound, put r := ∥g∥ ∞ and consider the map The derivative of h t reveals that its optimal upper Lipschitz bound is c t := (1 − t 2 r 2 ) −1/2 • tr.We apply this bound and the reverse triangle inequality to obtain and the result follows.
Notably, Lemma 15 produces a map with near identical distortion when t is small, but taking t too small will send the upper Lipschitz bound below 1, in which case homogeneous extension will increase the distortion.
In what follows, we describe a few examples of bilipschitz maps over S(V )//G that can be homogeneously extended to bilipschitz maps over V //G by Theorem 13(b).
Example 16 (cf.Example 10).Fix a nontrivial real Hilbert space V , take G := {± id} ≤ O(V ), and consider f : then G acts transitively on S(V ), and so Theorem 13(a) applies.Otherwise, for each x, y ∈ S(V ) with [x] ̸ = [y], we have and so taking t := |⟨x, y⟩| gives Since t takes values in [0, 1), it follows that f has optimal bilipschitz bounds 1 and √ 2. By Theorem 13(b), the homogeneous extension has the same optimal bilipschitz bounds.By Corollary 36, this is the minimum possible distortion for a Euclidean embedding of V /G.By contrast, in the case where V = ℓ 2 , max filtering approaches fail to deliver bilipschitz invariants [26,1].
Example 17. Fix a nontrivial complex Hilbert space V , take G := {ω •id : |ω| = 1} ≤ U (V ), and consider f : then G acts transitively on S(V ), and so Theorem 13(a) applies.Otherwise, following the argument in Example 16, then for any x, y ∈ S(V ), we have and f has optimal bilipschitz bounds 1 and √ 2. By Theorem 13(b), the homogeneous extension f ⋆ has the same optimal bilipschitz bounds.By Corollary 37, this is the minimum distortion for a Euclidean embedding of V /G.

Non-differentiability of bilipschitz invariants
We motivate this section with an example: Example 19.By Proposition 9, max filtering produces bilipschitz invariants modulo finite groups of O(d).However, these invariants fail to be differentiable.Indeed, x → ⟨⟨[z], [x]⟩⟩ is non-differentiable precisely at the boundaries of the Voronoi diagram of the orbit G • z.For an explicit example, take V := R 2 , templates z 1 := (1, 1), z 2 := (1, 2), and z 3 := (3, 1), and given G ≤ O(2), consider the max filtering invariant We consider two cases: when G consists of rotations by multiples of 2π/3, and when G is the dihedral group of order 6.See Figure 3 for an illustration.The points at which f is not differentiable form rays emanating from the origin.Later in this section, we show that every point with a nontrivial stabilizer in G (i.e., every non-principal point) is necessarily a In both parts of Figure 3 (and in fact, whenever G is nontrivial), the origin is one such non-principal point.In the example on the left, this is the only non-principal point, whereas on the right, every nondifferentiable point is a non-principal point.In what follows, we show how bilipschitzness requires such non-differentiability of max filtering invariants.
Recall that a function f : in which case we say A is the Fréchet derivative at x, which we denote by Df (x) := A; see [33] for more information.If V and W are Euclidean, then the matrix representation of Df (x) is the Jacobian of f at x.
Proof.By G-invariance, we have f (gx) = f (x) and f (gx + h) = f (x + g −1 h), and so Change variables g −1 h → h and apply Fréchet differentiability at x to obtain the result.
Theorem 21.Suppose x ∈ V is fixed by some nonidentity member of G, and consider any G-invariant map f : V → W that is Fréchet differentiable at x. Then the following hold.
(a) There exists a unit vector v ∈ V orthogonal to x such that Df (x)v = 0.
In words, a bilipschitz invariant cannot be differentiable at any point with a nontrivial stabilizer.In particular, if G is nontrivial, no bilipschitz invariant is differentiable at 0. For each example in Section 4, G acts freely on V \ {0}, and so we can get away with 0 being the only point at which f is not differentiable.
For (b), take v ∈ V from (a).Since G is finite or x = 0, either the members of [x] have some minimum pairwise distance δ > 0, or [x] is a singleton set and we may take Thus, f ↓ is not lower Lipschitz.For (c), we argue as in (b) with a different perturbation of x.Take v from (a) and put h(t) := x cos(t) + v sin(t) − x so that x + h(t) ∈ S(V ) for all t.An argument like above shows d([x + h(t)], [x]) = ∥h(t)∥ when |t| is sufficiently small.We apply the triangle inequality, the definition of the Fréchet derivative, and the fact that Df (x)v = 0 to get lim inf Considering ∥h(t) − tv∥ 2 = (cos t − 1) 2 + (sin t − t) 2 and ∥h(t)∥ 2 = (cos t − 1) 2 + sin 2 t, the lim inf above is zero, and so f ↓ | S(V )/G is not lower Lipschitz.
Example 22.As an application, consider the problem of multi-reference alignment [16].
Here, G is the group of cyclic permutations, and the task is to estimate an unknown [x] ∈ C d /G given data of the form {g i x + z i } n i=1 , where the g i 's are drawn uniformly from G and the z i 's are independent complex gaussian vectors with large variance.If G were trivial, then one could estimate x with the sample average.Since G is nontrivial, we are inclined to "average out the noise" in an invariant domain.
One popular invariant is the bispectrum B : where x denotes the discrete Fourier transform of x, and ℓ − k is interpreted modulo d.The approach in [16] performs multi-reference alignment by first using the data to find an estimate y of B(x), and then using y to estimate [x].For the second step, they construct a function ψ x] whenever x is everywhere nonzero, and they show that ψ is locally Lipschitz.
In what follows, we show that ψ is not Lipschitz as a consequence of Theorem 21(b).Suppose otherwise that ψ is β-Lipschitz, and let U denote the vectors in C d with everywhere nonzero discrete Fourier transform.Since U is dense in C d and B (and hence B ↓ ) is continuous, we have inf i.e., B ↓ is 1/β-lower Lipschitz.Since G fixes the constant functions and B is differentiable as a map between real Hilbert spaces, this contradicts Theorem 21(b).(Note that in this argument, the points used to break a Lipschitz bound on ψ are precisely those with vanishing Fourier coefficients, perhaps because at least some of these orbits are not even separated by B.) In general, a left inverse to a translation-invariant map f cannot be Lipschitz unless f is not differentiable at the constant functions (and at any other nontrivially periodic function).Such non-differentiability is afforded by max filtering, which in turn delivers bilipschitz invariants [29].

Bilipschitz polynomial invariants
Classical invariant theory is concerned with polynomial maps, thanks in part to the following well-known result.
Proposition 23.For each finite G ≤ GL(d), there exists an injective polynomial map R d /G → R n for some n ∈ N.
In this section, we evaluate polynomial maps in terms of bilipschitzness.
Proof.Since the origin has a nontrivial stabilizer and f is differentiable at the origin, Theorem 21 gives that f ↓ is not lower Lipschitz.
Next, suppose f is not affine linear.We will show that f ↓ is not upper Lipschitz.Select a coordinate function f i (x) of degree k ≥ 2, and let g(x) denote the homogeneous component of f i (x) of degree k.Since g(x) is nonzero, there exists v ∈ R d such that g(v) ̸ = 0. Note that g(0) = 0 by homogeneity, and so v ̸ = 0. Then g(tv) = t k g(v), and so f i (tv) = t k g(v)+O(t k−1 ) for large t.Thus, ∥f Finally, suppose f is affine linear and write f (x) = Ax + b.Since G is nontrivial, there exists g ∈ G and y ∈ R d such that z := gy ̸ = y.Then f (y) = f (z), meaning Ay + b = Az + b, and so While we cannot expect polynomial invariants to be bilipschitz, there is some hope of applying ideas from Section 4 to obtain bilipschitz maps from polynomial invariants by homogeneous extension.In fact, all of the examples from Section 4 were obtained in this way.Considering Lemma 15 and Theorem 13(b), it suffices to seek G-invariant polynomial maps f : R d → R n for which f ↓ | S(R d )/ /G is bilipschitz.By Theorem 21(c), this is not possible when G is finite unless it acts freely on S(R d ).(Indeed, for each example in Section 4, G acts freely.)This is a strong condition, and it implies that every abelian subgroup of G is cyclic; see Theorems 5. (c) For any G-invariant smooth immersion g : M → R n , the corresponding map g ↓ : M/G → R n is a smooth immersion.
If in addition M is a connected Riemannian manifold and G acts by isometries2 , then the following also hold: (d) M/G has a unique Riemannian metric tensor such that π is a Riemannian covering.
(e) The Riemannian distance on M/G coincides with the quotient metric d M/G .
Proposition 28.For Riemannian manifolds X and Y with X compact, every smooth embedding X → Y is bilipschitz.
Proposition 29.Given a k-dimensional compact smooth semialgebraic submanifold M of R n and m > 2k, then for a generic linear map L : R n → R m , the restriction L| M is a smooth embedding.
Proof of Theorem 25.First, by Theorem 21(c), we have both (b)⇒(a) and (c)⇒(a).For (a)⇒(b), we first claim that for every u ∈ S(R d ), there exists a G-invariant polynomial map p u : R d → R d such that Dp u (u) is invertible.To see this, for each i ∈ {1, . . ., d}, take any polynomial q u,i : R d → R such that ∇q u,i (gu) = ge i for all g ∈ G.Such a polynomial exists by Proposition 26; indeed, {gu} g∈G are distinct since G acts freely on S(R d ) by assumption.Let q u,i denote the G-invariant polynomial map obtained by applying the Reynolds operator to q u,i .Then Evaluating at x = u then gives Then taking p u (x) := (q u,1 (x), . . ., q u,d (x)) gives Dp u (u) = id, as desired.
Next, by the continuity of x → det(Dp u (x)), we have that Dp u (x) is invertible on an open neighborhood N u of u.By compactness, the open cover {N u } u∈S(R d ) of S(R d ) has a finite subcover {N u } u∈F .Then the polynomial map r 1 : x → {p u (x)} u∈F has the property that Dr 1 (x) is injective for every x ∈ S(R d ).Next, let r 2 denote any polynomial G-invariant map that separates G-orbits as in Proposition 23.The polynomial map r : S(R d ) → R n defined by r(x) = (r 1 (x), r 2 (x)) is an immersion due to the r 1 component, and so Proposition 27 gives that S(R d )/G is a smooth manifold and r ↓ is a smooth immersion.Furthermore, the r 2 component ensures that r ↓ is injective.As such, r ↓ is a smooth embedding by Proposition 4.22 in [65].Finally, the fact that r ↓ is bilipschitz follows from Proposition 28.Specifically, to use this result, we note two things.First, since G acts freely, S(R d )/G is a Riemannian manifold by Proposition 27.Second, the Euclidean distance on S(R d ) is equivalent to the standard Riemannian distance, and so the quotient Euclidean distance on S(R d )/G is equivalent to the quotient Riemannian distance, which in turn equals the Riemannian distance on S(R d )/G by Proposition 27.
Finally, for (a)⇒(c), since G acts freely, we have that S(R d )/G is a smooth manifold by Proposition 27.Next, the above shows that we have a bilipschitz polynomial map r ↓ : S(R d )/G → R n .The lower Lipschitz bound ensures that r ↓ is a smooth embedding.As such, M := im(r ↓ ) is a compact smooth submanifold of R n of dimension d − 1.Furthermore, M is the image of the semialgebraic set S(R d ) under a polynomial map, and so it is semialgebraic.By Proposition 29, L| M is a smooth embedding for a generic linear map L : R n → R 2d−1 , in which case L • r ↓ is the desired map.(Indeed, bilipschitzness follows from Proposition 28 as before.) While our proof of (a)⇒(b) in Theorem 25 was not constructive, the following result provides a construction in the special case where G is abelian.Considering G ≤ O(d) ≤ U (d), we may focus on the complex case without loss of generality.Then the spectral theorem affords C d with an orthonormal basis that simultaneously diagonalizes every element of G, as in the hypotheses below.(b) For each i ∈ {1, . . ., d}, the character χ i is an isomorphism of G onto its image.(c) For each i, j ∈ {1, . . ., d}, there exists a smallest integer m ij ≥ 0 for which χ i χ Furthermore, when (c) holds, the polynomial map f : Notably, (the proof of) Proposition 5.2.1 in [43] establishes that the related map is G-invariant and p ↓ is injective.Later, [28] established that while p ↓ | S(C d )/G is Lipschitz, when d ≥ 3 and |G| ≥ 3, this function is not lower Lipschitz.With this context, Theorem 30 can be interpreted as making the injective map p ↓ | S(C d )/G lower Lipschitz by including additional coordinate functions for i > j.
Proof of Theorem 30.For (a)⇒(b), suppose G acts freely on S(C d ).Then 1 is not an eigenvalue of any nonidentity g ∈ G.It follows that for each i, the character χ i has trivial kernel.Then the first isomorphism theorem gives G ∼ = im(χ i ).
For (b)⇒(c), observe that G is cyclic, and given a generator h of G, then for each i, χ i (h) is a primitive |G|th root of unity.Fix i and j.Then there exists an integer m ≥ 0 such χ j (h) m = χ i (h) −1 , and so χ i (h k )χ j (h k ) m = (χ i (h)χ j (h) m ) k = 1 for every k.The claim then follows from the least integer principle.
For (c)⇒(a), consider any g ∈ G for which there exists x ∈ S(C d ) such that gx = x.Then 1 is an eigenvalue of g, meaning there exists j such that χ j (g) = 1.Our assumption then gives χ i (g) = χ i (g)χ j (g) m ij = 1 for all i.Thus, g = id, i.e., G acts freely on S(C d ).
Finally, suppose (c) holds.Then for every g ∈ G, we have . Fix x ∈ S(C d ), select an index j at which x j ̸ = 0, and let A denote the d × d submatrix of the (complex) Jacobian Df (x) corresponding to row indices {(i, j)} d i=1 .Then A = diag u + ve ⊤ j , where The matrix determinant lemma gives and so Df (x) is injective.Then the corresponding real Jacobian R 2d → R 2d 2 is also injective, as desired.

Maps of minimum distortion
The Euclidean distortion of a metric space X, denoted by c 2 (X) ∈ [1, ∞], is the infimum of c for which there exists a Hilbert space H and a bilipschitz map X → H of distortion c.
In particular, c 2 (X) < ∞ if and only if X admits a bilipschitz embedding into some Hilbert space.Euclidean distortion is finitely determined: Proof.The inequality ≥ holds since restricting a bilipschitz function over X to F can only improve the bilipschitz constants.For the other direction, suppose that for every finite F ⊆ X and every ϵ > 0, one can embed F into a Hilbert space with distortion at most c + ϵ.
Then the span of the image of this embedding has dimension at most |F | < ∞, and so the Hilbert space can be taken to be ℓ 2 without loss of generality.Then X embeds into an ultra power H of ℓ 2 with distortion at most c by Theorem 4.6 in [50].Furthermore, H is a Hilbert space by Theorem 3.3(ii) in [54].
In fact, Proposition 31 can be strengthened as follows: Lemma 32.Given a dense subset Y of a metric space X, it holds that Proof.By Proposition 31, it suffices to prove The inequality ≥ follows from the containment Y ⊆ X.For the reverse inequality, fix γ > 1.
Since Y is dense in X, for each E := {x 1 , . . ., x m } ⊆ X, we may select F := {y 1 , . . ., y m } ⊆ Y close enough to E so that g : E → F defined by g : x i → y i has distortion at most γ.Furthermore, there exists f : The desired bound follows by taking the supremum of both sides and recalling that γ > 1 was arbitrary.
Proposition 31 focuses our attention to finite metric spaces, in which case [67] observed that Euclidean distortion can be computed by semidefinite programming: Proposition 33.Given a finite metric space X, then c 2 (X) 2 is the infimum of t for which there exists a positive semidefinite matrix Conversely, each Q ⪰ 0 determines an embedding f : X → R X up to post-composition by an orthogonal transformation.
One may apply weak duality, as was done in [67], to obtain the following result.Here, we let D X ∈ R X×X denote the matrix defined by (D X ) xy = d X (x, y) 2 , while Q + and Q − denote the entrywise positive and negative parts of a matrix Q, respectively.Proposition 34 (Corollary 3.5 in [67]).Given a finite metric space X, consider any bilipschitz map f : X → ℓ 2 and any positive semidefinite

Furthermore, if equality holds with
We take inspiration from the proof of Claim 2.1 in [68] to prove the following: Theorem 35.Take X = R/Z with any translation-invariant metric d for which the function Proof.First, we show the first equality.Take any A, B ∈ X with A ̸ = B.After swapping A and B as necessary, we may express A = s + Z and B = t + Z for some s, t ∈ R with s − t ∈ (0, 1  2 ].Then Since g is monotonically decreasing, it follows that f has optimal bilipschitz bounds Next, given an even number n ∈ N, the cyclic group C := Z/nZ enjoys an injective homomorphism ϕ : C → X, through which we may pull back d to endow C with a metric d.Explicitly, Then the optimal bilipschitz bounds of f • ϕ are obtained by minimizing and maximizing the quantity .
In particular, our assumptions on g and the fact that n is even together imply that f • ϕ has optimal bilipschitz bounds .
Furthermore, the triangle inequality gives d( , and so Since g is monotonically decreasing, we conclude that the limit in (2) is at most π 2 .By identifying C ∼ = {0, . . ., n − 1} with arithmetic modulo n, we define Q ∈ R C×C by Then Q1 = 0 and Q is positive semidefinite since its eigenvalues are nonnegative (this is verified in Claim 2.3 of [68]).Furthermore, since D C and Q are both circulant, we have and so By Proposition 34, it follows that dist(f • ϕ) = c 2 (C).Since ϕ is an isometric embedding, the order-n subgroup X n of X also has Euclidean distortion g( 1 n )/g( 1 2 ).Overall, we have , which implies the result.
We may use Theorem 35 to establish that the embeddings in Section 4 achieve the minimum possible distortion: Corollary 36.Given a real Hilbert space V of dimension at least 2, consider the group Proof.Denote X = R/Z, select orthonormal u, v ∈ V , and define ϕ : Then ϕ is injective.Let d denote the pullback of the quotient metric on V /G.Then for t ∈ (0, 1 2 ], we have and |e 2πit − 1| 2 = 4 − 4 cos 2 (πt), and so Since g is decreasing in t, Theorem 35 gives that Finally, take f ⋆ as constructed in Example 16.Then which implies the result.
A nearly identical proof gives the following: Corollary 37. Given a complex Hilbert space V of dimension at least 2, consider the group Corollary 38.Given V := C, consider the group G := ⟨e 2πi/r ⟩ ≤ U (1) for some r ∈ N.
Proof.Denote X = R/Z, and define ϕ : ]. Let d denote the pullback of the quotient metric on V /G.One may show the function g from Theorem 35 is given by g(t) = sin(πt) sin(πt/r) , which is decreasing by a calculus-based argument like the one in Example 18.Then Theorem 35 gives that c 2 (X) matches the distortion of the homogeneous extension of (a lift of) the map f from Example 18.The result follows.
Lemma 39.Given nontrivial real Hilbert spaces V 1 and V 2 and groups Proof.First, we prove that ≥ holds.Given a Hilbert space H and a (G The desired bound follows.Next, we prove that ≤ holds.If the right-hand side is infinite, then we are done.Otherwise, for each i ∈ {1, 2}, there exists a Hilbert space H i and a G i -invariant map f i : V i → H i of finite distortion.Scale as necessary so that the optimal bilipschitz bounds for f ↓ i equal 1 and β i , and define f : and similarly, f ↓ has lower Lipschitz bound 1.Thus, dist(f This result is similar in spirit to Lemma 3.2 in [46].We can use Lemma 39 (and its proof) to produce more examples of embeddings of minimum distortion.For example, we may express . This suggests taking f 1 : x → x and f 2 : y → |y| so that f : (x, y) → (x, |y|).In fact, the resulting map f ↓ isometrically embeds Unlike the previous examples, this distortion-minimizing map is not the homogeneous extension of a polynomial on the sphere, although it is positively homogeneous.

Quotients by permutation
In this section, we consider the real Hilbert space V := ℓ 2 (N; R d ) of sequences of vectors in R d whose norms are square summable.The group G := S ∞ of bijections N → N acts on the index set of these sequences, and we are interested in the metric space V //G.As indicated in Example 6, the orbits in this example are not closed, and so unlike the other examples we have considered, this metric quotient does not simply reduce to the honest quotient V /G.We will see that this space has an isometric embedding into ℓ 2 when d = 1, but has no bilipschitz embedding into any Hilbert space when d ≥ 3. We start by characterizing the members of the metric quotient: Proof.Fix x ∈ ℓ 2 (N; R d ), and let A x denote the right-hand side of (3).By Lemma 8, [x] = S ∞ • x.As such, our task is to show that A x = S ∞ • x.
First, we show that A x ⊆ S ∞ • x.Fix y ∈ A x , and let σ be as in (3).Let i 1 < i 2 < • • • denote the (possibly finite) list of members of supp(x).For each n ∈ N, consider the index set I n := {i j : j ≤ n}, select any π n ∈ S ∞ such that π n (i) = σ(i) for every i ∈ I n , and put p n := x • π −1 n .Then p n ∈ S ∞ • x for every n, and we claim that p n → y.Indeed, the bound x that converges to y.For each s > 0, consider the threshold function θ s : ℓ 2 (N; R d ) → ℓ 2 (N; R d ) defined by θ s (z) i := z i if ∥z i ∥ ≥ s 0 otherwise, and denote t(s) := ∥x − θ s (x)∥ ∞ .Then either t(s) = 0 or t(s) > 0, in which case t(s) = sup{∥x i ∥ : ∥x i ∥ < s} = sup{∥x i ∥ : t(s)/2 < ∥x i ∥ < s} < s, since x ∈ ℓ 2 (N; R) has finitely many terms x i with ∥x i ∥ > t(s)/2.Either way, t(s) < s.
In fact, for every g ∈ S ∞ , we have θ s (gx) = gθ s (x), and so t(s is bounded away from zero, too.Since {p n } ∞ n=1 is Cauchy, it follows that θ s (p m ) = θ s (p n ) for all sufficiently large m and n.That is, for each s > 0, we have θ s (p n ) = θ s (y) for all n ≥ N (s).From this information, one may construct σ : supp(x) → supp(y), for example, σ(i) = π N (⌈∥x i ∥ −1 ⌉ −1 ) (i), such that y i = x σ −1 (i) for all i ∈ supp(y).Thus, y ∈ A x .
As a consequence of Lemma 40, we can establish that there is no orthogonal group G such that ℓ 2 (N; R)//S ∞ = ℓ 2 (N; R)/G.Indeed, let e i denote the ith standard basis element in ℓ 2 (N; R).Then [e 1 ] = {e i : i ∈ N}.Thus, given G such that G • e 1 = [e 1 ], it must hold that every element of G permutes the standard basis, i.e., G ≤ S ∞ .But then every member of the G-orbit of x := { 1 n } ∞ n=1 is entrywise nonnegative, whereas [x] contains members (such as the forward shift of x) with entries equal to 0.
In what follows, we construct an isometric embedding of ℓ 2 (N; R)//S ∞ in ℓ 2 (N; R).Given x ∈ ℓ 2 (N; R), let x + and x − denote the positive and negative parts of x (respectively) so that x = x + − x − , and in the case where x is entrywise nonnegative, let x (n) denote its nth largest entry.
Theorem 41.Consider Φ : ℓ 2 (N; R)//S ∞ → ℓ 2 (N; R) defined by if n is even The following example illustrates how Φ sorts the positive and negative parts of x: For the reverse inequality, consider any p 0 , q 0 ∈ ℓ 2 (N; R).We claim there exists a sequence ), and From this, it follows that ∥Φ([x]) − Φ([y])∥ ℓ 2 ≤ ∥p 0 − q 0 ∥ ℓ 2 for every p 0 ∈ [x] and q 0 ∈ [y], and taking the infimum of both sides gives as desired.We construct {p n } ∞ n=1 and {q n } ∞ n=1 so as to satisfy (Given a sequence x and natural numbers s < t, we write x [s:t] := (x s , . . ., x t ).)In what follows, we describe the map (p n , q n ) → (p n+1 , q n+1 ).Take Since a, b ∈ ℓ 2 (N; R), we have sup a ≥ 0 and sup b ≥ 0. If sup a = 0 and sup b = 0, put (Here and below, we express p n+1 and q n+1 as row vectors of a 2 × ∞ matrix for simplicity.)If sup a > 0 and sup b = 0, then we take any i ∈ arg max(a) and put Similarly, if sup a = 0 and sup b > 0, then we take any i ∈ arg max(b) and put Finally, if sup a > 0 and sup b > 0, then we take any i ∈ arg max(a) and j ∈ arg max(b).If i = j, then we put Otherwise, if i ̸ = j, then we put Next, we verify that ∥p n+1 − q n+1 ∥ ℓ 2 ≤ ∥p n − q n ∥ ℓ 2 by cases.For (5), equality holds.For (6), equality holds if (q n ) n+i = 0. Otherwise, the condition sup a > 0 and sup b = 0 implies that (p n ) n+i and (q n ) n+i have opposite sign.Thus, from which it follows that ∥p n+1 − q n+1 ∥ ℓ 2 ≤ ∥p n − q n ∥ ℓ 2 .The analysis for ( 7) is identical.For (8), equality holds.Finally, for (9), we treat the case where n is odd, as the even case is similar.Since i ∈ arg max(a) and j ∈ arg max(b), we have (p n ) n+i ≥ (p n ) n+j and (q n ) n+j ≥ (q n ) n+i .Then (p n ) n+i ((q n ) n+j − (q n ) n+i ) ≥ (p n ) n+j ((q n ) n+j − (q n ) n+i ), and rearranging gives It follows that ⟨p n+1 , q n+1 ⟩ ≥ ⟨p n , q n ⟩, and since ∥p n+1 ∥ ℓ 2 = ∥p n ∥ ℓ 2 and ∥q n+1 ∥ ℓ 2 = ∥q n ∥ ℓ 2 , we have Finally, we verify that p n → Φ([p 0 ]) and q n → Φ([q 0 ]).A case-by-case inductive argument establishes (4), and so and similarly, ∥q n − Φ([q 0 ])∥ ℓ 2 → 0.
While c 2 (ℓ 2 (N; R)//S ∞ ) = 1, we can use ideas from the proof of Theorem 2 in [5] to show that c 2 (ℓ 2 (N; R d )//S ∞ ) = ∞ for every d ≥ 3. We do not know how to treat the d = 2 case, but ideas in [6] might be relevant.L 2 (Ω n , ν n ) is separable and the mapping G n × (Ω n , ν n ) → X n given by (g, ω) → g • ω is a measure space isomorphism.Then These examples suggest a fundamental problem.
Problem 48.Determine whether there exists a bilipschitz embedding of ℓ 2 (Z)/Z into some Hilbert space, and if so, find the exact value of c 2 ℓ 2 (Z)/Z .
One may use the semidefinite program in Proposition 33 to establish where X is a finite subspace of ℓ 2 (Z)/Z.This is the extent of our progress on Problem 48.We first prove Lemma 43.Our argument is inspired by [15].It involves spaces of vectorvalued functions.Given a measure space (X, µ) and a Hilbert space H, a function φ : X → H is measurable when the pre-image of every open set in H is measurable in X. Equivalently, for every v ∈ H, the complex-valued function x → ⟨φ(x), v⟩ is measurable on X [80].We consider measurable functions φ, ψ : X → H to be equivalent if they differ only on a set of measure zero.Then L 2 (X; H) is defined as the space of all equivalence classes of measurable functions φ : X → H for which X ∥φ(x)∥ 2 H dµ(x) < ∞.It is a Hilbert space with inner product ⟨φ, ψ⟩ := X ⟨φ(x), ψ(x)⟩ H dµ(x).When X is a group, one may define left translation in this space, and under the appropriate hypotheses, this operator is continuous: Proposition 49.Let G be a (multiplicative) second countable locally compact group, and let H be a separable Hilbert space.Given g ∈ G, let L g denote the left translation operator that transforms φ : G → X to L g φ : G → X defined by (L g φ)(h) = φ(g −1 h) for h ∈ G.The following hold for any φ, ψ ∈ L 2 (G; H).Proposition 49 is standard, but for the sake completeness, we give its proof in the appendix.In (a), it is vital that H is separable.For example, consider G = R and H = ℓ 2 (R).Choose any nonzero continuous function f ∈ L 2 (R), and let φ : G → H be the function with φ(x) = f (x)δ x , where δ x ∈ ℓ 2 (R) is the point mass at x ∈ R. For any y ̸ = 0, it holds that ⟨L y φ, φ⟩ = 0, and thus ∥L y φ − φ∥ = 2∥φ∥ = 2∥f ∥.Hence, the function y → L y φ is discontinuous at 0. Proof of Lemma 43.We use multiplicative notation for the group operation in G. Given g ∈ G and φ ∈ L 2 G; L 2 (Ω, ν) , let L g φ denote the left translation of φ defined by (L g φ)(h) = φ(g −1 h) for h ∈ G.We claim there is a unitary isomorphism L 2 (X) → L 2 G; L 2 (Ω, ν) that converts the action of G on L 2 (X) into left translation.To see this, we first note that L 2 (G) is separable since G is second countable; see (16.2) and (16.12) in [55].Then Theorem II.10 in [81] gives an isomorphism and ω ∈ Ω.This proves the claim.
By the extreme value theorem, M = Re⟨L h φ, ψ⟩ for some h ∈ K. Then (11) implies Proof of Theorem 44.Consider the dense subset c 00 (Z)/Z = [a] ∈ ℓ 2 (Z)/Z : a k = 0 for all but finitely many k ∈ Z .
Given finite F ⊆ c 00 (Z)/Z, select N ∈ N such that each [a] ∈ F has a representative with support contained in {0, . . ., N }.For each n > 2N , we will construct an isometric embedding First, we construct a critical set B ⊆ X n , whose normalized indicator function will emulate a point mass δ 0 ∈ ℓ 2 (Z).Let µ be the measure in X n .Express the group operation in G n multiplicatively, and write |S| for the Haar measure of Borel S ⊆ G n .By hypothesis, there exists h ∈ G n that generates a discrete subgroup H ≤ G n of order at least 2N + 1.Then there is a neighborhood U ⊆ G n of 1 that fails to intersect H \ {1}.By Proposition 2.1 of [48], U contains a neighborhood V ⊆ G n of 1 such that V V ⊆ U , and V in turn contains a compact neighborhood W ⊆ G n of 1 such that W −1 = W and W W ⊆ V .(While Proposition 2.1 of [48] does not directly provide compactness of W , this may be obtained by replacing W with a compact neighborhood of 1 contained in W and then intersecting with its inverse.)Since W is compact and contains an open set, it has finite positive measure by Proposition 2.19 of [48].Meanwhile, since ν n is σ-finite and nonzero, there exists measurable Next, we claim that for every g ∈ G n , there is at most one k ∈ H for which g Indeed, suppose g Since Ω n is a fundamental domain for the free action of G n , it follows that x i = y i and gv By the conditions on U , V , and W , we conclude that k where the last step uses the facts that G n is abelian and preserves measure in X n .In the inner sum, every term vanishes besides the one with j = i + k, by (12).Therefore, For each g ∈ G n , apply (12) to choose an index c(g) ∈ {−N, . . ., N } such that g•B ∩h i B = ∅ for every i ∈ {−N, . . ., N } with i ̸ = c(g).(Here we use the fact that h has order at least 2N + 1.) Then every term in the inner sum of ( 14) vanishes besides the one with i = j − c(g) when it is present, so that This completes the proof.
each k ∈ {1, . . ., d}, our treatment of the previous case delivers a polynomial function p such that ∇p(u i ) = v i for all i ∈ {1, . . ., n}.Denoting L : x → Ax and p := p • L, then since the Jacobian of p at x is ∇p(x) ⊤ , the chain rule gives that is, ∇p(u i ) = v i for every i ∈ {1, . . ., n}, as desired.

B Proof of Proposition 27
First, the finite group G acts properly on M by Proposition 21.5 in [65].Parts (a) and (b) are given by Theorem 21.10 in [65].For (c), Theorem 4.29 in [65] gives that g ↓ is smooth.
To see it is an immersion, we first apply the chain rule to g = g ↓ • π at an arbitrary point x ∈ M : Since g is an immersion, Dg(x) is injective.Since π is a submersion, Dπ(x) is surjective.By part (a), the domain and codomain of Dπ(x) have the same dimension.Consequently, Dπ(x) is a bijection.By (16), is injective, and so g ↓ is an immersion.Next, (d) is given by Proposition 2.32 in [66].For (e), let d R denote the Riemannian distance in M/G, that is  for every p ∈ [x] and q ∈ [y], and the desired bound follows from taking an infimum.To show that π is 1-Lipschitz, take any admissible curve γ : I → M from x to y.Then by Since ϵ > 0 is arbitrary, the desired bound follows.

C Proof of Proposition 28
We present an argument from Moishe Kohan [62].First, we show that any smooth map f : X → Y is upper Lipschitz.Take any p, q ∈ X and let c : [0, d X (p, q)] → X denote the unit-speed parameterization of a shortest path from p to q.Then d Y (f (p), f (q))≤ To prove lower Lipschitz, we further assume that f : X → Y is an embedding.To this end, we first show that f is locally lower Lipschitz in the sense that there exist α, ϵ > 0 such that d Y (f (p), f (q)) ≥ αd X (p, q) whenever d X (p, q) < ϵ.To accomplish this, we take Z := im(f ) and show that the inverse map f −1 : (Z, d Y ) → (X, d X ) is δ-locally L-Lipschitz, in which case we may take α := 1/L and ϵ := δ/β.Our approach uses Riemannian tools to analyze arbitrary u, v ∈ Z with d Y (u, v) sufficiently small, and so we start by constructing a compact submanifold with boundary K ⊆ Y that contains Z along with all length-minimizing geodesics in Y between sufficiently close points in Z. Since Z is a compact submanifold of Y , it has a normal injectivity radius r > 0. That is, letting B r ν Z denote the members of the normal bundle ν Z of Z in Y with norm smaller than r, then the restriction of the normal exponential map exp Z : ν Z → Y to B r ν Z is a diffeomorphism onto its image N ⊆ Y , which in turn is an open tubular neighborhood of Z. Invert this diffeomorphism and project onto Z to obtain a smooth retraction g : N → Z. Applying exp Z to the closure of B r/2 ν Z gives a compact submanifold with boundary K ⊆ N .Next, we find δ > 0 such that d K (u, v) = d Y (u, v) for every u, v ∈ Z with d K (u, v) < δ.By Theorem 6.17 in [66], every z ∈ Z is contained in an open set B z ⊆ K that is geodesically convex.This means that for every u, v ∈ B z , the distance d Y (u, v) equals the length of the unique shortest path in Y from u to v, which in turn is entirely contained in B z ⊆ K, and so d K (u, v) = d Y (u, v).Considering {B z } z∈Z is an open cover of the compact metric space (Z, d K ), then Lebesgue's number lemma produces δ > 0 such that every u, v ∈ Z with d K (u, v) < δ necessary satisfies u, v ∈ B z for some z ∈ Z, in which case d K (u, v) = d Y (u, v).The previous argument then metric maps: Ω = {t → t} Lipschitz maps: Ω = {t → ct : c ≥ 0} Hölder continuous maps: Ω = {t → ct α : c ≥ 0, α > 0} uniformly continuous maps: Ω = {weakly increasing, upper semicontinuous functions [0, ∞] → [0, ∞] that vanish at zero}.

Figure 1 :
Figure 1: Illustration of Example 10.Here, we take V := R 2 and G := {± id} ≤ O(2).The horizontal and vertical axes represent input and output distances, respectively.(left) Draw a million pairs of vectors x, y ∈ R 2 with standard gaussian distribution and plot the output distance ∥xx ⊤ − yy ⊤ ∥ F versus the input distance d([x], [y]).One can show that if the input distance is a > 0, then the output distance can take any value in [a 2 / √ 2, ∞).The lower bound is depicted in red.Notably, the map [x] → xx ⊤ is neither Lipschitz nor lower Lipschitz.(middle) Draw a million pairs of vectors x, y ∈ R 2 uniformly from the unit circle and plot the output distance ∥xx ⊤ − yy ⊤ ∥ F versus the input distance d([x], [y]).One can show that that if the input distance is a > 0, then the output distance resides in the interval [a, √ 2a].These bounds are depicted in red.Notably, the map [x] → xx ⊤ is bilipschitz when restricted to S(R 2 )/G.(right) Draw a million pairs of vectors x, y ∈ R 2 with standard gaussian distribution and plot the output distance ∥ 1 ∥x∥ xx ⊤ − 1 ∥y∥ yy ⊤ ∥ F versus the input distance d([x], [y]).One can show that that if the input distance is a > 0, then the output distance resides in the interval [a, √ 2a].These bounds are depicted in red.Notably, the map [x] → 1 ∥x∥ xx ⊤ is bilipschitz.More generally, Theorem 13 gives that the homogeneous extension of a bilipschitz map is bilipschitz.

Figure 3 :
Figure 3: Illustration of Example 19.Max filtering invariants are piecewise linear.In this example, the orbits of the templates z 1 , z 2 , z 3 ∈ R 2 determine points of non-differentiability (namely, the boundaries of their Voronoi cells) in the resulting max filtering invariant.On the left, the group G ≤ O(2) consists of rotations by multiples of 2π/3, while on the right, G is the dihedral group of order 6.Considering Theorem 21(b), the non-differentiability of max filtering is an artifact of its bilipschitzness.

Proposition 26 .
3.1 and 5.3.2in[95].Nevertheless, nonabelian examples exist, and furthermore, they have been classified; see Chapter 6 of[95].Interestingly, every one of these examples admits a polynomial f :R d → R n for which f ↓ | S(R d )/G is bilipschitz:Theorem 25.For finite G ≤ O(d), the following are equivalent: (a) G acts freely on S(R d ).(b) There exists a bilipschitz polynomial map S(R d )/G → R n for some n ∈ N. (c) There exists a bilipschitz polynomial map S(R d )/G → R 2d−1 .If G ≤ O(d) is finite and acts freely on S(R d ), then one may combine Theorem 25(c) with Lemma 15 and Theorem 13(b) to obtain a bilipschitz map R d /G → R 2d .By comparison, max filtering delivers injective maps R d /G → R 2d , and while these maps are conjectured to be bilipschitz, this is currently only known for certain choices of G [29, 72].(Max filtering is known to provide bilipschitz invariants R d /G → R n for large n.)We will prove Theorem 25 with the help of a few (essentially known) propositions, whose proofs can be found in the appendix.Given distinct u 1 , . . ., u n ∈ R d and not necessarily distinct v 1 , . . ., v n ∈ R d , there exists a polynomial function p : R d → R such that ∇p(u i ) = v i for every i ∈ {1, . . ., n}.Proposition 27.Let G be a finite group acting smoothly and freely on a smooth manifold M , and consider the quotient map π : M → M/G defined by π(x) = [x].(a)M/G is a topological manifold of the same dimension as M .(b)There exists a unique smooth structure on M/G for which π is a smooth submersion.

Theorem 30 .
Given a finite abelian G ≤ U (d), choose any orthonormal basis {u i } d i=1 of C d for which there exists characters {χ i } d i=1 of G such that gu i = χ i (g)u i for all g ∈ G and i ∈ {1, . . ., d}.The following are equivalent: (a) G acts freely on S(C d ).

)Example 45 .Example 46 .
Before proving Theorem 44, we consider several examples.For each n ∈ N, let G n := X n := C n ≤ T act on itself by multiplication, and take Ω n := {1} for a fundamental domain.Then Theorem 44 impliesc 2 ℓ 2 (Z)/Z ≤ lim inf n→∞ c 2 ℓ 2 (C n )/C n .The rotation group G := SO(2) ∼ = T has discrete cyclic subgroups of all finite orders, and it acts freely on the punctured plane X := R 2 \ {0}.Then the fundamental domain Ω := {(r, 0) : r > 0} has measure dν n (r, 0) := r dr, and thanks to polar coordinates, there is a measure space isomorphism G × Ω ∼ = X.(Here we scale Haar measure on SO(2) to have total measure 2π.) Taking G n = G and Ω n = Ω for every n in Theorem 44 givesc 2 ℓ (Z)/Z ≤ c 2 L 2 (R 2 )/SO(2) .Example 47.Let G be a second countable locally compact group, and let G ≤ G be a closed abelian subgroup with arbitrarily large discrete cyclic subgroups.(For instance, G = G = T, or G = R and G = Z.)Consider the action of G on X := G by left multiplication.By Theorem 3.6 in[60], there is a measurable fundamental domain Ω ⊆ G with a σ-finite measure ν such that L 2 (Ω, ν) is separable and the mapping G × (Ω, ν) → G given by (g, ω) → gω is a measure space isomorphism.Then applying Theorem 44 with G n = G and Ω n = Ω for every n gives c 2 ℓ 2 (Z)/Z ≤ c 2 L 2 (G)/G .In particular, all of the following are bounded below by c 2 ℓ 2 (Z)/Z : (a) The function G → L 2 (G; H) that maps g → L g φ is continuous.(b) The function G → C that maps g → ⟨L g φ, ψ⟩ resides in C 0 (G).
[43]pschitz.By Propositions 27 and 28, it suffices to show that f ↓ | S(C d )/G is injective and f | S(C d ) is an immersion.Injectivity follows from (the proof of) Proposition 5.2.1 in[43].To prove immersion, assume {u i } d i=1 is the standard basis without loss of generality so that f