A statistical test of isomorphism between metric-measure spaces using the distance-to-a-measure signature

We introduce the notion of DTM-signature, a measure on R that can be associated to any metric-measure space. This signature is based on the function distance to a measure (DTM) introduced in 2009 by Chazal, Cohen-Steiner and Merigot. It leads to a pseudo-metric between metric-measure spaces, that is bounded above by the Gromov-Wasserstein distance. This pseudo-metric is used to build a statistical test of isomorphism between two metric-measure spaces, from the observation of two N-samples. The test is based on subsampling methods and comes with theoretical guarantees. It is proven to be of the correct level asymptotically. Also, when the measures are supported on compact subsets of R^d, rates of convergence are derived for the L1-Wasserstein distance between the distribution of the test statistic and its subsampling approximation. These rates depend on some parameter \rho> 1. In addition, we prove that the power is bounded above by exp(−CN^(1/ \rho)), with C proportional to the square of the aforementioned pseudo-metric between the metric-measure spaces. Under some geometrical assumptions, we also derive lower bounds for this pseudo-metric. An algorithm is proposed for the implementation of this statistical test, and its performance is compared to the performance of other methods through numerical experiments.


Introduction
Very often data comes in the form of a set of points from a metric space. A natural question, given two such sets of data, is to decide whether they are similar. For example, do they come from the same distribution? Are their shapes similar? From the seminal two-samples tests of Kolmogorov-Smirnov specific to measures on R or even on R d , to the more recent kernel two-sample tests by Gretton et al. [29], where the data are sent into a reproducing kernel Hilbert space and then compared through the maximum mean discrepancy, the literature is abundant and proficient on the subject of two-sample testing. Note that an overview of Wasserstein-distance-based two-sample tests appears in [40].
Unfortunately, testing equality of two measures from samples may be compromised when the data are not embedded into the same space, or if the two systems of coordinates in which The signature we introduce in this paper is based on the distance to a measure, which is defined in [13] as a generalisation of the function distance to a compact set and can be defined as follows.
Let (X , δ) be a metric space, equipped with a Borel probability measure µ. Given m in [0, 1], the pseudo-distance function is defined at any point x of X , by This function can be easily computed when the measure of interest is uniform on a finite set of N points:μ N = 1 N N i=1 δ Xi with the X i s in a metric space (X , δ). Indeed, in this case, δμ N ,l (x) is the distance between x and its lN nearest neighbour, denoted by X ( lN ) . As a consequence, the distance to the measureμ N with mass parameter m = k N for some k in {1, . . . , N } at a point x of X satisfies: For two probability measures µ and ν over R, the L 1 -Wasserstein distance can be rewritten as the L 1 -norm between the cumulative distribution functions of the measures, F µ : t → µ((−∞, t]) and F ν , or equivalently, as the L 1 -norm between the quantile functions, F −1 µ : s → inf{x ∈ R | F (x) ≥ s} and F −1 ν ; see for instance [8,Theorem 2.9 and Theorem 2.10] and the references therein. Thus, for empirical measures on R, its computation is easy.
Its complexity is the same as the complexity of a sort. Then, the computation of the statistic and the subsampling distribution, see Section 2, will also be easy.
As mentioned above, the strategy we use to build the test is subsampling, which is close to bootstrap. Such methods were first introduced by Efron [22] in 1979, mainly to derive confidence intervals, but were often used since then even in the domain of topological data analysis for the function distance to a measure [14] ; see [41] for a main reference on asymptotic bootstrap and [3] for non-asymptotic bootstrap. But as aforementioned, the choice of n = N is unsuccessful and bootstrap fails experimentally and theoretically at least for our choice of statistic. Just as Politis and Romano in [39], we decide alternatively to use only a small part of the sample to approximate the distribution of the statistic, although we also use only a small part of the points to build the statistic. Thus, this method relates to subsampling.
In order to prove that the distribution of the statistic and the subsampling distribution are close, we derive an upper-bound for the Wasserstein distance between the two. Such a method was already used in the paper [5]. It is then enough to prove the convergence of the distribution of the statistic to some continuous distribution to establish that our test is asymptotically of the proper level, meaning that it is valid.
The paper is organized as follows. In Section 2, we construct the statistical test and provide the main results of the paper. In there, we state assumptions under which the test is proven to be asymptotically of the correct level. As well, we derive some non-asymptotic bounds for the expectation of the L 1 -Wasserstein distance between the distribution of the statistic and the subsampling distribution. We also provide a lower-bound for the power of the test. This lower bound depends on some discriminative quantity, a pseudo-distance between mm-spaces, which is studied in Section 3 in different contexts. In this section, the pseudo-distance is proven to be bounded above by the Gromov-Wasserstein and by L 1 -Wasserstein distances. Thus, the statistical test is stable under Wasserstein noise. In Section 4, we propose an algorithm to implement the test. Moreover, some numerical experiments illustrate the fact that our method works. We give an example for which our method even performs better than some other method. Finally, in Section 5 we expose three ideas of imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date:  isomorphism testing methods linked to our test, and show that one does not work at all whereas the other two could lead to major improvements.
The statistical tests we propose are based on the observation of two samples, an N -sample from µ and an N -sample from ν. To simplify notation, we assume that N = N , but the methods proposed also work when N is different from N . More importantly, we have to keep the same n in both cases, as defined below.
Given an N -sample X 1 , X 2 ,. . . X N from the measure µ, we denoteμ N = 1 N N i=1 δ Xi and µ n = 1 n n i=1 δ Xi for some n ≤ N . As well, we defineν N andν n from an N -sample from ν. We recall that dμ N ,m (μ n ) is the discrete distribution 1 n n i=1 δ dμ N ,m (Xi ) , and that we compare signatures with W 1 , the L 1 -Wasserstein distance.
Note that for two isomorphic mm-spaces (X , δ, µ) and (Y, γ, ν), the following distributions L N,n,m (µ, µ), L N,n,m (ν, ν), and 1 2 L N,n,m (µ, µ)+ 1 2 L N,n,m (ν, ν) are equal. The notation 1 2 L 1 + 1 2 L 2 stands for the distribution of a random variable that is generated according to L 1 with probability 1 2 and according to L 2 with probability 1 2 . The three aforementioned distributions correspond to the distribution of the test statistic T N,n,m (µ, ν); see Lemma C.1 in the Appendix.
The test we deal with in this paper is then φ N,n,m = 1 T N,n,m (µ,ν)≥q 1−α,N,n,m .
imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date:  The null hypothesis H 0 is rejected if φ N,n,m = 1, that is if the L 1 -Wasserstein distance between the two empirical signatures dμ N ,m (μ n ) and dν N ,m (ν n ) is too high.
Note that it is equivalent to compute a p-valuep N,n,m from the subsampling distribution and the test statistic:p The statistical test consists in rejecting the hypothesis H 0 if the p-value is not larger than α, that is φ N,n,m = 1p N,n,m ≤α .

A test of asymptotic level α
In this section, we consider two isomorphic mm-spaces (X , δ, µ) and (Y, γ, ν) (we may write (µ, ν) ∼ H 0 ). In order to assert the validity of the test φ N,n,m , the probability P (µ,ν)∼H0 (φ N,n,m = 1) of rejecting H 0 must be bounded above by α. In this section, under mild assumptions, we prove that the test φ N,n,m is of asymptotic level α, that is such that We will prove (Lemma 2.1 and Lemma 2.2) that the test is of asymptotic level α when the distribution L N,n,m (µ, µ) converges weakly to some atomless distributionL, and when its approximation L * N,n,m (μ N ,μ N ) from the sample satisfies that W 1 (L * N,n,m (μ N ,μ N ),L) converges in probability to 0.
Let G µ,m and G µ,m be two independent Gaussian processes with covariance kernel κ(s, t) = F dµ,m (µ) (s) 1 − F dµ,m (µ) (t) for s ≤ t, with F dµ,m (µ) the cumulative distribution function of d µ,m (µ). The limit distributionL is actually given by the distribution of G µ,m − G µ,m 1 , the integral on R of the absolute value of the difference between the two Gaussian processes. When the distributions µ and ν are compactly supported, these convergences occur under assumptions outlined in the following theorem. Theorem 2.1. Let (X , δ, µ) and (Y, γ, ν) be two mm-spaces, with µ and ν compactly supported. Let n be such that n N = o(1) and assume that when N goes to infinity, The first assumption of Theorem 2.1 can be readily checked in some specific case. Among the distributions supported on compact subsets of R d , Theorem 2.2 deals with distributions that are regular in the following sense. We say that a measure µ is (a, b)-standard with positive parameters a and b, if for any positive radius r and any point x of the support of µ, we have that µ(B(x, r)) ≥ min{1 , ar b }, with B(x, r) = {y ∈ X | δ(x, y) < r}. The assumption of (a, b)-standardness has been widely used in the context of set estimation and Topological Data Analysis [18,20,21,17,15,24] Here For compactly-supported distributions on R d and among them, for (a, b)-standard distributions, assumptions of Theorem 2.1 are satisfied as soon as n remains small enough with respect to N , as follows.
Theorem 2.2. Let µ and ν be two Borel probability measures supported on compact subsets of R d . We set N = cn ρ with some c > 0 and ρ > 1.

The statistical test
with the additional assumption that L( G µ,m − G µ,m 1 ) is atomless.
Morally, checking the assumption "L( G µ,m − G µ,m 1 ) is atomless" boils down to verify the following assumption: The reason of this assertion is the following. The process (G µ,m (t) − G µ,m (t)) t∈R is a Gaussian process with covariance kernel given by 2F dµ,m (µ) (s) 1 − F dµ,m (µ) (t) for s ≤ t. Set I, the smallest interval containing the support of the measure d µ,m (µ). Note that for every t / ∈ I, G µ,m (t) − G µ,m (t) = 0 since then its variance 2F dµ,m (µ) (t) 1 − F dµ,m (µ) (t) is equal to 0. As a consequence, G µ,m − G µ,m 1 is the integral of |G µ,m (t) − G µ,m (t)| over the interval I. This interval is reduced to a single point when d µ,m (µ) is a Dirac mass. In this case, G µ,m − G µ,m 1 is constant equal to 0. This interval is non trivial as soon as d µ,m (µ) is not a Dirac mass. In this case, L( G µ,m − G µ,m 1 ) is atomless, as the distribution of the integral of continuous random variables. A rigorous proof of this intuitive result is out of the scope of this paper. Even so, it should be noted that (G µ,m (t) − G µ,m (t)) t∈R has imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date: February 19, 2019 the same distribution as √ 2B(F dµ,m (µ) (t)) t∈R , where (B(s)) s∈ [0,1] is the Brownian bridge. Continuity of B 1 follows from former work in the literature. Indeed, Johnson and Killeen [31] derived an expression for the cumulative distribution function of L( B 1 ) that depends on the Airy function and is continuous, and Rice [32] derived an expression for its density.
The measures µ for which assumption A m is satisfied for no m ∈ [0, 1] are such that d µ,m (x) = d µ,m (y) for every x, y ∈ Supp(µ) and m ∈ [0, 1], that is: (2.1) For instance, discrete distributions for which A m is never satisfied are exactly the distributions that are uniform on a finite set of points (Equation (2.1) with r = 0) and that satisfy that: "For every x, y ∈ Supp(µ), for every d > 0, the number of points in Supp(µ) at distance d to y is the same as the number of points in Supp(µ) at distance d to x". Another example of measure µ such that A m is never satisfied is given by any uniform distribution on a sphere in R d . In this case, (2.1) is trivially satisfied.
Characterizing all distributions for which A m is not satisfied for some fixed m ∈ [0, 1] is not simple. The examples are more abundant than the above-mentioned ones. For instance, µ = 0.2δ 0 + 0.2δ 1 + 0.3δ 5 + 0.3δ 7 does not satisfy A 0.4 . For such examples, the isomorphism test on two samples from µ with parameter m will not work. Nonetheless, it is possible to circumvent this issue by modifying the datasets, as follows.
Set D, a continuous and compactly-supported isotropic distribution on R d ; for instance, the restriction of the normal distribution N (0, 1) to a ball centered at 0. Then, the convolution of two isomorphic distributions µ and ν (in R d ) with D will be isomorphic and will satisfy assumption A m for every m ∈ (0, 1). For practice, this strategy consists in replacing the sample (X i ) i∈[ [1,N ]] from µ with the sample (X i + Z i ) i∈[ [1,N ]] for Z i ∼ D independent random variables, independent from (X i ) i∈[ [1,N ]] . For the testing purpose, the same procedure should be applied to the sample from ν.
when n and N go to ∞. But also, that the L 1 -Wasserstein metric between the subsampling distribution and L converges to 0 in probability. Note that it is sufficient to prove these convergences for L N,n,m (µ, µ) and L * N,n,m (μ N ,μ N ). Indeed, the L 1 -Wasserstein distance W 1 is a metric for weak convergence and satisfies that for any distributions L 1 , L 1 , L 2 and L 2 , . This is a straightforward consequence of the definition of the L 1 -Wasserstein distance with transport plans. Then, Theorem 2.1 follows from the following two lemmas.

2)
and when N goes to infinity.
Proof. Proof in the Appendix, in Section C.3.
Convergence of the distribution of the statistic and convergence of its approximation via subsampling to a fixed distribution is not sufficient to prove that the test has the correct level. Continuity of the limit distribution L is also required. Proof. Proof in the Appendix, in Section C.3.
Note that under hypothesis H 0 , the distributions 1 where P (µ,ν) (φ N,n,m = 0) stands for the probability that φ N,n,m = 0 when the test is built from samples from two general mm-spaces (X , δ, µ) and (Y, γ, ν). If the spaces are not isomorphic, we want the test to reject H 0 with high probability. It means that we want the power to be as large as possible. Here, we give a lower bound for the power, or more precisely an upper bound for P (µ,ν) (φ N,n,m = 0), the type II error. Theorem 2.3. Let µ and ν be two Borel measures supported on X and Y, two compact subsets of R d . We assume that the mm-spaces (X , δ, µ) and (Y, γ, ν) are non-isomorphic and that the DTM-signature is discriminative for some m in (0, 1], meaning that the pseudometric W 1 (d µ,m (µ), d ν,m (ν)) is positive. We choose N = n ρ with ρ > 1. Then for all positive , there exists N 0 depending on µ and ν such that for all N ≥ N 0 , the type II error Proof. Proof in the Appendix, in Section C.6.
In order to have a high power, that is to reject H 0 more often when the mm-spaces are not isomorphic, we need n to be big enough, that is ρ small enough. Recall that n has to be small enough for the law of the statistic and its subsampling version to be close. This means that some compromise must be made. Moreover, the choice of m for the test should depend on the geometry of the mm-spaces. The tuning of these parameters from the data is still an open question.
Moreover, note that the power of the test is strongly related to is powerful when the pseudo-metric is high with respect to the diameters of the signatures supports and does not discriminate between measures when it is low. In the following section, we derive some upper-bounds and lower-bounds for the pseudo-metric W 1 (d µ,m (µ), d ν,m (ν)), under some geometric assumptions.

Stability of the DTM-signatures
In this section, we prove stability results for the DTM-signature. These results all rely on the stability of the distance-to-a-measure function itself.
Proposition 3.1 (Stability, in [13] for R d , in [9] for metric spaces). For two mm-spaces (X , δ, µ) and (Y, δ, ν) embedded into the same metric space, we have that In [35], Mémoli proposes a metric on the quotient space of mm-spaces by the relation of isomorphism, the Gromov-Wasserstein distance.
with Γ X ,Y (x, y, x , y ) = |δ(x, x ) − γ(y, y )|. Here Π(µ, ν) stands for the set of transport plans between µ and ν, that is the set of Borel probability measures π on X × Y satisfying π(A × Y) = µ(A) and π(X × B) = ν(B) for all Borel sets A in X and B in Y.
The DTM-signature turns out to be stable with respect to this Gromov-Wasserstein distance.

Proposition 3.2.
We have that: Proof. Proof in the Appendix, in Section B. The proof is relatively similar to the ones given by Mémoli in [35] for other signatures.
It follows directly that two isomorphic mm-spaces have the same DTM-signature. Whenever the two mm-spaces are embedded into the same metric space, we also get stability with respect to the L 1 -Wasserstein distance. Proposition 3.3. If (X , δ, µ) and (Y, δ, ν) are two metric measure spaces embedded into some metric space (Z, δ), then we can bound W 1 (d µ,m (µ), d ν,m (ν)) above by and more generally by Proof. First notice that: Then, for all π in Π(µ, ν): Thus, since d ν,m is 1-Lipschitz: Then, the result follows from Proposition 3.1.

Discriminative properties of the DTM-signatures
The DTM-signature is stable but unfortunately does not always discriminate between mmspaces. Indeed, in the following counter-example from [35] (example 5.6), there are two non-isomorphic mm-spaces sharing the same signatures for all values of m. Example 3.1. We consider two graphs made of 9 vertices each, clustered in three groups of 3 vertices, such that each vertex is at distance 1 exactly from each vertex of its group and at distance 2 from any other vertex. We assign a mass to each vertex; the distribution is the following, for the first graph ( Figure 1 The mm-spaces ensuing are not isomorphic since any one-to-one and onto measure-preserving map would send at least one pair of vertices at distance 1 from each other to a pair of vertices at distance 2 from each other, and thus it would not be an isometry. Moreover, note that the DTM-signatures associated to the graphs are equal since the total mass of each cluster is exactly equal to 1 3 .
Nevertheless, the signature can be discriminative in some cases. In the following, we give lower bounds for the L 1 -Wasserstein distance between two signatures under different alternatives.
We will prove in Proposition 3.4 that this pseudo-distance is proportional to |1 − λ| when we consider a metric space (X , δ, µ) and its dilatation (X , λδ, µ) with a factor λ > 0. More generally, it is possible to discriminate between two uniform distributions that are supported on compact subsets of R d , with different Lebesgue volumes Leb d (O) and Leb d (O ).
In Proposition 3.5 we provide a lower bound for the distance between such signatures, that is proportional to |Leb Note that this bound is kind of optimal, since we recover the factor |1 − λ| of Proposition 3.4 when O is the image of O with a dilatation of parameter λ.
Moreover, two uniform distributions on compact sets with the same Lebesgue volume might also have different signatures, as enhanced by Example 3.3. Indeed, according to Proposition 3.6, whenever the inner offsets O = {x ∈ O | inf y∈∂O x − y 2 ≥ } and O have different Lebesgue volume for some > 0, the signatures associated with some parameter m, depending on , will be different. Consequently, it is possible to discriminate between a "thin" subset of R d or a set with not regular boundary and a "fat" set or a set with a regular boundary (for instance, a set with a large reach, as defined in Section 3.2.3) such as a ball.
The signatures are also sensitive to the density of distributions. A distribution with density bounded above by some constant C (a uniform distribution for instance) can be discriminated from distributions which density is larger than C on some sets that are large enough. Lower bounds for the distance between such signatures are derived in Proposition 3.7, Proposition 3. positive ν-measure. One might use such a strategy to prove that two signatures are different in many other situations than the following situations handled in this paper.

When the distances are multiplied by some positive real number λ
Let λ be some positive real number. The DTM-signature discriminates between two mmspaces isomorphic up to a dilatation of parameter λ, for λ = 1.
for X a random variable of law µ.

The case of uniform measures on non-empty bounded open subsets of R d
The DTM-signature discriminates between two uniform measures over two non-empty bounded open subsets of R d with different Lebesgue volume, provided that m is small enough.
Proof. Proof in the Appendix, in Section A.2.
Proposition 3.5 states that two uniform distributions on compact sets with different Lebesgue volume have different signatures. Nonetheless, when these Lebesgue volumes are equal, it might also be possible to detect difference between the measures by considering the volumes of the inner offsets.
Proof. This proposition is a direct consequence of Proposition A.1 in the Appendix.
This proposition can be applied to the following simple cases.
In this example, m = 2 . More generally, "thin" sets and "fat" sets can be discriminated: Proof. The set O is empty but O is not empty.
The signatures of two sets that have the same volume and are very close (in terms of the Hausdorff metric for instance) might be different provided that the boundary of the first set is regular whereas the boundary of the second set is not. For instance, a rectangle and a biscuit-nantais-shaped rectangle have different signatures since the inner-offsets of the biscuit-nantais-shaped rectangle have smaller Lebesgue volume than the inner-offsets of the rectangle.
The aforementioned examples deal with measures with a support of dimension d in R d .
Nonetheless, sometimes, it is also possible to prove that uniform distributions on submani- For the same reason, a segment and a spiral with the same length have different signatures, whatever the value of m. This kind of example is investigated in the simulations part. In the following, we highlight the fact that signatures catch the density variation of distributions.
We can consider the λ-super-level sets of the function f denoted by {f ≥ λ}. Again, we will denote by {f ≥ λ} the set of points belonging to {f ≥ λ} whose distance to ∂{f ≥ λ} is at least .
Then we get the following lower bound for the L 1 -Wasserstein distance between the two signatures:

Proposition 3.7. Under these hypotheses, a lower bound for
and ω d stands for Leb d (B(0 , 1 )), the Lebesgue volume of the unit d-dimensional ball.
Proof. Proof in the Appendix, in Section A.3.
It should be noted that when ν = µ O , f is constant, equal to 1. Then, for λ > 1, the sets {f ≥ λ } are empty. As a consequence, the lower bound obtained in Proposition 3.7 is zero. Another simple example is the following.
The density of ν with respect to µ is given by f which is equal to 1 2 on B(0 , 1 )\B(0 , 1 2 ) and 5 2 on B(0 , 1 2 This set is non-empty if and only if λ ≥ 4m.
It is maximal at λ = 5 2 . Note that we must assume that 5 2 The proof of Proposition 3.7 extends to non-uniform distributions in R d , that are not necessarily supported on the same set.
Then, a lower bound for W 1 (d µ,m (µ), d ν,m (ν)) is given by Actually, Proposition 3.7 is a consequence of Proposition 3.8. It suffices to replace g max with 1 Leb d (O) , the value of the density of µ O with respect to the Lebesgue measure, on O. In the following, we work with the framework of Proposition 3.7.

When the density f is Hölder
In order to get additional results about discrimination, we need to define a quantity characterising the complexity of the set O. This is the notion of reach of an open set, defined from its medial axis.
Its reach, Reach(O), is then defined as the distance between its boundary ∂O and its medial axis M(O). That is, In the following, we assume that Reach(O) > 0 and that f is Hölder on O, with positive parameters χ ∈ (0, 1] and L > 0, that is: Then for m small enough, the DTM-signature is discriminative.
Under the assumptions of Proposition 3.7, if one of the following conditions is satisfied, then the quantity Moreover, under any of these conditions, we get the following lower bound on the pseudo- Here, ω d stands for Leb d (B(0 , 1 )), the Lebesgue volume of the unit d-dimensional ball.
Proof. Proof in the Appendix, in Section A.3.
This proposition displays different intervals of values for m for which the DTM-signatures are discriminative. These intervals depend on the reach of O. Indeed, if m is small enough with respect to Reach(O), then the distance to the measure µ O is easier to approximate on the whole set O, and is even known on most of the set (see Proposition A.1), and thus easier to compare with the distance to the measure ν.
The Hölder hypothesis provides some continuity of the density. Then, the function distance to the measure µ O will take close values at any two points close enough. The main idea is to use the fact that the density of ν is not constant. Then, we can, for particular values of m, reveal a set of points of positive µ O -measure for which the distance to the measure ν is smaller than the minimum of the distance to the measure µ O .

Morally, this proposition consists in proving that
For each of the three intervals (a, b), the value of a or b is computed from the inequality f ∞,O > λ + 2L (λ) χ for some λ well chosen, as follow. For the first interval, we take λ = 1; for the second one, we take λ such that (λ) = Reach(O); for the last one, we take the λ that minimizes the function λ → λ + 2L (λ) χ on R + . The second value b or a is computed according to the additional constraint for the λs: λ ≥ 1 and (λ) ≤ Reach(O).
In particular, the first interval is empty when The last interval is empty when the function λ + 2L (λ) χ attains its minimum at a point λ that is smaller than 1 or smaller than mLeb This proposition can be applied to concrete cases, proving the existence of some mass parameters m for which the DTM-signature is discriminative. Proof. The density f of the multivariate normal distribution N (0, σ 2 I) restricted to B(0 , 1 ) is Lipschitz, as a consequence, it is possible to apply Proposition 3.9 with the parameter χ = 1. The proof is deferred to the Appendix, in Section A.3.
The previous examples provide several relevant cases where the DTM-signature turns out to be discriminative. Thus, the test of isomorphism will be powerful for some distributions.

Numerical experiments
In this section, we first describe the procedure to implement the statistical test of isomorphism. Then, we illustrate the validity of the method by providing some numerical approximations of the type-I error and the power of the test for various examples. We also compare our test to a more basic statistical test of isomorphism.

The algorithm
The procedure for the statistical test is as follows.
In the code, if Z = {Z 1 , Z 2 , . . . , Z n }, then we use the notation 1 Z for the measure 1 n n i=1 δ Zi .
# Compute W sub , a N sub -sample from the subsampling law Let W sub be empty ; f o r j in 1 . . N sub /2 : Let dtmX 1 and dtmX 2 be two independent n-samples from dtmX with replacement ; Let dtmY 1 and dtmY 2 be two independent n-samples from dtmY with replacement ; Add √ # Compute pval, the p-value of the statistical test Let pval be equal to the mean number of elements in W sub bigger than T ; Recall that the L 1 -Wasserstein distance W 1 is simply equal to the L 1 -norm of the difference between the cumulative distribution functions, which is easy to implement in the discrete case. As explained in the Introduction, in order to compute the distance to an empirical measure on a N -sample at a point x, it is sufficient to search for its k = mN -nearest neighbours in the sample, where m ∈ [0, 1] is the mass parameter. The distance to the empirical measure can also be implemented by the R function dtm with tuning parameter r = 1, from the package TDA [23].

An example in R 2
In this subsection, we will compare the statistical test of this paper (DTM) with the statistical test (KS) which consists in applying a Kolmogorov-Smirnov two-sample test to the given an N -sample X = {X 1 , X 2 , . . . , X N } from an mm-space (X , δ, µ) and an N -sample We apply our isomorphism test to measures supported on spirals in R 2 . For some shape parameter v ∈ R + , the measure µ v is the distribution of the random vector (R sin(vR) + 0.03S, R cos(vR) + 0.03S ), with R, S and S independent random variables, S and S from imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date: February 19, 2019 the standard normal distribution N (0, 1) and R uniform on (0, 1). In the following experiments, we choose µ = µ 10  This leads to the empirical measuresμ N andν N . In Figure 4, we plot the cumulative distribution function of the measure dμ N ,m (μ N ), that is, the function F defined for all t in R by the proportion of the X i in X satisfying dμ N ,m (X i ) ≤ t. It approximates the true cumulative distribution function associated to the DTM-signature d µ,m (µ). As well, we plot the cumulative distribution function of the measure dν N ,m (ν N ).
The signatures are different. Thus, for the choice of parameter m = 0.05, the DTMsignature discriminates well between the measures µ = µ 10 and ν = µ 20 . The signature with parameter m = 0.05 provides a local information about the measure µ v . The spiral with v = 10 is less coiled that the spiral with v = 20. It means that at a point of the spiral, catching 5 percent of points requires a larger radius for v = 20 than for v = 10. As a consequence, d µ10 ,0 .05 (µ 10 ) takes smaller values than d µ20 ,0 .05 (µ 20 ), as illustrated by Figure 4. Note that a m close to 1 would not be relevant in such an example. Indeed, d µ,1 (x) roughly corresponds to the distance of point x to the expectation of the measure µ. Since the spirals have the same diameter, the signatures would be very close for m close to 1.
Note that a small m would not be appropriated to discriminate between a spiral and its uncoiled version (a noisy sample generated around a segment). Indeed, the local behavior of the measure would be the same. In this case, a large choice of m would be more appropriate.
In Figure 5, for m = 0.05 and n = 20, we first generate 1000 independent realisations of the random variable √ nW 1 (dμ N ,m (μ n ), dμ N ,m (μ n )), whereμ N andμ N are independent empirical measures from µ 10 , N = 2000, andμ n andμ n are the empirical measures associated to the n first points of the samples. We plot the empirical cumulative distribution function associated to this 1000-sample.
As well, from two fixed N -samples from µ 10 , leading to two empirical distributionsμ N andμ N , we generate a set of N sub = 1000 random variables √ nW 1 (dμ N ,m (µ * n ), dμ N ,m (µ * n )), as explained in the Algorithm in Section 4.1, and we plot its cumulative distribution function. Note that the two cumulative distribution functions are close. This means that the 1 − α-quantile of the distribution of the test statistic is well approximated by the 1 − αquantile of the subsampling distribution.   In order to approximate the type-I error and the power, we repeat the procedure of test DTM described in Section 4.1 1000 times independently. At each step, we sample N = 2000 points from the measures µ = µ 10 and ν = µ v to approximate the power, or twice µ v to imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date: February 19, 2019 approximate for the type-I error. We select the parameters α = 0.05, m = 0.05, n = 20, and repeat subsampling N sub = 1000 times. Then, we retain either H 0 or H 1 . The type-I error or power approximation is simply equal to the mean number of times the hypothesis H 0 was rejected among the 1000 independent experiments. We also approximate the power for the method KS after repeating this test procedure 1000 times independently. Note that by construction, the test KS is truly of level α = 0.05. Figure 8 contains the numerical values we obtained using the R software. It turns out that our isomorphism test DTM is of level close to α = 0.05 and is powerful. For parameters v ≥ 20, our test is even more discriminative than the test KS.

An example in R 28×28
In this subsection, we use our statistical test of isomorphism to compare the distribution of the digits "2" and the distribution of the digits "5" from the MNIST handwritten digits database. Each digit is represented by a picture of 28 × 28 pixels with grey levels, meaning that each digit X = (x 1 , x 2 , . . . , x 28×28 ) can be seen as an element of R 28×28 , where x i is equal to the grey level of the i-th pixel. We equip R 28×28 with the Euclidean metric. The interest of the statistical test here is not to test whether a "2" and a "5" are isometric, but whether the distribution of the "2"s and the distribution of the "5"s (which are distributions on compact subsets of R 28×28 ) are equal up to an isomorphism. These distributions are clearly not equal, since their supports are different, but it is not clear that there is no rigid transformation between the set of "2" and the set of "5" preserving the measures, as for instance a simple permutation of the pixels.
The statistical test is based on the observation of N = 5958 "2" and N = 5421 "5". In order to prove the validity of the test, we repeat 1000 times the experiment consisting in randomly splitting the set of "2" in two parts and applying the statistical test to these two samples. The type-I error approximation is equal to the mean number of times the hypothesis H 0 was rejected. We do the same with the set of "5". And we repeat these experiments for different values of n ∈ {10, 20, 30, 50, 75, 100, 200} and for a fixed m = 0.1, we repeat subsampling N sub = 1000 times.
These results are encouraging since they prove that the test does not discriminate between two samples of "2" (respectively, between two samples of "5") with probability 0.95. Thus, the type-I error is of order 0.05. We choose the parameter n = 100 to make the test between the sample of "2" and the sample of "5". We get a p-value equal to 0. It means that we reject H 0 at any level α. So we can conclude that the distribution of the "2" and the distribution of the "5" in the MNIST database are not isomorphic.
Unlike the spirals, there is a priori no intuition about how to choose a parameter m that discriminates between the distribution of the "2" and the distribution of the "5". In this case, one may choose some intermediate parameter m (for instance m = 0.1), which does not contains too local or global information about µ. A better idea is to refer to Theorem 2.3 and the remarks below. One may plot as a function of m. A suitable choice of m to discriminate between the distributions would be any maximum m of this function.

The Kolmogorov-Smirnov test applied to empirical DTM-signatures
In order to test isomorphism between two mm-spaces (X , δ, µ) and (Y, γ, ν) from two Nsamples  Thus, applying a Kolmogorov-Smirnov test to empirical DTM-signatures is to be avoided.

A different value of n for the test statistic and for the subsampling distribution
In [39], Politis and Romano propose subsampling methods consisting in approximating the distribution of a statistic with values of the statistic built from smaller subsets of the data. For our statistical test, since the distribution of the statistic and the subsampling distribution converge weakly to the same distribution under some assumptions, see Lemma 2.1, we can imagine fixing some parameters n and l smaller than N , choosing as a test statistic and approximating its distribution with the subsampling distribution If we do so, consider (a, b)-standard measures supported on compact subsets of R d and choose N = n ρ and l = n β with 1 < β < ρ, then the test is asymptotically of level α, provided that the cumulative distribution function of G µ,m − G µ,m 1 is continuous.
Moreover, the proof of Theorem 2.3, which provides an upper-bound for the type-II error, can be generalised to this case, leading to the upper-bound for the type-II error of the test φ N,n,m = 1 √ lW1(dμ N ,m (μ l ),dν N ,m (ν l ))≥q 1−α,N,n,m if n is big enough. Note that it is a real improvement for the power.
Nonetheless, the upper-bounds for the L 1 -Wasserstein distance between the test statistic distribution and the subsampling distribution in Proposition 2.1 and Proposition 2.2 cannot be generalised easily.
The following experiments emphasize the fact that the subsampling distribution does not necessarily well approximate the distribution of the test statistic if n and l are different. For the parameters n = 20 and l ∈ {20, 50, 200}, we have repeated 1000 times the experiment consisting of computing the p-value of our statistical test from two 2000-samples on a spiral with shape parameter v = 10, subsampling N sub = 1000 times and with mass parameter m = 0.05. We sorted these p-values and plotted the associated cumulative distribution function. In this experiment, the hypothesis H 0 is satisfied, so the p-values should be uniformly distributed. Moreover they are independent. Thus, the curve we obtained should lie close to the diagonal. This is not the case when l is too far from n.
However, we get a power equal to 1 when choosing l = 200 or l = 50 instead of l = 20, which is much better than 0.884 which was obtained in Section 4.2 from the same experiment but with l = n = 20.
Such a procedure should not be used, despite the improvement of power. Indeed, we have not proved the existence of a non-asymptotic control of a distance between the distribution of the test statistic and the subsampling distribution. Moreover, the experiments emphasize that these distributions are too different to get a test of type-I error not greater than α.

The one-sample Kolmogorov-Smirnov test of uniformity applied to p-values
A major problem of the statistical test proposed in this paper is that the hypothesis retained truly depends on the arbitrary selection of the two n-samples to build the test statistic among the N n 2 possible pairs of n-samples. Indeed, the p-value defined in Section 2 is random in the sense that different p-values can be associated to the same two N -samples. Moreover, the power is not that high because of n, which can be very small in comparison to N .
As an example, in Figure 13, we split an N = 2000-sample X = {X 1 , X 2 , . . . , X N } from the distribution µ 10 on the spiral with shape parameter v = 10, into N n = 100 disjointed subsets with n = 20.
As well, we split Y, an N = 2000-sample from µ 10 into N n = 100 disjointed subsets Y 1 , Y 2 ,. . . YN n . Then, with the notation in the algorithm in Section 4.1, we consider the p-valuesp 1 , Note that the N n p-values would be independent if we replaced d 1 X ,m by d µ10 ,m in the computation of the test statistic, and if the subsampling distribution was replaced with the true distribution of the statistic. In practice, when N is big enough, we are close to these assumptions. Then the p-valuesp 1 ,p 2 ,. . .p N n should behave like independent random variables uniformly distributed on [0, 1].
In figure 13, we have sorted these N n = 100 p-values, which were built after repeating subsampling N sub = 1000 times and with mass parameter m = 0.05. They seem to be uniform on [0, 1]; indeed their associated cumulative distribution function lies close to the diagonal.
We use this randomness to propose the following method (DTM-KS) to improve the power of our statistical test. We apply a one-sample Kolmogorov  In Figure 16, we evaluate the type-I error and the power for this new method, with the same procedure as in Section 4.2 and the same parameters.
In Figure 17, we evaluate the type-I error and the power for the testing method (DTM-KS2) consisting in applying a one-dimensional Kolmogorov-Smirnov test of uniformity on [0, 1] to the p-valuesp 1 ,p 2 ,. . .p 100 , wherep i is obtained from the test statistic with X 1 , X 2 ,. . . X 100 and Y 1 , Y 2 ,. . . Y 100 independent n-samples without replacement from X and Y respectively. The procedure is the same as in Section 4.2 and the parameters are the same as well. These procedures lead to major improvements for the power, but the type-I error degrades.

Concluding remarks and perspectives
This paper opens a new horizon of statistical tests based on shape signatures. It could be of interest to adapt these kind of methods to other signatures, if possible. In future it could even be interesting to build statistical tests based on many different signatures, leading to an even better discrimination. Regarding the test proposed in this paper itself, the geometric and statistical problem of the choice of the best parameters to use in practice is still an open, tough and engaging question.
with Leb d the Lebesgue measure on R d .
We also define the medial axis of O, M(O) as the set of points in O having at least two projections onto ∂O. That is, Its reach, Reach(O), is the distance between its boundary ∂O and its medial axis M(O). That is, If K is a compact subset of R d , it is standard to define its reach as Reach (K c ), the reach of its complement in R d . See [25] to get more familiar with these notions.
In the following, ω d stands for Leb d (B(0 , 1 )), the Lebesgue volume of the unit d-dimensional ball.

A.1. The distance to uniform measures
Here, we derive some properties of the spaces (O, · 2 , µ O ). We give a lower bound for the minimum of the distance to the measure µ O and give a description of the points attaining this bound. First, we state some technical lemma proposed by Lieutier in [33]. We are going to show that this class contains a maximal element by using the Zorn's lemma. For this, we need to show that the partiallyordered set S is inductive, which means that any non-empty totally-ordered subclass T of S is bounded above by some element of S. Let T be a non-empty totally-ordered subclass of S. Set R = sup{r > 0 | ∃ y ∈ O, B(y, r) ∈ T } the supremum of the radii of all balls in T . Since T is non-empty and O is bounded, R is positive and finite. Let (y k ) k∈N be a sequence of centres of balls in T converging to a point y in R d such that the sequence of associated radii (r k ) k∈N is non decreasing with R as a limit. Since T is totally-ordered and the radii non decreasing, for every K ∈ N, k≤K B(y k , r k ) = B(y K , r K ) . Then, the union k∈N B(y k , r k ) is equal to B(y, R). Thus, B(y, R) belongs to S and upper bounds T . So the class S is inductive and thanks to the Zorn's lemma, it contains a maximal element. It follows that:  Proof. Note that for all positive l smaller than m, we have: Moreover In particular we get for these values of l that: More precisely, if the set O (m,O) is non-empty, then the minimal value of the distance to a measure is given by Moreover, the points at minimal distance are exactly the points of O (m,O) . This is Propo- Then, definition of the L 1 -Wasserstein metric as the L 1 -norm between the cumulative distribution functions yields that: Assume for instance that d min ≤ d min . Since cumulative distribution functions are nondecreasing, it comes that max(dmin,d min) Thus, when m is smaller than ω d 2 d , the DTM-signature discriminates between µ O and µ O . Moreover, the L 1 -Wasserstein distance between the signatures is bounded below by

A.3. The DTM-signature to discriminate between uniform and non uniform measures.
Proof of Proposition 3.7: As for Proposition A.1, we get that for any point x in O: which concludes.

C.1. A lemma
Lemma C.1 (Equality of empirical signatures under the isomorphic assumption). If (X , δ, µ) and (Y, γ, ν) are two isomorphic mm-spaces, then the distributions of the random variables are equal. Here the empirical measures are all independent and the measuresμ N andμ n are from samples from µ.

C.2. L 1 -Wasserstein distance between the laws of interest
Proof. Let (X 1 , X 2 , . . . X N ) be a N -sample of law µ, andμ N the associated empirical measure. We can upper bound the L 1 -Wasserstein distance between the subsampling distribution and the distribution of interest We bound the term C.1 by and the term C.3 by This is proved in the three following lemmas.

Lemma C.3 (Study of term C.3). We have
imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date:  Proof. To bound this L 1 -Wasserstein distance, we choose as a transport plan the law of the random vector withμ n ,μ n ,μ N −n andμ N −n independent empirical measures of law µ. Then the L 1 -Wasserstein distance is bounded by which is not bigger than: Lemma C.4 (Study of term C.2). We have Proof. Let π be the optimal transport plan associated to W 1 (d µ,m (µ), d µ,m (μ N )); see the definition of the L 1 -Wasserstein with transport plans. From a n-sample of law π, we get two empirical distributions d µ,m (μ n ) and d µ,m (µ * n ). Independently, from another n-sample of law π, we get d µ,m (μ n ) and d µ,m (µ * n ). The L 1 -Wasserstein distance is then bounded by Now notice that, if we denoteμ n = n i=1 1 n δ Yi and µ * n = n i=1 1 n δ Zi , we have: So, the L 1 -Wasserstein distance is not bigger than with (d µ,m (Y ), d µ,m (Z )) of law π, so we get the upper bound: imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date:  Lemma C.5 (Study of term C.1). We have Proof. It is the same proof as for the first lemma, except thatμ N is fixed.
Lemma C.6. Let ν, µ and µ be some measures over some metric space (X , δ), we have: Proof. We chose the transport plan (d µ,m (Y ), d µ ,m (Y )) for Y of law ν.
Thanks to Proposition 3.1 and to the fact that the distance to a measure is 1-Lipschitz, we can derive another upper bound depending only on the L 1 -Wasserstein distance between the measure µ and its empirical versions: The rates of convergence of the L 1 -Wasserstein distance between a Borel probability measure on the Euclidean space R d and its empirical version are faster when the dimension d is low; see [26]. Thus, we prefer to use the first bound for regular measures. In this case, we use rates of convergence for the distance to a measure, derived in [18]. For regular measures, in some cases, the bound in Lemma C.2 is better than the bound in Corollary C.1. for s ≤ t; see [4] or part 3.3 of [8]. Thanks to Theorem 2.8, p.23, in [7], since L 1 × L 1 is separable andμ n andμ n are independent, the random vector

C.3. An asymptotic result with
converges weakly to (G µ,m , G µ,m ) with G µ,m and G µ,m independent Gaussian processes.
Since the map (x, y) → x − y is continuous in L 1 , the mapping theorem states that √ n F dµ,m (μ n ) − F dµ,m (μn ) converges weakly to the Gaussian process G µ,m − G µ,m in L 1 .
Once more we use the mapping theorem with the continuous map x → x 1 and the definition of the L 1 -Wasserstein distance as the L 1 -norm of the cumulative distribution functions to get that: We then get the convergence of moments following the same method as for Theorem We deduce that: Moreover, we have the bound: → 0 when N → ∞, we have that: Finally, with the same arguments as for Lemma C.2, we get that: Since µ is compactly-supported d µ,m (µ) is also compactly supported. Moreover, Theorem 3.2 of Bobkov and Ledoux [8] states that for any probability P on R with cumulative distribution function given by F P , √  (μ N ,μ N ), L G µ,m − G µ,m 1 converges to 0 in probability.

Proof of Lemma 2.2:
Let < α and η be two positive numbers.

Proof of part 2 of Proposition 2.1:
We may assume that the diameter D µ of the support of the measure µ equals 1. Indeed, if we apply a dilatation to the measure to make the diameter of its support be equal to 1, then the quantity W 1 L N,n,m (µ, µ), L * N,n,m (μ N ,μ N ) is simply multiplied by the parameter of the dilatation. By using Corollary C.1 and Theorem 1 of [26], we have a bound for the expectation: for some positive constant C depending on µ.
for some positive constants C, C and C depending on µ.
We conclude the proof with the Borel-Cantelli Lemma.

C.5. The case of (a, b)-standard measures
Let µ be a Borel probability measure supported on a connected compact subset X of R d . We assume this measure to be (a, b)-standard for some positive numbers a and b. In this part, we derive rates of convergence in probability and in expectation for the quantity dμ N ,m − d µ,m ∞,X . Thanks to these results, we can derive upper bounds and rates of convergence in expectation for W 1 L N,n,m (µ, µ), L * N,n,m (μ N ,μ N ) . We finally propose a choice for the parameter N depending on n for which the weak convergence L N,n,m (µ, µ) In order to derive an upper bound for dμ N ,m − d µ,m ∞,X , like in [18], we use the fact that the function distance to a measure is 1-Lipschitz and that X is compact, which means that we can compute a bound by upper-bounding the difference |dμ N ,m (x) − d µ,m (x)| over a finite number of points x of X . Thanks to the following lemma, the minimal number of points needed for this purpose is not bigger than (4Dµ √ d+λ) d λ d : Lemma C.8. Let µ is a measure supported on X a compact subset of R d , and for λ > 0 denote N (µ, λ) = inf{N ∈ N, ∃ x 1 , x 2 . . . x N ∈ X , i∈[ [1,N ]] B(x i , λ) ⊃ X }. Then, we have: imsart-ejs ver. 2014/10/16 file: DTM_sample_comparison_Claire_Brecheteau_new_version.tex date:  Proof. The idea is to put a grid on the hypercube containing X with edges of length D µ . The grid is a union of small hypercubes with edges of length equal to λ √ d , so that the number of such small hypercubes into which the big one is split is not superior to Then, we decide that each time the intersection between X and some small hypercube is non-empty, we keep one of the elements of the intersection. We denote x i the element associated to the i-th hypercube. Finally, each point x in X belongs to a small hypercube, and its distance to the corresponding x i is smaller than d k=1 We thus derive upper bounds for dμ N ,m − d µ,m ∞,X : Proposition C.1 (Upper bound for dμ N ,m − d µ,m ∞,X ). We have, Proof. Since the function distance to a measure is 1-Lipschitz, we get that: for the family (x i ) i associated to a grid which sides are of length equal to λ In order to get upper bounds for E[ dμ N ,m − d µ,m ∞,X ], we use the same trick as used in [18], which is: Lemma C.9. Let X a random variable such that: From this lemma, we can derive the following lemma. together with the rates of convergence of the L 1 -Wasserstein distance between empirical and true distribution in [8] to get the following result. If m ≥ 1 2 , then for n big enough we have, for some constants depending on a and b:
For some κ = n γ with γ in 0, 1 2 to be chosen later, we first bound above the quantilê q 1−α with high probability.
Then, under the assumption W 1 (L √ nW 1 (d µ,m (μ n ), d µ,m (μ n )) , L * N,n,m (μ N ,μ N )) ≤ κ, we have W 1 (L( G µ,m − G µ,m 1 ), L * N,n,m (μ N ,μ N )) ≤ κ + 1. We can do the same thing for ν. Thus we get that for n big enough and under the previous assumptions: We need to notice that with similar arguments as for Lemma C.2, we have: