Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology

This work is motivated by the study of local protein structure, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space $\mathcal{P}(\mathbb{R}^2/\mathbb{Z}^2)$ with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions. We extend existing results of the theory of Optimal Transport to the $d$-dimensional flat torus $\mathbb{T}^d=\mathbb{R}^d/\mathbb{Z}^d$, in particular a Central Limit Theorem. Moreover, we propose different approaches for two-sample goodness-of-fit testing for the one and two-dimensional case, based on the Wasserstein distance. We prove their validity and consistency. We provide an implementation of these tests in \textsf{R}. Their performance is assessed by numerical experiments on synthetic data and illustrated by an application to protein structure data.


Introduction
When it comes to measure the distance between two probability distributions, the well known Wasserstein distance, derived from the theory of Optimal Transport (OT), provides both strong theoretical guarantees -it metrizes weak convergence [66]-and attractive empirical performance [50]. Most of the applications of such theory are related to the very active field of machine learning, notably in the framework of generative networks [2], robustness [59] or fairness [21], among others. From a statistical point of view, one of the main caveats of the theory of OT comes from the curse of dimensionality: the rate of convergence of the empirical Wasserstein distance decreases as n −1/d with the dimension [26]. Another important issue is the asymptotic behavior of the fluctuations of the empirical optimal transport cost. For probability measures supported in R d , it has been proved, using Efron-Stein's inequality that, for the cost L 2 , the difference √ n(W 2 2 (P n , Q) − EW 2 2 (P n , Q)) is asymptotically Gaussian [22]. Recently, the proofs have been extended to some general costs in R d , including the cost L p , for p > 1 [19]. Concerning statistical goodness-of-fit tests based on Wasserstein distance, the one-sample case has already been addressed in [30] and, when the probability distributions are defined over R, two-sample tests can be derived from [17,47].
In this paper, we focus on the d-dimensional flat torus T d := R d /Z d where, even from the purely theoretical point of view, OT has not been completely addressed, besides the work in [14], [42] or, more recently, in [39]. However, this space appears naturally when the probability measures are periodic (e.g. for distributions of angles). The main objective of this work is (1) to extend recent existing OT results to the space of probability measures on the flat torus P(T d ), especially a Central Limit Theorem (CLT) for the fluctuations of the empirical optimal transport cost, and (2) to address in particular the two-dimensional case, by constructing two-sample goodness-of-fit tests based on the Wasserstein distance.
Our motivation for extending the theory of OT to T 2 comes from the investigation of proteins. Understanding the relationships between protein sequence, structure and function is the main goal of Structural Biology. In addition to its scientific importance, a better understanding of these relationships is essential for applications in diverse areas, such as biomedicine and biotechnology. The conformational state of a protein can be defined by a vector of angles, corresponding to rotations around the chemical bonds between the atoms that constitute its "backbone". This vector contains two values per amino-acid, ϕ and ψ, which follow a certain distribution, and which are usually represented using the so-called Ramachandran plots [53] (see also Figure 3). The analysis of these distributions has several important applications, such as the validation or refinement of protein structures determined from biophysical techniques [46,38], the prediction of some biophysical measurements to complement experiments [60], and the development of potential energy models or scoring methods for protein structure modeling, prediction and design [4,55,63].
In this context, the definition of a suitable distance between distributions on T 2 is essential. This would allow to quantify the expected magnitude of structural effects associated with local changes in the sequence, and therefore to develop improved versions of the aforementioned modeling and prediction techniques. Nevertheless, this has not been done satisfactorily in previous works. For example, significant differences between two laws are stated after visual comparison of two empirical distributions in [55] and [60], and the Hellinger distance is used to compare distributions on a non-periodic [−π, π] × [−π, π] in [63]. Powerful statistical tests remain to be defined and implemented in order to state such differences, being based on a metric that takes geometry into consideration. As many other commonly-used metrics, Hellinger distance ignores the underlying geometry of the space. Here, we propose to use the Wasserstein distance, whose advantageous geometrical and mathematical properties are described in [50], [65] and [66], to define goodness-of-fit testing techniques for two measures on T 2 , allowing a more accurate study of the distribution of protein local conformations.
The paper is organized as follows: • Section 2 starts by introducing the general framework of measures on the flat torus in general dimension, followed by the precise formulation of the optimal transport problem. Section 2.1 is devoted to the study of the shape of the solutions, recalling that they are the gradients of periodic convex functions and showing the uniqueness of the potential in Corollary 2.2. Section 2.2 proves through Theorem 2.5 that the optimal transport potentials converge, up to an additive constant, when the measures converge weakly. This result implies that the method of [22] based on Efron-Stein's inequality can be applied to derive a Central Limit Theorem, see Theorem 2.6 in Subsection 2.3. Finally, we show how the previously defined CLT does not allow the definition of an asymptotic test. • Section 3 shows how Wasserstein distance can be used to define two-sample goodness-of-fit tests in the two-dimensional flat torus. We propose two testing approaches. The first one, introduced in Subsection 3.1, consists in testing the equality of two measures projected into a finite number of closed geodesics on T 2 . The second, presented in Subsection 3.2, is a conservative procedure based on upper-bounding the exact p-values. This is possible thanks to a concentration inequality given in Theorem 3.7, together with faster convergence rates for the expectation. • Section 4 reports numerical experiments illustrating the relevance of these theoretical results, first with synthetic data and then with real data from protein structures, showing that our methods behave well in both cases.
To facilitate reading, the proofs are relegated to the Appendix, but in some cases the intuition behind the proof is provided in the main text for clarity.

Optimal transport in
For each x ∈ R d we denote asx ∈ T d its equivalence class and reserve the notation τ for the canonical projection map x → τ (x) =x.
The topology of the quotient space is defined as the finest one that makes τ continuous. With this topology, the space T d is a Polish space with the distance derived from the Euclidean norm ∥ · ∥, Note that the last claim is true since the projection map τ is in fact a metric identification, (R d , ∥ · ∥) is a Banach space and Z d is a closed subset, then it is complete, metrizable through d and separable. Set p > 1. For two probability measures P, Q ∈ P(T d ), a probability measure π ∈ P(T d × T d ) is said to be an optimal transport plan for the cost d p between P and Q if it solves The Kantorovich problem (2.2) can be formulated in a dual form, as follows The element ψ ∈ L 1 (P ) is said to be an optimal transport potential from P to Q for the cost d p if there exists φ ∈ L 1 (Q) such that the pair (ψ, φ) solves (2.3). Recall from [66] that the solutions of (2.3) are pairs (f, g) of d p -conjugate d p -concave functions. This means that Furthermore, since T d is a Polish space, then Theorem 4.1 in [66] implies that there exists a solution π * of (2.2). Additionally, Theorem 5.10 in [66] establishes that supp(π * ) is d p -cyclically monotone. This means that for any finite sequence {(x k , y k )} n k=1 ⊂ supp(π * ) and any bijection σ : {1, . . . , n} → {1, . . . , n}, the following inequality holds: Note that, if Q is a probability measure in T d , its support is defined as the closed set supp(Q) ⊂ T d composed byx ∈ T d such that for any neighborhood Ux ofx it holds that Q(Ux) > 0. The interior of the support is denoted by X Q .
With the same obvious notation we can define a ∥ · ∥ p -cyclically monotone set. Note that for p = 2, ∥ · ∥ 2 -cyclical monotonicity is equivalent to the concept of cyclical monotonicity in convex analysis, described in [56]. Recall that a set A ⊂ R d ×R d is cyclically monotone if for every finite sequence {(x k , y k )} n k=1 ⊂ A and every bijection σ : {1, . . . , n} → {1, . . . , n} it holds that n k=1 ⟨x k , y k ⟩ ≥ n k=1 ⟨x k , y σ(k) ⟩.
Consequently, the concept of d p (resp. ∥·∥ p ) -cyclical monotonicity is the natural generalization, to other spaces and costs, of cyclical monotonicity.
In some cases, that we will study later on, there exists some measurable map T such that the optimal transport plan π satisfies π = (I × T ) #P , where the symbol T #P denotes the push forward measure of P through T , which is defined by T #P (A) := P (T −1 (A)), for all measurable A ⊂ T d , and I denotes the identity map. Therefore, the problem becomes equivalent to the following Monge formulation: (2.5) 2.1. Existence of ∥ · ∥ p -cyclically monotone mappings.
A cyclically monotone map is the natural generalization of a non decreasing function in the real line (as being the gradient of a convex function, see [56]). Cyclical monotonicity provides a powerful tool for statistical studies, see [30,18,13] among others. The existence of cyclically monotone maps between probability measures in R d has been investigated, in parallel, by [15] and [12], with the restrictive assumption of finite second order moment, relaxed in [43]. For periodic measures, the celebrated result of [14] showed the existence. The concept of cyclically monotone map also appears naturally when solving an optimal transport problem with quadratic cost in R d . Therefore, for any potential cost ∥ · ∥ p , the natural generalization is the one of ∥ · ∥ p -cyclically monotone. In fact, [28] proved the existence of a ∥·∥ p -cyclically monotone mapping between probability measures with finite moment of order p > 1. To the authors' knowledge, no previous work has dealt with the existence of ∥ · ∥ p -cyclically monotone mappings between periodic probability measures. Consequently, the main result of this section is Theorem 2.1, which shows the existence and uniqueness of a ∥ · ∥ pcyclically monotone preserving map S p between periodic measures, for p > 1, and relates it with the solution of (2.5). Then, Theorem 2.2 guarantees, under certain assumptions of regularity on the support of P , that the solution of (2.3) is unique up to an additive constant. Note that, in practice, a probability P ∈ P(T d ) defines a periodic measure periodic if it is the natural extension of some probability measure P ∈ P(T d ).
As anticipated, the goal of this section is to show the existence of ∥· ∥ p -cyclically monotone mappings between two periodic measures µ P , µ Q ∈ M(R d ) absolutely continuous w.r.t. the Lebesgue measure on R d , denoted as µ P , µ Q ≪ ℓ d . As commented before, [14] established the existence of a ∥ · ∥ 2 -cyclically monotone map (which a.s. is the gradient of a convex function φ) such that ∇φ#µ P = µ Q . Theorem 1.25 in [57] entails that there is a unique solution of the Monge problem in the torus, described by the relation T = x − ∇f (x), where the sum is to be intended modulo Z d and f is an optimal transport potential for the quadratic cost. Note that this is a quite similar relation (between potentials and transport) to the one in the quadratic transport problem in R d .
The proof of Theorem 2.1 starts by realizing that since T d is a Polish space, then Theorem 4.1 in [66] implies that there exists a solution π * of (2.2). Furthermore, Theorem 5.10 in [66] establishes that supp(π * ) is d p -cyclically monotone, which implies that the set (2.6) is cyclically monotone. Corollary 3.5 in [28] implies that this cyclically monotone set is contained in the graph of a ∥ · ∥ p -differential of a ∥ · ∥ p -concave function φ p (defined as in (2.4) but replacing d p with ∥ · ∥ p ).
In conclusion, the a.s. uniqueness of this ∥ · ∥ p -differential ends the proof.
Theorem 2.1. Let P, Q ∈ P(T d ) be probability measures such that µ P ≪ ℓ d . Then, there exists a unique solution T p of (2.5). Moreover, there exists a µ Pa.e. defined ∥ · ∥ p -cyclically monotone map S p such that • the relation T p • τ = τ • (S p ) holds µ P -almost surely, • and S p #µ P = µ Q .
The following result gives the uniqueness, up to additive constants, of the optimal transport potential, where the assumptions are given with respect to its associated periodic measures. In particular, we need to have negligible boundary of µ P which means that the boundary of its support has Lebesgue measure 0, ℓ d (∂ supp(µ P )) = 0. The proof investigates the intrinsic relation between the optimal transport potentials and the previously described T p , which allows the use of general results for the uniqueness of ∥ · ∥ p -concave functions (see [19]) which have the same gradient a.s. in a connected domain of R d .
Theorem 2.2. Let P, Q ∈ P(T d ) be probability measures with connected support such that their associated periodic measures satisfy µ P , µ Q ≪ ℓ d with negligible boundary. Then, there exists a unique, up to an additive constant, d p -concave function f p solution of (2.3).
The assumption of connected support can be relaxed, via [62,Theorem 2], to the setting where both measures have disconnected support. If the supports of µ P and µ Q decompose into closures of connected open components where I is finite index set and J is a countable index set, then, assuming for all non-empty proper I ′ ⊂ I and J ′ ⊂ J that The importance of Corollary 2.3 mainly lies in that it enables the study of the asymptotic behavior of the potential, allowing us to apply Arzelá-Ascoli like reasoning, as explained in the following section.

Asymptotic behaviour
This section deals with the asymptotic properties of the transport map and potentials. We consider two sequences of probability measures {α n } n∈N , {β n } n∈N ⊂ P(T d ) converging weakly to P and Q respectively, α n w − → P and β n w − → Q.
Since T d is compact, here the weak convergence is in the sense that for every continuous function h ∈ C(T d ), h(x)dα n (x) → h(x)dP (x). Once again, thanks to that compactness the existence of moments of any order is always fulfilled for P ∈ P(T d ). As a consequence, Theorem 7.12 in [65] implies that α n w − → P if and only if the p-Wasserstein distance W p (α n , P ) := (T p (α n , P )) 1 p tends to 0. An analogous reasoning implies the convergence T p (α n , β n ) → T p (P, Q) for the two-sample case.
The idea of this section is to take advantage of the fact that any d p -concave function f is continuous whereby it is finite. Moreover, it has bounded continuity modulus, so we can apply Arzelá-Ascoli's Theorem by fixing the constants.
, with respect to the metric (2.1). The proof of the next Theorem first proceeds by choosing the sequence {a n } n∈N to guarantee the uniform boundedness of the sequence {(f n , g n )} n∈N of solutions of (2.3). This, together with Lemma 2.4 and Arzelá-Ascoli's Theorem, implies that {(f n , g n )} n∈N is relatively compact. The uniqueness of solutions of (2.3), described in Theorem 2.2, allows us to conclude. Theorem 2.5. Let P, Q ∈ P(T d ) be probability measures with connected supports whose associated periodic measures satisfy µ P , µ Q ≪ ℓ d with negligible boundary. Let {α n } n∈N and {β n } n∈N ⊂ P(T d ) be two sequences of probability measures converging weakly to P and Q respectively. Denote by (f n , g n ) (resp. (f, g)) the solution of the dual problem between α n and β n (resp. P and Q). Then there exists a sequence of real numbers {a n } n∈N such that f n + a n → f uniformly on the compact sets of X P .

Asymptotic normality
This section is devoted a proof of a Central Limit Theorem (CLT) for the fluctuations of the empirical optimal transport cost. Recall that the previous section proves that, under certain regularity assumptions, there exists a unique optimal transport potential from P to Q. Let f p be such a potential. We will use Efron-Stein's inequality to derive that √ n (T p (P n , Q) − ET p (P n , Q)) w −→ N (0, σ 2 p (P, Q)), with σ 2 p (P, Q) = Var(f p (X)). (2.9) Then, we will see that the same holds in the two sample case. The idea is not new: it has already been used with the same goal in [22] for the quadratic cost in R d , and in its extension to general costs in [19]. Moreover, when using regularized optimal transport, [45] showed that the same technique can be applied. A similar result, but using the idea in [20] of differentiating the supremum in the functional sense by applying the general result of [16], yields also a CLT on the torus for p ≥ 2, see [34].
Theorem 2.6. Let P, Q ∈ P(T d ) be probability measures with connected supports such that their associated periodic measures satisfy µ P , µ Q ≪ ℓ d with negligible boundary. Then, for any p > 1, we have and, if m = m(n) satisfies that m −→ +∞ and n n+m → λ ∈ (0, 1) as n → ∞, where σ 2 p (P, Q) and σ 2 p (Q, P ) are defined in (2.9) and satisfy It is clear that the limit of Theorem 2.6 degenerates to 0 when P = Q. Suppose now that P ̸ = Q are satisfying the assumption of Theorem 2.6. The limit, in the one-sample case, is degenerate if and only if Var(f p (X)) = 0. Since the optimal transport potentials are unique up to additive constants, see Theorem 2.2, we can suppose that E(f p (X)) = 0. Thus, the degeneracy is equivalent to E(f p (X) 2 ) = 0, hence f p = 0 P -a.s. and the same holds for f d p p . This implies, in particular, that W p (P, Q) = 0 which occurs only if P = Q.
Our initial motivation to prove Theorem 2.6 was to find an asymptotic distribution of T p (P n , Q m ) allowing the definition of a two-sample goodness-of-fit test. Even for measures supported on the real line, the only asymptotic results account for the case P ̸ = Q, providing the asymptotic behaviour of the Wasserstein statistic under the alternative hypothesis. The idea of switching H 0 and H 1 and testing for similarities has been studied in several previous works, all considering measures supported on R. Gaussian deviations from the true distance T 2 (P, Q) are proved in [21], which allows testing of T 2 (P, Q) ≥ ∆ 0 , for a given threshold ∆ 0 . In the same way, the earlier work [27] introduced such an asymptotic test for assessing similarities based on the trimmed Wasserstein distance, allowing sample dependency.
Unfortunately, the same strategy can not be applied in our case, as the derived CLT for measures supported on T 2 (Theorem 2.6) only states Gaussian deviations from the mean. Indeed, if we use (2.10), we could consider the statistic where, in practice, the variance and expectation could be estimated by bootstrapping the given samples (as long as bootstrap consistency is ensured). The recent works of [33] and [39] show that, in small dimension -d = 2, 3 and at most 4-, the value ET p (P n , Q m ) can be substituted by the population T p (P, Q).
That gives rise to When P ̸ = Q, the statistics in (2.11), (2.12) converge in law to a standard Gaussian distribution. This is illustrated in Figure 8. However, one would expect the statistic to be stochastically larger under P ̸ = Q than under P = Q, allowing the distinction of the null and the alternative hypotheses. Nevertheless, due to the aforementioned degeneracy of Theorem 2.6 when P = Q, this condition fails to be satisfied and no asymptotic test can be implemented from this result. Further discussion about this issue can be found in Section 5. Therefore, the rest of this paper is devoted to alternative approaches to define suitable two-sample goodness-of-fit tests for measures supported on T 2 .

Two-sample goodness-of-fit tests
Let us first formulate the problem. Denote by (X 1 , . . . , X n ) and (Y 1 , . . . , Y m ) two independent and identically distributed random samples of laws P, Q ∈ P(T 2 ) respectively, and by P n , Q m their corresponding empirical probability measures. We aim to test via the the definition of a statistic T nm = T (P n , Q m ), representing an estimate of discrepancy between P n and Q m , together with the critical region where x i (resp. y j ) denotes a realization of X i (resp. Y j ) for i = 1, . . . , n (resp. j = 1, . . . , m). The critical value c nm (α) in (3.2) is given for a fixed significance level α by where F nm is the distribution function of the statistic T nm under H 0 . We are therefore considering the test Equivalently, a p-value for this test is p nm = 1−F nm (T nm ). Ideally, we would like T nm to be T p (P n , Q m ). However, knowing the distribution of the latter statistic under H 0 remains an open problem. The one-sample case in R d has recently been addressed in [30], but approaches for two-sample testing in arbitrary dimension, and for measures on more general spaces, have not already been proposed to the best of our knowledge. The lack of solutions may be explained by the intrinsic difficulty of characterizing the distribution of T p (P n , Q m ) when P = Q especially when the dimension is larger than one. In the next subsections, we propose two alternative approaches to define (3.4), both based on the 2-Wasserstein distance, that allow two-sample goodness-of-fit testing for measures on T 2 .

Geodesic projections into R/Z
Our first approach for testing the equality of two measures P , Q on R 2 /Z 2 is to test the equality of their geodesic projections. This bypasses the dimension problem and allows the implementation of testing techniques based on Wasserstein distance for one-dimensional spaces. Geodesics on T 2 are the images by the canonical projection τ of straight lines on R 2 [9]. Lines with irrational slope map to geodesics which are dense on T 2 , and only lines with rational slope map to closed geodesics on the torus, which are closed spirals isomorphic to R/Z (see [9, Figure VII.10] for an illustration). The strategy is to project P n and Q m into N g closed geodesics, and to test the equality of each pair of projected measures, which will be supported on R/Z. These geodesics can be chosen a priori by the practitioner, or sampled from the set of all closed geodesics on T 2 . We propose a sampling method in Appendix A.1. This method prioritizes simpler geodesics (that is, with a smaller number of revolutions over the torus) in order to ease computational implementation. The algorithm we used to project samples on T 2 to a given geodesic is described in Appendix A.2. To avoid repetition of the same test, and to ensure independence between the computed p-values, we require all the N g geodesics to be different.
In this section, we propose a two-sample Wasserstein test to assess the equality of two measures supported on the circle, and state how to combine the resulting N g -tuple of p-values into a global p-value for the bi-dimensional problem. From now on, to simplify notation, we will denote by T 2 any squared Wasserstein distance, the ground space being inferred from the corresponding measures.

Two-sample goodness-of-fit test on R/Z
Optimal Transport on the circle has been recently studied in detail in [32], where the limit laws of the one and two-sample empirical Wasserstein distance for measures on R/Z are derived. However, the considered statistics are not distribution-free, so that only one-sample goodness-of-fit tests can be derived from these results. Still, the authors of [32] also propose a b-out-of-n bootstrap approach, for b = o(n), to define a two-sample goodness-of-fit test. Unfortunately, type I error fails to be controlled since the bootstrapped p-value under the null hypothesis is (substantially) stochastically smaller than a uniform random variable. This can be observed by simple numerical experiments based on the implementation proposed by [32], for example by comparing two equallysized samples from a Uniform distribution. We believe that this is due to a lack of consistency of the two-sample bootstrap for the proposed statistic. In order to bypass this issue, we now propose a convenient alternative approach based on a distribution-free two-sample statistic.
Let P c , Q c ∈ P(R/Z) and P c n , Q c m be their corresponding empirical probability measures. We aim to test If R/Z is parameterized by the set [0, 1) with the geodesic distance the cumulative distribution functions of P c , Q c , denoted as F , G respectively, can be defined as in [32] as Then, we can write where the pseudo-inverse is defined as H −1 (s) = inf{t : H(t) > s}, for any distribution function H. The formulation (3.6) was first proved in [52] for discrete measures, and extended to arbitrary measures in [23]. It shows how the Optimal Transport problem on the circle reduces to the same problem on [0, 1) ⊂ R if both measures are relocated on the real line choosing as origin the minimizing element α. This is well illustrated in [32]. We first remark that if one of the two measures is the uniform law on R/Z, the infimum on (3.6) has an explicit formulation.
Lemma 3.1. Let P c ∈ P(R/Z), and F be its cumulative distribution function. Let U be the uniform distribution on R/Z. Then, where the optimal origin is given by If we replace P c and F by their empirical counterparts, P c n and F n , Lemma 3.1 allows the definition of the statistic T 2 (P c n , U ), which is distribution-free when P c = U . Lemma 3.2. Let P c ∈ P(R/Z), P c n be its empirical probability measure, and U be the uniform distribution on R/Z. Then, if P c = U , where B is a standard Brownian bridge, and the weak convergence is understood as convergence of probability measures on the space of right-continuous functions with left limits.
Lemma 3.2 can be used to define of a one-sample goodness-of-fit uniformity test, based on the squared Wasserstein distance on the circle. This would complement the work in [32], where such a test was introduced for the 1-Wasserstein distance. As our aim here is to define a two-sample test, we adapt the idea of [54] to compare two measures on the circle, by considering the 2-Wasserstein distance between G −1 m (F n ) and the uniform distribution. We can therefore consider the statistic which is also distribution-free when P c = Q c . The following result is the counterpart of Lemma 3.2.
Proposition 3.3. Let P c , Q c ∈ P(R/Z), having continuous and strictly increasing cumulative distribution functions. Let P c n , Q c m be their corresponding empirical probability measures, and F n , G m be their empirical cumulative distribution functions. If n m → λ when n, m → ∞ for some λ ∈ [0, ∞) then, under P c = Q c , it holds that Consequently, with the notation of the beginning of Section 3, we propose the test where the critical value c c nm (α) is given by with F c nm denoting the distribution function of T c nm under H 0 . Equivalently, a pvalue for this test is p c nm = 1−F c nm (T c nm ). Following Proposition 3.3, the critical value or, equivalently, the p-value for a given sample, can be approximated with arbitrary precision using a Monte Carlo algorithm. The following result guarantees the consistency of (3.8).
These individual p-values can be aggregated as follows: This aggregation is akin to the Bonferroni correction for Family Wise Error Rate (FWER) control in multiple testing [6]. As such, p Ng defined in (3.10) is a valid p-value for the two-dimensional test, regardless of the possible dependencies between the N g individual p-values. This implies that the two-dimensional test controls the type I error for any α > 0 (see Appendix B.2 for a proof). Regarding consistency under fixed alternatives, by construction, (N g -geod) will fail to detect differences between two measures on T 2 whose projected distributions are identical for all the N g geodesics considered. Therefore, π g nm,Ng will not be consistent under such alternatives, which, are arguably very unlikely in practice if N g is large enough. Otherwise, consistency is guaranteed.
Remark 3.6. The assumption in Proposition 3.3 that the projected measure P c ∈ P(R/Z) has continuous and strictly increasing cumulative distribution function is satisfied if the underlying measure P ∈ P(T 2 ) satisfies µ P ≪ ℓ 2 . See Appendix B.2 for a proof.
The time complexity of (N g -geod) is O(n + m). Indeed, n + m operations are needed to compute G m (F −1 n (t)) and F −1 n (G m (t)) for a given t. Therefore, computing the test statistic (3.7) can be done in O(n + m) operations, where the complexity constant depends on the number of subdivisions of [0, 1] set by the numerical integration method chosen to compute (3.7). Moreover, the time complexity of the algorithm described in Appendix A.1 to sample closed geodesics is also O(n + m) in practice, as a consequence of the distribution from which the geodesics are drawn. This is empirically illustrated in Figure 9.

p-value upper bounding
If we set T 2 (P n , Q m ) as the statistic T nm for the test (3.4), the p-value for a given sample would be given by where t nm denotes the statistic realization. The goal of this section is to find an upper bound for (3.11), which will itself be a valid p-value for (3.4) if it controls type I error (that is, if it remains with probability 1 − α over a fixed significance level α under H 0 ). We will also require the power of the corresponding test to tend to 1 under fixed alternatives. We start by upper bounding the deviations of the statistic from the mean. Using McDiarmid's inequality [44], we obtain the following result, which extends to the two-sample case the inequality in [67,Proposition 20], for the quadratic cost.
Theorem 3.7. Let P, Q ∈ P(T 2 ) and P n , Q m be two empirical probability measures of laws P , Q respectively. Then, for all t ∈ R, we have After that, we study the convergence speed of the expectation under the null hypothesis. Using directly the results exposed in [26], only bounds of order If Assumption 1 is satisfied, then from Lemma B.1 and Theorem 6.3. in [1] we can derive the following asymptotic bound for the two-sample null expectation. (3.14) Note that Assumption 1 is not especially restrictive. It is satisfied by any continuously differentiable density, whose connected support can be locally given by the graph of a continuously differentiable function. Examples include bivariate von Mises distributions or uniform distributions in connected smooth sets.
The idea to define the test is to combine Theorem 3.7 with Lemma 3.8 and upper bound (3.11) for sufficiently large sample sizes. If we take the limit for the expectation in (3.12) under the null, we have the following result. Proposition 3.9. Let P, Q ∈ P(T 2 ) and P n , Q m be two empirical probability measures of laws P , Q respectively. For all ε > 0, there exists N ε ∈ N such that for all n, m ≥ N ε , we have For a fixed ε > 0, the bound (3.15) can be used to define a test (3.4) for any α > 0 as follows: By Proposition 3.9, the test (UB) will control type I error for all n, m ≥ N ϵ . In practice, the threshold N ε depends on the unspecified constant hidden in (3.14), which is dragged from the results in [1]. The following result shows that, nevertheless, asymptotic consistency at level α of (UB) is guaranteed.
The last result ensures asymptotic consistency at level α if the two compared measures are further than ε in the squared 2-Wasserstein distance. This can be used to calibrate the sensibility of (UB) if the practitioner possesses some prior information about the differences that the test should accept. This would ensure smaller N ε without implying a power decrease. For the simulation and case studies presented here, we will set ε to the machine precision ε m = 2.2·10 −16 (for a standard double-precision floating-point format). The corresponding N ε should be affordable thanks to Lemma 3.8, responsible of the satisfactory power of (UB). Due to the improved convergence speed of the expectation, we will have sharp bounds (3.15) for reasonable sample sizes, allowing the detection of differences for our practical purposes. This is illustrated in Section 4.2.
The computational complexity of (UB) is given by the numerical algorithm solving the Optimal Transport problem. Here, we used the Fast Network Simplex for Optimal Transport [8], which has O((n + m) 2 ) time complexity and O((n + m) 2 ) memory cost, due to the cost matrix computation.

Numerical experiments
This section is devoted to assess the performance of the two-sample goodnessof-fit tests (N g -geod) and (UB), and to show how they can be implemented to evaluate differences on protein structure data. In Section 4.1 and 4.2, we evaluate the relative efficiency of both tests, comparing their performance with other methods not based on Optimal Transport. Section 4.3 illustrates one possible application to protein structure investigations, by stating statistical evidence of nearest neighbors effects on local protein conformations.

Small-sample performance
To make an informative analysis of the performance of tests (N g -geod) and (UB), we studied how their power function behaves for alternatives converging to the null hypothesis. We also assessed whether the proposed approach to define a Wasserstein test on the circle contributes to a better power. In particular, we compared the power function of (N g -geod) with variations of the same test. On the one hand, to evaluate whether the choice of an optimal origin to relocate the measures on [0, 1) is advantageous, we considered the same statistic (3.7) but with α 0 being random and uniformly chosen in [0, 1]. It is easy to check that the modified statistic is distribution-free under the null, by proceeding analogously to Proposition 3.3. On the other hand, to study whether the use of Wasserstein distance for the one-dimensional statistic contributes to a better power, we relocated the measures in [0, 1) (again after choosing a random origin on R/Z) and compared them with the well-known Anderson-Darling two-sample statistic. To study the effect of the number N g of geodesics, we performed the test (N g -geod) for N g ∈ {2, 3, 4, 5}. We also compared all the previous approaches with the two-dimensional extension of the Kolmogorov-Smirnov two-sample test proposed by Fasano and Franceschini [24], defined for measures supported on R 2 . This allows the assessment of whether taking into account the geometry of the underlying space contributes to a better performance.
For the small-sample case, we compared samples of size n = m = 50 drawn from a bivariate von Mises (bvM) distribution [41] of means µ = ν = 0.5, and concentration parameters κ 1 , κ 2 , κ 3 with equally-sized samples drawn from a uniform distribution on T 2 . The density of the bvM cosine model is given by where the explicit form of the normalization constant c(κ 1 , κ 2 , κ 3 ) is stated in [41]. The null hypothesis corresponds to the case κ 1 = κ 2 = κ 3 = 0. For the converging alternatives, we distinguished two scenarios:  The first conclusion that we can state after Figure 1 is that the test (UB) has zero power for small sample sizes. This was expected by Proposition 3.9, as large values of n, m are required to ensure sharp bounds. However, some interesting conclusions can be extracted regarding the other tests. First, the test (N g -geod) has power α under H 0 . Indeed, further simulations confirmed that the approach described in Section 3.1.2 ensures the uniformity of the combined p-value's null distribution. Together with the illustrated consistency of test (N g -geod), we can observe the considerable gain in power when comparing measures with the Wasserstein statistic (3.7) by choosing an optimal origin on the circle. The choice of a random origin ('Naive W-geodesic' curve) or the use of techniques that do not rely on Optimal Transport ('AD-geodesic' or Fasano-Franceschini curve) notably reduce the test power, specially when differences are presented on the dependence structure (Figure 1b). Finally, the choice of the number N g of geodesics seems to have an effect on power. As one could have expected, increasing the number of geodesic projections improves the test's ability to detect slighter differences. Consequently, the practitioner is entitled to indefinitely increase N g , paying back on computation time (or implementation complexity, if geodesics are randomly chosen, see Appendix A.1).

Asymptotic performance
This section is devoted to assess the suitability of the upper bound testing technique (UB) when large sample sizes are available. Here, we studied the relative efficiency of tests (UB) and (N g -geod) for the same converging alternatives as in Section 4.1, with n = m ∈ {1000, 1500, 2000}. Results are shown in Figure 2, where the parameter of interest (κ 1 = κ 2 or κ 3 ) took values in {0.1, 0.2, . . . , 4}, and the empirical power was estimated as the proportion of rejections at level α = 0.05 among 1000 repetitions of each test. Figure 2 shows that the test (UB) is powerful when sample sizes are large enough. As its corresponding p-value has been defined as an upper bound of the actual p-value (3.11), it will be quite a conservative test and, therefore, relatively less efficient than (N g -geod). This is illustrated in both panels. In any case, the test (UB) can be useful in practice. Besides the detection of big differences, the practitioner may be interested in the acceptance of small and controlled discrepancies between samples, which may be due, for instance, to experimental inaccuracies. In scenarios where a less conservative method as (N g -geod) may detect such differences, one might prefer to rely on a test method that allows slight dissimilarities and stands out only the more relevant ones. Consequently, even if the test (UB) is clearly less efficient than our first candidate (N g -geod), we believe it can be of interest in some practical scenarios, such as several situations appearing in Structural Biology problems. This is further discussed in Section 5.

Application to protein structure analysis
A method to accurately compare local structural preferences in conformational ensemble models of proteins is useful to investigate sequence-structure-function relationships, allowing for instance to understand the effect of mutations. The local structure of a protein is determined by two dihedral angles usually denoted by ϕ and ψ, which describe the conformational state of each amino acid residue along the sequence [11,37]. For most amino acid types (for all excepting proline and glycine), the distribution of ϕ and ψ angles is supported on the same subset of T 2 , which, even if there exist some physically forbidden regions due to strong repulsive forces between non-bonded atoms at short distance, is connected and has a smooth boundary. We can also assume that density is continuously differentiable and strictly positive in its support, so that Assumption 1 is satisfied. The aim of this section is to make use of the tests (N g -geod) and (UB) to show that the distribution of (ϕ, ψ) does not depend only on the amino acid type, but also on the sequence context, and particularly on the closest neighbors. This corresponds to rejecting Flory's isolated-pair hypothesis [25]. Even if the importance of the closest neighbors effect is widely accepted in the Structural Biology community [35,29,4,55,63], only purely descriptive methods have been employed to state so, and no goodness-of-fit techniques have been used to the best of our knowledge. For a given amino acid C, we denote by P C the distribution of (ϕ, ψ) supported on T 2 . If we take into account the identities L, R of C's left and right neighbors, the distribution of (ϕ, ψ) is now given by P LCR . The objective is to test to assess whether nearest neighbors significantly affect dihedral angles distributions. An example of two samples drawn from P C and P LCR is depicted in  . For the analysis presented here, we used a structural database of three-residue fragments (also called tripeptides) extracted from experimentallydetermined high-resolution protein structures [48]. The large available sample sizes allow us to illustrate the asymptotic behaviour of (UB). We selected the 71 tripeptides L-C-R for which the database contained more than 3000 points. For each one, we compared the corresponding sample of (ϕ, ψ) values with an equally-sized sample drawn from P C (sampled from the sub-database containing (ϕ, ψ) values from tripeptides having C as central amino-acid). The data were rescaled to [0, 1] × [0, 1] before applying the tests. As the p-values for the test (N g -geod) are computed by Monte Carlo simulation, they are lower-bounded by 1/N M C , where N M C is the number of Monte Carlo replicas [51]. This point is important here, as due to the large number of performed tests, we had to correct p-values for multiplicity [31]. The results are depicted in Figure 4, where we show the empirical cumulative distribution function of both tests' corrected p-values, for three increasing ranges of sample sizes. From Figure 4, we can state that the geodesic projection test (N g -geod) strongly rejects the null hypothesis at level α = 0.05 for the three considered sample size ranges, being all p-values truncated to the Monte Carlo precision. Repeating the same analysis for N g = 3, 4 did not change the shape of the (N g -geod) p-values curves, which was expected as higher values of N g yield a power increase. A clear asymptotic behaviour is observed for the upper bound technique (UB), as power at level α tends to one when sample sizes increase. Note that, for the largest range of sample sizes, (UB) is relatively more efficient than (N g -geod), due to the Monte Carlo truncation. Both procedures lead to rejection of the null hypothesis, and therefore to the statement that nearest neighbors effect on (ϕ, ψ) distributions is statistically significant. This analysis suggests that both (N g -geod) and (UB) are suitable for assessing differences on local protein structures, as the available sample sizes (which may be up to ∼ 10 5 in some practical scenarios) are large enough to state significant conclusions. Nearest neighbors effect

Discussion
The main goal of this work was to define suitable two-sample goodness-of-fit tests for measures on T 2 . This naturally led us to enrich the existing theoretical results [14,39,42] on Optimal Transport for periodic measures. In particular, we studied the shape of the solutions to the Monge problem (2.5), which allowed the extension of a Central Limit Theorem to T d , for any p > 1. Our original inspiration when first investigating these theoretical results was to use the Central Limit Theorem 2.6 to define a two-sample asymptotic test. However, the derived limit distribution degenerates when P = Q and prevents such an application. Nevertheless, the Wasserstein distance on T 2 for the quadratic cost was used to define two efficient testing techniques, which address our initial goals. The first approach bypasses the dimension problem by projecting the measures to closed geodesics on T 2 and subsequently test their equality. This required the investigation of how to project samples on closed geodesics and, moreover, how to conveniently sample closed geodesics. The answers we propose here, notably in Sections A.1 and A.2, together with their supplied practical implementations, may be of interest in further practical situations. Furthermore, they suggest one possible extension of the Sliced Wasserstein distance [7] to the two-dimensional flat torus. As closed geodesics on T 2 are isomorphic to R/Z, the equality of the projected measures is assessed through a two-sample Wasserstein test on the circle which, to the best of our knowledge, is the only efficient procedure proposed up to now.
The second proposed approach consists in upper-bounding the exact p-values (3.11). This is possible thanks to the derived concentration inequalities (3.12) for the two-sample empirical Wasserstein distance with the quadratic cost, and to the improved convergence speed of its expectation, as shown in Lemma 3.8. As with any upper-bounding technique, the corresponding test is conservative and only efficient for large sample sizes, which reduces its range of application. However, this test could be relevant in some practical scenarios. For example, Molecular Dynamics simulations (which simulate the temporal evolution of the structure of a protein using force-fields based on physical models), produce samples on T 2 that may present small and meaningless differences when re-running simulations multiple times with slightly different initial conditions. In such a situation, we expect that the first technique (N g -geod) will reject the equality of their corresponding distributions, while the conservative test (UB) will accept differences between independent replicas of the same simulation. Consequently, (UB) will only detect more important discrepancies, which are the only ones of interest for practical purposes.
Regarding the practical implementation of both tests, some differences appear with respect to computing time. The main advantage of (N g -geod) is the explicit formulation of Wasserstein distance on one-dimensional spaces, which avoids the use of any Optimal Transport solver. As a result, its time complexity is linear in the sample size. However, the statistic null-distribution must be simulated with the desired precision, which may slow down the procedure. Note that, in any case, this distribution can be simulated once and be tabulated for any further implementation. The time complexity of (UB) exclusively lies on the Optimal Transport solver chosen to compute Wasserstein distance. For very large sample sizes, this might lead to a substantially slower process.
The issue of two-sample goodness-of-fit testing studied in Section 3 remains largely open. Our contribution in this respect is to propose easily implementable goodness-of-fit testing approaches that are built on top of state-of-the-art tools in Optimal Transport. Finding the exact or asymptotic distribution of the Wasserstein statistic in general dimension remains one of the main unsolved problems of the theory of Optimal Transport, preventing the construction of more efficient two-sample goodness-of-fit tests. An asymptotic approach for measures supported on a finite set has been presented in [61] and, in the onedimensional case, [3] have obtained a CLT under the null P = Q for deviations of W p (P n , Q n ) from the true distance W p (P, Q) (instead of E(W p (P n , Q n ))). The results of [3] are already quite challenging mathematically, and extensions to higher dimensions are clearly beyond the scope of the present work. Altogether, we believe that the goodness-of-fit tests defined in this paper constitute a relevant building block for the study of the sequence-structure-function relationship in proteins, and in particular for Intrinsically Disordered Proteins (IDPs), allowing their structural investigation with mathematical guarantees. Furthermore, the interest of the techniques here presented may go beyond the Structural Biology community, as they allow solving the goodness-of-fit testing problem for two distributions lying in general periodic spaces, which appears in various application domains.

Code availability
The test approaches presented in this work are implemented in the R package torustest, available at https://github.com/gonzalez-delgado/torustest, together with the algorithms introduced in Appendix A. Empirical Wasserstein distances were computed using the R package transport [58].
This Section is devoted to address some practical questions that arise when defining the test proposed in Section 3.1. In Appendix A.1, we propose a sampling method to prevent the practitioner from explicitly choosing the N g geodesics, letting them be chosen randomly with respect to a given distribution. In Section A.2, we propose an algorithm to project a pair of samples on T 2 to a given closed geodesic.

A.1. Sampling closed geodesics
As the closed geodesics on T 2 are given by the canonical projections of straight lines on R 2 with rational slope, sampling from the set of all closed geodesics is equivalent to sampling from Q, which is a countable set. This prevents the sampling to be uniform, in the sense that geodesics can not be equiprobable. Indeed, if P(q) = c for all q ∈ Q, by countable additivity P(Q) = q∈Q c, which is zero if c = 0 and ∞ otherwise. In consequence, as we have to assign different weights to rational slopes, we will opt for simpler geodesics to be more probable, in order to ease computational implementations. To achieve so, we can consider the random variable Q = A/B, studied in detail in [49], where B follows a geometric distribution of parameter p and, for a given denominator B = b, A is uniform on {0, 1, . . . , b}. Note that Q maps into Q ∩ [0, 1]. As p increases, A and B take smaller values and the corresponding geodesics revolt less over the torus. Conversely, when p → 0, P(Q = q) → 0 for all q ∈ Q ∩ [0, 1], and Q is asymptotically equiprobable [49]. However, small values of p yield extremely high values of A and B and, consequently, unmanageable geodesics with a too-big number of revolutions. The distribution of Q for different values of the parameter p is illustrated in Figure 5. Here, we will ask p ≥ 0.1 for computational simplicity. Note that rationals in Q ∩ [0, 1] yield to straight lines in R 2 whose director vector (B, A) lies in the first (eq. fifth) octant. To cover all the set of closed geodesics, we uniformly assign an octant to each realization of Q and transform its coordinates appropriately. As we would like all the N g p-values to be independent, we must only accept samples with N g different geodesics. This may be a problem if N g is too big, and might require decreasing the value of p. Nevertheless, for a small number (≲ 30) of geodesics we can keep p ∼ 0.1 and easily get samples with no repetitions. If one needs to perform the test for large values of N g , we recommend to explicitly choose geodesics a priori to avoid this problem, leaving the sampling method for controlled values of N g . The complete sampling procedure is described in Algorithm 1, which takes N g and p as arguments and retrieves N g director vectors. In Algorithm 1, M Ng×2 (Z) denotes the set of (N g × 2)-matrices with entire entries and we definegcd as for a, b ∈ Z with b ̸ = 0.

A.2. Projection to a closed geodesic
Let a, b ∈ Z, with b ̸ = 0, and u = (a, b) the director vector of a straight line r 0 u containing the origin (0, 0). Let I a be the real interval (min(a, 0), max(a, 0)), being I b analogously defined. We aim to project a pair of samples into the geodesic given by the canonical projection of r 0 u . To do so, we first consider the finite set P u of the points in I a × I b where r 0 u cuts the lines x = z a , y = z b for z a ∈ I a ∩ Z and z b ∈ I b ∩ Z:

Algorithm 1 Geodesics sampling
An example is presented in Figure 6a where R is the one defined in the begging of Section 2. These two steps are depicted in Figure 7.
The last step is to relocate all the projections on R/Z. To do so, we put the segments L u ∩([0, 1]×[0, 1]) in order, following the spiral path. This corresponds to transfer back the points to the straight line r 0 u of Figure 6a. Let (x u , y u ) ∈ r p u ∈ L u . The element t u ∈ R/Z will be parameterized as  [66] implies that there exists a solution π * of (2.2). Additionally, Theorem 5.10 in [66] establishes that supp(π * ) is d p -cyclically monotone. More precisely, by Theorem 5.10 in [66], this support lies on the graph of the d p -differential These definitions of d p -differential and d p -concave functions apply verbatim to ∥ · ∥ p -differential and ∥ · ∥ p -concave functions with the obvious notation. Let Γ be the set defined in (2.6), {(x k + p k , y k + p k )} n k=1 ⊂ Γ be a sequence and σ : {1, . . . , n} → {1, . . . , n} be a bijection. Then, the definition of which means that Γ is ∥ · ∥ p −cyclically monotone. Therefore, Γ ⊂ ∂ ∥·∥ p φ p , for some ∥ · ∥ p -concave function φ p . Now, recall from Theorem 3.3 and Proposition 3.4 in [28], that 1. The set of differentiablity Since Γ ⊂ ∂ ∥·∥ p φ p , this means that, for all x ∈ dom(∇φ p ), there exists an unique y x = S p (x) such that (x, y x ) ∈ Γ. We observe that, due to the fact that µ P ≪ ℓ d , the measure γ * = (Id × S p )#µ P on R d × R d is well defined, its support is ∥ · ∥ p -cyclically monotone and its first marginal is µ P . We claim that the second marginal is µ Q . Let (x,ȳ) ∈ π * be such that x + p ∈ dom(∇φ p ), for all p ∈ Z d . Then, for any representative pair, call it (x, y) ∈ R d , there exist p, p ′ ∈ Z d such that (x + p, y + p ′ ) ∈ Γ. Since the relationȳ = S p (x) holds. Since x + p ∈ dom(∇φ p ), for all p ∈ Z d , which is the intersection of sets of full µ P -measure, the relationȳ = S p (x) happens µ P −a.e. This means that π * = (Id × S p (·))#P , which proves automatically the claim. Consequently, the existence is proven.
The uniqueness follows from the proof of Corollary 2.4. in [28]. Indeed, we can define the set S = π * solving (2.2) where Γ(π * ) is defined as in (B.1) for each π * solving (2.2). Therefore, taking any finite sequence {(x k +p k , y k +p k )} n k=1 ⊂ S, there exists at most n different probability measures π k , for k = 1, . . . , n, such that (x k +p k , y k +p k ) ∈ Γ(π k ). As all of them are solutions of (2.2) we have, due to the linearity of the optimization in (2.2) and the convexity of the set Γ(P, Q), that the mean π 0 = 1 n n k=1 π k is also a solution. Then, its support is contained in a d p -cyclically monotone set, and Γ(π 0 ) is ∥ · ∥ p -cyclically monotone, since it contains the sequence {(x k + p k , y k + p k )} n k=1 ⊂ S. Consequently, S is ∥ · ∥ p -cyclically monotone. To conclude, repeating the previous arguments, there exists a ∥ · ∥ p -concave function f S such that S ⊂ ∂ d f S . Moreover, for any other φ p , defined as before, it holds that ∂ d φ p ⊂ ∂ d f S . Then, the equality holds µ P -a.e. This proves the uniqueness of S p and, consequently, the one of T p Proof of Theorem 2.2. We set (x, y) ∈ Γ and observe that d(x,ȳ) = ∥x − y∥.
Since (x,ȳ) ∈ supp(π * ), Theorem 5.10 in [66] establishes that if (f, g) solves (2.3), it holds Since, for each (x, y), there exists p ∈ Z d such that d(x,ȳ) = ∥x − y − p∥, we can directly replace y by y + p in the infimum without altering any of the terms. This yields the equality and allows to define the following periodic ∥ · ∥ d -concave function in R d : We claim that ∇φ p = ∇φ p for µ P -a.e., which implies the equality of bothφ p and φ p , in each connected component of supp(µ P ). By assumption, supp(µ P ) = p∈Z d p + A is a union of connected sets. By periodicity we can restrict our study to the connected set A, where the claim yields ∇φ p = ∇φ p for ℓ d -a.e. We can apply Theorem 2.6 in [19] to conclude that φ p =φ p + C in A, thus in supp(µ P ). We prove now the claim. Let π * be a measure solving (2.2), we know (from Theorem 5.10 in [66]) that its support lies in the graph of ∂ d p f . Therefore, we can define the following ∥ · ∥ p -cyclically monotone set (note that this is true by repeating the same arguments as for Γ): which satisfies the relation Γ(π * ) ⊂ Γ(∂ d p f ), with the notation of the proof of Theorem 2.1. Recall that the relation Γ(π * ) ⊂ ∂ ∥·∥ p φ p also holds. Moreover, by definition we have Γ(∂ d p f ) ⊂ ∂ ∥·∥ pφ p . Since µ P -a.e. the sets ∂ ∥·∥ pφ p (x) and ∂ ∥·∥ p φ p (x) are singletons, and, for µ P -a.e. x, there exists at least one y ∈ R d such that (x, y) ∈ Γ(π * ), then ∂ ∥·∥ pφ p (x) = ∂ ∥·∥ p φ p (x). This im- p (x) are equal µ P -a.e., which proves the claim. Note that, under continuity of the optimal transport potential, their uniqueness only need to be fulfilled µ P -a.e.
Proof of Lemma 2.4. Setx,z ∈ dom(f ). Then, by definition The mean value theorem yields the inequality a p − b p ≤ p|a − b|(a p−1 + b p−1 ), which holds for any a, b ≥ 0. Then, the triangle inequality for d leads to where the d p−1 2 term comes from the trivial bound of the diameter of T d . This concludes the proof.
Proof of Theorem 2.5. Setp ∈ X P and assume that f p (p) = 0. Set ϵ m → 0 and consider the sequence of balls B ϵm (p) ⊂ supp(P ), centered atp with radius ϵ n . Since the ball is a continuity set of P , after Portmanteau's Theorem, P n w − → P implies that for each m there exists a n m such that P n gives mass to B ϵm (p) for all n ≥ m n . Then, we can extract a sequencep n →p such thatp n ∈ X Pn . As a consequence, we have that f n (p n ) ∈ R, and we can set a n = −f n (p n ) and define h n = f n + a n . Recall from Lemma 2.4 that all such functions are L-Lipschitz in their respective domains. Kirszbraun's Theorem (Theorem B in [36]) implies that, without loss of generality, we can consider that h n (resp. f p ) are 2p-Lipschitz functions defined in the whole T d . The previous reasoning implies that {h n } n∈N is point-wise bounded for the compact sequence {p n } n∈N . Since all such functions are 2p-Lipschitz, then Arzelá-Ascoli's Theorem concludes that every subsequence {h n k } k∈N admits a convergent subsequence {h n k j } j∈N . Let h be one of those limits. Note that the d p −conjugation is continuous in the sense for allȳ ∈ T d . By assumption, we have Then, the inequality | (h n − h)dα n | ≤ ∥h n − h∥ ∞ gives hdα + h d p dβ = 0. The function h is thus an optimal transport potential. The uniqueness described in Theorem 2.2 and the fact thatp n →p and h n (p n ) = f p (p) = 0 conclude that f p is the unique possible limit of such subsequences in dom(f p ).
Proof of Theorem 2.6. Note that as Theorem 2.5 holds, since probability measures are supported in a compact set, the torus, then the reasoning of [22] can be imitated. Here the main steps of the proof for the one-sample case are given.
For further details about the proof we refer to the original text. Efron-Stein inequality, see Chapter 3.1 in [10], states that if (X ′ 1 , . . . , X ′ n ) is an independent copy of (X 1 , . . . , X n ), then we have the bound Moreover, if X 1 , . . . , X n are i.i.d, such inequality can be written as Set the empirical measures P n = 1 n n k=1 δ X k and P ′ n = 1 n (δ X ′ 1 + n k=2 δ X k ), and the values R n = T p (P n , Q) − f p dP n and R ′ n = T p (P ′ n , Q) − f p dP ′ n . Let f n and f ′ n be solutions of the dual problem (2.3) of T p (P n , Q) and T p (P ′ n , Q) respectively. Then from (2.3) we derive that which together with Theorem 2.5 yields Since the probability measures are supported in the torus, which is compact, then n 2 E(R n − R ′ n ) 2 + → 0. Finally, we conclude by the so called Efron-Stein's inequality.

B.2. Proofs of Section 3
Proof of Lemma 3.1. If G is the distribution function of the uniform distribution on R/Z, we have that Plugging (B.2) in (3.6), we have where the optimal value for α can be found by analytically minimizing the function H(α) =  when P c = U , which concludes the proof.
Under the null hypothesis, G m #P n w → U . Thus, T 2 (G m #P n , U ) → 0 in probability (recall Section 2.2). In consequence, for every ε > 0 and every α > 0, we have n + m nm c c nm (α) ≤ ε (B.7) for sufficiently large n, m. On the other hand, when P c ̸ = Q c , T 2 (G m #P n , U ) → T 2 (G#P, U ) > 0 in probability which, together with (B.7), proves the result.
Size of (N g -geod). Let us prove that (N g -geod) controls the type I error at any significance level α > 0. Indeed, if H 0 denotes the null hypothesis (3.1), we have P H0 π g nm,Ng = 0 = P H0 where the first equality in (B.8) is ensured as every p i follows a uniform distribution under the null.
Proof of Proposition 3.5. Suppose that P c j ̸ = Q c j w.l.o.g. for some j ∈ {1, . . . , N g }. After (N g -geod), we have P(π g nm,Ng = 1) = P Then, as the right side of the previous inequality tends to 1 after Proposition 3.4, so does the left side, which ends the proof.
Proof of Remark 3.6. Suppose that µ P ≪ ℓ 2 and project with respect a given direction e 1 . As an immediate consequence of the monotone convergence theorem, ℓ 2 (A × R) = 0, for any Lebesgue null set A ⊂ R. Consequently, by hypothesis 0 = µ P (A × R) = µ 1 P (A). Here, µ 1 P is the projected measure of µ P to the direction e 1 . Then, for any null set B in R/Z the leveraged setB = s∈Z (s + B) is a Lebesgue null set, so that µ 1 P (B) = 0 and P c (B) = 0. Proof of Theorem 3.7. Note that T 2 (P n , Q m ) = T (X 1 , . . . , X n , Y 1 , . . . , Y m ) is a function of X 1 , . . . , X n and Y 1 , . . . , Y m . For each x 1 , . . . , x n , y 1 , . . . , y m ∈ T d and x ′ ∈ T d let π and π ′ be both joint measures such that Then we have that which implies where the second inequality comes from the fact that d 2 (x, y) ≤ 1/2 in T d . By symmetry we also obtain the reverse inequality. Doing the same with y ′ 1 and y 1 we obtain the bound 1 2m . By using McDiarmid's inequality, see [44], we derive that P (T 2 (P n , Q m ) − ET 2 (P n , Q m ) > t) ≤ exp − nm n + m 8t 2 .
Proof of Proposition 3.9. Let P = Q. After Lemma 3.8, for every ε > 0 these exists N ε ∈ N such that for all n, m ≥ N ε , ET 2 (P n , Q m ) ≤ ε. Using explicitly the convergence speed, we can find the relationship between ε and N ε : where C > 0 is an unspecified constant. Then, directly from Theorem 3.7, we can bound (3.11) as for all n, m ≥ N ε .
Proof of Proposition 3.10. Let us first prove that (UB) is asymptotically of level α. Let ε > 0 and N ε ∈ N such that for all n, m ≥ N ε , the test (UB) controls type I error. As we are taking the limit n, m → ∞, we can choose n, m large enough such that they surpass N ε . Then, consistency is ensured by Proposition 3.9.