On geometric probability distributions on the torus with applications to molecular biology

: In this paper, we study a family of probability distributions, alternative to the von Mises family, called Inverse Stereographic Normal Distributions . These distributions are counterparts of the Gaussian Dis- tribution on S 1 (univariate) and T n (multivariate). We discuss some key properties of the models, such as unimodality and closure with respect to marginalizing and conditioning. We compare this family of distributions to the von Mises ’ family and the Wrapped Normal Distribution . Then, we discuss some inferential problems, introduce a notion of moments which is natural for inverse stereographic distributions and revisit a version of the CLT in this context. We construct point estimators, conﬁdence intervals and hypothesis tests and discuss brieﬂy sampling methods. Finally, we conclude with some applications to molecular biology and some illustrative examples. This study is motivated by the Protein Folding Problem and by the fact that a large number of proteins involved in the DNA-metabolism assume a toroidal shape with some amorphous regions.


Introduction
In the recent years, statisticians paid increasing attention to random variables taking values on manifolds. Amongst the most important statistical problems on manifolds, those arising from the analysis of circular data, spherical data and toroidal data play a fundamental role (see [15] and [21] for some references and [29]). Motivated by important applications in molecular biology, we will concentrate on circular and toroidal data.
The motivation behind this paper relates to the Protein Folding Problem (PFP), which is one of the major open problems in biochemistry. Protein folding is the physical process that a protein chain undertakes before reaching its final three dimensional shape (conformation). The shape of the protein ensures that the protein does its job properly, while a misfolding can be the cause of several neurodegenerative diseases and other types of diseases as well [37].
Since the physical process underlying protein folding is complicated, there has been limited success in predicting the final folded structure of a protein from its amino acid sequence. A better understanding of this would definitely have clinical impact and result in the design of efficient drug molecules for the cure of several diseases such as the ones mentioned above.
The PFP can be divided into two parts: the static problem and the dynamic problem. As explained in [6], the static problem concerns the prediction of the active conformation of the protein given only its amino acid sequence. The dynamic problem, instead, consists in rationalizing the folding process as a complex physical interaction. Both these problems have a precise mathematical formulation [6].
Some regions of the conformation of a protein may look amorphous and so require random models and probability distributions to be described properly. As a matter of fact, a large number of proteins involved in DNA metabolism with different evolutionary origin and catalyzing different reactions assume a toroidal shape [10], [11]. It has been argued that the preference towards the toroidal form may be due to special advantages in the DNA-binding [10], [11]. A long list of proteins share the toroidal form [10], [11]. The importance of the PFP and the DNA-binding process motivated statisticians to find appropriate statistical models that describe these phenomena.
The most famous circular distribution is the von Mises distribution. The interest on the theory as well as practical applications stimulated researchers to look for a good higher dimensional analogue of the one dimensional von Mises Distribution. The Bivariate von Mises Distribution (BVM) is a probability distribution describing a two dimensional random vector, taking values on a torus T 2 := S 1 × S 1 . It aims at representing an analogue on the torus of the Bivariate Normal Distribution (BVN). The Full Bivariate von Mises Distribution (FBVM) was first proposed by Mardia [17]. Some of its variants are currently used in the field of bioinformatics to formulate probabilistic models of protein structure.
The FBVM seems to be over-parametrized for being the toroidal counterpart of the BVN. In fact, the FBVM depends on eight parameters, while the BVN possesses only five parameters. This situation becomes even clearer in the high-concentration limit (κ → +∞) [26]. For this reason, several submodels have been proposed. Four commonly used variants of the bivariate von Mises distribution have been originally proposed by Mardia [18] and then revisited by Rivest [35] and also by Singh-Hnidzo-Demchuk [39]. They are models with a reduced number of parameters and are derived by setting to zero the off-diagonal elements of the interaction matrix. The following models are submodels of the FBVM which have been discussed in the literature: the Cosine Model with Positive Interaction [26], the Cosine Model with Negative Interaction [26], the Sine Model [39] and the Hybrid Model [13].
The use of these distributions has pros and cons. The pros: the von Mises distributions resemble the BVN in the high-concentration limit, they are closed with respect to conditioning, it is relatively easy to give unimodality conditions, and the parameters have easy interpretability, even when they do not exactly match the ones of the BVN. The cons: the family is not closed under marginalization (but in the case of high-concentration limit), inference and prediction are not trivial and require numerical methods. For example, MLEs cannot be computed explicitly, but just via optimization algorithms. To overcome this problem, more advanced procedures, like pseudo-likelihood estimators, have been suggested [21].
It is a common belief that the geometry of the torus implies that a completely natural counterpart of the BVN on the torus is not available (see, for example, the discussion in [13]). Therefore, despite von Mises' models have been proven to be successful, there has not been a definite answer yet to which of the models proposed is the "best" candidate to represent the toroidal counterpart of the BVN.
In this paper, we aim at giving an alternative to von Mises models both in univariate and multivariate case, which maintain the good properties of the von Mises distributions, like the asymptotic normality, but also possess some extra properties, such as the simplicity in the estimation that the BVN possesses in the Euclidean case, but the von Mises distributions do not.
The candidate distribution in the univariate case of S 1 is called Inverse Stere-ographic Normal Distribution (ISND). The name comes from the fact that the Stereographic Projection of this distribution is the Normal Distribution on R. Definition 1.1. We say that a random variable Θ ∈ S 1 has an Inverse Stereographic Normal Distribution, denote by ISN (μ, σ 2 ), if and only if its pdf is given by for some μ ∈ R, σ > 0 and for any θ ∈ [−π, +π). The distribution ISN (0, 1) is called Inverse Stereographic Standard Normal Distribution.
In a similar manner, we can define on T n := [−π, +π) n the Multivariate Inverse Stereographic Normal Distribution (MISND).
In the following, we concentrate on the case n = 2, namely the Bivariate Inverse Normal Distribution (BISND), since it is more relevant to the applications to molecular biology (See Subsection 6.3).

Remark 1.3.
Note that this is not the only way in which one can introduce a location parameter in the ISND. Previously, the location parameter for the ISND has been introduced as the location of the axis of symmetry of the distribution on the circle (see for example [1]). In our case μ does not play that role. Instead, μ corresponds to the mean value of the corresponding projected distribution on the real line. We refer to Section 4 for more comments on this.
There are several advantages of this approach, listed below not necessarily in order of importance: • The Inverse Stereographic Projection suggests a natural way to construct distributions on manifolds, by transforming distributions on the Euclidean Space (see Section 2.1); • The number of parameters of the BISND matches the number of parameters of the BVN, without imposing any further constraint and there is a natural interpretation for the parameters of the ISND in terms of the parameters of the BVN (see Section 4); • The BVM and BISND resemble each other in the case of high-concentration limit for certain ranges of parameters. Analogously for the ISND and Wrapped Normal Distribution (WN). Moreover, the ISND approximates the BVN in the case of high concentration limit (see Section 3); • The Stereographic Projection suggests a more geometric counterpart of the Euclidean Moments which differs from the Circular Moments (see Section 4); • The definition of moments using the Stereographic Projection helps address some interpretation problems in the parameters of directional variables (for further comments, we refer to Section 7); • The MLE estimates for the ISND do not need numerical methods, since the estimates of the parameters are the transformed estimates of the parameters of the BVN (see Section 5); • The Stereographic Projection allows to transfer some test statistics, confidence intervals and hypothesis testing from the Euclidean space to manifolds (see Section 5).
The rest of this paper is organized as follows.
In Section 2, we collect some notation and describe some preliminary results. We describe the construction of general inverse stereographic projected distributions; we give results about the marginals of the ISND which are still ISND (Theorem 2.3), and conditionals which are ISND as well (Theorem 2.4); we give unimodality conditions for this family of distributions (Theorem 2.6).
In Section 3, we compare the ISND with the VM and the WN. In particular, we prove approximation results in the case of high-concentration limit which connect ISND, VM and WN for some subsets of the parameter space (Theorem 3.1, Theorem 3.2 and Theorem 3.3). Note that WN and VM are close also for moderate values of the concentration parameter κ, which in the past has been an argument for using the VM for circular data, due to its compact form.
In Section 4, we introduce Inverse Stereographic Moments (ISMs) and Inverse Stereographic Moment Generating Functions (ISMGFs) which lead naturally to a corresponding "intrinsic" CLT on the torus (Theorem 4.12). We compare ISMs with the classical Circular Moments and with the moments of Euclidean random variables.
In Section 5, we pass to Statistical Inference. We propose a way to do point estimation, confidence intervals and hypothesis testing for the model parameters following the lines of Euclidean cases.
In Section 6, we present some numerical examples and applications. In Subsection 6.1, we produce some plots of the ISND for some choices of the parameters, both in the unimodal and the multimodal cases. In Subsection 6.2, we compare numerically the BVM and the ISND, using the results in Section 3 for the corresponding ranges of the parameters. In Subsection 6.3, we give an application to problems in Molecular Biology, in particular to the problem of the distribution of dihedral angles on the torus.
We complete our analysis with a discussion (Section 7), the conclusion (Section 8) and the technical proofs (Appendix A: Proofs). In the supplementary material, we collect some of the codes (Appendix B: Supplementary Material) used in the applications.

Notation and preliminary results
We denote with Mat n×n the set of square matrices of dimension n, and with Sym + n×n the set of positive symmetric matrices of dimension n. In Subsection 2.1, we briefly discuss the construction of Inverse Stereographic Projected Distributions. In Subsection 2.2, we present results about the marginals of ISND which are still ISND (Theorem 2.3), and conditionals of the ISND which are again ISND (Theorem 2.4). In Subsection 2.3, we give unimodality conditions for ISND (Theorem 2.6).
As a consequence, a natural measure on the torus when the probability distribution is obtained through Inverse Stereographic Projection P is the pull-back measure of the Lebesgue measure on R: Geometric pdfs on the torus with applications to molecular biology 2723 Remark 2.1. There are other possible multidimensional generalizations of the inverse stereographic projection. In the case of the sphere, the authors of [1] note that the Jacobian of the stereographic projection is non-trivial and they use a projection which produces a slightly simpler Jacobian to the expenses of a slightly more complicated construction (see also [41]). When reduced to p = 2, namely S 1 , the projections here and in [1] coincide.

Remark 2.2.
The Stereographic Projection is a conformal transformation and so it does not change angles. In our context, this is a very useful property as it means that the Stereographic Projection maps elliptical contours to elliptical contours.

Marginals and conditionals of MISND
In strong contrast with the Multivariate von Mises Distribution, the ISND is closed under marginalization.

Theorem 2.3. Suppose a random variable
Then, for every i = 1, . . . , n, we have Proof. See Appendix A.

Unimodality and multimodality conditions
The ISND can be both unimodal or multimodal depending on the parameters μ and σ 2 (see the plots in Subsection 6.1). In this subsection, we give necessary and sufficient conditions for the ISND to be unimodal. Recall that in the case n = 1, Θ ∼ ISN (μ, σ 2 ) has a pdf given by for some μ ∈ R, σ > 0 and for any θ ∈ [−π, +π). Note that, for θ = −π, the density has a removable singularity and it can be extended to a smooth function by choosing f Θ (−π) = 0. This makes θ = −π a critical point of the density and hence the global minimum. Since S 1 is compact and f Θ (θ) is not constant, it admits a maximum and a minimum distinct from the maximum. To ensure unimodality, we need to find μ and σ 2 such that f Θ (θ) admits only one further critical point apart from θ = −π.
Note that f Θ (θ) is differentiable for every θ ∈ (−π, +π). By direct computation, the first derivative of f Θ (θ) is given by Therefore, we need to find conditions for which only once for θ ∈ (−π, +π). The case μ = 0 is simpler and has been treated in [1], where authors proved that f Θ (θ|μ, σ 2 ) is unimodal if and only if σ 2 ≤ 1 2 . In this case, the maximum is achieved at θ = 0 and the minimum at θ = −π. The interpretation is that when the mass of the density is too spread out on the real line, when mapped to the circle, it tends to accumulate on itself and produces more than one peak.

Proof. See Appendix A.
Remark 2.7. Note that the condition on Δ(μ, σ 2 ) < 0 can be extremely simplified in the variables x := μ 2 and y : with x > 0 and y < 1. This inequality is quadratic in x and so it can be solved easily by max 0, for y < 1. Note that for y < 1, the argument of the square root is always positive. We refer to Figure 1 for a picture of the regionΔ(x, y) = Δ(μ, σ 2 ) < 0.
It is important to recall that the inverse stereographic projection does not necessarily send critical points to critical points and so we cannot study the critical points of f Θ by studying the critical points f X and then transform them. This makes the analysis of unimodality non-trivial. It is interesting to notice that, in the symmetric case, the inverse stereographic projection does not change the parity of the set of critical points. Indeed, we have the following lemma.

Lemma 2.9.
Suppose f X is a pdf on R and f Θ is a pdf on [−π, +π) such that Θ = 2 arctan(X) and such that both have a finite number of critical points. Moreover, suppose that f Θ (θ) = f Θ (−θ) for every θ ∈ [−π, +π). Then, both f Θ (θ) and f X (x) have an odd number of critical points.

A. Selvitella
Proof. Direct computation or direct consequence of the fact that the stereographic projection is a conformal transformation.

Comparison theorems: ISND vs Von Mises and Wrapped Normal
In this subsection, we compare the ISND with the VM and the WN. A classical argument to promote the use of the VM as a natural circular counterpart of the Normal Distribution is that, in the case of high-concentration limit (κ → +∞), the two distributions resemble each others This approximation is valid in every L p ([−π, +π))norm for p ∈ [1, +∞] and, as mentioned, does not uniquely identify the VM; in fact it is valid for several other distributions like the truncated normal and, indeed, it is valid for the ISND as well.

Theorem 3.1. Consider the two distributions
with θ ∈ R and κ −1 = 4σ 2 . Then The bivariate case for certain ranges of the parameters follows easily.

Theorem 3.2. Consider the following BVM:
defined for φ ∈ [−π, +π) and ψ ∈ [−π, +π). Here, κ 1 and κ 2 are the concentration parameters, and the matrix A = [a 11 , a; a, a 22 ] ∈ Mat 2×2 is the interaction matrix. Further, consider the BISND: Proof. In the case where b = 0, the variables φ and ψ decouple and so the proof is a direct consequence of Theorem 3.1. The general case follows from a small modification of the proof of Theorem 3.1, that you can find in Appendix A.
Another consequence of Theorem 3.1 is that the ISND is close to the WN. This distribution is particularly relevant, as it is the unique solution to the heat equation on the circle and it is the only distribution preserving causality when used for kernel smoothing [12].

Theorem 3.3. Consider the two distributions
Proof. We have As κ → +∞, the first summand goes to zero because of equations (3.5.23) and (3.5.24) on page 58 of [15], while the second summand goes to zero because of Theorem 3.1.

ISMs, ISMGF and a version of the Central Limit Theorem
Classical circular (trigonometric) moments (CMs) have been studied in the literature in the context of inverse stereographic distributions (see [30], [31], [33] and the references therein). There, the authors do not seem to consider the inverse map and deal with classical CMs. As the results in [30], [31], [33] underline, CMs seem not natural. In fact the formulas for CMs derived in those papers are complicated functions of the parameters of their models. As a consequence, when related to the parameters, CMs lack in interpretability for those distributions. This motivates the search for a notion of moments which suits better inverse stereographic distributions. We introduce the Inverse Stereographic Moments (ISMs) and compare them with CMs in Subsection 4.1. In Subsection 4.2, we introduce the Inverse Stereographic Moment Generating Function (ISMGF) and use ISMs to rephrase the Central Limit Theorem (CLT) in the context of T n for n ≥ 1.

The inverse stereographic moments
The classical way in which Moments on the Circle are defined is the following.
This procedure has no connection with the extra structure coming from the Stereographic Projection, since the Stereographic Projection does not send the Lebesgue Measure on [−π, +π) to the Lebesgue Measure on R and also does not send polynomials defined on C R 2 to their restriction to S 1 .
We propose to compute Moments in a way which is consistent with the Stereographic Projection, so that it is particularly suitable for inverse stereographic projected random variables defined on S 1 . This procedure identifies in a geometrically natural way the circular counterpart of the moments defined in the Euclidean Space with respect to the Lebesgue Measure. A corresponding construction works in higher dimensions too. Definition 4.2. Consider a random variable Θ defined on S 1 with pdf f Θ (θ). Then, we define the Inverse Stereographic Moment (ISM) of Θ of order k with k ∈ N as Therefore, if f Θ (θ) comes from an Inverse Stereographic Projection, namely Θ = 2 arctan(X), then ISMs are definitely different from CMs, in particular in how they relate to the parameters of the underlying circular distribution. We give a small proposition on the comparison between CMs and ISMs for the ISND, but definitely the relation between CMs and ISMs deserve a deeper and more general investigation.
Proof. We have that where Φ is the error function (cdf of N (0, 1)). Therefore, the two moments are equal if and only if μ = 0 or σ 2 = 0.
1+sin θ ], due to oddness of the integrands. The case σ 2 = 0 represents the degenerate case where Θ = c ∈ R a.s wrt to the Lebesgue measure on R.
A consequence of Proposition 4.4 is that a large ISM of order 2 affects a CM of order 1, but does not affect the ISM of order 1. This can be interpreted as some sort of orthogonality between the first and second ISMs and mimics the orthogonality ofX and S 2 for the MVN.
Proof. Direct computation plus the computations in the proof of Proposition 4.4.
Not all random variables defined on S 1 admit finite ISMs. For example, the Uniform Distribution on S 1 (Θ ∼ Unif(−π, +π)), which corresponds to the Cauchy Distribution on the Real Line through Inverse Stereographic Projection, does not admit any finite ISM, while all its CMs are finite. We have the following result.

Proposition 4.6. Consider a random variable
Proof. It is a direct consequence of the Taylor expansion of sin θ 1+cos θ at +π, where sin θ The definition if ISMs can be easily extended to random variables defined on T n . Definition 4.7. Consider a random variable Θ = (Θ 1 , . . . , Θ n ) ∈ T n with pdf given by f Θ (θ) and θ := (θ 1 , . . . , θ n ). Then, we define the Inverse Stereographic Note that the interpretation is a little tricky because μ and Σ are not scale and location parameter for the variable Θ, but only for the variable X. In particular, it is easy to see that MISN 's family is not closed under re-location and rescaling (affine transformations).

Remark 4.8.
An important consequence of this definition of moments is that we can compute moments of distributions coming from Inverse Stereographic Projections analytically and by performing simple integrations on R n . This seems a big simplification for estimation purposes, since the estimation of parameters would not need numerical optimization algorithms, in general. Note that the simplification in the computation is directly related to how simple the computation of the moments is in the corresponding random variable on the Euclidean space.

A CLT and the ISMGF
The ISMGF is defined in the following way.
for every t ∈ I, with 0 ∈ I ⊂ R.

Remark 4.10.
Applying the Stereographic Projection, we get Therefore, if f Θ (θ) comes from an Inverse Stereographic Projection, namely Θ = 2 arctan(X), then This makes the ISMGF a natural choice. We can construct an ISMGFs in higher dimensions as well. We concentrate on the case of T n .  Here, t 1 , . . . , t n ∈ I, with 0 ∈ I ⊂ R.
We have the following theorem, which represents an "intrinsic" version of the CLT on S 1 .
with P being the Stereographic Projection. Then, in the sense of the ISMGF and so in distribution.
This is not the most general version of the CLT possible and the theorem works in higher dimensions as well.

A. Selvitella
Proof. See Appendix A.
Remark 4.13. The standard ISND is multimodal. However, we have that Therefore, we can tune the parameter τ 2 so that τ 2 < 1/2 and so the convergence in distribution is towards a unimodal ISND. This possibly surprising fact is a consequence of the fact that the stereographic projection (and in general any continuous function) does not preserve the number of modes. This connects to our previous discussion in Lemma 2.9 and Example 2.1.

Inference
In this section, we discuss some key inferential issues such as point estimation, confidence intervals and hypothesis testing and we briefly comment on some sampling methods for inverse stereographic probability distributions.
The first statistical problem that we address is parameter estimation.
Definition 5.1. Consider a statistic T X on the Euclidean Space. Then, we call Inverse Stereographic Statistic, the following real-valued or vector-valued function: These estimators are natural for probability distributions defined through Inverse Stereographic Projection. Suppose we have a random sample Θ i ∼ ISN (μ, σ 2 ), for i = 1, . . . , n. We can define the Inverse Stereographic Sample Mean asΘ where Θ i ∼ ISN(μ, σ 2 ), for i = 1, . . . , n. By the definition of P and since Θ i ∼ ISN (μ, σ 2 ), for i = 1, . . . , n, we have P (Θ i ) ∼ N (μ, σ 2 ), for i = 1, . . . , n, and so n i=1 P (Θi) n ∼ N (μ, σ 2 /n). Again, by the definition of P , we havē Θ S ∼ ISN (μ, σ 2 /n). Note that similar considerations can be done for the Sample Variance and other estimators. P (Θ S ) is a point estimator of μ. The MLEs can be computed very easily, because the extra term with respect to the likelihood of the MVN, appearing because of the pull-back measure, is parameter independent.
We can develop Interval Estimation for random variables distributed as an Inverse Projected Distributions as well.
Example 5.1. Suppose Θ ∼ ISN (μ, σ 2 ) is a random variable defined on S 1 with parameters μ unknown and σ 2 known. Let P be the Stereographic Projection. A (1 − α)-Confidence Interval for μ can be constructed as follows. We have with Z ∼ N (0, 1) and Φ the error function. A possible choice is then c = −d = z α/2 σ √ n with z α/2 the corresponding normal quantiles. Therefore, Remark 5.3. Note thatΘ S ∼ ISN (0, σ 2 ) if and only if P (Θ S ) ∼ N (0, σ 2 ) and so C(Θ S ) does not depend on P . This is an important property, because makes C(θ S ) "intrinsic" of S 1 , namely independent of the charts used on S 1 .

Remark 5.4.
There are two fundamental advantages in using this perspective: a theoretical and a practical one. From a theoretical point of view, everything is geometrically consistent and fully "intrinsic". From a practical point of view, the estimators are explicit and their distributions can be computed as explicitly as they can be computed for the corresponding projected distribution. Therefore, there is no extra need for numerical optimization. The main negative feature of these estimators derives from the fact that the way in which they approach the true values of the parameters is counter-intuitive in the Θ variables, because it is measured with P * dx. We will discuss this in a particular example in Section 6, but definitely it deserves further investigation. See Figure 2 below, where it is shown how confidence bands become much worse close to the cut-point.
We turn the discussion to Hypothesis Tests.
Example 5.2. Suppose Θ ∼ ISN (μ, σ 2 ) is a random variable defined on S 1 with μ unknown and σ 2 known. Let P be the Stereographic Projection. We test the following hypothesis: Consider a random sample Θ 1 , . . . , Θ n ∼ ISN (μ, σ 2 ). We can use the inversion theorem and build a Rejection Region from the Confidence Intervals described above. For a fixed level α, the Most Powerful Unbiased Test (see [4]) has rejection region R(θ) This test has size α if P (H 0 is rejected |μ = μ 0 ) = α.
It is worth mentioning that it is easy to sample from Inverse Stereographic Projected Distributions. The strategy is to sample from the corresponding Euclidean Distributions and then inverse stereographic project the sample to the circle or the torus. To accomplish this, we can simply use Box-Muller Sampling or other classical algorithms (see [4]).

Applications and numerical examples
We give some numerical examples and applications of the theory developed in the previous sections.

Plots
We collect some plots in order to visualize the ISND and BISND and the dependency on the parameters of the location, scale and shape of the distribution. The parameter μ has a mixed role: it behaves more as a location parameter for small values, but it affects skewness and concentration a lot more for higher values. We refer to Figure 3 up to Figure 9 for the plots and some more comments.

Comparison between the VM and the ISND
We use Theorem 3.1 to illustrate that for κ → +∞ the VM distribution is well approximated by the ISND. For κ = 10, the densities of the ISN (0, (4κ) −1 ) and the V M(0, κ) overlap (see Figure 10).  A similar comparison can be done in the bivariate case, using Theorem 3.2. If we analyze the contour plots of the BISN (0, (4κ) −1 ) and the BV M (0, κ) for κ = 10, the lines overlap (see Figure 11). Note that the VM is unimodal independently of its parameters μ and κ, while there are regimes where the  ISND is bimodal (See Theorem 2.6). This of course happens in the case in which concentration is not high in agreement with the results of Section 3.
To overcome this lack of flexibility in the number of modes of the VM, people use mixtures of VM (see for example [26]). It is useful to know in which cases it is better to use a mixture of VM with the expenses of extra parameters and when it is enough to use a ISND, which is more parsimonious. Note that a limited number of parameters is often associated to a better fit of the model  and a better understanding of the phenomenon under study. From Figure 12 and Figure 13, we note that there are some differences in the shape of the bimodal ISND and the mixture of VM. In the case in which peaks are far away, the mixture of VM keeps its symmetry around peaks, while the ISND becomes skewed towards the zero angle. In the case in which peaks are close by, the mixture of VM separates peaks more distinctly, while the ISND seems more     suitable in the more complicated cases of almost indistinguishable peaks.

Applications to molecular biology
In this subsection, we consider some applications of our results to molecular biology.
To illustrate the distribution of the dihedral angles φ and θ (sometimes called conformational angles or torsional angles) in the protein main chain, people use the so called Ramachandran map. The Ramachandran map identifies a point in the protein main chain with a point on a flat square of the Euclidean plane R 2 with opposite sides identified. From the mathematical point of view, the Ramachandran map represents the embedding of the Flat Torus into the Euclidean space R 4 . It turns out to be a very useful starting point in the inference process. We analyze the Ramachandran Plots of three proteins and model the distributions of the dihedral angles with inverse stereographic distributions.

Myoglobin PDB=1MNB
The first dataset we consider contains the Dihedral Angles of Myoglobin from the Protein Database [32]. Myoglobin is a globular protein whose function is to store molecular oxygen in muscles. Its secondary structure consists mostly of α-helices, so it is expected that the Ramachandran plot shows a highly concentrated region.
To fit the ISND model, we did not use the MLE, because it was giving unintuitive results with lack of interpretability, possibly due to the fact that the convergence are phrased in norms given in the variables tan(Θ/2) and not Θ (See also Remark 5.4 and Figure 2). The estimation problem definitely needs further investigation. We decided to attack the problem with a semi-parametric approach. We centred the dihedral angle ψ so we could apply our Comparison Theorems from Section 3 and fit the parameter σ 2 with the value which was giving the ISND pdf closer to the empirical density in L ∞ -norm (which implies closeness in distribution), in agreement with the convergence results of Section 3 (See also Figure 2). We did also the following further simplification in the code. Reading through the lines of the proofs of the Comparison Theorems, one can see that outside the mode (the distribution of the ψ angle for 1MND is unimodal), there is no need to match the parameters of the distributions that we are comparing since both distributions approach zero heavily. Where the parameters matter the most is in the asymptotic shape of the mode. Therefore, it is reasonable to think that the fit is most likely good if the modes are close. Indeed, using this criterion, the fit seems pretty good as you can verify from Figure 14. The distance between the fitted distribution and the kernel density estimate of the empirical distribution is 0.01775444. This value has been found by a simple search over the parameter σ 2 with step size 10 −3 . In terms of the mode structure, the model is also plausible.

Important Comment
The estimation in the Myoglobin PDB=1MNB example uncovers a very significant issue: since the MLE approach basically fits a normal distribution on the real line, outliers, which are projected very far out by the inverse stereographic map, lead to serious problems. A possible remedy could be found in trimmed estimators which are known to be more robust in such cases.

Gomesin PDB=1KFP (Marginals and Bivariate ISND)
The second dataset we consider contains the Dihedral Angles of Gomesin from the Protein Database [32]. Gomesin is a peptide isolated from the blood cells of a spider. The interest in analyzing the dataset is that the Ramachandran Plot of the dihedral angles of this protein show some skewness. Fitting the ISND as in the previous example, give us the plot on the right in Figure 15 and a distance of 0.0006968093. However, we can see that the right tail is a little off and does not show a really good fit. To prevent this to happen, we searched for the best couple of parameters μ and σ 2 jointly. The plot on the left in Figure 15 confirms a better result and the distance in norm decreases to 2.409174 * 10 −6 . This example shows definitely an improvement in the fit, when using ISND instead of VM in cases where the distribution of dihedral angles is significantly skewed.
For what concerns the joint distribution of the dihedral angles for the Gomesin, we fitted a BISND. The MLE for the various parameters with corresponding 95% confidence intervals are given byμ φ = The original fit did not seem optimal, and heuristics in the choice of the cut-point did not seem to work. Therefore, we performed a "grid-search" over both angles in steps of π/16 and we obtained the fit in Figure 16 with cut-point at P = (−9π/16, −14π/16). The BISND catches the skewness in the distribution of the dihedral angles, but seems confused by presence of an extra observation, to which it assigns an extra mode. This example shows both the the pros and the cons in the use of the BISND to fit data on T 2 , namely the extra flexibility vs the problem of the choice of the cut-point. We refer to Section 7 for more coments on the choice of the cut-point. Note that as shown in Figure  16, the chain seems not long enough (153 dihedral angles) for a non-parametric method, like kernel density estimation, to take over. We fitted also the BISND using the criterion we used to fit the marginals. The distribution obtained fit better close to the mode, but missed completely to catch the dihedral angles in the zone −π < φ < 0, 0 < ψ < π, which is not desirable, since it is not a forbidden region in the Ramachandran Plot. Considering the sensitivity of the fit to the cut-point, we tried to estimate the parameters using robust statistics, like the median, the interquartile range and the MCD by Rouseeuw and Van Driessen [36]. The estimates for the various parameters giveμ φ = −1.134703, μ ψ = −0.6737781,σ φ = 0.1638183,σ ψ = 0.2821952 andσ φ,ψ = 0.007492636. Note that the bigger variation in the estimate is given by the marginal variances, which now, since more robust, are responsible of the unimodality of the fit (see Figure 16). We believe this last method gives the fit more representative of the distribution of the dihedral angles, even if the best goodness of fit test is produced by the non parsimonious KDE. Further investigation needs to be done on this issue.

Important Comment
In general situations, it is not easy to provide a good parametric model for distributions on manifolds, especially because the models available are much less than the ones available on the Euclidean space. For example, it is often hard to find a correct model if the VM, WN and ISND are not good models. However, with the next example, we outline a procedure which might be useful in such cases. Namely, we use the stereographic projection to send our data to the Euclidean space, we build a model there and we project it back to the manifold.

Protein Structures from CDB
We consider a data set from the open access Conformational Angles DataBase (see [5]). It consists of 8190 Conformation Angles from 1208 PDB structures in 25% non-redundant protein chains. See Figure 17. The experiment method taken into consideration is the NMR (Nuclear Magnetic Resonance). The Ramachandran Plot for the residue ALA is shown in Figure 17. The Shapiro-Wilk Normality Test rejects beyond every reasonable doubt the hypothesis of normality on the projected data set and so we cannot conclude that these data are  At this point, without taking advantage of the projected data set, it seems not easy to find a reasonable model which fits the data well. Let us see how the projected marginals behave under Stereographic Projection (see Figure 10). It seems reasonable to test if −X ∼ exp(λ) and if Y ∼ exp(μ) for some λ > 0 and μ > 0. To do this, we perform a Kolmogorov-Smirnov Test. Remark 6.1. Although this hypothesis has been formulated after looking at the data, here we are not interested in the result of the test, but in proposing test procedures. A similar point of view has already been taken in [23] (See Section 5.3 of that paper).
From the Kolmogorov-Smirnov Test (see Appendix B), we cannot reject the Null Hypothesis, which does support our model. Note that the rejection region has a more intuitive shape in the projected variables.
From the Ramachandran Plot, it is clear that there are two peaks and that the data are clustered along two main major circles of the torus. Our model, even if it fits well, does not see these differences, which translates to some lack of power of our test. This is also unsatisfactory from a biological point of view, since the model is good only in the region of right handed α-helices, but is not sensitive enough in the area of left handed α-helices. However, this dataset contains 1208 PDB structures and so it would be probably too much requesting a better performance for a model with so few parameters.
This example shows a possible procedure to adopt in the cases where more common models, like VM or WN do not provide a good fit and/or it is inconvenient to increase too much the number of parameters (using mixtures, for example).

Remark 6.2.
Using the novel hybrid QTAIM Ramachandran plots (see [27]), we think that the BISND could potentially be useful in modelling the glycine amino acid monomer, which largely occupies the "forbidden" regions of the Ramachandran plot. Remark 6.3. VMs are successfully used in Projective Shape Analysis (see for example [9] and [24]). It is definitely important to see if the ISND family is a good model in that context as well. Because of the Comparison Theorems presented in Section 3, we expect similar performances in the high-concentration limit.

Discussion
In this section, we critically discuss the advantages and disadvantages in modelling distribution as Inverse Stereographic Distributions.
The cut-point The construction of an inverse stereographic projected distribution might not seem completely satisfactory for the possible ambiguity left in the choice of the cut-point. Consider for simplicity the case of n = 2, namely Θ ∈ S 1 , and let θ 0 be the North Pole of the corresponding Stereographic Projection. For generic α, β ∈ S 1 , we have that P Θ (Θ ∈ [α, β]) depends on θ 0 . This does not seem a desirable property. However, the choice of the North Pole corresponds to the choice of the point at infinity in the real line. Also that is arbitrary and "breaks the symmetry" of the real line. Very similarly, the choice of the North Pole "breaks the symmetry" of the circle. Analogously can be argued for the choice of the origin of a coordinate system of the real line. Note that the problem of the cut-point is strictly connected to the fact that the circle S 1 is homeomorphic to the real projective line P R 1 , but not to the real line R. This seems to us the intrinsic reason for which this problem cannot be fully resolved.
A richer family of distributions can be considered by introducing a translation parameter θ 0 , in which case the fit can be optimized over θ 0 . In this perspective, the corresponding family of normal distributions might be considered more naturally on P R 1 , with θ 0 corresponding to the point at infinity. The extra flexibility has been proven to be useful in the applications considered in Section 6.

Population and Sample ISMs
The definitions of ISMs turn out to be somehow independent of the choice of the North Pole. To fix the ideas, consider the case n = 2. The choice of the North Pole N is as arbitrary as the choice of the origin or the point at infinity in the real line. A line does not know anything about the system of coordinates that we put on. In the same way, the circle has no intrinsically well defined North Pole. However, we want that characteristic quantities like the expected value and the variance are as independent as possible with respect to the choice of the North Pole.
If we choose a North Pole N = N , you can go from one Stereographic Projection to another by a simple change in the angles θ i → θ + θ 0 i for i = 1, . . . , n. This choice of N produces a new Inverse Stereographic Projection and so a new pdf f Θ (θ + θ 0 ) = f Θ (θ). The measure also changes consistently dθ 1+cos θ → dθ 1+cos(θ+θ0) , and so it compensates the change in the pdf. This procedure of choosing a geometric definition of moments leaves every characteristic quantity to be invariant. There is still the dependence on the choice of the origin (or of the point at infinity) in the Euclidean Space and so the definition is not perfectly independent of any coordinate system. However, this sort of dependence on the coordinate system is the same as the Euclidean one, where it is well accepted and unavoidable.
If we pose our attention to the sample moments, instead of the population moments, we need to consider the possibility of data points crossing the cutpoint. Note that the ISND approaches zero heavily in the neighbourhood of the cut-point and makes the crossing less likely to happen. Nevertheless, confidence bands around the cut-point can get extremely large, as shown in Figure 2. Distributions with positive probability in the neighbourhood of the cut-point do not have finite population moments, as shown in Proposition 4.6.

ISMs vs CMs
The canonical definition of CMs run into some problems due to the fact that they do not take into consideration the different geometry of the circle and torus.

A. Selvitella
Consider the sample mean of two angles. If the angles are θ 1 = −0.01 and θ 2 = +0.01, we have no problem to say that the mean is 0. Suppose, instead, the angles are − π 2 and π 2 . Then, it is not clear if it is more reasonable to say that the mean is 0 or π.
Knowing the mean is much more meaningful in the Euclidean setting than in a periodic setting, where −∞ and +∞ join together. The ISMs address this issue by "breaking the symmetry" of the circle and keeping −∞ and +∞ distinct. In this way, the problem mentioned above no longer exists. This procedure also suggests that it might be more intuitive to give an interpretation of parameters of circular/toroidal distributions after stereographic projection of the distribution, as we have done above.
Parameter Estimation vs cut-point Another important issue which naturally arises from the above comments is how the parameter estimation is influenced by the choice of the cut-point.
Suppose we have a set of observations on S 1 highly concentrated around a point θ and we believe the underlying probability distribution come from an inverse stereographic family. If we believe that unimodality needs to be highlighted in our model, it is not recommended to choose the cut-point in the middle of the cluster, because that would in fact risk to split the single mode into two modes highly concentrated around the cut-point. It is indeed more reasonable to choose the cut-point opposite to the centre of the cluster.
For real valued random variables, the parameter estimation is also influenced by the choice of the coordinate system, but in a more linear way and so, the influence is more intuitive, especially in the case of a normal distribution. The stereographic projection is nonlinear and so does not preserve linearity. As a consequence, different choices of cut-points lead to different parameter estimations.

Conclusion
In this paper, we have discussed several aspects of the ISND. We have derived some theoretical properties, like the unimodality conditions, and proved the Comparison Theorems with the VM and the WN. Then, we discussed the ISMs, the ISMGF and an intrinsic version of the CLT. We concluded with the illustrations, and some applications to molecular biology.
ISMs bring a novel perspective to the parametric statistics on manifolds and need to be investigated further. The performance of inverse stereographic estimators with respect to classical ones deserves future attention.
To verify the potential of ISND especially in higher dimensional models, further applications need to be tested. The possibility for the BISND to have more than two peaks might make the BISND flexible enough to describe the secondary structure of some of the proteins whose dihedral angles have multimodal distributions without the need of mixture models.

Appendix A: Proofs
In Appendix A, we give the proofs of the theorems presented in the paper.

Proof of Theorem 2.3
We give the proof just in the case n = 2, which is more interesting for the applications that we have in mind, but the proof for the general case follows in a similar way.
We want to find conditions on μ and σ 2 so that this polynomial admits only one real solution in t. For this, we need to examine the Discriminant of the Polynomial. We recall the following lemma. We use this lemma for the polynomial P [t; a, b, c, d] = t 3 − μt 2 + (1 − 2σ 2 )t − μ.
In the notation of Lemma A.1, we have Therefore, the discriminant Δ = Δ(μ, σ 2 ) in this case becomes Note that for μ = 0, we recover the case already treated above and in [1]. Lemma A.1 tells us that, when Δ(μ, σ 2 ) < 0, we have one single root in t, and so we have unimodality in t and so in θ, and that, when Δ(μ, σ 2 ) > 0, we have three distinct roots in t and hence we have multimodality in t and so in θ. The proof of the theorem is not complete yet, because it is still possible that for Δ(μ, σ 2 ) = 0, there are three identical roots. Note that since our polynomial is monic, then to have three identical roots β ∈ R it needs to be of the form: This implies the following conditions on μ and σ 2 : Note that the first and third condition together imply either β = 0 and so μ = 0 and σ 2 = 1 2 which has been treated before, or β 2 = 3. But β 2 = 3 together with the second condition implies 1 − 2σ 2 = 9 which is a contradiction. This completes the proof of the theorem.
Remark A.2. The solutions of this system can be calculated explicitly and pretty easily using some symbolic software like Maple or Wolfram Alpha. However, they have a complicated form and do not add any extra insight. Therefore, we have decided to not report them in the paper.

Proofs of Comparisons Theorems
Proof of Theorem 3.1 We have for K : Consider the first argument of the max. We get (for κ 1): e κ cos(θ) 2πI 0 (κ) With the choice of the parameters in Theorem 3.2, the BVM and BISND match at second order near the origin. If we want the two families to match at cubic order near the origin, we need a = 0 (which is the coefficient of the term −θ 1 θ 2 2 − θ 2 θ 3 1 ). It is not possible for the two families to match at quartic order since that would require κ 1 = −12b 11 (coefficients of θ 4 1 ) and κ 2 = −12b 22 (coefficients of θ 4 2 ) which would be in contradiction with the matching conditions at second order κ 1 = b11 4 and κ 2 = b22 4 .