Probabilistic modelling of general noisy multi-manifold data sets

The intrinsic nature of noisy and complex data sets is often concealed in low-dimensional structures embedded in a higher dimensional space. Number of methodologies have been developed to extract and represent such structures in the form of manifolds (i.e. geometric structures that locally resemble continuously deformable intervals of R j 1 ). Usually a-priori knowledge of the manifold’s intrinsic dimensionality is required. Additionally, their performance can often be hampered by the presence of a signiﬁcant high-dimensional noise aligned along the low-dimensional core manifold. In real-world applications, the data can contain several low-dimensional structures of different dimensionalities. We propose a framework for dimensionality estimation and reconstruction of multiple noisy manifolds embedded in a noisy environment. To the best of our knowledge, this work represents the ﬁrst attempt at detection and modelling of a set of coexisting general noisy manifolds by uniting two aspects of multi-manifold learning: the recovery and approximation of core noiseless manifolds and the construction of their probabilistic models. The easy-to-understand hyper-parameters can be manipulated to obtain an emerging picture of the multi-manifold structure of the data. We demonstrate the workings of the framework on two synthetic data sets, presenting challenging features for state-of-the-art techniques in Multi-Manifold learning. The ﬁrst data set consists of multiple sampled noisy manifolds of different intrinsic dimensionalities, such as Möbius strip


Introduction
Dimensionality reduction and Density Estimation of raw data, are commonly used tools to extract information from complex and noisy data sets. Due to dependencies among measured attributes of real world data, the data is often distributed along low-dimensional structures in a higher dimensional measurement space. This realisation has driven the development of a variety of Manifold Learning algorithms. Principal Component Analysis (PCA) [1] is a well understood and widely used linear dimensionality reduction scheme. However, by design, PCA cannot appropriately capture non-linear low-dimensional structures. This lack of flexibility has been addressed by non-linear dimensionality reduction algorithms such as Isomap [2] and Locally Linear Embedding (LLE, [3]), where the manifold is approximated by a neighbourhood graph and a neighbouring preserving map respectively. Both methods take advantage of the definition of a manifold as a locally linear low dimensional structure of dimension j. Many other Manifold Learning algorithms aiming to provide suitable approximations to low-dimensional data manifold have been proposed. Examples include Laplacian eigenmaps [4], Hessian eigenmaps [5], Local Space Tangent Alignment (LSTA, [6]), C-Isomap [7] (an extension of Isomap to conformal embeddings), NMDS (nonmetric formulation of Multidimensional Scaling (MDS) [8], [9], [10]), Riemannian Manifold Learning (RML [11]).
In [12] (and bibliographic references therein), a different angle, based on computational geometry, has been proposed in order to extract low-dimensional manifolds from data samples. Here, simplicial complexes (such as Delaunay triangulations, α-complexes and filtrations) are used for manifold reconstruction. With these techniques it is possible to infer geometrical and topological properties of data points that uniformly sample a single manifold embedded in a higher dimensional space.
While such techniques are potentially powerful and theoretically well grounded, their capability to naturally handle highdimensional noise aligned along low-dimensional manifolds is limited. Besides the sensitivity to the noise issues, it is often required that the intrinsic data dimensionality is known a priori.
Generative Topographic Mapping (GTM) [13] was proposed as a probabilistic formulation of the Self-Organizing Map [14]. Its main advantage is that instead of treating noisy manifold as a core low-dimensional manifold to be discovered, plus some "noise" around it that somehow needs to be dealt with, it formulates a consistent manifold-aligned density model in the form of a constrained mixture of Gaussians. 2 The original GTM formulation is trained in the maximum likelihood framework using the E-M algorithm [15]. Because of the sensitivity to initialization, a suitable initialization is required e.g., using PCA, or assuming a latent space topology if known a priori [16]. Bayesian formulations of GTM have also been proposed [17].
Other global density estimators, whether non-parametric, e.g. Parzen windows [18] (and its extensions such as Manifold Parzen Windows [19], Fast-Parzen Windows [20]), or semi-parametric, e.g. Infinite Gaussian Mixture model [21], are not designed to extract a representation of the embedded low-dimensional structures. They also may be computationally expensive to train and/or evaluate.
To deal with complex data sets, where several manifolds of different dimensionalities can co-exist, generalizations of the previous methods have been developed: Multi-Manifold Discriminant Analysis (MMDA, [22]), Sparse-Manifold Clustering and Embedding (SMCE, [42]), Multi-Manifold Isomap (M-ISOMAP [23]), Multi-manifold Proximity Embedding (MPE, [24]), Multi-Manifold LLE (MM-LLE, [25]), S-Isomap++ [26], Hierarchical GTM [27]. However, based on the same assumptions as their predecessors, the methods still need to be informed about the dimensionalities of the different manifolds and struggle when dealing with topologically complex, noisy structures. The works proposed in [28] and [29], are particularly relevant in terms of dimensionality estimation and clustering. The first methodology relies on the construction of Translation Poisson Mixture Models (TPMM) for estimating the dimensionality of local neighbourhoods in the data. However, by design, it is prone to separate a unique manifold if the local density of points changes throughout the manifold. The second methodology, Hidalgo, is a Bayesian extension of TWO-NN 3 [30]. The dimensionality of individual points is obtained by maximization of the posterior distribution over the model's parameters. Despite the appealing formulation, the estimated complexity of O(N 2 ), N being the number of points in the data set, is not well suited for applications considered in this study, where N is generally large.
We propose a framework for automated dimensionality estimation and reconstruction of multiple noisy manifolds embedded in a noisy environment. We generalize the GTM model so that densities aligned along arbitrary manifolds (even non-orientable ones -such as Möbius strip) can be captured. This is achieved by replacing the simple Euclidean latent space (generally parametrized as a discretized interval of R j ) with an abstract graph reflecting the topology of the data manifold that, when embedded in the data space, provides a manifold skeleton around which the noise models can be organized. This work is inspired by [31], but extends and generalizes it threefold: (1) it proposes a new robust dimensionality index estimation for data points, (2) through a dedicated manifold crawling mechanism it allows for completely abstract manifold representations in the GTM latent space (instead of a regular grid) and (3) it has Gaussian noise components naturally aligned along the manifold, unlike the spherical noise models in the original GTM and [31]. Manifold aligned noise models in GTM were also considered in [32], but under the assumption of simple latent space structure in the form of j-dimensional interval. The key idea was to impose larger and smaller variances in directions locally parallel and perpendicular, respectively, to the manifold. Since our latent space is a discrete structure (abstract graph representing a skeleton 2 Location parameters (means) of the Gaussian components are constrained to lie on a smooth manifold -most commonly a smooth image in the data space of a two-dimensional interval (latent space). 3 The methodology is based on distances to the two nearest neighbours of each point. of a given manifold), we formulate local noise models through kernel based estimates of the local covariance matrix of the data, with trainable scale parameter to allow for optimized overlapping of the neighbouring Gaussian components. Our work presents a radical reformulation of GTM to capture and model low-dimensional general noisy manifolds through a dedicated abstract graph-structured latent space reflecting the core manifold structure, specific to each noisy manifold. Bacciu et al. [33] present another radical generalization of GTM in the reverse direction -this time keeping the original simple latent space structure, but allowing for abstract structure in the data space -the space of trees.
The paper has the following organization: in section 2 we set up the scene, explain the broad outline of the methodology and introduce the synthetic data set on which different steps of the methodology will be demonstrated. Section 3 introduces the core model of our methodology, Abstract GTM (AGTM), representing density aligned along a single manifold. We explain how the abstract latent space graph representing topology of the data manifold is extracted through manifold crawling and how this graph is then embedded in the data space along with the suitable set of noise models. We also show how to calculate local curvature at the edges of the embedded graph, taking advantage of the smooth manifold description provided by AGTM. Section 4 defines our notion of Dimensionality index for individual points and extends the framework of section 3 to the multi-manifold case. We also offer a computationally efficient alternative to Multi-Manifold Crawling. The price to pay for gains in efficiency is weaker detection stability of low dimensional manifold entities buried in the data. Section 5 presents an experimental comparison on two synthetic data sets of our methodology with alternative multimanifold learning and probabilistic modelling methods. Even though our methodology aims to provide density models of multiple low-dimensional manifolds buried in the data, we organise the comparative experiments separately for the multimanifold capture and probabilistic modelling aspects of our work. This is because to the best of our knowledge, no other method exists that can simultaneously recover low dimensional representations of an unknown number of manifolds while building their probabilistic models.
Our main contributions can be summarized as follows: • Formulation of a new dimensionality index assigned to individual points based on which point cloud can be partitioned into background points and sets representing cloud points organised along noisy low-dimensional manifold structures; • Development of a recursive crawling algorithm for the extraction of multiple low-dimensional, noisy manifolds embedded in a higher dimensional space: Multi-Manifold crawling; • Extension of GTM's applicability to a broad class of manifolds (e.g. non-orientable or closed manifolds) by reformulating the latent space as an abstract graph and introducing a manifold-aligned noise attached to the embedded nodes of the latent graph: Abstract GTM. The abstract latent space and its embedding allows us to understand the important global structural features of the underlying manifolds -an important aspect of our methodology bringing manifold learning under the umbrella of Artificial Intelligence.
The methodology 4 is applied in section 6 to an astrophysical data set resulting from a mixed N-body/Smoothed Particle Hydrodynamics numerical simulation of a dwarf galaxy falling into the gaseous halo of a galaxy cluster. The point cloud generated by the simulation presents non linear, noisy, low dimensional structures, providing for an ideal test bed for our methodology. We extract and model two of the most significant manifolds, suggesting a possible scenario for formation of new stars in such a disrupted dwarf galaxy. Section 8 summarizes the main achievements and concludes the paper.

Methodology overview
Consider a point cloud Q = {t 1 , t 2 , ..., t L }, t i ∈ R d containing points sampled from an unknown number of noisy lowerdimensional manifolds embedded in a noisy environment (e.g., points generated from a broad d-dimensional distribution).
As an example, Fig. 2 shows a data set in R 3 obtained from a collection of noisy two-dimensional (central panel) and onedimensional (left panel, red points) manifolds. All underlying true manifolds are shown in Fig. 1 -the two one-dimensional ones (Figs. 1a and 1b: parabolic arm and spiral) and the four two-dimensional manifolds (Figs. 1c-1f: cap (hyperbolic surface), 2-toroid, S-shape surface, and Möbius strip). The noisy manifolds are embedded in a uniform noise (Fig. 2, rightmost panel). We also included a uniformly sampled 3-dimensional ball (Fig. 2, left panel, green points). Characteristics of the manifolds can be found in Table 1 and details of their parametric forms and sampling are presented in the Appendix A. Note that the total number of manifold points is only 33% of the whole data set, meaning that 67% of points would ideally be discarded in the initial filtering process.
In the following we give a brief outline of our methodology to robustly detect the manifolds and build their corresponding manifold-aligned density models. We first apply a physics-based diffusion method, structure-Aware Filtering Technique (SAF) [34], that collapses points in close vicinity of dense structures onto them, resulting in a diffused data set Q = {t 1 , t 2 , ..., t L }, t i ∈ R d . The SAF method moves points towards high density regions, enabling points in the vicinity of a noisy manifold to migrate towards its "spine" or "mean surface". Fig. 3 shows the noisy manifolds described previously (Figs. 3a-3f) together with the corresponding recovered mean manifolds via SAF (Figs. 3g-3l). Despite a few imperfections   when compared to their ground-truths (e.g. deformation of the spiral in the second ring from the bottom), the method generally reduces transverse noise to the manifolds. Assuming that the data structures to be modelled are more densely sampled than the noisy environment, we first filter the data sets Q and Q by removing point couples (t i , t i ) that have sparse neighbourhood in both Q andQ. In particular, around each t i ∈ Q and t i ∈Q we construct a hyperball B(t i ; r) and B(t i ; r), respectively, in R d of radius r > 0. In case a point t i ∈ Q lies further apart from a manifold, both B(t i ; r) and B(t i ; r) will be sparsely populated. Hence, if both B(t i ; r) and B(t i ; r) contain less points from Q and Q , respectively, than a pre-specified threshold τ > 0, the points t i and t i are removed from their corresponding data sets.
Following this, the first task in capturing the multi-manifold structure in Q is to estimate local dimensionality of the cloud point around each t i in the form of a dimensionality index δ i (section 4.1). Using the dimensionality indices, we partition the data into subsets Q j , Q j according to the local dimensionalities j = 1, 2, ..., d. Since Q j , Q j can contain several distinct sampled manifolds of dimensionality j, we use a dedicated "manifold crawling" procedure operating on Q to separate the individual manifolds (section 4.2). Moreover, the crawling also produces for each manifold a graph structure embedded in R d representing a piece-wise linear "skeleton" approximation of the spine of the noisy manifold (section 3.2).
For every manifold, the associated graph will then function as an abstract latent space of a generalized form of Generative Topographic Mapping (GTM) [13] that we call Abstract GTM (AGTM). Using points in Q, the generalized GTM produces manifold aligned density models (section 3.1). Finally, if a single summary model is required, density models of all detected manifolds (AGTMs) can be grouped together in a hierarchical mixture model [27] representing in a concise manner the global density of the low-dimensional structures in the dataset Q.

Density model of a single noisy manifold
In this section we deal with construction of manifold-aligned density model for a single manifold. For the sake of presentation clarity, slightly abusing mathematical notation, we use D = {t 1 , t 2 , ..., t N } to denote the subset of points from Q belonging to that manifold. The set D = {t 1 , t 2 , ..., t N } contains the corresponding diffused points from Q . Section 3.2 presents a recursive algorithm (Manifold Crawling) capable of recovering this latent graph, both in its abstract form and as an embedded graph in the data space.

Abstract GTM
Let us consider a j-dimensional manifold M embedded in a higher dimensional space R d , j < d. In the following, we assume that the dimensionality j of the manifold M is known. We will later (section 4.1) provide a methodology for the estimation of the intrinsic dimensionality of points lying on low-dimensional manifolds. We consider D = {t 1 , t 2 , . . . , t N } a data sample from M, obtained from the noisy manifold sample D through SAF (see section 2). The original GTM [13] assumes that the manifold M is an image of the j-dimensional interval [−1, +1] j (latent space) under a smooth embedding y : [−1, +1] j → R d . To add noise to M, the latent space [−1, +1] j is covered by a regular grid {x i } K i=1 of K points, whose images y(x i ) under y form a "skeleton" of M in the data space. A spherical Gaussian noise model is then positioned at each skeleton node y(x i ), thus representing the noisy manifold as a mixture of K Gaussians centred at y(x i ) (see Fig. 4a). The latent space grid structure can be represented by an abstract undirected graph G = (V, E), where each grid point x i corresponds to a vertex v i ∈ V and edges e ij ∈ E are connecting vertices corresponding to neighbouring grid points x i and x j . The structure of G is directly reflected in the latent grid shown in Fig. 4b. We will generalize the GTM model in that the manifold skeleton will be obtained by direct embedding of the latent graph G into the data space through a parametrized mapping f. Individual Gaussian noise models will be centred at the images v i = f(v i ) ∈ R d of the nodes of G. The manifold skeleton will then be formed by the embedded graph G = (V, E), where V = {v i , ..., v K } and e ij ∈ E whenever e ij ∈ E. This will enable us to naturally represent density models of noisy manifolds of much more intricate structure than that of a smoothly embedded low dimensional interval. For example, the sample from noisy Möbius strip shown in Fig. 4b (right panel) is captured by a Gaussian mixture with centres at fixed skeleton points in Fig. 4b (central panel), The skeleton points themselves are images under f of vertices v ∈ V (Fig. 4b, left panel). The abstract graph G is presented with the usual topological convention by identification of opposite sides of a rectangular grid with inversion of direction. The additional edges resulting from such an identification are shown in red.
Motivated by the original GTM model, we formulate the embedding f : V → R d as a nonlinear model, linear in parameters. In particular, we will use a set of M basis functions φ m : V → R, m = 1, 2, ..., M, operating on the abstract latent space where W is a d × M matrix of weights. To formulate the basis functions φ m , we construct a regular -net of the graph G (see e.g., [12]) with M nodes. The nodes c m ∈ V of the -net form the centres of the corresponding basis functions: where D(v, c m ) is the shortest distance in G between the vertices v and c m . Analogously to the original GTM, to ensure smoothness of the embedding f, the scale parameter was set 5 to 2 .
The manifold aligned probabilistic model is a flat mixture model where the mixture components are locally manifold-aligned multivariate Gaussians centred at the embedded vertices v i ∈ V: with t = f(v i ; W) − t. Unlike in [32], we model the local manifold-aligned covariance matrix as a scaled version (scaled by ζ i > 0) of the local covariance matrix estimated as The kernel scale parameter i is determined as the average of the distances between the embedded vertex v i and its neighbouring vertices in G, embedded in R d , As a latent variable model, AGTM can be trained to maximize log-likelihood via the E-M algorithm. In the E-step, which is the same as in the original GTM model (in fact, in any mixture model), the responsibilities R i n are calculated as the posterior distribution over nodes v i ∈ V, given the data points t n ∈ D [13]. The Mstep updates for parameters W and ζ are obtained by differentiating the expected value of the complete data log-likelihood L(W, ζ ) (equation (4)) w.r.t. the corresponding parameters and setting to zero: In particular, We rewrite the last equality in matrix notation: where R i is the (N × 1) column vector containing the responsibilities for every point t n w.
We can solve equation (8) by taking advantage of the properties of the Kronecker product and vec operator (see [35], sections 5.1 and 10.2), obtaining: We now turn our attention to equation (6). We have: Solving this for every ζ i , the scale parameters ζ of the local covariance matrices are updated as where D 2 This has an intuitive interpretation: the scale parameter ζ i of the i-th Gaussian mixture component is obtained as the (effective) squared mean Mahalanobis distance of points around component i to its centre v i , per dimension. As an example, consider the noisy sample around Möbius strip shown in Fig. 4b (right panel). Fig. 5a presents the embedded abstract graph and Fig. 5b an iso-surface of the density model given by the AGTM model trained on the sample.

Manifold crawling
The previous section specified how to obtain probabilistic models of noisy manifolds using AGTM. In the following, we will describe a recursive algorithm that enables us to recover the abstract graph and the embedding function f, given the data set D . We will refer to this procedure as Manifold Crawling, or simply Crawling. Recall that the manifold dimensionality is j < d.

Initialization
Initially, a single "seed" t 0 ∈D is randomly chosen and Principal Component Analysis (PCA [36]) is applied to its neighbourhood B(t 0 , r) ∩D. We assume that the unit length eigenvectors obtained through PCA are ordered in descending order of their corresponding eigenvalues.
The first j eigenvectors u 1 , . . . , u j span the tangent space to manifold M at t 0 , T˜t 0 M := span{u 1 , . . . , u j }. For every eigenvector u l , l = 1, . . . , j, two new points are computed at a distance η · r from t 0 along directions ±u l : where 0 < η < 1 is a step-size parameter. In order to keep the crawling adherent to the diffused sample D , the closest neighbour to every z ± l is found: We start the construction of the embedded graph G by initializing the set of nodes and edges: At the same time we initialize construction of the abstract graph G by imposing that every node v l in V corresponds to a node v l ∈ V in the abstract counterpart G of G. The node v l can be considered the image of v l through an unknown embedding function f : We now compute the degree of each node in graph G (analogously in G). The degree deg(v i ) of node v i ∈ G is the number of edges that are incident to that node. We select vertices having only one edge After the initialization phase, each successive iteration is split into two phases: expansion and contraction. The two phases are alternated for each newly discovered node v i ∈ V E 1 (blue dots in Fig. 6) having only one edge 6 connecting it to its parent node v p .

Expansion phase
Given a node v i in V E 1 , the eigen-decomposition of its neighbourhood B(v i , r) ∩D is performed, yielding the first j leading eigenvectors g 1 , . . . , g j . To preserve the crawling directions, the eigenvectors of the parent's node u . . , g j } and renormalized to unit length: Here P = VV + is the orthogonal projection operator onto T v i M, with V the matrix containing g l as columns and V + its pseudo-inverse. (see Fig. 7). As in the initialization step, new points are then computed using equation (12) and their closest neighbours t ± l in data set D stored as candidate nodes to be added to V . The result of the expansion phase for a given iteration is shown in Fig. 6. The green points are the new candidates, the dashed black lines represent the principal crawling directions. Construction of the set of new candidate nodes C(v i ) = {t ± l , l = 1, . . . , j} around v i terminates the Expansion phase.

Contraction phase
In the contraction phase we check if the candidate nodes together with V can be collapsed into a smaller set of nodes.
For every new candidate t c ∈ C(v i ) from the Expansion phase, we check if it lands within a small "tolerance" neighbourhood B(v e , β · r) of an already existing node v e of G. Here 0 < β < η < 1 is the tolerance scale parameter. If this is the case,

Initialization of the latent graph embedding f(v; W)
The node sets V and V will be used to initialize the embedding f(v; W) (equation (1)) of the abstract latent space G into the data space R d by setting the weight matrix W through linear regression so that f(v i ; W) ≈ v i for all corresponding  ) is unweighted, the edges in G are weighted by the distance between the two connected nodes.
If needed, one can regularize the parameters W by applying ridge regression instead of linear regression, by minimization of the cost function: solving the system of equations: ( Here, X is the diagonal matrix having as diagonal elements the regularization parameters {χ j } M j=1 . The solution to the system in equation (18) is the normal equation [37]: In order to recover a smooth mapping function with small optimal weights Ŵ , in our tests we chose χ j = χ = 1e − 5 for all j = 1 . . . , M.

Local curvatures
As outlined in [27], having obtained a smooth embedding of the latent space, we can now estimate local curvatures along edges of the abstract graph. Consider a node v 0 ∈ V and its edge e 0,1 connecting it to node v 1 . Since G is an unweighted graph, we assume, without loss of generality, that the weight on the edge e 0,1 is w 0,1 = 1. We would like to estimate the effect of a small perturbation 0 < h << 1 on the position of the node v 0 along the edge e 0,1 , relative to the basis functions centres c m , m = 1, . . . , M.
We can consider e 0,1 as a segment of unit length and estimate the change in tangent vectors as we move along the embedded image of e 0,1 . It is crucial that the tangent vectors have normalised length (in our case -unit vectors) to remove the effects of parameterisation speed. The tangent vectors are estimated using finite differences. To that end, we insert five new nodes x 1 , . . . , x 5 along the edge e 0,1 and connect them in a chain from node v 0 to node v 1 , with edges e i,i+1 and respective weights: where, as mentioned above, h > 0 is a small perturbation parameter (in this study, h = 0.01). The chain of newly defined nodes, connecting v 0 and v 1 , replaces the edge e 0,1 and acts as a discretization of the continuous segment in a small neighbourhood of the node v 0 . We define a discretized "lifted line" ω(x i ) = f(x i ; W) given by the mapping f through the new nodes x i , i = 1 . . . , 5. Given this discretization, we can approximate unit tangent vectors at ω(x 2 ) and ω(x 4 ) via central differences: . (21) The curvature at ω(x 3 ) can be evaluated through the difference of the unit tangent vectors at ω(x 2 ) and ω(x 4 ), relative to the geodesic distance from ω( , which can be written as . (22) This perturbation based estimation can be regularised by observing that for sufficiently small h, the distances between the embedded neighbouring nodes, ω( , are approximately equal and can be represented by h , and ω( In practice, we calculate the mean length ˜ of edges of the abstract latent graph embedded in the data space and set h = h ·˜ . The norm ϒ(x 3 ) quantifies the directional curvature of f in the vicinity of node v 0 , along edge e 0,1 .
To demonstrate the workings of our local curvature estimation we first performed a set of controlled experiments on concentric circles with known radii. We expect that the estimated curvatures will approximate the inverse radius. 7 After generating points on circles of radii r 0 = 1, 2, 4, we performed crawling and latent graph construction on each circle manifold. We then evaluated curvatures on all edges of the embedded graphs. The results are presented in Fig. 9. As expected, the curvatures closely follow the r −1 0 relation -the curvatures on circles of radii 1, 2 and 4 are approximately 1, 0.5 and 0.25, respectively. As a further illustrative example, we consider the case of a 2-dimensional manifold M, a "sphere with a bump", embedded in R 3 , shown in Fig. 10, left panel. Manifold M can be expressed in terms of a spherical and "peak" components, respectively C S and C P . The component C S is sampled by a uniform distribution U (θ, φ) over the angular components of the spherical coordinate system (θ, φ) ∈ I θ × I φ = [0, 2π ] × [−π /2, π/2] with fixed radius r = 1. The "peak" component is an additive term to coordinate r generated by a Gaussian distribution N (μ P , C P ), where μ P = (θ P , φ P ) = (π /2, −π /4) and covariance matrix C P = diag(π /90, π/180). In other words, we define a varying radius over the angular component of the coordinates as: Sample around M is then obtained by converting the spherical coordinate system to the Cartesian coordinates x, y, z using: x = r sin θ cos φ; y = r sin θ sin φ; z = r cos θ.
By applying manifold crawling and AGTM to this sample we recover a smooth manifold through the basis functions formulation, and compute, for every edge in the abstract latent graph a value for the curvature as defined in equation (23). Fig. 10 (central panel) presents the extracted graph where edges are colour-coded by their curvature (we only show curvature for the portion of the graph containing the peak and the adjacent spherical structure). The visualization in Fig. 10, right panel presents the same portion of the graph as in the central one, but unfolded on a plane. Here the central region captures the extreme curvature of the culminating part of the peak. The curvature decreases to a minimum where the Gaussian peak smoothly approaches the sphere's surface, but steadily converges to the constant curvature of the spherical surface in the outer edges of the planar representation. This behaviour can also be verified in the central panel and is to be expected by construction of the data set.

Multi manifold learning
Until now we have only considered a single noisy manifold embedded in R d . We now extend our method to multiple manifolds of possibly different dimensionalities, embedded in a noisy background. We will use as an illustrative example, the data set Q described in section 2 and shown in Fig. 2. The data is processed by removing the low-density background "noise" and producing a pair of corresponding data sets Q and Q , as outlined in section 2. Recall that while Q contains samples of noisy manifolds diffused to their "core" or "spine", Q collects the original samples distributed along the manifolds. In order to build the initial skeleton model for individual manifolds by Manifold Crawling (section 3.2) and subsequent density modelling around it through Abstract GTM (section 3.1), we are missing only one piece of information: the intrinsic dimensionality of each manifold, given a characteristic scale r.

Local dimensionality estimation
Around each t i ∈Q we perform local Principal Component Analysis (PCA) using points from B(t i ; r) ∩Q, obtaining The dimensionality index of t i ∈Q used in [31] (limited to 3-dimensional data and derived from [38]) was obtained as where Here, we suggest a general method for computing dimensionality index of points distributed in spaces of arbitrary finite dimension d, based on renormalized eigenvalues,  1). Considering S as the simplex of multinomial distributions, the appropriate Riemannian geodesic distance is the Fisher distance [39]. In particular, for any ˜ k , ˜ l ∈ S, Note that the vertex s j of S 0 corresponds to the ideal normalized eigen-spectrum of a j-dimensional neighbourhood. Hence, we will quantify the degree to which the neighbourhood of t i resembles a j-dimensional hyperplane in R d by the closeness of ˜ i to s j in terms of the geodesic distance (26), We also propose a 'soft' and spatially smoothed version of dimensionality index. This is done by positioning an isotropic kernel on top of each vertex e j of S with kernel scale κ set to the geodesic distance between the equidistant point s d ∈ S to every vertex of S (the 'centre' of S), i.e. κ = d J (e j , s d ), for any j. As explained above, due to the imposed eigenvalue order, the normalized eigen-spectra ˜ i live in the sub-simplex S 0 of the standard simplex S. To transform S 0 to S, we calculate the barycentric coordinates [40] The eigen-spectrum ˜ i is thus mapped to the point in S with barycentric coordinates α i . This leads to a soft kernel-based dimensionality distribution .
It may happen that the normalized index distribution P i is abruptly changing across neighbouring points due to low sample density (e.g., at the manifold edges). We therefore spatially smooth P i using a Gaussian filter 8 The smoothed normalized index distribution reads: where the sum is taken over diffused points t l in the spherical neighbourhood of t i of radius r. The smoothed dimensionality index of t i is then   Knowing a-priori the true dimensionality of each point, we provide a qualitative analysis of the different dimensionality index formulations in the form of confusion tables shown in Fig. 12. The confusion tables present percentages of correctly and incorrectly classified points depending on their true dimensionality and their corresponding dimensionality indices. We have also included the "Noise" class containing points generated as the lower density background noise and supposed to be filtered out in the initial pre-processing step outlined in section 2. Left, middle and right panels show confusion matrices of indices δ O i , δ G i and δ S i , respectively. Compared with δ O i , reformulation of dimensionality index as proximity of renormalized eigen-spectrum to the "ideal" eigen-spectrum for each dimensionality 1 ≤ j ≤ d (index δ G i ) stabilizes the estimation. The index is further stabilized by spatial smoothing of the soft version of δ G i , provided by index δ S i .

Multi manifold crawling
Having obtained dimensionality index δ S i for each point t i ∈Q (and hence also for the corresponding point t i ∈ Q), we perform a partition of the two data sets according to their dimensionality indices, Q j = {t i ∈ Q| δ S i = j} and Q j = {t i ∈Q| δ S i = j}. For a given 1 ≤ j < d, the data sets Q j and Q j contain the noisy and "crisp" samples, respectively, along manifolds of intrinsic dimension j. While the Manifold Crawling algorithm of section 3.2 can be used to crawl and assist model building of a single manifold, the method can be extended to crawl and isolate multiple manifolds of a given dimensionality j. We initialize the "to be processed" data set R as Q j . Then the Manifold Crawling algorithm is run on R with a scale parameter r. This results in graphs G 1 and G 1 , forming skeletons of individual manifolds of dimensionality j. Here we assume that the separation between manifolds in Q is greater than r.  The manifolds sampled in Q j could be separated by larger distances, so that the previous crawling run may not have reached them. Therefore, while R = ∅, another seed is picked at random from R and a new crawling procedure is initiated. This produces graphs G 2 and G 2 , accompanied by the corresponding manifold-aligned data A 2 . After reduction of R as described above, while R = ∅, the whole process is repeated. To guarantee smoothness of the AGTM density estimates we conservatively set the -net parameter for basis function positioning in the abstract graphs (see section 3.1) to = 1. Due to relatively modest curvature in the low-dimensional structures, the crawling step size parameter η (see section 3.2) was set to η = 0.75. The tolerance scale parameter in the contraction phase of the crawling procedure was set to β = 0.4. As mentioned earlier, we used r = 0.1 for the characteristic scale parameter. Fig. 13 shows the embedded graphs G (black lines) and overlaid AGTMs (red and blue pdf isosurfaces) of manifolds in the example data set described in section 2, plotted by intrinsic dimensionality of the corresponding manifolds This can be set based on prior knowledge, or after inspection of the data set. Alternatively, the multimanifold learning method can be used as a semi-automatic exploratory tool by the domain experts, where the focus and characteristic scale r of the structures to be mined can be varied continuously, starting from capturing coarse structures in the data (larger r), and proceeding with more detailed analysis, as potentially more and more low-dimensional manifolds start to appear with decreasing r.

An efficient alternative to multi-manifold crawling
One can approximate the results of Multi-manifold crawling by constructing an -neighbourhood graph on specific samples of the data set Q j , with set to r, the radius used for local PCA in the crawling algorithm. We first cover Q j via L hyper-balls B(t k , c), k = 1, 2, ..., L, of radius c = r/2, centred at t k ∈Q j located using the recursive algorithm of [20]. Hence, Q j = ∪ L k=1 S k , where S k ⊆ B(t k , c) ∩Q j . For every set S k , we define its centre s k as its sample mean, obtaining the set of centres S = {s 1 , . . . , s L }. By design, we have for the minimum Euclidean distance between any pair of centres s m , s n ∈ S, m = n: We now compute the distance between all centres obtaining a square, symmetric matrix D m,n = d E (s m , s n ), ∀m = n, m, n = 1, . . . , L, from which we can recover an initial estimate of the adjacency matrix A of graph G. The adjacency matrix has non-zero elements only for elements of D smaller than r: This (initial) adjacency matrix A tends to be overly connected, because it does not take into consideration the alignment of the tangent spaces of neighbouring partition centres. In order to include this geometrical information in A, for every set S k we compute its eigen-decomposition and approximate the local tangent space to the unknown manifold M at centre s k by the first j eigenvectors u 1 , . . . , u j collected in matrix U k = [u 1 , . . . , u j ]. For each pair of connected points (whose corresponding entry in the initial adjacency matrix is 1), we compute the angle between the corresponding tangent spaces [41] as | m,n | = arccos(| det M m,n |), where M m,n = U m U n ∈ R j× j . We retain the existing edge only if the angle between the connected points is | m,n | ≤ π/3. Alternatively, one could define an additional hyper-parameter 0 < T θ < 1 so that smaller or larger angles are tolerated in the construction of the graph. The condition on the final adjacency matrix would then be: In our case, we simply imposed T θ = 2 3 .
Having constructed the adjacency matrix A for a set Q j , we split the graph G corresponding to A into its maximal connected components K c = (V c , E c ), c = 1, . . . , N G , Here, V c and E c are the sets of centres (nodes) and edges, respectively, of the component K c . For each component we define the corresponding abstract graph as K c = (V c , E c ) as in the previous section. We can now use the N G extracted pairs of embedded and abstract graphs as initialization to AGTM, as an alternative to the Multi-manifold crawling initialization presented in the last section.

Experimental comparison to existing methods
While multi-manifold AGTM covers both multi-manifold learning and probabilistic modelling simultaneously, most methodologies in the literature focus on one or the other aspect of capturing spatial structures in point clouds. We therefore split the evaluation of AGTM in comparative experiments along two lines: (1) recovery of a collection of smooth manifolds of possibly different dimensionalities and (2) full probabilistic modelling of the data distribution. We will use two main data sets. The first one is data set D 1 = Q presented in section 2 composed of six sampled manifolds of different dimensionalities embedded in R 3 . We recall the presence of two non-linear 1-dimensional (curved spiral and hyperbolic arm) and four 2-dimensional manifolds (toroid, Möbius strip, parabolic cap, S-shape). The second data set D 2 consists of three interlocked noisy toroids (Fig. 14). Both data sets are embedded in uniformly distributed noise over the domain. All methods for this comparison are applied to the data sets after the initial filtering methodology (section 2), obtaining both a diffused D i and noisy D i version of each data set.

Multi-manifold learning
In this section we will compare manifold crawling with the alternative efficient graph construction presented in section 4.3, as well as with two main algorithms found in the literature, for which the codes are publicly available: Sparse Manifold Clustering and Embedding (SMCE) [42] and Low Rank Neighbourhood Embedding (LNRE) [43]. We test the four methodologies on data sets D 1 and D 2 in order to verify their ability to deal with multiple manifolds of different dimensionalities and topologies. Both SMCE and LRNE are designed to cluster a mixture of low-dimensional manifolds embedded in a higher dimensional space, while recovering a low-dimensional embedding for each manifold. In both methods the number of manifolds in the data set has to be provided by the user. In contrast, Multi-manifold crawling and its alternative graph construction recover the number of manifolds in the data set automatically, once the characteristic size of the manifolds is provided. Fig. 15a, presents a qualitative comparison 9 of the embeddings recovered by LRNE (second column), the efficient alternative to Manifold Crawling (third column) and Manifold Crawling (fourth column) on data sets D 2 . While the clustering performance is similar for all methods (first row), the embeddings recovered by LRNE do not explicitly display the properties of the three toroids. The graphs recovered both by Manifold Crawling and its alternative graph construction provide much more insight into the topological structures underlying the sampled noisy manifolds. We have also tested the stability of Multi-manifold crawling and the -graph alternative with respect to (hyper-)parameter settings. To that end, we evaluated the number of recovered manifolds over a range of hyper-parameter values (Fig. 15b, upper row). The blue boxplots show the statistics over the number of manifolds recovered by the -graph construction, for 30 different samplings of data set D 2 at a given radius. The black compact box-plots 10 show the same measure for 30 iterations of Multi-Manifold crawling over the same data set at the same radius, for different values of the parameter β. Not surprisingly, Multi-manifold crawling displays a higher stability than its alternative, over a range of values of the parameter r, while little change is 9 Parameters of each method were tuned to optimize performance on D 2 . 10 For visualization purposes, the black compact boxplots are shown as hollow dots for the means and black vertical lines for the variances, plus smaller black dots for outliers were present. due to the parameter β for a given radius. While both methods converge to the correct number of manifolds (three in this case) starting at r = 0.1, the efficient -graph method vastly overestimates the number of manifolds when the radius is too small. The computational times are shown in Fig. 15b, lower row. This discrepancy in the quality of the results stems from the imposition, in the alternative methodology to crawling, of a "hard" angular threshold that controls variability of the angles between neighbouring tangent spaces. Even a slight deviation from this threshold may prevent two nodes belonging to the same manifold from being connected. The efficient alternative critically relies on angle relations between local PCA subspaces. The subspace decompositions become increasingly unreliable with decreasing radius, while no such reliance on subspace angles is needed in the crawling procedure.
The same comparison is shown in Figs. 16 and 17 for data set D 1 . For both Crawling and its approximation, the clusters are obtained by applying the procedure outlined in section 4.2 and 4.3 respectively. While in the case of data set D 2 , LRNE was able to successfully recover the manifolds for a set of carefully tuned parameter configurations, the results proved unsatisfactory when applied to data set D 1 . In particular, no parameter setting was able to distinguish between the Möbius strip and the parabolic arm, as shown in Fig. 16, top right panel, red points. All the resulting embeddings (Fig. 16, rightmost column), with the exception for the hyperbolic surface (cyan points, bottom panel) and the spiral (third top panel, blue points), fail in capturing the low-dimensional structure of the corresponding manifold.
The first two columns of Fig. 16 illustrate the performance of Multi-manifold crawling and the -graph alternative on D 1 . Although the representation of the 2-dimensional manifolds is coherent over the two methodologies, the -graph tends to break prematurely the 1-dimensional structures into smaller segments. This behaviour is confirmed over various values for the radius, as shown in Fig. 17, top row. Here, in contrast to the data set D 2 , convergence to the correct number of   manifolds is achieved only by Multi-Manifold Crawling, while the efficient -graph alternative, consistently overestimates this measure. This clearly demonstrates the enhanced accuracy and stability of Multi-manifold crawling, of course, at the price of a higher computational time (Fig. 17, bottom row).
The application of SMCE with the set of parameters in Table 2 has proven unsuccessful on both data sets D 1 and D 2 .
In order to perform at least a qualitative comparison with our methodology, we applied manifold crawling to the data set described in [42]. The data set consists of two noise-less, 3-foiled interlocked knots. The application of both methods on the presented data set proved equally satisfying for various parameter settings. However, when introducing a small amount on noise along the manifold, the performance of SMCE greatly deteriorates, both in terms of clustering accuracy and quality of the embedding (Fig. 18a, top row). The results do not improve when the data set is diffused and filtered beforehand, for any parameter setting. In contrast, Manifold crawling recovers the underlying ground truth by building two 1-dimensional, closed graphs and clustering the noisy data set accordingly (Fig. 18b).

Probabilistic modelling
In the last subsection we evaluated the construction of embedded latent space of the AGTM model as a manifold detection and approximation method, specifically designed to handle general low-dimensional manifolds embedded in finite dimensional Euclidean spaces. In this section we will treat AGTM (section 3) as a generative probabilistic model of data aligned along such (noisy) manifolds. As explained above, we needed to split the model evaluation into those two distinct strands as up to our best knowledge we could not find a competitor that would naturally span general manifold learning and probabilistic modelling.
We will compare AGTM (initialized with Manifold crawling or the efficient -graph alternative) with two probabilistic modelling techniques, namely Fast Parzen Window (FPW, [20]) and another unsupervised algorithm for learning finite mixture models from multivariate data [44]. For both data sets D 1 and D 2 we apply a 5-fold cross-validation scheme. Each method (except initialization via Multi-manifold crawling) is applied thirty times on training folds for a given parameter setting and the average log-likelihood per point of the generated model is evaluated on the corresponding hold-out fold. The Multimanifold crawling initialization is repeated five times on each training set, each run with a different β parameter. The results for the two data sets, multiple manifolds of different dimensionalities (D 1 ) and three interlocked toroids (D 2 ), are presented in Figs. 19b and 19a, respectively. Top row in each plot shows the average log-likelihood for different parameters as a boxplot over the initialization repetitions. Bottom row shows the computational time required for training of each model. The performance of FPW is generally higher and faster than all other methods, although this is a natural consequence of it being an efficient unconstrained probabilistic modelling algorithm. The manifold structure was often lost in the set of mixture components. However, for both data sets, AGTM initialized with Multi-Manifold crawling is comparable to FPW and outperforms both its alternative initialization and the algorithm proposed in [44] in terms of average log-likelihood. It is also worth noting that the computational time of AGTM is comparable to, if not lower than, the finite mixture method of [44]. In conclusion, AGTM provides faithful probabilistic model of the data, while providing a clear interpretation of the low-dimensional general manifold structures along which the data is organized.

Experiments on a jellyfish galaxy
We will demonstrate our methodology on the analysis of formation of a curious astronomical object, "jellyfish galaxy", through a detailed astrophysical simulation of its formation. The term "jellyfish galaxy" refers to an observed galaxy showing signs of gas stripping [45], whose signatures are a dense "head" of mainly gas and stars and an elongated gaseous, starforming tail. These galaxies are usually observed when falling into large clusters of other galaxies, where the hot ionized gas filling the cluster is able to strip away "tentacles" of relatively cold gas from the galactic body.
The technique described above can be a valuable tool to investigate the behaviour of physical quantities in the head and in the gaseous tail. We will focus our study on identifying star formation regions in the head and in the tail of the galaxy for two main reasons: i) presence of new stars born in the head is an important indicator of how much the galactic gas is affected by the stripping pressure; ii) since stars are created from dense and cold regions of gas, the presence (or lack thereof) of stars formed in the gaseous tail carries information about how much the galaxy gas is mixed with the hot gas in the surrounding environment.
To quantify the star formation we will measure the intensity of the [C ii] emission line. Recent observations with the Herschel Space Observatory showed a tight correlation between the intensity of [C ii] and other well known tracers of Star Formation Rate (SFR) [46,47]. The study can provide useful insights on the formation scenario of galaxies infalling in a cluster [48]. As an example, dwarf galaxy NGC 1427A in the Fornax cluster provides an interesting case of still unclear formation scenario and a generally accepted common interpretation is still lacking [49][50][51].
Starting from an existing suite of simulations of dwarf galaxies evolving in a Fornax-like cluster environment [51], we chose a single simulated snapshot representing an irregular, gas rich galaxy, exposing an elongated star forming gaseous tail during intense ram pressure stripping. The simulation is performed using a modified version of the mixed N-body/Smoothed Particle Hydrodynamics (SPH) code GADGET-2 [52,53].
In order to simulate the evolution of the galaxy with high resolution and to reduce the computational cost of simulating the whole galaxy cluster, we take advantage of the moving box technique [54]. In this formulation, we follow closely the evolution of the galaxy by immersing a cubic volume ("box") enveloping the galaxy, in the gravitational potential of the galaxy cluster. The gravitational potential is modelled via a spherically symmetric, static, Navarro-Frenk-White profile (NFW, [55]). The box position and orientation is updated at every time-step so that its x-axis is always tangential to the orbit of the galaxy at every moment. Also at each time-step, new gas particles are injected into the simulation box from its open "front" face and existing ones removed when close to the opposite side of the box. 11 This injection-ejection technique is used to simulate the motion of the galaxy through the cluster's gas and it is crucial in studying processes such as gas stripping. The density and temperature of the injected particles change according to the position of the box within the cluster, following the radial profiles described in [56] and assuming hydrostatic equilibrium of the gas.
This setup allows us to study the evolution of the galaxy in high detail, while being consistent with the characteristics of the ambient space (physical properties of the galaxy cluster) at a comparatively lower computational cost.
The computation model in SPH simulations is based on a particle formulation of hydrodynamics where each particle samples physical properties (mass, temperature, density etc.) of a volume of radius r N -radius of the sphere containing N neighbouring particles (smoothing length). A continuous distribution of the physical variables over the full domain is then obtained by spatial smoothing with Gaussian kernels centred on each particle [57]. Associated with each gas particle (representing the corresponding volume) are values of physical quantities such as density, temperature, pressure etc. An estimate of [C ii] emission in this kind of simulations is obtained by using evolved quantities of the gas (metallicity, density and temperature) as inputs of chemical evolution models of the radiating gas, taking into account its ionization equilibrium and ion level occupation model [58,53]. Here we focus on the state of the galaxy after 2.5 Gyrs of evolution since its injection in the galaxy cluster. The data set consists of 135530 gas particles (observations), each having 11 observed variables 12  emission for the volume of gas sampled by the particle.
Since the particles, by construction of the SPH methodology, fill the whole volume of the moving box, we first apply a physically motivated filtering by removing all particles whose density is ρ ≤ 10 −3 atom/cm 3 . This is a lower limit in density for the particles to be able to emit in the radio band, and so to be observed as part of the galaxy. After this initial filtering, the remaining data set consists of 83772 particles. Their distribution over the box's volume is shown in Fig. 20.
Several low dimensional structures are clearly visible, for example the long 1-dimensional manifold departing from the head and elongating along the x-axis of the simulation box. Visually inspecting the data set, we chose a radius r = 1 kpc as the characteristic scale parameter for the manifolds (shown as the small sphere in Fig. 20). The chosen radius is in agreement with the spatial resolution of recent observations for galaxies as distant as NGC 1427A. This parameter is fixed for preprocessing through diffusion and filtering (section 2), local dimensionality estimation (section 4.1) and manifold As mentioned earlier, the main body of the data set can be visually divided into head and tail parts. However, from the topological standpoint there is no justification for clear segregation into 1-D manifolds in the tail and 2-D manifolds in the head. Dimensionality index estimation clearly identified 1-D structures in the tail, such as the elongated stream of particles starting from the head. However, points in the head were also predominantly identified as 13 1-D, due to its complicated, intertwined filamentary structure. On the other hand, distribution of 2-D points was more localized in the main body of the tail.

A multi manifold analysis of a dwarf jellyfish galaxy
We evaluate the performance of AGTM with both its initializations and compare it with the probabilistic modelling methods introduced above using 5-fold cross-validation scheme on the 1-and 2-dimensional gas particles. The average out-of-sample log-likelihood per point and computational times are reported in Fig. 22 for all methods and their parameter settings. From the probabilistic modelling standpoint, all methods perform similarly, with FPW providing for the best score. Both initializations of AGTM are comparable to the results obtained by the method described in [44], with lower computational effort. Initialization by crawling also shows a high stability of the results when compared to both its alternative initialization and the competing methodology. The -graph initialization, despite its larger sensitivity to the randomness of the sampling process, reaches higher values of the average log-likelihood than the initialization obtained via crawling. This behaviour is explained by the tendency of the former to break manifolds into sub-structures. Having more structures to cover the same spatial region, leads to a better performance of the probabilistic modelling algorithm. However, the information about low-dimensional structures retained by the sub-structures, gradually deteriorates as this phenomenon occurs more frequently. The limiting case of this information loss is represented by FPW, where the distribution of the data set is globally captured with a high efficiency, but no additional information on the low-dimensional local structures can be inferred. Having obtained the multi-manifold probabilistic profile of the gas particles in the tail of the jellyfish galaxy, it is possible to perform various kinds of detailed analysis of how physical properties vary along the manifolds. Here we concentrate on the curvature (Fig. 23), as defined in section 3.2.1, and the star formation potential by analysing the behaviour of emission line [C ii] over the 1-D and 2-D structures in the jellyfish tail shown as black dots in Fig. 21 (left) and (right), 12 The number of observed variables is larger than the one presented here. We decided to focus here on the ones relevant for our analysis. 13 We use loose terminology referring to points with dimensionality index j as j-D points.  respectively -the gaseous stream of gas particles departing from the head and reaching half way through the tail and the predominant 2-D structure in the tail. Fig. 23, left panel shows two regions of high curvature, for the 2-dimensional manifold presented in Fig. 21, right panel. The far right region of intense curvature, is located towards the end of the tail, where the gas motion is more chaotic due to the motion of the galaxy through the halo of the galaxy cluster. However, the spherical region on the left side of the manifold, presenting a coherent curvature throughout its elongation (top circle of Fig. 23, left panel) suggests, as a possible cause of formation, an isotropic expansion, typical of Supernova remnants.
Taking advantage of the probabilistic nature of the AGTM model, we show in Fig. 24a  of particles t i in the manifold, where the weights are the posterior probabilities of the node v j , given particles t i : Analogous figure for the 2-D structure is presented in Fig. 24b.
In the 1-D case, the manifold is located at the outskirts of the jellyfish (Fig. 20, left panel), meaning that it is more exposed during the evolution to the surrounding gas of the galaxy cluster. This implies that the manifold is subject to a higher ram pressure than the tail, leading to a higher density and lower temperature of the gas -necessary conditions for the formation of new stars. These conditions are reflected in an increase of the [C ii] emission line over the middle section of the manifold, 3 < x < 6, thus informing us of an enhanced Star Formation Rate, compared to the rest of the manifold. The  2-D structure shows an overall constant [C ii] intensity whereas the region at 5 < x < 9 presents sharply higher values. The shape of this region is particularly interesting. It is, in fact, a hole with an almost spherical section. This structure detected with AGTM and Manifold Crawling, with potent [C ii] emission at its boundary, is the remnant of a supernova explosion. This process is modelled in the simulation via an injection of 10 51 erg of energy for a short amount of time and a transfer of metallic elements (in the case of our simulation, the model tracks iron and magnesium, [53]) to the neighbouring gas particles. The metal-enriched gas particles (like in the surrounding of a supernova explosion) are then able to cool down more efficiently and show strong [C ii] emission line.
Our methodology provides a strong tool for extracting such an information from the morphology of gas particles and can be used to effectively calibrate feedback models 14 in simulations.
Such a detailed analysis of low dimensional structures (remnants of galaxy interactions) is not currently possible with tools routinely used to calibrate and analyse astrophysical simulations of galaxy evolution. As mentioned in section 4.2, the technique presented in this paper can be used as a semi-automatic exploratory tool by the domain experts, where the focus and characteristic scale of the structures to be mined can be varied continuously with analysis of their physical properties of interest (after necessary computations) performed and studied on the fly.

Detecting manifolds of higher dimensions
We performed an additional experiment on a 3-dimensional torus embedded in four dimensions. The torus is parametrized in the angle space (θ, φ, ψ) ∈ [0, 2π ] × [0, 2π ] × [0, 2π ] by 14 A feedback mechanisms, is any process that allows to exchange energy, matter and/or momentum among galaxy components.
The 3-torus is covered by sampling the angle parameters with a large number of uniformly distributed values. We then applied the same procedure for producing a sample from the noisy manifold as used in the previous synthetic data experiments. After pre-processing the noisy data set by SAF and filtering, we proceed with estimating the dimensionality index on every remaining point, finding that we were able to retain most of the structure as made of 3-dimensional points. We then apply the Crawling algorithm, constructing an abstract-embedded pair of graphs. The embedded latent space is shown in Fig. 25a, projected onto two different triplets of dimensions. The global structure is clearly recovered through crawling. However, given that the abstract latent space was induced from a diffused sample of points not regularly covering the 3-torus, the internal structure of the manifold is not easily identifiable. To demonstrate more clearly the ability of the crawling algorithm to recover the underlying 3-dimensional manifold structure, we let the crawling operate on a data set created as a regular grid of points lying on the 3-torus. This way we can directly compare the ground-truth embedded latent space graph (obtained by connecting neighbouring grid points in the 3-torus topology, shown in a sub-sampled version in Fig. 25b, left panel) to one recovered by crawling (Fig. 25b, right panel). The two embedded graphs (shown here in the projection corresponding to that of Fig. 25a, left panel) clearly sample the same manifold structure. As shown in this example, the methodology could in principle be applied to higher dimensional manifolds embedded in even higher dimensional space. This is also true if the ambient dimension increases, provided the manifold is well sampled and local neighbourhoods are populated enough to perform PCA. However, the computation time of the crawling algorithm is bound to increase, since at each expansion phase, 2 j new children node estimates are created per parent node.

Conclusion
We presented a novel, semi-automated framework for denoising, dimensionality estimation, multi-manifold extraction and manifold aligned density estimation from complex data sets containing samples from noisy manifolds of diverse dimensionalities embedded in a noisy environment. The framework uses only a few easy-to-understand hyper-parameters, such as characteristic scale, that can be manipulated to obtain an emerging picture of the multi-manifold structure of the data. We have illustrated the workings of the methodology on two synthetic data sets containing a number of manifolds, including a toroid and a Möbius strip, embedded in an extremely noisy environment. For each manifold we were able to estimate its intrinsic dimensionality, and using Abstract GTM, to extract its embedded and abstract graphs through Manifold Crawling, while constructing a probabilistic model, describing its density. We provided an efficient alternative to the Crawling algorithm that is less computationally intensive. While competitive with other manifold learning algorithms, the alternative to Crawling proved less accurate than its slower, but more stable, graph construction method. We showed how it is possible to compute the local curvature on the recovered manifolds taking advantage of the smooth mapping function formulation of AGTM. We also performed a detailed comparison with both Multi-Manifold learning and Probabilistic modelling algorithms, widely used in the literature. The Multi-Manifold learning alternatives proved sensitive to noise, or impractical when dealing with multiple manifolds of different dimensionalities. The probabilistic modelling techniques, although capable of representing globally the data sets, are not designed to explicitly capture general noisy low-dimensional structures along which the data can be organized. AGTM proved equally capable of modelling the global distribution of the data sets, while still carrying meaningful information about their underlying low-dimensional manifolds.
The methodology was then applied to a complex data set containing simulated gas volume particles from a numerical simulation of a dwarf galaxy interacting with its host galaxy cluster. We have shown how the extracted 1-D and 2-D manifolds can help us to understand the nature of star formation inside such disrupted dwarf galaxies. Besides many other possible applications, our methodology offers new exciting ways of analysing details emerging from astrophysical simulations that are not possible with the current tools commonly used in astronomy.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
For each manifold M i , i = 1, . . . , 9, we first sample a large number of N G T ∼ 1e4 points from its ground truth M i , densely covering the whole structure. Around each sample point on the manifold we position a hyper-ball of fixed radius r G T = 0.04. Let us denote the union of the hyper-balls by B G T . To generate a sample from a noisy manifold, we first envelope the ground truth manifold with a cuboid in the ambient space. The cuboid is then covered with a large number N Cub ∼ 1e6 of points sampled from the uniform distribution and collected in S Cub . The sample from the noisy manifold is then S Cub ∩ B G T .
After extensive testing of hyper-parameters for the SAF technique, we found that the best results were achieved with r = 0.05, μ = 1e − 3, N iter = 5.