Stable topological signatures for metric trees through graph approximations (cid:2)

for metric trees. This bound is tight if the metric distortion obtained through the graph and its maximal edge-weight are small. Through a case study of gene expression data, we demonstrate that our newly introduced diagrams provide novel quality measures and insights into cell trajectory inference. © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
For the past decade, persistent homology [16] -the most prominently used and studied tool within the field of Topological Data Analysis (TDA) [6] -has led to many new applications to supervised and unsupervised machine learning. Many of the data sets to which persistent homology has been successfully applied, were already at least partially structured, in the form of a simplicial complex , i.e., a higher-dimensional generalization of a graph. Examples of these include brain networks [17] , meshes [14] , and images [2,27] . Persistent homology then tracks topological changes over a filtration , i.e., a nested sequence of subcomplexes of the original complex.
In the case of point cloud data, the data is often sampled from a topological structure, the knowledge of which provides tremendous insight into the underlying structure or data generating process. However, the underlying topology is often difficult to reveal, due to the high dimensionality of the data, or noise. Since they lack a naturally induced simplicial structure, computing persistent homology of point clouds is mostly feasible through the the Vietoris-Rips filtration [22] . Unfortunately, this type of persistence -a measure of prominence or relevance of a topological feature-is often insufficient, as it merely detects gaps, cycles, voids, and higherdimensional holes in the model. Thus, it is impossible to distinguish between point clouds sampled from a linear ('I'-shaped) versus a bifurcating ('Y'-shaped) topology through this method.
We therefore develop a new foundation for learning topological patterns through graph approximations . These graphs will be used as simplicial representation of the data. As will be shown in this paper, they allow us to learn a wider variety of topological patterns in metric trees , in theory and in practice.

Contributions
• We provide an intuitive introduction to, as well as a formal theoretical foundation for studying topological patterns through 0-dimensional persistence of arbitrary graph approximations ( Section 2 ). • We show under which conditions functions lead to a nontrivial stability result-guaranteeing that our true and empirical persistence diagram are close-for graph approximations ( Theorems 2.1 and 2.2 ). We provide two such functions quantifying powerful topological features of metric trees: the eccentricity and normalized centrality ( Corollary 2.1 ). • We introduce a novel application of our signatures that goes beyond standard topological inference, providing novel quality measures and insights to the field of cell trajectory inference ( Section 3.2 ). • We summarize how our method leads to and opens up new possibilities for learning topological patterns ( Section 4 ).

Background on persistent homology
The concept of persistent homology has its roots in the field of algebraic topology [18] . Its computation requires two things: a simplicial complex K, and a filtration F defined on K. A simplicial complex can be seen as a generalization of a graph, that apart from 0-simplices (nodes) and 1-simplices (edges), may also include 2simplices (triangles), 3-simplices (tetrahedra), and so on. A simplicial complex K is furthermore closed under inclusion, i.e., if Fig. 2 a illustrates these concepts by means of a point cloud data set D sampled from the unit circle. Here, the filtration equals the Vietoris-Rips fil- to , and of dimension less than or equal to k . If k = 1 , we simply refer to the complex as the (Vietoris-) Rips graph . The number of k -dimensional holes in a complex is expressed by the Betti number β k . In this sense, a 0-dimensional hole is a 'gap', and β 0 corresponds to the number of connected components, β 1 corresponds to the number of loops, β 2 to the number of voids, and so on. Persistent homology quantifies topological changes through the birth and death of these holes across the filtration. E.g., in Fig. 2 a, every data point corresponds to the birth of a connected component at the start of the filtration. By increasing , points get connected to each other, resulting in the death of many of these components. From around = 0 . 75 , the complex consists of one connected component, as well as a loop representing the underlying cyclic structure. 1 Increasing further, this loop gets 'filled in' through the 2-simplices, resulting in its death. The idea behind persistent homology and persistence is that holes persisting for a long range of consecutive values represent significant features of the topology underlying the point cloud. This is illustrated by the persistence diagrams D k (one for each considered dimension k ∈ { 0 , 1 } of holes) in Fig. 2 b. This is a multiset containing a point (b, d) for each hole that was born at = b and died at = d. By definition, d = ∞ if a hole never dies. These points are usually displayed at the top of the diagram. Furthermore, by convention, a persistence diagram contains every point on the diagonal.
To understand one of the most important concepts in TDA (and in this paper), i.e., stability , we first need to introduce some definitions [1,22] . Definition 1.1. Let D and D be two persistence diagrams. The bottleneck distance between them is defined as where ϕ ranges over all bijections from D to D , and x ranges over all points in D. Since the diagrams include the diagonal, is the infimum of the for which there exists an -correspondence Stability ensures that if two finite metric spaces are close, their persistence diagrams obtained through the Vietoris-Rips filtrations are close as well. More formally, if (X, d X ) and (Y, d Y ) are two finite metric spaces, then [11] Stability results formulated through the ground truth topology also exist for the Vietoris-Rips filtration, but their formulation tends to be more complicated [22] .

Related work
Persistent homology has already been used extensively in (un)supervised machine learning problems. In this context, it can be regarded as a feature engineering method, where its resulting persistence diagrams correspond to topological signatures , encoding structural information at varying scales in the data. Our purpose is not to outperform these methods, but rather to extend them to become applicable to a wider variety of data sets for which learning topological patterns remains an important challenge-in our casemetric trees .
The main novelty of our introduced stability result ( Theorem 2.2 ) is its generality in terms of the type of graph approximation, instead of its generality in terms of the dimension of persistent homology. In case of metric trees, we will show that the restriction to 0-dimensional persistence is indeed sufficient for revealing multifurcations and leaves. However, the restriction to particular graphs such as Rips graphs is often unfavorable in case of non-uniform density across our point cloud.
That being said, it is worth pointing out the differences of our work to the following.

TDA through functions
The idea of TDA through functions equipped on point cloud data, and in particular, the eccentricity function ( Corollary 2.1 ), is not novel. Indeed, Carlsson [7] previously discussed that (regular) persistent homology through the Vietoris-Rips complex may miss out on finding meaningful structure in many examples of point cloud data. He proposed a refinement under the name of functional persistence . The idea is to apply regular persistence to a subset of the data, obtained through thresholding according to a user defined function. E.g., by applying regular persistence to a subset of points sufficiently far away from the center of the point cloud, one may be able to deduce a flare-structured topology. However, his introduction to functional persistence is rather brief, and this method mainly serves as a visual inspection tool for individual data sets. Point cloud data sets sample from (Left) an H-structured and (Right) an X-structured topology. The ground truth models are shown in red. As the middle branch of the H-structured topology is short relative to the amount of noise in the data, its underlying topology becomes difficult to distinguish from an X-structured topology. The purpose of our current work is to theoretically and practically quantify that these patterns are similar. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Chazal et al. [10] and Oudot [22] extend persistent homology of metric spaces to metric spaces equipped with a real-valued function f . They vary the Vietoris-Rips complex VR k (D ) alongside the sublevel sets D := { x ∈ D : f (x ) ≤ } on which the simplicial complexes are constructed. Both the simplicial complexes as well as the data on which they are constructed are indexed through the same parameter . Although their provided stability result applies to persistent homology in any dimension [22,Th. 7.11] it is restricted to Rips based filtrations. Furthermore, this method might only make sense for discovering topological features other than components and cycles whenever the distance metric on D and the functional values f take on a similar scale. In this case, topological features will generally appear less prominent, as higher weight edges will be added at later times ( Fig. 3 c). Furthermore, inclusion of all vertices would then mean the simplicial complex has to be grown until all pairs of nodes are connected by an edge, making this method computationally less efficient.
Finally, Carrière et al. [9] present a stability result for sublevel filtrations constructed from the ground truth and (a pair of) Rips graphs constructed from a point cloud approximation. In particular Lemma 3.3 by [9] is closely related to our main result ( Theorem 2.2 ), and our restriction to Lipschitz functions is inspired by this result. The exact differences between both results will be pointed out in Remark 2.2 , after our main theorem.

Persistent local homology
The idea of persistent local homology [3] is to infer topological properties of stratified spaces (including metric trees), by studying persistent homology of the data after removing a neighborhood B r (x ) of a particular point x . This is very similar to the concept of functional persistence, as discussed above. Since it is rather difficult to pinpoint a single suitable radius r, this parameter is often varied as well, resulting in a 1-parameter family of persistence diagrams also known as a persistence vineyard . Unfortunately, the current theoretical analysis of this method is again restricted to particular filtrations, such as Rips based filtrations or filtrations based on the Delaunay triangulation , the latter of which is challenging to compute in higher dimensional data [4] . Furthermore, existing implementations for computing persistence vineyards are limited (e.g., the Dionysus 1 library in C ++ ), and well-studied methods for comparing persistence vineyards (similar to the bottleneck distance) are lacking. Mapper The Mapper algorithm mainly serves as a data visualization method, and has been successfully applied to metric trees [21] . The algorithm itself does not directly provide topological signatures, as we do in this paper.
Mapper is quite sensitive to its parameters [19] . Some work to overcome this issue has been performed by Dey et al. [13] , under the name of multiscale Mapper . The idea is to track the changes in homology of the output of the Mapper algorithm across a varying parameter sequence. However, similar to persistent homology through the Vietoris-Rips filtration, this method only tracks changes in the number of connected components or cycles in the global model.
The result of Mapper is commonly a graph. Hence our main result ( Theorem 2.2 ) can also be applied to study how well these graphs preserve topological information of metric trees. In line with this approach, prior results do allow one to quantify the degree of (in)stability of topological features (including leaves) obtained, in case the used clustering method (one is required by the Mapper algorithm) coincides with obtaining connected components in Rips (sub)graphs [8] .

Metric graph reconstruction
The case studies in our paper are graph (tree)-structured topologies , previously studied by Aanjaneya et al. [1] . This work strongly connects to ours on a theoretical level, as we also formally define the metric distortion we obtain through our graph approximation through the concept of -correspondence. The major difference is that we do not require any assumptions on the underlying topology to provide our theoretical guarantee , which is the bound on the distance between our true and empirical topological signature ( Theorem 2.2 ). By contrast, Aanjaneya et al. [1] require that the metric distortion is bounded by a function of the shortest branch length of the underlying topology to guarantee its reconstruction. For example, one cannot guarantee the correct reconstruction of an H-structured topology if the noise in the data is too high relative to the length of middle branch. In this case, it may become difficult to distinguish the underlying topology from an X-structured topology, as illustrated in Fig. 1 .  Fig. 3 a misses out on capturing any topological information other than the underlying model being connected. We can however equip D with a function f that expresses how far a point is from the data center. To this end, we first constructed a 10NN graph G from D, and then computed its negative eccentricity function f = −E G , where

Persistent homology through graph approximations
After rescaling both f and the shortest path distance metric d G on G to [0, 1], the Rips based signature presented by Chazal et al. [10] for the metric space (D, d G ) equipped with resulting normalized centrality function captures some additional structural information. The three 'leaves' present in the topology underlying D correspond to the three most elevated points in the diagram ( Fig. 3 c). However, the components representing these leaves merge quickly before reaching the center of bifurcation, due to the addition of higher weight edges that are not present G . In contrast to this, (0-dimensional) persistent  The purpose of this section is to provide a more formal theoretical foundation for this last type of persistence through graph 'approximations'. The term 'approximations' is to be loosely interpreted, in the sense that we are given some graph that is meant to capture topological information of the data. This can be a Rips graph, k NN graph, minimum spanning tree, or any type of neighborhood graph constructed from the data. Furthermore, this may also be the result of a (graph) model inference method such as the Mapper algorithm.
In Section 2.1 , we will illustrate the concept of stability through graph approximations, and discuss the main obstacles for introducing an immediate stability result. In Section 2.2 , we prove a new stability result for metric trees.

Stability through graph approximations
The following theorem states that for any correspondence C between the points in a metric space (X, d X ) and nodes in a graph G, and functions f : X → R , g : V (G ) → R , one may bound the bottleneck distance between the diagrams for f and g by a value m = max { a, b} , measuring how well f and g preserve the connectivity in their respective sublevel filtrations under C. Theorem 2.1. Let (X, d X ) be a connected metric space, G a graph, f : X → R a tame function, and g : V (G ) → R . Let a, b > 0 , and suppose C ⊆ X × V (G ) is a correspondence with the following properties: where · ∼ · denotes that two points are connected by a path in their respective space (topological or graph), and G [ U] denotes the subgraph of G induced by the nodes U ⊆ V (G ) . Then Proof. As most definitions in this proof are unimportant for the rest of our paper, they will be omitted for conciseness. First, observe that Dgm 0 F g (G ) = Dgm 0 F | g| (| G | ) , where | G | is a geometric realization of G and | g| is obtained by extending g on | G | through linear interpolation [15,23] . Now let T f and T | g| be the merge trees of f and | g| , respectively [20] . Note that their elements (points) are equivalent classes. Let μ := max { a, b} , and consider the mapping where y is any node of G such that (x, y ) ∈ C. Also consider where (x, y ) ∈ C for some endpoint y of the segment in | G | including ˜ y , for which g(y ) = | g| (y ) ≤ | g| ( ˜ y ) . It immediately follows that α μ and β μ are μ-compatible maps. Furthermore, since by assumption the time increment needed for two points to become connected in one space does not become larger for their corresponding points (under C) after an initial increment by μ, α μ and β μ are both continuous and in particular well-defined. The result now follows from [20, Th. 3] . Theorem 2.1 cannot yet be interpreted as a stability result. We must still express how the distance between the diagrams depends on the closeness of (X, d X ) and G . However, even if (X, d X ) and G are arbitrarily close in the sense of an -correspondence C, and f : X → R and g : V (G ) → R are arbitrarily well-preserved under this correspondence, there is generally no guarantee that the diagrams are close as well. This is illustrated by two example models and their graph approximations in Fig. 4 .
In the first example ( Fig. 4 a), we constructed the fully connected graph G on a translated sample D of a continuous linearstructured metric space (X, d X ) . Due to the absence of curvature, the metric space (V (G ) , d G ) well-approximates (X, d X ) in the sense of an -correspondence (we omit an actual value of as we believe the concept is clear). Since G is fully connected, one connected component will be born in the filtration, and it will never die. This is illustrated by the persistence diagram in Fig. 4 b, where we defined the filtration through the negative eccentricity function of G . Both for the ground truth model, as well as for G, the eccentricity function provides a smooth transition from the (underlying) leaves towards the center. However, the sublevel filtration for (X, d X ) will start at two connected components, that only merge at the center of X.
The second example ( Fig. 4 c) illustrates a 'finer' approximation of (X, d X ) through the Rips graph R 0 . 1 (D ) := VR 1 0 . 1 (D ) constructed from D . We now defined a function f (resp. g) on X (resp. D ) that values 1 at every single point, apart from one point near the center where it values 1. Again, the filtration for R 0 . 1 (D ) starts with one connected component (including all but one point), that never dies. The filtration for the ground truth model starts off with two connected components that merge only at the center as before. The takeaway of the examples above, is that to ensure stability, we need two things. First, we need to formalize how well our graph G approximates the topology of the underlying space, both through the concept of -correspondences, as well as through a distance measure between nodes connected through an edge.
Given a weighting function w : E(G ) → R + , we will use the maximum weight w max := max e ∈ E(G ) w (e ) for this purpose. In practice, w max will be low if the data is sufficiently densely sampled and G is a neighborhood graph. Second, the functions used to define the filtration must be such that if and w max are small, so are a and b from Theorem 2.1 . Inspired by Lemma 3.3 by Carrière et al. [9] , we will consider Lipschitz functions, where a real-valued function f on

A new stability result for metric trees
In this section, we provide two closely-related functions to ensure stability for tree-structured topologies through graph approximations. These will be the (negative) eccentricity and the normalized centrality , the latter of which is scale-independent. The true persistence diagrams for these functions are extremely informative for metric trees. The birth of a component will always occur through a leaf, and its death through either a multifurcation or the center of the tree ( Fig. 3 d).

Definition 2.1.
A metric tree is a path metric space (X, d X ) that is homeomorphic to a 1-dimensional stratified space, for which there is a unique path between every two points. The radius of X is rad (X ) := min x ∈ X max y ∈ X d X (x, y ) .

Theorem 2.2. Let (X, d X ) be a metric tree, and G a positively
weighted graph such that there exists an X -correspondence C be- Since the functional distortion f and Lipschitz constant c remain the same after negating both functions, it suffices to show that the inequality holds for − f and −g.
Take any (x, u ) , (y, v ) ∈ C, let P x,y ⊆ X denote the unique path from x to y in X, and let (u = p 0 , p 1 , . . . , p l = v ) be a shortest path from u to v in G . For any 0 ≤ i ≤ l, take q i such that (q i , p i ) ∈ C, with q 0 = x, q l = y . Now arbitrarily take t ∈ R .
(they are not included). If x = y, take any z ∈ P x,y that minimizes f (z) over P x,y .
Observe that necessarily f (z) < t. Now let i := max 0 ≤ i < l : P q i ,P x,y ∩ P z,y = ∅ ∨ z = q i , where P q i ,P x,y ⊆ X denotes the unique path from q i to (its closest point on) P x,y in X. It follows that . The result now follows from Theorem 2.1 .
Remark 2.1. The proof of Theorem 2.1 suggests that we can obtain even stronger comparisons by looking at the interleaving distance between the resulting merge trees, instead of the 0-dimensional persistence diagrams. Indeed, Morozov et al. [20] provide an example of two distinct merge trees for which the corresponding functions have the exact same persistence diagram. Unfortunately, computing interleaving distances between merge trees is currently computationally more challenging than computing bottleneck distances between persistence diagrams [26] .

Remark 2.2. For Rips graphs G = R 3 δ (D ) , the bound in
Theorem 2.2 reduces to the bound in Lemma 3.3 by Carrière et al. [9] for zeroth-order persistent homology, whenever X 2 ≤ w max ≤ 3 δ. However, our result applies to any graph, and does not require that w max dominates X 2 . Intuitive examples for which this is important include minimum spanning trees.
The convexity radius ρ(X ) states that for any open metric ball in X of radius less than ρ(X ) , any two points x, y in this ball are connected by a unique shortest path on X. Similar to Lemma 3.3 by Carrière et al. [9] , we expect that our result can be generalized to arbitrary length spaces by bounding X through a function of the convexity radius ρ(X ) of X.
The following can now be straightforwardly derived.
Corollary 2.1. Let (X, d X ) be a metric tree, and G a positively weighted graph such that there exists an -correspondence C between (X, d X ) and (G, d G ) . Let E X := max x ∈ X d X (·, x ) be the eccentricity function, and C x := be the normalized centrality function on X (define E G and C G analogously). Then where the last inequality holds if C X and C G are well-defined.

Experiments
In this section, we show how Theorem 2.2 can be applied in practice. We first illustrate this through synthetic data sampled from metric trees in Section 3.1 . In Section 3.2 , we provide novel insights and quality measures to the field of cell trajectory inference.

Synthetic data of metric trees
We considered four tree-structured topologies embedded in R 2 , and sampled 600 observations from each of them, by sampling uniformly from each branch a number of points proportional the length of this branch. For each of these data sets, we applied a small amount of random 2-dimensional Gaussian noise, as well as a random rotation, three times. From each of these twelve resulting data sets, we constructed a Euclidean minimum spanning tree (MST), and computed the normalized centrality function. The resulting functions, MSTs, as well as the ground truth models, are shown in Fig. 5 .
The persistence diagrams obtained for the sublevel filtrations of the normalized centrality functions are shown in Fig. 6 . Note that there may be overlapping points. As can be expected, there are many points in the persistence diagrams for the MSTs near the diagonal. This is a result from the MST not including any triangles (in the graph theoretical sense). Nevertheless, we observe that the highly elevated points in all our diagrams identify important structural information of the ground truth models. Fig. 7 a visualizes the pairwise bottleneck distances between all diagrams. Fig. 7 b shows a Multi-Dimensional Scaling (MDS) plot of this distance matrix. We see that similar shapes are clustered well together. We also note that the H-structured topologies are somewhat in the middle of the other topologies. This is as expected. E.g., the longer the middle branch of the corresponding model is, the closer this pattern is to a I-pattern. The shorter this branch is, the closer it is to an X-pattern.

Cell trajectory data
Cell trajectory inference considers the task of inferring a graphstructured model from gene expression data , to identify the differentiation process of the cells. Cells can be regarded as points in a (high-dimensional gene expression) space R d , and approximate (the embedding of) their underlying graph-structured model in this space. Some examples of cell trajectory data sets and their underlying models are illustrated in Fig. 9 .
Cell trajectory inference is overall a very difficult task. Even the top ranked methods have a low performance on many data sets [25] . The purpose of this section is not to propose the use of our signatures ( Corollary 2.1 ) as a new topological inference method for this type of data, but rather to use these to study why this problem is essentially so difficult. In particular, Vandaele et al.  [28] recently showed that state-of-the-art cell trajectory inference methods struggle to approximate the geometry of the underlying model well, or commonly underestimate the number of leaves. To explain these difficulties, we proceed with an analysis similar to the one in Section 3.1 .
We consider 131 synthetic and 57 real cell trajectory data sets with an underlying tree-structured model [5] . The number of cells ranged from 59 to 5018, and the number of genes from 373 to 23,658. A two-dimensional diffusion map embedding was computed for each data set, both for visualization purposes, as well as to reduce the effects of the curse of dimensionality on our neighborhood graph approximation [24] . A 10NN graph and its normalized centralities were computed from each embedding. Fig. 8 visualizes all cell trajectory data sets by means of an MDS plot of the pairwise bottleneck distances we obtained through topological persistence of our 10NN graphs. We illustrate twelve 'landmark' embeddings of cell trajectory data sets, as well as their ground truth models on these embeddings, and their obtained empirical persistence diagrams in Fig. 9 .
First, observe that all linear cell trajectories are located near a linear curve on top of the MDS plot. This means that our chosen Fig. 8. MDS plot of pairwise bottleneck distances of the persistence diagrams obtained through the 10NN graphs and normalized centralities. Each point corresponds to one cell trajectory data set. A loess curve (red) is fitted using the MDS1 coordinate as independent variable, and the average performance over all considered cell trajectory inference methods as dependent variable. The points with a black contour correspond to the data sets visualized in Fig. 9 . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) data representation does not artificially create more leaves than truthfully present. E.g., this is more often the case when we apply a PCA projection instead of a diffusion map embedding. However, many nonlinear trajectories are located near this curve as well. Near the right side of this curve, this is mainly due to branches being relatively short compared to a main linear trajectory (e.g., MDS (0, 0.6) in Fig. 9 ). These trajectories are indeed theoretically close to linear according to our chosen metric. On the left side of this curve, we find the more noisy data sets, where we fail to provide a good representation. Their persistence diagrams represent more 'blob'-like patterns ( Fig. 9 ). Below this curve, we find the trajectories where we truthfully manage to identify additional branches. However, we note that it appears to be difficult to identify more than three leaves. This explains why cell trajectory inference methods commonly underestimate the true number of leaves [28] . Note that the 'boomerang' shape made up by all data sets in Fig. 8 , also coincides with what we theoretically expect for our chosen metric. We note a continuous transmission of blob-like patterns, towards linear patterns, towards patterns with leaves. The fact that this shape takes a turn near the right, can be theoretically explained through the definition of the bottleneck distance. As we only look at the maximal distances of a matching, the number of 'high' distances in such matching does not matter. Blob-like patterns are as distant from linear patterns as they are from patterns with more leaves, according to this metric.
Finally, we fitted a loess curve (standard settings in R ) using the MDS1 coordinate as the independent and the average performance over 45 different cell trajectory inference methods as the dependent variable. This performance is measured through the geodesic distance preservation (correlation) metric introduced by Saelens et al. [25] . Fig. 8 shows a positive correlation (0.58) between these variables. Note that the choice of using the MDS1 coordinate is arbitrary in general. However, this choice supports our findings that on the left side of our MDS plot, we mainly find noisy data sets. Since every cell trajectory inference method uses a different algorithm or data representation (such as the type of dimensionality reduction or neighborhood graph), this can be seen as a quality measure of the data itself, independent of our chosen data representation.

Discussion and conclusion
We provided a novel foundation for quantifying topological patterns in metric trees through graph approximations, which led to new and direct stability results. Though these result currently only holds for metric trees, we opened up new possibilities to study which functions ensure stability by means of Theorems 2.1 , 2.2 , and Remark 2.2 . This may lead to further theoretical justification of recognizing a wider variety of patterns through graph approximations.
Rather than using our signatures for topological inference, we introduced a novel use for them in an exploratory data analysis setting. We developed insights into cell trajectory inference, and provided the first charting of such data sets that explains some of the difficulties this field is confronted with. We also provided a new way of quality measurement, that does not require ground truth knowledge. It will be interesting to investigate whether other types of signatures, such as those discussed in Section 1.3 , may find additional applications within this setting.
Since we consider sublevel filtrations on any given graph, we can choose to approximate our data through a small or sparse graph. For 0-dimensional persistence, this is computationally more efficient than Rips based signatures for metric spaces equipped with functions, which may require the construction of the complete graph on the data to include all nodes. Nevertheless, for larger data sets it may be interesting to explore approximation methods similar to the witness complexes for Rips based filtrations [12] .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.