Stability for Inference with Persistent Homology Rank Functions

Persistent homology barcodes and diagrams are a cornerstone of topological data analysis that capture the"shape"of a wide range of complex data structures, such as point clouds, networks, and functions. However, their use in statistical settings is challenging due to their complex geometric structure. In this paper, we revisit the persistent homology rank function, which is mathematically equivalent to a barcode and persistence diagram, as a tool for statistics and machine learning. Rank functions, being functions, enable the direct application of the statistical theory of functional data analysis (FDA)-a domain of statistics adapted for data in the form of functions. A key challenge they present over barcodes in practice, however, is their lack of stability-a property that is crucial to validate their use as a faithful representation of the data and therefore a viable summary statistic. In this paper, we fill this gap by deriving two stability results for persistent homology rank functions under a suitable metric for FDA integration. We then study the performance of rank functions in functional inferential statistics and machine learning on real data applications, in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing non-persistence-based approaches.


Introduction
Topological data analysis (TDA) leverages theory from algebraic topology to computational and data analytic settings, and has enjoyed great success in applications in many fields, including biology (Emmett et al., 2016;Cang and Wei, 2017;Cang et al., 2018), medicine (Crawford et al., 2020;Biwer et al., 2017), physics (de Silva and Ghrist, 2007), economics (Gidea, 2017), and motion planning (Bhattacharya et al., 2015;Vasudevan et al., 2013), to name a few.A cornerstone methodology of TDA is persistent homology, which produces a summary statistic of data.Its widespread applicability stems from its flexibility to adapt to a variety of complex data structures; its interpretability within the scientific domains where data arise; and its stability, which is the focus of this work.Stability provides a notion of faithfulness of the topological representation of the input data, guaranteeing that a bounded perturbation of the input data results in a bounded perturbation of their topological representation captured by persistent homology.Stability is a crucial property that validates the use of persistent homology in real data applications.
Persistent homology has its roots in various constructions and thus has various representations; the most well-known is the persistence diagram or equivalently, the barcode.A less-used representation is the rank function, initially proposed in the early 1990s (Frosini, 1992) as the size function.Size functions were used as a mathematical tool for shape and image analysis in computer vision and pattern recognition (Verri et al., 1993;Frosini and Landi, 1999;Landi and Frosini, 2002;Biasotti et al., 2008a;Di Fabio et al., 2009), and were reinterpreted in algebraic terms using a correspondence between size functions and formal series (Frosini and Landi, 2001) (see Biasotti et al. (2008b) for a thorough survey on the theory of size functions).Such algebraic topological formulations of rank functions directly coincide with parallel concepts from persistent homology.
Although rank functions were proposed before persistence diagrams and barcodes, and they are in fact equivalent to them, their use has been comparatively restricted due to a the difficulty in establishing stability results and a less comprehensive understanding of their effectiveness in practical data scenarios.Nevertheless, with the current active research interest in multiparameter persistent homology, rank functions are again becoming increasingly relevant, as they inherently and directly adapt to this higher dimensional framework where persistence diagrams and barcodes do not (Carlsson and Zomorodian, 2007).Additionally, unlike persistence diagrams and barcodes, rank functions, due to their structural form as functions, are naturally amenable to functional data analysis (FDA), a robust statistical field focused on analyzing data taking the form of functions, curves, and surfaces.This adaptability to FDA (see Ramsay (2005Ramsay ( , 2002) ) for a thorough introduction to the field and its applications) offers a rich statistical toolkit not directly accessible with persistence diagrams and barcodes.FDA methods have been previously used in TDA by Crawford et al. (2020), who perform Gaussian process regression using a summary statistic constructed from a dynamic version of the Euler characteristic, which is an alternative topological invariant and distinct from persistence barcodes and diagrams.Principal component analysis (PCA) for rank functions-an important dimension reduction technique in descriptive, rather than inferential, statistics-has also been studied in Robins and Turner (2016).
In our work, we aim to study the performance of rank functions in inference tasks, which move beyond the descriptive analysis by Robins and Turner (2016) and, within the field of statistics, are arguably significantly more challenging: where descriptive statistics studies properties observed in a single sample of data, inferential statistics aims to impute information and provide guarantees on the possibly infinite, unobserved population from observed data.To validate our findings, however, we need to first establish suitable stability properties of rank functions.Here, "suitable" means that we will need to establish stability under a metric conducive to the application of FDA methods.Once this is achieved, we then study the performance of rank functions in both single-and multiparameter persistent homology in inferential machine learning tasks on real data applications.We find a clear improvement in performance using rank functions compared to existing methods that do not incorporate persistent homology, and other persistence-based methods.
The remainder of this paper is organized as follows.In Section 2, we provide a background and literature review on rank functions and discuss their relation to barcodes and persistence diagrams, as well as various metrics associated to these representations and their implications on stability.In Section 3, we present two stability results for rank functions with respect to a suitable metric for FDA implementation.These stability guarantees motivate the applications of rank functions in inferential tasks using FDA to real data presented in Sections 4 and 5.We close with a discussion in Section 6 on our findings and propose directions for future research.Proofs and further theoretical details are given in Appendix A.

Preliminaries: Persistent Homology
In this section, we review the essential background and existing literature to the theory of persistent homology and its metrics, foundational to our work.

Persistence Modules and Rank Functions
The algebraic object that is the central focus of persistent homology theory is the persistence module, a functor mapping from a poset category to the category of vector spaces, M : (P, ≤) → Vec, also written as M ∈ Vec (P,≤) (Bubenik and Scott, 2014;Bubenik et al., 2015;Kim and Mémoli, 2021).Unless otherwise specified, we assume that Vec is the category of finite dimensional vector spaces and work with pointwise finite dimensional (p.f.d.) persistence modules.
Arguably, the most relevant example is the module of persistent homology for a finite simplicial complex, first introduced by Edelsbrunner et al. (2002) and obtained as follows.Consider a filtration, i.e., a diagram F ∈ Simp (R,≤) such that F x := F (x) is a finite simplicial complex for each x ∈ R, and such that for any other y ∈ R with x ≤ y, F (x ≤ y) is an inclusion F x ⊂ F y .A common example is the Vietoris-Rips filtration (Vietoris, 1927), which for a metric space (X, d) and a finite subset S ⊂ X is denoted as VR(S) = {VR t (S) : t ∈ [0, +∞)]}.The simplicial complex at filtration value t ∈ R is defined as the family of all simplices of diameter less or equal than t that can be formed with the finite set S as set of vertices.An example of a Vietoris-Rips filtration is given in Figure 1.
Given that we are working with finite simplicial complexes, there is a finite discrete set of values over which the filtration changes.For each k ≥ 0, we obtain another diagram H k (F ) ∈ Vec (R,≤) by setting where H k : Simp → Vec is the homology functor of order k ≥ 0 with coefficients over a field F. We shall also refer to the diagram given by using the notation H(F ) ∈ Vec (R,≤) .
Figure 1: Vietoris-Rips filtration of a point cloud of 30 points over a circle with Gaussian noise of scale 0.1 added, at 4 different filtration values (i.e., radius of the balls centered on the points).The simplicial complexes are built by adding an edge between any two points with overlapping circles (at a distance less or equal than twice the filtration value); and higher k-dimensional simplices for each k + 1 subset of connected points.From the third image to the fourth, a 1-cycle (loop) appears in the filtration, encircling the hole in the shape.This is the type of feature that PH aims to capture.
The p.f.d.persistence module H(F ) ∈ Vec (R,≤) is called the persistent homology module of the filtration.
The previous construction is dimensionality-independent, allowing its application to both 2D and 3D data: the Vietoris-Rips filtration relies solely on pairwise distances, making it adaptable to any dimension.We demonstrate its use with 2D data in Section 4 and with 3D data in Section 5.
Observe that we obtain a spectrum of linear maps connecting the vector spaces, and thus, a natural way to study its structure is to consider the ranks of these maps.Let Definition 1 (Rank Function).Given a p.f.d.persistence module M ∈ Vec (R,≤) , its rank function is defined as The space of rank functions will be denoted by I 1 .
Deterministic and probabilistic properties of the rank functions have been also studied under the name of persistent Betti numbers, first introduced by Edelsbrunner et al. (2003).For instance, Duy et al. (2016); Krebs and Polonik (2023) investigated the asymptotic normality and stabilizing properties of central limit theorems of persistent Betti numbers under the homogeneous Poisson and binomial processes and variants.This allows for implementations of the bootstrap procedure on the persistent Betti numbers (Roycraft et al., 2023).Further, in Botnan and Hirsch (2021) the consistency and asymptotic normality of multiparameter persistent Betti numbers in large domains was determined, which is an important foundation to constructing statistical hypothesis tests.It is worth highlighting that this line of work deals mainly with probabilistic properties of rank functions (i.e., persistent Betti numbers), as opposed to their statistical performance in real data analysis, which is the focus of this work.

Persistence Diagrams and Barcodes
A complete invariant is a specific invariant assigned to a persistence module: it has the same value for all isomorphic persistence modules and is different for non-isomorphic ones.In single-parameter persistent homology, rank functions are equivalent to persistence diagrams and barcodes-two complete, discrete invariants obtained from the following distinct approaches.
Persistence diagrams can be traced back to the study of discontinuities of rank functions (Frosini and Landi, 2001), later reinterpreted to visually capture the persistent homology of a filtered simplicial complex H(F ) ∈ Vec (R,≤) (Edelsbrunner et al., 2002).We introduce them for general persistence modules M ∈ Vec (R,≤) .Let T = {t 1 , . . ., t ℓ } ⊂ R be the discrete set of values over which the module changes, and consider a sequence {s 0 , s 1 , . . ., s ℓ } of real numbers interleaved with the elements of T : ≤) is the multiset of points given by where each point (t i , t j ) has multiplicity µ j i , union all the points in the diagonal ∂ = {(x, y) ∈ R 2+ : x = y} counted with infinite multiplicity.The space of persistence diagrams is denoted by D.
Figure 2 shows persistence diagrams for 0-and 1-homology, representing components and loops, respectively, in three 3D point clouds.The sphere exhibits no persistent points far from the diagonal, indicating the absence of non-trivial loops on its surface-every loop can be deformed to a point.In contrast, the torus displays two persistent points representing its horizontal and vertical loops.The Stanford Bunny (Turk and Levoy, 1994) (see https://faculty.cc.gatech.edu/~turk/bunny/bunny.html) exhibits a more complex interpretation, featuring one persistent 1-cycle.
The Structure Theorem due to Zomorodian and Carlsson (2005) (with the version in full generality due to Crawley-Boevey (2015) and Botnan and Crawley-Boevey (2020)) asserts that any p.f.d.persistence module M is isomorphic to an essentially unique (up to reordering) finite direct sum of indecomposable persistence modules For single-parameter persistence, each M j is an interval persistence module, i.e., there is a pair of values b j < d j , where d j may be infinite, such that M j (x) is a copy of the field for all values b j ≤ x < d j and zero elsewhere.We denote these persistence modules by I[b j , d j ).
Definition 3 ( (Zomorodian and Carlsson, 2005;Carlsson and de Silva, 2010)).The barcode of M ∈ Vec (R,≤)  is the list of indecomposables given by the Structure Theorem, or equivalently, the collection of intervals that define these indecomposables The length of a bar is called its persistence.
A barcode translates to a persistence diagram by plotting the left and right endpoint of each interval persistence module as an ordered pair.A persistence diagram translates to a barcode by turning each point (x, y) with x < y in the persistence diagram into an interval persistence module beginning at x and ending at y.In this way, the persistence diagram is equivalent to a barcode, although the two definitions arise from different perspectives.Rank functions are also in bijection with persistence diagrams and barcodes.We have seen above how to define persistence diagrams from an inclusion-exclusion formula (2.1) on rank functions.Moreover, rank functions can be seen as cumulative functions on the persistence diagrams: the value of the rank function β M (x, y) corresponds to the number of points (counted with multiplicities) in the region (−∞, x] × [y, ∞) of the persistence diagram, providing the converse direction of the bijection.Figure 3 illustrates a persistence diagram and its corresponding rank function, where the equivalence between both objects becomes apparent.
Figure 3: Persistence diagram and its corresponding rank function.

Metrics and Stability in Persistent Homology
Stability results, which are the main theoretical contribution of this paper, give bounds for metrics defined on invariants of persistence modules.Given that there exist various metrics in persistent homology, the question of choosing appropriate metrics depends on the eventual goal.
Metrics on Barcodes and Persistence Diagrams.We recall the most widely used metrics to compare persistence diagrams in single-parameter persistent homology and their stability properties.
Definition 4. The bottleneck distance between the persistence diagrams D 1 and D 2 is defined as where ∥ • ∥ ∞ is the infinity norm ℓ ∞ on R 2 and ϕ ranges over all bijections between D 1 and D 2 .
The first stability result in TDA concerns the bottleneck distance (Cohen-Steiner et al., 2007), proving that the map sending a tame function f : X → R defining a filtration F (x) := f −1 (−∞, x] to its persistence diagram Dgm H(F ) is 1-Lipschitz with respect to the bottleneck distance for diagrams and the L ∞ distance for functions.
The bottleneck distance can be extended by replacing ℓ ∞ with ℓ p norms, which give a stronger sense of proximity.
Definition 5.For 1 ≤ p < ∞ and 1 ≤ q ≤ ∞ we define the p,q-Wasserstein distance between two persistence diagrams D 1 and D 2 as where ∥ • ∥ q is the ℓ q norm on R 2 and ϕ ranges over all bijections between D 1 and D 2 .
In our work, we will assume p = q and refer to this metric simply as the p-Wasserstein distance W p .
Wasserstein metrics are widely used in applications, especially p-Wasserstein distances with p = 1, 2, and in some instances have been able to reveal more insight in application settings than the bottleneck distance.For example, in a protein flexibility analysis study, Bramer and Wei (2020) show that p, ∞-Wasserstein metrics provide more accurate results when comparing two conjugate point clouds obtained using atom-specific persistent homology (Cang and Wei, 2017;Cang et al., 2018).Gamble and Heo (2010) explore the power of the Wasserstein distance in the context of statistical analysis of landmark-shape data.Gidea (2017) shows that the W 2,∞ distance is able to detect premature evidence for critical transitions is financial data.Hamilton et al. (2022) study barcodes endowed with Wasserstein distances for protein folding data and compare them with the Gaussian integral tuned (Burley et al., 2021) vector representation.
Despite the desirable performance of Wasserstein distances in applications, their stability properties have not been as broadly studied until recently, due to the level of additional technicality required.In Skraba and Turner (2021), the following cellular Wasserstein Stability Theorem for p-Wasserstein metrics was established, validating many of the existing results in applications and justifying their continued use.
The existence of this result also justifies comparison with the Wasserstein metric to establish stability.
Metrics on Rank Functions.Rank functions lie in the space of functions from R 2+ to R (Robins and Turner, 2016).By fixing a metric on the space R 2+ , we can then define the L p metric on this space of functions as follows: where ω is the measure on R 2+ corresponding to the fixed metric.For notational simplicity, we write ∥f − g∥ p = d L p (f, g), keeping in mind that this metric comes from the L p norm.This metric space is naturally endowed with a Hilbert structure for p = 2, which is a basic requirement for many FDA methodologies.
There are many choices for the metric as well as for the measure ω, however, the choice should avoid the pairwise distance between rank invariants being infinite.This happens, for example, when the metric is taken to be the Euclidean distance restricted to R 2+ , which implies ω is the Lebesgue measure.Here, two rank invariants have finite distance if and only if their infinite cycles have identical birth times.
Remark 7. In our work, such issues are circumvented by keeping in mind the goal of real data applications, where the posets over which we are defining our diagrams are always finite and in which the filtrations always end in a simplicial complex with trivial homology.This means that every cycle in the filtration is destroyed at some point, except for the 0-cycle representing the connected component of the space, which is always born at time 0. Thus, we can work with the Lebesgue measure without worrying about infinite distances between rank invariants: all bars in our barcodes will be finite, and therefore, every two rank functions will have finite pairwise distance, as desired.

L p -Stability of Rank Functions
In this section, we present our contributions of two stability guarantees for rank functions endowed with L p metrics with respect to the bottleneck distance and 1-Wasserstein distance for persistence diagrams.We focus on the L p metric, as opposed to Skraba and Turner (2021), who study the weighted version, where the weight function ϕ(•) inside the integral satisfies R ϕ(t) < ∞.In particular, they choose ϕ(t) = e −t and obtain the following stability result as a Corollary from Theorem 6.
Corollary 8 ( (Skraba and Turner, 2021)).Rank functions with the L q weighted metric (3.1) are 1-Lipschitz with respect to the p-Wasserstein distance between diagrams if and only if p = q = 1.
The weight function in (3.1) ensures finite distances between rank functions (Robins and Turner, 2016, Lemma 2.1.),which allows for the definition of an inner product structure on finite sets of rank functions and thus justifies the rank FPCA method proposed in Robins and Turner (2016).In our study, finiteness of L p distances between rank functions is guaranteed as explained in Remark 7, and the Hilbert space structure follows directly.As a result, the use of such a weight is not needed in our setting, and in fact it is not helpful for our purposes: its introduction fundamentally changes the expressions in the computations involving the metric, so that the proof of Corollary 8 by Skraba and Turner (2021) does not apply and cannot be replicated in our work.This also underlines the inherent differences between our contribution compared to Skraba and Turner (2021).
Other Hindrances to Rank Function Stability.Under the original name of size functions (Frosini, 1992), several notions of "pseudo-stability" were established.These were achieved under pseudometrics, where the distance between two distinct points can be zero, which makes pseudometrics less desirable for use in real data analyses (due to difficulties in intepretability) and therefore also makes pseudo-stability less desirable as a validating property to justify the use of rank functions as topological summary statistics.Some examples are the deformation distance, which were adapted to persistence diagrams and is more widely known today as the 1-Wasserstein distance between persistence diagrams; the Hausdorff pseudo-distance, which similarly gave rise to the bottleneck distance between persistence diagrams; and the L p pseudo-distance, which exhibited an unstable nature and did not appear to inspire any well-known distance in persistent homology.
In particular, it is worth noting that d' Amico et al. (2003) andd'Amico et al. (2010) renamed the Hausdorff pseudo-distance to the matching distance to emphasize the fact that its computation amounts to finding an optimal matching between multisets, in the same way that the bottleneck distance does, and which was used in establishing the first results of stability for persistent homology.The matching distance was used to establish stability under noisy perturbations when restricted to a subset of size functions called reduced size functions.

Stability Under the Bottleneck Distance
The most straightforward way to achieve stability for rank functions is to restrict away from the diagonal, which is known to complicate the metric geometry of the space of persistence diagrams (Turner et al., 2014;Cao and Monod, 2022).To do this, we introduce a truncation of the rank function that will allow us to compare its sensitivity to noise to that of the bottleneck distance.
Definition 9.For any rank function β and any δ > 0, we define the δ-truncated rank function as In other words, the truncated rank function is just the rank function excluding a strip of width δ > 0 above the diagonal ∂ (see Definition 2).The truncated rank function locally satisfies a Hölder inequality for the L p norm with respect to the bottleneck distance on persistence diagrams.
Proposition 10 (Bottleneck Stability for Truncated Rank Functions).Let 1 ≤ p < ∞ and M be a p.f.d.persistence module with finite intervals in its barcode decomposition.For every δ > 0, there exist 1 ≥ η > 0 and K M,p > 0 such that any persistence module N satisfying In other words, the map (D, d B ) → (I 1 , L p ) which sends each persistence diagram to its corresponding rank function is locally Hölder with exponent 1/p.
Remark 11.The constant appearing in Proposition 10 is precisely where m is the number of points in Dgm (M ) and Notice that we can always obtain a bound similar to that in (3.2) where the constant depends on both persistence modules M and N (see Appendix A).Proposition 10 refines this approach by obtaining a constant that only depends on the persistence module M .Nevertheless, an important limitation of Proposition 10 is that it discards the points close to the diagonal-an important component in the definition of persistence diagrams-even though it sheds light on the behavior of rank functions in discrete settings.The bounds appearing in the proof (see Appendix) will be useful in our next derivations.

Stability Under the 1-Wasserstein Distance
The previously-mentioned limitation of Proposition 10 is that it holds only for points away from the diagonal, which highlights the differences in sensitivity to noise between the L p norms on rank functions and the bottleneck distance on persistence diagrams.This observation was already made by Landi and Frosini (1997); we further develop this observation with a formal study in this paper.
As we will now establish, full rank functions satisfy a stability property with respect to the 1-Wasserstein distance for persistence diagrams.As mentioned in Section 2.3, stability properties of the Wasserstein metric on persistence diagrams were not studied in detail until very recently, which limited their applicability as upper bounds in stability studies.We use the Cellular Wasserstein Stability Theorem (Theorem 6 Skraba and Turner ( 2021)) to establish stability for rank functions.
Theorem 12 (1-Wasserstein Stability for Rank Functions).Let p = 1, 2; and M be a p.f.d.persistence module with finite intervals in its barcode decomposition.Then there exists a constant C M,p > 0 such that for any other p.f.d.persistence module N satisfying W 1 (Dgm (M ), Dgm (N )) ≤ 1, we have In other words, the map (D, W 1 ) → (I 1 , L 1 ) sending a persistence diagram to its corresponding rank function is locally Lipschitz, and the same map between the spaces (D, W 1 ) → (I 1 , L 2 ) is locally Hölder with exponent 1/2.
Remark 13.The constants appearing in Theorem 12 are where m is the number of points in Dgm (M ) and Theorem 12 provides a stronger theoretical guarantee than Proposition 10, not only because it also covers the diagonal, but also because the constant C M,p in (3.4) is smaller than the constant K M,p in (3.2) (p = 1, 2).For p = 1, the latter depends on the number of points in the persistence diagram of M and R, its maximum persistence, whereas the former only depends on R. For C M,2 we maintain a dependence on the number of points in the diagram of M , but it still provides a tighter bound than K M,2 , since this dependence is squared instead of linear.

Inference with Rank Functions
In this section, we study the performance of rank functions in machine learning tasks on real data.Specifically, we focus on inferential tasks-namely, classification and prediction-in the single-parameter setting.

Using Persistent Homology in Data Analysis
Persistent homology captured by persistence diagrams and barcodes fulfils the essential requirements for data analysis: interpretability (via the Structure Theorem) and stability (with various stability results available).Moreover, it is known to be a viable space for probability and statistics (Mileyko et al., 2011;Blumberg et al., 2014).Despite these desirable properties, there remain challenges in utilizing persistence diagrams and barcodes in the full scope of statistical analysis, mainly due to their complicated geometry which results in, for example, non-unique geodesics and Fréchet means (Turner et al., 2014).As a consequence, in statistical questions there are, broadly speaking, two approaches to handling persistent homology in data analysis.One approach entails developing new data analytic methodology, such as machine learning algorithms and statistical models, to accommodate barcodes or persistence diagrams directly (e.g., Fasy et al. (2014); Reininghaus et al. (2015); Hofer et al. (2017)).The other approach entails vectorizing barcodes persistence diagrams to apply existing methods (e.g., Chazal et al. (2014); Bubenik (2015); Adams et al. ( 2017)).
Our approach diverges from both strategies by exploring rank functions as equivalent, alternative representations of persistent homology to leverage theory from functional data analysis (FDA).Methods for FDA, which were constructed to analyze data in the form of functions, are well-established in statistics and many of them arise as extensions of methodologies from multivariate data analysis.Rank functions equipped with the L 2 metric, as discussed above, form a metric space that admits a Hilbert structure, and thus become a viable data structure amenable to integrating FDA with persistent homology.
We emphasize here that we are not modifying the output of persistent homology, as vectorization methods do to persistence diagrams or barcodes, since rank functions are equivalent to barcodes and persistence diagrams.Equally, we are not developing new methodology to accommodate persistent homology, as the existing field of FDA is directly applicable to persistent homology captured by rank functions.

Functional Support Vector Machine on Single-Parameter Rank Functions
Our first study is the performance of rank functions in the single-parameter persistence setting in classification.Specifically, we study the clinical application of discerning heart rate variability between healthy individuals and post-stroke (acute ischemic) patients using a functional support vector machine (FSVM) (Rossi and Villa, 2006).
Functional Data Analysis and High Dimensionality.Functional data, where datasets are collections of functions, are inherently infinite dimensional, which means that the discrete realizations of the underlying surfaces or curves are high dimensional and can cause various problems, such as overfitting.
However, FDA methodologies are generally insensitive to dimensionality; they circumvent the problem of high dimensionality in broadly two ways.One way is via dimensionality reduction, where projecting the data onto a smaller collection of orthogonal bases produces lower dimensional vectors that are robust to discretizations.The choice of basis function depends on the underlying functions; e.g., the Fourier basis functions can be used to approximate functions that exhibit cyclical properties, while wavelet basis functions can be used to approximate functions that exhibit fluctuations.Alternatively, a data driven approach may be adopted, with one of the most commonly used techniques being functional PCA (FPCA) (Dauxois et al., 1982), which is a descriptive technique (rather than inferential, which is the focus of our work), and has previously been implemented on rank functions by Robins and Turner (2016).FPCA works on the principle of finding orthonormal basis functions and projecting onto a finite subset of them with the greatest variation.
Another alternative is to ensure within the construction of the methodology that it is invariant to the choice of grid points for evaluation, such that as the number of grid points increases, convergence to an appropriate result is guaranteed by construction.This is known as the refinement invariance principle (Cox and Lee, 2008) and will not be our focus here.
Functional Support Vector Machines.Classical SVMs are supervised binary classification methods that seek to find the optimal boundary in the feature space which distinguishes between observations of the two categories in a way such that the distance to the boundary from any data point is maximized.For our FSVM application on rank functions, let {f 1 , f 2 , . . ., f N } be a collection of centered, discretized rank functions with corresponding labels (y i ) i=1,...,N ∈ {−1, 1} identifying the two groups.We adopt the soft margin approach to determine the boundary for conventional reasons, as it is used in most computational packages.Here, "soft" refers to certain deviations from the boundary being allowed for classification in the approach.The boundary can be defined for some function ψ ∈ H, H being a Hilbert space, and scalar b ∈ R as ⟨ψ, , where ζ i s are slack variables in the soft margin approach providing trade-offs between accuracy and overfitting.The optimal boundary is one which maximizes the margin given by 2 ∥ψ∥ and the optimization problem can be solved more easily through its dual formulation, by sequential minimal optimization (Platt, 1998).
Practically, however, not all data are linearly separable, in which case, a workaround is to project the data to higher dimensions where a clearer division between the two classes then becomes observable.This technique is referred to as the kernel trick (Boser et al., 1992).Let ϕ be the projection.Then the inner product from the previous optimization problem is replaced by a kernel function, i.e., ⟨ϕ(f ), ϕ(g)⟩.Examples include the polynomial kernel (of order d), K(f, g) = (⟨f, g⟩ + 1) d , and the Gaussian radial basis function (GRBF) kernel, K(f, g) = exp −γ∥f − g∥ 2 (Rossi and Villa, 2006).Their usage can be seen in a wide range of biomedical applications, for example, in the identification of PTSD patients based on resting state functional magnetic resonance imaging (fMRI) (Saba et al., 2022) and also in the classification of brain functions on electroencephalographic signals (Xie et al., 2008).In these two examples, the GRBF kernel outperforms other kernels and classifiers.
Data Description.The dataset we study consists of 86 sequences of 512 beat-to-beat time intervals (RR series) extracted from electrocardiograms from a clinical study between two groups of people in a similar age category: one group of 46 healthy individuals used as control and one group of 40 patients who have recently experienced stroke episodes (Gasecki et al., 2021;Narkiewicz et al., 2021).Stroke patients generally show reduced heart rate variability compared to healthy individuals (Lees et al., 2018).We aim to discern differences in heart rate variability between the two groups using persistent homology rank functions.For computations, we first linearly interpolated between the points in the RR series to construct continuous functions over time and then we constructed a sublevel set filtration based on the height function in the positive y-direction.We compute the zero-dimensional persistent homology rank function for each individual's RR series in the dataset.An example of steps in this process is visualized in Figure 4. Training the FSVM Classifier and Evaluating Performance.On the set of rank functions computed from the data, we train FSVM classifiers using the linear kernel, GRBF kernel, and polynomial kernels of three different degrees (2, 3, and 5).Since we are working with discretized functions, we consider both the rank functions as computed from the data, and transformed versions using dimension reduction.We experiment with both a set of data-driven basis functions obtained from FPCA and a set of standard basis functions-the Haar wavelets.Haar wavelets, in particular, have also been used in other inferential tasks in persistent homology; Häberle et al. (2023) use them in persistent homology density estimation.
We evaluate the performance of the binary classifiers using two metrics: the accuracy and the area under the curve of the Receiver Operator Curve (AUC-ROC).The evaluation is carried out and averaged over ten iterations of five-fold cross-validation.
Results.The performance results of the FSVM classifier are summarized in Table 1.Overall, the FSVM classifiers with degree two and three polynomial kernels produce highest accuracy, > 80%, compared to the performance of other kernels implemented and gave on average AUC-ROC values of over 0.8, indicating excellent discrimination between the two categories.We also include the runtimes for these computations, including: (a) the computation of the PH rank functions; (b) the training of the corresponding SVM; and (c) the computation of 10 iterations of the accuracy and AUC-ROC over five-fold cross-validation, and the corresponding average.Although we observed a higher runtime for the linear SVM on PCA-projected data as well as convergence limitations within the specified maximum number of iterations, the computations are nevertheless manageable in terms of runtime.
In general, linear kernels perform less well in discriminating between functions of the two categories, except when the dimensionality of the rank functions is reduced by projecting onto its principal component functions.
In doing so, we considered only the first 30 principal component functions with the largest eigenvalues, which explains 95% of the variation.For the two transformations, working with lower dimensional vectors and polynomial kernels, we see similar, and slight improvements, in accuracy and AUC-ROC of the classifiers than the original rank function.
Following Graff et al. (2021), we take the SVM classifier quadruples of clinical indices related to the RR series and quadruples of features derived from persistence diagrams as our input data; we now discuss this analysis in further detail.The AUC-ROCs of the optimal models for both approaches can be found in Table 2 as reported in Graff et al. (2021).The optimized performance of the classifier using rank functions was, on average, better than the performance of standard heart rate variability indices on the frequency and time domain, which only achieved an average AUC-ROC of 0.79 and 0.75 respectively (Graff et al., 2021).For the persistence-based approach in Graff et al. (2021), a wide range of topological indices were extracted from the persistent diagrams.Some indices were more typical, such as the total number of intervals, the sum of the lengths of all the persistent intervals, various mean and standard deviations; some were less conventional, such as the persistent entropy (due to Atienza et al. (2020) and given by h(r) , where ℓ i is the length of an interval and L is the sum of the length of all intervals), the frac 5%, 100, 200 (the number of intervals whose lengths are shorter than the threshold of 5% the length of the longest interval, 100ms, and 200ms), or the signal-to-noise ratio (which is the ratio of the sum of "signal" intervals over the sum of "noise" intervals, where the signal is considered to be the intervals longer than the threshold of 5% the length of the longest interval and those not passing this threshold are considered as "noise").They introduce new geometric measures called the topological triangle indices, which are based on the triangular interpolation on the RR interval histograms classically used in heart rate variability analysis and work by constructing a triangle on the persistence diagram with one side lying on the diagonal, enclosing a set of points with a small percentage of outliers and such that the triangle is as compact as possible.With these topological indices, it was shown that optimized combinations of parameters were able to achieve up to 0.84 AUC performance.However, indeed, computing these indices is an involved process.For an additional comparison to a typical persistence-based approach, we also trained SVM classifiers on persistence images (Adams et al., 2017) and persistence landscapes (Bubenik, 2015)-stable vectorizations computed from the barcodes.For persistence images, SVM with a linear kernel achieved optimal performance, with an accuracy of 68.5% and an AUC-ROC of 0.793, amongst SVM with alternative kernels, sparse SVM, and SVM applied after dimensionality reduction with PCA.For persistence landscapes, SVM with a GRBF kernel achieved optimal performance, with an accuracy of 81.0% and an AUC-ROC of 0.903.Full results are shown in Appendix C.
In conclusion, the performance we find using rank functions, as a direct and equivalent representation of persistent homology (as opposed to vectorized and manipulated) has better performance over classification with persistence images and is on par and slightly improved over the much more involved approach of computing topological indices from Graff et al. (2021) and the one using persistence landscapes.

Rank Functions in Multiparameter Persistent Homology
In this section, we explore the use of multiparameter persistent homology (Carlsson and Zomorodian, 2007) in real data applications.This is currently an active area of research in TDA, due to interpretive and computational difficulties.

Multiparameter Persistent Homology: The Struggle to Generalize Barcodes
There is an important distinction between single-parameter persistence modules, i.e., diagrams such as those we have considered so far M ∈ Vec (R,≤) , and multiparameter persistence modules (Carlsson and Zomorodian, 2009), i.e., diagrams which allow indexing over the poset (R n , ⪯).Here, ⪯ is the product order inherited from the total order in the reals, namely (x 1 , . . ., x n ) ⪯ (y 1 , . . ., y n ) if x i ≤ y i for all i = 1, . . ., n.The construction discussed in Section 2.1 giving rise to persistent homology can be replicated for these posets to obtain multifiltrations and multiparameter persistent homology.
The Structure Theorem in Section 2.2 can be extended to general p.f.d.persistence modules indexed over a small category (Botnan and Crawley-Boevey, 2020), which includes the case of multiparameter persistence.As mentioned before, for single-parameter persistence, the only possible indecomposable modules are interval modules, i.e., modules I[b, d) supported on intervals [b, d) ⊂ R, allowing for the definition of barcodes as multisets of intervals, as well as the interpretability of births and deaths of topological features corresponding to the intervals.Although there is a natural extension of the concept of an interval for general posets, the representation type of indecomposable modules over these posets is wider than those supported on intervals, so that no direct, parallel definition of barcode exists.Moreover, it has been shown that there is in fact no hope for a complete, discrete invariant in multiparameter persistence (Carlsson and Zomorodian, 2007).
Given the lack of complete invariants for multiparameter persistent homology, a central research interest has been the development of incomplete, interpretable, and computable invariants.Some strategies to define incomplete invariants include viewing n-dimensional persistence modules as n-graded modules over polynomials (Carlsson and Zomorodian, 2007) and capitalizing the invariants already existing for such objects, such as minimal presentations and multigraded betti numbers (Lesnick andWright, 2015, 2022) or multigraded associated primes and local cohomology (Harrington et al., 2019).Several other proposals bypass the Structure Theorem entirely.Patel (2018) generalizes the Möbius inversion in single parameter persistence connecting rank functions and persistence diagrams to define generalized persistence diagrams.Kim and Mémoli (2021) introduced generalized rank invariants, proving they are the courterpart to generalized persistence diagrams in the Möbius inversion by Patel (2018) in the multiparameter setting.Developing a theory of modules over posets, Miller (2020) defined QR codes for n-dimensional modules.Lastly, using resolutions and rank-exact structures, Botnan et al. (2022) defined signed decompositions and signed barcodes, extending single parameter barcodes and including the generalized persistence diagrams by Kim and Mémoli (2021).As in single-parameter persistence, a third approach entails vectorizing the output of persistent homology by embedding the modules in a Hilbert space.Some of these vectorizations are known to result in a loss of information, however.Popular vectorization methods in multiparameter persistence include persistence landscapes (Vipond, 2020), images (Carrière and Blumberg, 2020), and kernels (Corbet et al., 2019), among others.
Remarkably, rank functions can be extended to multiparameter persistence quite naturally.Let Definition 14 (Rank Invariant).Given a p.f.d.multiparameter persistence module M ∈ Vec (R n ,⪯) , its rank invariant is defined as The space of rank invariants for n-dimensional persistence modules will be denoted by I n .
In any of these approaches, including in the case of rank invariants, applications to real data are in their infancy.A main obstacle is the lack of efficient software to compute the invariants; the current technology also being restricted to two parameters.Rank invariants for biparameter persistence modules can be computed using RIVET (The RIVET Developers, 2020;Lesnick andWright, 2015, 2022)-currently the standard software for most strategies in defining multiparameter invariants.Developing efficient algorithms and optimizing existing software remains an active research area (Kerber and Nigmetov, 2019;Scaramuccia et al., 2020;Fugacci et al., 2023) The question of metrics for rank invariants is equally important as for rank functions.The well-established matching distance for rank invariants restricts multiparameter persistence modules to lines (d'Amico et al., 2003(d'Amico et al., , 2006(d'Amico et al., , 2010)).The matching distance is known to be stable for rank invariants for filtrations obtained as sublevel sets of a function f : X → R n on X a triangulable space and with respect to the L ∞ distance between two filter functions (Cerri et al., 2013).In a more general setting, the matching distance is also known to be stable with respect to the interleaving distance (Lesnick, 2015;Landi, 2018); and it is computable in polynomial time (Kerber et al., 2019;Kerber and Rolle, 2021).
A significant challenge, however, in using the matching distance in applications-especially in inferential tasks-despite its computability is that it does not induce a Hilbert structure on the space of rank invariants, which is often a condition needed in order to adapt FDA methods (e.g., Crawford et al. (2020)).Thus, in our real data application, our focus remains on the L 2 distance on rank invariants, which is also efficiently computed over a discretized grid, providing the necessary Hilbert structure to integrate with FDA methods.

Application of Biparameter Rank Functions in Lung Tumor Classification
We now demonstrate the inferential ability of the biparameter rank functions using nonparametric supervised learning methods on real data.The application focus is to predict lung tumor malignancies from computed tomography (CT) images, which has been studied previously by Vandaele et al. (2023) using single-parameter topological summary statistics.Here, we aim to show that using biparameter persistent homology captures additional distinguishing features of the tumor morphology, both on a local and global scale, which, together with the rank functions, leads to improved classification.
Data Description.We study images from the Lung Image Database Consortium (LIDC), which is freely available from The Cancer Imaging Archive (TCIA) (Armato III et al., 2011, 2015).From the LIDC data, we extract a subgroup of 70 chest CT scans, complete with annotations and masks, consisting of those with primary tumors that have either been diagnosed as benign (29) or malignant (41).
Following the approach in Vandaele et al. ( 2023), we convert the collection of CT scan images and masks into 3D point clouds of landmarks on the tumor surfaces by sampling, as shown in Figure 5. On the resulting point clouds, we compute the biparameter rank invariants using two types of bifiltrations, both of which are extensions of the Vietoris-Rips filtration-namely, the degree-Rips filtration and the height-Rips filtration.
The degree refers to the degree of connectivity measured on each vertex of the 1-skeleton, while the height is measured along the z coordinate, in the direction of stacking of the tumor slices.Using the bifiltration captures prominent features as they develop on the tumor surface along both filtration functions.Classification.We utilize the following two supervised classification methods: • k-Nearest Neighbors (Cover and Hart, 1967): This algorithm is a fundamental classification technique for both multivariate and functional data, where the decision for a new datum is made based on the majority vote of its k-closest neighbors.The method is adaptable to general metric spaces since the proximity can be measured using various metrics; the method has been studied in persistent homology by Marchese et al. (2017); Cao et al. (2024).Here, we work with the rank invariants in L 2 .
• Functional Maximum Depth (López-Pintado and Romo, 2009): This method uses an extended notion of depth on functional data to classify curves and surfaces.For a collection of rank invariants, f 1 (x), . . ., f n (x), x ∈ X , with X its the domain, we define a band as the region or hyperspace bounded by an upper and lower function as follows: The band depth BD is the total number of times that f lies within the band formed by a subcollection of the functions BD n,J (f ) : n (f ) for a fixed value J, where 2 ≤ J ≤ n and (5.1) Here, 1 Bd(f ) is the indicator function of the set Bd(f For our application, we use a modified band depth M BD where instead of using a strict indicator function in (5.1), we consider the proportion of the hyperspace for which f lies within the band: where ω is a Lebesgue measure on X and A(f ; f i1 , f i2 , . . ., f ij ) ≡ {x ∈ X : min r=i1,...,ij f r (x) ≤ f (x) ≤ max r=i1,...,ij f r (x)}.Hence, for any new invariant f , it will be assigned to the class in which the modified band depth (5.2) is maximized.
In the dataset we study, each of the tumor images is classified as either benign or primary malignant.Our task is to use topological summaries of the images as predictors to determine whether a primary tumor is benign or malignant.We train the classifiers on the biparameter rank invariants computed from the whole dataset.Taking a 75/25 split of the data for training and testing and averaging over 50 iterations, we obtain the results in Table 3.Furthermore, 24 of the 29 CT scans of benign tumors and 17 of the 41 CT scans of malignant tumors were taken with added contrast material.Refining to this smaller set, we see further improvements in the predictive accuracies reported in Table 4.
Results.Without added contrast, our results show that by training a modified maximum depth (M BD) Overall the performances of MBD classifiers trained on the different bifiltrations are better than the performance of k-NN classifiers and also the optimized model in Vandaele et al. (2023) which achieved an AUC-ROC of 67.7 on this dataset.
Moreover, comparing the performance on the subset of data with added contrast material, we find that the k-NN classifiers achieved better AUC-ROC with both filtrations than the optimal model in Vandaele et al. (2023) which had an AUC-ROC of 78.0 on average.In fact, the average AUC-ROC for the best k-NN classifier based on height-Rips filtration was 83.0.Therefore, indeed we find that the additional information captured by the bifiltration leads to better predictions.

Discussion
In this paper, we revisited persistent homology-which provides a geometric representation of data and can be used as a tool for point cloud processing-represented by rank functions in inferential, nonparametric FDA settings.In order to be able to validate our findings from the data analyses, we derived stability conditions on rank functions endowed with an appropriate metric over function space for FDA implementation: namely, the L 2 distance, which provides a Hilbert structure on the space of rank functions.Stability of rank functions, alternatively known as persistent Betti numbers, was well established with respect to the matching distance (Cerri et al., 2013), while, to the best of our knowledge, a thorough understanding of the stability behavior of rank functions endowed with the L p metric was previously missing in the literature.We fill this gap with Proposition 10, showing that we can compare to the bottleneck distance for barcodes only when restricting to points away from the diagonal; and Theorem 12, where we are also able to find bounds for rank functions with respect to the 1-Wasserstein distance.We also evaluated the performance of the topological representation of data as rank functions in two real-world applications and found that incorporating topological information outperforms previous non-topological methods, as well as other persistence-inspired approaches that use complicated constructions rather than equivalent representations of persistence diagrams.In addition to performing less well, these topological constructions based on persistent homology are more difficult to interpret and relate back to the original data.A particularly important contribution in this work that we highlight is in the second application where we used biparameter rank invariants (i.e., rank functions adapted to multiparameter persistent homology).The adaptation of multiparameter persistent homology to real data is still in its infancy and far from as widespread as in the single-parameter case, because, given the lack of direct extensions for the barcodes and persistence diagrams to higher dimensions, much of the work in the recent years has been foundational and devoted to finding alternative invariants that capture as much information as possible from computing persistent homology.We have found that using rank invariants directly in our real data analysis and machine learning task of classification provides excellent results, encouraging the use of this invariant in multiparameter persistent homology.
This naturally inspires several directions for future research.The first would be extending the theoretical stability results in Section 3 to multiparameter persistent homology represented by rank invariants.As in this work, this would be an important theoretical result needed to validate the experimental findings in this paper as well as justify its continued use in applying multiparameter persistent homology in real data applications.An additionally important direction to study would be comparative: given the multitude of invariants proposed in the literature on multiparameter persistent homology, understanding the performance of rank invariants in comparison to that of other existing variants would provide a basis and guideline for invariant usage in real data applications.Specifically, we would like to know whether rank invariants are able to capture more information, as a direct invariant obtained from persistent homology, than other functional vectorizations which embed modules into Hilbert spaces.In the single-parameter setting, this is true, as we explored in this work.

A Proofs
We now give the proofs of the key theoretical results presented in Section 3.
Proof of Proposition 10. .Let 1 ≤ p < ∞ and M a p.f.d.persistence module with barcode Bar Let N be a p.f.d.persistence module such that d B (Dgm (M ), Dgm (N )) < η, with barcode Bar (N ) = {[ bj , dj ) : 1 ≤ j ≤ n}.By the definition of η, the optimal matching ϕ between points in Dgm (M ) and Dgm (N ) defined by the bottleneck distance matches all points outside of the diagonal in the diagram Dgm (M ) to points in Dgm (N ) outside of the diagonal.In addition, all the remaining points in Dgm (N ) matched to the diagonal are at an ℓ ∞ -distance of their orthogonal projection to the diagonal of less than δ/2, which means that β N δ = 0 for all of them.With these two facts and the additivity of rank functions, we obtain for all 1 ≤ i ≤ m and J ⊂ {1, . . ., n} is the subset of indices of points in Dgm (N ) matched to the diagonal.
We now obtain a bound for (A.1).For i ∈ {1, . . ., m}, define the sets where ∆ denotes the symmetric difference (see Figure 6 for some illustrative examples of D i ).
Notice that for (x, y) ∈ D c i , the truncated rank functions coincide, i.e., β ); also, these rank functions differ by one for (x, y) ∈ D j .This implies where ω denotes the Lebesgue measure in R 2 .
The rectangles depicted in red dashed lines and green dotted lines in Figure 6 each have one side of length 6c).
The other side of both rectangles is bounded by Observe that this is also a bound for the lengths of the sides of the rectangle in the intersection.
In case 6c, by adding the Lebesgue measure of the rectangles, we get where ω(•) denotes the Lebesgue measure.In cases 6a and 6b, adding the Lebesgue measure of the rectangles and triangles that decompose the figures, we obtain As previously mentioned in Section 3, we can always obtain a bounding constant that depends on both modules M and N as follows: let R ′ be the maximum between the lifetimes of the bars in the barcodes of M and N , so that ω(D j ) ≤ 2R ′ • d B (Dgm (M ), Dgm (N )) (see (A.2)); and then use this bound afterwards (A.1).
In the proof of Proposition 10 above, we refine this strategy by bounding with a constant that only depends on the module M .
A natural follow-up question is whether it is possible to achieve a bound such as that in (A.4), but with the following dependency with respect to the bottleneck distance: p for some values p ≥ 2. This would imply a Lipschitz stability condition (notice that for p = 1, Proposition 10 is actually a Lipschitz condition).In a similar vein to Corollary 8 by Skraba and Turner (2021), the answer to this question is negative, and the key to this fact is the following counterexample.
Given two different persistence modules, M, N ∈ Vec (R,≤) , the p-landscape distance between them is defined as In Bubenik (2015), several stability results are established for landscapes endowed with this metric.Although landscapes and rank functions are inherently different in nature-where the former is a vectorization of persistence diagrams and barcodes (building from the latter), while the latter is a direct and equivalent representation of diagrams and barcodes-both have been used in real-data applications: a main contribution of this work is the performance assessment of rank functions in inferential machine learning tasks.This then raises the question of comparison between the stability results associated with landscapes versus those established in this work.
A first observation is that the p-landscape metric, introduced in Bubenik (2015), involves an infinite sum over the L p distances of these landscapes, which is a first distinction from the direct L p metrics that we consider over rank functions.Using the ∞-landscape distance, stability is then achieved with respect to the bottleneck distance between diagrams, which surpasses our Proposition 10 (Bubenik, 2015, Theorem 13).However, this is expected, since the persistence landscape is an incomplete invariant and thus sacrifices some information encompassed in the persistence diagram for improved stability in the L ∞ metric, while rank functions, as mentioned previously, are exactly equivalent to persistence diagrams and therefore comprise all topological information of the data captured by persistent homology.
Up until recently, such a stability bound was the best possible, since stability of PH was only rigorously established for the bottleneck distance.However, thanks to new stability results for the p-Wasserstein distances established by Skraba and Turner (2021), stability is now possible with respect to these metrics.This is what we achieve in Theorem 12. Comparing this result to the p-landscape stability theorem (Bubenik, 2015, Theorem 16) is challenging due to different settings and metrics.(Bubenik, 2015, Theorem 16) considers filtrations over triangulable, compact metric spaces-a restriction we do not impose.In this setting, the p-landscape metric is compared to the L ∞ distance between filtering functions in sublevel-set filtrations.
Our work extends beyond sublevel-set filtrations, and our L p metrics over rank functions are thus not easily comparable to the p-landscape distances.

C HRV Classification Results using Persistence Images and Persistence Landscapes
We include here the results of the SVM classification using the vectorization techniques of persistence images and persistence landscapes on HRV data.Table 5 shows the average accuracy, AUC-ROC and runtimes (in seconds) of the SVM classifier using persistence images under various kernels with and without dimensionality reduction using PCA.Table 6 shows the same data for the 5 first persistence landscapes λ k , 1 ≤ k ≤ 5.In these tables, the runtime includes: (a) the computation of the PH barcodes for all data, and from them, the computation of the corresponding vectorizations; (b) the training of the corresponding SVM; and (c) the computation of the accuracy and AUC-ROC over five-fold cross-validation.Experiments were run in a processor 11th Gen Intel Core i5-1135G7, with 16GB RAM.Table 7 further shows the average accuracy and AUC-ROC of linear support vector classification (LSVC) and sparse LSVC on the data.Recall that where standard LSVC adopts the L 2 penalty in the loss function, sparse LSVC adopts the L 1 norm, effectively reducing the dimensionality of the feature space (Zhu et al., 2003).

Figure 2 :
Figure 2: Examples of 3D point clouds and their corresponding persistence diagrams.

Figure 4 :
Figure 4: An example showing (a) the RR series for a healthy individual, (b) its respective persistence diagram and (c) its rank function.

Figure 5 :
Figure 5: An example showing the data extraction process.Annotated CT images (a) from the LIDC combined with masks (b) are converted into a 3D surface (c) from which we can sample a point cloud.

Figure 6 :
Figure 6: Example of domains D i (shaded in blue) on which rank functions differ and the sketched rectangles (delineated by red dashed lines and green dotted lines) indicate the bound of ω(D i ) when M = I[b i , d i ) and N = I[b ′ i , d ′ i ).

Table 1 :
Average accuracy, AUC-ROC and runtimes of classifiers constructed on rank functions and projected rank functions with linear, GRBF, and polynomial kernels over ten iterations of five-fold cross-validation.

Table 2 :
Graff et al. (2021)VM conducted on quadruples of persistence and non-persistence based features as reported inGraff et al. (2021)using three-fold cross-validation and a standard scaler on the input data.

Table 3 :
Accuracy and AUC-ROC of classification between benign and malignant primary tumors in the LIDC dataset.

Table 4 :
Accuracy and AUC-ROC of classification between benign and malignant primary tumors in LIDC dataset with added contrast material.