Persistent homology classification algorithm

Data classification is an important aspect of machine learning, as it is utilized to solve issues in a wide variety of contexts. There are numerous classifiers, but there is no single best-performing classifier for all types of data, as the no free lunch theorem implies. Topological data analysis is an emerging topic concerned with the shape of data. One of the key tools in this field for analyzing the shape or topological properties of a dataset is persistent homology, an algebraic topology-based method for estimating the topological features of a space of points that persists across several resolutions. This study proposes a supervised learning classification algorithm that makes use of persistent homology between training data classes in the form of persistence diagrams to predict the output category of new observations. Validation of the developed algorithm was performed on real-world and synthetic datasets. The performance of the proposed classification algorithm on these datasets was compared to that of the most widely used classifiers. Validation runs demonstrated that the proposed persistent homology classification algorithm performed at par if not better than the majority of classifiers considered.

vertices are element of K 0 , such that if v ∈ K 0 then [v] ∈ K and if τ and σ are simplices such that σ ∈ K 23 and τ ⊂ σ then τ ∈ K. Moreover, the dimension of K is the maximum of the dimensions of its elements. 24 Definition 4 Let K i denote the set of i-simplices in a simplicial complex K. The n-skeleton of K is the 25 union of the sets K i for all i ∈ {0, 1, 2, ..., n}. If σ 1 is a simplex of dimension n 1 and σ 2 is a simplex of 26 dimension n 2 , such that σ 1 ⊂ σ 2 , then σ 1 is said to be a face of σ 2 of codimension n 2 − n 1 .

29
A simplicial complex can be referred to as an abstract simplicial complex because of its abstract 30 nature. But, one can interpret a finite simplicial complex geometrically as a subset of R n for some natural 31 number n. Such subset is called a geometric realization and it is unique up to a canonical piecewise-linear 32 homomorphism (Otter et al., 2017). That is, for a simplicial complex K, there exists a geometric simplicial 33 complex G whose vertices are in one-to-one correspondence with the vertices of K and a subset of vertices 34 in K define a simplex in G if and only if they correspond to the vertices of some simplex of K. 35 Figure 1 shows the respective geometric realization of simplicial complexes A and B in R 2 .

36
Note that a simplicial complex ∆ can also be viewed as a topological space expressed as a quotient of 37 disjoint union of simplices by an equivalence relation that identifies certain faces of certain simplices.

39
A formal sum of k-simplices is called a k-chain and the free abelian group having a collection of 40 k-simplices as its basis is called a chain group.

41
Let X be a simplicial complex and ∆ k (X) be the free abelian group generated by the k-simplices of X. Elements of ∆ k (X) are called k-simplicial chains. For any k ∈ {1, 2, 3, ...}, define the boundary map as the linear map The boundary map ∂ k maps each k-simplex to its boundary, which is the sum of its faces of codimension 42 1. The map ∂ 0 is called the zero map. It can be shown that ∂ n • ∂ n+1 = 0, that is the boundary of a boundary 43 is always empty. Moreover, the image of ∂ n+1 is contained in the kernel of ∂ k .

44
The boundary operators and the chain groups form into a chain complex C * : Definition 5 For each n ∈ {0, 1, 2, 3, ...}, the n-th homology of a simplicial complex X, is given as Moreover, its dimension is called the n-th Betti number of X, or the rank of the n-th homology group of (X). And, elements of 45 Im(∂ n+1 ) are called n-boundaries, and elements of Ker(∂ n ) are called n-cycles.

46
The n-cycles which are not boundaries represent n-dimensional holes. Thus, the n-th Betti number gives the number of n-holes. Particularly, the β 0 (X) gives the number of connected components, the β 1 (X) gives the number of tunnels, the β 2 (X) gives the number of voids, and so on. Furthermore, if X is a simplicial complex of dimension p, then H n (X) = 0 for each n > p. Then there is the following sequence, Then there is the following sequence, Also, for k = 1, 2, 3, the boundary operator ∂ k is defined for k-simplices, respectively as follows, The homology groups are computed as follows, The Betti numbers are β 0 = 2, β 1 = 1, β 2 = 0, which means that there are 2 connected spaces, 1 hole 69 and 0 voids in A.

70
For the succeeding sections, simplicial homology will be defined over the field F 2 with 2 elements,

71
where 1 = −1. So instead of defining the chain groups as free abelian groups, we define the chain groups 72 as vector spaces over F 2 . However, when computing simplicial homology over F 2 , one needs to be 73 careful when defining the boundary maps ∂ k to ensure that ∂ k • ∂ k+1 remains the zero map (Otter et al., Theorem 1 (Nerve Theorem) The geometric realization of the nerve of U is homotopy equivalent to 109 the union of sets in U .

110
Definition 7 TheČech complex with parameter ε of X is given aš where B(x, ε) is a closed ball of radius ε centered at x.

111
If the cover of the sets in X is sufficiently 'nice,' then the Nerve Theorem guarantees that the nerve of  Definition 8 Let (X, d) be a metric space, S be a subspace of X with the induced metric. The Vietoris-Rips complex with parameter ε, denoted by R ε (X), is the set of all σ ⊂ X, such that the largest Euclidean distance between any of its points is at most 2ε. That is, given S ⊂ X, Both the Vietoris-Rips complex and theČech complex are abstract simplicial complexes which may 118 be defined at various parameters ε, but onlyČech complex preserves the homotopy information of the 119 topological spaces formed by the ε-balls.

122
The construction of a VR complex can be made easier with the use of clique complexes, also known 123 as the flag complexes. In topology, recall that a graph is complete if any two vertices in the graph is 124 connected by an edge and the set of vertices which form a complete graph is called a clique. A k-clique Definition 10 Let K s be a subcomplex in the filtration of the simplicial complex K, or K s be the filtered 156 complex at time s, and Z s k = Ker∂ s k and B s k = Im∂ s k+1 be the k-th cycle group and boundary group of K s , 157 respectively. The k-th homology group of K s is H s k = Z s k /B s k = Ker(∂ s k )/Im(∂ s k+1 ).

158
Definition 11 For p ∈ {0, 1, 2, ...}, the p-persistent k-th homology group of K given a subcomplex K s is K s is .
The p-th persistent k-th Betti number β s,p k of K s is the rank of H s,p k (K). Note that the zero-persistent 159 homology groups of K s are the same as the actual homology groups of K s .

160
The results of computing the persistent homology of a filtered simplicial complex are normally given

173
The persistent barcode for the filtered simplicial complex K can be created using the following steps. First, K must be associated to boundary matrix whose entries represents faces of the simplexes. It is assume that each of the simplexes of the nested sequence of complexes follow a total ordering such that a face of a simplex precedes the simplex and a simplex in the i-th complex K i precedes the simplices in K j for j > i, which are not in K i . Let n be the total number of simplices in the complex, and σ 1 , σ 2 , ..., σ n be the simplices. The square matrix B, of dimension n × n, is constructed by assigning a value 1 in B(i, j) if the simplex σ i is a face of simplex σ j of codimension 1 and a value 0 otherwise. That is, the boundary matrix B is defined by After constructing the boundary matrix B, it has to be reduced using the standard algorithm, sometimes • If low( j) = i, then the simplex σ j is paired with σ i , and the appearance of σ i in the filtration causes 183 the birth of a feature that dies with the entrance of σ j .

184
• If low( j) is undefined, then the appearance of the simplex σ j in the filtration causes the birth of 185 a feature. If there exists k such that low(k) = j, then σ j is paired with the simplex σ k , whose 186 appearance in the filtration causes the death of the feature. If no such k exists, then σ j is unpaired. complexes in the filtration of K. Suppose K 0 ⊆ K 1 ⊆ K 2 ⊆ · · · ⊆ K m is a filtration of K with respect to 192 parameter values ε i 's such that 0 = ε 0 < ε 1 < ε 2 < · · · < ε m . Let n ∈ N be the number of simplices σ j 's

196
And, the persistent homology of K based on the filtration K 0 ⊆ K 1 ⊆ K 2 ⊆ · · · ⊆ K m is the homology of

200
A filtration of K using theČech complexes, / 0 = K 0 ⊆ K 1 ⊆ K 2 ⊆ K 3 ⊆ K 4 ⊆ K 5 , is given in Fig. 4. Recall that the n-th Betti number of a topological space X is denoted by β n , which is equal to the rank of the n-th homology group H n . Moreover, if K is a simplicial complex and {K r } r∈J for some indexing is the the filtration of K, the p-th persistent k-th Betti number β s,p k of K s is the rank of H s,p k (K). From the persistent Betti numbers, there is a set of multiplicities µ i, j n > i such that

216
Discussion of the robustness and stability of persistence diagram requires the notion of distance. Given 217 two persistence diagrams, say X and Y , the definition of distance between X and Y is given as follows. for p = ∞, where d is a metric onR 2 and φ ranges over all bijections from X to Y .

219
Normally, d is taken to be L q where q ∈ [1, ∞] and the most commonly used distance function is the