Euler characteristic curves and profiles: a stable shape invariant for big data problems

Abstract Tools of topological data analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well-studied data summary, suffers a number of limitations; its computations are hard to distribute, and it is hard to generalize to multifiltrations and is computationally prohibitive for big datasets. In this article, we study the concept of Euler characteristics curves for 1-parameter filtrations and Euler characteristic profiles for multiparameter filtrations. While being a weaker invariant in one dimension, we show that Euler characteristic–based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations, and practical applicability for big data problems. In addition, we show that the Euler curves and profiles enjoy a certain type of stability, which makes them robust tools for data analysis. Lastly, to show their practical applicability, multiple use cases are considered.


Introduction
Topological Data Analysis since its beginning [14], [30] has brought attention in the data science community.Topological tools, like persistent homology [15] and mapper [30] were used in multiple tasks in material science [22], [13], [19], medicine [23] and many more.In time, persistent homology has been successfully integrated with machine learning pipelines, and mapper became an exploratory data analysis tool.In this work we will extend on the path of persistent homology.With its successes, attempts were made to apply it in task of big data analysis.However, the progress is minimal.While there exist a single distributed implementation [1], it does not scale up and was not extensively used in big data analysis.In practice, mostly various sequential implementations are used [31].To bypass the problem of too large input, a number of sparsification techniques [29], [28] as well as bootstrap [9] and zig-zag [7] approaches were proposed.While they scale up to problems of a certain size, they tend to bypass the big data challenge rather than proposing a solution for it.
In this paper we extend the tool of classical Euler characteristic and Euler characteristic curves.The new contributions include: • A proof of stability of the Euler Characteristic Curve (ECC) with respect to the 1-Wasserstein distance between persistence diagrams; PD and DG acknowledges support by Dioscuri program initiated by the Max Planck Society, jointly managed with the National Science Centre (Poland), and mutually funded by the Polish Ministry of Science and Higher Education and the German Federal Ministry of Education and Research.
• A generalization of the Euler Characteristic Curve to the multiparamenter filtration case, with arbitrary number of parameters, that we denote as Euler Characteristic Profile (ECP); • An analysis of the stability of such ECPs; • Distributed algorithms to compute the exact ECC for Vietoris-Rips and cubical complexes that can be naturally extended to the multiparameter case.An Python implementation of such algorithms is provided as scikit-learn [24] compatible package.• Discussion of methods to compare and vectorize ECCs and ECPs; • Examples of applications of the ECC/ECP to real word data.While we are not aware of any distributed algorithm to compute Euler Characteristic Curves of a Vietoris-Rips complex, Heiss and Wagner [18] describe a streaming algorithm to compute the ECC from cubical complexes which has also been adapted for GPU computations [33].While their implementation is very fast we see no straightforward way to generalize it to the multiparameter filtration case.To the best of our knowledge the concept of Euler Characteristic Profiles of arbitrary dimension is novel in the literature.There are however some works that focus on the bifiltration case, known as Euler Characteristic Surfaces.It was used in an applied setting by Roy et al. [27] to analyze drying droplets but no topological background is provided.Beltramo et al. [2] gave a description of Euler Characteristic Surfaces in the persistence homology framework and apply it to obtain a descriptor of both pointcloud and image based data.Moreover, they provide a Python implementation of their algorithms which however requires the input bifiltration to be binned.Chen et al. [10] introduced a time-aware multipersistence Euler-Poincaré surface to describe dynamical networks and proved its weak L 1 stability.A recent preprint by Perez [25] analyzes the stability of Euler and Betti curves of stochastic processes on compact Riemannian manifolds.

Euler characteristic curves (and profiles)
In this section we introduce the essential mathematical concepts needed to define Euler Characteristic Curves and Profiles.For an exhaustive presentation we refer to classic textbooks like [17] and [15].Definition 2.1.A CW or cell complex X is a topological space that can be built up starting from a discrete set X 0 of 0-dimensional cells and then inductively creating the nskeleton X n by attaching n-cells to X n−1 along their boundary.The process can be stopped at some finite dimension or can continue indefinitely.A subset A ⊆ X is a subcomplex of X if with each cell of A, all its lower dimensional cells enters A.
Remark 1.Since we are interested in applying this machinery to analyze real word data we will always assume that our complexes are finite.
While the theory can be built in the general CW complex setting, the algorithms we present in Section 4 are specific to two different specializations that are used to represent different types of data: simplicial and cubical complexes.Definition 2.2.An abstract simplicial complex is a finite collection of sets K such that σ ∈ K and τ ⊆ σ implies τ ∈ K.The sets in K are called simplicies and the dimension of a simplex is dim(σ) = card(σ) − 1.We will often refer to 0-simplices as vertices, and to 1-simplices as edges.Given a simplex s = {v 0 , . . ., v k }, its boundary is ∂s = k i=0 (−1) i {v 0 , . . ., vi , . . ., v k }, where vi denotes that the vertex v i is removed from the simplex.Simplices {v 0 , . . ., vi , . . ., v k } k i=0 are in the boundary of s.There are different ways of obtaining an abstract simplicial complex from point cloud data such as the Čech, the Vietoris-Rips and the Alpha constructions [15], in Section 4.1 we describe the Vietoris-Rips construction.
Definition 2.3.An elementary interval is a subset of R of the type I = [l, l + 1] or I = [l, l], for some integer l.The first type is called non-degenerate interval while the second is a degenerate interval.An elementary cube C is a product of elementary intervals C = I 1 × • • • × I n and its dimension is the number of non-degenerate intervals in the product.The boundary of an elementary interval is Similarly to the simplicial complex case, a cubical complex K is a collection of elementary cubes closed under operation of taking boundary One of most common use case of cubical complexes is image data.In Section 4.5 we describe how to build a filtered cubical complex from an n-dimensional image, by identifying the image's pixels with top dimensional cells.
In what follows we will refer to simplices and cubes, as elements of a simplicial or a cubical complex jointly as cells in a cell complex.A cell τ is said to be a face of σ if τ is in the boundary of σ.
Definition 2.4.Let K be a cell complex and d a dimension.A d-chain is a formal sum of d-cells in K, namely c = a i σ i where the σ i are the d-cells and the a i are the coefficients.
There are many possible choices for the group of coefficients.A standard approach in computational topology is to use modulo 2 coefficients, i.e. the a i can be either 0 or 1 and satisfy 1 + 1 = 01 .Other options include integer, rational or real coefficients.
Two d-chains can be added component-wise.Namely, given c = Definition 2.6.Let K be a cell complex.A filtration of K is a sequence of nested subcomplexes It can be obtained by means of a filtration function over K, a monotonic non-decreasing function f : For each dimension d, such a filtration corresponds to a sequence of homology groups 0 Definition 2.7.The d-th persistent homology groups are the images of the homomorphisms H i,j d = imf i,j d .The ranks of these groups are the d-th persistent Betti numbers β i,j d = rank(H i,j d ).Intuitively, the d-th persistent Betti number β i,j d counts how may homology classes of K i are still present in K j .There are two scenarios in which a homology class from K i may not be present in K j -it may either became trivial, or it may became identical (homologous) to a class that was created earlier.All the points on the diagonal are always included, with countable multiplicity, in a persistence diagram, in order to make sense of the following.Definition 2.9.A matching of two persistence diagrams C and D is a bijection η : C → D possibly to or from points on the diagonal.
where η is a matching of C and D.
Definition 2.11.The Euler Characteristic of a cell complex K is the alternating sum of the number of its cells in each dimension Where K d denotes the d dimensional cells in K. Thanks to the Euler-Poincaré formula, the Euler characteristic can also be expressed as the alternating sum of the Betti numbers, the ranks of the cell complex's homology groups: [15].Definition 2.12.Let us consider a filtered complex K with filtration function f : K → R. We can define its Euler Characteristic Curve as a function that assign an Euler number χ for each filtration level We are now interested in extending the concept of Euler Characteristic Curve to the more general Multidimensional Persistence setting [8].In order to do so, we need to generalize Definition 2.6 to families of nested complexes indexed by posets.While Multidimensional Persistence is a vibrant and active research topic, in this paper we will only make use of the basic concepts.We refer the interested reader to [6] for a modern introduction to the topic.Definition 2.13.Let K be a cell complex and P a poset.A P-indexed filtration on K is a family of nested complexes such that K x is a subcomplex of K for each x ∈ P , and where each T i is a totally ordered set, we call a multiparameter or n-parameter filtration.
It is a natural question to ask whether the idea of sublevel sets of a filtration function could be extended too.In general, this is not the case.It can be achieved only when each cell of K first appears in the filtration at some unique minimal index in P .Definition 2.14.Let K be a cell complex, P a poset and f a function f : K → P .The sublevelf iltration of f is a family of complexes of the type A filtration isomorphic to a sublevel filtration is said to be 1-critical.A filtration that is not 1-critical is said to be multicritical.
Definition 2.15.The Euler Characteristic Profile (ECP) of a P -filtered complex K is a function that assign to any value p ∈ P the Euler characteristic of the corresponding subcomplex K p .
For the rest of the paper we will focus on the case P = R n .
Remark 2. The two dimensional ECP already appeared in the literature and it is known as Euler Characteristic Surface [27], [2], [10].It was however defined only for the Cartesian product of two one-parameter filtrations and it is treated as matrix in the following way.Given a bi-filtering function F : K → R 2 over K and a set of threshold values ).This matrix representation corresponds to sampling the two dimensional profile on the grid given by I.In general the choice of such grid is not unique, and the spacing of such grid may not be constant.This makes it difficult to define a general notion of distance between Euler Characteristic Surfaces matrices.For this reason we think it is more natural to define the Euler Characteristic Profile as a function like in 2.15 and look for stability results in this setting.

Stability of Euler Characteristic Curves and Profiles
The goal of this section is to find a bound for the distance between Euler Characteristic Curves by some know topological quantity of the point cloud that is robust with respect to small perturbations of the point cloud.This way, the stability of Euler Characteristic Curves is obtained.
3.1.Euler Characteristic Curves.Since ECCs are are piece-wise constant functions, we consider the L 1 distances between them.Definition 3.1.Let K 1 and K 2 be two filtered cell complexes.The L 1 distance between their Euler Characteristic Curves is The proof presented in this section is inspired by the stability result for persistence functions by Chung and Lawson [12].They analyze stability of a wide class of persistence curves and obtain a general bound (see Theorem 1 in [12]).However, trying to specialize this result to the simple Betti curve case leads a term that depends on the number of points in the Persistence Diagram.Hence the authors claim that Betti curves are unstable.
We will instead carry out the proof focusing exclusively on Betti curves, by doing so a stability result can be obtained.Definition 3.2.Let K be a cell complex with filtration function f .Its k−th Betti curve is a function that assigns to each filtration level the k−th Betti number of the corresponding subcomplex.
Let now D be the k−dimensional persistence diagram obtained from a filtered complex K.The Fundamental Lemma of Persistent Homology [15] states that the k−th Betti number of the subcomplex K t can be obtained by counting the points in the diagram that lie in the box (x, y) | x ≤ t < y, Proposition 3.1.Let C and D be two k-dimensional persistence diagrams.Their Betti curves are stable with respect to the 1-Wasserstein distance, (1) Proof.Let us consider two k−dimensional persistence diagrams C, D and assume the optimal matching under the 1-Wasserstein distance is known.Moreover let us index the points in each diagram as ) so that points with matching indices are paired under the optimal matching.The case when points from one diagram are matched to diagonal is described in the case 2 below.We can then write the difference between the two Betti curves as following Let us focus on a single term of the sum, Then, one of the following cases have to hold:

The matching of one point (b
Note that, because of this, C and D are not required to have the same number of off-diagonal points.

Case 3: b
This case will never happen as a better matching can always be obtained by matching both points to the diagonal, which is a degenerate Case 2.
We have that ∥h holds for every i.We can then write the difference between two Betti curves as

□
Thanks to the Euler-Poincaré formula, the Euler Characteristic Curve of a filtered complex K can be obtained as the alternating sum of its Betti curves.
A stability result for the ECCs can be immediately derived from 1 assuming that the complex K has nonzero persistence diagrams in a finite number of dimensions, each of them containing a finite amount of off-diagonal points.Proposition 3.2.Let X and Y be two filtered cell complexes.The L 1 difference between the Euler Characteristic Curves of X and Y is bounded by the sum of the 1-Wasserstein distances between the corresponding k-dimensional persistence diagrams Dgm k (X), Dgm k (Y ).
( 2) Where the sum is over all dimensions in which the persistence diagrams are non-empty.
Proof.It is an immediate consequence of 3.1 and the triangular inequality.

□
The above Proposition 3.1 is in explicit contrast with the claim that the Euler Characteristic Curve is unstable.In addition to the already mentioned work by Chung and Lawson [11], a similar statement can be found in [2] and [11].
Remark 3.With reference to Figure 1, the left-hand side in 3.1 is finite when the two ECCs agree from some filtration value onward.This is exactly what happens, for example, when considering curves obtained from full complexes, i.e. filtered complexes having a single simplex as a last element of a filtration: at some value all possible faces will have entered the filtration and so the Euler characteristic will stabilize at 1.If this does not happen the difference between the two ECCs will be unbounded.At the same time, it is straightforward to show that if two filtered complexes have different Euler characteristic at +∞ their homologies will have a different number of essential classes.This translates to a different number of points at infinity in the persistence diagrams, whose Wasserstein distance would then be unbounded.In this case, the above result will trivially be +∞ ≤ +∞.
3.2.Euler Characteristic Profiles.We can immediately extend the notion of L 1 distances between ECCs to work in the general case of n−dimensional ECPs.Definition 3.4.Let K 1 , K 2 be two multifiltered cell complexes.The L 1 distance between the corresponding n−dimensional Euler Characteristic Profiles is It is natural to ask whether the stability result in 2 can be naturally extended to the multi-parameter case.In the existing literature, Chen et al. proposed the following weak L 1 -metric in the case of bifiltered complexes (see Definition 3.2 in [10]).Let us remind the proposed construction; consider two cell complexes K 1 and K 2 with a bifiltration function F : K 1,2 → R 2 .Let us denote with f and g the two real valued functions in the bifiltrations such that F (σ) = ((f (σ), g(σ))) for every cell σ.Moreover, let us index the threshold values of F as The idea behind Chen et al. construction is to fix one of of the two filtrations at a specific value and consider the distances between the single parameter persistence diagrams induced by the other filtration function.By considering the set of threshold values I as a matrix with i rows and j columns , they define the i th column distance for the k−dimensional PDs as ). Definition 3.5 (Definition 3.2 in [10]).The weak L 1 metric between K 1 and K 2 is Being able to recover the single parameter case, they prove the following stability result.
Proposition 3.3 (Theorem 3.1 in [10]).Let K 1 , K 2 be two bifiltered cell complexes.The distance between the corresponding Euler Characteristic Surfaces is bounded by the weak L 1 metric metric between K 1 and K 2 , for some c > 0.
This constructions appears to be the natural generalization to multifiltration case of the stability result in 2. However, there are some fundamental problems that undermine the usefulness of such weak L 1 metric.Remark 4. In our opinion the sums over rows or columns in 3.5 should be replaced with integrals over the filtration ranges.As already discussed in Remark 2, this would allow for more flexibility when dealing with filtration thresholds whose spacing is not constant. .Minimal counterexample for the instability of ECP.Consider a cell complex made by only one vertex whose +1 contribution appears at some point g = (g 1 , g 2 ) ∈ R 2 and move it to g ′ = (g 1 + ϵ, g 2 ).Their difference, the region shaded in red, is unbounded Remark 5. Proposition 3.3 will evaluate to a trivial ∞ ≤ ∞ in most cases, even the simplest one.Consider for example the situation depicted in Figure 5 of the ECP of a bifiltered complex K 1 made by just one 0-dimensional cell that appears at filtration value (g 1 , g 2 ).The ECP will then be 1 in the cone {(x, y) ∈ R 2 : x ≥ g 1 , y ≥ g 2 } and 0 otherwise.We can obtain a different complex K 2 by perturbing the first filtration value by an ϵ amount (g 1 + ϵ, g 2 ).The difference between the two ECPs will then be unbounded.At the same time, also the weak L 1 distance between K 1 and K 2 will be unbounded because in the interval [g 1 , g 1 + ϵ) × [g 2 , +∞] the two complexes have a different number of essential classes and so the W 1 distance between the corresponding PDs will be infinite.
Because of the discussed issues, the stability result in [10], while being formally correct, does not cover a lot of practically relevant cases.
However, in most applications we can truncate the ECP, by limiting its filtration domain to i.e. the interval [0, f ∞ ] in every filtration dimension, where f ∞ is a finite value.Note that this value at infinity should not be the same as the maximum filtration value of the complex's cells, but it should be strictly larger than the maximum filtration value.For example, in the case of images whose pixels have integer filtration values in the [0, 255] range (see Section 7.1) we could choose f ∞ = 256 as truncation value.By doing so, the distance between every pair of ECP will be finite but it will of course depend on the truncation value.Using truncation, we can state the following result.Proposition 3.4.Let K be a finite cell complex with a n-dimensional multifiltration F : K → R n .We define K ϵ as the complex obtained by perturbing the filtration values of each cell in K by at most ϵ in in l ∞ norm.Let us assume, for simplicity, that we truncate the domain of every filtration function to the same interval [0, f ∞ ].We then have the following bound where |K| is the number of cells in the complex and n is the number of filtration parameters.
Proof.Let us consider a single cell σ ∈ K with filtration value g = (g 1 , • • • , g d ) .Its contribution to the ECP will be (−1) dim(σ) in the cone above g (i.e. for all points x ∈ R d such that g ≤ x coordinate-wise).Let σ ′ be the corresponding cell in K ϵ whose filtration values have been maximally perturbed to ).The volume of the region which is in the cone of g but not on the cone of g ′ can be bounded by a sum of n n-dimensional cuboids of base ϵ n−1 and height f ∞ , each of them corresponding to a shift of ϵ in the direction of one of the axis, where the inequality is due to the fact that cuboids can have non-empty intersection.One of such cuboids is shaded in red in Figure 5. Multiplying by the total number of cells give us the bound.□

Algorithms
Recall that the Euler characteristic of a cell complex is the alternating sum of the number of its cells in each dimension.The contribution of each cell will thus be plus or minus one depending the dimension of the cell.Moreover, this contribution will appear at the cell's filtration level.Therefore, if we are able to obtain a list of all cells with their filtration values we can compute the Euler characteristic at each filtration level.This is the main idea behind the following algorithms, which will always return what we will denote as list of contributions, a list of pairs (f (σ), (−1) dim(σ) ) that stores each cell's contribution to the EC at the cell filtration level.Once this pairs have been sorted in ascending order with respect to the filtration, the Euler Characteristic Curve can be reconstructed by progressively summing up the contributions of following elements in the list.Remark 6. Roune and Sáenz de Cabezón [26] proved that computing the Euler characteristic of a simplicial complex given by its vertices and facets is is #-P-complete.Even if their result does not mention filtered complexes, it follows from it that the problem of computing the ECC is at least P-complete.Otherwise, by contradiction, we could construct an arbitrary filtration of the considered complex, and look at the end value of the curve to obtain the Euler characteristic of the complex in polynomial time.
4.1.Vietoris-Rips complexes.In this section we will present a distributed algorithm to compute the Euler Characteristic Curve of a Vietoris-Rips simplicial complex obtained from a collection of points in R n .Definition 4.1.Let X be a finite collections of points in R n , also denoted as a point cloud.Given a parameter ϵ ≤ 0, the Vietoris-Rips complex constructed from X is the collection of all subset of diameter at most 2ϵ , where the diameter is the greatest distance between any pair of vertices The filtration of each simplex is given by its diameter.
The Vietoris-Rips complex is a flag complex, this means that a subset S of vertices is in the complex if every pair of vertices in S is in the complex.This is analogous to saying that the Vietoris-Rips complex is completely determined by its 1-skeleton graph as there is a 1 to 1 correspondence between simplices in the complex and cliques in its 1-skeleton graph Therefore, it is straightforward to see that listing all the simplices in a Vietoris-Rips complex is equivalent to perform a cliques count of its 1-skeleton graph [5].In order to compute the contributions to the ECC we need to find an efficient and distributed way to list all cells in the simplex (i.e.all cliques in the 1-skeleton graph), and their filtration values (i.e. the length of the longest edge in each clique), this can be achieved in the following way.Given an ordered list of points X = {x i : i ∈ [1, n]}2 and a maximum distance ϵ, for each point x i we build its local graph G i of subsequent neighbours, namely all points x j ∈ B(x i , ϵ) ∩ X with j > i.For each G i we list all of its cliques that contain x i .They will correspond to simplices with x i being the smallest vertex in the chosen ordering of points.This way each simplex σ in the V-R complex will be generated exactly once, when considering the local graph of its lowest vertex in the considered ordering.
Algorithm 1, that uses this idea, describes a way to list all the simplices of increasing dimension.At each iteration we obtain a list of d-dimensional simplices (given as collections of vertices) and, for each of them, a list of common subsequent neighbours of its vertices.We can then extend each simplex to a (d+1)-dimensional one by adding one common neighbour to the collections of vertices.When doing so we need to update the simplex's filtration value if one of the newly added edges is longer than the current filtration.Moreover, we need to update the list of common subsequent neighbours by intersecting it with the subsequent neighbours of the newly added vertex.Once we have obtained all possible (d+1)-simplices, we carry out this extension procedure one dimension higher.All of these operations are performed at the local graph of each vertex.The procedure ends when no simplex can be extended, i.e. when all maximal simplices have been listed.This construction might be understood as a breadth-first traversal of the simplex tree [5].
The main advantages of the proposed algorithm are two: it does not require to construct the whole complex, leading to a significant decrease in memory utilization; it considers each point separately, allowing the computations to be carried out independently.
The inputs of our algorithm are X, a ordered list of points in R n and a maximum filtration value ϵ.The output is list of contributions, an ordered list of pairs.For each simplex σ we store its contribution as a tuple ((−1) dim(σ) , f (σ)).The output list will sorted according to the filtration values.Note that the Algorithm 1 is correct.Firstly, every simplex in the Vietoris-Rips complex will be generated.It will happen when its smallest vertex in the considered order will be considered in the for loop.Secondly, each simplex will be generated only once in the , where v 0 < . . .< n n−1 < v n will be generated from a simplex [v 0 , . . ., n n−1 ] by adding v n as a common neighbour of its vertices.4.2.Time performance.The worst case scenario occurs when the the 1-skeleton graph is fully connected.Assuming the point cloud consist of n points the resulting V-R complex will contain 2 n − 1 simplices.In this case the time complexity of Algorithm 1 is O(2 n−1 n).More details are provided in Appendix A. 4.4.Choice of the vertex ordering.Note that the total running time of the fully parallelized Algorithm 1 can be dominated by few vertices whose simplex tree is considerably larger than the others.This explains the plateau in Figure 6.This effect can be mitigated by choosing a different ordering of the vertices.One efficient choice is to order the vertices by increasing number of ϵ-neighbours.Since the local graph for each vertex is constructed by considering only its subsequent neighbours, this ordering will produce more evenly-sized simplex trees.A simple example is showed in Figure 7 while the effect of this reshuffling on a larger dataset is shown in Figure 8. Figure 8.Effect of different orderings of vertices for the example in Figure 6.Selecting the ascending order allows to achieve a more even distribution in the number of simplices in each tree (panel A) thus reducing the number of very large trees which dominate the running time (panel B).
4.5.Cubical complexes.Cubical complexes are the most used combinatorial structure to represent digital grayscale images and extract topological information from them.There are two ways to construct a cubical complex from an image, the V-construction and the T-construction.The former identifies pixels -also know as voxels, in case of images of arbitrary dimension -with the vertices (the 0-dimensional cells) of the cubical complex.Voxels's values are used to define the filtration on the vertices and the filtration of each other elementary cube is the maximal value of its vertices.The T-construction can be seen as the dual procedure, voxels's values are assigned to the top dimensional cubes and the filtration values are propagated to lower dimensional cells by taking the minimum over the cofaces.The relation between these two constructions is explored in a recent work by Bleile et al. [4].In this paper we choose the T-construction, although the presented techniques translate easily to the V-construction.Similar to the V-R case, we are interested, given a grayscale n-dimensional image, in obtaining a list of contributions to the Euler characteristic of its corresponding cubical complex.As before, we then need to iterate over all cells σ in the complex and store each contribution as a tuple (f (σ), (−1) dim(σ) ).This can be achieved in a streaming fashion by loading into memory a two voxel high slice of the image, iterating through the cells in the bottom row computing their contributions, and then moving the sliding window up by one voxel.To make sure we consider each cells contribution exactly once, at each iteration we consider one voxel and compute the contributions to the Euler characteristic of the cells in its upper closure.Assuming that we can identify each top dimensional cell c i with the indices (x 1 , • • • , x n ) of the corresponding voxel in the input n-dimensional image, we define the upper closure of c i as the set containing c i and all its faces that are shared with other top dimensional cells c j whose indices are y i = x i or y i = x i + 1 for all i.An example of this procedure can be found in Figure 9.As already mentioned in Section 1, a similar streaming algorithm to compute the ECC of grayscale images has been presented by Heiss and Wagner [18].They also provide a fast open-source C++ implementation at https://bitbucket.org/hubwag/chunkyeuler.Recently Wang, Wagner and Chen [33] provided a GPU implementation of the same algorithm available at https://github.com/TopoXLab/GPU_ECC_SoCG2022.However, there is a significant difference between their approach and the one we describe in Algorithm 3: they keep track of the faces introduced by each voxel by looking at the gray values of the voxel's 3 d − 1 neighbor and store the cumulative change in the EC at the voxel's filtration value.This approach can not be generalized to the multiparameter filtration case as a cell could inherit different filtration values from different voxels.There are some small differences in the implementation too: CHUNKYEuler only works with integer filtration values and Figure 9.A slice of a cubical complex obtained from a 2 dimensional image.The image's pixels are associated to the top dimensional cells, depicted in yellow.Algorithm 3 takes as input a two voxel tick slice of the image and iterates through the voxels in the bottom row.At each iteration a voxels is selected and the contributions of the cells in its upper closure are computed.In this example, the voxel at coordinates (1, 1) is selected and the considered contributions are depicted in red: the one coming from the corresponding 2-cell, the two from the 1-cells shared with (2,1) and (1,2) and the contribution from the 0-cell shared with (2,1) , (1,2) and (2,2).only accepts 'raw' binary files as input.Our implementation, while being not as fast as CHUNKYEuler, offers the user more flexibility in the input and choice of filtration (or multifiltration) values.4.6.Time and memory complexity.Considering a d-dimensional image with n voxels as input, the resulting cubical complex will have 3 d n cells.The running time of Algorithm 3 is then linear in the number of cells in the complex with a multiplicative constant which is exponential in the dimension.This is not a problem in practice as images with dimension larger than 3 are not common in applications.The memory requirement is just the space needed to store a two rows slice of the input image, the memory overhead for computing the local contributions for each voxel is negligible.4.7.From Euler Characteristic Curves to Profiles.Both Algorithm 1 and Algorithm 3 can be immediately extended to compute the Euler Characteristic Profile of multifiltered Vietoris-Rips or cubical complexes.In the Vietoris-Rips case we require that all filtration functions should be defined on the vertices or the edges and then be extended to higher dimensional simplices by some user defined rule.This is to assure that the resulting multifiltered V-R complex is still a flag complex.In the case of cubical complexes we assume that the input images contains a n−tuple of numbers in each voxel -RGB images are a typical n = 3 example -and values are propagated to lower dimensional cells by some user defined rules.In both cases the output of both algorithms will be a list of (n + 1)−tuples (f 1 (σ), • • • , f n (σ), (−1) dim(σ) ) that stores the list of contributions to the ECP at different points f (σ) ∈ R n .Remark 7. In above, the simplest case of so called 1-critical multifiltration is discussed.In this case, each cell σ appear in a unique value of the multifiltration.In a general case, a cell σ may appear in multiple non-comparable values p 1 , . . ., p k of multifiltration.A simple generalization described below allows to adopt this presented algorithm to the general case; Let us assume that each p i is n dimensional tuple, p i = (p 0 i , p 1 i , . . ., p n i ).We assume that p i and p j are not comparable provided i ̸ = j.It means that there exist a pair of coordinates l ̸ = m so that p l i < p l j and p m i > p m j .Then, the cell σ contributes the value (−1) dim(σ) for all the points x ∈ R n for which there exist i such that x > p i .Note that the regions consisting of points greater that p i overlap for different i ∈ {1, . . ., k}, hence we need to avoid double and multiple counting of the contributions.Below we describe a procedure to achieve it and enforce the contribution of exactly (−1) dim(σ) for all x > p i for arbitrary i ∈ {1, . . ., k}.For that purpose, given i ̸ = j, we define p i ∨ p j = (max(p 1 i , p 1 j ), max(p 2 i , p 2 j ), . . ., max(p n i , p n j )).Algorithm 4 define a set of points with appropriate contributions to enforce the required condition for all x ≥ p i for all i ∈ {1, . . ., k}.Algorithm 4: CONTIBUTION OF σ TO ECP Input: d -dimension of s, p 1 , . . ., p k ∈ R n -incompatible times of appearance of σ in the multifiltration Output: A collection of contributions of σ to ECP 1 List Contribution ← (p i , (−1) d ), for i ∈ {1, . . ., k} 2 P = p i ∨ p j for every i, j ∈ {1, . . ., k} 3 Queue L ← p i ∨ p j for i ̸ = j ∈ {1, . . ., k} 4 while L ̸ = ∅ do It is straightforward to see that for any given cell σ, its contributions to the ECP will change at at most in p i ∨ p j for i, j ∈ {1, . . ., k}, where {p 1 , . . ., p k } are incompatible points in which σ appears in the multifiltration.Algorithm 4 scans all those points, and assigns the appropriate value (see line 1 and 9) to contributions to the ECP.Note that all points p 1 , . . ., p k have their contributions initially set in the line 1.Consequently, the presented algorithm will terminate, as in each iteration at least one p will be added to the Contribution list.In addition, it explicitly enforces the correct contribution of the cell σ to all points x ≥ p i for any i ∈ {1, . . ., k}.

Data Structures for ECPs
All the algorithms we described in the previous section output a list of contributions to the Euler Characteristic Profile.For a n-dimensional profile, each contribution in the list is a pair where the first entry is a n-tuple storing the coordinates in R n at which the Euler characteristic varies by the integer values stored in the second item.When dealing with one dimensional ECCs it makes sense to sort the contributions according to their filtration value, in order to perform faster operations on them.

5.1.
Retrieving the EC at some filtration values.Given a ECP as a list of contributions, the first basic operation is to retrieve the value of the Euler characteristic at an arbitrary filtration value f * .It can be obtained by summing up all the contributions in the ECP that appear at filtration values less or equal f * .For a d-dimensional ECP this can be achieved in linear time with respect to the size of the contribution list.In the one dimensional case, we can take advantage of the total ordering on the list of contributions, since the filtration values f i ∈ R. By doing so we can build an auxiliary data structure storing the value of the Euler characteristic at each f i , the points in which the ECC is changing value.This can be done in O(n) time and space, where n is the length of the list of contributions.Given such a structure, computing the value of the ECC at a given filtration f * boils down to the the search for the largest jump point f i < f * and retrieving the value of the ECC therein.This can be achieved by interpolation search in O(log(log(n))) time.

5.2.1.
Distances between Euler Characteristic Curves.In Section 3.1 we introduced the notion of difference between two ECCs, expressed in terms of the L 1 norm of the difference between the two curves.One should note that, in the case of finite Vietoris-Rips or cubical complexes, such a difference is always finite (but not bounded) as all ECCs will eventually stabilize to 1 for a sufficiently large filtration value.In case when the construction of a Vietoris-Rips complexes is stopped at a certain diameter 2ϵ, and the final complexes have more than one infinite homology, it make sense to restrict the integral used in distance computations to an interval [0, 2ϵ] in order to make the distances between the ECCs finite.
Both Algorithm 1 and 3 return the computed ECC as list of pairs (f i , c i ) where c i is an integer representing the change in the Euler characteristic at filtration f i .Such list is sorted in increasing order with respect to the filtration values.Using such data structure the difference between two ECCs can be computed in linear time with the size of the lists.Given two list of contributions ECC 1 and ECC 2 we can merge them in linear time, preserving the order.While merging we flip the sign of all the contributions coming from ECC 2 .Let us denote the obtained list with ECC 1−2 .Now the difference can be computed by iterating over the full list where EC(f i ) = i j=0 c j with respect to the ordering of ECC 1−2 .
Figure 10.Example of a two dimensional ECP with three contributions.
The green points indicate a +1 while the red point is a −1.The plane can then be subdivided in a 4 × 4 irregular grid.The coloring of each block indicates the value of the EC in that that region, white is 0, light gray is 1 and dark gray is 2.

Distances between Euler Characteristic Profiles.
Unfortunately the strategy proposed in the previous section is difficult to generalize in the multifiltration setting as there is no natural way to sort the list of contributions.We present here a basic algorithm to compute the distances between two ECPs and leave the search for potentially faster algorithm to future work.
Let ECP 1 and ECP 2 be two list of contributions representing two n-dimensional profiles.We can merge them in linear time, as in the one dimensional case, flipping the sign of the contributions in the second list.Let N be the total number of contributions.With reference to Figure 10, the coordinates of such contributions will create a n-dimensional irregular grid of size (N + 1).The value of the EC inside each cuboid will be equal to the EC at the cuboid's bottom left corner and can be computed in O(N ).The L 1 distance between the two ECPs can then be obtained by summing up the values of the EC in each cuboid weighted by the cuboid's volume.Given that the number of cuboids is (N + 1) d , this operation can be computed in O(N d+1 ).Note that the ECPs need to be truncated in order to avoid cuboids with infinite volume.

Vectorization
Vectorizing the ECC / ECP is a critical step if we are interested in using these invariants in a Machine Learning framework.6.1.Curves.Assume we are given an ECC whose filtration values ranges from 0 to f max .We can convert it to a vector by evenly sampling it N times between 0 and f max .If we chose to include the endpoints the resulting vector will be vec(ECC where ∆ is the vectorization's resolution which is defined as ∆ = f max /(N − 1).
The vectorized ECC can be obtained by such vector as the union of N − 1 left-closed, right-open intervals of length ∆ that correspond to to sampling the value of the EC at filtration value f i and extending it till f i+1 .It makes sense then to ask whether it is Figure 11.An Euler Characteristic Curve (black) and its vectorized version (green) with resolution ∆.In this case, the vectorized version is stored as a vector of length 5 (the green filled-in points), but can be reconverted to a stepsize function.
possible to bound the difference between an ECC and its vectorized representation.Figure 11 is an example of such difference when a curve is sampled in 5 points.Proposition 6.1.Let K be a filtered cell complex whose filtration values ranges from 0 to f max .The L 1 norm between the Euler Characteristic Curve of K and its vectorized version at resolution ∆ is bounded by where |K| is the number of simplices in the complex and F = n−2 i=0 |EC(i∆) − EC((i + 1)∆)| is the sum of the absolute value of the differences between consecutive values in the vectorized Euler Characteristic vec(ECC(K), N ).
Proof.We will prove the two terms in the bound separately as they come from two different types of errors.
Type I errors occur when the EC at two consecutive sampling points f i and f i+1 is different.The simplest case is depicted in Figure 12A, the EC changes values in between the sampling interval.We can upper bound this error with the area of the rectangle having as base the vectorization's resolution ∆ = f i+1 − f i , and as height the difference between the EC at the two sampling points |EC(f i ) − EC(f i+1 ).Note that this bound also holds in the more general case where the EC varies monotonically at multiple values inside the sampling interval.By summing up all the contributions we obtain the value ∆ • F = ∆ • n−2 i=0 |EC(i∆) − EC((i + 1)∆)|.Type II errors, see Figure 12B, occur when the EC has the same value at consecutive filtration steps but varies in between.The maximum possible variation can be upper bounded by the area of the rectangle with ∆ as base and the half the number of cells in the complex as height.Each cell contributes to the EC by ±1, the factor one half is due to the constrain that the EC has the same value in f i and f i+1 .This amounts to the values ∆ • |K|/2.
By summing up the two contributions we obtain the bound in 3. Note that a generic situation can always be described as a combination of type I and type II errors.□ We have shown a way to bound the distance between an ECC and its vectorized version.Another possible stability question is whether this vectorization preserves distances between ECCs.In other words, we are interest in knowing whether something can be said for ||vec(ECC 1 , N ) − vec(ECC 2 , N )|| given ||ECC 1 − ECC 2 || Unfortunately it is possible to construct examples in which two curves can be made arbitrary far apart but they have the same vectorization or two curves can be made arbitrary close but they have drastically different vectorizations.Figure 13 shows two of such examples.Moreover, in the existing literature Johnson and Jung [20] prove that the distance between two vectorized Betti curves can not be bounded by the Wasserstein distance between the respective persistence diagrams.They propose a stable vectorization inspired by Gaussian smoothing techniques.Figure 13.Two ECC superimposed in the same plot.In panel (A) the two curves can be made arbitrary far apart in L 1 but they have the same vectorization.In panel (B) the two curves can be made arbitrary close but they have drastically different vectorizations.

6.2.
Profiles.An n-dimensional Euler Characteristic Profile whose filtration values ranges from 0 to f i max for i ∈ 1 • • • n can be vectorized in a similar fashion by sampling it on a grid of size In general the N i can be different and thus leading to different resolutions ∆ i on the various filtration parameters.The output of this sampling procedure is a n-dimensional tensor vec(ECP, N i ) that can be eventually flattened to a 1dimensional vector.Although this is an intuitive generalization of the 1-dimensional ECC case, the procedure has an increased computational cost due to the difficulties in sampling EC values from a profile, as already discussed in Section 5.1.Moreover, the stability result in 6.1 can not be generalized to the multiparameter setting.As depicted in Figure 14, the grid vectorization could be not able to detect the contributions coming from pairs of cells.In the multiparameter case however, it is not possible to bound this contributions using only the vectorization resolutions ∆ as such contribution can persist on subsequent grid elements up to infinity.The ECP is vectorized by sampling the EC values on the green grid.We can add pair of cells with contributions ±1 inside the rectangle ABCD in such a way that the value of the EC on the vertices does not change.However, such contributions have a non-zero sum on an area that can be made arbitrary large.

Examples and Experiments
7.1.RGB images.A toy experiment using 3 dimensional Euler Characteristic Profiles can be constructed using RGB images.In a RGB image each pixel contains a tuple of 3 integers, each ranging from 0 to 255.They stand for the Red, Green and Blue color channel and all colors in the visible spectrum can be represented by a 3 tuple.In particular black is coded by (0,0,0) and white is (255, 255, 255).
In this example we consider two different textures, stripes and checks, each of them can be red, green or blue.We generate 10 samples of each combination of style and color by adding random Gaussian noise to each pixel.We then compute the 3 dimensional Euler Characteristic Profile of the cubical complex obtained from each image and computed the matrix of pairwise L 1 distances between them.Such matrix is show in Figure 15.It confirms that distance between Euler Characteristic Profiles of different images increase following the intuitive sequence 'same style, same color' < 'same style, different color' < 'different style, same color' < different style, different color'.7.2.Immune cell spatial patterns in tumors.Vipond et al. [32] applied multiparameter persistent homology (MPH) landscapes to study immune cell location in digital histology images from head and neck cancer.They extracted the locations of three immune cell types from histology slides thus obtaining a list of pointclouds labelled CD8+, FoxP3+, or CD68.The goal is to correctly classify a pointcloud.All pointcloud data are available at github.com/MultiparameterTDAHistology/SpatialPatterningOfImmuneCells.The  image into a bidimensional (H, E) one.We first computed the ECC for each of the grayscale images corresponding to the hematoxylin channel as it is the color that highlights cell nuclei.We then also used the eosin color channel to obtain a 2-dimensional ECP.We input either the ECCs or the ECPs into an Support Vector Machine (SVM) [3] classifier and computed the mean test accuracy over 100 rounds with a 80/20 training split.The results are displayed in Table 4.The classifier using as input the 2-dimensional ECPs is consistently performing better than the one using the 1-dimensional ECCs.

Conclusions
Euler Characteristic Curves and Profiles provide a stable summary of the shape of data.Unlike other summaries used in Topological Data Analysis this one can be computed in a distributed fashion, hence is applicable to deal with big data problems.enjoys certain type of stability.We confirm it when using them to discriminate various toy datasets with varying level of noise.We also show how to compare and vectorize the Euler Characteristic Curves and Profiles and apply them to a number of real data analysis problems.The presented results are accompanied with efficient Python implementation.For example, on modern commodity hardware, our implementation for V-R complexes can handle a number of simplices on the order of 10 10 .This is two order of magnitude more that what can be achieved using available software like GUDHI [31].With this work we hope that the machinery of Euler Characteristic Curves and Profiles will be useful for practitioners in Topological Data Analysis.

Appendix A. Time performance analysis
We asses the time performance of Algorithm 1 by analyzing the worst-case scenario, a complete graph built from a pointcloud {x i } i ∈ [1, n].This is the worst-case scenario as it contains the maximal number of cliques (hence simplices) for a given number of vertices, namely 2 n − 1.As discussed in the previous section, the running time will be dominated by the first vertex x 1 as it has the highest number of successive neighbours.The most time consuming operations are the ones that happen inside Algorithm 2, namely the update filtration and update common neighbours subroutines.
A.1.Update filtration.The extension of a d-clique requires checking whether one or more of the new introduced edges have a filtration value higher than the current d-clique.
Comparison between floats can be done in constant time and has to be repeated d times.With reference to figure 18, we can assign to each edge in the simplex tree a cost that depends only on the edge depth.The total sum of such cost is n i=1 In case of perfect parallelization, the cost for the first vertex only is A.2. Update common neighbours.Updating the list of common neighbours after a clique extension requires computing the intersection between the current list of common neighbours (with length m 1 ) and the list of neighbours of the newly added vertex (with length m 2 ).Given that such lists are ordered their intersection can be computed in O(m 1 + m 2 ).The total cost for this operation can be obtain recursively by observing that the number of neighbours in a clique is uniquely determined -in this particular case -only by the last element of the clique.For example, in Figure 19 the subtree spanning from AB is the same as the one spanning from B, and the one from AC is equivalent to C. The total cost for a clique of size n can be then expressed as twice the cost for the (n − 1)−clique plus the cost of the depth 1 edges spanning from the first vertex:

Appendix B. Memory performance analysis
At each step, the algorithm needs to store in memory the local graph G with each edge's filtration value.The local graph can be stored as an adjacency matrix whose entries represent the filtration values.Moreover, we need to store the current list of simplices, the list of their filtration values and the list of common neighbour for each simplex.Let us denote with V the bits needed to store an edge label (usually a uint) and with F the bits needed to store a filtration value (usually a float).Assuming the worst case scenario of a fully connected graph with n nodes, the maximum number of simplices will be generated at dimension n 2 and will be n n/2 .The memory cost at that step will then be O(n(n − 1)F + n n/2 nV + n n/2 F ) = O 2 n √ n (nV + F ) , where the first term is the graph cost, the second one is the cost of the list of simplices and the list of common neighbours that we assume to have the same size due to the symmetry of the binomial coefficients, and the third one is the cost of the simplices filtration values.

Definition 2 . 5 .
Therefore, we can define the group of d-chains C d = C d (K).The boundary of a d-chain is the sum of the boundaries of its cells ∂c = a i ∂σ i , which is a (d − 1)-chain.Since the boundary commutes with the addition operation, we can define -for each dimension d-the boundary homomorphism ∂ d : C d → C d−1 .A d-cycle is a d-chain with empty boundary ∂c = 0.A d-boundary is a d-chain which is the boundary of a (d + 1)-chain.Since ∂ commutes with addition, we have the group of d-cycles Z d = Z d (K) , and the group of d-boundaries B d = B d (K).It is a fundamental result that ∂ d ∂ d+1 c = 0 for every dimension d and every (d + 1)-chain c.This means that the boundary of a boundary is always zero, in other words B d is a subgroup of Z d .This leads to the following definition.The d-th homolgy group is the d-th cycle group modulo the d-th boundary group, H d = Z d /B d .The d-th Betti number is the rank of this group, β d = rank(H d ).

Definition 2 . 8 .
The k-th dimensional persistence diagram of a filtered complex K, Dgm k (K) is a multiset of points in the extended real plane (R∪{∞})×(R∪{∞}).The multiplicity of each point (b, d) indicates the number of independent k-dimensional classes that are born at filtration value b and die at filtration value d.

FiltrationχFigure 1 .
Figure 1.Two Euler Characteristic Curves in red and green.The absolute value of their difference is highlighted in shaded gray.

Definition 3 . 3 .
We can reformulate this statement by assigning to each point (b, d) in the diagram its indicator function in the interval [b, d), I [b,d) (t) = 1 if t ∈ [b, d) and 0 otherwise.This indicator functions are exactly the bars in the barcode representation.By doing so we can define the k-dimensional Betti curve as the step function obtained by summing up all these indicator functions.The k−th Betti curve for a persistence diagram D with finitely many off diagonal point is

Figure 5
Figure 5. Minimal counterexample for the instability of ECP.Consider a cell complex made by only one vertex whose +1 contribution appears at some point g = (g 1 , g 2 ) ∈ R 2 and move it to g ′ = (g 1 + ϵ, g 2 ).Their difference, the region shaded in red, is unbounded

Algorithm 1 :
COMPUTE LOCAL CONTRIBUTIONS V-R Input: Ordered point cloud X, ϵ > 0 Output: A ordered list of pairs (filtration, ±1) Create an empty vector C for every point x i in X do create the local graph G i of subsequent neighbours of x i ; simplices = [x i ] ; filtrations = [0] ; common subseq neighs = [[subseq neigh(G i , x i )]] ; while simplices NOT empty do for every simplex σ ∈ simplices do 9 add to C the tuple (filtration(σ), (−1) dim(σ) ); end INCREASE DIMENSION(G i , simplices, common subseq neighs) ; end end sort C according to the filtration value ; return C Algorithm 2: INCREASE DIMENSION Input: local graph G i , simplices, common subseq neighs new simplices = [] ; new filtrations = [] ; new common subseq neighs = [] ; for every simplex σ ∈ simplices do for every n ∈ common subseq neighs[σ] do new simplices.append(σ+[n]);consider all the edges from vertices of σ to n and take the longest one ; new f = MAX( filtration(σ) , length longest edge ) ; new filtrations.append(new f ) ; compute the intersection between the current common subsequent neighbours of σ and the subsequent neighbours of n in G i ; new common subseq neighs.append(intersection ) ; end end simplices = new simplices ; filtrations = new filtrations ; common subseq neighs = new common subseq neighs ;

Figure 6 .
Figure 6.Average runtime over 10 runs of Algorithm 1 as a function of the number of cores used.Contributions computed for the V-R complex obtained from 10000 points sampled from the unit 4-sphere up to a maximum radius of 0.4.Experiment run on a AMD Ryzen Threadripper PRO 5955WX cpu.Error bars are scaled up by a factor of 20 for visibility.

4. 3 .
Memory performance.Assuming the worst case scenario, the size of the output list of contributions is O(n 2 ) while the maximal memory required at one intermediate step is O(2 n / √ n).More details are provided in Appendix B.

Figure 7 .
Figure 7. Different ordering of the vertices can produce different simplex trees.In the first row vertices are ordered by decreasing number of neighbours, in the second row by increasing number.The second choice produces more evenly-sized trees.

Algorithm 3 :
COMPUTE LOCAL CONTRIBUTIONS CUBICAL Input: A two voxels tick slice of an image, padded with +∞ Output: A ordered list of pairs (filtration, ±1) 1 Create an empty vector C 2 for every voxel c i in the bottom row do 3 for every cell σ in the upper closure of c i do 4 add to C the tuple (filtration(σ), (−1) dim(σ) ) 5 end 6 end 7 sort C according to the filtration value 8 return C

6 P 8 c
′ = all elements p ′ ∈ P such that p ′ < p 7 if all elements P ′ are already in Contribution then = sum of values of elements in P ′ in Contribution 9 Contribution ← (p, (−1) d − c) else L = enqueue(p) end end return Contribution

Figure 12 .
Figure 12.The two possible source of errors during vectorization of an ECC

Figure 14 .
Figure 14.A 2-dimensional analog of a type II error of Figure12.The ECP is vectorized by sampling the EC values on the green grid.We can add pair of cells with contributions ±1 inside the rectangle ABCD in such a way that the value of the EC on the vertices does not change.However, such contributions have a non-zero sum on an area that can be made arbitrary large.

Figure
Figure 15.60 × 60 distance matrix between Euler Characteristic Profiles of different RGB images.

Figure 16 .
Figure 16.Panel (A) depicts a raw RGB ROI.Panel (B) contains the hematoxylin channel while panel (C) contains the eosin one.

Figure 17 .
Figure 17.Panel (A) depicts the hematoxylin ECC for the ROI in Figure 16 while panel (B) depicts the eosin ECC.The combined ECP is showed in panel (C).

Figure 18 .
Figure 18.Simplex tree for a 4-clique with the update filtration cost.

Figure 19 .
Figure 19.Simplex tree for a 4-clique with the update common neighbours cost.

Table 1 .
15. 60 × 60 distance matrix between Euler Characteristic Profiles of different RGB images.Average classification accuracy for the LDA classifier using as input MLP, ECC or ECP.Data for each tumor are split into 80/20 train-test splits and classification accuracy is reported as the mean over 100 repetitions of splitting, training and testing.

Table 2 .
Average classification accuracy for the rLDA classifier using as input MLP, ECC or ECP.Data for each tumor are split into 80/20 train-test splits and classification accuracy is reported as the mean over 100 repetitions of splitting, training and testing.

Table 4 .
Mean test accuracy for the Gleason 3 vs Gleason 4 classification using ECCs or ECPs as input to an SVM classifier.