From Trees to Barcodes and Back Again II: Combinatorial and Probabilistic Aspects of a Topological Inverse Problem

In this paper we consider two aspects of the inverse problem of how to construct merge trees realizing a given barcode. Much of our investigation exploits a recently discovered connection between the symmetric group and barcodes in general position, based on the simple observation that death order is a permutation of birth order. The first important outcome of our study is a clear combinatorial distinction between the space of phylogenetic trees (as defined by Billera, Holmes and Vogtmann) and the space of merge trees. Generic BHV trees on $n+1$ leaf nodes fall into $(2n-1)!!$ distinct strata, but the analogous number for merge trees is equal to the number of maximal chains in the lattice of partitions, i.e., $(n+1)!n!2^{-n}$. The second aspect of our study is the derivation of precise formulas for the distribution of tree realization numbers (the number of merge trees realizing a given barcode) when we assume that barcodes are sampled using a uniform distribution on the symmetric group. We are able to characterize some of the higher moments of this distribution, thanks in part to a reformulation in terms of Dirichlet convolution. This characterization provides a type of null hypothesis, apparently different from the distributions observed in real neuron data and opens the door to doing more precise science.

Trees have a nearly universal presence as a structure for organizing relationships between objects. From hierarchical arrangements that are useful in the classification of species, to more immediate geometric applications in modeling neuron morphology [14,15,16], trees have proved to be an indispensable tool. However, as is natural for such a universal concept, subtle variations introduce important differences that are not always commented on. In this paper, we are interested in the comparison of the notion of merge trees, which is an important tool in topological data analysis (TDA), and that of metric phylogenetic trees, which has gained a tremendous traction since its formalization by Billera, Holmes and Vogtmann [1], along with their combinatorial variants.
Our interest in delineating these objects comes in part from the fact that both merge trees and metric phylogenetic trees have associated barcodes, which are topological invariants obtained from the persistent homology of a filtered space. Since their introduction, barcodes or persistence diagrams have become the standard topological summary used in TDA. Like all summaries, barcodes forget information about the space they are computed from. Thus, even when restricting to a specific set of topological spaces like trees, one may find that many different shapes give rise to the same barcode. Quantifying this failure of injectivity into a summary space is the realm of topological inverse problems. Understanding such problems is crucial for comparing different representations of objects arising in both pure mathematics and in data science.

High-Level Overview and Motivation.
In this paper we consider two aspects of the inverse problem of constructing merge trees realizing a given barcode, motivated by recent work in neuroscience. In particular, the tools developed in [14,15] have proven useful for the study of neuron morphologies [16], which can be modeled by rooted trees, i.e., acyclic binary graphs with a distinguished vertex called the root (which corresponds to the neuron's soma), embedded in R 3 . In the terminology of this paper, this structure is most faithfully represented by merge trees.
In [15] the authors introduced the Topological Morphology Descriptor (TMD), an algorithm that returns a barcode from a tree, keeping track of the lengths of each branch with respect to a given filtration, but forgetting the adjacency relations between the branches. In this article we expand this investigation and systematically study the inverse problem from a combinatorial point of view. We hope that understanding this relation will provide insight into the complex structures of neurons; see Figure 1 for a schematic. The general approach to the merge tree-to-barcode inverse problem is as follows. Any barcode can be realized by finitely many trees, the number of which is called the tree realization number (TRN) or simply the realization number of the barcode. As observed in [4] and [17], the realization number of a barcode in general position can be computed by certain containment relations between its bars, viewed as intervals on the real line. One of the crucial observations of [17] is that these containment relations partition the set of barcodes (on n bars) into equivalence classes, indexed by permutations in S n , the symmetric group on n letters. The representation of a barcode by a permutation not only gives a formula for the tree realization number (Lemma 3.4), but also opens the door to deeper connections between inverse problems in TDA, group theory, and combinatorics.
Besides quantifying the relative "descriptive power" of different summaries, in [17] it was shown that the realization number could be used as a statistic to distinguish distributions of trees. Figure 2 shows (log) realization numbers computed from different tree distributions, obtained by computing the realization number either from actual trees, such as neurons, or by randomly generating barcodes with specific properties. The datasets used were (i) real neurons (basal and apical dendrites, drawn in red and purple), (ii) random barcodes where the birth b i is picked, then the death d i is chosen to be larger than b i , and (iii) random barcodes with separated births and deaths so that the induced distribution on the symmetric group is uniform (see Section 4.1). The results are striking: barcodes computed from neurons exhibit a very different distribution than barcodes with uniformly drawn permutation type; see Figure 2 for a graphical comparison.
In this paper, we study the realization numbers computed from barcodes with uniform permutation type (i.e., drawn from the uniform distribution on the symmetric group). We view this as essential for the realization number to be used for applications, as it establishes a fundamental null hypothesis for the invariant. Our tools are mainly combinatorial, leading us to discover unexpected connections between the inverse problem and other classical combinatorial objects. One of our main theorems (Theorem 3.19) casts the classic result of Erdős that counts the number of maximal chains in the lattice of set partitions in a new, merge-tree light. It was this result that not only permitted an easy calculation of the expected tree realization number, but also further established the fundamental FIGURE 2. The log of the tree-realization number for barcodes with varying numbers of bars for TMD of basal dendrites (red), apical dendrites (purple) in comparison with "random" barcodes as defined in Section 4.1 (green), barcodes with separated births and deaths such that the distribution induced on the symmetric group is uniform (blue, see section 4.1 and Proposition 4.4), and the maximum tree-realization number (n! for n + 1 bars) (black).
differences between combinatorial classes of merge trees and phylogenetic trees. We now provide a more detailed overview of the paper.

Detailed Overview.
After the introduction, we start in earnest by reviewing the basic properties of trees and barcodes in Section 2. The basic graph-theoretic notion of a tree is reviewed in Definition 2.1, as are the notions of labelling and isomorphism. Labellings offer one important way of distinguishing merge trees (Definition 2.2) and metric phylogenetic trees (Definition 2.10), but Proposition 2.12 provides a more carefully stated distinction between the notions of BHV space, labelled merge tree space, and merge tree space. In this first subsection, combinatorial notions of merge trees and phylogenetic trees are also introduced. The pertinence of these combinatorial notions becomes evident after we introduce barcodes in Section 2.2, which allows us to review the inverse problem for merge trees in Section 2.3, where the combinatorial (permutation) type of the barcode is all that matters (see Section 2.4).
Section 3 marks the beginning of this paper's contribution to the literature. In Section 3.1, we formalize the observation of [17] that the tree realization number (TRN) is a function of the symmetric group, by expressing the TRN in terms of the left-inversion vector associated to a permutation. We take a minor detour in Section 3.2 to observe that the combinatorial equivalence class of each barcode is convex (Lemma 3.6), which is of use later when we choose certain standard forms for barcodes (Definition 3.9) and merge trees (Definition 3.16). We continue the algebraic analysis of the TRN in Section 3.3, where we prove that when the symmetric group is equipped with a certain partial order (Definition 3.10), the TRN is an order-preserving map. After proving that every pair of combinatorially equivalent merge trees can be connected by a line of merge trees (Lemma 3.15), we show that the sum of the tree realization numbers is equal to the total number of combinatorial types of merge trees in Lemma 3.17. Theorem 3.19 in turn states that this number is equal to the number of maximal chains in the lattice of partitions (Definition 3.18), which is (n + 1)!n!2 −n . This result provides a stark combinatorial contrast with the well-known fact that there are (2n − 1)!! types of labelled binary trees on n + 1 nodes [10]. Section 3.5 explores this contrast in greater depth by making quantitative the observation that whereas merge trees fiber over the symmetric group in a nice way, phylogenetic trees do not.
Section 4 finally delivers closed-form formulas for some of the trend lines in Figure 2. We cover briefly two methods to generate random barcodes in Section 4.1, before characterizing the distribution of tree realization numbers (when sampled uniformly on the symmetric group) in terms of Dirichlet convolution in Theorem 4.1. The paper concludes with Proposition 4.8, which uses the left-inversion vector representation of the TRN to give a closed formula for the expected log realization number.
1.3. Related Work. This paper touches on many classical concepts related to trees and combinatorics, so providing a complete list of related work is impossible. However, the literature on inverse problems for TDA can be reviewed briefly here.
The concept of a geometric realization of a persistence module was considered in [19] in order to prove a universality result for the interleaving distance. In [12] the authors initiated an algorithmic study of how to find a point cloud that realizes a given persistence diagram. While these articles are concerned with finding single realizations of persistent signatures, the present article focuses on the study of the entire pre-image of the persistent homology pipeline.
In the same vein, there is [4], which focused on the setting of functions on the interval and their associated merge trees. Some of the results there were independently rediscovered and extended in [17], which inspired the present collaboration. Both [7] and [20] are more recent articles that investigate the fiber of the persistence map in settings that are different from ours.
We note that the study of the (non-) injectivity of certain topological transforms is also an aspect of topological inverse problems, see [23,13,6,21,25] for a sampling of these articles and [24] for a recent survey. Better understanding the precise failure of injectivity of certain TDA invariants led to the development of enriched topological summaries (ETS) that remediate these failures, opening a promising line of research; see [2] and [5] for some examples of these ETS. Section 3.3 of this paper explores the relationship between the Bruhat order on the symmetric group and barcode equivalence classes. A similar connection was observed in [26] in a different context.

BACKGROUND ON TREES AND BARCODES
In this section, we assume basic familiarity with persistent homology in degree 0, even though it is not necessary to understand persistence for most of these definitions. For a more algorithmic review of the topic in the case of trees, see [17]. We begin by reviewing the necessary background and combinatorial results from [4] and [17]. Most of this section reviews prior work, though Proposition 2.12 provides a novel comparison of merge trees and phylogenetic trees and foreshadows results later in the paper.

Trees, Merge Trees and Phylogenetic Trees.
There are many notions of trees in mathematics and the sciences. We review a few of these here and explain their differences. We start with the simplest definition, that of a combinatorial tree. Definition 2.1. A combinatorial tree T is a connected, acyclic, binary graph. It is finite if the number of vertices is finite. A rooted tree is a combinatorial tree with a distinguished vertex of degree 1 called the root. Non-root vertices of degree 1 are called leaves.
A labelling of a combinatorial tree T is a bijective map from its set of vertices V(T ) to a set S of labels. A labelling is ordered if S is a subset of of the natural numbers N. An ordered labelling of a tree with n vertices gives rise to an n × n adjacency matrix, of which the (i, j)-coefficient is 1 if there is an edge between the vertices labelled i and j and is 0 otherwise.
Two combinatorial trees T and T are isomorphic if there is a bijective map T → T that sends vertices to vertices in an adjacency-preserving way: if two vertices in T are connected by an edge, then so are their images. Equivalently, T and T are isomorphic if there exist ordered labellings of both with respect to which their adjacency matrices are identical.
In this paper, we assume all trees are finite. Moreover, we assume that there are no vertices of degree 2, that is, each vertex is either a bifurcation or branching point, i.e., a vertex of degree 3, or a termination, i.e., a vertex of degree 1, such as the leaf nodes or the root.
When rooted trees are considered, there is a natural way to induce an orientation on the edges of the tree: for each vertex v, there is a unique path from v to the root r. Every edge of the tree is oriented from the vertex further from r to the closer one (with respect to the graph path distance). A vertex v of T is a parent of a vertex w if there is a directed edge from w to v; the vertex w is then a child of v. Each vertex of T has a unique parent, except for the root r, which has no parent at all. Note that a finite combinatorial tree T is fully specified by its set of vertices, equipped with the partial order specified by the "is a parent of" relation. The language of "parents" and "children" obviously comes from studying ancestral relations for people (as in family trees) and species (as in phylogenetic trees). There are also situations where the parent-child relation is determined in part by a notion of "height," which is how merge trees are defined.

Definition 2.2.
A merge tree is a rooted combinatorial tree T , together with a function on the vertices h : V(T ) −→ R ∪ {∞}, called a height function, that satisfies two properties.
(1) If v is the parent of w, then h(v) ≥ h(w).
(2) If r is the root node, then h(r) = ∞. Two merge trees (T, h) and (T , h ) are isomorphic if there is a graph isomorphism ϕ : T → T that preserves heights, i.e., h = h • ϕ. A generic merge tree is a merge tree (T, h) such that the height function h : V(T ) → R is injective. We always assume our merge trees are generic, unless otherwise indicated. This has the effect of placing the root node higher than the leaf nodes, contrary to how trees appear in nature. To honor the natural orientation and size of trees in nature, we draw our merge trees with the opposite convention, so that the root is lower than the leaves and so that f(r) = ∞ is represented with a finite value N. ∈ π 0 (f −1 (−∞, t]). Since the projection map from Γ + onto the second coordinate is constant on equivalence classes, this projection map factors to define the height function. Under reasonable tameness conditions, the quotient space is homeomorphic to the geometric realization of a combinatorial tree, where vertices correspond to connected components of "critical" points. Example 2.5. A typical example of merge tree is one arising from measuring height on an embedded manifold X ⊆ R n . Here "height" can be thought of as the scalar product with a specified unit vector. Figure 3 shows a simple example of a topological space and the corresponding merge tree.
There is a natural ordered labelling on the vertices of a generic merge tree (T, h), inherited from the function h, by ordering the vertices according to their h-value: the leaf node with lowest h-value is labelled 0, and the remaining nodes are labelled based thereafter on the order in which they appear. We call the labels on the leaves the birth labels and the ones on the internal vertices the death labels, for reasons that will become clear later in the paper when we review persistent homology.
We are now in a position to state the first novel definition of the paper. Recall that two graphs are isomorphic if they admit ordered labellings making their adjacency matrices the same. A merge tree includes the additional data of heights of each node. By focusing separately on the order of births and the order of deaths, along with adjacency data, we have a more flexible notion of a merge tree. Definition 2.6. Two generic merge trees (T, h) and (T , h ) are combinatorially equivalent if they are isomorphic as graphs via a graph isomorphism preserving the orders of births and of deaths, respectively. In more detail, (T, h) and (T , h ) are combinatorially equivalent if there exists a graph isomorphism ϕ : T → T such that the following conditions hold.   (1) For every pair of leaf (birth) nodes v i and v j in We note that these two conditions specify two different sets for the logical quantifier and that the total order on vertices need not be preserved; see Figure 4 for an example.
Remark 2.7. Note that that combinatorial equivalence classes of merge trees are simply combinatorial trees equipped with a labelling of the leaves and a labelling of the internal nodes. We call such a tree a combinatorial merge tree. such that h = h+∆ for some real number ∆. We say (T , h ) is a translation of T . A generic merge tree is combinatorially equivalent to any translation of itself. However, combinatorial equivalence detects relationships more general than translation; see Figure 4.
Example 2.9 (Sensitivity to Generators). Although the two merge trees in Figure 5 are isomorphic as graphs, the only possible graph isomorphism reverses the birth order, hence these generic merge trees are not combinatorially equivalent. Notice that the homology generator of the essential class (see Section 2.2) starts with the node labelled by 0 or A on the left hand side, while on the right hand side, it starts with the 0 or B label. This is sometimes called "instability" or "sensitivity" of generators in TDA. Together with Figure 4, these specify the three possible combinatorial equivalence classes of merge trees with three leaf nodes.
As mentioned earlier, most of the language concerning trees is inspired by the study of ancestral relationships. Although trees have been used for this purpose for centuries, a formal definition of a phylogenetic tree-and more importantly a clear coordinatization on the set of all phylogenetic trees-was given only somewhat recently in the landmark paper of Billera, Holmes and Vogtmann [1]. We review some of these definitions, modifying the terminology slightly for our purposes. Definition 2.10. A metric phylogenetic tree is a rooted combinatorial tree T endowed with (1) a labelling of the leaf nodes, and (2) a non-negative real number associated to every parent-child pair. The values assigned to each parent-child pair can be considered as weights on the graph edges. By contrast, a combinatorial phylogenetic tree is a rooted combinatorial tree with just a labelling of the leaf nodes. If we say phylogenetic tree without any modifier, we always mean a combinatorial phylogenetic tree. Figure 7B shows all combinatorial classes of merge trees with four leaves and Figure 7C shows all combinatorial classes of phylogenetic trees with four leaves.

Example 2.11.
One of the key differences between metric phylogenetic trees and merge trees is that phylogenetic trees always have labelled leaf nodes, with labels independent of the lengths on the edges. This makes sense because BHV space-the set of all possible metric phylogenetic trees on n leaf nodes, denoted MPT n -documents all possible evolutionary relationships among n fixed species. The labels matter because the involved species matter. FIGURE 5. Two generic merge trees that are isomorphic as graphs. When they are regarded as phylogenetic trees we fix alphabetical ('ABC') names for the leaf nodes, as if the nodes represented species that went extinct at different times. With this labelling they are considered close in the metric defined by [1]. When these are regarded as merge trees they are naturally unlabelled and are close in the interleaving distance [22], but if we use birth order ('012') to label the leaf nodes and regard them as phylogenetic trees then they are far apart; see Proposition 2.12.
On the other hand, the set of all merge trees with n leaf nodes, written MT n , consists of isomorphism classes of merge trees, see Definition 2.2. We consider also the set LMT n of labelled merge trees with n leaves, where the labelling is arbitrary (see Definition 2.1). Let I : LMT n −→ MT n , denote the map that sends a labelled merge tree to its isomorphism class. We describe the relationship between these two types of tree spaces in the following proposition.
Proposition 2.12. For every ∆ ∈ R, there is an injective map from the set of metric phylogenetic trees with n leaves, MPT n , to the set of labelled merge trees with n leaves, such that the composite I • H ∆ has a fiber of cardinality n! over generic merge trees, corresponding to permutations of the labels on the leaf nodes. Moreover, if ∆ ≥ 0, there is a natural map T ∆ : MT n,generic −→ MPT n that sends a generic merge tree to a metric phylogenetic tree that is labelled by birth order and where the distance from the root node to its child is ∆.
Proof. Given a metric phylogenetic structure on a rooted tree T , we can define a height function h on T as follows. Every node v that is not the root node r is assigned the function value h(v) := ∆ − d(r, v), where d is the sum of the weights of each edge along the unique path connecting r to v. This defines the map H ∆ in the statement of the proposition. As explained earlier, every generic merge tree admits a canonical ordering of its leaf nodes by height order. If two generic labelled merge trees in the image of H ∆ are isomorphic as merge trees, then there is a unique permutation of the n leaf labels taking one labelling to the other. This proves the second statement.
Finally, the map T ∆ sends a generic unlabelled merge tree (T, h) to the metric phylogenetic structure on T that has labels given by birth order and where the weight on an edge is given by the difference in heights of its two vertices. The distance from the root node to its child is given by ∆.
Remark 2.13. Each of the three sets above can be equipped with topologies. In [1], the space of phylogenetic trees is topologized as a CAT(0) space where each orthant records a distinct split topology. Both labelled merge trees and merge trees can be topologized using versions of the interleaving distance [27]. Unfortunately, the map T ∆ is discontinuous with respect to these topologies, as can be seen from Figure 5.
Proposition 2.12 shows that, despite their apparent similarity, there are significant differences between metric phylogenetic trees and merge trees. Indeed neither of the maps above is a bijection. However, if one quotients the set of labelled merge trees by translations, then the map induced by H ∆ should be a bijection; alternatively one could modify the definition of merge trees so that the root node has a fixed height N, as in the drawing convention of Remark 2.3.
Although the proposition and remark above identify certain differences and similarities between metric phylogenetic trees and merge trees, for this paper the most important distinction is in terms of combinatorial type. In this respect, merge trees and phylogenetic trees are distinguished by the explicit ordering of birth and death nodes. This observation will lead to different formulas for the numbers of top-dimensional strata in the set of phylogenetic trees PT n , which is (2n − 3)!!, and in MT n , which is (n − 1)!n!2 −n+1 . For now, however, the reader is encouraged to consult Table 1 and Figure 6 for two convenient summaries of the similarities and differences between combinatorial trees, merge trees, (combinatorial) phylogenetic trees, and barcodes.

Barcodes.
We now recall the notions of persistent homology and barcodes. For reasons of brevity, we choose to use the categorical definition of persistent homology, but the reader who would like a more algorithmic version for the case of merge trees can read [17] or the summary in Example 2.17.

Definition 2.14.
A persistence module is a functor where (R, ≤) is the real line with its total ordering ≤ . An interval module is a persistence module k I that is rank 1 on an interval I ⊆ R with identity maps internal to I and 0 elsewhere. FIGURE 6. Summary of the different notions studied in this paper and their relations, as expressed in part by Proposition 2.12. One can turn a metric phylogenetic tree (with labels A,B,C in red) into a labelled merge tree. Generic merge trees can be turned into metric phylogenetic trees by labelling according to birth order (labels 0, 1, 2 in red), but this introduces a discontinuity.
A function f : X → R is said to be tame if the homology groups of the sublevel sets {f t } t∈R = {f −1 ((−∞, t])} t∈R have finite rank and change at a finite number of points.
Tame functions f have finitely many critical values a 0 , · · · , a n , and the sublevel sets f t 1 , f t 2 are homeomorphic when t 1 , t 2 ∈ (a i , a i+1 ) for i = 0, · · · , n − 1. By [3] we have the following decomposition theorem. In this paper, we represent barcodes graphically by drawing the interval between b j and d j for each index j. Sometimes barcodes are represented by persistence diagrams, i.e., sets of points in R 2 where the x-coordinate indicates birth time and the y-coordinate death time. Note that x ≤ y always in this representation. Regarding T as a one-dimensional simplicial complex, we can linearly interpolate the height function from the vertices to the entire tree. The barcode of the merge tree (T, h) is the barcode corresponding to the persistence module Although the barcode of F is guaranteed to exist by virtue of Crawley-Boevey's theorem, there is a more direct way of constructing the barcode in the special case of merge trees, called the Elder rule [4].
The Elder rule provides a concrete way to compute the barcode of a merge tree via decomposition into branches, i.e., each bar in the barcode corresponds either to a single edge or a list of adjacent edges in the merge tree. According to the Elder rule, each leaf node marks the beginning of a bar in the barcode at the height of the leaf node. If two leaf nodes l i and l j such that f(l i ) > f(l j ) share a parent at vertex k, the branch that was born "earlier" at l j survives as it is "elder", and the branch born l i dies, creating a bar Under this rule, every bar begins at a leaf node and ends at an internal node with the sole exception of the bar that is born at the leaf node with the lowest height, which is paired with infinity. However, in our figures, in keeping with Remark 2.3, the lowest leaf node will be paired with N = f(r), which is the height of the root node when viewed as an embedded finite tree. A simple example is illustrated in Figure 3.
Although in general the barcode can be a true multiset, in this article we are concerned primarily with barcodes that are actually sets, leading us to formulate the following definition.
We refer to the half-infinite bar as essential.
Example 2.19. The barcode of a generic merge tree is always strict.
We summarise the different characteristics of combinatorial trees, merge trees, phylogenetic trees, and barcodes in Table 1. 2.3. Realizations of Barcodes. As described in the previous section, every merge tree has an associated barcode. It is natural to ask whether the map from merge trees to barcodes determined by the Elder rule is injective, but it is not hard to see that it is not. A somewhat more surprising result, proven independently in [4] and [17], is that the failure of injectivity of the Elder rule map can be quantified for generic barcodes. More precisely, we say that a merge tree (T, h) realizes a barcode B if the barcode of (T, h) is B. The tree realization number, R(B), of a strict barcode B is the number of combinatorial trees T admitting a height function h such that (T, h) realizes B. A. Combinatorial types of rooted trees with three leaves and the corresponding adjacency matrices. B. Cayley graph generated by the two adjacent transpositions of S 3 and the corresponding barcodes, together with all the combinatorial types of trees that realize a barcode. Colored letters correspond to different types of merge trees that are the same as phylogenetic trees (indistinguishable trees), illustrating the result of Section 3.5. C. Rooted phylogenetic trees with three leaves. We represent these phylogenetic trees organised by the cominatorial types of barcodes they would have if they had death labels as well. The three pairs of trees within colored squares correspond to the indistinguishable trees defined in Section 3.5: the internal nodes are incomparable, so they can have two different death value that lead to different merge trees. In phylogenetic trees, the label order does matter: for instance, in the first column, all the trees are of the same combinatorial type A but correspond to different phylogenetic trees. To go from the space of phylogenetic trees to the space of combinatorial trees, one forgets the labels and considers the adjacencies only, see Figure 6.

Combinatorial trees Merge trees Phylogenetic trees Barcodes
Height function X Label on leaves (births) X* X X* Label on internal vertices (deaths) X* X* Adjacency X X X where µ(I j ) = |{I k |I j ⊂ I k }|. The value µ(I j ), called the index of bar I j , is the number of bars of B (including the infinite bar) that contain I j .
Although the proof of this theorem, by induction on the number of bars, can be found in [4] and [17], we provide a brief sketch for the sake of intuition. Start by setting T 0 = I 0 = [b 0 , ∞). Since the merge tree T is connected, we can recursively attach bars by death time, first to T 0 and then in the j th step to T j to get T j+1 , according to the Elder rule. Each possible choice of attachment then gives a particular merge tree isomorphism class. See Figure 8 for a graphical representation of this process. Example 2.21. Consider the strict barcode B = [0, ∞), [1,8), [2,7), [3,6), [4,5) . According to the formula in Proposition 2.20, In general, if B is a strict barcode with n finite length half-open intervals such that I j ⊂ I k for all k < j, then R(B) = n!.

Relations to the Symmetric Group.
We begin by recalling the map from the set of strict barcodes with n nonessential bars to the symmetric group on n letters, which was introduced in [17].  also use cycle notation, which describes the permutation in terms of its orbits and uses parentheses; fixed points are omitted in this notation. For our example, σ = [132] can also be written as the elementary transposition (23). See Figure 9.
be a strict barcode such that b 1 < ... < b n . The permutation type σ of the barcode B is the automorphism σ of {1, . . . , n} that maps birth order to death order. In other words, if we re-index the death times using the natural order on R so that d i 1 < · · · < d in , the permutation σ is [i 1 i 2 ...i n ]. In terms of the Elder rule, this associated permutation comes from tracking which birth is paired with which death.
Notice that the essential bar [b 0 , ∞) does not play a role in the permutation type, as it always contains all the other bars in a strict barcode.  FIGURE 9. Combinatorial equivalence classes of persistence diagrams with three non-essential points. The associated permutation σ is written next to each diagram in both forms of notation: the image notation is in square brackets, i.e,.
[σ(1)σ(2)σ(3)], and the cycle notation in parentheses. The arrows point in the direction of increasing left Bruhat order and exhibit S 3 as a poset. Notice that the permutation acts by switching death order.
The association of a permutation to each barcode defines an equivalence relation on the set of strict barcodes. Definition 2.24. Let B and B be two strict barcodes, each with n non-essential bars, de- , respectively. We say B and B are combinatorially equivalent if they have the same associated permutation.
We can now express the relation between barcodes and the symmetric group more concisely as follows. Let B n denote the collection of strict barcodes with n finite length halfopen (non-essential) bars. The map that associates to every strict barcode its permutation type defines a bijection between combinatorial equivalence classes of strict barcodes and elements of the symmetric group, i.e., Example 2.25. The space B 3 / ∼ and the corresponding elements of S 3 of the bijection given above are displayed in Figure 7C.

Remark 2.26.
As was done in Remark 2.7 for combinatorial merge trees, one can identify the combinatorial equivalence classes of barcodes with elements of the symmetric group. What will be called a combinatorial barcode in this paper is just the corresponding permutation in S n .
We conclude this section by clarifying the relationship between the two notions of combinatorial equivalence that are pertinent to the tree realization problem. Proof. Since tree isomorphisms as defined in Definition 4 preserve both birth and death orders, we need to check only that if the Elder rule pairs the i-th birth node with the j-th death node in T , then the same holds for T . This is obvious, however, because the unique sequence of edges connecting a pair of nodes in T must be sent to the same sequence of edges connecting these nodes in T , since ϕ is a graph isomorphism and therefore preserves adjacencyy relations. Figure 10 illustrates the relationship between merge trees and their combinatorial equivalence classes and barcodes and their combinatorial equivalence classes, corresponding to permutations.

COMBINATORIAL AND ALGEBRAIC PERSPECTIVES ON THE REALIZATION NUMBER
Now that we have reviewed the basic notions of trees, merge trees, their barcodes, and prior results on the inverse problem detailed in [4] and [17], we are in a position to extend those results. The first observation of this section is that the tree realization number (TRN) of a barcode is simply the product of the entries of the left inversion vector for the permutation associated to a barcode. This is somewhat surprising, as the left inversion vector is a classical object of study, but typically authors study the sum of its entries rather than the product. This observation also allows us to characterize those barcodes that have a larger tree realization number in the language of geometric group theory: permutations that have longer word length in the left Bruhat order have higher TRN. Based on a convexity result for combinatorial equivalence classes of barcodes, we also provide a closed form expression for the sum of TRNs across all elements of the symmetric group, which is equal to the number of maximal chains in the lattice of partitions. This result is of use in the next section, when we consider probability distributions on the space of barcodes and calculate the expected tree realization number for the uniform distribution on the symmetric group.

The Realization Number and the Left Inversion Vector.
Careful inspection of the formula for the tree realization number in Proposition 2.20 reveals that the index of a bar [b i , d i ) in a barcode B is given by the number of bars born before b i and that die after d i . Thinking in terms of the permutation associated to a barcode, this index counts the number of "upsets" of birth-mapping-to-death order. More precisely, for a permutation σ of {1, . . . , n} if i < j and σ(i) > σ(j), then either the pair of places (i, j) or the pair of elements (σ(i), σ(j)) is called an inversion of σ-the usual order i < j has been "upset" or inverted here. We now modify the usual notion of an inversion vector so that it is defined for strict barcodes and makes our theorem statements as tidy as possible. FIGURE 10. The relationships between merge trees, combinatorial equivalence classes of merge trees, barcodes and combinatorial barcodes. Birth labels are indicated in red, and death labels in blue. The largest bar (corresponding to the essential class) is not taken into account in the combinatorial setting since it is there for every tree/barcode. Therefore we label it by 0.
be a strict barcode with b i < b j for i < j. The left inversion vector of B is the n-vector l(B) whose i-th coordinate is We note that for this formula the index j = 0 is used for computation although it is not given a position in the n-vector l(B), since the vector would have length n + 1. When we calculate the left inversion vector of a permutation σ associated to a barcode, we use the slightly modified definition in order to make sure that l(σ) = l(B).  FIGURE 11. Persistence diagrams associated to the six elements of S 3 , along with their inversion vector and tree realization number.
The permutation associated to this barcode is σ = (3214) because the first non-essential feature dies third, the second feature dies second, the third feature dies first and the fourth feature dies fourth. Clearly, l(σ) = (1, 2, 3, 1) as well.

Example 3.3.
For the left inversion vectors associated to the six elements of S 3 , along with their tree realization numbers, see Figure 11.
To define coordinates on the space of left inversion vectors, we use the the totally ordered sets [k] := {1 < 2 < · · · < k} for k a positive natural number. It is easy to see that the left inversion vector construction establishes a bijective correspondence between S n and the Cartesian product of sets of the above form, i.e., there is a bijection The next lemma, which is crucial for the rest of the paper, follows immediately from this observation. It was first established in [17], though not formulated explicitly in terms of the left inversion vector.

Lemma 3.4.
If B is a strict barcode with one essential bar [b 0 , ∞) and n non-essential bars σ(B)).
An immediate consequence of this lemma is that if B and B are combinatorially equivalent barcodes, in the sense of Definition 2.24, then their realization numbers are the same. It follows that the tree realization number induces a function on the symmetric group, i.e., Before analyzing this function on the symmetric group, we identify some interesting properties of the set of barcodes under the combinatorial equivalence relation, to prepare our exploration of the combinatorics of the TRN in earnest in subsequent sections.

Convexity of Combinatorial Equivalence Classes.
In this section we prove that combinatorial equivalence classes are convex in a certain sense: if two strict barcodes B and B are of the same combinatorial type, then they can be connected by a "line segment" of barcodes 1 all of the same permutation type.
We prove first that the set B n admits the algebraic structure necessary to formulate a convexity result.

Lemma 3.5.
(1) For all λ ∈ R >0 and B is also a strict barcode.
is also a barcode with distinct birth times, which is strict if B and B have the same permutation type.
Proof. The proof of (1) is trivial, since λ is assumed to be positive, whence multiplication by λ preserves the order of real numbers.
The only subtlety in the proof of (2) concerns distinct death times. If the permutation types of B and B are different, it could happen that d i < d j and d i > d j , but d i + d i = d j + d j , so that B + B would not be strict. If they have the same permutation type, then this cannot happen. Lemma 3.6. For every n and every σ ∈ S n , the set of strict barcodes of permutation type σ is convex, i.e., for B and B of permutation type σ, the interval Proof. Given the previous lemma, it remains only to prove that the permutation type of tB + (1 − t)B is σ, which follows immediately from the observation that Remark 3.7. We can also formulate the lemma above as saying that there is a "straightline path" from B to B ,  It is not hard to show that this function is indeed continuous with respect to both the bottleneck metric and the Wasserstein metric on B n , but we choose not to do so here, to avoid introducing further definitions outside of the focus of this paper.
It is interesting also to consider the path BB when the barcodes B and B are not of the same permutation type. As mentioned in the proof of Lemma 3.5, not every point of BB is necessarily a strict barcode in this case, which allows the path to move from one permutation type to another. One can show that the smallest number of different classes that the path goes through is the length of the shortest path between the two corresponding permutations of B and B on the Cayley graph defined using the generating set of elementary (neighboring) transpositions τ i = (i, i + 1). This value is related to the Bruhat order, which we introduce in the next section. A fuller description would involve describing the space of barcodes in terms of a family of convex sets that fiber over the permutohedron. We leave this for future work. Example 3.8. Figure 12 shows an example of the path described in the proof above, using the representation of barcodes as persistence diagrams. The path consists of the straight lines between the matched points of the diagrams. Note that the dotted lines indicating the births and deaths never cross for the same birth and death order, respectively, because the barcodes stay in the same permutation class at each step of the path. It is possible for b 1 to be greater than b 2 , for example, but the relative order of births and deaths does not change.
Lemma 3.6 allows us to fix a standardized representative of each combinatorial barcode type, making the connection to the symmetric group explicit.
It is clear that B is strict and has permutation type σ. We sometimes write B(σ) for the standard barcode associated to σ. Lemma 3.6 implies that any strict barcode B of permutation type σ can be connected via a straight-line path to the barcode B(σ).

Tree Realization Number Preserves Bruhat Order.
It is interesting to study both the tree realization number from a combinatorial point of view via the symmetric group and the symmetric group from a "barcode" point of view via the realization number. To our knowledge, the product of the components of the left inversion vector is not a very commonly used statistic on symmetric groups, so we take this opportunity to study some of its properties.
Observe first that two adjacent permutations in the Cayley graph (i.e., two permutations that differ by left multiplication by one elementary transposition τ i = (i, i + 1)) never have the same realization number. This follows easily from the definition. As a consequence, the realization number is locally injective, although it is not globally injective, since barcodes of type (12) and type (23) have the same TRN. In this section we extend this local injectivity observation, proving that the TRN defines an order-preserving map from the symmetric group to the natural numbers, when the symmetric group is equipped with the appropriate Bruhat order.
Recall that the symmetric group is generated by elementary transpositions τ i := (i, i + 1). This implies that any element of S n can be represented using a word made using the alphabet A = {(i, i + 1)} n−1 i=1 , although that representation need not be unique. A word representing a certain permutation is reduced if it is of minimal length. The length of a permutation is the minimal length of a word representing the permutation. Definition 3.10 (Left Bruhat Order). The left Bruhat order is a partial order on S n , specified as follows. If σ, σ ∈ S n , then σ < σ if the length of σ is less than that of σ , and there exist τ i 1 , ..., τ i k ∈ A such that σ = τ i 1 · · · τ i k σ. Example 3.11. In S 3 we note that (123) > (23) under the left Bruhat order because (123) = (12)(23), where we use cycle notation and where composition is read from right to left. In the left Bruhat order (123) and (12) are not comparable; see Figure 9.
The next lemma shows that the realization number increases with increasing left Bruhat order. We remark that this lemma can be viewed as a consequence of a classical result, which is mentioned in [8]: if σ < σ , then the number of inversions in σ is greater than the number of inversions in σ. Lemma 3.12. If σ, σ ∈ S n are such that σ < σ in the left Bruhat order, then R(σ) < R(σ ).
Proof. Since σ < σ , there exist τ i 1 , ..., τ i k ∈ A such that σ = τ i 1 · · · τ i k σ. If k = 1, so that σ and σ are adjacent on the Cayley graph, i.e., σ = τ i σ for some i. By assumption, the length of σ is greater than that of σ. Translating Proposition 3.5 in [17] into the language of permutations, we deduce that The result now follows by induction on the number of transpositions τ i .

Example 3.13.
One can see the Cayley graph of S 4 in Figure 13. Notice that two permutations σ, σ satisfy σ < σ in the Bruhat order if and only if the shortest path from σ to the identity contains the shortest path from σ to the identity. The realization number increases along such paths.

Remark 3.14.
It is interesting to consider the TRN as a discrete Morse function [11] on the order complex of S n . We note that the TRN has a unique max and min on S n , which appear to be the only critical points, recovering the known result, e.g. [8], that the order complex of S n is homotopy equivalent to a sphere.

The Sum of Realization Numbers and Chains in the Lattice of Partitions.
Given that the tree realization number on the set of strict barcodes induces a function R : S n → N, it is natural to study the sum: σ∈Sn R(σ).
As we show in this section, this sum is equal to the number of combinatorial classes of merge trees (Definition 2.6) and provides another quantitative characterization of the difference between merge trees and phylogenetic trees, which is explored further in the next section. The sum of TRNs also connects this work with a classical object of study in algebraic combinatorics: each combinatorial equivalence class of merge trees corresponds to a maximal chain in the lattice of partitions, ordered by refinement. For topologists this should make intuitive sense: as two connected components merge this coarsens the partition of a sublevel set into connected components. Enumerating these components leads naturally to the study of the partitions of the set of {0, 1, . . . , n}.
We start now by showing that this sum counts combinatorial equivalence classes of merge trees, but first prove a preparatory lemma. Proof. Lemma 2.27 guarantees that the barcodes B and B associated to T and T have the same permutation type, so that the straight-line path BB of Remark 3.7 does indeed exist, and every point on the path is a barcode of that permutation type by Lemma 3.6. We now apply the Elder Rule to construct a one-parameter family of merge trees that lifts the path BB .
Since (T, h) and (T , h ) are combinatorially equivalent, the trees T and T are isomorphic as graphs. Without loss of generality, we can suppose that T = T .
To define our one-parameter family of merge trees, we set T t = T for all t ∈ [0, 1] and specify the height function h t : V(T ) → R as follows. We have no choice but to set h t (r) = ∞, where r is the root, so it remains only to define h t on the non-root nodes.
If v i is the i-th leaf node by birth order in T , and therefore corresponds to the i-th bar of B t , then the h t (v i ) is chosen to be the birth time of this bar, i.e., Similarly, if w i is the internal node corresponding to the i-th bar in B t , then h t (w i ) is chosen to be the death time of this bar, i.e., By construction, the barcode associated to (T, h t ) is clearly B t .
It was shown in [22] (Theorem 2.2) that the interleaving distance between two merge trees in bounded by the maximal difference between the two height functions. Since T t 1 = T t 2 for all t i ∈ [0, 1] and the height functions h t change continuously with respect to the l ∞ norm, it follows that the path defined by t → (T t , h t ) in the space of trees is continuous.
It is clear that a merge tree in standard form has a barcode in standard form (Definition 3.9). Lemma 3.17. For all σ ∈ S n , the tree realization number R(σ) is equal to the number of combinatorial equivalence classes of merge tree whose barcode has permutation type σ.
It follows immediately from this lemma that σ∈Sn R(σ) = #{combinatorial classes of merge trees}, since barcode permutation type is also an invariant of the combinatorial equivalence type of the merge tree.
Proof. By Lemma 3.15 there is a path in MT n from any merge tree whose barcode is of permutation type σ to one that is in standard form (Definition 3.16).
The tree realization number R(σ) counts the number of merge trees in standard form with the standard form barcode B(σ); see Definition 3.9. If two different merge trees (T, h) and (T , h ) are both in standard form with the same barcode B(σ), then they cannot be combinatorially equivalent. The inductive construction that created T and T must have differed in a choice for some i ∈ {1, . . . , n} of where to attach a branch with leaf node at height i: to a branch with leaf node at height j or height j , with 0 ≤ j = j < i. An isomorphism of merge trees from (T, h) to (T , h ) would have to exchange the order of of the leaf nodes at heights j and j , which is prohibited by the definition of combinatorial equivalence of merge trees (Definition 2.6).
Since every merge tree is combinatorially equivalent to one in standard form, where leaf nodes are at heights {0, 1, . . . , n}, we can use this positioning to relate merge trees with maximal chains in the lattice of partitions of n. We review briefly the necessary definitions.

Definition 3.18.
A partition of the set n := {0, 1, . . . , n} is a collection of pairwise disjoint subsets U = {U 1 , . . . , U k } of n whose union is n. A partition U refines a partition U , written U U , if every subset of U is equal to a union of elements of U. Said differently, U U if for each U i ∈ U there exists U j ∈ U such that U i ⊆ U j . We denote the set of partitions of n by P n . The refinement relation endows the set P n with a partial order, which also happens to be a lattice. A chain in the lattice of partitions is a sequence of comparable partitions Such a chain is maximal if it is not a subsequence of any longer chain.
For the sake of notation, we can always write a partition of n as an ordered list where each subset is separated by a vertical line. The finest possible partition-and hence the bottom element of the P n -is denoted The top element of P n is the set {n}. Theorem 3.19. Combinatorial equivalence classes of merge trees with n + 1 leaf nodes are in bijective correspondence with maximal chains in the lattice of partitions P n . As a consequence, the sum of realization numbers is given by the following closed form formula: Proof. Given a merge tree (T, h) in standard form with n + 1 leaves, we explain first how to construct an associated maximal chain in the lattice of partitions, P n . We then show that every maximal chain is associated to some merge tree and that non-equivalent trees gives rise to distinct maximal chains.
Since (T, h) is in standard form, all of the merge events (bifurcations) happen after (are at greater height than) all the birth events. It follows that the sublevel set of h : V(T ) → R at any value in the interval (n, n + 1) ⊂ R consists of n + 1 components, corresponding to the finest partition S(T ) 1 := {0|1|2| · · · |n}.
As we cross height n + 1, the definition of the standard form implies that a merge event of two components, born at heights i and j, occurs. This merge event has the effect of coarsening the partition S(T ) 1 , placing the two elements i and j into a single set of the partition. This defines the next, coarser partition S(T ) 2 .
In general the i-th partition associated to the tree T is the partition of the leaf nodes into connected components at height n + i. At height 2n the sublevel set of the tree is connected, which corresponds to the top element in P n .
Each standard form merge tree thus gives rise to a chain of 2n elements in P n , which is obviously maximal. Moreover, from any maximal chain U 1 · · · U in P n , one can always build a merge tree that realizes the chain as follows. Start by defining a filtration of the set of subsets of [n], where a subset V ⊂ n enters the filtration at n+i, where i is the smallest index such that V ⊂ U for some U ∈ U i . This defines a function from the set of subsets of [n] (of which the geometric realization is the n-simplex) to R. Taking the merge tree of this function as in Remark 2.4 associates a merge tree to a chain in P n .
Injectivity of the map from standard form merge trees to maximal chains is also clear. If two merge trees in standard form produce the same maximal chain, then their heights and adjacency relationships must be the same, i.e., they must be combinatorially equivalent.
The number of maximal chains in P n was determined by Erdős and Moon [9] to be (n + 1)!n!2 −n . This number is easily understood in the setting of merge trees. First, one chooses two of the n+1 connected components to merge at height n+1. Then one chooses two of the remaining n connected components to merge at height n + 2. This process repeats until we run out of options at height 2n. The number of ways of constructing standard form merge trees is thus n + 1 2 n 2 · · · 2 2 = (n + 1)n 2 · n(n − 1) 2 · · · 2 · 1 2 = (n + 1)!n! 2 n . Example 3.20. Figure 14 shows the lattice of partitions on the set {0, 1, 2} together with the three possible merge trees corresponding to the maximal chains in the lattice.

Remark 3.21 (Expected Tree Realization Number).
It is very convenient that n! appears in the numerator of the sum of realization numbers. As we explain in greater depth in the section on statistics for the realization number, this allows us to compute the average realization number when S n is equipped with the uniform measure, for which the probability of a permutation σ P(σ) = 1 n! . Indeed, by rearranging terms slightly, we see that the expected realization number is determined by the ratio of (n + 1)! and 2 n : Before studying the probabilistic aspects of the realization number more fully, we first compare Theorem 3.19 with analogous counting results for phylogenetic trees in the next section.

Counting Merge Trees versus Phylogenetic Trees.
In this section, we compare two counting results for combinatorial merge trees and for phylogenetic trees. On the one hand, Theorem 3.19 implies that there are (n + 1)!n! 2 n different combinatorial merge trees with n + 1 leaves. On the other hand, it was shown in [10] that there are (2n − 1)!! distinct combinatorial phylogenetic trees with n + 1 leaves. In general, there are more classes of merge trees than there are phylogenetic trees. In the next example, we work through the case n = 3 in detail.  Figure 15. In Figure 7C, one can see the 18 different classes of merge trees, arranged by row according to their permutation type in S 3 . There are three pairs of merge trees highlighted with colored boxes that correspond to the same combinatorial type of phylogenetic tree.  Figure 7C, as discussed in Example 3.22.
As the example above shows, the essential difference between classes of merge trees and classes of phylogenetic trees is that merge trees are sensitive to relative heights of internal (death) nodes, whereas a phylogenetric tree is not. This also explains why two combinatorially equivalent metric phylogenetic trees (T, m) and (T , m ) may be associated to different permutation types, if one uses Proposition 2.12 to define a height function on each and compute a barcode according to the Elder rule. However there are certain orders of births and deaths that must be preserved. As one can see in Figure 7C, the pair of trees in the purple box under column B both have the blue bar being born before and dying after the purple bar; the relative positioning of the death time associated to the red bar is the only thing that changes.
In this section we pinpoint more precisely how many different classes of merge trees can produce the same class of phylogenetic tree. As one might imagine, this is dictated in part by certain subgroups of the symmetric group, determine essentially by the number of incomparable internal nodes in a certain the natural partial order on the tree nodes specified by p < q if p is on the unique path from q to the root. Our bound on the number of classes of merge trees that define the same class of phylogenetic trees is formulated as follows. Recall that we assume that the root of any rooted tree has a unique child. If η(T ) denotes the number of combinatorial equivalence classes of merge trees indistinguishable from T when regarded as combinatorial phylogenetic trees, then Proof. We prove our result by induction on the maximum path distance in T from the child c. If the maximum path distance to the child is 0, then T has a unique internal node c, i.e., T has three nodes: the root r, its child c, and two leaves. This tree admits unique combinatorial merge and phylogenetic strucures, whence η(T ) = 1 = 0!.
Suppose now the result holds whenever the maximal path distance from the child c is less than k, for some k ≥ 1. Decompose the internal nodes of T into k sets A 1 , A 2 , ..., A k . All nodes in A k have only (two) leaf descendents, as otherwise there would exist an internal node further away from c than some node in A k , so the maximal path distance to c would be greater than k.
Let A k = {q 1 , q 2 , ..., q s }. If we remove the leaf nodes attached to each q i ∈ A k , we obtain a phylogenetic tree T with internal nodes partitioned into sets A 1 , A 2 , ..., A k−1 . By the induction hypothesis, there are at least k−1 j=1 |A j |! combinatorial equivalence classes of merge trees indistinguishable from T when considered as phylogenetic trees.
For each such equivalence class, we can obtain merge trees indistinguishable from T as phylogenetic trees by reattaching the leaves to each q i and choosing any ordering on A k , which we may do because all q i are at the same distance from c, and hence are incomparable nodes. Since there are |A k |! possible total orders on the set of q i , we can conclude.

THE PROBABILISTIC STUDY OF TREE REALIZATION NUMBERS
As already foreshadowed by Remark 3.21, the formula in Theorem 3.19 provides us with an unexpected gift in the study of statistics for realization numbers. Assuming that every combinatorial type of barcode is equally likely, so that each permutation type σ has probability 1 n! , we calculated that the expected tree realization number (TRN) is We regard the assumption that each barcode permutation type is equally likely as a sort of "null hypothesis" to be tested against. Even if one considers Gaussian perturbations to functional data, characterizing the image of this measure on the space of merge trees and hence (combinatorial types) of barcodes is an open problem. Depending on the setup, it may be the case that features tend to die in the order in which they are born (a sort of "topological first in first out" queue) or it might be the case that features die in the opposite order in which they are born (a "first in last out" queue). In general, for real data, it is unlikely that the distribution of permutation types of (barcodes of) merge trees will be uniform. Regardless, characterizing the distribution of TRNs in terms of the output of the function R : S n → N when S n is equipped with the uniform measure provides an important null hypothese against which to test real data.
In this section we start with a brief outline of computational methods for generating random barcodes and compare the corresponding distribution of permutation types with the uniform distribution. We then provide formulas for first and second moments of the pushforward distribution π n := R * µ n , where µ n is the uniform measure on S n . This allows us to calculate the variance of the TRN, which opens the door to hypothesis testing wherever the map from trees to barcodes is of interest to scientific applications.
Somewhat surprisingly, Theorem 4.1 says that the exact value for the measure π n can be determined from π n−1 and Dirichlet convolution with the uniform distribution on S n−1 , enabling us to study the entire distribution of TRNs as the number of features varies. To conclude, we provide a novel closed-form formula for the expected log-realization number, which allows us to characterize the empirical data in Figure 2 in a more analytical manner.

Distributions of Randomly Generated Barcodes.
In this section we briefly describe two methods to generate random barcodes and consider the pushforward distribution on S n for each of these. This pushforward is defined by the identification of barcodes with permutations as described above.
The first method was used in [17] to generate barcodes and compare their realization numbers to the ones of biological barcodes, in a way similar to Figure 2. To generate a barcode with n bars, for each bar we first pick a birth time b i uniformly at random in the interval [0, 100] and then pick a death time d i ∈ [b i , 100] uniformly at random. Because the latter distribution is conditioned on d i > b i , the induced distribution on the symmetric group is not uniform, as seen in Figure 16 with the "random" green dots.
The second method displayed in Figure 16 forces separation of births and deaths to guarantee a uniform distribution on the symmetric group. To generate n bars in a barcode, we first choose n births uniformly in the interval [0, 49], then n death times d i uniformly in [50,100]. A moment of reflection shows that this provides a uniform distribution on S n , as seen in Figure 16 with the "separated" blue points.

The Distribution of Tree Realization Numbers via Dirichlet Convolution.
Let µ n denote the uniform distribution on S n . By our correspondence, this is also a distribution on combinatorial equivalence classes of barcodes. The tree realization number R : S n → N then defines a random variable where the probability P(R = t) is determined by the number of permutations n t with realization number t. The following theorem states that this probability can be computed recursively via convolution with the uniform distribution on 1, . . . , n.
Theorem 4.1. For any k ≥ 1, let µ k denote the uniform distribution on S n and π n = R * (µ n ) its pushforward onto N via R : S n → N. Let U k denote the uniform distribution on {1, 2, ..., k}.
The probability mass function of π n can be recursively defined as follows.
It follows immediately from this theorem that π n = U n * U n−1 * ... * U 1 for all n ≥ 1.
Proof. We prove this theorem by induction on k. It holds trivially for k = 1. Suppose that it holds for k − 1 for some k ≥ 2. Each number that has positive probability under π k−1 Consider the map κ j k−1 : S k−1 → S k that embeds S k−1 into S k as follows. For every σ ∈ S k−1 , the permutation κ j k−1 (σ) is specified by In other words, κ j k−1 sends σ ∈ S k−1 to the permutation κ j k−1 (σ) ∈ S k that maps the k-th object to j and then "bumps up" by one the assigned value of elements in {1, 2, ..., k − 1} that are mapped to an element greater than or equal to j.
Each map in the collection {κ j k−1 } k j=1 is injective and collectively their images surject onto S k . To determine the realization numbers for S k , we therefore need only compute the realization number of κ j k−1 (σ) for all j ∈ {1, . . . , k} and σ ∈ S k−1 .
We are now prepared to compute π k . Let x ∈ N.
For what follows, it is useful to consider for each n the multiset Π n , which is the range of R : S n → N, taking into account multiplicities. Let m n : N → Z ≥0 be the multiplicity function of Π n , i.e., m n (x) is the number of times x ∈ N appears in Π n , which is the number of permutations in S n that have realization number x. In particular, m n (x) = 0 if and only if x ∈ Π n .
Since π n is the pushforward of the uniform distribution on S n , the probability of each x is determined by dividing the multiplicity function by n!, i.e., π n (x) = m n (x) n! . The following corollary follows directly from the construction of π n . In other words, Π n can be defined as a [k] * Π n−1 , where [k] = {1, . . . , k} and * is the Dirichlet convolution of multisets.  We now explicitly describe Π i for i ∈ {1, 2, 3, 4}. For convenience we write the mutisets Π i as sets with repetition. Counting the number of appearances of a number k determines m i (k).
Proof. We prove the result by induction on n. The base case (n = 1) holds trivially, so assume that the formula holds for n = k. Consider E(π 2 k+1 ) = b∈Bn R(b) 2 . Since to prove our result, we need only show that x∈N m k+1 (x) 2 = (n + 1)!(2n + 1)! 12 n .
Remark 4.7 (Higher Moments of the TRN). In general, we can define the k-th moment E(π k n ) by rewriting n!E(π k n ) = E(Π k n ) = ( n a=1 a k )E(Π k−1 n ) and using this recursive relationship to compute a formula. We note that by Faulhaber's formula, where B k−i is the k − i Bernoulli number.
One can view the results above as a complete characterization of TRNs under the null hypothesis that combinatorial classes of barcodes are distributed uniformly or as part of the growing literature on statistics on the symmetric group, see e.g., [18]. In the following section, we investigate another such statistic.

Distributions of Log Realization Numbers.
Since the maximum realization number for a barcode with n non-essential bars is n!, it is convenient to work instead with the logarithm of the realization number, which we call the log realization number. The log realization number was used in [17] as a statistic on barcodes obtained from dendrites; see Figure 2 for a reminder. This was shown to distinguish between apical and cortical dendrites. Of course, the process of taking the logarithm affects the distribution of TRNs. Jensen's inequality provides a way to bound the expected log realization number. In this section we compute the expected log realization number of uniformly drawn barcodes. Since this Cartesian product has size n!, a uniform distribution on S n can be viewed as a uniform distribution on the set of left inversion vectors. The notation P(B ∼ µ n ) denotes the probability of a combinatorial equivalence class of barcodes B under the uniform distribution, that is 1 n! . It follows that log(l i (B)).
Since B ∼ µ n , and each coordinate in  log(i!) i .

CONCLUSION
In this paper, we extended the results of [17] and [4] to provide a more precise characterization of the distribution of tree realization numbers (TRNs). This investigation led us to consider the uniform distribution on the symmetric group and the expected TRN, which in turn put us in a setting where classical results from combinatorics could be used. This extraction of the notion of a combinatorial version of a merge tree led us to understand more precisely the difference between merge trees and metric phylogenetic trees [1].
We emphasize that the TRN provides a convenient summary statistic on the space of barcodes that could lead to a better understanding of inherent biological properties of neurons. If we can identify where biological barcodes live on the space of barcodes, it opens the door to many applications such as statistics of learning of "biological" barcodes, allowing to create artificial barcodes that mimic the properties of biological ones and hence to generate neurons from them that are statistically relevant, yet express higher variability. By studying the simplest possible version of a null hypothesis-where combinatorial equivalence classes of barcodes are uniformly distributed-we are in a position to move on to study more interesting variants on the null hypothesis in TDA and explore the geometry of barcode space in even greater detail.