Reconstructibility of unrooted level-$k$ phylogenetic networks from distances

A phylogenetic network is a graph-theoretical tool that is used by biologists to represent the evolutionary history of a collection of species. One potential way of constructing such networks is via a distance-based approach, where one is asked to find a phylogenetic network that in some way represents a given distance matrix, which gives information on the evolutionary distances between present-day taxa. Here, we consider the following question. For which~$k$ are unrooted level-$k$ networks uniquely determined by their distance matrices? We consider this question for shortest distances as well as for the case that the multisets of all distances is given. We prove that level-$1$ networks and level-$2$ networks are reconstructible from their shortest distances and multisets of distances, respectively. Furthermore we show that, in general, networks of level higher than~$1$ are not reconstructible from shortest distances and that networks of level higher than~$2$ are not reconstructible from their multisets of distances.


Introduction
Phylogenetic trees are often used to represent the evolutionary history of species, or more generally, taxa [Fel04]. Trees can be a powerful tool for elucidating relationships between species, especially in case the species in question have evolved only via speciation events. However, other events often also drive evolution, including hybridization, introgression, and lateral gene transfer. When such reticulate events occur, more general graphical structures, known as phylogenetic networks [BvIJ + 13, HRS10] can be a useful addition to trees.
There are two main types of phylogenetic networks: rooted and unrooted networks. A rooted network is a directed acyclic graph that represents how extant taxa have evolved from a single common ancestor, also known as the root. Internal vertices denote either speciation or reticulate events, and edges have directions to indicate the transfer of genetic material between the two vertices that are incident to it. Unrooted networks have similar properties except they have no direction on the edges. A lack of direction could, for example, represent an ambiguity in knowledge of the direction in which genetic material is transferred between species. Note that every rooted network has an underlying unrooted network, that can be obtained by suppressing the root vertex and ignoring edge directions. Conversely, one can try to obtain a rooted network from an unrooted network by estimating the location of the root via an outgroup, if it is known which vertices represent reticulations [HvIJ + 19]. In this paper we will only consider unrooted networks, which we shall call networks for short. We present an example of such a network in Figure 1.
As the shift from phylogenetic trees to networks has become more prevalent in the biological literature, finding good ways to construct phylogenetic networks has become a core theme in phylogenetics. Such an undertaking has experienced major developments through various reconstruction approaches (e.g., maximumlikelihood [JNST06]; building blocks [vIJJ + 18, MvIJ + 19, vIJJ + 19]; distance-based [BM04,BHMS18]; see [HRS10] for an overview). In this paper we consider the distance-based approach, in which one is given a distance matrix on the set of taxa in question and then aims to build a network representing this matrix. An entry in a distance matrix gives the evolutionary distance, a measure of genetic divergence between distinct taxa. This raises the following question. 'Is there a network that precisely represents the given distance matrix? ' The groundwork for distance-based methods is well-established for phylogenetic trees [Sok58,SN87,Fel04,PG16]. For networks, the story is more complicated. Since networks can contain cycles, there can be more than The leaves are bijectively labelled by the set X. If |X| = 1 then we define the singleton graph with one vertex labelled by the element of X as the network on X. A network with no cycles is a (phylogenetic) tree.
Deleting an edge uv from a network is the action of removing the edge uv and suppressing any degree-2 vertices in the resulting subgraph. Deleting a vertex from a network is the action of removing the vertex, deleting all its incident edges, and suppressing any degree-2 vertices in the resulting subgraph. A cut-edge of a network is an edge whose deletion disconnects the network. We call a cut-edge trivial if the edge is incident to a leaf, and non-trivial otherwise. Note that for a network N on X, deleting a cut-edge breaks the network into two components. The leaf-set X can be partitioned into the leaves that are contained in one component and the leaves that are contained in the other; therefore every cut-edge of a network induces a partition X = Y ∪ Z of X (where one of Y or Z could possibly be empty). These partitions are not unique in general (i.e., two distinct cut-edges can induce the same partition). Upon cutting a non-trivial cut-edge, if one of the components is a tree, then we say that the subgraph that corresponds to this component is a pendant subtree. Given a cut-edge uv we say that a leaf x can be reached from u via uv if, upon deleting the edge uv without suppressing degree-2 vertices, x is in the same component as v in the resulting subgraph.
A biconnected component (blob) of a network is a maximal 2-connected subgraph with at least three vertices. We say that a network is a level-k network if at most k edges must be deleted from every blob to obtain a tree. We say that a leaf is contained in a blob if the neighbour of the leaf is a vertex of the blob. A cut-edge is incident to a blob if one of the endpoints of the edge is a vertex of the blob. A blob is pendant if there is exactly one non-trivial cut-edge that is incident to the blob. We say that a leaf x can be reached from a blob B via a cut-edge uv if u is a vertex of B and x can be reached from u via uv.
Let N be a network on X and let x and y be leaves in N . We recall the notation used in [BHMS18]. The multiset of distances between x and y, denoted d(x, y) (and sometimes as d N (x, y) where necessary), is the multiset consisting of lengths of all possible paths between x and y in N . Since N is an unweighted network, the length of a path is simply the number of edges contained in the path. We let D(N ) denote the |X| × |X| matrix whose (x, y)-th entry is d(x, y). We further define the shortest distance between x and y, denoted d m (x, y), by taking d m (x, y) = min d(x, y). We analogously define D m (N ) to be the |X| × |X| matrix whose (x, y)-th entry is d m (x, y). An example of a network with its multisets of distances is illustrated in Figure 1. We use the following notation for the multisets. A multiset is a tuple (A, m) where A is a set and m is a function that specifies the multiplicity of each element in A. For x / ∈ A, we let m(x) = 0. We will, for the most part, write (A, m) as A = {a m(a1) 1 Note that a leaf can be in both a cherry and a chain. In a network without cherries, it is possible to partition the leaves into chains.
Let B be a level-2 blob of some network N . We may obtain the generator of B by deleting all cut-edges that are incident to B and taking the component that is B. The edges of the generator of B are called the sides of the generator, or simply the sides of B. Let N be a network with no pendant subtrees, let e be a side of B, and let x be a leaf in N . If the neighbour of x, say p, subdivides e in N then we say that x is on the side e or that the side e contains x. We say that a chain a = (a 1 , . . . , a k ) is on the side e or that the side e contains the chain a if every leaf a i in the chain is on the side e. If an endpoint of a cut-edge uv subdivides e then we say that the side e is incident to uv.
For an overview of the definitions presented in this section, see Figure 1.  Figure 1: A level-2 network with its multisets of distances. The network contains two chains (a, b) and (c), and a cherry {d, e}. All edges incident to leaves are trivial cut-edges, and edge f is the only cut-edge that is non-trivial. The dashed path is the side of the blob that contains the leaf c. In the distance matrix, the diagonal elements are {0}, and as the matrix is symmetric, many of the elements are omitted. The shortest distance matrix can be obtained by taking the smallest element in each multisets to be the element of the matrix in the same position.

Networks that cannot be reconstructed
In this section we give examples of networks that cannot be reconstructed from their shortest distances or from their multisets of distances. Figure 2 shows two distinct level-2 networks with the same shortest distance matrix. Observing that we may replace the leaves with the same label by the same pendant subtree to extend this example to a level-2 network on at least 4 leaves, we obtain the following lemma.
Lemma 3.1. There exist two distinct level-2 networks on n leaves for n ≥ 4 with the same shortest distance matrix. Note that the networks in Figure 2 have different multisets of distances -we investigate this further in Section 5 and show there that level-2 networks are reconstructible from their multisets of distances. Figure 3 presents two level-3 networks on 2 leaves that have the same multisets of distances. Because the shortest distance matrix can be obtained by taking the smallest number for each element in the multisets of distances, the two networks also have the same shortest distance matrix. Observe that this can be generalized to level-k networks for k ≥ 3 by replacing the level-3 blob by an arbitrary level-k blob. In addition, applying the same pendant subtree argument as in the level-2 network case gives us the following lemma.
Lemma 3.2. There exist two distinct level-k networks for all k ≥ 3 with the same shortest distance matrix / multisets of distances.
Therefore, networks of level higher than 1 are not reconstructible from their shortest distances in general; networks of level higher than 2 are not reconstructible from their multisets of distances in general.

Reconstructibility from shortest distances
In this section we show that level-1 networks as well as level-2 networks on fewer than 4 leaves are reconstructible from their shortest distances. We first look at level-1 networks. Noting that pendant blobs contain exactly one chain, the following lemma shows how we can identify this chain from the shortest distances.
Proof. Suppose first that a chain (a 1 , . . . , a k ) is contained in a pendant blob B. Let p 1 and p k denote the neighbours of a 1 and a k respectively, and let q denote the common neighbour of p 1 and p k . Let x ∈ X − {a 1 , . . . , a k }. Observe that any shortest path from x to a leaf contained in B must pass through the vertex q. Therefore we have that To show the other direction, we prove the contrapositive. Suppose that (a 1 , . . . , a k ) is not contained in a pendant blob. Then either the chain is incident to cut-edges, or the chain is contained in a non-pendant blob. Let p i denote the neighbours of a i for i ∈ [k], and let q denote the neighbour of p 1 that is not a 1 nor p 2 . Suppose first that the chain is incident to cut-edges. Let x be a leaf in the network that is not on the chain, such that x is reachable from p 1 via p 1 q. Then every path between x and a k must pass through the vertices p i for i ∈ [k], and therefore d m (x, a k ) = d m (x, a 1 ) + k − 1. Since k ≥ 2, the equality in the statement of the theorem does not hold.
So now consider the case that the chain is contained in a non-pendant blob. Then q is not a neighbour of p k ; the path between q and p k that does not contain the vertices {p 1 , . . . , p k−1 } contains at least three vertices. Now let x be a leaf not on the chain that can be reached from q via its incident non-trivial cut-edge. The shortest path from x to a 1 and the shortest path from x to a k both contain the shortest path from x to q. By observing that the shortest path from q to a 1 is shorter than the shortest path between q and a k , it follows that d m (x, a 1 ) < d m (x, a k ). Therefore the equality in the statement of the theorem does not hold. Proof. First we show that we can recognise cherries, reduce them and change the shortest distances accordingly. Note that as mentioned above, a pair of leaves forms a cherry precisely if their shortest distance is 2. If there exists a cherry {x, y}, we replace it by a leaf z and set d m (z, a) := d m (x, a) − 1 for all a ∈ X − {x, y}. All other shortest distances between leaf-pairs remain unchanged. After reconstructing the network from the modified distance matrix, we replace the leaf z by a cherry on {x, y}. So, without loss of generality, we assume from now on that there are no cherries.
We now consider the case that there is exactly one blob. Since there are no cherries, all leaves are contained in this blob. We can recognize this by seeing that there is a chain (a 1 , . . . , a k ) of length k ≥ 3 that satisfies d m (a 1 , a k ) = 3. This immediately shows how to reconstruct level-1 networks that contain exactly one blob. Hence, we assume from now on that there are at least two blobs.
Note that pendant blobs must contain a chain of length at least 2 since networks do not contain parallel edges. By Lemma 4.1, we can find chains on pendant blobs. We reduce a chain (a 1 , . . . , a k ) contained in a pendant blob by replacing the blob by a leaf z and setting d m (x, z) := d m (x, a 1 ) − 2 for all x ∈ X − {a 1 , . . . , a k }. All shortest distances between other leaf-pairs remain unchanged, since their paths do not travel through pendant blobs. It is again easy to reconstruct the blob after reconstructing the reduced network, since we know that (a 1 , . . . , a k ) must form a chain on the blob, in that order.
This finishes the proof of the theorem since any level-1 network has a cherry, a pendant blob, or exactly one blob.
We note that the restriction of Theorem 4.2 to networks without triangles also follows from Theorem 5 of [HHMM19]. We give the proof above to account for the triangle case and to give a more direct graphtheoretical proof that is independent of the results provided by Hayamizu et al.. Observe that trees (level-0 networks) are also level-1 networks. Thus Theorem 4.2 gives the following corollary, which we include here for completeness. This is a classical result that was proven in [HY65]. Next, we show that level-2 networks on fewer than 4 leaves are also reconstructible from their shortest distances.
Lemma 4.4. Level-2 networks on X for |X| ≤ 3 are reconstructible from their shortest distances.
Proof. There can only be one network on a single taxon, namely the singleton graph. Such a graph is trivially reconstructible from its shortest distances. So suppose that |X| = 2, say X = {x, y}, and let N be a network on X. Below, we will prove the claim that N consists only of level-2 blobs, where each level-2 blob is incident to exactly two cut-edges. In particular, N contains at most two pendant blobs, one of which contains the neighbour of x and the other the neighbour of y. Since each additional level-2 blob increases the shortest distance between x and y by 3, it follows that d m (x, y) = 3k + 1 where k denotes the number of level-2 blobs in N . From there, it follows that N is reconstructible from its shortest distances.
We now prove the claim. Note first that every blob in N must be incident to exactly two cut-edges. A blob cannot be incident to only one cut-edge. If the blob is level-1 then this would imply that it contains a loop; if the blob is level-2 then this would imply that it contains parallel edges. This also implies that every pendant blob must be incident to at least one trivial cut-edge. On the other hand if a blob is incident to more than two cut-edges, say c cut-edges, then this would imply that the network contains at least c pendant blobs. Since every pendant blob must be incident to at least one trivial cut-edge, this implies that the network contains at least c > 2 leaves, which is a contradiction. Therefore every blob in N must be incident to exactly two cut-edges. Now observe that a level-1 blob that is incident to exactly two cut-edges contains parallel edges. It follows that every blob in N must be a level-2 blob that is incident to exactly two cut-edges. This proves the claim, from which it follows by the argument presented above that N is reconstructible from its shortest distances for |X| = 2.
Suppose now that |X| = 3, and let X = {x, y, z}. Here we consider BT (N ), the blob-tree of N , which is obtained from N by replacing each blob of N by a single vertex. Since |X| = 3, BT (N ) contains exactly one vertex of degree-3, three vertices of degree-1 (which are the leaves x, y, and z), and all other vertices are of degree-2. By a similar argument as presented in the |X| = 2 case, the degree-2 vertices of BT (N ) correspond to level-2 blobs. The degree-3 vertex could be an internal vertex of the network, a level-1 blob, or a level-2 blob. In the case that it is a level-2 blob, there are two possibilities. Either the three edges are incident to different sides of the blob, or two edges are incident to the same side of the blob and the third edge to another side. See Figure 4 for these four possibilities. Observe that these four possibilities all contribute different distance lengths to inter-taxa distances. In particular, we have that the degree-3 vertex is a (an) • internal vertex if and only if (d(x, y), d(y, z), d(x, z)) = (2(mod3), 2(mod3), 2(mod3)); • level-1 blob if and only if (d(x, y), d(y, z), d(x, z)) = (0(mod3), 0(mod3), 0(mod3)); • level-2 blob with all edges on different sides if and only if (d(x, y), d(y, z), d(x, z)) = (1(mod3), 1(mod 3), 1(mod3)); • level-2 blob with the two edges that lead to leaves x and y on the same side if and only if (d(x, y), d(y, z), d(x, z)) = (0(mod3), 1(mod3), 1(mod3)).
Therefore we may identify the blob corresponding to the degree-3 vertex of the blob-tree by taking the distances modulo 3.
To finish the proof, take two networks N, N ′ with the same shortest distance matrix. By the previous paragraph, we may assume that N and N ′ have the same blob corresponding to the degree-3 vertex of their blob-trees. Assume that N = N ′ . Then the two blob-trees BT (N ) and BT (N ′ ) are different. Note that the shortest distances are determined by the number of degree-2 vertices between leaves in the blob-tree. Since D m (N ) = D m (N ′ ), we have that the number of degree-2 vertices between two leaves, say x and y, is the same in both BT (N ) and BT (N ′ ). However since BT (N ) differs from BT (N ′ ), the positioning of the degree-3 vertex must differ. But this would imply that upon placing z together with some degree-2 vertices, we can only satisfy one of . This contradicts the assumption that D m (N ) = D m (N ′ ). Therefore we must have N = N ′ , and that level-2 networks on X for |X| = 3 are reconstructible from their shortest distances.

Reconstructibility of level-2 networks from their multisets of distances
In the last two sections, we showed that level-1 networks are reconstructible from their shortest distances, level-k networks for k ≥ 2 are in general not reconstructible from their shortest distances, and level-k networks for k ≥ 3 are in general not reconstructible from their multisets of distances. In this section, we investigate the remaining case, and show that level-2 networks are reconstructible from their multisets of distances. The main theorem is the following.
Theorem 5.1. Level-2 networks are reconstructible from their multisets of distances.
The key ideas in proving the theorem are as follows. We first identify and reduce all cherries of the network. To identify cherries we observe that two leaves x and y form a cherry if and only if d(x, y) = {2}. To reduce cherries we replace it by a new leaf z and adjust the distance matrix accordingly, as done for the level-1 networks in the proof of Theorem 4.2. Next, we identify all leaves that are not contained in blobs, delete those leaves, and adjust the distance matrix accordingly. We show that each leaf that is deleted in this manner can be reattached to the reduced network in a unique fashion. After applying these two reductions, two chains are adjacent if and only if they are contained in the same blob. Using this observation, we then show that it is possible to identify pendant blobs, replace them by a new leaf, and adjust the distance matrix accordingly. Continuing in this fashion, we eventually reach the situation when the reduced network contains exactly one blob. We show that networks on single blobs are reconstructible from their multisets of distances, at which point it follows that simply reversing the reduction steps taken yields the original network.
We start with the two easy cases, when the network contains a cherry or a single blob.
Observation 5.2. Let N be a level-2 network on X and suppose that leaves x and y form a cherry in N . Upon replacing the cherry by a leaf z, we obtain a network N ′ on X ′ = X ∪ {z} − {x, y} such that the multisets of distances for N ′ contains the elements One may obtain N from N ′ by replacing the leaf z by a cherry {x, y}. Proof. Let N be a level-2 network containing a single blob. Assume without loss of generality that N contains no cherries, as we can recognize them from the shortest distances and reduce them by Observation 5.2. If N is a level-1 blob then we may reconstruct it from shortest distances by Theorem 4.2. If N is a level-2 blob then the blob must contain at least two chains since it has no parallel edges, and at most three chains. Noting that chains can be identified from the shortest distances, the placement of the chains on the blob sides can be done by matching the end-leaves of chains that have shortest distance 4.

Leaves not contained in blobs
Proof. Suppose first that a leaf x is not contained in a blob. Let p x denote the neighbour of x, and let p, q denote the two neighbours of p x that is not x. Observe that every leaf in X − {x} can be reached from p x via one of the cut-edges p x p or p x q. Let Y and Z denote the set of all leaves that can be reached from p x via the cut-edge p x p and p x q, respectively. Note that a shortest path between some y ∈ Y and some z ∈ Z passes through the edges p x p and p x q. Then by observing that the shortest path from x to y and the shortest path from x to z uses the same edges as the shortest path from y to z, bar the use of the edge incident to x twice, we obtain the equation We now show that such a partition is unique. We claim that all leaves that can be reached from p x via the edge p x p must be contained in the same set in the partition. Let y 1 and y 2 be an arbitrarily chosen pair of leaves that can be reached from p x via the edge p x p, and suppose for a contradiction that they are placed in different sets of the partition. Then, where the final inequality is the triangle inequality. Hence y 1 and y 2 must be contained in the same set of the partition; since y 1 and y 2 were chosen arbitrarily, all leaves that can be reached from p x via the edge p x p must be contained in the same set in the partition. Similarly, all leaves that can be reached from p x via the edge p x q must be contained in the same set in the partition. Observe that all leaves in X − {x} can be reached from p x via the edge p x p or via the edge p x q. Since neither sets of the partition can be empty, it follows then that the partition must be unique, with Y and Z containing all leaves that can be reached from p x via p x p and p x q, respectively.
To prove the other direction, we show that if a leaf x is contained in a blob B, then there is no such partition that satisfies the given equation. Let p x denote the neighbour of x. We first show that for leaves y, z ∈ X − {x}, if all shortest paths between y and z do not contain the vertex p x , then the equation is not satisfied by y and z. Let p y and p z denote the vertices on B that are closest to the leaves y and z respectively. Note that it is possible to have p y = p z -this is the case where all shortest paths between y and z do not pass through B. Then the following equations hold: We now distinguish two cases.
1. If p y = p z , then by the triangle inequality and as all shortest paths between y and z do not contain the vertex p x , we must have that It follows that where the final inequality follows from Inequality 1.
2. If p y = p z , then let p denote the neighbour of p y that is not on the blob B. Then where the first inequality follows since the shortest path between y and z may not pass through p (e.g., if p is a vertex on a blob), and the final inequality follows as d m (p x , p y ) ≥ 1 and d m (p y , p) = 1.
Observe that this holds in general for level-k networks where k ≥ 3 by replacing the level-3 blob by an arbitrary level-k blob.
It remains to show that for any partition Y ∪ Z of X − {x} where Y, Z = ∅, there exists a leaf pair y ∈ Y and z ∈ Z such that no shortest path between y and z uses p x .
Suppose first that B is a level-1 blob. Since our network contains no parallel edges, B must be incident to at least two cut-edges in addition to the edge p x x. If two leaves that can be reached from B via the same cut-edge are placed in different sets of the partition, then we are done as no shortest path between these leaves uses p x ; therefore we may assume that leaves that can be reached from B via the same cut-edge are placed in the same set in the partition. Since Y and Z are both non-empty, there must exist two cut-edges e 1 , e 2 (excluding p x x) whose endpoints form an edge of B, such that there exists a leaf that can be reached from B via e 1 and a leaf that can be reached from B via e 2 for which the two leaves lie in different sets of the partition. Every shortest path between these two leaves passes through the edge connecting the endpoints of e 1 and e 2 and therefore does not use p x . Therefore we are done. Now suppose that B is a level-2 blob. For the same reason as in the level-1 case (see proof of Theorem 4.2), if there are two leaves that can be reached from B via the same cut-edge that are placed in different sets of the partition, then we are done; therefore we may assume that leaves that can be reached from B via the same cut-edge are placed in the same set in the partition. Since Y and Z are both non-empty, it follows that there exist two cut-edges e 1 , e 2 incident to B, such that leaves y, z can be reached from B via e 1 , e 2 , respectively, for which y ∈ Y and z ∈ Z. There must exist a pair of such cut-edges such that all shortest paths between their endpoints on B do not contain p x , since there exist enough cut-edges to ensure there are no parallel edges in B. Given such a pair of cut-edges, take one leaf that can be reached from B via the first cut-edge and take another leaf that can be reached from B via the other cut-edge. Then no shortest path between this pair of leaves uses p x , and thus we are done.
Lemma 5.4 does not hold in general for networks of level higher than 2. An example of this for a level-3 network is shown in Figure 5.
We now show that after identifying a leaf that is not contained in a blob, we can delete it from the network and adjust the distance matrix accordingly. We also show that upon reconstructing the reduced network from the modified distance matrix, there is a unique cut-edge to which we may reattach the deleted leaf. Reattaching a leaf x to a cut-edge is the action of subdividing the cut-edge by a vertex p x , and adding an edge p x x. In the setting of Lemma 5.4, we say that the unique partition Y ∪ Z is induced by the leaf x.
Lemma 5.5. Let N be a level-2 network on X where |X| ≥ 3, and let x be a leaf that is not contained in a blob. Let Y ∪ Z denote the unique partition of X ′ = X − {x} that is induced by x. Then upon deleting the leaf x, we obtain a network N ′ on X ′ such that the multisets of distances for N ′ contains the elements In addition, there is only one edge location in N ′ where x can be reattached to, to obtain a network with the same multisets of distances as N . In particular, this network is isomorphic to N .
Proof. Let p x be the neighbour of x in N , and let p and q be the other neighbours of p x that are not x. As shown in the proof of Lemma 5.4, the sets Y and Z correspond to the leaves that can be reached from p x via p x p and via p x q, respectively. Upon deleting x from N , we note that p x becomes a vertex of degree-2 and is therefore suppressed in the resulting subgraph. Then all paths in N that used the edge p x p and the edge p x q have their length decreased by 1 in N ′ ; all paths in N that did not use the edges p x p and p x q are unaffected by this vertex suppression. Observe that any path between a leaf in Y and a leaf in Z uses the edges p x p, p x q in N . Furthermore, any path between two leaves in Y or any path between two leaves in Z did not use the edges p x p, p x q in N . Therefore the multisets of distances of N ′ can be obtained from the multisets of distances of N as shown in the statement of the lemma.
We now prove the second statement, namely that N ′ contains only one edge where x can be reattached to, so as to obtain a network with the same multisets of distances as N . By Lemma 5.4, we know that x is not in a blob, and that x induces a partition Y ∪ Z of X ′ . This implies that x must be reattached to N ′ at a cut-edge that induces the partition Y ∪ Z. We now show that there is only one such cut-edge in N ′ if we are to obtain a network with the same multisets of distances as N upon reattaching x. If there are two cut-edges e 1 , e 2 in N ′ that induce the same required partition Y ∪ Z, observe that any path from e 1 to e 2 must consist only of level-2 blobs that are incident to exactly two cut-edges. Note that level-1 blobs cannot be included here as otherwise we would produce parallel edges. Now take any leaf y ∈ X − {x}, and let N 1 and N 2 denote the networks obtained by attaching x to e 1 and e 2 respectively. Because of the level-2 blobs between e 1 and e 2 , we have that d N1 m (x, y) = d N2 m (x, y). But we know that there must exist one cut-edge e in N ′ to which we can attach x to obtain N . We locate this edge e by finding one that induces the correct partition and satisfies the equation d Ne m (x, y) = d N m (x, y). This proves the claim that x can be added back to N ′ via a unique edge to obtain a network with the same multisets of distances as N . Since there is a unique edge where x can be attached to in order to obtain a network with the same multisets of distances as N , the network obtained this way must be isomorphic to N .

Pendant blobs
For the remainder of this section, we will restrict to level-2 networks with at least two blobs and in which all leaves are contained in blobs. We can do this by Observation 5.2 and Lemmas 5.3, 5.4, and 5.5.
Proof. Suppose first that a chain (a 1 , . . . , a k ) with k ≥ 2 is contained in a pendant level-1 blob B. As there is only one non-trivial cut-edge incident to B, this chain is the only chain that is contained in B. It is then clear that, we must have d(a 1 , a k ) = {4 1 , (k + 1) 1 }. Now suppose that there exists a chain (a 1 , . . . , a k ) with k ≥ 2 such that d(a 1 , a k ) = {4 1 , (k + 1) 1 }. Clearly the distance k + 1 corresponds to the path between a 1 and a k that passes through the neighbours of a i for i ∈ [k]. Therefore we examine the path between a 1 and a k that does not pass through the neighbours of a i+1 for i ∈ [k − 2]. Note first that the chain cannot be contained in a non-pendant level-1 blob, as otherwise this path between a 1 and a k would pass through at least two vertices that are incident to non-trivial cut-edges. In this case, the length of the path between a 1 and a k would be at least 5, which is a contradiction. The chain also cannot be contained in a level-2 blob, as otherwise the set d(a 1 , a k ) would contain at least 3 elements. Therefore the chain must be contained in a pendant level-1 blob.
Lemma 5.7. Let N be a level-2 network on X in which (a 1 , . . . , a k ) is a chain that is contained in a pendant level-1 blob. Let N ′ be the network on X ′ = X ∪ {z} − {a 1 , . . . , a k } obtained from N by replacing the pendant blob by a leaf z. For every x ∈ X ′ − {z}, we can uniquely partition the multiset of distances d N (x, a 1 ) into two equal sized sets A and B such that A − 2 = B − (k + 1). Then the multisets of distances of N ′ contains the elements Proof. We first prove the claim that for every x ∈ X ′ − {z}, we can uniquely partition the multiset of distances d(x, a 1 ) into two equal sized sets A and B such that A − 2 = B − (k + 1). As usual, let p i denote the neighbours of a i for i ∈ [k], and let q denote the neighbour of p 1 that is not a 1 nor p 2 . Note that k ≥ 2 since otherwise there would be parallel edges. Let x ∈ X ′ . Then any path from x to a 1 consists of a path from x to q and a path from q to a 1 . There are two possible paths from q to a 1 : one is of length 2 and uses the edges qp 1 , p 1 a 1 ; the other is of length k + 1 and uses the edges qp k , p k p k−1 , . . . , p 2 p 1 , p 1 a 1 . Therefore every path from x to q yields two paths from x to a 1 , for which one of the paths is longer than the other by a length of k − 1. This implies that the size of the multiset d(x, a 1 ) is even, since every path from x to a 1 can be matched to another path from x to a 1 that shares the same part of the path between x and q. Now take the smallest element d ∈ d(x, a 1 ). By the argument presented above, there must exist a corresponding element d + k − 1 ∈ d(x, a 1 ). We place d in set A and we place d + k − 1 in set B, remove both elements from d(x, a 1 ) and recurse. By continuing this for the smallest element in d(x, a 1 ) at each step, this partitions the multiset into a bipartition d(x, a 1 ) = A ∪ B where |A| = |B| = d(x, a 1 )/2, such that A + (k − 1) = B. It follows from iteratively adding the smallest element from d(x, a 1 ) to A, that this bipartition is unique. This proves the claim.
To prove the second part of the lemma, first observe that any path between a leaf x ∈ X ′ − {z} and z in the network N ′ corresponds to a path between x and q in N . Now the multiset of distances between x and q in N can be obtained by finding the multiset of distances between x and a 1 that used the edges qp 1 , p 1 a 1 , and subtracting 2 from each element. This is precisely the set A − 2 that we have found above. For any other leaf y ∈ X ′ − {z}, we have that all paths between x and y are unaffected by the replacement of the blob by z, as the blob is pendant in N . Therefore d(x, y) remains unchanged for x, y ∈ X ′ − {z}.
It is again easy to reconstruct the blob after reconstructing the reduced network, since we know that (a 1 , . . . , a k ) must form a chain on the blob, in that order.

Pendant level-2 blobs
We adopt the following notation for pendant level-2 blobs. Let B be a pendant level-2 blob, and let a, b, c, d denote the four chains contained in B of lengths k, ℓ, m, n ≥ 0 such that chains c and d are on the same side of B as the non-trivial cut-edge. Then we say that B is of the form (k, ℓ, m, n). For ease of notation, a side without leaves is seen as a length-0 chain. See Figure 6 for pendant level-2 blobs of the forms (k, 0, 0, 0) and (k, ℓ, 0, 0). Proof. Suppose first that N contains a pendant level-2 blob B of the form (k, 0, 0, 0). Let e denote the non-trivial cut-edge that is incident to B. Then the path from a 1 to a k that uses the side of B without e and without the chain, the side of B with e, and the side of B with the chain are of distances 5, 6, and k + 1 respectively.
Suppose now that there exists a chain (a 1 , . . . , a k ) where k ≥ 2 such that d(a 1 , a k ) = {5 1 , 6 1 , (k + 1) 1 }. First, since |d(a 1 , a k )| > 2, we note that the chain (a 1 , . . . , a k ) must be contained in a level-2 blob. Consider a level-2 blob B that contains the chain (a 1 , . . . , a k ) on one of its sides, and suppose that there is a single non-trivial cut-edge e on another one of its sides. There must be at least one such edge e because otherwise there would be parallel edges. Currently we have that d(a 1 , a k ) = {5 1 , 6 1 , (k + 1) 1 }: adding more cut-edges (trivial or non-trivial) to the sides of B would change the set of distances. Since B is incident to exactly one non-trivial cut-edge, it is a level-2 pendant blob. Proof. Suppose first that a pendant level-2 blob B contains only the leaf a. Let uv denote the non-trivial cut-edge incident to B, where u is the vertex that is on B. Now, the shortest distance from a to u is exactly 3. Furthermore, the shortest distance from u to a leaf x that is not a is at least 3, since such a path must contain the edge uv, an edge of another blob, and an edge incident to x. In particular, such a path must contain an edge of another blob since all leaves are assumed to be contained in blobs. Therefore d m (a, x) ≥ 6 for all x ∈ X − {a}. To prove the second statement, let y, z ∈ X − {a}. Then by the triangle inequality, we have Now suppose that d m (a, x) ≥ 6 for all x ∈ X − {a} and for any two leaves y, z ∈ X − {a}, we have d m (a, y) + d m (a, z) − d m (y, z) ≥ 8. The first condition implies that (a) is a maximal chain. Suppose first that a was contained in a level-1 blob B. Note that B cannot be pendant as otherwise the network would have parallel edges . Let p a denote the neighbour of a (a vertex of B), and let p y , p z denote the two neighbours of p a on B that are not a. The vertices p y and p z are necessarily incident to non-trivial cut-edges, as otherwise a would be contained in a chain, in which case the condition d m (a, x) ≥ 6 would be violated for some leaf x in the chain. Now let y and z denote any leaves in X − {a} that can be reached from B via the cut-edges incident to p y and p z respectively. Then we have that d m (a, y) + d m (a, z) − d m (y, z) = 2 if a shortest path between p y and p z passes the vertex p a , and we have d m (a, y) + d m (a, z) − d m (y, z) = 3 otherwise. This contradicts our second condition, and therefore we may assume that the leaf a is contained in a level-2 blob B. Suppose that B is a non-pendant blob, in other words, that there are at least two non-trivial cut-edges incident to B. Take two non-trivial cut-edges that are closest to a, and take any two leaves y and z that can be reached from B via these cut-edges. The shortest distance from a to the endpoints of these cut-edges on B is at most 3. Therefore we have d m (a, y) + d m (a, z) − d m (y, z) ≤ 6, which contradicts our second condition. Therefore we may assume that the leaf a is contained in a pendant level-2 blob B. But aside from the leaf a and the single non-trivial cut-edge, no other cut-edges can be incident to B. Indeed, having another leaf that is contained in B violates the first condition, and having another non-trivial cut-edge contradicts the fact that B was pendant. Therefore B is a pendant level-2 blob of the form (1, 0, 0, 0) that contains a single leaf a.
Proof. We first show that the partition of d(x, a 1 ) exists and that it is unique. Let B denote the pendant level-2 blob containing (a 1 , . . . , a k ), and let q denote the vertex in B that is an endpoint of a non-trivial cutedge. Let x ∈ X ′ − {z}. Every path from x to a 1 consists of a path from x to q and a path from q to a 1 . There are four possible paths from q to a 1 of lengths 3, 4, k + 2, and k + 3. By an analogous argument used in the proof of Lemma 5.7, there is a unique partition of d(x, a 1 ) into four equal sized sets A, B, C, D such that A − 3 = B − 4 = C − (k + 2) = D − (k + 3). Upon replacing the pendant blob B by a leaf z, we note that the multiset of distances between a leaf x ∈ X ′ − {z} and z in N ′ is equivalent to the multiset of distances between x and q in N . This multiset of distances is precisely the set A − 3. Let y ∈ X ′ − {z} be another leaf that is not x. Then all paths between x and y in N are unaffected after replacing B by a leaf z; therefore d N ′ (x, y) = d N (x, y).
Pendant level-2 blobs with at least two chains Lemma 5.11. A level-2 network N on X contains a pendant level-2 blob of the form (k, ℓ, 0, 0) with chains a =  (a 1 , . . . , a k ) and b = (b 1 , . . . , b ℓ ) with k, ℓ ≥ 1 if and only if a and b are adjacent twice, and for all c ∈ a ∪ b, we have d m (c, x) ≥ 6 for all x ∈ X −(a∪b) and d m (c, y)+d m (c, z)−d m (y, z) ≥ 8 for any two leaves y, z ∈ X −(a∪b).
Proof. One direction follows an analogous argument used in the proof of Lemma 5.9.
To show the other direction, suppose that a and b are adjacent twice, and for all c ∈ a∪b, we have d m (c, x) ≥ 6 for all x ∈ X − (a ∪ b) and d m (c, y) + d m (c, z) − d m (y, z) ≥ 8 for any two leaves y, z ∈ X − (a ∪ b). Since a and b are adjacent twice, either a and b are contained in the same level-1 blob such that the cycle of the blob is up 1 p 2 . . . p k vq 1 q 2 . . . q ℓ u where p i and q j denote the neighbours of a i and b j for i ∈ [k], j ∈ [ℓ], respectively, and u and v are incident to non-trivial cut-edges, or a and b are contained in the same level-2 blob B in which a and b are on two different sides of B and there are no other vertices that subdivide these two sides of B (see Figure 7).
In the first case, let B denote the level-1 blob. We take leaves y and z that can be reached from B via the two non-trivial cut-edges. Without loss of generality, assume that k ≤ ℓ. Then the shortest path from y to z must pass through the neighbours of a i for all i ∈ [k]. But then for any c ∈ a, we have that which contradicts our original assumption.
In the second case, let B denote the level-2 blob and let e denote the side of B that does not contain a nor b. Since the network contains at least two blobs, the side e must be incident to at least one non-trivial cut-edge. Suppose for a contradiction that there are at least two cut-edges incident to the side e. Let p and q denote the vertices on side e such that if k ≥ 2 then they have shortest distance 3 and 4 from a 1 , respectively, and if k = 1 then they have shortest distance 3 and at most 4 from a 1 , respectively. Note first that the cut-edges incident to p and q must be non-trivial cut-edges -otherwise this would contradict our assumption that for any leaf x ∈ X − (a ∪ b), we have d m (a 1 , x) ≥ 6. Let y and z denote leaves that can be reached from B via the cut-edges incident to p and q, respectively. Then where the final inequality follows as d m (p, q) > 0. This is a contradiction. Therefore there is exactly one cut-edge that is incident to the side e, from which it follows that a and b are the only chains contained in a pendant level-2 blob of the form (k, ℓ, 0, 0).
Proof. The proof is analogous to that of Lemma 5.10.
Definition 5.13. A chain-adjacency graph (CAG) has a vertex for each chain, and between two vertices, • we insert a red edge if the chains are adjacent once and two red edges if the chains are adjacent twice; and • if the two chains are adjacent once, we insert a green edge for each length-5 path between endpoints of the chains (one per chain) that does not contain any edges of the two chains.
The condition for joining two vertices on the CAG via a green edge can indeed be verified from the multisets of distances. Let a = (a 1 , . . . , a k ) and b = (b 1 , . . . , b ℓ ) denote two chains that are adjacent once, and suppose without loss of generality that d m (a 1 , b 1 ) = 4. To count the number of green edges between a and b, we fall into the 9 cases shown in Table 1. This number is obtained by taking the multiplicity of 5's in the multiset of distances between a pair of endpoints, minus the number of length-5 paths that pass through edges of the chains. Let (A, m A ) = d(a 1 , b 1   We only insert green edges between chains that are adjacent, rather than between all chains that are distance-5 apart, to ensure that chains contained in different blobs are not connected in the CAG. Since we may assume that all leaves are contained in blobs, we note that two chains are adjacent and in the same blob if and only if they are connected by a red edge in the CAG. Note that there may be multiple edges between two vertices in a CAG (see Figure 8). We now show how we can use the CAG to distinguish the configurations of pendant blobs from non-pendant blobs, and how it can be used to distinguish the remaining level-2 pendant blob structures.
Observe that every edge in the CAG corresponds to some distinct distance-4 or distance-5 path between a pair of chain endpoints. We say that this path in the network is covered by the edge of the CAG. In particular, we also say that the edges of the path of the network is covered by this edge of the CAG. Note that an edge of a network can be covered by more than one edge of the CAG. See Figure 8 (c) for an example of a distance-5 path that is covered by an edge in the CAG. Proof. All other possible pendant level-2 blobs are of the form (k, 0, 0, 0) or of the form (k, ℓ, 0, 0). The CAG of the blob of the form (k, 0, 0, 0) is the singleton graph; the CAG of the blob of the form (k, ℓ, 0, 0) is two vertices connected by 2 red edges. The CAG for either of these two pendant blobs is not the same as any of the CAG for the four pendant blobs that we investigate here. Therefore we may distinguish the CAG of the pendant level-2 blobs from one another. Now we consider non-pendant level-2 blobs. First, if the blob contains no leaves then the CAG of such a blob is empty, so we are done. Hence, suppose that some non-pendant level-2 blob B contains some leaves. Observe that B can be obtained by introducing non-trivial cut-edges to one of the six possible level-2 pendant blobs.  In the CAG, the dashed lines represent the red edges and the solid lines represent the green edges. In (c), the green edge cd in the CAG covers the dotted path between c and d.
Suppose first that B can be obtained by introducing non-trivial cut-edges to a pendant blob of the form (k, 0, 0, 0). Then, B contains one or more chains on one side of the blob, and the possible CAGs would be a path (or disjoint paths) of red edges that connect adjacent chains, or if it contains a green edge, two vertices that are connected by 1 red and 1 green edge. However, none of these CAGs correspond to that of the four pendant blobs we consider here. Now suppose that B can be obtained by introducing non-trivial cut-edges to a pendant blob of the form (k, ℓ, 0, 0). Then, B contains one or more chains on two sides of the blob, and at least one non-trivial cut-edge on the third side. None of the edges in the CAG of B will cover an edge of this third side, since all paths between chain endpoints that uses this side will be of length at least 6. Therefore the only possible CAGs we can get on B is a cycle or a path (or paths) of red edges, or two vertices connected by 1 red and 1 green edge.
Suppose now that B can be obtained by introducing non-trivial cut-edges to one of the four remaining level-2 pendant blobs. Upon introducing non-trivial cut-edges to the pendant blob, either the number of chains on the blob increases or stays the same.
Suppose first that this number increases. In each of the four pendant blobs, we note that every chain is adjacent to every other chain on the blob. It is easy to check that adding non-trivial cut-edges to a pendant blob, which results in the increase in the number of chains on the blob, will return a blob in which every chain is not adjacent to every other chain. In particular, one side of B will contain at least two chains.
• It follows from here that at most three chains are pairwise adjacent in B. Therefore, non-pendant level-2 blobs cannot have a CAG that is the same as that of a pendant blob of the form (k, ℓ, m, n).
• So suppose there are three pairwise adjacent chains in B. There are two cases. Either the three chains are contained in distinct sides of B, or two of the three chains are contained in the same side of B. In the former case, we note that there is at least one side of B that contains two chains. Then, one of the three pairwise adjacent chains contained in this side of B cannot have an edge from it to the two other chains in the CAG, except for the red edge that shows their adjacency. In the latter case, there are exactly two chains on one side of B and one chain on another side of B that make up the pairwise adjacent chains. An edge between the chain vertices in the CAG excluding the red edge, if it exists, must correspond to some path between chain endpoints that uses the edges of the third side of B. But since B is a nonpendant blob, there must be at least one non-trivial cut-edge on this third side of B. Therefore any path between chain endpoints that uses this side must be of length at least 6. This implies that within the CAG, the three pairwise adjacent chains are connected by a single red edge between all pairs of vertices. Therefore, non-pendant level-2 blobs cannot have a CAG that is the same as that of a pendant blob of the form (k, ℓ, m, 0) nor (k, 0, m, n).
• Finally suppose that there are two chains that are adjacent in B. For the CAG of B on these two vertices to be the same as that of (k, 0, m, 0), we would need for the two distance-5 paths between chain endpoints to pass through (collectively) all three sides of B. However, there are at least two chains contained in one side of B, and thus at least one of these two distance-5 paths cannot exist. Therefore, non-pendant level-2 blobs cannot have a CAG that is the same as that of a pendant blob of the form (k, 0, m, 0).
On the other hand suppose that the number of chains on the blob stays the same upon adding non-trivial cut-edges to one of the four level-2 pendant blobs. Note that for these four cases, all edges of the pendant level-2 blobs that do not join the neighbours of leaves of the same chain are covered by at least one of the edges in its CAG. Upon inserting non-trivial cut-edges to obtain B, we see a change in color of the CAG edge that used to cover the bisected edge (from red to green), or a possible deletion of the edge (if the edge was green to begin with). This will clearly result in a blob B with a CAG that is different to that of the four level-2 pendant blobs we consider here. Now we consider the CAG of a level-1 blob. Observe that a CAG of a level-1 blob contains a green edge if and only if the level-1 blob contains two chains a = (a 1 , . . . , a k ) and b = (b 1 , . . . , b ℓ ) such that the cycle of the blob is up 1 p 2 . . . p k vwq 1 q 2 . . . q ℓ u, where p i and q j denote the neighbour of a i and b j for i ∈ [k], j ∈ [ℓ], respectively, and u, v and w are incident to non-trivial cut-edges. This does not result in any of the CAGs of the four pendant level-2 blobs. Therefore the CAG of a level-1 blob cannot be the same as that of a pendant level-2 blob of the forms (k, 0, m, 0); (k, ℓ, m, 0); (k, 0, m, n). Furthermore, at most 3 chains can be pairwise adjacent on a level-1 blob. Hence the CAG of a level-1 blob cannot be the same as that of a pendant level-2 blob of the form (k, ℓ, m, n).
Note that pendant level-2 blobs of the form (k, 0, m, 0) and (m, 0, k, 0) will have the same CAG; however, it is straightforward to find the chain that is on the same side of the blob as the non-trivial cut-edge. Given the two chains a and c in this case, c is on the same side of the blob as the non-trivial cut-edge if and only if |d(x, c m )| < |d(x, a k )| for all x ∈ X − (a ∪ c). Note also that we may identify the leaf on the chain that is closest to the non-trivial cut-edge, by taking the same leaf x and letting c m be the chain endpoint satisfying d m (c m , x) < d m (c 1 , x). A similar argument holds for the pendant level-2 blob of the form (k, 0, m, n), in identifying which chain is on the side of the blob without the non-trivial cut-edge. We now seek to replace these pendant level-2 blobs by a single leaf z and alter the multisets of distances accordingly.
Lemma 5.15. Let k, l, m, n ≥ 1, and let B be a pendant level-2 blob that is of the form (k, 0, m, 0); (k, ℓ, m, 0); (k, 0, m, n); or (k, ℓ, m, n). Then we can replace the pendant blob by a leaf z to obtain a network N ′ on X ′ = X ∪ {z} − (a ∪ b ∪ c ∪ d), such that the multisets of distances of N ′ contains the elements where if B is of the form • (k, 0, m, n), then we uniquely partition d(x, c m ) into three equal sized sets A, B, C such that A − 2 = B − (m + n + 3) = C − (k + m + n + 3).
Proof. The proof is analogous to that of Lemma 5.10.
We are now ready to prove Theorem 5.1.
Proof of Theorem 5.1. Let N be a level-2 network on X. We show by induction on |E(N )|, the number of edges in N , that level-2 networks are reconstructible from their multisets of distances. If N contains a cherry or a leaf that is not contained in a blob, then we can identify these structures and reduce them accordingly by Observation 5.2 or Lemma 5.5, respectively. Then upon reconstructing the reduced network by the induction hypothesis, we can undo the reduction by either replacing the leaf by a cherry or by reattaching the deleted leaf to the rightful cut-edge by Lemma 5.5. If N is a network on a single blob, then we may reconstruct it from its shortest distances by Lemma 5.3, and therefore from its multisets of distances.
We now assume that N is a level-2 network on at least two blobs such that every leaf is contained within blobs and that there are no pendant subtrees. We show that we may identify pendant blobs and replace them by a leaf. First note that a chain on a pendant level-1 blob can be identified by Lemma 5.6; Lemma 5.7 outlines how we can replace the blob by a leaf z and adjust the multisets of distances accordingly. It is easy to reconstruct the blob after reconstructing the reduced network, since we know the chain that is contained in the blob. For pendant level-2 blobs, recall that they are of the form (k, ℓ, m, n) where k, ℓ, m, n ≥ 0. The following list shows how all possible pendant level-2 blobs can be identified with one of the lemmas that we have proven before: • (k, 0, 0, 0) by Lemmas 5.8 and 5.9; • (k, ℓ, 0, 0) by Lemma 5.11; and • (k, 0, m, 0); (k, 0, m, n); (k, ℓ, m, 0); and (k, ℓ, m, n) by Theorem 5.14.
Replacing the pendant level-2 blobs by a leaf z and adjusting the multisets of distances accordingly for each case has been outlined in Lemmas 5.10, 5.12, and 5.15. It is easy to reconstruct the blob after reconstructing the reduced network, since we know which chains are on the same side of the blob as the non-trivial cut-edge.
Observe that every level-2 network has a cherry, exactly one blob, a leaf that is not contained in a blob, or a pendant blob. We have now shown that it is possible to identify these structures, to reduce them, and to add these structures back to the reduced network to obtain the original network. All these reductions decrease the number of edges of the network. Then by the induction hypothesis, we may reconstruct the reduced network from the modified distance matrix -since we can obtain the original network from the reduced network for each case, this completes the proof.

Discussion
We have considered the fundamental question of deciding which networks are uniquely reconstructible from the pairwise graph-theoretical distances between their leaves. We showed that level-1 networks are reconstructible from their shortest distances and that level-2 networks are reconstructible from their multisets of distances. We have also shown that networks of level higher than 1 and level higher than 2 are not reconstructible from their shortest distances and multisets of distances in general, respectively (Lemmas 3.1 and 3.2).
From a practical perspective, having the multisets of distances is not very realistic. For example, starting with sequence data, it is not clear how multisets could be produced in an accurate and efficient manner. As stated in [BT16], while it may be possible to obtain '...the set of different evolutionary path weights between a given pair of taxa, it seems hard to imagine how one might manage to measure the number of distinct evolutionary paths of a given observed weight.' Naturally, this points to the idea that perhaps we should investigate other types of distance matrices that are more restrictive when compared to the multisets of distances, that may be relatively easy to obtain from sequence data. Therefore in future research, it would be of interest to consider other distance matrices such as tree-average distances [Wil12] and sets of distances [BT16]. In particular, the two level-2 networks in Figure 2 have the same shortest distance matrices, but different sets of distances (i.e., the underlying sets of their multisets of distances are different). Therefore the question of whether a level-2 network is reconstructible from its set of distances remains open.
On a similar note, we wonder if there is some characterization of level-2 networks that are reconstructible from their shortest distances. We have already seen instances of this, for example when the level-2 network contains at most 3 leaves (Lemma 4.4) and when the network contains exactly one blob (Lemma 5.3). We conjecture that if every side of all blobs have enough incident edges, then they should provide enough information for unique reconstructibility. To motivate this conjecture, note that the networks in Figure 2 contain a level-2 blob of the form (2, 0, 0, 0). If every level-2 blob has at least two sides with enough cut-edges incident to them so that when they become pendant blobs upon reducing the network they are not of the form (k, 0, 0, 0), then is the network reconstructible from its shortest distances? A similar question can be posed for level-k networks for k ≥ 3. Can we characterize level-k networks that are reconstructible from their multisets of distances, or possibly from their shortest distances?
On the algorithmic side, the proofs of Theorems 4.2 and 5.1 outline the steps that can be taken to construct networks from distance data. Indeed, in both the level-1 and the level-2 cases, we describe how one can identify a cherry or a pendant blob, reduce it to a single leaf, and adjust the new distance matrices. Since all networks contain either a cherry or a pendant blob, we may recurse on the reduced instances until there is a tree or a single blob in the network, at which point we are done. The important question as to whether this algorithm can run in polynomial time remains open.
In practice, even if we are able to find efficient algorithms that can uniquely construct level-1/level-2 networks from their shortest/multisets of distances, it is important to bear in mind that variations in distances arising from real data sets may lead to inconsistencies which cannot be handled by such algorithms. One way to deal with such inconsistencies would be to consider a slight variant of the problem that we have solved. As in [CCYH18], we may wish to find an unrooted network in which the distance matrix elements correspond to the length of some, not necessarily the shortest, path between two taxa. Though we suspect that the output network will not necessarily be unique, it could nonetheless provide a solution that is consistent with the input data and therefore a useful starting point for making biological deductions.
Finally, a natural extension would be to see if our results generalize to edge-weighted networks. In addition to considering the network topology, weighted networks take into account edge weights which can, for example, represent the amount of genetic divergence that has occurred along each edge of the network. It has been shown that this additional information on the networks can lead to distinguishing two rooted networks on different topologies that display the same set of data (e.g., consider the three distinct rooted level-1 networks on three leaves that display the same set of trees) [PS15]. For level-1 networks (or for the more general cactus graphs), it was shown recently that while there may exist multiple level-1 networks that realize the same shortest distance matrix, there is a unique optimal edge-weighted network whose sum of edge weights is minimal [HHMM19]. It was also noted that this is not the case for edge weighted, level-2 networks by considering an example presented in [Alt88]. It could thus be of interest to ask whether if we consider optimality in terms of the multisets of distances instead, then is there a unique optimal level-2 network?