On minimum spanning tree-like metric spaces

We attempt to shed new light on the notion of 'tree-like' metric spaces by focusing on an approach that does not use the four-point condition. Our key question is: Given metric space $M$ on $n$ points, when does a fully labelled positive-weighted tree $T$ exist on the same $n$ vertices that precisely realises $M$ using its shortest path metric? We prove that if a spanning tree representation, $T$, of $M$ exists, then it is isomorphic to the unique minimum spanning tree in the weighted complete graph associated with $M$, and we introduce a fourth-point condition that is necessary and sufficient to ensure the existence of $T$ whenever each distance in $M$ is unique. In other words, a finite median graph, in which each geodesic distance is distinct, is simply a tree. Provided that the tie-breaking assumption holds, the fourth-point condition serves as a criterion for measuring the goodness-of-fit of the minimum spanning tree to $M$, i.e., the spanning tree-likeness of $M$. It is also possible to evaluate the spanning path-likeness of $M$. These quantities can be measured in $O(n^4)$ and $O(n^3)$ time, respectively.


Introduction
Historically, graphs as finite metric spaces have been extensively studied [5]. Even though we approach them differently, we would like to emphasise, amongst others [11,15,16,17], the classical result provided by Buneman [6]. In short, a metric on a finite set can be realised by the shortest path metric in a positiveweighted tree if and only if it satisfies the four-point condition. Not only is it frequently quoted in the context of evolutionary trees [14], but it is also known for its direct connection to the theory of Gromov hyperbolic metric spaces [8]. Approximately two decades later after Buneman's theorem, Hendy [10] proved the existence of a unique tree representation for every metric satisfying the four-point condition.
Given this background, a metric space that satisfies the four-point condition is commonly considered tree-like. However, an important caveat should be addressed: the four-point condition is necessary and sufficient to ensure the existence of a partially labelled tree that realises a given metric [5,10,14]. For example, a complete graph with a uniform edge length clearly satisfies the four-point condition, but it only becomes tree-like after an extra vertex is added. In this case, the four-point condition does not ensure that a metric is realised by a fully labelled tree on the same set. It does not characterise the distance within trees, in general, but rather the shortest path metrics induced by graphs of a certain class, called block graphs (i.e., graphs in which all biconnected components are complete subgraphs) [2]. This may not create an issue in the field of conventional phylogenetics, but considering the recent surge of renewed biological interest in minimum spanning tree (MST)-based tree estimation [13], determining when a metric space is realised by a positive-weighted tree on the same set is not only a natural undertaking but also a meaningful one. Thus far, this problem has not been properly recognised, much less addressed. The only two exceptions to this are the recent work provided in [1] and in [9]. It seems to be a non-trivial question not only because it cannot be answered using Buneman's theorem, but also because it is equivalent to determining a method for recognising a special case of the metric travelling salesman problem (TSP). If an input-a metric on a set of cities-is the shortest path metric in a tree on the city set, the length of the optimal tour must equal twice the length of the MST.
In this paper, we examine the sub-type of tree metrics without relying on the four-point condition. Our work is based on three ingredients: the so-called tiebreaking assumption, which has been popular in algorithmic applications since the work provided by Kruskal in [12]; what we call the fourth-point condition, which can typically be found in the definition of median metric spaces [7]; and a simple trick for metric-preserving edge removal, which applies to any finite metric space. These concepts, which are part of our original results, are defined and discussed in Section 2.
As expected, if it exists, a fully labelled positive-weighted tree that realises a finite metric space is the unique MST in its associated weighted complete graph (Proposition 2.13). Our goal is to prove the following: A finite metric space under the tie-breaking rule is realised by the MST if and only if it satisfies the fourth point condition (Theorem 3.1). This implies that every finite median graph, in which the shortest path lengths between all pairs of vertices are distinct, is necessarily a tree (Corollary 3.3). This result also yields a stronger condition for understanding when a finite metric space is realised, especially by a spanning path graph (Corollary 3.5). We define and discuss the notion of a spanning tree-likeness of a finite metric space in Section 4.

Preliminaries
We apply the metric-related terminology provided in [7] throughout this paper. Let (X, d M ) be a finite metric space, that is, a finite set, X, equipped with metric d M . For two distinct points x and x ′ in X, the closed metric interval between them is defined to be the set All graphs considered in this paper will be simple, undirected, fully labelled (i.e., each vertex is labelled), and positive weighted (i.e., each edge has a positive length). A graph is denoted (V, E; w) for a set, V , of labelled vertices and a set, E, of edges that are associated with a positive edge-weighting function, w : E → R + . Given graph G, the sets of vertices and edges are denoted V (G) and E(G), respectively. Moreover, graph G is said to be a graph on V (G). Vertices may be renamed as needed, assuming no confusion arises, and a vertex labelled 'x' is referred to as vertex x. The distance in graph G is defined to be the shortest path metric and is represented using d G .
Assume M is a finite metric space, (X, d M ). Let K M be the associated weighted complete graph with M . An edge of K M that joins two distinct vertices, x and x ′ , is denoted e(x, x ′ ). This paper uses the terms 'points' and 'vertices' interchangeably because there is a one-to-one correspondence between X and V (K M ) for any finite metric space M .  , is said to satisfy the fourth-point condition if, for every (not necessarily distinct) three points x, y, z ∈ X, there exists a point, p * ∈ X, such that Proof. Suppose that there are two quartets, {x, y, z, p * 1 } and {x, y, z, Proposition 2.4. The following is equivalent to saying that finite metric space (X, d M ) satisfies the fourth-point condition: For every (not necessarily distinct) three points x, y, z ∈ X, there exists only one point p * ∈ I(x, y) ∩ I(y, z) ∩ I(z, x). Remark 2.5. Fourth point p * is also known as the median for {x, y, z} because it minimises the sum of the distances to the three points, and a metric space satisfying the fourth-point condition (or a graph inducing this kind of metric space) is said to be median [2,7]. Although a discussion of this topic is provided in [2,3], it should be noted that median graphs include multiple types of graphs other than trees, such as grid and square graphs.
Proof. Without loss of generality, we can assume Clearly, y ∈ I(x, y) ∩ I(y, z). Therefore, I(x, y) ∩ I(y, z) ∩ I(z, x) = ∅ if and only if y ∈ I(z, x). Under the assumption that the length of C is fixed at c, this is equivalent to stating that d C (z, x) = c/2. Thus, I(x, y) ∩ I(y, z) ∩ I(z, x) = ∅ if and only if d C (z, x) = c/2. Applying Proposition 2.4 completes the proof.

Basic geodesic graphs.
In this subsection, we present a simple trick for metric-preserving edge removal, which can be used to represent an arbitrary finite metric space as a graph with the fewest edges. Let M be a finite metric space, (X, d M ), and assume K M is the weighted complete graph associated with M .
Definition 2.8. Given x, x ′ ∈ X, the edge, e(x, x ′ ), of K M is said to be non-basic if there is a permutation, (x 1 , x 2 , · · · , x k ), on a non-empty subset of X \ {x, x ′ } such that cyclic permutation (x, x 1 , x 2 , · · · , x k , x ′ ) satisfies The edge is called basic otherwise. Proposition 2.9. Let x, y, z be three different vertices of K M . When the three edges, e(x, y), e(y, z), and e(z, x), of K M are basic, the fourth point, p * , does not exist for {x, y, z}. If a non-basic edge exists, say e(x, y), points x and y are the only two candidates for p * .
The proof of this proposition is straightforward. Proof. It suffices to prove that G M is connected. Assuming that e(x, x ′ ) is nonbasic, we show that there is a path of basic edges joining x and x ′ in K M . We also note that they are obviously connected in G M if e(x, x ′ ) ∈ E(K M ) is basic. Let C be a cycle with the greatest number of vertices (or edges) of all cycles in K M that share edge e(x, x ′ ) and overall length 2d where Y := {x 1 , · · · , x k } is a non-empty subset of X \ {x, x ′ }, as in Definition 2.8. Furthermore, suppose d C is the shortest path metric induced by C and x i , x j ∈ V (C). If a path existed in K M joining x i and x j that was shorter than d C (x i , x j ), then edge e(x, x ′ ) would be longer than the path connecting x and x ′ through x i and x j . Therefore, any path in K M joining two vertices in V (C) must have a length greater than or equal to d C (x i , x j ). We use this fact at the end of the proof.
In order to obtain a contradiction, we suppose e(y, y ′ ) ∈ E(C) \ e(x, x ′ ) is nonbasic. We define C ′ to be a cycle in K M of overall length 2d M (y, y ′ ) with e(y, y ′ ) ∈ E(C ′ ), which is similar to our previous case except that |V (C ′ )| is unimportant. Let V (C ′ ) = {y, y ′ } ∪ Z, where Z := {y 1 , · · · , y l } ⊆ X \ {y, y ′ }. By Definition 2.8, if a cycle contains a non-basic edge, then it must be strictly longer than the other edges in the cycle. This implies that the number of non-basic edges contained in each cycle is zero or one. Thus, e(y, y ′ ) is shorter than e(x, x ′ ), and e(y, y ′ ) is the longest edge in E(C ′ ). Therefore, we can conclude that e(x, x ′ ) is not in E(C ′ ).
The assumption on |V (C)| provides Y ∩ Z = ∅. Our hypothesis ensures a path in K M of length d C (y, y ′ ) that connects y and y ′ via y ′′ ∈ Y ∩Z. This implies that K M contains a path joining y and y ′′ of length less than d C (y, y ′ ). If we assume that y ′ lies in the shortest path joining y and y ′′ in C (note that the roles of y and y ′ can be exchanged), then we have d C (y, y ′ ) < d C (y, y ′′ ). It follows that there is a path that joins y and y ′′ in K M of length less than d C (y, y ′′ ). This is a contradiction. Hence, e(y, y ′ ) is basic, which completes the proof. (2) Suppose that M is realised by fully labelled tree T on X. This implies that each edge of T has a positive weight. We can recover K M from T by summing the weights along every path in T that has two or more edges. This process indicates that T is isomorphic to the basic geodesic graph in K M . Hence, given (1), we know T is unique.
Remark 2.14. Proposition 2.13 states that a metric space is uniquely realised by the only MST if it is a spanning tree metric space. Note that we do not need Buneman's four-point condition in the argument (cf. [1]). Concerning the uniqueness of the MST, the tie-breaking rule is a well-known sufficient condition established by Borůvka [4] (cited in [12]) and by Kruskal [12]. The next section explores its relation to spanning tree metric spaces. Proof. (i) The fourth-point condition clearly holds for all spanning tree metric spaces. (ii) If d M is not a spanning tree metric on X, then we will show that there is a triplet in X that violates the fourth-point condition. According to Lemma 2.11, our assumption implies that the basic geodesic graph, G M = (X, B; λ), in K M contains at least one cycle. Suppose C := (X k , B k ; λ k ) is the shortest cycle in G M , where X k ⊆ X, B k ⊆ B, |X k | = |B k | = k, and λ k is the restriction of λ to B k . Then Proposition 2.9 yields k ≥ 4. Let c denote the sum of the λ k over all elements in B k . Also, assume that d C is the shortest path metric in C. For all i, j ∈ X k , no path in G M joining i and j has a shorter length than d C (i, j) (otherwise, C would not be the shortest cycle in G M ). Therefore, d C (i, j) = min{a ij , c − a ij }, in which a ij represents the length of the path in C that travels from i to j in a clockwise direction. Consider a route in which we visit the points in X k . Let s ∈ X k be the starting point from which we travel along the circle in a clockwise direction. We assign a label, 'L' or 'R', to every point i ∈ X k \{s}: label 'L' is assigned if a si < c/2, and we use label 'R' if a si ≥ c/2. If every point in X k \{s} was labelled 'L', the last edge we would traverse returning to s would be non-geodesic or non-basic. Therefore, there exists one and only one basic edge between vertices labelled 'L' and 'R'. Suppose that t signifies the last point with label 'L' and u indicates the first point with label 'R' as on the left in Figure 2. Note that d M (s, t) + d M (t, u) + d M (u, s) = c.

Main results
We assume that p * exists for {s, t, u} (otherwise, the assertion of the theorem immediately follows). Lemma 2.6 gives us max{d M (s, t), d M (t, u), d M (u, s)} = c/2. Thus, d M (u, s) = c/2 (the edge joining t and u is basic, and d M (s, t) < c/2). Let v( = u) be a point in X k with label 'R' that is between u and s as on the right in Figure 2. We know point v exists because e(u, s) would be non-basic otherwise. According to the tie-breaking rule, we note that a tv = c − a tv . We can also set a tv < c − a tv in order to select {s, t, v}. Although we should select {t, u, v} when a tv > c − a tv , we limit our consideration to the former case. Therefore, we have Corollary 3.3. Let G be a median graph on finite set X and let d G be the shortest path metric of G. If each pair in X has a different value for d G , then G is a tree.
Remark 3.4. As was mentioned in Remark 2.5, the fourth-point condition per se is not a sufficient condition, but it is a necessary condition in order to ensure that a finite metric space is induced by the shortest path metric in a tree (cf. a cycle graph on four vertices with a uniform edge length). three points x, y, z ∈ X, we have The condition can be confirmed in O(|X| 3 ) time. If M is a spanning path metric space, it is realised by the unique shortest path that joins the farthest two points in X.
Proof. We only prove the first statement. The three-point condition obviously holds for all spanning path metric spaces. Therefore, we assume that the threepoint condition holds and show that the basic geodesic graph, G M , in K M is a path graph on X. It is clear that y is the fourth point, p * , for {x, y, z} when the left-hand side equals d M (z, x). This means that the fourth-point condition automatically holds for any finite metric space that satisfies the three-point condition. Therefore, our assumption implies that G M is a tree on X. The three-point condition also indicates that every vertex in G M has a degree of one or two. In other words, if vertex x has degree three or more, then any three distinct vertices adjacent to x would violate the three-point condition. Hence, G M is a path graph on X, which completes the proof.

Discussion
The hyperbolicity of finite metric spaces (or graphs) is a concept provided by Gromov [8,14] and measures the deviance of a metric space from Buneman's fourpoint condition. If a metric space, M , satisfies the four-point condition, then the hyperbolicity of M equals 0, and M is said to be 0-hyperbolic. As was previously discussed, any complete graph with a uniform edge length is 0-hyperbolic. Because the four-point condition is a stronger version of the triangular inequality, all metric triangles are also 0-hyperbolic. Therefore, although the value of hyperbolicity is usually called the 'tree-likeness' of M , a more precise interpretation refers to the partially labelled tree-likeness of M . Therefore, as a final remark, we provide the notion of a fully labelled tree-likeness of M .
Let us say that finite metric space M is ρ-roundabout. Here, ρ is defined to be max x,y,z∈X  The degree of violation of the three-point condition similarly provides the spanning path-likeness of M -the maximum discrepancy between the left and right-hand sides of the triangular inequality. On the other hand, hyperbolicity does not provide any information because all metric triangles are 0-hyperbolic.