Least resolved trees for two-colored best match graphs

2-colored best match graphs (2-BMGs) form a subclass of sink-free bi-transitive graphs that appears in phylogenetic combinatorics. There, 2-BMGs describe evolutionarily most closely related genes between a pair of species. They are explained by a unique least resolved tree (LRT). Introducing the concept of support vertices we derive an $O(|V|+|E|\log^2|V|)$-time algorithm to recognize 2-BMGs and to construct its LRT. The approach can be extended to also recognize binary-explainable 2-BMGs with the same complexity. An empirical comparison emphasizes the efficiency of the new algorithm.


Introduction
Best match graphs recently have been introduced in phylogenetic combinatorics to formalize the notion of a gene y in species 2 being an evolutionary closest relative of a gene x in species 1, i.e., y is a best match for x [5]. The best matches between genes of two species form a bipartite directed graph, the 2-colored best match graph or 2-BMG, that is determined by the phylogenetic tree describing the evolution of the genes. 2-BMGs are characterized by four local properties [5,8] that relate them to previously studied classes of digraphs: Definition 1. A bipartite digraph G = (L, E) is a 2-BMG if it satisfies (N0) Every vertex has at least one out-neighbor, i.e., G is sink-free.
(N1) If u and v are two independent vertices, then there exist no vertices w and t such that (u, t), (v, w), (t, w) ∈ E.
(N3) For any two vertices u and v with a common out-neighbor, if there exists no vertex w such that either (u, w), (w, v) ∈ E, or (v, w), (w, u) ∈ E, then u and v have the same in-neighbors and either all out-neighbors of u are also out-neighbors of v or all out-neighbors of v are also out-neighbors of u.
Sink-free graphs have appeared in particular in the context of graph semigroups [1] and graph orientation problems [3]. Bi-transitive graphs were introduced in [4] in the context of oriented bipartite graphs and investigated in more detail in [7,8]. The class of graphs satisfying (N1), (N2), and (N3) are characterized by a system of forbidden induced subgraphs [12], see Thm. 2 below.
In general, best match graphs (BMGs) are defined as vertex-colored digraphs ( G, σ), where the vertex coloring σ assigns to each gene x the species σ(x) in which it resides. The subgraphs of a (G, σ) BMG induced by vertices of two distinct colors form a 2-BMG. Note that in this context the vertex coloring is assigned a priori, while Def. 1 induces a coloring that is unique only up to relabeling of the colors independently on each (weakly) connected component of G. For each BMG ( G, σ), there is a unique least resolved leaf-colored tree (T * , σ) with leaves corresponding to the vertices of ( G, σ) such that the arcs in ( G, σ) are the best matches w.r.t. (T * , σ) (cf. Def. 2 below). Fig. 1 shows an example for a 2-BMG together with its least resolved tree. Using certain sets of rooted triples that can be inferred from the 2-colored induced subgraphs of ( G, σ) with three vertices, it is possible to determine whether (G, σ) is a BMG in polynomial time and, if so, to construct the least resolved tree (T * , σ) [5,9]. This work also describes O(|V | 3 )-time algorithms for the recognition of 2-BMGs and the construction of the LRT for a given 2-BMG.
In this contribution, we derive an alternative characterization of 2-BMGs that avoids the use of rooted triples. This will give rise to an alternative, efficient algorithm for the recognition of 2-BMGs and the construction of the least resolved tree. The contribution is organized as follows: In Sec. 2, we introduce the necessary notation and review some results from the published literature that are needed later on. Sec. 3 is concerned with a more detailed analysis of the least resolved trees (LRTs) of BMGs with an arbitrary number of colors. We then turn to the peculiar properties of the LRTs of 2-BMGs in Sec. 4. To this end, we introduce the concept of "support leaves" that uniquely determine the LRT. The main result of this section is Thm. 4, which shows that the support leaves of the root can be identified directly in the 2-BMG. In Sec. 5, we then turn Thm. 4 into an efficient algorithm for recognizing 2-BMGs and constructing their LRTs. Computational experiments demonstrate the performance gain in practise. In Sec. 6 we extend the algorithmic approach to binary-explainable 2-BMGs, a subclass that features an additional forbidden induced subgraph.

Preliminaries
Let T = (V, E) be a tree with root ρ and leaf set L := L(T ) ⊂ V . The set of inner vertices of T is V 0 (T ) := V \L, in particular ρ is an inner vertex. An edge e = uv ∈ E(T ) is called an inner edge of T if u and v are both inner vertices. Otherwise it is called an outer edge. We consider leaf-colored trees (T, σ) and write σ( For the edges uv ∈ E(T ) we use the convention that uv ∈ E, v ≺ T u, v is a child of u. We write child T (u) for the set of children of u in T and T (u) for the subtree of T rooted in u. The least common ancestor lca T (A) is the unique T -smallest vertex that is an ancestor of all genes in A. Writing lca T (x, y) := lca T ({x, y}), we have Definition 2. Let (T, σ) be a leaf-colored tree. A leaf y ∈ L(T ) is a best match of the leaf x ∈ L(T ) if σ(x) = σ(y) and lca(x, y) T lca(x, y ) holds for all leaves y of color σ(y ) = σ(y).
Given (T, σ), the graph G(T, σ) = (V, E) with vertex set V = L(T ), vertex coloring σ, and with arcs (x, y) ∈ E if and only if y is a best match of x w.r.t. (T, σ) is called the best match graph (BMG) of (T, σ) [5].
We say that ( G, σ) is an -BMG if σ : V ( G) → S is surjective and |S| = . Given a directed graph G = (V, E) we denote the set of out-neighbors of a vertex x ∈ V by N (x) := {y ∈ V |(x, y) ∈ E( G)} and the out-degree |N (x)| of x by outdeg(x). Similarly, N − (x) := {y ∈ V |(y, x) ∈ E( G)} denotes the set of in-neighbors. By construction, the coloring σ of a BMGs ( G, σ) is proper, i.e., x ∈ N (y) implies σ(x) = σ(y), and there is at least one best match of x for every color s ∈ σ(V ) \ {σ(x)}. In particular, therefore, we have N (x) = ∅ for every 2-BMG, i.e., every 2-BMG is sink-free. Note that BMGs will in general have sources, i.e., N − (x) may be empty. We write G[W ] for the subgraph of Following [13] we say that T is displayed by T , in symbols T ≤ T , if the tree T can be obtained from a subtree of T by contraction of edges. For leaf-colored trees we say that (T, σ) displays or is a refinement of (T , σ ), whenever T ≤ T and σ(v) = σ (v) for all v ∈ L(T ).

Definition 4.
An edge e ∈ E(T ) is redundant with respect to G(T, σ) if the tree T e obtained by contracting the edge e satisfies G(T e , σ) = G(T, σ).
We will need the following characterization of redundant edges: In the following we will frequently need the restriction of the coloring σ on G or L(T ) to a subset of vertices or leaves. Since in situations like (G i , σ |V (Gi) ) the set to which σ is restricted is clear, we will write σ |. to keep the notation less cluttered.
BMGs can also be understood in terms of their connected components: is an -BMG if and only if all its connected components are -BMGs.
As a simple consequence of Prop. 1 and by definition of -BMGs, all connected components (G i , σ |. ) and (G j , σ |. ) of an -BMG satisfy σ(V (G i )) = σ(V (G j )) and |σ(V (G j ))| = . For our purposes it will also be important to relate the structure of a tree (T, σ) to the connectedness of the BMG G(T, σ) that it explains. Proposition 2 ([5], Thm. 1). Let (T, σ) be a leaf-labeled tree and G(T, σ) its BMG. Then G(T, σ) is connected if and only if there is a child v of the root ρ such that σ(L(T (v))) = σ(L(T )). Furthermore, if G(T, σ) is not connected, then for every connected component G i of G(T, σ) there is a child v of the root ρ such that V (G i ) ⊆ L(T (v)).
Moreover, 2-BMGs can be characterized by three types of forbidden subgraphs [12]. To this end we will need the following classes of small bipartite graphs: Definition 5 (F1-, F2-, and F3-graphs).
Although we aim at avoiding the use of triples in the final results, we will need them during our discussion. A triple ab|c is a rooted tree t on three pairwise distinct vertices {a, b, c} such that lca t (a, b) ≺ t lca t (a, c) = lca t (b, c) = ρ, where ρ denotes the root of t. A set R of triples is consistent if there is a tree T that displays all triples in R. Given a vertex-colored graph ( G, σ), we define its set of informative triples [5,11] as Lemma 2 ([11], Lemma 2.8 and 2.9). If ( G, σ) is a BMG, then every tree (T, σ) that explains ( G, σ) displays all triples t ∈ R( G, σ). Moreover, if the triples ab|b and cb |b are informative for ( G, σ), then every tree (T, σ) that explains Observation 3. Let (T, σ) be a tree explaining the BMG ( G, σ), and v ∈ V (T ) a vertex such that σ(L(T (v))) = σ(L(T )). Then (a, b) ∈ E( G) and a ∈ L(T (v)) implies b ∈ L(T (v)).
Finally, there is a close connection between subtrees of T and subgraphs of G(T, σ). We have

Properties of Least Resolved Trees
In this short section we derive some helpful properties of LRTs which we will use repeatedly throughout this work.
The converse of Lemma 4, however, is not true, i.e., a tree (T, σ) for which G(T (v), σ |. ) is connected for every v ∈ V (T ) with v ≺ T ρ T is not necessarily least resolved. To see this, consider the caterpillar tree (T, σ) given by (x , (x , (x, y))) with σ(x) = σ(x ) = σ(x ) = σ(y) and u = lca T (x, x ). It is an easy task to verify that the BMG of each subtree of T is connected. However, the edge ρ T u is redundant.
) is a BMG and, therefore, properly colored. But then G(T (v), σ |. ) is disconnected; a contradiction to Lemma 4.
As a consequence we find Corollary 1. Let (T, σ) be the least resolved tree of some BMG ( G, σ). Then any vertex v ∈ V (T ) with v ≺ T ρ T is an inner vertex if and only if |σ(L(T (v)))| > 1.

Support Leaves
In this section we introduce "support leaves" as a means to recursively construct the LRT of a 2-BMG. The main result of this section shows that these leaves can be inferred directly from the BMG without any further knowledge of the corresponding LRT. We start with a technical result similar to Cor. 3 in [5]; here we use a much simpler, more convenient notation. Lemma 6. Let (T, σ) be the least resolved tree of a 2-colored BMG ( G, σ). Then, for every vertex σ(L(T (u))). Since (T, σ) is 2colored, the latter immediately implies |σ(L(T (v)))| = 1 and, by Cor. 1, v is a leaf. Thus every u ∈ V 0 (T ) \ {ρ T } has a leaf v among its children, i.e. child T (u) ∩ L(T ) = ∅. If in addition ( G, σ) is connected, we can apply the same argumentation to u = ρ T and conclude that a leaf v is attached to ρ T . Lemma 6 states that, in the least resolved tree of a connected 2-colored BMG, every inner vertex u is adjacent to at least one leaf, and thus in a way "supported" by it.
Definition 6 (Support Leaves). For a given tree T , the set S u := child T (u) ∩ L(T ) is the set of all support leafs of vertex u ∈ V (T ).
Note that Lemma 6 is in general not true for -BMGs with ≥ 3, as exemplified by the (least- As a simple consequence of Prop. 2 and Cor. 1, we find Proof. Let v ∈ child T (ρ)∩V 0 (T ) = child T (ρ)\S ρ and consider the BMG G(T (v), σ |. ). By Lemma 4 and Lemma 3, then the statement is trivially satisfied. Therefore, suppose that | child T (ρ) \ S ρ | > 1. Hence, it remains to show that there are no arcs between G(T (v), σ |. ) and Cor. 1 and v ≺ T ρ imply that T (v) contains both colors. Thus, by Obs. 3, there are no out-arcs to any vertex in L(T ) \ L(T (v)), hence in particular there are no out-arcs (x, y) with x T v, y T w. By symmetry, the same holds for w, thus we can conclude that there are no arcs (y, x). From the observation that each x ∈ L(T ) \ S ρ must be located below some v ∈ child T (ρ) ∩ V 0 (T ), it now immediately follows that ( G − S ρ , σ |. ) consists exactly of these connected components as stated.
As a consequence, we have Corollary 3. Let (T, σ) with root ρ be the LRT of a 2-BMG ( G, σ). Then each child of ρ is either one of the support leaves S ρ of ρ or the root of the LRT for a connected component of ( G − S ρ , σ |. ).
Proof. Let (T, σ) with root ρ be the least resolved tree for ( G, σ). The support leaves S ρ are children of ρ by definition. By Lemma 7, the connected components of ( G − S ρ , σ |. ) are exactly the BMGs G(T (v), σ |. ) with v ∈ child T (ρ) \ S ρ . Moreover, by Lemma 3, the subtrees T (v) with v ∈ child T (ρ) \ S ρ are exactly the unique LRTs for these BMGs.
In order to use this property as a means of constructing the LRT in a recursive manner, we need to identify the support leaves of the root S ρ directly from the 2-BMG ( G, σ) without constructing the LRT first. To this end, we consider the set of umbrella vertices U ( G, σ) comprising all vertices x for which N (x) consists of all vertices of V ( G) that have the color distinct from σ(x).
The intuition behind this definition is that every support leaf of the root of the LRT of a 2-BMG must have all differently colored vertices as out-neighbors, i.e., they are umbrella vertices. We now define "support sets" of graphs as particular subsets of umbrella vertices. As we shall see later, support sets are closely related to support vertices in S ρ .
Proof. Assume, for contradiction, that ( G, σ) has (at least) two distinct support sets S, S ⊆ U ( G, σ). Clearly neither of them can be a subset of the other, since supports sets are maximal.
Together with the fact that S, S , and thus S ∪ S , are all subsets of U ( G, σ), this contradicts the maximality of both S and S .
For the construction of the support set S := S( G, σ), we consider the following sequence of sets, defined recursively by By construction S (k+1) ⊆ S (k) . Furthermore, there is a k < |V ( G)| such that S (k+1) = S (k) . Next we show that in a 2-BMG, S is obtained in a single iteration.
First, suppose that ( G, σ) is not connected. Then it immediately follows from Prop. 2 that σ(L(T (v))) = σ(L(T )) and thus |σ(L(T (v)))| > 1 for any v ∈ child T (ρ). The latter together with Cor. 1 implies that any child of ρ must be an inner vertex in T . Hence, S ρ = ∅. On the other hand, since ( G, σ) is not connected, each of its connected components is a 2-BMG (cf. Prop. 1), and thus, contains both colors. Therefore, for each vertex x in G, we can find a vertex y with σ(x) = σ(y) such that (x, y), (y, x) / ∈ E, and thus x / ∈ S. Since this is true for any vertex in G, we can conclude S = ∅ = S ρ . Now, suppose that ( G, σ) is connected. By Cor. 2, we have S ρ = ∅. We first show S ρ ⊆ S. Let x ∈ S ρ . By definition, x satisfies lca T (x, y) = ρ and therefore (x, y) ∈ E for all y ∈ L(T ) with σ(y) = σ(x), i.e., x has an out-arc to every differently colored vertex in G. By definition, we thus have x ∈ U . Now assume for contradiction that The latter implies that there exists a vertex y ∈ N − (x) such that y / ∈ U . In particular, (y, x) ∈ E. Since y / ∈ U , there is some vertex x with σ(x ) = σ(x) such that (y, x ) / ∈ E. Together this implies that xy|x is an informative triple. By Lemma 2, we obtain lca T (x, y) ≺ T lca T (x, x ) = lca T (x , y) T ρ; a contradiction to the assumption that x is a support leaf of ρ. Thus x ∈ S.
Next, we show by contraposition that S ⊆ S ρ . To this end, suppose that x is not a support leaf of ρ, i.e. x / ∈ S ρ . Hence, there is an inner vertex v ∈ child T (ρ) ∩ V 0 (T ) such that x ≺ T v. By Cor. 1, we conclude that |σ(L(T (v)))| = 2, i.e., the subtree T (v) contains both colors. We now distinguish two cases: (i) there is a leaf y ∈ L(T ) \ L(T (v)) with σ(y ) = σ(x), and (ii) there is no leaf y ∈ L(T ) \ L(T (v)) with σ(y ) = σ(x).

Case(ii):
Suppose that there is no leaf y ∈ L(T ) \ L(T (v)) with σ(y ) = σ(x). We will continue by showing that there is a support leaf y of vertex v with σ(y) = σ(x). Assume, for contradiction, that the latter is not the case. Since (T, σ) is least resolved, the inner edge ρv is not redundant. Hence, by Lemma 1, there must be an arc (a, b) ∈ E such that lca T (a, b) = v and σ(b) ∈ σ(L(T ) \ L(T (v))). Since there is no leaf y ∈ L(T ) \ L(T (v)) with σ(y ) = σ(x), we conclude that σ(b) = σ(x) and σ(a) = σ(x). Clearly, it holds a, b ∈ L(T (v)). Now consider an arbitrary a ∈ L(T (v)) with σ(a ) = σ(x). Since, by assumption, every such a is not a support leaf of v, there must be an inner vertex w ∈ child T (v) (v) with a ≺ T w. By Cor. 1 and since w ≺ T v ≺ T ρ, we conclude that |σ(L(T (w)))| = 2, i.e., the subtree T (w) contains both colors. Thus there is some b with σ(b ) = σ(x) and lca T (a , b ) T w ≺ T v. Since a was chosen arbitrarily, we conclude that there cannot be an arc (a, b) ∈ E such that lca T (a, b) = v; a contradiction. It follows that there is a support leaf y of vertex v with σ(y) = σ(x). Hence, lca T (x, y) = v T lca T (x , y) for all x ∈ L(T ) with σ(x ) = σ(x), and thus (y, x) ∈ E and y ∈ N − (x). Since S ρ = ∅ and σ(y) / ∈ σ(L(T )\L(T (v))), there must be a leaf x ∈ S ρ with σ(x ) = σ(x). The fact that lca T (x, y) = v ≺ T ρ = lca T (x , y) implies (y, x ) / ∈ E. Therefore and since σ(x ) = σ(y), it follows y / ∈ U . Together with y ∈ N − (x), we conclude that x / ∈ S (1) = S. In summary, we have shown S = S ρ for any BMG ( G, σ). Finally, S = S ρ together with Cor. 2 implies that S = ∅ if and only if ( G, σ) is connected, which completes the proof.

Algorithmic Considerations
Thm. 4 provides not only a convenient necessary condition for connected 2-BMGs but also a fast way of determining the support set S = S ρ and thus also a fast recursive approach to construct the LRT for a 2-BMG. It is formalized in Alg. 1 and illustrated in Fig. 2. Proof. Let (T, σ) be the (unique) least resolved tree of ( G, σ) with root ρ. The latter is supplied to Alg. 1 to initialize the tree. By Thm. 4, Lemma 9 and since ( G, σ) is connected, the set of support leaves S ρ = S (2) = S (1) = ∅ for the root ρ is correctly identified in the top-level recursion of Alg. 1 (Line 2-4) and attached to the root ρ (Line 8-9). According to Cor. 3, one can now proceed to Algorithm 1: LRT for connected 2-colored BMGs ( G, σ). recursively construct the LRTs for the connected components of ( G − S ρ , σ |. ), which is done in Line 10-15. By Lemma 7, these connected components ( G v , σ |. ) are exactly the BMGs G(T (v), σ |. ) with v ∈ child T (ρ) \ {S ρ } (Line 14). In particular, therefore, we have V ( G v ) = L(T (v)). Since v / ∈ S ρ , i.e., v is an inner vertex, Cor. 1 and v ≺ T ρ imply |σ(L(T (v)))| > 1. Hence, in particular, the condition |V ( G v )| > 1 (cf. Line 11) to proceed recursively is satisfied for each connected component.
Theorem 5. Given a connected properly 2-colored digraph ( G, σ) as input, Alg. 1 returns a tree T if and only if ( G, σ) is a 2-colored BMG. In particular, T is the unique least resolved tree for ( G, σ).
Proof. By Lemma 10, Alg. 1 returns the unique least resolved tree T if ( G, σ) is a connected 2colored BMG. To prove the converse, suppose that Alg. 1 returns a tree T given the connected properly 2-colored digraph ( G, σ) as input. We will show that ( G, σ) = G(T, σ), and thus ( G, σ) is a BMG.
It is easy to see that L(T ) = V ( G) must hold since, in each step of Alg. 1 every vertex is either attached to some inner vertex or passed down to a deeper-level recursion as part of some connected component. Therefore, every vertex of G eventually appears in the output. Thus σ(L(T )) = σ(V ( G)) and |σ(L(T ))| = |σ(V ( G))| = 2. It remains to show E( G) = E ( G(T, σ)).
Note first that neither ( G, σ) nor G(T, σ) contain arcs between vertices of the same color. Moreover, since Alg. 1 eventually returns a tree, we have S (1) = S (2) = ∅ in every recursion step. Throughout the remainder of the proof, we will write S (1) i and S (2) i for the sets S (1) and S (2) of the i th recursion step. Likewise, in every step, each connected component ( G v , σ |. ) computed in Line 10 must contain at least two vertices (cf. Line 11), and thus |σ(V ( G v ))| = 2 because ( G, σ) is properly 2-colored.
First, let S be the support set of G(T, σ) and x ∈ S be arbitrary. Note that the support set is computed in the first iteration step of the algorithm as S = S By construction of T , x is attached as a leaf to ρ, i.e. lca T (x, y) = ρ. Consequently, (x, y) is an arc in G(T, σ) for all y ∈ V ( G) with σ(y) = σ(x). By construction of S in Alg. 1, we have x ∈ S ⊆ U , i.e. x is an umbrella vertex in ( G, σ) and has out-arcs to every vertex y ∈ V ( G) with σ(y) = σ(x). Hence, all arcs of the form (x, y) with x ∈ S and σ(x) = σ(y) exist both in ( G, σ) and in G(T, σ). The latter property is in particular satisfied for all vertices in S and hence, all arcs between differently colored elements in S exist both in ( G, σ) and in G(T, σ). Now consider an arbitrary vertex y ∈ V ( G) \ S. Clearly, all in-neighbors in ( G, σ) of the elements in S = S 1 . Hence, y / ∈ S and x ∈ S implies that (y, x) is not an arc in ( G, σ). Moreover, y / ∈ S also implies that y is part of some connected component ( G v , σ |. ) of ( G − S, σ |. ). Therefore, and because Alg. 1 returns T , we must have y ∈ V ( G v ) = L(T (v)) for some inner vertex v ∈ child T (ρ). As argued above, ( G v , σ |. ) and thus also the subtree T (v) contain both colors. Together with Obs. 3 and x / ∈ L(T (v)), this implies that G(T, σ) does not contain the arc (y, x). By the same arguments, there is no arc (y, x ) in G(T, σ) such that the vertex x is contained in a different connected component ( G v , σ |. ) = ( G v , σ |. ) of ( G − S, σ |. ) than y. Since x ∈ S and y / ∈ S were chosen arbitrarily, we conclude that (i) any arc incident to some vertex in S exists in ( G, σ) if and only if it exists in G(T, σ), and (ii) G(T, σ) contains no arcs between distinct connected components of ( G − S, σ |. ). Hence, it remains to consider the arcs within a connected component ( G v , σ |. ) of ( G − S, σ |. ). Alg. 1 recurses on each such connected component ( G v , σ |. ) using a newly created vertex v ∈ child T (ρ) to initialize the tree T (v). By Lemma 3, it clearly holds that, for any x, y ∈ L(T (v)) = V ( G v ), (x, y) is an arc in G(T, σ) if and only it is an arc in G(T (v), σ). Thus, it suffices to consider only the subtree T (v). Now, we can apply the same arguments as in the previous recursion step to conclude that all arcs incident to the support set S (2) 2 constructed in the current recursion step are the same in ( G, σ) and G(T, σ) and that neither ( G, σ) nor G(T, σ) contain arcs between distinct connected components of ( G v − S (2) 2 , σ |. ). Hence, it suffices to consider the connected components of ( G v − S (2) 2 , σ |. ). Repeated application of this argumentation results in a chain of connected components that are contained in each other. Since Alg. 1 finally returns a tree, this chain is finite, say with a last element ( G w − S (2) k , σ |. ), and thus S (2) k = V ( G w ). In particular, therefore, every vertex in V ( G) is contained in the support set of some recursion step.
The construction in Lines 2-4 in Alg. 1 naturally produces two cases, U = S (1) = S (2) and S (2) ⊆ S (1) U . The following result shows that the latter case implies that the corresponding interior node in the LRT has only a single non-leaf descendant: Lemma 11. Let ( G, σ) be a 2-BMG and S ρ the support leaves of the root ρ of its LRT (T, σ). If W := U ( G, σ) \ S ρ = ∅, then the following statements are true: 1. S ρ = ∅, G is connected, and G − S ρ is connected.
2. All vertices in U ( G, σ) = S ρ ∪ · W have the same color, 3. The set of support leaves S v of the unique inner vertex child v of ρ contains vertices of both colors, and Proof. First recall that, by Thm. 4 and the definition of the support set S of ( G, σ), we have S ρ = S ⊆ U ( G, σ), and thus U ( G, σ) = S ρ ∪ · W . Moreover, by Lemma 7, the connected components of ( G−S ρ , σ |. ) are exactly the BMGs G(T (v), σ |. ) with v ∈ child(ρ)\S ρ . The vertices v ∈ child(ρ)\S ρ are all inner vertices of T since, by definition, the support leaves S ρ are exactly the children of ρ that are leaves. Together with the contraposition of Lemma 5 this implies that T (v) contains both colors. Statement 1: Let x ∈ W , which exists due to the assumption W := U ( G, σ) \ S ρ = ∅. Since x / ∈ S ρ , it must be part of some connected component of ( G − S ρ , σ |. ), say G(T (v), σ |. ) for some v ∈ child T (ρ) \ S ρ . Now assume, for contradiction, that G − S ρ consists of more than one connected component. By Lemmas 7 and 5, there is a vertex v ∈ child T (ρ) \ S ρ such that v = v and both subtrees T (v) and T (v ) contain both colors. Hence, there are distinct y ∈ L(T (v)) and y ∈ L(T (v )) with σ(y) = σ(y ) = σ(x). Together with x ∈ L(T (v)), we therefore have lca T (x, y) T v ≺ T ρ = lca T (x, y ), which implies (x, y ) / ∈ E( G). However, x ∈ W ⊆ U ( G, σ) and σ(y ) = σ(x) imply (x, y ) ∈ E( G); a contradiction. Hence, we conclude that G − S ρ has exactly one connected component, and thus ρ has a single inner vertex child v. Since T is phylogenetic, the latter implies that ρ must be incident to at least one leaf, i.e. S ρ = ∅. Together with Thm. 4 this in turn implies that G is connected. In summary, Statement 1 is true. Statement 2: Let x ∈ W as in the proof of Statement 1. By arguments analogous to those used for Statement 1, we conclude that σ(x) = σ(y) for every y ∈ S ρ , since otherwise we would obtain (x, y) / ∈ E( G), and thus a contradiction to x ∈ U ( G, σ). Since x ∈ W was chosen arbitrarily and S ρ is non-empty, we immediately obtain that all vertices in U ( G, σ) = S ρ ∪ · W have the same color, i.e., Statement 2 is true. Statement 3: Now consider the single inner vertex child v of ρ, and its set of support leaves S v , which must be non-empty by Lemma 6. Note that W must be entirely contained in L(T (v)) and recall that all vertices in S ρ ∪ · W are of the same color (cf. Statement 2). First suppose, for contradiction, that S v only contains vertices of the opposite color as the vertices in S ρ ∪ · W . This immediately implies S v ∩ W = ∅, thus every vertex x ∈ W must be located in a subtree T (w) of some inner vertex child w of v. Again by contraposition of Lemma 5, every such T (w) contains both colors. However, this contradicts (x, y) ∈ E( G) for every y ∈ S v , which must hold as a consequence of x ∈ W ⊂ U ( G, σ) and σ(y) = σ(x). Next suppose, for contradiction, that S v only contains vertices of the same color as the vertices in S ρ ∪ · W . In this case, we obtain that the edge ρv is redundant w.r.t. ( G, σ). To see this, consider an arc (x, y) ∈ E( G) such that lca T (x, y) = v.
Clearly, x must be directly incident to v, since otherwise the subtree below v to which x belongs would contain both colors, and thus contradict (x, y) ∈ E( G). In other words, every such vertex x is a support leaf of v, thus σ(x) = σ(S v ) = σ(S ρ ) and σ(y) = σ(S ρ ). In particular, there exists no arc (x, y) ∈ E( G) such that lca T (x, y) = v and σ(y) ∈ σ(L(T ) \ L(T (v))) = σ(S ρ ) and therefore, by Lemma 1, the inner edge ρv is redundant. However, this contradicts the fact that T is least resolved. In summary, only the case in which S v = ∅ contains vertices of both colors is possible, and thus Statement 3 is true. Statement 4: First, recall from the proof of Statement 3 that W ⊆ L(T (v)) for the single inner vertex child v of ρ. In order to see that W ⊆ S v , assume for contradiction that this is not the case. By similar arguments as used for showing Statement 3, this implies that some x ∈ W lies in a 2-colored subtree T (w) for some w ∈ child T (v) \ S v . Together with the above established fact that S v contains both colors, this contradicts x ∈ U ( G, σ). Finally, W = S v is a consequence of the fact that S v contains both colors (Statement 3) but W ⊆ S ρ ∪ · W contains only one color (Statement 2).
We now use this result to consider the performance of Alg. 1.  (2) ), and hence, outdeg(x) does not require updates. The in-neighborhoods N − (x) can be updated by removing the arcs between G − S (2) and S (2) as a consequence of Lemma 7 and Thm. 4. Since every arc appears exactly once in the removal, the total effort for these updates is O(|E|).
We continue by showing that every vertex needs to be considered as an umbrella vertex at most twice, and that the total effort of constructing all sets S (1) and S (2) is O(|E|), given that the umbrella vertices U can be obtained efficiently, which we discuss afterwards. To this end, we distinguish, for each of the single recursion steps, two cases: S (1) = U and S (1) U . First if S (1) = U , and thus also S (2) = S (1) = U , we consider each in-arc of x ∈ U . Since these vertices and their corresponding arcs are removed when constructing G − S (2) , they are not considered again in a deeper recursion step. In the second case, we have S (1) U , which together with S (2) = S (1) implies W := U \ S (2) = ∅, and only the vertices in U \ W are removed. However, Lemma 11 guarantees that, for a 2-BMG as input graph, the vertices in W appear as support leaves in the next step and thus appear in the update of U , S (1) , and S (2) no more than a second time. In order to use the properties in Lemma 11 for the general case (i.e. ( G, σ) is not necessarily a BMG), we can, whenever W = ∅, (i) check that G − S (2) only has a single connected component G v , and (ii) pass down the set W to the recursion step on G v in which the condition W S (2) is checked. If any of these checks fails, we can exit false. This way, we ensure that every vertex appears at most two times as an umbrella vertex in the general case. To construct S (1) from U , we have to scan the in-neighborhood N − (x) of each vertex x ∈ U and check whether N − (x) ⊂ U . We repeat this step to construct S (2) from S (1) . Membership in U and S (1) , resp., can be checked in constant time (e.g. by marking the vertices in the current set U ). Since we have to consider each vertex, and hence, each in-neighborhood at most twice, all sets S (1) and S (2) can be obtained with a total effort of O(|E|).
It remains to show that the input graph can be decomposed efficiently in such a way that the connectivity information is maintained and the candidates for umbrella vertices in each component are updated. The connected components G v can be obtained by using the dynamic data structure described in [6], often called HDT data structure. It maintains a maximal spanning forest representing the underlying undirected graph with edge set E = {xy | (x, y) ∈ E or (y, x) ∈ E}, and allows deletion of all | E| ∈ O(|E|) edges with amortized cost O(log 2 |V |) per edge deletion.
The explicit traversal of the connected components to compute U can be avoided as follows: Since outdeg(x) does not require updates, we can maintain a doubly-linked list of vertices x for each color i ∈ {1, 2}, and each value of outdeg(x) where σ(x) = i. In order to be able to access the highest value of the out-degrees, we maintain these values together with pointer to the respective doubly-linked list in balanced binary search trees (BST), one for each color and each connected component. The BSTs for the two colors are computed first for ( G, σ) in O(|V | log(|V |)) time and afterwards updated to fit with the out-degree of the currently considered component G v . To update these lists and BSTs for G v , observe first that G v can be obtained from G by stepwise deletion of single arcs, i.e. edges in the HDT data structure representing the underlying undirected versions. We update, resp., construct the pair of BSTs (one for each color) for each connected component as follows: Since a single arc deletion splits a connected component G into at most two connected components G 1 , and G 2 , we can apply the well-known technique of traversing the smaller component [14]. The size of each connected component can be queried in O(1) time in the HDT data structure. Suppose w.l.o.g. that |V ( G 1 )| ≤ |V ( G 2 )|. We construct a new pair of BSTs for G 1 , and delete the vertices V ( G 1 ) and the respective degrees from the two original BSTs for G, which then become the BSTs for G 2 . More precisely, we delete each vertex x ∈ V ( G 1 ) in the respective list corresponding to outdeg(x), and if the length of this list drops to zero, we also remove the corresponding out-degree in the BST. Likewise, we insert the out-degree of x and an empty doubly-linked list into the newly-created BST for G 1 , if it is not yet present, and append x to this list. Note that the number of out-degree deletions and insertions does not exceed |V ( G 1 )|. Due to the technique of traversing the smaller component, every vertex is deleted and inserted at most log |V | times. Therefore, we obtain an overall complexity of O(|V | log 2 |V |) for the maintenance of the BSTs where the additional log-factor originates from rebalancing the BSTs whenever necessary.
In each recursion step, the set U can now be obtained by listing (at most) the vertices with the maximal out-degree for each of the two colors. Finding the two out-degrees and corresponding lists in the BSTs requires O(log |V |) in each step, and thus O(|V | log |V |) in total. In order to determine whether these candidates x are actually umbrella vertices, we have to check whether The HDT data structure allows constant-time query of the size of a given connected component, since this information gets updated during the maintenance of the spanning forest. By the same means, we can keep track of the number of vertices of a specific color in each connected components. Note that we only need to do this for one color r |. This does not increase the overall effort for maintaining the data structure since it happens alongside the update of |V (G v )|.
In summary, the total effort is dominated by maintaining the connectedness information while deleting O(|E|) arcs, i.e., O(|E| log 2 |V |) time.
As a direct consequence of Thm. 4 the LRT of a disconnected graph G is obtained by connecting the roots of the LRTs T v of the connected components G v to an additional root vertex, see also [5,Cor. 4]. Lemma 12 thus implies In order to illustrate the improved complexity for the construction of LRTs of 2-BMGs, we implemented both the well-known triple-based approach, i.e., the application of BUILD [2] with the informative triples defined in Eq. (1) as input, and the new approach of Alg. 1. As input, we used 2-BMGs that where randomly generated as follows: First, we simulate random trees T recursively, starting from a single vertex, by attaching to a randomly chosen vertex v either a single leaf if v is an inner vertex of T or a pair of leaves if v was a leaf. The construction stops when the desired number of leaves is reached. Note that the resulting tree is phylogenetic by construction. Each leaf is then colored by selecting at random one of the two colors. Finally, we compute the 2-BMG G(T, σ) from each of the simulated leaf-colored trees (T, σ).
Both methods for the LRT computation were implemented in Python. Moreover, we note that we did not implement the sophisticated dynamic data structures used in the proof of Lemma 12, but a rather naïve implementation of Alg. 1. Nevertheless, Fig. 4 shows a remarkable improvement   of the running time when compared to the general O(|V | |E| log 2 |V |) approach for -BMGs detailed in [5]. Empirically, we observe that the running time of Alg. 1 indeed scales nearly linearly with the number of edges.

Binary-explainable 2-BMGs
Binary phylogenetic trees are of particular interest in practical applications. Not every 2-BMG can be explained by a binary tree. The subclass of binary-explainable ( -)BMG are characterized among all BMGs by the absence of single forbidden subgraph called hourglass [10,11], illustrated in Fig. 5. In this section we briefly describe a modification of Alg. 1 that allows the efficient recognition of binary-explainable 2-BMGs.
A graph ( G, σ) is called hourglass-free if it does not contain an hourglass as an induced subgraph. We summarize Lemma 31 and Prop. 8 in [11] as Proposition 3. For every BMG ( G, σ), the following three statements are equivalent: