Computing consensus networks for collections of 1-nested phylogenetic networks

. An important and well-studied problem in phylogenetics is to compute a consensus tree so as to summarize the common features within a collection of rooted phylogenetic trees, all whose leaf-sets are bijectively labeled by the same set X of species. More recently, however, it has become of interest to ﬁnd a consensus for a collection of more general, rooted directed acyclic graphs all of whose sink-sets are bijec-tively labeled by X , so called rooted phylogenetic networks . These networks are used to analyze the evolution of species that cross with one another, such as plants and viruses. In this paper, we introduce an algorithm for computing a consensus for a collection of so-called 1-nested phylogenetic networks. Our approach builds on a previous result by Rosell´o et al. that describes an encoding for any 1-nested phylogenetic network in terms of a collection of ordered pairs of subsets of X . More speciﬁcally, we characterize those collections of ordered pairs that arise as the encoding of some 1-nested phylogenetic network, and then use this characterization to compute a consensus network for a collection of t ≥ 1 1-nested networks in O ( t | X | 2 + | X | 3 ) time. Applying our algorithm to a collection of phylogenetic trees yields the well-known majority rule consensus tree. Our approach leads to several new directions for future work, and we expect that it should provide a useful new tool to help understand complex evolutionary scenarios.


Introduction
In recent years, phylogenetic networks have become an important tool for analyzing the evolution of species, and their study is an active area in phylogenetics [11,19]. Given a finite non-empty set X of species, a (rooted) phylogenetic network on X is a directed acyclic graph with a single source vertex ρ (called the root) whose set of sinks (also called leaves) are in bijective correspondence  with the species in X (see e.g. Figure 1(a)). Note that it is usually assumed that such networks do not contain vertices whose indegree and outdegree are both 1. Phylogenetic networks generalize (rooted) phylogenetic trees, networks in which every vertex has indegree at most 1. They are particularly useful in studying the evolution of species which cross or hybridize with one another (such as plants or viruses [26] ) since they permit the representation of evolutionary events such as hybridization and recombination. This is not possible using phylogenetic trees since, by their very nature, trees only permit the representation of speciation events (see e.g. [2] for more details).
A well-studied class of phylogenetic networks is the class of 2-hybrid, 1-nested networks [28] which are defined as follows. A phylogenetic network is 2-hybrid if every vertex has indegree at most 2 (see e.g. Figure 1(b)). A reticulation cycle in a phylogenetic network consists of two directed paths that have the same start vertex and the same end vertex but no other vertices in common. A 2-hybrid phylogenetic network is 1-nested if no pair of reticulation cycles have an arc in common (see e.g. Figure 1(c)). Important subclasses of 2-hybrid, 1-nested networks include galled trees (in which no pair of reticulation cycles have a vertex in common [14]) and level-1 networks (in which every reticulation cycle contains only one vertex with indegree 2 [8]). In the rest of this paper, we refer to 2-hybrid, 1-nested phylogenetic networks simply as 1-nested networks. Various software packages can be employed to compute 1-nested networks from biological datasets including Dendroscope [20], Lev1athan [18], PhyloNet [36] and Trilonet [27]. These programs have been used to generate 1-nested networks in applications such as the evolution of complex traits [16,Fig. S1] and corals [23,Fig. 2].
Since alternative 1-nested networks may result for a dataset depending on which software is used to compute them, it is of interest to develop new approaches to find a consensus for a collection C of 1-nested networks in the form of a single 1-nested network. The overarching aim is that this consensus network should exhibit structures that are shared by many of the networks in C (see Figure 2 for an example). Note that the more specific problem of finding a consensus for a collection C of phylogenetic trees on X has been considered in phylogenetics for many years (see [5] for a comprehensive review), and it is also well-studied in classification theory (see [22] for a review). One of the most popular consensus methods used for phylogenetic trees, is the majority rule [24] approach, which we now recall.
First, each tree T ∈ C is broken down into the set C(T ) of clusters that it induces on the set X (i.e. the collection of subsets of X, one subset C(u) for each vertex u in T , such that C(u) contains those x ∈ X that can be reached from u by a directed path in T ; see Figure 3(a)). Then those clusters in C(T ) that are induced by more than half of the trees in C are kept. It can be shown that the resulting set of clusters uniquely defines, or encodes, a phylogenetic tree on X. The phylogenetic tree obtained in this way is called the majority rule consensus tree of C.
Note that the majority rule approach has been extended to unrooted phylogenetic networks (see e.g. [15]). Biological examples of unrooted consensus networks that result from the application of such approaches include [30,Fig. 4] and [33,Fig. 4]. These examples also illustrate that networks in biological applications may have the property that no two cycles have an edge in common, which, in the rooted setting, corresponds to 1-nested networks. The problem of directly computing a consensus for a collection of rooted phylogenetic networks, however, remains relatively unexplored (see [20] where some approaches are mentioned), even though from a biological point of view a phylogenetic tree or network should preferably be rooted to explicitly represent the evolution of the species under consideration (see e.g. [6,21]). In this paper, we shall generalize the majority rule method to 1-nested networks and, in this way, obtain a consensus network for any collection of such networks. We now briefly outline our approach. First note that the definition of the set C(T ) of clusters induced by a phylogenetic tree T can also be applied more generally to phylogenetic networks N , and we denote by C(N ) the set of clusters induced by N . In general, however, the set C(N ) does not encode N (see [12,13]). Therefore, we consider set pairs on X instead of clusters. Set pairs are ordered pairs (S, H) of subsets of X with S = ∅ and S ∩ H = ∅. Each vertex u in a phylogenetic network N on X induces such a set pair by putting S to be the set of those elements in the cluster C(u) that can be reached from the root of N only by directed paths that contain u and putting H = C(u) \ S. Consider, for example, the 1-nested network in Figure 3(b). Since the elements in the subset {a, b, c} of X are precisely those that can be reached from the root ρ by a directed path that contains the vertex v, we have C(v) = {a, b, c}. Moreover, since none of the elements in {a, b, c} can be reached from the root ρ by a directed path that avoids vertex v, we have S(v) = C(v) and H(v) = ∅. In contrast, for vertex u in Figure 3(b), we have C(u) = {a, b} but b can also be reached from the root ρ by a directed path that avoids vertex u. Therefore, we have S(u) = {a} and H(u) = {b}.
It follows from [29,Corollary 5] that the equivalence class of every 1-nested network N (with respect to a natural equivalence relation on phylogenetic networks described in Section 2) is encoded by the set θ(N ) of set pairs induced by N (see Theorem 6). Here we shall take this result a step further and characterize those sets of set pairs, or set pair systems, that are induced by 1-nested networks (see Theorem 12). Once we have this characterization, we then leverage it to compute a consensus of a collection of 1-nested networks using a similar strategy to the majority rule approach for phylogenetic trees. In particular, for t ≥ 1, we prove that for a collection of t ≥ 1 1-nested networks, all on the same set X with n elements, an analogue of the majority rule consensus tree can be computed in O(tn 2 + n 3 ) time (see Theorem 22). Note that in case all of the 1-nested networks in the input collection are phylogenetic trees our approach will generate the majority rule consensus tree. The rest of the paper is organized as follows. In Section 2 we describe the above-mentioned natural equivalence relation on 1-nested networks, and show that we can encode any resulting equivalence class in terms of a set pair system. In Section 3, we first present some more notation related to set pair systems and then introduce a special class of such systems called 1-nested compatible set pair systems. In Section 4, we show that these 1-nested compatible set pair systems are precisely those set pair systems which are induced by 1-nested networks. In Section 5, we present an algorithm for computing a consensus for a collection of 1-nested networks. We conclude with a list of open problems in Section 6.

Encoding compressed 1-nested networks
In this section, we introduce compressed 1-nested networks, which represent equivalence classes of 1-nested networks. From a biological point of view, all 1-nested networks in such an equivalence class describe the same flow of genetic information from the root of the network to the species at its leaves (see Figure 4). Mathematically, it is more convenient to work with compressed 1-nested networks as they are directly encoded by their induced set pairs. To make this and the terms used informally in the introduction more precise, we begin by recalling some standard graph theory terminology.
A directed graph N = (V, A) consists of a finite non-empty set V and a subset A ⊆ V × V . The elements of V and A are referred to as vertices and arcs of N , respectively. A directed graph N is acyclic if there is no directed cycle in N . Moreover, a directed acyclic graph (DAG) N is rooted if there exists a vertex ρ ∈ V with indegree 0, called the root of N , such that for every u ∈ V there is a directed path from ρ to u. In a rooted DAG, a leaf is a vertex with outdegree 0, a tree vertex is a vertex with indegree at most 1 and a reticulation vertex is a vertex with indegree at least 2.
Note that the root of a rooted DAG is considered a tree vertex. Moreover, in a rooted DAG N , we call a vertex u a child of a vertex v and, similarly, v the parent of u if (v, u) is an arc of N . We next define two key concepts. From now on, X will denote a finite, non-empty set.
Definition 1 A reticulation cycle C = {P, P } in a rooted DAG consists of two distinct directed paths P and P such that P and P have the same start vertex and the same end vertex but no other vertices in common.
together with a bijective map ϕ from X to the set of leaves of N such that: (i) No vertex of N has outdegree 1.
(ii) All vertices of N have indegree at most 2.
(iii) No two distinct reticulation cycles in N have an arc in common.
Note that general 1-nested networks may contain arcs (u, v) such that u has indegree 2 and outdegree 1 and v has indegree 1. In Figure 4 arcs of this type are drawn with dotted lines. Such arcs do not have any impact in the flow of genetic information from the root of the network to its leaves and induce a natural equivalence relation on 1-nested networks (see also [32, p.251] for the concept of compression in more general phylogenetic networks). For our purposes, it will be convenient to work with that member of the equivalence class that does not contain any such arcs, that is, we restrict to precisely the compressed 1-nested networks defined above.
We next describe an encoding of compressed 1-nested networks. A vertex u in a rooted DAG N is a descendant of a vertex v if there exists a directed path (possibly of length zero) from the root of N to u that contains v. A descendant u of v is a strict descendant if every path from the root to u contains v. Otherwise u is called a non-strict descendant of v.
Definition 3 Let N = ((V, A), ϕ) be a compressed 1-nested network on X and u ∈ V . Then C(u) denotes the set of those x ∈ X with ϕ(x) a descendant of u, S(u) denotes the set of those x ∈ X with ϕ(x) a strict descendant of u and H(u) denotes the set of those x ∈ X with ϕ(x) a non-strict descendant of u.
In [25] the ordered 3-tuple (S(u), H(u), X \ C(u)) was introduced as the so-called tripartition associated with vertex u. In view of the redundancy of the information stored in the tripartition we will focus on the first two components and denote them by θ(u) = (S(u), H(u)). Note that S(u) ∩ H(u) = ∅ for every vertex u of N . Also note that, for every vertex u, the set S(u) is always non-empty while H(u) may be empty (see [29, p. 416]). In addition we have the following property. A), ϕ) is a compressed 1-nested network on X. Then, for any two distinct vertices u, v ∈ V , we have θ(u) = θ(v).
Proof: Let u and v be two distinct vertices of N . First it can be checked that if u and v are both contained in a single reticulation cycle then we must have θ(u) = θ(v).
So assume that u and v are not contained in a single reticulation cycle. If there exists a directed path P starting from the root ρ of N that contains u and v (assuming without loss of generality that u comes before v on P ) it can be checked that we must have either Hence, u must be a vertex with outdegree 1 and (u, v) is an arc in N , in contradiction to the fact that N is a compressed 1-nested network. Now consider the situation where there is no directed path starting from the root of N that contains both u and v. It can be checked that this implies ( Putting θ(N ) = {θ(u) : u ∈ V } for any compressed 1-nested network N = ((V, A), ϕ) on X, the following is a consequence of [29, Cor. 5] and Lemma 4. In view of Theorem 6 the set θ(N ) can be viewed as an encoding of the isomorphism class of N , for any compressed 1-nested network N .

Set pair systems
In Section 2, we have associated to any compressed 1-nested network N on X an encoding in the form of the set θ(N ). The following definition captures the basic properties of this set.
Definition 7 A set pair system on X is a non-empty collection S of ordered pairs (S, H) of subsets of X with S = ∅ and S ∩ H = ∅.
In this section, we give a list of properties that a set pair system arising from a compressed 1-nested network on X must necessarily satisfy. In Section 4, we will then show that this list of properties actually characterizes set pair systems that are encodings of isomorphism classes of 1nested networks. The following chart displays the main dependencies within the material presented in Sections 3-5. As a first step towards giving the above-mentioned characterization we introduce a binary relation.
Definition 8 Let S be a set pair system on X. Then (S 1 , H 1 ) < (S 2 , H 2 ) for two distinct (S 1 , H 1 ), (S 2 , H 2 ) ∈ S if one of the following holds: Note that conditions (a)-(c) in Definition 8 are mutually exclusive. In addition, we write

Lemma 9
The binary relation ≤ is a partial ordering for every set pair system S on X.
Proof: Let S be a set pair system on X. The relation ≤ on S is reflexive by definition. To establish that ≤ is also antisymmetric, consider ( Then, by the definition of the binary relation <, precisely one condition from each of the two following columns must hold: It can be checked that every combination of two conditions yields a contradiction, as required. It remains to show that ≤ is transitive. So, consider three pairs ( . Therefore, it remains to consider (S 1 , H 1 ) < (S 2 , H 2 ) and (S 2 , H 2 ) < (S 3 , H 3 ). Then, by the definition of <, precisely one condition from each of the columns above must hold with the index 1 replaced by 3 in the right column. By checking every combination of two conditions, it follows that (S 1 , H 1 ) < (S 3 , H 3 ), as required. 2 Next we present properties that set pair systems arising from compressed 1-nested networks must satisfy (see Proposition 11).
Definition 10 A set pair system S on X is 1-nested compatible if it has the following properties: It can be checked with the set pair systems given below, that Properties (NC1)-(NC5) in Definition 10 are independent of one another in the sense that for every i ∈ {1, 2, 3, 4, 5} there exists a set pair system S i on some set X that satisfies all of these properties except for property (NCi): In view of our aim to compute a consensus of a collection of compressed 1-nested networks, a key aspect of properties (NC1)-(NC5) is that they can be checked locally for any set pair system S, that is, by inspecting only subsets of S of small constant size.

Proposition 11
For any compressed 1-nested network N on X the set pair system θ(N ) is 1nested compatible.
Next consider a vertex v ∈ V such that H(v) = ∅. Then there exists a unique reticulation cycle C = {P, P } in N such that v is a vertex on the directed path P . Note that since H(v) = ∅ and N is 1-nested, v cannot be the start or end vertex of P . Let u = v denote the end vertex of P . Then θ(u) = (H(v), ∅) ∈ θ(N ), implying (NC3).
To establish (NC4), consider two distinct vertices u, v ∈ V . In view of Lemma 4 we must have θ(u) = θ(v). First we consider the case that u and v are both vertices in some reticulation cycle C = {P, P }. This can lead to the following configurations (ignoring symmetric configurations obtained by switching the roles of P and P ): • u is the start vertex of P and v is another vertex on P . Then we have implying (S(v), H(v)) < (S(u), H(u)), as required.
• u is a vertex of P , but neither its start nor its end vertex, and v is the end vertex of P . Then , H(u)), as required.
• u and v are both vertices on P with u coming before v and both vertices are neither the start nor the end vertex of P . Then we have S(v) S(u) and , H(u)), as required.
• u is a vertex on P and v is a vertex on P but both vertices are neither the start nor the end vertex of P and P , respectively. Then we have S(u) ∩ S(v) = ∅ and H(u) = H(v) = ∅, as required.
Next we consider the case that u and v are not contained in the same reticulation cycle. This can lead to the following configurations: • There is a directed path P in N starting at the root ρ that contains both u and v. Then, assuming without loss of generality that v comes before u on P , we have • There is no directed path from the root ρ that contains both u and v. Then we have Since N is 1-nested, this is only possible if u, v, w are all vertices in the same reticulation cycle C = {P, P } but none of them can be the start or end vertex of the directed paths P and P . Since S(u) ∩ S(v) = ∅, u and v cannot lie on the same directed path in C. Without loss of generality, we may therefore assume that u and w are vertices on P .
From this it follows that we cannot have S(v) ∪ S(u) ⊆ S(w). Moreover, assuming without loss of generality that u comes before w on P , we have ∅ = S(w) S(u), implying that we cannot have

1-nested compatible set pair systems are encodings
In this section we prove the following result.
Theorem 12 Given a set pair system S on X, there exists a compressed 1-nested network N on X with S = θ(N ) if and only if S is 1-nested compatible. Moreover, if it exists then N is unique up to isomorphism.
Note that, in view of Proposition 11, there remains only one implication to be established to prove Theorem 12. Also note that Theorem 12 is a generalization of the so-called "Cluster Equivalence Theorem" for rooted trees and hierarchies (see e.g. [32, Proposition 2.1]). Indeed, this equivalence theorem follows from Theorem 12 by considering set-pair systems S in which H = ∅ for all (S, H) ∈ S.
In our proof of Theorem 12, we will use the concept of the Hasse diagram of a partial ordering π on a finite set M , that is, the DAG with vertex set M in which (x, z) ∈ M × M forms an arc directed from x to z if and only if zπx holds and there is no y ∈ M \ {x, z} with zπy and yπx. Our proof of Theorem 12 will follow a similar strategy to that used in the proof of [32, Proposition 2.1], in which it is shown that, when considering the usual set inclusion as the partial ordering on the set C(T ) of clusters induced by a phylogenetic tree T , the resulting Hasse diagram is isomorphic to T . Note that, as can be seen in Figure 5, the Hasse diagram of the partial ordering introduced in Section 3 on the set pair system θ(N ) for a compressed 1-nested network N will, in general, not be isomorphic to N . More specifically, the Hasse diagram is always missing those arcs of N which occur in a directed path in a reticulation cycle such that the path consists only of this single arc. We will come back to this technicality in Theorem 19 below.
For the rest of this section, S denotes a 1-nested compatible set pair system on X and D(S) the Hasse diagram of the partial ordering ≤ on S defined in Section 3. The bulk of the following proof is concerned with showing that Properties (NC1)-(NC5) suffice to establish that D(S) is, up to the technicality just mentioned above, isomorphic to a compressed 1-nested network N with θ(N ) = S. We begin with a basic observation about D(S).  Thus, D is rooted with root (X, ∅).
Next consider an arbitrary x ∈ X. In view of (NC2), we have ({x}, ∅) ∈ S and it follows immediately from the definition of ≤ that ({x}, ∅) has outdegree 0 in D. To show that the vertices of outdegree 0 in D are in one-to-one correspondence with the elements in X, assume for contradiction that there exists some (S, H) ∈ S with outdegree 0 but (S, H) = ({x}, ∅) for all x ∈ X. By the definition of a set pair system we must have S = ∅ and so we may select some x ∈ S. But then, by the definition of ≤, we have ({x}, ∅) < (S, H), implying that the outdegree of (S, H) in D is greater than 0, a contradiction. (iii): First note that in view of (X, ∅) ∈ S, (S, H) < (X, ∅) and H = ∅ there must exist at least one (S 2 , H 2 ) ∈ S that is minimal with respect to ≤ such that (S, H) < (S 2 , H 2 ) and H 2 = H. By the definition of < and in view of H 2 = H, we must have either S ∪ H ⊆ S 2 or S ∪ H ⊆ H 2 . Assume for contradiction that S ∪ H ⊆ H 2 . This implies H 2 = ∅. Consider the set pair (H 2 , ∅) which must be contained in S in view of (NC3). Then we have (S, H) < (H 2 , ∅) < (S 2 , H 2 ) in contradiction to (S 2 , H 2 ) being minimal. Thus, we must have S ∪ H ⊆ S 2 , as required. Now consider an arbitrary (S 1 , H 1 ) ∈ S with (S, H) < (S 1 , H 1 ) < (S 2 , H 2 ). Since (S 2 , H 2 ) is minimal, we must have H 1 = H. Therefore, we can have neither S ∪ H ⊆ S 1 in view of S 1 ∩ H = ∅ nor S ∪ H ⊆ H in view of S ∩ H = ∅ and S = ∅. Hence, by the definition of <, we must have S S 1 , as required.
To finish the proof, assume for contradiction that there are two distinct minimal elements (S 2 , H 2 ), (S 2 , H 2 ) ∈ S with (S, H) < (S 2 , H 2 ) and H 2 = H as well as (S, H) < (S 2 , H 2 ) and Proof: We first show that (S, H) has at most one parent (S 1 , H 1 ) with H 1 = ∅ in D(S). Assume for contradiction that (S, H) has two distinct parents (S 1 , ∅) and (S 2 , ∅) in D(S). Note that this implies ∅ = S ⊆ S 1 ∩S 2 . Moreover, it follows immediately from the definition of the Hasse diagram that we can have neither (S 1 , ∅) < (S 2 , ∅) nor (S 2 , ∅) < (S 1 , ∅). As a consequence and in view of (NC4), we have S 1 ∩ S 2 = ∅, in contradiction to ∅ = S ⊆ S 1 ∩ S 2 .
To finish the proof of the proposition, assume that (S, H) ∈ S has two distinct parents (S 1 , H 1 ) and (S 2 , H 2 ) in D(S). Then, in view of Lemma 14(i), we have H = ∅. Hence, by Lemma 15, we cannot have both H 1 = ∅ and H 2 = ∅. Moreover, by the same lemma, if H 1 = ∅ and H 2 = ∅, we must have H 1 = H 2 = S and S 1 ∩ S 2 = ∅, as required.
It remains to consider the case that, without loss of generality, H 1 = ∅ and H 2 = ∅. By the definition of the Hasse diagram, we cannot have (S 1 , ∅) < (S 2 , H 2 ) or (S 2 , H 2 ) < (S 1 , ∅). Thus, in view of (NC4), we must have We now prove a lemma which will be key to understanding reticulation cycles in D(S). Next note that every vertex of N has indegree at most 2, since by Proposition 16, every vertex of D has indegree at most 2, and we only add arcs in the construction of N from D whose end vertex has indegree 1 in D.
Finally, we show that no two distinct reticulation cycles in N have an arc in common. By Proposition 18, every reticulation cycle C in N is either a reticulation cycle in D or it arises by adding an arc from the start vertex to the end vertex of the directed path P (S, H) in D for some (S, H) ∈ S with H = ∅ for which P (S, H) is not already contained in a reticulation cycle in D. But then, again in view of Proposition 18, no two distinct reticulation cycles in N can have an arc in common. 2 We now prove the main result of this section.
Proof of Theorem 12: Consider a set pair system S on X. As noted at the beginning of this section, by Proposition 11, if S = θ(N ) for some compressed 1-nested network N on X, then S is 1-nested compatible.
Conversely, assume that S is a 1-nested compatible set pair system on X. Then, by Theorem 19, N (S) is a compressed 1-nested network on X. Case 1 : H = ∅. Assume for contradiction that H(u) = ∅. Then there must exist some (S 1 , H 1 ) ∈ S with (S 1 , H 1 ) < (S, H) such that (S 1 , H 1 ) is a child of some (S 2 , H 2 ) ∈ S with (S 2 , H 2 ) ≤ (S, H). This implies that (S 1 , H 1 ) has indegree 2 and, thus, (S 1 , H 1 ) is the end vertex of the two paths in a reticulation cycle C in N (S). Hence, we have H 1 = ∅ and, in view of ∅ = S 1 ⊆ S ∩ (S 2 ∪ H 2 ), (NC4) implies (S, H) < (S 2 , H 2 ). So, (S 2 , H 2 ) must be the start vertex of the two directed paths in C and (S, H) is a vertex on one of these directed paths distinct from the start vertex and the end vertex. But this implies H = ∅, a contradiction.

Consensus networks
In this section, we present an algorithm to compute a consensus network for a non-empty collection C of compressed 1-nested networks on X (cf. Algorithm 1). To give a high level description of this algorithm, put θ(C) = N ∈C θ(N ) and denote, for every (S, H) ∈ θ(C), by #(S, H) the number of networks N ∈ C with (S, H) ∈ θ(N ). In addition, for real numbers p and q with 0 ≤ p < 1 and 0 ≤ q < 1, put θ(C) (p,q) to be the set pair system In Lemma 20 below we establish that the set pair system θ(C) ( 1 2 , 2 3 ) is 1-nested compatible. Thus, Algorithm 1 first computes θ(C) and counts the number of times each set pair arises from the networks in C. From this, first the set pair system θ(C) ( 1 2 , 2 3 ) and then the 1-nested network N (θ(C) ( 1 2 , 2 3 ) ) is computed. Note that if all networks in C are phylogenetic trees (so that H = ∅ holds for all (S, H) ∈ θ(C)), then Algorithm 1 computes the majority rule consensus tree mentioned in the introduction.
To derive an upper bound on the run time of Algorithm 1, we rely on an upper bound for the size of a 1-nested compatible set pair system. In view of Theorem 12, finding such a bound is equivalent to giving an upper bound on the number of vertices in a compressed 1-nested network on X in terms of n = |X|. In view of upper bounds on the number of vertices in the closely related level-1 networks given e.g. in [34,Lemma 4.5] and [13, Lemma 3.1], the following result is perhaps not surprising, however we give its proof for the sake of completeness: Algorithm 2 Generate the set pair system θ(N ) from the 1-nested network N on X if v i is a leaf then 6: Initialize an empty trie T 2 for storing subsets of X as bitstrings 10: for all u ∈ U do 12: if u is a reticulation vertex then 13: if return θ 36: end procedure Lemma 21 Let S be a 1-nested compatible set pair system on a set X with |X| = n. Then |S| ≤ 3n − 2 and this upper bound is tight.
Proof: As mentioned above, it suffices to consider an arbitrary compressed 1-nested network N on a set X with |X| = n and to establish that |θ(N )| ≤ 3n − 2. Also note that if N does not contain any reticulation cycle then N is a rooted phylogenetic tree on X and it is known that |θ(N )| ≤ 2n − 1 (see e.g. end if 9: end procedure we assume that the directed path P consists of at least three vertices. Let e = (u, v) be the last arc on P . Note that u has indegree 1 in N . We remove e from N . If after the removal of e vertex u has outdegree 1 we suppress u. We perform this removal of an arc for every reticulation cycle in N and obtain a rooted phylogenetic tree T on X with where c(N ) is the number of reticulation cycles in N . Now, to establish |θ(N )| ≤ 3n − 2, it suffices to show that c(N ) ≤ n − 1 by induction on n. The base case of the induction for n = 2 claims that any compressed 1-nested network with precisely two leaves contains at most 1 reticulation cycle, which can easily be checked to be true. For n ≥ 3, consider the root ρ of N . To apply the induction hypothesis, we split N at ρ into two networks N 1 and N 2 on disjoint non-empty subsets X 1 and X 2 of X with X 1 ∪ X 2 = X. Note that if ρ has outdegree 2 and is contained in a reticulation cycle this involves the removal of an arc from this reticulation cycle as described in the previous paragraph. By induction, we have c(N ) ≤ c(N 1 ) + c(N 2 ) + 1 ≤ (|X 1 | − 1) + (|X 2 | − 1) + 1 = n − 1, as required.
It remains to note that, for every n ≥ 2, there exists a compressed 1-nested network N on a set X with |X| = n and |θ(N )| = 3n − 2. In Figure 7 examples for n ∈ {2, 3, 4} are depicted that can easily be generalized to any n ≥ 5.  In view of Lemmas 20 and 21, the set pair system S contains O(n) ordered pairs of subsets of X at the end of the loop in Line 18. To compute the Hasse diagram (S, A), the DAG corresponding to the partial ordering ≤ on S is formed first and then a transitive reduction [1] is performed on this DAG, taking O(n 3 ) time. Finally, each vertex of the Hasse diagram is checked using Algorithm 3. The total number of iterations of the loop in Line 4 of Algorithm 3 over all calls of Algorithm 3 is bounded by the number of arcs of the Hasse diagram. Therefore, the loop in Line 20 of Algorithm 1 has a run time in O(n 2 ).
In summary, the run time of Algorithm 1 is in O(tn 2 + n 3 ). The memory used by Algorithm 1 is dominated by the trie T 1 for storing ordered pairs of subsets of X, which is in O(tn 2 ). 2 Before concluding this section we note that as a consequence of Lemma 21 we can also give a bound on the time complexity of checking whether or not a set pair system is 1-nested compatible.

Discussion
We have presented a new characterization of an encoding of compressed 1-nested networks and used it to develop a novel approach to compute a consensus for a collection of such networks. These results open up various new directions and lead to several questions including the following (see [32,Chapter 10] for an overview of phylogenetic networks and the definitions for the classes that we mention): • Can similar encodings be given and characterized for other classes of phylogenetic networks? For example, in [7] an encoding for so-called tree-child networks is presented, and it would be interesting to understand how these encodings can be characterized. Other classes of phylogenetic networks that could be interesting to consider in this context are level -k networks for small k ≥ 2, normal networks and unrooted phylogenetic networks.