Representing non-crossing cuts by phylogenetic trees

Phylogenetic trees are representations of the evolutionary descendency of a set of species. In graph-theoretic terms, a phylogenetic tree is a partially labeled tree where unlabeled vertices have at least degree three and labels corresponds to pairwise disjoint subsets of the set of species. A cut of a graph G = (V,E) is defined as bipartition {S, V \ S} of the vertex set V of G. A pair of cuts {S, S̄}, {T, T̄} is said to be crossing, if neither S ∩ T , S ∩ T̄ , S̄ ∩ T nor S̄ ∩ T̄ is empty. In this paper, we show that each set of pairwise non-crossing cuts of a graph G can be represented uniquely by a phylogenetic tree such that the set of species corresponds to the vertex set of G.


Introduction
By the evolutionary theory, existing biological species are linked by common ancestors. Studying these ancestor relations leads to phylogenetic trees as graphical representations of the postulated evolutionary relationship of a specified set of organisms. In graph-theoretic terms, a phylogenetic tree is a graph-theoretic tree together with a mapping of the set {1, . . . , n} of labels to the vertices of the tree such that all vertices with degree less than 3 are images. Note, that some vertices may obtain several labels while other vertices (with degree at least 3) may obtain no label.
There had been done extensive research on how to find the evolutionary most probable phylogenetic tree for a given set of species [1,2,3,4,5,6,10]. Among those studies, several focused on how to construct all phylogenetic trees and how to count them [2,3,4,5,6].
Given a graph G = (V, E), an edge-cut of the graph can be characterized by the set of vertices on both sides of the edge-cut. Two cuts are said to be crossing, if their vertex sets intersect on both shores of the edge-cuts. Otherwise, the cuts are said to be non-crossing or laminar [7,9]. The concept of sets of laminar cuts has been used to show certain coverings of graphs and hypergraphs.
In Section 2 of this paper bipartitions of the label set of phylogenetic trees induced by the edges of the tree are introduced. We will call those bipartitions edge-bipartitions of the phylogenetic tree. In Section 3 we will show that each phylogenetic tree is uniquely determined by its set of edgebipartitions by giving a constructive proof. Further, some properties of the set of edge-bipartitions for phylogenetic trees resulting from each other by merging a single edge are obtained. In Section 4 we will then show that the set of edge-bipartitions of a phylogenetic tree forms a set of laminar cuts and further, that each set of laminar cuts corresponds to the set of edge-bipartitions of exactly one phylogenetic tree. In Section 5 we will summarize our results and pose some questions related to the shown combinatorial identity between phylogenetic trees and sets of laminar cuts.

Preliminaries
A phylogenetic tree is a tree T = (V, E) together with a mapping φ : X → V such that all all vertices with degree less than three are images of φ. We denote X as the label set of the phylogenetic tree, vertices in Im(φ) are called labeled vertices, vertices not in Im(φ) are called unlabeled vertices. For an edge e ∈ E, the edge-bipartition of e = {u, v}, denoted by X e (T ), is the bipartition {X 1 e , X 2 e } with X 1 e ∪ X 2 e = X such that all vertices of X i e , i = 1, 2, are in the same connected component of T − e. If we do not specify further, we will consider both possible enumerations for the blocks of X e (T ). X u e (T ) denotes the block such that the vertices labeled by the set X u e are in the same connected component as u (u does not necessarily be labeled itself), Xū e will denote the block of the connected component which does not contain the vertex u. The set E(T ) denotes the set of all edge-bipartitions of T . We will omit T whenever there is no ambiguity. A cut of a graph G = (V, E) is a vertex bipartition {S, V \ S}. Two cuts {T, V \ T } and {S, V \ S} are called crossing, if neither S ∩T , S ∩T ,S ∩T norS ∩T is empty. Otherwise they are called noncrossing or laminar. For a graph G = (V, E) and an edge e ∈ E, G/e denotes the graph resulting from G by identifying both endvertices of the edge e. For phylogenetic trees, the mapping φ will then also identify both endvertices of e.

Phylogenetic trees and edge-bipartitions
In this section we will show that each phylogenetic tree can be uniquely characterized by its set of edge-bipartitions.
First, we will show that the set of edge-bipartitions of a phylogenetic tree is indeed a set and not a multiset. Proof. Since T is a tree, T − e − f contains exactly three connected components. Let V 1 ,V 2 and V 3 denote the vertex sets of those components and let X i = X ∩ V i , i = 1, 2, 3, be the corresponding subsets of X in those components. Without loss of generality, the bipartitions X e and X f have then the following representation: Thus, it remains to show that the set X 2 is not empty. Consider an arbitrary vertex u on the path between the edges e and f . The vertex u clearly belongs to V 2 . Now two cases are possible: If u is a labeled vertex, it holds u ∈ X 2 and thus X 2 is not empty. If u is an unlabeled vertex, it has degree at least three. Hence, T − u has at least three components. The component not containing e and f is a tree consisting only of vertices of V 2 . Since all leafs in T are labeled -since they do not have a degree at least three -this subtree needs to contain a labeled vertex and thus X 2 is not empty. This completes the proof that X 2 is not empty and thus that the edge-partitions X e and X f are not equal.
Next, we will show an interesting property concerning the edge-bipartition of an edge e and the edge-bipartitions of the other edges which have the same endpoint with e in common. This property will then provide the main argument for the unique construction of the phylogenetic tree given its set of edge-bipartitions.
• If u is a labeled vertex, let U denote the label set of the vertex u. Then, for each edge For unlabeled vertices this implies that there is a set of edges whose edge-bipartition have as union the set X e u and that the unique minimal set with this property is Γ(u) \ {e}. The first part of the statement for labeled vertices implies that there is no set since each set disjoint to Xū e will miss the label set of the vertex u. The second part then implies that there exist sets where the label set of u is the only missing set in the union and that further Γ(u) \ {e} is the unique minimal set with the desired property.
First we will show that e is the only edge separating u and Xū e . For each edge f ∈ E 1 , u and all vertices of V 2 are in the same connected components. Thus, those edges do not separate u and Xū e .
For each edge f ∈ E 2 all vertices of V 1 are in the same connected component and thus X u e ⊆ X u f . However, since by Lemma 3.1 different edges induce different edge-partitions, X u e = X u f and thus X u f ∩ Xū e = ∅. This completes the first part of the proof which corresponds to the first statement for labeled vertices u.
Thus, considering the corresponding labeled vertex sets, it follows that if u is unlabeled, i∈F Xū i = X u e holds and if u is labeled, i∈F Xū i = X u e \ U holds. This completes the proof of the statements concerning F = Γ(u) \ {e}. It remains to show that F is the minimal set which allows the representation of X u e as the union of blocks of other edge partitions. First note for edges in E 2 both blocks contain vertices of Xū e . Thus, we are only concerned with edgesets F ⊆ E 1 .
Consider an arbitrary edge f ∈ E 1 \ F and let e denote the edge in F on the path from u to f . Let X 1 f denote the block which does not contain vertices of Xū e . Then by considering the tree structure, it is obvious that X 1 f ⊆ X 1 e . However, by Lemma 3.1 different edges of the same tree have different edge-partitions, which implies X 1 f = X 1 e . Thus, whenever we do not choose an edge of F , we have to choose at least two edges in the corresponding subtree instead and thus F is the only set with minimal cardinality and the desired property. Note that by 3.1 different edges have different partitions and thus E contains exactly one edgepartition for each edge in T . In this proof, we will use {u} to denote the label set of a labeled vertex u. Please note that {u} might have a cardinality greater than 1.
Proof. First note, that for each leaf u, there exists an edge f ∈ E with X f = {{u}, X \ {u}}. Further, if u ∈ X is not a leaf, there exists no edge-partition such that {u} is a block. Thus, we can derive the set of leafs of T from the set E. We now will give a procedure which uses a current root vertex u, a given edge-bipartition whose edge e will be incident to u in the resulting tree and the remaining set of edge-bipartitions to construct the subtree attached to the root vertex u by the edge e. We will start with an arbitrary leaf u and remove its edge-partition {{u}, X \ {u}} from the set E. Construction step: Check, if X \ {u} can be represented as union of blocks in E. If that is possible, by Lemma 3.2 the vertex adjacent to the root vertex u, which we will denote by v, cannot be labeled and thus, we add an unlabeled vertex v and the edge {v, u} to the constructed tree. Further, by Lemma 3.2 there exists exactly one minimal set of partitions, F ⊆ E, with this property and F consists of edge partitions induced by edges incident to v. If that union representation is not possible, by Lemma 3.2 the vertex adjacent to our root vertex must be labeled. So for each vertex v ∈ X \ {u} we check whether we can represent X \ {u, v} by the union of blocks of partitions in E. By Lemma 3.2 this will be possible for exactly one vertex v, which is the neighbor of our root vertex u so we add v and the edge {u, v} to our constructed tree. Further, there exists a unique minimal description for X \ {u, v} as union of partition-blocks. The corresponding set of partitions F again correspond to the edge-partitions induced by the edges incident to v. In any case we added one edge and one vertex to our constructed tree and know the blocks of the edge-partitions of the edges incident to our newly added vertex v. So iteratively, for each of those blocks we check whether we can represent them as unions of other blocks in E and repeat the construction step where now v corresponds to the root vertex and the corresponding block in F is considered instead of X \ {u}. Since in each construction step we add one edge to our graph and our graph contains exactly |E| edges, the procedure will terminate. Thus, we have given a procedure to construct T from E in a finite number of steps which completes the proof of the theorem.
We have now shown that the set of edge-bipartitions E of a phylogenetic tree T uniquely describes the tree T . We will now demonstrate some results how modifications of the set E influence the structure of the corresponding tree. Proof. Let f be an arbitrary edge of T with f = e. Then, X f (T ) = X f (T /e) since the labels of X will be in the same connected components in T − f and T /e − f . Thus, since for each edge f besides e, the edge-bipartitions coincide. The result of the lemma follows immediately.
Lemma 3.4. Let T be a phylogenetic tree. Let U ⊂ X be a label subset such that for each edgebipartition {X 1 , X 2 } holds that U ⊆ X 1 . Let E denote the set of edge-bipartitions of T . Then, there exists a phylogenetic tree T with set of edge-bipartitions E such that Proof. Note that by the consideration of both enumerations of the blocks the restriction that U ⊆ X 1 for each edge-bipartition {X 1 , X 2 } means that for each edge f of T all labels in U are in the same connected component in T − f . Thus, U is a subset of the labels of one labeled vertex w (it is further easy to see that each non-empty subset of the label set of a single labeled vertex has the desired property). Let W denote the set of all labels of the vertex w. Consider the partially labeled tree T resulting from T by relabeling the vertex w with the set W \ U and adding a new vertex u with label set U and the edge e = {u, w}. There are two cases: If W \ U is non-empty or W \ U is empty and w has degree at least two in T , then T is a phylogenetic tree. Then it is easy to see that T results from T by contraction of the edge e which further has X e = {U, X \ U }. Thus, T is the desired phylogenetic tree by Lemma 3.3. If on the other hand W \U is empty and w has degree one, then let f be the edge incident to w. Since w is a leaf, it holds It follows E ∪ {U, X \ U } = E and T itself is the desired phylogenetic tree.

Sets of non-crossing cuts and phylogenetic trees
In this section we will show that the set of edge-bipartitions of any phylogenetic tree forms a set of non-crossing cuts. Further, we will show that for each set of pairwise non-crossing cuts on a given ground set there exists one phylogenetic tree which has this set as its set of edge-bipartitions. Proof. This result basically follows from the proof of Lemma 3.1. There we showed that X e = {X 1 , X 2 ∪ X 3 } and X f = {X 1 ∪ X 2 , X 3 } for suitable disjoint choices of X 1 , X 2 and X 3 . It follows immediately that X 1 e ∩ X 2 f = X 1 ∩ X 3 = ∅. Thus, X e and X f are non-crossing.
Theorem 4.2. Let C be a non-empty set of pairwise non-crossing cuts of a set X. Then there exists a phylogenetic tree T with label ground set X such that C = E(T ).
Proof. We will use a proof by induction on the size of the set C. Assume C contains exactly one element, the bipartition {X 1 , X 2 }. Then the phylogenetic tree T with two vertices labeled X 1 and X 2 joined by an edge has C as its set of edge-bipartitions. Now let C be a set of cardinality at least two and assume that the theorem holds for all sets of pair-wise non-crossing cuts of X with fewer than |C| elements. Let X S = {S, X \ S} be a cut in C such that no other cut in C has S as superset of one of its sets. Such a cut X S clearly exists. Now there are two cases possible: Either, for each other cut X T = {T, X \ T } without loss of generality holds S ⊆ T , or there exists some cut X T = {T, X \ T } such that S ∩ T = ∅ and S ∩ (X \ T ) = ∅. We will assume the latter, and we will show that then X S and X T are crossing. We already know that S ∩ T = ∅ and S ∩ (X \ T ) = ∅ holds in this case. Further, by the definition of X S holds T ⊂ S and (X \ T ) ⊂ S. Thus, (X \ S) ∩ T = ∅ and (X \ S) ∩ (X \ T ) = ∅ and it follows that X S and X T must be crossing, a contradiction to the definition of the set C.
Thus, for all other cuts X T = {T, X \ T } ∈ C holds S ⊆ T . We will now consider the set C obtained by removing the cut X S from C. Since C has fewer elements than C, by induction there exists a phylogenetic tree T which has C as its set of edge-bipartitions. Further, for all edgepartitions of X T = {T, X \ T } of C holds S ⊆ T . By Lemma 3.4, there exists a tree which has C ∪ {S, X \ S} = C as its set of edge-bipartitions which completes the induction.

Conclusion and open problems
By introducing the concept of edge-bipartitions of phylogenetic trees we could show that there exists a one-to-one correspondence between sets of pairwise non-crossing cuts and phylogenetic trees. While this combinatorial identity is interesting in itself, it poses several new questions which may lead to useful applications: There are other interesting applications related to phylogenetic trees. For example did Lucet, Carlier and Manouvrier [8] obtain phylogenetic trees when considering the splitting classes for the two-edge connected reliability. The concept of edge-bipartitions can be easily used to proof that those splitting classes are minimal, i.e. that there are no two classes which describe the same connectivity case -a result missing in [8] which we will present in a forthcoming paper.