Efficient Maximum Matching Algorithms for Trapezoid Graphs

Trapezoid graphs are intersection graphs of trapezoids between two horizontal lines. Many NP-hard problems can be solved in polynomial time if they are restricted on trapezoid graphs. A matching in a graph is a set of pairwise disjoint edges, and a maximum matching is a matching of maximum size. In this paper, we first propose an $O(n(\log n)^3)$ algorithm for finding a maximum matching in trapezoid graphs, then improve the complexity to $O(n(\log n)^2)$. Finally, we generalize this algorithm to a larger graph class, namely $k$-trapezoid graphs. To the best of our knowledge, these are the first efficient maximum matching algorithms for trapezoid graphs.


Introduction
A graph G is a trapezoid graph if there exists a set of trapezoids between a pair of horizontal lines such that each vertex v i of G corresponds to a trapezoid T i and there is an edge (v i , v j ) iff T i ∩ T j = ∅. We call this family of trapezoids a trapezoid representation (or trapezoid model) for G, see Figure 1 in Section 3 for an example. Trapezoid graphs were first introduced by Dagan, Golumbic and Pinter in 1988 [6]. Corneil and Kamula independently introduced the same class [5], but they refer to them as interval-interval (II for short) graphs. Felsner, Müller, and Wernisch [9] introduced an equivalent box representation for trapezoid graphs. It is noticeable that we can easily get the box representation from trapezoid model by mapping the lower and upper lines of the model to the x-axis and y-axis in the box representation, respectively.
Trapezoid graphs are a class of cocomparability graphs that contains interval graphs and permutation graphs as subclasses. Interval graphs are widely applied for modeling real world problems. They appear in many different scientific domains such as biology, chemistry and archaeology [13]. On the other hand, trapezoid graphs are mainly applied in modeling channel routing problems in VLSI circuit design [6]. Besides, they are very simple in the sense that many graph problems that are NP-hard in general can be solved in polynomial time. For instance, in [9], some of the most classical problems in graph theory such as finding chromatic number, maximum weighted independent set, minimum clique cover and maximum weighted clique are solved in O(n log n) time by using their box representation and sweeping line techniques. Recently, by exploring the simplicity of this graph class, many well-known problems have a more efficient solution on trapezoid models, such as some O(n 2 ) algorithms for several counting problems on vertex covers [20], efficient algorithms on K-terminal residual reliability of d-trapezoid graphs [21,26], an O(n log n) algorithm for calculating the vertex connectivity [16].
Given a graph G = (V, E), a matching M in G is a set of pairwise disjoint edges; that is, no two edges share a common vertex. A maximum matching is a matching that contains the largest possible number of edges. For finding a maximum matching in general graphs, Even and Kariv [8] presented an O(n 2.5 ) algorithm; Micali and Vazirani [23] improved it to O( √ nm), which is proportional to the best-known algorithm for this problem in bipartite graphs [15]. Although a maximum matching can be found in polynomial time in general, it is still interesting to improve the time complexity of this problem for more restricted classes. As pointed out by Moitra and Johnson in [24], there is a strong relationship between the maximum matching problem in a cocomparability graph and the scheduling problem on its complement. Coffman and Graham [4] propose an O(n + m) algorithm for the two-processors scheduling problem when the dependency graph among tasks is transitively closed, therefore we propose an algorithm for the maximum matching problem on cocomparability graphs. Frank et al. [1] present an O(n(log n) 2 ) algorithm for finding a maximum matching in a permutation graph. Rhee and Liang [25] improved it to O(n log log n). They also present an efficient O(n log n) maximum matching algorithm for interval graphs and circular-arc graphs [19], which could be refined to O(n log log n) as claimed in [25]. Ghosh and Pal [12] proposed an O(n 2 ) maximum matching algorithm for trapezoid graphs that uses O(n+m) space. Unfortunately, [17] shows that their algorithm turns out not to be correct by giving a simple counterexample. The rest of the paper is organized as follows. In Section 2, we present some data structures and a known algorithm which solves the problem for cocomparability graphs. In Section 3, we define S-Range tree and propose an O(n(log n) 3 ) maximum matching algorithm for trapezoid graphs. This approach is inspired by the range tree method used in [1]. In section 4, we refine the previous algorithm to O(n(log n) 2 ) by introducing C-Range tree and generalize it for k-trapezoid graphs. We give some remarks and open questions in the last section. To the best of our knowledge, these are the first efficient maximum matching algorithms for trapezoid graphs.

Preliminaries
We first present some data structures and a prior result about the maximum matching algorithm for cocomparability graphs, which will be used later in our algorithms.

Segment tree
The segment tree was discovered by Bentley [2] in 1977; this is a data structure to store intervals, or segments. Let A be an array of n elements. By considering each element in A as an elementary interval in a normal segment tree, and each internal node contains a maximum element corresponding to its interval, a segment tree can be used to answer the problem of Range Maximum Query (RMQ for short). This data structure uses O(n) storage and can be built in O(n) time. It supports the following basic operations: • Given i and j, where 1 ≤ i ≤ j ≤ n, find the maximum element in the interval [i, j] of array A in O(log n) time.
• Update the value of an element in array A in O(log n) time.

CRMQ
This data structure was introduced in [11] to answer RMQ in constant time. Its time-efficiency is achieved thanks to Cartesian tree [27] and the lowest common ancestor query [14]. For convenience, we call this data structure CRMQ (constant-time RMQ) throughout this paper. It can be constructed in O(n) time, uses O(n) storage and answers RMQ optimally in O(1). However, the only disadvantage of CRMQ compared to the segment tree is that it does not support the update operation.

Range tree
The range tree is a data structure to hold a list of points. It allows to report all points in a given range efficiently, and can be used in two or higher dimensions. Range tree was separately introduced by different people [3,18,22,29]. In this paper, we first use 2D range trees while working on trapezoid graphs and use k-dimensional range trees later to apply for k-trapezoid graphs. The range tree of dimension k (with k ≥ 2) can be built in O(n(log n) k−1 ) time and uses O(n(log n) k−1 ) storage. It allows to report all points in a k-dimensional region given by a range (for both closed and open regions) in O((log n) k−1 + m) time, where n is the number of points stored in the tree and m is the number of points reported in a given query. In the 2D case, the query region is a rectangular region of the form [x 1 , x 2 ] × [y 1 , y 2 ] and the corresponding query time is O(log n + m). We refer to [7] for more details about both segment trees and range trees.

Maximum matching algorithm for cocomparability graphs
The maximum matching algorithm for cocomparability graph is derived from the following result.
Theorem 2.1. ( [10]) Given a directed acyclic and transitively closed graph G with n vertices, then there is a two processors scheduling for G of length ℓ iff there is a matching of size n − ℓ in the undirected complement graph G ′ .
It is noticeable that G ′ is a cocomparability graph and a matching in G ′ can be obtained from the schedule by simply matching all pairs of vertices that are assigned to the same time unit. The two processors scheduling problem is solved efficiently in [4]. Now, we describe briefly their idea to find a maximum matching in a cocomparability graph. The algorithm uses a vertex-labeling that assigns numbers from 1, 2, . . . , n to n vertices of graph G. Given two vertices u, v of G, we say that v is a successor of u if there is a directed edge from u to v in G. Let L(u) be the label of vertex u and N (u) be the label-list (L(v 1 ), L(v 2 ), . . . , L(v k )) of the successors of u in G such that N (u) is sorted in decreasing order. First, we label the outdegree-zero vertices starting by 1 in an arbitrary order. Suppose that the labels from 1 to k − 1 have already been assigned, then a vertex u is labeled by k if: 1. all successors of u are labeled, and 2. N (u) is the smallest list in lexicographic order among all vertices that satisfy the condition 1.
Once the vertex-labeling process is complete, the matching can be found in a greedy manner: start from the highest-label vertex, match it with the highest-possible-label vertex that remains, then delete these two vertices from the graph and repeat this process. This yields an O(n 2 ) time algorithm for finding a maximum matching in the cocomparability graph G ′ . It is noticeable that a trapezoid graph is also a cocomparability graph, therefore we wish to apply this procedure on trapezoid graphs to get a more efficient maximum matching algorithm based on their special structure.

An O(n(log n) 3 ) maximum matching algorithm on trapezoid graphs
In this section, we present our O(n(log n) 3 ) maximum matching algorithm for trapezoid graphs. Let G be a trapezoid graph with n vertices, and its trapezoid model is given, as in Figure 1. Each vertex u in G is associated with its four endpoints in the trapezoid model, which are denoted by a u , b u , c u , d u for coordinates of the top-left, top-right, bottom-left, bottom-right of its corresponding trapezoid, respectively. In the box representation ( Figure 2), each vertex corresponds to a box (a rectangular region), and we consider only the coordinate of the bottom-left (c u , a u ) and the topright (d u , b u ) endpoints of that box. As described in Section 2.4 above, our maximum matching algorithm has two main steps: (1) labeling the vertices of the complement graph G ′ (comparability graph), and (2) finding a maximum matching of G by the above greedy method. We use the notions of L(u) and N (u) as defined in Section 2.4.  Since G ′ is the complement of G, two vertices of G ′ are adjacent iff their two corresponding trapezoids do not intersect in the trapezoid model, i.e. one lies entirely to the right of the other (e.g. vertices 1 and 5 in Figure 1). We orient the edges of G ′ from the vertex corresponding to the left trapezoid to the vertex corresponding to the right one.
Definition 1. The level of a vertex v of G ′ is the length of the longest path from v to an outdegreezero vertex.
For the labeling purpose, we first put the vertices into levels. We denote the level of vertex u by ℓ u . For example, in Figure 1, ℓ 1 = 2 because one longest path from 1 to an outdegree-zero vertex in G ′ is 1-4-7 with length 2. Let k be the maximum level among all the vertices and L i be the set of vertices of level i (0 ≤ i ≤ k). This putting-into-levels step can be done in O(n log n) time by the technique of finding maximum independent set in [17]. The algorithm uses a dynamic process. We assign a number for each vertex while scanning the upper line of the trapezoid model. This number is the length of the longest chain ending at that vertex in the context of maximum independent set. It is the level of that vertex if we scan in the reverse direction. It is noticeable that for every i (0 ≤ i ≤ k), L i is a clique in G and is an independent set in G ′ . By the definition, the level of every successor of a vertex u in G ′ is always lower than ℓ u . We prove by induction the following lemma: Proof. The lemma is true for every vertex whose level is 0 or 1. This is because the level-0 vertices are exactly the outdegree-zero vertices (i.e. do not have any successor). They are always labeled from the beginning of the algorithm. Hence their labels are smaller than any level-1 vertex. Assume that the lemma is true for every vertex whose level is below i (i ≥ 2), we show that it is also true for level i. Let u be any level-i vertex and v be any vertex having level ℓ v less than i. By the definition of level above, u must have at least one successor of level (i − 1). Since ℓ v < i, the level of any successor of v is less than (i − 1). Therefore, by the induction hypothesis, N (v) < N (u) in lexicographic order, hence v is always labeled before u, thus L(v) < L(u).
According to Lemma 3.1, after putting into levels, the rest of the labeling process is to sort and label the vertices in each level, from the lowest to the highest level. We do this by using the box representation of the trapezoid graph.
We denote by DomReg(u) the upper right quadrant of an axis aligned coordinate system whose origin is at point +∞). Therefore, a vertex v is not adjacent to u in trapezoid graph G and has lower level than u iff (c v , a v ) ∈ DomReg(u). Let Reg(u) be the subregion of DomReg(u) which shares no point with DomReg(v), or Reg(u) = DomReg(u) \ DomReg(v). We define Reg(v) similarly (see Figure 2). Let M ax(R) be the maximum label of all points in R, where R is any region in the plane. M ax(R) is defined to be zero if R contains no point. In the plane, each vertex is represented by only the bottom-left point of its box in the box representation of the trapezoid graph (i.e. we discard all points of the form (d u , b u )). Moreover, we label the bottom-left point of each vertex by its label. The next lemma is very important in our algorithm: Proof. Note that we consider the complement graph G ′ of the trapezoid graph in the labeling process. A vertex u ′ has its corresponding point in DomReg(u) iff u ′ is a successor of u. Hence, each point in DomReg(u)∩DomReg(v) is a common successor of u and v. Therefore, to compare N (u) and N (v), we only need to consider the remaining points in Reg(u) and Reg(v). The lemma is then followed.
The comparison in Lemma 3.2 is a critical operation that will be used in our algorithm for sorting vertices in each level from L 0 to L k . So, we need a data structure which allows to query the maximum label in a given rectangular region, i.e. 2D range maximum query, and allows to update the label of a point efficiently.

S-Range tree
We use a range tree data structure to store only the bottom-left points of vertices in their box representation (the point at coordinate (c u , a u ) for each vertex u, see black points in Figure 2). Recall that a range tree can be constructed by using the fractional cascading technique [22,28] as follows. First, sort the points with respect to the increasing x-coordinate, and build a binary search tree over them. Second, at each internal node, sort the points in its subtree with respect to the y-coordinate by using the merge sort method. Each element in the point list of any internal node always has two pointers that point to the appropriate element in the list of its left child and right child. Specifically, an element α has both a pointer to the element corresponding to the same point in a child and a pointer to the element in the other one having smallest y-coordinate that is bigger than the y-coordinate of α. The goal of these pointers is to indicate the exact segment to query while searching for points in a given rectangular region. For querying the maximum label and updating labels, we add a segment tree over the labels of points at each internal node (see www.ejgta.org Efficient maximum matching algorithms for trapezoid graphs | Phan-Thuan Do et al. the construction of segment trees in Section 2.1). We call this modified range tree S-Range tree.
As we can see in Figure 3, S-Range tree has a recursive structure: an S-Range tree of n points is formed by its root node and two S-Range trees of n 2 points. Each node contains a key x serving for searching by x-coordinate like a binary search tree, a list of its corresponding points sorted by ycoordinate and a segment tree τ built over the labels of these points. Note that S-Range tree is very similar to the data structure introduced in Section 3 of [11], which they called a kind of scaling tree. It is also used to answer efficiently RMQ in any dimension. For convenience, we call their data structure C-Range tree for dimension 2 and multidimensional C-Range tree in general. The only difference between S-Range tree and C-Range tree is that instead of a segment tree, C-Range tree uses CRMQ (introduced in Section 2.2) at each node. Therefore, they differ a bit in the time complexity and supported operations, that we will discuss more in Section 4.
The construction of S-Range tree takes O(n log n) time as a normal range tree since the building time for a segment tree at each internal node is just O(n ′ ), where n ′ is the number of points in that node. So, given a rectangular region, we need to traverse through O(log n) nodes in the S-Range tree. At each node, querying the maximum label by a segment tree takes O(log n). Hence the total time for querying the maximum label in any given rectangular region is O((log n) 2 ). For the update operation, we only need to follow the sequence of pointers corresponding to the point whose label needs to be updated. Therefore, this takes O((log n) 2 ) time for updating, similar to the time complexity for searching for the maximum label.

Algorithm
We can summarize the whole vertex-labeling process as follows: 1. Put the vertices into levels L 0 , L 1 , . . . , L k . 2. Initialize the label of every vertex to zero.   Figure 1 with the reduced edge set (consider only the edges between two consecutive levels) and an appropriate vertex labeling.
Recall that operations 1) and 4) take O(n log n) time, operations 2) and 3) trivially take O(n) time. Suppose that the set L i has n i vertices. Since the comparison takes O((log n) 2 ) time, the sorting operation in 5a) takes O((log n) 2 n i log n i ) time (by using any kind of O(n log n) sorting algorithm). Operation 5b) simply takes O(n i ) time and operation 5c) takes O((log n) 2 n i ) time since there are n i vertices to be updated. Since k i=1 n i log n i < n log n, operation 5) for all k sets takes O(n(log n) 3 ) time. Therefore, the overall time for the labeling process is O(n(log n) 3 ).
After labeling every vertex, all that remains is to perform a greedy matching step in trapezoid graph G. The S-Range tree constructed in the above step is still useful for the matching purpose. Recall that DomReg(u) is a region that contains only the points corresponding to vertices that are not adjacent and have a lower level than u. We denote by M axLabel(u) the maximum label of the vertices in the whole plane that is outside DomReg(u), or inside the region (0, +∞) × (0, +∞) \ DomReg(u). We can compute M axLabel(u) efficiently by dividing the query region into two rectangular regions (e.g. region (0, d u ) × (0, +∞) and (d u , +∞) × (0, b u )), and use the range tree structure above to find the maximum label in these two regions. Obviously, this M axLabel(u) query takes O((log n) 2 ) time. Our matching step can be described as follows: We go through every set from L k to L 0 . At each set, visit from the highest-label vertex to the lowest one, and for each vertex u: • If L(u) = 0, do nothing and continue with the next vertex, since u was already matched with a higher-label vertex.
• If L(u) > 0, update the label of u to 0 and do the M axLabel(u) query: -If M axLabel(u) = 0, then u has no free adjacent vertex, so u is not matched.
-If M axLabel(u) > 0, then match u with the vertex v having that M axLabel, and update the label of v to 0. Proof. Note that the region in query M axLabel(u) contains only two kinds of points: one is adjacent to u, and the other is not adjacent to u that must have a higher level and must be updated to the label of 0 before u. So, M axLabel(u) always gives the vertex having the highest label that is adjacent to u, or returns 0 if no such vertex exists. Hence, our greedy algorithm works correctly.
The algorithm goes through every vertex. For each vertex, we use at most two update operations and one M axLabel query that both take O((log n) 2 ) time. The total time for greedy matching step is then O(n(log n) 2 ). For example, from the vertex-labeling showed in Figure 4, the greedy matching step produces a maximum matching of the trapezoid graph in Figure 1 consisting of three edges: (1, 2), (3,4) and (6, 7). Proof. Our maximum matching algorithm on a trapezoid graph is correct because its two main steps are right with respect to the maximum matching algorithm for cocomparability graph described in Section 2.4. Since the time complexity of the labeling step is O(n(log n) 3 ), and the greedy matching step takes O(n(log n) 2 ), the total time for our algorithm is O(n(log n) 3 ).

Improved algorithm by using C-Range trees
In this section, we describe an O(n(log n) 2 ) maximum matching algorithm for trapezoid graph by improving the complexity of the labeling step to O(n(log n) 2 ).
In the labeling process, the most important operation which affects the complexity of the algorithm is sorting the vertices in each level based on the label-comparison in Lemma 3.2. The previous algorithm used an S-Range tree to perform this comparison in O((log n) 2 ). In this improved algorithm, we introduce C-Range tree to execute this comparison only in O(log n) time, based on the idea of CRMQ. The construction of a C-Range tree is analogous to an S-Range tree. Due to the similarity of the segment tree and CRMQ, we provide some comparisons for the operations performed by these two types of range tree in Table 1.  To gain the efficiency in the maximum label query, we need to change the algorithm by means of ignoring the update operations. Unlike the previous algorithm that uses only one S-Range tree, in this algorithm we need to construct (k + 2) C-Range trees including: • A lvl-C-Range tree to query the maximum level of the points in a given rectangular region.
Since the levels of points are fixed after putting into levels, this data structure can be built from the beginning of the algorithm.
• (k + 1) C-Range trees to query the maximum label in each level (denoted by C-Range tree-0, C-Range tree-1,. . . , C-Range tree-k). C-Range tree-i stores only the points corresponding to the level-i vertices, and will be constructed after knowing every label of the vertices in that level. We use C-Range tree-i to compare the labels of the higher-level vertices.
We need to use many C-Range trees since the label of every vertex is not known from the beginning, it is obtained based on the labels of other lower-label vertices. To be able to apply in our algorithm, querying the maximum label in a given rectangular region must be divided into two sub-operations: 1. Find the maximum level of the points in this region; denote this level by t (it is obvious that the maximum-label point also has the maximum level). 2. Query the maximum label in this region using C-Range tree-t.
Since these two sub-operations take O(log n) time, querying the maximum label in a given rectangular region takes only O(log n) time. We can summarize the entire improved labeling process as follows: 1. Put the vertices into levels L 0 , L 1 , . . . , L k . 2. Construct a lvl-C-Range tree to query the maximum level in a rectangular region. 3. Label the vertices in L 0 starting from 1 in an arbitrary order. 4. Construct the C-Range tree-0 to query the maximum label of level 0. 5. For each i from 1 to k: (a) Sort the vertices in L i in an increasing label order based on the comparison as described above. (b) Label the vertices in L i in increasing order from (M aximum label in L i−1 ) + 1.
(c) Construct C-Range tree-i to serve for the label-comparisons of higher-level vertices.
Suppose that L i has n i vertices, operations 1) and 2) take O(n log n) time, operation 3) takes O(n 0 ) time, operation 4) takes O(n 0 log n 0 ) time. Since the comparison described above takes O(log n) time, operation 5a) takes O((log n)n i log n i ) time, operation 5b) takes O(n i ) and operation 5c) takes O(n i log n i ) time. Hence, entire operation 5) for all k sets takes O(n(log n) 2 ) time, and it is also the time complexity of the whole improved labeling process. The remaining step of our algorithm -the greedy matching step, is still implemented as in the previous algorithm since the updating operations here cannot be ignored. Therefore, the overall time complexity of our algorithm is O(n(log n) 2 ). So we have the following theorem:

Generalization
A k-trapezoid graph (k ≥ 1) is an intersection graph of k-trapezoids between k parallel lines. A k-trapezoids is a polygon formed by k intervals on each line by both joining the starting points and joining the ending points of every consecutive interval. This generalization of trapezoid graph was first proposed in [9]. Note that an interval graph is a 1-trapezoid graph and a trapezoid graph is a 2-trapezoid graph. Since k-trapezoid graphs are cocomparability graphs (the complement of ktrapezoid graphs are comparability graphs), we can apply the algorithm described in section 2.4 to find a maximum matching in this graph class. Similar to trapezoid graph, a k-trapezoid graph also has a box representation. Hence we only need to consider the coordinates of the bottom and top points corresponding to each box. The bottom (top) point of a box is the point whose coordinate is formed by k starting (ending) points of each interval in k-trapezoid representation. Since almost all crucial operations of maximum matching algorithm on trapezoid graphs use a range tree of dimension 2, we can easily extend our method to k-trapezoid graphs by using a multidimensional range tree. Here are some specific details: • Putting the vertices into levels: Instead of using [17], we apply the technique for solving the maximum independent set or the minimum clique cover problem in a k-trapezoid graph from [9]. Therefore, this process takes O(n(log n) k−1 ) time.
• Construction time for both k-dimensional S-Range tree and k-dimensional C-Range tree is O(n(log n) k−1 ).
• Querying the maximum label in a rectangular region is extended to querying the maximum label in a k-dimensional region. This operation takes O((log n) k ) and O((log n) k−1 ) time on the extended S-Range tree and C-Range tree, respectively.
• The update operation of an extended S-Range tree takes O((log n) k ) time.
Therefore, by extending the dimension of the range trees, we can get an O(n(log n) k ) algorithm for finding a maximum matching in a k-trapezoid graph.
Theorem 4.2. A maximum matching in a k-trapezoid graph (k ≥ 2) can be found in O(n(log n) k ) time.
Remark that not only could our algorithm adapt to the larger graph class by extending the dimension of trapezoid graphs, but it is also possible if we lower the dimension. If we consider 1-dimensional S-Range tree as a segment tree and 1-dimensional C-Range tree as a CRMQ, then we could similarly obtain an O(n log n) algorithm for finding a maximum matching in 1-trapezoid graphs, i.e interval graphs. Therefore, Theorem 4.2 is also true for k = 1. This does not mean too much since we already had an O(n log log n) maximum matching algorithm for interval graph from [19,25]. However, it confirms the flexibility of our algorithm.

Conclusion
In this paper, we present an O(n(log n) 2 ) algorithm for finding a maximum matching in a trapezoid graph, and extend the result to obtain an O(n(log n) k ) algorithm for k-trapezoid graphs. To the best of our knowledge, these are the first efficient algorithms to solve the problem. Since we do not know any lower bound of this problem except the trivial bound Ω(n), we believe that the complexity we obtained is not optimal. One hypothesis is that if we can answer Range Maximum Query in constant time on permutation or trapezoid models, there could be a linear maximum matching algorithm for both of them. We leave the following conjecture as an open question.

Conjecture 1.
There exists an O(n) algorithm to find a maximum matching in a trapezoid graph given its trapezoid representation, where n is the number of vertices.