A note on block-and-bridge preserving maximum common subgraph algorithms for outerplanar graphs

Schietgat, Ramon and Bruynooghe proposed a polynomial-time algorithm for computing a maximum common subgraph under the block-and-bridge preserving subgraph isomorphism (BBP-MCS) for outerplanar graphs. We show that the article contains the following errors: (i) The running time of the presented approach is claimed to be $\mathcal{O}(n^{2.5})$ for two graphs of order $n$. We show that the algorithm of the authors allows no better bound than $\mathcal{O}(n^4)$ when using state-of-the-art general purpose methods to solve the matching instances arising as subproblems. This is even true for the special case, where both input graphs are trees. (ii) The article suggests that the dissimilarity measure derived from BBP-MCS is a metric. We show that the triangle inequality is not always satisfied and, hence, it is not a metric. Therefore, the dissimilarity measure should not be used in combination with techniques that rely on or exploit the triangle inequality in any way. Where possible, we give hints on techniques that are suitable to improve the algorithm.


Introduction
Graph comparison is getting increasingly important with the growth of data analysis tasks on graphs and networks. An important application occurs in molecular chemistry for the tasks of virtual screening of molecular data bases, substructure search of molecules, and the discovery of structure-activity relationships within rational drug design. Thereby, finding the largest substructure that two molecules have in common is a fundamental task [11]. Since molecules can naturally be represented by graphs, the problem is phrased as maximum common subgraph problem, which is as follows. Given two graphs, find a graph with a largest possible number of edges that is isomorphic to subgraphs of both input graphs. This classical graph theoretical problem generalizes the subgraph isomorphism problem and is well-known to be NP-hard in general graphs [7]. Even deciding whether a forest G is isomorphic to a subgraph of a tree is an NP-complete problem [7]. However, if G is a tree the subgraph isomorphism problem can be solved in polynomial time [16,17,4,21,19]. The generalisation of this approach to the maximum common subgraph problem is attributed to J. Edmonds [16]. However, the vast amount of molecular graphs of interest are not trees, but outerplanar graphs, i.e., they admit a drawing on the plane without edge crossings such that all vertices are incident to the outer face of the drawing. Even deciding whether a tree is isomorphic to a subgraph of an outerplanar graph is NP-complete [20]. On the other hand, subgraph isomorphism can be solved in polynomial time when both graphs are biconnected and outerplanar [13]. More general, subgraph isomorphism can be solved in polynomial time in k-connected partial ktree [15,8].
Based on these theoretical findings, Horváth, Ramon and Wrobel [10] proposed to consider so-called block-and-bridge-preserving (BBP) subgraph isomorphism for mining frequent subgraphs in databases of outerplanar molecular graphs. The BBP subgraph isomorphism allows to consider blocks, i.e., the biconnected components, and the trees formed by the bridges separately and thereby can be solved in polynomial-time. Moreover, the approach yields chemical meaningful results, since it requires that the ring systems of molecules are preserved.
The maximum common subgraph problem in outerplanar graphs of bounded degree can be solved in polynomial time [1]. Although molecular graph have bounded degree and are often outerplanar, the algorithm has a high running time and is probably not suitable for practical use. Schietgat, Ramon and Bruynooghe [18] proposed to determine a maximum common subgraph under the BBP subgraph isomorphism and developed an algorithm with a claimed running time of O(n 2.5 ) for two outerplanar graphs of order n. While the authors presented promising experimental results on graphs representing molecules, we show that their theoretical analysis of their approach is flawed. Moreover, we show that the proposed approach to derive a distance from the size (or weight) of the maximum common subgraph does not yield a proper metric.

Preliminaries
We briefly summarize the necessary terminology and notation. A graph G = (V, E) consists of a finite set V (G) = V of vertices and a finite set E(G) = E of edges, where each edge connects two distinct vertices. A path of length n is a sequence of vertices (v 0 , . . . , v n ) such that {v i , v i+1 } ∈ E for 0 ≤ i < n. A cycle is a path of length at least 3 with no repeated vertices except v 0 = v n . A graph is connected if there is a path between any two vertices. A graph is biconnected if for any two vertices there is a cycle containing them. A tree is a connected graph containing no cycles. A graph G with an explicit root vertex r ∈ V (G) is called rooted graph, denoted by G r . A graph An edge is a bridge if it is not contained in any block. A matching in a graph G is a subset of edges M ⊆ E(G) such that no two edges in M share a common vertex, i.e., e ∩ e ′ = ∅ for all distinct edges e, e ′ ∈ M . Given a bipartite graph G with edge weights w : E(G) → R, the weighted maximal matching problem asks for a matching M ⊆ E in G such that the weight w(M ) = e∈M w(e) is maximal. 1 An isomorphism between two graphs G and H is a bijection ϕ :  . A (BBP) common subgraph I is maximum w.r.t. a weight function w if there is no (BBP) common subgraph I ′ with w(I ′ ) > w(I). The two different concepts, maximum common subgraph (MCS) and BBP-MCS, are illustrated in Figure 1. The above definitions can be naturally extended to graphs with vertex and edge labels, where an isomorphism must preserve labels and the weight function may depend on the labels.

Complexity Analysis
The BBP-MCS algorithm for outerplanar graphs proposed in [18] decomposes the two input graphs into subgraphs with distinct root vertices referred to as parts (see Section 3.2 for a formal definition). An MCS problem for all compatible pairs of parts is then solved using a dynamic programming strategy. Here, a series of weighted maximal matching instances arises as subproblems. It has been claimed [18, Theorem 2] that for two outerplanar graphs G and H the proposed BBP-MCS algorithm runs in time We show that this bound cannot be obtained by the presented techniques.

Solving Weighted Maximal Matching Problems
The algorithm makes use of a subroutine for solving the weighted maximal matching problem in bipartite graphs, where weights are real values. The matching instances arising in the course of the algorithm may be complete bipartite graphs with a quadratic number of edges, see the counterexample discussed in Section 3.2. Hence, the running times given in the following refer to bipartite graphs with n vertices and Θ(n 2 ) edges in order to improve readability. The authors propose to use the algorithm by Hopcroft and Karp [9] to solve an instance of the problem in time O(n 2.5 ). Since this algorithm computes a matching of maximal cardinality, but is not designed to take weights into account, it cannot be applied to the instances that occur.
The best known approaches for the weighted problem allow to solve instances with n vertices and Θ(n 2 ) edges in time O(n 3 ), e.g., the established Hungarian method [3]. When we assume weights to be integers within the range of [0..N ], scaling algorithms would become applicable such as [6], which solves the problem in time O(n 2.5 log N ). This running time is still worse than the time bound for the algorithm by Hopcroft and Karp by a factor depending logarithmically on N . Moreover, it is desirable to allow that the weight of a common subgraph graph is measured by a real number depending on the labels of the vertices and edges it contains, cf. [18,Definition 2]. This leads to real edge weights in the matching instances.
In summary, no better bound than O(n 3 ) on the worst-case running time can be assumed for the subproblem of solving weighted maximal matching instances with n vertices.

The Number of Matching Instances
We consider a particularly simple counterexample to illustrate that the running time required to solve the matching problems cannot be bounded by O(n 2.5 ). We identify the flaw regarding the analysis which led to this incorrect result [18, Proof of Theorem 2]. More precisely, we show that for two graphs G and H of order n the BBP-MCS algorithm performs Θ(n) calls to the subroutine for weighted maximal matching [18, Algorithm 2, MaxMatch] with instances of size Θ(n). Since the relationship between the matching instances is not considered in [18], we assume that each instance is solved separately in cubic time, cf. Section 3.1. Therefore, no better bound than O(n 4 ) can be given on the total running time.
Let the two graphs G and H both be star graphs of order n + 1, i.e., trees with all but one vertex of degree one as depicted in Figure 2(a). Since trees are outerplanar, G and H are valid input graphs for BBP-MCS. The algorithm presented in [18] relies on a decomposition of the two input graphs into their parts. 2 Parts(T r ) of a rooted tree T r is recursively defined as follows [18,Definitions 20,23,26].
(i) T r ∈ Parts(T r ), (ii) if P p ∈ Parts(T r ) and p is incident to exactly one edge {p, v}, then the graph (P \ {p}) v is in Parts(T r ), For the first input graph G an arbitrary root vertex r is selected to define its parts. Let G be the star graph, r its center vertex and let L(G) denote its leaves, then where c is the unique center vertex of H and B H the subgraphs rooted at c obtained by deleting a single leaf with its incident edge, cf. Figure 2(c). In order to solve the problem, a variant of BBP-MCS, which requires to map the root of one part to the root of the other, is solved for specific pairs of parts denoted by Pairs (G, H). If the roots of both parts have multiple children, a matching problem between them must be solved. Such parts are referred to as compound-root graphs and the parts associated with the children are elementary parts, respectively [18]. Note that this is the case for G r and all the parts in B H ; according to [ Consequently, there must be an error in its proof: The authors claim that every vertex g ∈ V (G) and every vertex h ∈ V (H) has at most deg(g) (resp. deg(h)) elementary parts involved in a maximal matching. While this statement is correct the subsequent analysis does not take into account that there may be up to deg(h) matching instances of that size for a vertex h ∈ V (H). More precisely, the total time spent in RMCScompound for solving matching instances is claimed to be bounded by where T WMM (k) is the running time for solving a weighted maximal matching instance with k vertices [18, p. 361]. Actually the procedure considers all pairs of compound-root graphs, where each pair leads to a matching instance containing one vertex for each of the associated elementary parts. The counter example above shows that for a vertex h ∈ V (H) there may be deg(h) compound-root graphs with root h, each with deg(h) − 1 elementary parts. In addition, there is one compound-root graph with root h and deg(h) elementary parts. Therefore, a correct upper bound is In the counter example the degree of the center vertex is not bounded, which leads to the additional factor of n appearing in T corrected comp , but not in T comp .

Exploiting the Structure of the Matching Instances
The matching instances emerging for the counter example are closely related, since the symmetric difference of the elementary parts of Q 1 ∈ B H and Q 2 ∈ B H with Q 1 = Q 2 contains exactly two elements. It was recently shown that this fact can be exploited by solving groups of similar matching instances efficiently in one pass [5]. This technique was used to show that the maximum common subtree problem can be solved in time O(n 2 ∆) for trees of order n with maximum degree ∆, thus leading to O(n 3 ) worst case time. The same technique can be used to improve the running time of the BBP-MCS algorithm.
In [5] the proposed maximum common subtree algorithm was compared experimentally to the BBP-MCS algorithm of [18] using the implementation provided by the authors. The running times reported for the BBP-MCS algorithm actually suggest a growth of Ω(n 5 ) on star graphs.

Violation of the Triangle Inequality
Bunke and Shearer [2] have shown that where |Mcs(G, H)| is the weight of a maximum common subgraph, is a metric and, in particular, fulfills the triangle inequality. This was originally shown for a definition of the maximum common subgraph problem, which requires common subgraphs to be induced and measures the weight of a graph G by w(G) = |V (G)|. Lins et al. [14] proved that Eq. (3) also is a metric for the general (not necessarily induced) subgraph relation, where w(G) = |V (G)| + |E(G)|.
The article [18] suggests that the weight of a BBP-MCS combined with Eq. (3) is a metric, too. We show that this is not the case. The triangle inequality is violated, since d(G, F ) > d(G, H) + d(H, F ). In general, the connectivity constraints imposed by BBP-MCS make it difficult to derive a metric. For a more detailed discussion of this topic we refer the reader to [12,Section 3.6].