Journal of Graph Algorithms and Applications I/o-efficient Algorithms on Near-planar Graphs 504 Haverkort & Toma I/o-efficient Algorithms on Near-planar Graphs

Obtaining I/O-efficient algorithms for basic graph problems on sparse directed graphs has been a long-standing open problem. The best known algorithms for most basic problems on such graphs still require Ω(V) I/Os in the worst case, where V is the number of vertices in the graph. Nevertheless optimal O(sort(V)) I/O algorithms are known for special classes of sparse graphs, like planar graphs and grid graphs. It is hard to accept that a problem becomes difficult as soon as the graph contains a few deviations from planarity. In this paper we extend the class of graphs on which basic graph problems can be solved I/O-efficiently. We discuss several ways to transform graphs that are almost planar into planar graphs (given a suitable drawing), and based on those transformations we obtain the first I/O-efficient algorithms for directed graphs that are almost planar. Let G be a directed graph that is given as a planar subgraph (V, E) and a set of additional edges EC. Our main result is a single-source-shortest-paths algorithm that runs in O(EC + sort(V + EC)) I/Os. When EC is small our algorithm is a significant improvement over the best previously known algorithms, which required Ω(V) I/Os. Alternatively, when G is given with a drawing with T crossings, we can compute single-source shortest paths in O(sort(V + T)) I/Os. We obtain similar bounds for computing (strongly) connected components, breadth-first and depth-first traversals and topological ordering.


Introduction
When working with massive graphs, only a fraction of the data can be held in the main memory of a computer.Thus, the transfer of blocks of data between main memory and disk, rather than the actual computation, is often the bottleneck.Therefore the running time can be improved considerably by developing external-memory or I/O-efficient algorithms-algorithms that specifically optimize the number of block transfers between main memory and disk.
As graph problems are widely encountered in practice, I/O-efficient algorithms for graph problems have been an active area of research.Even though significant progress has been made, there is still a significant gap between the lower and the upper bounds for all basic problems.Consider a directed graph (digraph) with non-negative real edge weights.A shortest path from vertex u to vertex v in G is a minimum-length path from u to v in G, where the length of a path is the sum of the weights of the edges on the path.The length of a shortest path is called the distance δ G (u, v) from u to v in G.The single-source-shortestpaths (SSSP) problem is to find shortest paths from a source vertex s to all vertices in G.For planar digraphs (graphs that can be embedded in the plane such that no two edges intersect), there exist SSSP-algorithms with upper bounds on the number of block transfers that match proven lower bounds up to a constant factor.However, for general digraphs, the SSSP problem is still open, as are other basic problems such as the computation of connected components (CC), strongly-connected components (SCC), and depth-and breadth-first traversals (DFS, BFS).We note that these problems are also open on undirected graphs, although the best known upper bounds on undirected graphs have seen significant progress in the recent years.
Both from a theoretical and from a practical point of view, it is hard to accept that SSSP should become extremely difficult as soon as a graph contains a few deviations from planarity (like the one in Fig. 1).In practice, networks (e.g. transportation networks) may not be planar.However, when edges are expensive and junctions are cheap, such networks still have a strong tendency to planarity: there will be only relatively few links (e.g., motorways) that cross other edges without connecting to them.Other examples are networks in which each vertex is connected to a few nearby vertices.In such networks, there may be quite a number of crossings, but they are all very 'local'.In this paper we give a characterization of near-planarity covering a wide range of near-planar graphs, and develop the first I/O-efficient algorithms for such graphs.An extended abstract of this work appeared in [19].

I/O-Model and related work:
We develop I/O-efficient algorithms using the standard two-level I/O-model of Aggarwal and Vitter [1].The model defines two parameters: M is the number of vertices/edges that fit into internal memory, and B the number of vertices/edges that fit into a disk block, where B M/2.An Input/Output (or I/O) is the operation of transferring a block of data between main memory and disk.The I/O-complexity of an algorithm is the number of I/Os it performs.The basic bounds in the I/O-model are ) represents the number of I/Os required to sort N contiguous items on disk [1].For all realistic values of N , B, and M , scan(N ) < sort(N ) N .I/O-efficient graph algorithms have been considered by a number of authors; for a recent review see Meyer et al. [27].On general digraphs G = (V, E) with V > M the best known algorithm for SSSP, as well as for the BFS and DFS traversal problems, use Ω(V ) I/Os in the worst case 1 ; their complexity is ) as shown by Buchsbaum et al. [11], Chiang et al. [12], and Kumar and Schwabe [22].On sparse graphs, which have E = O(V ), the best known bounds are thus O(V ) I/Os or worse, which is no better than just running the internal-memory algorithms with an I/O-efficient priority queue in external memory.This is far from the currently best lower bound of Ω(min{V, sort(V )} + E/B) I/Os [12,28], which on sparse graphs is practically Ω(sort(V )).
The search for BFS, DFS and SSSP algorithms using O(sort(E)) I/Os on general (sparse) graphs has led to a number of improved results for special classes of sparse graphs by Arge and Toma [6], Arge and Zeh [9], and Arge et al. [4,5,7].All these algorithms are based on the existence of small separators.For planar graphs, they exploit R-partitions, as introduced by Frederickson [16].Given a parameter R, for any planar graph K = (V, E) we can find a subset V S of O(V / √ R) separator vertices, such that the removal of V S partitions K into O(V /R) subgraphs of size O(R).Moreover, the separator vertices can be "evenly" distributed among the subgraphs, so that each subgraph is adjacent to O( √ R) separator vertices, called the boundary of the subgraph.Maheshwari and Zeh [25] showed that such a partition of a planar graph can be computed I/O-efficiently with O(sort(V )) I/Os provided that M min(cB 2 , cR log 2 B), for a sufficiently big constant c.All I/O-efficient planar graph algorithms first compute a partition of the graph with R = Θ(B 2 ) and use it to reduce the original problem to a smaller problem, defined on the Θ(V /B) separator vertices.Assuming that M = Ω(B 2 ), each subgraph has size O(B 2 ) = O(M ) and thus fits in memory.The reductions rely crucially on the fact that there are a factor of B fewer separator vertices, and they are distributed among the subgraphs, so that each subgraph has a small boundary.Using these ideas, on planar digraphs, SSSP and BFS can be solved in O(sort(V )) I/Os as described by Arge et al. [7], and DFS in O(sort(V ) log V M ) I/Os as described by Arge and Zeh [9].Now consider a digraph G = (V, E ∪ E C ) that consists of a planar graph K = (V, E) and a given set of additional edges E C ; with a slight abuse of terminology, we call G C = G − K = (V C , E C ) the non-planar part of G, and we call V C and E C the non-planar vertices and the non-planar edges, respectively (refer to Fig. 1).If E C is empty, G is planar and SSSP can be solved with only O(sort(V )) I/Os.On the other hand if G is not planar, running any of the general I/O-efficient SSSP algorithms would result in Ω(E C + V ) I/Os.Moreover, running the planar SSSP algorithm using a separator of K would also result in Ω(E C + V ) I/Os (details in Sec.2.2).If E C is small compared to V , we show that we can do much better by refining the separator of K and extending the ideas behind the planar SSSP algorithm to handle the non-planar part of G.
Our results: In this paper we extend the class of directed graphs that admit I/O-efficient algorithms.We introduce a class of near-planar graphs and show how to find small separators for planar subgraphs of such graphs, that gracefully depend on the non-planarities.Using these separators, we develop the first I/O-efficient SSSP, BFS, DFS, topological sorting and (strongly) connected components algorithms for near-planar digraphs.
The main ingredient of our results is a partitioning theorem for a non-planar digraph G = (V, E ∪ E C ) consisting of a planar graph K = (V, E) and a given set of additional edges Starting with an R-partition of the planar part K of G, we show how to refine it to restrict the number of non-planar vertices v ∈ V c per subgraph and ensure that the number of subgraphs and separator vertices is not too large.More precisely, we show that an R-partition can be refined so that no subgraph contains more than O( √ R) vertices of V C , while adding no more than O( √ V V C /R 1/4 ) vertices to the separator and increasing the number of subgraphs in the partition by no more than O(V C / √ R).Using this refined R-partition and extending the ideas behind the planar SSSP algorithm, we show how to compute SSSP on We generalize our result to digraphs G = (V, E ∪ E C ) such that K = (V, E) can be drawn in the plane with T crossings.If we know for each edge (u, v) of K which edges it crosses, and in which order these crossings occur when traversing the edge from u to v, we can compute SSSP on such a graph G in When a graph is near-planar in the sense that T = O(V ) and E C = O(V /B), these bounds reduce to O(sort(V )), whereas the best known SSSP-algorithm for general graphs require O((V If information about a suitable drawing (that is, the location of its vertices) of a graph is given, our results allow the computation of SSSP in O(sort(E)) I/Os on graphs with crossing number O(E), on graphs that are k-embeddable in the plane for constant k, on graphs with skewness O(E/B) and on graphs with splitting number O(E/B).We obtain similar results for BFS, DFS, topological ordering and SCC.
Outline: The paper is organized as follows.Sec. 2 presents background on planar partitions and describes how to extend the results to graphs that are not planar.Sec. 3 describes how to use a refined, non-planar partition to compute SSSP efficiently.Sec. 4 extends our approach to other basic graph problems-BFS, DFS, topological sort, and (strongly) connected components.In Sec. 5 we explain how our technique could be used for problems on several types of graphs that are near-planar according to measures of planarity proposed in the literature.We conclude in Sec.6 and give directions for further research.

Partitioning a Near-Planar Graph
In this section we give an overview of partitions of planar graphs as described by Frederickson [16] and discuss how to extend his result to obtain a partition with similar good properties on graphs that are not planar.
We assume that we work with a directed graph G = (V, E ∪E C ) that consists of a planar subgraph K = (V, E) of constant degree and a set of edges is an endpoint of an edge in E C .We call the edges of G C cross-link edges, the vertices of G C cross-link vertices and G C the cross-link graph; for an example see Fig. 1.The graph G is directed, but in this section, when computing a partition of G, we ignore the direction of the edges.For this section and the next two sections we assume that the vertices and edges in G C are known and labeled as such.We assume that K has degree at most three; in Sec. 5 we will discuss how to transform any planar graph K = (V, E) of higher degree into a planar graph K with O(V ) vertices, O(E) edges, and degree at most three, so that our algorithms can be run on K +E C and the results can easily be mapped back to G.

Planar partition
By applying the separator theorem by Lipton and Tarjan [24] recursively, Frederickson [16] showed that any planar graph can be partitioned into subgraphs of arbitrarily small size with a small number of separator vertices.More precisely he showed the following: Theorem 1 (Frederickson [16]) For any planar graph K = (V, E), given a parameter R ≤ V , we can find a subset V S ⊂ V of O(V / √ R) vertices, such that the removal of V S partitions K into subgraphs K i such that: 1. there are O(V /R) subgraphs (clusters); 2. each subgraph has size O(R), and

(the vertices in) each
We call such a partition an R-partition-refer to Fig. 2. We use the following notation: the vertices in V S are separator vertices, and each of the subgraphs is a cluster ; the set of vertices in K−K i adjacent to K i are the boundary vertices ∂K i (or simply the boundary) of K i .We use K i to denote the graph consisting of K i , ∂K i and the subset of edges of E connecting vertices in K i ∪ ∂K i .
The set of separator vertices can be partitioned into maximal subsets so that the vertices in each subset are adjacent to precisely the same set of subgraphs (Fig. 2).These sets are the boundary sets of the partition.Assuming the graph has constant degree (we will discuss how to ensure this in Sec.5), there exists an R-partition with only O(V /R) boundary sets [16].

Refining a planar partition
Given a non-planar graph G, we start by computing an R-partition for its planar subgraph K = (V, E).Let G i denote the subgraph induced by K i in G.The separator V S is a separator for K but not necessarily for G, because any subgraph in K may contain up to O(R) cross-link vertices that are connected by cross-link edges to cross-link vertices in other subgraphs, bypassing the separator.
A straightforward way to get a separator for G would be to add all cross-link vertices V C to V S ; however, the planar SSSP algorithm of Arge et al. [7], run on the basis of such a separator, would use Ω(E C + V ) I/Os (see Remark 3.1).This is essentially the same as runnning any of the general SSSP algorithms, and no better than just running Dijkstra's algorithm in external memory with an external priority queue.
Therefore, we need a more sophisticated method to get a separator for G.In the remainder of this section we show how to refine the partition of K to incorporate the cross-link vertices of G while ensuring that each subgraph contains O( √ R) cross-link vertices and that the total number of separator vertices and subgraphs is not too large.The main conclusion of this section is the following.
subgraphs G with the following properties: • each subgraph G = (V , E ) has a total weight v∈V w(v) of at most 1.
Proof: The proof follows the proof of Lemmas 1 and 2 from Frederickson [16], which is based on recursive application of the separator theorem by Lipton and Tarjan [24] in two phases: first with uniform weights on the vertices, and then with weights on the separator vertices only.However, we use a non-uniform weight function in the first phase.Note that we are not interested in low-weight separators: it is the weight of the subgraphs that counts.The first phase of the recursive procedure is as follows.When G has weight w(G) at most 1, we are done.Otherwise, by applying Lipton and Tarjan's separator theorem, we find a subset S of at most 2 √ 2 √ V vertices of V such that S separates G − S into two subgraphs A and B that each have weight at most 2  3 w(G).We partition the subgraphs A and B recursively.This procedure results in a number of subgraphs.By construction each subgraph G = (V , E ) has weight at most 1, and it can be seen that the number of subgraphs is O(W ).However, the boundary ∂G of a subgraph G may still have more than O( √ V ) vertices-in the second phase of the algorithm we will subdivide each subgraph further into subgraphs with boundary size O( √ V ).But first we show that so far, the total number of vertices in the subsets S that were selected throughout the recursive process is O( √ V W ). Let s(V, W ) be the maximum number of separator vertices that may be selected while recursively partitioning a planar graph induced by a set of V vertices with weight W .Note that each of its subgraphs A and B has total weight at most 2  3 W , and at least one of them has at most V /2 vertices.Therefore s(V, W ) is bounded by the following recursive expression: where s(V, W ) = 0 if W 1, and c = 2 √ 2. Subgraphs without vertices have zero weight, thus s(0, 0) = 0. Since no graph with 0 vertices and weight W > 0 exists, no separator vertices can ever be selected while recursively partitioning such a graph, so s(0, W ) = 0 by definition.
We claim that this recurrence solves to We proof this by induction on W , starting with the base case 1 < W 3. It is easy to see that a subgraph with weight at most 3 is subdivided further at most 4 times, so that for 1 < W 3, we have It remains to prove that for W > 3, the following inequality holds for all 0 < α 1/2 and 1/3 β 2/3 (dividing out all factors c √ V ): To prove this, we distinguish three cases: Thus, it follows that the total number of vertices selected as separator vertices is at most We note that the first phase of the above proof effects to computing a weighted version of Frederickson's partition, and could be obtained using the results of Aleksandrov et al. [2].However, we believe that the proof above is simpler.Our algorithm for refining the R-partition of K proceeds by applying Lemma 1 to each subgraph G i that has more than c √ R cross-link vertices, for some fixed constant c.For each such subgraph we assign weight 1/(c √ R) to every crosslink vertex in G i and weight 0 to every other vertex.We get that each resulting subgraph has O( √ R) cross-link vertices and O( √ R) vertices on its boundarysee Fig. 3.We use Lemma 1 to bound the total number of separator vertices and number of subgraphs resulting from the refinement.We have the following.
Lemma 2 After refining the R-partition of K, the total number of vertices in Proof: Let G i be a subgraph in the original partition of G (before refinement) that had more than c √ R cross-link vertices.Subgraph G i has total weight: From Lemma 1 we get that the number of separator vertices obtained by refining a subgraph G i is:

Summed over all subgraphs
√ a + √ b, the worst case occurs if the cross-link vertices V C are evenly distributed over the O(V /R) subgraphs G i , and we get: Adding this to the O(V / √ R) vertices that were already in V S before we started refining the partition, we get a total of Recall that a boundary set of a planar partition is a maximal set of separator vertices adjacent to the same subgraphs.In our case G is not planar and we compute a partition of K = G−E C ; thus, a boundary set in the refined partition is a maximal set of separator vertices that are adjacent (ignoring the cross-link edges) to precisely the same set of subgraphs.For simplicity, we can think of all the cross-link vertices in a subgraph G i which are not in V S as an additional "boundary" set of that subgraph.
Lemma 3 After refining the R-partition of K, the total number of subgraphs and the number of boundary sets in the partition is Proof: Let G i be a subgraph in the original partition (before refinement) that had more than c √ R cross-link vertices.By Lemma 1, the number of subgraphs obtained by refining . The total number of subgraphs that we obtain from refinement, summed over all subgraphs in the original partition, is thus Adding O(V /R) for the subgraphs that already had at most c √ R cross-link vertices and did not need to be subdivided further, we get a total of To bound the number of boundary sets, we model the refined partition as a region graph: each subgraph represents a vertex, and two vertices are connected by an edge if the corresponding subgraphs share a boundary vertex.By the analysis of [16], from the fact that the vertices in the input graph have degree at most three, it follows that the region graph is planar and that the worstcase number of boundary sets is asymptotically the same as the number of subgraphs G i .
Lemma 4 If R ≤ M/c, for a sufficiently big constant c, then the R-partition of K can be refined with O (sort(V + E C )) I/Os.
Proof: Given that we know, before refining the partition, for every nonseparator vertex the subgraph that contains it, we can sort the partition into an edge-list representation.This representation consists of a a list of edges, with the out-edges of each vertex appearing consecutively in the list; thus, the out-edges of vertices in each subgraph G i are stored consecutively.This representation can be obtained in O(sort(V )) I/Os.Furthermore we label the cross-link vertices in each subgraph as such, in O(sort(V + E C )) I/Os.
The recursive subdivision algorithm works on one subgraph of the R-partition at a time, each of which can be processed in main memory if R ≤ M/c.Thus, the total number of I/Os needed are the I/Os needed to load the subgraphs G i and their boundaries, one at a time, into memory, and output the results.With the above representation each subgraph G i and all its adjacent edges can be loaded into memory in O(R/B) I/Os, or O(V /B) I/Os in total.The total time to refine the partition is thus O(sort(V + E C )) I/Os.

Overall we have the following:
Theorem 2 Let G be a graph that consists of a planar subgraph K of constant degree and a set of edges E C , and let V C be the set of vertices of V that are endpoints of edges in E C .Given a parameter 1 ≤ R < V , there exists a set of vertices V S ⊂ V whose removal separates K into a set of subgraphs G i with the following properties: 4. the number of boundary sets is asymptotically the same as the number of subgraphs.
Furthermore, if M min(cB 2 , cR log 2 B) for a sufficiently big constant c, then the above set V S can be computed with O (sort(V + E C )) I/Os.
Proof: The size of the partition follows from Lemmas 2 and 3. Provided that M min(cB 2 , cR log 2 B) the R-partition can be computed in O(sort(V )) I/Os using the algorithm of Maheshwari and Zeh [25].With one pass through the partition and E C one can label the cross-link vertices.Thus, the partition can be refined with O(sort(V + E C )) I/Os by Lemma 4.
Representation of the refined partition.We refer to the partition of Theorem 2 as a refined partition of G.For the rest of the paper we assume that the refined partition of G is given in edge-list representation, as follows.Let V σ be the list of vertices of G in the following order: all vertices in V − (V S ∪ V C ) are at the front of V σ grouped by the subgraphs G i , and, within the same subgraph, by vertex ID; then follow all the separator and cross-link vertices v ∈ V S ∪ V C grouped by boundary set, and, within the same boundary set in order of their vertex ID (remember that all cross-link vertices in a subgraph which are not in V S are considered to be another boundary set of that subgraph).Given that we know for each vertex v ∈ V S ∪ V C the boundary set which contains it and for every other vertex the subgraph which contains it, we can produce V σ in O(sort(V )) I/Os.Moreover, also in O(sort(V )) I/Os, we can associate to each vertex v its position σ(v) in V σ .
From the ordering σ we produce an edge-list of G by sorting the edges (u, v) by (σ(u), σ(v)).In this list, all the edges contained in or outgoing from a subgraph G i are consecutive and can be accessed sequentially; similarly, all outedges of a boundary set are consecutive.Given the refined partition and the ordering V σ , this edge-list representation of the partition can be obtained in O(sort(V + E C )) I/Os.

Non-planar SSSP using a Refined Partition
In this section we show how to use a refined partition of a non-planar digraph G = K ∪ G C to compute single-source shortest paths I/O-efficiently.
The approach follows the one used by the I/O-efficient algorithm for planar digraphs by Arge et al. [7], which is as follows.First, compute an R-partition of the planar graph K, while ignoring the directions of edges.Given the partition, compute a substitute digraph K R defined on the separator vertices.The graph K R is a reduced version of K (it has fewer vertices), and it is constructed such that for any pair of vertices in K R , the length of the shortest path between them in K R is the same as in K.The substitute graph K R is obtained by replacing each subgraph with a complete graph on its boundary vertices; the weight of each edge (u, v) between two boundary vertices u, v of a subgraph K i is the distance from u to v in that subgraph.In addition, K R contains the source vertex s and edges to the boundary of its subgraph, with weights defined in a similar way.The substitute graph K R as computed by Arge et al. [7] has O(V / √ R) vertices and O(V ) edges.Using K R the SSSP computation can now be accomplished in two steps: (1) Compute SSSP in K R ; by construction, we thus get the lengths of the shortest paths to separator vertices in K; (2) Compute the shortest paths to non-separator vertices (vertices inside the subgraphs K i ).
To extend this approach to a non-planar graph G we have to incorporate the non-planar part of G.A straightforward way to do this would be to construct an R-partition for K and add all cross-link vertices V C to V S .However, as we will explain below in Remark 3.1, the algorithm of Arge et al. [7], run on the basis of such a separator, would use Ω(V + E C ) I/Os in the worst case.Below we show how to exploit Theorem 2 to define the substitute graph G R of a refined R-partition and get a better result.We will prove the following.we draw all edges between separator vertices and cross-link vertices that have a path in K between them that does not go through any other boundary vertices.Right: The complete substitute graph consisting of edges inside subgraphs, edges between separator vertices, and cross-links, before adding the source vertex s.

The substitute graph
We construct the substitute graph G R on the basis of a refined R-partition that divides G into subgraphs G i , as explained in Sec. 2. Note that a shortest path between two arbitrary vertices in G enters and exits a subgraph G i either through a boundary vertex or through a cross-link vertex.Therefore the substitute graph G R will be defined on both the separator and the cross-link vertices.
• First, G R includes the edges between the separator vertices in V S , and the edges between the cross-link vertices V C , i.e., the cross-link graph G C .
• Second, it includes the union of all graphs G R i obtained by replacing each subgraph G i as follows: the vertices of G R i are the boundary vertices ∂G i of G i and the cross-link vertices V C ∩ G i of G i , and there is an edge from u to v in G R i if there is a path from u to v in G i that does not pass through any vertices of ∂G i other than u and v.The edge (u, v) has weight equal to the length of a shortest path from u to v in G i .Note that G R i contains edges between boundary vertices, between cross-link vertices and boundary vertices, and between cross-link vertices, see Fig. 4.
• Third, if the SSSP source vertex s is not a separator or a cross-link vertex, we add it to G R and add edges from s to all the boundary vertices and all cross-link vertices of the subgraph G i containing s (for which there exists a path from s in G i ); as above, the weight of an edge (s, v) is the length of a shortest path from s to v in G i .
Lemma 5 The substitute graph Proof: The number of vertices in the substitute graph is ) cross-link edges and edges between separator vertices in the partition, and the claimed bound follows.
Recall that δ G (u, v) denotes the length of the shortest path from u to v in G.

Lemma 6 For any pair of vertices
, that is, G R maintains shortest paths between its vertices.
Proof: We will first prove δ G R (u, v) δ G (u, v), and then prove δ G R (u, v) δ G (u, v), from which the lemma follows.
Let p be a shortest path in G from u to v. Let u ∈ V S ∪ V C ∪ {s} be a vertex on p and let v be the next vertex from V S ∪ V C ∪ {s} on p. Thus the part p u v of p between u and v is either a single edge (which is also included in G R ), or it only visits vertices within a single subgraph G i .In the latter case, p u v must be a shortest path from u to v in G i (otherwise we could replace p u v with a shortest path from u to v in G i and get a shorter path p, which is impossible by the definition of p).By the construction of G R , there must be an edge (u , v ) in G R which corresponds to p u v .This shows that δ G R (u, v) δ G (u, v).
To prove that δ G R (u, v) δ G (u, v), let p be a shortest path in G R from u to v. Consider an edge (u , v ) of p.The edge is either an edge in G with weight at least δ G (u , v ); or it is an edge in some graph G R i , in which case its weight is equal to δ Gi (u , v ) δ G (u , v ).Therefore the total length of p is at least the total length of some path from u to v in G.This means that δ G R (u, v) δ G (u, v) and this concludes the proof.
Lemma 7 Given a refined partition as in Theorem 2, an edge-list representation of the substitute graph G R can be computed in by Theorem 2, and each is loaded O(1) times (because of the assumption that the degree of each vertex is at most three).Thus the total number of I/Os required to load the subgraphs , each subgraph G i fits in memory and the APSP computation on a subgraph can be done without further I/O.Writing all the edges of G R to disk and sorting them in the end by vertex index to obtain the edge-list of G R requires O(sort(|G R |)) I/Os.Thus, in total, the computation of from s to all vertices v in the substitute graph G R can be computed by adapting Dijkstra's algorithm as discussed by Arge et al. [7].One of the problems with SSSP in external memory is figuring out, when relaxing an edge (u, v), the current tentative distance of vertex v.This distance is necessary in order to be able to delete the vertex from the priority queue-known external-memory priority queues support N insertions, extractions and deletions in O(sort(N )) I/Os (which is what we can afford) only if the deletion operations are given the element to be deleted together with its current priority.To this end, in addition to using a priority queue, we maintain a list L that stores the tentative distances from s to all the vertices in G R , that is, in V S ∪ V C ∪ {s}.When extracting a vertex from the priority queue, we retrieve the tentative distances of its out-neighbors from L. For each out-neighbor w of v, we check whether its tentative distance as stored in L is greater than d(v) plus the weight of the edge − → vw; if it is, we update the distance of w in L, delete the old entry of w from the priority queue and insert a new entry for w with the updated distance into the queue.
To analyze the I/O-complexity of the computation, we bound the number of accesses to the priority queue and to the list L. On the priority queue we perform in total )) I/Os using an I/O-efficient priority queue, e.g. the queue from Arge [3].
The list ) times; this is because every vertex in L is accessed once by each incoming edge in G R .Of course, we cannot afford one I/O per edge.In order to perform the accesses to L efficiently, we store L in the following order: all vertices in V S are at the front of L, grouped by boundary set, followed by the vertices in V C −V S , grouped by the subgraph G i that contains them.With this order the vertices in the same boundary set, as well as cross-link vertices in the same subgraph, are consecutive in L.

Lemma 8
The accesses to the list L can be performed in We can balance the terms in the expression above by choosing the subgraph size R appropriately.We assume that M = Ω(B 2 ).We have two cases: This concludes the proof of Theorem 3.
In the above algorithm we only discussed how to compute the length of the shortest paths.If we are interested in finding the actual paths, we can easily augment the algorithm to output the edges in the shortest path tree.Given a tree, Hutchinson et al. [20] showed how to store it such that for any vertex t, the shortest path between the source (root) s and t can be returned in k/B I/Os, where k is the number of vertices on the path.This data structure can be constructed from the computed distances in O(sort(V )) I/Os.Corollary 4 Let G = (K∪G C ) be a directed non-planar graph.A data structure can be constructed in O (E C + sort(V + E C )) I/Os such that the shortest path from a fixed source vertex s to a given vertex t can be found in O(k/B) I/Os, where k is the number of vertices on the path.

Other Non-planar Graph Problems using a Refined Partition
The refined R-partition can be exploited for other basic graph problems on non-planar graphs.The computation of a breadth-first search order is simply a special case of SSSP.Below we mention results for topological order, (strongly) connected components (SCC, CC) and depth-first search (DFS).All algorithms use the refined R-partition of G, which, according to Theorem 2, can be computed in O(sort(V + E C )) I/Os if M min(cB 2 , cR log 2 B).

Topological order
Let G = K∪G C be a directed acyclic non-planar graph.A refined partition of G can be used to compute a topological ordering on G by extending the algorithm for planar graphs of Arge and Toma [6].The basic idea is that an ordering of the vertices in order of the lengths of their longest paths from a source vertex gives a valid topological order.The algorithm is similar to SSSP, except that the substitute graph is defined to encode reachability and each vertex is labeled with the length (total weight) of its longest path from a source.The size of G R is the same as the size of the substitute graph in Sec. 3. The algorithm proceeds by sorting the vertices of G R in topological order and then processing them in this order to compute the lengths of their longest paths in G R .Finally each subgraph is loaded into memory to compute the lengths of the longest paths to internal vertices.Sorting all vertices in order of these lengths will put them in topological order.
Analysis: If R ≤ M/c, for a sufficiently large constant c, then each subgraph fits in memory.Computing the initial separator, computing G R , and the last step can be done as in Sec. 3 with O(sort(V Computing a topological order and longest paths on G R will cause O(1) I/Os per vertex, cross-link edge and boundary set, making a total of O( ) I/Os (the proof is the same as for Lemma 8).
From Theorem 2 we know that the size of ), and by Lemma 5, the size of Substituting this in the above bounds and adding them up we get the following.

Strongly-Connected Components (SCC)
The same ideas can be used to compute the strongly-connected components of G based on the algorithm by Arge and Zeh [9].The algorithm is similar to SSSP and topological order, in that it computes a partition, defines a substitute graph G R , computes SCC on G R and then computes SCC on the entire graph loading each subgraph into memory, one by one.The substitute graph is defined to encode reachability between vertices in the same way as it is defined above for topological order, except that it is not weighted.To compute SCC on G R the standard algorithm explores the edges in depth-first search manner and finds out, when exploring an edge (v, w), whether w has been explored before and whether it causes any of the current components to merge.The challenge is to find out the status of w with o(1) I/Os per edge.Arge and Zeh [9] showed how this can be done by keeping the active vertices in 3 stacks: one to store one vertex per each active component, one to store all vertices in order of discovery, and one to store the adjacency lists of these vertices.The key for I/O-efficiency is to store the out-edges of a vertex on the stack in order of the boundary set of the target vertex.This defines the so-called "stack segments".The stack segment on top of the (adjacency list) stack is kept up-to-date with the SCC labels of the target nodes.It is shown in [9] that maintaining this invariant causes O(1) I/Os per vertex, and O(B) I/Os per boundary set.For the full proof we refer the reader to [9].
The analysis can be extended to use the refined partition.The size and time to compute G R are the same.The only difference with computing topological order is computing SCC on G R .Using the same arguments as in [9], every vertex in G R will cause O(1) accesses to a stack segment; since a stack segment is stored consecutively, each access takes O( √ R/B/ ) I/Os.Every stack segment will be In all cases we assume that information about a suitable drawing of the graph is given, which will guide our selection of crossings that will be transformed into vertices and edges E C that are labelled as cross-links, so that the remaining graph K = G − E C is planar.Unfortunately this assumption seems difficult to overcome.Finding an optimal set of crossings would imply determining the crossing number of G, and finding an optimal set of cross-links E C would imply determining a maximum planar subgraph of G-both of these problems are well known to be NP-hard [18,30].However, in practice, graphs often come with good drawings.We can use the drawing of the graph to attempt to identify a large planar subgraph of G.
Below we will discuss the measures of planarity mentioned above and discuss how a graph that is near-planar, according to these measures, can be preprocessed so that it can be operated on by the algorithms described in the previous sections of this paper.After that we explain how to deal with vertices of degree more than three.

Graphs with low skewness
The skewness of a graph G = (V, E) is the minimum size of any set of edges E C such that G−E C is planar.When the skewness of a graph is O(E/B) and the set E C is given, our SSSP algorithm needs only O(sort(E)) I/Os, even if the edges and vertices in E C form a graph that is far from planar (for example a clique which would have Θ(E 2 /B 2 ) crossings when drawn in the plane).If E C is not given, it may be difficult to find it.Finding a minimum-size set E C corresponds to finding a maximum-size planar subgraph of G, which is NP-hard [17].
When a drawing of the graph is given, we can obtain an approximation for a minimum-size set E C by describing the problem as a vertex cover problem.Let G = (V , E ) be the crossing graph in which V has a node v(e) for every edge e in G, and E has an arc (v(e), v(f )) for every pair of crossing edges e and f in G.A minimum-size set of cross-links E C that leaves the remaining graph G − E C planar-that is, without crossings-now corresponds to a minimum-size set of nodes V C in G such that every arc in G is incident to at least one node in V C .
Finding a minimum-size vertex cover is again an NP-complete problem, even for planar graphs [21], but fortunately a factor-two approximation is sufficient for our purposes.We can find such an approximation as follows.We use the algorithm by Zeh [31] to find a maximal matching in the crossing graph, that is, a maximal set of arcs such that no two of them have a node in common.Since the maximal matching leaves no arc uncovered, while any minimum node cover must contain at least one node of every arc in the matching, we have that the nodes in the maximal matching form a factor-two approximation of a minimumsize node cover.The algorithm takes O(sort(E )) = O(sort(T )) I/Os, where T is the number of crossings in the input graph.We get the following: Theorem 9 Let G = (V, E) be a graph for which we are given a drawing with T crossings.If M = Ω(B 2 ), then single-source shortest paths, (strongly) connected components, and a topological order (if G is acyclic) of G can be computed with , where E C is the minimum number of edges that needs to be removed from the drawing of G to make it a plane drawing.A depth-first search order can be computed with

Graphs with small splitting number
Another measure of planarity used in the literature is the splitting number.Splitting a vertex is the process of replacing a vertex u by two vertices u 1 , u 2 , whereby some of the edges incident to u will be reconnected to u 1 , while the remaining edges incident to u are reconnected to u 2 .The splitting number of a graph is the minimum number of splittings that is needed to make the graph planar.
Finding the splitting number of a graph is NP-hard [15].When the splitting number of a graph is O(E/B) and the necessary splittings are given, we can solve the SSSP problem on such a graph in O(sort(E)) I/Os, using an approach similar to that for graphs with small skewness.Instead of running the shortest paths algorithm on the original graph, we run it on the planar graph resulting from the splittings, augmented with a zero-weight2 bidirectional crosslink (u 1 , u 2 ) for every vertex u split into u 1 and u 2 .This approach also works for connected components and depth-first search, but not for topological ordering, as it introduces bidirectional edges in the graph.

Graphs with low crossing number
The crossing number of a graph G = (V, E) is the minimum number of edge crossings needed in any drawing of a given graph in a plane.
Finding the crossing number of a graph is NP-complete [17].However, when a drawing with T crossings is given, we will show that it can be preprocessed so that our SSSP algorithm described in the previous sections uses O(sort(V + T )) I/Os.As before, we assume that all vertices of the graph have degree at most three.
The idea is to represent each crossing x by two crossing vertices c and c , which are marked as a crossing.Each crossed edge (u, u ), with crossings x 1 , ..., x n in order going from u towards u , is replaced by edges (b 0 , c 1 ), (c 1 , c 1 ), (c 1 , b 1 ), (b 1 , c 2 ), (c 2 , c 2 ), ..., (c n , c n ), (c n , b n ), where b 0 = u and b n = u .The edges crossing (u, u ) are transformed in the same way; thus, the edge (v, v ) that crosses (u, u ) in x i is replaced by a sequence of edges that also includes (c i , c i ).For an illustration, see Fig. 5.The vertices b i are inserted to make it easier to restore the original connectivity of the graph later; we call these vertices breakers.The resulting graph is a planar graph with O(V ) original vertices, O(T ) crossing vertices, and O(T ) breakers, and all vertices have degree at most three.We now apply our partitioning scheme from Sec. 2 to the transformed graph.After that, we restore the original connectivity of the graph so that shortest paths, connected components etc. are the same is in the original graph, while we maintain a good partition.We do this as follows.Consider a pair of crossing vertices c i and c i , and let c i , v 1 , v 2 be the neighbours of c i in clockwise order, and let c i , v 3 , v 4 be the neighbours of c i in clockwise order-see Fig. 5.The vertices v 1 , ..., v 4 are points on two input edges: more specifically they are original input vertices or breakers.The pair c i , c i models the crossing between the segments (v 1 , v 3 ) and (v 2 , v 4 ) of these input edges.We remove the edges (c i , c i ), (c i , v 1 ), (c i , v 2 ), (c i , v 3 ) and (c i , v 4 ) and the vertices c i and c i from the graph.If v 1 and v 3 lie in the same subgraph G i (that is, a subgraph with its boundary), we put an edge (v 1 , v 3 ) back in.If v 1 and v 3 do not lie in the same subgraph G i , we put a vertex c i and edges (v 1 , c i ) and (c i , v 3 ) back in, with c i added to the boundary set between the subgraphs that contain v 1 and v 3 .Analogously, we put in an edge (v 2 , v 4 ), or a vertex c i and two edges (v 2 , c i ) and (c i , v 4 ).
Note that these operations can only decrease the number of vertices and edges in any subgraph.The number of vertices in the boundary sets may increase by a constant factor, as a pair of vertices c i , c i of which only one vertex was in the boundary set, may be replaced by a pair c i , c i with both vertices in the boundary set.
Our algorithms for shortest paths (Sec.3), (strongly) connected components, depth-first search, and topological sort will run correctly on the resulting graph.Note that the transformation before and after partitioning can easily be carried out in O(sort(V + T )) I/Os.We get the following: Theorem 10 Let G = (V, E) be a directed graph for which we are given a drawing with T crossings.If M = Ω(B 2 ), then single-source shortest paths, (strongly) connected components, and a topological order (if G is acyclic) of G can be computed with O (sort(V + T )) I/Os.A depth-first search order can be computed with O (V + T )/ √ B + sort(V + T ) I/Os.
Thus, if a drawing is given where the number of crossings T = O(E) then SSSP, SCC and topological sorting can be solved in O(sort(E)) I/Os and DFS in O(V / √ B + sort(E)) I/Os.The approach described above can be compared with the approach described in Sec.5.1 above.Both approaches have the same asymptotic dependency on T ; however, the approach described in Sec.5.1 also depends on E C , the number of edges that needs to be removed from the drawing of G to make it a plane drawing.The approach described in Sec.5.1 may nevertheless be advantageous if a large number of crossings is caused by a small number of edges: in that case the approach of Sec.5.1 will work with a substitute graph with a small number of cross-link vertices, while the approach described here in Sec.5.3 would work with a substitute graph to which breaker and separator vertices may be added for many crossings.

Graphs that are k-embeddable in the plane
A graph is k-embeddable in the plane if it can be drawn in the plane so that each edge crosses at most k other edges [29].Since a k-embeddable graph necessarily has small crossing number, the above approach can be taken.

Combining crossings and cross-links
Above we mentioned that graphs that have low crossing number (or are kembeddable in the plane for small k) can be handled efficiently by replacing crossings by special vertices, while graphs with small skewness or small splitting number can be handled efficiently by identifying a small number of cross-link edges.The two approaches can be combined, so that we get the following: Hence we can find shortest paths in O(sort(E)) I/Os on a graph that consists of O(E/B) cross-links and a graph with crossing number O(E), provided the cross-links and the intersections in the remaining graph are given.However, how to find a constant-factor approximation of a minimum-size set of crosslinks such that the rest of the graph has crossing number O(E), still remains as an open problem.Graphs with vertices of degree more than three, can be handled by first transforming these vertices into vertices of degree three.If the graph is undirected, a vertex of degree d > 3 can easily be transformed into d − 2 vertices of degree 3 while keeping the graph planar, see Fig. 6

(left).
If the graph is directed, the same transformation can be used when computing shortest paths and strongly-connected components.However, this transformation does not work for topological sorting, since the added zero-weight edges would have to be undirected (thus introducting cycles).For directed graphs another transformation can be used that does not introduce cycles, see Fig. 6 (right).This transformation may make the graph non-planar: it may introduce up to d − 2 crossings.However, these can easily be handled with the technique described in Sec.5.3.This increases the number of I/Os only by a constant factor.

Discussion
In this paper we extended the class of graphs for which efficient computations of single-source shortest paths are possible from planar graphs to several classes of near-planar graphs.Our approach yields efficient algorithms for graphs with low crossing number, low splitting number, or low skewness-provided suitable drawings are given.Our techniques can also be applied to compute (strongly) connected components, BFS orderings, DFS orderings, and topological orderings.
In theory, creating suitable drawings is difficult, since identifying a maximum planar subgraph or computing the crossing number, splitting number, or skewness of a graph are NP-complete problems [15,18,30].However, in many practical applications of graph algorithms, graphs are given with a drawing or suitable drawings can be produced by heuristic methods.
Even if a good drawing is given, the method to identify cross-links in a graph of low skewness as described in Sec.5.1 needs to know all crossings in the drawing.The crossings would need to be given or would need to be computed: in the case of a rectilinear drawing3 we could do so with the external-memory line segment intersection algorithm by Arge et al. [8] or the randomized algorithm by Crauser et al. [13].One could hope to find an algorithm that can find an effective set of cross-links without computing all crossings in the drawing first.It would also be interesting to find a constant-factor approximation of a minimum-size set of cross-links such that the rest of the graph has crossing number O(E), so that we may have only very few cross-links and handle the remaining crossings with auxiliary vertices as described in Sec.5.3.
Furthermore, it would be interesting to look into more measures of planarity that may be exploited, such as thickness or geometric thickness [14].

Figure 1 :
Figure 1: A "near-planar" graph (bottom) consisting of a planar graph (top left) and a set of additional edges (top right).

Figure 2 :
Figure 2: Left: Partition of a planar graph into subgraphs (white background) and separator vertices (dark background).Each dark region constitutes a boundary set.Right: Removing the separator vertices and their incident edges would make the graph fall apart into its subgraphs.
The second phase of our algorithm subdivides each subgraph that results from the first phase recursively until each subgraph has boundary size O( √ V ).As with Frederickson, the number of subdivisions required is O(S/ √ V ) and the number of separator vertices added in each step is O( √ V ), thus only O(S) separator vertices are added.The number of subgraphs in the second phase increases by O(S/ √

Figure 3 :
Figure 3: Top left: A near-planar graph that consists of a planar subgraph and a set of cross-links.Circles represent cross-link vertices.Top right: Partition of the planar subgraph into subgraphs (white background) and boundary sets (dark background), before refinement.If c √ R = 6, the central subgraph and the top right subgraph have too many cross-link vertices.Bottom left: Partition after refinement.Bottom right: Partition after refinement with cross-links.
. each subgraph contains O(R) vertices, is adjacent to O( √ R) separator vertices and contains O( √ R) cross-link vertices;

Theorem 3
If M = Ω(B 2 ) the distances from a given source vertex s to all other vertices in a directed graph G = K ∪ G C can be computed in O (E C + sort(V + E C )) I/Os.

Figure 4 :
Figure4: Construction of the substitute graph.Left: Within each subgraph, we draw all edges between separator vertices and cross-link vertices that have a path in K between them that does not go through any other boundary vertices.Right: The complete substitute graph consisting of edges inside subgraphs, edges between separator vertices, and cross-links, before adding the source vertex s.

Proof:
The accesses to the list L are of three types: (1) O(E C ) accesses through the cross-link edges of G R ; (2) O(V S ) accesses through edges between separator vertices; and (3) O(V + V C √ R) accesses through the edges in the substitute graphs G R i .The first two types of accesses clearly take O(V S + E C ) I/Os.We now analyze the third type of accesses to L by counting the number of

Figure 5 :
Figure 5: Top left: A non-planar graph.Top right: Introducing crossing vertices (circles with dots) and breakers (circles without dots).Bottom left: A partition into subgraphs (white background), with separator vertices drawn on a dark background.Bottom right: Final graph with the original connectivity and a good partition.

Theorem 11
Let G = K ∪ G C be a directed graph and let a drawing for K with T crossing be given.If M = Ω(B 2 ), then single-source shortest paths, (strongly) connected components, and a topological order(if G is acyclic) of G can be computed with O(E C +sort(V +T +E C )) I/Os.A depth-first search order can can be computed with O E C + (V + T )/ √ B + sort(V + T + E C ) I/Os.

Figure 6 :
Figure 6: Transforming high-degree vertices into vertices of degree three, in the undirected case (left) and the directed case (right).The fat edges have zero weight.
To compute G R we need to load, one at a time, each subgraph G i into memory together with its boundary and cross-link vertices; compute all pairs' shortest paths (APSP) between its boundary and cross-link vertices; and write the edges and their weights to disk.Assume the representation of the partition at the end of Sec. 2. Loading the subgraphs G i into memory takes O(scan(|G i |)) for a sufficiently large constant c.Proof: