A New Multi-level Algorithms for Balanced Partition Problem on Large Scale Directed Graphs

Graph partition is a classical combinatorial optimization and graph theory problem, and it has a lot of applications, such as scientiﬁc computing, VLSI design and clustering etc. In this paper, we study the partition problem on large scale directed graphs under a new objective function, a new case of graph partition problem. We ﬁrstly propose the modeling of this problem, then design an algorithm based on multi-level strategy and recursive partition method, and ﬁnally do a lot of simulation experiments. The experimental results verify the stability of our algorithm and show that our algorithm has the same good performance as METIS. In addition, our algorithm is better than METIS on unbalanced ratio.


Introduction
Graph partition is a classical combinatorial optimization and graph theory problem.Given a graph G and a parameter k, the aim of this problem is to divide the vertex set of G into k parts, and to optimize the given objective functions.If we restrict the number (or total weights) of vertices of all parts are same or as same as possible, this problem is called balanced graph partitioning problem (BGP).BGP is a standard special case of graph partition problem, and it has a lot of applications, such as scientific computing, VLSI and chips design, image processing and clustering etc. Andreev and H. Räcke [1] showed that BGP is NP-hard even for 2-partition, and there is no constant approximation algorithm.In particular, BPG doesn't admit constant approximation algorithm unless NP = P, even for trees and grids [2].In addition, other cases of graph partition problem with application background have also received extensive attention from researchers, such as hyper-graph partition problem [3,4], balanced connected graph partition problem [5,6], and path-partition problem [7], etc.Recently, Buluç et al. [8] surveyed the algorithms design and applications of graph partition problem.Although there is no constant approximation algorithm for BGP, due to its wide applications, many heuristic algorithms had been developed to solve it.Firstly, by using local search strategy, Kernighan and Lin [9] presented an efficient heuristic algorithm for 2-BGP with time complexity O(n 2 log n).Then, Fiduccia and Mattheyses [10] developed a linear heuristic algorithm.Spectral method [11] is also an important method to solve BGP.This method divides the given graph into two parts, by using their eigenvalues and eigenvectors of its adjacency matrix or Laplacian matrix.At present, there are many graph partition algorithms based on spectral method [12,13], which can solve 2-BGP or general k-BGP iteratively.
On the other hand, with the increasing of the problem scale and improvement of the computing power, the size of the graph to be divided is becoming larger and larger, and the number of vertices of the graph can reach 100,000,000 or more.Thus, it is impractical to use the previous algorithms to solve large scale graph partition problem.Based on this, researchers proposed multi-level method and streaming algorithms to solve this problem.The main idea of multi-level method is to convert the original graph into a small scale resulting graph by multiple contraction firstly, then divide the new graph into k-parts, and finally back map and modify the partition of the contracted graph become a partition of the original graph.The popular software and software package of graph partition, METIS [14] and KaHIP [15] are designed based on this method.The main idea of the streaming algorithm is to assign each vertex of the graph into the suitable part one by one, through a specific potential function.The advantage of streaming algorithm is fast and saving memory, and it is very suitable for large-scale graph partition problem.The graph partition software FENNEL is based on streaming algorithm [16].
Although a lot of theoretical results and algorithms on graph partition have been obtained, there are still some problems that have not been explored.The first problem is partition on directed graph.Most of the previous works are on undirected graphs, but for some practical applications, such as multi-subject coupling problem, the corresponding models should be directed graph.Therefore, it is necessary to study the partition on directed graphs.The second one is about the objective function.In the past, researchers often considered the vertex-weight and the edgeweight separately, that is, to optimize some edge-weight objective functions under some vertex-weight constraints.There are few works on objective functions combining the two weights together.Based on this two point, we study the directed graph problem with combined weight function.
The organization of this paper.Some basic conception of graph theory and the mathematical modeling of this problem will be presented in Sect. 2. In Sect.3, we introduce the main idea and process of our algorithm.The experimental results are exhibited in Sect. 4. In detail, we will verify the stability of our algorithm, determine the some parameters and compare our algorithm with METIS.Finally, the conclusion and future work are given in Sect. 5.

Basic Conceptions and Mathematical Modeling
In this paper, we will introduce some conceptions in graph theory and develop the mathematical programming for the new balanced graph partition problem.
A (undirected) graph G is an ordered pair (V (G), E(G)) consisting of a set V (G) of vertices, and a set E(G) of edges.Each edge of G is an unordered pair of vertices.If an edge e joins vertices u and v, then u and v are called the ends of e.A directed graph D is an ordered pair (V (D), A(D)) consisting of a set V (D) of vertices, and a set A(D) of arcs (directed edges).Each arc of D is an ordered pair of vertices.If an arc a joins vertices u to v, then u is the tail of a, v is the head of a, and u and v are the ends of a.For any graph, if we see each edge e = uv as two arcs (u, v) and (v, u), then this graph becomes a directed graph.Thus, undirected graphs can be considered as a special class of directed graphs.For any vertex v in D, the notation A − D ({v}) is the sets of arcs whose heads are v, and the notation A + D ({v}) is the sets of arcs whose tails are v.Furthermore, for any vertex subset X, A − D (X) (A + D (X)) is the sets of arcs whose heads (tails) are in X, but tails (heads) are not in X.A set M of independent arcs (no common ends) in a digraph D is called a matching.Given a matching M of D, a vertex v is called matched (by M ) if v is an end of some arc of M ; otherwise, v is called unmatched.A matching M of G is maximal if for any arc a not in M , M ∪ a is not a matching of D.
Given a directed graph D = (V, A) with a weighted function Given a specific k-partition P , for any part j, we define its load where w(a).Let L P M and L P m be the maximum load and minimum load among all parts in P , that is, Thus, we model the balanced graph partition problem as the following unconstrained two-objective programming, min min where P is the set of all k-partitions of G and ρ P is the unbalanced ratio of the partition P .As mentioned in Sect. 1, our problem differs from METIS in two points.The first is that METIS can only deal with undirected graphs, but our problem is defined on the directed graphs.The second is the different objections.The mathematical model of METIS as follows, where E C is the set of edges whose ends are in distinct parts, and ρ ≥ 1 is the unbalanced ratio of the vertex weights.That is to say, the model of METIS considers vertices and edges separately, but we consider them together.

Algorithm
Since the scale of the graphs we're going to deal with are very large (up to 100,000,000 vertices), and the number of parts are also large (up to 100,000), our algorithm is designed by combining classical multi-level method and recursive partition method.

Multi-level Stage
Recently, the popular method to partition the large scale graph is multi-level.The main idea of the multi-level method has three phases: iterative contraction, initial partition and modification, and backward mapping.We will introduce the detail of each phase in the following.PHASE 1: Iterative Contraction.In this phase, we will construct a sequence of directed graphs (D 0 , D 1 , . . ., D m ) and make the numbers of vertices of graphs to be decreased one by one, where D 0 is the original directed graph.To do this, as the previous strategy, for any current graph D i , we computer a maximal matching M i and contract it to obtain the next graph D i+1 .This phase is end when the one of the following cases occurs: (i) the number of vertices of current graph is less than ck, where k is the number of parts of the partition and c = 90 is the contracted parameter chose by our experiments in next section; (ii) the ratio of contraction To computer the maximal matching, we will use the following two random methods.
Random Maximum Weight Matching (RMWM).This classical method is used in METIS [14] and other multi-level algorithms [15].The process of RMWM as follows.The vertices of the graph are chosen by a random order.For a chosen vertex u, if u is already matched by other vertex or its in-neighbors are all matched, we choose the next vertex.Otherwise, u is matched with its unmatched in-neighbor v with the maximum weight of arc (v, u), that is, When all vertices are chosen, we can obtain a maximal matching.
Random Maximum Ratio Matching (RMRM).The motivation to use this matching is the new objective functions.The difference between processes of RMRM and is only when u has unmatched in-neighbors.Since the objective function considers the weights of vertices and arcs together, u is matched with its unmatched in-neighbor v with the maximum ratio of arc-weight to vertex-weight, that is, PHASE 2: Initial Partition and Modification.After iterative contraction, the final graph D m has at most ck vertices.Thus, we can obtain a good initial partition by greedy strategy fast.In detail, we will use the best fit decreasing (BFD) algorithm similar to that of solving the bin-packing problem.Firstly, we set every part P j = ∅ for any j = 1, 2, . . ., k and reordering the vertices with decreasing vertex-weight.For each stage, if we put the current vertex v into the part j, then the load of the part j will become and the load of other part i will become Thus, we put the v into the part which can make the maximum load minimum.When the all vertices are visited, the initial partition P is obtained.
The aim of modification is to make the initial partition to be local optimum.The main strategy is local search, that is, move a vertex of the maximum load part into another part to reduce the maximum load, iteratively.In detail, for current iteration, we firstly choose a part P j with the maximum load.Then, for any vertex v in P j , we calculate its in-arc-weight w − i (v) and out-arc-weight w + i (v) with respect to each part Now, if we move vertex v from part P j into part P i , then the load of part other than P i and P j has not changed, and the new loads L j and L i become For every pair (v, P i ), we can calculate the maximum load and the sum of loads of the swapped partition.If there exists some swapped partitions whose maximum load less than that of the current partition, then we choose the partition with minimum maximum load instead of the current one, and repeat this operation.Otherwise, if there are some swapped partitions whose maximum load equal to that of the current partition, but the sum of loads are less than that of the current partition, we choose the partition with minimum sum of loads instead of the current one, and repeat this operation; else, the current partition achieves local optimum, and the process of modification is finished.PHASE 3: Back Mapping.In this phase, it should be mapping the partition of D i+1 back to that of D i , and modify the partition of D i to be local optimum in each level i = m − 1, m − 2, . . ., 0. But since the scale of the original graph is huge and the number of parts is also large, in order to save the memory and speed up the running time, we mapping the partition of D m back to the partition of D 0 directly.

Recursive Partition Stage
As the above subsection, the phase of iterative contraction ends when the number of vertices of contracted graph D m is less than 90k, where k is the number of parts of desired partition.This implies that if k is large, the scale of D m is also large and it can make bad performance and long running time.Thus, we use the recursive partition strategy to avoid this.
The main idea of recursive partition method as follows.At the beginning, we divide k into several little numbers, that is, where each k i is a little number (may be ≤ 20).In the first step, we firstly use the multi-level method to obtain a k 1 -partition P of the original graph.Since k 1 is small, we can guarantee the good performance and short running time.Then, based on the partition P , the whole graph will be decomposed into k 1 subgraphs, and each subgraph is induced by each part in P .Note that the weight of arcs in the subgraphs is the same as that in the original graph, but the weight of every vertex v needs to be changed as follows, where P [v] is the part which v belongs to P .The purpose of changing vertex-weight is to ensure that the objective value is the same either the partition for whole graph or for the subgraphs.In the second step, we will divide every subgraph into k 2 parts, and obtain k 1 k 2 new subgraphs by decomposing all old subgraphs.Hence, in the last step, we have k 1 k 2 • • • k t−1 subgraphs and obtain a k t -partition of every subgraph.That is, we obtain a partition of the original graph with How to choose recursive partition strategy?Based on our experiments in next section, we find that there is little difference between different strategies.Thus, if k is a power of some integer b ≤ 20, that is

Experimental Results
If we do this, we can obtain a better result on unbalanced ratio and maximum load.Since in many practical applications, each small part is often required to induced a connected subgraph, In this section, our experiment is mainly divided into two parts: design of algorithm and comparison with other algorithms.In the part of design of algorithm, we will test the performance of the two random matching methods, verify the stability of random method, and determine the contracted parameter c and strategy of recursive partition.In the comparison part, we will compare our algorithm with the k-way partition algorithm in METIS on unbalanced ratio, maximum load and running time to evaluate the performance of our algorithm.
The directed graphs used in the experiment consists of two classes, theoretical and practical models.We use the grid graph as the representative of the theoretical model, which can also be regarded as the inner dual graph of the square grid of plane.We consider grid graphs of three sizes Grid-1 with 1,000,000 vertices and 3,996,000 arcs, Grid-2 with 10,890,000 vertices and 43,546,800 arcs, and Grid-3 with 100,000,000 vertices and 399,600,000 arcs, each of which has the random vertexweight of 120-150, the weight of every arc is about 1/20 of the weight of its end.In practical models, we use 8 graphs from 3D finite element meshes, two of them from the METIS and others from the real examples.The characters of all graphs are showed in the Tab. 1.
All the experiments were performed on a Dell T7610 graphics workstation with Intel Xeon 2.6GHz CPU (6 cores) and 1866mhz DDR3 32 GB memory.

Matching Comparison
The aim of the subsection is to test the performance of the two matching contraction methods, and RMRM mentioned in Subsec.3.1.We do the experiment on five graphs, Grid-1, Grid-2, MDual, FEM-1 and FEM-3.The small-scale graphs (Grid-1, MDual and FEM-1) and large-scale graphs (Grid-2 and FEM-3) are partitioned into 100 and 1000 parts and 1000 and 10000 parts, respectively, where the contracted parameter c = 90 and the recursive partition strategies are 10 2 , 10 3 and 10 4 .Because of the randomness of the algorithm, we do each partition 10 times, and then compare the average and maximum values of the unbalanced rate ρ and the max-load L M .
The experimental and comparative results can be seen in following table and figures.Fig. 1 illustrates that the unbalanced ratios of are better than that of RMRM, except for the maximum unbalanced ratio of 100-partition on MDual.Fig. 2 implies that in term of max-load, while the performance of is better than RMRM, the gap is very small and the maximum ratio is less than 1.012.Hence, we use the method in the following.

Stability Verification
In this subsection, we will test the stability of algorithm, that is, determining whether will bring a large deviation to the output result.The same graphs with same parts are used in the experiment.we compare the experiment results from Figure 2 The ratio of the max-load of the RMRM to that of .Bars above the baseline indicate that the performance of RMRM is worse than .
unbalance ratio, max-load and running time three aspects, and the detail can be seen in Tab. 3.  From Fig. 3, we can see that although there is a gap between the best and the worst, the gap is small, and the biggest is only 0.70%.Furthermore, all of unbalance ratios are quite small, less than 2.00% except the worst result of 10000-partition on FEM-3.Fig. 4 illustrates the max-load and the running time, where the baseline is average values.For each example, the worst max-load is almost equal to the best one; the difference of running time is also very small, and the maximum ratio is about 1.10.Hence, the randomness of algorithm does not bring much deviation, our algorithm is vert stable.

Determining Parameters
In our algorithm, there is a parameter and a strategy that need to be determined.Firstly, we determine the parameter, contracted parameter c mentioned in Subsec.Fig. 5 shows the unbalanced ratios with different contracted parameter c. Fig. 6 and Fig. 7 exhibit the ratios of results of other parameters at maximum load and running time to results of c = 90, respectively.From these figures, we can see that the unbalanced ratio will basically decrease with the increase of the contracted parameters, on the contrary, the max-load and the running time will often rise with  the increase of the parameters.Overall, good performance is occurred when the parameter is selected as 70, 90, 110.Thus, we will choose the parameter c = 90.
For the recursive partition strategy, by dividing the number k and doing corresponding experiments, we find that there is little difference between these results.The deviations of unbalanced ratio and ratio of max-load are at most 0.5% and 0.2%, respectively.Hence, we choose the simplest strategy, that is, divide k into a power of some integer b ≤ 20.For example, if k = 1000, our algorithm is divided into three stages, and each stage does 10-partition.

Comparison with METIS
In this subsection, we will compare the performance of our algorithm (Graph Partition) with the k-way partition in METIS by carrying out the experiments on the 11 graphs of Tab. 1.Since METIS can only deal with undirected graphs, we transform each directed graph in Tab. 1 into an undirected graph, by modifying the weight of every edge uv as w(u,v)+w(v,u)

2
. Then, the resulting undirected graphs are partitioned by the k-way partition.Finally, we calculate the unbalanced ratio and max-load of each graph with respect to the partition.The experimental results can be seen in Tab. 4, and the comparison can be seen in the following figures.Note that since the graph Grid-3 is huge (100,000,000 vertices and 399,600,000 arcs), METIS does not calculate a feasible result.Fig. 8 illustrates the unbalanced ratios of partition results of the two algorithms.From the figure, we can see that the unbalanced ratio of small part is better than that of big part for each graph, this is a very natural phenomenon.Most of unbalanced ratios by our algorithm are less than 2%, and most of the results by METIS are between 6% and 9%.Clearly, our algorithm is better than METIS on unbalanced   ratio.All unbalanced ratios of graph Copter are worse, the reason is the average degree of Copter is much larger than others.
Fig. 9 and Fig. 10 show the ratios of max-load and running time of our algorithm to that of METIS.Fig. 9 illustrates that most of all ratios of max-load are between 0.94 and 1.06.This implies that there is little difference between the two algorithms in terms of maximum load.Moreover, we can see that the ratio increases with the number of parts, the main reason is that we do not use mutli-level modification in back mapping phase.And this is also a key direction in our future work.From Fig. 10, we can see that for the small k, our algorithm often runs longer than METIS; conversely, our algorithm often runs less time than METIS for large k.This difference is related to the number of iterations and the average number of vertices in each part.

Conclusions and Future Work
In this paper, we consider the balanced partition problem on large scale directed graphs.Firstly, we present a new mathematical modeling with new objective functions for this problem.Then, we combine multi-level strategy and recursive partition method to design an algorithm to solve it.Finally, by a large number of experiments, we determine the parameters, verify the stability of the algorithm, and compare with k-way partition in METIS in unbalanced ratio, maximum load and running time three aspects.The experimental results show that comparing with METIS, our algorithm is better in unbalanced ratio and has the same quality in maximum load.Furthermore, our algorithm can deal with some graphs with huge scale, which METIS can not return a feasible result.
There are two key points in the future work.The first one is adding modification in back mapping phase, that is, map the partition of D m back to that of D 0 level by level, and modify the partition of each level to be local optimum.the second point is to ensure the connectivity of each part.Furthermore, finding a new good and efficient graph contraction method is also a meaningful work.

Figure 1
Figure1The unbalanced ratios of the two types of maximal matching.

Figure 3
Figure 3 The best, average and worst unbalanced ratios of all examples.

Figure 4
Figure4The ratios of best and worst max-load and running time to the relative average results.The baseline is the relative average value.

Figure 5
Figure5The unbalanced ratios with different contracted parameters.

Figure 6
Figure6The ratios of max-load of other contracted parameters to that of parameter 90.

Figure 7
Figure7The ratios of running time of other contracted parameters to that of parameter 90.

Figure 8
Figure8The experimental results on unbalanced ratio.

Figure 9
Figure9The ratios of max-load of our algorithm to that of METIS.

Figure 10
Figure10The ratios of running time of our algorithm to that of METIS.

Table 1
The Characters of Graphs

Table 2
The Experimental Result on Different Matchings

Table 3
The Experimental Result on Randomized Stability

Table 4
The Experimental Results of Graph Partition and k-Way Partition