Algorithms on Graph Coloring Problem

Graph coloring problem is one of the most famous NP-complete problem and it is an important problem in the study of set theory and graph theory. Since it was raised, lots of people has tried many methods to solve the problem or to optimize the solution to this problem. In this article, the author will introduce the concept of the graph coloring problem and two solutions of the problem. One is based on genetic algorithm and the other is based on greedy algorithm. Each solution has its own traits. Through the analysis, this paper aims to reduce the complicity of the algorithms for making them useful in various applications.


Introduction
Graph Coloring Problem (GCP) is one of the most famous Non-deterministic Polynomial Complete problems. The mathematical definition of GCP is that there is an undirected connected graph G= (V, E) and V is the vertex set while E is the edge set. GCP is to divide the set V into several color groups and each group is supposed to form an independent set which means there is no adjacent vertex in one set. To be brief, all the vertexes ought to be colored and the adjacent ones should be coloured in different colors. If the graph can be divided into m independent sets at least, or to say if m colors used at least can make the adjacent vertexes in different colors, m is named after chromatic number χ(G) [1].
Take the following figures as examples. If the adjacent vertexes are supposed to get colored in different colors with least colors, the basic method is enumeration. First, vertex 1 is colored with green, and vertex 2 and 3 are colored with yellow. As for vertex 4, its colour must be different from vertex 2 and 3, so it can be colored with green again. The last one is vertex 5 and its color must not be green and yellow, so it can be colored with blue. However, when coloring vertex 4, it can be colored with any one if it is not yellow, but the graph is supposed to get colored with least colors, so green is used to color vertex 1. As a result, the vertexes are colored in three colors, and that means the vertex set is divided into three independent sets, so the chromatic number χ(G) is 3.
However, if there are hundreds of vertexes, this method is too complete, so other algorithms are needed.

Genetic algorithm
Genetic algorithm is a calculation model simulating natural selection in the Darwinian biological evolution and biological evolution in genetics. Before solving the problem, we are supposed to conduct mathematical modeling first. A feasible solution is seen as a chromosome. Feasible solutions are always made up of multiple elements and each element is called a gene.
As for the flow of the algorithm, at first, it will randomly generate a set of possible solutions, the first generation of chromosomes, called a population [2]. After that, the fitness function, which is used to evaluate the fitness of these chromosomes just like the seemingly god in the nature, will calculate the fitness of each chromosome and the probability that each chromosome will be selected in the next evolution according to the fitness.
Next, crossover will happen. Crossover means seeking two parent chromosomes, cutting them at the some location and splicing together, thus forming a new chromosome. The new chromosome contains some genes from the father chromosome and some genes from the mother chromosome. By crossover, n-m pieces of new chromosome will get born.
After crossover, variation will take place on the new-born n-m pieces of new chromosome. Variation is to select good genes by swapping the combination order to break the current search limit and the algorithm can find out the best solution.
At last, m pieces of new chromosomes will get born by copying, which means the chromosomes with the highest fitness of the previous generation are directly copied intact to the next generation. Assuming that n chromosomes need to be generated in each evolution, n-m chromosomes need to be generated in each evolution by crossover, and the remaining m chromosomes are generated by copying m chromosomes with the highest fitness of the previous generation [3,4]. This is an evolutionary process, followed by a new round of evolution.

Modularity
Modularity is a concept proposed by Newman in 2003. It reflects how well the outcome of a community division is.
In the formula, Q is modularity, and the bigger modularity is, the better the effect of the division is. Suppose there are x vertexes and each vertex represents an input. We have divided these inputs into n communities and there are m connections between vertexes. i and j are two random vertexes in the graph. When the two vertexes are connected, Aij=1; otherwise, Aij=0. di represents the degree of the vertex i and 2m is the degree of the whole graph. δ(ci,cj) is used to judge whether vertex i and j are in the same community. If they are in the same community, δ(ci,cj) is supposed to be 1; otherwise, δ(ci,cj)=0. There is an algorithm aimed to solve the graph coloring problem by the genetic algorithm. The first problem is to choose an adaptable fitness function. In order to divide the vertex set V into k independent sets {V1 , V2 , …, Vk }, the modular function of this partition can be inferred from the modular function defined for Newman:

Analysis of
In the formula, πk is the division of the vertex set; di is the degree of vertex i and ci is the independent set which i belongs to ci ∈{V1 , V2 , …, Vk }. The contribution of vertex i to Q (πk) is Now expand k independent sets to k + 1 independent sets and what is supposed to do is that x non-adjacent vertexes are selected from k independent sets to form a new independent and Vk + 1.
Because di is bigger than 0, In the absence of independent sets, the partition is measured by the ratio of the edges between vertex modules to the total edges of the graph, namely R/e and δ(R,e)=0.
When R/e=1, an independent division π has been found. If it is not the best division, there must be a partition with smaller Q values and fewer independent sets π, so the fitness function is designed as

Population initialization.
In the process of establishing the initial population, the vertex set of the graph is randomly divided into k sets and all the numbers of the vertexes depend on the specific problem. The division is like the following figure. In the initialization process, each vertex is randomly assigned a set number, but there must be a mechanism to ensure that while initializing the random assignment of set Numbers, vertexes should be reasonably assigned as far as possible according to the connection relation. After the random production of a stain, num gene sites are randomly selected for adjustment. Suppose there are g(g≤k) different numbers and that means the vertexes are divided into g subset {V1 , V2 , …, Vg }. We use er(πg,Vi) to denote the number of adjacent points of vertex r assigned to vertex set Vi under partition mode πg. The process of adjustment is as follows.
1)j = 1, emin = 1, temp = 0; 2)i=1, select a vertex r at random and place it in set Vi . Calculate er(πg,Vi), if er(πg,Vi) < emin, emin = er(πg,Vi), temp=i; 3)i=i+1; if i=g+1, step to 4); otherwise, step to 2); 4)The value of temp is assigned to the gene location on the chromosome corresponding to vertex r, so j = j + 1, if j = num + 1, it will stop, otherwise 1); According to the above method, pop size chromosome individuals are generated to form an initial population (where pop size is the population size).

2.3.3.
Coding. The algorithm in this paper adopts natural number coding, because a normal k coloring of connected graph G is to divide the vertex set V of graph G into k independent sets, which can be numbered as 1, 2... , k. The gene length of chromosomes is equal to the number of vertexes of graph G.

Selection.
In each generation population, according to the fitness of chromosomes, the elite selection strategy is applied to save the chromosome with the highest fitness. This method can ensure that good chromosomes will not disappear after generation, and the algorithm can converge to the optimal value according to probability 1 in limited steps.

Crossover and variation.
According to the chromosome sequences obtained from the selection operation, they are cross-operated with the crossover probability Pc. By means of crossover, information interaction between chromosomes can be carried out to produce new individuals and make the whole chromosome population show new traits.
When electing two chromosomes during the crossover operation, it is supposed to select of a gene randomly and then to carry out one way propagation cross, which is to select two chromosomes as paternal chromosomes. One is known as the source of chromosome and the other is called target chromosomes. A collection is selected randomly from the source on chromosome number c, and it is supposed to search iteratively for vertex v belonging to the set, and vertex v at target on chromosome number is set as c.
Because set numbers are a relative property, different set numbers on different chromosomes may represent the same module, so it is necessary to adjust the numbers of the two chromosomes. The process of adjustment is listed as follows.
1)Set id as 1 and all genes on the chromosome are marked as unadjusted; 2)Search the chromosomes for unlabeled, unregulated genes. If found, the value of id is assigned to the gene locus and the gene locus with the same value as the gene locus, and these gene loci are marked as adjusted states, set id as id+1; 3)Repeat step 2) until all loci are marked as altered.
After each operation, the candidate individuals were ranked according to the fitness value, and the individuals with the largest fitness were selected as the offspring.
According to the fitness function, the fitness value of independent set partition is obviously higher than that of not independent set partition. In order to prevent the algorithm from falling into the local

Greedy algorithm
The greedy algorithm is always to make the best choice in the current situation when solving a problem. In other words, instead of thinking about the overall optimal, what it does is a locally optimal solution in a sense. For the greedy algorithm, in order to get a optimal solution, the problem must meet the following two conditions: greedy selectivity and optimal substructure. Greedy selectivity means that the overall optimal solution of the problem can be achieved through a series of local optimal choices, namely greedy choice. Greedy selectivity is to make successive choices from the top down and iteratively, and each greedy selectivity simplifies the problem to a smaller sub-problem. As for optimal substructure, when the optimal solution of a problem contains the optimal solution of its sub-problems, the problem is said to have the optimal substructure property.
The basic idea of greedy algorithm is to proceed step by step from an initial solution of the problem. According to an optimization measure, the local optimal solution should be obtained at each step. Only one datum is considered in each step, and its selection should meet the conditions of local optimization. If the next datum and partial optimal solution are no longer feasible, the datum is not added to the partial solution until all data are enumerated, or the algorithm cannot be added to stop.

The RLF Algorithm
The Recursive Largest First (RLF) algorithm was put forward in 1979 by F. Leighton [6,7]. It builds several stable sets which correspond to several classes. It is supposed that C is the next color class to be built, U is the set of uncolored vertices and W is the set (initially empty) of uncolored vertices which owns no less than one neighbor belonging to C, and W is initially empty. Every time a vertex in U will be moved to C and all its neighbors in U will be moved from U to W. The first vertex v∈U in C is the vertex with most neighbors in U. After that, while U is not empty, move the vertex in U with most neighbors in W from U to C. If possible, ties will stop after the choice of a vertex whose number of neighbors in U is the smallest.
For a vertex u∈U, the author uses Au(u) to denote its number of neighbors in U and Aw(u) to represent its number of neighbors in W. Besides, if v is the first vertex in a color class, it is able to use Cv to denote the color class that contains it. The following algorithm shows how to construct Cv when given a specific vertex v by the RLF algorithm.
Construction of Cv Input A set U of uncolored vertices and a vertex v∈U. Output A stable set Cv containing v. 1)Initialize W as the set of vertices in U adjacent to v.
2)Remove v and all its neighbors from U and set Cv←{v}.
3)while U ≠ ∅, do Select a vertex u∈U whose value Aw(u) is the largest. In case of ties, choose one with smallest value Au(u). Move vertex u from U to Cv, and move all neighbor vertices w∈U of vertex u to W. end while When W=∅ initially, Aw(x) for all x∈U equals to 0 and we can get the initial values Au(x) in O(m) time. After that, when a vertex w is moved from U to W, for all neighbors x∈U of w, its Aw(x) will be incremented by one unit and Au(x) is decreased by one unit. Similarly, when a vertex u∈U is moved from U to Cv, Au(x) is decreased by one unit for all neighbors x∈U of u. Therefore, there are O(m) such updates. Besides, the selection of the next vertex to be moved to Cv will take O(n) time. As a result, the total complexity of the construction of Cv is O(m + n|Cv|). The RLF algorithm is listed as follows.
Algorithm RLF Input A graph G. Output A coloring of the vertices of G. 1) k←0.
2) while G contains uncolored vertices do Let U be the set of uncolored vertices. Set k ← k + 1. Choose a vertex v∈U with largest value AU(v).
Build Cv and assign color k to all vertices in CV. end while Because each vertex belongs to a color class, the overall complexity of the RLF algorithm is O(km + n 2 ) and k is the total number of the colors used in the graph. When it is the worst case, the complexity will be O(mn).

Alternative Greedy Choices
In the follow parts, we use the same notations as the RLF algorithm. For a vertex x∈U, Aw(x) denotes the number of neighbors of x in W. If a vertex x is moved to W from U, its Aw(x) will not change any more. Therefore, the value Aw(x) of every vertex equals to the value Aw(x) before the movement to W. Now, the following part introduces two improvements of the greedy choices made in RLF [7].

Alternative greedy choice for the selection of the next vertex to be placed in Cv.
This greedy choice is to select which vertex when selecting a vertex w ≠ v to be placed in Cv. The RLF algorithm selects the vertex u∈U whose value AW(u) is largest. However, there is a new and better way to select the vertex. For every vertex u∈U, let In the formula, N(u) represents the set of neighbor vertices of u while d(ω) is the number of uncolored neighbors of w during the initial stage of the building of Cv. The next vertex to be moved to Cv is the one with maximum value B(w).
Let G' be the part of the graph which has not been colored after the construction of Cv. By , we can choose a vertex u with uncolored neighbors w of large degree in W so that the maximum degree in the graph G' is the smallest. Besides, by making  (1) The first alternative is to construct a stable set Cv for every uncolored vertex v and to pick out a vertex which can produce a residual graph with a minimum number of edges, which makes the total complexity of the algorithm O(mn 2 ).
(2) In order to avoid increasing the complexity from O(mn) to O(mn 2 ), it is able to construct a stable set CV for a constant number M of uncolored vertices v with the largest values AU(v).
A solution in-between is to follow alternative (2), but with M = ⌊pn⌋and 0 <p <1, which makes the overall complexity O(mn 2 ), but approximately decreases the total computing time by a factor p when compared with alternative (1).

Discussion
From what the author has said above, the two algorithms are both solutions aiming to solve the graph coloring problem and they have different characteristics. As for the algorithm based on genetic algorithm, genetic algorithm has the characteristics of group search. At the beginning of the search, there is an initial group which include multiple individuals. From one aspect, it can effectively avoid searching some points which are not supposed to be searched. Nevertheless, genetic algorithm is easy to be irregular and inaccurate in coding. Besides, the efficiency of genetic algorithm is usually lower than other traditional optimization methods. When it comes to the algorithm based on greedy algorithm, it is efficient and easy to understand and finish. However, it has weaknesses either. It relies on previous steps too much. In addition, the final solution is a collection of the optimal solution of each step, but it may not be the global optimal solution. All in all, the algorithm that is selected to solve the problem depends on the specific demands and physical truth.

Conclusion
As a classical NP-complete problem, graph coloring problem has a good application background in both theoretical and engineering application, such as frequency distribution problem of transmitting station [8]. There are several possible solutions but no one knows for sure which solution is the best one. In the article, the author introduces two of them. One based on genetic algorithm is to use modular function to make the number of independent set as small as possible and it has a good convergence rate. The other based on greedy algorithm is to get the optimal solution of every step, but it may not be the optimal solution of the whole problem. During the study of this problem, many new algorithms are raised and several new fields are broaden. In this article, the author introduces two algorithms based on genetic algorithm and greedy algorithm. Perhaps they are not the simplest ones, but they are the most fundamental ones. Having a complete understanding of these algorithms can help study other problems and in the future, study of graph coloring problem as well as more algorithms