Parameter tuning of a genetic algorithm for finding central vertices in graphs

This paper studies a genetic algorithm for finding the central vertices in graphs. The algorithm uses a different approach to the solution method presentation and describes a new insight in the crossover process. Studies are conducted to find the optimal parameters of the genetic algorithm such us crossover probability, mutation probability and population size. Based on the results, it can be claimed that with the right parameters, our algorithm shows good running time results with high accuracy of the correct answers.


Introduction
In recent times, with the introduction of computer technology and the emergence of new methods of data collection, the analysis of big data is becoming more acceptable. These studies reveal previously unexplored features of the communication of large groups of interacting objects. These data can represent not only theoretical, but also colossal practical benefits. It is obvious that for this kind of research it would be logical to have an appropriate algorithm that could work with big volumes of data in an acceptable amount of time. Varieties of big data include large graphs that describe social networks or the relationships between a large number of people. In graph theory, finding the graph radius and central vertices is one of the important issues in theoretical and applied tasks.
This work considers an unweighted undirected graph G = (V, E), where |V | = n is the number of vertices, |E| = m is the number of edges. Since the graph is unweighted, the path length between the vertices u and v is equal to the number of edges in it.
To find the central vertex in the graph, we can consider the concept of vertex eccentricity, i.e. the distance from the vertex to the most distant vertex. Then the graph radius is the minimum of the eccentricities of all the vertices. Besides, the vertex on which this minimum is reached is called central vertex.
This paper considers connected graphs, since if a graph is not connected, then there is a pair of vertices with infinite distance, and in this case the task does not make sense.
This problem is well examined, and there are several exact algorithms that are used to search for the central vertices of the graph. In addition, together with the problem of searching for central vertices in a graph, the problem of all-pairs shortest path in a graph is often considered.
On the one hand, this task may seem trivial. There is a breadth-first search (BFS) algorithm that allows one to find the distance from one vertex to all other vertices with a time estimate O(n + m). If the algorithm runs from each vertex, it can solve the problem and obtain the exact At its core, all other works were written to improve this time estimate. There are algorithms that allow one to solve this problem in less time. In general, there are two main approaches that were proposed for finding central vertices in graphs.
The first one is proposed in [1] and is based on matrix multiplication. In this algorithm, the graph is represented as an adjacency matrix. The algorithm is based on working with a matrix, and the most resource-consuming operation in its transformations is matrix multiplication. Accordingly, a time estimate of this algorithm is O(n 3 ), but if combined with fast multiplication algorithm, then it can be improved to a theoretical estimate of O(n 2.376 ) which in reality is usually close to O(n 2.81 ).
The second approach was proposed in the work [2] by D. Aingworth et al. This approach assumes that the set of vertices is divided into two subsets; the first contains vertices with high degrees and the second one with low degrees. This heuristic separation allows to start BFS algorithm rationally and reduce the asymptotics to O(m √ n). Further, this estimate was improved in the works [7][8][9]. However, all these algorithms either impose significant restrictions on graph sizes or use special data structures, which makes their practical implementation difficult. Even with these improvements, exact algorithms show good time characteristics only on graphs with small dimensions.
These methods can be well applied to small graphs. However, real-life graphs, that are of most interest for research, contain tens of thousands of nodes. As a result, exact algorithms cannot be applied to solve similar tasks due to unacceptable time costs. In this situation heuristic algorithms that give better temporal results may be the most suitable, but they do not guarantee the exact solution.
This work presents Ball-Shrinking Search Algorithm [10] that is a genetic algorithm aimed to find the central vertices in a graph. This algorithm was compared with some exact algorithms and with the genetic algorithm from the article [4] and proved to be promising. The paper examines the influence of parameters of a genetic algorithm, such as population size, probability of selection and mutation, on its performance.
The problem of choosing optimal parameters for heuristic and genetic algorithms was addressed in many papers, e.g. [11][12][13][14][15][16][17][18][19] 2. Ball-Shrinking Search Algorithm In order to find the exact solution to the problem, it is necessary to find the eccentricity of each vertex using the breadth-first search algorithm, and then to select the vertex whose eccentricity is minimal. Obviously, this approach is not acceptable for large graphs. The main idea of the algorithm is as follows. At each iteration of the proposed genetic algorithm, we find eccentricities only for a small number of vertices (individuals of the population), the convex hull of which we can conditionally call a ball. In this case, we assume that the central vertex is inside this ball.
The current population can be interpreted as a set of spheres each with a center at its own vertex from population. At the intersection of the spheres there lies a central vertex of the graph. In this regard, one can consider a pair of vertices from the population and find the shortest path between them using the BFS. After that, as a result of crossover operation for two individuals, a random vertex lying on the path is selected. It is supposed that we approach a vertex with the optimal eccentricity with each iteration (figure 1).
As a mutation process, an approach is chosen in which in order to change an existing population, the vertex is replaced by a random one from the set of its neighbors with a given probability p m .
Using the selection and crossover operators, we reduce the size of this ball towards the central vertex. Meanwhile, the mutation operator pushes the process of evolution towards the search for  a global rather than a local solution. As a rule, the diameter of the ball containing individuals of the current population decreases with each iteration, i.e. balls shrink. The most natural way to assess the quality of the solution obtained in the framework of the problem is to question the value of vertex eccentricity that is found using a breadth first search (BFS). In this case, natural selection gives priority to the vertices with a lower eccentricity.
The algorithm pseudocode is presented in Algorithm 1.
To visualize the work of the proposed algorithm, we present one of the graphs that was generated with use of random geometric model (figure 2).

Selection of optimal algorithm parameters
The main parameters of the genetic algorithm are the probability of a mutation p m , probability crossover p c and a population size N . In this section we investigate the effect of these parameters on the performance of the algorithm.
In [10] the algorithm took the values p c = 0.7 and p m = 0.1 and N = 20 for these parameters. With these values, the algorithm showed quite promising results.
The running time of the algorithm and the accuracy of the solution can be considered as performance characteristics. The nature of genetic algorithms is that any changes to these parameters can significantly affect both characteristics.
A series of experiments were conducted to find parameters that are best fit for the proposed algorithm. The results are described below.
All experiments were performed on a computer with AMD A8-7410 2.20 GHz CPU and 6 GB RAM. All algorithms were implemented in the programming language C++. The algorithm was run 100 times with each set of parameters. The running time of the algorithm is calculated as an average between them. Two algorithms by D. Aingworth [2] and R. Seidel [1] are used in order to find out the exact center vertex. The results are compared with them to calculate the percentage of errors in each set of parameters.
In this article, we present the results of the algorithm for random graphs G = (V, E), in which the number of vertices and the number of edges were assumed to be |V | = 2500, |E| = 90268 and |V | = 5000, |E| = 358553.
First of all, in order to find out the optimal values for the parameters p c and p m , we measured the running time of the algorithm with enumeration of p c and p m in increments of 0.1.
The first series of experiments was performed with constant population size N = 20. The running time of the algorithm on the geometric random graph is presented in tables 1 and 2.  As it can be seen from tables 1 and 2, the algorithm running time increases as it grows p c , p m . The results shown in these tables are predictable, since, by increasing the probability of an operation we increase the frequency of its execution.
A similar experiment for a percentage error measurement was carried out, the results of which are given in tables 3 and 4.
According to results from tables 3 and 4 the percentage of incorrectly found answers decreases with the increase of the values of p m and p c . These results can also be called foreseeable since      an increase in the number of crossover and mutation results should lead to an increase in the variety of vertices that the algorithm operates on, in which the desired solution may turn out to be. For a general estimation of the parameters, we introduce the function F , which allows us to find the optimal parameters p m and p c : The values of the function for α = 0.3 and β = 0.7 with normalized runtime and percentage of errors are presented in tables 5 and 6.
As it can be seen from table 5, F reaches its minimum at p m = 0.4 and p c = 0.4 on corresponding types of graph. Table 6 shows that for the second type of graph, the optimal probability values are p m = 0.2 and p c = 0.3.
In addition, we will illustrate the influence of the parameters p m and p c on the efficiency of the algorithm.    Table 5.  Figure 4 shows a similar dependence for graphs with a larger number of vertices at p m = 0.2. The influence of the parameter N on the efficiency of the algorithm is studied in the next series of experiments. The following experimental results were obtained for random graphs G = (V, E), in which the number of vertices and the number of edges were assumed to be |V | = 2500, |E| = 90268.
It is well known, that the effectiveness of genetic algorithms depends on the size of the population N . On one hand, an increase in N will lead to an increase in the run time of the algorithm, which is confirmed by figure 5.
On the other hand, an increase in this parameter will lead to a greater variety of individuals under consideration, and therefore to the probability of finding an accurate solution more quickly. This trend is shown in figure 6. It can be seen that with the increase of running time the percentage of error decreases exponentially.  Table 6.  To select a proper parameter N , we introduce the function F 1 , similar to function F : Choosing α and β allows one to visually select the parameter N in an optimal way. Figure 7 shows different behavior of the function for α = 0.3, β = 0.7, and α = 0.5, β = 0.5. From figure 7(b), we can conclude that the optimal value of parameter N for this type of graph is N = 30.

Empirical results
Based on experimental results, an example of which is given in the previous section, the parameters p m = 0.3, p c = 0.4, N = 30 were selected as optimal for our algorithm. These parameters were used for all following test runs.
To test the effectiveness of the algorithm for the chosen parameters, another (N4N) genetic algorithm was used, which is described in [4]. It has similar approaches to solving the considered problem but differs in the rules of mutation, crossover operator and population selection.  Algorithms were compared on graphs generated with the use of three random graph models • the well-known Barabasi-Albert model [5], • the random geometric graph [6] • the Erdos-Renyi model [3].
We choose parameter m = 2 for the BA graph, r = 0.1 in geometric graph and we set p = 1% in Erdos-Renyi model. For both genetic algorithms, the values of running time and the accuracy were found. Both algorithms were run 100 times on each test, which allowed us to obtain the average running time and percentage of errors -the ratio of the number of correctly found solutions to the total number of iterations. The results of the experiments are given in tables 7, 8 and 9.
Also, the running time of the proposed algorithm was compared with running time of Aingworth algorithm, that is one of the well-known exact algorithms. The results of these experiments can be seen in figure 8.
Furthermore, the algorithm is applied on real graphs of larger sizes and compared to the      Table 10. Running time and percentage of algorithms errors on real graphs. BSSA -Ball-Shrinking Search Algorithm, Trivial -Trivial Algorithm. (1) -gemsec-Facebook(artists network) [21], (2) -MUSAE GitHub Social Network [22]. From the obtained results it can be seen that the proposed algorithm works faster than the Aingworth algorithm and the N4N algorithm. In this case, the algorithm gives a significant percentage of error only on graphs of relatively small size, and since the time costs of exact algorithms are insignificant it is not advisable to use this approach. However, with the increase of graph size, the percentage of incorrect answers tends to minimize.

Conclusions
This work investigates the main parameters of the genetic algorithm that was proposed in the work [10], such us mutation probability p m , crossover probability p c , and population size N . A series of experiments were conducted that showed the strengths and weaknesses of our algorithm. Based on these results, it can be concluded that the running time of the algorithm increases significantly with increasing parameters p m , p c . However, the percentage of incorrect answers is significantly reduced. In addition, the percentage of errors is affected by population size. In this regard, it can be said that parameters should be selected based on time constraints and requirements for the percentage of errors.