Approaching the bi-objective critical node detection problem with a smart initialization-based evolutionary algorithm

Determining the critical nodes in a complex network is an essential computation problem. Several variants of this problem have emerged due to its wide applicability in network analysis. In this article we study the bi-objective critical node detection problem (BOCNDP), which is a new variant of the well-known critical node detection problem, optimizing two objectives at the same time: maximizing the number of connected components and minimizing the variance of their cardinalities. Evolutionary multi-objective algorithms (EMOA) are a straightforward choice to solve this type of problem. We propose three different smart initialization strategies which can be incorporated into any EMOA. These initialization strategies take into account the basic properties of the networks. They are based on the highest degree, random walk (RW) and depth-first search. Numerical experiments were conducted on synthetic and real-world network data. The three different initialization types significantly improve the performance of the EMOA.


INTRODUCTION
In recent years, complex networks have received a lot of attention due to their applicability in various domains. Several optimization problems were studied within complex networks like community detection (Fortunato, 2010), maximal influence node detection (Kempe, Kleinberg & Tardos, 2003) and link prediction (Liben-Nowell & Kleinberg, 2007). All of the aforementioned problems reveal major insights into the networks studied.
Identifying critical nodes (the critical node detection problem, or CNDP) in a complex network is a crucial task. The base problem consists of minimizing pairwise connectivity by removing a subset of K nodes in a given graph. In Arulselvan et al. (2009) it was proven to be an NP-hard problem.
The general formulation of the problem is Lalou, Tahraoui & Kheddouci (2018): given a G = (V ,E) graph and a connectivity metric λ, find the set of nodes S ⊆ V such that G[V \ S] satisfies the metric λ. This metric is usually defined as an objective function that needs to be optimized (for example, maximize the number of components, minimize the component size, etc.).
CNDP has a wide field of applicability, for example in social network analysis (Fan & Pardalos, 2010), epidemic control (Tao, Zhongqian & Binghong, 2006), network immunization (Kuhlman et al., 2010) and biological networks (Liu et al., 2020). Several algorithms were designed for the CNDP. The majority of the exact methods are based on the integer linear programming formulation of the problem (Summa, Grosso & Locatelli, 2012). In Addis, Summa & Grosso (2013), a dynamic programming approach is proposed for a special class of graphs. As approximation algorithms, we can mention, for example, a simulated annealing algorithm (Ventresca, 2012). A thorough survey of existing methods for the CNDP can be found in Lalou, Tahraoui & Kheddouci (2018).
The CNDP has several variants exploring the connectivity metric λ. Other variants with constraints were introduced, such as the cardinality constrained critical node detection problem (CC-CNDP) (Arulselvan et al., 2011) and the component-cardinality-constrained critical node problem(3C-CNDP) (Lalou, Tahraoui & Kheddouci, 2016). One of the existing bi-objective variants of the CNDP is proposed in Li et al. (2019). In this variant the cost of removing the node counts. Another bi-objective variant proposed in Ventresca, Harrison & Ombuki-Berman (2018) is the base of our study (described in 'The Bi-objective Critical Node Detection Problem').
Evolutionary algorithms are powerful tools in optimization problems. Multi-objective optimization problems involve multiple objective functions which need to be optimized at the same time, so they can be used in real-world optimization problems. Because the BOCNDP is an NP-hard problem (Ventresca, Harrison & Ombuki-Berman, 2018), the use of the evolutionary algorithms is straightforward. To increase the performance of evolutionary algorithms, several techniques were designed, for example, hybridization, a special case of memetic algorithms which incorporates a local search in the initialization phase. Kazimipour, Li & Qin (2014) emphasizes the importance of population initialization techniques, introduces a new categorization, and mentions some concrete examples.
Due to the wide applicability of the critical node detection problem, this article introduces new smart initialization strategies that can be incorporated into any multi-objective optimization algorithm that treats the BOCNDP to increase its performance. These strategies can be used in other variants of the CNDP, or even for other computationally graphed theoretical problems because they take into account structural information about the network.
To summarise, the main goal of this article is as follows: • a smart initialization which is based on a depth-first search: nodes lying on a path are chosen to be in the initial population; • a random walk-based smart initialization strategy: a random walk is simulated on the graph, and nodes that appear more times in the walk are considered more important; • a degree-based smart initialization strategy: nodes with a higher degree are more likely to be chosen in the initial population; • statistical analysis of the three smart initialization strategies introduced here and their comparison with random initialization.
The rest of the article is organized as follows: In the second section, we describe the bi-objective critical node detection problem and the existing solving algorithms. In the third section, we present the proposed initialization algorithms. The next section describes the numerical experiments. The article ends with conclusions and further work.

THE BI-OBJECTIVE CRITICAL NODE DETECTION PROBLEM
Let G = (V ,E) be an undirected graph, where V is the set of nodes, and E is the set of edges.
Let G = (V ,E) be an undirected graph, where V is the set of nodes, and E is the set of edges. The bi-objective critical node detection problem was proposed in Ventresca, Harrison & Ombuki-Berman (2018) and consists of finding a fixed number of k nodes, which, if deleted from graph G, will optimize the following two objectives: 1. maximize the number of connected components; 2. minimize the variance of the cardinality of the connected components. Formally the objectives are the following: such that where w i are the weights associated to the vertices of the graph and W > 0 is a constraint, H denotes G[V \ S] the set of the connected components and var(H ) denotes the variance of the cardinality of the connected components and can be calculated in the following way:

EVOLUTIONARY COMPUTATION METHOD
Evolutionary algorithms are powerful optimization tools, especially in multi-objective optimization problems. To increase the performance of these algorithms, hybrid versions are designed and analysed. Smart initialization of the population of an evolutionary algorithm can increase the performance of the algorithm significantly (Maaranen, Miettinen & Penttinen, 2007).
We present three strategies that can be used in the initialization phase of any multiobjective algorithm. The first one is based on a depth search algorithm, outlined in the algorithm 1. A depth-first search (DFS) algorithm is started with a random initial node, and every xth element will be added to the chromosome, where x = |V | k , |V | is the number of nodes and k is the number of critical nodes.
The second initialization method is based on the degree distribution of the nodes. The first x nodes with the highest degree are set in the chromosome, and the rest of the k − x nodes are selected randomly, to preserve the stochastic nature of the initialization (Algorithm 2).
The third method is based on a random walk. We start the walk from a random node, t is the length of the walk and p r is the probability to restart the walk. In each step, the decision is to continue the walk or to restart. If we fail to walk through k different nodes, the algorithm will restart from another initial point. In the walk, we keep counting how many times a node appears. The more times it appears, the higher the probability it is a gene in the chromosome. The main steps are presented in the algorithm 3.
These initialization strategies can be used in any kind of multi-objective evolutionary algorithm. The outline of the smart initialization-based algorithm is depicted in the algorithm 4.

Synthetic data
We use the synthetic graph set proposed in Ventresca (2012). The benchmark set contains four different types of graphs: Barabási-Albert (BA), Erds-Rényi (ER), Forest-fire (FF) and Watts-Strogatz (WS). Barabási-Albert graphs are scale-free networks, using a preferential attachment mechanism and some high degree nodes (hubs). Erds-Rényi graphs are random networks in which each link between nodes is generated randomly based on a probability. Forest-fire graphs are random graphs with a preferential attachment mechanism. Watts-Strogatz graphs are random graphs with short average path lengths, so they have a dense structure. Table 1 presents some basic properties of the benchmarks used: number of nodes (|V |), number of edges (|E|), the number of critical nodes (k), average degree ( d ), density of the graph (ρ), and average path length (l G ).
Real dataset Nine real datasets are used for the numeric experiments. The real datasets come from different areas: transportation networks (USAir97, TrainsRome, EUFlights), biological networks (Bovine, EColi, HumanDis), social networks (Oclinks, Facebook), and an electric network (Circuit). The size of the graphs varies from 121 to 4039 nodes. The density of the networks varies from 0.008 to 0.044 and the average path length is from 2.622 to 43.496. The basic properties of the networks are outlined in Table 2.

Statistical analysis of the smart initialization strategies
To analyse the behaviour of the initialization strategies introduced, we generated 100 independent solutions and calculated the values of |H | and var(H ). A statistical test was Algorithm 4 Evolutionary algorithm with smart initialization Require: G,k 1: initialize 1 population P s ; 2: run a multi-objective Pareto based optimization algorithm 2 , where P initial = P s ; 3: return Pareto front 1 for the initialization we use: random initialization, depth-first search, degree, random walk; 2 e.g., NSGA-II, SPEA conducted to mark the differences between the methods. Table 3 presents the results. In almost all cases, the degree-based initialization outperformed the other strategies. Almost in all cases the degree based initialization outperformed the other strategies, but all of them outperformed the random initialization.

Algorithm
For the numerical experiments, we used the NSGA-II (Deb et al., 2002) algorithm within the Platypus (https://github.com/quaquel/Platypus, last accessed 3/12/2019) framework. NSGA-II is a multi-objective evolutionary algorithm in which every member of the population is sorted according to the level of non-domination. To maintain diversity, a crowding distance is applied.

Parameter setting
For the numerical experiments, parameters of the NSGA-II algorithm are the default values of the Platypus framework, with a total evaluation number of 10000. All the weights of the nodes are set to 1, and W equals the number of nodes. Parameters of the smart initialization strategies are as follows: for the DFS, x is the number of nodes divided by the value of k; for the random walk, the number of steps is 10000 and the probability of restart is 0.2; and for the Deg algorithm, x = k 3 .

Performance evaluation
For the performance evaluation, we use the hypervolume indicator (Zitzler & Thiele, 1998;Zitzler & Thiele, 1999), a popular measure for multi-objective optimization algorithms. The hypervolume indicator measures the volume of the region of the dominated points in the objective space bounded by a reference point.

RESULTS AND DISCUSSION
In the case of synthetic benchmarks, we conducted ten independent runs for each initialization strategy (depth-first search, degree-based, random walk) and made comparisons with random initialization. Table 4 presents the mean values and the standard deviation of the hypervolume indicators. For a reference point, we set the nadir point of all unified Pareto fronts. We conducted a Wilcoxon sign rank nonparametric test for the hypervolume indicator reported by each method. The Wilcoxon sign rank assesses if there is a significant difference between the two sample means. An (*) is used to indicate the statistical significance of differences. All initialization strategies which are not statistically different from the best one are marked. Figure 2 presents the Pareto front obtained within a single run. Table 5 describes the mean value and standard deviation obtained for the real datasets. Best results are marked with an (*).
Based on the results, we can draw a general conclusion for the synthetic benchmarks about the best initialization strategy. The structure of the graph determines which initialization is worth using, but based on the numerical experiments, all of them give better results than the random initialization. In the case of Barabási-Albert graphs, which contain hubs, the degree-based initialization gets the best result. Erds-Rényi graphs are random graphs, in which case the depth-first search algorithm seems to be best. For Forest-fire graphs, which are random graphs, the three proposed initialization types gave almost the same result. The Watts-Strogatz graphs have a dense structure, and the best results were provided by the random walk-based algorithm.
In the case of the real networks, all three proposed initialization strategies outperformed the random initialization. In most cases, the degree-based initialization seemed to give the best results.

CONCLUSIONS AND FURTHER WORK
In this paper, we propose three smart population initialization methods for the BOCNDP problem. Numerical experiments show the effectiveness of the proposed approaches. All three methods outperformed the traditional random initialization.
As further work, other initialization strategies will be investigated and an adaptive algorithm can be developed to find the best initialization, taking into account the basic properties of the graph.