Multi-level learning based memetic algorithm for community detection
Graphical abstract
Introduction
With the advancement of Internet and Web 2.0 techniques, many real-world complex systems, including online communities, power grids, collaboration systems, disease control systems, resource distribution systems and information recommendation systems, are closely related to our daily activities by sharing information [1], [2], [3], [4]. In a world of the huge and disordered information, how to use an effective computing model to mine and analyze the potentially useful information has become a commonly concerning issue [4]. In recent years, the research on complex networks has attracted more and more attentions in the fields of biology, physics, sociology and mathematics [1], [2], [3], [5]. The topological structure of complex systems can easily be modeled as a complex network with linked nodes. More specifically, the entities of complex systems can be represented as the nodes (or vertices) of networks, and the relations between entities can be modeled as links (or edges) of networks. Community structure is a common and important property in both complex networks and complex systems [6]. In complex networks, communities are made up of a set of nodes which have more connections with each other than those with the remaining nodes in the network [7], [8], [9], [10]. The community structure property is indispensable to reflect the potential structural behavior of networks [6]. In complex systems, communities are composed of a few entities which have similar properties. Generally, the potential functionality of complex systems is related to its community structure property [6].
Various methods have been proposed to detect communities in complex networks. Among them, one of the most popular techniques is based on the optimization of an objective, modularity, which is the most widely used criterion to evaluate the quality of the community structure of networks [6], [11]. The modularity optimization methods to detect communities are based on searching a particular network partition which has the maximal modularity [6], [11]. Recent studies in [12], [13] demonstrate that the number of local maxima in the optimization of modularity is exponentially growing with the increase of the size of complex networks.
Many heuristic algorithms based on optimizing modularity for community detection have been proposed in recent years [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. Some of them optimize from one solution to a better one and thus easily get into local optimal solutions [27]. Moreover, some of them are sensitive to the optimized order and thus the detected network partitions are not stable [6], [27]. For instance, the fast greedy method (FM) [14], which iteratively joins a pair of communities with the largest gain in modularity, tends to obtain quite large communities and neglect small ones [6], and thus it is easy to get into a local optimal network partition. The modularity-specific label propagation algorithm (LPAm), proposed by Barber and Clark [19], is sensitive to the optimized order of the nodes, and thus the revealed network partitions are different at different independent trials. Moreover, it is also easy to get into a local optimal network partition where the detected communities are similar in total degree [20]. The multistep techniques, proposed by Blondel et al. [17] and Schuetz and Cafisch [18], which iteratively merges a set of nodes and communities, are also easy to get into a local optimal solution. This is because the merged nodes and communities can hardly be separated again [28]. Several techniques have been proposed to enhance both the accuracy and the stability of modularity-based optimization methods. Lai et al. [29] and Yan and Gregory [30] use network preprocessing techniques to improve the accuracy. Both of them are based on the assumption that the vertices in the same community possess more similar behaviors than those in different communities. Lancichinetti and Fortunato [31] adopt the consensus clustering technique to enhance both the accuracy and the stability of the resulting network partitions. Pizzuti, Shi and we adopt multiobjective optimization algorithms to discover a proper and stable network partition which has a high value of modularity [25], [32], [33]. In this study, we try to use an intelligent hybrid technique, memetic algorithm (MA), to improve the performance of modularity-based community detection algorithms by simultaneously evolving a population of solutions to better ones.
With the rapid development of computer science, mathematics and biology, the research on intelligent algorithms for solving practical engineering problems has attracted increasing attentions in recent years. Many intelligent algorithms, including genetic algorithm (GA) [34], memetic algorithm [35], [36], [37], artificial neural network [38], simulated annealing [15], swarm intelligence algorithms [39], fuzzy system [40], [41], [42], [43], [44], artificial immune system [39], and so on, have been designed as problem-specific techniques for tackling and solving real-world applications [32], [45], [46], [47], [48], [49], [50], [51]. Compared with the traditional algorithms, the problem-specific intelligent algorithms can effectively find a proper solution with high quality in a reasonable period of time [39].
Memetic algorithms (MAs) are hybrid global-local heuristic search methodologies [35]. The global heuristic search is usually a form of population-based method, while the local search is generally considered as an individual learning procedure for accelerating the convergence to an optimum [35]. In general, the population-based global search has the advantage of exploring the promising search space and providing a reliable estimate of the global optimum [36]. However, the population-based global search is difficult to discover an optimal solution around the explored search space in a short time. The local search is usually designed for accelerating the search and finding the best solutions on the explored search space. Therefore, this hybridization, which synthesizes the complementary advantages of the population-based global search and the individual-based local search, can effectively produce better solutions [37]. Recent studies on MAs have demonstrated that they are effective and efficient for tackling the optimization problems in many real world applications [35], [36], [37], [48], [49], [50], [51]. MAs have also been used for uncovering communities in networks in recent years [45], [52], [53], [54]. For instance, in [45], we proposed a memetic algorithm, named as Meme-Net, to uncover communities at different hierarchical levels. Meme-Net shows its effectiveness. However, its high computational complexity makes it impossible to search communities on slightly large networks. Shang et al. [52] and we [53] try to adopt simulated annealing as an individual learning procedure to decrease the computational complexity of Meme-Net. However, their computational complexity are still very high relative to classical modularity-based community detection algorithms. The algorithm in [54] adopts the technique in [17] as the local search. Meanwhile, it takes a large amount of time and energy on generating initial population as it directly use the algorithm in [17] to initialize a population of solutions. Therefore, the algorithms in [45], [52], [53], [54] are difficult to apply to real-world problems.
Motivated by the above descriptions, in this study, we present a fast memetic algorithm with multi-level learning strategies to detect communities by optimizing modularity. We term the proposed algorithm as MLCD for short. MLCD adopts a genetic algorithm as the global search and uses the proposed multi-level learning algorithms to accelerate the convergence. The proposed multi-level learning strategies work on the network at node, cluster, and network partition levels, respectively. By iteratively executing GA and multi-level learning strategies, a network partition with high modularity can be accurately and stably obtained. We also employ a modularity-specific label propagation rule to update the cluster identifier of each node at each operation. The simple update rule guarantees the rapidity of the proposed algorithm. Experiments on GN and LFR benchmarks and 12 real-world networks demonstrate that compared with the classical community detection algorithms, MLCD has the superior performance in stably finding a proper community structure of networks. It is also shown that compared with Meme-Net, MLCD takes much less time to find a more proper and stable community division of networks.
The remainder of this paper is organized as follows. In the next section, the problem definition is given. Section 3 gives a detail description for the proposed algorithm. In Section 4, experiments on GN and LFR benchmarks and 12 real-world networks are given to demonstrate the effectiveness of the proposed algorithm. Finally, the conclusion is given.
Section snippets
Problem definition
Let us consider an unweighted and undirected graph G = (V ; E) which has |V| = n vertices (or nodes) and |E| = m edges (or links), the connection of the graph G can be represented as an adjacency matrix A. Its element Aij is 1 when a link between nodes and exists and 0 otherwise [6]. In order to detect the underlying structure of complex networks, we need to know the definition of community. However, there is no agreement of the definition of community in networks. Its definition is closely
The proposed memetic algorithm with multi-level learning strategies for community detection
The proposed algorithm MLCD is easy to implement, and its process can be described as follows. Firstly, initialize a population of solutions through a problem-specific strategy, where Np is predefined as the size of the population. Then, the genetic operators, including crossover and mutation procedures, are performed according to the predefined ratio on the randomly chosen solutions XC. Next, use the proposed multi-level learning strategies to accelerate the above evolutionary
Experimental results
In this section, we test MLCD on the GN [55] and LFR [56] benchmarks networks and 12 real-world networks. The comparisons between MLCD and its three variants, M-two-phase, M-LPAm and GA, are made to illustrate the effectiveness of each level learning strategy. The algorithms GA and M-two-phase are the variants of MLCD by removing the multi-level learning strategies and the partition-level learning strategy, respectively. The algorithm M-LPAm is the variant of MLCD by removing both the
Concluding remarks
Nowadays, many real-world complex systems have accumulated a large amount of disordered information. How to discover the potential functionality of these systems from the massive data has attracted great attentions in recent years. In this study, a network technique is adopted to model the potential topology structure of complex systems, and a fast multi-level learning based memetic algorithm is proposed to optimize modularity for revealing the potential community structures of complex
Acknowledgements
The authors wish to thank the editors and anonymous reviewers for their valuable comments and helpful suggestions which greatly improved the paper's quality. This work was supported by the National Natural Science Foundation of China (Grant No. 61273317), the National Top Youth Talents Program of China, the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130203110011), and the Fundamental Research Fund for the Central Universities (Grant Nos. K50510020001 and
References (68)
Community detection in graphs
Physics Reports
(2010)- et al.
Maximizing modularity intensity for community partition and evolution
Information Sciences
(2013) - et al.
Multi-objective community detection in complex networks
Applied Soft Computing
(2012) - et al.
A fast parallel modularity optimization algorithm (FPMQA) for community detection in online social network
Knowledge-Based Systems
(2013) - et al.
Hybrid metaheuristics in combinatorial optimization: a survey
Applied Soft Computing
(2011) - et al.
Supervisory adaptive dynamic rbf-based neural-fuzzy control system design for unknown nonlinear systems
Applied Soft Computing
(2013) - et al.
Community detection in networks by using multiobjective evolutionary algorithm with decomposition
Physica A
(2012) - et al.
A hybrid differential evolution algorithm for job shop scheduling problems with expected total tardiness criterion
Applied Soft Computing
(2013) - et al.
Inventory based two-objective job shop scheduling model and its hybrid genetic algorithm
Applied Soft Computing
(2013) A twenty-first century science
Nature
(2007)
Life in the network: the coming age of computational social science
Science
Emergence of structural and dynamical properties of ecological mutualistic networks
Nature
Social computing: from social informatics to social intelligence
IEEE Intelligent Systems
Spontaneous synchrony in power-grid networks
Nature Physics
Defining and identifying communities in networks
Proceedings of the National Academy of Sciences of the United States of America
Finding and evaluating community structure in networks
Physical Review E
Efficient discovery of overlapping communities in massive networks
Proceedings of the National Academy of Sciences of the United States of America
Modularity and community structure in networks
Proceedings of the National Academy of Sciences of the United States of America
Performance of modularity maximization in practical contexts
Physical Review E
Limits of modularity maximization in community detection
Physical Review E
Finding community structure in very large networks
Physical Review E
The worldwide air transportation network: anomalous centrality, community structure, and cities’ global roles
Proceedings of the National Academy of Sciences of the United States of America
Community detection in complex networks using extremal optimization
Physical Review E
Fast unfolding of communities in large networks
Journal of Statistical Mechanics: Theory and Experiment
Efficient modularity optimization by multistep greedy algorithm and vertex refinement
Physical Review E
Detecting network communities by propagating labels under constraints
Physical Review E
Advanced modularity-specialized label propagation algorithm for detecting communities in networks
Physica A
Mixture models and exploratory analysis in networks
Proceedings of the National Academy of Sciences of the United States of America
Iterated tabu search for identifying community structure in complex networks
Physical Review E
Revealing network communities through modularity maximization by a contraction-dilation method
New Journal of Physics
Modularity-maximizing graph communities via mathematical programming
European Physical Journal B
Evolutionary method for finding communities in bipartite networks
Physical Review E
The map equation
European Physical Journal Special Topics
Enhanced modularity-based community detection by random walk network preprocessing
Physical Review E
Cited by (101)
Dynamic community detection including node attributes
2023, Expert Systems with ApplicationsHeuristics and metaheuristics for biological network alignment: A review
2022, NeurocomputingPMCDM: Privacy-preserving multiresolution community detection in multiplex networks
2022, Knowledge-Based SystemsCitation Excerpt :Similar to the hierarchical community detection (HCD) [18,19], smaller communities obtained by MCD in higher resolution levels tend to group together to form larger ones in lower resolution levels [20]. Note that, MCD is different from HCD, as MCD mainly focuses on the scale resolution threshold of communities, while HCD primarily considers the hierarchical organization of communities [1,20–25]. The studies on MCD are essential to understanding the functional modules of social, biological, physical, and IoT systems at different scales [1,21,24,25].
Graph embedding via multi-scale graph representations
2021, Information SciencesCitation Excerpt :In graph analyses, multi-scale approaches are conductive to capture diverse global structures of the original graph, especially for community detection [18,25]. Generally, the multi-scale community detection algorithms [23,30,37] transfer the original graph into a smaller one by coalescing some nodes as a supernode. Then these supernodes will be partitioned into clusters.
Community-guided link prediction in multiplex networks
2021, Journal of Informetrics