Elsevier

Applied Soft Computing

Volume 19, June 2014, Pages 121-133
Applied Soft Computing

Multi-level learning based memetic algorithm for community detection

https://doi.org/10.1016/j.asoc.2014.02.003Get rights and content

Highlights

  • We propose a fast memetic algorithm to uncover community structure in networks.

  • The proposed algorithm is based on novel multi-level learning strategies at nodes, communities and network partitions levels.

  • Our algorithm need not know the number of clusters in advance.

  • Our algorithm has superior performances in speed, accuracy and stability.

Abstract

Complex network has become an important way to analyze the massive disordered information of complex systems, and its community structure property is indispensable to discover the potential functionality of these systems. The research on uncovering the community structure of networks has attracted great attentions from various fields in recent years. Many community detection approaches have been proposed based on the modularity optimization. Among them, the algorithms which optimize one initial solution to a better one are easy to get into local optima. Moreover, the algorithms which are susceptible to the optimized order are easy to obtain unstable solutions. In addition, the algorithms which simultaneously optimize a population of solutions have high computational complexity, and thus they are difficult to apply to practical problems. To solve the above problems, in this study, we propose a fast memetic algorithm with multi-level learning strategies for community detection by optimizing modularity. The proposed algorithm adopts genetic algorithm to optimize a population of solutions and uses the proposed multi-level learning strategies to accelerate the optimization process. The multi-level learning strategies are devised based on the potential knowledge of the node, community and partition structures of networks, and they work on the network at nodes, communities and network partitions levels, respectively. Extensive experiments on both benchmarks and real-world networks demonstrate that compared with the state-of-the-art community detection algorithms, the proposed algorithm has effective performance on discovering the community structure of networks.

Introduction

With the advancement of Internet and Web 2.0 techniques, many real-world complex systems, including online communities, power grids, collaboration systems, disease control systems, resource distribution systems and information recommendation systems, are closely related to our daily activities by sharing information [1], [2], [3], [4]. In a world of the huge and disordered information, how to use an effective computing model to mine and analyze the potentially useful information has become a commonly concerning issue [4]. In recent years, the research on complex networks has attracted more and more attentions in the fields of biology, physics, sociology and mathematics [1], [2], [3], [5]. The topological structure of complex systems can easily be modeled as a complex network with linked nodes. More specifically, the entities of complex systems can be represented as the nodes (or vertices) of networks, and the relations between entities can be modeled as links (or edges) of networks. Community structure is a common and important property in both complex networks and complex systems [6]. In complex networks, communities are made up of a set of nodes which have more connections with each other than those with the remaining nodes in the network [7], [8], [9], [10]. The community structure property is indispensable to reflect the potential structural behavior of networks [6]. In complex systems, communities are composed of a few entities which have similar properties. Generally, the potential functionality of complex systems is related to its community structure property [6].

Various methods have been proposed to detect communities in complex networks. Among them, one of the most popular techniques is based on the optimization of an objective, modularity, which is the most widely used criterion to evaluate the quality of the community structure of networks [6], [11]. The modularity optimization methods to detect communities are based on searching a particular network partition which has the maximal modularity [6], [11]. Recent studies in [12], [13] demonstrate that the number of local maxima in the optimization of modularity is exponentially growing with the increase of the size of complex networks.

Many heuristic algorithms based on optimizing modularity for community detection have been proposed in recent years [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. Some of them optimize from one solution to a better one and thus easily get into local optimal solutions [27]. Moreover, some of them are sensitive to the optimized order and thus the detected network partitions are not stable [6], [27]. For instance, the fast greedy method (FM) [14], which iteratively joins a pair of communities with the largest gain in modularity, tends to obtain quite large communities and neglect small ones [6], and thus it is easy to get into a local optimal network partition. The modularity-specific label propagation algorithm (LPAm), proposed by Barber and Clark [19], is sensitive to the optimized order of the nodes, and thus the revealed network partitions are different at different independent trials. Moreover, it is also easy to get into a local optimal network partition where the detected communities are similar in total degree [20]. The multistep techniques, proposed by Blondel et al. [17] and Schuetz and Cafisch [18], which iteratively merges a set of nodes and communities, are also easy to get into a local optimal solution. This is because the merged nodes and communities can hardly be separated again [28]. Several techniques have been proposed to enhance both the accuracy and the stability of modularity-based optimization methods. Lai et al. [29] and Yan and Gregory [30] use network preprocessing techniques to improve the accuracy. Both of them are based on the assumption that the vertices in the same community possess more similar behaviors than those in different communities. Lancichinetti and Fortunato [31] adopt the consensus clustering technique to enhance both the accuracy and the stability of the resulting network partitions. Pizzuti, Shi and we adopt multiobjective optimization algorithms to discover a proper and stable network partition which has a high value of modularity [25], [32], [33]. In this study, we try to use an intelligent hybrid technique, memetic algorithm (MA), to improve the performance of modularity-based community detection algorithms by simultaneously evolving a population of solutions to better ones.

With the rapid development of computer science, mathematics and biology, the research on intelligent algorithms for solving practical engineering problems has attracted increasing attentions in recent years. Many intelligent algorithms, including genetic algorithm (GA) [34], memetic algorithm [35], [36], [37], artificial neural network [38], simulated annealing [15], swarm intelligence algorithms [39], fuzzy system [40], [41], [42], [43], [44], artificial immune system [39], and so on, have been designed as problem-specific techniques for tackling and solving real-world applications [32], [45], [46], [47], [48], [49], [50], [51]. Compared with the traditional algorithms, the problem-specific intelligent algorithms can effectively find a proper solution with high quality in a reasonable period of time [39].

Memetic algorithms (MAs) are hybrid global-local heuristic search methodologies [35]. The global heuristic search is usually a form of population-based method, while the local search is generally considered as an individual learning procedure for accelerating the convergence to an optimum [35]. In general, the population-based global search has the advantage of exploring the promising search space and providing a reliable estimate of the global optimum [36]. However, the population-based global search is difficult to discover an optimal solution around the explored search space in a short time. The local search is usually designed for accelerating the search and finding the best solutions on the explored search space. Therefore, this hybridization, which synthesizes the complementary advantages of the population-based global search and the individual-based local search, can effectively produce better solutions [37]. Recent studies on MAs have demonstrated that they are effective and efficient for tackling the optimization problems in many real world applications [35], [36], [37], [48], [49], [50], [51]. MAs have also been used for uncovering communities in networks in recent years [45], [52], [53], [54]. For instance, in [45], we proposed a memetic algorithm, named as Meme-Net, to uncover communities at different hierarchical levels. Meme-Net shows its effectiveness. However, its high computational complexity makes it impossible to search communities on slightly large networks. Shang et al. [52] and we [53] try to adopt simulated annealing as an individual learning procedure to decrease the computational complexity of Meme-Net. However, their computational complexity are still very high relative to classical modularity-based community detection algorithms. The algorithm in [54] adopts the technique in [17] as the local search. Meanwhile, it takes a large amount of time and energy on generating initial population as it directly use the algorithm in [17] to initialize a population of solutions. Therefore, the algorithms in [45], [52], [53], [54] are difficult to apply to real-world problems.

Motivated by the above descriptions, in this study, we present a fast memetic algorithm with multi-level learning strategies to detect communities by optimizing modularity. We term the proposed algorithm as MLCD for short. MLCD adopts a genetic algorithm as the global search and uses the proposed multi-level learning algorithms to accelerate the convergence. The proposed multi-level learning strategies work on the network at node, cluster, and network partition levels, respectively. By iteratively executing GA and multi-level learning strategies, a network partition with high modularity can be accurately and stably obtained. We also employ a modularity-specific label propagation rule to update the cluster identifier of each node at each operation. The simple update rule guarantees the rapidity of the proposed algorithm. Experiments on GN and LFR benchmarks and 12 real-world networks demonstrate that compared with the classical community detection algorithms, MLCD has the superior performance in stably finding a proper community structure of networks. It is also shown that compared with Meme-Net, MLCD takes much less time to find a more proper and stable community division of networks.

The remainder of this paper is organized as follows. In the next section, the problem definition is given. Section 3 gives a detail description for the proposed algorithm. In Section 4, experiments on GN and LFR benchmarks and 12 real-world networks are given to demonstrate the effectiveness of the proposed algorithm. Finally, the conclusion is given.

Section snippets

Problem definition

Let us consider an unweighted and undirected graph G = (V ; E) which has |V| = n vertices (or nodes) and |E| = m edges (or links), the connection of the graph G can be represented as an adjacency matrix A. Its element Aij is 1 when a link between nodes vi and vj exists and 0 otherwise [6]. In order to detect the underlying structure of complex networks, we need to know the definition of community. However, there is no agreement of the definition of community in networks. Its definition is closely

The proposed memetic algorithm with multi-level learning strategies for community detection

The proposed algorithm MLCD is easy to implement, and its process can be described as follows. Firstly, initialize a population of solutions XB={x1,x2,,xNp} through a problem-specific strategy, where Np is predefined as the size of the population. Then, the genetic operators, including crossover and mutation procedures, are performed according to the predefined ratio on the randomly chosen solutions XC. Next, use the proposed multi-level learning strategies to accelerate the above evolutionary

Experimental results

In this section, we test MLCD on the GN [55] and LFR [56] benchmarks networks and 12 real-world networks. The comparisons between MLCD and its three variants, M-two-phase, M-LPAm and GA, are made to illustrate the effectiveness of each level learning strategy. The algorithms GA and M-two-phase are the variants of MLCD by removing the multi-level learning strategies and the partition-level learning strategy, respectively. The algorithm M-LPAm is the variant of MLCD by removing both the

Concluding remarks

Nowadays, many real-world complex systems have accumulated a large amount of disordered information. How to discover the potential functionality of these systems from the massive data has attracted great attentions in recent years. In this study, a network technique is adopted to model the potential topology structure of complex systems, and a fast multi-level learning based memetic algorithm is proposed to optimize modularity for revealing the potential community structures of complex

Acknowledgements

The authors wish to thank the editors and anonymous reviewers for their valuable comments and helpful suggestions which greatly improved the paper's quality. This work was supported by the National Natural Science Foundation of China (Grant No. 61273317), the National Top Youth Talents Program of China, the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130203110011), and the Fundamental Research Fund for the Central Universities (Grant Nos. K50510020001 and

References (68)

  • D. Lazer et al.

    Life in the network: the coming age of computational social science

    Science

    (2009)
  • S. Suweis et al.

    Emergence of structural and dynamical properties of ecological mutualistic networks

    Nature

    (2013)
  • F.Y. Wang et al.

    Social computing: from social informatics to social intelligence

    IEEE Intelligent Systems

    (2007)
  • A.E. Motter et al.

    Spontaneous synchrony in power-grid networks

    Nature Physics

    (2013)
  • F. Radicchi et al.

    Defining and identifying communities in networks

    Proceedings of the National Academy of Sciences of the United States of America

    (2004)
  • M.E.J. Newman et al.

    Finding and evaluating community structure in networks

    Physical Review E

    (2004)
  • P.K. Gopalan et al.

    Efficient discovery of overlapping communities in massive networks

    Proceedings of the National Academy of Sciences of the United States of America

    (2013)
  • M.E.J. Newman

    Modularity and community structure in networks

    Proceedings of the National Academy of Sciences of the United States of America

    (2006)
  • B.H. Good et al.

    Performance of modularity maximization in practical contexts

    Physical Review E

    (2010)
  • A. Lancichinetti et al.

    Limits of modularity maximization in community detection

    Physical Review E

    (2011)
  • A. Clauset et al.

    Finding community structure in very large networks

    Physical Review E

    (2004)
  • R. Guimerà et al.

    The worldwide air transportation network: anomalous centrality, community structure, and cities’ global roles

    Proceedings of the National Academy of Sciences of the United States of America

    (2005)
  • J. Duch et al.

    Community detection in complex networks using extremal optimization

    Physical Review E

    (2005)
  • V.D. Blondel et al.

    Fast unfolding of communities in large networks

    Journal of Statistical Mechanics: Theory and Experiment

    (2008)
  • P. Schuetz et al.

    Efficient modularity optimization by multistep greedy algorithm and vertex refinement

    Physical Review E

    (2008)
  • M.J. Barber et al.

    Detecting network communities by propagating labels under constraints

    Physical Review E

    (2009)
  • X. Liu et al.

    Advanced modularity-specialized label propagation algorithm for detecting communities in networks

    Physica A

    (2009)
  • M.E.J. Newman et al.

    Mixture models and exploratory analysis in networks

    Proceedings of the National Academy of Sciences of the United States of America

    (2007)
  • Z. et al.

    Iterated tabu search for identifying community structure in complex networks

    Physical Review E

    (2009)
  • J. Mei et al.

    Revealing network communities through modularity maximization by a contraction-dilation method

    New Journal of Physics

    (2009)
  • G. Agarwal et al.

    Modularity-maximizing graph communities via mathematical programming

    European Physical Journal B

    (2008)
  • W. Zhan et al.

    Evolutionary method for finding communities in bipartite networks

    Physical Review E

    (2011)
  • M. Rosvall et al.

    The map equation

    European Physical Journal Special Topics

    (2009)
  • D. Lai et al.

    Enhanced modularity-based community detection by random walk network preprocessing

    Physical Review E

    (2010)
  • Cited by (101)

    • Dynamic community detection including node attributes

      2023, Expert Systems with Applications
    • PMCDM: Privacy-preserving multiresolution community detection in multiplex networks

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Similar to the hierarchical community detection (HCD) [18,19], smaller communities obtained by MCD in higher resolution levels tend to group together to form larger ones in lower resolution levels [20]. Note that, MCD is different from HCD, as MCD mainly focuses on the scale resolution threshold of communities, while HCD primarily considers the hierarchical organization of communities [1,20–25]. The studies on MCD are essential to understanding the functional modules of social, biological, physical, and IoT systems at different scales [1,21,24,25].

    • Graph embedding via multi-scale graph representations

      2021, Information Sciences
      Citation Excerpt :

      In graph analyses, multi-scale approaches are conductive to capture diverse global structures of the original graph, especially for community detection [18,25]. Generally, the multi-scale community detection algorithms [23,30,37] transfer the original graph into a smaller one by coalescing some nodes as a supernode. Then these supernodes will be partitioned into clusters.

    View all citing articles on Scopus
    View full text