An efficient agent-based algorithm for overlapping community detection using nodes’ closeness

https://doi.org/10.1016/j.physa.2013.06.056Get rights and content

Highlights

  • Considering nodes’ closeness improves the methods that are based on label propagation.

  • L2I quantify how much internal a link is with respect to a specific community.

  • NMI improvement by the proposed algorithm in synthetic networks.

  • F-score improvement in the overlapping nodes identification problem.

Abstract

Communities are groups of nodes forming tightly connected units in networks. Some nodes can be shared between different communities of a network. The presence of overlapping nodes and their associated membership diversity is a common characteristic of social networks. Analyzing these overlapping structures can reveal valuable information about the intrinsic features of realistic complex networks, especially social networks.

In this paper, we propose a novel algorithm that is able to detect overlapping and non-overlapping community structures in complex networks. This algorithm uses a number of agents for investigation of the input network. These agents consider different nodes’ closeness in their investigations. Various experiments are carried out on both synthetic and real-world networks that prove that the proposed algorithm outperforms most state-of-the-art algorithms of this field both in terms of the accuracy and execution time.

Introduction

Community or modular structure is one of the most significant properties of graphs representing real systems and complex networks  [1]. Communities are groups of nodes such that members of the same group have a large number of edges between themselves compared with the number of edges with members belonging to other groups  [2]. The existence of modular structure in real-world complex networks originates from their intrinsic features, especially the fact that they are not random graphs  [3]. In such networks, the distribution of link density is inhomogeneous both locally and globally, i.e. more links reside within some particular parts of the network  [1]. Community structure identification has many applications, especially in fields like sociology, biology and computer science. Some complex networks that are usually represented by graphs are social groups like family, colleagues, friendship networks and online communities on the Internet. Such graphs can also be found in biology, for example in protein–protein interaction (PPI) networks  [4], metabolic networks  [5], World Wide Web, i.e. web pages and hyperlinks among them  [6], collaboration networks  [2] and bipartite or multipartite networks like co-authorship networks.

To further clarify the importance of the community identification task, we will mention some typical applications of this interesting challenge. In protein–protein interaction networks that are subject of intense investigation in biology and bioinformatics, communities correspond to functional groups, i.e. proteins with similar functionality tend to be observed with each other in similar chemical reactions. So, such proteins are often observed in the same cluster near each other. To predict the functionality of newly discovered proteins (the proteins that no prior knowledge is available for their attributes and functionalities) detection of community structure in PPI networks can be used to predict the functionality of newly discovered proteins, the proteins that no prior knowledge is available for their attributes and functionalities. For example Ref.  [7] used this technique for yeast.

Community detection is also a popular topic in computer science. In parallel computing, it is vital to know the best way to allocate tasks to processors in order to minimize communication cost. This can be accomplished by partitioning computer clusters into groups of approximately the same number of processors. Partitioning must be carried out in a way that the required number of actual connections between processors of different clusters is kept at the minimum possible level  [8].

In this paper, we analyze social networks as our case study. In these networks, like many other complex networks, we can achieve valuable information about the prominent features of the network by analyzing community structures. Usually in social networks, multiple memberships of nodes or individuals in different groups or communities exist. As a result, in social networks, the overlapping community structure must be considered in which each node of the social network can belong to one or more community. This issue is also observed in other kinds of complex networks such as biological networks, where a node might have multiple functions. In Ref.  [9], the authors show that the overlap is actually a prominent property of many real social networks. As a consequence, in these networks, the intersection of clusters can be non-empty, a condition that is assumed and exploited in some previously proposed algorithms.

Nodes that belong to more than one community are called overlapping nodes. Such nodes naturally have the important duty of intermediation between different parts of the graph  [1].

Most of the previously proposed algorithms for finding the community structure aim to find the disjoint community structure. But in social networks, as shown in Ref.  [9], we must consider the detection of overlapping communities. The goal of this paper is to present an efficient algorithm to calculate accurate covers for various kinds of social networks. A cover of a graph is the set of clusters such that each node is assigned to at least one cluster  [10]. Moreover, considering overlapping nodes causes tremendous increase in the number of possible covers with respect to standard partitions. Thus, finding overlapping communities of ten needs much more computation than is required for identifying traditional partitions, and an efficient algorithm for this harder challenge should be devised.

This problem is not yet perfectly solved, despite the considerable efforts of different researchers in the past few years  [1].

Section snippets

Related work

In this section, the existing algorithms for overlapping community detection are briefly introduced. These algorithms can be categorized into five different classes  [11] which will be discussed respectively:

  • Clique percolation

  • Line graph and link partitioning

  • Local expansion and optimization

  • Fuzzy detection

  • Agent-based and dynamical algorithms.

Our proposed algorithm, described in Section  3, works based on label propagation  [12] and belongs to the agent-based and dynamical algorithms’ class.

As

Our approach

In this section we propose a new algorithm for community identification that is named swarm of agents’ teamwork for overlapping community identification (SATOCI). This algorithm is able to detect both overlapping and non-overlapping community structures. The following table explains the different steps of the SATOCI algorithm.

SATOCI   (T,α)
Step 1: Load the Input Network   G
Step 2: Initialization
 (2.1) Compute L2I for all links of   G
 (2.2) Assign initial labels to nodes of   G
 (2.3) Let   H   be

Experiments and results

In order to analyze and understand the performance of SATOCI in the identification of overlapping community structure, different experiments on both synthetic and real-world networks are executed. For creation of synthetic networks, we use the widely accepted LFR benchmark  [40]. Also, parameter values used for the creation of synthetic networks are stated in the following subsections.

Conclusion and discussion

In this paper an efficient algorithm, named SATOCI, for overlapping community detection was introduced. Also, through a number of different experiments, we proved that the performance of our algorithm in synthetic networks is significantly better than other similar state-of-the-art algorithms. The main reason is that in our algorithm, agents consider closeness of different nodes in their decisions. Of course, in terms of precision, the performance of SATOCI is competitive compared with the best

References (52)

  • A. Pothen

    Graph partitioning algorithms with applications to scientific computing

  • S. Kelley et al.

    Handbook of Optimization in Complex Networks

    (2011)
  • A. Lancichinetti et al.

    Detecting the overlapping and hierarchical community structure in complex networks

    New Journal of Physics

    (2009)
  • J. Xie, S. Kelley, B.K. Szymanski, Overlapping community detection in networks: the state of the art and comparative...
  • U.N. Raghavan et al.

    Near linear time algorithm to detect community structures in large-scale networks

    Physical Review E

    (2007)
  • G. Palla et al.

    Uncovering the overlapping community structure of complex networks in nature and society

    Nature

    (2005)
  • B. Adamcsek et al.

    CFinder: locating cliques and overlapping modules in biological networks

    Bioinformatics

    (2006)
  • J.P. Onnela et al.

    Intensity and coherence of motifs in weighted complex networks

    Physical Review E

    (2005)
  • I. Farkas et al.

    Weighted network modules

    New Journal of Physics

    (2007)
  • C. Lee, F. Reid, A. McDaid, N. Hurley, Detecting highly overlapping community structure by greedy clique expansion,...
  • Y.Y. Ahn et al.

    Link communities reveal multiscale complexity in networks

    Nature

    (2010)
  • T. Evans et al.

    Line graphs of weighted networks for overlapping communities

    The European Physical Journal B. Condensed Matter and Complex Systems

    (2010)
  • J. Baumes, M. Goldberg, M. Krishnamoorthy, M. Magdon-Ismail, N. Preston, Finding communities by clustering a graph into...
  • A. Lancichinetti et al.

    Finding statistically significant communities in networks

    PLoS One

    (2011)
  • S. Gregory

    Finding overlapping communities in networks by label propagation

    New Journal of Physics

    (2010)
  • J. Xie et al.

    Towards linear time overlapping community detection in social networks

  • Cited by (20)

    • Overlapping community detection with adaptive density peaks clustering and iterative partition strategy

      2023, Expert Systems with Applications
      Citation Excerpt :

      These agents consider different network features in their investigations and make community-detection decisions independently. Badie et al. proposed an efficient agent-based algorithm known as SATOCI that uses the closeness of nodes (Badie et al., 2013) and operates based on label propagation (Raghavan et al., 2007). Game theory-based methods also belong to this category.

    • Community detection with the Label Propagation Algorithm: A survey

      2019, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      LPA has been combined, as well, with ant algorithms and simulated annealing to overcome the resolution limit by propagating labels probabilistically using modularity as the node preference function [146]. Conversely, LPA has also benefited from evolutionary algorithms — for example, by utilizing a swarm of agents that traverse the network to propagate labels [147]. Even though there exists a synergy when LPA is combined with evolutionary algorithms, the extra cost is the aggregation of a number of parameters, since this kind of algorithms have a non-negligible number of tuning parameters.

    • A cooperative game framework for detecting overlapping communities in social networks

      2018, Physica A: Statistical Mechanics and its Applications
    • Sampling algorithms for stochastic graphs: A learning automata approach

      2017, Knowledge-Based Systems
      Citation Excerpt :

      Network measures and computing them play a significant role in social network analysis [30]. Popular network measures, such as degree, betweenness, and closeness, are used not only in the characterization of a network, but also as part of computing some algorithms, such as the Girvan-Newman community detection algorithm [31] and overlapping community detection using nodes’ closeness [32]. In this section, some of the well-known network measures for deterministic networks are discussed.

    • Stochastic graph as a model for social networks

      2016, Computers in Human Behavior
      Citation Excerpt :

      Network measures and calculating them play a significant role in social network analysis (Borgatti, 2005). Popular network measures such as degree, betweenness, closeness and clustering coefficient not only used for evaluating the node importance in actual complex network studies but also used as a part of some algorithms such as Girvan-Newman community detection algorithm using betweenness (Girvan & Newman, 2002), overlapping community detection using node closeness (Badie et al., 2013). In this section some of well-known network measures for deterministic networks are introduced.

    View all citing articles on Scopus
    1

    Tel.: +98 21 82089718.

    View full text