Physica A: Statistical Mechanics and its Applications
An efficient agent-based algorithm for overlapping community detection using nodes’ closeness
Introduction
Community or modular structure is one of the most significant properties of graphs representing real systems and complex networks [1]. Communities are groups of nodes such that members of the same group have a large number of edges between themselves compared with the number of edges with members belonging to other groups [2]. The existence of modular structure in real-world complex networks originates from their intrinsic features, especially the fact that they are not random graphs [3]. In such networks, the distribution of link density is inhomogeneous both locally and globally, i.e. more links reside within some particular parts of the network [1]. Community structure identification has many applications, especially in fields like sociology, biology and computer science. Some complex networks that are usually represented by graphs are social groups like family, colleagues, friendship networks and online communities on the Internet. Such graphs can also be found in biology, for example in protein–protein interaction (PPI) networks [4], metabolic networks [5], World Wide Web, i.e. web pages and hyperlinks among them [6], collaboration networks [2] and bipartite or multipartite networks like co-authorship networks.
To further clarify the importance of the community identification task, we will mention some typical applications of this interesting challenge. In protein–protein interaction networks that are subject of intense investigation in biology and bioinformatics, communities correspond to functional groups, i.e. proteins with similar functionality tend to be observed with each other in similar chemical reactions. So, such proteins are often observed in the same cluster near each other. To predict the functionality of newly discovered proteins (the proteins that no prior knowledge is available for their attributes and functionalities) detection of community structure in PPI networks can be used to predict the functionality of newly discovered proteins, the proteins that no prior knowledge is available for their attributes and functionalities. For example Ref. [7] used this technique for yeast.
Community detection is also a popular topic in computer science. In parallel computing, it is vital to know the best way to allocate tasks to processors in order to minimize communication cost. This can be accomplished by partitioning computer clusters into groups of approximately the same number of processors. Partitioning must be carried out in a way that the required number of actual connections between processors of different clusters is kept at the minimum possible level [8].
In this paper, we analyze social networks as our case study. In these networks, like many other complex networks, we can achieve valuable information about the prominent features of the network by analyzing community structures. Usually in social networks, multiple memberships of nodes or individuals in different groups or communities exist. As a result, in social networks, the overlapping community structure must be considered in which each node of the social network can belong to one or more community. This issue is also observed in other kinds of complex networks such as biological networks, where a node might have multiple functions. In Ref. [9], the authors show that the overlap is actually a prominent property of many real social networks. As a consequence, in these networks, the intersection of clusters can be non-empty, a condition that is assumed and exploited in some previously proposed algorithms.
Nodes that belong to more than one community are called overlapping nodes. Such nodes naturally have the important duty of intermediation between different parts of the graph [1].
Most of the previously proposed algorithms for finding the community structure aim to find the disjoint community structure. But in social networks, as shown in Ref. [9], we must consider the detection of overlapping communities. The goal of this paper is to present an efficient algorithm to calculate accurate covers for various kinds of social networks. A cover of a graph is the set of clusters such that each node is assigned to at least one cluster [10]. Moreover, considering overlapping nodes causes tremendous increase in the number of possible covers with respect to standard partitions. Thus, finding overlapping communities of ten needs much more computation than is required for identifying traditional partitions, and an efficient algorithm for this harder challenge should be devised.
This problem is not yet perfectly solved, despite the considerable efforts of different researchers in the past few years [1].
Section snippets
Related work
In this section, the existing algorithms for overlapping community detection are briefly introduced. These algorithms can be categorized into five different classes [11] which will be discussed respectively:
- •
Clique percolation
- •
Line graph and link partitioning
- •
Local expansion and optimization
- •
Fuzzy detection
- •
Agent-based and dynamical algorithms.
Our proposed algorithm, described in Section 3, works based on label propagation [12] and belongs to the agent-based and dynamical algorithms’ class.
As
Our approach
In this section we propose a new algorithm for community identification that is named swarm of agents’ teamwork for overlapping community identification (SATOCI). This algorithm is able to detect both overlapping and non-overlapping community structures. The following table explains the different steps of the SATOCI algorithm.
SATOCI Step 1: Load the Input Network Step 2: Initialization (2.1) Compute L2I for all links of (2.2) Assign initial labels to nodes of (2.3) Let be
Experiments and results
In order to analyze and understand the performance of SATOCI in the identification of overlapping community structure, different experiments on both synthetic and real-world networks are executed. For creation of synthetic networks, we use the widely accepted LFR benchmark [40]. Also, parameter values used for the creation of synthetic networks are stated in the following subsections.
Conclusion and discussion
In this paper an efficient algorithm, named SATOCI, for overlapping community detection was introduced. Also, through a number of different experiments, we proved that the performance of our algorithm in synthetic networks is significantly better than other similar state-of-the-art algorithms. The main reason is that in our algorithm, agents consider closeness of different nodes in their decisions. Of course, in terms of precision, the performance of SATOCI is competitive compared with the best
References (52)
Community detection in graphs
Physics Reports
(2010)- et al.
Identification of overlapping community structure in complex networks using fuzzy -means clustering
Physica A. Statistical Mechanics and its Applications
(2007) - et al.
Identification of overlapping and non-overlapping community structure by fuzzy clustering in complex networks
Information Sciences
(2011) - et al.
Detect overlapping and hierarchical community structure in networks
Physica A. Statistical Mechanics and its Applications
(2009) - et al.
Community structure in social and biological networks
Proceedings of the National Academy of Sciences
(2002) - et al.
Statistical physics of social dynamics
Publicationes Mathematicae Debrecen
(1959) Protein Interaction Networks: Computational Analysis
(2009)- et al.
Functional cartography of complex metabolic networks
Nature
(2005) - et al.
The diameter of the world wide web
Nature
(1999) - et al.
Protein complexes and functional modules in molecular networks
Proceedings of the National Academy of Sciences
(2003)
Graph partitioning algorithms with applications to scientific computing
Handbook of Optimization in Complex Networks
Detecting the overlapping and hierarchical community structure in complex networks
New Journal of Physics
Near linear time algorithm to detect community structures in large-scale networks
Physical Review E
Uncovering the overlapping community structure of complex networks in nature and society
Nature
CFinder: locating cliques and overlapping modules in biological networks
Bioinformatics
Intensity and coherence of motifs in weighted complex networks
Physical Review E
Weighted network modules
New Journal of Physics
Link communities reveal multiscale complexity in networks
Nature
Line graphs of weighted networks for overlapping communities
The European Physical Journal B. Condensed Matter and Complex Systems
Finding statistically significant communities in networks
PLoS One
Finding overlapping communities in networks by label propagation
New Journal of Physics
Towards linear time overlapping community detection in social networks
Cited by (20)
Overlapping community detection with adaptive density peaks clustering and iterative partition strategy
2023, Expert Systems with ApplicationsCitation Excerpt :These agents consider different network features in their investigations and make community-detection decisions independently. Badie et al. proposed an efficient agent-based algorithm known as SATOCI that uses the closeness of nodes (Badie et al., 2013) and operates based on label propagation (Raghavan et al., 2007). Game theory-based methods also belong to this category.
Community detection with the Label Propagation Algorithm: A survey
2019, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :LPA has been combined, as well, with ant algorithms and simulated annealing to overcome the resolution limit by propagating labels probabilistically using modularity as the node preference function [146]. Conversely, LPA has also benefited from evolutionary algorithms — for example, by utilizing a swarm of agents that traverse the network to propagate labels [147]. Even though there exists a synergy when LPA is combined with evolutionary algorithms, the extra cost is the aggregation of a number of parameters, since this kind of algorithms have a non-negligible number of tuning parameters.
A cooperative game framework for detecting overlapping communities in social networks
2018, Physica A: Statistical Mechanics and its ApplicationsSampling algorithms for stochastic graphs: A learning automata approach
2017, Knowledge-Based SystemsCitation Excerpt :Network measures and computing them play a significant role in social network analysis [30]. Popular network measures, such as degree, betweenness, and closeness, are used not only in the characterization of a network, but also as part of computing some algorithms, such as the Girvan-Newman community detection algorithm [31] and overlapping community detection using nodes’ closeness [32]. In this section, some of the well-known network measures for deterministic networks are discussed.
Stochastic graph as a model for social networks
2016, Computers in Human BehaviorCitation Excerpt :Network measures and calculating them play a significant role in social network analysis (Borgatti, 2005). Popular network measures such as degree, betweenness, closeness and clustering coefficient not only used for evaluating the node importance in actual complex network studies but also used as a part of some algorithms such as Girvan-Newman community detection algorithm using betweenness (Girvan & Newman, 2002), overlapping community detection using node closeness (Badie et al., 2013). In this section some of well-known network measures for deterministic networks are introduced.
IrBlogs: A standard collection for studying Persian bloggers
2016, Computers in Human Behavior
- 1
Tel.: +98 21 82089718.