Improving fuzzy C-mean-based community detection in social networks using dynamic parallelism

https://doi.org/10.1016/j.compeleceng.2018.01.003Get rights and content

Abstract

In Social Network Analysis (SNA), a common algorithm for community detection iteratively applies three phases: spectral mapping, clustering (using either the Fuzzy C-Means or the K-Means algorithms) and modularity computation. Despite its effectiveness, this method is not very efficient. A feasible solution to this problem is to use Graphics Processing Units. Moreover, due to the iterative nature of this algorithm, the emerging dynamic parallelism technology lends itself as a very appealing solution. In this work, we present different novel GPU implementations of both versions of the algorithm: Hybrid CPU-GPU, Dynamic Parallel and Hybrid Nested Parallel. These novel implementations differ in how much they rely on CPU and whether they utilize dynamic parallelism or not. We perform an extensive set of experiments to compare these implementations under different settings. The results show that the Hybrid Nested Parallel implementation provide about two orders of magnitude of speedup.

Introduction

Due to their importance and popularity, the attention in Social Networks (SNs) has significantly increased recently. A SN is defined as a set of vertices or nodes (representing individuals, organizations, countries, etc.) connected by links or edges representing relationships between them such as friendship, ownership, alliances, etc. With their billions of users, studying Online SNs (OSNs), such as Facebook, Twitter, etc., is of great importance to many parties in the academia as well as the industry, which gave a great boost to the field of SN Analysis (SNA).

Community detection is one of the very basic and fundamental tasks in SNA and it is concerned with finding sub-groups such that the nodes in each group share certain common characteristics. This is very useful in many practical applications such as in recommendations and advertisement [1]. Many algorithms were proposed for community detection. One of the popular ones was originally proposed by Newman and Girvan in 2004 (henceforth, referred to as NG) and was later extended by Zhang et al. in 2007 (henceforth, referred to as ZWZ). Despite their effectiveness, such algorithms do not scale well.

The SNs being studied these days have millions or even billions of nodes and edges [2]. This means that, without efficient implementations, SNA algorithms (such as community detection algorithms) will be impractical [3]. Graphics Processing Units (GPUs) represent a good option to address this problem. It is now affordable to obtain a laptop equipped with a very powerful GPU consisting of hundreds or thousands of cores.

In this work, we aim at speeding up the process of extracting communities in SNs by using the parallelization capabilities of the GPUs to. To be more specific, our focus will be on the ZWZ algorithm of [1], which comprises of three stages: spectral mapping, clustering and modularity computation. We exploit the GPU capabilities to improve the performance of the algorithm by speeding up the both the modularity and the clustering operations. We employ different parallelization techniques (some of which have already been presented in our earlier work, while the others are novel) and conduct various experiments.

The remainder of this paper is organized as follows. The following section summarizes the related works on community detection algorithms and GPU-based implementations. Section 3 presents our sequential implementations of the different versions of the algorithm under consideration. Section 4 discusses our parallel implementations. Section 5 presents the experimental setup and the results. Finally, the paper is concluded in Section 6.

Section snippets

Communities in SNs

A community is a group of social entities that is formed by the interactions between the individuals, where the interactions inside the community are more frequent than the interactions across different communities. The grouping of the individuals depends on some characteristics that are shared between nodes. Overlap in communities means that there are some nodes that belong to more than one community. Community detection is the process of detecting and characterizing groups of such network

The sequential implementation

In this section, we present the sequential implementation of the ZWZ algorithm [1]. We start with the spectral mapping algorithm which represents the implementation of the spectral mapping method used for extracting the features of the graph [19]. Then the FCM part and the KM part.

FCM-based algorithm: FCM is a data clustering technique, where each data point has a memberships degree in the range [0,1] indicating how much it belongs to a certain cluster. With its reliance on fuzzy sets, this

The parallel implementations

This section presents the parallel implementation of the ZWZ community detection algorithm as proposed in [1] . ZWZ utilizes FCM clustering, which can be replaced by KM as indicated by the authors of [1]. So, we present a parallel implementation of the KM-based version of the algorithm under consideration. The proposed parallel implementations exploit the capabilities of CPUs and GPUs using different approaches, including hybrid CPU-GPU (HCG), Dynamic Parallel (DP) and Hybrid Nested Parallel

Experiments and evaluation

The accuracy verifications of our implementations are done by carefully inspecting the outputs generated on the three datasets shown in the paper of Zhang et al. [1] (i.e., the simple “toy” graph, Zachary's karate club dataset and the American College football dataset) and manually comparing them with the outputs shown in [1].

Like [21], we use two machines to evaluate our implementations in order to increase the confidence in our observations and how general and realistic they are. The hardware

Conclusion

One of the interesting methods for community detection in social networks is the Newman and Girvan algorithm which was later extended to the Zhang et al algorithm. Despite being very effective, these methods are inefficient to deal with large-scale networks. This work presented our effort to improve the performance of both the Fuzzy C-Mean -based and the K-Mean based algorithms through the use of Graphics Processing Units. We presented different parallel implementations of the algorithms under

Mahmoud Al-Ayyoub received his Ph.D. in computer science from Stony Brook University in 2010. He is currently an associate professor of computer science at Jordan University of Science and Technology (JUST). His research interests include cloud computing, high performance computing, machine learning and AI. He is the co-director of the High Performance and Cloud Computing research lab at JUST.

References (30)

  • S Zhang et al.

    Identification of overlapping community structure in complex networks using fuzzy c-means clustering

    Phys A

    (2007)
  • H Lu et al.

    Parallel heuristics for scalable community detection

    Parallel Comput

    (2015)
  • MM Sathik et al.

    Comparative analysis of community discovery methods in social networks

    Int J Comput Appl

    (2011)
  • CA Navarro et al.

    A survey on parallel computing and its applications in data-parallel problems using GPU architectures

    Commun Comput Phys

    (2014)
  • J Tang et al.

    Social recommendation: a review

    Social Network Anal Mining

    (2013)
  • F Bonchi et al.

    Social network analysis and mining for business applications

    ACM Trans Intell Syst Technol (TIST)

    (2011)
  • CAR Pinheiro

    Community detection to identify fraud events in telecommunications networks

  • A Kalyanaraman et al.

    Fast uncovering of graph communities on a chip: toward scalable community detection on multicore and manycore platforms

    Found Trends Electron Des Autom

    (2016)
  • ME Newman et al.

    Finding and evaluating community structure in networks

    Phys Rev E

    (2004)
  • Forster R. Louvain community detection with parallel heuristics on GPUs. In Intelligent engineering systems (INES),...
  • S Mittal et al.

    A survey of CPU-GPU heterogeneous computing techniques

    ACM Comput Surv

    (2015)
  • M Girvan et al.

    Community structure in social and biological networks

    Proc Natl Acad Sci

    (2002)
  • LC Freeman

    A set of measures of centrality based on betweenness

    Sociometry.

    (1977)
  • S Gregory

    An algorithm to find overlapping community structure in networks

    Knowledge discovery in databases: PKDD 2007

    (2007)
  • S White et al.

    A spectral clustering approach to finding communities in graph

    SDM

    (2005)
  • Cited by (19)

    • Parallel and distributed paradigms for community detection in social networks: A methodological review

      2022, Expert Systems with Applications
      Citation Excerpt :

      Due to the individual evolution of chromosomes, it was easy to parallelize the evolutionary algorithm, and the speed of the execution increased significantly as compared to the existing parallel QIEA method. In the paper P_1, Al-Ayyoub et al. (Al-Ayyoub et al., 2019) proposed a novel algorithm to discover the communities present inside the networks. The algorithm consists of three main phases; spectral mapping, clustering, and modularity computation.

    • LapEFCM: overlapping community detection using laplacian eigenmaps and fuzzy C-means clustering

      2022, International Journal of Information Technology (Singapore)
    View all citing articles on Scopus

    Mahmoud Al-Ayyoub received his Ph.D. in computer science from Stony Brook University in 2010. He is currently an associate professor of computer science at Jordan University of Science and Technology (JUST). His research interests include cloud computing, high performance computing, machine learning and AI. He is the co-director of the High Performance and Cloud Computing research lab at JUST.

    Mohammed Al-andoli is a Research Associate at Computer Science Department at Jordan University of Science and Technology. His M.Sc. degree has been received from at Jordan University of Science and Technology in Computer Science. His main research interests include Social Networks Computing, Data Mining, Machine learning, and GPUs.

    Yaser Jararweh received his Ph.D. in Computer Engineering from University of Arizona in 2010. He is currently an associate professor of Computer Science at Jordan University of Science and Technology, Jordan. His research interests include Cloud Computing, GPUs and Big Data. He is associate editor in the Cluster Computing Journal (Springer), Information Processing & Management (Elsevier) and others.

    Mohammad AL-Smadi received his Ph.D. in Computer Science from Graz University of Technology in 2012. He is currently an assistant Professor at Jordan University of Science and Technology, Computer Science department. He has co-authored several technical papers in established journals and conferences in fields related to Social and Semantic Computing, Knowledge Engineering, Natural Language Processing, Technology Enhanced Learning.

    B. B. Gupta received PhD degree from Indian Institute of Technology Roorkee, India in the area of Information and Cyber Security. In 2009, he was selected for Canadian Commonwealth Scholarship and awarded by Government of Canada Award. Dr. Gupta has excellent academic record throughout his carrier. He has published more than 70 research papers (including 1 book and 8 chapters).

    Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. A. Sangaiah.

    View full text