Elsevier

Information Sciences

Volume 493, August 2019, Pages 75-90
Information Sciences

High-performance community detection in social networks using a deep transitive autoencoder

https://doi.org/10.1016/j.ins.2019.04.018Get rights and content

Abstract

Community structure is an important characteristic of complex networks. It determines where important functions of a network are located. Recently, discovering community structure in complex networks has become a hot topic of research. However, the continuous increase in network size has made network structure more complex, and community detection has become extremely difficult in real applications. In particular, the detection results are usually not accurate enough when classical clustering methods are applied to high-dimensional data matrices. In this paper, inspired by the relationship between vertices, we design a novel and effective network adjacency matrix transformation method to describe vertices’ similarity in the network topology. On this basis, we propose a framework to extract nonlinear features: community detection with deep transitive autoencoder (CDDTA). This framework can obtain powerful nonlinear features of a real network to make community detection algorithms perform excellently in practice. We further incorporate unsupervised transfer learning into the CDDTA (Transfer-CDDTA) by minimizing the Kullback–Leibler divergence of embedded instances, to discover powerful low-dimensional representations. Finally, we propose a new training strategy and optimization method for our algorithm. Extensive experimental results indicate that our new framework can ensure good performance on both real-world networks and artificial benchmark networks, which outperforms most of the state-of-the-art methods for community detection in social networks.

Introduction

Complex networks are composed of vertices and edges (representing relationships) between those vertices [6], [14]; examples include social, biological, and web networks. The community detection problem has attracted considerable attention from researchers within the domain of complex networks [30]. Research on community detection methods and network science is very important in text analysis, bibliometrics, and data analysis [2]. Some research achievements of community detection have been successfully applied to various fields, such as friend recommendation, personalized product promotion, protein function prediction, and public opinion analysis and processing. Community detection requires dividing a network into groups of tightly connected vertices, while vertices belonging to different groups have only sparse connections. An adjacency matrix can be easily obtained from any network. However, using the adjacency matrix as a similarity representation of the network can only render the connections between vertices, which greatly affects the accuracy of community detection. Clustering method is very important and has a wide range of applications in exploring data relationships. However, if clustering is used as a method for community partition, the detection results are usually not accurate enough when these methods are used to deal with high-dimensional data.

Community detection aims to effectively detect the structure of communities in complex networks. There is a variety of approaches to developing network community detection methods. For example, the multilayer ant-based algorithm (MABA) [8] implements community detection by adopting the method of locally optimizing modularity with individual ants. The spectral algorithm (SP) [27] is used to detect community structure by spectral analysis of a Laplace or modularity matrix. In addition, many algorithms have been put forward to solve the detection problem of complex network topology, including: (1) Algorithms based on modularity optimization. For example, the unified link and content (ULC) model [23] is a heuristic searching algorithm that attempts to solve the modularity optimization problem. It optimizes the global parameters by adjusting local extreme values, to improve the efficiency of computation. Fast and accurate mining (FAM) [12] is a fast mining algorithm based on modularity, which belongs to the class of greedy agglomeration algorithms. The fast unfolding algorithm (FUA) [10] is a local optimization and hierarchical clustering algorithm, which takes less computing time than many other network clustering algorithms. (2) Algorithms based on label passing. For example, the label propagation algorithm (LPA) [24] is the earliest tag-based algorithm, whose basic aim is to predict the tag information of untagged vertices from tag information of a tagged vertex. (3) Dynamic algorithms. For example, finding and extracting communities (FEC) [32] is a community detection algorithm which considers both connection density and connection symbols of complex networks. It can effectively deal with the relationship between vertices and find more reasonable network community structures. The Infomap algorithm [25] adopts random walks as a process for information propagation on the network to generate the corresponding data flow. According to the network topology, these methods can solve the problem of community detection from different angles. However, in real-world applications, networks usually contain many nonlinear properties, so the detection ability of those models may be limited in real-world networks.

Deep learning research has made remarkable achievements in the fields of machine learning and artificial intelligence. Theoretical studies, algorithm designs, and application systems for neural networks have been widely proposed in many fields, such as speech identification, image classification, and natural language processing [5]. Multilayer neural networks can effectively reduce the data dimensionality [33] and achieve good community detection results. High-dimensional original data can be encoded into a new feature representation through a multilayer neural network, with a bottleneck layer to best approximate the original input data. This method can reduce data dimensionality better than classical dimensionality reduction methods. However, existing neural network methods cannot perform feature extraction well, in the feature representation of community detection. Therefore, we need to discover more effective deep neural network approaches to obtain low-dimensional features.

In pattern recognition, transfer learning aims to establish a target prediction model with efficient generalization ability. It aims to transfer effective information, which may contain a limited amount of valid label information, from a source domain to a related target domain. Therefore, it has become an important research problem in transfer learning methods to obtain more powerful feature representations for instances of source and target domains. In complex networks, a good feature representation can help detect the topological structure. However, low-dimensional feature representations are not strong enough in mainstream community detection methods based on deep learning. Traditional transfer methods inspired by deep learning methods are to learn a more powerful feature representation. However, most methods do not explicitly emphasize minimizing the differences between various domains. In addition, most transfer methods use supervised learning, and valid labels are limited to existing community detection tasks. To achieve better performance in regard to the problem of community detection, unsupervised transfer learning is urgently required to assist a deep autoencoder.

In summary, there are three main contributions of our paper:

  • (1)

    We propose a novel method to transform a network to an adjacency matrix, to describe the similarity of vertices in a network. A new method, which applies an effective algorithm based on a deep transitive autoencoder, is proposed to transmit similarity information and detect communities in complex networks. The method is named community detection with deep transitive autoencoder (CDDTA).

  • (2)

    We incorporate unsupervised transfer learning into the proposed algorithm by measuring Kullback–Leibler (KL) divergence of embedded instances, to ensure that the differences between different domains can be approximately equal when learning low-dimensional representations.

  • (3)

    We propose a training strategy for our novel framework, in which the target domain shares common parameters with the source domain, for encoding and decoding in the process of deep transitive autoencoder training. We also optimize the training algorithm by a back-propagation method with stochastic gradient descent.

The rest of this paper is organized as follows. Section 2 introduces the related work. The proposed CDDTA algorithm is introduced in Section 3. We further incorporate unsupervised transfer learning into the CDDTA algorithm (named Transfer-CDDTA) by KL divergence of embedded instances, in Section 4. Section 5 analyzes the detection results on various social networks. Finally, Section 6 presents the conclusions.

Section snippets

Related work

In real life, many complex relationships can be represented by networks with complex structural characteristics. The community structure of complex networks has the characteristic that tightly interconnected vertices form groups and there are sparse connections between vertices in different groups. We introduce some relevant community detection and transfer learning methods in this section.

Community detection with deep transitive autoencoder model

In this section, we introduce the proposed community detection method based on a deep transitive autoencoder. We also describe an adjacency matrix transformation method to represent the similarity of vertices in a network topology.

A social network can be modeled as an undirected and unweighted graph G={V,E}, in which V represents the set of N vertices, where V={v1,v2,,vN}, and E={eij} represents the set of edges. Here, we use a symmetric matrix A=[aij]RN×N as the adjacency matrix to

Problem formalization

Transfer learning builds a target prediction model with good generalization performance by transferring knowledge from the source to the target domain. Because community detection may lack sufficient information in the real world, transfer learning is an effective way to solve this problem. In addition, to assist the deep transitive autoencoder to obtain a more powerful feature representation, we incorporate unsupervised transfer learning into our method and further introduce a novel framework,

Experimental analysis

To evaluate the proposed CDDTA method and Transfer-CDDTA framework, we analyze their performance on some widely used real networks and two types of synthetic benchmark networks. We compare our framework with some efficient methods to prove the validity of our method. Here, we employ 20-iteration k-means as our clustering method. We conducted our experiments on an NVIDIA GeForce GTX 1070 Ti with MATLAB 2015a.

Conclusion and future work

In this paper, to compensate for the defects of existing community detection methods, we proposed a novel CDDTA method for the feature representation of large complex networks. On this basis, we further incorporated unsupervised transfer learning into the proposed method, named Transfer-CDDTA, by measuring the KL divergence of embedded instances. Extensive experimental results show that our new methods are superior to most existing advanced methods available, on real-world networks and

Conflict of interest

There are no conflicts of interest.

Acknowledgments

The research is supported by the National Natural Science Foundation of China (No. 61602005), Natural Science Foundation of Anhui Province (1608085MF130, 1808085MF199), The Natural Science Foundation of the Anhui Higher Education Institutions of China (KJ2018A0016).

References (33)

  • M. Girvan et al.

    Community structure in social and biological networks

    Proc. Natl. Acad. Sci. U.S.A.

    (2002)
  • D. He et al.

    An ant-based algorithm with local optimization for community detection in large-scale networks

    Adv. Complex Syst.

    (2012)
  • M. Jebabli et al.

    Community detection algorithm evaluation with ground-truth data

    Physica A

    (2017)
  • R. Kanawati

    Ensemble selection for community detection in complex networks

    Social Computing and Social Media - 7th International Conference, SCSM 2015, Held as Part of HCI International 2015, Los Angeles, CA, USA, August 2–7, 2015, Proceedings

    (2015)
  • A. Lancichinetti et al.

    Benchmark graphs for testing community detection algorithms

    Phys. Rev. E Stat. Nonlin. Soft Matter Phys.

    (2008)
  • H.J. Li et al.

    Fast and accurate mining the community structure: integrating center locating and membership optimization

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • Cited by (0)

    View full text