High-performance community detection in social networks using a deep transitive autoencoder
Introduction
Complex networks are composed of vertices and edges (representing relationships) between those vertices [6], [14]; examples include social, biological, and web networks. The community detection problem has attracted considerable attention from researchers within the domain of complex networks [30]. Research on community detection methods and network science is very important in text analysis, bibliometrics, and data analysis [2]. Some research achievements of community detection have been successfully applied to various fields, such as friend recommendation, personalized product promotion, protein function prediction, and public opinion analysis and processing. Community detection requires dividing a network into groups of tightly connected vertices, while vertices belonging to different groups have only sparse connections. An adjacency matrix can be easily obtained from any network. However, using the adjacency matrix as a similarity representation of the network can only render the connections between vertices, which greatly affects the accuracy of community detection. Clustering method is very important and has a wide range of applications in exploring data relationships. However, if clustering is used as a method for community partition, the detection results are usually not accurate enough when these methods are used to deal with high-dimensional data.
Community detection aims to effectively detect the structure of communities in complex networks. There is a variety of approaches to developing network community detection methods. For example, the multilayer ant-based algorithm (MABA) [8] implements community detection by adopting the method of locally optimizing modularity with individual ants. The spectral algorithm (SP) [27] is used to detect community structure by spectral analysis of a Laplace or modularity matrix. In addition, many algorithms have been put forward to solve the detection problem of complex network topology, including: (1) Algorithms based on modularity optimization. For example, the unified link and content (ULC) model [23] is a heuristic searching algorithm that attempts to solve the modularity optimization problem. It optimizes the global parameters by adjusting local extreme values, to improve the efficiency of computation. Fast and accurate mining (FAM) [12] is a fast mining algorithm based on modularity, which belongs to the class of greedy agglomeration algorithms. The fast unfolding algorithm (FUA) [10] is a local optimization and hierarchical clustering algorithm, which takes less computing time than many other network clustering algorithms. (2) Algorithms based on label passing. For example, the label propagation algorithm (LPA) [24] is the earliest tag-based algorithm, whose basic aim is to predict the tag information of untagged vertices from tag information of a tagged vertex. (3) Dynamic algorithms. For example, finding and extracting communities (FEC) [32] is a community detection algorithm which considers both connection density and connection symbols of complex networks. It can effectively deal with the relationship between vertices and find more reasonable network community structures. The Infomap algorithm [25] adopts random walks as a process for information propagation on the network to generate the corresponding data flow. According to the network topology, these methods can solve the problem of community detection from different angles. However, in real-world applications, networks usually contain many nonlinear properties, so the detection ability of those models may be limited in real-world networks.
Deep learning research has made remarkable achievements in the fields of machine learning and artificial intelligence. Theoretical studies, algorithm designs, and application systems for neural networks have been widely proposed in many fields, such as speech identification, image classification, and natural language processing [5]. Multilayer neural networks can effectively reduce the data dimensionality [33] and achieve good community detection results. High-dimensional original data can be encoded into a new feature representation through a multilayer neural network, with a bottleneck layer to best approximate the original input data. This method can reduce data dimensionality better than classical dimensionality reduction methods. However, existing neural network methods cannot perform feature extraction well, in the feature representation of community detection. Therefore, we need to discover more effective deep neural network approaches to obtain low-dimensional features.
In pattern recognition, transfer learning aims to establish a target prediction model with efficient generalization ability. It aims to transfer effective information, which may contain a limited amount of valid label information, from a source domain to a related target domain. Therefore, it has become an important research problem in transfer learning methods to obtain more powerful feature representations for instances of source and target domains. In complex networks, a good feature representation can help detect the topological structure. However, low-dimensional feature representations are not strong enough in mainstream community detection methods based on deep learning. Traditional transfer methods inspired by deep learning methods are to learn a more powerful feature representation. However, most methods do not explicitly emphasize minimizing the differences between various domains. In addition, most transfer methods use supervised learning, and valid labels are limited to existing community detection tasks. To achieve better performance in regard to the problem of community detection, unsupervised transfer learning is urgently required to assist a deep autoencoder.
In summary, there are three main contributions of our paper:
- (1)
We propose a novel method to transform a network to an adjacency matrix, to describe the similarity of vertices in a network. A new method, which applies an effective algorithm based on a deep transitive autoencoder, is proposed to transmit similarity information and detect communities in complex networks. The method is named community detection with deep transitive autoencoder (CDDTA).
- (2)
We incorporate unsupervised transfer learning into the proposed algorithm by measuring Kullback–Leibler (KL) divergence of embedded instances, to ensure that the differences between different domains can be approximately equal when learning low-dimensional representations.
- (3)
We propose a training strategy for our novel framework, in which the target domain shares common parameters with the source domain, for encoding and decoding in the process of deep transitive autoencoder training. We also optimize the training algorithm by a back-propagation method with stochastic gradient descent.
The rest of this paper is organized as follows. Section 2 introduces the related work. The proposed CDDTA algorithm is introduced in Section 3. We further incorporate unsupervised transfer learning into the CDDTA algorithm (named Transfer-CDDTA) by KL divergence of embedded instances, in Section 4. Section 5 analyzes the detection results on various social networks. Finally, Section 6 presents the conclusions.
Section snippets
Related work
In real life, many complex relationships can be represented by networks with complex structural characteristics. The community structure of complex networks has the characteristic that tightly interconnected vertices form groups and there are sparse connections between vertices in different groups. We introduce some relevant community detection and transfer learning methods in this section.
Community detection with deep transitive autoencoder model
In this section, we introduce the proposed community detection method based on a deep transitive autoencoder. We also describe an adjacency matrix transformation method to represent the similarity of vertices in a network topology.
A social network can be modeled as an undirected and unweighted graph in which V represents the set of N vertices, where and represents the set of edges. Here, we use a symmetric matrix as the adjacency matrix to
Problem formalization
Transfer learning builds a target prediction model with good generalization performance by transferring knowledge from the source to the target domain. Because community detection may lack sufficient information in the real world, transfer learning is an effective way to solve this problem. In addition, to assist the deep transitive autoencoder to obtain a more powerful feature representation, we incorporate unsupervised transfer learning into our method and further introduce a novel framework,
Experimental analysis
To evaluate the proposed CDDTA method and Transfer-CDDTA framework, we analyze their performance on some widely used real networks and two types of synthetic benchmark networks. We compare our framework with some efficient methods to prove the validity of our method. Here, we employ 20-iteration k-means as our clustering method. We conducted our experiments on an NVIDIA GeForce GTX 1070 Ti with MATLAB 2015a.
Conclusion and future work
In this paper, to compensate for the defects of existing community detection methods, we proposed a novel CDDTA method for the feature representation of large complex networks. On this basis, we further incorporated unsupervised transfer learning into the proposed method, named Transfer-CDDTA, by measuring the KL divergence of embedded instances. Extensive experimental results show that our new methods are superior to most existing advanced methods available, on real-world networks and
Conflict of interest
There are no conflicts of interest.
Acknowledgments
The research is supported by the National Natural Science Foundation of China (No. 61602005), Natural Science Foundation of Anhui Province (1608085MF130, 1808085MF199), The Natural Science Foundation of the Anhui Higher Education Institutions of China (KJ2018A0016).
References (33)
- et al.
Incorporating network structure with node contents for community detection on large networks using deep learning
Neurocomputing
(2018) - et al.
Evaluating deep learning architectures for speech emotion recognition
Neural Netw.
(2017) Overlapping community detection with least replicas in complex networks
Inf. Sci. (Ny)
(2018)- et al.
Semi-supervised clustering algorithm for community structure detection in complex networks
Phys. A Stat. Mech. Appl.
(2010) - et al.
Modularity in complex multilayer networks with multiple aspects: a static perspective
Appl. Inf.
(2017) Multilayer bootstrap networks
Neural Netw.
(2018)- et al.
The political blogosphere and the 2004 U.S. election: divided they blog
Proceedings of the 3rd international workshop on Link discovery, LinkKDD 2005, Chicago, Illinois, USA, August 21–25, 2005
(2005) - et al.
Text authorship identified using the dynamics of word co-occurrence networks
PLoS One
(2017) - et al.
Robustness of community structure to node removal
J. Stat. Mech: Theory Exp.
(2015) Community structure in complex networks
Extraction et Gestion des Connaissances, EGC 2018, Paris, France, January 23–26, 2018
(2018)