Discriminative Streaming Network Embedding☆
Introduction
Many real-world networks (e.g., friendships among Facebook or Tencent users, citations among research papers, and communications among telecom users) generate data (e.g., friend requests, new citations, and calling records) in a stream fashion, where an element (e.g., a friend request) records a link between two users. The data can be naturally modeled as a streaming network represented by a sequence of edge insertions/deletions. These real-world streaming networks are usually extensive and frequently evolves. For example, the number of monthly active users in the social networks of Facebook is about 2 billion and there are over 1500 friend requests per second. Therefore, analyzing these networks provides many new challenges in areas such as community detection [1], [2], [3], anomaly detection [4], [5], and recommendation [6], [7], and thus have received significant attention in the last few years.
Network embedding methods [8], [9], [10], [11], [12], [13] are effective tools for analyzing such huge networks. These methods aim to transform a network into a low-dimensional space where each node is represented by a low-dimensional feature vector, and the proximity of nodes is preserved. Using the learned embeddings, existing data mining and machine learning algorithms can work effectively in many tasks, such as node classification [8], [11], link prediction [9], and anomaly detection [10]. However, since the steaming network evolves rapidly, it is still challenging for the above static embedding methods to handle such networks, because one needs to perform them on the entire network (i.e., the batch learning) as long as the network changes, which results in a large number of node embedding updates. Recently, several streaming network embedding methods [14], [15], [16], [17], [18] are proposed to perform incremental updates when the topology changes. However, these methods perform incremental updates in a heuristic manner and thus fail to quantitatively restrict the differences between incremental learning and batch learning. Moreover, these embedding methods perform in an unsupervised manner, i.e., they ignore the label information when learning node embeddings, which leads to sub-optimal results for applications such as node classification.
Many real-world networks have abundant node labels (e.g., interests). For instance, social networks such as Tencent and Flickr, allow users to create groups of specific labels (e.g., group description) that other users can join. The coauthor network is another example where researchers’ labels are their research interests. Motivated by the fact that labels are potentially helpful in learning a better joint embedding representation, we propose to leverage label information when learning streaming network embedding.
It is a nontrivial task to take advantage of label information in streaming networks for embedding learning. There are three main challenges. Firstly, as the evolving of network topology, the label information may also change as time elapses. Take the coauthor network as an example; recently, many researchers change their interests to the popular deep learning and simultaneously they coauthor with other researchers in the deep learning field. How to process the evolving of both topology and label information is still an unresolved problem. Secondly, although several discriminative embedding methods [19], [20], [21], [22] are proposed to learn network embeddings from label and topology information, they are all accustomed to static networks. Therefore, one needs to apply them on the entire network as long as the network changes, which is computationally intensive and prohibited for many online applications. Moreover, these methods usually balance the importance of label and topology information in a coarse-grained way and thus ignore the node-specific difference. Third, the label information could be sparse and incomplete. For instance, in social networks, the proportion of active users with labels might also be quite small. Therefore, it is a challenging task to leverage the label information to learn embeddings on streaming networks efficiently.
To solve the above challenges, in this paper, we propose a novel network embedding framework, Discriminative Streaming Network Embedding (DimSim), which learns effective embeddings for streaming networks efficiently. For effectiveness, a novel objective function for constructing discriminative embeddings is designed to automatically trade-off the importance of topology and label information. Notably, we propose a method using a learnable parameter matrix to better model the trade-offs that differ among individual nodes. For efficiency, we design an incremental learning algorithm, which theoretically guarantees that the objective function of incremental embedding learning well approximates to that of batch learning on the current snapshot at any time. When an edge insertion/deletion occurs, DimSim learns node embeddings incrementally, which is desired for many online applications such as anomaly detection. More importantly, the average amount of updating operations of DimSim for processing each newly coming edge is about , where is constant between and and is the number of nodes in the current network. As a result, DimSim can be increasingly fast as the network grows, which adheres to the intuition that an edge insertion/deletion has a smaller impact on larger networks. Our main contributions can be summarized as:
- •
We propose a novel framework DimSim to efficiently learn effective network embeddings from both topology and label information of streaming networks. Different from previous works, we design a novel loss function with fine-grained trade-off parameters, which automatically trade-off the importance of topology and label information for each node.
- •
We propose an efficient incremental embedding learningmethod, of which the average time complexity for processing each newly coming edge is inversely proportional to the network size. To the opposite, previous works usually take for each snapshot. As a result, DimSim can be increasingly fast as the network grows and thus can cope with the fast changes in streaming networks.
- •
Different from the heuristic updating manner of most previous works, our method DimSim theoretically guarantees that, at any time, the objective function of incremental learning well approximates to that of batch learning on the current snapshot at any time.
- •
Experimental results on benchmark datasets show that our method is up to times faster and more accurate than the state-of-the-arts.
The rest of this paper is organized as follows. We describe the problem of discriminative streaming network embedding in Section 2. Section 4 presents our framework DimSim. The performance evaluation and testing results are presented in Section 5. Section 3 summarizes related work. Conclusions then follow.
Section snippets
Problem definition
To formally define our problem, we first define information networks and streaming information networks as follows:
Definition 1 Information Network An information network consists of a network and a label map , where and are the sets of nodes and edges, consists of a part of nodes and their labels. Each node represents a data object. Each edge represents a relationship between two nodes which is an ordered pair . consists of a small proportion of nodes and their associated labels. Given a node
Related work
Networks can be categorized into static networks and dynamic networks according to whether the nodes or edges vary with time. In this section, we first review previous works on both static and dynamic network embeddings. Then, we also briefly review some works on incremental learning.
DimSim framework
In this section, we propose our framework DimSim of discriminative streaming network embedding. The whole framework is shown in Fig. 1. Our framework consists of two phases, a startup phase (Box A in Fig. 1) to construct the initial network embedding and a maintaining phase (Box B in Fig. 1) to efficiently update the embedding for the continuing stream. At the startup phase, we construct the initial network embedding over . Specifically, DimSim generates a batch of random walks from each
Experiment results and analysis
In this section, we demonstrate the effectiveness and efficiency of our method on streaming networks. All experiments are run on the same machine with an Intel Xeon CPU E5-2620 v3 with 2.4 GHz and 128 GB RAM.
Conclusions and future work
In this paper, we propose a novel framework, DimSim, which learns node embeddings for streaming networks considering both sides of effectiveness and efficiency. For effectiveness, a novel objective function for constructing discriminative embeddings is designed to automatically learn the trade-offs between the importance of topology and label information. For efficiency, we design an incremental learning algorithm, which theoretically guarantees that the objective function of incremental
Acknowledgments
The research presented in this paper is supported in part by Shenzhen Basic Research Grant (JCYJ20170816100819428), National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (61922067, U1736205, 61603290), Natural Science Basic Research Plan in Shaanxi Province of China (2019JM-159), Natural Science Basic Research Plan in ZheJiang Province of China (LGG18F020016).
References (63)
- et al.
Low-rank local tangent space embedding for subspace clustering
Inform. Sci.
(2020) - et al.
An incremental attribute reduction method for dynamic data mining
Inform. Sci.
(2018) - et al.
Incremental rough set approach for hierarchical multicriteria classification
Inform. Sci.
(2018) - et al.
Incremental on-line learning: A review and comparison of state of the art algorithms
Neurocomputing
(2018) - et al.
What’s in a crowd? Analysis of face-to-face behavioral networks
J. Theoret. Biol.
(2011) - et al.
Fast and accurate mining the community structure: Integrating center locating and membership optimization
IEEE Trans. Knowl. Data Eng.
(2016) - et al.
Dynamic cluster formation game for attributed graph clustering
IEEE Trans. Cybern.
(2019) - Z. Bu, H.-J. Li, C. Zhang, J. Cao, A. Li, Y. Shi, Graph K-means based on leader identification, dynamic game and...
- et al.
Fast memory-efficient anomaly detection in streaming heterogeneous graphs
- et al.
Spotlight: Detecting anomalies in streaming graphs
Recommendations for streaming data
Network anomaly detection: Methods, systems and tools
IEEE Commun. Surv. Tutor.
Scalable temporal latent space inference for link prediction in dynamic social networks
IEEE Trans. Knowl. Data Eng.
TIMERS: Error-bounded SVD restart on dynamic networks
Improving network embedding with partially available vertex and edge content
Inf. Sci.
Semi-supervised classification with graph convolutional networks
Cited by (2)
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105138.