Elsevier

Knowledge-Based Systems

Volume 190, 29 February 2020, 105138
Knowledge-Based Systems

Discriminative Streaming Network Embedding

https://doi.org/10.1016/j.knosys.2019.105138Get rights and content

Abstract

Many real-world networks (e.g., friendship network among Facebook users) generate data (e.g., friend requests) in a stream fashion. Recently, several network embedding methods are proposed to learn embeddings on such networks incrementally. However, these methods perform incremental updates in a heuristic manner and thus fail to quantitatively restrict the differences between incremental learning and direct learning on the entire network (i.e., the batch learning). Moreover, they ignore the node labels (e.g., interests) when learning node embeddings, which undermines the performance of network embeddings for applications such as node classification. To solve this problem, in this paper we propose a novel network embedding framework, Discriminative Streaming Network Embedding (DimSim). When an edge insertion/deletion occurs, DimSim fast learns node embeddings incrementally, which is desired for many online applications such as anomaly detection. With the incremental learning method, at any time, the objective function well approximates to that of batch learning on the current snapshot. More importantly, the average amount of updating operations of DimSim for processing each newly coming edge is about O(n1λ), where λ is constant between 1 and 2 and n is the number of nodes. As a result, DimSim can be increasingly fast as the network grows, which adheres to the intuition that an edge insertion/deletion has a smaller impact on larger networks. In addition, we also utilize node label information to learn discriminative node embeddings and we deliberately design a new model to automatically trade-off the node-specific importance of topology and label information. Extensive experiments on real-world streaming networks show that our method DimSim is up to 50 times faster and 40% more accurate than the state-of-the-arts.

Introduction

Many real-world networks (e.g., friendships among Facebook or Tencent users, citations among research papers, and communications among telecom users) generate data (e.g., friend requests, new citations, and calling records) in a stream fashion, where an element (e.g., a friend request) records a link between two users. The data can be naturally modeled as a streaming network represented by a sequence of edge insertions/deletions. These real-world streaming networks are usually extensive and frequently evolves. For example, the number of monthly active users in the social networks of Facebook is about 2 billion and there are over 1500 friend requests per second. Therefore, analyzing these networks provides many new challenges in areas such as community detection [1], [2], [3], anomaly detection [4], [5], and recommendation [6], [7], and thus have received significant attention in the last few years.

Network embedding methods [8], [9], [10], [11], [12], [13] are effective tools for analyzing such huge networks. These methods aim to transform a network into a low-dimensional space where each node is represented by a low-dimensional feature vector, and the proximity of nodes is preserved. Using the learned embeddings, existing data mining and machine learning algorithms can work effectively in many tasks, such as node classification [8], [11], link prediction [9], and anomaly detection [10]. However, since the steaming network evolves rapidly, it is still challenging for the above static embedding methods to handle such networks, because one needs to perform them on the entire network (i.e., the batch learning) as long as the network changes, which results in a large number of node embedding updates. Recently, several streaming network embedding methods [14], [15], [16], [17], [18] are proposed to perform incremental updates when the topology changes. However, these methods perform incremental updates in a heuristic manner and thus fail to quantitatively restrict the differences between incremental learning and batch learning. Moreover, these embedding methods perform in an unsupervised manner, i.e., they ignore the label information when learning node embeddings, which leads to sub-optimal results for applications such as node classification.

Many real-world networks have abundant node labels (e.g., interests). For instance, social networks such as Tencent and Flickr, allow users to create groups of specific labels (e.g., group description) that other users can join. The coauthor network is another example where researchers’ labels are their research interests. Motivated by the fact that labels are potentially helpful in learning a better joint embedding representation, we propose to leverage label information when learning streaming network embedding.

It is a nontrivial task to take advantage of label information in streaming networks for embedding learning. There are three main challenges. Firstly, as the evolving of network topology, the label information may also change as time elapses. Take the coauthor network as an example; recently, many researchers change their interests to the popular deep learning and simultaneously they coauthor with other researchers in the deep learning field. How to process the evolving of both topology and label information is still an unresolved problem. Secondly, although several discriminative embedding methods [19], [20], [21], [22] are proposed to learn network embeddings from label and topology information, they are all accustomed to static networks. Therefore, one needs to apply them on the entire network as long as the network changes, which is computationally intensive and prohibited for many online applications. Moreover, these methods usually balance the importance of label and topology information in a coarse-grained way and thus ignore the node-specific difference. Third, the label information could be sparse and incomplete. For instance, in social networks, the proportion of active users with labels might also be quite small. Therefore, it is a challenging task to leverage the label information to learn embeddings on streaming networks efficiently.

To solve the above challenges, in this paper, we propose a novel network embedding framework, Discriminative Streaming Network Embedding (DimSim), which learns effective embeddings for streaming networks efficiently. For effectiveness, a novel objective function for constructing discriminative embeddings is designed to automatically trade-off the importance of topology and label information. Notably, we propose a method using a learnable parameter matrix to better model the trade-offs that differ among individual nodes. For efficiency, we design an incremental learning algorithm, which theoretically guarantees that the objective function of incremental embedding learning well approximates to that of batch learning on the current snapshot at any time. When an edge insertion/deletion occurs, DimSim learns node embeddings incrementally, which is desired for many online applications such as anomaly detection. More importantly, the average amount of updating operations of DimSim for processing each newly coming edge is about O(n1λ), where λ is constant between 1 and 2 and n is the number of nodes in the current network. As a result, DimSim can be increasingly fast as the network grows, which adheres to the intuition that an edge insertion/deletion has a smaller impact on larger networks. Our main contributions can be summarized as:

  • We propose a novel framework DimSim to efficiently learn effective network embeddings from both topology and label information of streaming networks. Different from previous works, we design a novel loss function with fine-grained trade-off parameters, which automatically trade-off the importance of topology and label information for each node.

  • We propose an efficient incremental embedding learningmethod, of which the average time complexity for processing each newly coming edge is inversely proportional to the network size. To the opposite, previous works usually take O(n) for each snapshot. As a result, DimSim can be increasingly fast as the network grows and thus can cope with the fast changes in streaming networks.

  • Different from the heuristic updating manner of most previous works, our method DimSim theoretically guarantees that, at any time, the objective function of incremental learning well approximates to that of batch learning on the current snapshot at any time.

  • Experimental results on benchmark datasets show that our method is up to 50 times faster and 40% more accurate than the state-of-the-arts.

The rest of this paper is organized as follows. We describe the problem of discriminative streaming network embedding in Section 2. Section 4 presents our framework DimSim. The performance evaluation and testing results are presented in Section 5. Section 3 summarizes related work. Conclusions then follow.

Section snippets

Problem definition

To formally define our problem, we first define information networks and streaming information networks as follows:

Definition 1 Information Network

An information network consists of a network G=(V,E) and a label map L, where V and E are the sets of nodes and edges, L consists of a part of nodes and their labels. Each node vV represents a data object. Each edge eE represents a relationship between two nodes which is an ordered pair e=(u,v). L consists of a small proportion of nodes and their associated labels. Given a node v

Related work

Networks can be categorized into static networks and dynamic networks according to whether the nodes or edges vary with time. In this section, we first review previous works on both static and dynamic network embeddings. Then, we also briefly review some works on incremental learning.

DimSim framework

In this section, we propose our framework DimSim of discriminative streaming network embedding. The whole framework is shown in Fig. 1. Our framework consists of two phases, a startup phase (Box A in Fig. 1) to construct the initial network embedding and a maintaining phase (Box B in Fig. 1) to efficiently update the embedding for the continuing stream. At the startup phase, we construct the initial network embedding over G(0). Specifically, DimSim generates a batch of random walks from each

Experiment results and analysis

In this section, we demonstrate the effectiveness and efficiency of our method on streaming networks. All experiments are run on the same machine with an Intel Xeon CPU E5-2620 v3 with 2.4 GHz and 128 GB RAM.

Conclusions and future work

In this paper, we propose a novel framework, DimSim, which learns node embeddings for streaming networks considering both sides of effectiveness and efficiency. For effectiveness, a novel objective function for constructing discriminative embeddings is designed to automatically learn the trade-offs between the importance of topology and label information. For efficiency, we design an incremental learning algorithm, which theoretically guarantees that the objective function of incremental

Acknowledgments

The research presented in this paper is supported in part by Shenzhen Basic Research Grant (JCYJ20170816100819428), National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (61922067, U1736205, 61603290), Natural Science Basic Research Plan in Shaanxi Province of China (2019JM-159), Natural Science Basic Research Plan in ZheJiang Province of China (LGG18F020016).

References (63)

  • SubbianK. et al.

    Recommendations for streaming data

  • P. Wang, Y. Qi, Y. Zhang, Q. Zhai, C. Wang, J. Lui, X. Guan, a memory-efficient sketch method for estimating high...
  • S. Cao, W. Lu, Q. Xu, Grarep: Learning graph representations with global structural information, in: CIKM, 2015, pp....
  • A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: SIGKDD, 2016, pp....
  • BhuyanM.H. et al.

    Network anomaly detection: Methods, systems and tools

    IEEE Commun. Surv. Tutor.

    (2014)
  • B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: SIGKDD, 2014, pp....
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network embedding, in: WWW, 2015, pp....
  • ZhuL. et al.

    Scalable temporal latent space inference for link prediction in dynamic social networks

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • J. Li, H. Dani, X. Hu, J. Tang, Y. Chang, H. Liu, Attributed network embedding for learning in a dynamic environment,...
  • ZhangZ. et al.

    TIMERS: Error-bounded SVD restart on dynamic networks

    (2017)
  • A. Zhiyuli, X. Liang, Z. Xu, Learning distributed representations for large-scale dynamic social networks, in: INFOCOM,...
  • L. Du, Y. Wang, G. Song, Z. Lu, J. Wang, Dynamic network embedding: An extended approach for skip-gram based network...
  • Z. Yang, W.W. Cohen, R. Salakhutdinov, Revisiting semi-supervised learning with graph embeddings, in: ICML, 2016, pp....
  • C. Tu, W. Zhang, Z. Liu, M. Sun, Max-margin deepwalk: discriminative learning of network representation, in: IJCAI,...
  • J. Li, J. Zhu, B. Zhang, Discriminative deep random walk for network classification, in: ACL, 2016, pp....
  • X. Huang, J. Li, X. Hu, Label informed attributed network embedding, in: WSDM, 2017, pp....
  • T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their...
  • J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, J. Tang, Network embedding as matrix factorization: Unifying deepwalk, line,...
  • D. Wang, P. Cui, W. Zhu, Structural deep network embedding, in: SIGKDD, 2016, pp....
  • LanL. et al.

    Improving network embedding with partially available vertex and edge content

    Inf. Sci.

    (2019)
  • KipfT.N. et al.

    Semi-supervised classification with graph convolutional networks

    (2016)
  • No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105138.

    View full text