Elsevier

Neurocomputing

Volume 412, 28 October 2020, Pages 31-41
Neurocomputing

Learning heterogeneous information network embeddings via relational triplet network

https://doi.org/10.1016/j.neucom.2020.06.043Get rights and content

Abstract

Network embedding algorithms learn low-dimensional features from the relationships and attributes of networks. The basic principle of these algorithms is to preserve the similarities in the original networks as much as possible. However, in heterogeneous information networks, existing algorithms are not sufficiently expressive to capture detailed semantic patterns between nodes. In this paper, we propose a novel heterogeneous information network embedding algorithm called the relational triplet network (RTN). In the data sampling phase, meta-schema-based random walks are performed to extract semi-hard quadruplets based on the node type and degree. In the representation learning phase, a relational triplet loss is designed to optimize the distance of triplet embeddings on diverse heterogeneous relationships. The empirical results demonstrate that our algorithm can obtain multiple types of representations and outperform other state-of-the-art methods in node classification and link prediction.

Introduction

Data mining on networks has attracted widespread attention in recent years. When performing typical network mining tasks, such as node classification [1], [2], [3], link prediction [4], [5] and clustering [6], [7], classifiers require accurate features about nodes, edges, communities and other elements. Therefore, a large number of network embedding algorithms have been proposed to represent a network as low-dimensional vectors while preserving the network structure [8].

However, most real-world networks are heterogeneous and involve rich semantic information. For example, in the bibliographic network illustrated in Fig. 1(a-c), the nodes can be authors, organizations, papers, terms or venues, and the edges can be writing, belonging to, citing, including or publishing relations. Heterogeneous information networks (HINs) are a generalization of such networks that consist of multiple types of nodes or edges. Learning the embeddings in such networks is clearly difficult because of the complex semantic relationships between nodes.

Many researchers have studied HIN embeddings and downstream classification tasks. Meta-paths [9], [10] and meta-graphs [11], [12], [13] are two major template tools for sampling specific relationships from HINs, as illustrated in Fig. 1(d,e). Most methods leverage neural networks [10], [14], [15] or skip-gram models [9], [13] to realize the embeddings of HINs. Alternatively, some address special HINs, such as coupled HINs [16], signed HINs [17] and HINs with node attributes [18], [15], [19]. However, these sampling methods either become inefficient with exponential growth of the meta-path or neglect the node degrees. For instance, in an r-relational (r>1) network, up to r(1-rl)1-r types of meta-paths could exist for which the lengths are less than l, and an author is difficult to fully sample if he publishes a substantial number of articles. Additionally, current representation learning algorithms are not sufficiently expressive to capture detailed semantic patterns between nodes.

To solve these challenges, we propose the relational triplet network (RTN), a heterogeneous neural network model for representation learning in HINs. In the data sampling phase, we design a meta-schema-based random walk algorithm to extract biased semi-hard quadruplets that can accurately model the details of HINs. In the representation learning phase, we employ a recurrent neural network (RNN) to encode heterogeneous relationships and construct a triplet residual network to aggregate node and relation embeddings. Then, a relational triplet loss is designed to optimize the distance of triplet embeddings on diverse relationships. For HINs, it is difficult to express complex semantic relationships between nodes with node embeddings alone. Alternatively, RTN obtains node embeddings in specific relationship spaces by fusing node and relationship embeddings through a complex neural network, enabling the network to capture more detailed semantic patterns. Empirical results demonstrate that our algorithm achieves relational HIN embeddings and significantly better performance at node classification and link prediction than state-of-the-art methods.

The remainder of this paper is outlined as follows: A brief review of related work is provided in Section 2. We describe the preliminary concepts and define the HIN embedding problem in Section 3. We propose a meta-schema-based random walk algorithm in Section 4 and the whole RTN framework in Section 5. In Section 6, we evaluate our algorithm using real-world networks. In Section 7, we present our conclusions.

Section snippets

Related work

Large numbers of network embedding algorithms already exist for homogeneous networks. Many methods learn node embeddings by fusing neighbour embeddings but often have a higher time complexity [20], [21], [22]. Roweis et al. proposed the locally linear embedding (LLE) approach [20], which assumes that every node is a linear combination of its neighbours in the embedding space. Kipf et al. proposed graph convolutional networks (GCNs) [21], which use a k-layer convolutional neural network to embed

Preliminaries

In this section, we introduce some basic mathematical concepts and the problem we address in this paper.

Definition 1 Heterogeneous Information Network (HIN)

An HIN [32] can be represented as G=V,E,ϕv,ϕe, where V=v1,v2,,vm is a set of m nodes, E=e1,e2,,en is a set of n edges, ϕv is a node type mapping function VTv, and ϕe is an edge type mapping function ETe. Tv and Te are sets of node and edge types, respectively. Heterogeneity is defined as satisfying Tv+Te>2. In addition, the multi-relational network satisfies Tv=1 and Te>1, which is a

Meta-schema-based random walk

In this section, we provide a detailed introduction of meta-schema-based random walks, which can sample semi-hard quadruplets from HINs. Because of the diverse node types and degrees, we design a flexible neighbourhood sampling strategy that enables us to preferentially sample certain types of nodes. First, we define the notion of a meta-schema, which describes complex constraints on the walk direction.

Definition 5 Meta-Schema

Given an HIN G=V,E,ϕv,ϕe and its network schema τG=(Tv,Te), a meta-schema S is a weighted

Relational triplet network

The difficulty in learning HIN embeddings is that the same node may exhibit different properties in different semantic relationships. To learn these semantic patterns, we design the RTN framework to obtain node embeddings in specific relationship spaces by fusing node and relation embeddings through a complex neural network, as illustrated in Fig. 8. In general, the whole framework consists of three major components: a representation learning module, a node classification module and a link

Experiments

In this section, we first introduce the datasets, baseline methods and experimental settings and then present the experimental results.

Conclusion

In this paper, we proposed RTN, a novel HIN embedding algorithm. In the data sampling phase, we design a meta-schema-based random walk algorithm to extract biased semi-hard quadruplets that can accurately model the details of HINs. In the representation learning phase, we employ an RNN to encode heterogeneous relationships and construct a triplet residual network to aggregate node and relation embeddings. Then, a relational triplet loss is designed to optimize the distance of triplet embeddings

CRediT authorship contribution statement

Xiyue Gao: Conceptualization, Data curation, Methodology, Software, Writing - original draft. Jun Chen: Conceptualization, Project administration, Writing - review & editing. Zexing Zhan: Visualization, Investigation. Shuai Yang: Software, Validation.

Acknowledgements

This work has been supported by the National Key R&D Program of China (No. 2017YFC0803700) and the National Natural Science Foundation of China (No. U1736206).

Xiyue Gao received his bachelor degree from Shandong University, Jinan, China, in 2014. Since September 2014, he has been working toward the Ph.D degree under the Master-Doctor combined program in communication and information system at School of Computer Science, Wuhan University, Wuhan, China. His research focuses on data mining, social network analysis and machine learning.

References (43)

  • S. Emmons et al.

    Analysis of network clustering algorithms and cluster quality metrics at scale

    PLOS ONE

    (2016)
  • Y. Zhou et al.

    Social influence based clustering and optimization over heterogeneous information networks

    ACM Trans. Knowl. Discovery Data

    (2015)
  • H. Cai et al.

    A comprehensive survey of graph embedding: problems, techniques, and applications

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • Y. Dong, N. V. Chawla, A. Swami, metapath2vec: Scalable representation learning for heterogeneous networks, in:...
  • T. yang Fu, W.-C. Lee, Z. Lei, HIN2vec: Explore meta-paths in heterogeneous information networks for representation...
  • H. Jiang et al.

    Semi-supervised learning over heterogeneous information networks by ensemble of meta-graph guided random walks

  • C. Yang et al.
  • Y. Shi et al.

    AspEm: Embedding learning by aspects in heterogeneous information networks

  • X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, P. S. Yu, Heterogeneous graph attention network, in: The World Wide Web...
  • C. Zhang et al.

    Heterogeneous graph neural network

  • L. Xu, X. Wei, J. Cao, P. S. Yu, Embedding of embedding (EOE): Joint embedding for coupled heterogeneous networks, in:...
  • Cited by (8)

    • Attentive gated graph sequence neural network-based time-series information fusion for financial trading

      2023, Information Fusion
      Citation Excerpt :

      A temporal relation network [37] is a relation network [35] extended to activity recognition. Gao et al. [38] use an architecture called relational triplet network (RTN) to capture semantic patterns in heterogeneous networks. Zheng et al. [39] proposed a relation network that takes advantage of meta-learning to solve the problem of few-shot caricature face recognition.

    • A survey of structural representation learning for social networks

      2022, Neurocomputing
      Citation Excerpt :

      Heterogeneous networks are still largely conceptual rather than a functional implementation model. Recently, there have been some developments in these attempted working methods [30,41]. Network data is a combination of data obtained from multiple sources, and this large-scale data with complex characteristics makes it more challenging to representation learning.

    • Deep Attributed Network Embedding Based on the PPMI

      2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    Xiyue Gao received his bachelor degree from Shandong University, Jinan, China, in 2014. Since September 2014, he has been working toward the Ph.D degree under the Master-Doctor combined program in communication and information system at School of Computer Science, Wuhan University, Wuhan, China. His research focuses on data mining, social network analysis and machine learning.

    Jun Chen received the M.S. degree in Instrumentation from Huazhong University of Science and Technology, Wuhan, China, in 1997, and Ph.D degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2008. Dr. Chen is the deputy director of National Engineering Research Center for Multimedia Software, and a professor in school of computer science, Wuhan University. His research interests include multimedia analysis, computer vision and security emergency information processing, where he has published over 50 papers.

    Zexing Zhan received his bachelor and master degree from the School of Computer Science at Wuhan University, Wuhan, China, in 2016 and 2019, respectively. His research focuses on data mining and social network analysis.

    Shuai Yang received his bachelor and master degree from the School of Computer Science at Wuhan University, Wuhan, China, in 2016 and 2019, respectively. His research focuses on data mining and machine learning.

    View full text