MINE: A Method of Multi-Interaction Heterogeneous Information Network Embedding

: Interactivity is the most significant feature of network data, especially in social networks. Existing network embedding methods have achieved remarkable results in learning network structure and node attributes, but do not pay attention to the multi-interaction between nodes, which limits the extraction and mining of potential deep interactions between nodes. To tackle the problem, we propose a method called Multi-Interaction heterogeneous information Network Embedding (MINE). Firstly, we introduced the multi-interactions heterogeneous information network and extracted complex heterogeneous relation sequences by the multi-interaction extraction algorithm. Secondly, we use a well-designed multi-relationship network fusion model based on the attention mechanism to fuse multiple interactional relationships. Finally, applying a multitasking model makes the learned vector contain richer semantic relationships. A large number of practical experiments prove that our proposed method outperforms existing methods on multiple data sets.


Introduction
The development of mobile Internet has promoted the rise and prosperity of online social media. By 2018, Facebook's monthly active users have exceeded 2 billion [Hu, Li, Yang et al. (2019)], and Sina Weibo's monthly active users have reached 462 million [Cui, Wang, Pei et al. (2018)]. A large amount of interactive data is continuously generated on social networks. This interactive data embodies the dynamic relationship between people and is a true portrayal of people's daily social life. To effectively mine and utilize this data is of great significance for public opinion analysis, recommendation system, community detection [Hu, Li, Yang et al. (2019); Xu, Wang, Wu et al. (2020)]. Traditional graph representation methods have the disadvantages of high algorithm complexity, difficulty in parallel processing and applying to current machine learning and deep learning models, which will no longer be applicable to large-scale information networks. Therefore, how to find an efficient network representation method, transforming the traditional network structure representation into a low-dimensional vector carrying potential information is an urgent problem to be solved [Wang, Cui and Zhu (2016)]. In order to solve the problems of traditional network representation, many experts have been committed to the new field of network embedding, which is to convert the traditional network structure representation into a low-dimensional node vector carrying potential information. Initially, some classic methods, including the Laplacian feature map [Belkin and Niyogi (2002)], IsoMap [Tenenbaum, De Silva and Langford (2000)] were used to reduce the dimensionality of the graph structure data. This kind of methods first needs to construct the relation graph based on the extracted features using K-Nearst or other methods, and then transforms the relation graph into the low-dimensional vector space [Tang, Qu, Wang et al. (2015)]. Due to a large number of iterations and combination operations are required to construct the relationship graph, such methods are time-consuming and cannot be applied to large-scale network embedding. To deal with large-scale networks, Ahmed et al. [Ahmed, Shervashidze, Narayanamurthy et al. (2013)] proposed a novel decomposition technique to enable its dimensionality reduction processing to run on a decentralized system. However, this method was not originally designed for network embedding. Therefore, the resulting low-dimensional vector does not preserve the semantic relationship between network structure and nodes [Bourigault, Lagnier, Lamprier et al. (2014)]. Inspired by the natural language processing model Word2Vector Mikolov et al. [Mikolov, Chen, Corrado et al. (2013)], DeepWalk [Perozzi, Al-Rfou and Skiena (2014)] treats each node as a word, uses the Random Walk strategy to generate a series of node sequences for each node, then treats these "sequences" as input of the Word2Vector to get a low-dimensional vector representation of each node. Later, some improved methods based on DeepWalk were proposed. Node2Vec Grover et al. [Grover and Leskovec (2016)] improved the node sequence generation strategy, which is no longer a completely random strategy. It introduces the ideas of BFS and DFS and effectively fuses them to make the resulting sequence, which can better preserve the local and global information of the network structure. LINE Tang et al. [Tang, Qu, Wang et al. (2015)] introduces the concept of first-order and second-order proximity, which makes the model more comprehensive in the acquisition of structural information, and the method of negative sampling it introduced can accelerate the learning speed. With the development of BP Neural Networks Zhao et al. [Zhao, Zhang, Shi et al. (2019)], some network embedding methods based on deep learning have been proposed. Chang et al. [Chang, Han, Tang et al. (2015)] introduces heterogeneous information into the network, including texts, images and makes full use of the feature extraction ability of deep learning to achieve state-of-the-art performance. The existing network embedding methods have achieved great success in the fields of node classification, link prediction, recommendation system, community discovery. But there are still the following shortcomings: Most focus on isomorphic interactions. Different relationships are not treated differently. However, the relationship between network nodes in reality is intricate. The simple static network relationship cannot fully reflect the rich network relationship between nodes, and the complex dynamic interaction between nodes will be the key to mine the potential relationship of the nodes. Take the Twitter network as an example. Users have relationships of following, tweet forwarding, tweet commenting, and liking. Different relationships represent different semantic relationships and ignoring the differences between these relationships will lose a lot of potentially important information, as shown in Fig. 1. There is no distinction between the importance of different interactions and nodes. Not all relationships should be valued, and some relationships may have a negative effect on node representation learning. At the same time, the importance that can be reflected by positive interactions is different [Chen, Wang, Wang et al. (2019)]. Indiscriminate consideration of all node relationships will introduce some noise, affecting the effect of network embedding. Learn and optimize network representations with a single task. Most existing network embedding methods only learn and optimize for a certain task or target, such as node attribute similarity and node label classification. These methods may perform better after being optimized for their respective tasks [Zhu, Wang, You et al. (2019)]. However, the rich relationship between nodes cannot be fully explored, so that the learned network node representation is not well suited for other downstream applications. To tackle the problem, we propose a method of multi-interaction heterogeneous information network embedding (MINE), which can effectively capture different interactions between different nodes and effectively combine them to improve the effect of network embedding representation learning. The intuition behind MINE is that different interactions represent different semantic relationships in low-dimensional vector space, and cross-pattern interactions can mine deep potential semantic relationships (see Fig. 1). More importantly, the different interactions between nodes play different roles in the final network embedding representation. Based on the above intuition, we first separate the multi-interaction network into a network of various single interaction patterns according to the characteristic interaction relationship model and learn the respective node representations separately. Next, we fuse the representations of the various interactive network nodes by different weights. Inspired by the Attention mechanism, we propose an attention strategy for calculating the weights of different nodes in different interaction patterns. Finally, we use multi-tasking to optimize representation learning.
To the best of our knowledge, this article is the first network embedding method to fuse multi-modal interactive networks. Our contributions can be summarized as follows: 1) We introduce multi-interactive heterogeneous information network (MIHIN) representation method and extract complex heterogeneous relations sequences by multiinteractive extraction algorithms. 2) We carefully designed a multi-relational network fusion model based on attention mechanism to fuse multiple relationships.
3) We propose a multi-tasking model network embedding model, making the learned vector cover richer semantic relationships. 4) Through a lot of practical experiments, we have proved that our proposed method has superiority in interactive prediction. The overall framework of MINE is shown in Fig. 2. From bottom to top, there are multiinteractive network representation and interaction mode extraction algorithm, single mode network embedding layer, attention-enhanced interaction fusion model and multitask network embedding optimization layer. The input of the entire model is the sub-network edgelist of various interaction patterns; the multi-interactive network presentation method is responsible for combining the subnetworks of the various modes as a hybrid interactive network and the sequences of nodes in a specific pattern are extracted by the multi-interaction sequence extraction algorithm (in Section 2.1); the single mode network embedding layer is responsible for mapping the node's one-hot coding to a low dimensional vector. To better explore the interaction between users in different patterns and the influence of different patterns in social networks, the multi-network embedding layer based on Attention mechanism is used to fuse multi-mode network vector representations (in Section 2.2); Finally, the multitasking layer is used to improve the feature extraction capabilities of the entire network model and the ability to represent embedded vector embedding (in Section 2.3).
Next, we will detail each part of the model.

Figure 3: Schematic diagram of hybrid interactive sequence extraction
To better explore the complex interactions existing in the real social network, we first fuse the various single interactive sub-networks as shown in Fig. 3(a) to obtain multiinteractions heterogeneous information network (MIHIN) as shown in Fig. 3(b). It can be seen that there may be multiple edges of different colors between two nodes in the MIHIN network, that is, different interaction relationships may exist between the two nodes. Inspired by metapath2vec [Dong, Chawla and Swami (2017)], we use the multipattern path matching node sequence generation algorithm to generate hybrid interactive sequences. The difference is that metapath2vec extracts different types of nodes, but in our research, there are different types of interactions (as shown in Fig. 3(c)). Specifically, for a multi-interactive heterogeneous information network and an interaction pattern, the transition probability is as follows: (1) It can be seen that the sequence of walking strictly follows the relationship pattern, and the walk is performed according to the pattern circularly: (2) is derived from Eq. (3). It defines the walk strategy under the condition that the relationship of the step of the relationship mode is satisfied, and the corresponding schematic diagram is as shown in Fig. 4. Direction and distance are added to make it better to capture the bi-directional distance relationship between nodes [Zhu, Sun, Cao et al. (2019)].
( 3) where is the bi-directional distance between and , represents the probability of returning to the previous node, and represents the probability of moving away from the previous node.

Attention-enhanced interaction fusion model
In the previous section, we obtained a node vector representation for each interaction pattern. To distinguish the contribution of different interaction patterns to the final node vector representation, one basic approach is to set different weights for each mode. The interaction pattern that has a greater impact should be set to a larger ratio. First of all, it is difficult to determine which interaction patterns contribute more to the final variable vector representation. Second, the setting of the specific ratio is a challenge task, and it is difficult to find the best ratio of the final merged fusion representation. To solve the above problems, we introduced the Attention mechanism to learn these weights autonomously according to the target tasks. The ratio of each pattern is defined as follows: (4) where is the presentation of in the interaction pattern of , is the corresponding ratio learned by Attention Net. In particular, we keep original vector of follow relation pattern The learning of the specific ratio is performed by taking and the node vector representations in various interaction patterns as inputs. Finally, the fused vector is represented as: (5) where is the final fusion vector representation of , is the representation of of follow relation pattern. can be continuously updated and adjusted to achieve optimal values as the entire model is trained.

Multi-task network embedding optimization method
In the previous section, we get the final multi-interaction fusion vector representation of each node, which contains rich semantic information in each interaction pattern. In this part, we design a multi-task learning model, which uses the fusion vector representation of nodes as a shared input and uses each task to learn together to better mine the potential semantic information. In the multi-tasking learning of this paper, the main task is used to evaluate and optimize the co-occurrence frequency of nodes within the specified window size of the node sequence. Therefore, it is necessary to minimize the objective function defined by Eq. (6). (6) where is the context node set of node in the follow network. is the cooccurrence frequency of nodes and . The auxiliary task is to predict the sequence of nodes in each interaction mode through the LSTM network, and the sequence refers to a sequence obtained by a specific pattern interaction sequence extraction algorithm. The input is the previous node, and the target is the next node in the sequence, as shown in Fig. 5. For each interactive pattern network, the goal of this task is to maximize the probability of the next node: where and We add the SoftMax function as the final output after the output of the model, see Eq. (8).

(8)
For each sequence, the cross-loss function is used to measure the loss of the model, see Eq. (9).
(9) Figure 5: The diagram of interaction node sequence prediction

Experiment
In this section, we will validate the performance of the model through interactive prediction tasks on a variety of practical network datasets. To verify the performance of the model, we use the accuracy indicator to evaluate the model.

Datasets
We used the following open source real social network datasets from different platforms in our experiments.
• YouTube Dataset. A multi-dimensional network consists of various type of interactions which can be used to study shared communities among heterogeneous interactions. This is the data set crawled on Dec, 2008 from YouTube which is a video sharing site where various interactions occur between users. It contains five interactive pattern networks, we have selected the following: : the contact network between the 15088 users; : the number of shared friends between two users in the 848, 003 contacts; : the number of shared subscriptions between two users; : the number of shared favorite videos.
• Higgs Twitter Dataset. The Higgs dataset has been built after monitoring the spreading processes on Twitter during the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered. It contains four interactive pattern networks： : friends/followers social relationships among users.
: re-tweeting (retweet network); : comment network to existing tweets; : mentioning (mention network) other users; Detailed statistics for each data set are shown in Tab. 1. As can be seen from the table, the scale of each network is different, and the average degree of nodes is also different, which fully reflects the unbalanced distribution of nodes in real social networks and can better verify the performance of the model in the real environment.

Experiment setup
For the three benchmark methods DeepWalk, LINE, and Node2Vec, because they cannot handle heterogeneous interactions, we use the constructed multi-interactive heterogeneous information network as its input and treat all edges as equivalent types. For all network embedding methods, we set different parameters according to different data set sizes and characteristics, but in each data set, the parameters are same. For the relatively small dataset Higgs Twitter, we set the output dimension d to 128, and since the shortest path length of its network structure is only 9, the sequence length is set to 9; for the larger datasets Tencent Weibo and YouTube, we set the output dimension to 256 and the sequence length to 20. The learning rate is uniformly set to 0.001, the window size is set to 7, and the optimal and are found by grid search.
For our proposed model, we also need to set the interaction patterns. For the YouTube dataset, we choose the friend relationship pattern and three interaction modes: : (Used to find people who have a common subscription with who interact closely with ); : (To find people who share a hobby with who interact closely with ); ： (used to find people who share a hobby with who have a common subscription with ). For the Twitter dataset, we choose the friend relationship and three interaction patterns: : (used to find people person mentioned by who frequently retweets tweet); : (used to find people who are commented by who often retweets tweet); ： (used to find people who are mentioned by who often comments tweet).

Experiment result
After setting the parameters, we use the node embedding results as input to perform the node interaction prediction task to evaluate the performance of our proposed model and compare it with the benchmark model. The node interaction prediction task we defined refers to predicting whether two nodes have the interaction relationship set in our experiment. It is worth noting that the interaction relationship between two nodes does not mean that there is a direct connection between the two nodes. For example, and in Fig. 1 have no directly connected edges, but they have an interactive relationship between Follow-Retweet-Review and Follow-Retweet-Like.
On each dataset, we selected different proportions of nodes as training data for training, and the rest as validation sets. Each experiment is the result of an average of 10 different sampling nodes. Tab. 2 shows the performance of each model under two datasets. As can be seen from the table, the models we propose are better than the benchmark model. Among them, LINE gets the worst result. The main reason is that LINE only pays attention to first-order proximity and second-order proximity. For two nodes that are far away, it cannot learn the semantic relationship. Therefore, it cannot Properly predict the potential long-distance interaction.

Parameter sensitivity analysis
In this section we explore and analyze the sensitivity of our proposed model to different parameters, including the dimension of the network embedding (d), the number of trainings (N), and the length of the sequence (L). For the hyperparameter d, we set it to 16, 32, 64, 128, 256, 512 respectively; the sequence length increases from 10 to 50; the training times increases from 50 to 1000, and the change of accuracy is observed. We can see that when d exceeds 128 dimensions, the accuracy will be stable, which proves that our model can capture the potential mixed interaction information between nodes with low latent embedded vectors. Similar results are shown in Figs. 5(b) and 5(c). When the sequence length is greater than 20, our model also obtains stable and good results. For the training times, the model can convergence just be trained after 300 times, which reflects the efficiency of our model.

Conclusion and discussion
In this paper, in order to solve the problem of multi-interaction embedding between users in social networks, we propose the MINE model. Firstly, we introduce the multiinteractions heterogeneous information network and extract complex heterogeneous relation sequences by the multi-interaction extraction algorithm. Secondly, we exploit a well-designed multi-relationship network fusion model based on the attention mechanism to fuse multiple interactional relationships. Finally, applying a multitasking model makes the learned vector cover richer semantic relationships. A large number of practical experiments prove that our proposed method outperforms existing methods on multiple data sets. However, our model also has some shortcomings. For example, when analyzing interactions, we do not consider the chronological order of interactions; we do not introduce textual emotion information to better explore the relationship between users. These are our future work.