Exploiting Meta-Path with Attention Mechanism for Fine-Grained User Location Prediction in LBSNs

Based on the huge volumes of user check-in data in LBSNs, users’ intrinsic mobility patterns can be well explored, which is fundamental for predicting where a user will visit next given his/her historical check-in records. As there are various types of nodes and interactions in LBSNs, they can be treated as Heterogeneous Information Network (HIN) where multiple semantic meta-paths can be extracted. Inspired by the recent success of meta-path context based embedding techniques in HIN, in this paper, we design a deep neural network framework leveraging various meta-path contexts for fine-grained user location prediction. Experimental results based on two real-world LBSN datasets demonstrate the best effectiveness of the proposed approach using various evaluation metrics than others.


Introduction
The rapid advancement of smart mobile terminals and communication technologies has witnessed the prosperity of Location-based Social Networks (LBSNs) in recent years. On their heels, the typical LBSN platforms such as Foursquare and Yelp are also attracting thousands of users joining in every day. Through the location-aware services provided by these platforms, a large amount of user data (e.g. comments on certain restaurants and check-in records at some scenic spots.) which is associated with geo-tagged identifiers has been generated. Such data is rich in temporal-spatial and semantic information, which can be exploited for mining user mobility patterns and thus predicting where a user will go for his/her next visit. The application of user location prediction has proved beneficial for intelligent travel route recommendation [1,2], personalized product advertising [3,4], to name a few.
Although predicting users' next visit location has been one of hot issues in recent years and lots of embedding frameworks [5][6][7][8] have been proposed to learn users' distributed representation and locations in the latent space, these literatures mainly treat LBSNs as the homogeneous networks, which neglects the hidden semantics in interactions between different types of nodes.  Specifically, in Figure 1(a), there are three types of nodes, i.e. user node, location node and attribute node, and there exist four types of relations, namely, User-User, User-Location, Location-Location and Location-Attribute. In Figure 1(b), we extract four types of meta-paths for the < u i , l j > pair, and an example for < u 1 , l 2 > pair is given.
Indeed, as noted in [9,10], LBSNs are heterogeneous networks where different types of nodes and relationships can be observed. Therefore, the heterogeneous information network (HIN) embedding techniques, which are newly emerging learning frameworks with flexibility in characterizing diverse heterogeneous data [11], should be applicable for modeling interactions between user and location nodes in LBSNs.
To illustrate the motivation, we present a visual example in Figure 1. As we can see from Figure 1(a), there are three types of nodes in LBSNs, i.e. user nodes, location nodes and attribute nodes. Specifically, we use the venue category information associated with a venue in LBSNs as the corresponding attributes. LBSNs platforms such as Foursquare divide all Point-of-Interests (POI) into 10 categories. Following this convention, we give each category an abbreviation with one capital letter, where 'A' stands for 'Arts & Entertainment', 'C' stands for 'College & University', 'F' stands for 'Food', etc. Besides, there are four types of relations in LBSNs, i.e. user-user relation which represents social friendships, user-location relation which indicates check-in behavior, location-location relation which reflects successive transition pattern or geographical influence, and location-attribute relation which indicates what specific type a location belongs to. With regard to meta-path contexts, as we focus on the challenge of user location prediction, we only explore the interactions between users and locations, i.e. meta-paths for < u i , l j > pairs. Considering that long meta-paths are likely to introduce noise [12], we only select short meta-paths with sequence length up to four, and for a specific user-location pair < u i , l j >, four types of meta-paths are extracted, i.e. 'UUL', 'ULL', 'ULUL' and 'ULAL'. The detailed example of the meta-path contexts for < u 1 , l 2 > pair can be seen in Figure 1(b). As can be seen, we merge four meta-path contexts in an interaction between a user and a location, in other words, we use the triple-tuple < u i , meta-paths, l j > to characterize the contexts why user u i visits venue l j . To address user location prediction problem, we conceive a deep neutral network framework incorporating meta-path contexts as well as an attention mechanism.
The remainder of this paper is scheduled as follows. Section 2 briefly reviews related works. Section 3 introduces the deep neutral network framework. In section 4, we evaluate the prediction performance, and finally section 5 concludes our work.

Related Works
The location prediction of users has been a tough and excited problem during these years. Quite a lot of previous works deal with this problem based on collaborative filtering (CF) strategies  [13][14][15]. The basic objective of matrix factorization methods is to factorize a user-location matrix into two low rank matrices, with each matrix representing the latent representation of users and locations, respectively. Then a user's visit probability at a candidate location is fitted by the inner product of two vectors. However, CF based methods usually suffer from cold-start problem, which means they are generally infeasible for new users or new locations in the training data.
In recent years, embedding techniques, especially Heterogeneous Information Network (HIN) based embedding techniques [11,12,16] have drawn much attention from researchers. As it is able to depict semantic relations between different kinds of nodes using meta-paths [17], HIN embedding is becoming increasingly popular in venue recommendation, and thus can be effective for location prediction with minor modification. Hu et al. [12] propose to characterize a threeway interaction < u i , meta-paths, l j > for semantic item recommendation. Particularly, they use a co-attention mechanism to capture the weights for each meta-path context. Inspired by [12], we conceive a deep neutral network framework incorporating meta-path contexts as well as an attention mechanism for user location prediction. Unlike [12], we adopt a quite different random walk method to sample meta-path instances based on predefined rules, and simplify the attention mechanism for meta-path contexts learning. Besides, as we only have positive samples due to implicit feedback in user check-in data, we use a pairwise learning algorithm to maximize the margin between visited and unvisited locations, after which the involved parameters can be optimized.

Preliminaries
In this paper, we strive to predict the next visit location for each user relied on his/her historical check-in records in LBSNs. For a given LBSN dataset, suppose we have M users, i.e. U = {u 1 , u 2 , .., u i , ...u M }, and we have N locations, i.e. L = {l 1 , l 2 , .., l i , ...l M }. As a social network dataset, it also contains social relations (e.g. friendship relation) among some users. A venue in LBSN is defined as a uniquely identified site such as a cinema or a coffee shop, and it is associated with geographical coordinates as well as the category information indicating which type of location it belongs to. Depended on the data, we can construct a user-venue check-in matrix X ∈ R M ×N , where each entry x u,i ∈ {0, 1} represents whether user u has visited location i. Given the above preliminaries, we can formulate the location prediction problem to be solved in this work. Definition 1. User Next Location Prediction. Given the implicit feedback matrix X ∈ R M ×N with check-in records of each user u being C u = {< u, l 1 , t 1 >, < u, l 2 , t 2 >, < u, l 3 , t 3 > , ... < u, l n , t n >}, where each < u, l i , t i > denotes user u check-in at location l i and time t i . We strive to predict the next venue where user u will visit after time t n . To do so, we sort all the N feasible locations, thus the precise location that u will be visited next is sorted at the supreme position of the ranking list.

Our Model
Following the main idea of [12], we also leverage meta-path contexts to model the semantics in a user-location pair < u i , l j >. Specifically, we extract four meta-paths (see Figure 1(b)) and fuse the meta-path contexts with user and location embeddings through an attention mechanism. Due to the incorporation of such semantic information, the model is expected to deliver more interpretability than previous works. The overall framework of the deep neural network can be seen in Figure 2.  Figure 2. Framework of the deep neural network for user location prediction, where '⊕' means the vector concatenation operation, andx i,j represents the approximated real value of user u i 's preference for location l j . Besides, m i→j indicates the aggregate meta-path contexts embedding for the input < u i , l j > pair.

Datasets
We utilize two real world LBSN datasets for experimental evaluation, both of which are crawled from Foursquare. The first Foursquare dataset is provided by Xu et al. [18] and it includes 10,901 users who live in New York (NY) with more than 764,328 check-ins in total. The second Foursquare dataset comes from Yin et al. [19] and it contains the check-in history of 4,163 users who live in California (CA), USA. Both of the datasets contain explicit undirected friendship relations. To filter the noisy data in both datasets, we eliminate users who have less than 10 check-ins, as well as venues with fewer than 10 visitors. The basic statistics of both datasets and are shown in Table 1. Note that for each user in the datasets, we use his/her first 80% check-ins as the training data, the following 10% check-ins for validation, and the last 10% check-ins for testing. The overall framework is implemented using Keras with TensorFlow as the backend, where all hyper-parameters are set according to [12]. Table 1. Basic statistics of two Foursquare datasets.

Evaluation Metrics
In this work, we adopt two commonly used evaluation metrics in user location prediction [9], one of which is Acc@N and the other one is AP R. With regard to Acc@N , it's often employed to evaluate the recommendation or predication accuracy. Once sorting that location in the top-N list, we will be able to predict the next visit location for each user successfully. Herein, Acc@N is calculated by where #hit@N indicates the hits number over the all test set. We take the mean value in the whole test cases as the final result.
where |L| is the total number of candidate locations and rank(k) is the rank of the ground-truth location in the list. We again take the mean value in all the prediction instances as the final one, i.e. Average P R (AP R).

Comparison Methods
We select two models as comparison, i.e. (1) PRME [20], which embeds user and location into the same latent space to capture the user transition patterns. The geographical influence is incorporated through a simple coefficient.
(2) GE [6], which jointly learns the embeddings of locations, regions as well as venue categories.

Result Analysis
First of all, we compare the location prediction performance over all methods. To distinguish our model from other comparison models, we denote it as MP Loc Pre, which is short for M eta-P ath based Location P rediction. Table 2 summarizes the numerical results in terms of Acc@10, Acc@50 as well as AP R. As we can see, for one thing, location prediction models considering meta-path contexts achieve the highest prediction performance at all facets.

Conclusion
In this paper, we investigate the issue of user next location prediction depended on user checkin data in LBSNs. Inspired by the recent success of HIN embedding techniques, we conceive a deep neural network framework incorporating meta-path contexts and a simple attention mechanism. To deal with the implicit feedback, we optimize the model with negative sampling and use a pair-wise training method to maximize the margin between visited and unseen locations. Experimental results verify the effectiveness of the proposed approach as it can achieve comparable and even better performance than PRME and GE.