Exploiting high-order behaviour patterns for cross-domain sequential recommendation

The cross-domain sequential recommendation aims to predict the next item based on a sequence of recorded user behaviours in multiple domains. We propose a novel Cross-domain Sequential Recommendation approach with Graph-Collaborative Filtering (CsrGCF) to alleviate the sparsity issue of user-interaction data. Specifically, we design time-aware and relation-aware graph attention mechanisms with collaborative filtering to exploit high-order behaviour patterns of users for promising results in both domains. Time-aware Graph Attention mechanism (TGAT) is designed to learn the inter-domain sequence-level representation of items. Relationship-aware Graph Attention mechanism (RGAT) is proposed to learn collaborative items' and users' feature representations. Moreover, to simultaneously improve the recommendation performance in the two domains, a Cross-domain Feature Bidirectional Transfer module (CFBT) is proposed, transferring user's common sharing features in both domains and retaining user's domain-specific features in a specific domain. Finally, cross-domain and sequential information jointly recommend the next items that users like. We conduct extensive experiments on two real-world datasets that show that CsrGCF outperforms several state-of-the-art baselines in terms of Recall and MRR. These demonstrate the necessity of exploiting high-order behaviour patterns of users for a cross-domain sequential recommendation. Meanwhile, retaining domain-specific features is an important step in the process of cross-domain feature bidirectional transferring.


Introduction
Sequential recommendation (RS)  predicts the next items by leveraging the user's historical interactions. Many existing methods have followed the same propagation rules in GNN to model users' behaviour sequences and achieve promising results. However, these RNN-based methods have limited capability to accurately capture the intention of users who lack rich historical behaviours. Data sparsity has become a significant challenge problem that most sequential recommendation methods are confronted with. Cross-domain sequential recommendation (CSR) (D.  considers cross-domain historical behaviours of users to alleviate data sparsity for improving the performance of personalised recommendations (C. . Data from multiple domains might effectively discover the users' current preferences, which has been extensively applied in advertising, e-commerce and web search (P. Li et al., 2021;Z. Wang et al., 2019). For example, after a person watches a movie in the movie domain, that person is more inclined to listen to the theme of the movie in the music domain, or read a novel with the same name in the book domain, and then watch another similar movie in the movie domain.
Transfer learning (TL) (Pan & Yang, 2009) is an effective solution to the data sparsity problem, recognising and applying knowledge learned in related domains to a new domain. In real life, a client unavoidably interacts with multiple domains to fulfil the need of his life (Z. Cao et al., 2021). When the interaction history is less in domain S, it is natural to consider getting some common knowledge from related domain T that includes more data. Earlier, transfer learning was introduced to recommendation systems to address the challenge of data sparsity (B. Cao et al., 2010;Hu et al., 2018Hu et al., , 2019B. Li et al., 2009), which leverage explicit feedbacks from multiple domains to promote recommendation performance with collaborative filtering. Many research proved that the recommendation performance could be improved by transfer learning from multiple domains. Later, several methods take deep learning as a basic model to enhance the effect of transfer learning (Guo et al., 2021;Hong et al., 2020;Z. Liu et al., 2019;Zhao et al., 2019). This is an effective method to not only leverage sharing information between domains, but also differences from different domains. On this basis, more effective methods are proposed. BITGCF (M. Liu et al., 2020) proposed bi-directional transfer learning for data sparsity in cross-recommendation. DSLN (H. Liu et al., 2022) consists of two deep learning networks to extract the features of users and items from users' reviews and select useful knowledge from the auxiliary domains.
Despite their effectiveness, it is a significant challenge to alleviate the data sparsity problem because of the following two main issues. Firstly, existing methods gain user representation from the common features in different domains, but without considering domain-specific features. Such simple knowledge transferring leads to modelling users' preferences and interests exactly the same in both domains, which might reduce the impact of transfer especially when both domains have low similarity. This means that it should model the features of users in a single domain by learning users' features from multiple domains. Secondly, current approaches model sequential dependencies between items ignoring interaction intervals, which only consider items transformation relationships in relative temporal. In fact, the interaction time interval is closely related to the sequential dependencies between items.
Given the above limitations and inspired by the idea of graph-collaborative filtering (X. , we propose a Cross-domain Sequential recommendation with Graph-Collaborative Filtering (CsrGCF) to solve the data sparsity issue by mining the users' highorder behaviour pattern and transferring knowledge among different domains. Specifically, we first propose TGAT to obtain complex sequential dependencies between items from the user history behaviour sequence. Then a novel module RGAT is proposed to learn item and user collaborative feature representations based on the user-item interaction graph. Finally, we note that common sharing features and domain-specific features motivate the module. Thus, we propose CFBT can better fit the real-life data and achieve better performance. Furthermore, we empirically demonstrate that our proposed model outperforms all stateof-the-art models on two real datasets. Compared to previous methods, the proposed model has the following contributions: • We propose a new cross-domain sequential recommendation algorithm by applying the Neural Graph-Collaborative Filtering for recommendations. In CrsGCF, we design timeaware and relation-aware graph attention mechanisms to exploit high-order behaviour patterns of users for promising results in both domains. • A novelty Cross-domain Feature Bi-directional Transfer module (CFBT) is proposed to capture common sharing features and domain-specific features representations. Moreover, we conduct simulation experiments to validate the importance of users' domainspecific features. • We conduct extensive experiments on real-world datasets, validating that CsrGCF can improve the recommendation performance. Furthermore, our proposed model effectively alleviates the data sparsity issue.
This paper is organised as follows. In Section 2, we briefly review recent literature related to the cross-domain sequential recommendation. Then, We give a detailed explain the CsrGCF method in Section 3. In Section 4, the experiments and analysis are presented in detail. Finally, we summarised the main content of the paper and concluded with future works in Section 5.

GNN-based sequential recommendation
The rich data can be represented as a graph structure in the recommender system. In recent years, with the improvement of graph data learning capability, graph neural network (GNN) (G. Li et al., 2019;Z. Wu et al., 2020) has become one of the most successful methods for recommender systems. It has been widely applied in various recommendation tasks. Recent advances in graph neural networks further boost the performance of sequential-based behaviour prediction by modelling behaviour sequences as a graph to achieve advanced performance (K. Xu et al., 2018). Wu et al. proposed SR-GNN (S. Wu et al., 2019) for the first time by constructing a directed graph of session data, which applies a gated graph neural network (GGNN) (Weinzierl, 2021) to learn feature presentations of users and items. However, its design is too complex. After that, the idea of stacking multiple embedding propagation layers by exploiting user-item interaction graph promoted inferring users' preferences. Specifically, some research (Cui et al., 2020;Qiu et al., 2019;Y. Wu et al., 2018;C. Xu et al., 2019) is designed to model the sequential order of items and generate predictions based on historical sequential behaviours. GCE-GNN (Z. ) employs a session-aware attention mechanism to recursively incorporate the neighbours' embeddings of each node on the graph. C. Ma et al. (2020) propose a memoryaugmented graph neural network to capture items' short-term contextual information and long-range dependencies. In summary, the above works take into account simple relationships for sequential item pairs. However, complex relationships between non-sequential pairs of items can promote recommendation effectiveness.
In this paper, we propose time-aware and relation-aware graph neural networks to exploit the user's high-order behaviour patterns for overcoming data sparsity issues and improving recommendation performance.

Cross-domain recommendation
Cross-domain recommendation leveraging data from multiple domains has been proven effective in dealing with data sparsity and cold-start issues . Traditionally, existing methods had two main ways. One is to aggregate knowledge between multiple domains. Another is to transfer knowledge from the source domain to the target domain. The method to transfer knowledge is abundant. For example, Collective Matrix Factorisation(CMF) (Singh & Gordon, 2008) and Codebook Transfer (B. Li et al., 2009) are based on Matrix Factorisation(MF) applied in all domains (M. Liu et al., 2020). Ma et al. proposed π -Net (M. Ma et al., 2019) to generate recommendations for two domains simultaneously: the shared account filter unit (SFU) addresses the challenge raised by shared accounts, and the cross-domain transfer unit (CTU) addresses the challenge raised by the cross-domain setting. Then, Guo et al. further improved the SCRM (Lei et al., 2021) model by replacing RNN with a self-attentive network for π -Net to address that RNNs cannot model long-term dependencies among items. This proves that it is effective to adopt transfer learning for improving cross-domain recommendation performance and alleviating data sparsity. We design the process of transferring knowledge to combine common sharing features and domain-specific features.

Neural graph-collaborative filtering
The recommender system typically mines users' preferences by exploring the history interactions to provide personalised services. Collaborative filtering (CF) is the earliest and most widely used recommender system by modelling users' preferences based on the interactive history (He et al., 2017). Neural Graph-Collaborative filtering (NGCF) learns users and item preferences to predict items by exploiting user-item interaction graph. For example, He et al. devised a general framework NCF (He et al., 2017) that model user-item interactions differently. NGCF exploits the user-item graph structure by propagating embeddings on it. It leads to the expressive modelling of high-order connectivity in a user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner (Lian & Tang, 2022). LR-GCCF , LightGCN (He et al., 2020) and Multi-GCCF (Sun et al., 2019) improve on NGCF, respectively. However, the above works learning user features require sufficient user-item interaction data, which performance is significantly affected by data sparsity and cold-start problems. In this paper, we introduce transfer learning to transfer interaction data between two domains to improve the recommendation performance in two domains.

Problem definition
Cross-domain Sequential Recommender recommends the next item via leveraging a user's historical interactions in different domains. This paper focuses on solving the data sparsity problem in cross-domain sequence recommendation. We consider two different domains T and S. We represent a set of cross-domain overlapping users U = {u 1 , u 2 , u 3 , . . . , u l } in both domains T and S, where l is the number of overlapping users. Let the set of all items in both domains T and S be I T = {I T 1 , I T 2 , . . . , I T t } and I S = {I S 1 , I S 2 , . . . , I S s }, respectively. To obtain the sequential dependencies between items, we apply the interaction sequences of a specific user in two domains. Take the user u as an example, let S S = {I S 1 , I S 2 , . . . , I S m }, S T = {I T 1 , I T 2 , . . . , I T n } be the interaction sequences from domain T and S, respectively, where I d k donates the kth session for user u in domain d (d ∈ {S, T}). Given S S and S T , the task of CsrGCF is to recommend the next item based on the past behaviour sequences in the two domains. The recommendation probabilities for all candidate items in domains T and S are:

An overview of CsrGCF
In this paper, the proposed CsrGCF tempts to alleviate the data sparsity issue and to improve the accuracy of recommendation results in cross-domain sequential recommendation scenarios. Figure 1 presents an overview of CsrGCF, we will describe each component of the methods in detail.

User and item initialisation embedding layer
To effectively model features of users and items, this module maps the information of users and items into embedding vectors, where e (0) i d are the initialisation embeddings of user u (u ∈ U) and item i (i ∈ I) in d (d ∈ {T, S}) domain. P and Q are the learnable parameters matrix of user and item. Note that h u d and h i d are one-hot encodings of user u and item i.

Inter-domain item sequence-level representation learning module
In this module, we mine high-order complex transformation relationships and temporal dependencies between items from a given historical behaviours sequence of a user in a specific domain. Given the historical behaviour sequence of a user, the information of interitem interaction time can reflect the association relationship between items. Besides, the closeness of the relationship between items is inversely proportional to the length of the interaction time interval. In other words, the shorter the time interval between two items interaction, the more intimate between the two items. Then, a time-aware graph attention mechanism (TGAT) is designed to learn the inter-domain sequence-level representation of items. We construct directed session graphs , which based on the method mentioned SR-GNN. V and E denote the nodes and edges, V can be the set of items that have been interacted with and E representing relations between users and items. The time-aware sequential schematic as shown in Figure 2. When there is a bidirectional edge between two nodes in a directed graph, the degree of association between the nodes is greater than that of only one edge. To clearly show this difference, we designed four types of edges r ij = {r out , r in , r in−out , r self }. r out denotes that there is only one directed edge from item i to item j, and r in denotes that there is only one directed edge from item j to item i, and r in−out denotes there is a bidirectional edge connected two nodes i and j, and r self denotes that it's a self-connecting edge of a node i. The weight matrix W d is a matrix of time-aware weights calculated from the interaction time between items in the sequence, and the ω i,j are calculated as follows: where t ij is the interaction time interval between items i and j. t max and t min denote the maximum and minimum interaction time interval between two items in a given sequence, respectively. Items with greater ω i,j (W[i, j] = ω i,j ∈ [0, 1]) are more likely to be recommended to users. We have learned the presentations of items from the above-directed session graphs. The information of each relationship is aggregated, and the presentations of nodes are updated. Improved edge-type-aware attention is designed to model the association between items by considering the weight ω i,j : where α ij denotes the relevance between items i and j. γ r ij and ω ij are the edge-type-aware attention weight and time-aware weight, respectively. Further, to increase the comparability between nodes, the above attention weights are performed normalised as flow: where ρ ij denotes the contributions of different neighbour nodes to the target node. The previous node sequence representation has an impact on the current node, aggregating the previous sequence representation: TGAT aggregates the information of one-hop neighbour nodes from the target node. And multi-layers Ta-GAT obtains a k-order representation of items, which aggregates more available auxiliary information for modelling higher-order complex relationships between items: where f (·) is aggregation functions.

Item and user-collaborative representation learning module
In this section, we mine more information from the interaction behaviours of users in a domain for sequential recommendations. We model user-item bipartite graph G B = {U, V, E}. U denotes the set of users, V denotes the set of items, and E denotes the interaction behaviours between users and items. Intuitively, the interacted items provide direct evidence of a user's preferences. A user's preferences are influenced by the items he has interacted with and by users with similar interests and tastes. A relationship-aware graph attention mechanism (RGAT) is proposed to learn the collaborative representations of users and items. Inspired by R-GCN (Schlichtkrull et al., 2018), we treat edge "u → v" and edge "v → u" as two different types of edges, which denote "user u like item v" and "item v is liked by user u", respectively. Different types of edges have different effects on the final recommendation results. Take the representation learning process of domain T as an example, and domain S is the same as T method: where E i ∈ {e u , e v }, R denotes the edge four types, N(i) is the set of neighbour nodes of node i, σ a ij is the relation attention of items i and j: where q is the transfer matrix, ⊕ is the embedding concatenate operation. In this way, it can effectively model the complex transformation relationships between nodes in interaction graph and enrich the collaborative representations of items and nodes.

Cross-domain features bi-directional transfer modules
The item feature aggregates the item collaborative representation and the item sequence embeddings. Besides, we refine the user feature representations by leveraging feature aggregation and feature transfer. We use the same approach to aggregate users and items features. Since the characteristics of the items are stable, in this paper, we only consider the transfer of the user's preference representation across two domains. Then we propose a novel Cross-domain Feature Bi-directional Transfer Module (CFBT) to learn the bi-directional transfers of feature across two domains. Taking the red path in Figure 3 as an example, for item node i T 1 , it updates its representation by aggregating the information from its one-hop neighbour user node u 1 , while user node u 2 also aggregates the information from its one-hop neighbour item node i S 2 , then the user i T 1 and item i S 2 establish the following cross-domain association path across two domains: u 1 → i T 1 → u 2 → i S 2 . From the path, the connectivity between i T 2 and i S 2 can be learned. Most previous works only consider the common features across two domains. The difference between our method and previous methods is that we also consider domain-specific features. The bi-directional feature transfer process is as follows: where e (k) u T and e (k) u S are calculated from Equation (8), μ (u S ) and μ (u T ) denote user-related weight factors across domain two domains. α S and α T are hyperparameters used to control the proportion of user features retained in the corresponding domain. Besides, where N (u S ) and N (u T ) denote the number of one-hop neighbours for user node u, specially, number of items that user u has interacted with in the S and T domains. This is because the more interaction records a user has in a domain means the more distinctive the feature is in that domain. As a result, there is a greater proportion of features in this domain and a smaller proportion of transfer features from other domains. The user feature representation in S domain based on transfer features is Note that domain T can be processed similarly to calculate e * (k) u T .

Prediction and optimisation layer
After the above steps, we obtain the feature representations of users and items learned from behaviour sequences. After the embedding of the sequence-level item and itemcollaborative representation are obtained, we calculate the possibility y d ui to recommend item i to user u in domain d: where ⊕ is the embedding concatenate operation. And (, ) T denotes the transposition of the matrix. We use the binary cross-entropy function as the objective function for model training and optimisation: where S + and S − denote the given set of interaction history records and the set of user history interaction records constructed by random sampling, respectively.

Experimental setup
We intend to answer the following research questions: RQ1 How does our proposed CsrGCF method perform compared with other state-of-theart methods in cross-domain recommendation scenarios? RQ2 What is the effect of different modules in the methods for the recommended performance?

Datasets
We evaluate the effectiveness of CsrGCF on real-world datasets that Amazon 1 released. The Amazon datasets contain user interactions in multiple domains, satisfying the crossdomain sequential recommendation scenarios. We chose Movie-Book and Food-Kitchen cross-domain datasets. We process the data by the method in article (M. Ma et al., 2019). We aim to improve the cross-domain recommendation performance and alleviate the data sparsity issue in both domains. We retain those overlapping users who interacted with more than 10 items and items with more than 10 recorded interactions. Besides, we need to preprocess the behaviour sequences for a sequential recommendation. We first sort the user interaction records chronologically and split them into subsequences. These subsequences are treated as sessions containing successive interactions and at least three items. The statistics of the two processed datasets are shown in Table 1.

Evaluation protocols
For evaluation, in our experiments, the widely used evaluation metrics of Recall and MRR are adopted to evaluate the performance of all algorithms. Recall measures the proportion of cases of all test cases in which the correct result is amongst the Top-K with K ∈ {10, 20}. MRR is the average reciprocal rank of the correct items, which is the mean value of the ranking of the first select item.

Baselines
To demonstrate the effectiveness of CsrGCF, we use the single-domain recommendation models, single domain sequential recommendation models, cross-domain recommendation models, and cross-domain sequential recommendation models. We set the number of GNN layers to 3. We compare our proposed TRaGCF with competitive recommendation methods: Single-domain Recommendation Model: • POP: recommends items based on the popularity of the items in the training data.

Results and discussion
Figures 4 and 5 show the comparison results of CsrGCF over other baselines on Movie-Book dataset and Food-kitchen dataset,respectively. According to the evaluation criteria MRR and Rall, CsrGCF achieves better performance compared to other baselines for Top-K recommendations. For single-domain and cross-domain recommendation scenarios, the sequential recommendation models outperform all non-sequential recommendations models, illustrating the effectiveness of sequential recommendations in modelling user behaviour sequences. For sequential recommendation scenarios, cross-domain methods outperform single-domain methods recommendations, validating the effectiveness of feature transfer module for improving the recommendation performance and alleviating the data sparsity issue in the cross-domain recommendation. For single-domain recommendation scenarios, sequential recommendation performs better than traditional methods on two datasets. It indicates that sequential dependencies between items help achieve promising results. Then we analyse the performance of SR-GNN and GRU4Rec, supporting  the effective of mining high-order complex relations based on GNN for better recommendation performance. For Cross-domain recommendation scenarios, we observe that π -Net performs better than CoNet. It demonstrates that the bi-directional features transfer module can improve the recommendation performance. When modelling user behaviour sequences from across domains to mine user preferences using item intrinsic association information can increase inter-domain connections to improve performance. For Crossdomain sequential recommendation scenarios, CsrGCF outperforms all recommendation baselines on two datasets. It effectively demonstrates that the method mines high-order complex relationships based on GNN and fuses common sharing features and domainspecific features through a bi-directional feature transfer module. Meanwhile, We also observe that the performance is rising rapidly on the Movie-Book dataset, which is maybe because the closer relationship between the film and the book has a greater effect for the recommendation.

Ablation study of CsrGCF
We conduct an ablation study on CsrGCF to validate the function of its main components on the Food-Kitchen dataset. We use Recall@20 and MRR@20 to measure the results. The comparison results of the contributions of CsrGCF as shown in Table 2, with the best results heightened in boldface.
(1) CsrGCF_SBM: We remove Inter-domain Item Sequence-level Representation Learning Module from CsrGCF. Then we perform the dot product operation in the recommendation prediction module by directly integrating the user's collaborative preference representation with the item's collaborative representation.
(2) CsrGCF-GCF_CBM: We remove the relationship-aware attention mechanism from the Item and User-Collaboration Representation Learning Module model.
From Table 2, we obverse that the performance of CsrGCF_SBM is the worst on both datasets. It indicates that sequential mining dependencies between items in the historical behaviours significantly improve the cross-domain recommendation results. Meanwhile, the variation of recommendation performance on the Food domain is larger than that on the Kitchen domain. It is maybe because the food in the Food domain is more closely related. The close relation helps to fully exploit the complex relationship among items to improve the recommendation effect. In contrast, the relations between different items in the Kitchen domain have a weaker impact on the recommendation results. The possible reason for this result is the immensely complex relationship of the item and user features in the Food domain. In the Kitchen domain, users buy items on demand and are less influenced by the behaviour of other users. It is worth noting the modest change in performance of CsrGCF_CBM. Again, the variation of recommendation performance on the Food domain is larger than that on the Kitchen domain. It is necessary to distinguish the effect of different relationships on the recommendation results in the scenario with complex relationships.

Parameter sensitive analysis
α S and α T , which are used to control the proportion of user features retained in the corresponding domains, and play a vital role in expressiveness. In this section, we conduct experiments to investigate the sensitivity of hyperparameters α S and α T on the model performance. For convenience, we design a single parameter α = α T = α S ranging in [0.5, 1] with a step size of 0.1. Figure 6(a) shows the results on the Movie-Book dataset, the optimal α is 0.7 for the recommendation. Figure 6(b) shows the results on the Food-Kitchen dataset, the optimal α is 0.8 for the recommendation. It is obvious that the parameter α has a significant impact on the recommendation. In particular, the Movie-Book dataset having lower similarity between the two domains is more sensitive to parameter α. This demonstrates that it is necessary to retain domain-specific features, especially when the two domains have little similarity.

Model training efficiency
To explore the training efficiency and scalability of our model, we further conduct experiments via measuring the time cost for the model training with different dataset proportions in [0.1, 1.0]. The comparison results with other cross-domain sequential baselines are shown in Figure 7. From the result we can find that our proposal requires less training time than other baselines. Moreover, we observe that the training time slowly increases as the proportion of the dataset gradually increases, which indicates the validity of our model for large-scale datasets.

Conclusions
Cross-domain sequential recommendation can predict next items via modelling the sequence of historical interactions data. However, there is the challenge of data sparsity in exiting method. This paper presents a novel method for the cross-domain sequential recommendation with graph-collaborative filtering to mine user's high-order behaviour for alleviating the data sparsity problem. The method introduces graph-collaborative filtering to build the bridge for user cross-domain preferences in sequence recommendations. First, to mine user high-order behaviour and predict user's current intentions, we propose timeaware and relation-aware graph attention mechanisms to model the complex correlations between items and users. The time-aware graph attention mechanism learns the interdomain sequence-level representation of items, and the relation-aware graph attention mechanism learns item-collaborative representations and collaborative user representations. Then we devised a Cross-domain Feature Bidirectional Transfer module transferring common sharing features in both domains and retaining domain-specific features in a specific domain to achieve excellent recommendation performance in both domains. The novel method considers both common sharing features and retains domain-specific features to refine user features. The experimental results on real-world datasets demonstrate the validity of the proposed method. However, it assumes that the users of the two domains overlap entirely, and we will consider a subsequent extension of the model to scenarios with partially overlapping users. Moreover, we will extend the method to more than two domains. Note 1. https://jmcauley.ucsd.edu/data/amazon/