Memory-augmented meta-learning on meta-path for fast adaptation cold-start recommendation

Personalised recommendation is a difficult problem that has received a lot of attention to academia and industry. Because of the sparse user–item interaction, cold-start recommendation has been a particularly difficult problem. Some efforts have been made to solve the cold-start problem by using model-agnostic meta-learning on the level of the model and heterogeneous information networks on the level of data. Moreover, using the memory-augmented meta-optimisation method effectively prevents the meta-learning model from entering the local optimum. As a result, this paper proposed memory-augmented meta-learning on meta-path, a new meta-learning method that addresses the cold-start recommendation on the meta-path furthered. The meta-path builds at the data level to enrich the relevant semantic information of the data. To achieve fast adaptation, semantic-specific memory is utilised to conduct the model with semantic parameter initialisation, and the method is optimised by a meta-optimisation method. We put this method to the test using two widely used recommended data set and three cold-start scenarios. The experimental results demonstrate the efficiency of our proposed method.


Introduction
Recommendation systems (Dacrema et al., 2019;Tang et al., 2021; have become increasingly important to the industry due to the rapid development of mobile applications. The core goal is to solve the problems of user information overload, which comes with a slew of challenges. Although there are several kinds of recommendation systems, traditional recommendation methods based on matrix factorisation (Jiang et al., 2021;Xu et al., 2021) or deep learning (D'Angelo et al., 2021) have been efficient, cold-start (Kumar et al., 2020;Zhang et al., 2019) is an unavoidable problem in most recommendation systems. Because of the absence of user-item interaction, the recommendation system frequently fails to recommend appropriately for new users. The issue of the cold-start is broken into two parts: user cold-start and item cold-start, which refer to the situation in which the recommendation system is unable to deal with the presence of new users or items because of the absence of user-item interaction. To deal with the issue of cold-start, an effective solution is to enrich new users and new items with auxiliary data, such as recommendation systems based on user or item content (Li & She, 2017;Wei et al., 2016). Furthermore, a heterogeneous information network (HIN) (Pham & Do, 2020;Shi et al., 2016) is used to supplement user-item interaction with complementary heterogeneous information.
Because the cold-start problem is a natural data sparsity concern, increasing the data can address the cold-start strain significantly. On the basis of these works (Dong et al., 2020;Lu et al., 2020), we combined the data level method with the model level method and proposed memory-augmented meta-learning on meta-path (MAMP) method for cold-start recommendation. The proposed approach first develops the appropriate initial embedding for the user's semantic context and the items, a meta-optimisation-based strategy is advised. In particular, introducing the user's semantics and items into a specific user's task will be significant, and by establishing two layers of fully connected neural networks, utilising previously studied semantics and items, learning new semantic embedding and new item embedding, the last meta-optimisation approach will be used to update parameters. Then, to solve the local optimisation problem, MAMP creates a semantic-specific memory that generates a customised bias item while parameters of model are initialised. The following is the specific procedure: semantic adaptation learns each aspect's unique semantic priors, updates each user's preference for task adaptation learning from diverse semantic priors and, lastly, uses semantic-specific memory to guide the initialisation of semantic priors with individualised parameters.
To conclude, the main contributions of MAMP are as follows: • We introduce HIN to meta-embedding, which is able to learn new semantics embeddings and new items embeddings to solve the cold-start problem more effectively. • We present semantic memory improvement to aid the co-adaptation meta-learner (Lu et al., 2020), which significantly improves the co-adaptation meta-learner's performance measure. • Experiments using DBook and MovieLens data sets to demonstrate the performance effectiveness of our meta-learning technique.

Cold-start recommendation
There may be a different proportion of new users and new items in the recommendation system, and interaction between these users and items is sparse. As a result, personalised recommendation for new users is challenging due to the cold-start problem. Deep learning (Liang, Xie, et al., 2020) has achieved great results in a variety of artificial intelligence domains. However, in order to obtain significant generalisation, a large number of examples must be trained. Deep learning becomes ineffective when used in a cold-start recommendation scenario with sparse user-item interactions. Data augmentation at the data level or the provision of auxiliary data (Zhu et al., 2019) are the most typical solutions to cold-start recommendations. There are also some methods involving the high-level representation of the data, such as capturing the rich heterogeneous data (Chang et al., 2021) of the items and the users, using the data representation of the heterogeneous information network, in addition to considering the basic characteristics of the data. Alternatively, a semantic network can be built using a knowledge graph, in which nodes represent entities and edges to reflect various semantic relationships between items. There are also cross-domain recommendations based on mapping of neighbour user attributes, or recommendations based on mining friend lists on social networks. These techniques rely largely on data and have a number of drawbacks.

Meta-learning
The purpose of meta-learning is to acquire meta-knowledge, which can be regarded as the most basic general knowledge needed to solve similar learning tasks, as a result of which it would swiftly adapt to new target tasks according to several learning tasks. Due to its characteristic of cross-task learning, meta-learning is also regarded as one of the key technologies to open up general artificial intelligence. According to different research contents, meta-learning can be divided into three types (Hospedales et al., 2020). Metrics-based approaches Snell et al., 2017;Vinyals et al., 2016) to compare and classify by calculating measurement or approximation, while model-based approach (Munkhdalai & Yu, 2017;Santoro et al., 2016) wraps internal learning steps in the feed-forward transfer stage of a single model, so as to generalise tasks quickly. Finally, the optimisation-based method focuses on acquiring meta-knowledge to improve the optimisation performance of the model, including the optimisation of the model's initialisation parameters (Finn et al., 2017). Wang et al.'s work (2019) demonstrated the effectiveness of meta-learning on the few-shot problem, and the issue of cold-start could be thought as a subset of the few-shot problem. As a result, it is possible to incorporate meta-learning into a cold-start recommendation system. These papers (Chen, Luo, et al., 2018;Zhao et al., 2019) make some progress in introducing the meta-learning paradigm from the model level of recommendation systems. Lu et al. (2020) began to address the issue of cold-start on the levels of data and model. It is the first time meta-learning has been used to address cold-start recommendation on HIN, while it can make effective recommendations, it also has flaws. On the one hand, although the link between two objects sequence meta-path (Sun et al., 2011), such figure forms can effectively capture the semantic context on the data level, but on the various semantic (i.e. meta-path) to integrate the user history record embed into a single implicit, thus strongly relies on collaborative filtering technology, fine structure is difficult to adjust to individual new users. Some characteristics or items may lose their relevance, causing multidimensional semantic fusion to perform poorly. On the other hand, the author uses an optimisation-based meta-learning technique called model-agnostic meta-learning (MAML) (Finn et al., 2017) at the model level. It is frequently recommended for the user as a learning task since it performs well in learning configuration initialisation for new tasks. The central concept is to learn a global parameter that will be used to set the parameters of the personalised recommendation model. Personalisation parameters are updated locally to comprehend a user's preferences, global parameters are updated by reducing training task loss between users and learnt global parameters are then utilised to guide model settings for subsequent users. Although these methods based on the MAML method and its derivatives have a significant capacity to deal with data sparsity, they have a number of flaws, including instability, sluggish convergence and poor generalisation ability. When dealing with users who have different gradient descent directions than the bulk of users, which are included in the data set of training, they are more likely to be influenced by gradient degradation, which can lead to local optimisation. Starting at the model level, to offer a customised bias item while initialising parameters of the model and to gain knowledge to catch the commonality of prospective user preference shared across items, Dong et al. (2020) created a feature-specific memory and a task-specific memory. It effectively solves the problem that the MAML technique and its variants are easy to optimise locally, but without considering the level of auxiliary data, the model would become inefficient when the interaction data is sparse.

Meta-learning for cold-start recommendation
At present, most traditional cold-start recommendation systems try to initialise the basic model f θ by learning the global meta-parameter θ by the meta-learner; therefore, the global meta-parameter θ is also called the global prior. Global prior θ is optimised in several tasks and quickly adapts to a target task after one or several gradient stages, given the limited number of instances. Specifically, meta-learning divides tasks into meta-training tasks and meta-testing tasks, for each task of meta-training and meta-testing, it also includes support set and query set, which are mutually exclusive. First, during meta-training, the meta-learner adjusts the global prior θ to suit the specific parameters of the task by using the loss w.r.t support set. In the query set, the loss w.r.t the specific parameters of the task are calculated and propagated backwards to update the global prior θ. Second, during metatesting, the meta-learner adapts the θ updated in meta-training to the task by one or several gradient steps on the support set, and then applies the adjusted parameters to predict the results of the query set.

Memory neural networks
In general, after the training parameters have been trained, the neural network will put the samples directly into the trained model for calculation, and then retrieve the results without interacting with the memory. For few-shot learning with sparse data, or models with human-computer interaction, it is difficult to achieve the goal only by connecting calculation between parameters within the model. Weston et al. (2014) first implemented the memory-augmented network using the neural Turing machine model, which consists of a controller and a memory module. The controller writes data to the memory module with a writing head and reads data from the memory module with a reading head. In the memoryaugmented network, the feature information is closely associated with the corresponding label in the writing process, and the feature vector is accurately classified in the reading process.

Memory-augment neural network for recommendation
Neural Turing machine (Graves et al., 2014) is the foundation of a significant subsidiary of memory neural networks: extract and update information from memory. These features make NTM a good example of few-shot learning or meta-learning. Based on this idea, Santoro et al. (2016) proposed a Memory-Augmented Neural Network (MANN), which applied NTM to few-shot learning, retained sample feature information by displaying external memory modules and optimised the reading and writing process of NTM by using metalearning algorithm, and finally realised effective small sample classification and regression. Chen, Xu, et al.'s work (2018) is one of the first batches of work to integrate man into recommendation task, proposed the idea of integrating MANN with collaborative filtering (Lian & Tang, 2021) to recommend, with the help of external memory matrix, the model can store and update the user's history, which effectively improves the presentation ability of the model. Dong et al. (2020) further propose two kinds of global shared memory, to deal with the cold-start recommendation.

Proposed approach
In this section, a meta-learning method called MAMP based on memory augmented and meta-path is proposed to solve the cold-start recommendation.

Overview
Which can be seen in Figure 1, the suggested model of this paper mainly consists of three parts: the first part is semantic enhancement task constructor that extracts metapaths from profiles. The second part is meta-learning recommendation and semantic coadaptation models for prediction. The third part is memory-augment optimiser to help recommendation model initialisation parameters.

Semantic-enhanced task constructor
In a cold-start recommendation task, a node's neighbours on the meta-path have different importance. The focus is on substituting the various levels of semantic context of meta-path into tasks. The definition of the task T u involves one user u follows Lu et al.'s work (2020), that is where S u represents the semantic-enhanced support set, Q u represents the semanticenhanced query set. The support and query sets to every task T u are mutually exclusive and they comprise items randomly selected from item sets that user u has rated. Similarly, the semantic-enhanced support and query set are defined as follows: where S R u and Q R u are sets of items that the user u has rated, S P u and Q P u represent a semantic context according to a collection of meta-paths P.
Firstly, S P u is used to encode the multidimensional semantics of predicted rating y Q u,i . Specifically, in the recommendation, it is assumed that user u has not rated item i, but the rated items u have some relationship with unrated item i or the users with some relationship with the user u has relation with item i.
These relationships are defined as multifaceted semantics here. For example, in Figure 2, the multifaceted semantics can be defined as follows: where capital letters represent a type of node, such as 'U' for user, 'B' for book and 'A' for author. The multifaceted semantics are a collection of meta-paths, such as 'UBAB' and 'UBUB', that define different semantic contexts induced by meta-paths: books written by the same author or books purchased separately by the same user. Since there are multiple interactions for user u in each task, a specific meta-path p ∈ P semantic context is built for task T u , as follows: where C p u,i represents the items reachable along the meta-path p starting from the user u and rated item i.
Secondly, we formulate personalised recommendation as a task in the context of metalearning. Given the user set U S and its profile F S U , the item set I S and its profile F S I , and the corresponding rating set Y S U,I , then the task of the recommendation system is to predict the rating y Q u,i of an item i ∈ I Q for a user u ∈ U Q then the recommendation system's goal is to anticipate a user u's rating of an item, under the conditions u ∈ U Q and i ∈ I Q , where S and Q represent support set and query set, respectively. Under the cold-start recommendation scenario, to construct a support set for rated items, the data include only users with 13-100 items, we take the below equation as the rated condition to construct, the number of ratings for a user u is indicated by |Y S u,I |. Using the same method above, these steps can continue to make Q u = (Q R u , Q P u ) and S R u ∩ Q R u = ∅.

Recommendation model
Based on the initial embedding e u of a certain user u ∈ U Q and the embedding e i of a certain item i ∈ I Q u to be rated, we predict u's rating for i. The semantic-enhanced user encoder g φ creates a user's embedding based on the embeddings of the p-related items C p u that are reachable through meta-path p rooted from u, as follows: where C u represents the collection of items that are related by a user u through either direct interaction (i.e. rated items) or meta-paths (i.e. induced indirect items). g φ is the context aggregation function, with φ = {W ∈ R d×d I , b ∈ R d } as the parameter. R d is a ddimensional dense vector, mainly used to embed multiple features of the users. R d×d I is the feature embedding matrix. σ (·) represents the activation function which is defined as follows: where a i is a fixed parameter in the interval of (1 + ∞).
Using an embedding e s of user u as well as an embedding e i of item i in the prediction of preferences, to learn better initial embedding to adapt to new users, e s and e i are relearned by building two multi-layers fully connected neural networks, which is defined as Equation (8) where z(·) represents the fully connected layer, which will be used as the generator of semantic embedding and item embedding; χ s and χ i represent the full-connection layer parameters of learning semantic embedding and item embedding, respectively, which will be optimised by meta-leaning. The preference prediction function predicts user u's rating on item i, which is defined as follows: where h ω is implemented by a multi-layer perceptron parametrised by ω and ⊕ stands for the concatenation of the two real-valued vectors. Finally, we denote the recommendation model by the below equation where θ = (φ, ω, χ).

Co-adaptation meta-learner
The objective of a co-adaptive meta-learner is to learn global prior θ so that it can swiftly adjust to a new task with only sparse interactions. The co-adaptation meta-learner includes semantic-based adaptation and task-based adaptation procedures, which will be described as follows.

Semantic-based adaptation.
Using S P u of task T u , the semantic adaptor evaluates the loss based on the semantic context caused by a particular meta-path p, and uses a gradient descent step to get global context prior φ w.r.t a specific p. It not only encodes how to use contextual semantics in a heterogeneous information network, but also adapts to the semantic space induced by meta-path p.
In general, given task T u of a particular user u, the support set S u = (S R u , S P u ) is enhanced by the semantic context S P u . Then given p ∈ P, the semantic embedding e s in the semantic space of p can be calculated by the below equation Then the loss of the rated item S R u in task T u is further calculated by the below equation which h ω (e P u , e i ) represent meta-path p induced semantic space used for u on the item i predicted rating. Finally, to get the semantic prior of various aspects φ p u , the loss of task T u in each semantic space can be calculated after gradient descent follows the below equation where φ p u is the adapted model parameter when given a user u and semantic context p, and γ represents the semantic learning rate. The adaptation is achieved by a single-step gradient descent based on the gradient of the supervised loss computed on a support set (defined in Equation (12)) w.r.t the semantic-enhanced user representation e p u while freezing the gradient to φ.

Task-based adaptation.
Based on the φ p u , p ∈ P that we got before, task-based adaptation further adapts the global prior ω to the task T u through multiple gradient descent, the prior parameter ω of the rating prediction function h ω is also adapted by using given a user and semantic context in a similar way.
First, e P s on the support set is updated through Equation (10) e P S s next, we convert the global priors ω into the same space which follows the below equation where is the product of elements, κ(·) can be regarded as a transformation function realised by several full-connection layer networks. Then the ω p is adapted to task T u using gradient descent which follows the below equation Finally, the main optimisation is achieved by the gradient descent of the sum of the taskspecific loss over the users' query sets.

Memory-augmented meta-learning
Referring to the single-initialisation problem in Dong et al.'s work (2020), this paper introduced sentiment-specific Memory. That is, by semantic embedding memory M S and profile memory M F , the personality parameter χ s is effectively initialised. The information that profile memory M F associated with profile F to provide the retrieval attention value a s . a s is used to extract key information from M S , where each row of M S holds the corresponding. These matrices of memory will aid with in construction of a customised bias term b s when initialising χ * , that is, χ * ← χ * − τ b s . τ is a hyper-parameter that controls the degree of personalisation when initialising χ * , which is the set of χ s and χ i .
Specifically, given the semantic profile f s , which is the set of e p u . Semantics are a twodimensional vector representation, the first dimension of semantic vectors for different users is not necessarily consistent, therefore, consider the profile memory is represented by a three-dimensional vector, that is, M F ∈ R K× ×d s , where is the common dimension of the first dimension of different semantic vectors.
Then calculate semantic attention value a s ∈ R K . Firstly, the semantic profile vector is extended to the same dimensional space as M F , that is, Equation (17) Cosine similarity is used to calculate the degree of correlation between f s and M F which follows the below equation Finally, the semantic attention value a s can be obtained by the normalisation of SoftMax function.
Semantic embedding memory M S ∈ R K×{d θs } saves any of the rapid gradients which have the same shape as parameters that belong to semantic embedding model, d θ s represents the dimension of parameters that belong to M S . Then get a personalised bias term b s , through the below equation During the initialisation phase, the two memories are randomly initialised and updated during the training process. M F will be updated with the following equation where α is a hyper-parameter that controls how much new profile information is added. Similarly, M S will also be updated with the following equation where L(yû ,i , y u,i ) represents the loss of training task, α and β are hyper-parameters that regulate what more new data are kept.

Optimisation
Optimise the global prior θ = φ, ω, χ across different semantic tasks is the purpose of coadaptation meta-learner, which will be optimised by the back propagation of the loss of query set for the meta-training task T train u as follows: There is no direct update the global prior θ with the data of T u ∈ T test , memory M F and M S will be updated by formulas (20) and (21), semantic prior φ and task prior ω will be updated by formulas (13) and (16). In addition, semantic embedding prior and item prior will also be updated through the following equation where λ is a hyper-parameter, and f represents the recommendation model.

Experiments
In this section, the performance of MAMP is verified by detailed experiments: through recommendation system commonly used three evaluation indicators, the model of this paper compared with the previous models.

Data sets
This paper mainly conducts experiments and evaluations on two widely used standard data sets, DBook and MovieLens-1M, which contain both user information and item information from open-source data sets. MovieLens has about 1 million ratings, with over 3000 movies rated by about 6000 users, including attributes such as gender, age, occupation, zip-code and movies including attributes such as genre, with ratings ranging from 1 to 5. In addition, DBook contains about 650,000 ratings and 20,000 books are rated by about 10,000 users. Users include address attributes, and books contain information such as the year, the author, the publisher, with ratings ranging from 1 to 5. Unlike previous efforts (Lee et al., 2019;Lu et al., 2020) to add extra information for MovieLens, such as adding information about the director and actors for the movie, here we use a native data set, Table 1 outlines some essential statistics for the two data sets.

Data preprocessing
For each data set, we divided users and items into two groups: existing and new, roughly based on user joining time (or first user operation time) and item release time. In particular, for DBook, since there is no time information of users, we randomly select 80% of users as existing users and the other 20% as new users according to Lu et al.'s work (2020). In addition, each data set is divided into meta-training and meta-testing.
(1) Meta-training only contains the ratings of existing items by existing users, of which 10% that the verification set choose at random, and the corresponding task is to recommend existing items for existing users, that is, warm scenario.
(2) The remaining is spent on meta-testing, which is broken into three parts to correspond to three different cold-start scenario. (CW) The corresponding task is to recommend existing items for new users; (WC) the corresponding task is to recommend new items for existing users; (CC) the corresponding task is to recommend the new item to the new user. Rated in order to construct the sets of S R u and Q R u , we follow the previous work (Lee et al., 2019). Specifically, the users in the task should include the number of rated items between 13 and 100, that is, Equation (5). The items that a user has rated, will be randomly selected 10 items as Q R u , and the rest of the items as S R u . In addition, to construct the sets of S p u and Q p u , we consider any meta-paths p ∈ P, beginning with user-item and concluding with items up to 2 in length.

Evaluation metrics
This paper mainly verifies the model performance under three evaluation indexes. The mean squared error (MAE) and root mean squared error (RMSE) and normalised discounted cumulative gain (NDCG) 2 y u,n − 1 log(n + 1) MAE and RMSE are utilised to indicate the error between the predicted and actual values, with lower value indicates better performance of the model. N is the number of predictions for each user in the query set. NDCG@N accounts for the observed predictions sorting performance of the query set, with higher values indicating better performance of the model.

Model comparison
To compare performance with MAMP, we select various representative advanced technologies. Among them, there is the traditional feature-based method, namely FM; and there are also HIN-based methods HEREC, MetaHIN; the rest are cold-start based methods, namely, MeLU, MAMO and MetaHIN.
FM (Rendle et al., 2011): a feature-based model that makes use of a variety of auxiliary information. FM can be used to solve the classification problem (to predict the probability of each rating) and can be used to solve the regression problem (to predict the size of each rating).
HEREC (Shi et al., 2018): a heterogeneous information network embedding method was proposed, which combined embeddings and matrix decomposition (MF) model to optimise the recommendation effect.
MeLU (Lee et al., 2019): a typical approach to applying meta-learning to cold-start recommendations. Rating predictions are obtained by providing user and item embedding links into a fully connected network, and MAML method is used to update the parameters locally and globally.
MAMO (Dong et al., 2020): the authors designed two memory matrices, which offer a customised bias item while initialising parameters of the model and assist the model to quickly predict user preferences.
MetaHIN (Lu et al., 2020): a method based on the thorough capture of HIN-based semantics allows learner to readily adapt to basic knowledge of multidimensional semantics in a meta-learning, combining meta-learning model at the model level with a heterogeneous information network at the data level.

Comparison results 4.3.2.1. Cold-start scenario.
The performance indexes of various models under the three cold-start schemes are listed in detail in Table 2 and Figure 3 provides a more intuitive comparison. Overall, our meta-learning model has achieved relatively ideal performance in all indicators of the two data sets. For example, on the MovieLens-1M data set, our metalearning model improved 1.65%, 2.48%, and 1.12% on nDCG@5, respectively, compared to the best model. Generally speaking, the performance of traditional methods such as FM is poor, mainly because it is difficult for traditional methods to deal with the high-order graph structure such as meta-path and fail to integrate richer semantic features. In contrast to traditional methods, HIN-based methods such as HEREC performs better because they include meta-paths and incorporate richer semantic features. However, the problem of sparse data interaction exists on various scenarios of cold-start. In the case of insufficient training data, these models cannot further improve their performance. Furthermore, MeLU, MAMO and MetaHIN using meta-learning can effectively deal with the problem of sparse data interaction. MeLU only integrates heterogeneous information into content characteristics, while MetaHIN captures multifaceted semantics from higher-order structures and performs coadaptation of semantics and tasks, so MetaHIN outperforms MeLU in most performance metrics. In addition, MAMO relies on personalised bias terms to enhance the generalisation ability of meta-learning and alleviate the disadvantage of being unable to capture deeper semantic data to a certain extent. However, these meta-learning methods based on MAML variants are still lower than our model, mainly because the training set is prone to confrontation data, and the problem of gradient degradation often occurs when dealing with users with different gradient descent directions, and the poor robustness leads the model to enter the local optimal state. MAMP not only performs semantic and task co-adaptation, but  also effectively improves the robustness of the model by designing a semantically specific memory that provides a personalised bias item when initialising model parameters.

Warm scenario.
In the W-W section of Table 2, we also examine traditional recommendation scenarios. Statistically, the MAMP model is still ahead of other models in terms of performance. This is mainly due to the sparse samples and interactive data sparsity will still exist in a traditional recommendation scenario. MetaHIN and other meta-learning models need to be updated by combining the gradient results of the loss generated on the inner task of the test set of each task, so each iteration is targeted at a batch of data. MAMP,  on the other hand, provides a personalised offset entry through semantic memory, which only updates a set of input parameters, effectively achieving fast adaptation.

Model analysis
To show the severity of the single-initialisation problem and the advantage of the proposed memory-based technique, we conducted ablation studies. Intuitively, we consider MAMP without memory-based initialisation as a trivial baseline, which is called MAMP-SI. We just present the performance in CC, which is the most difficult scenario, because the various cold-start scenarios showed similar results. The performances of MetaHIN, MAMP-SI and MAMP in all data sets are readily seen in Figure 4. First, the performance of MAMP-SI is better than MetaHIN, which is mainly due to the better adaptability of meta-embedding to new tasks; second, due to the singleinitialisation problem, the performance of MAMP-SI is worse than MAMP, which also proves the advantages of the proposed memory-based technology.

Parameter analysis
MAMP has two memories, including profile memory M P and semantic embedding memory M S , whose function is used to generate customised bias terms when initializing local parameters. During the construction of semantic memory, we predefined K user types and built a three-dimensional common semantic embedding. For predefined user types, the effect of appropriate values of K values on model performance is investigated, and for the Semantic Common Embedding, we discuss the impact of the specific value of the second dimension on the model performance. For a clearer display effect, we only show the results in the C-C scenario, as shown in Figures 5 and 6.

Limitations and future work
Data sparsity is a natural problem in cold-start recommendation, so auxiliary data are critical to solving the cold-start problem. However, in practice, auxiliary information is not always able to be imported successfully, in which case, data enhancement becomes an alternative. In this work, we use the fusion of multiple meta-paths in heterogeneous information network to enhance data, and some results have been obtained. However, we preliminarily find that meta-paths may not be the best way to describe rich semantics, because the construction of meta-path is cumbersome. In future work, we will use meta-graph (Zhao et al., 2017) to replace meta-path. Compared with meta-path, which requires continuous structure, meta-graph only requires one starting node and one ending node, and the intermediate structure is not restricted. But how to calculate the similarity of the meta-graph and how to integrate the meta-graph into meta-learning can be an interesting challenge. In addition, data fusion  is also a direction worth considering.

Conclusion
In this article, we propose MAMP, a new meta-learning recommendation method, for fast adaptation cold-start recommendation. Specifically, we use the idea of meta-optimisation to learn embedding to better fit the recommendation model and new tasks. In addition, a semantic-specific memory is proposed to assist the co-adaptation meta-learner, which generates a personalised bias term through the history record to greatly improve the performance index of the co-adaptation meta-learner. Experiments on DBook and Movie-Lens data sets show that our meta-learning method has significant advantages in terms of effectiveness and performance in various scenarios.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data sets of this paper is available at https://book.douban.com and https://grouplens. org/datasets/movielens/.