Deep Interest-Shifting Network with Meta-Embeddings for Fresh Item Recommendation

Nowadays, people have an increasing interest in fresh products such as new shoes and cosmetics. To this end, an E-commerce platform Taobao launched a fresh-item hub page on the recommender system, with which customers can freely and exclusively explore and purchase fresh items, namely, the New Tendency page. In this work, we make a first attempt to tackle the fresh-item recommendation task with two major challenges. First, a fresh-item recommendation scenario usually faces the challenge that the training data are highly deficient due to low page views. In this paper, we propose a deep interest-shifting network (DisNet), which transfers knowledge from a huge number of auxiliary data and then shifts user interests with contextual information. Furthermore, three interpretable interest-shifting operators are introduced. Second, since the items are fresh, many of them have never been exposed to users, leading to a severe cold-start problem.)ough this problem can be alleviated by knowledge transfer, we further babysit these fully cold-start items by a relational meta-Id-embedding generator (RM-IdEG). Specifically, it trains the item id embeddings in a learning-to-learn manner and integrates relational information for better embedding performance. We conducted comprehensive experiments on both synthetic datasets as well as a real-world dataset. Both DisNet and RM-IdEG significantly outperform state-of-the-art approaches, respectively. Empirical results clearly verify the effectiveness of the proposed techniques, which are arguably promising and scalable in real-world applications.


Introduction
E-commerce has been prevalent in our daily life. In traditional online shopping scenarios, all items are mixed up, and a recommender system predicts users' preferences on items based on their past interactions, e.g., click, purchase, and rating [1][2][3]. However, this strategy overlooks the influence of the items' life periods and causes two problems. First, as many people have a growing interest in novel, newly released commodities, their requirements will not be fully satisfied. Second, popular items have more opportunities to be exposed, whereas those new products are overwhelmed, even though with high quality [4][5][6].
To tackle these problems, one E-commerce platform Taobao launched a new application, namely, New Tendency page, aiming to recommend fresh items for users who prefer new products. As illustrated in Figure 1, a card which contains a fresh item with its textual descriptions is pushed to the users. Once a user clicks this card, the New Tendency page appears, where more items from a predefined fresh item pool are recommended to this user. As a result, users who prefer newly released products can freely explore this 1.3. Q2: How to Deal with Totally Cold-Start Items? As reported by Taobao, more than 60% of fresh items are newborn and never interacted by users, which causes a severe coldstart problem. Note that these newborn items are not the cause of data deficiency because they are not a part of training data.

Potential Solutions to Q2.
e cold-start problem is usually solved by integrating external information, e.g., item attributes [22,23], user attributes [24,25], relational data [26], and knowledge from other domains [16]. We note that this problem can be alleviated by applying the cross-domain technique because the embeddings of item attributes can be reused. Nevertheless, since the id of a cold-start item never appears, its embedding cannot obtain a good initialization. Pan et al. [27] proposed the meta-Id-embedding generator (Meta-IdEG), which considers the id embedding initialization problem and solves it through a learning-to-learn training manner. However, meta-IdEG only utilizes item features to generate the id embedding. As a result, it is unable to explore the community structural information when initializing id embeddings, which leads to a suboptimal solution.

Our Solutions.
In this study, we propose two novel techniques to construct a deep learning-based recommender system, which simultaneously tackles these above issues. e proposed model fully exploits various types of external information to improve the prediction performance. To answer Q1, we present a deep interest-shifting network (DisNet). Specifically, it firstly learns the users' general interest vectors using a huge number of auxiliary data and then shifts them to a scenario-specific representation using Metaembedding Figure 1: An illustration of fresh item recommended. Once a user clicks the card on the left side, a fresh-item recommendation page (middle side) appears to achieve interest shifting from users' history interests to the fresh items, where the recommended fresh items are ranked using the embeddings of fresh items and relational items (right side). All the fresh items on the recommended page are chosen from a specific fresh item pool, which are usually less exposed than items on the main entrance page. 2 Complexity contexts. Next, the size of trainable parameters is reduced to a few neural network layers, which significantly alleviates the data-deficiency problem. To answer Q2, the transferred embedding layer of item attributes can be reused, and the only thing that matters is the item id embedding initialization problem. Hence, this paper proposes a relational meta-Id-embedding generator (RM-IdEG), which is trained in a learning-to-learn manner, aiming to make the model achieve great generalization ability after few-shot training. Furthermore, RM-IdEG absorbs the information of relevant items. erefore, the community structural information can be inherently embedded and exploited, which has been proved beneficial for addressing the cold-start problem [26]. e main contributions of this work are summarized as follows: A novel application, fresh item recommendation, is studied, which gives new items more opportunities to be exposed and fully personalizes the recommendations of those who prefer the novel, innovative products. We also make a first attempt to address the fresh-item recommendation task by two novel techniques. We present a deep interest-shifting network (DisNet) to deal with the severe data-deficiency problem in a fresh-item recommendation scenario. To address the cold-start problem, we propose a relational meta-Id-embedding generator (RM-IdEG) that involves the relational data into meta-id embedding initialization, which enables community structural information to be inherently contained. Extensive experimental results demonstrate that our model can effectively handle fresh-item recommendation tasks in both cold-start and warm-start stages. e rest of this work is organized as follows. In the next section, notations and preliminary knowledge are introduced. In Section 3, we provide a detailed description of our network architecture. After that, the results of empirical studies are reported. en, we give the related works of our method. Further discussion and concluding remarks are provided in the last section.

Notations and Preliminaries
In this section, we firstly discuss a popular architecture of context-aware recommender systems. en, we introduce the training procedure of meta-IdEG and summarize the notations in Table 1.

Context-Aware Recommendation.
A popular strategy in existing context-aware recommendation systems is to learn latent representations for users and items and then make decisions using these latent vectors.
Formally, given an example, which contains an item t, a user u, and potentially some contexts, we first feed them into an embedding layer. en, their features are transformed into vector representations by one-hot encoding or multihot encoding. e transformed item features consist of an item id embedding e t and other content features v t . For the user, we combine its id embedding and other features as one vectorized representation v u . Finally, we denote the transformed context features by c. e final prediction is made by For example, in matrix factorization-based models [28], q u and p t are exactly their id embeddings, and g is the context-biased prediction function. State-of-the-art models [29,30] also use neural networks to learn user/item representation as well as make decisions. is paper also adopts neural networks for f t , f u , and g, which lead to a doubletower model architecture.
It is noteworthy that such a learning paradigm deeply couples the contextual information in the model architecture. In our cross-domain setting, there are heterogeneous contexts, i.e., scenario-specific contexts. erefore, the trainable parameters of deep neural network models cannot be reused, which makes them hard to share knowledge across different domains [31][32][33].

Meta-Id Embedding Generator.
To babysit newborn items, the only thing that matters is how to learn the embeddings for new items' ids. A common learning paradigm first uses an Id embedding generator (IdEG) to initialize a vector for new ids in the embedding table and then update them using incoming user interactions. e most intuitive way is to output a random embedding initialization. However, its generalization ability may be restricted due to the cold-start problem. To this end, Pan et al. [27] proposed to initialize id embeddings using meta-learning technique, a.k.a. meta-Id embedding generator (meta-IdEG). By regarding the recommendation for each item as a task, meta-IdEG ensures good embedding initialization such that the model achieves better generalization ability after few-shot training.
Next, we illustrate the workflow of meta-IdEG. For each task that relates to a specific item, we divide its data examples (interactions) into two sets: a support set D s and a query set D q . We firstly feed the item features into a neural network to generate an id embedding, e * t � IdEG meta (v t ). en, we optimize IdEG meta in a learning-to-learn manner. We denote the predicted label on the support set as y * using e * t . First, we can obtain the cold-start loss by en, we update the embedding by one step of gradient descent: where α is the learning rate. Since a new embedding is obtained, we can predict label y ′ on the query set using e t ′ . Next, we define a warmed loss by

Complexity
Note that e * t and e t ′ do not have to be explicitly computed, and we are only interested in their gradients on IdEG meta . Finally, we sum the two losses to get our meta-loss function: Here, η is the tradeoff parameter. In other words, minimizing l meta simultaneously achieves two goals: (1) the error in predictions for the new items should be small; (2) after a small amount of labeled data is collected, a few gradient descent updates should lead to good generalization ability.

Deep Interest-Shifting Network.
In this section, we present DisNet, a learning framework for recommending items in a fresh-item recommendation page, which usually contains rich scenario-specific contexts. e overall network architecture is shown in Figure 2.
We note that the latent vector of a user actually reflects his or her interest in a latent space, while the scenariospecific contexts reflect the interest shifting in the user's general interests [34,35]. For example, there is a boy who is interested in sports, games, and electronic products. Once he clicks a fresh item iPhone-11, he may pay more attention to electronic products with advanced technology, and we can recommend him newly released smartphones, laptops, and so on. We assume that such interest shifting will not change its latent semantics. In other words, the shifted representations can directly be fed into the decision-making network g. By this assumption, we can decouple the general interest of users from the scenario-specific contexts. Denoting the scenario-specific context by c s , we propose an interestshifting operator (ISO) to obtain a shifted user representation: where q s u and q u have the same dimension m. h maps the contexts to a latent space to extract their critical information.
It is noteworthy that there is a huge amount of auxiliary data, from which we can model the general interest of the users. us, we can pretrain the item/user representation networks as well as the decision-making network using these data. We denote the pretrained networks by f t , f u , and g. en, the context information can be incorporated to shift the latent user vector to a scenario-specific one but in the same interest space. Formally, DisNet makes the decision by Such a model not only transfers knowledge from a general interest domain that has rich data samples but also reduces the size of the trainable parameters to the ISO(·) and h functions only. Obviously, the context-aware and datadeficiency problems can be addressed simultaneously.
Note that c is some contexts shared by the two domains. However, it is possible that auxiliary data have their own context as well. We ignore such contextual information and preserve the common parts only because we are modeling the general interest of the users. In practice, we also enable the decision-making network g and the embedding layer to be fine-tuned.

Notation
Definition or descriptions (u, t) A pair of user u and item t e t e id embedding of item t v t e embedded features (except id) of item t v u e embedded features (include id) of user u c, c s Common contexts and scenario-specific contexts Cold-start and warmed item id embeddings in the meta-training procedure D s , D q Support and query sets of a cold-start item in D c y * , y′ Predicted labels on D s and D q l c , l w , l meta Cold-start, warmed, and meta-loss in the meta-training procedure D a , D w , D c Auxiliary, warm-start, and cold-start datasets y, y, y DN Ground-truth label and predicted labels of context-aware models and DisNet ISO(·) Interest-shifting operator Weight matrices and bias vectors of the NN operator Weight matrices and bias vector of RM-IdEG W, h, b Parameters of attentional embedding aggregator T, T c Item sets of D w ∪ D c and D c 4 Complexity

Interest-Shifting Operators.
e above discussion provides the overall network architecture. Now, we can perform any reasonable shifting operation to learn the context-specific representation of the user. In this work, we introduce three interest-shifting operators, all of which relate to very interesting interpretations.
Add Operator. Motivated by the huge success of the representation learning and knowledge graph, we adopt a similar strategy as TransR [36]. Specifically, it embeds each entity and relation by optimizing the translation principle e a + e r ≈ e b if a triplet (a, r, b) exists in the graph. Recall the example of interest shifting, i.e., when a boy clicks an item iPhone-11, the interest representation of this boy goes to the interest of a boy who has a preference for electronic products with advanced technology. If we regard the contextual information as a relation, we obtain our first operator, which adds up the latent user vector and contextual vector: is implies that q u and z s have the same dimension. at is, the projection function h directly learns the discrepancy between the original interest and the shifted interest, which is similar to the relation embedding in the knowledge graph.
COT Operator. Before introducing the second operator, we review a popular technique in the context-aware recommendation, namely, the contextual operation tensor (COT) [37]. By estimating a contextual operation matrix, COT maps the original user/item latent vectors to their context-specific ones. We notice that COT has three main limitations: (1) it assumes the context space is fixed and the contextual operation matrix relates to different context values; (2) it jointly learns the original latent vectors as well as the contextual operation matrix; and (3) it uses linear mapping, i.e., a 3D tensor, to obtain the contextual operation matrix, which leads to degenerated performance. Obviously, COT cannot be applied to our problem directly because the data-deficiency problem prevents the joint learning procedure, and cross-domain data have different contexts. Fortunately, in DisNet, we have decoupled the user's general interest from the scenario-specific interest. erefore, we can estimate the scenario-specific context operation matrix using the h function: Here, h outputs a d × d matrix instead of a single vector. In other words, while COT focuses on different context refer to the weight matrices and bias vectors. σ is the activation function.
[·‖·] denotes the concatenation of two vectors. It is worth pointing out that any network architectures can be used, and this paper considers a simple multilayer perceptron.
While the add operator regards the contexts as bias and the COToperator considers the cross-influences between the user interest and contexts, the NN-based operator achieves these two goals simultaneously.

Relational Meta-Id-Embedding Generator.
is section concentrates on babysitting fresh items in the cold-start phase, where they suffer from a severe cold-start problem. It is worth noting that DisNet can reuse the embedding layer after pretraining. en, all the attributes except item id obtain great embeddings. Hence, the only thing that matters is the item id embedding initialization. Following [27], this work learns an IdEG in a learning-to-learn manner. Nevertheless, we notice that the vanilla meta-IdEG feeds item features into a simple neural network to generate embeddings. Obviously, meta-IdEG neglects the fact that id embedding reflects the community structural information between items, exploiting which has been proved beneficial for alleviating the cold-start problem [26].
To remedy this problem, a novel relational meta-Id embedding generator (RM-IdEG) is proposed, whereas it trains the item id embedding in a learning-to-learn manner and integrates relational information for better embedding initialization, which further improves the performance of DisNet on new items. Specifically, we collect a set of warmstart items that are significantly relevant to the cold-start item t. Many influential relations can be considered, such as items from the same seller and the same brand. For instance, a newly released Nike T-shirt may have similar selling behaviors as other items in Nike shops. en, we construct an id embedding set I t � e 1 t , . . . , e k t . Here, e i t (i � 1, 2, . . . , k) denote the id embeddings of top k relevant items. en, we output the new embedding via an attentional embedding aggregator: Here, C is used for normalization. e attention score a i t is given by a global attention network: where h, W, and b are shared attention parameters. en, we feed the learned attentional id embedding and item features into a neural network to obtain the final embedding: where W i (i ∈ 1, 2 { }) are weight matrices and b 1 is the bias vector. To obtain numerically stable outputs, we follow some tricks in [27]: (1) the bias of the last layer is removed; (2) tanh activation is applied in the final layer.

Remark 1.
e proposed model fully addresses the cold-start problem from two aspects: (1) through a learning-to-learn training procedure, our model achieves better generalization ability with few training data; (2) by considering influentially relevant items, RM-IdEG automatically encodes community structural information into the embedding initialization, and the predictive accuracy is further improved.

3.3.
Training. Now, we describe the training procedure of our model. Note that the training fresh item set T does not contain those newborn items. Consequently, we choose an item subset T c from T to simulate the cold-start setting. For each item in T c , which corresponds to a task, we preserve m examples for both the support set and the query set (a total of 2m examples). e remaining examples of these items are dropped since they should not appear before we train the RM-IdEG. To avoid the performance of the base model being decreased, we limit each item in T c to having less than or equal to M examples (M > 2m) and obviously, greater or equal to 2m examples. We denote the constructed cold-start dataset by D c . e data examples of the remaining items T − T c constitute the warm-start dataset D w . Remark that the items in T are all warm-start items since they have at least one data example. We call D c cold-start because they are used to train RM-IdEG, which is designed for totally cold-start items. Also, D w is called warm-start since it is used to train DisNet, which does not consider the cold-start problem.
In summary, we have three datasets: (1) an auxiliary dataset D a , having no scenario-specific contexts, collected from other domains; (2) a warm-start dataset D w that has rich contextual information; and (3) a cold-start dataset D c that contains few-shot examples. Accordingly, the whole model is trained in three stages, and we put the details in Algorithm 1.

Experiments
To justify the effectiveness of DisNet and RM-IdEG, we conduct comprehensive experiments to answer the following questions:

Data Splitting.
For MovieLens and Book-Crossing, we first group the items by their ids. We put those items with the number of examples less than M + 1 and larger than 2m − 1 in T c . en, we construct a cold-start dataset D c by Input: D a : auxiliary dataset Input: D w : warm-start dataset Input: D c : cold-start dataset Input: (t, u, c, c s ): a testing example Output: y DN : the predicted label of (t, u, c, c s ) 1 repeat 2 / * the first stage, pretrain the model using auxiliary data * / 3 Randomly sample a batch of data from D a 4 Calculate the predicted label y by equation (1) 5 Update g, f u , f t by gradient descent 6 until Converge 7 Fix g, f u , f t to g, f u , f t 8 repeat 9 / * the second stage, train DisNet using warm-start data * / 10 Randomly sample a batch of data from D w 11 Calculate q u , p t using f u , f t 12 Compute the shifted interest vector q s u by equation (6) 13 Calculate the predicted label y DN using g by equation (7) 14 Update h, ISO(·) by gradient descent 15 until Converge 16 Fix all the trainable parameters except the item id embeddings 17 repeat 18 / * the third stage, train RM-IdEG using cold-start data * / 19 Randomly sample an item t i and get its support/query sets (D s i , D q i ) from D c 20 Aggregate embeddings of relational items of t i by equation (11) 21 Generate an id embedding e * t for t i using RM-IdEG 22 Compute the cold-start loss on D s i by equation (2) 23 Update the id embedding of t i to e t ′ by equation (3) 24 Compute the warmed loss on D  For Taobao-Fresh, the auxiliary data D a have been collected. We then split the fresh-item recommendation data into two parts. e first one is a cold-start dataset D c where items have greater than or equal to 10 interactions and less than or equal to 20 interactions. Similarly, each item in D c has a support set and a query set, each of which has 5 examples. e examples of the remaining items are collected as the warm-start dataset D w . e statistics of these datasets can be found in Table 2.

Data Generation.
To answer RQ1, for each dataset, we run DisNet on three types of data: Auxiliary-only data: they contain the auxiliary data and context-free warm-start data, i.e., the context features of the warm-start data are removed. Context-only: it is exactly warm-start data. In other words, DisNet is run without pretraining. Full data: they comprise auxiliary data and warm-start data and are the main setting of this paper.
Note that the three types of data are used to test the effectiveness of DisNet, while cold-start data are used to evaluate the superiority of RM-IdEG.
For performance evaluation, we randomly divide the warm-start and cold-start data into 80% training and 20% testing. We run the experiments for five times, and the mean AUC performance on the testing set is reported.

Baselines.
We evaluate the proposed model in two stages. In the first stage, we compare DisNet with three context-aware recommendation models: DeepFM [11]: it feeds embeddings to a factorization machine model as well as a multilayer perceptron and then aggregates their outputs and gets the final prediction. PNN [13]: the dense embeddings are fed into a dense layer and a product layer. en, it concatenates their outputs together and uses a two-layer neural network to get the prediction. CFM [15]: CFM is a recent state-of-the-art CARS method that explicitly learns second-order feature interactions. It calculates the pairwise outer product of dense embeddings and stacks them to obtain an interaction cube. en, it applies the convolution pooling technique to get the final prediction. e dimension of embedding vectors of each input field is fixed to 128, and the activation function is chosen as ReLU for all the models. As suggested in [11], we use three dense hidden layers as the deep component for both DeepFM and PNN. For DisNet, the size of the user/item latent representation is set as 64. We use two fully connected layers with a hidden dimension of 64 for user/item representation networks as well as the decision-making network. We do not activate the outputs of user/item representation networks. e context network of the NN/add ISO and the shifting network of the NN ISO also comprise two fully connected layers with hidden size 64 and without activation in the final layer. For the COT ISO, we linearly learn a contextual operation matrix of size 64 × 64 from the contexts. Finally, the learning rate and l 2 -regularization parameters are finetuned by five-fold cross-validation. en, we evaluate the RM-IdEG with two baselines: Rand-IdEG: the random initialization of id embeddings is one of the most commonly used strategies in recommender systems. Meta-IdEG [27]: the state-of-the-art solution to the cold-start problem. It firstly feeds the item features into a simple neural network to generate embeddings and then trains them in a learning-to-learn manner.
For Rand-IdEG, we initialize the id embeddings with random values from a standard Gaussian distribution with standard deviation 0.01. For meta-IdEG, we use the neural network architecture as suggested in [27]. For RM-IdEG, we use a two-layer neural network with a hidden size of 128 as the IdEG network. According to Pan et al. [27], the tradeoff parameter η is robust. Hence, we follow their experimental setting and set η as 0.1 for meta-IdEG and RM-IdEG. We also follow their two suggestions that use tanh as activation and remove the bias of the output layer. For a target item in the synthetic dataset, we choose k-nearest neighbors from the previous training dataset, i.e., D w ∪ D a , using hamming distance as the relevant items, where k is chosen by five-fold cross-validation. For Taobao-Fresh, we randomly select 10 items having the same seller and 10 items having the same brand as the relevant items. We choose DisNet-NN as the base model, which has been pretrained by D a and D w . Tables 3 and 4 report the testing AUC comparison of three context-aware models on two synthetic datasets and the Taobao-Fresh dataset. We have the following findings:

Performance Comparison of Context-Aware Models (RQ1).
All the methods obtain the best performance on the full data. For example, on Taobao-Fresh, DisNet-NN improves the AUC scores on auxiliary-only and contextonly data by 1.00% and 1.69%, respectively. is finding verifies the importance of utilizing auxiliary data and contexts to alleviate the data-deficiency problem. On the Taobao-Fresh dataset, all the methods achieve significantly greater improvement on the context-only data than the auxiliary-only data. It demonstrates that, in the fresh-item recommendation task, the context information highly reflects the user interest.

Complexity
On auxiliary-only data, all the models are competitive with each other. However, on full data, the performance of baselines shows no significant improvement after the context features being involved. e reason is these baselines deeply couple the context in the model, and thus, the knowledge of the auxiliary domain cannot be fully utilized. Take DeepFM as an example; since D a and D w have different input formats, the deep component cannot be reused.
ough we can reuse the embedding layer, its predictive performance is limited. DisNet models with full data significantly outperform all the baselines as well as their auxiliary-only and context-only counterparts. e interest-shifting operator enables us to completely exploit both context and cross-domain information. Different interest-shifting operators show competitive performance with each other. Moreover, the NN-based operator obtains the best performance because it enables the user interest to be shifted nonlinearly. Interestingly, DisNet-COT always underperforms DisNet-Add on the context-only dataset but is better than DisNet-Add on the full dataset. We suppose the reason is the COT operator tends to overfit on contextonly data since it contains more parameters. With the help of auxiliary data, this problem is alleviated.

Performance Comparison of Different IdEGs (RQ2).
Tables 5 and 6 list the cold-start and warmed-up performance of DisNet with different id embedding generators. Once the IdEG produces the id embeddings, the cold-start performance is directly evaluated on a meta-testing query set, where all items are cold-start ones. en, we perform one step of gradient descent to update the id embeddings using a meta-testing support set that contains the same items as the query set. Finally, the warmed-up performance is evaluated again on the query set.
From the results, we conclude that Meta-IdEG and RM-IdEG outperform Rand-IdEG on both cold-start and warmed-up phases because the learning-to-learn training procedure guarantees them to quickly achieve good generalization ability on unseen data. RM-IdEG achieves the best performance on all the datasets. In particular, even with one-shot training, RM-IdEG still outperforms on the Book-Crossing dataset. By integrating information of significantly relevant items, RM-IdEG inherently models the community structural information when initializing id embeddings.

Parameter Sensitivities (RQ3).
e main parameters are the tradeoff parameter of the meta-loss η and the number of relevant items k. e robustness of η has been studied in [27]. us, we investigate the sensitivity of k and the results on Book-Crossing and MovieLens datasets which are shown in Figure 3. We can see that when k is small, the performance is close to Meta-IdEG because few relational information is learned. e best result is obtained when k � 6, and then the   performance drops. e reason is that, as k becomes larger, the relations become weaker, but the model complexity increases.

Context-Aware Recommendation.
Context-aware recommender systems (CARSs) have attracted considerable attention in past years [7]. Early work in CARS can be divided into two categories: (1) prefiltering methods [40], where context guides the selection of training data; (2) postfiltering methods [41], where context drives recommendation results' selection. e main limitation of these methods is that they require the supervision and fine-tuning in all steps of recommendation [42]. To address this problem, contextual modeling approaches capture the contextual information directly in model construction. Some works are based on matrix factorization [8], such as CAMF [28] and CSLIM [9]. Another group of studies exploits tensor factorization techniques for modeling useritem-context relations [43,44]. Recently, factorization machines [42,45,46] and deep learning [47,48] based on CARS become increasingly popular, which directly model nonlinear interactions between features. Some studies also use representation learning techniques, e.g., CARS 2 [49] and COT [37], which provide not only a latent vector but also context-aware representations. In summary, all the above methods assume the data are sufficient for training, while severe data-deficiency problem occurs in many fresh-item recommendation pages.

Cross-Domain Recommendation.
As we have discussed, data deficiency is one of the most challenging problems for recommender systems, and it is much more significant in many fresh-item recommendation scenarios. One promising solution to this problem is cross-domain recommender systems (CDRSs) [50]. Existing CDRSs can be categorized into symmetric and asymmetric ones. Symmetric models [16,18,51,52] collect sparse data from multiple domains and anticipate that these domains can complement each other. In our task, symmetric strategy is incompatible because the two domains have heterogeneous data format and imbalance data size. us, we consider asymmetric models [19,20,21],   which aim to leverage data in an auxiliary domain to alleviate data deficiency of the target domain. In this way, knowledge learned from the auxiliary domain is directly transferred to the target domain, acting as priors or regularization. Nevertheless, many asymmetric CDRSs adopt shallow methods and have the difficulty in learning complex user-item interaction relationship [18,26]. Moreover, scenario-specific contextual information of the target domain has been seldom considered.

Cold-Start Recommendation.
When recommending cold-start fresh items, a severe cold-start problem occurs. To handle this problem, it is common to collect information for the cold item or user, e.g., item attributes [22,23] and user attributes [24,25]. A recent work HERS [26] also utilizes relational data to boost performance, such as social information of users. In [16], the authors explored a symmetric cross-domain recommender system, where shared knowledge can help alleviate the cold-start problem.
Recently, a series of works [27,53,54] also adopt metalearning technique [55] which enables the recommender system to achieve good generalization ability after few-shot training. From the cold-start user perspective, MeLU [53] learns a meta-id embedding for the cold-start users and then predicts the user preference on the items by the norm of gradients. From the cold-start item perspective, Pan et al. [27] proposed the meta-Id embedding generator (meta-IdEG), which also takes id embedding initialization into account. However, since meta-IdEG only uses item features to generate id embedding, it ignores the community structural information concealed in id embedding, which leads to a suboptimal solution.

Discussion and Conclusion
6.1. Further Discussion. In this section, we discuss the significance of this work.

Importance of the Application.
e fresh-item recommendation task reveals a new perspective of personalized recommendation, i.e., the impact of items' life period. Some people may prefer products which stand the test of time, while some others may be interested in newly released products.
e New Tendency page enables the latter ones' recommendation to be fully personalized. From another point of view, these fresh items also obtain more opportunities to be exposed. Hence, high-quality and novel products can quickly become popular. We also address the main difficulties of this learning task, i.e., data deficiency and cold-start.
6.1.2. Importance of the Techniques. Surprisingly, though the two techniques DisNet and RM-IdEG are proposed to handle the fresh-item recommendation task, we find that both methods have a wide range of applications.
As aforementioned, the DisNet is designed for freshitem recommendation pages. Actually, such pages are quite common in existing E-commerce platforms. For example, after a bill being paid, the E-commerce platform will recommend other related items to the customers. It is a classical fresh-item recommendation scenario. Obviously, a fresh-item recommendation page usually contains rich contextual information. e contexts reflect that the user interest shifts from a general one to a scenario-specific one. However, with fewer page views, such pages usually face severe data-deficiency problems. And this work can address this issue by giving a novel learning framework, which simultaneously transfers knowledge from an auxiliary domain as well as fully utilizes the context information.
RM-IdEG can also be applied to many real-world applications. In [27], the authors proposed to learn meta-idembeddings for cold-start advertisements. And we can also collect relevant advertisements by its company, topic, and so on. As a result, the model can generate better id embeddings. Furthermore, other relational data can also be considered. For instance, if we consider the user cold-start problem [53], we may explore the social networks of a new user so that RM-IdEG is able to initialize a fast-adapting and relationaware id embedding.

Conclusion.
In this work, we address two difficulties of the fresh-item recommendation task. First, we propose a deep interest-shifting network to deal with the data-deficiency problem of fresh item recommendation. Specifically, users' general interests are learned from a huge number of an auxiliary dataset. en, our model shifts the user interest to a scenario-specific one using context features. Second, we propose a relational meta-Id-embedding generator (RM-IdEG) to alleviate the cold-start problem. RM-IdEG is trained in a learning-to-learn manner with relational information being integrated. Hence, community structural information can be inherently embedded in the id embeddings of newborn items. Extensive experiments on two synthetic datasets and a real-world dataset clearly identify the effectiveness of our approaches, which have been already deployed on a largescale online fresh-item recommendation application.

Data Availability
Previous reported data were used to support this study, and these prior studies (and datasets) are cited at relevant places within the text as references [38,39].

Conflicts of Interest
e authors declare that they have no conflicts of interest.