News Recommendation System Based on Topic Embedding and Knowledge Embedding

: News recommendation system is designed to deal with massive news and provide personalized recommendations for users. Ac‐ curately capturing user preferences and modeling news and users is the key to news recommendation. In this paper, we propose a new framework, news recommendation system based on topic embedding and knowledge embedding (NRTK). NRTK handle news titles that us‐ ers have clicked on from two perspectives to obtain news and user representation embedding :1) extracting explicit and latent topic features from news and mining users 􀆳 preferences for them in historical behaviors; 2) extracting entities and propagating users 􀆳 potential preferences in the knowledge graph. Experiments in a real-world dataset validate the effectiveness and efficiency of our approach.


Introduction
Online news websites collect news contents from a variety of sources and provide them to users, attracting a large number of users.However, due to the large amount of news generated every day, it is almost impossible for users to read all the articles.Therefore, it is critical to help users target their reading interests and make personalized recommendations [1][2][3][4][5] .
In order to improve the accuracy of recommenda-tion systems, recent research focuses on learning the representation of news more comprehensively.Deep Knowledge-aware Network (DKN) [4] embeds each news from three perspectives: word, entity and entity context, and then designs a CNN model to aggregate these features together.RippleNet [5] obtains the potential interest of users by automatically and iteratively spreading their preferences in the knowledge graph.However, DKN and RippleNet not only ignore rich semantic topics in the news titles, but also fail to consider the relevance be-tween topics and users  preferences for those topics to learn more precise news representations.
As shown in Fig. 1, news titles may contain not only a variety of entities, such as politicians, celebrities, companies, or institutions, but also multiple topics, such as politics, entertainment, sports, etc., all of which often play important roles in the title.Long-and short-term user representations (LSTUR) [1] uses explicitly given topic information to learn the representation of news titles.Although explicit topic labels can accurately represent the information of the news, when a news title contains two or more different topics, simple topic information may not be detailed enough to give a more comprehensive representation of the news topic.Therefore, we need the latent topic information to model the news titles in more details.
For example, the following news title, "Donald Trump vs. Madonna: Everything We Know", appears as a music topic.However, the content of the news is more relevant to politics.Such misinterpretation in news modeling can lead to serious errors in learning users  topic preferences.Therefore, only considering the explicit topic information and ignoring latent topic information of news will reduce the accuracy of news recommendation systems.
To address the limitations of existing methods and inspired by the wide success of leveraging knowledge graphs, we propose a news recommendation approach based on topic and entity preference in historical behavior.The core of our approach is a news encoder and a user encoder.In the news encoder, we jointly train news title and word vectors to get the topic information of the news and extract entities to construct the knowledge graph.In the user encoder, we use a combination of long short-term memory network and self-attention mechanism to mine users topic preferences and a graph attention algorithm to mine users  potential preferences for the entities in knowledge graph based on users  historical behavior.Extensive experiments on a real-world dataset prove the validity of our news recommendation method.

Our Approach
In this section, we first introduce the overall framework of news recommendation system based on topic embedding and knowledge embedding (NRTK), as illus-trated in Fig. 2, then discuss the process of each module with encoders.NRTK contains three parts, news encoder, user encoder and click predictor.For each news, we extract a news representation vector through the news encoder, which uses two modules to extract features of the news, allowing us to obtain embedding vectors set for a user  s clicked news.In the user encoder, we use the long-and short-term memory network (LSTM) combined with self-attention to learn the users topic preferences, and then use a graph attention algorithm to aggregate the users entity preferences to obtain the final representation of the user.In the click predictor, we use the scoring function to calculate the probability of a user clicking the candidate news.

News Encoder
The news encoder module is used to learn news rep- The first one is word embedding and knowledge graph embedding.Each news title is composed of a sequence of words, t = [w 1  w 2  ].In order to construct the semantic space, we use the word2vec model to pretrain a matrix W nm for word vectors and a matrix W ′ nm for context word vectors.In addition, each word w may be associated with an entity e in the knowledge graph, then we use the TransE [6] to obtain entity embeddings e Î H d , d is the size of the vectors to be learned for each entity in news title, and take the average value k as the knowledge graph embedding of the title.
The second module is topic-level embedding.We use doc2vec Distributed Bag of Words (DBOW) [7] to learn jointly embedded news title and word vectors.The doc2vec DBOW model consists of a matrix D cm , where c is the number of all news titles and m is the size of the vectors to be learned for each news title.For each news title t in the corpus, the context vector w Î W ' nm of each word in the news title is used to predict the news titles In the learning process, the news title vectors are required to be close to the word vector of the words in them, and far from the word vector of the words not in them.This results in a semantic space where news titles are closest to the words that best describe them and far from words that are dissimilar to them.In this space, an area where news titles are highly concentrated means that news titles in this area are highly similar.This dense area of news titles indicates that these news titles share one or more common latent topics.We assume that the number of dense areas is equal to the number of topics.
We use the uniform manifold approximation and projection for dimension reduction (UMAP) [8] to reduce the dimension of the news title vector.Then, we can use hierarchical density-based spatial clustering of applications with noise (HDBSCAN) [9,10] to identify the dense clusters of news titles and noise news titles in the UMAP-reduced dimension, and uses a noise label or a label of dense clusters to mark each news title in the semantic embedding space.
The topic vectors can be calculated by assigning labels to each dense news title cluster in the semantic embedding space.Our method is to calculate the centroid, i.e. the arithmetic means of all news title vectors in the same dense cluster.
Finally, we get a matrix C xm  where x is the number of topics, m is the dimension of the topic vector.For each news title t, we get its topic embedding as follows: (2) where W t is a weight matrix of topics, and t is the news titles topic embedding.
The final representation of a news title is the contact of averaged entity embeddings and topic embedding, formulated as: r = kÅt (3)

User Encoder
The user encoder module is used to learn the representations of users from their browsed news.It contains two modules.
The first one is topic preference learning module.The purpose of this module is to learn long-term and short-term user topic preferences.Since users have different degrees of interest in each historical click news title, and the attention mechanism can capture the topic that the user is interested in, long and short-term memory network combined with the self-attention mechanism can be used to mine users topic preferences according to the users historical click behavior.
From the news encoder, we have got news  topic embedding T c*m .Given the users click historical matrix Y Î T c*m , we can obtain the query Q, key K and value a in the self-attention mechanism by the nonlinear transformation of click historical matrix Y as follows: are weight matrices of the query and key.Then, the weight matrix P can be obtained as follows: where P is a similar matrix of click historical matrix Y Î T c*m .Finally, the output of self-attention can be obtained by multiplying the similarity matrix P and the history matrix Y. a = PY (7)  where aÎ T c*m is the user preferences.We average the self-attention results to learn a single attention value.
where p is the user topic preference embedding.
The second module is a knowledge graph-level preference propagation module.In the knowledge graph, the head entity is related to many entities through direct or indirect relationships, but the existence of relation-ships does not mean that users will have the same degree of interest in these entities.This module uses graph attention networks to learn semantic networks.
To describe users  hierarchically extended preferences based on the knowledge graph, we recursively define the set of n-hop relevant entities for user u as follows: E 0 u represents the entities contained in the news titles that the user has clicked on in the past.
We then define the n-hop triple set of user u as follows: (10) where S n u are triples associated with the entities in E n u .Given the average value k Î H d of entity embeddings in user click news titles and the 1-hop triple set S 1 u of user u, we use an attention mechanism to learn the entities the user prefers.
where r i Î R d ´d and h i Î R d ´d are the embeddings of relation r i and head h i , respectively.The x i can be regarded as the weight indicating the users interest in the entity h i under the relation r i .Users may have different degrees of interest in the same entity with different relations, so taking the relations into account when calculating the weights can better learn the users interest in entities.
After obtaining the weights, we multiply the tails in S 1 u with them, and the vector hop 1 can be obtained by linear addition: where t i Î R d ´d represents the tails in S 1 u .Through this process, a user  s preferences are transferred from his click history to the 1-hop relevant entities E 1 u along the links in S 1 u .By replacing k with hop 1 in Eq. ( 11), the module iterates this procedure over user u s triple set S i u for i = 1•••N.Therefore, a user  s preference is propagated N times along the triple set from his click history, and N different preference sequences are generated: hop 1 , hop 2 , ••• , hop N .To represent the users final entity preference embeddings, we merge all embeddings.
The embedding f is the output of this module.The final user representation is the contact of entity preference embedding and topic preference embedding, formulated as: u = pÅf (14)

Click Predictor
The click predictor is used to predict the probability of a user clicking a candidate news.Denote the representation of a candidate news t as r, the click probability score ŷ is computed as follows: where is the sigmoid function.

Datasets and Experimental Settings
We use the Bing News server logs from May 16, 2017 to January 11, 2018 as our dataset.Each piece of impression in the dataset contains a timestamp, a news ID, a title, a category label.The basic statistics and distribution of the news dataset are shown in Table 1.In our experiments, we divided the dataset into training set, validation set and test set in a 6:2:2 ratio.The word embeddings are 300-dimensional and initialized by the word2vec model.The entity embeddings are 50-dimensional and initialized by the TransE.And we set the hop number H = 2.These hyperparameters are tuned on validation set.In addition, the experiment was independently repeated for 10 times and the average results in terms of area under curve (AUC) and accuracy (ACC) was taken for performance analysis.

Results
The results of all methods in click-through-rate (CTR) prediction are presented in Table 2. Experimental results show that our recommendation system performs best compared with other recommendation models.Specifically, NRTK outperforms baselines by 1.9% to 8.0% on AUC and 2.1% to 8.3% on ACC, respectively.
We also evaluate the influence of maximal hop number H on NRTK performance.The results are shown in Table 3 which shows that the best performance is achieved when H is 2 or 3.This is because if H is too small, it is difficult to explore the connection and longdistance dependence between entities, while if H is too large, it will bring much more noise than useful signals.

Ablation Study
To verify the validity of our approach that attention mechanisms can improve recommendation performance, we designed an ablation study to evaluate our model.In this section, instead of using attention mechanisms to capture user preferences for topics and entities, the ablation model simply aggregates them together.The experimental results are shown in Fig. 3. From these results, we find the self-attention and graph attention are very useful.This is because users have different interests on different topics and entities, and capturing users preferences is important for recommendations.

Parameter Sensitivity
In this section, we study the effect of parameters d and training weight of knowledge graph embedding term λ 2 on the model performance.We change d from 2 to 128 and λ 2 from 0 to 1.0, keeping other parameters constant.The results of AUC are shown in Fig. 4. We observe from Fig. 4(a) that the performance of the model improves at the beginning with increasing d, as larger dimensional embeddings can encode more useful information, but degrades after d = 64 due to possible overfitting.From Fig. 4(b), it can be seen that the performance of NRTK reaches the best when λ 2 = 0.01.

Conclusion
In this paper, we propose NRTK, an end-to-end framework that naturally incorporates the topic model and knowledge graph into recommendation systems.NRTK overcomes the limitations of existing recommendation methods by addressing two major challenges in news recommendation: 1) explicit and latent topic features are extracted from news titles by topic-level embedding, and users  long-term and short-term preferences are mined for them; 2) through knowledge graphlevel preference propagation module, it automatically propagates users  potential preferences and explores their hierarchical interests in the knowledge graph.We conduct a lot of experiments in a recommendation scenario.The results show that NRTK has a significant advantage over the strong baseline.
For future work, we plan to improve the efficiency and precision of finding topics and further investigate the methods of characterizing entity-relation interactions.

Fig. 1
Fig. 1 Illustration of news title with a variety of entities and topics