MRE: A translational knowledge graph completion model based on multiple relation embedding

: Knowledge graph completion (KGC) has attracted significant research interest in applying knowledge graphs (KGs). Previously, many works have been proposed to solve the KGC problem, such as a series of translational and semantic matching models. However, most previous methods su ff er from two limitations. First, current models only consider the single form of relations, thus failing to simultaneously capture the semantics of multiple relations (direct, multi-hop and rule-based). Second, the data-sparse problem of knowledge graphs would make part of relations challenging to embed. This paper proposes a novel translational knowledge graph completion model named multiple relation embedding (MRE) to address the above limitations. We attempt to embed multiple relations to provide more semantic information for representing KGs. To be more specific, we first leverage PTransE and AMIE + to extract multi-hop and rule-based relations. Then, we propose two specific encoders to encode extracted relations and capture semantic information of multiple relations. We note that our proposed encoders can achieve interactions between relations and connected entities in relation encoding, which is rarely considered in existing methods. Next, we define three energy functions to model KGs based on the translational assumption. At last, a joint training method is adopted to perform KGC. Experimental results illustrate that MRE outperforms other baselines on KGC, demonstrating the e ff ectiveness of embedding multiple relations for advancing knowledge graph completion.


Introduction
A knowledge graph (KG) [1,2] is a semantic network designed to describe entities of the real world and the relations between various entities, sometimes referred to as the Knowledge Base (KB) [3,4]. A KG also can be denoted as a triple [5] indicating that the head entity and the tail entity are connected by a relation, e.g., (head entity, relation, tail entity). However, even some large-volume KGs, such as Freebase [6] and NELL [7], are still incomplete, i.e., missing a lot of correct KGs. Thus, many researchers have paid massive attention to knowledge graph completion (KGC) to validate whether a KG is correct or not. A KGC task is one of the research directions of knowledge representation learning (KRL) [8][9][10][11] that aims at validating whether a triple is correct or not while preserving the inherent structure of the KGs. Examples of knowledge graphs are illustrated in Figure 1. The colored ellipses represent entities, and arrows connect the relations from the head entities to the tail entities. Most of the available methods perform the KGC tasks by embedding components of KGs into continuous vector spaces through two steps: embedding KG components (entities and relations) and then defining a scoring function to measure the plausibility of each triple. The current KGC models can be categorized as translational and semantic matching models. Translational models take relations to translate head entities to tail entities, such as TransE [12], TransH [13] and TransR [14]. Translational models can take advantage of the transitional character of KG components, but they only use addition operations, which limits the expressive power of KGC models. Semantic matching models match latent semantics of KG components in the embedding space, such as RESCAL [15] and DistMult [16]. Semantic matching models can capture the semantic similarities among different triples, but some models with fully connected layers usually cause overfitting problems. Consequently, some convolution neural network (CNN)-based methods [17,18] have been proposed to perform KGC tasks. They can capture deep expressive features and alleviate overfitting problems.

Rome
Recently, some researchers have attempted to use path information to improve the performance of knowledge graph completion models. A path starts along the head entity, passes through intermediate entities, and reaches the tail entity. Taking a 2-hop path as an example, the path can be composed of two multi-hop relations and an intermediate entity, i.e., r 1 −→e−→r 2 , where r 1 and r 2 are 2-hop relations, and e denotes a intermediate entity. PTransE [19] is a typical translational model that performs simple addition operations on multi-hop paths to complete knowledge graphs. Other methods like RUGE [20] can inject logic rules [21] into the model and use the rules to guide the representation learning of entities and relations. The rules are interpretable and contain rich semantic information [22], showing the power of knowledge reasoning. The process of reasoning can be defined as r ⇐ (r 3 , r 4 ), where r 3 and r 4 denote rule-based relations, and r can be inferred by these two rule-based relations.
Even though the above methods can achieve good performance, it remains challenging to conduct KGC tasks. There are two limitations: First, most knowledge graph completion models only consider the single form of relations since they only embed direct relations or multi-hop paths to capture semantic information of relations. From the perspective of natural language processing, different forms of the same relation are semantically related and complementary. Therefore, multiple (direct, multi-hop, and rule-based) relations can potentially be exploited to capture adequate semantic information and advance the performance of KGC. Second, the data-sparse problem of knowledge graphs would make part of relations challenging to embed since very little data is involved. To address these issues, we propose a novel knowledge graph completion model called multiple relation embedding (MRE), which can simultaneously embed multiple relations and jointly learn embeddings of entities and relations.
In this paper, we reconfirm that translational KGC methods are effective and apply the main idea of TransE [12] in our proposed work. MRE aims to take advantage of multiple relations to perform knowledge graph completion. Specifically, MRE mainly consists of four steps. MRE first extracts multi-hop and rule-based relations through corresponding tools. Then, MRE proposes different encoders to encode multi-hop and rule-based relations. Note that our proposed encoders of multiple relations do not learn each relation in isolation but continuously interact with entities during the learning process and use pre-trained embeddings as supervision to maximize the restoration of the semantic information of multiple relations. Next, MRE defines new energy functions to model knowledge graphs. Finally, a joint training method is adopted to train our proposed MRE.
In summary, the proposed KGC method can better use multiple relations and fully capture semantic information of knowledge graphs. Our contributions can be summarized as follows: • We propose a novel method for knowledge graph completion that simultaneously embeds multiple relations (direct, multi-hop and rule-based) in a unified framework and defines new energy functions to learn representations of entities and relations. • Our proposed multiple relation encoders can continuously interact with connected entities in the process of relation encoding, which is rarely considered in most knowledge graph completion methods. • This paper evaluates the MRE on two benchmark datasets of FB15K-237 and NELL-995 with knowledge graph completion. Experimental results show that MRE has achieved the best performance on all evaluation metrics compared to several baselines.
The rest of this paper is organized as follows: Section 2 summarizes the related work of knowledge graph completion. Section 3 introduces some notations and definitions used in this paper. Section 4 introduces the details of our proposed MRE. Section 5 shows and analyzes our experimental results, including comparisons with current methods, further evaluations and visualization analysis. Section 6 concludes our work and points out future research directions.

Related work
This section introduces KGC models that embed KGs into continuous low-dimensional spaces to capture latent semantic representations, including translational and semantic matching methods.
Translational methods: Translational models leverage distance-based scoring functions and measure the plausibility of a triple as the distance between head entities and tail entities, specifically after the relations through a translation. TransE [12] is inspired by word2vec, where relations can be regarded as translations between head entities and tail entities in implicit semantic embedding space. However, TransE has its drawbacks in handling complex relations, such as one head entity and one relation corresponding to multiple tail entities. To tackle this problem, TransH [13] is proposed, in which each relation has a specific hyperplane while projecting the head entity and the tail entity onto the hyperplane. Since there may be infinite hyperplanes in each relation, TransH uses approximate orthogonality to select a hyperplane which may prevent the model from dealing with entities and relations properly. TransR [14] extends TransH [13] by using relation-specific spaces to complete a KGC task. PTransE [19] is an extension of TransE which proposes a path constraint resource allocation method to extract multi-hop paths and compose all the relations in each multi-hop path.
Semantic matching methods: Semantic matching models use similarity-based scoring functions that measure the plausibility of triples through matching implicit semantics of KGs embedded in continuous low-dimensional vector space. RESCAL [15] can capture KGs' implicit semantic information by associating each entity with a vector and each relation with a matrix to realize bilinear interactions. DistMult [16] is based on RESCAL [15], and each relation is restricted by a diagonal rather than a full matrix. DistMult can capture bilinear interactions between head and tail entities through the same embedding space and reduce training parameters. HolE [23] represents both entities and relations as vectors by utilizing the circular correlation operation based on RESCAL and DistMult. This allows the model to deal with irreflexive or similar relations in KGs.
In addition to the above methods, there are still many ways to conduct knowledge graph completion tasks, such as RUGE [20], which iteratively learns entity and relation embeddings from labeled triples, unlabeled triples and soft rules, or MADLINK [24], which considers contextual information as well as the textual descriptions of the entities to perform KGC.
Although the experimental results of the above models are all impressive, they only use direct relations or multi-hop paths, limiting the models' representational power. Translational methods are one of the most valuable methods for KGC. Thus, our work, while building upon translational methods, is distinguished based on the following properties: • Unlike most KGC methods that only embed single-form relations, MRE simultaneously embeds multiple relations and obtains multiple semantics of relations. • Different from current KGC methods, MRE considers an interactive process in the relation encoding phase to obtain accurate relation representations.

Background
Notations: Table 1 shows the important symbols of this article. This paper uses lower-case boldface letters to denote vectors (e.g., h) and boldface upper-case letters to represent matrices (e.g., W). We use ∥x∥ to denote the ℓ 2 norm.
Knowledge graph: For a given knowledge graph G= {E, R}, we represent entities by E and relations by R. A KG G includes many factual triples, and each triple is in the form of (h, r, t), with h,t ∈ E and r ∈ R.
Knowledge graph completion: Knowledge graph completion tasks can be carried out simply by predicting if a triple (h, r, t) is valid or not. In our work, valid triples obtain lower energy (scores) than invalid triples.
Multiple relations: Multiple relations consist of direct relations, multi-hop relations and rule-based relations. The direct relations are derived from the given knowledge graphs. The multi-hop and rulebased relations are extracted from the given knowledge graphs.

Proposed work
Knowledge graph completion aims to predict whether a triple is correct or not. Although much research has been devoted to KGC, most models cannot take advantage of the multiple relations and fail to capture multiple semantic information. This paper proposes a novel knowledge graph completion method called MRE to compensate for existing methods' deficiencies. We attempt to simultaneously embed multiple relations (direct, multi-hop and rule-based) in our model. The overall architecture of MRE is shown in Figure 2. First, we extract different kinds of relations from given triples (4.1). After vector initialization, we propose different encoders to capture multi-hop and rule-based semantics of relations (4.2). Furthermore, we embed multiple relations into the same semantic space based on the translational assumption (4.3). Finally, a joint training method is implemented for optimizing the objective of MRE (4.4). The following section shows the details of our work.

Compositions of multiple relations
In this paper, multiple relations include direct, multi-hop and rule-based relations. The direct relations come from given triples. The multi-hop and rule-based relations are derived from the following procedures.
Multi-hop relations: We follow the paths extraction procedure provided by PTransE [19] to extract multi-hop paths. In PTransE, each multi-hop path is extracted together with its reliability, which is achieved by the path-constraint resource allocation mechanism. The path-constraint resource allocation mechanism can extract different paths and compute a reliability metric for each path that flows from the head entity to the tail entity. A multi-hop path can be linked by a head entity, multi-hop relations, intermediate entities and a tail entity. For a given entity pair, we manually select the path with the largest reliability metric as the optimal path and limit the path length to 2. In this paper, we regard path-determined relations as multi-hop relations. Given a triple (h, r, t), a 2-hop path r 1 −→e−→r 2 can be generated from (h, r 1 , e) and (e, r 2 , t), where r 1 and r 2 are 2-hop relations, and e denotes an intemedioate entity.
Rule-based relations: Rule-based relations can be generated using the rule extraction tool AMIE+ [25] which extracts relations based on horn rules. A horn rule can be defined as r ⇐ (r 3 , r 4 ), where r 3 and r 4 denote rule-based relations. Each rule has a confidence level to measure the matching degree of the rule. The higher the confidence level is, the higher the matching degree of the rule. In this paper, we limit the length of rules to 2 and select the highest confidence level to extract rule-based relations.
For the relation '/film/film/country', there are two horn rules: Since the confidence level of the former is greater than that of the latter, we select the former as the rule-based relations, i.e., / f ilm/ f ilm/executive produced by and / f ilm/ f ilm/ f ilm f ormat are rule-based relations.

Encoders of multiple relations
Through corresponding relation extraction procedures, we can obtain multi-hop and rule-based relations. To make better use of multiple relations, we propose two encoders that can encode different kinds of relations and achieve interactions between entities and relations.

Multi-hop relation encoding
A multilayer perceptron (MLP) [26] is a simple feedforward neural network that can map a set of input vectors to output vectors and learn the best of the parameters of neural networks. To encode multi-hop relations, we use two MLP structures that load pre-trained embeddings as inputs and achieve multiplicative interactions between entities and relations. The encoder of multi-hop relations is shown in Figure 3. The specific process is as follows: where r 1 represents the first multi-hop relation embedding, and h represents a head entity embedding.

Rule-based relation encoding
Similarly, we use two MLP structures to obtain rule-based relation embeddings. The encoder of rule-based relations is shown in Figure 4. The specific process is as follows: where r 3 represents the first rule-based relation embedding.

Energy functions
TransE [12] is the most representative model in knowledge graph completion. Previous work shows that TransE and its extensions [13,14] can obtain very competitive results. We reconfirm that TransE is a robust model and apply this structure to our method. In TransE, relations can be regarded as translations between head entities and tail entities in implicit semantic embedding space. The energy function of TransE is defined as where h represents a head entity embedding, r represents a relation embedding, t represents a tail entity embedding, and E 1 is an energy function of given triples.
Next, we transfer the energy function of given triples to the multi-hop information, incorporating the multi-hop relations under the translational assumption. The energy function can be defined as where r ′′′ 12 represents an encoded multi-hop relation embedding. The energy function E 2 can be regarded as an additional constraint for the given triples.
Following the above energy function E 2 , we transfer the energy function of given triples to the rule-based information, incorporating the rule-based relations under the translational assumption. The energy function can be defined as where r ′′′ 34 represents an encoded rule-based relation embedding. The energy function E 3 can be regarded as another additional constraint for the given triples.
Therefore, the overall energy function of a triple can be defined as

Objective formalization
Based on the above energy functions, we propose a joint training method to take advantage of multiple relations and train our model. The overall loss function is defined as where L 1 denotes pairwise ranking loss, L 2 denotes error loss of multi-hop encoder, and L 3 denotes error loss of rule-based encoder. Given the positive triples G and negative triples G ′ constructed accordingly, we define the pairwise ranking loss as where γ is a margin parameter that separates positive and negative triples. Following TransE [12], negative triples can be generated by changing the head entity or the tail entity at random, i.e., Our method uses TransE to obtain pre-trained embeddings of entities and relations. In order to reduce the semantic errors of the encoders, we use pre-trained relation embeddings to supervise the encoded embeddings. The error loss function is defined as follows: where r pre denotes pre-trained relation embeddings. The central insight in developing MRE is as follows: Given three input embeddings, we view the KGC task as a predicting problem. To predict whether a triple is valid, MRE first selects multi-hop and rule-based relations through different extraction tools. Then, MRE encodes extracted relations by leveraging MLP structures. Next, MRE models the whole knowledge graphs based on the TransE. At last, the optimization objective is proposed to train MRE jointly.

Datasets
The statistics of the two KGC datasets evaluated in this paper are given in Table 2. FB15K-237 and NELL-995 are created from FB15K [12] and NELL [7]. FB15K comes from the large real-world knowledge base Freebase [6]. FB15K-237 contains 237 relations, 14,541 entities and 310,116 triples; and the approximate ratio of the train set, valid set and test set is 14:1:1. NELL-995 is a subset of NELL created from the 995th iteration of the construction. NELL-995 includes 75,492 entities, 200 relations and 154,208 triples.

Evaluation protocol
Following the convention, we employ mean reciprocal rank (MRR) and Hits@k as evaluation metrics. MRR is computed by the average reciprocal rank of correct entities. Higher MRR indicates better performance.
where | G | represents the total number of triples, and Rank(i) represents the ranking of the correct label of the i−th test triple. Hits@k computes the proportion of correct entities that appear within the top-k predictions. Higher Hits@k indicates better performance.
where | Rank-k | represents the number of correct labels ranking in the top-k. MRR and Hits@k scores always range from 0 to 1.

Experimental settings
We use pre-trained embeddings to initialize our model. To obtain entity and relation embeddings, we follow the traditional settings initially provided in TransE [12]. After training, we can obtain entity and relation embeddings with embedding size k = 100. For our proposed MRE, we employed the path extracted method provided in PTransE to extract multi-hop relations. We employed AMIE+ to extract rule-based relations. The length of multi-hop and rule are limited to 2. For our proposed MRE, we use Adam [29] and apply an L 2 norm for all the equations. The highest Hits@10 scores are obtained when using lr = 1e −4 , batch size b = 128 and γ = 8 on FB15K-237, and when using lr = 2e −3 , batch size b = 128 and γ = 8 on NELL-995.
• TransE [12] is the most representative translational model, which embeds components of KGs in vector space and makes learned entity and relation embeddings follow the translational principle h + r = t. • TransH [13] assigns each relation with a specific hyperplane and projects the head and tail entity onto the hyperplane. • TransR [14] is an extension of TransH which introduces relation-specific spaces to complete a KGC task. • PTransE [19] leverages multi-hop paths to complete knowledge graphs.
• RUGE [20] learns entity and relation representations through iterative guidance of soft rules.
• DMACM [18] captures directional information and the triple's inherent deep expressive characteristic using a CNN-based method. • ConE [27] is a hierarchical reasoning method that embeds entities into hyperbolic cones and then models relations as conversions between the cones. • MADLINK [24] incorporates path and contextual information in given knowledge graphs to learn embeddings. • HittER [28] can jointly learn entity and relation embeddings based on Transformer structure.
• MRGAT [30] proposes a multi-relational graph attention model to complete knowledge graphs.
The experimental results of different baselines on FB15K-237 and NELL-995 are listed in Table 3. From the results in Table 3, we have some observations. First, MRE consistently outperforms other baselines in MRR and Hits@10 on FB15K-237 and NELL-995. For the FB15K-237 dataset, MRE has achieved an improvement of 0.399 -0.373 = 0.026 (+7%) in MRR and 0.612 -0.558 = 0.054 (+9.7%) in Hits@10 when compared to the second-best results. For the NELL-995 dataset, MRE has achieved an improvement of 0.327 -0.318 = 0.009 (+2.8%) in MRR and 0.572 -0.437 = 0.135 (+31%) in Hits@10 when compared to the second-best results. This demonstrates the effectiveness of our proposed method and supports that embedding multiple relations in a unified semantic space is beneficial for knowledge graph completion. Second, we find that transitional methods [12][13][14]19] can achieve competitive results on both datasets. This indicates the usefulness of considering transitional properties in KGC methods. Our proposed MRE differs from these transitional methods in that MRE can capture semantic information of multiple relations while maintaining translation characteristics between entities and relations. Third, we observe that HittER [28] can obtain second-best MRR and Hits@10 on FB15K-237. However, HittER leverages a complex structure to embed knowledge graphs. Compared with HitER, MRE simply uses MLP structures to encode knowledge graphs. This shows the superiority of our method. In addition, MRE outperforms path-based models [19,24] and the ruleguided method [20] mainly because our model can take advantage of multiple relations and provide more semantic information to complete knowledge graphs. Table 3. Experimental results on FB15K-237 and NELL-995. MRR represents the mean reciprocal rank and Hits@10 represents the correct predictions appear within the top-10. The best result is in bold, while the second-best result is in underline.

Further evaluations
This paper proposes a multiple relation embedding method to make better use of different kinds of relations. In this section, we explore the impacts of margin, different numbers of relations and extra semantic information on our proposed MRE.
Effect of margin Evaluation results are shown in Table 4. From the table, we have two observations. First, we observe that, like most translational models [12][13][14]19], our method is also affected by the margin, i.e., as the margin value changes, the experimental results of MRE will also fluctuate. Second, MRE performs well when the margin γ = 6/8/10. This indicates that the setting of a reasonable margin value is helpful for the model to achieve good performance. Effect of different numbers of relations To explore the effect of different numbers of relations, we randomly select 9 relations in FB15K-237 and NELL-995 to perform further evaluations. These relations are listed in descending order of quantity as shown in Table 5. Here, our model is trained with all the triples and tested on every relation respectively. The performance of different numbers of relations on FB15K-237 and NELL-995 are shown in Tables 6 and 7. As can be seen from Table 6, for MRR, the relation '/people/person/profession' with the largest number of relations did not achieve the best results, and the relation with the least number of relations '/business/business operation/industry' did not obtain the worst result. As can be seen from Table 7, the relation 'concept:coachesinleague' which has the second smallest number of relations, achieves most of the best results in the NELL-995 dataset. From the above observations, we can conclude that the performance of MER has no obvious influence by the amount of training data. This suggests that MRE has the potential to achieve better experimental results even with a small number of relations, which benefits from the fact that MRE can simultaneously embed multiple relations. The multiple relation embedding method proposed in this paper can alleviate the sparsity of relations and provide extra semantic information for a smaller number of relations.   Effect of extra semantic information To analyze the parameter sensitivity of extra semantic information, we set a threshold Ω to change the overall energy. The changed energy function can be defined as E = E 1 + Ω (E 2 + E 3 ), where the larger Ω is, the richer the extra semantic information. The results are shown in Table 8. We can see that the MRR and Hits@k of MRE will increase with extra semantic information. This once again confirms that sufficient semantic information helps to improve the performance of the knowledge graph completion.

Visualization analysis
To understand and analyze semantic similarities among different relations, we visualize knowledge graph completion results of TransE [12] and MRE. We compute the similarities matrices of each two learned relation embeddings. Figures 5 and 6 show the semantic similarities of different relations on FB15K-237 and NELL-995, respectively. From the figures, we have three significant observations. First, heat maps from Figures 5 and 6 show an evident regularity for various relations in FB15K-237 and NELL-995, i.e., semantic similarities exist among various relations. For each two learned relation embeddings, the darker the heat map color is, the higher the similarity and the tighter the semantic associations. Second, not all relations have a high degree of semantic similarities. This indicates that TransE and MRE can obtain semantic differences among different relations. Third, compared with TransE, the discrimination degree of heat maps in MRE is more obvious, i.e., the similarity of some relations in the heat map will be higher. For example, the similarity between the relation '/film/film/genre' and the relation '/film/film/genre' in Figure 5 has increased from 0.37 to 0.47. This shows that our proposed model can learn more accurate semantic information than TransE. To showcase the process of completing knowledge graphs, we regard tail entity prediction as a simple question-answer system. Different tail entity prediction results are shown in Table 9. Given a query consisting of a head entity and a relation, the objective is to predict the golden answer (tail entity). Top-5 predictions are listed in Table 9, and we can compute the rank of the golden answer in candidate answers. It can be seen from Table 9 that our proposed MRE can generate a list of candidate answers and predict the golden answer.

Conclusions
In this paper, we proposed a new kind of knowledge graph completion approach called MRE to predict if a triple in a knowledge graph is valid or not. Unlike most methods that only consider the single form of relations in the embedding phase, we embedded multiple relations in a semantic space. Specifically, we introduced two relation encoders to capture semantic information of multi-hop and rule-based relations. These two encoders proposed in this paper can realize the interactions between connected entities and relations in the encoding phase. In addition, we defined corresponding energy functions for multi-hop and rule-based relations to obtain new representations. Compared with current KGC methods, MRE can better use multiple relations properties and provide additional semantic information for single-form relations.
In order to verify the effectiveness and superiority of our work, we conducted massive experiments on two widely used benchmarks. The experimental results showed that our method could effectively capture multiple semantics. Further evaluations demonstrated that our work could keep stable when the margin is within a reasonable range and alleviate the sparse existing in knowledge graphs. Visualization analysis showed the semantic similarities of different relations and the working principle of MRE.
As for future work, we plan to study follow-up open problems. (i) MRE performs the knowledge graph completion based on MLP. How can we advance it using sophisticated neural architectures such as capsule or graph neural networks? (ii) Existing studies have shown that there is still much additional information in the knowledge graphs that is not used, such as entity types and spatial information. How can we integrate useful information with MRE to advance knowledge graph completion?