1 Introduction

Knowledge graphs (KGs), a kind of knowledge base, such as WordNet [1], Freebase [2], YAGO [3], and DBpedia [4], store large amounts of structured data in the form of triplets (head_entity, relation, tail_entity), abbreviated as (h, r, t). KGs concisely represent entities and their relationships, providing an effective way to organize, manage and use a vast amount of information on the internet. Therefore, KGs have attracted much attention and play crucial roles in many deep learning domains, including recommender systems [5], question answering [6], and natural language processing [7]. Although knowledge graphs contain a large number of entities and relations, there are still many missing links between entities, which causes KGs to face the dilemma of incompleteness. To solve this problem, predicting missing relations or entities, known as knowledge graph completion or link prediction, is becoming a hot research topic. Knowledge graph embedding (KGE) encodes entities and relations as multidimensional vectors in space, which is considered an efficient link prediction method.

Distance-based KGE models determine the rationality of a triplet by measuring the distances among the head_entity, relation, and tail_entity. TransE heuristically embeds the positive triplets with the translation rule \(h + r \approx t\) [8], and can model multiple relation patterns, including antisymmetry, inversion, and composition. TransH [9] defines the relations on the hyperplane and enforces constraints by projecting the entities onto the hyperplane. In TransR [10], when two entities share similar meanings, they are expected to be in close proximity within the physical space. TransH [9] and TransR [10] can model 1-to-N and N-to-1 mapping properties well but cannot model inversion and composition. Inspired by the Euler formula \(e^{i\theta } = \cos \theta + i\sin \theta \), RotatE [11] assumes that each relation r is the rotation from the head_entity h to the tail_entity t. RotatE [11] is a promising model for encoding symmetry, antisymmetry, inverse, and composition relation patterns but has difficulties dealing with N-to-1 relations. These models are intuitive and mathematically interpretable, but their performance in simultaneously modelling four relation patterns, i.e., symmetry, antisymmetry, inverse, and composition, and four mapping properties, i.e., one-to-one, one-to-many, many-to-one, and many-to-many, is limited. Distance-based approaches focus on the conditions that should be met among the head_entity, relation, and tail_entity of a positive triplet while ignoring the correlation between entities that are not in the same triplet. We refer to the potential information stored in the knowledge graph structure that reflects the relation patterns and mapping properties as context information.

Fig. 1
figure 1

An example of a knowledge base and knowledge graph. The red dotted line indicates the potential relationship Winner’s alma mater

Figure 1 depicts a straightforward example of a knowledge base and a knowledge graph. As shown in Fig. 1b, the triplet (Albert Einstein, WinnerOf, Nobel Prize in Physics), and (Albert Einstein, GraduateFrom, University of Zurich) hold at the same time. Despite the fact that the entities Nobel Prize in Physics and University of Zurich are not directly connected, they are interconnected and may have a potential relationship, i.e., Winner’s alma mater. This hypothesis is based on the possible relationship between multihop connected entities in two different triplets. Thus, context information among the head_entity, relation, and tail_entity is indispensable for link prediction. However, most distance-based KGE methods are only committed to properly representing the connections between entities and relations in the positive triplets and do not account for context information when embedding entities and relations.

Therefore, to add contextual information to knowledge graph embeddings, we represent the head_entity and tail_entity as vectors consisting of two specialized parts, \( h =[h_r,h_c]\) and \(t =[t_r,t_c]\), respectively. The part associated with c is used to capture the contextual information, and the r-related part is used to build the connection between the entity and relation in the positive triplet. From Fig. 1c, we find that the relation connects the head and tail entities directionally. Therefore, the relation is represented as \(r =[r_h,r_t]\). \(r_h\) is specifically used to connect to the head_entity, and \(r_t\) is used to connect to the tail entity. The entity Alfred Kleiner shown in Fig. 1b is the head_entity in triplet (Alfred Kleiner, ProfessorOf, University of Zurich) and the tail_entity in triplet (Albert Einstein, SupervisedBy, Alfred Kleiner). This shows that relations can assign different roles to the same entity and have a critical effect on the transmission of context information. Motivated by latent associations among the triplet components, we opine that the embeddings of entities and relations should embody context information about related entities and relations. To achieve this goal, we devise an interaction for \(h_c, r_h\) and \(t_c, r_t\) to specifically learn context information. In addition, we design another interaction for \(h_r\), \(r_h\), \(r_t\), and \(t_r\), building associations between entities and relations in the positive triplets.

Table 1 Details of some knowledge graph embedding models

Herein, we propose a new KGE model named TCIE, which conducts interactions to build the connection between entity and relation in the positive triplet and capture the context information stored in the graph structure. The contributions of our paper are summarized as follows:

  1. 1.

    We propose a novel model for knowledge graph embedding, TCIE, which learns contextual information among the triplet components and establishes appropriate associations between entities and relations in positive triplets.

  2. 2.

    TCIE explicitly incorporates context information into the graph embeddings and shows a solid ability to model four relation patterns and mapping properties.

  3. 3.

    Extensive experiments are conducted to evaluate the performance of TCIE in terms of link prediction on six benchmark datasets: ogbl-wikikg2 [18], ogbl-biokg [18], FB15k [8], FB15k-237 [19], WN18 [8], and YAGO3-10 [15]. The experimental results show that with the help of triplet component interactions, TCIE obtains highly competitive results compared with those of state-of-the-art models.

2 Related Work

A knowledge graph is a directed heterogeneous multigraph \(G=(E, R, T)\), where E, R, and T are sets of entities, relations, and triplets, respectively. A triplet is denoted as (h, r, t) \(\subseteq \) \( E \times R \times E\), where h represents the head_entity, r represents the relation, and t represents the tail_entity. Given a knowledge graph, which is described as a list of fact triplets, the knowledge graph embedding method defines a score function \(f_r(h,t) \) to measure the plausibility of a triplet. Knowledge graph completion mainly predicts the tail of a given head and relation (h, r, ?) and the head of a given tail and relation (?, r, t).

Typical knowledge graph embedding methods properly encode entities and relations in vector space via the score function [20, 21]. Table 1 summarizes the score functions and parameters of some recent knowledge graph embedding models. We can roughly divide these models into three categories [22]: distance-based models, semantic matching models, and neural network models.

2.1 Distance-Based Models

Distance-based models develop distance-based score functions and usually measure the distance between two entities after a relation has been transformed [23]. TransE [8] was the first approach to use translation distance constraints, which assumes that entities and relations satisfy h + r \(\approx \) t. When the pattern of a relation is symmetric, its vector will be encoded by 0, thus, TransE unable to distinguish symmetric relations. TransH [9], which was proposed to compensate for this demerit of TransE [8], interprets a relation as a translation operation on a hyperplane and has advantages in modelling N-to-1, 1-to-N, and N-to-N mapping properties. In TransR [10], it is assumed that entities have multiple attributes, and various relations can focus on different attributes of entities. TransR models entities and relations in entity space and relation-specific entity space and transforms them into the corresponding relation space. TransR has more parameters than TransE and TransH. Because of its complexity, TransR is challenging to apply to large-scale knowledge graphs. TransD [24]simplifies TransR by setting up two projection matrices, which project the head and tail entities into the relational space. TransD considers the diversity of entities and the low complexity of the model, which is suitable for large-scale knowledge graphs.

RotatE [11] defines each relation as a rotation from the source entity to the target entity in a complex vector space and can model symmetry, asymmetry, inversion, and composition relation patterns. However, it does not perform well in N-to-1 relations. HAKE [25] considers that elements in the knowledge graphs belong to different hierarchical levels and captures semantic hierarchies by mapping entities into a polar coordinate system. PairRE [16] represents each relation with paired vectors and can deal with complex relations and more relation patterns. These translation-based models do not make significant use of context information, which is critical for modelling multivariate relationships. StructurE [17] uses a dual-interaction model to capture both relational structure-context information and edge structure-context information. To model complex relationships, StructurE defines different scoring functions for different relationships, but there is no unified scoring function, and a large number of parameters are needed.

2.2 Semantic Matching Models

Semantic matching models utilize similarity-based score functions. These models measure the plausibility of facts by matching the underlying semantics of entities and relations contained in a vector space representation. RESCAL [26] associates each entity with a vector to capture its potential semantics. Each relation is represented as a matrix that models pairwise interactions between potential factors. DistMult [12] simplifies RESCAL by limiting \(\hbox {M}_{\textrm{r}}\) to diagonal matrices. For each relation r, DistMult introduces an embedding vector r \(\in \) \({\mathbb {R}}^k\) and requires that \(\hbox {M}_{\textrm{r}} = \hbox {diag(r)}\). Nevertheless, only the interactions between the components of h and t in the same dimension are captured. HolE [14] combines the expressive power of RESCAL with the efficiency and simplicity of DistMult. ComplEx [13] extends DistMult by introducing complex-valued embeddings of entities and relations to better model antisymmetric relations. However, despite the significantly increased space and time complexity, ComplEx cannot model composition patterns. QuatE [27] is an extension of ComplEx in hypercomplex space and provides a better spatial interpretation. Semantic matching models can reflect the credibility of semantic information of triplets, but they have defects in encoding relation patterns [28].

2.3 Neural Network Models

Driven by deep learning and machine learning, neural network models for knowledge graph embedding are developing rapidly. ConvE [15] outputs vectors on an input feature map using 2D convolution filters and computes the triplet fraction through the inner product of the output vector and the embedded tail entities. HypER [29] fully convolves head entity embeddings using relation-specific convolutional filters generated by the hypernetwork. Since a hypernetwork is a fully connected network, the interactions and relationships between entities are increased at the expense of the parameters. Neural network models start from the distributed representation of entities and relations. Some utilize complex neural structures such as tensor networks (NTN [30]), graph convolution networks (SACN [31] and R-GCNs [32]), recurrent networks (RSNs [33]), and transformers (CoKE [34] and KG-BERT [35]) to learn richer representation. These neural network models achieve outstanding results but are opaque and lack interpretability.

3 Proposed Method

3.1 Motivation and Overview

Examples of knowledge bases and knowledge graphs are illustrated in Fig. 1. There are four triplets e.g., (Albert Einstein, WinnerOf, Nobel Prize in Physics) in Fig. 1b. To properly represent entities and their relationships during graph embedding, first, we need to reasonably construct the appropriate association among the head_entity, relation, and tail_entity in the positive triplet. Second, due to the potential relationships between indirectly related entities, for instance, the entities Nobel Prize in Physics and University of Zurich may have a potential relationship, i.e., Winner’s alma mater , such context information between triplet components can provide favourable evidence for link prediction. Naturally, we believe that the representations of entities and relations should not only simulate the association among hr,  and t in the positive triplet (h, r, t), but also contain context information stored in the graph structure.

Consequently, to address this challenge, we propose a targeted KGE model, TCIE, which represents entities and relations as vectors consisting of two parts, such as \( h =[h_r, h_c], r =[r_h, r_t], \) and \( t =[t_r, t_c]\), and conducts three interactions among them to establish appropriate associations between entities and relations in positive triplets and capture contextual information in the graph. For head_entity, \(\hbox {h}_{\textrm{c}}\circ \hbox {r}_{\textrm{t}} \approx \hbox {t}_{\textrm{c}}\); for relation, \(\hbox {h}_{\textrm{r}} \circ \hbox {r}_{\textrm{h}} \approx \hbox {t}_{\textrm{r}}\circ \hbox {r}_{\textrm{t}}\); for tail_entity, \(\hbox {t}_{\textrm{c}} \circ \hbox {r}_{\textrm{h}} \approx \hbox {h}_{\textrm{c}} \). An illustration of TCIE is shown in Fig. 2. A relation interaction is devised to build a connection among hr,  and t in the positive triplet, and the head_entity and tail_entity interactions are designed to capture the context information from the related entities and relations.

3.2 TCIE

To facilitate building connections in positive triplets and capturing contextual information, all entity and relation vectors are composed of two parts according to their attributes in knowledge graphs. For relations, head_entity and tail_entity are simultaneously and directly connected. Therefore, the embedding vector of the relation is expected to contain semantic information about the head_entity and tail_entity. Therefore, in TCIE, the embedding vectors of relations should be composed of two parts related to the head_entity and tail_entity, which are denoted as:

$$\begin{aligned} r =[r_h,r_t] \end{aligned}$$
(1)

As shown in Fig. 1c, \(tail\_entity_2\) is the tail of the triplet (\(head\_entity_1\), \(relation_2\), \(tail\_entity_2\)) and the head of the triplet (\(head\_entity_2, relation_4, tail\_entity_1\)). Therefore, the embedding vector of \(tail\_entity_2\) also contains the context information of these two linked triplets. In addition, \(tail\_entity_2\) directly connects \(relation_2\) and \(relation_4\), which indicates that the embedding vector of \(tail\_entity_2\) also contains information about relations. Hence, the embedding vector of tail_entity is designed to include the related relation information and context information, which is denoted as:

$$\begin{aligned} t =[t_r,t_c] \end{aligned}$$
(2)
Fig. 2
figure 2

Illustration of TCIE. For head_entity, the embedding vector of \(h_c\) will be close to the Hadamard product of \(r_h\) and \(t_c\) to learn contextual information of related entities and relations. For relation, the vector of \(h_r \) and \( r_h\) will be close to \(t_r\) and \(r_t\) after the Hadamard product to build the semantic connection between entities and relations. For tail_entity, the embedding vector of \(t_c \) will approach the Hadamard product of \( r_h\) and \(h_c\) to learn the context information. TCIE employs three interactions to build the association among the hr,  and t in the positive triplet and capture context information stored in the graph structure

Head_entity is also directly connected with relations and contains the information of related entities. Therefore, its embedding vector should contain two parts, the relevant relation and context information, and is denoted as:

$$\begin{aligned} h =[h_r,h_c] \end{aligned}$$
(3)

During knowledge graph embedding, to let the vectors of the entities and relations explicitly learn context information from the remaining triplet components, we design two interactions. \(\circ \) denotes the Hadamard product. For head_entities, \(h_c\) is devoted to learning context information from other tail_entities through \(r_h\). The interaction for the head_entity is represented as:

$$\begin{aligned} \hbox {h}_{\textrm{c}} \approx \hbox {r}_{\textrm{h}} \circ \hbox {t}_{\textrm{c}} \end{aligned}$$
(4)

For tail_entities, \(t_c\) is devised to learn context information from other head_entities through \(r_t\). The interaction for tail_entity is represented as:

$$\begin{aligned} \hbox {t}_{\textrm{c}} \approx \hbox {r}_{\textrm{t}} \circ \hbox {h}_{\textrm{c}} \end{aligned}$$
(5)

To build the proper connection among the head_entity, relation, and tail_entity in the positive triplet, we design another interaction for relations. \(r_h\) is designed to connect with head entities, and \(r_t\) is used to connect with tail entities. The relation is a bridge to connect the head entity and tail entity. Therefore, the interaction for the relation is represented as:

$$\begin{aligned} \hbox {h}_{\textrm{r}} \circ \hbox {r}_{\textrm{h}} \approx \hbox {t}_{\textrm{r}}\circ \hbox {r}_{\textrm{t}} \end{aligned}$$
(6)

TCIE makes the associated embeddings in triplet components close in geometric space to build associations in the positive triplets and capture context information. An illustration of TCIE is shown in Fig. 2. From Fig. 2a, for the head_entity, the embedding vector of \(h_c\) will be close to the Hadamard product of \(r_h\) and \(t_c\) to learn contextual information of related entities and relations. From Fig. 2b, for the relation, the vector of \(h_r \) and \( r_h\) will be close to \(t_r\) and \(r_t\) after the Hadamard product to build the semantic connections between entities and relations. From Fig. 2c, for the tail_entity, the embedding vector of \(t_c \) will approach the Hadamard product of \( r_h\) and \(h_c\) to learn the context information. These interactions are conducive to TCIE capturing context information and strengthening the triplets’ associations to better model relation patterns and mapping properties.

3.3 Score Function

We combine the interactions for head_entities, relations, and tail_entities as the score function of our KGE model to measure the plausibility of facts by calculating the Euclidean distance between vector components. The projection operation is the Hadamard product between these two vectors. The distance of two projected vectors will be computed as the plausibility of the triplet. In this paper, the \(L_1\)-norm is chosen to measure this distance. The score function is defined as follows:

$$\begin{aligned} \begin{aligned} f_r({h,t})=-\parallel h_r \circ r_h - t_r\circ r_t \parallel _{1}-\parallel h_c- r_h \circ t_c \parallel _{1}-\parallel t_c- r_t\circ h_c\parallel _{1}, \end{aligned} \end{aligned}$$
(7)

3.4 Loss Function

Knowledge graphs only contain positive triplets. Therefore, we need to randomly replace the head_entity or tail_entity of an existing triplet to construct negative triplets, known as negative sampling. The distances between positive samples should be shorter, and the distances between negative samples should be longer. Many negative sampling methods have been proposed [36,37,38], among which the self-adversarial negative sampling method [11] dynamically adjusts the weights of negative samples according to the scores during training. We use this self-adversarial negative sampling loss as the objective for training:

$$\begin{aligned} \begin{aligned} L=&-\log \sigma (\gamma -f_r(h,t)) \\&-\sum _{i=1}^n p(h^{\prime }_i,r,t^\prime _i)\log \sigma ( { f_r(h^{\prime }_i,t^\prime _i)}-\gamma ), \end{aligned} \end{aligned}$$
(8)

where \(\sigma \) is the sigmoid function and \(\gamma \) is a fixed margin. \((h^{\prime }_i,r,t^\prime _i)\) is the \(i^{th}\) negative triplet and \(p(h^{\prime }_i,r,t^\prime _i)\) represents the weight of this negative sample. \(p(h^{\prime }_i,r,t^\prime _i)\) is defined as follows:

$$\begin{aligned} {p((h^{\prime }_i,r,t^\prime _i) \mid (h,r,t))= \frac{\exp f_r(h^{\prime }_i,t^\prime _i)}{\sum _j \exp f_r(h^{\prime }_j,t^\prime _j)}}, \end{aligned}$$
(9)

3.5 Proofs of Modeling Relation Pattern

We deduce the conditions for our TCIE to be able to model the four relation patterns. The main results are as follows:

Proposition 1

Our model can encode symmetry/antisymmetry relation patterns.

Proof

If \((h_1,r_1,t_1)\) \(\in \) T and \((t_1,r_1,h_1)\) \(\in \) T, we have

$$\begin{aligned}&{\left\{ \begin{array}{ll} h_{1r} \circ r_{1h} = t_{1r}\circ r_{1t}\\ h_{1c}\circ r_{1t} = t_{1c}\\ t_{1c} \circ r_{1h}=h_{1c}\\ \end{array}\right. } \end{aligned}$$
(10)
$$\begin{aligned}&{\left\{ \begin{array}{ll} t_{1r} \circ r_{1h} = h_{1r}\circ r_{1t}\\ t_{1c}\circ r_{1t} = h_{1c}\\ h_{1c} \circ r_{1h}=t_{1c}\\ \end{array}\right. } \end{aligned}$$
(11)
$$\begin{aligned}&\Rightarrow {\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t}\wedge t_{1r}\circ r_{1h} = h_{1r}\circ r_{1t} \\ h_{1c}\circ r_{1t} = t_{1c}\wedge t_{1c}\circ r_{1t} = h_{1c}\\ t_{1c}\circ r_{1h}=h_{1c}\wedge h_{1c}\circ r_{1h}=t_{1c}\\ \end{array}\right. }\nonumber \\&\Rightarrow {r_{1h}}^2={r_{1t}}^2=1 \end{aligned}$$
(12)

If \(r_1\) satisfies the symmetry relation pattern, we need \( {r_{1\,h}}^2={r_{1t}}^2=1.\)

If \((h_1,r_1,t_1)\) \(\in \) T and \((t_1,r_1,h_1)\) \(\notin \) T, we have

$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t}\\ h_{1c}\circ r_{1t} = t_{1c}\\ t_{1c}\circ r_{1h}=h_{1c} \\ \end{array}\right. } \end{aligned}$$
(13)
$$\begin{aligned}{} & {} {\left\{ \begin{array}{ll} t_{1r}\circ r_{1h} \ne h_{1r}\circ r_{1t}\\ t_{1c}\circ r_{1t} \ne h_{1c}\\ h_{1c}\circ r_{1h} \ne t_{1c}\\ \end{array}\right. } \end{aligned}$$
(14)
$$\begin{aligned}{} & {} \Rightarrow {\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t}\wedge t_{1r} \circ r_{1h}\ne h_{1r}\circ r_{1t}\\ h_{1c}\circ r_{1t} = t_{1c}\wedge t_{1c}\circ r_{1t} \ne h_{1c}\\ t_{1c} \circ r_{1h}=h_{1c}\wedge h_{1c}\circ r_{1h} \ne t_{1c}\\ \end{array}\right. }\nonumber \\{} & {} \Rightarrow {r_{1h}}^2\ne {r_{1t}}^2\ne 1 \end{aligned}$$
(15)

If \(r_1\) satisfies the antisymmetry relation pattern, we need \( {r_{1\,h}}^2 \ne {r_{1t}}^2\ne 1.\) \(\square \)

Proposition 2

Our model can encode inverse relation pattern.

Proof

If \((h_1,r_1,t_1)\) \(\in \) T and \((t_1,r_2,h_1)\) \(\in \) T we have

$$\begin{aligned}&{\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t}\\ h_{1c}\circ r_{1t} = t_{1c}\\ t_{1c} \circ r_{1h}=h_{1c}\\ \end{array}\right. } \end{aligned}$$
(16)
$$\begin{aligned}&{\left\{ \begin{array}{ll} t_{1r}\circ r_{2h} = h_{1r}\circ r_{2t},\\ t_{1c}\circ r_{2t} = h_{1c}),\\ h_{1c}\circ r_{2h}=t_{1c})\\ \end{array}\right. } \end{aligned}$$
(17)
$$\begin{aligned}&\Rightarrow {\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t}\wedge t_{1r} \circ r_{2h} = h_{1r}\circ r_{2t}\\ h_{1c}\circ r_{1t} = t_{1c} \wedge t_{1c}\circ r_{2t} = h_{1c}\\ t_{1c} \circ r_{1h}=h_{1c} \wedge h_{1c} \circ r_{2h}=t_{1c}\\ \end{array}\right. }\nonumber \\&\Rightarrow {r_{1h}}\circ {r_{2h}= r_{1t}}\circ {r_{2t} =1} \end{aligned}$$
(18)

If \(r_1, r_2\) satisfy the inverse relation, we need \( r_{1\,h} \circ r_{2\,h}=r_{1t} \circ r_{2t} =1.\) \(\square \)

Proposition 3

Our model can encode composition relation pattern.

Proof

If \((h_1,r_1,t_1)\) \(\in \) T, \((t_1,r_2,t_2)\) \(\in \) T, and \((h_1,r_3,t_2)\) \(\in \) T we have

$$\begin{aligned}&{\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t}\\ h_{1c}\circ r_{1t} = t_{1c}\\ t_{1c}\circ r_{1h} = h_{1c}\\ \end{array}\right. } \end{aligned}$$
(19)
$$\begin{aligned}&{\left\{ \begin{array}{ll} t_{1r}\circ r_{2h} = t_{2r}\circ r_{2t}\\ t_{1c}\circ r_{2t} = t_{2c}\\ t_{2c}\circ r_{2h}=t_{1c}\\ \end{array}\right. } \end{aligned}$$
(20)
$$\begin{aligned}&{\left\{ \begin{array}{ll} h_{1r}\circ r_{3h} = t_{2r}\circ r_{3t}\\ h_{1c}\circ r_{3t} = t_{2c} \\ t_{2c}\circ r_{3h}=h_{1c}\\ \end{array}\right. } \end{aligned}$$
(21)
$$\begin{aligned}&\Rightarrow {\left\{ \begin{array}{ll} h_{1r}\circ r_{1h} = t_{1r}\circ r_{1t} \wedge t_{1r} \circ r_{2h} = t_{2r}\circ r_{2t}\\ \wedge h_{1r} \circ r_{3h} = t_{2r}\circ r_{3t}\\ h_{1c}\circ r_{1t} = t_{1c}\wedge t_{1c}\circ r_{2t} = t_{2c}\wedge h_{1c}\circ r_{3t} = t_{2c}\\ t_{1c}\circ r_{1h}=h_{1c}\wedge t_{2c} \circ r_{2h}=t_{1c} \wedge t_{2c}\circ r_{3h}=h_{1c}\\ \end{array}\right. }\nonumber \\&\Rightarrow {r_{1h}}\circ {r_{2h}= r_{3h}}, r_{1t}\circ r_{2t}= r_{3t} \end{aligned}$$
(22)

If \(r_1, r_2, \) and \( r_3\) satisfy the composition relation pattern, we need \( {r_{1\,h}}\circ {r_{2\,h}= r_{3\,h}}, r_{1t}\circ r_{2t} = r_{3t}.\) \(\square \)

Table 2 Statistical information of the datasets used in experiments

4 Experiments

4.1 Datasets

Six commonly used standard datasets are used in our link prediction experiments: ogbl-wikikg2 [18], ogbl-biokg [18], FB15k [8], FB15k-237 [19], WN18 [8], and YAGO3-10 [15]. The statistical information of these datasets is summarized in Table 2.

ogbl-wikikg2 is extracted from the Wikidata knowledge base [39]. The main challenge for this dataset is complex relations.

ogbl-biokg contains data from a large number of biomedical data repositories. The main challenge for this dataset is symmetry relations.

FB15k is a subset of Freebase [2], and the main relation patterns are inverse, symmetric and antisymmetric.

FB15k-237 is a subset of FB15k and has no inverse relations. The challenge of link prediction on FB15k-237 is how to handle composition patterns.

WN18 is extracted from WordNet [1], the main challenges of link prediction on WN18 are to model inversion and symmetry relations.

YAGO3-10 is a YAGO3 sample proposed by [15]. This dataset was built by identifying all entities with at least ten different relations in the KG and extracting all the corresponding facts.

4.2 Evaluation Protocol

We use the mean rank (MR), mean reciprocal rank (MRR), and Hits@N (the proportion of correct entities ranked in the top N, where N = 1, 3, and 10) as the evaluation metrics. The lower the MR is, the better, while the higher the MRR and Hits@N are, the better. For ogbl-wikikg2 and ogbl-biokg, we follow the settings in the work by [18], using the average of multiple experiments to reflect the volatility of the results. For the other datasets, we follow the mainstream practice to select the best results for comparison. The embedding dimension is the same for hr, and t in our model. The embedding size of the vector components \( h_r, h_c, r_h, r_t, t_r\), and \(t_c\) is half that of hr, and t.

Through the grid search method, hyperparameters are selected according to the performance of the model on the validation dataset. We set the range of the hyperparameter as follows: temperature of sampling a \(\in \) { 0.5, 1.0 }, embedding size k \(\in \) { 100, 200, 500, 1500, 2000, 2500 }, batch size b \(\in \) { 512, 1024 }, fixed margin \(\gamma \in \) { 3, 4, 5, 6, 7, 8, 12 }, \(\delta \in \) { 0.2, 0.4, 0.6, 0.8, 0.9 }, and number of negative samples for each observed triplet n \(\in \) { 128, 256, 512, 1024 }. We add two additional coefficients to the score function at training, i.e. \(f_r({h,t})= -\lambda _1\parallel h_r \circ r_h - t_r\circ r_t \parallel _{1}-\lambda _2\parallel h_c - r_h \circ t_c \parallel _{1}-\lambda _2\parallel t_c - r_t\circ h_c\parallel _{1} \), where \(\lambda _{1},\lambda _{2} \) \(\in \) [0, 1], \(\lambda _{1}+2\lambda _{2}=1 \). The optimal configurations of our model are:

  • a=1.0, k=100, b=1024, n=128, \(\lambda _{1}\)=0.6, \(\lambda _{2}\)=0.2, \(\gamma \)=4 on ogbl-wikikg2;

  • a=1.0, k=200, b=1024, n=128, \(\lambda _{1}\)=0.6, \(\lambda _{2}\)=0.2, \(\gamma \)=6 on ogbl-wikikg2;

  • a=1.0, k=400, b=1024, n=128, \(\lambda _{1}\)=0.4, \(\lambda _{2}\)=0.3, \(\gamma \)=12 on ogbl-biokg;

  • a=1.0, k=2000, b=1024, n=128, \(\lambda _{1}\)=0.4, \(\lambda _{2}\)=0.3, \(\gamma \)=12 on ogbl-biokg;

  • a=1.0, k=2500, b=1024, n=256, \(\lambda _{1}\)=0.9, \(\lambda _{2}\)=0.05, \(\gamma \)=17 on FB15k;

  • a=1.0, k=1500, b=1024, n=256, \(\lambda _{1}\)=0.8, \(\lambda _{2}\)=0.1, \(\gamma \)=4 on FB15k-237;

  • a=0.5, k=500, b=512, n=1024, \(\lambda _{1}\)=0.9, \(\lambda _{2}\)=0.05, \(\gamma \)=8 on WN18;

  • a=1.0, k=500, b=1024, n=512, \(\lambda _{1}\)=0.8, \(\lambda _{2}\)=0.1, \(\gamma \)=17 on YAGO3-10.

Table 3 Link prediction results on ogbl-wikikg2 and ogbl-biokg
Table 4 Link prediction results on FB15k and FB15k-237
Table 5 Link prediction results on WN18 and YAGO3-10

4.3 Main Results

The experimental results of knowledge graph completion on the Open Graph Benchmark [18] datasets are shown in Table 3. To incidentally explore the effect of dimensionality on the model performance, we used two embedding sizes for all models. TCIE outperforms all baselines significantly on two large datasets, ogbl-wikikg2 and ogbl-biokg. Compared with the test MMRs obtained using PairRE, our model obviously improves the test MRRs by 23% on ogbl-wikikg2 (dimension 100), 18% on ogbl-wikikg2 (dimension 200), and 0.67% on ogbl-biokg (dimension 2000). On the ogbl-wikikg2 and ogbl-biokg2 dataset, TCIE performs best using both finite and increased embedding sizes. As the dimension increases, the test MRR improvement using TCIE weakens, indicating that our model does not need many high dimensions to achieve competitive results.

Fig. 3
figure 3

Histograms of relation embeddings for different relation patterns. \(r_{1}\) is relation _similar_to. \(r_{2}\) is relation _part_of. \(r_{3}\) is relation _hyponym. \(r_{4}\) is relation _hypernym. \(r_{5}\) is relation /location/location/adjoin_s./location/adjoining_relationship/adjoins. \(r_{6}\) is relation /location/statistical_region/gdp_real./measurement_unit/adjusted_money_value/adjustment_currency. \(r_{7}\) is relation /location/statistical_region/gdp_nominal./measurement_unit/dated_money_value/currency

Table 4 shows the experimental results of link prediction on the FB15k and FB15k-237 datasets. Compared with the results of recent competitive models, TCIE shows clear improvements on FB15k-237 for all evaluation metrics. On FB15k, our model outperforms all baselines in terms of almost all metrics. One exception is that TransH [9] performs better than TCIE in terms of the MR. The relations in FB15k and FB15k-237 are mainly inversion and composition patterns. Thus, TCIE is more suitable for capturing context information in such relationships.

Table 5 displays the experimental results of link prediction on WN18 and YAGO3-10. The major relations in WN18 are inversion and symmetry patterns. TCIE achieves comparable results to those of the state-of-the-art models and scores the highest in terms of Hit@1 and Hit@3. These comparisons demonstrate the strong ability of our model to encode inverse and symmetry relations. YAGO3-10 has 37 kinds of relations, among which the triplets involved in isAffiliatedTo and playsFor account for 35% and 30%, respectively, in the training set. This may be because their (subject, object) pairs basically overlap, resulting in nonoptimal results with our model.

4.4 Model Analysis

4.4.1 Analysis of the Relation Patterns

The propositions in Sect. 3.5 prove that TCIE can model multiple relationships. We investigate the relation embeddings of different patterns (500 dimensions on WN18 and 1500 dimensions on FB15k-237), the histograms are shown in Fig. 3.

Symmetry Pattern. Figure 3a and b show a symmetry relation \(r_1\) _similar_to from WN18. We can see that most elements in Fig. 3a are close to or equal to 0, and the absolute values of the \(r_{1t}\) elements are very close to 1 in Fig. 3b. The embeddings of \(r_1\) basically satisfy \( {r_{1\,h}}^2={r_{1t}}^2=1\)

Antisymmetry Pattern. Figure 3c and d show an antisymmetry relation \(r_2\) _part_of from WN18. We observe that the elements in Fig. 3c are not concentrated at approximately 0, and the value of \(r_{2t}\) elements is not close to 1 in Fig. 3d. These results demonstrate that the embeddings of \(r_2\) satisfy \( {r_{2\,h}}^2 \ne {r_{2t}}^2 \ne 1\).

Inverse Pattern. Figure 3e and f show the inverse relation \(r_3\) _hyponym and \(r_4\) _hypernym from WN18. Most elements in Fig. 3e are approximately 0, and the Hadamard products of \(r_{3t}\) and \(r_{4t}\) are much closer to 1 in Fig. 3f. These results indicate that the embeddings of \(r_3\) and \(r_4\) nearly satisfy \( r_{3\,h} \circ r_{4\,h}=r_{3t} \circ r_{4t}=1\).

Composition Pattern. Figure 3g and h show the composition relation \(r_5\) /location/location/adjoin_s./location/adjoining_relationship/adjoins, \(r_6\) /location/statistical_region/gdp_real./measurement_unit/adjusted_money_value/adjustment_currency, and \(r_7\) /location/statistical_region/gdp_nominal./measurement_unit/dated_money_value/currency from FB15k-237. Most elements in Fig. 3g and h are close to or equal to 0, which shows that these three relations are close to satisfying \( {r_{5\,h}}\circ {r_{6\,h}= r_{7\,h}}, r_{5t}\circ r_{6t}=r_{7t}\).

Table 6 The detailed link prediction results by relation category on FB15k-237

4.4.2 Analysis of the Complex Relations

We study the performance of TCIE in different relation categories. Table 6 summarizes the precise results by relation category on FB15k-237, which shows that our model achieves highly competitive performance on different mapping properties. TCIE is capable of modeling complex mapping properties and performs pretty well on 1-to-N, N-to-1, and N-to-N relations.

4.4.3 Analysis of Embedding Dimension

The embedding dimension will impact the performance of KGE on the knowledge graph. We further conduct experiments to explore the influence of the embedding dimension on TCIE. On the ogbl-wikikg2 dataset, for TCIE in all embedding sizes, \(\lambda _{1}\)=0.6, \(\lambda _{2}\)=0.2 and \(\gamma \)=4. On FB15k-237, for TCIE in all embedding sizes, \(\lambda _{1}\)=0.8, \(\lambda _{2}\)=0.1 and \(\gamma \)=4. As shown in Fig. 4a, with the increase in dimensions in the ogbl-wikig2 dataset, the MRR increases and then remains unchanged. The same law is found in Fig. 4b on FB15k-237, and when the dimension increases to 2000, the MRR even decreases. Thus, we have learned that the improvements brought by adding dimensions are limited in link prediction. The embedding dimension in Fig. 4 is the entity dimension. PairRE’s relation dimension is twice that of the entity. In TCIE, the dimensions of entities and relations are the same. Our model needs a lower relation dimension compared with that need for PairRE but achieves better performance.

Fig. 4
figure 4

Results of different embedding dimensions on ogbl-wikikg2 and FB15k-237

4.4.4 Analysis of the Score Function

We perform an ablation study on combinations of score functions. We set \(score_1\) = \(-\parallel h_r \circ r_h - t_r\circ r_t \parallel _{1}\), \(score_2\) = \(-\parallel h_c - r_h \circ t_c \parallel _{1}\), \(score_3\) = \(-\parallel t_c - r_t\circ h_c \parallel _{1}\). The \(score_1\) + \(score_2\) + \(score_3\) = \(-\lambda _1\parallel h_r \circ r_h - t_r\circ r_t \parallel _{1}-\lambda _2\parallel h_c - r_h \circ t_c \parallel _{1}-\lambda _2\parallel t_c - r_t\circ h_c\parallel _{1}\). For ogbl-wikikg2, \(\lambda _1\)=0.6, \(\lambda _2\)=0.2. For FB15k-237, \(\lambda _1\)=0.8, \(\lambda _2\)=0.1. As shown in Table 7, better results are achieved using \(score_1\) than using \(score_2\) or \(score_3\); however, the lowest MRR is achieved using \(score_3\). These results indicate that the interactions initiated by the relation are more useful than those of the head and tail; more information may be learned from the interactions initiated by the head than those initiated by the tail. The best result is achieved with our score function (\(score_1\) + \(score_2\) + \(score_3\)) obtains the best results. This proves that aggregating the interactions among the head, relation, and tail will help capture more triplet semantic information and obtain better knowledge representations.

Table 7 Ablation experiment results on FB15K-237 and ogbl-wikikg2

4.4.5 Analysis of the Parameters

The number of parameters of the baseline models is shown in Table 8. TCIE and TransE require the same number of parameters. However, the performance of TCIE is significantly improved. Our model requires less parameters than those required by the recent state-of-the-art model StructurE [17] and performs better on FB15k and FB15k-237.

Table 8 Parameter statistics of baselines and TCIE, where \(n_e\) denotes the number of entities and \(n_r\) and denotes the number of relations, d is the embedding dimension

4.4.6 Analysis of the Entity Embeddings

To explore the distribution of entity embeddings in vector space, we project the learned entity embeddings from TransE [8], PairRE [16], and TCIE into a 3D space by t-SNE [43]. As shown in Fig. 5, points with the same colour are related to the same relation. We can observe that points of the same colour tend to be grouped together in TransE, PairRE, and TCIE. However, the points in TransE and PairRE are more dispersed than those in TCIE. In Fig. 5c, all the points are clustered together and very dense. This demonstrates that the distance between entities is closer through the interactions. A closer distance indicates that the context information has been learned and that the entity embeddings have similarities. This will help to strengthen the connections between entities and infer potential relationships in link prediction.

Fig. 5
figure 5

3D scatter plot of entity embedding on FB15k-237. Each color represents a relation and each point represents an entity

5 Conclusion

The contextual information between the triplet components reflects the correlation between entities that are not directly connected and is essential for inferring potential relations between entities. To explicitly account for this context information in knowledge graph embedding, we propose a new model, TCIE, which designs three interactions for entity and relation representations consisting of two specialized parts. These interactions are used separately for building semantic connections between entities and relations in the positive triplets and capturing contextual information. In addition, we design a unified score function to combine the interactions for semantic connections and contextual information. With these two key factors, TCIE shows a strong capacity for modelling four relation patterns and four mapping properties. Compared with distance-based KGE models, TCIE achieves state-of-the-art results on multiple standard datasets in link prediction.

In future work, we plan to study the following problems. (1) We will integrate the general semantic information in language models into TCIE. (2) We intend to extend the TCIE to KGE downstream tasks, such as graph-to-text generation and complex question answering over knowledge bases.