Abstract

A knowledge graph is a collection of fact triples, a semantic network composed of nodes and edges. Link prediction from knowledge graphs is used to reason about missing parts of triples. Common knowledge graph link prediction models include translation models, semantics matching models, and neural network models. However, the translation models and semantic matching models have relatively simple structures and poor expressiveness. The neural network model can easily ignore the overall structural characteristics of triples and cannot capture the links between entities and relations in low-dimensional space. In response to the above problems, we propose a knowledge graph embedding model based on a relational memory network and convolutional neural network (RMCNN). We encode triple embedding vectors using a relational memory network and decode using a convolutional neural network. First, we will obtain entity and relation vectors by encoding the latent dependencies between entities and relations and some critical information and keeping the translation properties of triples. Then, we compose a matrix of head entity encoding embedding vector, relation encoding embedding vector, and tail entity embedding encoding vector as the input of the convolutional neural network. Finally, we use a convolutional neural network as the decoder and a dimension conversion strategy to improve the information interaction capability of entities and relations in more dimensions. Experiments show that our model achieves significant progress and outperforms existing models and methods on several metrics.

1. Introduction

The knowledge graph [1] is a structured semantic knowledge base, which is stored in the form of triples , where is a head entity, is a tail entity, and is the relation between them. Many large knowledge graphs, such as YAGO [2], Freebase [3], and DBpedia [4], use triples to store the entities and relations of the knowledge base. With the advent of the era of artificial intelligence, knowledge graphs have been heavily used, such as critical resources for intelligent applications such as intelligent question answering [5], web search [6], recommender system [7], and sentiment analysis [8, 9]. Figure 1 is an example of a simple knowledge graph.

Although knowledge graphs are widely used, the knowledge graphs are still incomplete; that is, it lacks a large number of effective triples. To make the content of the knowledge graph more complete concept of knowledge graph link prediction is valued by the majority of researchers. An excellent knowledge graph link prediction method is knowledge graph embedding [10]. Knowledge graph embedding aims to learn embedded representations of entities and relations and perform inference and prediction. Typical knowledge graph embedding models include the translation models [1114] and semantic matching models [1517], which are easy to train, simple and efficient. However, due to their simple structure, these two models capture fewer features than some deep models, which significantly limits their expressive power. Convolutional neural networks shine in the field of imagery and NLP [18] with their excellent feature extraction capabilities and performance. Recently, researchers have applied CNN to the field of KGE, and some CNN-based models [1922] have also achieved good results on most datasets. These models generate embedded representations by computing latent connections between entities and relations through convolutional neural networks’ powerful nonlinear feature extraction capabilities.

Translation models and semantic matching models have relatively simple structures. They only focus on triples’ structural information, cannot effectively infer complex semantic connections between entities and relations and perform poorly on datasets with complex relations. Mainstream neural network models cannot capture the connection between entities and relations in low-dimensional space and ignore the translation characteristics between triples. In order to solve the above problems, improve the efficiency of knowledge graph link prediction, increase the fitting ability of the model, and have better performance in dealing with complex relationships, we combine relational memory network and convolutional neural network to enhance the generalization ability of the model. The core of the relational memory network [23] is shown in Figure 2. Specifically, we add positional encoding to the input sequence of head entities, relations, and tail entities. We then use the Transformer self-attention mechanism [24] to interact with the memory matrix to produce encoded vectors. At the same time, in the convolutional decoder part, we propose a dimension conversion strategy, which dramatically increases the feature interaction of entities and relations in more dimensions. Experiments show that our model outperforms the baseline model on most metrics. In summary, the main contributions of this paper are as follows:(i)We propose a new knowledge graph embedding model (RMCNN), which uses relational memory networks to encode relations between relations and entities. It can effectively reason about the complex semantic relationships between entities and relations and capture the deep relation between entities and relation embedding vectors.(ii)We use a dimension conversion strategy on the encoded embedding matrix to increase the number of sliding steps of the convolution kernel and improve the information interaction capabilities of entities and relations in the triple in more dimensions.(iii)We use four datasets to evaluate the model results by link prediction task. The experiments show that our model has better prediction accuracy than other models.

We introduce the partial translation model in Section 2.1, the semantic matching model in Section 2.2, and the convolutional neural network model in Section 2.3. We compare the entity embedding representation with the relation embedding representation and the scoring function of some models in detail as shown in Table 1.

2.1. Translation Models

The TransE [11] model maps the head entity vector, the relation vector, and the tail entity vector to a low-dimensional dense vector space and regards the relation vector as a translation operation from the head entity vector to the tail entity vector. The TransE model has the advantages of fewer parameters and convenient calculation. It performs well on large-scale sparse knowledge graphs. The TransH [12] model defines a hyperplane for each relation. Two entities in the entity space are projected to the hyperplane through the relation mapping matrix. The TransR [13] model defines a relation r and the projection matrix of the relation and projects the entity from the entity space to the subspace of the relation r. The essence of TransR is to turn the projection vector into a projection matrix, the entity is represented by a vector, and a matrix represents the relation. The TransD [14] model adopts a dual vector design strategy for each entity or relation. Each entity and relation is represented by two vectors (meaning vector and projection vector), one representing its embedding and the other used to construct the projection matrix. The projection matrix used for each entity-relation pair is different, with head and tail entities projected separately. However, the translation model structure is too simplistic to capture the underlying connections between entities and relations.

2.2. Semantic Matching Models

RESCAL [15] is the first model to do knowledge graph embedding based on semantic matching, which uses tensor decomposition to build the model. The model represents entities as vectors and relations as matrices and proposes the first scoring function consisting of bilinear products. DistMult [16] improves on RESCAL by restricting its relational matrix to diagonal matrices. ComplEx (Complex Embedding) [17] introduces complex-valued embedding based on DistMult, and the embedding of entities and relations is no longer in the real-valued space but in the complex space. ANALOGY [25] extends RESCAL better to model the reasoning properties of entities and relations. It uses the same bilinear function as RESCAL as the triplet scoring function. RotatE (Rotation Embedding) [26], the main idea is to represent the entity as a complex vector, and the relation is regarded as a rotation from the head entity to the tail entity. However, although the semantic matching model is easy to train, it is straightforward to overfit due to its redundancy, which is a fatal disadvantage for embedding large knowledge graphs.

2.3. Convolutional Neural Network Models

The ConvE [19] model is the first model to use CNN to complete the knowledge graph. It reorganizes the head entity vector and the relation vector and combines them into a matrix as the input of the convolutional layer of CNN. ConvE uses different convolution kernels for convolution and outputs feature maps. It maps these feature maps to a vector and uses that vector to do a dot product with the tail entity to get the triple score. 1D convolution can only capture the interaction at the splicing of vectors. ConvE uses 2D convolution in the image domain to obtain more interactions than 1D convolution. However, 2D convolution can only capture part of the interaction, so the interaction between entities and relations is still insufficient. Therefore, to maximize the interaction between entities and relations, the researchers proposed the ConvR [21] model, which uses the embedding of the relation as a convolution filter and performs convolution operations on the embedding of the head entity, which can fully interact between the entity and the relation. the InteractE model focuses on how to increase the interaction between entities and relations. InteractE [27] mainly increases the interaction between entities and relations through feature replacement, rashape operations, and circular convolution. JointE [28] combines 1D and 2D convolutions to embed the knowledge map, where 1D convolution is used to obtain explicit knowledge and 2D convolution is used to obtain deep knowledge. However, these convolutional neural network models ignore triples’ translation properties and do not pay attention to the global features of triples.

3. Methods

This section introduces the symbols we use and their definitions in Section 3.1, our model framework in Section 3.2, and the loss function we use in Section 3.3.

3.1. Definition

The knowledge graph is a set of valid triples in the form of (head entity, relation, tail entity) expressed as . Among them, and , where E is the set of entities and R is the set of relations. We define to represent the embedding representation of the head entity, the relation, and the tail entity, respectively. We define as the scoring function. If the triple is valid, the corresponding score will be higher.

3.2. The Framework of the Proposed Model

The model structure of this paper is shown in Figure 3, mainly consists of two parts: the relational memory module and the convolutional neural network module. The relational memory module, which is composed of multilayer perceptrons and memory gates, encodes the potential dependencies and important parts of the information between entities and relations and forms a coded embedding vector. The convolutional neural network module needs to go through five processes, dimensional conversion, convolution operation, feature map vectorization, linear mapping, and dot product operation.

We believe that the relative positions of the head entity, relation, and tail entity are of great significance for reasoning about fact triples. Therefore, we add the corresponding position embedding codes to the head entity vector, relation vector, and tail entity vector. Given a triple , the vector representation of can be obtained as shown in the following equations:where represent the position encoding embedding vector of head entity, relation, and tail entity, is a projection weight matrix. Position coding is used to determine the potential semantic connection of entities and relations in the low-dimensional representation space. D represents the embedding dimension of entities and relations, N stands the size of memory.

In this paper, the memory matrix is defined as consisting of U rows and N columns, where each row represents a memory slot. In our research, we use to represent the memory matrix at time e, and to represent the i-th memory slot at time e. The attention mechanism in Transformer uses the multihead attention mechanism to update the vector to make the input vector interact with the memory matrix. We use to update according to the proposal made by the relational memory network, and effectively capture the potential dependencies between triples, where and is shown by the following equations:where represents the i-th memory slot at the e + 1-th time, represents c-th head of the multihead attention mechanism, C is the number of heads in the multihead attention mechanism, and ⊕ represents the splicing operation, which stitches the results of each head of the multihead attention mechanism. is a value projection matrix, in which is the head size and , is the weight value of the attention mechanism calculated by the softmax function, is the scalar value obtained by the dot product of the query matrix and the key matrix, as shown in the following equations:where and are the query projection matrix and the key projection matrix. In this paper, the residual network is connected between and to ensure its good performance, and the results of the residual network are fed to the multilayer perceptron and memory gating. Then, it generates N-dimensional encoded embedding vectors for time e and the next memory slots for time .

As a result, we obtain a sequence of 3 encoded vectors for the triple . We use a convolutional neural network and a matrix of encoded embedding vectors output by the relational memory network as the input of the convolutional neural network. RMCNN performs a dimension conversion strategy on matrix. Specifically, assuming that the vector dimension of each element in the triple is 100, using a convolution kernel of shape 3 × 3 will slide 98 times on the triple matrix of shape 100 × 3. The RMCNN model adopts a dimension conversion strategy, which can convert a 100 × 3 triple matrix into a 10 × 30 shape. Assuming that 3 × 3 convolution kernels are also used, the number of times each convolution kernel slides on the convolution kernel is 224, and the number of sliding times of the convolution kernel on the triple matrix increases significantly. Due to the triple matrix dimension conversion strategy, our model improves the information interaction ability of entities and relations in the triple matrix in more dimensions. Our specific dimension conversion strategy is shown in the following Figure 4.

The RMCNN model performs a dimension conversion strategy on the matrix to obtain the matrix, , . We use different 2D convolution kernels to convolve the matrix to extract the features. is used to represent the set of convolution filters , represents the number of convolution kernels. And, it is assumed that the dimension of the feature maps obtained by the convolution operation is . Our model combines these feature matrices and reshapes them into a vector . The vector is first multiplied by the weight matrix and mapped into the u-dimensional vector space, and then the dot product operation is performed with the weight vector to obtain the score of the triple. Therefore, our scoring function is defined as shown in the following equation:where represents convolution operation, × represents the multiplication operation of the matrix, · represents the dot product operation between vectors, represents the vectorization operation of the combined characteristic matrix, represents the activation function, represents the set of convolution kernels, is the projection weight matrix, and is the weight vector.

3.3. Loss Function

After we get the scoring function for the triples, the RMCNN model can calculate the score for each triple. Usually, vaild triples will get higher scores than invalid triples. The nonconvex relaxations usually achieve better performance than the convex case since the former can achieve a nearly unbiased solver [2931]. Therefore, we choose the log logistic regression function as our loss function. Furthermore, we employ the Adam optimizer to train our model by minimizing the following loss function:where and are the sets of valid and invalid triples, respectively. is generated by destroying valid triples in .

4. Experiment

In this section, we evaluate the performance of RMCNN. The experimental results show that our model has a good improvement in performance compared with the previous models. We use classic link prediction experiments to validate our model. In Section 4.1, we introduce the dataset used; in Section 4.2, we illustrate the hyperparameters used; in Section 4.3, we clarify our experimental metrics; in Section 4.4, we perform the empirical analysis; in Section 4.5, we conduct ablation experiments.

4.1. Datasets

We execute many experiments on link prediction tasks on the following benchmark datasets: YAGO3-10 [2], Kinship [32], FB15k-237 [27], and WN18RR [19]. The details of these datasets are shown in Table 2. Since there are many reversible relations in FB15k and WN18, it is easier to predict most triples, so we adopt FB15k-237 and WN18RR with the reversible relation removed. Kinship is a small dataset with kinship relations. YAGO3-10 is the largest of the four datasets and it is a subset of YAGO3.

4.2. Hyperparameters

In our experiments, we acquired the best accuracy on the validation set when using a single memory slot (i.e., U = 1). This paper sets the following: the number of heads in the multihead attention mechanism , the size of the head in the multihead attention mechanism , the number of layers of the multilayer perceptron , the number of convolution filter , the memory matrix size . To maximize the learning effect of our model learning parameters, we use Adam’s initial learning rate . The specific hyperparameters we use are shown in Table 3.

4.3. Evaluation Metrics

Link prediction predicts the relation between entities and entities that are missing triples in the knowledge graph. For example, given a triple , where the head entity is , the relation is , and the tail entity is missing, completing the triple, add to the triple.

In this study, we use standard metrics to evaluate our model, similar to previous work: mean reciprocal rank (MRR) and percentage of entering top k (Hit@k). MRR is the average of the reciprocal scores of predicted correct samples in all test samples. Hits@k refers to the proportion that the score of the predicted correct sample is higher than the k-th or equal to the k-th among all test samples. Given triples (h, r, t) in the test set, we use a scoring function to score them and randomly generated negative triples and sort their scores in descending order. The specific calculation steps are shown in the following equations:where denotes the number of triples and denotes the link prediction rank of the triple. is an indicator function (if the condition is true, the function value is 1. Otherwise, it is 0), and the value of generally takes 1, 3, or 10.

4.4. Analysis of Results

We demonstrate the performance of different models on four benchmark datasets and give further analysis. The results of our specific link predictions are shown in Tables 4 and 5, where the highest score is shown in bold and the second highest score is underlined. However, the semantic matching model is prone to overfitting, causing its performance to lag behind the convolutional neural network model. MRR is the ability of our model to correctly represent triple relations. The improvement in this metric indicates that our model is able to learn triple vectors well. On the WN18RR dataset, compared with ConvE, our model has a good improvement in various metrics, with MRR increasing by 10% and Hit@10 increasing by 3.8%. Compared with the best baseline model InteractE, MRR is improved by 1.2%, and Hit@10 is improved by 2.1%. On the FB15k-237 dataset, compared with InteractE, which also uses a convolutional neural network, RMCNN improves MRR by 1.4% and Hit@3 by 1.1%. InteractE also shows excellent performance on the FB15k-237 dataset with many relations and few entities. Even compared with the latest JoinE, our model has good advantages in two datasets.

In addition, we also adopt a large dataset YAGO3-10 and a smaller dataset, Kinship, to evaluate our model. We use two classic semantic matching models, DisMult, and ComplEx, and three typical convolutional neural network models, ConvE, HypER, and InteractE, as our baseline models. After experiments, our results are shown in Table 4. On the YAGO3-10 dataset, our model outperforms other models on all metrics, compared with InteractE, RMCNN achieves 1.5%, 1.9%, 2%, and 2.3% improvement on MRR, Hit@10, Hit@3, and Hit@1, respectively. We found that models based on convolutional neural networks outperformed semantic matching models due to the nonlinear nature of convolutional neural networks. To better verify the performance of our model, we also conduct experiments on a small dataset, Kinship. After comparison, our model performance far outperforms other baseline models. After comparison, our model performance far outperforms other baseline models. This also shows that our model can perform excellent modeling of knowledge graphs, whether it is a large dataset or a small dataset.

After the experimental results of the above four datasets, we can see that our model has surpassed the KGE models ConvE, InteractE, and JointE, which are also based on convolutional neural networks, in many metrics and have shown in various datasets. The excellent performance reflects the good robustness of our model.

4.5. Ablation Experiments

We adopt ablation experiments in order to prove the effectiveness of the relational memory network and dimension conversion strategy. Tables 6 and 7 show the results of our ablation experiments. RMCNN (RM) uses only a relational memory network; RMCNN (DC) uses only a dimensional conversion strategy. RMCNN (RM) achieves excellent performance using only the relational memory network, showing that the relational memory network can encode and remember latent dependencies between entities and relations well. The performance of RMCNN cannot be fully achieved using only the relational memory network, where MRR drops from 0.358 to 0.349 on FB15k-237, 0.473 to 0.463 on WN18RR, 0.557 to 0.521 on YAGO3-10, 0.872 to 0.854 on Kinship; Hit@10 drops from 0.535 to 0.529 on FB15k-237, and drop from 0.540 to 0.531 on WN18RR. Scrutinizing these changes, we can verify that our dimensional transformation strategy improves the interaction between entities and relations in more dimensions.

In conclusion, the results of our ablation experiments demonstrate that high performance can be achieved using only relational memory networks. However, its link prediction performance is still inferior to our RMCNN model. These experimental analyses demonstrate that the relational memory network encoding entity and relation embeddings significantly contribute significantly to the link prediction task. In contrast, the dimension conversion strategy that captures the interactions of entities and relations in more dimensions plays an auxiliary role. Therefore, only by combining the two can we fully grasp the potential links between entities and relations, improve the interaction between entities and relations, and obtain better link prediction capabilities.

5. Conclusion

This paper proposes a model based on relational memory networks and convolutional neural networks. The model uses the relational memory network to encode triples and uses the convolutional neural network to decode, which improves the efficiency of knowledge graph link prediction. Firstly, the relational memory network is used to encode the entity and relation vector, so as to fully retain the important information of entities and relations. Then, in the convolutional neural network decoding part, we use a dimensional conversion strategy to add interactions between entities and relations in more dimensions. A limitation of the current work is that the proposed neural network structure needs to be designed manually. In future work, we will consider using neural network architecture search methods to search for optimal convolutional neural network structures for a specific data set, which will be a worthwhile direction to explore.

Data Availability

The labeled data set used to support the findings of this study is available from the author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (Grant no. 2019YFB1404702).