Knowledge reasoning with multiple relational paths

Knowledge reasoning technology can infer new knowledge based on the existing entity relationship information in the knowledge graph to complement the knowledge graph. In the existing knowledge reasoning methods based on multi-step relationship paths, the contributions of entities and relationships in the relationship paths are not distinguished, nor are multiple relationship paths comprehensively considered. To solve these problems, we propose a knowledge reasoning model based on Bi-directional Short-Term Memory and Attention Mechanism. Firstly, the neighborhood, entity semantic category and relationship path information are integrated in the knowledge graph, and the Neighborhood Semantic Path Network model (NSPN) is constructed to obtain a hybrid representation of the multi-step relationship path. Secondly, the loss function and training process of NSPN are introduced. Finally, comparative experiments are carried out on the public dataset, and the results show that our model is superior to other models in relation prediction tasks and link prediction tasks, improves the computational efficiency and data sparsity, and provides a new idea for knowledge reasoning methods.


Introduction
Knowledge graph is a structured semantic knowledge base that uses visualisation technology to describe knowledge resources and their carriers, which can mine, analyze, construct, draw and display knowledge and their interrelationships.Large-scale knowledge graphs must be given certain reasoning capabilities to serve many downstream applications.At the same time, knowledge reasoning is also a major method of knowledge graph completion.Therefore, knowledge reasoning is an essential technology in the process of building an excellent large-scale knowledge graph.It has practical importance in various real-world applications, ranging from friendship recommendation, knowledge graph completion, target advertising, and protein-protein interaction prediction.
Knowledge reasoning refers to discovering new associations between entities from the existing entities and relationships in the knowledge graph through computer reasoning, thereby complementing and enriching the knowledge graph.The traditional knowledge reasoning method needs to be instantiated whether it is a rule or an ontology constraint at the abstract level, and the computability is poor.For the knowledge graph with a large number of instances, the cost is too high, and there are also some difficulties in migration and processing sparse samples.Therefore, knowledge reasoning based on knowledge graph has developed a unique knowledge reasoning technology (Sai-Ping et al., 2018).
In the knowledge reasoning method of knowledge graph, the low-dimensional vector representation is obtained based on representation learning and neural network model.It not only contains the semantic association information of entity relations, but also models the path information of multi-step relationship in the knowledge graph.These methods can significantly improve computational efficiency and better handle data sparsity.However, the current knowledge reasoning model uniformly handles the entity relationships included in the multi-step relation path, which will miss the importance of some entities or relationships.Besides, for the same pair of entities, there may be multiple multi-step relation paths between them.The information contained in the path may also vary.In order to solve these problems, this paper proposes an improved knowledge reasoning model based on knowledge representation learning, using Bi-directional Long Short-Term Memory (Bi-LSTM) and attention mechanism to develop a model of multi-step relation path in the knowledge graphs.Different weights are assigned to different entities and relationships in each path considering the situation of multiple paths, so that the final low-dimensional vector representation can effectively complete the knowledge graph.

Related work
Knowledge reasoning method is to infer new knowledge or judge the right or wrong of knowledge from existing knowledge.Traditional knowledge reasoning methods can be divided into deductive reasoning, inductive reasoning and default reasoning.As early as 1935, Gentzen (1935) ) proposed the natural deductive reasoning method, which formalised the reasoning process as the proof process of mathematical logic rules.Inductive reasoning can be traced back to the general inductive reasoning theory founded by Solomonoff (1964), which can induct general conclusions from enough facts, and is an observationbased prediction theory and reasoning process.Default reasoning was formally proposed by Reiter (1980).This reasoning method is mainly used for reasoning in the case of incomplete knowledge by assuming that some conditions are correct.Because the knowledge graph has the characteristics of extensive content, a large number of triples, and data sparsity, the traditional knowledge reasoning method is too computationally complex and expensive to perform knowledge reasoning in the knowledge graph.
With the development of machine learning and data mining technology, knowledge reasoning methods for knowledge graphs are gradually enriching.They can be roughly divided into ontology-based reasoning, rule-based reasoning and multi-step relational path-based reasoning (Ji et al., 2022).Ontology describes concepts and the relation between concepts in the domain.Ontology-based knowledge reasoning uses the semantics and logic contained in ontology to reason about entity types and relationships in different entities.Common methods include methods based on Tableaux operations (Cohen, 2016), methods based on logic programming rewriting (Wang et al., 2013(Wang et al., , 2015)), methods based on first-order query rewriting (Lifschitz et al., 2008), and methods based on production rules (Alonzo, 1943).The representative one is the methods based on Tableaux operation, which is mainly used to check the satisfiability of an ontology and detect instances.However, for a knowledge graph with a large number of instances, it is necessary to instantiate rules and ontology constraints at the abstract level, which is poor computationally and expensive.Also, when the data is noisy, the extracted features are misleading (Ji et al., 2022).
Rule-based reasoning conducts mining and reasoning by defining or learning the rules existing in knowledge.Typical methods are the association rule mining algorithms AMIE (Galárraga et al., 2013) and AMIE+ (Galarraga et al., 2015) based on incomplete knowledge bases.Effectively learn high-confidence rules from large-scale knowledge graphs, and use them for knowledge reasoning tasks.The AMIE algorithm learns the rules for predicting each relationship in turn.For a relationship, starting from the rule with an empty rule body, the rule body is expanded by adding dangling edges, instance edges, and closed edges, and retaining candidates whose support is greater than the threshold rule.Wei et al. (2021) proposed a DegreEmbed model that combined embedding-based learning and logic rule mining for inferring on knowledge graphs to predict missing links in heterogeneous knowledge graphs that involve entities and relations of various types from the perspective of the degrees of nodes.Chen et al. (2021) proposed a Topology-Aware CorrelaTions (TACT) approach for inductive reasoning, believed that the semantic correlation between two relations is highly correlated to their topological structure in knowledge graphs, categorised all relation pairs into several topological patterns, and built a Relational Correlation Network to learn the importance of the different patterns for inductive link prediction.These algorithms have strong interpretability and can automatically discover inference rules.However, due to the large search space, it has shortcomings such as low coverage of generated rules and poor prediction effect of the final model.
Knowledge reasoning methods based on multi-step relational paths are mainly divided into two categories: One is based on Path Ranking Algorithm [PRA], and the other is based on low-dimensional vector spaces.PRA is a knowledge reasoning algorithm that converts multi-step relational paths into features.This algorithm corresponds the obtained relational paths in the knowledge graph to Horn clauses, converts the calculated path features into logical rules, aims to discover and understand hidden knowledge.Lao and Cohen (2010) proposed the PRA algorithm for the first time, using paths as features to predict the relationship between entities in the knowledge graph.The generation of paths in PRA is obtained by traversal, and the calculation cost is too high in large-scale knowledge graphs.Zhao et al. (2014) extended PRA and proposed a knowledge reasoning method CSRI based on content and structure information to predict the target entity that has a similar relationship with a known entity.Gardner et al. (2014a) proposed a random walk algorithm based on vector space similarity, which combines the entity relationship in the knowledge graph with a large corpus for knowledge reasoning, and integrates vector space similarity into random walks on knowledge graphs to reduce feature sparsity caused by using text corpora.Afterwards, Neelakantan et al. (2015) proposed a multi-step relational path reasoning method combining RNN + PRA, using the RNN to take the vector of the binary relationship contained in the path in the knowledge graph as input to capture the implicit semantics in the path.The method is not only capable of inferring relational paths never seen during training, but also predicts new relations not seen during training with a single high-capacity RNN.Zhao et al. (2021) proposed a hierarchical relational attention-oriented reasoning mode to aggregate the information of multi-hop neighbors and obtain a nodeembedding representation.The model relieved over-smoothing to a certain extent.The rule features used in PRA-based multi-step relational path reasoning are mainly transitive rules.Although filtering mechanisms can be used to screen these rules when digging them, rules with low information content and little effect will inevitably be introduced.Algorithms that adaptively mine data sets are difficult to guarantee the reliability and interpretability of rules.
The multi-step relation path reasoning based on low-dimensional vector space is represented by vectorised knowledge graph.Different from knowledge representation learning, the knowledge reasoning method introduces the information of multi-step relation path in the process of building the model.Based on the TransE model by Guu et al. (2015), regarding the low-dimensional vector space model as the traversal operation of the relationship, we directly model the intermediate steps of the path, and regularise the distribution of the entity vector space.On the basis of TransE, Lin et al. (2015) added the constraints of relational paths, proposed the PTransE model, and modeled path information through the combined operation of relations.Later, Lin et al. (2016) proposed a combined learning model RPE based on relational path representation.Unlike the PTransE model, RPE projects each entity to the corresponding relational space and path space through the relational mapping matrix and path mapping matrix at the same time.Relationship or path vectors are viewed as translations between projected entity vector representations.For the first time, Feng et al. (2016) utilises more information in knowledge graphs to learn vector representations of entities and relationships based on GAKE, a graph-aware knowledge reasoning model.GAKE introduces three types of information, neighbor context, path context and edge context, to reflect knowledge characteristics from different perspectives, and at the same time, an attention mechanism is designed to learn entities or relations with important contributions.Singh and Lakshmanan (2021) presented a popularity, interests, location used hidden Naive Bayesian-based model for link prediction in dynamic social networks by considering behavioral controlling elements like relationship network structure, nodes' attributes, location-based information of nodes, nodes' popularity, users' interests, and learning the evolution pattern of these factors in the networks.
In the knowledge reasoning methods of KG, low-dimension representations from representation learning-based NN models can not only contain semantic relation information entity relations but also can model the multi-step path information that can gain significant improvement in computational efficiency and better sparsity in the data processing.However, existing knowledge reasoning models may lose some entity or relational importance during the unified processing of entity relationships in the multi-step relation path.Meanwhile, for the same pair of entities, there can be multiple multi-step paths and these paths may contain arbitrary information.Therefore, we propose a knowledge reasoning model using Neighborhood Semantic Path Network (NSPN).This method models the multi-step paths and assigns different weights to different entities and relations in every path.The low dimensional representation obtained using our method can eventually complete the KG effectively.

Neighborhood semantic path network
Neighborhood Semantic Path Network (NSPN) is a knowledge reasoning model based on Bi-directional Long Short-Term Memory (Bi-LSTM) and Attention Mechanism.It is mainly divided into three layers.The first layer is the embedding layer, which trains the entity and relationship vectors containing the semantic information of the neighborhood.The second layer is the Bi-LSTM layer, which models the semantic connections before and after the multi-step relationship path.The third layer is the attention mechanism layer which distinguishes the different contributions of different entities and relationships that appear in a path, and it is used to weight the relative importance of different paths to the reasoning task when multiple paths appear.

Multi-hop relational paths
The direct or indirect relationship attributes between entities and entities are clearly described in the triple data of the knowledge graph.These relationship attributes constitute one or more paths between two entities, and using these relation paths information can help knowledge reasoning achieve better results.In the knowledge graph, this paper defines a multi-step path as a sequence of entity relationships, i.e. p = [e 1 , r 1 , e 2 , r 2 , ... , r l−1 , e l ], where (e n , r n , e n + 1 ) is the nth triple in the multi-step relation path p, and l represents the number of triples in the relation path p.
In the actual knowledge graph, due to the rich entity relationship data, in the process of knowledge reasoning, there may be multiple multi-step relation paths between two entities.As shown in Figure 1, when inferring the relation r in a possible triple (Xiao Ming, r, Landscape), the following multi-step relation paths may exist: These paths start from the same character category entity "Xiao Ming" and arrive at the same song category entity "Landscape", representing a variety of different path information, and implying the different interpretations relationship r in a variety of different semantic combination pairs (Xiao Ming, r, Landscape).For example, path p 1 and p 3 can infer that Xiao Ming may prefer the songs of singer Zhou Shen and the songs of the album Shen of Shen; path p 2 reflects a collaborative filtering effect, people with similar behaviors tend to have similar hobbies.Therefore, in the process of knowledge reasoning, we should not only pay attention to the roles of different entities and relationships in the path, but also use all multistep relation paths between two entities to learn path representations.In this way, for each existing relation path, a path vector p can be finally obtained.Thus, when all multi-step relation paths are given different weights, they are combined to express a mixed path vector v p to reason possible relationships between two entities.
For a given entity h and a target entity t, there is a multi-step relation path set P(h, t) = {p 1 , p 2 , ... , p K } between them, including multi-step relation paths.Referring to the form of triplets, this paper defines them as path triplets (h, P, t).The task of comprehensively utilising these multi-step relational paths to obtain the mixed path vector v p can be summarised as Eq. ( 1): where, f represents a latent model with parameter , and v p represents a possible mixed path vector representation between (h, t) predicted by the path set P (h, t) and the model f .

Embedding layer and Bi-LSTM layer
To model multi-step relational paths in knowledge graphs, a Recurrent Neural Network RNN (Medsker & Jain, 2001), can be used.RNN is the expansion of a sequence in time, because its network structure can handle the correlation between time series data.The model structure is shown in Figure 2, where x represents the input layer, H represents the hidden layer, O represents the output layer, and there is a loop operation on the hidden layer H, U, V, and W are linear relationship parameters, RNN passes the weight matrix W to realise the dependencies between hidden layers.With the research and application of RNN, some issues have been found in many practical problems that RNN has shortcomings such as gradient disappearance, gradient explosion and poor ability to model long-range dependency information.Therefore, the long short-term memory network LSTM is introduced.LSTMs are able to memorise long-range dependencies in data sequences, thereby modeling long-term dependencies.In knowledge reasoning techniques, such long-term sequential patterns are crucial for predictions between entities and relations in knowledge graphs using multi-step relation paths.
LSTM is an improvement of RNN.It is similar to RNN in the main structure.The main difference is that three gate-control structures are added to the hidden layer H, namely the input gate, output gate and forget gate, and a cell state is also added.state.The structure principle of LSTM hidden layer is shown in Figure 3. c n−1 and H n−1 represent the cell state and hidden layer state at time n-1, respectively.x n represents the input at time n, and H n represents the hidden layer state, c n represents the cell state, a n represents the temporary cell state of the feature extraction process, f n represents the forget gate, i n represents the memory gate, and o n represents the output gate.The calculation process of LSTM can be summarised as: by forgetting the information in the cell state and memorising new information, the useful information for the next moment can be passed on, while the useless information is discarded, and the hidden layers give outputs at each time point.
The calculation steps are as follows:  In order to model a multi-step relational path p = [e 1 , r 1 , e 2 , r 2 , ... , r l−1 , e l ], and obtain its vector representation.The neighborhood semantic path network NSPN proposed in this paper concatenate current vectors of entity e n and relation r n as input vectors, as shown in Eq. ( 8).
where, ⊕ represents the connection operation, and e n and r n are obtained by training the Transe-NSE model proposed by Chapter 3 of this paper.For the last entity e l in path p, fill the end of the path with an empty relation.For the nth triplet (e n , r n , e n + 1 ) in the path p, use the hidden layer state H n of the LSTM network at time n to correspond to it, the combined vector representation of e n , r n in relation path p = [e 1 , r 1 , e 2 , r 2 , ... , r l−1 , e l ] is the state vector H n output by the hidden layer of the LSTM network at time n.Therefore, the input vector not only contains order information, but also contains neighborhood semantic information and relationship information between two entities.Using the feature that LSTM can better model long-term semantic relationship dependencies, LSTM can obtain the vector representation of entities and relationships in path p.However, for the path p 2 in Figure 1, the same entity "big fish" contains the relationship in two different directions, and an LSTM network can only model information in one direction of a path at the same time, so for this case, NSPN uses a bidirectional long short-term memory network Bi-LSTM to construct multi-step relational path vectors.The structure of the Bi-LSTM network (Plank et al., 2016) is divided into two independent LSTMs.The input sequence is input to the two LSTM networks in positive and reverse order respectively for feature extraction, and the extracted feature vectors, namely the output vectors, are spliced to form a final state vector.The design concept of the Bi-LSTM network is to make the final feature vector obtained at a certain moment have both the information of the past and the future.It just corresponds to the two directions in the multi-step relation path in the knowledge graph, which are the relationship and the inverse relationship.The overall structure of the Bi-LSTM network is shown in Figure 4.

Attention layer
After the Bi-LSTM layer, a combined vector of entity relationships in a multi-step relation path can be obtained.This combined vector is the hidden layer state vector output by the Bi-LSTM network at the corresponding moment.This vector contains bidirectional long-term dependency information in the path.However, when using path information for knowledge reasoning, different entities and relations may make different contributions in a multi-step relation path.Therefore, the knowledge reasoning model NSPN proposed in this paper introduces an attention mechanism on top of the Bi-LSTM layer.
The earliest application of attention mechanism (Vaswani et al., 2017a) in natural language processing was the machine translation tasks (Bahdanau et al., 2014).When using the attention mechanism to calculate the important content of a set of inputs Y = [y 1 , y 2 , ... y m ], a query vector q related to the task to be solved is first required, and then the correlation between the query vector q and each input y i is calculated through a scoring function s (y, q) to obtain a corresponding score.Then the softmax function is used to normalise these values.The normalised result is the attention of the query vector q on each input y i Force distribution a = [a 1 , a 2 , ... , a m ]; Finally, according to the obtained attention distribution, useful information can be selectively extracted from the input to obtain the vector context.The common extraction method is to weight and sum the input vectors according to the attention distribution.In application, the scoring function s (y, q) has various forms: • Bilinear Model: s(y, q) = y T Wq The parameters W, U, and v are all learnable parameter matrices or vectors, and D is the dimension of the input vector.The additive model introduces learnable parameters, and maps the query vector q and the original vector y to different vector spaces before scoring.
Compared with the additive model, the dot product model has better computational efficiency, but when the dimension of the input vector is relatively high, the dot product model will have a large variance, resulting in a small gradient of the softmax function.In order to improve this problem, the scaling dot model smoothes the score value by dividing square root of one input vector dimension, equivalent to smoothing the final attention distribution.The bilinear model can be reconstructed as s(y, q) = y T Wq = y T (U T V)q = (Uy) T Vq.
It calculates the dot product after the linear transformation for the query vector q and the original input vector y.Compared with the dot product model, the bilinear model introduces asymmetry in calculating the attention weights.
For a multi-step relation path p = [e 1 , r 1 , e 2 , r 2 , ... , r l−1 , e l ], the entity relationship combination vector is obtained in the Bi-LSTM layer, that is, each state vector H = [H 1 , H 2 , . . ., H l ] output by the hidden layer is obtained.The third layer in the NSPN model is the attention layer, which takes H = [H 1 , H 2 , . . ., H l ] as the input of the attention layer, and e l as the query vector q.Output a = [a 1 , a 2 , . . ., a l ] is calculated by the scoring function s (H, q) and the softmax function.Finally, it gets the path vector p of the fusion attention mechanism, that is, the context.The specific process is shown in the model frame diagram, and the calculation process is shown in Eq. ( 9) and Eq. ( 10).
The scoring function s (H, q) selects the bilinear model.The bidirectional semantic association information in a path is constructed through the embedding layer and the Bi-LSTM layer, and then the weighted relation path vector p is obtained by the attention layer.In the knowledge graph, there may be multiple relation paths between the entities that want to perform knowledge reasoning and the target entity.Figure 1 indicates that the content reflected by multiple relation paths is different.Therefore, it is necessary to comprehensively consider the effect between multiple relation paths in knowledge reasoning.For a set P(h, t) = {p 1 , p 2 , . . ., p K } containing K multi-step relation paths, define the corresponding set of relation path vectors as P(h, t) = {p 1 , p 2 , . . ., p K }, where the vector representations are obtained from the attention layer.Then, the final representation of the mixed path vector v p can be the average of all relational path vectors, the formula is as follows: Although the mixed path vector can be obtained through the above operations, previous studies (Sun et al., 2018;Yu et al., 2014) show that when using relational paths comprehensively for knowledge reasoning, the contribution of different multi-step relation paths to the final prediction results is different.However, Eq. ( 11) fails to show the different importance of each path.Referring to the work of Das et al. (2016) and Chen et al. (2017), the NSPN model performs one more attention weighting operation on multiple multi-step relational path vectors after the attention layer.Take P = {p 1 , p 2 , . . ., p K } as input, e l as query vector q, calculate b = [b 1 , b 2 , . . ., b K ] through scoring function s (p, q) and softmax function, the calculation process is shown in Eq. ( 12).
The scoring function s (p, q) selects the bilinear model.Finally, the weighted summation of each path vector is performed to obtain the mixed path vector v p with the fusion of attention mechanism.The formula is as follows: In practice, when it is necessary to make full use of all relational paths in the knowledge graph for knowledge reasoning, the workload is too large and not feasible.In particular, as the number of relational paths increases, the total length of relational paths grows exponentially, even generating millions of connections.Therefore, the NSPN model introduces a path number threshold θ.If there are multiple multi-step relation paths between two entities, the NSPN model only models θ paths at most, i.e.K ≤ θ.

Model training
Referring to the score function of the TransE model based on the displacement distance, the idea of the knowledge reasoning model NSPN proposed in this paper is: in the knowledge graph, there should be similarity between the head-tail entities in the multi-step relation path and the mixed path composed of multiple relation paths, reflected in the embedding space, that is, the sum of the head entity vector and the mixed path vector in the multistep relational path should be equal to the tail entity vector.Therefore, a function f p (h, t) to measure the similarity of inference can be constructed, and the formula is as follows: where, h and t are the vector representations of the first entity and the last entity in the multi-step relation path, which are both initialised by the vector training obtained by the knowledge representation learning model TransE-NSE proposed in Chapter 3; v p is the mixed path vector.In order to be able to obtain the optimal parameters of the model and the vector representations of entities and relations, we train the model by minimising the similarity function f p (h, t).
Referring to the TransE model, this paper uses the following margin-based loss function L to train the NSPN model: where, G p represents the set of all path triples that exist in the knowledge graph; G p represents the set of all path triples that do not exist in the knowledge graph, and randomly replaces the real path triples (h, P, t) in the entity or relationship generates (h, P , t); γ > 0 is a margin hyperparameter; [x] + = max{0, x}.During the training process, the NSPN model uses the SGD algorithm and the RMSProp adaptive learning rate to minimise the loss function L , thereby learning the parameters and the entity relationship vector in the model.

Model description
More and more research work is devoted to the construction of large knowledge graphs, such as Freebase, DBPedia, NELL, YAGO, and China's OpenKG.CN.These knowledge graphs contain rich data information, but they are still incomplete and need to be supplemented by knowledge graphs.Knowledge reasoning is an important knowledge graph completion technology.In previous knowledge reasoning research, many studies have paid attention to the importance of multi-step relation paths.Multi-step relationship is an entity transitive constraint.Using multi-step relation paths for knowledge reasoning can model indirect relationships between entities, integrate more information into the model.Through training and learning, it is ultimately included in the entity relationship vector.However, there are still some problems that need to be improved: While modeling a multi-step relation  bidirectional long-term dependency information in the path through the vector input by the embedding layer, and uses the state vector of each hidden layer as the output.The third layer is the attention layer, which uses the query vector and the scoring function to obtain the attention distribution through the output vector of the Bi-LSTM network, and then performs weighted summation to obtain the path vector of the fusion attention mechanism; finally, weights the multiple relational path vectors again to obtain hybrid path vector representation.

Experiments and analysis
In order to verify the performance of the NSPN model, this section conducts comparative experiments on the link prediction task and relation prediction task in knowledge reasoning through the FB15 K and FB40 K datasets and analyzes the results.

Dataset description
The FB15 K and FB40 K datasets are extracted from Freebase, a typical large-scale knowledge graph, and were first used by the PTransE model (Sun et al., 2011) to verify the effect of knowledge reasoning.There are many relationship types in the two datasets, which are suitable for extracting multi-step relation paths between two entities for modeling.To verify the generalisation and practicability of the model, the practical data set Konect/Online social network/Google + (NIPS) (http://konect.cc/networks/ego-gplus/),small-scale dataset CN3lFamily (Chen et al., 2016) and Kinships (Kemp et al., 2006) are selected.Google + (NIPS) is a directed network with Google + user-user links, in which a node represents a user, and a directed edge denotes that one user has the other user in his circles.Since the dataset has a huge amount of data, we selected some of them for the experiments.The specific information of the datasets is shown in Table 1.

Experiment tasks and evaluation indicators
The experiments are mainly divided into two tasks: relationship prediction task and link prediction task.Link prediction task (Bordes et al., 2013) refers to predicting the tail entity or head entity for a triplet given the relationship and head entity or the relationship and tail entity.The relationship prediction task was first introduced into the knowledge reasoning evaluation task by Lin et al. (2015) to predict the relationship between the given head entity and tail entity.There are raw and filtered cases.The main evaluation indicators are MAP (Mean Average Precision) and MRR (Mean Reciprocal Range).The calculation details of each evaluation indicator are as follows: (1) MAP refers to the mean value of the Average Precision of the triplet in the test set.The formula is shown in formula ( 16).
where |S| represents the number of triples in the test set, and Average Precision is defined as: where N q represents the number of candidate results related to prediction q, and P(k) represents the precision up to k in the sorted sequence of candidate results, r(k) represents the change of recall between two adjacent candidate results k and k + 1.
(2) MRR refers to the mean value of the reciprocal of the predicted results in the test set triplet.That is, if the first result is the correct answer in the ranking of candidate results sorted by the score function value, the score is 1; if the second result is the answer, the score is 0.5, and so on.The definition is shown in formula (18).
where, |S| represents the number of triples in the test set, and rank q represents the rank of correct answers in a triple prediction result sequence.

Path selection and parameter setting
During the experiment, if there are multiple multi-step relation paths between two entities, the model in this paper selects the path number threshold θ = 5 during training for the experiment.According to previous research (Wang et al., 2018;Gardner et al., 2014b), truncating the selected path with a certain length and ignoring the relationship that is too far away enough to model the relation path information.According to the paper (Wang et al., 2018), the path will introduce noise entities when the length is greater than 6.Thus, this paper selects all suitable relation paths in the knowledge graph, and the length of each path is up to 6.For relation prediction task and link prediction task, on both datasets, the score function of l 1 or l 2 norm is used on both datasets.The initial RMSProp learning rateη ∈ {0.001, 0.005, 0.01, 0.1} is set, the vector space dimension k ∈ {50, 100} is set, and the margin hyperparameter γ ∈ {1, 2, 5, 10} is set.To find the optimal parameters of the model, a grid search algorithm is used to select the optimal parameters on the NSPN model by monitoring the MRR and MAP after each training round on the validation set.In the link prediction task, the optimal parameters of the NSPN model are configured as a score function with l 2 norm, η = 0.001, γ = 1, k = 100.In the relation prediction task, the optimal parameter configuration of the NSPN model is a score function with l 2 norm, η = 0.005, γ = 2, k = 100.To verify the effectiveness of the NSPN model, this paper selects the most classic path selection algorithm PRA (Path Ranking Algorithm), the RNN + PRA algorithm improved by adding the RNN neural network, and the knowledge reasoning model PTransE fused with path information modeling as the comparison algorithms.We also adopted non-neural network models and advanced knowledge reasoning models TACT (Chen et al., 2021), R-GCN (Schlichtkrull et al., 2018), andTransformer (Vaswani et al., 2017b).The implementation of R-GCN and Transformer utilises the contributions in PyKEEN (Ali et al., 2021).Comparative experiments were performed on the two datasets.

Relationship prediction task
The experimental results of the four methods on the relationship prediction task on the FB15 K and FB40 K datasets are shown in Table 2.The bold font indicates that the index result is the best, and the underlined result is the second best.It can be seen that the experimental data of the NSPN model proposed in this paper is the best in all evaluation indicators.In particular, in the "Filtered" case, for the two experimental datasets, the NSPN model outperforms the PTransE model by 0.054, 6.26% and 0.055, 3.75% in MRR and MAP, respectively.It can be seen that modeling on entities and relationships in multi-step relation paths through Bi-LSTM neural network and attention mechanism, and comprehensively considering the mixed representation of multiple paths can improve the accuracy of relationship prediction in knowledge reasoning.

Link prediction task
The experimental results of the four methods for link prediction tasks on FB15 K and FB40 K data sets are shown in Table 3.It can be seen that, similar to the results of the relationship prediction task, the experimental data of the NSPN model proposed in the paper is the best for all evaluation indicators.In the Filtered case, for the two experimental data sets, NSPN model is 0.031, 8.7% higher than PTransE model in MRR and MAP, and 0.048, 7.32% higher than PTransE model.Especially on the FB15 K dataset, the MAP value of NSPN is 8.7% higher than that of PTransE.Analyzing the details of the dataset and the information in the training process, it is found that coexistence of multiple relation paths is frequent in the FB15 K. Therefore, considering the mixed representation of multiple paths in the knowledge graph can improve the effect of link prediction significantly.
To compare the application performance of the NSPN model and other advanced models, we selected four datasets, FB15 K, CN3lFamily, Kinships and Google + (NIPS) for the experiment of link prediction in the Filtered case.In Table 4, the experimental results are shown that the NSPN model has a more advanced performance on the dataset FB15 K, which is 0.027 and 1.17% higher than the second model in MRR and MAP respectively.In comparison with small-scale datasets, it is found that MAP of NSPN model on CN3lFamily is the best result of all models, while most models on Kinships do not perform well and the best one is the non-neural network model TACT.We analyze the possible reasons: the number of entity relationships in Kinships datasets is small, and the path information that neural networks can learn is limited, which reduces the performance of the model.In the experimental results of Google + (NIPS), the practical dataset of online social networks, it is found that the MRR and MAP of the NSPN model are both optimal, and the MAP is 2.62% higher than that of the Transformer model.This shows that the NSPN model can make full use of multiple path information and attention mechanisms for knowledge reasoning, can predict the relationship between most users, and it has a good application in the actual scenario of online social networks.

Ablation experiment
To analyze whether the attention layer and the attention mechanism of the mixed representation of multiple paths in the NSPN model proposed in this paper are effective, this section selects the FB15 K data set to conduct ablation experiments on the relationship prediction and link prediction tasks of the NSPN model, and analyze the experimental results.analyze.During this experiment, the attention layer in the NSPN model was removed, and the outputs of the Bi-LSTM layer were added and averaged to represent a multi-step relation path, and the mixed representation of multiple paths was not considered.This model configuration Denoted as NoAttentionNoMixture.For the NSPN model, the mixed representation of multiple relational paths is not considered, and the configuration of other models remains unchanged, denoted as NoMixture.
Tables 5 and 6 describe the experimental results of the three model configurations on the relation prediction task and the link prediction task, respectively, on the FB15 K dataset.It can be seen that compared with NoAttentionNoMixture, the results of using NoMixture configuration are 0.132 and 0.178 higher in MRR, 12.51% and 15.61% higher in MAP in relation prediction task.Besides, 0.118 higher in MRR in link prediction task and 0.127, 12.18% and 10.71% higher on MAP.This result shows that in knowledge reasoning technology, it is necessary to distinguish different contributions of different entities and relationships in a multi-step relation path, and the attention mechanism can better establish the different contributions of entities and relationships in the path.
From Tables 5 and 6, it can be found that the fully configured NSPN model is 0.108 and 0.076 higher on MRR and 6.71% and 6.51% higher on MAP in relation prediction task compared to the results with NoMixture configuration.In the link prediction task, the MRR is 0.035 and 0.032 higher, and the MAP is 4.01% and 6.08% higher.This result indicates that considering multiple relation paths in the knowledge graph comprehensively, the mixed representation of relation paths is helpful for knowledge reasoning.

Time performance analysis
To compare the time complexity of the model intuitively, we adopted the experimental running time of the model as an indicator.As shown in Figure 6, the experimental running time of models PTransE, Transformer, R-GCN, and NSPN are compared in the filtered case of link prediction task on the FB15 K dataset.It can be seen that when the vector dimensions of entities and relationships increase, the running time of all models increases in a positive direction.When k = 50, PTransE model takes the least time, and NSPN model takes the second place, 84 minutes more than PTransE model.When k = 100, the running time of NSPN model is the least, and the MRR and MAP of the model are better than all other models.

Conclusion
In the knowledge reasoning method based on multi-step relation paths, it is impossible to distinguish the different contributions of all entities and relationships contained in a relation path, and multiple relation paths are not comprehensively considered.This paper proposes a neural network based on Bi-LSTM and Attention Mechanism of Knowledge Reasoning Model NSPN.The model is divided into three layers: the first layer is the embedding layer, which is initialised with the entity and relationship vectors trained by the TransE-NSE model, and the vector splicing of the entity-relationship pairs contained in a path is used as the input of the second layer.It contains the neighborhood semantic information of the entities in the knowledge graph; the second layer is the Bi-LSTM layer, which takes the bidirectional long short-term memory network as the main body and the embedding layer as the input to capture the relationship between entities in a multi-step relation path.Semantic association and their long-term dependency information, the output of each hidden layer state is used as the final output; the third layer is the attention layer, which weights the output of the second layer through the attention mechanism to obtain the attention distribution of a relation path, and then perform weighted summation to obtain a representation of a multi-step relation path; finally, go through an attention mechanism to obtain a mixed representation of multiple relation paths.Comparative experiments were carried out on two public datasets, FB15 K and FB40 K, and the results showed that compared with PTransE, PRA, and RNN + PRA, the performance of the NSPN model was better than those of the three comparison models in relation prediction tasks and link prediction tasks.Distinguishing the different contributions of all entities and relations contained in a relation path and comprehensively considering multiple relation paths are helpful for knowledge reasoning.The knowledge representation learning and knowledge reasoning methods in this paper provide a new idea for the fusion of various information in knowledge reasoning.Besides, through comparative experiments on public data sets, it is verified that the proposed model in this paper have good performance, effectively alleviate the problem in knowledge reasoning based on knowledge graph.
In view of the current research status of knowledge reasoning technology, further work can be done in the following aspects in the future.The knowledge reasoning model proposed in this paper has a certain ability of reasoning and prediction, and can be applied to recommendation tasks.In the future, the model in this paper can be used to achieve recommendation tasks in different specific application scenarios.

Figure 6 .
Figure 6.Time Performance Comparison on FB15 K.
where, W f , W i , W a and W o represent the forget gate, input gate, temporary cell state, and weight coefficient matrix of output gate in H n−1 , respectively.U f , U i , U a and U o represent forget gate, input gate, temporary cell state, and weight coefficient matrix of output gate in x n , respectively.b f , b i , b a , and b o represent the forget gate, input gate, temporary cell state and bias values of output gate, respectively.Tanh represents tangent hyperbolic function, σ represents sigmoid activation function, denotes the Hadamard product, the elementwise product of two vectors.

Table 2 .
Experiment comparison of relation prediction tasks on FB15 K and FB40 K.

Table 3 .
Experimental results of linking prediction tasks on FB15 K and FB40 K.

Table 4 .
Experiment results on 4 datasets in the filtered case.

Table 5 .
Experimental results on relation prediction tasks with different model configurations on FB15 K.

Table 6 .
Experimental results on link prediction tasks with different model configurations on FB15 K.