MSEN-GRP: A Geographic Relations Prediction Model Based on Multi-Layer Similarity Enhanced Networks for Geographic Relations Completion

Huang, Zongcai; Qiu, Peiyuan; Yu, Li; Lu, Feng

doi:10.3390/ijgi11090493

Open AccessArticle

MSEN-GRP: A Geographic Relations Prediction Model Based on Multi-Layer Similarity Enhanced Networks for Geographic Relations Completion

by

Zongcai Huang

^1,2,

Peiyuan Qiu

³,

Li Yu

⁴

and

Feng Lu

^1,2,5,*

¹

State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

School of Surveying and Geo-Informatics, Shandong Jianzhu University, Jinan 250101, China

⁴

National Academy of Safety and Development, Beijing Institute of Technology, Beijing 100081, China

⁵

Fujian Collaborative Innovation Center for Big Data Applications in Governments, Fuzhou 350003, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(9), 493; https://doi.org/10.3390/ijgi11090493

Submission received: 27 July 2022 / Revised: 3 September 2022 / Accepted: 15 September 2022 / Published: 17 September 2022

Download

Browse Figures

Versions Notes

Abstract

:

Geographic relation completion contributes greatly to improving the quality of large-scale geographic knowledge graphs (GeoKGs). However, the internal features of a GeoKG used in large-scale GeoKGs embedding are often limited by the weak connectivity between geographic entities (geo-entities). If there is no proper choice in the method of external semantic enhancement, this will often interfere with the representation and learning of the KG. Therefore, we here propose a geographic relation (geo-relation) prediction model based on multi-layer similarity enhanced networks for geo-relations completion (MSEN-GRP). The MSEN-GRP comprises three parts: enhancer, encoder, and decoder. The enhancer constructs semantic, spatial, structural, and attribute-similarity networks for geo-entities, which can explicitly and effectively enhance the implicit semantic associations between existing geo-entities. The encoder can obtain the long path relation dependency characteristics of geo-entities using a mixed-path sampling strategy and can support different optimization schemes for external semantic enhancement. Geo-relations prediction experiments show that the mean reciprocal ranking of this method is significantly higher than those of the traditional TransE DisMult and methods, and Hits@10 is improved by up to 57.57%. Furthermore, the spatial-similarity network has the most significant enhancement effect on geo-relations prediction. The proposed method provides a new way to perform relation completion in sparse GeoKGs.

Keywords:

geographic knowledge graph; relation completion; representation learning; similarity network; relation prediction

1. Introduction

A knowledge graph (KG) contains rich descriptions of concepts, objects/places, events, and their relations in the physical world in the form of symbols. As shown in Figure 1, for example, the triplet <Ionian Sea, outflow, Meditteranean Sea>. A geographic knowledge graph (GeoKG) provides a convenient tool in the field of geography to describe geographic knowledge, depict the relations between objects, and express geographic information. Among the components of a GeoKG, a geographic entity (geo-entity from hereon) represents an object with location connotation in the real world. A geographic relation (geo-relations from hereon) refers to the relation between geo-entities and related entities, mainly including their spatial (e.g., spatial adjacency, separation, orientation, and inclusion relation) relation and semantic (e.g., type, component) relation [1]. The vast amount of ubiquitous geographic semantic information present on the Internet promotes GeoKG-related research, including geographic information extraction [2], geographic information fusion [3], GeoKG construction [4,5], and GeoKG reasoning [6]. High-quality entities and relations are important prerequisites for intelligent computation and reasoning based on a KG, such as recalibration convolutional networks for learning interaction KG embedding, learning KG embedding with heterogeneous relation attention networks, and multi-scale dynamic convolutional networks for KG embedding. However, owing to the constraints of ambiguous semantics, variety of styles, and incomplete structures of natural language expressions, the existing geography-related knowledge bases (e.g., DBpedia [7], Ownthink, OSM [8], and Geonames [9]) do not only contain massive geo-entities but also have sparse entity relations and are generally missing data. This results in an incomplete relation structure, inaccurate information, or poor timeliness, which seriously affects the intelligent reasoning and computation of the resultant GeoKG [10]. Hence, geo-relation complementation has become the primary challenge of GeoKG research.

The completion of geo-relations is realized by predicting the relation in GeoKG. for example, obtaining the value of “r” in the triplet <Ionian Sea, r, Meditteranean Sea> through relation prediction modeling, or predicting the value of “h” in <h, outflow, Meditteranean Sea> or the value of “t” in <Ionian Sea, outflow, t>. Existing relation prediction methods include inference methods based on deductive reasoning, inductive reasoning, and representation learning. Methods based on deductive reasoning [11] require clearly defined prior information, such as defining axiomatic rules of (province, contains, city) and (country, contains, province) then (province, contains, city), to complete the “contains” relation between the province and city category entities in a KG. This type of method has high accuracy, but also high labor cost. Methods based on inductive reasoning [12] can effectively mine axiomatic rules with high confidence from large-scale KGs, which reduces the cost of manually defining rules, to a certain extent, and also eliminates potential subjective errors made by the rule makers. However, these two types of methods remain difficult to apply to the open GeoKGs that have a large scale of data and many types of relations. The reasoning method based on representation learning can map the entities and relations in a KG to the vector space, and transform the geo-relations prediction problem into the direct calculation between vectors, which makes it possible to complete more relations in a large-scale KG.

The reasoning method based on representation learning has become a research hotspot in intelligent computing and relation completion for large-scale KGs, it mainly comprises the distanced-based method, semantic matching-based method, and neural network method. The distanced-based method (e.g.,TransE [13], TranR [14], TransD [15]) maps geo-entities and geo-relations into low-dimensional vector space, and treat relations as translation operations from head entities to tail entities in vector space, so that h +r ≈ t. The semantic matching-based method (e.g., RESCAL [16], DisMult [17]) uses vectors to represent entities and matrices to represent their relations. The internal interaction of triplets is captured by a self-defined scoring function, as shown in Equation (12). The neural network method (e.g., ProjE [18], GNN [19], R-GCN [20]) learns the vectorial expression of new entities by using related entities and relations with the help of neural network models. All of the above methods aim to mine the features of geo-entities and relations from an existing GeoKG. However, the relations are often so sparse that the sample data cannot provide sufficient and comprehensive characteristics of geo-entities and relations in large-scale GeoKGs.

Therefore, an increasing number of researchers are finding that the aforementioned methods of mining internal structural features cannot cope with incomplete structural information in the sparse KG. Using external information to enhance the semantics of entities or relations, or learning intervention may promote the improved vectorial expression of entities and relations. TKRL [21] states that different types of entities should have different vectorial expressions, so the type information of entities is added as a constraint, the hierarchical type is taken as the projection matrix of entities, and the distance-based method is used to model different types of entities. TransEA [22] states that entity attributes will help to optimize the vectorial expression of entities. Based on using a distance model to model the triplet structure in the body of data, a linear regression model is used to model the quantitative attributes of entities. DKRL [23] states s that similar entities have similar descriptions, and uses a convolution network to encode the description information and integrate it into a distance-based model. All of the above methods, without changing the structure of the sample data, optimize the quantitative expression of entities and their relations by enhancing the external semantic information. To a certain extent, this overcomes the sparse problem of the sparsity of entities and relations in the sample data set.

However, those methods have the following problems:

Although TransE, DisMult, ProjE, and other methods can learn the features of entities, relation transformation, and graph structure, the geo-entities, and geo-relations in a GeoKG dataset often have evident unbalanced distribution characteristics, which results in the model being unable to obtain sufficient relevant features if they do not have an explicit relational connection during the learning process, but actually, a lot of entities has implicit relation with each other.
Methods that add external information (entity types, entity attributes, textual descriptions of entities) can only improve the embedding of geo-entities theoretically, it is not clear which external information can make embedding better or worse without selected valid information, and such methods still increase the complexity and reduce the efficiency of learning.

To address the above problems, a geo-relations prediction method based on multi-layer similarity enhanced networks (MSEN-GRP) is proposed, which incorporates an enhancer, encoder, and decoder to alleviate the difficulties caused by sparse relation connections between entities in the GeoKG corpus. This method has the following advantages: (1) By calculating the multi-level similarity of geo-entities, the enhancer can explicitly preserve the multi-level similarity relations between geo-entities to a certain extent, which enhances the connection of entities in the training dataset and also takes into account the enhancement of effective semantics; (2) In the encoder, different sampling ratios can be used to realize the process of vector chemistry learning of geo-entities, eliminate the added interference and enhance the semantic part, and realize the embedding of the optimally enhanced semantics. In addition, the path feature collection method can better obtain the long path dependency features between entities in a KG; 3) The model adopts a semantic enhancement method of limited display, which not only displays and increases the multi-level explicit semantic relations among geo-entities but also improves the effectiveness of the model in relation prediction.

The remainder of this paper is organized as follows: Section 2 presents details of the proposed methodology. In Section 2.1, we describe the construction of multi-layer similarity networks of geo-entities to achieve the semantic enhancement of a KG. We describe the process of realizing the vectorization expression of geo-entities in Section 2.2, and we introduce the prediction of geo-relations based on the vectorization of geo-entities in Section 2.3. In Section 3, we report comparative experiments that prove the effectiveness of the method. We discuss the usefulness of the method and its limitations in Section 4 and summarize our work in Section 5.

2. Research Methodology

The long tail distribution and sparsity of entities and relations in large-scale GeoKG have seriously affected model training. We suppose that adding the potential relations explicitly between geo-entities into the KG with explanatory external reference information can not only enhance the connectivity of sparse geographical entities but also correct the vector bias caused by a lack of data in the model, to a certain extent. Geo-entities with very similar attributes, word meanings, space, and structure are more likely to have similar vector expressions.

Hence, the MSEN-GRP method (shown in Figure 2) is divided into three main parts: (1) Enhancer: geo-entities similarity network. According to the characteristics of geo-entities, lexical-similarity network, spatial-similarity network, structural-similarity network, and attribute-similarity network are constructed to enhance the potential relations among geo-entities; (2) Encoder: geo-entity path hybrid embedding. Path generation is based on random wandering path generation for different layers of networks, and hybrid path sampling and the Word2vec [24] method are used to pretrain entities for hybrid paths; (3) Decoder: geo-relation prediction. By combining head and tail entity pre-training vectors, with the help of the DisMult model, geo-relations prediction is achieved in GeoKG.

2.1. Enhancer: Geo-Entity Similarity Network Construction

To alleviate the influence of sparse entities and relations in large-scale GeoKG on a geo-relations prediction model, we convert the metaphorical relations between geo-entities into explicit connected relation expressions by constructing a lexical-similarity network, spatial-similarity network, structural-similarity network, and attribute-similarity network to attach external information, thus enriching the connectivity between geo-entities in sparse GeoKGs and enabling the model to obtain a more effective vectorized representation of geo-entities and their relations. The construction of each network is described in Section 2.1.1, Section 2.1.2. Section 2.1.3 and Section 2.1.4 below.

2.1.1. Lexical-Similarity Network

The lexical similarity of geo-entities reflects the existence of some semantic association between them; for example, the geo-entities “Godavari River” and “Allegheny River”. The diversity of natural language expressions often makes it difficult to calculate the similarity of entity words. To address the above problems, the function of “word_sim” in Figure 2 used to calculate the similarity of word meanings of geo-entities (Figure 3) is two-fold: (1) When the names of geo-entities contain some of the same words, the Jaccard method is used to measure the lexical similarity of entities by calculating the co-occurrence of the constituent words, which allows for a quick coarse calculation of similarity through the combination of entity words; (2) For the case of different combinations of entity words that have similar meanings, such as the geo-entities “Hochfeiler” and “Hochvogel”, a pre-trained Word2vec of the word achieves a larger range of entity lexical-similarity calculation.

Firstly, the words of geo-entity

E_{r}

are divided to form the word set

W_{r} = {W_{r, 1}, W_{r, 2}, W_{r, i}, \dots, W_{r, i}}

for

E = {E_{1}, E_{2}, E_{3}, \dots E_{r}, \dots, E_{n}} \in G

. Then, the Jaccard coefficient method

J S_{E_{r}, E_{t}}

in Equation (1) (JS in Figure 3) is used to quickly calculate the similarity of entities whose words have good co-occurrence; Moreover, the similarity of entity word meanings

T S_{E_{r}, E_{t}}

(TS in Figure 3) is calculated based on the pre-trained word vector

W V_{E_{r}}

of Wikidata to solve the case of different words with the same meaning (Equation (2)); Finally, the threshold judgment method

W S_{E_{r}, E_{t}}

shown in Equation (3) (WS in Figure 3) is used to select the optimal strategy for the entity word similarity calculation, to obtain more reasonable word meaning similarity measure results. The lexical similarity of each pair in the entity set (

E_{r}, E_{t}, W S_{E_{r}, E_{t}}

) can be calculated using Equations (1)–(3) below, and a similarity threshold

θ^{W S}

is set to filter out the potential relations of entities with low similarity, this obtains the lexical-similarity network

G_{W S}

of

E

and adds an explicit relation “lexical_sim” between

E_{r} and E_{t}

as the triplet <

E_{r}, lexical_sim, E_{t}

>.

J S_{E_{r}, E_{t}} = \frac{W_{r} \cap W_{t}}{W_{r} \cup W_{t}}

(1)

T S_{E_{r}, E_{t}} = \frac{1}{1 + \sqrt{{(W V_{E_{r}} - W V_{E_{t}})}^{2}}}

(2)

W S_{E_{r}, E_{t}} = {\begin{matrix} J S_{E_{r}, E_{t}} J S_{E_{r}, E_{t}} \geq 0.5 \\ T S_{E_{r}, E_{t}} J S_{E_{r}, E_{t}} < 0.5 \end{matrix}

(3)

2.1.2. Spatial-Similarity Network

Geo-entities in a GeoKG have evident spatial distribution characteristics, and the spatial relations between geo-entities can reflect their potential similarity to some extent. Euclidean distances and topological relations are commonly used to quantify the spatial relations among geographic objects. However, the spatial scale characteristics of geo-entities make it possible to have different scales in the geometric expression. For example, when trying to judge their topological relation and Euclidean distance of the geo-entities “Xingguo County” and “Beijing-Kowloon Railway Line”, the former may be a point-type or region-type entity, while the latter is a line-type entity in GIS. Geo-entities of different geographic object types Commonly exist in large-scale GeoKGs, which results in difficulty in constructing the spatial similarity between said geo-entities in Figure 2.

Thus, the first function of “spatial_sim” in Figure 2 is to recognize geo-entities as an appropriate geo-object type (e.g., Point, Region, Line). After the identification of the corresponding geometric object types of the geo-entities involved in the spatial similarity calculation, a computing framework in Table 1 is used to obtain the similarity computing mode of the given entity. The spatial similarity calculation framework of geo-entities is shown in Table 1, in which

D

represents the spatial distance model of two geo-entities. The spatial distance calculations between different geo-object types with different topological relations have different patterns. The Point–Point type does not have the topological relations including “Intersect”, “Contains“ or “Within”. The Point–Line type does not have the topological relations including “Intersect” or ”Equation”. The spatial distance is often zero when the two geo-entities are of the Point–Line or Point–Region type. This is also the case for Line–Region or Line–Line types with topological relation “Contains” or “Within” as well as in the Point–Point, Point–Region, or Line–Line types with the topological relation “Equation”.

When the spatial distance is not zero in the various space distance models,

D_{P - P}

,

D_{P - L}

,

D_{P - R}

,

D_{R - R},

and

D_{L - R}

should be calculated directly using coordinates of representative points. Point-type geo-entities make the point self as its representative point and their coordinates can be directly found, for example, “The Beijing Railway Station” can mostly be viewed as a point-type entity, with a specific location and set of coordinates. However, for other geo-object types geo-entities it is necessary to select a representative point to calculate the spatial distance; for example, the geo-entity “Beijing” is probably best viewed as a region type entity, so the common method is to select the coordinates of the government of Beijing as the coordinates of its representative point. Of course, the center point of a line-type or region-type geo-object as the representative point is the most convenient way (Figure 4). for example, the line-type geo-entity

E_{t}

shown in Figure 4b, selecting the nearest point

E_{t, p}

on the line. In Figure 4c, selecting the central point

E_{t, p}

in the region as the representative point if

E_{t}

is the region-type. In addition, selecting the nearest point as the representative point for

E_{t}

or

E_{r}

in different types is often applied to the model in Figure 4f,g, while selecting the center point as the representative point for

E_{t}

or

E_{r} in different types

is usually seen in the model in Figure 4d–i.

After obtaining the representative points

E_{r, p}

and

E_{t, p}

for

E_{r} and E_{t}

, respectively, with their coordinates

L o n_{E_{r, p}}

(

L o n_{E_{t, p}}

) and

L a t_{E_{r, p}}

(

L a t_{E_{t, p}}

), the spatial distance

D_{E_{r,} E_{t}}

calculation of

E_{r} and E_{t}

uses the coordinates

L o n_{E_{r, p}}

(

L o n_{E_{t, p}}

) and

L a t_{E_{r, p}}

(

L a t_{E_{t, p}}

) as the input for Equation (4); the spatial similarity of the geo-entities

S S_{E_{r}, E_{t}}

can then be calculated via Equation (5). Furthermore, the spatial-similarity network connectivity between the entities in the GeoKG is set by the threshold parameter

θ^{s s}

to construct the spatial-similarity network

G_{S S}

of E. The relationship “spatial_sim” between

E_{r} and E_{t}

is then added to the GeoKG corpus.

D_{E_{r,} E_{t}} = \sqrt{{(L o n_{E_{r, p}} - L o n_{E_{t, p}})}^{2} + {(L a t_{E_{r, p}} - L a t_{E_{t, p}})}^{2}}

(4)

S S_{E_{r}, E_{t}} = \frac{1}{1 + D_{E_{r}, E_{t}}}

(5)

2.1.3. Structural-Similarity Network

The types of connected entities in a GeoKG can partly reflect the potential similarities between geo-entities. As shown in Figure 2 and Figure 5, the entity types adjacent to the geo-entities “Ionian Sea” (

E_{r}

) and “Adriatic Sea” (

E_{t}

) include {“body of stream”, “body of water”, and “populated place”} (

T_{E_{r}} and T_{E_{t}}

), indicating a high structural similarity between them.

Therefore, the structural similarity network is constructed based on the current status of connectivity and type labels of entities from the existing GeoKG according to the process in Figure 5. Firstly, a collection of connected object type labels list

T_{E_{r}}

and

T_{E_{t}}

is obtained for a geo-entity object

E_{r} and E_{t}

. Then, the one-hot bag-word model is introduced to realize the label vectorization of geo-entities. In the bag-word vectors

B W_{E_{r}} and B W_{E_{t}}

of

E_{r} and E_{t},

the vector dimension is the label name, and the value is the number of appearing labels; and then, the structural similarity of the two geo-entities

E_{r} and E_{t}

is calculated using Equation (6). Furthermore, the structural similarity network

G_{A S}

is set by the threshold parameter

θ^{A S}

. Finally, the relation “structural_sim” between

E_{r} and E_{t}

is added to the GeoKG corpus.

A S_{E_{r}, E_{t}} = \frac{1}{1 + \sqrt{\sum_{i = 1}^{n} {(B W_{E_{r}} - B W_{E_{t}})}^{2}}}

(6)

2.1.4. Attribute-Similarity Network

The attributes of geo-entities comprise the detailed characterization of the properties of said entity. The higher the similarity of the geo-entities in terms of their sets of attributes, the stronger the potential connection between them. Attribute similarity measures must consider various attribute name expressions and attribute values of the geo-entities, such as those for “Chicago” and “China” given in Table 2. The entity attribute types should be aligned firstly and then calculates the similarity of their matching attribute value should be calculated.

First of all, for the set of attribute names

P L_{E_{r}} and P L_{E_{t}}

obtained for

E_{r}, E_{t}

, Equations (1)–(3) are used to calculate the lexical similarity of the entity attribute names and obtain the set of aligned attribute names

P L A_{E_{r,} E_{t}}

; and then the proportion of attribute type alignment

P L A_r a t e

is calculated (Equation (7)).

P L A_r a t e = \frac{l e n (P L A_{E_{r}})}{l e n (P L_{E_{r}})}

(7)

Next, the aligned attribute names are divided into string-type attributes (SAT) and numeric-type attributes (VAT) according to their corresponding attribute value types; for example, the attribute name “name” is a SAT, and “Total area” is a VAT in Table 2. The attribute sets

P L A_{E_{r,} E_{t}, s a t} and P L A_{E_{r,} E_{t}, v a t}

are thus formed. The SAT similarity adopts a similar method to that shown in the above Equations (1)–(3). Because an entity may have several SATs, a comprehensive similarity calculation

S i m_{s a t_{E_{r}, E_{t}}}

is calculated using the lexical-similarity

W S_{E_{r, p i}, E_{t, p i}}

(Equation (8)).

S i m_{s a t_{E_{r}, E_{t}}} = \frac{1}{1 + \frac{\sum_{i = 0}^{i = n} W S_{E_{r, p i}, E_{t, p i}}}{n}} E_{r, p i}, E_{t, p i} \in P L A_{E_{r,} E_{t}, s a t}

(8)

The VAT similarity calculation of geo-entities must consider the magnitude and distribution of the attribute value. The isometric discretization algorithm is used to discretize the continuous values and unify the attribute values into the same magnitude to reduce the influence of the similarity difference of different attributes. The wide discrete algorithm performs the regionalization of the value space according to the set number of

K

by finding the minimum and maximum values of the attribute, as shown in Table 3.

After obtaining the partition interval of the attribute value, the attribute similarity is calculated by using the attribute discrete value in Equation (9). Finally, the string type and numerical type attribute values are integrated to calculate the attribute similarity

P S_{E_{r}, E_{t}}

of the entity

E_{r}, E_{t}

using Equation (10). After obtaining the similarity of the entity properties, the potential connection between the entities is determined based on the attribute similarity threshold

θ^{P S}

, thus constructing the attribute similarity network

G_{P S}

of the geo-entity set E. Finally, the relation “attribute_sim” between

E_{r} and E_{t}

is added to the GeoKG corpus.

S i m_{v a t_{E_{r}, E_{t}}} = \frac{1}{1 + \frac{\sum_{i = 0}^{i = n} T y p e (E_{r, p i}) - T y p e (E_{t, p i})}{n}} E_{t, p i} \in P L A_{E_{r,} E_{t}, v a t}

(9)

P S_{E_{r}, E_{t}} = P L A_{r a t e} * \frac{1}{1 + S i m_{v a t_{E_{r}, E_{t}}} + S i m_{s a t_{E_{r}, E_{t}}}}

(10)

2.2. Encoder: Geo-Entity Path Hybrid Embedding

The encoder is designed to achieve the vectorized expression of the geo-entities in GeoKG. The encoder refers to DeepWalk [25], a path feature-based representation learning method, which learns the relational features among geo-entities by generating a collection of path sequences of different layer similarity networks through random walk strategies; the Word2vec method is introduced to pretrain these path sequences and ultimately obtain the vectors of geo-entities.

The basis of the encoder method is the enhanced GeoKG, including the basic network

G_{B}

, lexical-similarity network

G_{W S}

, spatial-similarity network

G_{S S}

, structural-similarity network

G_{A S}

, and attribute-similarity network

G_{P S}

, which were constructed in Section 2.1. As shown in Figure 6, the network

G_{B}

is abstracted from the GeoKG in Figure 1, and the enhanced part of the network

(G_{W S}, G_{S S}, G_{A S}, and G_{P S})

is represented by the red, green, purple, and blue sections in the central part of Figure 6. Each layer of the network then uses a random walk, in which each step of the walk moves from the edge connected to the current node from a particular node, and moves along the selected edge to the next node repeatedly; for example, {route_1,route_2,route_3,……} shown in Figure 6. During the process, the number of random walk paths n and the length of each path

l

are required. Hence, the set of four-layer random walk paths of the entity R is formed R^B, R^W, R^S, R^A, and R^P for

G_{B}, G_{W S}, G_{S S}, G_{A S}, and G_{P S}

, respectively.

Then, representation learning of the nodes (geo-entities) in the path set is implemented by using the word pre-training method (Word2vec) based on the set of path sequences. The proposed method makes full use of the long-distance dependencies between entities, which is highly interpretable and easy to understand and is suitable for capturing the relational features of entities in sparse KGs. The concept of multi-layer sampling parameters is introduced to explore the involvement of different layer similarity networks in representation learning. Sampling proportion will affect the construction of the active Word2vec pre-training model. To control the proportion of sample data from different layers in active Word2vec pre-training, we take

λ_{B}

,

λ_{W}

,

λ_{S}

,

λ_{A}

, and

λ_{P}

as the proportional parameters of path sampling of the above layers, and ultimately form the path set R of the multi-layer similarity networks using Equation (11). When

λ_{B} : λ_{W} : λ_{S} : λ_{A} : λ_{P} =

1:0:0:0:0, this corresponds to the currently popular DeepWalk method, which only considers the direct connection between the entities in a GeoKG.

R = λ_{B} R^{B} + λ_{W} R^{W} + λ_{S} R^{S} + λ_{A} R^{A} + λ_{P} R^{P}

(11)

As shown in Figure 6, the entity pre-training model of MSEN-GRP takes the path sequence as the input route dataset, which is inspired by the DeepWalk method and introduces the Word2vec algorithm, and lastly realizes the vector representation learning of the node entities through unsupervised training. Word2vec can be realized through both Skip-gram and bag of words (BOW), Skip-gram for the current node prediction context, and BOW for the context prediction current node, both have the same activation function, and both ultimately translate this into a specific type of vectorized expression finally. The key parameters for Word2vec training include the setting of the sampling window w and output vector dimension d.

2.3. Decoder: Geo-Relation Prediction

The decoder in MSEN-GRP introduces the DisMult model, as shown in Figure 2, which uses the pre-trained geo-entity vectors

y_{e_{1}} and y_{e_{2}}

as the input from the encoder above, in which the head-end entity vector is combined as the whole input feature vector. The relation representation is mainly reflected by

A_{r}^{T}

and

B_{r}

in the scoring function. DisMult mainly adopts the basic linear transformation function

g_{r}^{a}

and bilinear conversion function

g_{r}^{b}

in Equation (12) as the scoring function. The scoring function is used to calculate the score of the relation belonging to a certain category in the triplet.

g_{r}^{a} (y_{e_{1}}, y_{e_{2}}) = A_{r}^{T} (\begin{matrix} y_{e_{1}} \\ y_{e_{2}} \end{matrix}) and g_{r}^{b} (y_{e_{1}}, y_{e_{2}}) = y_{e_{1}}^{T} B_{r} y_{e_{2}}

(12)

The vector of

y_{e_{1}}, y_{r_{1}}, and y_{e_{2}}

can be learned and updated by using a loss function in Equation (13). The loss function is used to evaluate the difference between the predicted relation and the actual relation of the model. A better loss function equates to better performance of the model. Equation (13) encourages positive sampled relations or triplets to obtain higher confidence scores than negative sampled relations or triplets. Usually, the training samples given are assumed to be positive sampled data. The model builds the positive to negative sample

(e_{1}, r, e_{2})

through corruption by the positive sample

({e_{1}}^{'}, r, {e_{2}}^{'})

, to generate the corresponding negative sample dataset T′ of the positive sample dataset T; the score function of T is

f_{(e_{1}, r, e_{2})}

, and that of T′ is

f_{({e_{1}}^{'}, r, {e_{2}}^{'})}

in Equation (13).

L (Ω) = \sum_{(e_{1}, r, e_{2}) \in T} \sum_{({e_{1}}^{'}, r, {e_{2}}^{'}) \in T^{'}} \max {f_{({e_{1}}^{'}, r, {e_{2}}^{'})} - f_{(e_{1}, r, e_{2})} + 1, 0}

(13)

3. Experimental Design and Results

The GeoKG dataset used for the experiment was formed using the geospatial knowledge extracted from the large-scale general KG. Next, we built the multi-layer similarity networks of geo-entities, and the randomly generated and mixed sampling paths on the multi-layer network as the input for vector model training to obtain the geo-entity vector. Lastly, the vector of geo-relations was decoded by DisMult. We present experimental comparisons of different mixed-path sampling patterns to explore the effects of different-layer similarity network enhancers on relation completion. Furthermore, we compare the MSEN-GRP method with the common distance-based model (TransE, TransD) and semantic matching-based method (RESCAL, DisMult) and the neural network method ProjE, and TKRL to illustrate the effectiveness of this method.

3.1. Experimental Dataset Analysis

The experimental dataset GeoDBpedia21 parsed geo-entities and their relations from the DBpedia dataset. Table 4 shows the number of geo-entities and relations, there were 21 geo-relations and 39,770 geo-entities. Table 5 describes the meaning of each geo-relation.

Details of the geo-relations in the GeoDBpedia21 dataset are shown in Figure 7, in which there is an unbalanced distribution of relation types; the relations “department”(14,974), “located in area”(14,865), and “source country” (4012) appear the most, and several geo-relations appear less than 1000 times (e.g., “inflow”, “outflow”, “broadcast area”, “river mouth”, “river”, “location city”, “south country”, “mouth region”, “crosses”, “major island”, “right tributary”, “left tributary” and “island”).

Details of the geo-entities in the GeoDBpedia21 dataset are shown in Figure 8; 25,872 geo-entities have just one degree, which represents the number of other geo-entities connected to the given entity from a KG aspect, as well as the number of edges connected with this node from a graphical aspect. Most geo-entities have a degree < 8, and the number of geo-entities varies from 10 to 1 when the degree is >20. Thus, unbalanced distribution of geo-relations and weak connectivity between geo-entities Objectively exists in the GeoDBpedia21 dataset. A similar pattern is expected for large-scale GeoKG whose data are from complex Internet resources. This unbalanced distribution of relations is visualized in Figure 9, where it can be seen that many geo-entities have little connection to other geo-entities around the boundary in the graph.

3.2. Experimental Design and Parameters

The experiment was conducted via the enhancer, encoder, and decoder for the GeoKG dataset GeoDBpedia21. In the process, the similarity thresholds

θ^{W S}, θ^{S S}, θ^{A S}, and θ^{T S}

must be set to reasonable values. As the similarity value ranges were different, the top 10% of the highest similarity of sample data in each layer were ultimately selected. In mixed path sampling, the setting of the sample proportions

λ_{B}

:

λ_{W}

:

λ_{S}

:

λ_{A}

:

λ_{T}

was related to the proportion of sample paths of each similarity network participating in training. Six path sampling modes were adopted to carry out the effect comparison of pre-training model effects under different sampling modes. The parameters of the Word2vec were set during the model pre-training process as n = 100, l = 5, w = 5, and d = 100. Finally, to evaluate the effect of geo-entity representation learning as trained by multi-layer similarity networks in knowledge completion, we evaluated the effect of knowledge completion using the geo-relation prediction results. The DisMult model was used for geo-entity category prediction; the number of the training dataset, test dataset, and validation datasets are shown in Table 4.

3.3. Analysis of the Experimental Results

The mean reciprocal ranking (MRR) index was used to evaluate the effect of geo-relations prediction. The calculation is as formula (14), where S is the triplet set,

| S |

is the number of triplet sets,

r a n k_{i}

indicates the link prediction rank of the i-th triplet. A larger indicator equates to a better MESEN-GRP model.

M R R = \frac{1}{| S |} \sum_{i = 1}^{| S |} \frac{1}{r a n k_{i}}

(14)

In addition, HITS@n was used to represent the average proportion of triplets <n in the prediction. The current commonly used indicators include hits@1, hits@3, hits@5, and hits@10. The calculation formula is shown in Equation (15):

HITS @ n = \frac{1}{| S |} \sum_{i = 1}^{| S |} I (r a n k_{i} \leq n)

(15)

where the same symbols as in Equation (14), and I represent the indicator function (the function value if the condition is 1, otherwise 0).

The MSEN-GRP used the mixed sampling method for Comparing the effects of the enhancement of different network layers

G_{B}, G_{W S}, G_{S S}, G_{A S}, and G_{P S}

, the proportions

λ_{B}

:

λ_{W}

:

λ_{S}

:

λ_{A}

:

λ_{P}

are shown in Table 6 along with their geo-relations prediction results. It was found that the lexical-similarity network and spatial-similarity networks constructed within GeoDBpedia21 provided the most evident contribution to the improvement effect; the lexical similarity network has the highest Hits@10 result with about 54% increase over the basic network and the spatial similarity network’s Hits@10 is 55.82%. The structural similarity network has learned a lot from the structure of the basic network. So its Hits@10 has increased but is lower than the lexical similarity network because of the weak connectivity in GeoKG.Among them, the enhancement effect of attribute similarity network is not obvious, mainly because it is difficult to obtain comprehensive high similarity between geo-entities under multi-dimensional attributes in the sparse large-scale KG.

Furthermore, the experiments in this paper compare the proposed MSEN-GRP method with the common internal feature learned methods (used distance-based TransE, the semantic matching-based method Rescal, the neural network method ProjE) and fusing the external information method TKRL. As the experimental results show in Table 7, Most of the MRR of MSEN-GRP is higher than that of TransE, Rescal, DisMult, ProjE, TKRL methods, and the MRR of the raw part mostly lower than the filter part (remove the existing part in the training dataset from the test dataset). The best Hits@10 index of MSEN-GRP is improved to 0.5759, the highest increase rate is 57.57% and the lowest increase rate is 24.61%.

We show the geo-relation prediction between “Wuyi Mountains” and “Shangrao” from the experiment result, the entity “Wuyi Mountains” and “Shangrao” do not have the explicit geo-relation in DBpedia but we know that some part of “Wuyi Mountains” is in “Shangrao” so that they have spatial neighborhood relation actually. We can get the geo-relation prediction result of the said case as shown in Table 8, the accurate geo-relation “http://dbpedia.org/ontology/nearestCity” (accessed on 14 September 2022) get the first ranking in the geo-relation prediction result. The geo-relation “http://dbpedia.org/ontology/locatedInArea” (accessed on 14 September 2022) get a high score in this geo-relation case because this type of geo-relation gets a high number distribution as shown in Figure 7.

4. Discussion

At present, the main knowledge completion methods (e.g., TransE, RESCAL, ProjE, TKRL) are all based on the connection characteristics between existing entities to obtain a reliable and effective vectorized representation of entities, which is then applied to knowledge completion. Therefore, the connectivity of entities is closely related to the representation learning of a KG. Thus, when those methods such as TransE and DisMult were used in GeoDBpedia21 for testing, and the MRR was lower than 0.2, Hits@10 is lower than 0.33, and the overall effect was not ideal [26]. The proposed MSEN-GRP method converts the implicit geo-relations between geo-entities to explicit geo-relations with specific meaning by constructing multi-layer similarity networks, which compensates for the weak connectivity between geo-entities and improves the learning effect of geo-entity representation(as long as the accuracy of knowledge completion based on representation learning is improved). The experimental results shown in Table 7 verify the effectiveness of the proposed MSEN-GRP method in improving geo-relations prediction.

The MSEN-GRP method attempts to build those similarity networks, which theoretically enhances the effect of representation learning, but there are differences in the contribution of different layers of similarity networks to the representation learning of geo-entities. Therefore, the MSEN-GRP method also provides a mixed-path sampling method to test the effect of the MSEN-GRP method under different sampling modes, and evaluate the impact of different levels of similarity networks on the learning of geo-entity representation. The experimental results (Table 6) show that the lexical-similarity network and the spatial-similarity network provide a greater contribution to the prediction of the relations between geo-entities; this is mainly because the spatial characteristics and structural similarity characteristics of geo-entities are relatively evident. However, owing to the complexity of geo-entity attribute types, the effect of the attribute similarity network remains unclear. the good effect of the structural-similarity network is verified that the simplified and captured structural feature of GeoKG would contribute to the geo-entity embedding. MSEN-GRP method can adjust the sampling ratio according to the contribution of each layer of the similarity network to the relation completion, to achieve the optimization of efficiency and effect. Therefore, the mixed sampling strategy results in the MSEN-GRP method being more interpretable, and also provides greater flexibility for method optimization and scene applicability.

However, the MSEN-GRP method still has some limitations: (1) The time complexity of the enhancer increases as the scale of the KG becomes larger. For large-scale GeoKG applications, the construction of a geo-entity similarity network requires the calculation of the similarity between each pair of geo-entities; (2) The vectorized representation of geo-relations still needs to be improved. Although this method uses DisMult in the decoder to realize the vectorized expression of the geo-relations between geo-entities, the distance calculated by the vectors cannot be used to evaluate the similarity of the geo-relations because of their uneven distribution in GeoKG datasets.

5. Conclusions

To solve the problem of the geo-relations completion of GeoKG being subject to a poor knowledge representation learning effect caused by the sparse relations, we propose a geo-relations prediction model based on multi-layer similarity enhanced networks (MSEN-GRP) is proposed for geo-relations completion. This method compensates for the weak connection defect of geo-entities in the GeoKG by constructing multi-layer similarity networks, explicitly including word meaning, space, structure, and attribute for each geo-entity; this facilitates better learning with the more explicit and balanced features across different models. In addition, the DeepWalk algorithm, introduced into the encoder part of the model, uses the hybrid path sampling method to learn the relation dependencies with long distances. The geo-relations prediction experiment based on the GeoDBpedia21 datasets proves that the MSEN-GRP model performs better than most current methods in geo-relationship completion. For example, the Hits@10 of the MSEN-GRP model is 57.57% higher than that of DisMult, and 24.61% higher than that of TransE, proving that adding explicit information is effective. Experiments using different sampling modes show that the spatial-similarity network improves the learning of geo-entity representations by the greatest degree, with a Hits@10 increase of 30%. In contrast, the enhancement effect of the attribute-similarity network is not apparent, highlighting the fact that the effects of different similarity networks vary widely. We also found that the geo-entities in GeoKGs have a strong implicit spatial similarity. In the future, we will consider introducing a geographic weighting mechanism to improve the biased vectorized representation of geo-entities and their relationships in large-scale GeoKGs.

Author Contributions

Conceptualization, Zongcai Huang, Peiyuan Qiu, Li Yu, and Feng Lu; Data curation, Zongcai Huang and Peiyuan Qiu; Formal analysis, Peiyuan Qiu; Funding acquisition, Feng Lu; Investigation, Zongcai Huang, Peiyuan Qiu, and Li Yu; Methodology, Zongcai Huang; Project administration, Feng Lu; Resources, Zongcai Huang; Supervision, Feng Lu; Validation, Zongcai Huang and Li Yu; Writing—original draft, Zongcai Huang; Writing—review & editing, Zongcai Huang, Peiyuan Qiu, Li Yu, and Feng Lu. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Grant No.41631177, Grant No.41801320, Grant No.42001341). This support is gratefully acknowledged. We also thank the anonymous referees for their helpful comments and suggestions.

Acknowledgments

The authors would like to thank the four anonymous reviewers for their valuable suggestions, which significantly improved the quality of this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, S.; Zhang, X.; Ye, P.; Du, M.; Lu, Y.; Xue, H. Geographic Knowledge Graph (GeoKG): A Formalized Geographic Knowledge Representation. ISPRS Int. J. Geo-Inf. 2019, 8, 184. [Google Scholar] [CrossRef]
Ren, Z.; Yu, H.; Wan, F. Research on geographic information extraction based on knowledge graph. In Proceedings of the 2018 3rd International Conference on Advances in Materials, Mechatronics and Civil Engineering (ICAMMCE 2018), Hangzhou, China, 3–15 April 2018; Atlantis Press: Amsterdam, The Netherlands; pp. 297–302. [Google Scholar]
Yu, L.; Qiu, P.; Liu, X.; Lu, F.; Wan, B. A holistic approach to aligning geospatial data with multidimensional similarity measuring. Int. J. Digit. Earth 2017, 11, 845–862. [Google Scholar] [CrossRef]
Guo, C.; Xu, T.; Liu, L. Construction of Knowledge Graph Based on Geographic Ontology. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2019; Volume 252, p. 052161. [Google Scholar]
Bingchuan, J.; Gang, W.A.N.; Jian, X.U.; Feng, L.I.; Huiqi, W.E.N. Geographic knowledge graph building extracted from multi-sourced heterogeneous data. Acta Geod. Cartogr. Sin. 2018, 47, 1051. [Google Scholar]
Qiu, P.; Gao, J.; Yu, L.; Lu, F. Knowledge Embedding with Geospatial Distance Restriction for Geographic Knowledge Graph Completion. ISPRS Int. J. Geo-Inf. 2019, 8, 254. [Google Scholar] [CrossRef]
Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. Dbpedia: A nucleus for a web of open data. In The Semantic Web; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
Haklay, M.; Weber, P. OpenStreetMap: User-Generated Street Maps. IEEE Pervasive Comput. 2008, 7, 12–18. [Google Scholar] [CrossRef] [Green Version]
Rebele, T.; Suchanek, F.; Hoffart, J.; Biega, J.; Kuzey, E.; Weikum, G. YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames. In International Semantic Web Conference; Springer: Cham, Switzerland, 2016; pp. 177–185. [Google Scholar]
Feng, L.; Li, Y.; Peiyuan, Q. On Geographic Knowledge Graph. J. Geo-Inf. Sci. 2017, 19, 723–734. [Google Scholar]
Galárraga, L.A.; Teflioudi, C.; Hose, K.; Suchanek, F. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 413–422. [Google Scholar]
Lao, N.; Mitchell, T.; Cohen, W. Random walk inference and learning in a large scale knowledge base. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Scotland, UK, 27–31 July 2011; pp. 529–539. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Durán, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; Volume 2, pp. 2787–2795. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; AAAI Press: Palo Alto, CA, USA; pp. 2181–2187. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; Association for Computational Linguistics. pp. 687–696. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 809–816. [Google Scholar]
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. arXiv 2014, arXiv:1412.6575. [Google Scholar]
Shi, B.; Weninger, T. Proje: Embedding projection for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–7 February 2017; Volume 31, pp. 1236–1242. [Google Scholar]
Socher, R.; Chen, D.; Manning, C.D.; Ng, A. Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; Volume 1, pp. 926–934. [Google Scholar]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Berg, R.V.D.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In European Semantic Web Conference; Springer: Cham, Switzerland, 2018; pp. 593–607. [Google Scholar]
Xie, R.; Liu, Z.; Sun, M. Representation Learning of Knowledge Graphs with Hierarchical Types. IJCAI 2016, 2016, 2965–2971. [Google Scholar]
Wu, Y.; Wang, Z. Knowledge graph embedding with numeric attributes of entities. In Proceedings of the Third Workshop on Representation Learning for NLP, Melbourne, Australia, 20 July 2018; Association for Computational Linguistics. pp. 132–136. [Google Scholar]
Xie, R.; Liu, Z.; Jia, J.; Luan, H.; Sun, M. Representation learning of knowledge graphs with entity descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30, pp. 2659–2665. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; Association for Computing Machinery. pp. 701–710. [Google Scholar]
Sun, Z.; Vashishth, S.; Sanyal, S.; Talukdar, P.; Yang, Y. A re-evaluation of knowledge graph completion methods. arXiv 2019, arXiv:1911.03903. [Google Scholar]

Figure 1. An example visualization of a GeoKG.

Figure 2. Framework of MSEN-GRP.

Figure 3. Lexical similarity calculation schemes.

Figure 4. Visual space distance model.

Figure 5. Method of constructing the structural similarity network.

Figure 6. Representation of the encoder.

Figure 7. Number of all relationship types appearing in the GeoDBpedia21 dataset.

Figure 8. Distribution of entities with different degrees.

Figure 9. Graphical visualization of the GeoDBpedia21 dataset.

Table 1. Computational framework for calculating the spatial similarity of geo-entities.

Space Distance Model	Disjoint	Intersect	Contains/Within	Equation
Point–Point	$D_{P - P}$	/	/	D = 0
Point–Line	$D_{P - L}$	/	D = 0	/
Point–Region	$D_{P - R}$	/	D = 0	/
Region–Region	$D_{R - R}$	$D_{R - R}$	D = 0	D = 0
Line–Region	$D_{L - R}$	$D_{L - R}$	D = 0	/
Line–Line	$D_{L - L}$	$D_{L - L}$	D = 0	D = 0

Table 2. Example of attribute comparison for two geo-entities.

Chicago		China
name	Chicago, Illinois	name	The People’s Republic of China
Total area	606,057,217.818624	Total area	9,596,961,000,000
country	USA	currency	Renminbi
Postal code	606xx, 607xx	capital	Beijing
……	……	……	……

Table 3. VAT splitting rules.

Type	Interval
1	$[M i n_{E_{r, p}}, M i n_{E_{r, p}} + \frac{M i n_{E_{r, p}} + M a x_{E_{r, p}}}{K}]$
2	$[M i n_{E_{r, p}} + \frac{M i n_{E_{r, p}} + M a x_{E_{r, p}}}{K}, M i n_{E_{r, p}} + 2 \frac{M i n_{E_{r, p}} + M a x_{E_{r, p}}}{K}]$
……	…………
$n$	$[M i n_{E_{r, p}} + (n - 1) \frac{M i n_{E_{r, p}} + M a x_{E_{r, p}}}{K}, M a x_{E_{r, p}}]$

Table 4. The statistics of GeoDBpedia21.

Dataset	elations	Entities	Traning Sets	Validation Sets	Test Sets
GeoDBpedia21	21	39,770	46,657	2560	2544

Table 5. Explanation of relationship types in the GeoDBpedia21 dataset.

Type	Explanation
department	which department the place belongs to (the department is one of the three levels of government in France)
located in area	where the entity is located in a place
source country	where the river originated from in a country
nearest city	the entities’ nearest city in geospatial terms
mountain range	which mountain range the mountain belongs to (a mountain range is a series of mountains arranged in a line and connected by high ground)
mouth mountain	where the body of water flows into a mountain
mouth place	where the body of water flows into a place
parent mountain peak	a peak’s parent as a particular peak in the higher terrain connected to the peak
outflow	a sink of the body of water
inflow	a source of the body of water
broadcast area	a place served by a radio station
river mouth	where the river flows into a lake, reservoir, sea, ocean, or another river,
river	a river located in or meets at the place
location city	where the organization is located in a city
mouth region	where the body of water flows into a region
crosses	where the bridge crosses a river
major island	which small major islands the island has
mouth country	where the body of water flows into a country
island	an island belongs to or contains the place
right tributary	a stream or river that flows into its right larger stream or main stem (or parent) river or a lake
left tributary	a stream or river that flows into its left larger stream or main stem (or parent) river or a lake

Table 6. Result of different rate of

λ

in MSEN-GRP.

Table 6. Result of different rate of

λ

in MSEN-GRP.

Network Name	$λ_{B}$ : $λ_{W}$ : $λ_{S}$ : $λ_{A}$ : $λ_{P}$	MRR		Hits@10
Network Name	$λ_{B}$ : $λ_{W}$ : $λ_{S}$ : $λ_{A}$ : $λ_{P}$	Raw	Filter	Raw	Filter
The basis network	1:0:0:0:0	0.0011	0.0011	0.0002	0.0002
Lexical-similarity network	1:1:0:0:0	0.2891	0.3877	0.4520	0.5452
Spatial-similarity network	1:0:1:0:0	0.2925	0.3934	0.4654	0.5582
Structural-similarity network	1:0:0:1:0	0.2167	0.2857	0.3762	0.4367
Attribute-similarity network	1:0:0:0:1	0.1115	0.1313	0.2938	0.3278
All enhanced netoworks	1:1:1:1:1	0.2761	0.3750	0.4784	0.5759

Table 7. Comparison of experimental results.

Method	MRR		Hits@10
Method	Raw	Filter	Raw	Filter
TransE	0.0815	0.0959	0.2240	0.2471
Rescal	0.0609	0.0619	0.1344	0.1352
DistMult	0.0013	0.0015	0.0022	0.0022
ProjE	0.1183	0.1487	0.2254	0.2759
TKRL	0.1089	0.1304	0.2844	0.3298
MSEN-GRP	0.2761	0.3750	0.4784	0.5759

Table 8. The top 5 in Geo-relation prediction case. (all website are accessed on 14 September 2022).

Geo-entity	Geo-relation	Geo-entity
https://en.wikipedia.org/wiki/Wuyi_Mountains	http://dbpedia.org/ontology/nearestCity	https://en.wikipedia.org/wiki/Shangrao
https://en.wikipedia.org/wiki/Wuyi_Mountains	http://dbpedia.org/ontology/mouthMountain	https://en.wikipedia.org/wiki/Shangrao
https://en.wikipedia.org/wiki/Wuyi_Mountains	http://dbpedia.org/ontology/locatedInArea	https://en.wikipedia.org/wiki/Shangrao
https://en.wikipedia.org/wiki/Wuyi_Mountains	http://dbpedia.org/ontology/mountainRange	https://en.wikipedia.org/wiki/Shangrao
https://en.wikipedia.org/wiki/Wuyi_Mountains	http://dbpedia.org/ontology/locationCity	https://en.wikipedia.org/wiki/Shangrao

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Z.; Qiu, P.; Yu, L.; Lu, F. MSEN-GRP: A Geographic Relations Prediction Model Based on Multi-Layer Similarity Enhanced Networks for Geographic Relations Completion. ISPRS Int. J. Geo-Inf. 2022, 11, 493. https://doi.org/10.3390/ijgi11090493

AMA Style

Huang Z, Qiu P, Yu L, Lu F. MSEN-GRP: A Geographic Relations Prediction Model Based on Multi-Layer Similarity Enhanced Networks for Geographic Relations Completion. ISPRS International Journal of Geo-Information. 2022; 11(9):493. https://doi.org/10.3390/ijgi11090493

Chicago/Turabian Style

Huang, Zongcai, Peiyuan Qiu, Li Yu, and Feng Lu. 2022. "MSEN-GRP: A Geographic Relations Prediction Model Based on Multi-Layer Similarity Enhanced Networks for Geographic Relations Completion" ISPRS International Journal of Geo-Information 11, no. 9: 493. https://doi.org/10.3390/ijgi11090493

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MSEN-GRP: A Geographic Relations Prediction Model Based on Multi-Layer Similarity Enhanced Networks for Geographic Relations Completion

Abstract

1. Introduction

2. Research Methodology

2.1. Enhancer: Geo-Entity Similarity Network Construction

2.1.1. Lexical-Similarity Network

2.1.2. Spatial-Similarity Network

2.1.3. Structural-Similarity Network

2.1.4. Attribute-Similarity Network

2.2. Encoder: Geo-Entity Path Hybrid Embedding

2.3. Decoder: Geo-Relation Prediction

3. Experimental Design and Results

3.1. Experimental Dataset Analysis

3.2. Experimental Design and Parameters

3.3. Analysis of the Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI