ISLKG: The Construction of Island Knowledge Graph and Knowledge Reasoning

He, Qi; Yu, Chenyang; Song, Wei; Jiang, Xiaoyi; Song, Lili; Wang, Jian

doi:10.3390/su151713189

Open AccessArticle

ISLKG: The Construction of Island Knowledge Graph and Knowledge Reasoning

¹

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China

²

National Marine Data and Information Service, Tianjin 300171, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(17), 13189; https://doi.org/10.3390/su151713189

Submission received: 17 July 2023 / Revised: 24 August 2023 / Accepted: 28 August 2023 / Published: 1 September 2023

(This article belongs to the Section Sustainable Oceans)

Download

Browse Figures

Versions Notes

Abstract

:

Islands with both land and sea characteristics provide the foundation for protecting the marine environment, preserving the ecological balance of the ocean, and fostering sustainable economic and social growth. Advanced monitoring technologies have boosted the collection of multi-source data of islands, but the isolation and insufficiency of data are harmful to the development and management of islands. To form a coherent and complete understanding of islands, it is necessary to convert the multi-source data into knowledge-based information. This paper proposes an island knowledge graph construction method based on the combination of entity dictionary and rule patterns, and builds the island knowledge graph (ISLKG) from the top to bottom. An ontology layer is initially created to standardize island knowledge, followed by the collection, transformation, and extraction of entities and relationships from multi-source data in order to construct an island knowledge graph. Then, a knowledge reasoning model based on knowledge graph embedding is used for knowledge completion, improving the ISLKG. Finally, the knowledge inference model was verified based on the constructed island knowledge graph. The results indicate that the model can effectively predict missing entities and complement the island knowledge graph.

Keywords:

island; domain knowledge graph; ontology; knowledge reasoning

1. Introduction

Islands are an essential component of marine territory, serving as a foundation and pivot for the expansion of land into the ocean. They also play a critical role in securing national maritime rights and facilitating marine economic growth [1]. Due to the scarcity of island resources, the island’s unique geographical location, and its fragile ecosystem, the islands’ capacity for sustainable development faces significant challenges [2]. In order to achieve the sustainable development of islands, it is inevitable to protect the resources and ecological environment of islands, and realize the coordinated development of the ecological environment and economy of islands [3]. This demands a consistent, precise, and organized system of island information. Yet, the available data on islands are fragmented and insufficient. Therefore, a unified, systematic, and intelligent framework of island knowledge is urgently needed to support island conservation, management, development, and exploitation.

The knowledge graph, as a structured representation of knowledge, is essentially a large-scale semantic network comprising entities, concepts, and various relationships between them. It converts human cognitive information into machine-readable, processable, and presentable forms, enhancing the capacity to organize, manage, and comprehend vast quantities of data. It has proven aptitude in a variety of intelligent applications, including knowledge retrieval, integration, and analysis. As a vertical knowledge graph in a specific field, the island knowledge graph is able to integrate, interconnect, and reason about island knowledge, thereby facilitating the sustainable development of islands.

Even if the knowledge graph for the island domain contains a lot of information about islands, it is still incomplete by its inherent nature [4]. To some extent, the lack of data in the island domain exacerbates the incompleteness of the island knowledge graph. The absence of key entities such as the location, ports, and shipping routes of islands, as well as the absence of linkages between islands and these entities, has a substantial detrimental influence on the development, protection, management, and implementation of knowledge graphs for islands. Knowledge reasoning can infer missing entities and relationships in a knowledge graph and deduce implicit relationships between entities. As a result, it is often used to complete knowledge graphs. Among knowledge reasoning methods, embedded knowledge graph representation methods demonstrate superior performance in knowledge completion tasks by embedding entities and relationships as vectors in a multi-dimensional space. This has become a hot topic of research.

Currently, there are few knowledge graphs in the discipline of oceanography, and no knowledge graphs have been constructed on the topic of islands. The heterogeneous and difficult-to-obtain nature of island data further challenges the construction of island knowledge graphs. The first challenge is that it is difficult to extract knowledge from multi-source heterogeneous data, as these entities are organized in different data sources and their relationship types are complex. The second challenge is that the protection, management, and development of islands require detailed and accurate data, thus placing high demands on the accuracy and completeness of the data.

This paper presents a method for creating an island knowledge graph from multi-source data on islands, and supplements and verifies it using an embedded knowledge representation model. These are the specific contributions:

The composition and construction framework of the island knowledge graph (ISLKG) is proposed, and the island knowledge graph’s ontology is designed. The framework can also serve as a reference for knowledge graphs in other related fields.
The knowledge extraction model based on the combination of entity dictionary and rule pattern is utilized for island unstructured data to realize the structured storage of island knowledge.
A knowledge representation model based on embedding was utilized for knowledge reasoning on the island knowledge graph, resulting in the graph’s knowledge completion. In addition, two new evaluation indicators were proposed to confirm their impact on link prediction results.

This paper’s structure is as follows. Section 2 examines the evolution of knowledge graphs, knowledge graphs in the ocean domain, and knowledge-reasoning-related research. Section 3 presents the overall architecture and construction process of ISLKG, an embedded knowledge reasoning model, experimental results, and ISLKG-based application scenarios. In Section 4 and Section 5, the results of the research are examined and summarized, and future work directions are offered.

2. Related Works

The term “knowledge graph” was first coined by Google in 2012. It is a structured semantic knowledge base used to describe concepts and their interrelationships in the physical world in symbolic form. Its basic unit of composition is the “<entity, relationship, entity>” triplet, and entities are interconnected through relationships to form a network of knowledge structures. Since Google proposed the concept of a knowledge graph in 2012, a large number of enterprises and research institutions have explored knowledge graph research. The majority of early explorations of knowledge graphs were generic knowledge graphs, with YAGO [5], DBpedia [6], Freebase [7], and Wikidata [8] being the most representative. Despite the fact that these large-scale graphs of general knowledge capture a vast quantity of data, there are still many unexplored areas. As technologies such as artificial intelligence, big data, and cloud computing mature, researchers are increasingly turning to knowledge graphs in specific fields to meet industry demands. Domain knowledge graphs have more advantages in related topics than general knowledge graphs. Domain knowledge graphs may extract, organize, and manage knowledge from massive datasets more precisely, hence enhancing the quality of domain information services [9]. However, domain knowledge graphs are often produced manually, necessitating substantial human and financial resources [10]. Currently, the domain knowledge graph is expanding quickly across various fields. In the geologic field closely associated with oceanography, Chen et al. [11] constructed a geographic knowledge graph through crowdsourcing; Li et al. [12] constructed a GIR-oriented Chinese geographic knowledge base in the same year; Wang et al. [13] proposed a formalized geographic knowledge representation, and constructed a geographic knowledge graph GeoKG; Guo et al. [14] proposed a method for constructing a geographic knowledge graph based on multi-source data. In addition to the geosciences, domain knowledge graphs are expanding rapidly in other fields as well. Xiao et al. [15] created a meteorological simulation knowledge graph using deep learning techniques; Tan et al. [16] constructed an urban traffic knowledge graph and augmented it using knowledge reasoning.

Although current technology and theory connected to knowledge graphs are progressing rapidly, the integration of knowledge graphs with the marine domain is not comprehensive enough to use relevant technology to obtain further information in the marine domain. In fact, large-scale research on knowledge graphs in the marine domain, embracing various facets of oceanography, has only arisen in recent years [17]. In 2019, Zhang et al. [18] constructed a marine dangerous goods transportation knowledge graph (KGMDG), thereby introducing knowledge graphs to this field. Based on the Chinese Medical Dictionary and the CNKI database, Liu et al. [19] constructed a knowledge graph for marine traditional Chinese medicine in 2021. In addition, Wu et al. [20] constructed a marine expert management knowledge graph based on Trellisnet-CRF in 2022. In the field of islands, studies focus mostly on island development and exploitation, as well as environmental conservation. The relevant knowledge graph remains blank, and almost no work focuses on the construction of the knowledge graph in the field of islands and the mining of implicit relationships between island entities, which is detrimental to the use of island data for development and utilization, environmental protection, and the promotion of sustainable island development.

Despite the fact that current domain knowledge graphs contain a vast quantity of information, they are incomplete and lack a significant amount of data. This has an unavoidable negative effect on the quality of the knowledge graph and its downstream applications [21]. Additionally, there are other implicit linkages between items inside domain knowledge graphs that have yet to be discovered. As a result, a large number of academics have performed considerable research on knowledge graph completion. Current research focuses on utilizing knowledge graph embeddings for knowledge representation in order to carry out knowledge reasoning, complement and improve knowledge graphs, and uncover implicit relationships between entities. Knowledge graph embedding entails embedding entities and interactions from a knowledge graph into a dense and low-dimensional feature space where semantic relationships between items can be effectively computed. This efficiently tackles the problems of computational complexity and sparse data [22]. Nowadays, numerous knowledge graph embedding models have been suggested and are categorized according to their respective qualities. Distance models, semantic matching models, and neural network models are the most represented.

TransE [23] is the first knowledge graph embedding model. Inspired by word2vec [24], Bordes et al. [23] brought translation invariance into knowledge graph embedding and presented the TransE embedding model. TransE embeds all entities and relationships in a unified, continuous, low-dimensional feature space, where relationships are regarded as connection vectors between entities, and it geometrically models all entities and connections in a knowledge graph. Although TransE has made tremendous progress in the embedding of large-scale knowledge graphs, it is still challenging to deal with complex relationships [25]. TransH [26] introduces a relational hyperplane and uses hyperplane vectors to express and handle complicated relationships, yet excessive computation is an issue. In addition to the distance model, which includes the Trans series, RESCAL [27] proposed the first semantic matching model based on tensor decomposition, modeling triplets as three-dimensional tensors. DistMult [28] restricts the relational matrix in RESCAL to a diagonal matrix, greatly reducing the number of parameters. However, due to the use of a diagonal matrix, DistMult cannot handle asymmetric relationships. In order to solve this issue, Complex [29] attempts to introduce complex number operations into matrix multiplication. In addition, ConvE was the first to incorporate convolutional neural networks into the knowledge graph embedding model. ConvE [30] concatenated the head entities and relationships in the triplet, retrieved the matrix, and modified the matrix’s shape. Convolutional neural networks were then employed to encode the data. The resultant encoding was produced through the fully connected layer, and the scoring of the knowledge graph triplet was determined by calculating the similarity with the tail entity representation. ConvE has strong learning ability, steady and excellent performance in a large number of tests, and is ideally suited for knowledge reasoning of domain knowledge graphs.

Knowledge reasoning based on knowledge graph embedding is typically limited to the general knowledge graph, and there are few papers that use the knowledge graph embedding model to perform knowledge reasoning on the generated domain knowledge graph. Therefore, this article is based on the knowledge graph embedding model (ConvE) and uses the existing information in the island knowledge graph for knowledge reasoning, with the goal of completing the created ISLKG and discovering implicit linkages between island entities.

3. Materials and Methods

This section presents the construction method of ISLKG, including the composition and construction framework of ISLKG, ontology design, data collection, knowledge extraction, knowledge fusion, and storage; it then presents the knowledge reasoning model based on embedding and applies it to the island dataset. The model is then validated, followed by the introduction of an ISLKG-based application case.

3.1. ISLKG Construction Techniques

3.1.1. Overall Framework

The island knowledge graph is constructed using a “top-down” methodology, which includes two stages: constructing the ontology layer and extracting in the instance layer, as opposed to the general knowledge graph that is built using a “bottom-up” methodology. Figure 1 depicts the overall structure of ISLKG. The ontology layer structure represents entity types, attributes, and relationship types between entities, and the instance layer instantiates the designed ontology in accordance with the established concept hierarchy, mapping rules, and constraints in the ontology layer.

ISLKG’s construction framework is depicted in Figure 2. First, the ontology layer defines a sequence of ideas, relationship types, characteristics, mapping rules, entity restrictions, and concept hierarchies, which will be used to guide tasks at the instance layer, such as data collection and knowledge extraction. Secondly, the instance layer collects multi-source island data according to ontology specifications. Mapping is used to collect the structured island database; crawler technology is used to collect data from semi-structured web pages and Json files and non-structural data such as web page texts and books. The data obtained from the three data sources are independently acquired, processed, and organized for quality control in line with each source’s specific requirements. Data processing will be repeated if the quality evaluation result does not meet requirements; if it does, knowledge fusion will be performed. The island knowledge graph is completed by storing the fused island knowledge graph triplets in the neo4j graph database.

Correspondingly, the construction and knowledge inference process of ISLKG is illustrated in Figure 3. The first step involves designing the ontology for the islands. Specifically, based on the existing structured knowledge framework in the field of islands, the ontology is semi-automatically constructed using the Protégé tool developed by Stanford University. The second step involves collecting multi-source data based on the constraints of the island ontology. This primarily includes the collection of structured, semi-structured, and unstructured data. Detailed explanations and collection methods for these data types will be provided later in this paper. The third step involves performing initial data cleansing on the collected data. This mainly focuses on removing empty, erroneous, and corrupted text from unstructured data. The data for each island are integrated to maintain uniqueness, resulting in the compilation of the Island Text Corpus. Subsequently, knowledge extraction is conducted on the cleansed Island Text Corpus. Following the entity categories specified by the ontology, entities of all categories for each island are extracted to form the constituent units of the island knowledge graph—triples. A second round of data cleansing is applied to the extracted triples, targeting erroneous, corrupted, and illegitimate entities.

Subsequent to this, the cleaned triples undergo knowledge fusion, primarily utilizing the Dedupe tool for entity alignment, merging duplicate triples, and semantically similar triples. This enhances the data quality and usability of ISLKG. Ultimately, the aligned ISLKG is subjected to quality evaluation. The current evaluation standards primarily encompass two aspects: the accuracy of the triples sampled from ISLKG queries exceeding 80%, and the validation results of the knowledge inference model. If these criteria are met, the knowledge is stored in the graph database. Otherwise, more data are collected to augment the knowledge graph. At this point, the construction of ISLKG is complete, albeit still partially comprehensive. Therefore, additional knowledge inference is applied to enhance the existing knowledge graph of ISLKG. The quality of the augmented knowledge graph is evaluated and validated. Based on these validation results, it is determined whether more data collection is required for ISLKG augmentation. If the validation results meet the criteria, the ISLKG inference process is completed.

3.1.2. Ontology Construction of Island Knowledge Graph

An ontology refers to a precise, formal, and systematic definition of concepts and their relationships in a particular domain [31]. It describes the knowledge graph’s data format and is essentially the knowledge graph’s uppermost layer. The ontology of the island knowledge graph contains the concepts, relationships, and properties of the island domain, as well as the concept hierarchy, concept constraints, and mapping rules. There are five categories of first-level ideas in the island ontology library: fundamental information, social information, scientific research activities, infrastructure, and natural qualities. Figure 4 depicts the hierarchical structure of the island ontology library, which is composed of numerous second- and third-level concepts that relate to the first-level concept.

Basic Information

Basic information is the most fundamental category of islands, encompassing notions such as former names, present names, classification, and marine areas to which they belong. The concept names, entity examples, and concept constraints for fundamental island data are detailed in Table 1. In addition, the basic properties of the island are outlined in Table 2, which are included in the island’s basic information.

2.: Social Information

Island social information refers to the information that indicates the environment and mode of human social activities on the island. Table 3 presents the main concepts of social information, such as businesses, hospitals, schools, tourist attractions, administrative agencies, and social activities.

3.: Research Activity

As stated in Table 4, research activities refer to certain scientific research undertaken by scholars on the island, including the names of research specialists, papers linked to the island, and the island’s study field.

4.: Infrastructure

Infrastructure refers to the basic engineering facilities on an island that provide public services for social production and residents’ lives, such as ports, anchorages, wharves, waterways, and other transportation facilities that are crucial to the economic operations of the island. Table 5 illustrates that the island ontology database includes some concepts that are important for an island such as water conservation and electric power.

5.: Nature Property

The natural property describes the natural resources and natural environment of the island and its surrounding waters. The natural resources include, as indicated in Table 6, land resources, water resources, biological resources, marine resources, vegetation, etc. The natural environment consists of the island’s terrestrial ecosystem, the surrounding sea environment, and natural disasters. Moreover, these ecosystems encompass topics like climate, meteorology, and hydrology. Table 7 displays the entity examples and conceptual limitations of the natural environment.

In the category of natural resources on islands, the biological resources of islands encompass the plants, animals, and microorganisms of both the island’s terrestrial environment and its surrounding maritime areas. Therefore, within the subset of biological resources on islands, they are further classified into marine and terrestrial biota. Specifically, terrestrial biota includes the species of animals and plants that inhabit the island, such as capercaillie, brown bear, bats, etc. Terrestrial plants encompass species like palms, coconut trees, and dandelion. Marine biota consists of the plants, animals, and microorganisms found in the vicinity of the island’s surrounding waters. For instance, salmon, seaweed, and cyanobacteria are examples of marine organisms.

Beyond biological resources, the vegetation information of islands is categorized separately. This is because the plant resources of islands primarily serve to highlight the diversity of plant species in the region, whereas vegetation information focuses more on the overall characteristics and distribution of plant communities. For instance, island vegetation information may encompass categories like moss, shrub, and mangrove forest.

The natural property of the island in marine meteorology and climate, ocean hydrology, and marine chemistry contains many entities that are associated with numerical data, such as air temperature, surrounding sea temperature, pH, sunshine, wind speed, etc. As the observation data are mostly maintained in an organized format in the database, mapping principles must be established to extract semantic concepts from these structured data. We divide the mapping principles into two parts: the form or structure of conceptual mapping and the approach of mapping.

The island ontology library primarily designs the mapping concept’s structure. For example, when mapping the concept of “average monthly temperature”, the temperature of the island entity is accumulated by hour and then by day to determine its average value, and the associated triplets are recorded. The approach of conceptual mapping will be detailed in Section 3.1.3 on structured data collecting.

6.: Relationship Definition

In addition to conceptual concepts, knowledge graphs also include relationships between entities. Since island knowledge is a subset of geographic knowledge, the relationship between island entities inherits the general relationship between geographic entities and is divided into two types: spatial and semantic. The geographical relationship is subdivided into distance relationship, orientation relationship, and topological relationship, while the semantic relationship is subdivided into the entity attribute relationship and the objective attribute relationship.

Topological relationships are invariant with respect to topological transformations, such as rotation, scaling, and translation [32]. Topological relationships between entities include intersection, disjoint, containment, inside, equality, overlap, tangency, and intersection. In the island knowledge graph, the orientation connection is depicted, i.e., the orientation of one island relative to another, namely east, west, south, north, northeast, northwest, southeast, and southwest. The geographic relationship in the island knowledge graph comprises both quantitative and qualitative distance. The quantitative distance is the geographical straight-line distance between two islands. The qualitative distance describes four levels of perceived distance: quite near, near, far, and quite far, corresponding to quantitative distances greater than 25 km, 250 km, 500 km, 1200 km, and 1200 km, respectively.

The relationship between the island and its basic attributes reflects their membership or subordination or inclusion relationships. Examples include the triples “<Changxing Island, former name, Changsheng Island>”, “<Changxing Island, the sea area to which it belongs, the East China Sea>”, “<Nan’ao Island, Bridge, Nan’ao Bridge>”, “<marine industry, including, coastal tourism>”, and “<climate, including, precipitation>”.

3.1.3. Data Collection and Preprocessing

The initial stage in generating the instance layer of the island knowledge graph is the collecting and processing of multi-source knowledge data of the island. Since the task of the instance layer requires the guide of the ontology layer, data collection and processing must be based on various data formats, conceptual constraints, and mapping rules. According to the data format, island data sources are classified as structured data, semi-structured data, and unstructured data. For different data formats, distinct data collection techniques are utilized. The next section describes in detail the features and collection techniques of the three data sources.

Structured Data

Structured data, also referred to as row data, are logically represented and realized by a two-dimensional table structure, strictly adhere to the data format and length criteria, and are primarily stored and managed in a relational database. Being one of the forms of island multi-source data, island structured data represent a reasonably large fraction of the total data and a very important data source in the island knowledge network. Particularly for observational data such as marine hydrology, marine water quality, marine climate, and meteorology, such as temperature, humidity, salinity, pH value, dissolved oxygen, nitrate, etc., this sort of data is semi-structured or unstructured throughout the island. The majority of data sources are kept in relational databases or other types of structured data sources. Thus, it is crucial to convert structured data mapping to island knowledge graph triples. Designing the corresponding conceptual mapping method based on the characteristics of the structured format of the to-be-collected dataset and converting the structured data mapping into the conceptual structure form mandated by the mapping rules of the island ontology database constitute the majority of the collection method for structured data. For various organized types of island data, various conceptual mapping methodologies are utilized. Using the Mysql data conversion tool, for instance, the structured island data in the Mysql relational database are converted into the triple form of the island knowledge graph, and the mapping of the structured data in ASCII format is collected based on the number of characters of the concept to be mapped. Figure 5 displays a mapping method example for structured data in this format.

The image depicts a structured data file derived from the Chinese observation station data of the National Oceanographic Data Center of China (http://mds.nmis.org.cn, accessed on 1 December 2022), which consists of the marine meteorological station’s observation, collecting, decoding, format checking, and code. After conversion, standardization, automatic quality control, visual inspection, calibration, etc., the real-time data of maritime meteorology, wave, temperature, and salinity are standardized. The format is ASCII [33], and the number of characters, such as 1–15 numbers, is utilized to separate column data. The 16th to 19th digits represent the date, followed by the hours, latitude, longitude, visibility, temperature, wind speed, and air pressure, from left to right. It also includes additional marine meteorological data. Using Xiaochangshan Island’s temperature as an illustration, the conceptual mapping technique is as follows: Then, determine and collect the number of characters where the temperature is situated. The average temperature of Xiaochangshan Island for a given month is calculated by averaging the daily and hourly temperatures for that month. Lastly, based on the mapping rules specified in the island ontology library, it is grouped into relevant structural forms, and the triples pertaining to the island’s temperature are retrieved.

2.: Semi-structured Data

The term “semi-structured data” refers to data that are captured or formatted in non-standard ways. Due to the lack of a predetermined schema, semi-structured data do not adhere to the structure of a tabular data model or a relational database. However, the data are not entirely raw or unstructured; they have certain structural features, such as labeling and organizational metadata, to facilitate analysis. In comparison to structured data, semi-structured data are more flexible and easier to extend. Popular semi-structured data formats include XML, HTML, and JSON, among others. In the island knowledge graph, semi-structured data can supplement and correct fundamental notions such as island area, location, and coastline length extremely well. The primary source of semi-structured data for the island knowledge graph is the semi-structured data section of the encyclopedia. Using the encyclopedia page for Nan’ao Island as an example, the page’s semi-structured data consist of island information that has been integrated and processed by the encyclopedia, and its data collection method relies primarily on web crawler technology. We obtain the island’s encyclopedia page by first conducting a search using the normal name of the island. The second step is to evaluate the Sea Island Encyclopedia’s HTML page information to extract its semi-structured data. Then, construct island triplets using the acquired semi-structured data, such as <Nan’ao Island, area, 117.73 km²>, <Nan’ao Island, coastline length, 94.3 km>, etc.

3.: Unstructured Data

This section focuses mostly on the acquisition of unstructured data from an island; the mechanism for extracting unstructured data will be discussed in a later section. Unstructured data have an irregular or incomplete structure, for which there is no predetermined data model, and for which it is inconvenient to express using the database’s two-dimensional logic table. This data category encompasses all office document types, text, images, HTML, numerous reports, photos, audio/video data, etc. There are numerous formats and standards for unstructured data, and unstructured information is technically more difficult to standardize and comprehend than structured information. In comparison to structured and semi-structured data, unstructured data are vast and contain a wealth of information. Most concepts in the island knowledge graph can be discovered in the island’s unstructured data, and obtaining unstructured data is similarly simple. It is possible to collect unstructured data from any place. Unstructured data are abundant and simple to acquire but contain a substantial amount of data that are irrelevant to the goal information. How to properly and thoroughly extract desired target information from unstructured data has always been a challenging and study-worthy task, and academic circles have also undertaken a great deal of research on the topic.

The majority of the unstructured data sources for the island knowledge graph consist of island-related books, journals, surveys, news articles, and websites. Initially, books, monographs, survey bulletins, and other paper-based data are semi-automatically scanned into computer-side text, while island-related webpage data such as newspapers, news articles, and encyclopedias are scanned manually. Crawler technologies are used to explore and analyze the HTML pages that have been crawled to collect the island’s unstructured text. Then, a series of data-processing operations, such as data cleansing, data formatting, etc., are conducted on the two types of collected texts. The processed data are then summed up and incorporated to create an island text library based on unstructured data.

3.1.4. Knowledge Extraction

The second stage in the instance layer of the island knowledge graph is to extract knowledge from the obtained data by using the ontology library for the island. After data collection and processing, structured and semi-structured data can be directly translated due to their predefined structures. For the triples required for the island knowledge graph, no more knowledge extraction is needed. After the acquisition and processing of the unstructured data of islands, a text library containing rich information about islands is created. Unfortunately, it cannot be loaded into a knowledge graph. So, the main objective of knowledge extraction is to extract knowledge from the island text library, extract the abundant island information contained within it, and construct a triplet of the island knowledge graph.

On the basis of the characteristics of the island text domain knowledge in the island text database and the emergence pattern of the island-related concept grammar, a knowledge extraction method that combines entity dictionary and rule patterns is proposed, and the entity recognition of the island text database is accomplished in an efficient and exhaustive manner. Knowledge extraction, which is a technique for entity recognition, is unsupervised. Figure 6 demonstrates that its recognition and extraction procedure is divided into two sections: knowledge extraction based on rule patterns and knowledge extraction based on entity dictionary. The former is primary and the latter is supplementary; the rule model designed and constructed by the former can supplement the entity dictionary for the latter, and the entity dictionary of the latter can guide and assist the construction of the rule model, thereby completing the island knowledge extraction from text databases to construct island knowledge graph triples. The next section describes the implementation and combining of details of the two approaches.

Knowledge Extraction Based on Rule Patterns

The concept of knowledge extraction based on rules and patterns is to build entity extraction rules after analyzing the grammatical composition and patterns of specific domain texts and entities in order to accomplish the extraction. The foundation of knowledge extraction based on rule patterns is the building of rule templates, which needs a comprehensive examination of the word formation rules of entity words or attribute values in domain texts, including character word formation rules and combination rules for parts of speech. First, make numerous observations on the island text in the island text database, then identify and describe the laws governing its entity presentation and grammatical structure. Then, develop and build the rule pattern based on its rules, and utilize the regular expression to extract. For instance, “w+([-+.]w+)@w+([-.]w+).w+([-.]w+)*$” is meant to match the email address for extraction, as the email expression is often “[email protected]”. The regular template phrase “d4[year-]d1,2[month-]d1,2day” is meant to recognize the date, and based on the part-of-speech rule of the date entity, the part-of-speech mark of the word is used to extract the combination rule, where word segmentation technology is employed. First, the word containing time information is segmented using Jieba word segmentation. Then, the word containing the parts of speech “m” (number) and “t” (time) is extracted, and the date’s authenticity is verified. Use sequential time terms as date entities.

If the process of rule pattern building necessitates the semantic complement of the entity dictionary, the entity dictionary constructed by the second technique provides support. On the basis of the island text database, it is essential, for instance, to establish which islands have the marine industry of “seawater salt production and salinization business”. After observing and summarizing the rules of its entity occurrence in the island text database and querying the relevant terms in the entity glossary, it has been determined that if an island has a seawater-salt-making and salinization industry, there is a high likelihood that the relevant professional terms will appear in its island text. Therefore, the professional terms included in seawater-salt-making and the salt chemical industry in the entity glossary are introduced into the rule mode, and the entity extraction rule mode of “salt making | salt making | salt drying | salt mining | salt processing | raw salt | sea salt | sea salt production | electrodialysis | freezing | salt industry” is built. After the rule pattern of the entity to be retrieved has been generated, the entity of interest is extracted from the island text database using the constructed rule pattern. The correctness of the entities extracted based on the rule pattern is dependent on the precision of the rule pattern’s design and construction, the rule pattern’s generalizability, and the quality of the island text library. Hence, the retrieved entities must undergo quality evaluation and data cleansing. At the conclusion of the knowledge extraction method based on the rule pattern, a series of data-processing operations were performed on the extracted entities, such as the quality evaluation of the extracted entities, data cleaning, data organization, etc., and the target triplet of the concept to be extracted was obtained.

2.: Knowledge Extraction Based on Entity Dictionary

Matching recognition based on an entity dictionary is often used for unsupervised named entity recognition problems in natural language processing. Even though an entity dictionary can only partially match the dictionary of the target text, it is still quite successful, and when domain-specific entity dictionaries are compared, the knowledge extraction approach based on the rule pattern can be supplemented and verified by the matching recognition based on entity dictionary, which, if everything goes as planned, has very high accuracy.

The knowledge extraction approach used in this article, which is based on an entity dictionary, creates an island entity dictionary by transforming the current marine thesaurus and experts in the marine sector, and then expands it using an entity extraction rule model. The island text library text is then word-segmented based on the island entity dictionary. The name and part of speech of each word in the entity dictionary are used for tagging to create a user dictionary to intervene in Jieba word segmentation in order to ensure that entity words appear in their entirety. This is realized with the aid of the user dictionary function in the Jieba word segmentation tool. Then, based on the evaluation results, determine whether it is necessary to expand the island dictionary by evaluating the word segmentation results, such as by using the maximum matching method to check the word segmentation results. If necessary, modify the created rule pattern or use crawler technology. Otherwise, in the outcomes of word segmentation, entity extraction is carried out in accordance with tags. Entity dictionary is increased. To acquire the target triplet of the concept to be extracted, the retrieved entity is next subjected to the same data processing as the first technique.

3.: Knowledge extraction model based on the combination of entity dictionary and rule pattern

As shown in Figure 6, taking the knowledge extraction of Sanmen Island as an example, prior to extraction, entity categories and constraints for the words to be extracted are determined based on ontology-based restrictions. For the knowledge extraction of longitude and latitude, for instance, the unstructured text is initially segmented using the Jieba tool and the existing marine dictionary. The quality of segmentation depends on the coverage of the marine dictionary for the entities to be extracted from the text. Since the marine dictionary cannot cover all unstructured text information related to island entities completely, an evaluation of the segmentation result is performed. If the segmentation result contains longitude and latitude that adhere to the entity constraints, there is no need to expand the dictionary. Knowledge extraction is executed, resulting in the construction of the triple <Sanmen Island, longitude and latitude, (East longitude 114°37′58″, North latitude 22°27′47″)>, which is then stored in the knowledge graph, and the process proceeds to the next word to be extracted.

However, if the segmentation result does not contain longitude and latitude or if the entities extracted do not meet the constraint requirements, the segmentation result is deemed ineffective, prompting the need for dictionary expansion. At this point, the rule pattern extraction method is employed. Initially, the syntax rules for longitude and latitude are summarized and verified, specifically the format X°Y′Z″, where XYZ are integers. Subsequently, extraction patterns are designed using regular expressions. By determining whether the text preceding X refers to longitude or latitude, the currently extracted value is identified as either longitude or latitude. The longitude and latitude of Sanmen Island are extracted, and the resulting triple is constructed for storage in the ISLKG. Simultaneously, the extracted word is added to the marine dictionary. If the knowledge extraction based on pattern rules still fails to yield longitude and latitude, it is concluded that the text related to Sanmen Island does not contain this information. The text is then marked for future data supplementation. The automation of the extraction process is applied to each word to be extracted for every entity category in the island text. This process is iteratively performed for each island in the island text database until all pending entity extractions for island texts are completed.

3.1.5. Knowledge Fusion and Storage

Knowledge fusion is the final stage of the island knowledge graph’s instance layer. Once data collection and knowledge extraction are accomplished, the same island entity often exhibits various expressions due to the diverse origins of data sources and the variation in Chinese descriptions. This multiplicity necessitates a process of entity alignment to mitigate potential data redundancy within the knowledge storage of the graph. Entity alignment plays a pivotal role in identifying coreference relationships across distinct knowledge graphs—a foundational task within the realm of knowledge fusion. By facilitating the integration of knowledge from diverse sources, entity alignment enriches the comprehensive representation of information for subsequent analytical endeavors.

In contexts such as data governance, the practice of entity alignment becomes particularly pronounced, serving as a primary strategy for resolving redundancy. Essentially, it involves a deduplication procedure aimed at consolidating duplicated entities. To achieve this, we employ the Dedupe entity alignment tool, a Python library that leverages machine learning techniques to efficiently perform tasks such as entity alignment, fuzzy matching, and deduplication on structured data. The core methodology of Dedupe revolves around comparing and assessing sample similarity across a range of data formats. This approach transforms data deduplication into a feature-based scoring process, culminating in the grouping of related data instances into coherent clusters. Through this iterative process, deduplication emerges as a cornerstone of effective data refinement.

For example, taking the triple <ChongMing Island, area, 1269.1 km²> from ISLKG, due to the multi-source nature of the data, there might be multiple repeated triples. Additionally, variations like <ChongMing Island, area, 1269.1 square kilometers>, <ChongMing Island, area, 1,269,100 m²>, and <ChongMing Island, area, 1,269,100 square meters> may arise, representing semantically equivalent facts but with differing units or descriptions. There could even be triples like <ChongMing Island, area, 313,903 acres of land>. Despite differences in description, units, and format, all these triples essentially convey the same fact: the area of ChongMing Island is 1269.1 km². Therefore, the need arises to align these entities and eliminate duplicates. To achieve this, the Dedupe tool is employed for entity alignment. The process begins with data preprocessing—organizing these triples into a dataset, ensuring consistency in the subject entities and relationship fields of each triple. Subsequently, this dataset is fed into the model for training. The model automatically learns similarity features to aid in identifying duplicate entities. Next, the trained model is utilized to align the entities within the dataset. The model applies learned similarity rules to determine which entities should be considered the same. Finally, based on the model’s output, aligned entities are merged to create a clean and consistent entity collection. This process combines various similar yet distinct expressions of entities into a unified representation while retaining the most accurate descriptions and units. This enhances the data quality and usability of ISLKG.

Knowledge storage is the final phase of island knowledge graph creation. After knowledge fusion, island information with a clear structure and a wealth of features and relationships benefits greatly from being stored in a graph database [34], which can be viewed from several dimensions including entities, attributes, and ideas. High-performance NOSQL graph databases like Neo4j store structured data on the network rather than in tables. Neo4j is also a high-performance graph engine with all the attributes of an established database. Due to these benefits, Neo4j is currently the most widely used graph database. There is a vibrant community and an established ecology in the database. Moreover, it features its own visualization tools that help users view graphs as well as a unique semantic query language called Cypher [35]. To implement the knowledge storage of the island knowledge graph, this study selects Neo4j as the knowledge graph store platform. The island concepts and entities are saved as nodes, the spatial and semantic relationships are stored as edges, and the fundamental qualities of the island are stored as attribute values. Figure 7 and Table 8 illustrate an example of the storage format for the island knowledge graph in Neo4j. Taking Changxing Island as an example, the floating box in the figure defines its basic attributes such as category, latitude and longitude, population count, length of coastline, and distance to the nearest mainland. Entities within the ISLKG are represented by nodes of different colors, including Changxing Island itself, as well as its soil types, economy, mineral resources, energy resources, elemental composition, marine industries, scenic spots, etc. Arrows between two entities indicate their relationships and the direction of the relationships, such as the relationship indicating that Changxing Island is subordinate to Shanghai.

3.2. Knowledge Reasoning Model Based on Knowledge Graph Embedding

Knowledge reasoning based on knowledge graph embedding, also referred to as knowledge reasoning based on knowledge representation learning, aims to embed the components of the knowledge graph (including entities and relationships) into a continuous vector space to preserve the knowledge graph while simplifying operations. Intrinsically structured and computationally involved to perform tasks such as link prediction. Knowledge graph embedding offers a denser representation of entities and interactions in knowledge graphs, reduces computational complexity in its application, and mitigates the effect of data sparsity on model inference results. Moreover, by assessing the similarity of their low-dimensional embeddings, knowledge graph embedding can directly reflect the similarity between entities and relationships.

3.2.1. Model Definition

The ConvE model is utilized in this paper. ConvE is a CNN-based method whose fundamental concept is to describe the interaction between entities and relationships using convolutional and fully connected layers by treating the head entities and relationships of knowledge graph triples as feature maps. The primary characteristic of the ConvE model is the prediction score of the fact triplet (h, r, t), where h, r, and t denote head entities, relationships, and tail entities, respectively. It is defined by the convolution of the 2D graph embedding, as illustrated in Figure 8.

In the ConvE model, the embeddings of entities and relationships are first reshaped and concatenated; the resulting matrix serves as the input to the convolutional layer; the resulting feature map tensor is vectorized and projected into the k-dimensional space; and all candidate object embeddings are compared for a match. The scoring function is formally defined as follows:

f_{r} (h, t) = σ (v e c (σ ([M_{h}; M_{r}] * ω)) W) t

(1)

where

r \in R^{k}

is a relational parameter that depends on r;

M_{h}

and

M_{r}

denote a 2D reshaping matrix of

h

and

r

, respectively; if

h

,

r \in R^{k}

, then

M_{h}, M_{r} \in R^{k_{w} \times k_{h}}

, where

k = k_{w} k_{h}

.

During the feed-forward pass, the model first performs a row-vector lookup operation on two embedding matrices: one for entities, denoted

E^{|ε| \times k}

, and the other for relationships, denoted

R^{|R| \times k^{'}}

, where

k

and

k^{'}

are the embedding dimensions of entities and relationships, and

|ε|

and

|R|

represent the number of entities and relationships. The model then concatenates

M_{h}

and

M_{r}

and uses it as the input of a 2D convolutional layer with a convolution kernel

ω

. This layer returns a feature map tensor

T \in R^{c \times m \times n}

, where

c

is the number of 2D feature maps of dimensions

m

and

n

. The tensor

T

is then reshaped into a vector

v e c (T) \in R^{c m n}

, projected into k-dimensional space by a linear transformation parameterized by matrix

W \in R^{c m n \times k}

, and matched with the target embedding

t

by the inner product. The parameters of the convolution kernel and matrix

W

are independent of entities h and t and relationship r.

3.2.2. Model Training Process

To train the model parameters, a sigmoid activation function

σ (\cdot)

is applied to the scoring scores, i.e.,

p = σ (f_{r} (h, t))

, and the following binary cross-entropy loss is minimized:

L (p, t) = - \frac{1}{N} \sum_{i} (t_{i} \cdot l o g (p_{i}) + (1 - t_{i}) \cdot l o g (1 - p_{i}))

(2)

where

t

is a label vector with dimension

R^{1 \times 1}

for 1-1 scoring and

R^{1 \times N}

for 1-N scoring. The elements of vector

t

are those relationships that exist; otherwise,

t

is 0.

The training first utilizes rectified linear units as nonlinear

σ

to increase training speed, followed by batch normalization after each layer to stabilize, normalize, and increase convergence speed. Second, the model is tweaked via dropout in many stages, including embedding layers, feature map layers following convolution operations, and hidden layers following fully connected operations. Adam is then utilized as the optimizer, and label smoothing is implemented to reduce overfitting caused by output nonlinear saturation on labels.

For the constructed ISLKG, in the training and validation process, the data of ISLKG are first exported and adjusted to conform to the input format of the ConvE model. Upon initiating the training, ConvE reshapes and concatenates each triple in the ISLKG based on the aforementioned procedure. It then performs convolutional neural network computations, and finally reshapes the results into a one-dimensional vector. This vector is projected into a multi-dimensional space through a linear transformation. Subsequently, it is matched with the target embedding, and prediction probabilities are calculated after this matching process.

3.2.3. Experiment and Results Analysis

One objective of knowledge reasoning is to augment the knowledge graph. The degree of knowledge graph completion is used to evaluate the performance of the knowledge reasoning model, and the completion degree is often evaluated by the link prediction task. Given two elements of a triple, such as the known head entity and relationship, the link prediction task entails predicting the third element, such as the proper tail entity, based on the known head entity and relationship. Given the queries (h, r, ?) or (?, r, t), the formal definition predicts the proper set of head entities and tail entities.

The data from the island knowledge graph that was created and stored in Neo4j are the data that were used in this paper. The data of the island knowledge graph were divided into a training set, a verification set, and a test set based on the ratio of 84%:7%:7%. Table 9 displays specific descriptions of the dataset’s entities, relationships, and triples.

The assessment indicators of the link prediction task of the island knowledge graph consist of three sorting indicators: MRR (Mean Reciprocal Rank), MR (Mean Rank), and Hits@k. This is because the learning of entities and relationships is viewed as a sorting task by the link prediction task. Specifically, for each triplet in the test set and validation set, replace the head entity or tail entity with other entities related to the relationship in the island knowledge graph. These replaced triplets are referred to as negative samples, while the correct triplets are referred to as positive samples. After being scored by the scoring function, the correct triplets are sorted in descending order to determine their rank. Take the projected tail entity as an example for the evaluation sample (h, r, t), assuming that its accurate label set is S = e|(h, r, e). To calculate the aforementioned indicators, the ranking value of the right label t from the present assessment sample (h, r, t) must be counted among the candidate entities. There are two statistical approaches for counting the sorting value: the original sorting value and the filtered sorting value. The original ranking value is to directly count the ranking value of t in the candidate entity as the final rank. However, given that the correct label set S includes not only the label t, but also other entities, if the model only counts the ranking value of t in the candidate entity, then the final rank will be incorrect. If the correct label prior to t is disregarded, the estimated ranking value is too high; hence, this paper uses the filter sort value to filter out all accurate labels rated at t. Then, determine the rank of t as the final position. Based on the preceding facts, the calculating procedure for MR is as follows:

M R = \frac{1}{N} \sum_{i = 1}^{N} {r a n k}_{i}

(3)

where is the number of evaluation samples, and

{r a n k}_{i}

represents the ranking value of the correct label in the i-th sample. The meaning of MR stands for Mean Rank. As evident from the formula, for each test triple (h, r, t), the model predicts a score for all possible entities and ranks them based on the scores. The average rank of all test triples corresponds to the MR value of the model. Generally, a smaller MR value indicates better model performance. MRR stands for Mean Reciprocal Rank, which is a more comprehensive evaluation metric. It considers not only the rank of the correct tail entity but also the reciprocal of the rank. For each test triple, MRR calculates the reciprocal rank of the correct tail entity and then averages the reciprocal ranks of all test triples. The MRR value ranges from 0 to 1, with a value closer to 1 indicating better model performance. The corresponding MRR is calculated as follows:

M R R = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{{r a n k}_{i}}

(4)

The Hits@k indicator counts the proportion of the top k correct labels in all evaluation samples, and the calculation method is as follows:

H i t s @ k = \frac{1}{N} \sum_{i = 1}^{N} I ({r a n k}_{i} \leq k)

(5)

where

I (x)

is an indicative function, which takes a value of 1 when the parameter is true, and 0 otherwise. In the Hits@K metric, for each test triple, the model’s prediction for the correct tail entity is checked to see if it falls within the top K predictions. If it does, it is counted as a “hit”; otherwise, it is considered a “miss”. The calculation of Hits@K involves dividing the total number of hits across all test triples by the total number of test triples, resulting in a hit rate value. Smaller values of K, such as 1, 3, 10, etc., are typically used to indicate the hit rate of the model’s top K predictions. In summary, while MR and MRR mainly focus on ranking, Hits@K emphasizes the performance of the model’s predictions within the top K candidates. This paper selects Hits@10, Hits@3, and Hits@1 as evaluation indicators. When the MR evaluation index is small, and the MRR and Hits@k indexes are large, it indicates that the reasoning performance of the model is better.

Comparing the ConvE model with the classic knowledge graph link prediction models DistMult and Complex, utilizing the aforementioned evaluation indicators to assess the performance of these three models in link prediction, and adjusting the learning rate during training with an adaptive optimizer. Table 10 displays the appropriate hyperparameter settings for these three models.

On the dataset of island knowledge graphs produced in this paper, the link prediction outcomes of several models are compared. Table 11 displays the model’s average prediction results for head and tail entities, where bold denotes the ideal model predictor value. Specifically, the ConvE model achieved higher scores on the MRR metric, indicating its superiority in terms of average reciprocal ranking for predicting the correct head and tail entities. Furthermore, the ConvE model exhibited lower MR values, demonstrating better performance in terms of average ranking. Most notably, the ConvE model’s hit rates (Hits@10, 3, 1) at different K values significantly surpassed the other two models. This implies that the ConvE model is more accurate in identifying the correct entity predictions within the top K predictions. These results suggest that the ConvE model possesses stronger capabilities in representation learning and prediction within the context of knowledge graph embeddings.

Comparing the link prediction outcomes of various models on the island knowledge network dataset reveals that ConvE is superior to other models in all evaluation indicators, and is 0.02 higher than other models in MRR and Hit@10 above.

In addition, this paper uses the result of link prediction as one of the quality evaluation indicators of knowledge graph construction, and compares the link prediction result of ConvE on the island knowledge graph dataset with its link prediction public datasets WN18RR, FB15K-237, and YAGO3-10, as well as their entities, relationships, and number of triples. In addition, this study includes the comparative indices T/E and E/R. In the knowledge graph, T/E reflects the ratio between the number of triples and the number of entities. The precise definition is the average number of relationships between each entity and other entities, or the density of the relationship between entities in the knowledge graph. The greater the value of T/E in the knowledge graph, the greater the average number of relationships between each entity and other entities, and the greater the relationship between entities in the knowledge graph. E/R reflects the ratio between the number of entities and the number of relationships in the knowledge graph, i.e., the average number of entities corresponding to each relationship in the knowledge graph, i.e., the relationship’s complexity. When the number of entities in the knowledge graph is big, the smaller the E/R value, the fewer the number of entities corresponding to each relationship, and consequently, the higher the complexity of the knowledge graph relationship. We believe the number of entities, relationships, and triples in the knowledge graph is the foundation for enhancing link prediction once the model has been determined. When the knowledge graph’s data volume reaches a certain threshold, the T/E and E/R indicators have a substantial effect on the ultimate prediction results. The experimental results presented in the table below support our conclusion.

Table 12 displays the data volume of KG datasets and the outcomes of their link prediction using ConvE models. The bold font represents the optimal value of the same evaluation indicator across the four datasets. It can be shown that the T/E and E/R of FB15K-237 are superior to those of other datasets, indicating that the dataset is the most complicated and that the link between entities is the closest. Significantly better than MR in other datasets. Nevertheless, YAGO3-10 has the biggest data volume among the four datasets, and its complexity is second only to FB15K-237. The ConvE model has achieved the best Hits@10 on the YAGO3-10 dataset. The island knowledge graph dataset created for this paper has attained the highest index values for MRR, Hits@3, and Hits@1, while the index value for Hits@10 is second only to the best, YAGO3-10. Specifically, our ISLKG achieved an MRR score of 0.454, indicating remarkable performance in terms of average ranking and the ability to position correct tail entities higher in the rankings. Additionally, in terms of Hits@3, we obtained a result of 0.490. Similarly, for Hits@1, we achieved a result of 0.387, signifying high accuracy in the top three and highest predicted positions. Notably, our ISLKG dataset’s Hits@10 score was only slightly lower than the best-performing Yago-3 dataset. This indicates that our model’s hit rate within the top 10 predictions is also approaching the best performance levels.

4. Discussion

4.1. Challenges in ISLKG

The usefulness of knowledge graphs in the application field has greatly promoted the development of knowledge graphs in many fields. However, due to the multi-source and complexity of marine data, there are relatively few knowledge graphs in the marine domain, and research on the construction of island knowledge graphs based on multi-source data is still blank. Due to the heterogeneity of diverse fields, it is also challenging for general knowledge graph construction frameworks to be universal. In addition, the scarcity of island data exacerbates the island knowledge graph’s incompleteness. This research proposes a framework for constructing an island knowledge graph utilizing data from multiple sources. It includes ontology design, multi-source data collection and extraction, knowledge fusion and storage, and the construction of an island knowledge graph. This research also performed knowledge reasoning based on the constructed island knowledge graph, validated the link prediction effect of the embedded knowledge graph representation model, and proposed two new evaluation indicators for quality evaluation and knowledge completion of the island knowledge graph. However, there are still some deficiencies that need to be addressed in this research:

The knowledge extraction method based on the combination of entity dictionary and rule patterns proposed in this research can accurately identify different types of entities in the island domain in the island text database, but it requires users to write a large number of regular expressions, and the extraction effect is dependent to some extent on the quality of regular expressions, resulting in high labor costs.
The knowledge extraction method based on the combination of entity dictionary and rule patterns is able to extract the majority of island entities of the corresponding category, but it has a weak semantic understanding ability of the text, and may not be able to achieve good results in some cases where entities must be extracted based on context semantics.
This research proposes two new evaluation indicators, T/E and E/R, and verifies their effects on link prediction results. Their stability and degree of influence, however, require additional investigation.

4.2. Applications of ISLKG

The application of domain knowledge graphs in their respective fields is also a current research focus, which can be categorized into systemic applications and domain-specific applications. Systemic applications are constructed based on the inherent characteristics and capabilities of knowledge graphs. Currently, well-developed systemic applications of knowledge graphs include visualization, intelligent search, question-answering, and decision support. Domain-specific applications, on the other hand, originate from real-world requirements within a specific domain. They integrate the domain-specific knowledge graphs with practical demands to create advanced applications.

Similarly, the application of the island-specific knowledge graph (ISLKG) can be categorized into systemic and domain-specific applications. First of all, mature systemic applications of knowledge graphs are integrated into the island knowledge graph to support the developmental utilization and preservation of islands. Furthermore, by aligning with the specific needs of island regions, novel approaches for applying ISLKG in the context of island environments can be investigated.

4.2.1. ISLKG System Applications

The systemic applications of ISLKG are discussed from the perspective of visualization technology, intelligent search, intelligent question-answering, and decision support systems.

Firstly, because knowledge graphs are triplets and ISLKG contains a vast amount of well-organized semantic information as well as observational data, it can intuitively visualize complex relationships. This makes it possible to comprehend island domain knowledge more thoroughly from both a global and local perspective. It also helps to reveal patterns, trends, and regularities within ISLKG. Visualization tools that use the Neo4j graph database depict the knowledge graph in the form of nodes and relationships, which is important for both disseminating island-related knowledge and promoting domain understanding.

Secondly, the structured data representation of knowledge graph nodes and edges enhances the efficiency of search engines in comprehending and processing data, leading to more accurate search outcomes. Furthermore, the explicit semantic associations between knowledge graph entities and relationships aid search engines in better grasping user intent, contextual understanding, and reasoning capabilities. Thus, ISLKG’s query capabilities based on the Cypher language optimize search results. Considering ISLKG’s integration of diverse data sources, its comprehensive nature enables cross-source queries, thereby yielding more precise and extensive query outcomes.

Furthermore, the query efficiency of the ISLKG-based search engine, coupled with its semantic understanding accuracy, paves the way for intelligent question-answering technology. By swiftly accessing data and related information on the graph-structured knowledge chain of ISLKG, the system organizes and presents results to users. Additionally, embedding-based knowledge reasoning methods can be utilized for relationship inference, providing more associated information, and elevating the intelligence of ISLKG applications.

Ultimately, the integration of intelligent search, question-answering technology, and visualization techniques gives rise to feasible strategies for users. This integration forms the basis for ISLKG-powered decision support applications. Specifically, the process involves rapidly integrating and retrieving query information, leveraging ISLKG’s amalgamation of multiple data sources and structured semantic associations. The system then promptly retrieves query results and related information, organizing feasible strategies for inquirers to aid them in their decision-making process.

4.2.2. Development and Utilization of Islands

In addition to the systemic applications of ISLKG, its practical application within the island domain holds significant importance. The development and utilization of islands constitute a pivotal aspect of real-world ISLKG applications. By integrating the systemic applications of ISLKG with the actual requirements of island development and utilization, the capability for island development and utilization is enhanced.

Taking the tourism industry as an example, by integrating cultural, historical, and scenic information about the island, intelligent recommendations of island attractions can be provided to tourists. This enriches the tourists’ experience and promotes the sustainable development of tourism. Visitors can use the knowledge graph to explore unique cultural activities, traditional cuisine, and natural landscapes on the island, enhancing their satisfaction and prolonging their stay. Similarly, the application of intelligent search and question-answering technology in the tourism industry can enhance the travel experience for tourists. After visualizing information about attractions, tourists can use queries or questions to obtain travel routes.

To facilitate the energy development of islands, ISLKG’s decision support can be employed to provide high-efficiency use of viable energy resources, such as wind and solar energy. ISLKG can integrate data for assessing energy potential and environmental impact, assisting decision makers in formulating sustainable energy development plans. Moreover, ISLKG’s decision support can play a crucial role in island infrastructure planning. Island infrastructure construction necessitates consideration of factors like topography and climate. Knowledge graphs can integrate geological, meteorological, and geographical information, assisting in planning rational layouts for infrastructure and thereby enhancing construction efficiency.

Lastly, ISLKG’s applications in visualization and intelligent search can aid marine research and education. The island-specific knowledge graph provides a rich data source for marine research. Researchers can conduct cross-analysis based on the graph to discover new research directions. Additionally, these information sources can be utilized in education to help students better understand topics like oceans and ecosystems.

4.2.3. Conservation of Islands

The conservation of islands is crucial for their sustainable development. Similar to island development and utilization, the integration of ISLKG’s systemic applications with island conservation efforts enhances the management and protection capabilities of islands. Specifically, in areas such as ecological preservation and environmental management, island ecosystems are fragile and susceptible to environmental changes and human activities. Visualization technology, intelligent question-answering, and intelligent search can be applied to ecosystem protection. ISLKG can integrate data on island geography, biodiversity, pollution status, and more, enabling ecosystem monitoring and analysis. This aids in formulating scientific conservation plans, reducing environmental degradation.

The waters around islands often harbor abundant fishery resources. ISLKG can apply decision support to fishery resource management. By integrating data on fishery production, regulations, and policies, ISLKG assists administrators in devising sustainable development plans, promoting responsible fishing practices, and preventing overfishing and resource depletion. Last but not least, islands are vulnerable to natural calamities like typhoons and tsunamis and need to be prepared for them. ISLKG can consolidate historical disaster data, geological information, and topographic information, contributing to the establishment of disaster warning systems and improving emergency response capabilities.

In summary, ISLKG not only offers comprehensive information support through data but also combines systemic applications with island conservation efforts. This balance facilitates sustainable island management and protection, allowing for the harmonization of economic development and ecological preservation.

5. Conclusions

Islands represent an essential platform for safeguarding the marine environment, preserving marine ecological balance, and promoting sustainable economic and social growth. However, due to the scarcity of island resources, their unique geographical location, and their fragile ecosystem, islands’ capacity for sustainable development faces significant obstacles. Protection, development, and administration of islands necessitate comprehensive, precise, and structured data support, so it is necessary to establish a standard island knowledge system. In this paper, the knowledge graph is introduced into the island field. Due to scarce marine domain knowledge graphs and island data being multi-source, disordered, and difficult to obtain, the island knowledge graph construction framework based on multi-source data is proposed, and the ontology of the island knowledge graph is designed. Based on the designed ontology, the semi-structured and unstructured data of the island are collected using a web crawler. This paper also proposes a method for constructing an island knowledge graph based on the characteristics of numerous island data categories. Combining an entity dictionary and the rule method, it achieves the knowledge extraction of island multi-source data; constructs the island knowledge graph; enhances the effectiveness of island protection, development, management, and other work; and provides a structured knowledge base for the sustainable development of islands.

In addition, there is a lack of knowledge in the island knowledge graph due to the interaction between the sparsity of island data and the inherent incompleteness of the knowledge graph. Therefore, the constructed island knowledge graph must be complemented with knowledge. The paper performs knowledge reasoning on the constructed island knowledge graph, verifies the link prediction performance of the embedded knowledge graph representation model, and compares the prediction performance of different models and knowledge graphs. The paper also proposes two new evaluation indicators, T/E and E/R, for the impact prediction results and verifies their impact on the link prediction results, thereby carrying out quality evaluation and knowledge completion on the island knowledge graph.

In light of the development of the island knowledge graph, the intense research on the domain knowledge graph, and the lack of the marine domain knowledge graph, the future direction of research can focus on the following:

On the basis of the island knowledge graph, intelligent Q&A, decision-making assistance, and implicit relationship discovery systems should be developed to enhance the multi-level service capabilities of the island knowledge graph.
Enlarge the island text database, construct the island corpus, and investigate the knowledge extraction method based on deep learning to enhance the semantic understanding ability of the knowledge extraction method based on island multi-source data.
Expand the island knowledge graph construction framework to include other marine fields associated with the island, and further extract the corresponding domain knowledge based on the framework, so as to expand the island knowledge graph and even construct the marine domain knowledge graph.

Author Contributions

Conceptualization, Q.H. and C.Y.; methodology, Q.H. and C.Y.; software, C.Y.; validation, C.Y. and Q.H.; resources, X.J., C.Y. and L.S.; data curation, C.Y.; writing—original draft preparation, C.Y.; writing—review and editing, Q.H., W.S. and J.W.; funding acquisition, Q.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2021YFC3101602, and the National Natural Science Foundation of China, grant number 42376194.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

We would like to thank the anonymous reviewers for their insightful comments and substantial help in improving this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhao, Y.; Ma, R.; Zhu, B.; Sun, X. Mapping study and outlook for the Zhoushan archipelago in China by geography and oceanology using bibliometrics analysis of the CNKI and WOS database. J. Ningbo Univ. 2021, 33, 88–95. [Google Scholar]
Peng, B.; Dong, Y. Ecological damage compensation for uninhabited island development: A case study of Dayangyu Island. Acta Ecol. Sin. 2022, 42, 7587–7596. [Google Scholar]
Jiang, C.; Zhang, C.; Huo, D.; Yang, H. The protection and development trend of islands at home and abroad based on bibliometrics. Mar. Sci. 2022, 46, 113–126. [Google Scholar]
Rossi, A.; Barbosa, D.; Firmani, D.; Matinata, A. Knowledge graph embedding for link prediction: A comparative analysis. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 15, 1–49. [Google Scholar] [CrossRef]
Hoffart, J.; Suchanek, F.M.; Berberich, K.; Weikum, G. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 2013, 194, 28–61. [Google Scholar] [CrossRef]
Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. DBpedia—A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar] [CrossRef]
Bollacker, K.D.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; ACM: New York, NY, USA, 2008. [Google Scholar]
Vrandecic, D.; Krtoetzsch, M. Wikidata: A free collaborative knowledgebase. Commun. ACM 2014, 57, 78–85. [Google Scholar] [CrossRef]
Chen, X.; Jia, S.; Xiang, Y. A review: Knowledge reasoning over knowledge graph. Expert Syst. Appl. 2020, 141, 112948.1–112948.21. [Google Scholar] [CrossRef]
Liu, S.; Yang, H.; Li, J.; Kolmanič, S. Preliminary study on the knowledge graph construction of Chinese ancient history and culture. Information 2020, 11, 186. [Google Scholar] [CrossRef]
Chen, J.; Deng, S.; Chen, H. Crowdgeokg: Crowdsourced geo-knowledge graph. In Proceedings of the Knowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence: Second China Conference, CCKS 2017, Chengdu, China, 26–29 August 2017; Revised Selected Papers 2. Springer: Singapore, 2017; pp. 165–172. [Google Scholar]
Li, J.; Liu, R.; Xiong, R. A Chinese geographic knowledge base for GIR. In Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 361–368. [Google Scholar]
Wang, S.; Zhang, X.; Ye, P.; Du, M.; Lu, Y.; Xue, H. Geographic knowledge graph (GeoKG): A formalized geographic knowledge representation. ISPRS Int. J. Geo-Inf. 2019, 8, 184. [Google Scholar] [CrossRef]
Guo, X.; Qian, H.; Wu, F.; Liu, J. A method for constructing geographical knowledge graph from multisource data. Sustainability 2021, 13, 10602. [Google Scholar] [CrossRef]
Xiao, Z.; Zhang, C. Construction of meteorological simulation knowledge graph based on deep learning method. Sustainability 2021, 13, 1311. [Google Scholar] [CrossRef]
Tan, J.; Qiu, Q.; Guo, W.; Li, T. Research on the construction of a knowledge graph and knowledge reasoning model in the field of urban traffic. Sustainability 2021, 13, 3191. [Google Scholar] [CrossRef]
Xiong, Z.; Ma, H.; Li, S.; Zhang, N. Summary of Application and Prospect Analysis of Knowledge Graphs in Marine Field. Comput. Eng. Appl. 2022, 58, 15–33. [Google Scholar]
Zhang, Q.; Wen, Y.; Zhou, C.; Long, H.; Han, D.; Zhang, F.; Xiao, C. Construction of knowledge graphs for maritime dangerous goods. Sustainability 2019, 11, 2849. [Google Scholar] [CrossRef]
Liu, L.; Li, X. Research and Construction of Marine Chinese Medicine Formulas Knowledge Graph. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 3853–3855. [Google Scholar]
Wu, J.; Wei, Z.; Jia, D.; Dou, X.; Tang, H.; Li, N. Constructing marine expert management knowledge graph based on Trellisnet-CRF. PeerJ Comput. Sci. 2022, 8, e1083. [Google Scholar] [CrossRef]
Ali, M.; Berrendorf, M.; Hoyt, C.T.; Vermue, L.; Galkin, M.; Sharifzadeh, S.; Fischer, A.; Tresp, V.; Lehmann, J. Bringing light into the dark: A large-scale evaluation of knowledge graph embedding models under a unified framework. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8825–8845. [Google Scholar] [CrossRef]
Yang, D.; He, T.; Wang, H.; Wang, J. Survey on Knowledge Graph Embedding Learning. J. Softw. 2021, 33, 3370–3390. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Santa Cruz, CA, USA, 14–18 November 2015; Volume 29. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; Volume 11, pp. 3104482–3104584. [Google Scholar]
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. arXiv 2014, arXiv:1412.6575. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PMLR. pp. 2071–2080. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
Palacio, M.P.; Sol, D.; Gonzalez, J. Graph-based knowledge representation for GIS data. In Proceedings of the Fourth Mexican International Conference on Computer Science, 2003, ENC 2003, Tlaxcala, Mexico, 8–12 September 2003; pp. 117–124. [Google Scholar]
ANSI X3.4-1986; American National Standard for Information Systems—Coded Character Sets—7-Bit American National Standard Code for Information Interchange (7-Bit ASCII). (Technical Report); American National Standards Institute (ANSI): Washington, DC, USA, 1986.
Cui, B.; Gao, J.; Tong, Y.X.; Xu, J.; Zhang, D.; Zou, L. Progress and Trend in Novel Data Management System. J. Softw. 2019, 30, 164–193. [Google Scholar]
Holzschuher, F.; Peinl, R. Performance of graph query languages: Comparison of cypher, gremlin and native access in neo4j. In Proceedings of the Joint EDBT/ICDT 2013 Workshops, Genoa, Italy, 18–22 March 2013; pp. 195–204. [Google Scholar]

Figure 1. Diagram of the composition of island knowledge graph (ISLKG).

Figure 2. The framework of ISLKG construction.

Figure 3. The overall flowchart of ISLKG construction and knowledge reasoning.

Figure 4. The concept hierarchy of the ontology in the island domain.

Figure 5. An example of mapping structured data to ISLKG triples.

Figure 6. The flowchart of knowledge extraction of ISLKG.

Figure 7. An example of ISLKG storage format in Neo4j. The definitions of different colored circles (distinguished by RGB) are shown in Table 8. The arrows connecting the entities signify relationships between them. In the figure, the relationships of Changxing Island with other entities are indicated by categories. Furthermore, the floating box beneath the Changxing Island entity represents its attributes, including category, latitude and longitude, population count, length of coastline, and distance to the nearest mainland.

Figure 8. The architecture of the ConvE model.

Table 1. The basic information of island concepts in the ontology.

Concept	Entities	Constraint
Former name	Nanjian Island, …	VARCHAR
Current name	Zhouzai Island, …	VARCHAR(UNIQUE)
Classification	Inhabited, Uninhabited	CHAR(10)
Sea area	The East China Sea, The South China Sea, The Bohai Sea, The Yellow Sea	CHAR(4,NOT NULL)
Administrative division	Province, City, County	VARCHAR(NOT NULL)
Type of substance	Bedrock, Sand and mud, Coral	CHAR(4)

Table 2. The basic properties of an island in the ontology.

Property	Constraint	Description
Location	Int°Float′	Including longitude and latitude
Area	Float	The size of the island
Length of shoreline	Float	The circumference of the coastline
Distance to the nearest mainland	Double	The distance between the island and the nearest mainland
Elevation	Float	The height difference between an island and sea level

Table 3. The concepts of social information in the ontology.

Concept	Entities	Constraint
School	Dongsha Middle School, …	VARCHAR
Hospital	Changdao County People’s Hospital, …	VARCHAR
Enterprise	Longhai No. 2 Shipping Company, …	VARCHAR
Scenic spot	Maritime Museum, …	VARCHAR
Population	(In 2010, 24,414), …	INT
Administrative agency	Zhoushan Municipal People’s Government, …	VARCHAR
Social Economy	GDP	INT
Social Economy	Fishery, Coastal tourism industry, …	VARCHAR
Hot Events	Time, Place, Introduction	VARCHAR

Table 4. The concepts of research activity in the ontology.

Concept	Entities	Constraint
Expert	Wang Yang, Dai Zhiguo, …	VARCHAR
Research field	Biology, Geology, …	VARCHAR
Paper	An empirical study on the influence of island tourism development on social change, …	VARCHAR

Table 5. The concepts of infrastructure in the ontology.

Concept	Entities	Constraint
Transportation facility	Port, Bridge, Wharf, Channel, Anchorage, Navigation	VARCHAR
Municipal facility	Landscaping, Urban road, Sanitation	CHAR(8)
Water conservancy	River, Reservoir, Seawall	CHAR(4)
Electric power facility	Grid, Microgrid, Power plant	CHAR(6)
Communication facility	Optical fiber, Base station	CHAR(4)

Table 6. The concepts of natural resources in the ontology.

Concept	Entities	Constraint
Land resource	Grass, Farmland, …	CHAR (8)
Water resource	Surface water, Groundwater	FLOAT
Mineral resource	Coal mine, Clay, Granite, …	CHAR (6)
Marine energy resource	Tidal energy, Wind energy	CHAR (6)
Marine biological resource	Marine animal, Marine plant, Marine microorganism	VARCHAR
Vegetation	Moss, Shrub, Coniferous forest, Grassland, …	VARCHAR
Terrestrial biotic resource	Plants (Camellia, Hawthorn Tree, Mangrove, …), Animals (Mouse, Snake, Ant, …)	VARCHAR

Table 7. The concepts of natural environment in the ontology.

Concept	Entities	Constraint
Climate	Sunshine, Precipitation, Temperature, Humidity	FLOAT
Soil type	Sand, Loam, …	CHAR (6)
Topography	Mountain, Hill, Plain, …	CHAR (4)
Hydrology	Temperature, Salinity, Wave height, …	FLOAT
Meteorology	Air pressure, High wind days, Ice age, …	FLOAT
Meteorology	Wind condition	VARCHAR
Sea water quality	PH, Dissolved oxygen, Phosphate, …	DOUBLE
Environmental disaster	Typhoon, Rainstorm, …	CHAR (8)
Ecological disaster	Red tide, Invasion of alien organisms, …	CHAR (12)
Geological disaster	Earthquake, Tsunami, …	CHAR (10)

Table 8. Explanation of the entity meanings represented by different colored circles in Figure 7.

Color	RGB	Entity Category	Definition
Cyan	0,204,255	Island	Entity of island category
Teal	56,105,102	Ecological disaster	Ecological disasters present on islands
Light Gray	240,244,243	Dissolved oxygen	Dissolved oxygen content in the seawater around the island
Slate Blue	121,127,168	Sea area	The sea area to which the island belongs
Slate Gray	96,125,139	Petroleum	Petroleum content around the island
Purple	108,91,123	Zinc	Zinc content in the seawater around the island
Indigo	63,81,181	Arsenic	Arsenic content in the seawater around the island
Turquoise	130,195,199	Suspended solids	Suspended solids content in the sea water around the island
Royal Blue	46,86,166	Marine mineral	Mineral resources on islands and their surrounding areas
Fuchsia	165,20,158	Energy resource	Energy resources of the island and its surrounding areas
Baby Blue	167,208,242	GDP	The annual economic total of the island
Pale Pink	210,197,204	COD	Chemical Oxygen Demand of the sea water around the island
Seafoam Green	109,206,158	City	The city to which the island belongs
Salmon	246,114,128	Scenic spot	The scenic spots included in the island
Olive Green	139,194,74	Marine industry	The marine industry included in the island
Pale Gold	242,201,117	PH	The PH value of the seawater around the island
Light Pink	251,149,175	Country	The country to which the island belongs
Orange	248,160,10	Soil type	Soil types of islands
Steel Blue	15,86,114	Former name	The former name of the island
Dark Slate Gray	48,71,83	Silicate	Silicate content in the seawater around the island
Beige	196,182,154	Material type	Material types of islands
Crimson	242,61,68	Geographic disaster	Geographical disasters present on islands
Yellow	242,227,15	Copper	Copper content in the seawater around the island
Sky Blue	104,189,246	Country town	The county seat to which the island belongs
Light Blue	146,212,244	Island classification	Categories of islands classified by presence or absence of residentst
Cerulean	0,188,213	Chromium	Chromium content in the seawater around the island
Dusty Rose	207,126,131	Lead	Lead content in the seawater around the island

Table 9. The specific number of datasets.

Entities	Relationships	Training Set	Test Set	Validation Set
82,086	111	297,700	24,231	24,231

Table 10. Hyperparameter values of the model.

Model	Batch_Size	Learning Rate	Dimension	Epochs
DistMult	256	0.01	200	1000
Complex	256	0.001	100	1000
ConvE	128	0.003	200	1000

Table 11. The model link prediction results of ISLKG.

Model	MRR	MR	Hits@10	Hits@3	Hits@1
DistMult	0.431	4026	0.552	0.469	0.367
Complex	0.434	3877	0.554	0.471	0.366
ConvE	0.454	2014	0.574	0.490	0.387

Bold indicates the optimal values of link prediction results on ISLKG among the three models.

Table 12. Specific details of the datasets and their link prediction results of the ConvE model.

Dataset	Entities	Relationships	Triples	T/E	E/R	MRR	MR	Hits@10	Hits@3	Hits@1
WN18RR	40,943	11	93,003	2.27	3722	0.430	4187	0.519	0.438	0.400
FB15K-237	14,541	237	309,116	21.26	61	0.325	244	0.501	0.356	0.237
YAGO3-10	123,182	37	1,089,041	8.84	3329	0.440	1676	0.621	0.489	0.351
ISLKG	82,086	111	346,162	4.22	740	0.454	2014	0.574	0.490	0.387

Bold indicates the optimal values of link prediction results of the ConvE model among the four datasets.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, Q.; Yu, C.; Song, W.; Jiang, X.; Song, L.; Wang, J. ISLKG: The Construction of Island Knowledge Graph and Knowledge Reasoning. Sustainability 2023, 15, 13189. https://doi.org/10.3390/su151713189

AMA Style

He Q, Yu C, Song W, Jiang X, Song L, Wang J. ISLKG: The Construction of Island Knowledge Graph and Knowledge Reasoning. Sustainability. 2023; 15(17):13189. https://doi.org/10.3390/su151713189

Chicago/Turabian Style

He, Qi, Chenyang Yu, Wei Song, Xiaoyi Jiang, Lili Song, and Jian Wang. 2023. "ISLKG: The Construction of Island Knowledge Graph and Knowledge Reasoning" Sustainability 15, no. 17: 13189. https://doi.org/10.3390/su151713189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ISLKG: The Construction of Island Knowledge Graph and Knowledge Reasoning

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. ISLKG Construction Techniques

3.1.1. Overall Framework

3.1.2. Ontology Construction of Island Knowledge Graph

3.1.3. Data Collection and Preprocessing

3.1.4. Knowledge Extraction

3.1.5. Knowledge Fusion and Storage

3.2. Knowledge Reasoning Model Based on Knowledge Graph Embedding

3.2.1. Model Definition

3.2.2. Model Training Process

3.2.3. Experiment and Results Analysis

4. Discussion

4.1. Challenges in ISLKG

4.2. Applications of ISLKG

4.2.1. ISLKG System Applications

4.2.2. Development and Utilization of Islands

4.2.3. Conservation of Islands

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI