Bibliometric Analysis on the Research of Geoscience Knowledge Graph (GeoKG) from 2012 to 2023

: The geoscience knowledge graph (GeoKG) has gained worldwide attention due to its ability in the formal representation of spatiotemporal features and relationships of geoscience knowledge. Currently, a quantitative review of the state and trends in GeoKG is still scarce. Thus, a bibliometric analysis was performed in this study to fill the gap. Specifically, based on 294 research articles published from 2012 to 2023, we conducted analyses in terms of the (1) trends in publications and citations; (2) identification of the major papers, sources, researchers, institutions, and countries; (3) scientific collaboration analysis; and (4) detection of major research topics and tendencies. The results revealed that the interest in GeoKG research has rapidly increased after 2019 and is continually expanding. China is the most productive country in this field. Co-authorship analysis shows that inter-national and inter-institutional collaboration should be reinforced. Keyword analysis indicated that geoscience knowledge representation, information extraction, GeoKG construction, and GeoKG-based multi-source data integration were current hotspots. In addition, several important but currently neglected issues, such as the integration of Large Language Models, are highlighted. The findings of this review provide a systematic overview of the development of GeoKG and provide a valuable reference for future research.


Introduction
Geoscience knowledge graph (GeoKG), also known as geographic KG or spatialtemporal KG, has been receiving increasing attention from both academia and industry in recent years.Just like general KGs, GeoKG is a graph-structured representation of human knowledge, where nodes represent entities and the edges of the graph represent relationships between those entities [1].It is effective in organizing knowledge into machine-understandable and computable semantic networks so that the knowledge can be processed efficiently and unambiguously by machines.Differing from general KGs, GeoKG is a geoscience domain-specific KG and has excellent capability in representing the unique spatiotemporal features and relationships of geoscience knowledge [2][3][4][5].GeoKG is playing an increasingly important role in the discovery, mining, sharing, and service of geoscience knowledge and spatial data on the Web [5][6][7][8].Moreover, GeoKGs are at the core of geospatial artificial intelligence (GeoAI), which is an interdisciplinary field combining geography, spatial data science, and AI, and seeks to solve major geospatial problems by developing intelligent geographic methods and applications [9][10][11].It can be used to foster not only scientific research, such as the formal representation and sharing of geoscience knowledge and data-driven discoveries in deep-time Earth [6], but also many practical problems such as points-of-interest (POI) recommendation [12], geographical question answering [13], geospatial big data integration, intelligent environmental geo-services, and interactive analysis of epidemic situations [14].Given its diverse application scenarios and immense potential, the number of GeoKG-related publications has been surging in the last decade, signifying the thriving growth and progress of this field.Therefore, a comprehensive review of GeoKG research is greatly needed so that researchers can understand the state-of-the-art and identify the gaps in the field.To date, several review articles concerning GeoKG have been published, focusing on either the historic development of the field [5,15] or specific aspects of GeoKG research, e.g., knowledge acquisition [16,17] and GeoKG construction [8,18].While existing reviews are insightful and helpful in understanding GeoKG research, they do not provide a quantitative perspective of the whole field.They were typically based on qualitative analyses, limited not only by the small number of analyzed publications, but also by a heavy reliance on the personal knowledge and judgment of the reviewers.
Therefore, in this paper, a bibliometric analysis was performed to explore current research performance and future development trends in GeoKG from a quantitative perspective, over the period 2012-2023, based on the Web of Science Core Collection (WoSCC) database.Bibliometric analysis is a powerful method for exploring and analyzing large volumes of scientific literature in a certain field by using quantitative and statistical techniques [19,20].It provides a quantitative and objective understanding of the current status and trends in the whole field [19].Furthermore, it presents a comprehensive overview of the knowledge structures of the field, including the intellectual structure, conceptual structure, and social structure, in terms of impactful authors, publications, sources, institutions, and countries [21,22].Bibliometric analysis methods have now been widely used in a variety of scientific fields, such as geographical information systems [23] and KG [24,25].In the field of GeoKG, although some bibliometric reviews have already been conducted on specific sub-topics, e.g., geo-ontology [26], studies concerning the research status and trends in the whole field are still scarce.
The purpose of this study is to provide valuable and practical references for researchers and practitioners in the GeoKG field.The following questions are used to guide the research: (1) What is the publication growth trend in GeoKG research?What are the possible causes?(2) Which publications have had the most significant impact on GeoKG research?What topics have they discussed?(3) What were the most prominent research areas and sources where articles were published?(4) Who were the leading authors, and what were the most prolific institutions and countries?(5) What were the scientific collaborations between major authors, countries, and institutions?What should we do next to enhance the collaborations?(6) What were the primary research topics in the field, and which topics remain underexplored?

Research Framework and Data Source
The overall framework of this review that describes all of the analysis processes and contents, including the data sources and search terms, as well as bibliometric analysis methods, is shown in Figure 1.Detailed descriptions of each part are elaborated in the following sub-sections.
The scientific literature used for analysis was collected from the WoSCC database.WoSCC was chosen as the data source for the following two reasons.First, it is one of the world's leading citation databases and is widely used in bibliometric studies [27].Second, it includes detailed and high-quality bibliographic records about publications from thousands of high-impact journals worldwide, making it possible to trace the progress and identify the trend in the research on GeoKG.

Search Criteria and Justifications of Search Terms
In this review, we searched the academic literature information in WoSCC using the following terms on 8 March 2024: TS = (("geographic*" OR "geoscience*" OR "geospatial*" OR "spatial-temporal*" OR "spatio-temporal*" OR "spatiotemporal*") AND ("knowledge graph*")).The asterisk (*) represents any group of characters, including no character.The term "knowledge graph" was first introduced by Google in 2012.Thus, the time was from 2012 to 2023 in this study.Moreover, the document type was selected as "all document types", including articles, proceeding papers, and reviews, et al.A total of 294 publications were collected and processed for analysis based on the selection criteria.The analysis results of these collected publications generated using Web of Science (WoS) were downloaded for this review as well.
This study defines the keywords for search based on the investigation of search results and data analysis results, the synonyms, the comparison of results using different search terms, as well as by referring to already published articles and reviews, e.g., [5,24].The keywords "geoscience *" and ''knowledge graph*" were originally included, since they are the most relevant terms to the topic of this study, i.e., GeoKG.We then extended the keyword list by considering more terms such as "geographic*" and "geospatial", which have been widely used in peer-reviewed articles, e.g., [28,29].The keywords "spatial-temporal", "spatio-temporal", and "spatiotemporal" were added into the list for similar reasons.
It is worth noting that this study has excluded the term "ontolog*" (matches ontology and ontologies) from the data search.The reasons are twofold.First, the investigation conducted by Chen, et al. [24] showed that adopting only the term ''knowledge graph*" is reasonable.Second, we conducted data searches using an extended list containing the term "ontolog*" and obtained a total of 1875 publications.Preliminary examinations of the publications indicated that the results were undesirable, as they involved too much

Search Criteria and Justifications of Search Terms
In this review, we searched the academic literature information in WoSCC using the following terms on 8 March 2024: TS = (("geographic*" OR "geoscience*" OR "geospatial*" OR "spatial-temporal*" OR "spatio-temporal*" OR "spatiotemporal*") AND ("knowledge graph*")).The asterisk (*) represents any group of characters, including no character.The term "knowledge graph" was first introduced by Google in 2012.Thus, the time was from 2012 to 2023 in this study.Moreover, the document type was selected as "all document types", including articles, proceeding papers, and reviews, et al.A total of 294 publications were collected and processed for analysis based on the selection criteria.The analysis results of these collected publications generated using Web of Science (WoS) were downloaded for this review as well.
This study defines the keywords for search based on the investigation of search results and data analysis results, the synonyms, the comparison of results using different search terms, as well as by referring to already published articles and reviews, e.g., [5,24].The keywords "geoscience *" and "knowledge graph*" were originally included, since they are the most relevant terms to the topic of this study, i.e., GeoKG.We then extended the keyword list by considering more terms such as "geographic*" and "geospatial", which have been widely used in peer-reviewed articles, e.g., [28,29].The keywords "spatial-temporal", "spatio-temporal", and "spatiotemporal" were added into the list for similar reasons.
It is worth noting that this study has excluded the term "ontolog*" (matches ontology and ontologies) from the data search.The reasons are twofold.First, the investigation conducted by Chen, et al. [24] showed that adopting only the term "knowledge graph*" is reasonable.Second, we conducted data searches using an extended list containing the term "ontolog*" and obtained a total of 1875 publications.Preliminary examinations of the publications indicated that the results were undesirable, as they involved too much noise.For example, many highly cited articles within the results, such as [30] (442 citations) and [31] (263 citations), have employed the philosophical definition of ontology instead of that in the field of computer science.

Methods of Analysis
Collected citation data were further analyzed using Python and two bibliometric mapping tools, i.e., the bibliometrix R-package 4.1.3[32] and VOSviewer 1.6.20,as depicted in Figure 1.Several Python 3.9 libraries including Matplotlib 3.8.0 and SciPy 1.11.4 were used to fit and visualize the annual publications, citations, and the Logistic Growth Model.The bibliometrix R-package was used to perform (1) a descriptive analysis of the publications, authors, sources, institutions, and countries/regions; and (2) network analysis of keywords to generate the thematic map.Particularly, the local citation score (LCS) and global citation score (GCS) are used as primary indicators to assess the impact of publications.LCS represents the number of citations a document received from other documents included in the dataset collected for this study, while GCS refers to the total citations a document received in the whole bibliographic database [33], i.e., the WoSCC in this study.Thus, LCS and GCS could be used to reveal the important documents in the specific research field and the documents that attracted multidisciplinary attention.
VOSviewer is a frequently used science mapping tool for analyzing bibliometric networks.It was used in this study for constructing and visualizing the co-authorships between various contributors (authors, institutions, and countries), the co-occurrences of keywords, and the co-citation analysis of publications.Co-authorship analysis is a frequently used way to identify the scientific collaborations among scholars (including their affiliations and countries) in a specific research field at the intellectual or social level [19].The information provided by the co-authorship network is helpful for individual researchers, policy-makers, and funding agencies.This is because scientific collaboration holds a pivotal guiding role in promoting the dissemination of knowledge and enhancing academic communication.It significantly contributes to the advancement of scientific discovery and the strengthening of the global academic community.The co-occurrences of keywords appearing in the literature can effectively reflect the heat of the topic corresponding to the keywords in a field.Therefore, scholars use co-occurrence analysis of keywords in the literature to analyze the change trajectory of research hotspots and reveal the emerging trends and frontiers in a specific field [19,34].The keywords frequently used in bibliometric analysis include both author keywords (i.e., keywords given by authors), keywords plus that are generated from cited article titles by algorithms in the WoS platform, and terms extracted from the title and abstract of articles.The combination of multiple types of keywords can offer a more comprehensive understanding of the research hotspots and trends in a given field [35].
Specifically, the geographic visualization of countries' collaboration was conducted using SCImago Graphica 1.0.39, a free and easy-to-use visualization tool, based on the data exported from bibliometrix R-package and VOSviewer.

Trends in Publications and Citations
The annual trend in publications serves as a straightforward yet profound means of reflecting global activity and scientific interest towards GeoKG.As shown in Figure 2, the annual number of publications and citations on GeoKG increased significantly between 2012 and 2023.Moreover, a total of 261 papers (88.78%) were published in the last 5 years, indicating that the research interest in this topic has been growing continually, particularly since 2019, reaching 86 publications and 485 citations in 2022.It is interesting that the number of publications in 2023 decreases compared to 2022.This may be because some of the papers published in 2023 have not yet been indexed in WoSCC at the time we collected the data (8 March 2024).Normally, it can take from a couple of weeks to several months for an article to be indexed in the database after publication.Furthermore, it is common for the data to experience regular variations before stabilizing over several years.
for an article to be indexed in the database after publication.Furthermore, it is common for the data to experience regular variations before stabilizing over several years.According to the theory of technology maturity, the Logistic Growth Model could be employed to fit and forecast the cumulative number of publications [36].The red dashed curve in Figure 3 illustrates the logistic growth function (or the S-curve function) for the global publication accumulation.It is described by Equation (1) as follows:  738.03 1 790.9 e . ( where x and y represent the year and the corresponding cumulative publications, respectively.The least squares method for curve fitting in Python library SciPy is adopted to obtain the parameters.Consequently, the cumulative publications on GeoKG over time follow a logistic growth pattern in the shape of an S-curve.Similar to [36,37], the development of GeoKG could be divided into the following three stages based on Equation (l) and Figure 3: (a) infant stage (before 2020, up to 10% of publication output), (b) growth stage (2020-2028, 10-90% of publications), and (c) mature stage (after 2029).At the infant stage, GeoKG gained less attention and the annual publication numbers increased slowly, with no more than 30 articles per year.The importance of GeoKG was gradually recognized in the growth stage, and the number of publications and citations increased exponentially.One reason could be the success of general KGs in the computer science field and prominent industries.Another reason was likely because According to the theory of technology maturity, the Logistic Growth Model could be employed to fit and forecast the cumulative number of publications [36].The red dashed curve in Figure 3 where x and y represent the year and the corresponding cumulative publications, respectively.The least squares method for curve fitting in Python library SciPy is adopted to obtain the parameters.Consequently, the cumulative publications on GeoKG over time follow a logistic growth pattern in the shape of an S-curve.
ISPRS Int.J. Geo-Inf.2024, 13, 255 5 of 21 for an article to be indexed in the database after publication.Furthermore, it is common for the data to experience regular variations before stabilizing over several years.According to the theory of technology maturity, the Logistic Growth Model could be employed to fit and forecast the cumulative number of publications [36].The red dashed curve in Figure 3 illustrates the logistic growth function (or the S-curve function) for the global publication accumulation.It is described by Equation ( 1) as follows: 738.03 1 790.9 e . ( where x and y represent the year and the corresponding cumulative publications, respectively.The least squares method for curve fitting in Python library SciPy is adopted to obtain the parameters.Consequently, the cumulative publications on GeoKG over time follow a logistic growth pattern in the shape of an S-curve.Similar to [36,37], the development of GeoKG could be divided into the following three stages based on Equation (l) and Figure 3: (a) infant stage (before 2020, up to 10% of publication output), (b) growth stage (2020-2028, 10-90% of publications), and (c) mature stage (after 2029).At the infant stage, GeoKG gained less attention and the annual publication numbers increased slowly, with no more than 30 articles per year.The importance of GeoKG was gradually recognized in the growth stage, and the number of publications and citations increased exponentially.One reason could be the success of general KGs in the computer science field and prominent industries.Another reason was likely because Similar to [36,37], the development of GeoKG could be divided into the following three stages based on Equation (l) and have aroused worldwide attention regarding the effectiveness of KG in sharing global geoscience knowledge and facilitating data-driven scientific discovery [6].It is anticipated that this trend in GeoKG research will be sustained over a period until the year 2028.After reaching maturity, the growth in the number of publications will gradually slow down.Note that the predicted maturity year for GeoKG might be changed due to new theoretical or technological advances in geoscience or generic KGs.

Top Publications, Research Areas, and Sources
The number of citations, including LCS and GCS, normally represents the academic influence of a paper to a certain extent.On the whole, the retrieved 294 documents were cited 323 times in the local database and 1843 times in the whole WoSCC database, with an average of 1.10 and 6.27 citations per item, respectively.Notably, there are 34 (11.56%)publications obtained only one citation, and a total of 206 (70%) publications have not received local citations yet.Table 1 lists the top 11 articles in GeoKG with a minimal LCS value of 8. Topics of these highly cited articles could be divided into the following five categories: (a) geoscience knowledge representation [29,38,39], (b) geoscience information extraction [40,41], (c) GeoKG construction and completion [3,8,42,43], (d) GeoKG application [44], and (e) review articles that introduced the current status and future developments of GeoKGs from a qualitative perspective [6,8].Particularly, in the first category, the article entitled "Geographic Knowledge Graph (GeoKG): A Formalized Geographic Knowledge Representation" published by Wang et al. [29] was the most cited (23 citations) document in the local database as of March 2024.It designed a formalized knowledge representation model and supplemented the constructors of the ALC (attributive language with complements) description language to represent geographic states, evolutions, and mechanisms.Zheng, et al. [38] presented a hierarchical cubical model structure to represent geographic evolutionary knowledge, including the evolution mechanism of geographic elements and the reasons.Evolution not only happens to geographic elements, but also to domain concepts.Thus, a mechanism of version control and organization of concepts is needed to reduce the semantic ambiguity caused by the evolution.To this end, Ma et al. [39] proposed a new structure based on the identifiers of vocabulary schemes for version control and tracking of concepts and attributes in a GeoKG.What the three highly cited papers have in common is a special focus on the evolution of geoscience knowledge.
The second category focuses on geoscience information extraction from textual geoscience data based on natural language processing (NLP) techniques.It is a very important prerequisite task to the construction and application of GeoKGs.Specifically, the article entitled "Information extraction and knowledge graph construction from geoscience literature" authored by Wang et al. [40] received the largest GCS value (108) and the second largest LCS value (15) as of March 2024.It developed a workflow to extract information and construct KG from the unstructured Chinese geoscience literature.
The third category of highly cited papers centered on GeoKG construction and completion [3,8,42,43].They are important prerequisites to the success of GeoKGs.They consist of several iterative steps, including data curation and integration, text classification and information extraction, knowledge representation and encoding, as well as entity disambiguation and linking.Currently, GeoKG construction and completion are still complex, time-consuming, and limited in scale, taking into consideration the heterogenous of multivariate geoscience big data and the dynamic nature of geoscience knowledge.There are still many important issues that need to be studied in the future [3,8,42,43].
GeoKGs can be used for many types of applications, although there was only one highly cited paper [44].Typical applications of GeoKGs include, but are not limited to, geographical question answering, geospatial knowledge summarization, knowledge-driven integration and analysis of spatiotemporal big data, intelligent map editing and mapping, intelligent environmental geo-services, knowledge-driven remote sensing image analysis, smart city, digital humanities, virtual disaster environments, and so on [3,8,9,42,44].Such applications show that GeoKGs could not only boost the performance of existing applications, but also open up the path toward new smart applications in the big data era.
The total of 294 articles covered 41 WoS research areas, and the top 20 areas with the most publications are shown in Figure 4.Note that each paper may belong to more than one research area in the WoS database.The top 10 most productive research areas were computer science (205 documents, 69.728% of the 294 outputs), engineering (51, 17.35%), remote sensing (40, 13.61%), geology (36, 12.25%), physical geography (36, 12.25%), geography (23, 7.82%), environmental sciences ecology (16, 5.44%), information science library science (12, 4.08%), telecommunications (12, 4.08%), and science technology other topics (10, 3.40%).The distribution of research areas suggested the high priority of technical issues in GeoKG research.It also reveals the close relationships between GeoKG, computer science, and earth science (including remote sensing, geology, physical geography, geography, and ecology).AI techniques in computer science provided essential approaches and standards for the implementation of GeoKG, including knowledge representation, extraction, embedding, completion, fusion, and reasoning; earth science data can yield unique spatial-temporal information, which is important for not only scientific discovery, but also practical applications such as policy-making, while GeoKG can facilitate the representation, retrieval, integration, and sharing of earth science data from highly heterogeneous sources, promoting knowledge assisted data intelligence and computational intelligence [5,42].Thus, GeoKG researchers should keep a close eye on the development of computer science and earth science.Furthermore, the 294 publications concerning GeoKG are contributed to by 180 sources.According to the law of Bradford implemented in the bibliometrix R-package, there are 15 core publication sources.Among them, the top nine sources with a minimal publication count of five are shown in Table 2.In terms of the publication number, ISPRS International Journal of Geo-Information was the most productive source in the field (pub- Furthermore, the 294 publications concerning GeoKG are contributed to by 180 sources.According to the law of Bradford implemented in the bibliometrix R-package, there are 15 core publication sources.Among them, the top nine sources with a minimal publication count of five are shown in Table 2.In terms of the publication number, ISPRS International Journal of Geo-Information was the most productive source in the field (published 22 articles), followed by Transactions in GIS (16) and Geoscience Frontiers (8).Specifically, the top nine (5%) sources published 79 (26.87%) of the 294 outputs.In contrast, 131 sources (72.78%)published only one paper on GeoKG.Moreover, regarding the citation count, the top three sources were Computers & Geosciences (194 citations), ACM Transactions on Information Systems (140), and ISPRS International Journal of Geo-Information (133).In addition, according to the average citations, the top three were Computers & Geosciences (38.8),International Journal of Geographic Information Science (11.83), and IEEE Access (11.8), suggesting their considerable impact in this field.In addition, the h-index is frequently used to measure both the productivity and citation impact of the publications of a source or a scientist.It means the h number of publications were cited at least h times.Thus, the top four sources with a minimal h-index value of 5 were ISPRS International Journal of Geo-Information (7), Computers & Geosciences (5), International Journal of Geographic Information Science (5), and Transactions in GIS (5).These findings indicated that the GeoKG outputs among journals or conference proceedings were very dispersed, but the primary concentration was on a limited number of sources.Researchers could follow these sources to keep updated with the latest research or select suitable journals to publish their works.

Leading Authors, Institutions, and Countries
Author analysis in a certain research field can help scholars know the leading experts, thereby timely tracking the latest research trends and achievements in the field.As a whole, the collected 294 publications concerning GeoKG were contributed to by 1084 authors.Table 3 presents the top ten most prolific authors who have published at least seven papers.The top three authors were Janowicz Krzysztof (twelve publications), Mai Gengchen (ten), and Qiu Qinjun (ten).In contrast, 894 (82.47%) authors had published only one paper.Furthermore, according to the Price formula, i.e., N = 0.749(N max ) 1/2 and N max = 12 (the number of the most prolific author's publications), 72 authors who have published more than 2.59 papers were recognized as core authors in the field.In terms of the number of citations, the top three authors were Janowicz Krzysztof (246), Ma Xiaogang (221), and Mai Gengchen (206).They were also the only three scholars who had more than 200 citations.Notably, Janowicz Krzysztof is both the most prolific and influential researcher in the field of GeoKG.He also has the most continued trajectory in GeoKG research.His research areas include spatial and temporal principles of knowledge organization, geospatial semantics, and semantic web, KGs, GeoAI, and spatial studies.The collected 294 publications concerning GeoKG were contributed by 449 institutions.Among them, as shown in Table 4, the Chinese Academy of Sciences (CAS) was the most productive institution with 31 publications, accounting for 10.54% of the 294 outputs, followed by the China University of Geosciences, Wuhan (23), and the University of Chinese Academy of Sciences (20).In contrast, 345 (76.84%) institutions have published only one paper.This shows that the production of these institutions was uneven.It is worth noting that ten of the top fifteen most prolific institutions were located in China, with 129 papers, accounting for 43.88% of the total publications, indicating the great interest of Chinese scientists in GeoKG research.However, regarding the total citations, the top three institutions were the China University of Geosciences Wuhan (342 citations), the University of California Santa Barbara (253), and the University of Idaho (221), revealing their high impact on GeoKG research.
Furthermore, the 294 publications concerning GeoKG were contributed to by 45 countries, among which the top nine with a minimal publication count of 4 are shown in Table 5.According to the results, China was the most productive country on GeoKG research with 158 publications, accounting for 53.74% of the total publications, followed by the USA (48) and Germany (18).Additionally, China, the USA, and Australia were ranked as the top three most cited countries, with 905, 464, and 130 citations, respectively.This reflects that these three countries paid relatively high attention to the research of GeoKG.In addition, total link strength (TLS) is widely used to indicate the influence of a node in a network.The greater the TLS value of a node, the greater its impact.Thus, China was the most influential country in the field of GeoKG, with a TLS value of 55, followed by the USA (49), India (20), Australia (20), and Germany (19).It is noteworthy that, although Australia, New Zealand, and Finland have fewer publications (4, 1, and 1, respectively), their articles have exhibited a significantly higher quality and influence, evident in their ACP values (32.5, 24, and 20).However, the ACP value of China stands relatively low at 5.73, ranking fifth, which suggests that Chinese scientists should strengthen their research and publish high-impact papers in the future.

Scientific Collaboration Analysis
Scientific collaborations among scholars can generate new ideas and richer insights, thus improving the research.In this section, we use VOSviewer to analyze the scientific collaborations, i.e., co-authorship relationships of major contributors, including authors, institutions, and countries.The parameter "minimum number of documents of an author" was set to three according to the Price formula mentioned above.Figures 5 and 6 show the resulting network map and the average year of each publication, respectively, with a set of 57 authors.The size of the nodes and edges denote the TLS of a node and the link strength between two nodes, respectively.Edges connecting nodes represent coauthorships.The color of a node signifies the cluster it belongs to, where clusters are tightly connected research communities of authors interlinked via co-authorship relations.The colors depicted in Figure 6 illustrate the average publication year (APY) of each author.
the resulting network map and the average year of each publication, respectively, with a set of 57 authors.The size of the nodes and edges denote the TLS of a node and the link strength between two nodes, respectively.Edges connecting nodes represent co-authorships.The color of a node signifies the cluster it belongs to, where clusters are tightly connected research communities of authors interlinked via co-authorship relations.The colors depicted in Figure 6 illustrate the average publication year (APY) of each author.the resulting network map and the average year of each publication, respectively, with a set of 57 authors.The size of the nodes and edges denote the TLS of a node and the link strength between two nodes, respectively.Edges connecting nodes represent co-authorships.The color of a node signifies the cluster it belongs to, where clusters are tightly connected research communities of authors interlinked via co-authorship relations.The colors depicted in Figure 6 illustrate the average publication year (APY) of each author.As illustrated in Figure 5, the network map primarily consists of 13 clusters.Detailed information of the top seven clusters which have grouped at least four authors is listed in Table 6.Regarding the number of involved influential authors (see Table 3), cluster 6 and cluster 2 were the most impactful research communities on GeoKG research.Furthermore, the TLS values show that authors in these clusters except cluster 7 have a very close cooperative relationship.However, cooperations between clusters are limited to clusters 1, 4, and 6, with no collaboration discernible between the remaining clusters.This indicates that the network is fragmented, and the cooperation among different research communities is very weak.Therefore, scholars in the field of GeoKG should explore opportunities to strengthen scientific collaborations across disciplines, institutions, and/or countries in the future.Additionally, institutions and funding agencies should provide more support for initiatives aimed at fostering scientific collaboration among different research communities.Figure 8 shows the international collaboration among the 45 countries.The colors on the map represent the clusters to which this country belongs.The size of each country represents its total publications on GeoKG research.Lines connecting the countries indicate collaboration among them, and the width of the lines signifies the intensity level of the relationships (thin lines indicate weak relationships).As a result, China and the USA were the most collaborative countries, both collaborating with 20 other countries.The top four collaborative partners of China were the USA (fifteen links), Australia (seven), the United Kingdom (six), and France (three).The top six collaborative partners of the USA were Poland (four links), India (four), the United Kingdom (three), Germany (three), Austria (three), and Australia (three).It is worth noting that countries/regions from four continents, namely Asia, North America, Europe, and Australia, have contributed most of the publications and collaborations concerning GeoKG, as shown in Figure 8.Thus, interna- Figure 8 shows the international collaboration among the 45 countries.The colors on the map represent the clusters to which this country belongs.The size of each country represents its total publications on GeoKG research.Lines connecting the countries indicate collaboration among them, and the width of the lines signifies the intensity level of the relationships (thin lines indicate weak relationships).As a result, China and the USA were the most collaborative countries, both collaborating with 20 other countries.The top four collaborative partners of China were the USA (fifteen links), Australia (seven), the United Kingdom (six), and France (three).The top six collaborative partners of the USA were Poland (four links), India (four), the United Kingdom (three), Germany (three), Austria (three), and Australia (three).It is worth noting that countries/regions from four continents, namely Asia, North America, Europe, and Australia, have contributed most of the publications and collaborations concerning GeoKG, as shown in Figure 8.Thus, international scientific collaborations between the abovementioned continents and Africa and South America should be strengthened in the future.Such collaboration can facilitate the sharing of data, knowledge, resources, and funding, thereby accelerating the development of global GeoKG research.
the map represent the clusters to which this country belongs.The size of each country represents its total publications on GeoKG research.Lines connecting the countries indicate collaboration among them, and the width of the lines signifies the intensity level of the relationships (thin lines indicate weak relationships).As a result, China and the USA were the most collaborative countries, both collaborating with 20 other countries.The top four collaborative partners of China were the USA (fifteen links), Australia (seven), the United Kingdom (six), and France (three).The top six collaborative partners of the USA were Poland (four links), India (four), the United Kingdom (three), Germany (three), Austria (three), and Australia (three).It is worth noting that countries/regions from four continents, namely Asia, North America, Europe, and Australia, have contributed most of the publications and collaborations concerning GeoKG, as shown in Figure 8.Thus, international scientific collaborations between the abovementioned continents and Africa and South America should be strengthened in the future.Such collaboration can facilitate the sharing of data, knowledge, resources, and funding, thereby accelerating the development of global GeoKG research.

Keyword Analysis
In this section, the co-occurrence analysis of keywords was performed based on both author keywords and keywords plus using VOSviewer.To achieve better accuracy, the collected bibliometric dataset was pre-processed.Firstly, keywords in plural forms were converted to singular forms (e.g., "knowledge graphs").Secondly, the term "geospatial knowledge graph" and its synonyms (e.g., "geographic knowledge graph") were abbreviated as "GeoKG".Consequently, a total of 191 keywords that appeared at least two times were clustered and visualized, as shown in Figure 9.The size of each node signifies the occurrences of the keyword.Lines connecting two nodes indicate co-occurrence among them, and the line width represents the frequency of the co-occurrence.
These keywords could be grouped into four clusters based on their association strength.Keywords that occurred at least five times in each cluster are listed in Table 7.The average link strength (ALS) of the cluster represents the closeness of the keywords contained in it.The greater the ALS of a cluster, the greater the co-occurrence strength between the keywords and the more concentrated the research topics.Otherwise, it means that the co-occurrence intensity is relatively low and the research is more dispersed.In addition, the TLS of a keyword represents the importance of the keyword in the network.The higher the TLS, the more important the keyword is for the construction of the network.Additionally, the average citation (AC) of the keywords indicates the level of interest in the cluster's topic.
converted to singular forms (e.g., "knowledge graphs").Secondly, the term "geospatial knowledge graph" and its synonyms (e.g., "geographic knowledge graph") were abbreviated as "GeoKG".Consequently, a total of 191 keywords that appeared at least two times were clustered and visualized, as shown in Figure 9.The size of each node signifies the occurrences of the keyword.Lines connecting two nodes indicate co-occurrence among them, and the line width represents the frequency of the co-occurrence.These keywords could be grouped into four clusters based on their association strength.Keywords that occurred at least five times in each cluster are listed in Table 7.The average link strength (ALS) of the cluster represents the closeness of the keywords contained in it.The greater the ALS of a cluster, the greater the co-occurrence strength between the keywords and the more concentrated the research topics.Otherwise, it means that the co-occurrence intensity is relatively low and the research is more dispersed.In addition, the TLS of a keyword represents the importance of the keyword in the network.The higher the TLS, the more important the keyword is for the construction of the network.Additionally, the average citation (AC) of the keywords indicates the level of interest in the cluster's topic.(15,72), visualization (14, 84), system (13,61), model (11,39), knowledge representation (8,20), semantics (7,34), management (6,38), 26) As illustrated in Figure 9, cluster 1 (red) includes terms commonly found in the topic of AI-based information extraction and GeoKG construction.It covers a variety of AI technologies such as deep learning, NLP, and graph neural networks, aiming to extract knowledge and construct GeoKGs from big data for intelligent applications in different fields [3,8,17].Cluster 1 has the largest value of AC among the four clusters, indicating that information extraction and GeoKG construction are currently the hottest research topics in the field.
Cluster 2 (green) includes terms frequently used in studies of knowledge representation, management, and visualization that are based on semantic web techniques.It covers several research aspects such as knowledge representation models [29,38], knowledge management and visualization [45], knowledge-enhanced systems [46], and linked open data [47], as well as their applications in various areas such as COVID-19 [14], digital humanities [48], VGEs [44], and so on.
Cluster 3 (dark cyan) includes terms commonly found in the topic of GeoKG completion and application.It emphasizes the use of AI technologies such as machine learning and knowledge embedding for KG completion tasks, particularly link prediction [49].This then could improve the effectiveness of knowledge-driven applications such as POI recommendation [50] and decision-making [51].The cluster's ALS value is the lowest among the four, and the TLS values of the keywords except "knowledge" and "machine learning" are less than 11, demonstrating that the research topics of this cluster are less concentrated.
Cluster 4 (purple) includes terms frequently found in the research of multi-source spatial data integration based on GeoKG.It covers multiple types of spatial data on the Web such as OpenStreetMap, Wikidata, and geologic time scales, as well as semantic web technologies such as ontology, KG, linked data, and SPARQL, aiming to improve the integration and semantic interoperability of spatio-temporal data and information in earth science [43,52].This cluster has the largest ALS value, indicating that its research contents are highly concentrated.
Furthermore, the APY when a keyword appeared in the GeoKG research domain was calculated and added to each node in the network, shown in Figure 10.The warmer (redder) the nodes are, the more recently the keywords have emerged.The top 10 keywords with the largest value of APY were "bert" (Bidirectional Encoder Representations from Transformers) (1), "knowledge reasoning" (1), "ontology model" (1), "smart city" (1), "machine" (2), "public transport" (2), "smart card data" (2), "models" (cluster 2), "city" (3), and "interoperability" (4).Most of these keywords appear in clusters 1 and 2, indicating that knowledge representation and GeoKG construction were hot topics in recent years.Moreover, these terms could be divided into the following three categories: (a) AI techniques (e.g., BERT, machine, knowledge reasoning), (b) knowledge representation and interoperability (e.g., models, ontology model, interoperability), and (c) smart city (e.g., smart city, public transport, smart card data, city).This means that, in recent years, more attention has been paid to the utilization of AI technologies in geoscience knowledge extraction and reasoning, as well as the extended applications of GeoKG in new fields such as smart cities.Interestingly, the APY for the keyword "ontology" (including "ontology model" and "domain ontology") is greater than 2021, as the earliest research on geo-ontology can be traced back to the 1990s [26].Such a long history of activity in the research of geo-ontology implies the fundamental role of semantic modeling and representation of geoscience knowledge in GeoKG research.This may be because it is one of the most important goals of GeoKG to transform unstructured knowledge fragments into a formal representation, to facilitate the integration of multi-source geoscience data, and to enable intelligence.

Future Directions for GeoKG Research
In addition to the above-mentioned research topics, there are other important issues that should be paid attention to, but are currently being neglected, as seen in the recent research of geoscience, as well as through examination of the latest KG and AI studies.These important issues are as follows.

Future Directions for GeoKG Research
In addition to the above-mentioned research topics, there are other important issues that should be paid attention to, but are currently being neglected, as seen in the recent research of geoscience, as well as through examination of the latest KG and AI studies.These important issues are as follows.
Generally, there are two basic types of knowledge: declarative knowledge, and procedural knowledge [5].Declarative knowledge is also known as descriptive or conceptual knowledge.It comprises all of the explicit knowledge about facts, concepts, and principles that can be used to explain and distinguish things, helping people answer the questions of what, why, and how.Procedural knowledge is also known as operational knowledge or application-context knowledge, and is normally implicit or tacit.It refers to the cognitive processes and operational procedures that define how things are conducted, helping people answer the questions of how to do something to solve a given problem.For example, the experiences and steps to build a geographic model or geoprocessing workflow for a specific application context [53,54].Existing GeoKGs mainly focus on declarative knowledge, since it is easier to extract and represent than procedural knowledge.Thus, new methods are required in the future to represent procedural knowledge in GeoKG.One possible solution could be case-based methods which transform the acquisition of implicit procedural knowledge from the elicitation of explicit knowledge (e.g., rules) into a task of gathering historical cases, and that is easier and more efficient [55][56][57].Prof. John P. Wilson, a famous geographer and Editor-in-Chief of the journal Transactions in GIS, has featured case-based method as one of the future needs and opportunities to capture and use the relevant digital terrain modeling application-context knowledge [58].
Geoscientific models have been recognized as powerful and effective tools to solve complex geoscientific problems.Prof. Krzysztof Janowicz outlined spatially explicit models as one of the significant research directions of GeoAI in one of his hot papers (with an LCS value of 140) [10].To date, the number of geoscientific models available across various subdomains of geoscience, including earth and environmental science, geography and remote sensing, and related fields, has increased significantly [59].Consequently, it is increasingly difficult for users, especially non-experts, to discover and build fit-for-application models.Therefore, intelligent methods and tools that can minimize the dependence on users' modeling knowledge and skills, e.g., question answering and recommendation of models and input data for specific application contexts, are urgently needed [53,54,60].This idea is similar to the semantically aware environmental modeling approach proposed by Villa et al. [61], who have received 91 citations in the local database.However, existing GeoKGs have mostly ignored the knowledge of geoscientific models, while research on model knowledge or intelligent modeling has neglected the construction of GeoKGs [61][62][63][64].Thus, further studies of GeoKGs regarding the knowledge representation of geoscientific models would be worthwhile.
Multi-modal knowledge graph (MMKG) is a key step towards the realization of human-level machine intelligence [65].The search of the term "multi-modal knowledge graph*" in WoSCC from 2020 to 2023 returned 225 publications, reflecting the research enthusiasm for this topic in the field of KG.Geoscience knowledge is inherently multimodal.For example, both text and maps are essential for understanding, representing, and propagating geospatial information [66].The systematization, completeness, and richness of geoscience knowledge vary significantly between different modalities [2].In addition, learning from multi-modal sources, including the correspondences between modalities, makes it possible for AI to gain an in-depth understanding of natural phenomena, i.e., it improves the robustness and performance of deep learning models [67].Thus, geoscience data in different modalities such as text, images, maps, schematic diagrams, data tables, and videos are important sources for constructing and updating GeoKG [2,3,68].However, most of the existing GeoKGs focus on representing textual geoscience knowledge, while paying little attention to the proliferation of multi-modal geoscience data.This weakens the capability of machines to describe and understand the real world [65].Thus, more efforts are required in the future to construct multi-modal GeoKGs, i.e., to associate symbolic knowledge in a traditional GeoKG, including entities, concepts, relations, etc., to their corresponding entities in other modalities [65].
Large language models (LLMs) have achieved huge success in recent years for their great performance in the field of AI, especially in NLP tasks such as question answering and text generation.A total of 1117 related articles published in 2023 were retrieved from WoSCC using the search term "Large Language Model* or LLM*", indicating the huge impact of LLMs on the research of AI.LLMs and KGs can mutually enhance each other.LLMs can be applied to augment various KG-related tasks, e.g., KG construction, KG embedding, KG completion, and KG-based question answering, to improve the performance and facilitate the applications, while KGs can be used to augment LLMs for, e.g., training and prompt learning, or providing explicit domain knowledge, so as to mitigate hallucination and improve interpretability [69,70].However, while integrating LLMs into geoscience is currently a hot topic [71,72], no research has been found that investigated the integration of LLMs and GeoKGs.Therefore, it is strongly recommended to unify LLMs and GeoKGs in the future.This may not only change the trend in GeoKG research, but also delay the predicted maturity year (i.e., 2028).

Conclusions and Limitations
The purpose of this study is to analyze the current state and future trends in GeoKG research from a quantitative perspective using bibliometric techniques.A total of 294 papers concerning GeoKG research published from 2012 to 2023 were collected from the WoSCC database and analyzed using the bibliometrix R package and VOSviewer software.Results of the bibliometric analysis show that there has been an ongoing increase in GeoKG research over the past 12 years, particularly since 2019.This trend will be sustained until 2028, as predicted by the Logistic Growth Model.ISPRS International Journal of Geo-Information and Computers & Geosciences were the most productive and most cited journals in this field, respectively.The research areas of most publications were concentrated in computer science and the sub-disciplines of earth science, including remote sensing, geology, and geography.Moreover, researchers including Janowicz Krzysztof, Ma Xiaogang, Mai Gengchen, and Qiu Qinjun have been highly active in GeoKG research.China has contributed most of the publications in this field, and the Chinese Academy of Sciences has been the most productive institution.Scientific collaboration on GeoKG research is frequent, but still needs to be enhanced, especially for international and inter-institutional collaboration.This analysis also detected that geoscience knowledge representation, information extraction, GeoKG construction, and GeoKG-based multi-source data integration were currently the hot spots in the field.More studies are required for the application of GeoKG.Four research directions, including the representation of procedural knowledge and geoscientific model knowledge in GeoKGs, the construction of multi-modal GeoKGs, and the integration of LLMs and GeoKGs are worthy of attention, and they are expected to become the major research directions in the future.
The major contributions of this review include the following aspects.First, it provides researchers, policymakers, and practitioners with systematic information on the study of GeoKG, helping them to better understand the current state and trends in research in this field or to evaluate the effects of fundings and policies on GeoKG.Second, findings of influential publications and prolific sources provide suggestions about sources to which scholars, especially newcomers to the field, can track the research frontiers and publish their work.Additionally, the results provide valuable information for scientists and institutions to find potential collaborators.More importantly, findings from the review remind researchers of the key research methods and topics in the field as well as the future directions.
However, several limitations of this study need to be acknowledged.First, only English publications from the WoSCC database were collected.Papers written in other languages and distributed in other databases have not been included in this study, and may result in deviation in the results.Adding more data sources such as Scopus and arXiv could make the review more comprehensive.It is the same with papers written in other languages, e.g., Chinese.Second, the time is limited to 2012-2023.Thus, papers published before 2012 and after 2023 were excluded from the study.Expanding the time may provide a more historic view of the field.Third, the growth trend in the publications may last longer than the model predicts, since it could be affected by many factors such as emerging AI techniques and new big science programs.Finally, while bibliometric analysis has its advantages, it is difficult for this quantitative approach to form a deep and thorough conclusion for this interdisciplinary field adequately.Therefore, qualitative review methods that incorporate expert opinions could be employed in the future to enrich our understanding of this evolving and complex research area.

Figure 1 .
Figure 1.The research framework of this review.

Figure 1 .
Figure 1.The research framework of this review.

Figure 2 .
Figure 2. Trends in the annual number of publications and citations on GeoKG during 2012-2023.

Figure 3 .
Figure 3. Growth trend curve of publication number.

Figure 2 .
Figure 2. Trends in the annual number of publications and citations on GeoKG during 2012-2023.
illustrates the logistic growth function (or the S-curve function) for the global publication accumulation.It is described by Equation (1) as follows: y = 738.03 1 + 790.9e −0.52(x−2011)

Figure 2 .
Figure 2. Trends in the annual number of publications and citations on GeoKG during 2012-2023.

Figure 3 .
Figure 3. Growth trend curve of publication number.

Figure 3 .
Figure 3. Growth trend curve of publication number.

Figure 3 :
(a) infant stage (before 2020, up to 10% of publication output), (b) growth stage (2020-2028, 10-90% of publications), and (c) mature stage (after 2029).At the infant stage, GeoKG gained less attention and the annual publication numbers increased slowly, with no more than 30 articles per year.The importance of GeoKG was gradually recognized in the growth stage, and the number of publications and citations increased exponentially.One reason could be the success of general KGs in the computer science field and prominent industries.Another reason was likely because of the official launch of the Deep-time Digital Earth (DDE) project in February 2019.DDE is the first IUGS (International Union of Geological Sciences)-recognized big science program.Its research plan on building deep-time Earth knowledge systems may ISPRS Int.J. Geo-Inf.2024, 13, 255 8 of 21

Figure 5 .
Figure 5. Collaboration networks of core authors on GeoKG research.

Figure 6 .
Figure 6.Collaboration networks with the average publication year of each author.

Figure 5 .
Figure 5. Collaboration networks of core authors on GeoKG research.

Figure 5 .
Figure 5. Collaboration networks of core authors on GeoKG research.

Figure 6 .
Figure 6.Collaboration networks with the average publication year of each author.Figure 6. Collaboration networks with the average publication year of each author.

Figure 6 .
Figure 6.Collaboration networks with the average publication year of each author.Figure 6. Collaboration networks with the average publication year of each author.

Figure 7
Figure 7 shows the collaboration network map of the institutions with minimal number of publications of three.It consists of 45 institutions grouped into 17 clusters.Six clusters were established around three up to eight institutions, and the rest clusters included only one or two institutions.Similar to Figure 5, the size of the nodes represents the TLS of institutions, and lines connecting the nodes indicate the inter-institutional collaborations.Nodes sharing the same color signify institutions that exhibit greater collaboration compared to others.Details of the institutions with a minimal number of nine publications are listed in Table 4.As a result, the most collaborative institutions were the Chinese Academy of Sciences, University of Chinese Academy of Sciences, China University of Geosciences Wuhan, Tsinghua University, Chengdu University of Technology, and Nanjing University; each had collaborated with 20, 14, 13, 13, 12, and 12 other institutions.Notably, all six institutions are based in China, demonstrating the close cooperation within the country.However, the fragment of the network suggests the collaboration among institutions based in different countries should be strengthened.ISPRS Int.J. Geo-Inf.2024, 13, 255 13 of 21

Figure 7 .
Figure 7. Institutions' collaboration network map and clusters based on VOSviewer.

Figure 7 .
Figure 7. Institutions' collaboration network map and clusters based on VOSviewer.

Figure 8 .
Figure 8. Countries' collaboration world map.Numbers in brackets represent the links between the country and others.

Figure 8 .
Figure 8. Countries' collaboration world map.Numbers in brackets represent the links between the country and others.

Figure 9 .
Figure 9. Distribution of keywords and topics on GeoKG research based on VOSviewer.

Figure 9 .
Figure 9. Distribution of keywords and topics on GeoKG research based on VOSviewer.
ISPRS Int.J. Geo-Inf.2024, 13, 255 16 of 21geoscience knowledge in GeoKG research.This may be because it is one of the most important goals of GeoKG to transform unstructured knowledge fragments into a formal representation, to facilitate the integration of multi-source geoscience data, and to enable intelligence.

Figure 10 .
Figure 10.Keywords' average year distribution in the GeoKG research domain.

Figure 10 .
Figure 10.Keywords' average year distribution in the GeoKG research domain.

Table 1 .
Top 11most influential documents ranked using LCS with a minimal value of 8.
DOI: Digital Object Identifier; LCS: local citation score; GCS: global citation score.

Table 2 .
Top nine core sources on GeoKG that have published at least five papers.

Table 3 .
Top ten most prolific authors ranked by the number of publications.
NP: number of productions; TC: WoSCC times cited count; PY_start: first year published; CAS: Chinese Academy of Sciences.

Table 4 .
The top 15 most contributed institutions with a minimal publication number of six.
NP: number of productions; TC: times cited count; AC: average citations; TLS: total link strength; APY: average publication year; PRC: People's Republic of China.

Table 5 .
The top nine most contributed countries on GeoKG research.
NP: number of productions; SCP: singular country publication; MCP: multi-country publication; ACP: average citations per article; TLS: total link strength.

Table 6 .
Detailed information on the top seven clusters.Centered authors are those who have the largest total link strength (TLS) in the cluster; NA: number of core authors included in the cluster; NMIA: number of the most influential authors (see Table3) included in the cluster; APY: average publication year. *

Table 7 .
Clusters of keywords in GeoKG publications.

Table 7 .
Clusters of keywords in GeoKG publications.