Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

: Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difﬁculties in designing practical features during the NER classiﬁcation. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random ﬁeld (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.


Introduction
Knowledge graphs of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard mitigation. There is significant literature related to geological hazard research on the Wanfang academic platform (Wanfang database), and it is difficult for researchers to read all of these articles to find the information they need. Using machine learning methods to recognize the named entities from the geological hazard related literature and constructing a knowledge graph can greatly enhance the reuse of literature, and increase efficiency and convenience in the research and governance of geological hazards.
Named entity recognition (NER) is a technology to classify mentions of entities in unstructured text into pre-defined categories. Named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. Named entities in geological hazard literature have diverse forms. For example, Los Angeles, the City of Los Angeles, and L.A., are different expressions of the same location name. Named entities in geological hazard literature have ambiguous semantics.
For example, Jordan is an Arab country named the Hashemite Kingdom of Jordan in Western Asia, but also refers to a famous basketball player named Michael Jordan, depending on the context. Besides, named entities in geological hazard literature have an uncertain context. The context of the same entity is not the same. For example, the phrase prior to "Los Angeles" can be the phrase "located at" or "near." Therefore, it is challenging to design features with complete accuracy, which makes the recognition of named entities difficult and potentially ineffective.
Focusing on the above problems, in this paper, we propose a deep learning-based method; namely, the deep, multi-branch BiGRU-CRF model, for NER of geological hazard literature named entities. The proposed deep, multi-branch BiGRU-CRF model combines a multi-branch BiGRU layer and a CRF model. Considering that named entities in geological hazard literature are diverse in form, we used the context information of the named entities in the whole sentence to help to predict on the named entities. Considering named entities in geological hazard literature are ambiguous in semantics, we propose a multi-branch structure to extract different levels of semantic information, and use the attention mechanism [1] and residual structure [2] to enhance the feature from each branch of different depths. Considering named entities in geological hazard literature are uncertain in context, we use BiGRU layers to extract the contextual features of the named entities in both the forward and reverse directions. However, because the tag sequences themselves are also constrained, the multi-branch BiGRU layer does not learn these dependencies very well. Therefore, we added a CRF layer on top of the multi-bidirectional GRU layer. The CRF model is used to further constrain the tags with context information in different time steps and ultimately to output the optimized tags of the currently observed Chinese characters.
Besides the deep, multi-branch BiGRU-CRF model, we proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. In the pattern-based corpus construction method, we first obtained a large number of seeds automatically by some manually designed patterns, and then backed up the seeds in a large amount of geological hazard research literature to construct a large-scale geological hazard NER corpus using a maximum forward matching (MFM) method. The proposed NER model achieved an average precision of 0.9413, an average recall rate of 0.9425, and an average F1 score of 94. 19. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.
The main contributions of the proposed method are as follows: • To the best of our knowledge, this is the first work to apply the NER technique to extract named entities and build a knowledge graph for geological hazards literature.

•
This paper proposed a deep learning-based NER model that combines a multi-branch BiGRU layer and a CRF model for geological hazard NER. The model uses a multi-branch structure; each branch contains a BiGRU layer of different depths to extract different levels of features, and then further enhances the preliminary features using the attention mechanism and the residual structure.

•
This paper proposed a pattern-based method to build a large-scale geological hazard literature NER corpus with little manual costs.
The rest of this paper is structured as follows. Section 2 shows related work. Section 3 shows preliminaries. Section 4 introduces our approach, and Section 5 presents the implementation. Section 6 summarizes experimental results. Section 7 discusses the paper and Section 8 concludes the paper.

Related Work
With the development of statistical machine learning methods and natural language processing technology, in recent years, many scholars and institutions have begun to study how to use natural language processing (NLP) [3] technology to extract knowledge and construct knowledge graph from geoscience-related literature.
Zhu et al. [4] conducted knowledge extraction on a large number of geological hazard literature and linked open data (LOD) [5], and constructed a knowledge graph. Specifically, the TextRank [6] algorithm was first used to extract the literature keywords, and the geological domain entities were obtained by combining the entries of the open link data (such as Baidu Encyclopedia, Interactive Encyclopedia, and Wikipedia) and the extracted keywords. On this basis, the key rule algorithm was used to obtain the relationship and build the geological knowledge map. This method of using a LOD (Baidu Encyclopedia, Interactive Encyclopedia, and Wikipedia) entry catalog to acquire relevant geological domain entities was groundbreaking. However, this method can only get entities that are already included in the encyclopedic knowledge base and LOD. The coverage of geological knowledge contained in current general encyclopedias (Baidu Encyclopedia, Interactive Encyclopedia, and Wikipedia) is small. Therefore, the scale and depth of the knowledge graph constructed using this method are relatively small.
In order to better extract knowledge from the unstructured geoscience literature, Wang et al. [7] designed a workflow for knowledge extraction and construction of knowledge graph for geoscience literature. First, a corpus containing domain corpus and general domain corpus were constructed for word segmentation. Secondly, based on this corpus, a word segmentation model was trained using the conditional random field (CRF) [8]. Then, they used this model to segment the literature. Finally, the TF-IDF [9,10] method was used to extract the keywords of the literature, and the keywords with relatively large co-occurrence relations were connected to form a knowledge graph. Shi et al. [11] also used TF-IDF to extract keywords to construct a knowledge graph. However, unlike Wang et al. [7], Shi et al. [11] trained a CNN-based classifier that automatically divides the geoscience literature into four categories (geophysics, geology, remote sensing, and geochemistry) and then constructs the corresponding knowledge graph.
These methods have brought great inspiration to the extraction of knowledge and knowledge graph construction in the geoscience literature, but there are also some shortcomings worth improving. These methods use statistical analysis methods to extract keywords, high-frequency words, etc., rather than entities, as nodes in their knowledge graph. Nevertheless, more often, in order to better analyze and understand the geological disaster literature, we need to extract the entities in the literature that represent specific categories and meanings, such as methods, data, etc.
NER is the task of identifying a named entity in text and classifying it into a specified category [12]. NER was first proposed in the MUC [12] mission of the 1980s and has been a hot topic in natural language processing research.
Some studies start with text mining methods and build specific rules for NER. These methods adopt the strategy of bootstrapping to extract entities of the specified categories from the Web. Representative work includes the TextRunner system [13], the Snowball system [14], and the CasSys system [15]. The disadvantage of these methods is that the bootstrapping iteration introduces noise instances and noise templates, resulting in poor results.
Since the 1990s, statistical models have been the mainstream method for NER. There are a number of statistical methods [16,17] used to extract entities from text, such as as the maximum entropy model (ME) [18][19][20], support vector machines (SVM) [21][22][23][24], the hidden Markov model (HMM) [25][26][27], the CRF model [28][29][30], and so on. Statistical model-based methods typically formalize entity recognition tasks from the input text to predict specific target structures, use statistical models to model the association between input and output, and use machine learning methods to learn parameters of the model.
With the excellent performance of deep learning in different fields, more and more deep learning models have been proposed to solve the problem of NER. Currently, there are two typical deep learning architectures for NER. The first is the NN-CRF architecture [31][32][33][34], in which CNNs/RNNs are used to learn the vector representation at each word position. Based on the vector representation, the CRF layer decodes the best label at that location. The second adopts the idea of sliding window classification, uses neural networks to learn the representation of each n-gram in the sentence, and then predicts whether the n-gram is a target entity [35][36][37]. Compared with the traditional statistical model, the main advantage of the deep learning method is that its training is an end-to-end process, without the need to manually design related features. Besides, deep learning facilitates learning a specific representation of the task. By learning the correlation of information between different modalities, different types, and language environments, better entity recognition performance can be achieved.
These NER methods provide a useful reference for NER tasks in geoscience. Sobhana et al. [38] first used the CRF model combined with some manually designed features (such as prefixes and suffixes for words) to extract 17 types of geoscience-related entities from geoscience texts. Considering named entities in geological hazard literature are diverse in form and complicated in context, it is challenging to design practical features, resulting in a poor performance by CRF models that rely on manually designed features.
Inspired by the above NN-CRF architecture [31][32][33][34], in this paper, we propose a deep learning-based method; namely, the deep, multi-branch BiGRU-CRF model, for NER of geological hazard literature named entities. The proposed deep, multi-branch BiGRU-CRF model combines a multi-branch BiGRU layer and a CRF model. The multi-branch structure combines the attention mechanism and the residual structure, which can learn different depths and levels of features. The BiGRU network can obtain the context information of named entities from both forward and reverse directions. The CRF model can further optimize the prediction results based on the dependencies between the tags.

Preliminaries
In the deep, multi-branch BiGRU-CRF model for geological hazard NER, we use two widely used models, GRU and CRF. They are introduced in the preliminary section.

GRU
Since a recurrent neural network (RNN) [39,40] does not handle long-range dependencies well, a long short-term memory network (LSTM) [41][42][43] is proposed. GRU [44], which can be seen in Figure 1, is a variant of LSTM. GRU maintains the effects of LSTM while making the structure simpler, and it has a wide range of applications in many tasks of natural language processing, sequence analysis, image processing, etc., [45][46][47]. The GRU model has only two gates, the update gate and the reset gate; namely, z t and r t in Figure 1. The update gate is used to control the degree to which the status information of the previous moment is brought into the current state. The larger the value of the update gate, the more the status information of the previous moment is brought in. The reset gate is used to control the degree to which status information from the previous moment is neglected or forgotten. The smaller the value of the reset gate, the more the information from the previous moment is neglected. The reset gate helps capture short-term dependencies in the time series data, while the update gate helps capture long-term dependencies in the time series data [42,45,[48][49][50].
The reset gate r t and update gate z t are defined as follows: where σ is the sigmoid activation function [51]. h t represents the implied state and is defined as follows: where is the element product operator of two vectors, and h t represents the candidate implied state and is defined as follows: The candidate implied state h t uses a reset gate r t to control the inflow of the last implied state h t−1 containing past time information. If the value of the reset gate r t converges to a value closed to 0, the last implicit state h t−1 will be discarded. Therefore, the reset gate r t provides a mechanism to discard past implied states that are unrelated to the future; that is, the reset gate r t determines how much information is left in the past. The implicit state h t uses the update gate z t to update the last implicit state h t−1 and the candidate implied state. Updating the gate can control the importance of the past implied state at the current moment. If the value of the update gate always converged to a value closed to 1, the past implied state will be saved over time and passed to the current time. This design can cope with the vanishing gradient problem [52,53] in the recurrent neural network and better capture the large interval dependencies in the time series data.

CRF
The CRF model is a discriminant probability, undirected graph learning model proposed by Lafferty [8] based on the maximum entropy model [54] and hidden Markov model [55]. CRF was first proposed for sequence data analysis and has been successfully applied in the fields of natural language processing (NLP), bioinformatics, machine vision, and network intelligence [56][57][58][59].
Let G = (V, E) be an undirected graph, where V is the set of nodes and E is the set of edges, and let Y = {Y v |v ∈ V} be a set of random variables Y v indexed by node v in V. Given a condition of X, if each random variable Y v obeys the Markov property: then (X, Y) constitutes a CRF, where X represents the observed sequence and u ∼ v represents all neighbor nodes of u connected by the node v in graph G.

Linear Chain-CRFs
Linear chain-CRFs [60], as shown in Figure 2 are a common form of CRF model. Let x = {x 1 , x 2 , · · · , x n } denote the observation sequence and y = {y 1 , y 2 , · · · , y n } be the set of finite states, according to the basic theory of the random field: where the terms are defined as follows: t k (y i+1 , y i , x, i): transfer characteristic function between the marked positions i and i + 1 of the observed sequence. It is used to characterize the correlation between adjacent finite states and the influence of observation sequences on them.
λ k : weights of the transfer characteristic function t k (y i+1 , y i , x, i). s l (y i , x, i): State feature function of the observed sequence at position i. It is used to characterize the effect of observation sequences on finite states. µ l : weights of the state feature function s l (y i , x, i). Z(x): a normalization factor used to ensure that formula (6) is a correctly defined probability.

The Proposed Methods
In this section, the proposed method is introduced in detail. The proposed method aims to extract geological hazard named entities from the considerable body of geological hazard literature and build a geological hazard knowledge graph.
In this paper, we propose a geological hazard NER model based on the deep learning method; namely, the deep, multi-branch BiGRU-CRF model, to extract geological hazard named entities and construct a knowledge graph. Since the proposed model is a supervised model that requires an annotated corpus, we propose a pattern-based corpus construction method to provide a corpus for the deep, multi-branch BiGRU-CRF model. The proposed method is presented in two parts: pattern-based corpus construction and the deep, multi-branch BiGRU-CRF model for NER.

Pattern-based corpus construction. Given literature documents
is the n-th document and Patterns P = {p m , p l , p d } where p m , p l , p d are patterns for methods, location, and data, respectively. The pattern-based corpus construction method aims to construct a named entity corpus C. 2. The deep, multi-branch BiGRU-CRF model for NER. Given literature documents is the nth document and the named entity corpus C, the proposed deep, multi-branch BiGRU-CRF model aims to extract methods location, and data entities from F and constructs a knowledge graph G.

Pattern-Based Corpus Construction
Pattern-based corpus construction can be divided into three steps. Firstly, we define the three named entities we want to extract. Then Patterns P = {p m , p l , and p d } are used to get the named entity seeds from the geological hazard literature  [12,17]. Named entities include person names, place names, and organization names, and times, dates, amounts, and percentages. Among them, the most commonly used named entities are person names, place names, and organization names [17,61].
For geological hazards literature research, the three named entities proposed in the literature are methods, data used, and descriptions of regions and locations. When reading geological hazards literature, researchers usually care about the study area targeted, the methods proposed, and the data used. Most articles generally have three Sections (methodology, data, and study area) that correspond to the three named entities mentioned above. These entities have the most important role in the understanding, research, and reuse of geological hazard literature. This article focuses on the extraction of the above three types of named entities: methods, data, and location. Table 1 shows the details of these entities.

. Pattern-Based Seed Acquisition
Given the three defined named entities above (methods, location, and data), we extract these entities in this section and build entity seed collections MethodsSeeds M = {m 1 , m 2 , ..., m I }, Considering there are often certain rules among named entities of geological hazards, discovering these rules and designing related patterns can help us extract these named entities. Therefore, we have designed a pattern-based seed acquisition method to obtain these named entity seeds. The manually defined Patterns P = {p m , p l , and p d }, where p m , p l , p d are patterns for methods, location, and data, respectively, are shown in Table 2: Table 2. Patterns (regular expressions) used.  Table 2 in this work are in Chinese. Please refer to Appendix A for detailed translation.

Entity Type
We use these patterns (regular expressions) to match the sentences S = {s 1 , s 2 , ..., s H } in the literature F from papers in the Wanfang database (http://www.wanfangdata.com.cn). The words that match those patterns (regular expressions) P are the entity seeds we want to extract. After that, we randomly select 2000 entity seeds each and manually check the entity seeds to evaluate the accuracy can be calculated by the following equation: where n c denotes the number of correct entity seeds and n denotes the total number of entity seeds. The results are shown in Table 3. After manually checking, all correct entities form the entity seed collections M = {m 1 , m 2 , ..., m I }, L = {l 1 , l 2 , ..., l J }, and D = {d 1 , d 2 , ..., d K }. Given the three types of entity seed collections above (M = {m 1 , m 2 , ..., m I }, L = {l 1 , l 2 , ..., l J }, and D = {d 1 , d 2 , ..., d K }) and sentences S = {s 1 , s 2 , ..., s H } , the MFM method shown in Algorithm 1 is used to automatically construct a geological hazards named entity corpus C in a character-based format named IOB format [31], where "B" indicates the starting character of an entity, "I" indicates the intermediate characters and the ending character of an entity, and "O" indicates that the character is not part of entity [62]. Table 4 shows a illustration of IOB format. We defined seven types of tags ("O," "B-MED," "I-MED," "B-DAT," "I-DAT," "B-LDS," and "I-LDS"); see Table 1. for h = 1 to H do 5: for i = 1 to I do 6: if M i in S h , and the characters of M i are unlabeled then for j = 1 to J do 11: if L j in S h , and the characters of L j are unlabeled then 12: label "B-LDS" in the first character and "I-LDS" in the remaining characters of L j in S h . 13: end if 14: end for 15: for k = 1 to K do 16: if D k in S h , and the characters of D k are unlabeled then 17: label "B-DAT" in the first character and "I-DAT" in the remaining characters of D k in S h . 18: end if 19: end for 20: label unlabeled characters in S h as "O." 21: C ← S h + C

The Deep Multi-Branch BiGRU-CRF Model
Given the corpus C constructed above, we proposed a deep learning-based model named the deep, multi-branch BiGRU-CRF model, which combines neural networks and CRF for geological hazard NER. The model is shown in Figure 3 consists of three components, which are the embedding layer, the multi-branch BiGRU layer, and the CRF layer. The embedding layer is the first layer of the model, which converts Chinese characters into dense vectors and passes them to the multi-branch BiGRU layer. The multi-branch BiGRU layer learns different levels of features through a multi-branch BiGRU layer and passes these features to the CRF layer. The CRF layer further enhances the mapping of characters to tags and the probability of transition between tags and outputs the optimized tags as the final output of the proposed model. We introduce these three layers in detail below.

Embedding Layer
Given Chinese characters w 1 , w 2 , ..., w n in sentence S i as input, where S i ∈ S = {s 1 , s 2 , ..., s H }, the first step of deep neural networks is often to refer to discrete Chinese words in sentences as continuous vectors or a matrix. This step is called embedding. We use random 100-dimensional vectors v 1 , v 2 , ..., v n as the initialized representation of the character w 1 , w 2 , ..., w n . v 1 , v 2 , ..., v n can be trained to get a better representation.

Multi-Branch BiGRU Layer
The output v 1 , v 2 , ..., v n of the embedding layer then passes through a multi-branch BiGRU layer.
For every branch t ∈ {1, 2, · · · , n}, n is the number of branches, the output h t = − → h t ; ← − h t is the concatenation of − → h t and ← − h t where − → h t and ← − h t represent the forward and inverse representation of v 1 , v 2 , ..., v n and can be calculated by Equation (3) from two different directions of GRU. Through this combination of the forward and reverse representations of v 1 , v 2 , ..., v n , we can fully consider the context content of the characters, making the feature extraction more abundant. In the experiment, our multi-branch BiGRU layer consisted of three branches with depths of 1, 2, and 3, respectively. A large number of branches bring a lot of computational burden; too few branches cannot fully extract multiple levels of features. We use three branches with depths of 1, 2, and 3 to extract the low-level, middle-level, and high-level features, that is, h 1 , h 2 , and h 3 . Then, we use the attention mechanism to weight the corresponding elements of h 1 , h 2 , h 3 to obtain the weighted feature matrix h 123 = h 1 ⊗ h 2 ⊗ h 3 , where ⊗ represents the multiplication of the corresponding elements of feature matrix. Then, the residual structure is used to add the weighted feature matrix h 123 and the low-level features h 1 , that is, h 1 ⊕ h 123 , to solve the problem of gradient disappearance and difficulty in training caused by increasing the number of layers.
is the output of the multi-branch BiGRU layer.

CRF Layer
The elements h t in h, where t represents the t-th element in h, are not completely independent. For example, when h t is "B-MED," the probability of h t+1 being "I-MED" is obviously much higher than the probability of being "B-DAT." Therefore, instead of treating h independently, we use a CRF layer to model the relationship between h and get the enhanced results. The CRF layer is added to calculate the conditional probability p(y|h) by Equation (9), where y = {y 1 , y 2 , · · · y T } represents the label sequences. (9) where γ represents the sequences of all possible tags, t represents the transition probability for a given input sequence h from y i−1 to y i , and s is the emission score of the transition from the output of BiGRU layer to y i at time step i. Finally, the model is trained by maximum conditional likelihood estimation [63] by Equation (10). The sequence that enables the conditional probability p(y|h; t, s) to get the maximum value is the output of the model.

Implementation
In this paper, the proposed multi-branch BiGRU-CRF model used the Python (version 3.6.3) programming language. The deep learning library used was TensorFlow-GPU (version 1.13.1). An NVIDIA Titan RTX GPU was used. We did not use any open APIs when obtaining the geological hazard research literature in the Wanfang database. We used web crawler technology to crawl the title and the abstract section of the theses related to geological disasters. The crawler used the Scrapy library and returned a text file, each line containing only the title and abstract of a paper. The knowledge graph was stored and visualized in the Neo4j database.

Experimental Results
This section shows the statistics of the corpus constructed by pattern-based methods, the parameter settings of training, the results of the proposed deep, multi-branch BiGRU-CRF model, and the knowledge graph constructed in the following four parts.

Corpus Constructed
The corpus was built automatically by the method mentioned in Section 4.1, containing 536,426 characters, 4548 sentences, and seven types of tags, for which the detailed statistics are shown in Table 5. We randomly split the data into a training set, a validation set, and a test set with a ratio of 8:1:1.

Training
For all the models mentioned, we update the parameters using the back-propagation algorithm and use stochastic gradient descent (SGD) to optimize our model. Our model uses three stacked BiGRU layers, each layer containing one forward GRU and one reverse GRU, and the number of neurons in each GRU is set to 100. We added a Dropout [64] between the BiGRU layer and the CRF layer to improve the model's effectiveness and prevent overfitting. The Dropout rate was set to 0.5, as higher rates negatively impacted our results, and lower rates led to longer training time.

Results
We used P (precision), R (recall rate), and F (F1 score), which are widely used evaluation criteria [31][32][33][34]65] in NER, to evaluate the three mentioned models. The larger the three evaluation criteria, the better the model's effect. P, R, and F can be calculated by the following three formulas: where n p denotes the number of true positive predictions; n t denotes the total positive predictions, including both true and false; and n c denotes the total number of predictions, including both positive and negative. The result of our NER model is shown in Table 6.
1. The CRF model is the model proposed by Sobhana et al. [38], using CRF for NER in geosciences.
We used the CRF method as our benchmark. As can be seen in the Table 6, the CRF model could initially identify these geological hazard named entities, achieving an average precision of 0.8210, a recall rate of 0.7765, and an F1 score of 79.81. 2. The BiLSTM-CRF model is the state-of-the-art model in current NER tasks [31]. It has one bidirectional LSTM layer and one CRF layer on top. As can be seen in the Table 6, the BiLSTM-CRF model has a significant lead on all indicators compared to the CRF model, with an average precision of 0.9205, an average recall rate of 0.9419, and an average F1 score of 93.10. It fully demonstrated that the BiLSTM-CRF model has more efficient feature extraction and more accurate discriminating ability after adding one bidirectional LSTM layer before the CRF layer.
3. The deep, multi-branch BiGRU-CRF model was the proposed model with a three-branch BiGRU layer, which consisted of three branches of stacked BiGRU layers with depths of 1, 2, and 3, respectively, and one CRF layer on top. As can be seen in the Table 6, the deep, multi-branch BiGRU-CRF model had a significant lead on almost all indicators (except the recall rate of methods) compared to the CRF model and BiLSTM-CRF model above, with an average precision of 0.9413, an average recall rate of 0.9425, and an average F1 score of 94. 19. It fully demonstrated that the proposed model has more efficient feature extraction and more accurate discriminating ability after adding three branches of BiGRUs with depths of 1, 2, and 3, respectively. Table 6. Result of proposed models. P, R, and F indicate our evaluation criteria precision, recall, and F1 score. DAT, LDS, and MED indicate the corresponding entity categories data, location and methods. "Avg." represents the overall weighted average score. The best performances are shown in bold.

Knowledge Graph Construction
We used the trained deep, multi-branch BiGRU-CRF model to perform NER on the geological hazard related papers in the Wanfang knowledge base, and obtain the three types of named entities (location, methods, and data) mentioned in this paper, and to construct a knowledge graph. Table 7 shows the named entities extracted from randomly selected papers. It can be seen that the proposed method correctly extracted the relevant location and area descriptions, the data used, and the models and methods used in these geological hazard research papers. This is very helpful for research, reuse, and reference on the geological hazard literature. We used the proposed model to extract the three types of named entities from the 14,630 geological hazard-related research papers crawled on the Wanfang knowledge base, and constructed a knowledge graph containing 34,457 entities nodes and 84,561 relations. For the relationships ("in location," "use methods," and "use data") that appear in the knowledge graph, in our article, we did not use any complicated relation extraction models. When constructing the knowledge graph, we simply think that if the paper contains an entity, it has a corresponding relationship. For example, if article A contains data B, we generate a triple (A-> "use data"-> B) and add it to the knowledge graph. Table 8 shows the detailed statistics of the entities of the geological hazards literature knowledge graph, and Table 9 shows the detailed statistics of the relations of the geological hazards literature knowledge graph. Figure 4 shows an overview of the geological hazard literature knowledge graph. For convenience, we only show 100 nodes in the knowledge graph and zoom in one of its parts in Figure 5. Obviously, the knowledge graph constructed can clearly reflect the relationship between literature and entities (methods, locations, and data). Table 8. Statistics of the entities in geological hazards literature knowledge graph.

Entities Type Methods Location Data Paper
The number of the entities 8530 9123 2173 14,630   At the same time, we counted the top-15 most frequently occurring entities of methods, data, and location, in the knowledge graph, which are shown in the Figures 6, 7, and 8 with the corresponding English versions. It can be seen that in methods entities, the numerical simulation method is the most widely used research methods, with a frequency of 4542 times, and the number of other methods shows a smooth downward trend. In data entities, a similar phenomenon was also present. The rainfall data and vegetation data are the most widely used research data, respectively, with a frequency of 14,539 and 13,114 times. The other types of data are not very different, showing a trend of smooth decline. In location entities, mountainous areas, mining areas, and mountains are the most studied areas, with frequencies of 5172, 4354, and 4023, respectively, indicating that these three types of areas are the most significant areas of geological hazards.

Discussions of Generalizability
In this subsection, the outcomes generalizable to other contexts (e.g., use of papers written in English) are discussed in the following two aspects.
1. Paper structure. In our practice, we crawled the abstract parts of the articles named entity recognition. Therefore, our method has no special requirements for the structure of the article, so long as the article contains a complete summary section. 2. Paper language. In terms of language (in English, for example), the model needs to be adjusted as follows: Firstly, Chinese is based on characters, while English is based on words. Therefore, to extend our method to English papers, we need to rebuild the seed acquisition patterns (in Section 4.1.2) to build a training corpus for the model. Secondly, when doing NER tasks in Chinese, one character corresponds to one tag, but in English, one word corresponds to one tag. Therefore, to extend our method to English papers, we need to change the Chinese character vectors to the English word vectors in the embedding layer (in Section 4.2.1) of the deep, multi-branch BiGRU-CRF Model.

Discussions of Extensibility
In this subsection, the extensibility of the proposed methods is discussed in the following two aspects: the flexibility to accommodate new instances and the extensibility of the type of entities extracted from the paper.
1. The flexibility to accommodate new instances. When a new paper is added to the Wanfang database, the newly added papers can be processed into nodes and edges the following three steps. The first step: crawling the abstract part of the new papers from the Wanfang database through web crawler technology; the second step: using the deep, multi-branch BiGRU-CRF model to identify the method, data, and location entities; the third step: the entity acts as a node, and the connections between the entities and the papers act as edges to the knowledge graph. 2. The extensibility of the types of entity. At the same time, we also discussed what adjustments our methods need to be made if new entity types (e.g., theory) are added. If a new entity type is added, the deep, multi-branch BiGRU-CRF model needs to be adjusted as follows: First, we need to manually design the seed acquisition patterns (such as A) and build the training corpus using the methods mentioned in Section 4.1.2. Second, due to the addition of new entity types, the probability value of the softmax output of the last layer of our model needs to be changed from 7 ("O," "I-LDS," "I-MED," "B-LDS," "I-DAT," "B-MED," and "B-DAT") to 9 ("O," "I-LDS," "I-MED," "B-LDS," "I-DAT," "B-MED," "B-DAT," "B-THE," and "I-THE") in which "THE" represents the theory entity.

Discussions of Limitations and Future Work
This research, however, is subject to several limitations. In this subsection, some possible limitations are discussed in the following two aspects. The first limitation is that the proposed method involves some manual work. First of all, our approach needs to define some patterns to obtain the initial entity seed manually. And we also need to manually check the initial entity seed to get the correct entity seed collections in Section 4.1.2. The second limitation is that we only use the most straightforward method to obtain the relationships in our knowledge graph. That is, if article A contains data B, we generate a triple (A-> "use data"-> B) and add it to the knowledge graph.
Therefore, in future work, we believe that how to reduce manual costs is still an important research topic for the geological disaster knowledge graph construction. It would be a feasible method to reduce manual costs based on weak supervision and distant supervision strategies. At the same time, how to extract more accurate and diverse relationships, and even the joint extraction of entities and relationships are also important research topics.

Conclusions
Our work aims to extract geological hazard named entities from the considerable body of geological hazard literature and build a geological hazard knowledge graph. In this paper, a deep learning-based NER model, the deep, multi-branch BiGRU-CRF model, was proposed to extract the three types of entities (location, methods, and data) in geological literature and achieved the highest average precision of 0.9413, an average recall rate of 0.9425, and an average F1 score of 94.19. Besides, since the proposed model is a supervised model which requires a corpus, we proposed a pattern-based method to construct a large-scale geological hazard NER corpus. Acknowledgments: The authors are grateful to the editors and reviewers for their valuable comments on the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Since the data used in this paper are all from Chinese literature database, phrases in Table 2 in Section 4.1 and Table 7 in Section 6.4 are all in Chinese. For express convenience, we translate these phrases to English. In this appendices, we present the original phrases in Chinese associated with the Table 2 in Section 4.1, the Table 7 in Section 6.4 in this paper. Table A1 shows the original patterns (regular expressions) in Chinese used in this work in Section 4.1. Since it is designed in Chinese, we translate it in English for better reading in Table A2.   Table A3 shows the Chinese translation of the entities extracted from geological hazard research papers in Table A4 in Section 6.4.