Intelligent generation method of emergency plan for hydraulic engineering based on knowledge graph – take the South-to-North Water Diversion Project as an example

ABSTRACT There are some problems with the traditional emergency plans of hydraulic engineering, such as low digitisation, poor knowledge relevance, insufficient intelligent decision-making, and so on. This paper proposes an intelligent method for generating the hydraulic engineering emergency plan for patrol text based on knowledge graph and machine learning. Firstly, based on the electronic documents of various plans, the knowledge graph of the emergency plan is constructed to realise the high organisation of scattered knowledge, using the skills of knowledge modelling, knowledge extraction, knowledge fusion, and knowledge storage. Then, based on bidirectional encoder representation from transformers (BERT) and bidirectional long-short-term memory with conditional random fields (BiLSTM+CRF), the entity recognition model is constructed to intelligently recognise dangers, projects, parts, and other entities in the patrol text. The Jaccard entity similarity algorithm based on the word2vec model matches the danger entity with the graph danger entities and generates the emergency plan through knowledge retrieval and reasoning. With the performance of the model and the verification of the “Channel Leakage” example, this method has high accuracy in identifying entities (the F1 value is 96.21%) and has high reliability in the generation of emergency plans, which can be applied to the emergency rescue of hydraulic engineering.


Background
Since the South-to-North Water Diversion Project officially began operation in December 2014, the allocation of water resources has extended farther along the route, playing a significant role in ensuring water security, restoring water ecology, and improving the water environment (Li et al., 2022). As the South-to-North Water Transfer Project spans many provinces and cities, the project route is long and the environment along the route is complex and varied (the three South-to-North Water Transfer routes are shown in Figure 1). There are inevitably many risks associated with the operation of the project, so an agile and effective emergency response to hazards is important.
Most of the Water Conservancy Project emergency plan materials are stored in the form of paper texts and electronic documents. There are some problems such as poor query and retrieval efficiency, weak content correlation, and insufficient intelligent auxiliary decision-making during use. Therefore, it is necessary to explore scientifically efficient methods for the intelligent generation of emergency plans and to design better decision support systems to deal with all types of hazards (Ramsbottom et al., 2017;Sorensen et al., 2017).
At present, the intelligent generation method of emergency plans mainly relies on case-based reasoning (CBR). CBR solves emergencies by matching the similarity between the target case and historical cases, as well as reusing or modifying emergency plans of historical cases with the highest similarity (Sekar et al., 2019). Many scholars have applied CBR to the field of emergency decision-making. For example, Fan et al. used CBR based on the case retrieval method and hybrid similarity calculation (Fan et al., 2014); and Zhang et al. proposed a new case adjustment method to modify and generate the emergency plan for grid stroke disasters (Zhang et al., 2015). Jiang et al. combined ontology with improved CBR to create a decision method for safety risk management . Hadj-Mabrouk set the goal to develop a new approach to the analysis and evaluation of the validity of decision support, based on machine learning and the CBR (Hadj-Mabrouk, 2020). The above research relies on the scale of the historical case base and the calculation weight of the case attributes. The generated emergency plans are presented as documents, with a low degree of digitisation and weak knowledge correlation. The knowledge graph (Chen et al., 2021), created in 2012, was initially applied to semantic search (Dong et al., 2014), question answering (Hao et al., 2017), intelligent recommendation (Gong et al., 2021), and so on. It involved the rapid acquisition, rational organisation and scientific utilisation of massive amounts of knowledge. In recent years, knowledge graphs have also been applied in the field of emergency management. Li et al. used knowledge representation to provide auxiliary decision-making for natural disasters . Liu et al. constructed a knowledge map of geological disaster emergency plans for rapid emergency response actions . Ni et al. constructed an emergency plan knowledge system to provide reliable information for emergency responders (Ni et al., 2019). Yang et al. constructed a knowledge co-occurrence network to analyse the importance of emergency management in public health emergencies (Yang et al., 2020).
As a kind of semantic network, the knowledge graph has a strong ability in knowledge organisation and expression, and it also has preliminary applications in the field of water conservancy (Diaz & Vilches-Blazquez, 2022;Yan et al., 2018), but it has not been applied to water emergency management. Therefore, taking the South-to-North Water Diversion Project as an example, this paper constructs a knowledge graph of emergency plans for water conservancy projects and combines machine learning technology to realise the intelligent generation of emergency plans.

Knowledge graph construction of the emergency plan
The knowledge graph construction process generally includes four stages: knowledge modelling (Ayachi et al., 2022), knowledge extraction (Al-Moslmi et al., 2020;Smirnova & Cudre-Mauroux, 2019), knowledge fusion (Zhao et al., 2020) and knowledge storage (Wylot et al., 2019;Zou & Özsu, 2017). Knowledge modelling refers to organising entities and related information for building a knowledge graph schema using knowledge representation language. Based on the schema, knowledge extraction extracts entities, relationships and attributes from data sources. Knowledge fusion solves the problem of ambiguity of entities during extraction. Knowledge storage refers to storing the merged entities, relationships and attributes in the graph database, which is convenient for downstream applications. Figure 2 shows the process of construction of the knowledge graph in the emergency plan for the middle route of the South-to-North Water Diversion Project.

Knowledge modelling
Knowledge modelling includes two methods: one is top-down, and the other is bottom-up. Taking the Risk Prevention and Control Manuals of the 47 management offices of the middle route of the South-to-North Water Diversion Project and the Overall Plan of the emergency rescue system for engineering risks as data sources, in this paper we combined the top-down and bottom-up methods to define knowledge graph ontologies and relationships. Then, we construct the knowledge graph schema of the emergency plan on the middle route of the South-to-North Water Diversion Project, as shown in Figure 3. This schema includes 15 kinds of entities such as engineering, risk events, risk factors, equipment, materials, and so on, and includes 18 kinds of relationships such as risk events, risk factors, rescue equipment, rescue materials, etc.

Knowledge extraction
Under the guidance of the knowledge graph schema, this paper adopts different ways to complete knowledge extraction based on the structural characteristics of data sources.

Knowledge extraction from the Risk Prevention and Control Management Manual
Since the Risk Prevention and Control Management Manual is mainly composed of semi-structured data, this paper uses the traditional manual extraction method combined with the regular template method to achieve knowledge extraction. Some examples are shown in Table 1.

Knowledge extraction from the Overall Plan of the emergency rescue system for engineering risks
The Overall Plan of the emergency rescue system for engineering risks is mainly composed of unstructured data, but it has a clear text structure which has the mode of "keyword + short text". Therefore, in the process of knowledge extraction, this mode is used to extract entities and relationships, as shown in Table 2.

Knowledge fusion
In the process of graph construction, entity ambiguity cannot be avoided, so knowledge fusion needs to be performed before storing data to obtain a standardised and unified description. This paper realises entity fusion by clustering and linking. Since the knowledge graph of the emergency plan for the middle route of the South-to-North Water Diversion Project belongs to the vertical domain knowledge graph and data sources are scarce, this graph only fuses three kinds of entities: engineering, risk event and risk factor.
The entity fusion process includes three parts: entity name expansion, candidate entity generation and candidate entity sorting. According to the background of the entity reference, the entity reference extension is expanded into a full name to reduce ambiguity; for example, "overloading" can be expanded to "vehicle overloading". For each entity reference M, the candidate entity generation returns the candidate entity set E < E 1 , E 2 , E 3 , . . . > according to its label in the knowledge graph. For example, the entity reference M "water channel destruction" returns candidates with the same label such as E 1 "structural failure", E 2 "component failure", etc. Based on the generated entity set E, the candidate entity sorting means that we extract the semantic feature vector of the entity reference M and each candidate entity E i with the Word2Vec model, calculate the spatial distance between M and E i with a support vector machine model, sort and select the optimal entity E top from the <M, E top > pairs used as an alias attribute of E i . For example, the matching of M "water conveyance channel damage" and E i "structural damage" in set E has the highest score, so we return and save the optimal entity pair <water conveyance channel damage, structure damage>. Finally, knowledge fusion is realised, and entity ambiguity is eliminated.

Knowledge storage
Compared with the relational database, the graph database has the characteristics of high traversal efficiency and strong relational expression. Moreover, the scale of the knowledge graph for emergency plans of the middle   Backhoe, manual auxiliary tools; auger, grouting machine, manual auxiliary operation tools.

Materials
The plastic waterproof film, reinforcement, cement, natural sand and stone.

Storage location
The plastic waterproof film is stored in the management office emergency materials warehouse, and cement, natural sand, and stone materials are purchased locally.

Risk consequence
Serious deformation forms hidden danger of landslides; the small-scale landslide forms the outburst of hidden danger; massive landslides block channels. Engineering risk part Top of channel slope; middle channel slope; near Grade 1 bridleway; the outer slope of the building.
route of the South-to-North Water Diversion Project is small, and the downstream tasks have higher requirements for relational expression. Therefore, this study adopts the Neo4j graph database to achieve knowledge storage which has a high query efficiency in seeking entities and relations. We organise the entity relationships obtained by knowledge extraction and knowledge fusion into a triplet (entity1, relationship, entity2), such as (plumbing, risk factor, ant-rat-hole hazard), (Zhangzhuang Bridge, risk value, 4.0), etc.

Method for intelligent generation of emergency plan
This paper proposes an intelligent generation method for the emergency plan of the middle route of the South-to-North Water Diversion Project based on the knowledge graph. The specific flow chart is shown in Figure 4. First, based on the bidirectional encoder representation from transformers (BERT) and bidirectional long-short-term memory with conditional random felds (BiLSTM+CRF) model, we identified the engineering, location, risk, engineering risk part, and other entities in the patrol text. Then, we sorted the risk entities of the map by the Jaccard algorithm in a coarse-grained manner to generate a set of risk candidate entities. Second, based on the Word2Vec model combined with the Jaccard algorithm, we fused the three identified entity features -risk, engineering, and engineering risk part -and align the identified risk entity with the set of risk candidate entities by calculating the feature similarity. Finally, by combining the emergency plan template, we used graph reasoning technology to intelligently generate the emergency plan (Chen et al., 2016(Chen et al., , 2020.

Entity recognition model based on BERT +BiLSTM+CRF
Entity recognition methods include dictionary-rulebased methods (Rau, 1991), traditional machine learning methods (Liu et al., 2011) and deep learning methods (Zhiheng et al., 2015). The entity recognition model based on dictionary rules or traditional single machine learning methods is limited by a lack of universality and by low accuracy. Therefore, based on the BERT deep learning model (Devlin et al., 2018), this paper constructed a BERT+BilSTM+CRF model to identify risk entities. The model is finetuned and trained based on Google's large-scale Chinese corpus pre-training model, which can effectively avoid a large amount of labelled data and achieve high entity recognition accuracy. The specific structure of the model is shown in Figure 5. The model initially converts the patrol text into a list of numbers through the encoding layer and then generates a matrix T carrying semantic features through the BERT pre-training model. The vector T is input to the BiLSTM layer to obtain the contextual semantic information and to output as a matrix O. Then the concatenation layer concatenates the matrix T with the matrix O to get the output matrix E. The fully connected layer (FC) is used to reduce the dimension of matrix E and the CRF layer is used to correct the output matrix. Finally, the optimal label sequence is generated.

Jaccard entity similarity algorithm based on Word2Vec model
The traditional entity similarity calculation method was primarily based on the Jaccard algorithm, which has high accuracy but low recall and lacks semantic information. There are three common methods for calculating entity similarity: One is based on a dictionary or a certain classification system. Common dictionaries include Hownet, Wordnet and synonym word forest; their construction methods are different from each other, but they are all limited to the general field and are difficult to apply to the professional field of hydraulic engineering. The second is a statistical method based on the context space vector, represented by Google's Word2Vec (Mikolov et al., 2013). This method maps words into space vectors and calculates the similarity by the distance of the vectors. The last method is based on deep learning, which requires a huge corpus and high computation and cost. In this paper we select the Jaccard entity similarity algorithm based on the Word2Vec model to calculate the similarity of risk entities. The overall idea of the algorithm is as follows: Step 1: Use the Overall Plan of the emergency rescue system for the middle route of the South-to-North Water Diversion Project and the unstructured text information in the Risk Prevention and Control Manual as the corpus to train the Word2Vec model and obtain the word vector.
Step 2: Take the risk, engineering, and engineering risk part which are obtained by the entity recognition model in Section 3.1 as the target entity set A[a 1 , a 2 , a 3 ].
Step 3: Bring the target risk entity a 1 and the risk candidate entity set T[t 1 ,t 2 ,t 3 , . . .] into the graph as the input of the Jaccard entity similarity algorithm, output the list of risk candidate entities W[w 1 ,w 2 ,w 3 , . . .] by coarse-grained sorting, and use the Word2Vec model to convert the list W into the feature vector H[h 1 ,h 2 ,h 3 , . . .]. The equation for the Jaccard entity similarity algorithm is as follows: Step 4: Input the target entity set A into the Word2Vec model to obtain the vector V[v 1 ,v 2 ,v 3 ], use the weighted average method to fuse various entity feature vectors v i into a target feature vector aim, calculate the cosine similarity of a i with candidate feature vector h i , and finally return the score S[s 1 ,s 2 ,s 3 , . . .] of each risk candidate entity t i . The cosine similarity calculation equation is as follows: Step 5: Take the target entity set A and the risk candidate entity set T as the input of the Jaccard algorithm, and output the score Q[q 1 ,q 2 ,q 3 , . . .] of each risk candidate entity t i .
Step 6: Add the scores S and Q, sort and return the top five risk candidate entities R[r 1 , r 2 , r 3 , r 4 , r 5 ] with the highest scores.

Emergency plan template
We analysed the Risk Prevention and Control Manual of the middle route of the South-to-North Water Diversion Project and the Overall Plan of the emergency rescue system for the project. The emergency plan is divided into four parts, namely project overview, risk analysis, rescue plan and material preparation point. The project overview includes the project, the location, the risk value and the risk level. The risk analysis includes the inducing factors, related risk events, the consequences and the engineering risk part. The rescue plan includes equipment, materials, storage locations and rescue measures. The material preparation point includes the material preparation point numbers, the main channel number, length, area, etc. The specific emergency plan template is shown in Figure 6.
Based on the risk entity of the graph matched in Section 3.2 and the engineering, location, engineering risk part and other entities identified in Section 3.1, in this paper we combined various elements of the emergency plan template, and used knowledge graph reasoning and knowledge graph retrieval to realise the intelligent generation of the emergency plan.

Experimental environment
The experimental running environments of this paper are CPU (Intel (R) Core (TM) i7-8700), GPU (NVIDIA GeForce GT 710), running memory of 16 GB, python version 3.6, operating platform pycharm 2020.3, Keras neural network framework. Some parameters of the BERT+BiLSTM+CRF model are shown in Table 3.

Entity and relation extraction results
Taking 47 risk prevention and control manuals for the middle route of the South-to-North Water Diversion Project and the Overall Plan of the emergency rescue system as the data sources, a knowledge map of emergency plans for the middle-line project of the South-to-North Water Diversion Project was constructed based on the construction plan of the knowledge graph.
We extracted 15 kinds of entities such as projects, locations, management institutions, risk values, risk events, etc. (as shown in Table 4 for specific entity statistics), and extracted 17 kinds of relationships such as risk events (engineering to risk events), engineering risk value (engineering to risk value), engineering risk level (project to risk level), location (project to location), including risk factor (risk event to risk factor), etc. (as shown in Table 5 for specific relationship statistics).

Knowledge graph results
Due to the large number of entities and relationships of the middle route of the South-to-North Water Diversion Project, this article shows only part of the entities and relationships of the knowledge map, as shown in Figure 7.
Inverted siphons of the trunk canal end branch canals in Taocha county, subordinated to the canal crossing bridge inverted siphon, which is located in Dengzhou and affiliated with the Dengzhou administrative office. This engineering has a risk event, seepage damage, which includes two sub-risk events, seepage soil and piping. Each sub-risk event entails different risk factors, and the risk event and risk factor   correspond to control measures and preventive measures, respectively. The control measures involve two kinds of entities, emergency equipment and emergency materials, and the risk events have different control measures according to different engineering risk parts. In this way, we can associate entities such as engineering, risk, control measure, rescue equipment and rescue material.

Analysis of entity recognition model results
The BERT+BiLSTM+CRF entity recognition model constructed in this paper mainly identifies eight types of entities: project, stake number, location, management office, risk value, risk event, risk factor and engineering risk part. We divided more than 28,000 pieces of manually marked data into independent and different training sets, validation sets and test sets.
And, based on templates, we use data augmentation methods to expand the training sets and validation sets and finally input them into the BERT+BiLSTM+CRF model for training. The model training results are shown in Table 6. The results show that the model has an F1 value of 96.21% and can accurately identify risk entities and related entities. The model can also be used for knowledge extraction when updating the graph in the future, which has significant importance.

Analysis of entity similarity algorithm results
In this paper we design a Jaccard entity similarity algorithm based on the Word2Vec model, manually construct a test set for the risk entities similarity algorithm and compare the effects of a single Word2vec model and a single Jaccard algorithm. The test set contains a total of 300 groups of entity lists in the  form of [(engineering, risk1, engineering risk part), (risk2)], and each group of entity lists is used as the input for the entity similarity algorithm. If five risk entities of the output contain risk2, it is regarded as a correct matching. The evaluation results of the three algorithms are shown in Table 7.
The experimental results show that the Jaccard entity similarity algorithm based on the Word2Vec model is better than the single Word2vec model and the single Jaccard algorithm. The matching results of some risk entities are shown in Table 8. It can be seen from Table 8 that when the input risk entity is a terrorist attack, the algorithm returns the correct matching risk entity terrorist attack and other matching risk entities related to the terrorist attack. For example, drone attacks and terrorist hijacking events are the manifestations of terrorist attacks, and control system failures are the result of cyber-attack damage in terrorist attacks. Users can select the different matching risk entities to intelligently generate different emergency plans according to their needs.

Intelligent generation of emergency plans
Based on the risk entity and related entities identified by the BERT+BiLSTM+CRF model, we use the Word2Vec model trained in the vertical domain and the Jaccard algorithm to match risk entities in the graph and intelligently generate emergency plans with graph retrieval and reasoning.
Take the patrol text of "the seepage phenomenon occurred in the embankment of the high-fill channel at pile number K15 + 125~ K16 + 140 in Dengzhou Management Office" as an example. Based on the entity recognition model, we can obtain the risk entity and related entities: management office (Dengzhou management office), engineering (high-fill channel), stake number (K15 + 125⁓K16 + 140), engineering risk part (embankment body) and risk (seepage phenomenon).
According to the coarse-grained sorting of the Jaccard algorithm, 20 risk candidate entities are generated. Based on the Jaccard algorithm and the Word2Vec model, we integrate the three entity features of engineering, risk and engineering risk parts to obtain the top five risk entities in the graph in the order of the similarity scores, and we use the graph retrieval method to return the corresponding disposal measures, as shown in Table 9. Table 9 shows that the identified risk entity "Leakage phenomenon" matches the risk entity "Canal seepage" in the graph. Based on the identified entity list and emergency plan template, we use the graph reasoning method and graph retrieval method to intelligently generate emergency plans, as shown in Figure 8. Figure 8 shows that the content of the emergency plan for "Canal seepage" includes identified entities of the patrol text, related risk events, matching risk list in the graph, risk control methods, graph overview, project overview, rescue plan, risk analysis and material preparation points. The graph overview shows the   knowledge related to the risk entity "Canal seepage" in the form of a knowledge graph. The project overview contains the relevant information about the project entity "Dengzhou high fill channel". The rescue plan includes the material and equipment needed for emergency rescue. And the risk analysis includes causes, consequences, related risks and the engineering risk part of the risk entity "Canal seepage".

Conclusion
Because of the frequent occurrence of risk in the middle route of the South-to-North Water Diversion Project, this paper proposes an intelligent method for the generation of emergency plans based on knowledge graphs, and draws the following conclusions: (1) Based on the Risk Prevention and Control Manual of the middle route of the South-to-North Water Diversion Project and the Overall Plan of the emergency rescue system, this paper analyses and organises the knowledge of danger, engineering, etc., and uses knowledge extraction, knowledge fusion and knowledge storage to construct a knowledge graph of the emergency plan for the middle route of the project. Furthermore, we provide inquiry and retrieval services of knowledge related to risk.
(2) In this paper we design an entity recognition model based on BERT+BiLSTM+CRF, which achieves a better recognition effect through a smaller data set, and realises automatic identification and extraction of entities such as projects, locations, stake numbers, etc. We propose the Jaccard entity similarity algorithm based on the Word2Vec model and verify the reliability of the model in calculating the similarity of the risk entities, and then realise the intelligent alignment of the risk entities.
Based on the danger inspection text submitted by the inspectors, this paper combines the two machine learning models mentioned above to automatically extract the danger, location and engineering information in the text and match it with the danger in the knowledge graph, and, finally, use knowledge retrieval and reasoning to intelligently generate an emergency plan, which contains an overview of engineering knowledge, rescue measures and danger analysis to effectively assist professionals to make emergency decisions and improve the safe operation of water conservancy projects. However, there are limitations to this approach, such as its reliance on the method of describing hazards in the inspection text and the number of hazards in the graph. In the future, we need to improve the knowledge matching and knowledge updating techniques and expand the breadth and depth of the water conservancy projects knowledge map to serve other downstream tasks of water conservancy projects and enhance the intelligent management of water conservancy information.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the Projects of Open Cooperation of Henan Academy of Sciences under grant number 220,901,008; and National Natural Science Foundation of China under grant number 72271091.

Data availability statement
The data that support the findings of this study are available from the corresponding author, H. K. Lu, upon reasonable request https://github.com/luhankang/emergency_plan