Next Article in Journal
Bioaugmentation Potential Investigation Using a Phenol Affinity Analysis of Three Acinetobacter Strains in a Multi-Carbon-Source Condition
Previous Article in Journal
Low-Flow (7-Day, 10-Year) Classical Statistical and Improved Machine Learning Estimation Methodologies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hidden Danger Association Mining for Water Conservancy Projects Based on Task Scenario-Driven

1
College of Computer and Information, Hohai University, Nanjing 210098, China
2
Business School, Hohai University, Nanjing 210098, China
*
Author to whom correspondence should be addressed.
Water 2023, 15(15), 2814; https://doi.org/10.3390/w15152814
Submission received: 4 June 2023 / Revised: 27 July 2023 / Accepted: 1 August 2023 / Published: 3 August 2023

Abstract

:
With the rapid development of water conservancy engineering and infrastructure construction, there are many safety hazards in the construction process of water conservancy engineering, so it is of great significance to study the potential hazards in the construction process. In this context, this paper proposes a task scenario-based association mining method for hydraulic engineering hidden danger records. By analyzing transaction characteristics, the traditional Apriori algorithm is improved to optimize pruning results and generate hidden danger association rules. The research results of this paper have been successfully applied to the investigation and management of hidden dangers in the Xinmenghe dredging project. Based on the mapping of association rules driven by task scenarios, hidden dangers association rules in specific task scenarios are mined to assist construction safety managers in hidden dangers investigation, which reduces the complexity of the algorithm, reduces the running time of the algorithm and improves the efficiency of the algorithm.

1. Introduction

With the vigorous development of China‘s water conservancy industry, water conservancy construction has the characteristics of large scale and long construction period. It faces certain construction safety hazards, and construction safety management is becoming increasingly difficult [1,2]. At present, there are some problems in water conservancy data, such as insufficient capacity of data governance infrastructure and low degree of data standardization, these problems have become the bottleneck restricting the development of water conservancy informatization [3,4]. With the advancement of science and technology, the water conservancy industry has also begun to apply emerging technologies such as computer networks and data mining on a large scale to achieve the purpose of efficient management of water conservancy projects [5,6,7]. The application of data mining technology in the safety management of water conservancy projects is of great significance for promoting the modernization of water conservancy project management and the construction of an efficient and smooth safety management system [8].
Hidden dangers are the source of safety accidents, and the investigation and treatment of hidden dangers play an important role in the entire construction and operation cycle of hydropower projects [9,10]. With the advancement of safety informatization, a large number of complicated, multi-source heterogeneous hidden danger text data have been accumulated during the construction of water conservancy projects. Hidden danger mining is of great significance to the safety management of water conservancy projects. However, the current security risk analysis mainly relies on the personal experience of security managers, and the efficiency is low and it is difficult to find the management points of security risks and the internal connection of hidden risk data. At present, new intelligent management methods have entered a new stage in the safety and quality control of water conservancy projects [11,12]. In order to extract useful safety management information from the huge amount of safety hazard data, solve the problems of low efficiency and subjectivity of manual data analysis, and provide managers with targeted auxiliary decision-making, data mining technology has been applied to the early warning process of engineering risk hazards. Association rule mining is an important tool for data mining, and common association rule algorithms include the Apriori algorithm, etc. [13,14]. Through the association rules, the association rules between the hidden dangers in the transaction concentration can be excavated, and the association rules can be provided to the project managers to realize the investigation and management of the hidden dangers. The Apriori algorithm needs to scan the database many times for large data sets, when obtaining the frequent items of the target association rules, it will generate multiple redundant rules [15], and there is a low efficiency of generating candidate items [16], it is difficult to accurately analyze unstructured text such as hidden danger description. Therefore, in order to solve the above problems, this paper analyzes the basic idea of the Apriori algorithm and combines the characteristics of transaction data to optimize the practical application of the Apriori algorithm in the investigation of hidden risks in water conservancy project construction. The main contributions of this paper are as follows:
  • Construct task scenarios of water conservancy project, obtain the corresponding risk association rules in the specific construction task scenarios and analyze project data, structurally process and code the hidden danger records;
  • Optimize the Apriori algorithm based on transaction data characteristics: use a single risk hidden record (that is a single transaction) as a connection object for transaction matching, filter redundant association rules, optimize pruning results, narrow the range of frequent item candidate sets, and improve algorithm efficiency;
  • Association rule mapping based on task scenario driven: According to the characteristics of the task scenario, the hidden danger association rules mined under the corresponding construction task scenario are matched, so as to obtain the potential associated with the hidden dangers of a construction task and realize the function of hidden danger investigation and management.

2. Literature Review

Data mining refers to the extraction of practical hidden information from massive data, through careful analysis to clarify the important relationship between data, has been widely used in education, military, medical and other fields [17,18,19]. At the end of the 20th century, the concept of data mining was proposed and popularized. The current research on data mining is divided into the following categories: classification [20], clustering [21], association analysis [22], rule analysis, etc., mainly including neural network [23], Logistic regression [24], Apriori algorithm, FP-Growth algorithm [25] and other data mining methods. Literature established an automatic control model of input information based on data mining technology, which was used to control the pharmaceutical education resource library to automatically input information and improve work efficiency [26]. Aiming at the problem that traditional military intelligence analysis cannot deeply mine and analyze massive information, Reference proposed an analysis method based on data mining technology, which effectively improved the efficiency of military intelligence analysis [27]. Literature used the Apriori algorithm to mine association rules for massive data in the medical industry, focusing on the association between cesarean section, signs and drugs [28].
At present, the application of data mining technology in the field of water conservancy projects mainly focuses on reservoir regulation, water-saving irrigation control [29,30,31] and dam safety monitoring [32,33]. The data mining research on the investigation and treatment of hidden dangers in water conservancy projects at home and abroad is still in its infancy. Statement [34] aiming at the needs of hidden danger investigation in hydropower projects, adopts the improved Apriori algorithm to mine the association rules of the massive data accumulated in the investigation of hidden dangers in the construction of hydropower projects. Based on mutual information and left and right information entropy methods, key phrases are extracted from unstructured hidden danger data to provide decision support for the investigation and treatment of hidden dangers in water conservancy construction. Based on the theory of double prevention mechanism of coal mine, Huang [35] combined the Apriori algorithm to analyze a large number of coal mine accident hidden danger data, so as to reveal the correlation between coal mine hidden dangers and formulate risk prevention measures. Liu [15] used the concept of strong association rules to improve the Apriori algorithm for the problems of subway fault types and various influencing factors, and mined association rules for subway fault data to provide decision support for subway fault warning. Sing H proposes a new approach called Sandwich-Apriori which is a combination of both Apriori and Reverse-Apriori. This approach reduces the number of scans and number of candidates generated as compared to Apriori and Reverse-Apriori [36]. Verma proposed a Ameliorated-Apriori Algorithm (A-Apriori) with the original Apriori al-gorithm which calculates frequent items on several groups of transactions with minimum support (for both Apriori & A-Apriori) [37].
The traditional Apriori algorithm needs to scan the database multiple times, and there will be redundancy when generating frequent items, it is difficult to accurately analyze unstructured text such as security risk description. In order to solve the above problems, the researchers improved the Apriori algorithm mainly from two aspects: reducing the number of database scans and converting data storage methods. Chen [38] proposed the AS-Apriori algorithm based on the cumulative array method to reduce the number of scans of the database. Xie [39] proposed the Apriori-MMR algorithm, which converts the hidden danger database into a matrix to improve the data storage mode. This paper proposes a hidden danger record mining technology for water conservancy project risk and hidden danger investigation based on task scenario driving. Then, based on transaction characteristics, the traditional Apriori data mining algorithm is improved, the pruning results are optimized to generate risk and hidden danger association rules, and the hidden danger association rules under specific task scenarios are mined, so as to assist construction safety management personnel in hidden danger investigation and management.

3. Materials and Methods

3.1. Study Area and Data Source

The new North district project in Changzhou City involves 36 villages in 5 towns from north to south, including Menghe Town, Xixiashu Town, Luoxi Town, Penniu Town and Chunjiang Town (abandoned area), with a total length of about 25.29 km: 21.81 km (3.3 km newly opened) in the section north of the canal, 1.4 km in the section of Penniu Junction, and 2.06 km in the newly opened section of the southern extension; Jiepai of New Meng River Extension Dredging Project, the water conservancy pivot project is an important part of the extension and dredging project of the New Meng River, located in the town of Jiepai, Danyang City, Zhenjiang, at the mouth of the river in the northward extension section of the New Meng River, consisting of a ship lock, a restraint lock and a pumping station, with buildings arranged in sequence from west to east.
The Xinmeng River Extension and Dredging Project is a backbone project with comprehensive benefits of flood prevention, drainage, water resources allocation, water ecology improvement and navigation, which is also listed in the 172 national major water conservation and water supply projects and the key project of the construction of Yangtze River Economic Belt, according to the “Overall Plan for the Comprehensive Management of Water Environment in Taihu Basin”, “Taihu Basin Flood Control Plan” and “Taihu Basin Water Resources Comprehensive Plan”.

3.2. Building of Construction Task Scenarios for Water Conservancy Project

There are many water conservancy project construction projects, and the construction tasks involved in each individual project are complicated. There is a close relationship between the construction tasks. Different construction tasks have many different hidden dangers, and a series of chain reactions will be caused after a safety accident. Therefore, in order to realize the hidden danger investigation of water conservancy project construction from the source, the construction task scenarios should be carefully divided into different construction tasks, the work focus under each construction task scenario should be clarified, the pertinence of hidden danger investigation should be improved, and the standard hidden danger investigation and management should be realized. In addition, not all association rules can be applied to the investigation of hidden risks in different construction tasks by mining association rules for hidden risks in water conservancy construction. If the construction task scenarios are not considered, the advantages of association rule mining cannot be brought into play, and the need for risk investigation cannot be met. Therefore, by constructing the water conservancy project construction task scenarios, the corresponding risk association rules can be mined in the specific task scenarios, and the hidden danger investigation can be realized for the construction site safety management personnel. To sum up, the building of a water conservancy construction task scenario is an important part of realizing the hidden danger mining for water conservancy project safety, and is the key element to realize the targeted hidden danger investigation.
The water conservancy project is a systematic project, which can be divided into projects according to the construction tasks, and the whole project can be subdivided into multiple individual items. The principle of the water conservancy project division is to divide a construction project into several unit works. Each unit work is divided into several divisional works according to its structure, location or function, and then the divisional work is specifically divided into element work.
The Xinmeng River extension and dredging project is a backbone project with comprehensive benefits of flood prevention, drainage, water resources allocation, water ecology improvement and navigation, which is also listed in the 172 national major water conservation and water supply projects. The project in the Xinbei District of the Xinmeng River projects involves 36 villages in 5 towns from north to south, including Menghe Town, Xixiashu Town, Luoxi Town, Penniu Town and Chunjiang Town. Jiepai of Xinmeng River extension dredging project is an important part of the projects located in the town of Jiepai, Danyang City, Zhenjiang, at the mouth of the river in the northward extension section of the Xinmeng River, consisting of a ship lock, a restraint lock and a pumping station, with buildings arranged in sequence from west to east. Combined with the literature analysis and the practice of the Xinmeng River extension dredging project, the construction task scenarios of this project engineering are shown in Table 1.
According to the division standard of the water conservancy project, the divisional project is divided according to the operation structure and position, and the element work is divided according to the operation function. Therefore, it can correspond to the two construction task characteristics of the working place and the operation category of the construction task. According to the characteristics of the two construction tasks, different construction tasks can be divided into different construction task scenarios, so as to realize the detailed division of the construction task scenarios of water conservancy projects and clarify the work focus under each construction task scenario.

3.3. Data Analysis of Hidden Danger Records

In the whole process of traditional water conservancy project construction hidden danger investigation, a series of safety inspection plans are often formulated, and the hidden danger records generated in the process of safety inspection and hidden danger investigation are collected and registered according to the date. According to the safety inspection table library and the hidden danger investigation standard library, the hidden danger rectification and acceptance of each hidden danger are realized, the process is shown in Figure 1. On this basis, based on the traditional hidden danger investigation method steps as theoretical support, this paper analyzes the hidden danger records, and provides help for the realization of hidden danger mining and hidden danger investigation research for water conservancy project construction.
This paper takes the hidden danger records in the construction of the Xinmeng River extension dredging project as the research object, the hidden danger records include four fields: inspection date, operation category, hidden danger description and working place. Some hidden danger records are shown in Table 2.
According to the hidden danger records in Table 2, each hidden danger record is used to extract key information such as operation category, hidden danger description and workplace, so as to realize the preprocessing of unstructured data into structured data. Aiming at the characteristics of long description and inconsistent expression of hidden danger description field, the word segmentation processing is carried out after establishing the stop vocabulary. The information in the hidden danger description field is divided into two parts: the hidden danger subject and the hidden danger description. The hidden danger subject is used to describe the energy carrier or dangerous substance where the hidden danger occurs. The data are shown in Table 3.
In order to facilitate data mining, the key information characteristics of the hidden danger records after structured processing, such as operation category, operation place, hidden danger subject and hidden danger description, should be coded as follows: the coding at the beginning of number 1 indicates the operation category, the coding at the beginning of number 2 indicates the operation place, the coding at the beginning of number 3 indicates the hidden danger subject, and the coding at the beginning of number 4 indicates the hidden danger description. The classification of key information characteristics and corresponding coding parts of some hidden danger records are shown in Table 4.
A hidden danger record is regarded as a transaction, and the hidden danger characteristics in the hidden danger record are represented by the corresponding digital coding. For example, a hidden danger is recorded as “when the excavation of the embankment foundation is deep, no safety technical measures to prevent slope collapse and landslide“. The work class is the embankment foundation construction (1001), the work field is the second section of the first construction area (2001), and the hidden danger subject is the embankment foundation (3001). The hazard description is no safety measures (4001,4006), slope collapse (4008), landslide (4009), and the data recorded in this hidden danger are expressed as (1001,2001,3001,4001,4006,4008,4009), the hidden danger records are shown in Table 5.

3.4. Improvement of Apriori Algorithm Based on Transaction Data Characteristics

In 1994, R. Agrawal proposed the famous Apriori algorithm based on the theory of item set lattice space, which has become a classical algorithm in the field of association rule mining. Its main idea is to generate strong association rules by finding frequent items in a given data set, and to generate the candidate items by using the frequent items generated by the previous scan candidate items. It can be simplified into three steps: connection, pruning and calculation. The flowchart is shown in Figure 2.
Frequent items and strong association rules are two rules that satisfy minimum support and confidence. Support represents the probability of simultaneous occurrence of A and B, and confidence represents the ratio of the probability of simultaneous occurrence of A and B to the probability of occurrence of A, as shown in Formulas (1) and (2).
support A = > B = P A B   ,
confidence A = > B = P ( B | A ) = support A B support A   ,
It can be seen from Figure 2 that the traditional Apriori algorithm needs to iterate repeatedly in the process of connection and pruning, and it also needs to scan the database repeatedly when mining frequent items. In the process of generating frequent candidate items, the candidate items satisfying the minimum support will be retained, which will make the number of candidate items huge. Therefore, the time overhead of the algorithm is mainly focused on the generation of candidate items, which will greatly reduce the time efficiency of the algorithm. Secondly, there are some candidate sets in these retained candidate sets. Even if their corresponding association rules are valid, they still cannot make an effective contribution to the association analysis of risk hazards. In order to solve the above problems, based on the analysis of the data characteristics of the hidden danger database in Section 3.2, this paper improves the traditional Apriori algorithm and proposes a new frequent items mining strategy. The improvement process of the Apriori algorithm based on transaction data characteristics is shown in Figure 3.
The traditional Apriori algorithm takes the items in the hidden danger database as the connection object. Because the number of transactions in the hidden danger database is far less than the number of items in each transaction record, this paper takes a single hidden danger record, that is, a single transaction, as the connection object. The dynamic function can be used to obtain the longest common item in the pairwise transaction pairing in the transaction pairing database and use this result as the transaction pairing for the next use. In addition, the longest common item sought each time is the result of association rule mining. Continue to loop this step, and continue to find the longest common item on the new transaction pairing until the number of records in the transaction pairing database is 1.
Combined with the data characteristics in the hidden danger database to be excavated, each hidden danger is composed of four parts: operation category, operation place, hidden danger subject and hidden danger description. The association rules such as “embankment foundation construction → embankment foundation“ and “embankment foundation construction → sluice earthwork excavation“ are effective, but they cannot contribute to the correlation analysis of potential risks. Since the association rules that do not contain the subject of the hidden danger and the description characteristics of the hidden danger are not interesting association rules. In the pruning process of the mining algorithm, this paper considers directly deleting a certain item in the frequent k-item set that does not contain the hidden danger subject and the hidden danger description feature, optimizing the pruning results, narrowing the candidate set range of the frequent item set, reducing the time complexity of generating the frequent item set, and improving the efficiency of the algorithm.
Establish a hidden danger database D and set the number of transaction pairs K to 2, and set the minimum support threshold. Then the longest common item in all transaction pairs is obtained as the K-candidate set. If the longest common item is empty, it is not retained. A frequent K-item is obtained by deleting all items that do not contribute to the association analysis. If K is greater than min _ sup, all subsets of the frequent K-item are obtained. These subsets are the items that constitute the transaction. The results obtained by association rule mining are used for association analysis. If the frequent K-item is empty or the number of elements contained is 1, the algorithm is terminated, otherwise K = K + 1, the above steps are iterated circularly.
The core code of the improved Apriori algorithm based on the characteristics of transaction data is shown in Algorithm 1:
Algorithm 1 Improved Apriori algorithm based on transaction data characteristics
Input:
Output:
D{ d 1 , d 2 , d 3 d n }, min_sup;
F P k ;
1.Initialize transaction database T, Sets the threshold for minimum support δ;
2.The initialization K value is 2;
3.While ( L k 1 is not NULL)
4.Generate a new pair;
5.Find the longest common term as C k ;
6.Prune invalid transaction pairs as L k ;
7.If (δ K )
8.Find all subsets of the L k as F P k ;
9.End if;
10.End while;
11.K = K + 1;
12.End while;

3.5. Association Rule Mapping Based on Task Scenario-Driven

A task scenario is a story about people and their activities, it refers to a series of observable behavior sequences when people perform a task, the focus of a task is shifted from the behavior of completing a task to the sequence of task execution based on task scenario-driven. The concept of task scenario has been maturely applied in military applications [40], human-computer interaction [41] and other fields. The association rule mapping based on the construction task scenario can quickly realize a series of work such as hidden danger investigation and management, which can effectively help the construction safety management personnel to dig out the hidden danger in the specific construction task scenario, and can more effectively obtain the hidden danger rules closely related to the target construction task. Therefore, the complexity of safety management is greatly reduced, and the corresponding hidden danger investigation measures can be more effective.
According to the water conservancy project construction task scenario constructed in Section 3.1, for a specific water conservancy project construction task, the work class and work field in the specific construction task can be used as the task characteristics of the construction task according to the construction task scenario. With these two task characteristics, the association rules in the target construction task scenario can be obtained more effectively through the task scenario-driven association rule mapping. At the same time, the computer can match the target risk association rules mined in the corresponding construction task scenario, and then obtain the potential associated with the hidden danger of a certain construction task, so as to realize further hidden danger investigation and management. Extract the job category and workplace information in a construction task, and encode the job category and workplace according to Section 3.2 in this article. These two features are used to match the association rules obtained by the previous association mining. If an association rule contains these two features, it is considered to satisfy the association rule, and the corresponding hidden danger investigation is given according to the association rule. The task scenario-driven association rule mapping is shown in Figure 4.
According to the matching of the association rules obtained from the mining, two association rules that meet the conditions can be obtained: [‘2001‘, ‘1006‘, ‘3007‘, ‘4010‘] and [‘2001‘, ‘1006‘, ‘3011‘, ‘4003‘], which correspond to [one standard construction area 2 section‘, ‘mud pipeline hypothesis‘, ‘mud pipeline‘, ‘not strong‘], [‘one standard construction area 2 section‘, ‘mud pipeline hypothesis‘, ‘rope‘, ‘not set‘]. The results of this example are explained as follows: when the construction task of mud pipeline erection is carried out in the working place for the second section of the construction area of the one standard construction area, there may be two hidden risks, that is, the mud pipeline is not strong and the rope is not set, so as to give the direction of risk investigation. This result can enable the managers of the construction site to strengthen the risk investigation and safety management of these two risks before the construction of the mud pipeline in the second section of the first construction area to prevent safety incidents.

4. Experiment and Results

4.1. Experiment Preparation

In this section, the hidden danger records collected during the construction of the Xinmeng River extension dredging project are selected as the experimental object, and the hidden danger records belonging to 26 different operation categories are selected for the experiment. The number of transactions in the record set is 4312, containing a total of 632 items, of which the length of the transaction does not need to be unified.
The word segmentation tool based on Python Jieba is used to segment the hidden danger records, and a deactivated vocabulary is constructed and words that are meaningless to association rule analysis are removed. According to the content described in 3.2 of this paper, a hidden danger database is established, and some hidden danger records in the hidden danger databases are shown in Table 6.
From the above table, it can be analyzed that the job category, hidden danger description and workplace field in each hidden danger record are structured after word segmentation, and then the key information contained in them is encoded and converted into a specified code. Each code is regarded as an item in the hidden danger database, and each hidden danger record corresponds to a record in the hidden danger database.
The overall experimental idea is to mine the association rules between hidden dangers under different job categories, and compare the differences in operating efficiency between the improved algorithm proposed in this paper and the traditional algorithm and the improved algorithm proposed by other scholars. In addition, the results of association rules mined by the traditional algorithm and the improved algorithm proposed in this paper are compared under the same support and confidence conditions.

4.2. Experiment on the Efficiency of Improved Algorithms

Taking the risk information collected by date in the construction of the Xinmeng River extension dredging project as the experimental object, the same hidden danger database is used as the experimental data. In this section, the traditional Apriori algorithm, the AS-Apriori algorithm and the Apriori-MMR algorithm introduced in the literature review are selected as the baseline for comparative experiments respectively. It is observed that the running time of the algorithm changes with the minimum support.
It can be analyzed from Figure 5 that the running time of the improved Apriori algorithm is much smaller than that of the traditional Apriori algorithm when the support is small. With the increase of support, although the running time difference between the two is getting smaller and smaller, the running time of the improved Apriori algorithm is also smaller than that of the traditional Apriori algorithm. After calculation, the average running time of the improved Apriori algorithm is reduced by 32.9% compared to the traditional algorithm under different support degrees.
It can be analyzed from Figure 6 that the running time difference between the improved Apriori algorithm proposed in this paper and the AS-Apriori algorithm is less affected by the support degree. The improved Apriori algorithm has a better running time than the AS-Apriori algorithm under different support degrees. After calculation, the average running time of the improved Apriori algorithm is reduced by 23.5% compared with the AS-Apriori algorithm under different support degrees.
From Figure 7, it can be analyzed that the running time difference between the improved Apriori algorithm proposed in this paper and the Apriori-MMR algorithm is quite different when the support degree is 0.1, but after the support degree is 0.7, the running time of the two is similar, but on the whole, the improved Apriori algorithm is better than the Apriori-MMR algorithm. According to the calculation, the average running time of the improved Apriori algorithm is reduced by 19.4% compared with the Apriori-MMR algorithm under different support degrees.

4.3. Experiment on the Effectiveness of Hidden Danger Association Mining

The following experiments are conducted to verify the advantages of the improved Apriori algorithm based on transaction data features proposed in this paper in mining the hidden danger association rules of water conservancy projects. The mined hidden danger association rules based on task scenario-driven can contribute to the association analysis of hidden dangers without generating meaningless association rules. The experiment sets the minimum support of 0.1 and the confidence of 0.8 for the traditional Apriori algorithm and the improved Apriori algorithm respectively. The part of association rules generated by the traditional Apriori algorithm and the improved Apriori algorithm are shown in Table 7.
By comparing the results of association rules, it can be analyzed that the results of association rules generated by the traditional algorithm are ‘not formulated → safety measures‘. Although the rule is effective, it is not helpful to the risk investigation in practical application. These association rules are meaningless and belong to invalid rules. The improved Apriori algorithm based on transaction data features proposed in this paper avoids the meaningless increase in the number of candidate items by deleting meaningless items in the process of frequent items mining, improves the efficiency of the algorithm and reduces memory space consumption.

5. Discussion

The hidden danger data of the water conservancy project has the characteristics of large volume and redundancy, therefore, association rule mining of the hidden danger records requires several scans of the hidden danger database, and there is the problem of redundancy when generating frequent item sets, which leads to low efficiency of the algorithm. In order to solve the above problems, this paper improves the traditional Apriori algorithm by analyzing the characteristics of transaction data, structuring and coding the hidden danger records for storage, and improving the efficiency of frequent item set matching by using a single hidden danger record as the connection object.
Construction task scenarios for water conservancy projects are constructed, and a task scenario-driven association rule mapping is proposed to filter invalid association rule candidate sets and improve the mining process of frequent item sets. The effectiveness of the improved algorithm based on transaction features for association rule mapping based on task scenario-driven is demonstrated. The improved algorithm is applied to the hidden hazard association rule mining for the Xinmeng River dredging project. The obtained hidden danger association rules are used as knowledge for hidden danger investigation and management of water conservancy projects, which provides managers with a scientific basis for decision making and reduces the probability of safety accidents in water conservancy projects.
There are many factors involved in the construction safety management of water conservancy projects, and this paper only analyzes the mining of the construction hidden danger records, but many hidden danger factors also exist in other engineering stages and outside the hidden danger records. Further research will follow on how to enhance the algorithm’s ability to mine rare patterns of hidden danger with higher severity and lower frequency of occurrence. Using natural language processing and text mining technology to mine and analyze construction safety solutions and other related materials, and then realize the whole process, multi-source intelligent analysis and management of safety hidden danger.

6. Conclusions

In this paper, a hidden danger record mining method for hydraulic engineering risk detection under task scenario-driven is proposed. Firstly, the traditional Apriori data mining algorithm is improved based on transaction characteristics, and the pruning results are optimized to generate risk and hidden danger association rules, which reduces the complexity of the algorithm and improves the efficiency of the algorithm. Then, association rules are mapped based on task scenario-driven, hidden danger association rules in specific task scenarios are mined, which makes hidden danger mining closer to actual construction task scenarios and assists construction personnel in project management. Experiments are carried out on the data of the Xinmeng River dredging project, which proves the effectiveness of the algorithm efficiency and the effectiveness of the task scenario. The research in this paper has been successfully applied to the actual data.

Author Contributions

Conceptualization, Y.P.; investigation, C.Y. and F.T.; resources, M.Z. and F.T.; data curation, M.D. and M.Z.; methodology, F.T. and Y.P.; writing—original draft, F.T.; writing—review and editing, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number: 42001250; Jiangsu Water Conservancy Science and Technology Foundation, grant number: 2020014.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, G.C. Research on Water Conservancy Project Construction Management-Comment on ‘Introduction to Water Conservancy Project (2nd Edition)’. Yellow River 2022, 44, 176. [Google Scholar]
  2. Jiang, X.; Luo, D.L.; Li, W.; Chen, Y.; Wu, J.H.; Jin, L.H. Research on cusp catastrophe model of evolution mechanism of high-risk operation emergency in water conservancy project. J. Yangtze River Sci. Res. Inst. 2020, 37, 75–81. [Google Scholar]
  3. Yong, Z.; Hua, D.; Chen, D.Q. Research and practice of reservoir basic data management work. Water Resour. Informatiz. 2022, 171, 49–53+69. [Google Scholar]
  4. Yuan, L.W. Application of information and automation control technology in water conservancy system. Electron. Technol. 2022, 51, 145–147. [Google Scholar]
  5. Liu, W.; Fang, G.C. Overview of current situation and development of water conservancy industry informatization. Water Conserv. Constr. Manag. 2021, 41, 81–84. [Google Scholar]
  6. Wang, D.P. Water conservancy project management based on data mining technology. Jilin Water Resour. 2014, 385, 38–40, 49. [Google Scholar]
  7. Zhang, L.J. Analysis of water conservancy project management based on data mining technology. Heilongjiang Hydraul. Sci. Technol. 2017, 45, 172–174. [Google Scholar]
  8. Fang, G.H.; Gao, Y.Q.; Tan, W.X.; Zheng, Z.Z.; Guo, N. Construction of evaluation index system of water conservancy project management modernization. Adv. Sci. Technol. Water Resour. 2013, 33, 39–44. [Google Scholar]
  9. Fan, Q.X.; Yang, Z.L.; Wang, Z.L. Digital dynamic control over whole construction process of large hydropower projects. J. Hydroelectr. Eng. 2019, 38, 1–11. [Google Scholar]
  10. Fan, Q.X.; Lin, P.; Wei, P.C. Hydropower engineering safety accident occurrence mechanism and management strategies. China Saf. Sci. J. 2019, 29, 144–149. [Google Scholar]
  11. Zhang, Q.L.; An, Z.Z.; Liu, T.Y. Intelligent control theory of earth-rock dam compaction. J. Hydroelectr. Eng. 2020, 39, 34–40. [Google Scholar]
  12. Zhong, D.H.; Shi, M.N.; Cui, B. Research progress of the intelligent construction of dams. J. Hydraul. Eng. 2019, 50, 38–52+61. [Google Scholar]
  13. Zhu, X.F.; Zong, Y. Construction and application analysis of university management system based on association rule mining algorithm Apriori. Sci. Program. 2022, 2022, 4367267. [Google Scholar] [CrossRef]
  14. Liu, L.J. Research and application of improved Apriori algorithm. Comput. Eng. Des. 2017, 38, 3324–3328. [Google Scholar]
  15. Liu, W.Y.; Xu, Y.N. Subway fault association rule mining based on improved Apriori algorithm. J. Ordnance Equip. Eng. 2021, 42, 210–215. [Google Scholar]
  16. Wang, W.; Chu, Z.N.; Han, Y. An improved Apriori algorithm based on MapReduce for Apriori association rules with antecedent and consequent constraints. J. Xinyang Norm. Univ. Nat. Sci. Ed. 2020, 33, 448–453. [Google Scholar]
  17. Gu, W.H.; Zhang, L.B.; Zhou, J. Technical limitation boundary and optimization strategy of fair use of digital resources-text analysis based on digital resource license agreement. Inf. Stud. Theory Appl. 2023, 46, 51–59. [Google Scholar]
  18. Harding, J.A. Data Mining in Manufacturing: A Review. J. Manuf. Sci. Eng. 2006, 128, 969–976. [Google Scholar]
  19. Choudhary, A.K.; Harding, J.A.; Tiwari, M.K. Data mining in manufacturing: A review based on the kind of knowledge. J. Intell. Manuf. 2009, 20, 501–521. [Google Scholar]
  20. Liu, J.; Yu, Y.Z.; Han, Z.X. Research on a diagnosis method based on fault mechanism and mathematical fusion. Pump Technol. 2023, 269, 24–28. [Google Scholar]
  21. Xu, E.B.; Zhu, Q.W.; Zhang, J.F. Research on urban open space accessibility at home and abroad: Theme context and frontier trend. World Reg. Stud. 2023, 34, 1–16. [Google Scholar]
  22. Abdullah, Z.; Herawan, T.; Deris, M. Detecting Critical Least Association Rules in Medical Databases. Int. J. Mod. Phys. Conf. Ser. 2012, 9, 464–479. [Google Scholar] [CrossRef]
  23. Zhou, Y.; Xu, B.; Xu, J. Compositional recurrent neural networks for chinese short text classification. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, USA, 13–16 October 2016. [Google Scholar]
  24. Rakesh, K. Forest cover dynamics analysis and prediction modeling using logistic regression model. Ecol. Indic. 2014, 45, 444–455. [Google Scholar]
  25. Zhang, L.; Wei, X.Y.; Xu, S. Improved FP-Growth algorithm for database partition for big data. J. Nanchang Univ. Nat. Sci. 2022, 46, 570–576. [Google Scholar]
  26. Chen, G.Y.; Cheng, H.Z.; Su, H. The control of information input in pharmaceutical education resource database based on data mining. Microcomput. Appl. 2021, 37, 5–8. [Google Scholar]
  27. Zhang, Y.T.; Han, Q.X. Research on open network military intelligence analysis based on data mining. J. Intell. 2016, 35, 12–15. [Google Scholar]
  28. Jia, K.B.; Li, H.J.; Yuan, Y. Application of data mining based on Apriori algorithm in mobile medical system. J. Beijing Univ. Technol. 2017, 43, 394–401+322. [Google Scholar]
  29. Babovic, V. Data mining in hydrology. Hydrol. Process. 2005, 19, 1511–1515. [Google Scholar] [CrossRef]
  30. Xu, L.; Yao, L. Applications of data mining in hydrology. In Proceedings of the IEEE International Conference on Data Mining, San Jose, CA, USA, 29 November–2 December 2001. [Google Scholar]
  31. Trafalis, T.B.; Richman, M.B.; White, A. Data mining techniques for improved WSR-88D rainfall estimation. Comput. Ind. Eng. 2002, 43, 775–786. [Google Scholar] [CrossRef]
  32. Zhu, Z.H.; Zheng, D.J.; Zhang, X.H. Application of chaos-data mining model in dam safety prediction. Appl. Chaos-Data Min. Model Dam Saf. Predict. 2007, 114, 34–37. [Google Scholar]
  33. Xian, Y.; Wu, Z.R.; Fu, Z.M. Application of data mining technology in dam safety decision support system. Water Power 2003, 29, 20–23. [Google Scholar]
  34. Chen, S.; Xi, J.B.; Wang, J.P. Association rule mining of construction safety hazard in hydropower project. China Saf. Sci. J. 2021, 31, 75–82. [Google Scholar]
  35. Wang, Y.X.; Yan, Z.G.; Fan, J.D. Coal mine dual prevention information system based on Apriori algorithm. J. Mine Autom. 2020, 46, 92–98. [Google Scholar]
  36. Singh, T.; Sethi, M. Sandwich-Apriori: A combine approach of Apriori and Reverse-Apriori. In Proceedings of the 12 IEEE Int C Elect Energy Env Communications Computer Control, New Delhi, India, 17–20 December 2015. [Google Scholar]
  37. Verma, N.K.; Kumar, S.; Kumar, M. An Alternate Approach to Improve Access Time for Defining Frequent Item Set Through ’A-Apriori’ in Textual Data Set. In Proceedings of the 5th IEEE International Conference on Recent Advances and Innovations in Engineering (IEEE—ICRAIE), Poornima Coll Engn, ELECTR NETWORK. Jaipur, India, 1–3 December 2020. [Google Scholar]
  38. Chen, W.X.; Qu, R.; Sun, Y.G. Intermittent fault prediction of ground air conditioning based on improved Apriori algorithm. J. Comput. Appl. 2016, 36, 3505–3510. [Google Scholar]
  39. Xie, Z.M.; Wang, P. Parallel matrix Apriori algorithm based on MapReduce architecture. Appl. Res. Comput. 2017, 34, 401–404. [Google Scholar]
  40. Xin, J.; Sun, Q.; Wang, Z. Research on contribution rate analysis and technical index conversion method of military helicopter system based on mission scenario. Sci. Technol. Rev. 2018, 36, 79–88. [Google Scholar]
  41. Zhao, D.; Zhang, Z.W. The impact of task scenarios on human-computer interaction requirements. J. Beijing Univ. Civ. Eng. Archit. 2021, 37, 86–92. [Google Scholar]
Figure 1. Water conservancy project construction hidden trouble investigation flow chart.
Figure 1. Water conservancy project construction hidden trouble investigation flow chart.
Water 15 02814 g001
Figure 2. Apriori algorithm flowchart of connection, pruning and calculation.
Figure 2. Apriori algorithm flowchart of connection, pruning and calculation.
Water 15 02814 g002
Figure 3. Improved flow chart of the Apriori algorithm based on transaction data characteristics.
Figure 3. Improved flow chart of the Apriori algorithm based on transaction data characteristics.
Water 15 02814 g003
Figure 4. Example diagram of association rule mapping based on task-driven.
Figure 4. Example diagram of association rule mapping based on task-driven.
Water 15 02814 g004
Figure 5. The traditional Apriori algorithm running time comparison.
Figure 5. The traditional Apriori algorithm running time comparison.
Water 15 02814 g005
Figure 6. The running time comparison of AS-Apriori algorithm.
Figure 6. The running time comparison of AS-Apriori algorithm.
Water 15 02814 g006
Figure 7. The running time comparison of Apriori-MMR algorithm.
Figure 7. The running time comparison of Apriori-MMR algorithm.
Water 15 02814 g007
Table 1. Project engineering division table.
Table 1. Project engineering division table.
Unit EngineeringDivisional WorkElement Work
sluice engineeringGate chamber sectionFoundation pit excavation and treatment
superstructureGate floor reinforced concrete pouring
Upstream and downstream connection sectionConcrete pouring of pier and bank wall
Traffic roomLeft and right bent concrete pouring
Gate houseWing wall roofing, masonry
Table 2. Part of the project hidden danger records table.
Table 2. Part of the project hidden danger records table.
Check DateWork ClassHidden Danger DescriptionWork Field
27 June 2019Beware of foundation constructionWhen the excavation of embankment foundation is deep, there is no safety technical measure to prevent slope collapse and landslide.Section 2 in the construction zone of bid section 1
27 June 2019Channel slope excavationThe measures of interception and drainage are not done well to prevent the influence of surface water and groundwater on the slope.Gangtou River Improvement bid section 1
30 June 2019Earthwork excavation of sluiceThe excavation of foundation pit of buildings should not follow the construction principle of dewatering first and then excavation.Section 1 in the construction zone of bid section 2
30 June 2019Erection of mud pipelineThe pipe foundation of land sludge discharge pipeline is unstable and uneven. The connection of pipe fittings should not be fastened and sealed. Mud leakage occurs during construction.Section 1 in the construction zone of bid section 2
Table 3. Structured data example of hidden danger records.
Table 3. Structured data example of hidden danger records.
Work ClassHidden Danger SubjectHidden Danger DescriptionWork Field
Beware of foundation constructionBeware of foundationNo safety measures were taken and the slope collapsed.Section 2 in the construction zone of bid section 1
Beware of foundation constructiontransport facilitiesOverloading, does not meet the load requirements.Section 3 in the construction zone of bid section 1
Beware of foundation constructionHeavy tampingNo safety measures are in place.Section 2 in the construction zone of bid section 1
Channel slope excavationchannel slopeNo interception and drainage measures have been formulated.Gangtou River Improvement bid section 1
Earthwork excavation of sluiceEarthwork of building foundation pitFailure to follow construction principles.Section 1 in the construction zone of bid section 2
Earthwork excavation of sluicePile foundation equipment operatorsNo operational training conducted.null
Erection of mud pipelineLand mud pipelineThe erection is not stable and the connection is not closed.Section 1 in the construction zone of bid section 2
Table 4. Feature classification and corresponding coding of some hidden danger records.
Table 4. Feature classification and corresponding coding of some hidden danger records.
Work ClassCodeWork FieldCodeHidden Danger SubjectCodeHazard DescriptionCode
Beware of foundation construction1001Section 2 in the construction zone of bid section 12001Beware of foundation3001Not formulated4001
Beware of foundation construction1002Section 3 in the construction zone of bid section 12002transport facilities3002Not followed4002
Flood control emergency construction1003Section 1 in the construction zone of bid section 22003Heavy tamping3003Not set4003
Channel slope excavation1004Section 3 in the construction zone of bid section 22004channel slope3004Not carried out4004
Earthwork excavation of sluice1005Gangtou River Improvement bid section 12005Earthwork of foundation pit3005operation training4005
Table 5. Example of hidden danger records.
Table 5. Example of hidden danger records.
NumWork ClassWork FieldHidden Danger SubjectHazard Description
11001200130014001, 4006, 4008, 4009
21001200230024012
31001200130034001, 4006
41004200530044001, 4003, 4006
5100520033005, 3006, 3008, 30094002, 4007, 4004, 4005
61006200330074010, 4011
Table 6. Some hidden danger records in the hidden danger database.
Table 6. Some hidden danger records in the hidden danger database.
NumWork ClassWork FieldHidden Danger SubjectHazard Description
11001200130014001, 4006, 4008, 4009
21001200230024012
31001200130034001, 4006
41004200530044001, 4003, 4006
5100520033005, 3006, 3008, 30094002, 4007, 4004, 4005
61006200330074010, 4011
7101120053008, 30094021, 4032, 4044, 4100, 4110
8101120183012, 30234001, 4018, 4056, 4102
91016 3021, 30234028, 4032, 4054, 4056
Table 7. Comparison of generated association rules.
Table 7. Comparison of generated association rules.
Rules Generated by Traditional AlgorithmConfidenceRules Generated by Improved AlgorithmConfidence
Embankment foundation construction → 2nd bid construction area 2nd section0.812Two standard construction area 2 section → mud pipeline, not firm0.816
Embankment foundation construction → no safety measures are formulated0.886Channel construction → Operators, protective facilities, damage, not developed0.822
Not formulated → Safety measures0.923One standard construction area 2 section, earth and stone filling → ventilation pipe, fulcrum, not solid0.827
Foundation grouting construction, pile machine equipment → safety measures0.813Operator → not developed, safety measures, violations0.928
Earthwork filling → pressure imbalance, the first standard construction area 2 section0.866One standard construction area 2 section, earth and stone filling → ventilation pipe, fulcrum, not solid0.827
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tao, F.; Pi, Y.; Zhang, M.; Yuan, C.; Deng, M. Hidden Danger Association Mining for Water Conservancy Projects Based on Task Scenario-Driven. Water 2023, 15, 2814. https://doi.org/10.3390/w15152814

AMA Style

Tao F, Pi Y, Zhang M, Yuan C, Deng M. Hidden Danger Association Mining for Water Conservancy Projects Based on Task Scenario-Driven. Water. 2023; 15(15):2814. https://doi.org/10.3390/w15152814

Chicago/Turabian Style

Tao, Feifei, Yanling Pi, Meng Zhang, Chi Yuan, and Menghua Deng. 2023. "Hidden Danger Association Mining for Water Conservancy Projects Based on Task Scenario-Driven" Water 15, no. 15: 2814. https://doi.org/10.3390/w15152814

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop