Hazard analysis: A deep learning and text mining framework for accident prevention

doi:10.1016/j.aei.2020.101152

Advanced Engineering Informatics

Volume 46, October 2020, 101152

https://doi.org/10.1016/j.aei.2020.101152 Get rights and content

Abstract

Learning from past accidents is pivotal for improving safety in construction. However, hazard records are typically documented and stored as unstructured or semi-structured free-text rendering the ability to analyse such data a difficult task. The research presented in this study presents a novel and robust framework that combines deep learning and text mining technologies that provide the ability to analyse hazard records automatically. The framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation algorithm (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project. It is envisaged that the use of the framework can provide managers with new insights and knowledge to better ensure positive safety outcomes in projects. The contributions of this research are threefold: (1) it is demonstrated that the process of analysing hazard records can be automated by combining deep learning and text learning; (2) hazards are able to be visualized using a systematic and data-driven process; and (3) the automatic generation of hazard topics and their classification over specific time periods enabling managers to understand their patterns of manifestation and therefore put in place strategies to prevent them from reoccurring.

Introduction

Accidents on construction sites, particularly those of a large-scale, in China have been increasing over the last five years [1],[2]. There is therefore a need to improve safety performance and put in place mechanisms to control and mitigate hazards [3]. The identification and effective monitoring of hazards provides managers with the ability to put in place strategies to ensure peoples safety. Traditionally, however, safety monitoring has been reliant on performing regular inspections and in some instances the use of video surveillance [4].

During the process of inspecting a site for hazards those that are identified are manually recorded and stored in an unstructured or semi-structured text format. In the case of large-scale infrastructure projects, which can take years to construct, masses of hazard data that accords with regulations and expert judgement will be collected [5], [6]. Yet the ability to sieve through and analyse the data to determine patterns and trends in order to improve safety is a difficult process due its format.

Drawing on text mining and deep learning technologies the research presented in this study presents a novel framework that provides the ability to analyse hazard records automatically. The developed framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project that is being constructed in Wuhan, China. The research commences with a brief review of the text mining and deep learning literature to provide a contextual backdrop for the framework that is presented.

Section snippets

Related research

Safety information in construction is collected and stored in various formats (e.g., data, images, and text). Digital and mobile technologies are often used to capture hazards by site supervisors and engineers when they perform safety inspections. Yet the data tends to be unstructured, while useful, it is difficult to analyse and obtain insights about emerging patterns and nuances of hazards [7]. Recognising this problem, it has been averred that text mining techniques can be used to examine

Research approach

In making headway to prevent accidents in construction, we aim to develop a deep learning and text mining framework to analyse hazards automatically. The framework provides managers with the capability to quickly analyse a large number of hazard records and put in place mechanisms to prevent an adverse safety incident and therefore support an error management culture[35],[36].

In supporting of our research aim, and akin to previous studies that have been used to develop CNN frameworks to ensure

Design and development of a deep learning and textmining framework

Our proposed framework is presented in Fig. 2, comprises four-step modelling process:

1.
An LDA algorithm is employed to determine hazard topics. The LDA is used to determine acquiring and clustering scenario-specific hazards over a period of time.
2.
Based on the hazard sub-categories extracted from the topics of LDA, a CNN algorithm is trained to extract text features and automatically classify hazard records without manual feature processing. The CNN model is used to automatically classify hazards

Demonstration

The construction engineering arm of the Wuhan Metro Group Co., Ltd (China) has in place a digital hazard reporting system. A cellular phone application has been developed to enable site engineers and the workforce to report hazards in a text format in real-time while working on its various sites. The reporting system enables a description of a hazard that is encountered to be formulated in free text, which identifies the event, its proximal causation and nuances. The text descriptions that are

LDA model generated topic assignment

To demonstrate the effectiveness of the developed LDA model, a good evaluation approach is needed to compare the LDA-generated topic assignments with those of the experts. Shukui [44], for example identified nine main hazard categories of hazards that occurred during the construction of metro lines. The experts were invited to compare the topics derived from the LDA with the categories identified by Shukui [44]. The experts found that all the 34 hazard topics obtained from performing the LDA

Discussion

Traditionally, the process of deciphering hazard records has been a manual and tedious process rendering it difficult to identify recurring patterns that may be jeopardizing the safety of a project’s workforce. However, digital technologies can play a significant role in helping to improve the management of safety in construction. A pertinent example of how technologies can be used to analyse the morass of unstructured and semi-structure safety data that is collected during construction is our

Conclusion

A novel and robust framework for identifying and classifying hazards automatically is presented, which safety managers can use to drive evidence-based decision-making in their projects. The framework combines deep learning and text mining and incorporates: (1) topic identification and hazard classification, (2) word co-occurrence network; and (3) hazard dynamic evolution over time. The framework is tested and evaluated for the topic identification and automatic classification, and visual

Declaration of Competing Interest

The authors declared that they have no conflicts of interest to this work.

Acknowledgments

The authors would like to acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 51878311, No. 71732001).

References (44)

W. Li et al.
An accident causation analysis and taxonomy (ACAT) model of complex industrial system from both system safety and control theory perspectives
Saf. Sci.
(2017)
V. Venkatasubramanian
Systemic failures: challenges and opportunities in risk management in complex systems
AIChE J.
(2010)
P. Marshall et al.
Heinrich's pyramid and occupational safety: A statistical validation methodology
Saf. Sci.
(2018)
C.L. Harden
Therapeutic safety monitoring: what to look for and when to look for it
Epilepsia
(2010)
T.H. Beach et al.
A rule-based semantic approach for automated regulatory compliance in the construction sector
Expert Syst. Appl.
(2015)
Y.K. Cho et al.
Projection-recognition-projection method for automatic object recognition and registration for dynamic heavy equipment operations
J. Comput. Civil. Eng.
(2014)
Q.T. Le et al.
A social network system for sharing construction safety and health knowledge
Autom. Constr.
(2014)
Y. Zhou et al.
Application of 4D visualization technology for safety management in metro construction
Autom. Constr.
(2013)
A. Abbaszadegan et al.
Assessing the influence of automated data analytics on cost and schedule performance
Proc. Eng.
(2015)
F. Dusse et al.
Information visualization for emergency management: a systematic mapping study
Expert Syst. Appl.
(2016)

S. Sarshar et al.

Visualizing risk related information for work orders through the planning process of maintenance activities

Saf. Sci.

(2018)

M.A. Qady et al.

Automatic clustering of construction project documents based on textual similarity

Autom. Constr.

(2014)

T.P. Williams et al.

Predicting construction cost overruns using text mining, numerical data and ensemble classifiers

Autom. Constr.

(2014)

J.P. Tixier et al.

Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports

Autom. Constr.

(2016)

J. Xu et al.

Incorporating context-relevant concepts into convolutional neural networks for short text classification

Neurocomputing

(2019)

J.F.D. Silva et al.

Document clustering and cluster topic extraction in multilingual corpora

Icdm

(2001)

M. Pavlinek et al.

Text classification method based on self-training and LDA topic models

Expert Syst. Appl.

(2017)

W. Yu et al.

TM-LDA: efficient online modeling of the latent topic transitions in social media

Acm Sigkdd International Conference on Knowledge Discovery & Data Mining

(2012)

C. Lucas et al.

Computer-assisted text analysis for comparative politics

Polit. Analy.

(2017)

E.P.S. Baumer et al.

Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?

J. Assoc. Inform. Sci. Technol.

(2017)

K.M. Quinn et al.

How to analyze political attention with minimal assumptions and costs

Am. J. Polit. Sci.

(2010)

H. Ling et al.

Topic detection from microblogs using T-LDA and perplexity

Asia-pacific Software Engineering Conference Workshops

(2017)

Cited by (89)

A hybrid deep semantic mining method considering fuzzy expressions for the automatic recognition of construction safety hazard information
2024, Advanced Engineering Informatics
Safety hazards are a key consideration in construction management. The efficient recognition of safety hazard information can help managers formulate safety hazard management measures and improve the efficiency of construction safety management. However, construction site safety hazard data are stored in semistructured and unstructured text formats, which cannot be directly converted into understandable and usable information. Moreover, safety hazard text contains many fuzzy expressions, thereby increasing the difficulty of text semantic analysis; thus, how to accurately mine safety hazard information from complex and diverse text data is an urgent problem that must be solved. In consideration of this problem, we propose a bidirectional long short-term memory (BiLSTM) method with a fuzzy word vector and self-attention mechanism (FSABiLSTM) to automatically recognize safety hazard information. This method adopts TextRank and Word2vec to calculate the fuzzy word vector and process fuzzy expressions in safety hazard text. The safety hazard text semantic features are deeply extracted based on BiLSTM and a fuzzy word vector, and the extracted semantic features are analyzed via a self-attention mechanism. Actual construction safety hazard text is used to verify the reliability and applicability of the method, and the results indicate that the accuracy of this method, which outperforms existing machine learning methods, is 91.70%. In addition, the FSABiLSTM method can be used to automatically evaluate the risk degree of safety hazards; this use is beneficial to managing and controlling safety hazards. Concerning safety hazard text data, this study provides a new deep mining approach that can enhance safety management efficiency.
Mining construction accident reports via unsupervised NLP and Accimap for systemic risk analysis
2024, Automation in Construction
The mortality rate in the construction industry in China is comparatively greater than that of other industries. However, the existing research on accident texts in this field is constrained to manual analysis and natural language processing (NLP) approaches. While the former approach necessitates labor-intensive efforts, the latter is restricted by a narrow viewpoint, posing challenges to comprehensively evaluating the interrelationships of factors. This study uses a Chinese sentence model to capture factors from 159 accident reports, organize text with clustering, and use manual encoding to identify themes. The accident risk was analyzed based on Accimap. The study results show the potential of combining NLP with accident causation modeling to provide a technical solution for data-driven systemic accident analysis (SAA). The findings offer insights for controlling risks on construction sites and improving safety in the industry.
Digital twin for intelligent tunnel construction
2024, Automation in Construction
New-generation intelligent construction places higher demands on digitisation and intelligence of tunnel. Digital twin (DT) effectively supports high-fidelity modelling, virtual-real mapping, and analysis-based decision-making but with research in the initial stage. To begin with, this paper delves into the complexity and uncertainty inherent in tunnel construction, highlighting DT as a promising solution compared to exisiting technologies such as Building Information Modelling. Then, a systematic literature survey is conducted, revealing growing focus on DT research topics. To provide comprehensive insights into DT-related technologies and their application in tunnel construction, this paper clusters literature from perspectives of sensor networks, Internet of Things (IoT), computer vision-based twin data acquisition, communication, natural language processing (NLP), automatic control-based connection, and geometric, semantic, analytical integrated twin modelling. These aspects shed light on potentials and limitations of existing researh in developing a functional DT. In response to the challenges of information richness, timeliness, and analytical capabilities, an improved conceptual framework tailored for tunnel is proposed to close the information and control loop. Finally, the paper discusses the prospects and gaps of DT in theory and practice to leverage DT implementation.
Machine learning framework for Hazard Extraction and Analysis of Trends (HEAT) in wildfire response
2023, Safety Science
This research proposes a natural language processing enabled risk analysis framework, named Hazard Extraction and Analysis of Trends (HEAT), and applies the framework to the ICS-209-PLUS data set of wildfire incident response forms. The HEAT framework produces safety- and risk-relevant analyses, consisting of: (1) a set of hazards extracted from text data, (2) a primary analysis using hazard-relevant metrics, such as rate and severity, to form an FMEA-style table and risk matrix, (3) a time series analysis of metric trends, and (4) a secondary analysis examining potential predictors for hazards. Results from HEAT provide quantitative risk-relevant information for high-level hazards documented in existing-state operations. Because of the generalizability of the steps and limited data requirements, HEAT can be applied to any dataset containing narrative text, thus providing a framework for data-driven machine learning-enabled quantitative risk analysis across a variety of domains. To demonstrate HEAT in a case study, we apply the framework to the ICS-209-PLUS dataset of wildland fire incident response forms. Hazards identified in wildfire response arise from environmental conditions, the mission, and the wildland urban interface. The resulting risk matrix identifies evacuations as high-risk hazards, while all other identified hazards are medium or serious risk.
A contrastive learning framework for safety information extraction in construction
2023, Advanced Engineering Informatics
Typically named entity recognition (NER) and relation extraction (RE) from safety documentation (e.g., accident reports) adopt a pipeline processing approach whereby tasks are split into two sub-tasks. As a result, error propagation occurs between components, and useful information from one task may go unexploited by the other. Additionally, training sets to perform NER and RE from safety documentation are often limited and context-specific. Thus, our research addresses the following question: How can we accurately identify entities and extract relations from safety documentation using limited training sets? This paper utilizes ‘contrastive learning’ to tackle our research question. It proposes a contrastive learning-based cascade binary tagging framework (CasRel) to automatically and synchronously extract entities and relations from safety documents. A five-fold cross-validation process is used to validate the effectiveness and feasibility of our approach. Results from the validation process achieve an average precision of 77.8%, recall of 58.7%, and F1-score of 66.9%, outperforming CasRel with an increase of about 10% in precision, 5% in recall, and 7% in F1-score. Thus, our approach can accurately recognize entities and extract relations from safety documentation. The contributions of our study are twofold: (1) an improved unified model is developed to recognize and extract the entity and relation from safety documents to reduce error propagation and improve its accuracy; and (2) the concept of ‘contrastive learning’ is introduced in the design of the joint entity and relation extraction model with limited training sets.
Intelligent information extraction from government on-site inspection reports of construction projects: A graph-based text mining approach
2023, Advanced Engineering Informatics
Government inspection reports detail unsafe acts and conditions that arise on construction sites, especially front-line managers’ non-compliance practices, which are hardly identified during self-inspections. Such information serves as a valuable learning source for better construction management. However, non-compliance issue records in inspection reports are typically stored in unstructured text formats, making data analysis challenging. In response, an intelligent text mining framework integrating graph analysis and visualization is presented. The proposed framework comprises data collection and preprocessing and three levels of text analysis: word, sentence, and document. The main tasks of the word-level analysis include (1) extracting keywords using KeyBERT and (2) identifying non-compliance issue types based on community detection in a keyword co-occurrence graph. The sentence-level analysis is performed to automatically classify text data from inspection reports by determining the degree of similarity between texts and communities and assigning the most similar community to each text. The document-level analysis aims to identify the interrelations between various non-compliance issues through association rule mining and a community interaction network. The framework is validated by a total of 6,153 text data featuring non-compliance issues from 322 government on-site inspection reports in Shanghai, China. The results demonstrate that the critical word-level features of non-compliance issues can be accurately identified using KeyBert, which outperforms other state-of-the-art methods. Our approach can also automate the development of a data-driven taxonomy for non-compliance issues and the classification of the corresponding records, requiring less manual intervention than conventional text classification models.

View all citing articles on Scopus

View full text

Hazard analysis: A deep learning and text mining framework for accident prevention

Abstract

Introduction

Section snippets

Related research

Research approach

Design and development of a deep learning and textmining framework

Demonstration

LDA model generated topic assignment

Discussion

Conclusion

Declaration of Competing Interest

Acknowledgments

An accident causation analysis and taxonomy (ACAT) model of complex industrial system from both system safety and control theory perspectives

Saf. Sci.

Systemic failures: challenges and opportunities in risk management in complex systems

AIChE J.

Heinrich's pyramid and occupational safety: A statistical validation methodology

Saf. Sci.

Therapeutic safety monitoring: what to look for and when to look for it

Epilepsia

A rule-based semantic approach for automated regulatory compliance in the construction sector

Expert Syst. Appl.

Projection-recognition-projection method for automatic object recognition and registration for dynamic heavy equipment operations

J. Comput. Civil. Eng.

A social network system for sharing construction safety and health knowledge

Autom. Constr.

Application of 4D visualization technology for safety management in metro construction

Autom. Constr.

Assessing the influence of automated data analytics on cost and schedule performance

Proc. Eng.

Information visualization for emergency management: a systematic mapping study

Expert Syst. Appl.

Visualizing risk related information for work orders through the planning process of maintenance activities

Saf. Sci.

Automatic clustering of construction project documents based on textual similarity

Autom. Constr.

Predicting construction cost overruns using text mining, numerical data and ensemble classifiers

Autom. Constr.

Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports

Autom. Constr.

Incorporating context-relevant concepts into convolutional neural networks for short text classification

Neurocomputing

Document clustering and cluster topic extraction in multilingual corpora

Icdm

Text classification method based on self-training and LDA topic models

Expert Syst. Appl.

TM-LDA: efficient online modeling of the latent topic transitions in social media

Acm Sigkdd International Conference on Knowledge Discovery & Data Mining

Computer-assisted text analysis for comparative politics

Polit. Analy.

Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?

J. Assoc. Inform. Sci. Technol.

How to analyze political attention with minimal assumptions and costs

Am. J. Polit. Sci.

Topic detection from microblogs using T-LDA and perplexity

Asia-pacific Software Engineering Conference Workshops