Hazard analysis: A deep learning and text mining framework for accident prevention
Introduction
Accidents on construction sites, particularly those of a large-scale, in China have been increasing over the last five years [1],[2]. There is therefore a need to improve safety performance and put in place mechanisms to control and mitigate hazards [3]. The identification and effective monitoring of hazards provides managers with the ability to put in place strategies to ensure peoples safety. Traditionally, however, safety monitoring has been reliant on performing regular inspections and in some instances the use of video surveillance [4].
During the process of inspecting a site for hazards those that are identified are manually recorded and stored in an unstructured or semi-structured text format. In the case of large-scale infrastructure projects, which can take years to construct, masses of hazard data that accords with regulations and expert judgement will be collected [5], [6]. Yet the ability to sieve through and analyse the data to determine patterns and trends in order to improve safety is a difficult process due its format.
Drawing on text mining and deep learning technologies the research presented in this study presents a novel framework that provides the ability to analyse hazard records automatically. The developed framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project that is being constructed in Wuhan, China. The research commences with a brief review of the text mining and deep learning literature to provide a contextual backdrop for the framework that is presented.
Section snippets
Related research
Safety information in construction is collected and stored in various formats (e.g., data, images, and text). Digital and mobile technologies are often used to capture hazards by site supervisors and engineers when they perform safety inspections. Yet the data tends to be unstructured, while useful, it is difficult to analyse and obtain insights about emerging patterns and nuances of hazards [7]. Recognising this problem, it has been averred that text mining techniques can be used to examine
Research approach
In making headway to prevent accidents in construction, we aim to develop a deep learning and text mining framework to analyse hazards automatically. The framework provides managers with the capability to quickly analyse a large number of hazard records and put in place mechanisms to prevent an adverse safety incident and therefore support an error management culture[35],[36].
In supporting of our research aim, and akin to previous studies that have been used to develop CNN frameworks to ensure
Design and development of a deep learning and textmining framework
Our proposed framework is presented in Fig. 2, comprises four-step modelling process:
- 1.
An LDA algorithm is employed to determine hazard topics. The LDA is used to determine acquiring and clustering scenario-specific hazards over a period of time.
- 2.
Based on the hazard sub-categories extracted from the topics of LDA, a CNN algorithm is trained to extract text features and automatically classify hazard records without manual feature processing. The CNN model is used to automatically classify hazards
Demonstration
The construction engineering arm of the Wuhan Metro Group Co., Ltd (China) has in place a digital hazard reporting system. A cellular phone application has been developed to enable site engineers and the workforce to report hazards in a text format in real-time while working on its various sites. The reporting system enables a description of a hazard that is encountered to be formulated in free text, which identifies the event, its proximal causation and nuances. The text descriptions that are
LDA model generated topic assignment
To demonstrate the effectiveness of the developed LDA model, a good evaluation approach is needed to compare the LDA-generated topic assignments with those of the experts. Shukui [44], for example identified nine main hazard categories of hazards that occurred during the construction of metro lines. The experts were invited to compare the topics derived from the LDA with the categories identified by Shukui [44]. The experts found that all the 34 hazard topics obtained from performing the LDA
Discussion
Traditionally, the process of deciphering hazard records has been a manual and tedious process rendering it difficult to identify recurring patterns that may be jeopardizing the safety of a project’s workforce. However, digital technologies can play a significant role in helping to improve the management of safety in construction. A pertinent example of how technologies can be used to analyse the morass of unstructured and semi-structure safety data that is collected during construction is our
Conclusion
A novel and robust framework for identifying and classifying hazards automatically is presented, which safety managers can use to drive evidence-based decision-making in their projects. The framework combines deep learning and text mining and incorporates: (1) topic identification and hazard classification, (2) word co-occurrence network; and (3) hazard dynamic evolution over time. The framework is tested and evaluated for the topic identification and automatic classification, and visual
Declaration of Competing Interest
The authors declared that they have no conflicts of interest to this work.
Acknowledgments
The authors would like to acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 51878311, No. 71732001).
References (44)
- et al.
An accident causation analysis and taxonomy (ACAT) model of complex industrial system from both system safety and control theory perspectives
Saf. Sci.
(2017) Systemic failures: challenges and opportunities in risk management in complex systems
AIChE J.
(2010)- et al.
Heinrich's pyramid and occupational safety: A statistical validation methodology
Saf. Sci.
(2018) Therapeutic safety monitoring: what to look for and when to look for it
Epilepsia
(2010)- et al.
A rule-based semantic approach for automated regulatory compliance in the construction sector
Expert Syst. Appl.
(2015) - et al.
Projection-recognition-projection method for automatic object recognition and registration for dynamic heavy equipment operations
J. Comput. Civil. Eng.
(2014) - et al.
A social network system for sharing construction safety and health knowledge
Autom. Constr.
(2014) - et al.
Application of 4D visualization technology for safety management in metro construction
Autom. Constr.
(2013) - et al.
Assessing the influence of automated data analytics on cost and schedule performance
Proc. Eng.
(2015) - et al.
Information visualization for emergency management: a systematic mapping study
Expert Syst. Appl.
(2016)
Visualizing risk related information for work orders through the planning process of maintenance activities
Saf. Sci.
Automatic clustering of construction project documents based on textual similarity
Autom. Constr.
Predicting construction cost overruns using text mining, numerical data and ensemble classifiers
Autom. Constr.
Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports
Autom. Constr.
Incorporating context-relevant concepts into convolutional neural networks for short text classification
Neurocomputing
Document clustering and cluster topic extraction in multilingual corpora
Icdm
Text classification method based on self-training and LDA topic models
Expert Syst. Appl.
TM-LDA: efficient online modeling of the latent topic transitions in social media
Acm Sigkdd International Conference on Knowledge Discovery & Data Mining
Computer-assisted text analysis for comparative politics
Polit. Analy.
Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?
J. Assoc. Inform. Sci. Technol.
How to analyze political attention with minimal assumptions and costs
Am. J. Polit. Sci.
Topic detection from microblogs using T-LDA and perplexity
Asia-pacific Software Engineering Conference Workshops
Cited by (89)
A hybrid deep semantic mining method considering fuzzy expressions for the automatic recognition of construction safety hazard information
2024, Advanced Engineering InformaticsMining construction accident reports via unsupervised NLP and Accimap for systemic risk analysis
2024, Automation in ConstructionDigital twin for intelligent tunnel construction
2024, Automation in ConstructionA contrastive learning framework for safety information extraction in construction
2023, Advanced Engineering InformaticsIntelligent information extraction from government on-site inspection reports of construction projects: A graph-based text mining approach
2023, Advanced Engineering Informatics