Hazard analysis: A deep learning and text mining framework for accident prevention

https://doi.org/10.1016/j.aei.2020.101152Get rights and content

Abstract

Learning from past accidents is pivotal for improving safety in construction. However, hazard records are typically documented and stored as unstructured or semi-structured free-text rendering the ability to analyse such data a difficult task. The research presented in this study presents a novel and robust framework that combines deep learning and text mining technologies that provide the ability to analyse hazard records automatically. The framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation algorithm (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project. It is envisaged that the use of the framework can provide managers with new insights and knowledge to better ensure positive safety outcomes in projects. The contributions of this research are threefold: (1) it is demonstrated that the process of analysing hazard records can be automated by combining deep learning and text learning; (2) hazards are able to be visualized using a systematic and data-driven process; and (3) the automatic generation of hazard topics and their classification over specific time periods enabling managers to understand their patterns of manifestation and therefore put in place strategies to prevent them from reoccurring.

Introduction

Accidents on construction sites, particularly those of a large-scale, in China have been increasing over the last five years [1],[2]. There is therefore a need to improve safety performance and put in place mechanisms to control and mitigate hazards [3]. The identification and effective monitoring of hazards provides managers with the ability to put in place strategies to ensure peoples safety. Traditionally, however, safety monitoring has been reliant on performing regular inspections and in some instances the use of video surveillance [4].

During the process of inspecting a site for hazards those that are identified are manually recorded and stored in an unstructured or semi-structured text format. In the case of large-scale infrastructure projects, which can take years to construct, masses of hazard data that accords with regulations and expert judgement will be collected [5], [6]. Yet the ability to sieve through and analyse the data to determine patterns and trends in order to improve safety is a difficult process due its format.

Drawing on text mining and deep learning technologies the research presented in this study presents a novel framework that provides the ability to analyse hazard records automatically. The developed framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project that is being constructed in Wuhan, China. The research commences with a brief review of the text mining and deep learning literature to provide a contextual backdrop for the framework that is presented.

Section snippets

Related research

Safety information in construction is collected and stored in various formats (e.g., data, images, and text). Digital and mobile technologies are often used to capture hazards by site supervisors and engineers when they perform safety inspections. Yet the data tends to be unstructured, while useful, it is difficult to analyse and obtain insights about emerging patterns and nuances of hazards [7]. Recognising this problem, it has been averred that text mining techniques can be used to examine

Research approach

In making headway to prevent accidents in construction, we aim to develop a deep learning and text mining framework to analyse hazards automatically. The framework provides managers with the capability to quickly analyse a large number of hazard records and put in place mechanisms to prevent an adverse safety incident and therefore support an error management culture[35],[36].

In supporting of our research aim, and akin to previous studies that have been used to develop CNN frameworks to ensure

Design and development of a deep learning and textmining framework

Our proposed framework is presented in Fig. 2, comprises four-step modelling process:

  • 1.

    An LDA algorithm is employed to determine hazard topics. The LDA is used to determine acquiring and clustering scenario-specific hazards over a period of time.

  • 2.

    Based on the hazard sub-categories extracted from the topics of LDA, a CNN algorithm is trained to extract text features and automatically classify hazard records without manual feature processing. The CNN model is used to automatically classify hazards

Demonstration

The construction engineering arm of the Wuhan Metro Group Co., Ltd (China) has in place a digital hazard reporting system. A cellular phone application has been developed to enable site engineers and the workforce to report hazards in a text format in real-time while working on its various sites. The reporting system enables a description of a hazard that is encountered to be formulated in free text, which identifies the event, its proximal causation and nuances. The text descriptions that are

LDA model generated topic assignment

To demonstrate the effectiveness of the developed LDA model, a good evaluation approach is needed to compare the LDA-generated topic assignments with those of the experts. Shukui [44], for example identified nine main hazard categories of hazards that occurred during the construction of metro lines. The experts were invited to compare the topics derived from the LDA with the categories identified by Shukui [44]. The experts found that all the 34 hazard topics obtained from performing the LDA

Discussion

Traditionally, the process of deciphering hazard records has been a manual and tedious process rendering it difficult to identify recurring patterns that may be jeopardizing the safety of a project’s workforce. However, digital technologies can play a significant role in helping to improve the management of safety in construction. A pertinent example of how technologies can be used to analyse the morass of unstructured and semi-structure safety data that is collected during construction is our

Conclusion

A novel and robust framework for identifying and classifying hazards automatically is presented, which safety managers can use to drive evidence-based decision-making in their projects. The framework combines deep learning and text mining and incorporates: (1) topic identification and hazard classification, (2) word co-occurrence network; and (3) hazard dynamic evolution over time. The framework is tested and evaluated for the topic identification and automatic classification, and visual

Declaration of Competing Interest

The authors declared that they have no conflicts of interest to this work.

Acknowledgments

The authors would like to acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 51878311, No. 71732001).

References (44)

  • W. Li et al.

    An accident causation analysis and taxonomy (ACAT) model of complex industrial system from both system safety and control theory perspectives

    Saf. Sci.

    (2017)
  • V. Venkatasubramanian

    Systemic failures: challenges and opportunities in risk management in complex systems

    AIChE J.

    (2010)
  • P. Marshall et al.

    Heinrich's pyramid and occupational safety: A statistical validation methodology

    Saf. Sci.

    (2018)
  • C.L. Harden

    Therapeutic safety monitoring: what to look for and when to look for it

    Epilepsia

    (2010)
  • T.H. Beach et al.

    A rule-based semantic approach for automated regulatory compliance in the construction sector

    Expert Syst. Appl.

    (2015)
  • Y.K. Cho et al.

    Projection-recognition-projection method for automatic object recognition and registration for dynamic heavy equipment operations

    J. Comput. Civil. Eng.

    (2014)
  • Q.T. Le et al.

    A social network system for sharing construction safety and health knowledge

    Autom. Constr.

    (2014)
  • Y. Zhou et al.

    Application of 4D visualization technology for safety management in metro construction

    Autom. Constr.

    (2013)
  • A. Abbaszadegan et al.

    Assessing the influence of automated data analytics on cost and schedule performance

    Proc. Eng.

    (2015)
  • F. Dusse et al.

    Information visualization for emergency management: a systematic mapping study

    Expert Syst. Appl.

    (2016)
  • S. Sarshar et al.

    Visualizing risk related information for work orders through the planning process of maintenance activities

    Saf. Sci.

    (2018)
  • M.A. Qady et al.

    Automatic clustering of construction project documents based on textual similarity

    Autom. Constr.

    (2014)
  • T.P. Williams et al.

    Predicting construction cost overruns using text mining, numerical data and ensemble classifiers

    Autom. Constr.

    (2014)
  • J.P. Tixier et al.

    Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports

    Autom. Constr.

    (2016)
  • J. Xu et al.

    Incorporating context-relevant concepts into convolutional neural networks for short text classification

    Neurocomputing

    (2019)
  • J.F.D. Silva et al.

    Document clustering and cluster topic extraction in multilingual corpora

    Icdm

    (2001)
  • M. Pavlinek et al.

    Text classification method based on self-training and LDA topic models

    Expert Syst. Appl.

    (2017)
  • W. Yu et al.

    TM-LDA: efficient online modeling of the latent topic transitions in social media

    Acm Sigkdd International Conference on Knowledge Discovery & Data Mining

    (2012)
  • C. Lucas et al.

    Computer-assisted text analysis for comparative politics

    Polit. Analy.

    (2017)
  • E.P.S. Baumer et al.

    Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?

    J. Assoc. Inform. Sci. Technol.

    (2017)
  • K.M. Quinn et al.

    How to analyze political attention with minimal assumptions and costs

    Am. J. Polit. Sci.

    (2010)
  • H. Ling et al.

    Topic detection from microblogs using T-LDA and perplexity

    Asia-pacific Software Engineering Conference Workshops

    (2017)
  • Cited by (89)

    • Digital twin for intelligent tunnel construction

      2024, Automation in Construction
    View all citing articles on Scopus
    View full text