Deep learning and network analysis: Classifying and visualizing accident narratives in construction

https://doi.org/10.1016/j.autcon.2020.103089Get rights and content

Highlights

  • We propose a deep learning-based approach for analyzing the text with accident reports.

  • A CNN model is developed to classify accident narratives without using manual features.

  • An LDA-based network analysis method is used to provide a visualization of factors contributing to accidents.

  • The proposed approach can provide managers with much-needed information and knowledge to improve safety on-site.

Abstract

If headway is to be made to improve safety performance in construction, then there is a need to learn from past accidents. Accident reports provide a useful source of information to make sense as to why and how events occurred. Analyzing such reports, however, can be a lengthy and challenging process as there is a tendency for data to be presented in an unstructured or semi-structured free-text format. Thus, being able to classify and analyze the narrative that surrounds accidents and to better understand their causal nature is a challenge. Text classification using shallow machine learning with sophisticated manual lexical, syntactic, and semantic features engineering has been typically used to mine accident data. However, this approach requires highly skilled experts with domain knowledge to undertake this task. A limited number of studies have employed deep learning models to examine the text of safety reports in construction. In consideration of this limitation, word embedding is used to model the semantic narratives of accidents. Then, a Convolution Neural Network (CNN) model is trained to automatically extract text features and classify accident narratives without manual feature processing. The Latent Dirichlet Allocation (LDA) model is used to examine the interdependency that exists between causal variables to visualize the accident narratives. The proposed automated classification model and LDA-based network analysis method provide a useful approach to enable machine-assisted interpretation of texts-based accident narratives. Moreover, the proposed approach can provide managers with much-needed information and knowledge to improve safety on-site.

Introduction

Safety analysis in construction can broadly be classified into two different categories, namely predictive and retrospective methods [1]. Retrospective methods rely on past experiences and accident records (e.g., lessons learned and safety checklists) to avoid them in the future. Retrospective management requires a mechanism to prevent occurrence of similar accidents and promote workplace safety [2]. The crucial function of such a mechanism is the ability to analyze accident narratives collected over a period of time and derive knowledge about what previously went wrong [3]. More often than not managers are not provided with timely and fact-based information about accident causation as it is typically in an unstructured or semi-structure format [4]. Having to manually analyze accident text is a time-consuming and inefficient task [5].

Text mining has been identified as a potential technique that can be used to analyze and classify data contained within safety reports [6]. Existing text classification approaches that have been used to examine safety reports have tended to combine lexical, syntactic, and semantic features manually [[7], [8], [9]]. Such approaches are referred as shallow machine learning and include Support Vector Machine (SVM), Naive Bayes (NB) and K-Nearest Neighbor (KNN) with Natural Language Processing (NLP). This manual feature extraction process is limited by a person's domain knowledge and only can learn using the human-specified shallow feature. Contrastingly, deep learning algorithms (e.g., Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN)) can automatically identify features and use multiple single functions to learn complex tasks with a nonlinear combination of parameters from training data [10].

While there have been a number of studies that have used deep learning methods in construction [11], there is a paucity of research that has focused its use with text classification to examine accident reports. Managers often glimpse over reports as they can be rich in content and in some cases lengthy. As consequence valuable information that describes circumstances and conditions may be overlooked. Being able to analyze reports so that we can better understand the conditions and circumstances of an accident as well as their relationships others that have occurred can help managers make more informed-decisions about how manage safety. Thus, there is a need to automate the process of analyzing accident reports so that managers can learn and put in place processes to mitigate their future occurrence in a timely manner.

Against this contextual backdrop, we develop an Artificial Intelligence (AI) solution to automate the process of analyzing accident reports. We integrate therefore NLP with deep learning to automatically extract features and effectively classify accident narratives. The developed deep learning model incorporates topic mining and visual network to analyze and interpret texts-based accident narratives. The effectiveness of the deep learning model is verified using an experiment and compared with SVM, NB and KNN shallow machine learning methods. Each narrative category based on the CNN's classification is examined using the Latent Dirichlet Allocation (LDA) to garner further insights into the causes of the accident. The keywords derived from the LDA are analyzed using network analysis to identify and visualize the causal nature of an accident.

The aim of this paper is not to provide new insights into the causes of accidents per se, but demonstrate that deep learning can be used to extract unstructured safety data from accident text narratives automatically. As a result, managers will be better positioned to make timely and better-informed decisions about how to ensure the safety of their workforce on-site. The paper's contributions are twofold: (1) a novel deep learning approach is developed to analyze accident reports automatically; and (2) the narrative in form of text can be extracted and the causal variables of accidents visualized.

Section snippets

Related work

Text classification is the process of assigning tags or categories to text according to its content. Several studies have utilized text mining techniques to analyze injury texts. Its limited modeling and representational ability, however, makes it impossible to learn complex functions, such as those involved in text semantics [12]. Acknowledging this limitation, Ugray [13] used NLP and Bow-tie diagrams to form a semi-automated technique for classifying text-based ‘close call’ texts. Similarly,

Research approach

The workflow for the research presented in this paper is presented in is Fig. 1. A CNN model is used to classify the data before preprocessing accident narratives. The accident narratives are then collated to form one document. The LDA technique with unsupervised learning approaches is used to mine the main topics and determine their respective (corresponding) keywords in the Co-occurrence Network. The LDA-based network analysis is employed to depict and visually to represent the causal

CNN-based classification of accident narratives

As a deep learning model, a CNN can learn complex functions and related features from given texts without complicated feature engineering work [38]. By reducing manual interventions in pre-treatment and post-processing, a CNN model can automatically adjust parameters when a text classification task is performed. The CNN can automatically determine discriminative phrases in text using a max-pooling layer, instead of through manual feature engineering with domain knowledge [39]. A CNN classifier

Topic mining and LDA-based network analysis

The LDA model is a useful method for mining topics and their respective (responding) keywords [45]. LDA models are particularly useful for minimizing the time required to examine data that does not possess a label [34]. The accident keywords and the categories obtained from the CNN model are combined to form a single document, as shown in Fig. 5. Then, the LDA model is used to mine the main topics and identify their respective (responding) keywords. The keywords under a topic are treated as the

Discussion

With the increasing emergence of AI and digital technologies, the construction industry is beginning to embrace their use to improve productivity and performance of operations on-site. The AI techniques of machine and deep learning, for example have had a significant influence, but use cases in construction are still relatively nascent. To demonstrate how AI can be used to improve safety, this paper develops a deep learning CNN approach to analyze unstructured accident text automatically. In

Limitations

The research presented in this paper, however, is not without its limitations. As the CNN model was constructed on top of generic and basic features (i.e. words), it was expected the model would perform well for classifying similar accident texts. However, in this study, only the text of accidents occurring on construction sites from the OSHA website was selected to train and test the effectiveness of the proposed method. Thus, future work is needed to test the algorithms on a much larger

Conclusion

Unstructured and semi-structured free-texts are widely produced and used in construction. Such text provides practitioners with essential sources of information that can be used to retrospectively inform decision-making and improve the management of safety in projects. Typically, however, the process used to decipher and garner an understanding of accident texts is a manual and time-consuming process. Consequently, managers may overlook some important and recurring issues that are embedded in

Declaration of competing interest

The authors declared that they have no conflicts of interest to this work.

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. We declare that the work presented is original research that has not been published previously, and is not under consideration for publication elsewhere, in whole or in part.

Acknowledgments

This research is partly supported by the National Natural Science Foundation of China (Grant No. 51878311, No. 71732001, No. 51978302).

References (49)

  • J. Suto et al.

    Efficiency investigation from shallow to deep neural network techniques in human activity recognition

    Cogn. Syst. Res.

    (2019)
  • M.G. Yang et al.

    Construction accident narrative classification: an evaluation of text mining techniques

    Accid. Anal. Prev.

    (2017)
  • S.D. Robinson

    Visual representation of safety narratives

    Saf. Sci.

    (2016)
  • L.M. Steacy et al.

    Examining the role of imageability and regularity in word reading accuracy and learning efficiency among first and second graders at risk for reading disabilities

    J. Exp. Child Psychol.

    (2019)
  • P. Wang et al.

    Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification

    Neurocomputing

    (2016)
  • B. Ruhnau

    Eigenvector-centrality — a node-centrality?

    Soc. Networks

    (2000)
  • D. Jatnika et al.

    Word2Vec model analysis for semantic similarities in english words

    Procedia Computer Science

    (2019)
  • M. Fiedler et al.

    Predictive value of FHIT, p27, and pERK1/ERK2 in salivary gland carcinomas: a retrospective study

    Clin. Oral Investig.

    (2019)
  • P. Thepaksorn et al.

    Job safety analysis and hazard identification for work accident prevention in para rubber wood sawmills in southern Thailand

    J. Occup. Health

    (2017)
  • K. Mckenzie et al.

    Identifying work related injuries: comparison of methods for interrogating text fields

    BMC Medical Informatics and Decision Making

    (2010)
  • A.M. Aitken

    Managing unstructured and semi-structured information in organisations

    IEEE/ACIS International Conference on Computer and Information Science

    (2007)
  • M. Behm et al.

    Application of the Loughborough construction accident causation model: a framework for organizational learning

    Constr. Manag. Econ.

    (2013)
  • M. Goudjil et al.

    A novel active learning method using SVM for text classification

    Int. J. Autom. Comput.

    (2018)
  • S. Ahmad et al.

    Information extraction from text messages using data mining techniques

    Malaya Journal of Matematik

    (2018)
  • Cited by (91)

    • Discovering latent themes in aviation safety reports using text mining and network analytics

      2024, International Journal of Transportation Science and Technology
    View all citing articles on Scopus
    View full text