Deep learning and network analysis: Classifying and visualizing accident narratives in construction

doi:10.1016/j.autcon.2020.103089

Automation in Construction

Volume 113, May 2020, 103089

https://doi.org/10.1016/j.autcon.2020.103089 Get rights and content

Highlights

•
We propose a deep learning-based approach for analyzing the text with accident reports.
•
A CNN model is developed to classify accident narratives without using manual features.
•
An LDA-based network analysis method is used to provide a visualization of factors contributing to accidents.
•
The proposed approach can provide managers with much-needed information and knowledge to improve safety on-site.

Abstract

If headway is to be made to improve safety performance in construction, then there is a need to learn from past accidents. Accident reports provide a useful source of information to make sense as to why and how events occurred. Analyzing such reports, however, can be a lengthy and challenging process as there is a tendency for data to be presented in an unstructured or semi-structured free-text format. Thus, being able to classify and analyze the narrative that surrounds accidents and to better understand their causal nature is a challenge. Text classification using shallow machine learning with sophisticated manual lexical, syntactic, and semantic features engineering has been typically used to mine accident data. However, this approach requires highly skilled experts with domain knowledge to undertake this task. A limited number of studies have employed deep learning models to examine the text of safety reports in construction. In consideration of this limitation, word embedding is used to model the semantic narratives of accidents. Then, a Convolution Neural Network (CNN) model is trained to automatically extract text features and classify accident narratives without manual feature processing. The Latent Dirichlet Allocation (LDA) model is used to examine the interdependency that exists between causal variables to visualize the accident narratives. The proposed automated classification model and LDA-based network analysis method provide a useful approach to enable machine-assisted interpretation of texts-based accident narratives. Moreover, the proposed approach can provide managers with much-needed information and knowledge to improve safety on-site.

Introduction

Safety analysis in construction can broadly be classified into two different categories, namely predictive and retrospective methods [1]. Retrospective methods rely on past experiences and accident records (e.g., lessons learned and safety checklists) to avoid them in the future. Retrospective management requires a mechanism to prevent occurrence of similar accidents and promote workplace safety [2]. The crucial function of such a mechanism is the ability to analyze accident narratives collected over a period of time and derive knowledge about what previously went wrong [3]. More often than not managers are not provided with timely and fact-based information about accident causation as it is typically in an unstructured or semi-structure format [4]. Having to manually analyze accident text is a time-consuming and inefficient task [5].

Text mining has been identified as a potential technique that can be used to analyze and classify data contained within safety reports [6]. Existing text classification approaches that have been used to examine safety reports have tended to combine lexical, syntactic, and semantic features manually [[7], [8], [9]]. Such approaches are referred as shallow machine learning and include Support Vector Machine (SVM), Naive Bayes (NB) and K-Nearest Neighbor (KNN) with Natural Language Processing (NLP). This manual feature extraction process is limited by a person's domain knowledge and only can learn using the human-specified shallow feature. Contrastingly, deep learning algorithms (e.g., Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN)) can automatically identify features and use multiple single functions to learn complex tasks with a nonlinear combination of parameters from training data [10].

While there have been a number of studies that have used deep learning methods in construction [11], there is a paucity of research that has focused its use with text classification to examine accident reports. Managers often glimpse over reports as they can be rich in content and in some cases lengthy. As consequence valuable information that describes circumstances and conditions may be overlooked. Being able to analyze reports so that we can better understand the conditions and circumstances of an accident as well as their relationships others that have occurred can help managers make more informed-decisions about how manage safety. Thus, there is a need to automate the process of analyzing accident reports so that managers can learn and put in place processes to mitigate their future occurrence in a timely manner.

Against this contextual backdrop, we develop an Artificial Intelligence (AI) solution to automate the process of analyzing accident reports. We integrate therefore NLP with deep learning to automatically extract features and effectively classify accident narratives. The developed deep learning model incorporates topic mining and visual network to analyze and interpret texts-based accident narratives. The effectiveness of the deep learning model is verified using an experiment and compared with SVM, NB and KNN shallow machine learning methods. Each narrative category based on the CNN's classification is examined using the Latent Dirichlet Allocation (LDA) to garner further insights into the causes of the accident. The keywords derived from the LDA are analyzed using network analysis to identify and visualize the causal nature of an accident.

The aim of this paper is not to provide new insights into the causes of accidents per se, but demonstrate that deep learning can be used to extract unstructured safety data from accident text narratives automatically. As a result, managers will be better positioned to make timely and better-informed decisions about how to ensure the safety of their workforce on-site. The paper's contributions are twofold: (1) a novel deep learning approach is developed to analyze accident reports automatically; and (2) the narrative in form of text can be extracted and the causal variables of accidents visualized.

Section snippets

Related work

Text classification is the process of assigning tags or categories to text according to its content. Several studies have utilized text mining techniques to analyze injury texts. Its limited modeling and representational ability, however, makes it impossible to learn complex functions, such as those involved in text semantics [12]. Acknowledging this limitation, Ugray [13] used NLP and Bow-tie diagrams to form a semi-automated technique for classifying text-based ‘close call’ texts. Similarly,

Research approach

The workflow for the research presented in this paper is presented in is Fig. 1. A CNN model is used to classify the data before preprocessing accident narratives. The accident narratives are then collated to form one document. The LDA technique with unsupervised learning approaches is used to mine the main topics and determine their respective (corresponding) keywords in the Co-occurrence Network. The LDA-based network analysis is employed to depict and visually to represent the causal

CNN-based classification of accident narratives

As a deep learning model, a CNN can learn complex functions and related features from given texts without complicated feature engineering work [38]. By reducing manual interventions in pre-treatment and post-processing, a CNN model can automatically adjust parameters when a text classification task is performed. The CNN can automatically determine discriminative phrases in text using a max-pooling layer, instead of through manual feature engineering with domain knowledge [39]. A CNN classifier

Topic mining and LDA-based network analysis

The LDA model is a useful method for mining topics and their respective (responding) keywords [45]. LDA models are particularly useful for minimizing the time required to examine data that does not possess a label [34]. The accident keywords and the categories obtained from the CNN model are combined to form a single document, as shown in Fig. 5. Then, the LDA model is used to mine the main topics and identify their respective (responding) keywords. The keywords under a topic are treated as the

Discussion

With the increasing emergence of AI and digital technologies, the construction industry is beginning to embrace their use to improve productivity and performance of operations on-site. The AI techniques of machine and deep learning, for example have had a significant influence, but use cases in construction are still relatively nascent. To demonstrate how AI can be used to improve safety, this paper develops a deep learning CNN approach to analyze unstructured accident text automatically. In

Limitations

The research presented in this paper, however, is not without its limitations. As the CNN model was constructed on top of generic and basic features (i.e. words), it was expected the model would perform well for classifying similar accident texts. However, in this study, only the text of accidents occurring on construction sites from the OSHA website was selected to train and test the effectiveness of the proposed method. Thus, future work is needed to test the algorithms on a much larger

Conclusion

Unstructured and semi-structured free-texts are widely produced and used in construction. Such text provides practitioners with essential sources of information that can be used to retrospectively inform decision-making and improve the management of safety in projects. Typically, however, the process used to decipher and garner an understanding of accident texts is a manual and time-consuming process. Consequently, managers may overlook some important and recurring issues that are embedded in

Declaration of competing interest

The authors declared that they have no conflicts of interest to this work.

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. We declare that the work presented is original research that has not been published previously, and is not under consideration for publication elsewhere, in whole or in part.

Acknowledgments

This research is partly supported by the National Natural Science Foundation of China (Grant No. 51878311, No. 71732001, No. 51978302).

References (49)

L. Tanguy et al.
Natural language processing for aviation safety reports: from classification to interactive analysis
Comput. Ind.
(2016)
W. Fang et al.
A deep learning-based approach for mitigating falls from height with computer vision: convolutional neural network
Adv. Eng. Inform.
(2019)
V. Cherkassky et al.
Practical selection of SVM parameters and noise estimation for SVM regression
Neural Netw.
(2004)
S.J. Bertke et al.
Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims
J. Saf. Res.
(2012)
T.P. Williams et al.
Predicting construction cost overruns using text mining, numerical data and ensemble classifiers
Autom. Constr.
(2014)
H. Fan et al.
Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques
Autom. Constr.
(2013)
W.D. Yu et al.
Content-based text mining technique for retrieval of cad documents
Autom. Constr.
(2013)
N.W. Chi et al.
Evaluating the strength of text classification categories for supporting construction field inspection
Autom. Constr.
(2016)
F.S. Al-Anzi et al.
Toward an enhanced Arabic text classification using cosine similarity and latent semantic indexing
Journal of King Saud University-Computer and Information Sciences
(2017)
A. Chokor et al.
Analyzing Arizona OSHA injury reports using unsupervised machine learning
Procedia Engineering
(2016)

J. Suto et al.

Efficiency investigation from shallow to deep neural network techniques in human activity recognition

Cogn. Syst. Res.

(2019)

M.G. Yang et al.

Construction accident narrative classification: an evaluation of text mining techniques

Accid. Anal. Prev.

(2017)

S.D. Robinson

Visual representation of safety narratives

Saf. Sci.

(2016)

L.M. Steacy et al.

Examining the role of imageability and regularity in word reading accuracy and learning efficiency among first and second graders at risk for reading disabilities

J. Exp. Child Psychol.

(2019)

P. Wang et al.

Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification

Neurocomputing

(2016)

B. Ruhnau

Eigenvector-centrality — a node-centrality?

Soc. Networks

(2000)

D. Jatnika et al.

Word2Vec model analysis for semantic similarities in english words

Procedia Computer Science

(2019)

M. Fiedler et al.

Predictive value of FHIT, p27, and pERK1/ERK2 in salivary gland carcinomas: a retrospective study

Clin. Oral Investig.

(2019)

P. Thepaksorn et al.

Job safety analysis and hazard identification for work accident prevention in para rubber wood sawmills in southern Thailand

J. Occup. Health

(2017)

K. Mckenzie et al.

Identifying work related injuries: comparison of methods for interrogating text fields

BMC Medical Informatics and Decision Making

(2010)

A.M. Aitken

Managing unstructured and semi-structured information in organisations

IEEE/ACIS International Conference on Computer and Information Science

(2007)

M. Behm et al.

Application of the Loughborough construction accident causation model: a framework for organizational learning

Constr. Manag. Econ.

(2013)

M. Goudjil et al.

A novel active learning method using SVM for text classification

Int. J. Autom. Comput.

(2018)

S. Ahmad et al.

Information extraction from text messages using data mining techniques

Malaya Journal of Matematik

(2018)

Cited by (91)

A hybrid deep semantic mining method considering fuzzy expressions for the automatic recognition of construction safety hazard information
2024, Advanced Engineering Informatics
Safety hazards are a key consideration in construction management. The efficient recognition of safety hazard information can help managers formulate safety hazard management measures and improve the efficiency of construction safety management. However, construction site safety hazard data are stored in semistructured and unstructured text formats, which cannot be directly converted into understandable and usable information. Moreover, safety hazard text contains many fuzzy expressions, thereby increasing the difficulty of text semantic analysis; thus, how to accurately mine safety hazard information from complex and diverse text data is an urgent problem that must be solved. In consideration of this problem, we propose a bidirectional long short-term memory (BiLSTM) method with a fuzzy word vector and self-attention mechanism (FSABiLSTM) to automatically recognize safety hazard information. This method adopts TextRank and Word2vec to calculate the fuzzy word vector and process fuzzy expressions in safety hazard text. The safety hazard text semantic features are deeply extracted based on BiLSTM and a fuzzy word vector, and the extracted semantic features are analyzed via a self-attention mechanism. Actual construction safety hazard text is used to verify the reliability and applicability of the method, and the results indicate that the accuracy of this method, which outperforms existing machine learning methods, is 91.70%. In addition, the FSABiLSTM method can be used to automatically evaluate the risk degree of safety hazards; this use is beneficial to managing and controlling safety hazards. Concerning safety hazard text data, this study provides a new deep mining approach that can enhance safety management efficiency.
Mining construction accident reports via unsupervised NLP and Accimap for systemic risk analysis
2024, Automation in Construction
The mortality rate in the construction industry in China is comparatively greater than that of other industries. However, the existing research on accident texts in this field is constrained to manual analysis and natural language processing (NLP) approaches. While the former approach necessitates labor-intensive efforts, the latter is restricted by a narrow viewpoint, posing challenges to comprehensively evaluating the interrelationships of factors. This study uses a Chinese sentence model to capture factors from 159 accident reports, organize text with clustering, and use manual encoding to identify themes. The accident risk was analyzed based on Accimap. The study results show the potential of combining NLP with accident causation modeling to provide a technical solution for data-driven systemic accident analysis (SAA). The findings offer insights for controlling risks on construction sites and improving safety in the industry.
Critical review on data-driven approaches for learning from accidents: Comparative analysis and future research
2024, Safety Science
Data-driven intelligent technologies are promoting a disruptive digital transformation of human society. Industrial accident prevention is also amid this change. Although many emerging technologies, such as machine learning (ML), are extensively employed in workplace safety, these approaches need to fit the intended safety purpose of accident analysis, risk assessment, adverse outcome prediction, or anomaly detection. Hence, examining the “real-world” need for accident prevention and the advantages of emerging data-driven methodologies to better integrate them is necessary. This study provides a systematic review to clarify the current research status, existing problems, and future insights into these evolving technologies in accident prevention. We present notable gaps and barriers in data-driven accident prevention by analyzing 194 published studies from four perspectives: Paradigm, Model, Data Source, and Purpose. The results demonstrate (1) lack of a systematic framework to guide the application of Big Data (BD) in the field of safety; (2) few prior studies have considered model interpretability; (3) more proactive data needs to be incorporated into accident analysis; (4) safety-related data and domain knowledge need to be further integrated; (5) some recent data-driven techniques are unexplored in safety science. Further, the future research opportunities are discussed based on these findings. Such review may help clarify the mapping of data-driven tasks to safety goals to accelerate the uptake of data-driven technologies in safety or accident analysis research.
Artificial neural networks applications in construction and building engineering (1991–2021): Science mapping and visualization
2024, Applied Soft Computing
Artificial neural network (ANN) has acquired noticeable interest from the research community to handle complex problems in Construction and Building engineering (CB). This interest has led to an enormous amount of scientific publications in diverse CB domains over the last three decades. This study introduces a scientometric review to quantitatively explore and visually map the development pathways and trends of ANN-CB literature. Via the Web of Science (WoS) database, 2406 peer-reviewed journal articles are identified and included for analysis as follows. First, the publication growth over time is inspected and evaluated. Second, the collaboration patterns between key contributors (researchers, countries, and organizations) are explored and mapped using the co-authorship analysis. Third, the key sources’ productivity and influence are explored and mapped using the direct citation analysis. Fourth, the publications clusters and research themes are analyzed and visualized via the keyword co-occurrence analysis and document trend topics mapping. The study outcomes include but are not limited to i) recognizing pioneer ANN-CB researchers for future collaboration opportunities, ii) identifying reliable sources of information or suitable ones for publishing new ANN-CB works, and iii) fostering probable academic partnerships with the leading ANN-CB organizations. These outcomes help researchers to comprehend ANN-CB literature and direct research policy-makers and editorial boards to adopt the promising ANN-CB themes for further research and development.
Text mining and natural language processing in construction
2024, Automation in Construction
Text mining (TM) and natural language processing (NLP) have stirred interest within the construction field, as they offer enhanced capabilities for managing and analyzing text-based information. This highlights the need for a systematic review to identify the status quo, gaps, and future directions from the perspective of construction management. A review was conducted by aligning the objectives of 205 publications with the specific domains, areas, tasks, and processes outlined in construction management practices. This review reveals multiple facets of the construction sector empowered by TM/NLP approaches and highlights essential voids demanding consideration for automation possibilities and minimizing manual tasks. Ultimately, following identified obstacles, the review results indicate potential research opportunities: (1) strengthening overlooked construction aspects, (2) coupling diverse data formats, and (3) leveraging pre-trained language models and reinforcement learning. The findings will provide vital insights, fostering further progress in TM/NLP research and its applications in academia and industry.
Discovering latent themes in aviation safety reports using text mining and network analytics
2024, International Journal of Transportation Science and Technology
Aviation accidents, referring to unexpected and undesirable events involving aircraft, often cause great damage to property and human life. Learning from historical accidents is pivotal for improving safety in aviation. However, aviation accidents are typically documented and stored as unstructured or semi-structured free-text, rendering the ability to analyze such data a difficult task. This study presents a novel framework that combines text mining and network analytics techniques and provides the ability to analyze aviation accident reports automatically. The framework comprises a four-step modelling approach to: (1) the transformation of unstructured aviation safety report texts into structured numeric matrices using the TF-IDF matrix; (2) the identification of aviation accident topics using a Structural Topic Model (STM); (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between aviation safety risk factors; and (4) quantitative analysis by technology of keywords to pinpoint key causal factors in aviation safety events. The proposed framework is validated by analyzing aviation accident reports collected by the National Transportation Safety Board (NTSB). The results indicate that STM provides a more granular partitioning of topics and better distinguishes between similar events compared to traditional Latent Dirichlet Allocation (LDA). Among the identified topics, “Fuel and Power” and “En-route Phase” have the highest occurrence rate according to STM. Additionally, “Aircraft Crash” is the most prevalent topic in aviation accidents that resulted in fatal injuries, while “Landing phase” is the most prevalent topic in non-fatal injuries on accidents. Based on the word co-occurrence network, three centrality measures highlight “inspection of equipment” and “take off” as the most important risk factors in aviation safety. The proposed framework provides a comprehensive solution for in-depth analysis of aviation safety reports, offering decision support for aviation safety management and accident prevention, thereby reducing risks and strengthening safety measures.

View all citing articles on Scopus

View full text

Deep learning and network analysis: Classifying and visualizing accident narratives in construction

Highlights

Abstract

Introduction

Section snippets

Related work

Research approach

CNN-based classification of accident narratives

Topic mining and LDA-based network analysis

Discussion

Limitations

Conclusion

Declaration of competing interest

Acknowledgments

Comput. Ind.

Adv. Eng. Inform.

Neural Netw.

J. Saf. Res.

Autom. Constr.

Autom. Constr.

Autom. Constr.

Autom. Constr.

Journal of King Saud University-Computer and Information Sciences

Procedia Engineering

Cogn. Syst. Res.

Accid. Anal. Prev.

Saf. Sci.

J. Exp. Child Psychol.

Neurocomputing

Soc. Networks

Procedia Computer Science

Predictive value of FHIT, p27, and pERK1/ERK2 in salivary gland carcinomas: a retrospective study

Clin. Oral Investig.

Job safety analysis and hazard identification for work accident prevention in para rubber wood sawmills in southern Thailand

J. Occup. Health

Identifying work related injuries: comparison of methods for interrogating text fields

BMC Medical Informatics and Decision Making

Managing unstructured and semi-structured information in organisations

IEEE/ACIS International Conference on Computer and Information Science

Application of the Loughborough construction accident causation model: a framework for organizational learning

Constr. Manag. Econ.

A novel active learning method using SVM for text classification

Int. J. Autom. Comput.

Information extraction from text messages using data mining techniques

Malaya Journal of Matematik