Human reliability analysis: Exploring the intellectual structure of a research field

Humans play a crucial role in modern socio-technical systems. Rooted in reliability engineering, the discipline of Human Reliability Analysis (HRA) has been broadly applied in a variety of domains in order to understand, manage and prevent the potential for human errors. This paper investigates the existing literature pertaining to HRA and aims to provide clarity in the research field by synthesizing the literature in a systematic way through systematic bibliometric analyses. The multi-method approach followed in this research combines factor analysis, multi-dimensional scaling, and bibliometric mapping to identify main HRA research areas. This document reviews over 1200 contributions, with the ultimate goal of identifying current research streams and outlining the potential for future research via a large-scale analysis of contributions indexed in Scopus database.


Introduction
Human Reliability Analysis (HRA) is the discipline that provides methods and tools for qualitatively and quantitatively predicting human errors in systems in which people have monitoring and control functions. The roots of HRA are in equipment reliability engineering, from which it derives its central concepts and methods [1,2]. The first systematic assessments of human reliability were initiated in the military domain and were conducted in particular for predicting and quantifying the probability of human errors in nuclear weapon assembly (the work of Swain and Guttman at the Sandia National Lab); these assessments resulted in the development of the early versions of the THERP [3]. The second main driver came from the development in the nuclear power industry of Probabilistic Risk Assessment (PRA), a technique for quantifying the risks posed to the public by a serious coremelt accident at a nuclear power plant. The WASH-1400 report [4], considered a pioneering work, used the THERP to identify potential operator errors and to systematically estimate their probability.
Applications in the military domain were focused on well-defined assembly tasks in which the physical environment paced the operator, allowing only a known sequence of subtasks for correct performance. In the context of such repetitive, lower-level processing and predictable tasks, operators could be readily modelled as components that either acted as required by the system or deviated from the requirements. Early applications in the nuclear industry maintained the assumption of the operator as a component performing a set of assigned functions. This allowed for a single reliability engineering framework to be applied to the entire human-machine system for which failure probabilities were required. However, it was later recognized that instead of a modelling of technical components, a more detailed human modelling was needed. Unlike equipment such as valves and pumps that have very specific functions in response to limited inputs and outputs, operators in nuclear power plants interpret the inputs according to the goals they are pursuing and autonomously decide among a vast array of strategies or subtasks to achieve the same results. In addition, human performance is strongly influenced by variations in task and workplace conditions as well as individual and cognitive aspects.
The need for a proper treatment of the human element in the total system led to research and development efforts that continue to this day. Through an examination of the publications included in HRA literature during a period covering over 50 years, the present review explores the intellectual structure of the field.
Tightly linked to bibliometrics, the scientometrics perspective ("quantitative study of science […]" [5]) has been adopted. A scientometrics study of a scientific field can be performed through the analysis of the field's immediate and tangible outputs (e.g.) papers, proceedings, books [6]. Consequently, to delineate research areas and thematic relationships for the definition of the field's intellectual structure, bibliometric data, such as the number of citations or the number of co-citations (i.e., the times when two documents are cited together by another document) can be analysed as proxy measures [7].
However, we offer a word of caution regarding the coverage of the present review. HRA is foremost an applied, industrial engineering discipline whose results are not necessarily published or even publishable (confidentiality issues). All HRA research and development contributions are not directly reflected in scientific indexed databases. Some important reference sources comprise proprietary research (for instance, the highly influential proprietary reports by the Electric Power Research Institute -EPRI). In other cases, the sources are publicly available but not recorded in citation databases, particularly as one moves back in time. Some examples are reports by industry bodies (e.g., the HRAG/Human Factors in Reliability Group in the UK and the Energy Institute in the US), by international organizations (e.g., IAEA, NEA/CSNI, the EU Joint Research Centers, the OECD Halden Reactor Project, and NATO) and by national regulatory and safety bodies (e.g., the U.S. NRC's NUREG reports and the HSE in the UK).
Bearing this aspect in mind, we still perform a bibliometric metaanalysis assuming that (i) the sources retrieved in citation databases are able to directly keep track of the non-recorded sources (e.g., summary papers of proprietary reports) or indirectly keep track of them (e.g., papers treating themes first raised in the non-cited reports) and that therefore (ii) the absence of the uncited sources is not expected to dramatically modify the field's overall intellectual structure.
The core concept of the review performed in this paper is the usage of bibliometric data as a main support tool for a meta-analysis aimed at exploring, clustering and categorizing the available literature. The analysis extracts information from Scopus database and adopts a multimethod approach based on different bibliometric information extracted from the articles' metadata (source, article type, date, reference list, etc.). Scopus has been identified as the reference database for two main reasons: (i) with over 5000 publishers and over 71 million records, it represents the largest database of peer-reviewed literature and is fairly balanced among the technical and social aspects of science; (ii) it allows a well-structured metadata export either through its APIs or through manageable export files (e.g., .ris, .csv) [8].
The analysis pays attention to citation and co-citation data. In particular, co-citations have been recognized as valuable data sources for examining the relationships among documents and their contribution to a research field [9][10][11]. The co-citation analysis in this paper relies on the assumption that if two documents are often co-cited, the same contributions have some type of semantic or conceptual link. Starting from co-citation data, Factor Analysis (FA) is used here as a multi-variate technique for data reduction to extract research factors from the literature. Research factors are intended as sets of documents that focus on a similar research topic and concern a specific sub-field. As such, they thus support the exploration and definition of the intellectual structure of the research field itself. Based on the acknowledgement of research factors as multi-faceted abstract artefacts, the results of the FA have been further extended in a multi-dimensional perspective through a Multi-Dimensional Scaling (MDS) algorithm. MDS is used to depict the proximity between documents, which is still based on co-citation values, whose ultimate purpose is to understand intra-and inter-factor relationships.
In addition to these techniques for document analyses, other approaches have been used to further explore the research field. In particular, bibliometric maps have been developed to identify main terms and respective relationships.
Note that this research adopts a strong interpretive dimension: we set a level of philosophical assumption that is intrinsic to the complexity of uncovering a research field's intellectual structure. Nevertheless, following a hermeneutic perspective, we use data analytics to provide an interpretation for reducing our subjective bias in the definition of the publications structure [12].
In practice, regarding the methodology developed for the analysis, the research also follows a complementary normative dimension. It is noteworthy that the complementary nature of the multi-method The yellow boxes refer to the methodology steps (divided into Scopus API manipulation, Data pre-processing, Data analysis). Each method has been labelled and associated with the software used for its implementation (within brackets, in bold characters).
approach proposed in this work is described in detail in order to support other researchers in performing other scientometrics research.

Methodology
The research methodology can be summarized in 9 steps that were based on the Scopus APIs and managed by means of Python scripts and other software for data analysis and visualization (Microsoft PowerBi, VOSviewer). Fig. 1 summarizes the research process, which is described in detail in the following 9 steps.
Step 1. The search key was finalized in Scopus by using the Scopus search query system.
Step 2. The Scopus API "Scopus_Search()" was implemented in order to extract the list of papers associated with the Scopus key defined in Step 1. The outcome of this extraction generated a set of papers that constitutes the so-called Dataset 0.
Step 3. Starting from the list of papers defined in Step 2, the Scopus API "Abstract_Retrieval()" was implemented in order to obtain the respective papers' metadata. The latter was structured in multi-dimensional tensors and required further manipulation to be completely exploited. In addition, the same Scopus API allowed the extraction of the list of papers cited by the papers included in Dataset 0 (whose size is n, number of papers in Dataset 0). These cited papers constitute Dataset 1, and they were used for subsequent analyses (Step 6).
Step 4. Based on the Dataset 0 metadata, an ad hoc Python script was developed in order to create citations pairs, i.e., a vector of citingcited papers that exploits all the citations of papers included in Dataset 0.
Step 5. Based on the citations pairs in Step 4, a co-citation matrix was developed. The matrix has a n x n dimension (where n is the number of papers in Dataset 0). Note that for pragmatic reasons, a cocitation threshold was iteratively defined to isolate the papers to be included in the matrix itself. These papers constituted the Core Dataset (whose size is m < n) for the application of data reduction techniques.
Step 6. Starting from the list of papers obtained in Step 3, the Scopus API "Abstract_Retrieval()" was applied to gather all the papers' metadata. In this case, the metadata were obtained from the papers in Dataset 1. This step was necessary to combine the results of Step 5 so that all the metadata for papers in the Core Dataset (which is a subset of Dataset 1) were available for subsequent analyses. The connection between the co-citation matrix and the metadata tensors was performed through an ad hoc Python code.
Step 7. Combining all the information available from the co-citation matrix and the metadata tensors, an ad hoc Python code was developed as a basis for factor analysis. First, the co-citation matrix was translated into a Pearson correlation matrix (m x m) in order to make the cocitations comparable and standardized, providing a more robust basis for the following statistical analyses [13].
Second, starting from the m x m Pearson correlation matrix, a Principal Components Analysis (PCA) with varimax rotation was applied in order to extract the key factors of the Core Dataset. In this context, a factor is a linear combination of optimally weighted observed variables that accounts for a maximal amount of the variance in the observed variables (relying on the correlation values obtained from the co-citation matrix) that is not accounted for by the preceding components and is uncorrelated with all of the preceding components [14]. Varimax represents a valuable rotation criterion for this analysis since it allows rotating elements to create an economic set of factors with high individual loadings.
Third, the Pearson matrix was used as the basis for a MDS algorithm that was developed as a support to interpret the research factors individually and jointly and to explore the relationships among them.
Step 8. In addition to the specific information on the Core Dataset, the analysis was extended to the original search dataset, i.e., Dataset 0. A meta-analytic overview of such papers was developed by means of multiple statistical analyses performed in Microsoft PowerBI.
Step 9. An additional analysis was performed through the exploration of keywords and their co-occurrences. Co-occurrences refer to all combinations of keyword pairs in each document being revised. This analysis relied on the assumption by Law and Whittaker, i.e., that authors of scientific papers choose technical terms carefully, recognizing some type of association between them [15]. Therefore, if multiple authors use the same terms and associate them, the relation can be assumed to be significant. A threshold of significance was assigned, i.e., a number of documents that had to include the keyword in order to consider this relation relevant for the analysis.
Extending the concepts presented in Fig. 1, Fig. 2 sketches the relationships among different datasets for the scientometrics analysis.

Findings
From an operations point of view, for documents published until April, 1 2019, the 9-step methodology described in Section 2 started from the adoption of the following Scopus search key: TITLE-ABS-KEY ("human reliability" OR "human unreliability"). This broad key aims to include all documents where the phrases "human reliability" or "human unreliability" have been mentioned in the title, abstract or keywords. For the purpose of this meta-analysis, the search key was purposively not been narrowed in order to include all the contributions that play a role in the intellectual structure of the field. In formal terms, such choice implies that no explicit exclusion criteria were assigned; i.e., the articles were included in the analysis regardless of their subject area, year of publication, source type, etc. Exclusion and inclusion criteria were instead data-driven, following the analysis of citations and cocitations, differently from PRISMA-based reviews (e.g. [16]).
The following statistical information about the approach can be added. Dataset 0 (the outcome of the search query) includes 2140 documents. Therefore, with respect to Step 3 and Step 5, n=2140. The length of the citation pair vector (cf.
Step 4) is 42910, implying that Dataset 0 presents 42910 citations, which refer to all the papers in Dataset 1. The intersection between Dataset 0 and Dataset 1 accounts for 5272 citations, which constitute the starting point for defining the Core Dataset. Regarding Step 5, the dimension n of the co-citation matrix was reduced to the number m of papers that have at least 20 total co-citations in (Dataset 0 ∩ Dataset 1) in order to retain a significant while manageable number of papers. This choice (m=440, cf. Step 5, Step 7) remains significant since it allowed the retention of those 440 papers, which include 39060 co-citations out of the 44930 co-citations (from the 5272 citations) from the total number of documents in Dataset 0 ∩ Dataset 1. This choice qualitatively confirms Relationships among Dataset 0 (outcome of the search query), Dataset 1 (dataset of cited papers) and the Core Dataset (papers with co-citations over a certain threshold). The size of the bubbles is a function of the co-citations count of a paper.
Pareto theory: approximately 20% of the documents explains more than 85% of the co-citations.
Based on these preliminary analyses, it was been possible to proceed with the papers' findings, were divided into 3 classes: -Statistical overview (Section 3.1) -Research factors (Section 3.2) -Key Term analysis (Section 3.3)

Statistical overview
Dataset 0 (the outcome of the search key in Scopus) was first analysed in terms of source type. In particular, most contributions (approximately 93%) in the dataset are listed as belonging to either conference proceedings (approximately 51%) or journals (approximately 42%). A more detailed overview of source types is presented in Fig. 3, where paper types are further explored. For a detail description of paper types and source types, please refer to [17].
The data summarized in Fig. 3 were analysed through a cumulative trend graph over years. It is possible to highlight a general increasing trend starting from early 2000s and an even larger increasing trend for journal articles (yellow area). The graph is a cumulative representation, and the border of the yellow area represents the total number of documents over the years. (Fig. 4) Another bibliometric perspective can be gathered from the analysis of open access contributions. In particular, only 106 contributions were published as open access over the years (less than 5%). Even considering the last 15 years of literature, which is considered the beginning of the open access movement [18], the statistics are similar (approximately 5%); thus, the field underperforms with respect to the average number of open access articles currently in the literature (set at approximately 27% according to [18] Table 1, where for multiple conferences, it is possible to find all relevant years listed. An additional statistical analysis can be performed with respect to the geographical distribution of documents. This distribution represents the number of documents produced per affiliation country (note that one document may imply multiple affiliation countries, and there may be even more than one affiliation country for each author). As a first step for the analysis, only the affiliation(s) of the first author is considered here. This analysis aims to give an overall geographical representation rather than a detailed author-based analysis. From Fig. 6, it is possible to note the leading role of institutions in the United States, followed by those in China, the United Kingdom and South Korea. The top-ten affiliation countries include 3 EU countries (France, Germany, Italy), Norway, Brazil and Japan (see Fig. 6 for details).

Overall definition
Research factors (RFs) were determined by the adoption of a PCA with varimax rotation (cf. Step 7). The outcome of the approach consists of defining factor loadings for each document [9]. A factor loading represents the degree to which a specific document belongs to a factor. A significance threshold was defined as ± 0.30, implying that a document was assigned to a factor if its factor loading was greater than the threshold value [11,19]. In the case of multiple loadings, the documentfactor association refers to the factor with the highest score.
Starting from the Pearson co-citation matrix (Cronbach's alpha=0.976), the PCA led to the definition of 10 factors, which in turn are able to explain approximately 78.5% of the variability. These factors include all the 440 documents previously identified (cf. Section 3 regarding the number of co-citations filtered). Regarding source type, the 440 documents constituting this Core Dataset can be compared with the number of documents in Dataset 0 (see Fig. 7). Although Dataset 0 includes a larger number of publications from conference proceedings rather than journals, the Core Dataset is mainly composed of journal articles. This is an expected outcome for bibliometric-based analyses: journal articles are usually the most cited document types.
For the FA, the interpretive analysis of the documents was performed by 3 researchers with an average of 10 years of academic and  industrial experience in HRA. In particular, the researchers investigated the contributions listed for a factor in order to make inferences from the title and abstract for their classification. Such inferences were the basis for providing an interpretation of the PCA factor as a representative research factor. To check the validity of the inferences and provide coherent intra-factor and inter-factor associations, the individual classification was then confirmed and validated through two focus groups involving another researcher with experience in data analytics.
An inherent bias in the co-citations metric may lead to the assignment of a document to a PCA factor not completely aligned with it. This has been managed through reading all the abstracts and, if needed, the full text of the ambiguous documents. The publications identified as not pertaining to an RF (usually documents with scores distributed in multiple factors) were re-assigned to other factors with relatively lower scores but with greater topic alignment. Following this interpretative perspective, one of the smallest (in terms of number of documents) PCA factors was excluded since its isolation was mainly due to self-citations (and consequently self-co-citations) by some authors who monopolized the factor. Documents in this PCA factor were re-assigned to other factors where relevant: a total of 23 documents out of 440 were left unassigned. In general, self-citations were not excluded since given the cumulative nature of the production of new knowledge, they were recognized as a natural part of the communication process. For the cocitation threshold assumed in this case, it was determined that selfcitations did not to play an important role in the citation rates attained by the highest-cited documents [20].   Nine RFs (out of the 10 PCA factors) were identified through the approach described above. An RF comprises a set of documents grouped by common research topics: for instance, these latter may be the main contributions of the publication, the method used, or the application field. For example, publications that use BBNs for analysis are mainly grouped in one RF (RF1). However, not all these publications aim to study the application of BBNs in HRA; some of them just make use of BBNs as a method for analysis, which may consider another product to be the main contribution. Nevertheless, publications that make use of BBNs for analysis will naturally cite (and be cited together with) publications that discuss and validate the use of BBNs, resulting in those being grouped in the same RF. For pragmatic reasons, a full list of all papers included in each of the 9 RFs is provided in the Appendix, while the following sections are intended to summarize main contributions and reference either the most cited papers per RF or those ones with higher FA rotated loadings, i.e., the ones that were usually easier to summarize semantically. A summary of the final association between RFs and documents is presented Table 2, and this summary is complemented by the information in the Appendix.

RF1 -Advances in quantification in HRA: Data collection and analysis methods
This publication group focuses on advances in quantification in HRA, including data collection and methods for analysis.
A large group of publications related to quantification in HRA focus on the use of BBNs. Indeed, the use of BBNs in the field of HRA is steadily increasing, as noted by [21] and [22]. In their reviews, they identify five main groups of BBN applications. The first group comprises publications on the modelling of organizational factors. This application is illustrated by [23], which proposes a fuzzy Bayesian network (BN) approach to improve the quantification of organizational influences in HRA. Another possibility is pointed out by [24]: BBN can be combined with system dynamics, ESD and FTs for a hybrid approach in how to incorporate organizational factors into PRA.
Other groups of applications identified in [21] are BBN-based extensions of existing HRA methods (e.g., [25]) and the assessment of situation awareness, as in [26], which provides a computational model for situational assessment.
The other two groups identified by [21] are the analysis of the relationships among failure influencing factors and the dependency assessment among human failure events. Indeed, BBN is a useful tool for dealing with dependencies. For instance, [27] presents a BBN model and uses the time slice concept of dynamic BN for explicit treatment of dependencies among HFEs.
The analysis of the relationship between PSFs is also closely related to dependency. In most of the HRA methods, dependency between PSFs is not considered. [28] propose a solution for this problem by using BBNs and artificial data sates. They model factors and estimate failure probabilities when dependency between PSFs is considered. Moreover, they use artificial data for the development and testing of the BBN. Artificial data refers to the generation of data with known properties in order to test a modelling approach and evaluate its performance. In a further work, they investigate an approach to incorporate information about uncertainty in the BBN parameter estimates and the effect of unreliable data [29,30] An additional potential domain for the application of BBNs is in dealing with limited data. BBN allows for the use of expert judgement in combination with empirical data, and solutions have been provided in this direction. [31] propose a Bayesian approach to aggregate expert estimates on human error probabilities to determine the relationships in an HRA model. Another document [32] remarks that a challenge during the elicitation of expert judgement is the possible high number of questions that are necessary. These authors propose a quality indicator that would allow for adequate quantification of qualitative knowledge with a reduced number of questions. BBNs can be further used to consider the uncertainty related to expert judgement. Approaches for treating uncertainty with fuzzy systems have also been proposed, and [33] compare BBNs and fuzzy expert systems for the treatment of uncertainty. They conclude that BBN is preferred in cases characterized by quantifiable uncertainty in the input, while fuzzy expert systems are preferred in cases where there is very limited knowledge and the analyst feels constrained by a probabilistic framework.
The incorporation of expert judgement is not the only solution for scarcity of data that can be modelled through BBN. Simulation can be a valuable tool to generate data, as in [34], which presents a data collection methodology using a virtual environment for a simplified BN model of offshore emergency evacuation.
The application of BBNs in the field of HRA is also explored in connection with other techniques. A hybrid approach has been used in model-based HRA methodologies, which propose to overcome issues in general HRA methodologies. These issues, among others, have contributed to the variability in results seen in the application of different HRA methods and in cases where the same method is applied by different analysts. In an attempt to address these issues, a framework for a "model-based HRA" methodology has been proposed. This framework uses a hybrid model with event sequence diagrams, fault trees and BBNs. The BBN models the influence of performance shaping factors in the failure modes [35][36][37][38] Other advances in quantification approaches for HRA include the use of simulators [39][40][41][42] for data collection and modelling. Regarding data collection, [43] remarks that data for HRA has been persistently viewed as lacking. Indeed, many sources of HRA-relevant data exist, and many efforts to collect the data have been and are being pursued. For instance, to inform human reliability analysis, the Human Event Repository Analysis (HERA) database was developed for the U.S. (NRC) as a repository of retrospective qualitative analyses of actual incidents [44]. In addition, the U.S. NRC has an active human reliability analysis (HRA) data program that, through the collection and analysis of human performance information, aims to improve HRA quality in the NRC's risk-informed programs [45,46]. The aims to collect and analyse licensed operator simulator training data for the primary objective of generating human error probabilities (HEPs) in HRA. The use of simulator data with the HURAM (Human-related event Root cause Analysis Method plus) methodology has also been adopted in Korean nuclear power plants [47].

RF2 -Human cognitive process across application domains
The RF 2 focuses on the human cognitive process across various application domains, ranging from maritime transport [48,49] to power systems [50]. Vanderhaegen et al. [51] address diagnosis and cognitive ergonomics, while Kontogiannis and Malakis [52] propose a framework of cognitive strategies in error detection to make human performance resilient to changes in work demands within aviation and work traffic control. The need for addressing human reliability from this perspective is also shared by Kim and Bishu [53], who state that human errors have been generally modelled on the basis of probabilistic concepts, leaving the consideration of cognitive aspects of human behaviours as merely optional.
On the other hand, He et al. [54] affirm that the Cognitive Reliability and Error Analysis Method (CREAM) relies on a sound cognitive model and framework and emphasizes the whole characteristics of the context. CREAM is a representative method of the so-called secondgeneration human reliability analysis (HRA) methods. For this reason, for application in the construction industry, Liao et al. [55] use CREAM as a basis to develop a model of the relationship between performance shaping factors and human error.
Bedford et al. analyse CREAM sensitivity with respect to the choices made for common performance conditions (CPCs -contextual conditions under which a given action is performed) and the intrinsic uncertainty when interpreting the method categories [56]. Such limitations are increased in the case of scarcity of empirical data, as shown by Wang et al. [57]. New CREAM performance conditions specifically related to space missions, i.e., an International Space Station ingress procedure, were also defined [58,59].
Expert judgement is essential for the study of cognitive processes, and several authors make use of systematic methods to obtain it. El-Ladan and Turan [60] and Maniram Kumar et al. [61] apply structured and guided expert elicitation methods to interview experts and increase the fidelity of second-generation HRA techniques.
To overcome CREAM limitations, as a complement to the methods employed, novel quantitative techniques are used to enhance its inherent perspective human error probability (HEP) analysis. Yang et al. [62], Kim et al. [63] and Ashrafi et al. [64], given updated information about a dynamic context, introduce concepts from Bayesian theory to improve HEP evaluation. To account for CPC ambiguity and unevenness, fuzzy versions of the CREAM paradigm are suggested by Marseguerra et al. [65], Geng et al. [66] and Konstantinidou et al. [67].

RF3 -Human performance and human factors dynamically modelled
This RF focuses on human performance and human factors described dynamically and through a comparison of diverse HRA methods.
Joe et al. [68] affirm that there is a general lack of focus on simulations of human operators and on how the reliability of human performance can affect risk-margins and the performance of nuclear plants.
To explore this, human performance data were collected during simulator trials and compared with the HRA lessons from Massaiu et al. [69]. Another aspect considered was the transition of technology in nuclear power plants, an issue that has raised many important human performance issues. For this reason, a survey was conducted by Liao and Chang [70] to examine the causal factors of human-system interface-related human errors in control rooms. Human performance is assessed not only for safety-critical industries (aerospace engineering, and nuclear engineering) but also for the automotive industry [71]. Operators' performance may be reflected in overall team performance. The relevant literature shows how appropriate methods, such as the Performance Evaluation of Teamwork (PET) [72] and Phoenix (modelbased human reliability analysis methodology) [73], can account for this performance interconnectedness.
The THERP (technique for human error rate prediction) is one of the most established and detailed HRA methods, and it considers specific performance shaping factors (PSFs) to assess human error probability. Bubb [74] applies the method to a case study within manufacturing. Other HRA methods, such as the standardized plant analysis riskhuman reliability analysis (SPAR-H) technique, were inspired by the use of the THERP in the treatment of PSFs. The SPAR-H method was developed to aid in characterizing and quantifying human performance at nuclear power plants [75] and has subsequently been used for other domains [76].
Van de Merwe et al. [77] apply SPAR-H to managed-pressure drilling operations and find it a useful support for project managers. Boring [78] aimed to bridge the SPAR-H HRA method with NASA's man-machine integration design and analysis system (MIDAS) for use in simulating and modelling the human contribution to risk in nuclear power plant control room operations. Defining the PSF role across the HRA stages, Boring [79] also wonders how many PSFs are necessary for techniques such as SPAR-H.
Human performance has an intrinsic dynamic nature, and HRA experts are focusing on including this aspect in novel analysis methods [80]. For instance, the Simulator for Human Error Probability Analysis (SHERPA) [81] aims to merge the advantages of simulation tools and the principles of traditional HRA methods. The "dynamic risk modelling project" [82] developed a simulation approach for the quantitative analysis of critical air traffic control activities by operators. Droguett et al. [83] adopts a Bayesian approach to provide dynamism to HRA.
Human performance analysis and the related inclusion of its dynamic features are also the objects of benchmarking studies. Boring et al. [84] discuss a study comparing and evaluating HRA methods in assessing operator performance in simulator experiments. Moreover, Boring et al. [85] address the drivers of crew performance in a methodto-method comparison.

RF4 -Quantitative definition of human actions and their dependency
This RF focuses on the quantitative definition and assessment of human actions, tasks, and commissions and their interdependency, interaction, hierarchy, or dependency on external factors.
The study of potential errors within human actions and how these contribute to accidents is paramount in HRA, but it is not free from challenges. An important output of human action assessment is the isolation of actions with the greatest potential to reduce accident risk [86]. To provide solid foundations to the analysis, the quantification may be based on operational experience, as Preischl and Hellmich [87] show by covering a wide variety of tasks and human error probabilities in the operations of German nuclear power plants. Prosek and Cepin [88] instead illustrate how parametric safety analysis studies provide relevant parameters for the HRA of human actions, whose complexity cannot be disregarded while assessing error probability [89]. To this regard, Park and Jung [90] identify an objective tool to evaluate the level of complexity of a task in HRA terms.
Human actions may depend on several factors. For this reason, dependency from contextual factors such as cultural variability is investigated by Park [91]. Intra-dependency among human actions also plays an important role in human reliability analysis, as dependent tasks may have an important influence on each other's probability. The modelling of dependencies may be based on the lessons learned from available HRA methods [92]. Julius and Grobbelaar [93] developed a tool and guidelines to obtain comparable HRA results when evaluating the human interactions of similar tasks. Other authors [94,95] opt for advanced computational models to assess the dependency between tasks. Fuzzy logic-based approaches are also considered for a number of case-studies [96][97][98], ranging from dependencies between operators in digital control systems [99] to the use of medical devices [100]. A solution to handle dependency in HRA is also demonstrated by using the analytic hierarchy process (AHP) method [101][102][103]: first, dependency influencing factors among human tasks are identified, and following the AHP weighting process, the weights of the factors are then determined by experts [104].

RF5 -Recent methodological developments and digital humansystem interface
Factor 5 focuses on methodological developments that aim to fill the gaps and advance the field of HRA.
The majority of the papers are recent and concern HRA and digital HSIs. As HRA was originally developed in the analogue control room age, many authors assert that the available guidance on assessing the interaction between humans and digital human-system interfaces is insufficient and identify areas that need attention [105][106][107][108] Referring to tasks performed in analogue control rooms, the HEPs contained in the methods might no longer apply. For instance, after evaluating various sources of data, [109] conclude that "existing human reliability assessment methods are likely to be optimistic in their estimates of HEPs where diagnosis is involved". New data on human performance and human error are thus collected to not only assess the reliability of human-interface interaction with digital artefacts [110][111][112][113][114][115] but also help in the development of the methods [116][117][118] An equally large set of contributions addresses issues related to performance shaping factors (PSF) not only due to the digitalization of the HSI, as in [119]. PSFs are discussed and defined for optimal selection in HRA [112], for improving the way they are treated (in SPAR-H) [120], or are studied individually, e.g., in relation to fatigue [121] or complexity [122]. Even more papers focus on estimating the effects of PSFs on human performance. This is accomplished through a literature review [121,123], computer simulation [124], or Bayesian belief network applications [125,126] or by analysing operational data [127], microworld data [118] and data from human-in-the-loop simulators [128,129]. The issue of objectively measuring PSFs is approached from several angles by a research group in South Korea [130][131][132].

RF6 -Advancements of HRA in healthcare
This factor emphasizes contributions related to the advancements of HRA in the field of healthcare.
This research stream can be considered a relatively recent area of study, as pointed out by [133]. They remark that HRA is still not broadly applied in healthcare, and the reason may be the lack of awareness of the usefulness of the techniques and their applicability to the problem of human error in the clinical context. The authors review popular HRA techniques and discuss their feasibility for use in healthcare. While some areas of healthcare have used certain HRA techniques, there is considerable scope to use other techniques and to apply techniques to other aspects of healthcare that have not yet explored. Lyson [134] provides a framework to select techniques for error prediction in the healthcare sector.
A large group of papers under this factor relates to doctors' performance during surgeries. Concerning developments in HRA, Onofrio, Trucco and Torchio [135] propose a taxonomy for PSFs in surgery applications. They remark that in spite of the growing interest in HRA application in healthcare, only a limited number of studies use PSFs to describe the working context. Cox Dolan and MacEwen [136] focus on HRA development in a specific type of surgery: cataract surgery. They remark that HRA is a prospective method of assessment of surgical performance and can be further used in the training and assessment of cataract surgery.
In particular, laparoscopic surgeries are a field of interest for HRA application. For instance, Ghazanfar et al. [137] analyse how divided attention affects novices and experts during this type of surgery. The observational clinical-HRA (OCHRA) [138] was developed for use in laparoscopic surgery. It is used by Talebpour et al. to analyse competency level for laparoscopic surgery, by [139] to analyse a proficiencygain curve, and by Miskovic et al. [140] to measure competence level during laparoscopic colorectal surgery. Other areas of application of OCHRA include laparoscopic rectal cancer surgery [141], laparoscopic cholecystectomy [142], and laparoscopic pyloromyotomy [143]. In the context of operative and cognitive skills, Tang et al. [144] further propose a new approach that combines OCHRA with Objective Structured Clinical Examination (OSCE) for competence assessment during laparoscopic surgery.
HRA in healthcare also leverages the HEART methodology. Castiglia, Giardina and Tomarchio [145] use HEART to evaluate the potential exposure of medical operators working in a high dose rate brachytherapy irradiation plant. Ward et al. [146] apply HEART as part of the investigations into a surgical incident involving the accidental retention inside a patient's venous system of a guide wire for central venous catheterization (CVC). Chadwick and Fallon [147] apply a modified version of HEART to the radiotherapy treatment process.
Other approaches are also proposed, including one by Pandya et al. [148], who provide a generic task-type-performance-influencing factors structure.

RF7 -HRA and human factors in design
RF 7 focuses on the application of human reliability concepts and tools to system design, bridging the gap between HRA and human factors.
The papers included in this factor are not concerned with a complete HRA examinations for system design purposes but rather provide examples of how to use HRA-related techniques for the identification, measurement and reduction of human-caused risks at the design stage. HRA techniques allow identifying bottlenecks in operating processes and improving the system design in socio-technical activities, such as the command and control room operations of a military vessel [149]. Further results refer to the identification of safety functional requirements (SFRs) in the nuclear industry, combining human perspectives with technical information [150]. Similarly, to combine traditional hardware and software requirements with the ones coming from the system users, corrective design actions based on the application of HRA techniques have been taken for a missile system design [151]. Following the increasing interest in car driving automation, to propose a way forward for regulation, training, car design, and intersection layout, an HRA perspective has been adopted for modelling driver-car interaction [152]. More focused on regulatory aspects, in a comparison with the ISO Guide (ISO/IEC Guide 73, ISO Guide 51, etc.), humanoriented, risk-preventing strategies have been developed in the design stage, emphasizing the need for collaborative participation [153].
The interest in the early design phase is further extended with research focusing on the system lifecycle. The early results focused on human-computer interaction to ensure usability during the entire lifecycle [154] and were later extended to the joint-cognitive dimension [155]. Human-computer interactions remain particularly relevant for both individual and team performance, as confirmed by an experimental research study in the nuclear domain [156], especially for the socio-technical design of 4 th generation nuclear reactors [157]. An experimental project showed how a 12-month program supported the integration of human-oriented analysis with traditional engineering approaches for both early concept design and later product qualification and certification [158]. In this context, the System Development Safety Triptych represents a checklist of considerations developed for the interplay of human factors and human reliability in the design, testing, and modelling stages of product development and planned for use during the conception, design and implementation of a system [159].

RF8 -Benchmarking exercises in HRA
RF 8 reflects an overall empirical connotation but is focused on benchmarking among different techniques and assessment methodologies. In HRA literature, this perspective considers the significant differences in the scope, approach and underlying models of the available literature and the subsequent need for comparing respective results with available empirical data [160]. Benchmarking can be intended for use between a method and empirical data, as well as between different methods and data. For the former, see the Qinshan nuclear power plant exercise involving different human interactions that are skill-based, rule-based and knowledge-based [161]. Regarding the latter comparison, see the 1992 benchmarking exercise conducted to compare the THERP, SLIM, and a rank-ordering procedure. The results suggested the need for the use of a more structured perspective when applying the methods [162], a problem partly solved in more recent applications [163]. Benchmarking has been referred to also in methods' results and proceduralized risks, such as the risks in the fuzzy fault tree analysis compared with the modern gamma rays irradiators' risks suggested by the International Commission on Radiological Protection [164]. Benchmarking also extends to very technical aspects, such as the probability distribution for the definition of hazard rate parameters P. Riccardo, et al. Reliability Engineering and System Safety 203 (2020) 107102 (i.e., log-normal, gamma, inverse Gauss) [165]. When assessing a method, critiques have been recognized regarding the reliability of available data as well as the advantages afforded by an investigator's and a reporter's background in a marine transportation case study [166]. The need for a structured approach has also been examined through the introduction of a combined methodology based on HRA and a failure modes, effects, and criticality analysis [167]. A recent study identifies some specific analysis criteria designed to compare and map HRA methods (e.g., required data evidence, theoretical basis, and PSF coverage) and finally suggests the benefits arising from the use of a cross-fertilization approach for socio-technical systems [168]. This trend is also confirmed by another research study comparing results obtained from traditional analysis; some documents in this RF argue for the potential benefits arising from a resilience engineering point of view [169]. Similarly, an exploratory benchmarking exercise between traditional techniques and one of the most used resilience engineering methods, i.e. the functional resonance analysis method (FRAM), promotes the complementary perspective these methods can offer [170].

RF9 -The use of fuzzy logic in HRA
This RF is strongly related to other factors, in particular RF1, and concerns HRA advances obtained by using fuzzy logic. The importance of applying fuzzy concepts to reliability analysis was explored by Onisawa [171]. Szwarcman et al. [172] further present a methodology for the characterization of human reliability based on fuzzy sets concepts. They propose a human reliability index for the identification of problems that may lead to human errors, as well as possible strategies for the control of potentially adverse impacts of interactions that add uncertainty and complexity to processes. One particular area of HRA that can benefit from the use of fuzzy logic is the treatment of uncertainty. For instance, demonstrating an application for HRA, [173] presents two techniques for sensitivity and uncertainty analysis of fuzzy expert systems. Baziuk, Rivera and Nuñez Mc Leod [174] propose an approach to facilitate the identification of uncertainties and future treatment with fuzzy sets. They attempt to unify human behavioural science and engineering in a unified human reliability model. Fuzzy logic can also be applied by using an existing HRA method as a basis. For example, Kirytopoulos [175] proposes a fuzzy logic system based on CREAM to provide more sophisticated estimations of the tunnel operators' performance in safety-critical situations.

Multi-dimensional scaling
The significance of the RFs has also been tested through a MDS algorithm. Based on the Pearson co-citation matrix as a similarity measure, MDS is intended to depict the conceptual proximity among contributions and RFs in the Core Dataset. Two-dimensional and threedimensional MDS maps have been developed to find an interpretable configuration (two, or three dimensions at maximum) that is still statistically representative. Among the tested results, a three-dimensional, non-metric random starting configuration has been selected since it allowed an acceptable value of its goodness-of-fit (stress < 0.2) [176]. In this MDS map, each document's position reflects its relative correlation with other documents: the higher the correlation is, the closer the documents.
Relying on the graphical representation, it has been possible to define a meta-dimension for the map that gives a holistic interpretation of the nature of multiple RFs, as shown in Fig. 8. An overall dimension, which goes from "theoretical", extends through "simulation-based", and finally reaches "applied", indicates the nature of the considered works.
As mentioned in Section 1, HRA theoretical foundations may not be directly reflected in scientific indexed databases, as these theoretical foundations may be the results of proprietary research or may be publicly available but not recorded in citation databases such as Scopus. For this reason, the dimension identified in Fig. 8 originates from an area that is not covered by the analysis. This area lies on a lower level where no documents are graphically represented, as they are not found in the considered databases. While this lower level represents the very HRA theoretical origins, both foundational components and simulations are observed in RF2 and RF4 and address processing and response. Human cognitive processes, such as diagnosis, are the focus of publications grouped under RF2 and are treated across various domains. Actions, their interdependency and their quantification are the subjects of RF4. The distinction outlined by RF2 and RF4 is characteristic of traditional HRA methods, such as the technique for human error rate prediction (THERP) [177], the accident sequence precursor (ASP) HRA methodology [178], the SPAR-H HRA method [179,180], and the Petro-HRA method [76,181]. Quantitative aspects are found also within the works of RF1 and discuss the advances in the pivotal step of HRA quantification (e.g., in terms of simulations), which represents a pillar of HRA theory and reflects an overlapping area with RF4.
RF1 dedicates more attention to the use of data and simulations. Data collection for HRA is an important sub-topic of RF1. RF9, which focuses on HRA and fuzzy logic, shows some overlap with RF2, demonstrating that the complexity and uncertainty encountered during the assessment of cognitive process may be dealt with by classes of simulated alternatives whose boundaries are not sharply defined. RF3 lays the foundations for simulations (both in virtual and real environments), as the work labelled with this RF study human performance and human factors from a dynamic perspective in an effort to continuously refine HRA models and reproduce a realistic evolution of events.
RF5 spans along the whole theoretical/simulation-based/applicative dimension. For this reason, it well represents the tension involved in the improvement of HRA theories through new data, which may come from either simulations or verifiable observations from the applications in specific sectors. RF6 and RF7 are relatively isolated on the map (Fig. 8) with respect to the other RFs and present a strong applicative connotation. The two RFs show how empirical studies support the advancement of HRA in healthcare, while the application of human reliability concepts and tools allows considering human factors in the design of systems.

Key term analysis
This analysis has been performed to further explore the content of documents and their evolution over time. Note that the key terms are the key words as originally proposed by the authors of the articles. Ideally, the key words should reflect the main content and contributions of the paper. For the analysis, we assume this to be accurate. Moreover, we present the key words as written by the authors, including acronyms. As a result, for instance, some maps may have "probabilistic safety assessment" and "PSA", although the meaning of both key terms is the same.
Following a time interval of approximately 5 years for each cluster, the article database (Dataset 0) has been divided into clusters according to the articles' publication year. Therefore, 4 clusters have been identified: 1999-2003, 2004-2008, 2009-2013, and 2014-2019. The final period is four months longer than the previous ones. The size of the sample cluster before 1999 was too small for any representative analyses. It is interesting to observe how the increase in the number of papers generated, as expected, an increase in the variety of topic areas. The analysis allows us to explore the relative frequency of key terms (size of the bubble) and the interconnectedness (links between bubbles) of methods, models, and research aspects. The thickness of the lines depicts the strength of the relationship between the key terms: a thicker line connecting two key terms indicates that those have often been used together. The analyses were performed in VOSviewer [182]. To have a manageable and significant number of terms, the threshold for the number of documents that should include the keywords was set to 5. There are only 4 key terms significantly used in the references between the years 1999 and 2003. These are "human reliability", "human error", "human factors" and "risk analysis". "Human factors" is a central connection to the key words. Note that as a discipline, human factors was established earlier than HRA. Indeed, the oldest professional body for human factors' specialists and ergonomists is The Chartered Institute of Ergonomics and Human Factors, formed in 1946 in the UK. The 5-year period of Dataset 1 is characterized by few sources dealing with generic issues in the field rather than more specific topical contributions (see Fig. 9).

Cluster: 2004-2008
In the period 2004-2008, the significant key terms increased to 14 (see Fig. 10). Note that compared to the previous years, in this period, the term "HRA" is substantially used, which indicates a popularization of the discipline so that its acronym is well known in this period. During this period, the publications initially concerned human factors, human error, PRA, and human reliability assessment and progressed to significantly include human error probability and performance shaping factors. The latter is connected to human factors, as the factors analysed in the human factors discipline affect operators' performance and, as such, can serve as a foundation for performance shaping factors in HRA. CREAM, developed in 1998, is used as key word in this period and is associated with PSA. Note that this does not necessarily mean that CREAM was not used in HRA in the previous years. However, it may be assumed that during this period, it became a more popular method since the key words were chosen by the articles' authors to make their paper identifiable and easily found.

Cluster: 2009-2013
The degree of specialization of the sources explodes to 45 items in the 2009-2013 period (see Fig. 11). In addition to CREAM, the key terms include the methods SPAR-H, published in 2005, and THERP. Moreover, in addition to risk analysis, PRA and PSA, HRA appears connected also to LOPA, process safety, risk management, and resilience engineering, indicating a broader use of HRA in risk-related disciplines. Concerning fields of application, this period reveals the use of the key word "patient safety" in addition to the expected "nuclear power plants", indicating a significant number of papers concerning the use of HRA in healthcare. Compared with previous years, in this period, the key words, namely, performance shaping factors and performance influencing factors, were increasingly used. They are connected to Bayesian networks, indicating the increasing use of BBNs for modelling PSFs and organizational factors. This increased usage suggests a popularization of the recognition of the impact of organizational factors in human performance and the need to model them as PSFs.

Cluster: 2014-2019
The key terms increase to 53 between 2014 and 2019, exhibiting a rather complex network of interrelated clusters (e.g., key terms such as "Bayesian networks", "PSF", "expert opinion/judgment" appear in several clusters) (see Fig. 12). In addition to "patient safety", which was used during the previous cluster, this period of time also includes "surgery", which focuses on the use of HRA in healthcare, and "maritime safety", indicating the use of HRA in fields other than nuclear. An additional key term that gained importance in this period is "cognitive". Given the increasing awareness that cognitive errors should be assessed in human reliability, this was expected. "Digital main control room" is also an expected added key term for this period. Unlike the more popular terms such as "human factors", this term was not used as a key word by a large number of papers during the 5 years analysed and therefore cannot be clearly viewed in Fig. 12. Digital main control rooms are an important and recent modification in NPPs' operation, and the HRA community has been discussing and proposing how to analyse this new form of interaction with HRA. A similar phenomenon occurs with the key term "HRA data": compared to other terms, this term is not very popular; therefore, it cannot be seen in Fig. 12. However, the topic is of increasing interest in the HRA community, in particular due to the SACADA and HuREX projects.

Discussion and conclusions
Regarding the methodological contribution proposed in this research, the multi-method approach allows the use of complementary perspectives to explore the intellectual structure of research on HRA. Through analytic expressions grounded on relevance theory, the approach could be further extended through Pennant diagrams to capture main documents (or authors) in terms of text (or citations) entropy [183]. Other analyses based on naturalistic text analyses may automatically support content extraction. In the long run, the process described may be linked to (near) real-time data extraction and analysis so that scholars may access such outcomes autonomously. Through modern technologies and database informative structure, the notion itself of literature reviews including systematic data analytics may evolve through support vector machines or artificial neural networks [184].
The statistical overview (results from the methodology step 8) highlights an increasing trend in terms of the number of publications (especially journal articles) from the early 2000s. Currently, HRA research is not concentrated within few world regions but is mainly spread across the American, European and Asian continents. The journals and conferences reporting a larger number of publications are not surprising, as they clearly show the following aspects of HRA: -It addresses the topic of reliability and safety, as the main journals for HRA publications are Reliability Engineering and System Safety and Safety Science, and the main conferences are PSAM and ESREL; -Its origins are within the nuclear sector, and it has been adopted by other safety-critical sectors such as the process industry, as several publications are from the Annals of Nuclear Energy and the PSA conference by the American Nuclear Society, together with the Global Congress of Process Safety.
Despite the increasing trend in publications, there is still a need to improve access to HRA publications by promoting open access. However, this trend may be slowly reversing in Europe. Two main factors motivate this: several initiatives seeking nationwide licenses combine reading paywalled articles and publishing in an open access format into one fee [185], and the projects that received or are receiving Horizon 2020 funding are required to make sure that any peerreviewed journal article they publish is openly accessible and free of charge [186]. Such trend inversion is confirmed by the increase of HRA open access publications within the 2016-2019 interval.
The research factors (results from the methodology step 7) reveal that cognition processes are recognized and studied independently from actions. Methods such as CREAM rely on a sound cognitive model and framework that emphasizes the whole characteristics of the context. Expert judgement is essential for the study of cognitive processes, but the discussion on how to use it in a structured and guided fashion to increase HRA fidelity is still open. At the same time, human action assessment allows for isolation of actions, which has intrinsic potential to reduce accident risk. However, it should not be forgotten that human actions may depend on several factors, such as contextual factors, or be intra-dependent on each other.
In the field of HRA, the use of BBNs both as a stand-alone approach and combined with other techniques to create hybrid approaches is steadily increasing within the relevant literature. BBNs are effectively used to model organizational factors and deal with the mentioned dependencies but continuously require data for development and testing. A solution may reside in the fact that new data on human performance and human error are collected to assess the reliability of the humaninterface interaction with digital systems. Indeed, data collection for use in HRA is the focus of two substantially large projects: SACADA [187] and HuREX [188]. SACADA is a database developed by the U.S. NRC and collects operator performance data in cooperation with nuclear companies. The data are collected during training programmes with the aim of supporting NPP's operator training programmes and improving HRA quality. SACADA is an ongoing project, and the NRC made a portion of the database available to the public. Updates on SACADA, NPPs partners, and the database structure can be found at the NRC website. Similarly, HuREX provides a framework for HRA data collection. HuREX is an ongoing project by the Korea Atomic Energy Research Institute that aims to generate HEP data and correlations between PSFs and HEPs. Computer simulation, data from human-inthe-loop simulators, and operational data from surveys are other approaches to accumulating human reliability data. Despite the existence of several strategies, uncertainties related to the collected data (e.g., unreliable or sparse data) may be present. For this reason, the importance of applying fuzzy concepts to new generation HRA is being recognized by the experts.
Notably, a number of works identify and underline that human performance has a dynamic nature that is not fully captured by HRA. Experts are focusing on including this aspect in novel analysis methods through benchmarking studies or new sessions of simulations. These areas of study and application focus on developments in human performance in the context of highly critical tasks for humans. These areas include healthcare (surgeries, radiotherapy treatment processes, etc.), nuclear, chemical, manufacturing, and railway domains, which in addition to experiencing relatively well-known issues, cyclically remains subject to transitions towards new technologies and emerging risks.
The results from the multi-dimensional scaling provide a spatial positioning of the single factors represented on a map. There are two main takeaways from these results. First, an overall dimension from "theoretical", through "simulation-based", and finally extending to "applied" indicates the nature of the considered works and resembles the evolution process of a generic methodology, starting from the definition of its basic theory and the study of its feasibility, through the demonstration of its maturity on simulations, and extending to its testing in real cases to show its readiness. The very origin of the HRA dimension is located among a number of foundational documents that are not analysed by this work and represent most of the theoretical elements at the basis of the topic. However, as HRA is addressed by a number of methods (even grouped within generations), this evolution has been iterated repetitively within the scientific literature, showing a clear pattern in Fig. 8. Second, the factors are not only organized based on this dimension but also highly interlaced due to transversal topics that outline general trends, as can be appreciated both graphically (Fig. 8) and thematically. The dependency of human actions is one of these transversal topics (addressed by RF1, RF3 and RF4, which are graphically adjacent in Fig. 8). However, the related uncertainty (RF2 and RF9, graphically adjacent in Fig. 8) and complexity (RF4 and RF9, graphically adjacent in Fig. 8) require integration with novel approaches based on fuzzy concepts (RF2, RF4 and RF9, graphically adjacent in Fig. 8). On the other hand, limited data (RF1 and RF2, graphically adjacent in Fig. 8) for HRA may require the use of appropriate expert judgement (RF1 and RF2) and ad hoc simulations (RF1, RF2, RF3 and RF5, graphically adjacent in Fig. 8). Another novel approach that is proving suitable for HRA is the adoption of BBNs (RF1, RF2, RF3 and RF5, graphically adjacent in Fig. 8), which represent one of the most recent developments together with an extended digitalization incentive (RF4 and RF5, graphically adjacent in Fig. 8). Moreover, the study of HRA applications in domains that fall outside the traditional safetycritical sectors, such as the nuclear and process industries, is common across the factors (RF3, RF4 and RF8, graphically adjacent in Fig. 8) and represents the main feature of the most delineated factor in the map (RF6).
The key term analysis (results from methodology step 9) outlines clear research streams within the HRA literature. While the publications from the period 1999-2003 show rather predictable key words ("human reliability", "human error", "human factors" and "risk analysis"), the second period (2004-2008) shifts its focus to the fundamental HRA elements and addresses human factors, human error and their probabilistic modelling through performance shaping factors. The emerging HRA methodology denominated CREAM also becomes one of most considered key words, demonstrating the rise of a method that is rather popular today. "CREAM" is also a key word of the period 2009-2013, which sees a focus on both a consolidated first-generation technique (THERP) and its emerging derivation (SPAR-H). The key words "process safety", "resilience engineering" and, especially, "patient safety" demonstrate that HRA is increasingly employed beyond its traditional application fields, such as in "nuclear power plants", and gradually becoming a pillar of the overall industrial risk analysis. Moreover, the analysis of this period registers the appearance of "BBNs" as a key word, later confirmed in the period 2014-2019, when the adoption of this quantitative technique for HRA further strengthens. The trend concerning the application of HRA within relatively new fields is also confirmed in this last analysed period, as the key words "surgery" and "maritime safety" are registered. Finally, the digitalization wave is registered within the HRA community, as "digital main control room" and "HRA data" become key words. It is expected that data-based and BBNrelated topics may eventually lead HRA towards future research involving the adoption of relatively more sophisticated machine learning techniques, mimicking recent risk analysis trends [189].
In conclusion, this study allows the promotion of awareness and an understanding of publications in the field of HRA. In a nutshell, the scope of this analysis focused mainly on exploring and discussing publications within HR rather than on the challenges of the field itself. Nonetheless, further research can start from the results of the present study to provide additional observations and critical reflections on the discipline, also considering the social structure of the field.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
This research was supported by the project Lo-Risk ("Learning about Risk"), funded by the Norwegian University of Science and Technology -NTNU (Onsager fellowship).

Factor Title
Year Source

RF1
A Bayesian approach to treat expert-elicited probabilities in human reliability analysis model construction