Reliability Engineering and System Safety Enhancing human performance in ship operations by modifying global design factors at the design stage

of ﬀ ective modi ﬁ on of we introduce probabilistic models linking the e ﬀ ect of GDFs with the human performance suitable for ship design process. As a theoretical basis for modelling human performance the concept of Attention Management is utilized, which combines the theories described by Dynamic Adaptability Model, Cognitive Control Model and Malleable Attentional Resources Theory. Since the analysed ﬁ eld is characterised by a high degree of uncertainty, we adopt a speci ﬁ c modelling technique along with a validation framework that allows uncertainty treatment and helps the potential end-users to gain con ﬁ dence in the models and the results that they yield. The proposed models are developed with the use Bayesian Belief Networks, which allows systematic translation of the available background knowledge into a coherent network and the uncertainty assessment and treatment. The obtained results are promising as the models are responsive to changes in the GDF nodes as expected. The models may be used as intended by naval architects and vessel designers, to facilitate risk-based ship design.


Introduction
Reduced human performance is reported as one of major factors contributing to the maritime accidents, [1][2][3][4]. In the recent years the studies related to the quantification of human performance for various shipborne operations have been gaining an increasing attention, resulting in a number of models and approaches, see for example [5][6][7][8] . At the same time, significant efforts have been made to study and implement local design modifications of ships improving the ergonomics thus human performance on board a ship and ultimately ship safety, [9][10][11][12][13].
A major recent advance in the field of maritime safety is the development of risk-based ship design methodology (RBSD), resulting in development of larger and potentially safer ships, [14,15]. Within RSBD the assessment of the risk level with respect to predefined types of accidents is conducted in the early design stage, where a design modification is easy and cost-effective and risk is treated as a design objective, [12]. In risk analysis two aspects of the analysed accident are covered, its likelihood (accident prevention) and the anticipated consequences (accident mitigation). The latter is addressed by improving technical and structural reliability of a ship and it has been extensively studied over past years, [16][17][18][19][20]. The former is addressed usually by improving performance of a human, however the research on the effect of the overall ship design on human performance is in its infancy, [21][22][23][24], despite its relevance to the field of ship design, [25]. Such a method could be incorporated into the RBSD, improving human performance through modification of appropriate global design factors (GDFs), thus reducing the risk of accidents already at the early design stage.
However, the possible risk reduction remains unknown until now, since the common human error quantification frameworks do not readily account for the specifice ffect of the GDFs.
Therefore, this paper presents advances, focusing on modelling the effect of GDFs on human performance which is measured with the probability of ship-ship collision and ship grounding, which in turn is a recognized proxy for risk in the RBSD process and can easily be incorporated therein.
As a results of extensive literature survey on the effects of human exposure to the following three GDFs: ship motion, noise and vibration see [24][25][26] a workable approach has emerged for modelling human performance focussing on attention management, which is found suitable for the given purpose. It is based on three theories: the Dynamic Adaptability Model, [29] Cognitive Control Model [30] and Malleable Attentional Resources Theory [31].
These foundations are used as a guide for constructing two models presented here, which are developed using Bayesian Belief Networks (BBNs). BBNs is capable of representing background knowledge about the analysed accidents, the evaluation of associated uncertainties, efficient reasoning and updating in light of new evidence, see for example [6,[32][33][34][35][36][37].
Finally the models are validated adopting a framework as proposed by [38], which is found suitable for a given purpose, [39]. The framework allows for rigorous checks of the models along with the evaluation of uncertainty. As result, the models are found to behave in response to GDF inputs as intended, and the obtained results are found valid for a given purpose. The models can offer a valid comparative assessment of ship designs with respect to human performance, which is their primary intention.
The remainder of the paper is organized as follows: Section 2 introduces the modelling framework, upon which the human performance models are developed. The models development process is presented in Section 3. Section 4 discusses the validation of the models and Section 5 concludes the paper.

Structure of the models
The aim of the presented models is two-fold. First they quantify the human performance in the presence of motion, noise and whole body vibration that are specific for each ship designs. Second they allow differentiation of various ship designs with respect to the human performance.
In this section we introduce a modelling framework adopted here that fits requirements of RBSD and recommendations by IMO regarding human performance modelling in maritime [40]. Subsequently the causal pathway linking the GDFs with human performance is described.

Modelling framework
To develop the models we adopt a generic modelling framework by Fig. 1. A modelling framework adopted for human performance evaluation subjected to GDFs, suitable for risk-informed ship design.  [41], with modification, as depicted in Fig. 1. The framework aims to: • integrate in systematic way the available background knowledge on the subject of the analysis, • provide sound basis for the procedure of human performance assessment subjected to GDFs • account for the existing uncertainty by determining its bound for the models, • validate the developed probabilistic models, • facilitate the decision making process.
The relevant data, theories, models and expert's judgment is obtained and gathered into a workable model, mimicking the described phenomenon. Since the level of available background knowledge significantly varies across the models, the modelling choice has to reflect on that, being able to: • account for the uncertainty, • propagate it through the model, • infer in the presence of uncertainty, Considering the above, we use probabilistic casual models, aka. Bayesian Belief Networks, as a modelling technique. BBNs is recognized tool for reasoning under uncertainty, linking various types of data into one whole for the wide uncertainty assessment and treatment. Within the BBNs the uncertainty is measured through the probabilities, and propagated through the model with the use of Bayesian Theory. The inherent feature of BBNs, two-ways reasoning, allows not only forward propagation of the evidences resulting in an outcome, but it is also possible to propagate backwards the evidences to estimate the most probable input variables, given a selected state of the output. The high-level modelling framework adopted here is depicted in Fig. 1 and elaborated in the following sections.

Linking the effects of GDFs with human performance
To describe the process through which exposure to GDFs causally affects the performance of a crew member in relation to specific operations (collision avoidance, grounding avoidance, tasks related to maintenance of ship technical systems) a causal pathway was developed through the mediating agent of the crewmember. The process serves to do three things: 1. Represent the mechanism by which GDFs exposure impacts human performance in operations. 2. Describe the overall topography of the final model. 3. Facilitate the identification of nodes.
GDFs can be considered a type of performance shaping factors (PSF), where PSFs are an aspect of the human's individual characteristics, environment, organisation, or task that specifically decrements or improves human performance, thus increasing or decreasing the likelihood of human error respectively, [42]. While there are many other PSFs that can affect human behaviourfor instance training, experience, competence, time available, workload, job design, manning, ergonomics of the equipment and procedures -these are excluded from the models as they are not affected by exposure to GDFs. All the excluded PSFs are implicitly assumed to remain constant within the model.
Other potentially relevant factors, which are not considered, are the long terms effects of GDFs on the crew performance. For example, we do not consider the hearing loss due to long-term noise exposure either individually or in combination with other GDFs effects. In practical terms only the effect of GDFs-affected human performance on the possible occurrence of collision and grounding in combination with the safety critical task (SCT) being performed are considered.
In the models the inputs and outputs are predetermined. The GDFs form the three inputs: ship motion, noise and whole body vibration (WBV). The unwanted outcomes form an output: the probability of an accident. The latter is chosen as a measure of human performance in SCT related to accident avoidance; since it is commonly accepted and widely used metric in the maritime domain thus it could be easily implemented in the RBSD process.
In reality, crew exposure to GDFs is likely to result in a plethora of effects on human performance and subsequent outcomes. However, to remain within the scope of our study the causal representation is limited to describing only those mechanisms that can describe the relationships between the predetermined inputs and outputs, as depicted in Fig. 2.
Two main paths linking GDFs exposure to human behaviour have been identified: • Path 1: Stressor effects. Exposure to a GDF acts as a stressor and can affect the perceptual, cognitive and physical capabilities of an individual (e.g. attention management), which can subsequently impair the performance of the individual (i.e. the actual behaviour produced).
• Path 2: Physical effects. Exposure to a GDF can have specific and direct effects on the behaviour produced. For example, Ship motion can result in Motion-Induced Interruptions (MII). MII does not affect the underlying human capabilities of balance or fine motor control, but it exceeds the ability of the human to compensate and produce the intended behaviour. Similarly, WBV can directly impact the actual behaviour produced.
These two paths show how GDFs exposure affects human behaviour, which in turn influences the performance of SCT. It is the outcomes of an individual's actions and behaviour that determine the success or failure of a SCT. Insufficient performance of the SCT creates an antecedent for the unwanted outcome. The SCT are associated with: • Maintaining safe vessel navigation thus avoiding collision or grounding.
• Proper maintenance of technical equipment of a ship, required for performing accident evasive action.
However, insufficient task performance alone does not determine Fig. 2. A causal chain describing the relationship between crew, GDF exposure and unwanted outcomes, [43]. whether or not a collision or grounding occurs; the vessel must also be exposed to the collision or grounding hazard, as follows: • For a collision to occur, another vessel must be on a collision course. • For a grounding to occur, the ship must be in shallow water.
This causal mechanism makes the following assumptions: • While we recognise that individuals have differing cognitive and physical abilities, it is assumed that all individuals have the same basic set of capabilities (i.e. all individuals can manage their attention, irrespective of the extent of this capability).
• Human behaviour is influenced by diffuse and acute effects of GDFs exposure as represented by the paths in Fig. 2. • The crew perform SCT related to collision and grounding and tasks are appropriate, processes and procedures are optimised, and are undertaken by a competent operator.
• SCT must be performed correctly to maintain safe vessel operation.
• In case of bridge team the SCT manage the exposure of the vessel to the collision and grounding hazard.
• In case of engine room team the SCT manage the ability of a technical systems of a vessel to respond as requested by the bridge team.
• While it is recognized that interaction effects between GDFs within each pathway are likely to exist, these are excluded from the models, as the literature does not provide any information describing this interaction. For the review of the relevant literature the reader is referred to our earlier work [26,27].

Background knowledge related to human performance affected by the GDFs
It was found that the data on the specific GDF effects of ship motion (with the exception of motion induced interruption MII), noise, WBV on human performance are sparse and in many, but not all, cases generated under very specific, often non-marine, conditions. Data shows that there is certainly evidence for GDFs having some effect on human performance. However, the direct effects of GDF exposure on human performance tend to be weak, whereas secondary effects acting through another mechanism (e.g. fatigue, Motion Induced Sickness -MIS) tend to be stronger and more pervasive, [44,45]. Specifically, there are some data that describe the: • Impact of GDFs on specific human capabilities, [46]. • Impact of GDFs on specific human behaviours, [47]. • Impact of errors on task performance, [48].
However, there is very little data about the link between the following components: • Degraded human capabilities and collision or grounding related performance.
• Degraded task performance and exposure to the collision/grounding hazard Fig. 3 demonstrates the links in the causal chain for which some quantitative data are available (in green) and the links for which there is no data (in red). For a summary of available literature the reader is referred to our earlier work, [27]. In addition to this gap, a given level of exposure to GDFs of certain intensity or duration may not affect all individuals equally. For example, while a given frequency and amplitude of ship motion may be generally MIS-inducing, individual experiences may range from significant nausea to no negative affects whatsoever, depending on their underlying susceptibility to MIS and the degree to which they have acclimatized. Moreover, with the possible exception of secondary effects on human performance caused by fatigue, attributable to sleep disruption, a holistic view could not readily be derived directly from the individual findings. As such, the relevant theoretical models available in the scientific literature guided our approach.
The approach taken here to describe a mechanism that accounts for the impact of stressors on human performance has been based on the principles of attention management. It combines the principles from three theoretical models: • Dynamic Adaptability Model (DAM), [29]. • Cognitive Control Model (CCM), [30].
Under the DAM paradigm, GDFs are seen as types of physical stressor that affect human capabilities associated with maintaining a desired level of task performance either directly or indirectly (e.g. via fatigue). When exposed to GDFs, CCM describes humans compensating through the effortful direction of more cognitive resources at the task, typically at the cost of performance in other areas. Despite the sophisticated, and potentially subconscious, strategies humans have at their disposal, there is a limit to how much an individual can compensate without experiencing degradation in primary or secondary task performance. In addition, the extent to which human can compensate for task demands is not fixed. MART describes this compensatory capability changing as a function of task demands and associated arousal an individual experiencesattentional resources available vary as a function of load. When humans are in a state of under-load (i.e. bored) their pool of attentional resources is relatively small and will increase proportionately with the demands placed on them. However, there is a limit to how much the pool of attentional resources can grow. When task demands exceed the pool of attentional resources available (either transiently or when the upper attentional resource limit is exceeded), performance can breakdown and errors may be made.
Generally, task performance is only expected to degrade and become insufficient when compensatory mechanisms have failed. However, the literature does not allow prediction of how and when (chronologically) an operator would fail, under what conditions of GDF exposure, and what the specifice ffect on behaviour (i.e. type of error) would be.
In the models presented here, the main element is the performance of the SCT by engineers and a bridge team. The SCT of the engine team Fig. 3. Supporting data for links in the causal chain, [43]. is to maintain the ship systems, ensuring that they function properly when needed. This is a simplified description of the effect of GDFs on technical failure, through the mediating agent of crewmember. The SCT of the bridge team is to perform tasks associated with an accident evasive action. These tasks are complex and distributed in time, thus can be decomposed into three major phases, [5]: detection (D), assessment (A), action (Act). These three phases (DAAct) reflect the basic cognitive functions of observation, interpretation and planning, and execution, [49,50].
The performance of various tasks is governed by attention management, which is the supervisory human capability that directs, allocates and regulates the attentional resources required for the tasks. This high-level supervisory capability manages lower-level tasks such as perception, cognition, decision-making, memory, fine motor control and locomotion.
To evaluate the effect of GDF exposure on human performance the approach based on attention management theory is taken. Therein the effect of GDFs is represented as a stressor that sits either above or below the threshold of attentional capacity for any given SCT. If the stressor exceeds the attentional capacity then the attention management is degraded, whereas no negative effect is expected if the stressor can be managed within the available attentional capacity.
Representing ship motion, noise and WBV GDFs as stressors interacting with an individual's attention management capabilities provides an evidence-based mechanism for human performance that has been used to develop the models presented here.

Aggregation of the background knowledge into a model
The GDFs may have either an acute or diffuse effect. In case of an acute effect, the threshold refers to the GDFs level at which an individual may be unable to physically compensate for GDF exposure and perform actions as intended. In case of diffuse effect, the threshold refers to the amount of motion, vibration or noise an individual can endure before it acts as a stressor (with a corresponding stress response). The exact value of the threshold will vary between individuals and it is dependent upon previous experience, exposure duration and sensitivity. If the thresholds are not exceeded, then the attention of a crew-member is not affected, otherwise the attention management capability is degraded. It is not yet known what underlying psychological factors set an individual's baseline tolerance to GDF exposure. In reality, the overall impact of stressor exposure is likely to be determined by an individual's personal threshold for feeling the effects of stressor exposure, and the effectiveness of the strategies they adopt for managing mental resources to preserve task performance (e.g. task prioritisation and shedding).
Representing GDF exposure effects on safety behaviour via the attention management path provides a structure compatible with the introduction of an Error Producing Conditions (EPC) using the method called NARA, 1 which belongs to the latest (third) generation of the human reliability assessment (HRA) methods, and pertains to the nuclear field, [51].
The models begin with the discretization of the GDFs into the classes, which reflect the available background knowledge in the field. For the review of the effects of GDFs on humans, the reader is referred to earlier work, [27,28]. Based on that discretization, a type of an effect that the GDF has on a crewmember is determined, namely diffuse or acute along with the thresholds.
The diffuse effect of motion, vibration and noise, as well as the acute effect of the later is directly affecting the human capabilities of a bridge crew and the engine team, described by the concept of attention management. The latter if reduced, increases the probability of human error while performing a given type of SCT. If the attention management remains unaffected, the probability of error stays at its baseline level, as obtained from NARA. Acute effects of all GDFs but noise don'ta ffect attention management capability (AMC), but they directly degrade the physical ability of a crewmember. However, in case of the engine team, this effect is not anticipated, since we assume that the preventive maintenance on ships systems is not carried out if the levels of GDFs are above thresholds. This is in line with most of the operations guidelines of ships, where crews abstain from certain tasks if the weather conditions do not allow for safe performance of these tasks.
The probability of an error while performing SCT by the bridge team (detecting, assessing and acting along with communicating within the bridge team) together with the probability of technical failure, due to erroneous maintenance of critical ship systems by the engine team, yields the probability of not making proper and effective evasive action while on a course leading to an accident.
If the other ship on the collision course does not take appropriate action either, the collision is inevitable. In case of grounding accident, the lack of action from the ship on a grounding course results in the accident.
The models are presented in Figs. 4 and 5, and elaborated in details in the following sections.

Quantification of the GDFs and their effect on human capabilities
In the presented models the variables associated with the GDFs are as follows: • noise (dB), • vibrations (Hz), • motion expressed by the accelerations in horizontal and vertical planes (in RMS g), as well the frequency of vertical motion (in Hz).
The variables are discretised according to the best available background knowledge, summarised in earlier work see [27,28],t or e flect various effects that the GDFs alone, or with combination, may have on a person.
In the models presented here we assign the equal a priori probabilities for the states of GDFs. However, if the models are used to quantify the human performance associated with a specific ship design, these probabilities are updated accordingly to reflect the anticipated behaviour of the designs under evaluation. The discretization and a'priori probabilities for the GDFs are gathered in tables presented in Appendix A (Tables A8-A12).
In the models the GDFs affect a person in a binary manner, either they act as a stressor, which degrades the human capability thus human performance, or they do not have any effect whatsoever. For the GDFs to become an active stressor their values need to be above threshold. The probabilities of GDFs sitting below or above a threshold, given set of GDFs, are evaluated by the conditional probability tables presented in the Appendix A (Tables A13-A20). For the summary of relevant background knowledge adopted to develop these tables see [27,28].
The quantification of the effect of GDFs stressors on the capabilities of human, through the concept of the AMC, is presented in Table 1.I f the stressors are active (the GDF sits above the threshold), the AMC is degraded with the probability of 1. The analogical assumption is made for the situation where the stressors are inactive.

Detection, assessment and action
Based on the earlier research, [5,49,50], and rational thought a sequence of SCT related to accident avoidance can be broken down into three major phases: detection, assessment, action (DAAct). These are linked with the basic cognitive functions of human such as observation, interpretation and planning, and execution. In the models presented here the performance of a human with respect to these functions is modelled with the use of NARA guidelines and the generic task types (GTT) that it offers. To limit the complexity of the models, a single GTT was sought to represent all relevant navigational tasks performed by the OOW that are important in executing collision avoidance and

GDFs and their effects on bridge personell
GDFs and their effects on engine personnel Human capabilities Hazard exposure Human behaviour while performing safety critical tasks Safety critical task performance

GDFs and their effects on bridge personell
GDFs and their effects on engine personnel Human capabilities Human behaviour while performing safety critical tasks Safety critical task performance  This task refers to the triplet DAAct, and it determines whether or not the SCT of detection, assessment and actions is performedwhich implies that it is also sufficient. This is an all-encompassing definition including whatever tasks are required to maintain situational awareness and to respond appropriately to avoid collision or grounding. The probability of not performing DAAct is calculated based on the integration of three nodes: Helmsman present, GDF Physical Effect and Attention Management Capability (AMC), as shown in Table B21 and depicted in Fig. 4.
In the case of not degraded AMC (AMC=z 1 ) and the absence of helmsman (H=h 0 ) we assume very weak physical effect of the GDFs (PE=x 1 ) on not performing the DAAct (DAAct=θ 0 ), as follows: However, this effect is represented as being more significant in combination with a degraded AMC (AMC=z 0 ), as follows: Thus representing an additional drain on cognitive resources compensating for physical task disruption. These values were estimated using judgement. To set the probability of not performing DAAct when a person is not affected by GDF exposure (PE=x 0 ), to reflect a baseline error rate, the NARA GTT value for Task C1 of 0.0005 is used, as follows: NARA categorises the factors that negatively influence human performance as one of eighteen Error Producing Conditions (EPCs), see Table B28. The EPC that best represented the causal mechanism from GDFs exposure to human performance was EPC No. 15 'Poor Environment'. This EPC represents the stressor effect of GDF exposure on attention management capability. The potential strength of effect of this EPC was set using the Assessed Proportion of Affect (APOA) variable. The APOA level was set based on the application of the NARA methodology to subjectively determine an appropriate value, nominally between 0 (no effect) and 1 (maximum effect). However, based on the guidance available for NARA, it was decided to cap the maximum APOA associated with the EPC to 0.1.
Then using NARA guidelines, given exposure to error producing conditions (EPC) No. 15, the probability of human error (HEP) in the above mentioned situation of 0.0006 is calculated, see Table B21. This HEP corresponds to the probability of not performing DAAct, as follows: The NARA calculation allows inclusion of multiple EPCs and an Extended Time Factor (ETF). In the models presented here GDFs are represented using only one EPC and there is little justification to include the ETF. Thus, the HEP is calculated based on the following formula:

Verbal communication on safety critical data
If evasive manoeuvres are performed with the presence of helmsman, which requires appropriate communication of information between the OOW and helmsman, the node D1 -verbal communication of safety critical data is evaluated, as presented in Table B22. This node determines whether or not the verbal communication of safety critical data to the helmsman required to avoid accident is sufficient. It relates to the NARA GTT No. D1 of the same name. This node is passed to its descendant called Evasive Action, where it is used to quantify the probability of this action differentiating between the situation where the helmsman is present or not. The NARA GTT value for Task D1 of 0.006 is used to set the probability of insufficient performance (D 1 =d 0 ) unaffected by GDF exposure (AMC=z 1 )t or e flect a baseline error rate.
The HEP of 0.0072 was calculated using NARA given exposure to EPC No. 15 representing the effect of GDF exposure via AMC (AMC=z 0 ), reflecting the following situation: It is assumed that Verbal communication of safety critical data is unaffected by the GDF Physical Effect.

Evasive action
Evasive action is assessed in the node Evasive Action, as shown in Table B23. It is assumed that if Detection, assessment and execution of simple actions and Verbal communication of safety critical data, where applicable, are performed then Evasive Action will be executed. If the helmsman is present then the probability of not executing the evasive action is capped at 0.0001, in line with the NARA Human Performance Limiting Value for 'Actions taken by a team of operators'.

Technical failure
Technical failure node quantifies the probability of the relevant systems not functioning as a result of lack of maintenance or poor maintenance caused by the GDFs affecting the AMC of a crewmember responsible for the maintenance. This node was included in recognition of the importance of maintenance in sustaining the functionality of vessel equipment such that it performs as it is designed to. Errors during maintenance on systems that provide the manoeuvring capability of the vessel can limit the vessel's response to control inputs associated with evasive action, hence affecting the probability of an unwanted outcome. Technical failure node determines whether or not maintenance actions performed on equipment that provides the vessel's manoeuvring capability has been completed successfully, see Table B24.
The probability of insufficient Maintenance Task Performance (MTP=m 0 ) is calculated based on the Attention Management Capability-2 node, which follows the principle presented in Table  A19. To set the probability of insufficient Maintenance Task Performance, unaffected by GDF exposure, to reflect a baseline error rate, the NARA GTT value for Task C1 of 0.0005 is used, as follows: The HEP of 0.0006 was calculated using NARA given exposure to EPC No. 15, representing the effect of GDF exposure via the Attention Management Capability-2 node, as presented in Table B25.
The GDF Physical Effect on the probability of insufficient safety behaviour is not anticipated. This comes from an assumption about lack of preventive maintenance carried out if the levels of GDFs are above thresholds. This is in line with most of the operations guidelines of ships, where crews abstain from certain tasks if the weather conditions do not allow for safe performance of these tasks.

Evasive action of another ship
Evasive action of another ship is a node, which accounts for the behaviour of OOW on a ship that is encountered by the own vessel. The probability for this node is assigned based on NARA calculations, and subject to alternative hypothesis testing, as elaborated in Section 3.5.

Failure of another officer of the watch
This node exists in the model quantifying the performance of a navigator conducting SCT related to avoiding grounding accident. The node reflects the team-work nature of bridge navigation, which is a process, where dangerous situations are anticipated well before they arise. In case of collisions the time horizon for such anticipation is usually expressed in minutes, 2 however in case of groundings it can be in hours or even days. A navigator knowing the passage plan is aware of any areas that are potentially dangerous, especially if the ship course is not adjusted and a ship leaves the pre-planned and safe route, [52]. Moreover, before a new bridge watch commences the relieved officer reports to the one who takes over all the anticipated course alterations and dangers to navigation to be expected during his watch, thus increasing his situational awareness, [53].
Such practice is recognized in the grounding model by the node Failure of another officer on the watch. This node quantifies the probability that an officer leaving the bridge will not pass the relevant information on anticipated hazardous waters to the officer taking over.
The probability for this node is assigned based on NARA calculations, and subject to alternative hypothesis testing, as elaborated in Section 3.5.

An accident
The last nodes of the models quantify the probability of an accident, namely collision and grounding, for a given set of GDFs. However, the models do not look to estimate probability of accident per se. Instead they take it as a well-understood measure of human performance, to propose a way to reflect human performance positively, which can be maximized in ship design phase. The logic behind the nodes is presented in Tables B26 and B27.

Uncertainty assessment and treatment
The parameters of the models presented here are developed based on certain amount of background knowledge, which by no means is complete. This incompleteness is the primary source of uncertainty associated with the models and their parameters, [54,55]. Some elements of the model and their relations are characterised by larger amount of information, knowledge and understanding, thus have lower uncertainty, the others are known and/or understood less resulting in larger uncertainty.
The uncertainty assessment together with the sensitivity analysis of the models provides a valuable tool for screening the models for important variables, which are both uncertain and the models are sensitive to. By ranking the variables in the models by their importance, we define the set of variables affecting the most the credibility of the models. Finally, with respect to the variables, which are important, the uncertainty treatment is applied, as described in the following sections.

Sensitivity analysis
A sensitivity-value approach presented by Coupé and van der Gaag [56] is applied here. The purpose of a sensitivity analysis is to investigate the effect of changes in the assigned probabilities of the network variables on the probabilities of a specific outcome variable. In a one-way sensitivity analysis, every conditional and prior probability in the network is varied in turn, keeping the others unchanged. Based on the findings from the sensitivity analysis of the models, the following can be concluded: • The models are highly sensitive to the following parameters: Maintenance Task Performance, C1 -Detection, Assessment and execution of simple actions and D1 -verbal communication of safety critical data.
• The models are moderately sensitive to Evasive action of another ship and Helmsman present. However, the effect that these parameters have on the output is significantly lower than the effects of C1 and D1, as specified above.
• The remaining nodes have very low sensitivity values, meaning that their effects on the models outputs are rather minor.

Evidential uncertainty assessment
For the most sensitive model parameters the evidential uncertainty assessment is carried out, and the results are presented in Table 2.T o rank the uncertainty we apply the following qualitative scoring system as introduced in [57]:

Significant uncertainty.
All of the following conditions are met: • The phenomena involved are not well understood; models are nonexistent or known/believed to give poor predictions.
• The assumptions made represent strong simplifications. • Data are not available, or are unreliable. • There is lack of agreement/consensus among experts.

Minor uncertainty.
All of the following conditions are met: • The phenomena involved are well understood; the models used are known to give predictions with the required accuracy.
• The assumptions made are seen as very reasonable.
• Much reliable data are available. • There is broad agreement among experts. Table 2 The qualitative assessment of evidential uncertainty for the models presented here, [58].

Model parameter
Justification for the evidential uncertainty score Evidential uncertainty score

Moderate
Evasive action of another ship The node represents the performance of navigation tasks critical in accident avoidance on board other ship. It is quantified based on NARA.

Moderate
Helmsman present At the moment this node is quantified fully based on judgement. However, more detailed assessment is possible if needed, for instance by performing survey among shipping companies.
Moderate-Minor 2 In situations where two ships are approaching each other at parallel courses and very low relative speed, the time to the closest point of approach (TCPA), can be expressed in hours. This measure along with the closest distance between two ships encountering (CPA) is an example of the operational measures of safety during sea passages. However in most cases the time needed to solve potential collision encounters is not more than tens of minutes.

Moderate
uncertainty. Conditions between those characterising significant and minor uncertainty, e.g.: • The phenomena involved are well understood, but the models used are considered simple/crude.
• Some reliable data are available.
Subsequently, by combining the results of sensitive and uncertainty assessment, the parameter importance ranking is carried out, see Table 3. Three parameters have high importance score, namely Maintenance Task Performance, C1-Detection, Assessment and Execution of Simple Action, D1 -verbal communication on safety critical data. For those three elements the uncertainty treatment is carried out as presented in the following section.

Uncertainty treatment
The uncertainty associated with the model parameters, which are believed to be to some extent known, is expressed through the probabilities. For example, the state of a specific GDF associated with a given ship design, say noise, can be described by a probability density function that reflects the anticipated variability of this parameter for the given design across a set of operational conditions.
In a similar way we can judge the probability of a helmsman being present during a critical situation. This probability can be based on the analysis of shipborne operations and the way how the bridge watch is organized and conducted, knowing that most of the day-time the bridge is manned by one person only, and a watchman is present during night shifts and when navigating in coastal waters.
However, the majority of the uncertainty in the models comes from the limited knowledge related to human performance given the presence of stressors (GDFs). Despite the extensive amount of data that NARA is built upon, it is very difficult to point to a given probability as the probability of erroneous behaviour of a human in certain situation.
The probability of human error (HEP) is estimated based on NARA methodology, for a set of factors and conditions, as presented in Table 4. However, our background knowledge on the link between human capability and human behaviour is heavily limited, as depicted in Fig. 3, and we are unable to gain new information at this point. Acknowledging that fact, we have to select an appropriate way of dealing with this type of uncertainty, trying to improve the model performance and results communication, [59]. We do not expect uncertainty reduction, since our limited knowledge will not improve, but we would expect some bounds for the results to fall within. Since, the NARA itself offers the intervals for the probability to account for the anticipated uncertainty; this seems to be a rational way out in our case.
The AHT, in principle does not reduce the uncertainty but helps to determine its bounds, which is relevant for decision making process. Even though the results obtained are not more accurate than the results without using AHT, the calculated bounds provide a warning for a decision-maker about the quality of a given parameter, which is missing in case of point estimates. In our case, the AHT is performed for the variables that score high in importance ranking. These are associated with the performance of a human affected by external stressors. The human error probability (HEP) is calculated for a given Generic Task Type (GTT) given set of stressors described by, so called, Error Producing Conditions (EPC).
The probabilities of nominal HEP for a given GTT are defined as intervals, whereas the EPCs are chosen by the analysts. This provides several combinations leading to several coexisting alternatives (AH 1 , AH 2 …AH n ). Due to lack of knowledge we cannot point to any specify hypothesis to be true for the given case, however we believe that we can judge their probabilities of occurrence, using subjective probabilities (p(AH 1 )+p(AH 2 )+…+p(AH n )=1), [70,71]. By doing that we obtain a set of HEPs rather than a single value HEP.
The background knowledge and the associated uncertainty are propagated through the model, and the output is produced. The latter is a two-state variable Z=(z1, z2), where z1=accident; z2=no_accident, and the quantity of interest is the probability of the accident given the set on input variables (X), as follows p(Z=z1|X). Table 3 The qualitative assessment of model parameters importance for the models presented here [58].

Model parameter
Evidential uncertainty score Sensitivity score Importance score  Table 4 The lower and upper probabilities for the parameters obtained from NARA, [51]. GTT  The following parameters of models need to be described with the use of imprecise probabilities, characterised by the lower and upper bounds: 1. C1 -Detection, assessment and execution of simple actions. 2. D1 -Verbal communication of safety critical data. 3. Maintenance task performance. 4. Technical failure. 5. Evasive action of another ship. 6. Failure of another officer of the watch.
All the items but 4 are based on the same paradigm and utilize the same upper and lower bounds for factors specified by NARA. The item 4 provides the spectrum of failure probabilities for a technical system, for normal and degraded maintenance regime, based on the historical data and experts' judgement. The values of the imprecise probabilities adopted for the items 1-3 and 5-6 are gathered in Table 4. The probability of evasive action of another ship and the failure of another officer of the watch correspond to the probabilities of C1.
However, there is no evidence for the probability of technical failure during erroneous maintenance, which results from the reduced human performance. In such situation the following three interpretations, referred to as alternative hypothesis (AH), may hold.
First it is intuitive to expect the probability of technical failure to increase, compared to normal maintenance. The most straightforward way to model that is to assume, that in case of improper maintenance (MTP=m0) the probability of technical failure equals 1.

AH p techn failure MTP m
1: Second, in case of erroneous action, the redundancy of the system may prevent technical failure to affect the operations of the safety critical systems, thus the probability of technical failure will remain the same as in normal operations: Third, in light of our limited knowledge on what will happen in case of erroneous maintenance in terms of the availability of safety critical technical systems of a ship, we can assume that both failure and lack of it are equally likely: To account for three interpretations, we consider each of these as an alternative hypothesis and we assign a weight to each, corresponding to our belief on a given variable becoming true. At the moment there is no support for giving preference to any of these, we assign equal weights for the hypothesis, on the probability of technical failure given insufficient maintenance as follows: However, in light of new evidences these weights and probabilities may be easily updated in the models.

Validation of the models
This section shows the results of validation framework applied to the models described in the earlier sections. The validation framework assesses the plausibility of the model as a tool for serving its envisaged functions: (i) to convey an argumentation based on available evidence, (ii) to discriminate different ship designs.

Face validity
The face validity of the models is drawn from the evidence base that drove the development of the model. The major components that make up the model of the attention management mechanism and were derived from the literature, as described earlier in this paper.
The input node thresholds for effect on human performance were derived from the literature, however the probabilities were set by a combination of expert judgement and human reliability assessment (HRA) methods. The NARA guidelines selected is evidence-based to the extent it utilises the CORE-DATA set to set basic human error probabilities that are derived from real-world human reliability data. However, significant expert judgement is required to select which factors are appropriate for the context and to determine the magnitude of their effect on human performance in the context. Despite the adoption of HRA methods in a number of domains (e.g. nuclear, oil & gas), these subjective judgements required by the method remain subject to potential inter-rater variability.
There are very few models inhabiting the modelling space presented here, hence it is difficult to elaborate on their consistency with the parameters of these models. Some elements existing in the models Table 5 The probabilities of collision and grounding obtained in the course of extreme conditions tests.

The probability of collision
The probability of grounding   Table 7 The summary of validity tests.

Validity criteria Score
Face High Content Moderate Predictive Moderate-High Concurrent Moderate correspond to the expert driven risk models for collision and grounding presented in the literature. Structurally, the concept of degraded and normal attention and resulted human behaviour are the aspects the models have in common with a model proposed by DNV, [13,74]. However, the relationship of the attention management construct to other elements in the model is unique to the models presented here. Overall, the face validity of the models presented here is high.

Content validity
The development of the models was based on scientific literature as far as possible. The pre-selected GDFs of ship motion, noise, WBV are a subset of the potential performing-shaping factors that could influence the outcomes of collision or grounding. Other factors can be imagined that are influenced by vessel design, which have a greater impact on human performance, however the three types of GDFs were chosen arbitrarily.
The states for the GDF input variables of ship motion, noise and vibration were derived from the literature around thresholds of effect that, despite incomplete data, can be justified with some caveats.
Other states for nodes within the model represent a simplification due to incomplete knowledge (attention management being represented by only two states: normal or degraded), or to perform a function within the model (switching GDF stressor effects and GDF physical effects on or off). These binary nodes are understood to have low content validity.
Nodes outside the GDF input variables fail the test of dimensional consistency as they do reflect states either above or below a level, but cannot be set at the level itself. Hence these reduce the content validity of the model.
The presented models are developed to the limits of the relevant literature and expert knowledge. The integration of HRA is relevant here. Despite HRAs general limitations as a method in terms of reliability and its application here being outside its normal use, it was necessary as a means to generate probabilistic bounds for the nodes within the risk model.
Overall, the content validity of the models is moderate, being compromised by the limitations imposed by the limited scientific knowledge in the domain.

Concurrent validity
Concurrent validity (CV) is tested in two-fold. First, by comparing the elements of the models with another model developed for a similar purpose. Second by comparing the models with the trends existing in the results from the experiments, which were designed for this specific purpose, conducted with the use of full mission bridge simulator as reported in [75].
The former is complicated, since there is no other model that would link the GDFs with the human task performance. The external models that were used for this validation exercises, were developed for specific purposes, which substantially differ from the purpose of the models presented here. Therefore, the inter-models comparison is feasible only with respect to selected elements of the models.
For instance, the elements describing human behaviour in a given situation can be compared with respect to the paradigms governing these, discretization level adopted or input-output relation. In some case qualitative comparison is feasible; in other quantitative evaluation is the only option.
There are several models that can be deemed relevant for this step of validation, for example [5,6,[76][77][78]. However only one by DNV, is found suitable for the concurrent validity, [13,74].

Comparison with other relevant models.
A model by DNV estimates the probability of collision, accounting for the navigational parameters, safety culture, personnel factors, management factors, technical reliability and other vigilance. The model assumes the failure of a navigator along the path of detection-assessment-action for an accident to happen. As a modelling technique BBN is adopted and experts' judgment is used as a primary source of information to parameterize the model. The model was developed to measure the effects of bridge layout on the collision and grounding probability through the estimation of performance of a navigator. The performance affects the causal path of: detection, assessment and action, moreover, the probability of technical failure of ship machinery is accounted for, likewise in the models described here.
Looking into the core of DNV model, where the effect of stressors on the performance of a navigator is estimated, we can say that our models and the DNV model can be compared at a generic level. The basic causal relations: stressorattention of a persondetectionassessmentaction is retained in both models, the logic behind their structure can be seen as similar.
The structure of our models is based on the concept of Attention Management and the parameters are found through the HRA methods and experts' judgments. The DNV model to a large extent is based on experts' judgment and historical data, therefore the structure and parameters of these two models can hardly be comparable.
Despite that difference, the validity of our models shall not be seen compromised. Since the purposes of the two compared models are substantially different, their input nodes and CPTs are not expected to match fully.

Comparison with simulator studies.
By comparing the effect of GDFs on the performance of a navigator with the results obtained from experiments with the use of full mission ship bridge simulator, one can evaluate the level of concurrent validity of the relevant parts of the models. The results of simulator studies conducted for the purpose of validating the models shown here are reported in [75]. The performance of a navigator is measured in two-fold for each simulation run. First, two navigation measures are evaluated, i.e. normalised distance to the relevant target ship, and normalised track deviation. Second the instructor ratings on the rapidity/quality of mariners' responses are recorded. The studies account for the effect of noise on human performance. Due to technical limitations of the simulator the effect of motion and vibration could not be realistically modelled.
In the report authors claim that there are no significant effects of noise or task difficulty on either of the navigation measures. However, when the instructors' ratings of the rapidity/quality of mariners' responses are considered, a number of significant or marginally significant effects of noise and task difficulty emerge. The overall, global ratings of task performance by instructors showed both a significant effect of task difficulty (which validated the task difficulty manipulation within the experiment) and a significant effect of noise such that higher levels of noise were associated with poorer performance. The effects of noise were also either significant, or close to statistical significance, in the case of several specific actions within high difficulty scenarios.
This finding suggests that there might be an interaction between noise and task difficulty such that noise has little or no effect in situations where the task is easier, but might lead to impaired performance of a high difficulty task. These findings support, to some extent, the modelling choices adopted here.
The overall concurrent validity of the risk model can be judged as moderate.

Predictive validity
Extreme conditions test is a case of predictive validity testing where the hypothetical model is set to extreme conditions where the behaviour of the model is more predictable, [79]. The results of these tests are presented in Table 5, where the measure of human performance is tabulated, expressed as the probabilities of collision and grounding. The results are shown for three different states of input variables. The base line refers to the input variables being in their states reflecting the average conditions on board ships. The all-active state refers to a situation where all the stressors are active and are affecting negatively the performance of a navigator. The all-inactive state refers to the situation where there is no stressor acting on a navigator.

Comparison of the results obtained from the models being in nomological proximity.
There is only one model that is nomologicaly close to the model presented here, namely DNV risk model, ** [13,74]. The proximity of these two models comes from the fact, that both measure the effect of performance shaping factors (PSFs) on the ability of a navigator to perform his main tasks in relation to safe navigation and collision avoidance action. However, the PSFs that each model takes into account are different, likewise the mechanisms governing that ability. In the model presented here navigator's ability is governed by the attention management capability, which is modelled by a variable of the same name. In DNV model this ability is modelled through a set of variables, where the one called performance seems to be the most comparable with the attention management in our model. Therefore, the effect of manipulation of these two variables on the explanatory variable is measured. The results of this test are presented in Table 6, where the probabilities of collision associated with the various states of the analysed variables are shown.
It is evident, that the behaviour of both models is comparable, when changing the states of the variables of interests between their states. In our model the variable Attention Management Capabilities has two states (Normal, Degraded), however in DNV model the variable Performance has four states (Excellent, Standard, Poor, Not able to perform). For comparison we took Performance=Excellent and Performance=Poor, these refer to Normal and Degraded attention management capability (AMC).
On the basis of the obtained scores of the validity criteria presented in Table 7, the overall predictive validity of the risk model can be judged as between moderate and high with the following observations: • Moderate content validity informs that there is room for improvement when it comes to the parametrization of the models. This stems mainly form the lack of data about the analysed phenomena. In order to increase the score in this validation test, an extensive research is needed dedicated solely to the assessment of the effect of GDFs on performance of a crew member.
• Moderate concurrent validity reflects the actual state of the art in the analysed field. Due to lack of compatibility between the existing models and these presented here the direct comparison of these is not feasible. In some cases only elements of the models can be compared, but event then due to substantially different modelling paradigm behind the models the results of comparison are not satisfactory.
• Moderate -High predictive validity shows that the models developed behave as expected when tested. This means that the model ranks the designs appropriately for different GDF levels. In practice it means, that the exact numbers that the models provide may not be "correct" but the proper behaviour of the models means they can be used to distinguish different designs in a multi-objective optimization, which is the primary aim of the models.
• Also the sensitivity-uncertainty assessment allowed defining the most critical elements of the models and remedial actions were taken to improve the overall performance of the model. These comprise the alternative hypothesis with respect to the most critical element. This high score supports the adopted modelling techniques, which make it possible to carry out an extensive tests pertaining to the predictive validity.

Conclusion
This paper offers two novel models linking the effect of GDFs with human performance, which can be incorporated into a process of multi-objective ship design. Within the models the GDFs can be appropriately modified at the early stage of ship design process resulting in a design characterised by the highest performance of a human. All other factors affecting the human performance, but not belonging to this particular aspect of ship design, are omitted in the model (they are considered constant in all analysed design alternatives, thus having the same effect across designs).
The primary aim of the framework is to allow differentiation among various designs based on the criteria selected, which is human performance in safety critical tasks pertaining to accident avoidance. The results of validation process adopted here show that the presented models are valid for the given purpose, despite certain limitations and the paucity in data. These gaps in background knowledge lower the level of accuracy of the models, thus the models shall not be used to seek an accurate estimates of the measure of human performance (the probability of an accident). Instead the models can be used for the relative comparison of designs, which is affected to much lesser extent by the gaps in background knowledge.
The models may be used by naval architects, vessel designers, and vessel system designers as intended, provided access to Human Factor (HF) expertise is available to assist with application and interpretation. It is important to recognise the relevance of human factors input during its eventual application. HF provides the understanding of the complexities of human behaviour in operational settings, its interdependencies and interactions.  Tables A8-A20).  Tables B21-B28).