Probability elicitation for Bayesian networks to distinguish between intentional attacks and accidental technical failures

.


Introduction
Modern societies rely on proper functioning of Critical Infrastructures (CIs) in different sectors such as energy, transportation, and water management which is vital for economic growth and societal wellbeing.Over the years, CIs have become over-dependent on Industrial Control Systems (ICSs) to ensure efficient operations, which are responsible for monitoring and steering industrial processes as, among others, electric power generation, automotive production, and flood control.ICSs were originally designed for isolated environments [1].Such systems were mainly susceptible to technical failures.The blackout in the Canadian province of Ontario and the North-eastern and Mid-western United States is a typical example of a technical failure in which the absence of alarm due to software bug in the alarm system left operators unaware of the need to redistribute power [2].However, modern ICSs no longer operate in isolation, but use other networks to facilitate and improve business processes [3].This increased connectivity, however, makes ICSs more vulnerable to cyber-attacks apart from technical failures.A cyber-attack on a German steel mill is a typical example in which adversaries made use of corporate network to enter into the ICS network [4].As an initial step, the adversaries used both the targeted email and social engineering techniques to acquire credentials for the corporate network.Once they acquired credentials for the corporate network, they worked their way into the plant's control system network and caused damage to the blast furnace.
It is essential to distinguish between attacks and technical failures that would lead to abnormal behaviour in the components of ICSs and take suitable measures.In most cases, the initiation of response strategy presumably aimed at technical failures would be ineffective in the event of a targeted attack and may lead to further complications.For instance, replacing a sensor that is sending incorrect measurement data with a new sensor would be a suitable response strategy to technical failure of a sensor.However, this may not be an appropriate response strategy to an attack on the sensor as it would not block the corresponding attack vector.Furthermore, the initiation of inappropriate response strategies would delay the recovery of the system from adversaries and might lead to harmful consequences.Noticeably, there is a lack of decision support to distinguish between attacks and technical failures.
Bayesian Networks (BNs) have the capacity to tackle this challenge especially based on their real-world applications in medical diagnosis [5] and fault diagnosis [6][7][8].In addition, there are other BN-based applications in domains like resilience engineering [9], structural systems [10].BNs belong to the family of probabilistic graphical models [11], consisting of a qualitative and a quantitative part [12].The qualitative part is a Directed Acyclic Graph (DAG) of nodes and edges.Each node represents a random variable, while the edges between the nodes represent the conditional dependencies among the random variables.BN structure modelling includes determining nodes and relationships between the determined nodes [13].The quantitative part takes the form of a priori marginal and conditional probabilities so as to quantify the dependencies between connected nodes.BN parameter modelling includes specifying prior marginal and conditional probabilities [13].
In order to address the above-mentioned research gap, we developed a framework in our previous work to help construct BN models for distinguishing attacks and technical failures [14].Furthermore, we extended and combined fishbone diagrams within our framework for knowledge elicitation to construct the qualitative part of such BN models.However, our previous work lacks a systematic method for knowledge elicitation to construct the quantitative part of such BN models.This present study aims to provide a holistic framework to help construct BN models for distinguishing attacks and technical failures by addressing the research question: "How could we elicit expert knowledge to effectively construct Conditional Probability Tables of Bayesian Network models for distinguishing attacks and technical failures?".The research objectives are: • RO1.To propose an approach that would help to effectively construct Conditional Probability Tables (CPTs) for our application.• RO2.To demonstrate the proposed approach using an example in the water management domain.
Empirical data is one of the data sources utilised to populate CPTs of BN models in cyber security [15].This empirical data can be extracted from specific sources like cyber security incidents database, technical failure reports, and red team vs. blue team exercises.However, in the water management domain in the Netherlands, there is no/limited cyber-attacks on their infrastructures [16].In addition, information corresponding to limited cyber-attacks and technical failure reports are not shareable due to the sensitivity of data [16].Furthermore, red team vs. blue team exercises were not possible due to practicalities, especially there is a lack of testbeds which could facilitate such exercises in the Netherlands [16].Expert knowledge is one of the predominant data sources utilised to populate conditional probability tables (CPTs) especially in domains where there is a limited availability of data like cyber security [15].Probability elicitation is the most challenging part of constructing BN models especially when it relies on expert knowledge as we need to elicit probability for every possible combination of parent variables state to complete the CPT of a child variable from experts.The CPT size of a child variable grows exponentially with the number of parents.For instance, the CPT size of a binary child with 5 binary parents is 64 (2 5+1 ) entries.The burden of probability elicitation could be reduced by: (i) reducing the number of conditional probabilities to elicit by imposing structural assumptions, and (ii) facilitating individual probability entry by providing visual aids to help experts answer elicitation questions in terms of probabilities [17].We evaluate several techniques for reducing the number of probabilities to elicit, and conclude that DeMorgan models is most suitable for our purpose [18].Furthermore, we review several methods for facilitating individual probability entry and conclude that probability scales with numerical and verbal anchors is most appropriate for our application [19,20].
The main contributions of this paper are as follows: (i) we propose an approach involving DeMorgan model and probability scales with numerical and verbal anchors that could help to effectively construct quantitative part (CPTs) of BN models for distinguishing attacks and technical failures.(ii) we demonstrate the proposed approach using an example in the water management domain to mainly show which parameters need to be elicited from experts and corresponding questions that needs to be asked in addition to how the rest of the probabilities in the CPTs are computed.This paper is not about "anomaly detection" (i.e., detecting whether an anomaly has occurred or not), but rather "diagnosis" (i.e., pinpointing whether the detected anomaly is due to cyber-attack or technical failure).Diagnosis is prevalent in medical and safety domains [21,22].Furthermore, we utilised Design Science Research (DSR) method to tackle our RQ, which is widely used to create artefacts [23].An artefact is defined as an object made by humans for the purpose of solving practical problems like distinguishing attacks and technical failures [24].An artefact could be a construct (or concept), a model, a method, or an instantiation [25].The practical problems can be solved using artefacts in numerous cases.There are five main phases in the DSR process: (i) problem identification, (ii) design and development, (iii) demonstration, (iv) evaluation, and (v) communication.In the problem identification phase, we gather constraints and high-level requirements using semi-structured interviews and focus group sessions with experts in safety and/or security of ICS in the water management domain in the Netherlands.The list of questions which we asked the experts in addition to constraints and requirements are provided in the Appendix.These constraints and high-level requirements are mainly for developing our holistic framework which would then help to construct BN models for distinguishing attacks and technical failures and their evaluation.This phase results in a set of high-level requirements and constraints based on the responses from experts, which are mainly used as an input for the "design and development" and "evaluation" phases of the DSR process.This paper corresponds to the "design and development" and "demonstration" phases in the DSR process.However, evaluating the performance of the proposed approach is outside the scope of this study, which corresponds to the "evaluation" phase in the DSR process.Our related work that has already been published corresponds to the "evaluation" phase in the DSR process [16].The set of constraints and high-level requirements mentioned in the Appendix plays an important role in structuring the problem space and deriving design decisions systematically.This is used as a basis for the "design and development" and "evaluation" phase of the DSR process.
The remainder of this paper is structured as follows.In Section 2, we illustrate the different layers and the components of an ICS and describe a case study in the water management domain that is used to demonstrate our proposed approach.In Section 3, we describe our existing framework in addition to a systematic method for knowledge elicitation to construct the qualitative part of BN models for distinguishing attacks and technical failures.Section 4 provides an essential foundation of techniques to reduce the burden of probability elicitation to construct the quantitative part of BN models for distinguishing attacks and technical failures.In Section 5, we demonstrate the proposed approach using an example in the water management domain.Section 6 presents discussion followed by the conclusions and future work directions in Section 7.

Industrial control systems
In this section, we illustrate the three different layers and major components in each layer of an ICS.Furthermore, we provide an overview of a case study in the water management domain.

ICS architecture
Domain knowledge on ICSs is the starting point for the development and application of our proposed approach.A typical ICS consists of three layers: (i) Field instrumentation, (ii) Process control, and (iii) Supervisory control [26], bound together by network infrastructure, as shown in Fig. 1.
The field instrumentation layer consists of sensors (S i ) and actuators (A i ), while the process control layer consists of Programmable Logic Controllers (PLCs)/Remote Terminal Units (RTUs).Typically, PLCs have wired communication capabilities whereas RTUs have wired or wireless communication capabilities.The PLC/RTU receives measurement data from sensors, and controls the physical systems through actuators [27].The supervisory control layer consists of historian databases, software application servers, the Human-Machine Interface (HMI), and the workstation.The historian databases and software application servers enable the efficient operation of the ICS.The low-level components are configured and monitored with the help of the workstation and the HMI, respectively [27].

Case study overview
This case study overview is based on a site visit to a floodgate in the Netherlands.Some critical information has purposely been anonymised    for security concerns.This case study is also used in our previous work [14].Fig. 2 schematises a floodgate being primarily operated by Supervisory Control and Data Acquisition (SCADA) system along with an operations centre.
Fig. 3 illustrates the SCADA architecture of the floodgate.The sensor (S 1 ), which is located near the floodgate, is used to measure the water level.There is also a water level scale which is visible to the operator from the operations centre.The sensor measurements are then sent to the PLC.If the water level reaches the higher limit, PLC would send an alarm notification to the operator through the HMI, and the operator would need to close the floodgate in this case.The HMI would also provide information such as the water level and the current state of the floodgate (open/close).The actuator opens/closes the floodgate.This case study would be used to demonstrate our proposed approach that would help to effectively construct CPTs involving domain experts.

Framework for distinguishing attacks and technical failures
This section describes the proposed framework including extended fishbone diagrams in our previous work with an example that could help to construct qualitative part (DAG) of BN models for distinguishing attacks and technical failures [14], which corresponds to structural modelling of BNs.
The framework consists of three layers as shown in Fig. 4, which mainly shows different type of variables (i.e., contributory factors, problem, and observations (or test results)) and their relationships.The middle layer consists of a problem variable which is the major cause for an abnormal behaviour in a component of the ICS (observable problem).In the example shown in Fig. 4, we considered "Sensor (S 1 ) sends incorrect water level measurements" as the problem, which is observable.For instance, this problem could be observed by comparing the water level measurements sent by the sensor (S 1 ) against the  measurements in the water level scale.We considered the major causes of the problem (intentional attack and accidental technical failure) as the states of the problem variable.In our work, we assume that problem (example: "Sensor (S 1 ) sends incorrect water level measurements") is already identified by the operator.The scope of our proposed approach is to distinguish between the major causes (i.e., intentional attack vs. accidental technical failure).
The upper layer consists of factors contributing to the major causes of the problem.For instance, the factor "Weak physical access-control" contributes to "Sensor (S 1 ) sends incorrect water level measurements" due to intentional attack, whereas "Lack of physical maintenance" contributes to "Sensor (S 1 ) sends incorrect water level measurements" due to accidental technical failure.The lower layer consists of observations (or test results) which is defined as any information useful for determining the major cause of the problem based on the outcome of tests.For instance, the outcome of the test whether "Sensor (S 1 ) sends correct water level measurements after cleaning the sensor" would provide additional information to determine the major cause (accidental technical failure or intentional attack) of the problem accurately.
The framework which we proposed in our previous work includes a systematic method based on fishbone diagrams for knowledge elicitation to construct the qualitative part of BN models [14].We adopted this approach because there are challenges to solely rely on BNs for knowledge elicitation to construct the qualitative part of BN models.It is not easy-to-use for knowledge elicitation involving domain experts as it could be time-consuming for elicitors to explain the notion of BNs [14].Furthermore, it could not encourage and guide data collection by showing where knowledge is lacking as it is not well-structured.On the other hand, fishbone diagrams help to systematically identify and organise the possible contributing factors (or sub-causes) of a particular problem [28][29][30][31][32].We extended fishbone diagrams to incorporate observations (or test results) in our previous work, which needs to be elicited for our application in addition to contributory factors.
Fig. 5 shows an example extended fishbone diagram which consists of a problem ("Major cause for sensor (S 1 ) sends incorrect water level measurements"), contributing factors (or sub-causes) sorted and related under different categories on the left side of the problem.Each category on the left side of the problem represents the major causes of the problem (intentional attack and accidental technical failure).Our example shows that "Lack of physical maintenance" is the contributing factor to the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to accidental technical failure.Furthermore, the observations (or test results) on the right side of the problem would provide additional information to determine the major cause of the problem accurately.Each category on the right side of the problem are used for reference to elicit observations (or test results) that would be useful for determining the particular major causes of the problem [14].Our example shows that the observation "abnormalities in other locations" would increase the probability of the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to intentional attack.
Once the extended fishbone diagram is developed, it would be translated into a corresponding qualitative BN model based on the mapping scheme in our previous work [14].However, the proposed framework lacked a systematic method for knowledge elicitation to construct the quantitative part of BN models (the CPTs), which we address in the current work.

Techniques for reducing the burden of probability elicitation
Probability elicitation is a challenging task in building BNs, especially when it relies heavily on expert knowledge [17].The extensive workload for experts in probability elicitation could affect the reliability of elicited probabilities.However, the workload for experts in probability elicitation could be reduced by reducing the number of conditional probabilities to elicit and facilitating individual probability entry.

Technique for reducing the number of conditional probabilities to elicit
This section analyses well-known techniques and describes the most suitable technique for our application, which would help to reduce the number of conditional probabilities to elicit.
In order to reduce the number of conditional probabilities to elicit, we could exploit the causal independence models [17].Causal independence refers to the situation where the contributory factors (causes) contribute independently to the problem (effect) [33].By utilising these models, only a number of parameters that is linear in the number of contributory factors is needed to be elicited to define a full CPT for the problem variable as the total influence on the problem is a combination of the individual contributions [34].As an example, we shall consider the BN model depicted in Fig. 4, where the problem variable (Y) is a binary discrete variable with the states "Intentional Attack" and "Accidental Technical Failure".In the CPT shown in Fig. 6, Y = "Intentional Attack" denotes Y = "True", and Y = "Accidental Technical Failure" denotes Y = "False".We translated the states of Y into "True" and "False" to comply with the inherent assumptions of the noisy-OR model with regard to the states of variables.The typical state of each variable in the noisy-OR model is "False".For instance, the typical state of a child variable (Fever) in the noisy-OR model is "False" as it is normal.Therefore in our application, we assigned Y = "False" for Y = "Accidental Technical Failure" as this is the a priori expected major cause, based on the higher frequency of technical failures compared to the intentional attacks [14].
In our application, we are dealing with a combination of promoting and inhibiting influences.In case of a promoting influence, the presence (or absence) of the parent will result in the child event with a certain probability.When the parent is absent (or present), it is certain not to cause the child event.In other words, the presence (or absence) of the contributory factor will result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "intentional attack" with a certain probability as it denotes "True" state.For instance, the presence of "Weak physical access-control" will result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "intentional attack" with a certain probability, whereas the absence of "Weak physical access-control" will not certainly result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "intentional attack".This type of promoting influence is defined as a cause [18].On the other hand, the absence of "Sensor data integrity verification" will result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "intentional attack" with a certain probability, whereas the presence of "Sensor data integrity verification" will not certainly result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "intentional attack".This type of promoting influence is defined as a barrier [18].
In case of an inhibiting influence, the presence (or absence) of the parent will inhibit the child event with a certain probability.When the parent is absent (or present), it is certain not to inhibit the child event.In other words, the presence (or absence) of the contributory factor will result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "accidental technical failure" with a certain probability as it denotes "False" state.For instance, the presence of "Lack of physical maintenance" will result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "accidental technical failure" with a certain probability, whereas the absence of "Lack of physical maintenance" will not certainly result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "accidental technical failure".This type of inhibiting influence is defined as an inhibitor [18].On the other hand, the absence of "Well-written maintenance procedure" will result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "accidental technical failure" with a certain probability, whereas the presence of "Well-written maintenance procedure" will not certainly result in the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "accidental technical failure".This type of inhibiting influence is defined as a requirement [18].
Our example BN model shows that it possesses a mixture of promoting and inhibiting influences (causes and inhibitors) especially with regard to the interaction between the contributory factors and the problem.Therefore, we need a technique that would help to model opposing influences as we deal with a mixture of promoting and inhibiting influences in our application, which would help to reduce the number of conditional probabilities to elicit.
We analysed several techniques and chose the most suitable technique for our application which would be described in Section 4.1.1.The description of techniques that are unsuitable for our application can be found in Appendix which includes the noisy-OR model and Causal Strength (CAST) logic.The noisy-OR model is one of the most commonly used causal independence models which helps to reduce the number of conditional probabilities to elicit [5,35].The noisy-OR model inherently assumes binary variables [36].The noisy-MAX model is an extension of the noisy-OR model which is suitable for multi-valued variables [37].We analysed the noisy-OR model as we deal with only binary variables in our application.
The noisy-OR model assumes that the properties of exception independence and accountability hold true [38].In case all the modelled contributory factors of the problem ("Sensor (S 1 ) sends incorrect water level measurements") are false, the property of accountability requires that the problem be presumed false ("Sensor (S 1 ) sends incorrect water level measurements" due to "accidental technical failure").However, this would not work for inhibiting influences such as "Lack of physical maintenance" in the noisy-OR model as shown in Fig. 6.In case "Lack of physical maintenance" is absent, it is certain not to inhibit the problem which is incompatible with the property of accountability.Therefore, we found that the noisy-OR model is unsuitable for the purposes of our application because the property of accountability does not hold true.
Alternatively, CAST logic is one of the techniques mainly developed for modelling opposing influences [39].CAST logic assumes all the variables in the model are binary.The parameters which need to be elicited to completely define CPTs using CAST logic are: (i) causal strengths for each edge, and (ii) baseline probability for each variable.The baseline probability of a parent variable can be interpreted as the prior probability of the corresponding parent variable.However, it would not be appropriate to interpret the baseline probability of the child variable as a prior probability or a leak probability, as the parent variables have no state in which they are guaranteed to have no influence on the child variable [40].As the definition of baseline probability of child variable is not clear, we cannot formulate appropriate question to elicit baseline probability of child variable.This is the major disadvantage of CAST logic which resulted in the lack of practical applications [18,40].We conclude that neither the noisy-OR model nor the CAST logic is suitable for the purposes of our application.

DeMorgan model
As an alternative to the previously discussed models, the DeMorgan model can potentially be used to tackle the challenge of modelling opposing influences, which would help to reduce the number of conditional probabilities to elicit.This section explains the DeMorgan model.This section corresponds to parameter modelling of BNs, which show parameters that needs to be elicited from experts and corresponding questions that needs to be asked to experts during this elicitation process in addition to how the rest of the parameters in the CPTs are computed.
The DeMorgan model is a technique mainly developed for modelling opposing influences, which would help to reduce the number of conditional probabilities to elicit [18,40].The DeMorgan model is applicable when there are several parents and a common child.The DeMorgan model inherently assumes binary variables.The DeMorgan model assumes that one of the two states of each variable is always the distinguished state as shown in Fig. 7. Usually such state of the child variable depends on the modelled domain [41].This is a typical state of the corresponding child variable [42].In case the child variable consists of two states ("disease", "no disease") in the medical domain, the distinguished state of the corresponding child variable is chosen as "no disease" as it is normal [41].In our application, the distinguished state of the problem variable ("Major cause for sensor (S 1 ) sends incorrect water level measurements") is chosen as "accidental technical failure" as this is the a priori expected major cause, based on the higher frequency of technical failures compared to the intentional attacks [14].The distinguished state of a parent variable is relative to the type of causal interaction with the child variable [18].The same parent variable can have different distinguished states in different interactions that it participates in with the different child variables.
There are 4 different types of causal interactions between an individual parent (X) and a child (Y) in the DeMorgan model: (i) cause, (ii) barrier, (iii) inhibitor, and (iv) requirement.
(i) Cause: X is a causal factor and has a positive influence on Y.In this type of causal interaction between an individual parent (X) and a child (Y), the distinguished state of the corresponding parent variable is "False" [18].Consequently, when the parent variable is "False", it is certain not to trigger a change from the typical state of the child variable as shown in Table 1.When the parent variable is "True", it will trigger a change from the typical state of the child variable, with a certain probability (v X ) as shown in Table 1.(ii) Barrier: This is a negated counterpart of cause, i.e., X ′ is a causal factor and has a positive influence on Y.In this type of causal interaction between an individual parent (X) and a child (Y), the distinguished state of the corresponding parent variable is "True" [18].Accordingly, when the parent variable is "True", it is certain not to trigger a change from the typical state of the child variable as shown in Table 2.When the parent variable is "False", it will trigger a change from the typical state of the child variable, with a certain probability (v X ) as shown in Table 2. (iii) Inhibitor: X inhibits Y.In this type of causal interaction between an individual parent (X) and a child (Y), the distinguished state of the corresponding parent variable is "False" [18].As a result, when the parent variable is "False", it is certain not to prevent a change from the typical state of the child variable as shown in Table 3.When the parent variable is "True", it will prevent a change from the typical state of the child variable, with a certain probability (d X ) as shown in Table 3. (iv) Requirement: The relationship between an inhibitor and requirement is similar to the relationship between a cause and

Table 1
Type of Causal Interaction: Cause.

Table 2
Type of Causal Interaction: Barrier.

Table 3
Type of Causal Interaction: Inhibitor.
In this type of causal interaction between an individual parent (X) and a child (Y), the distinguished state of the corresponding parent variable is "True" [18].Hence, when the parent is "True", it is certain not to prevent a change from the typical state of the child variable as shown in Table 4.When the parent variable is "False", it will prevent a change from the typical state of the child variable, with a certain probability (d X ) as shown in Table 4.
The DeMorgan model is an extension and a combination of the noisy-OR and noisy-AND model which supports modelling the abovementioned types of causal interactions [18].Maaskant et al. modelled promoting influences which includes causes and barriers by mimicking the noisy-OR model [18].Furthermore, Maaskant et al. modelled inhibiting influences which includes inhibitors and requirements by mimicking the noisy-AND model [18].Finally, Maaskant et al. modelled the combination of promoting and inhibiting influences by combining the noisy-OR and noisy-AND model.
The property of accountability in the noisy-OR model is applicable to the DeMorgan model with a slight modification as it also exploits causal independence: In case all the modelled parents of the child are in their distinguished state, the property of accountability requires that the child be presumed their distinguished state.However, in many cases, this is not a realistic assumption as it is difficult to capture all the possible parents of the child [34].Specifically, this is not realistic in our example as it is difficult to capture all the possible contributory factors of the problem ("Sensor (S 1 ) sends incorrect water level measurements") due to "intentional attack".In the DeMorgan model, the leak parameter (v XL ) deals with the possible parents of the child that are not previously known and explicitly modelled.
In general, the size of the CPT of a binary variable with n binary parents is 2 n + 1 .However, only n + 1 parameters are sufficient to completely define CPT using the DeMorgan model as it exploits causal independence.In the example shown in Fig. 7, only 5 parameters are sufficient to completely define the CPT of child variable (Y) using the DeMorgan model instead of 64 entries.There are 2 different parameterisations for the Noisy-OR model with a leak parameter (the Leaky Noisy-OR model) proposed by Henrion [43] and Diez [37] which are mathematically equivalent.These 2 parameterisations differ only in the type of question that needs to be asked to the experts for knowledge elicitation.Henrion's parameterisation is supported by a question like: "What is the probability that the effect is true given that a cause (X i ) is true and all the modelled causes are false?".On the other hand, Diez's parameterisation is supported by a question like: "What is the probability that the effect is true given that a cause (X i ) is true and all other modelled and unmodelled causes are false?".The DeMorgan model utilised the Diez's parameterisation with a slight modification.
We could find the values for required parameters from the experts to completely define CPT using the DeMorgan model based on appropriate question for each type of causal interaction detailed below: (i) The leak parameter: To find the value for the leak parameter, the elicitor could ask experts: "What is the probability that the child is in their non-distinguished state given that the parents are in their distinguished states?".In our example shown in Fig. 7, the elicitor could ask experts to find the value for parameter (v XL ): "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is intentional attack given that the physical access-control for sensor (S 1 ) is strong, data integrity verification is performed for sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is well-written?".(ii) Cause: To find the value for parameter corresponding to a cause, the elicitor could ask experts: "What is the probability that the child is in their non-distinguished state given that all the parents are in their distinguished states, except X i and no other unmodelled causal factors are present?".In our example shown in Fig. 7, the elicitor could ask experts to find the value for parameter (v X1 ): "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is intentional attack given that the physical access-control for sensor (S 1 ) is weak, data integrity verification is performed for sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is well-written, and no other unmodelled causal factors are present?".(iii) Barrier: To find the value for parameter corresponding to a barrier, the elicitor could ask experts: "What is the probability that the child is in their non-distinguished state given that all the parents are in their distinguished states, except X i and no other unmodelled causal factors are present?".In our example shown in Fig. 7, the elicitor could ask experts to find the value for parameter (v X2 ): "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is intentional attack given that the physical access-control for sensor (S 1 ) is strong, data integrity verification is not performed for sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is well-written, and no other unmodelled causal factors are present?".(iv) Inhibitor: Maaskant et al. did not directly determine the value for parameters corresponding to inhibitors similar to causes and barriers as it is not practical for the example which they considered [40].Specifically, it makes less sense to ask for the effect of presence of parent ("Rain") on the child ("Bonfire"), when the child ("Bonfire") is "False".Therefore, they determined the value for parameter corresponding to each inhibitor by determining the negative influence relative to an arbitrary (non-empty) set of causes/barriers/leak parameter.However, in our application, it is possible to determine the value for parameter corresponding to inhibitors directly as we ask for the effect of presence of parent ("Lack of physical maintenance") on the child ("Major cause for sensor (S 1 ) sends incorrect water level measurements"), when the latter ("Major cause for sensor (S 1 ) sends incorrect water level measurements") is "Accidental technical failure".In order to find the value for parameter corresponding to an inhibitor directly, the elicitor could ask the experts: "What is the probability that the child is in their distinguished state given that the parents are in their distinguished states, except U i and no other unmodelled causal factors are present?".In our example shown in Fig. 7, the elicitor could ask experts to find the value for parameter d U1 : "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is accidental technical failure given that the physical access-control for sensor (S 1 ) is strong, data integrity verification is performed for sensor (S 1 ) data, sensor (S 1 ) is not always physically maintained, maintenance procedure for sensor (S 1 ) is well-written and no other unmodelled causal factors are present?".(v) Requirement: Maaskant et al. did not directly determine the value for parameters corresponding to requirements similar to causes and barriers as it is not practical for the example which they considered [40].Specifically, it makes less sense to ask for the effect of absence of parent on the child, when the child is "False".Therefore, they determined the value for parameter corresponding to each requirement by determining the negative influence relative to an arbitrary (non-empty) set of causes/barriers/leak parameter.However, in our application, it is possible to determine the value for parameter corresponding to requirements directly as we ask for the effect of absence of parent ("Well-written maintenance procedure") on the child ("Major cause for sensor (S 1 ) sends incorrect water level measurements"), when the latter ("Major cause for sensor (S 1 ) sends incorrect water level measurements") is "Accidental technical failure".In order to find the value for parameter corresponding to a requirement directly, the elicitor could ask the experts: "What is the probability that the child is in their distinguished state given that the parents are in their distinguished states, except U i and no other unmodelled causal factors are present?".In our example shown in Fig. 7, the elicitor could ask experts to find the value for parameter d U2 : "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is accidental technical failure given that the physical access-control for sensor (S 1 ) is strong, data integrity verification is performed for sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is not well-written and no other unmodelled causal factors are present?".
Once we determine the required parameters based on appropriate elicitation questions, we can completely define the CPT of the child variable using (1): In the Eq. ( 1), Y represents the effect variable which has values y for the effect being in the non-distinguished state ("Intentional attack") and y ′ for the effect being in the distinguished state ("Accidental technical failure").X denotes the set of parents which interact with the effect variable as promoting influences, U denotes the set of parents which interact with the effect variable as inhibiting influences, + X denotes the subset of X that contains all parents that are in their non-distinguished states, + U denotes the subset of U that contains all parents that are in their non-distinguished states.v XL denotes the leak parameter which expresses the probability of y ("Intentional attack") given all parents are in their distinguished states, v Xi denotes the probability of y ("Intentional attack") given that the parent X i is not in its distinguished state and all other parents are in their distinguished states, d Ui denotes the probability of y ′ ("Accidental technical failure") given that the parent U i is not in its distinguished state and all other parents are in their distinguished states.
We choose the DeMorgan model for our application to reduce the number of conditional probabilities to elicit as they support modelling opposing influences with clear parameterisations.

Technique for facilitating individual probability entry
This section explains our chosen technique for facilitating individual probability entry for our application.
Our systematic method for knowledge elicitation to construct CPTs of BN models would be incomplete without a technique that facilitates individual probability entry.The DeMorgan models would help to reduce the number of conditional probabilities to elicit and allow elicitors to ask appropriate questions during probability elicitation.In addition, there should be a suitable technique which would make it easy for experts to answer elicitation questions in terms of probabilities.
There are well-known methods such as probability scale [19,44], and probability wheel [45] which would help to facilitate individual probability entry [17,46].The basis for choosing a particular method includes accuracy, less probability elicitation time, and usability [46].Wang et al. compared three different methods: (i) direct estimation, (ii) probability wheel and (iii) probability scale in terms of their accuracy and time taken to elicit probabilities from experts [47].They pointed out that probability scale is better in terms of both accuracy and probability elicitation time compared to the other two methods.
A probability scale can be a horizontal or vertical line with several anchors [46].Fig. 8 shows a probability scale with 7 numerical and verbal anchors [48].However, there are several variants of probability scales which would help to facilitate individual probability entry.Witteman et al. compared 3 probability scales: (i) probability scale with numerical and verbal anchors, (ii) probability scale with only numerical anchors, and (iii) probability scale with only verbal anchors [49].They compared 3 probability scales based on a study with general practitioners in the domain of medical decision making.They concluded that all 3 probability scales are equally suitable to facilitate individual probability entry.However, they recommended the probability scale with numerical and verbal anchors to facilitate individual probability entry as it provides numerical anchors for experts who prefer numbers and verbal anchors for experts who prefer words.Furthermore, Witteman et al. compared 2 different probability scales: (i) probability scale with numerical and verbal anchors, (ii) probability scale with only numerical anchors [50].They compared 2 probability scales based on a study with arts and mathematics students.They concluded that the probability scale with numerical and verbal anchors is more comfortable to use compared to the probability scale with only numerical anchors.
There are real-world applications of the probability scale with numerical and verbal anchors in the elicitation of probabilities to construct the quantitative part of BN models [19,44].Van der Gaag et al. used the probability scale with numerical and verbal anchors for a case study in oesophageal cancer [19].This study was conducted with two experts in gastrointestinal oncology.The experts found that this method is easier to use than any other method they used before.Van der Gaag et al. also highlighted that the large number of probabilities are elicited in a reasonable time using this method.Furthermore, Hanninen et al. used the probability scale with numerical and verbal anchors for the construction of quantitative part of collision and grounding BN model [44].This study was conducted with 8 experts who possessed maritime working experience between 3 and 30 years.These studies show that the probability scale with numerical and verbal anchors can be used for facilitating individual probability entry involving experts with different background.
We choose probability scales for our application as they are better in terms of accuracy and probability elicitation time compared to other methods.In particular, we would employ the probability scale with numerical and verbal anchors to facilitate individual probability entry in our application as they are effective and practicable based on previous studies.We would utilise the probability scale with 7 numerical and verbal anchors to facilitate individual probability entry with a variation.In our application, the experts could mark the suitable probability among 7 anchors in the scale directly or express fine-grained probabilities using the probability scale with numerical and verbal anchors as a supporting aid to visualise the probability range.This is convenient when the experts would like to express fine-grained probabilities based on historical data which is realistic for accidental technical failures in our application.
As a part of the probability elicitation process, in addition to the case outline, we also need to provide information related to the type of floodgate (examplecriticality rating: very high) and context (example threat level: substantial).This guideline would help to avoid very diverse responses over participants as they have substantive information based on the system knowledge.This is evident from our application of the proposed approach [16].Furthermore, it is also important to select appropriate group of experts to elicit probabilities considering the type of floodgate and needed expertise.For instance, in our application of the proposed approach, we relied on experts who have experience working with safety and/or security of ICS in the water management sector in the Netherlands as we dealt with a type of floodgate in the Netherlands [16].
Finally, focus group workshop is one of the approaches that can be used to facilitate the probability elicitation process in addition to questionnaire [16].The use of focus group workshops can also help to facilitate discussion among the participants once we gather the responses from each of them on the reasoning behind the varied probabilities which they provided for some variables (if any) [16].These mechanisms would supplement the probability scales with numerical and verbal anchors and allow us to elicit reliable probabilities.

Application of the methodology
In this section, we use an illustrative case of a floodgate in the Netherlands to explain how we effectively construct CPTs of BN models for distinguishing attacks and technical failures.
We considered the upper and middle layer of our framework for the application of our methodology.It is important to reduce the number of conditional probabilities to elicit for the problem variable as a considerable number of contributory factors (upper layer), corresponding to intentional attack and accidental technical failure, typically interact with the problem variable (middle layer), which in turn increases the CPT size of the problem variable exponentially.On the other hand, the conditional probabilities for observations (or test results) (lower layer) would be easy to elicit directly as there is only one problem variable (middle layer) in our framework, which makes the CPT size of an observation (or test result) variable to 4 (2 1+1 ).We shall consider the BN model with the upper and middle layer of our framework depicted in Fig. 7 for the application of our methodology.We considered the problem "Sensor (S 1 ) sends incorrect water level measurements" as it could develop more complex situations in the case of floodgate.In case the floodgate closes when it should not be based on the incorrect water level measurements sent by the sensor (S 1 ), it would lead to severe economic damage, for instance, by delaying cargo ships.On the other hand, in case the floodgate opens when it should not be due to incorrect water level measurements sent by the sensor (S 1 ), it would lead to flooding.
The normal text (i.e., text without bold formatting) in Table 5 denotes the explicitly mentioned causal factors that are absent (Example: data integrity verification is performed for the sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is well-written).This makes the probability elicitation process simple as they do not affect the corresponding probability based on our structural assumptions.The experts could directly read the remaining text (i.e., text with bold formatting) (Example: "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is intentional attack given that the physical access-control for sensor (S 1 ) is weak and no other unmodelled causal factors are present?")and mark the answer which could also reduce probability elicitation time.
We considered 4 contributory factors to the major causes (intentional attack or accidental technical failure) of the observed problem: (i) Weak physical access-control (X 1 ), (ii) Sensor data integrity verification (X 2 ), (iii) Lack of physical maintenance (U 1 ), and (iv) Well-written maintenance procedure (U 2 ) as shown in Fig. 7 to depict each type of causal interaction.The type of causal interaction between individual parent X 1 and the child Y is cause.The type of causal interaction between individual parent X 2 and the child Y is barrier.The type of causal interaction between individual parent U 1 and the child Y is inhibitor.The type of causal interaction between individual parent U 2 and the child Y is requirement.In this example, we need to elicit only 5 (4 + 1) parameters instead of 32 (2 4+1 ) to completely define CPT for the problem variable.The 5 parameters which we need to elicit are: The values for these 5 parameters could be elicited from experts by providing the appropriate elicitation questions based on the DeMorgan model and the probability scale with numerical and verbal anchors, which could help experts answer in terms of probabilities to elicitation questions as shown in Table 5.The normal text in Table 5 makes the probability elicitation process simple as they do not affect the corresponding probability based on our structural assumptions.The experts could directly read the remaining text and mark the answer for each question in Table 5 which could also reduce probability elicitation time.Suppose the expert marks the answer for v XL as 0.15, v X1 as 0.50, v X2 as 0.25, d U1 as 0.85, d U2 as 0.50.These probabilities are examples to demonstrate the application of the methodology.
Once we elicit all the required parameters, we could use (1) to completely define CPT for our example BN model.For instance, we could use (1) to calculate: The number with bold formatting in Table 6 denotes this probability.The completed CPT for the problem variable (Y) is shown in Table 6.
Once we complete the CPT for the problem variable, we could define the a priori probabilities for each contributory factor and observation (or test result) by eliciting corresponding probabilities directly from the experts as they are not complicated.An example BN model with corresponding CPTs for each variable is shown in Fig. 9.
Once the problem ("Sensor (S 1 ) sends incorrect water level measurements") is observed in the floodgate, the evidence (True/False) contributory factors and observations (or test results) could be set by the operator (or end-user) to determine the major cause for the observed problem.Once the evidence for contributory factors and observations (or test results) is set, the posterior probability of the problem variable would be computed accordingly.Based on the computed posterior probability, the appropriate response strategy could be put in place for the most likely major cause (intentional attack/accidental technical failure) for the observed problem ("Sensor (S 1 ) sends incorrect water level measurements") thereby minimising negative consequences.
In the example shown in Fig. 10, we provided the evidence for the contributory factors "Weak physical access-control (X 1 ) = True", "Sensor data integrity verification (X 2 ) = False", "Lack of physical maintenance (U 1 ) = False", "Well-written maintenance procedure (U 2 ) = True", and observation (or test result) "Abnormalities in other locations (Z 1 ) = True", "Sensor (S 1 ) sends correct water level measurements after recalibrating the sensor (Z 3 ) = False".On the other hand, we did not provide the evidence for the problem "Major cause for sensor (S 1 ) sends incorrect water level measurements (Y)" and observation (or test result) "Sensor (S 1 ) sends correct water level measurements after cleaning the sensor (Z 2 )".The BN computes the posterior (updated) probabilities of the other nodes (Y, and Z 2 ) based on the provided evidence.The BN in Fig. 10 determines that the major cause for the observed problem "Sensor (S 1 ) sends incorrect water level measurements" is most likely due to intentional attack as the corresponding posterior probability (0.97306) is higher compared to the posterior probability of accidental technical failure (0.02694).

Discussion
An example parameter elicitation for the problem variable (Y) without reduced number of conditional probabilities is provided in Table 7).This example helps to highlight key challenges especially in parameters.Furthermore, Zhang et al. also highlighted that reducing the number of conditional probabilities to elicit reduces the uncertainty and bias and improves elicitation accuracy [57].Finally, Zagorecki et al. conducted an empirical study to elicit probabilities under Noisy-OR assumptions in addition to elicit complete probabilities directly from human experts [57].Like DeMorgan structural assumptions, the elicitation of probabilities under Noisy-OR assumptions reduce the number of parameters that need to be elicited from exponential to linear in the number of parents to define a full CPT for the child variable.Based on the empirical study, Zagorecki et al. concluded that the elicitation of probabilities under Noisy-OR assumptions yield better accuracy than the elicitation of complete probabilities directly from human experts [58].
To determine the most critical variables, sensitivity analysis is performed with Y (Major cause for sensor (S 1 ) sends incorrect water level measurements) selected as the target node.The sensitivity levels are shown in Fig. 11.According to the results of the tornado diagram which shows 10 most critical events leading to Y due to intentional attack, "Lack of physical maintenance", "Well written maintenance procedure", "Weak physical access control" were identified as the top three most effective variables.Based on the tornado diagram, "Lack of physical maintenance" is identified as the most influential variable in the occurrence of the studied scenario.This in turn would help to focus on most critical variables during elicitation.
Performance-based weighting is one of the systematic approaches that can help to guarantee the accuracy of elicited parameters [51].In this approach, each expert is weighted on their performance in answering calibration (or seed) questions.These are a set of questions from the experts' field that have observed true values and also closely related to the variables of interest [52].The overall weight for each expert can be obtained by multiplying two separate scores, which include statistical accuracy (or calibration) score and information score [53].Accuracy score assesses how close an expert's estimate to the truth value.Furthermore, information score assesses the amount of entropy in what the expert says or in the expert's performance.This overall weight for each expert can then be used to combine multiple expert judgements.Eggstaff et al. highlighted that the performance-based weighting significantly outperforms equally weighting expert judgement [54].There are various applications of performance-based weighting [51,55,56].This can supplement the proposed framework to ensure the accuracy of elicited parameters.

Conclusions and future work directions
Limited availability of data is one of the key challenges to construct BN models in domains like cyber security which results in modellers depending on expert knowledge.However, BNs are not suitable for knowledge elicitation involving domain experts.In our previous work, we developed a systematic method using fishbone diagrams for knowledge elicitation involving domain experts to construct the DAGs of BN models for distinguishing attacks and technical failures.Noticeably, the systematic method for knowledge elicitation involving domain experts Major cause for sensor (S 1 ) sends incorrect water level measurements (Y) not performed for sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is well-written?"16 "What is the probability that the major cause for the observed problem (sensor (S 1 ) sends incorrect water level measurements) is intentional attack given that the physical access-control for sensor (S 1 ) is strong, data integrity verification is not performed for sensor (S 1 ) data, sensor (S 1 ) is always physically maintained, maintenance procedure for sensor (S 1 ) is not well-written?"to produce the effect and that the hidden processes that may inhibit the occurrence of the effect are mutually independent [35].In case all the modelled causes of the effect are false, the property of accountability requires that the effect be presumed false, i.e., P(y ′ |x 1 ′ ,x 2 ′ ,…, x n ′ ) = 1.In the noisy-OR model, the effect can be caused by any cause similar to a logical-OR.However, the relationship is not deterministiceach of the causes X i alone can cause the effect with probability p i, which is known as link probability [36].
′ represents the absence of the other causes except X i .The probability of any combination of active causes can be calculated as: Where X represents all active causes.

Causal strength (CAST) logic
CAST logic is applicable when there are several parents and a common child as shown in Fig. 2A [39].CAST logic assumes all the variables in the model are binary.CAST logic is only applied in the international policy and crisis analysis domain [41].The interaction between a parent and the common child can be either promoting or inhibiting.The promoting influence is depicted by an arrowhead, whereas the negative influence is illustrated by a filled circle as shown in Fig. 2A.
The parameters which need to be elicited to completely define CPTs using CAST logic are: (i) causal strengths (g Xi ,h Xi ) for each arc, and (ii) baseline probability (b) for each variable.The values of causal strengths (g Xi ,h Xi ) are not probabilities and can take any arbitrary values from the range [− 1, 1].The value of causal strength (h Xi ) indicates the change in belief of Y relative to the baseline probability of Y (b Y ) under the assumption that X i is in "True" state.For instance, h X1 indicates how much the presence of X 1 would change our belief of Y. On the other hand, the value of causal strength (g Xi ) indicates the change in belief of Y relative to the baseline probability of effect (b Y ) under the assumption that X i is in "False" state.For instance, g X1 indicates how much the absence of X 1 would change our belief of Y.
Once we elicit the above-mentioned parameters, we could apply CAST algorithm for every combination of parent states to completely define the CPT of child variable.CAST algorithm consists of four steps: (i) aggregate positive causal strengths, (ii) aggregate negative causal strengths, (iii) combine the positive and negative causal strengths, and (iv) derive conditional probabilities.
In the first step, the positive causal strengths are aggregated using (1A): Where s Xi can be g Xi or h Xi depending on the state of the parent.
In the second step, the negative causal strengths are aggregated using (2A): Where s Xi can be g Xi or h Xi depending on the state of the parent.
In the third step, the positive and negative causal strengths are combined.The overall influence (O) of all parents is determined using (3A) if S + > =S − and using (4A) if S − < S + : In the final step, the conditional probabilities are derived using (5A) if O j ≥ 0 and using (6A) if O j < 0: Where O j denotes the overall influence of j th combination of parent states X j .Q13.Is it possible to evaluate the developed method in the real water management infrastructure?If so, are there any challenges?Q14.Whether do we have access to system architectures of any real water management infrastructure or not?

Constraints (Cs) and requirements (Rs)
Based on the responses which we received from the experts to those questions, the following set of constraints and high-level requirements is extracted by manually analysing the interview notes and summarising the essence of the responses: C1.When the operators notice an abnormal behaviour in a component of the ICS, they presume that this is due to a technical failure and initiate corresponding response procedures.The response strategy initiated towards a technical failure is not effective in case of an attack.C2.There is a lack of real data regarding cyber-attacks as they claim that there are no/limited cyber-attacks on their infrastructures.Furthermore, this is not shareable due to the sensitivity of data.C3.Technical failures occur in their infrastructures which are documented as technical failure reports.However, they are also not shareable due to the sensitivity of data.C4.The automation department deals with the technical failures, whereas the security department deals with cyber-attacks in the water management infrastructure.There are experts who have expertise in dealing with both technical failures and cyber-attacks.C5.Experts are limited in this domain with limited time availability.C6.The real water management infrastructure like a floodgate is not available for the evaluation of the developed method due to availability and criticality issues.C7.There are system architectures with specific components which are not shareable due to the sensitivity issues.However, there is a possibility to arrange a visit to a water management infrastructure which could help to understand the system architecture on a high-level.Furthermore, the system architecture needs to be anonymised when publishing it.C8.There is a need for decision support that would help operators to distinguish between intentional attacks and accidental technical failures as it provides input to the decision-makers to choose appropriate response strategy.However, the selection of these response strategies also depends on cost-benefit and feasibility.R1.An effective and practical alternative to data-driven approaches for developing decision support to distinguish between attacks and technical failures is required.R2.Decision support should help operators to distinguish between attacks and technical failures by taking into account real-time system information.R3.The method for developing decision support should facilitate to involve experts from the department that deals with technical failures and the department that deals with cyber-attacks including experts who have expertise in dealing with both technical failures and cyber-attacks.R4.The workload of experts during the knowledge elicitation process for developing decision support to distinguish between attacks and technical failures should be limited.R5.The reliability of knowledge elicited for developing decision support to distinguish between attacks and technical failures should be ensured.R6.The developed decision support should be scalable to different problems in the real environment.

S
.Chockalingam et al.

Fig. 11 .
Fig. 11.Tornado Diagram Obtained from Sensitivity Analysis for Major Cause for Sensor (S 1 ) Sends Incorrect Water Level Measurements.

Table 4
Type of Causal Interaction: Requirement.

Table 5
Parameter Elicitation for the Problem Variable (Y): Example.Major cause for sensor (S 1 ) sends incorrect water level measurements (Y)

Table 7 (
continued ) discussion guide Q1.When the operator notices an abnormal behaviour in a component of the ICS, how do they respond to it?Q2.Do you have a mechanism for the operator to determine whether an abnormal behaviour in a component of the ICS is due to attacks or technical failures?Q3.Does the same department deal with the attacks and technical failures?If not, how?Q4.Which functionalities do you think are important in a system which helps to distinguish between attacks and technical failures?Q5.Are there any cyber-attacks reported in your infrastructure?Q6.Are there any technical failures reported in your infrastructure?Q7.Doyou have a repository of technical failure reports?Q8.If so, whether this repository of technical failure reports is available for research or not?Q9.What do you think are the alternate data sources available for research?Q10.What are the challenges you foresee in the alternate data sources you proposed?In addition to risk factors and symptoms based on tests, what are other elements that you would take into account when you diagnose an (intentional) attack on a component?Q12.In addition to risk factors and symptoms based on tests, what are other elements that you would take into account when you diagnose (accidental) technical failure?