A Network Security Risk Assessment Method Based on a B_NAG Model

Computer networks face a variety of cyberattacks. Most network attacks are contagious and destructive, and these types of attacks can be harmful to society and computer network security. Security evaluation is an effective method to solve network security problems. For accurate assessment of the vulnerabilities of computer networks, this paper proposes a network security risk assessment method based on a Bayesian network attack graph (B_NAG) model. First, a new resource attack graph (RAG) and the algorithm E-Loop, which is applied to eliminate loops in the B_NAG, are proposed. Second, to distinguish the confusing relationships between nodes of the attack graph in the conversion process, a related algorithm is proposed to generate the B_NAG model. Finally, to analyze the reachability of paths in B_NAG, the measuring indexs such as node attack complexity and node state transition are defined, and an iterative algorithm for obtaining the probability of reaching the target node is presented. On this basis, the posterior probability of related nodes can be calculated. A simulation environment is set up to evaluate the effectiveness of the B_NAG model. The experimental results indicate that the B_NAG model is realistic and effective in evaluating vulnerabilities of computer networks and can accurately highlight the degree of vulnerability in a chaotic relationship.

cyberattacks continuing a trend of high growth over the previous six years. A survey of these attacks announced that the number of applications had quickly increased and was nearly three times higher than the percentage in 2016.
In recent years, researchers have introduced methods based on Bayesian probabilities in evaluating vulnerabilities of attack graphs [5][6][7]. Bayesian networks are capable of representing nondeterministic relationships and can be used to quantify the correspondences within attack graphs. Therefore, methods of effectively combining a Bayesian network with an attack graph for network vulnerability assessment have become an important focus of research.

Related Research
Recently, lots of scholars analyzed vulnerabilities of networks by using attack graph. Because of the asymmetric information between attackers and defenders, the detection of Zero Day attacks is still challenging. Revealing Zero Day attacks based on attack paths is a better strategy than targeting them individually.
Sun et al. [8] implemented the system ZePro to identify Zero Day attack paths by adopting the probabilistic approach. With evidence of intrusion as input, the Bayesian network used in this system can calculate the infection probabilities concerning object instances.
The dynamic defense framework was presented to select best countermeasures against diverse attack damage costs [9]. To calculate these costs, a new defense-centric model was designed on the basis of service dependency graphs. The current approaches suffer from some limitations. For example, only static countermeasure effectivity and static countermeasure deployment costs are considered, but the negative impacts of the possible countermeasures on service quality are neglected [10]. These above-mentioned restrictions may lead an industrial control system (ICS) to choose improper countermeasures and deployment locations. And then they can degrade the network performance and frustrate legitimate users.
The construction and analysis on inference rules of attack graph was presented by Garg et al. [11]. They developed a methodology for prioritizing individual vulnerabilities and attack paths using a PageRank model. The results were verified by using a Markov model, and showed that the methodology outperformed lots of current technologies [12] about risk analysis. However, the relevant experiment was lack of specific indicators, and the results were not convincing.
As Zhang et al. [13] said, dynamic risk analysis is an important component of protecting network security. However, risk assessment methods used in network systems are not very appropriate for ICSs due to their unique characteristics. That paper proposed a multilevel network model including attack functions and incidents based on Bayesian. On this basis, it proposed a new risk incident prediction method, and designed a dynamic security risk assessment method which can assess the risk caused by unknown attacks [14]. Moreover, a quantification method was presented to further calibrate the accuracy of assessment. Finally, to test and verify the method, the simplified control system was simulated in MATLAB.
On the basis of previous researches, the paper presents a Bayesian network attack graph (B_NAG) model and an algorithm to assess network vulnerabilities. In this paper, probability theory is introduced into the resource attack graph (RAG) model and converted into the corresponding B_NAG model. The reachability probability of nodes can be calculated, and the final reachability probability of attack paths can be calculated. Finally, the related posterior probability can be calculated, and enable network security administrators to assess network security more accurately and effectively.

The RAG Model
Attack graph is a method to analyze all sequences of vulnerabilities exploited by attackers. Attacks can be occurred against all available node status and vulnerability, and all sequences can be constructed into a directed graph. The purpose of the RAG model is to characterize an attack sequence launched against the attacker's intentions according to Bayesian probability calculations to help network administrators properly understand the security status of their networks. The RAG model is constructed as described below.

Definition 1
The graph RAG ¼ ðS; S 0 ; A; E; À; L; OÞ is a directed graph, where the relevant notations are defined as follows: gdenotes a resource state nodes set. S 0 ∈ S denotes the initial resource state nodes which are occupied by the attacker. A ¼ a i ji ¼ 1; . . . ; N f grepresents a set of attack behavior nodes. E ¼ fE 1 [ E 2 g denotes a set of directed edges connecting all related nodes. E 1 S Â A means that the attack will be occurred only if one attacker occupies some resources; E 2 A Â S means that the attack can make this attacker occupy some resources. Its parent nodes set m is denoted as PreðmÞ, and the child nodes set m is denoted as NexðmÞ. Γ is the node state discriminant function. ÀðxÞ denotes the current status of the node x and ÀðxÞ 2 f1; 0g, where Àðs i Þ means the current status of s i . Àðs i Þ ¼ 1 indicates that the attacker has occupied the resource s i . Conversely, 0 indicates that the attacker has not occupied the resource. L is the logical relationships set between nodes, and L ¼ fand; or; bleg. There is an and relationship between Pre(a i ) only if all preconditions for the corresponding attack node a i are met. And a successful attack will enable Àðs i Þ ¼ 1 only if the attacker has occupied the resource s i . There is an or relationship between attack nodes when resource state nodes are child nodes. Finally, ble denotes a kind of chaotic logical relationship which exists between parent nodes. O ¼ fo i ji ¼ 1; 2; 3; …; N g represents the set of resource state nodes associated with those successful attacks which have been detected. For 8o i 2 S, o i represents the resource state nodes associated with the successful attacks are detected by IDS.
Definition 2 Attack path: In the RAG, if there exists a status sequence s 0 ; a 0 ; s 1 ; a 1 ; . . . ; a nÀ1 ; s n , where s 0 represents the initial node of resource state and s n represents the target node. So the Path k ¼ , s 0 ! a 0 ! s 1 ! a 1 ! . . . ! a nÀ1 ! s n > can be defined, where 8s i 2 S; 8a j 2 A ð0 i n; 0 j n À 1Þ; The Path k denotes the attack path k th .
Definition 3 Attack behavior: One attack behavior can be denoted by a four-tuple of the form Src id; Dst id; Att code; Res ð Þ , where Src id denotes the host id launching an attack, Dst id denotes the host id which has been attacked, Att code is the number which can identify attack behaviors, and Res is the result of this attack.
Definition 4 State transition: One state transition is denoted by a three-tuple of the form ðsid; vid; rÞ, where sid is the number which can identify state transitions, vid is the number which identify vulnerabilities used by attackers, and r is the resulting state transition which is caused by one attack using vulnerabilities.

The Method of Metrics
To remove loops in an attack graph, an attack difficulty metric is introduced. In the Common Vulnerability Scoring System (CVSS), three basic indexes are used to characterize vulnerabilities: the access vector index, the access complexity index, and the authentication index, which are denoted by Acc_com, Acc_vec and Auth respectively. The values of these indexes associated with different levels of severity of a vulnerability are shown in Tab. 1.
Based on these indexes, the availability score of a vulnerability used in the CVSS is defined as An attack becomes more difficult to perform successfully as the value of Exp gets smaller. Thus, the attack difficulty is inversely proportional to the availability of a vulnerability. Accordingly, an attack difficulty metric Aga Dif may be defined based on the above three indexes as shown in Eq. (2). The larger the value of Aga Dif is for a particular node, the more difficult the node is to attack.

The Algorithm E-Loop
In the generation of the RAG, a loop may arise that leads to repeated traversals over a given node. It has a great influence on Bayesian probability calculation in network security assessment. In order to overcome the problem, the algorithm E-Loop is proposed to eliminate loops in the RAG. The specific steps are as follows: Fig. 1 shows an RAG built as described above. There are two loops, Path 1 ¼ , s 2 ! a 3 ! s 5 ! a 5 ! s 2 > and Path 2 ¼ , a 9 ! s 11 ! a 12 ! s 12 ! a 9 >. For Path 1 , the node a 5 will be eliminated by the algorithm E À Loop to remove the loop; For Path 2 , node a 9 can never be reached because of Aga Dif a 9 ð Þ ! 1, so this loop can be removed by eliminating this node and all subsequent nodes. Fig. 2 shows the acyclic RAG (Ac RAG) obtained after the loops are eliminated by the E-Loop algorithm. Step 1 Start nodes are added to the queue of the root node.
Step 2 All loops found are stored in the initialization stack: InitðÞ.
Step 3 Carry out a depth-first traversal from the begining node and then traverse every node: root ¼ GetRootðÞ.
Step 4 Push every visited node into the stack: PushStackðrootÞ. Until all nodes are traversed or the currently traversed node has been traversed. Finally, a loop has been stored in the stack.
Step 5 In the loop, Aga Dif i of every node must be calculated, and the node attacked most difficultly can be found: Step 6 Delete S m to eliminate the loop: DeleteðS m Þ.
Step 7 Loop through Step3-Step6 until there is no loop in the RAG.
Step 8 Output Ac RAG.

Probability Calculation in the B_NAG Model
In the B_NAG, the probability of each node is only constrained by its parent nodes, and the node remains conditionally independent of the others. In the RAG, the transition of node state is only correlated to whether the relevant resource has been occupied or not. A child node can occur a state transition only if its parent nodes are occupied. Thus, the state transition needs be associated with conditional independence in the B_NAG.
Tab. 2 presents the corresponding relationship between an Ac_RAG and a B_NAG. Although these graphs have corresponding structures, differences exist in their certain nodes. The detailed implementation described below is based on a B_NAG. The resulting resource state node and the conditional resource state node: The resource state node where the attack has been occurred successfully is called the resulting resource state node; When the attack condition is satisfied, the required resource state node is called the conditional resource state node.
. . . ; Ng, the set of weights between the resource state nodes: W is represented in the form of two-tuples ðdepcoef; cos tÞ), where depcoef denotes the correlation coefficient between resource state nodes and cos t denotes the cost required to attack another resource state node from the current node. w ij is the weight value between the node s i and the node s j . As illustrated in the example shown in Fig. 2, an RAG consists of four structures: a series structure, a parallel or structure, a parallel and structure, and a mixed structure. While converting such a graph into a B_NAG, each of these structures can be transformed as follows: (1) Series structure: By deleting the attack behavior node a 1 , the attack behavior can be represented by the directed edge from s 1 to s 2 : (2) Parallel or structure: The nodes a 10 and a 11 exist an or relationship, meaning that the attack is able to occur when the resource state condition corresponding to either of the parent nodes s 9 or s 10 can be satisfied. The related attack behavior nodes are removed. And the resulting resource state node and the conditional resource state node can be linked by one directed edge. In the B_NAG, the resource state nodes have an or relationship: ðs 9 ! a 10 ; s 10 ! a 11 ; a 10 _ a 11 ! s 12 Þ ) ðs 9 _ s 10 ! s 12 Þ (3) Parallel and structure: The parent nodes s 2 and s 3 of a 3 have an and relationship, meaning that the attack behavior may occur only if all resource state conditions are satisfied. After the attack node is removed, and the resulting resource state node and the conditional resource state nodes can be linked by one directed edge, which represents the attack behavior. In the transformed B_NAG, the resource state nodes still have an and relationship: ðs 2^s3 ! a 3 ! s 5 Þ ) ðs 2^s3 ! s 5 Þ (4) Mixed structure: The parent nodes s 6 and s 4 of the node a 6 have an and relationship, and the two nodes a 6 and a 7 that can get to the resulting resource state node have an or relationship. If the node a 6 is directly removed, the structure of the RAG will become confusing, causing inconvenience in the Conditional independence of nodes conditional probability calculation. In order to solve the problem, this paper defines a temporary mixed nodeblend; namely, the node a 6 is denoted as the node blend: ðs 6^s4 ! a 6 ; a 6 _ a 7 ! s 9 Þ ) ðs 6^s4 ! blend; blend _ s 7 ! s 9 Þ After this conversion process, each edge in the converted B_NAG represents an attack behavior and has a weight that describes the correlation between the two resource state nodes connected by that edge. It can be observed from the converted B_NAG shown in Fig. 3 that blend is a mixed resource state node representing the combination of s 4 and s 6 , so there must be directed edges from s 4 and s 6 to blend, namely, Pðblendjs 4 ; s 6 Þ ¼ 1. The relationships between the nodes do not change upon conversion into a B_NAG, and only the resource state nodes will be included. All attack behaviors are represented by the directed edges of the B_NAG, and the only possible relationships are and and or.
To clarify the process, the conversion algorithm is proposed as follows.

Calculation of the Probability of Reaching a Node Based on the B_NAG
The direct parent nodes of node S are denoted here by DPreðSÞ, and the attack probability P a S ð Þ of the target node can be calculated: The state transition index P m ðcos t i Þ can be denoted as the probability of the conversion from S iÀ1 to S i . Because of the correlation between one resource and its parent nodes, the weights W must be considered when the state transition indexes of the parent nodes are calculated. If a sufficiently high cost is paid, the attack will be guaranteed to be accomplished; namely, if cos t ! 1, then P m ðcos tÞ ¼ 1. If no cost is afforded, any target can't be attacked successfully; that is, when cos t ¼ 0, P m ðcos tÞ ¼ 0. If an attack on a node fails, the state of this node remains unchanged. For the state transition index P m ðcos t i Þ, its value follows a certain distribution. Thus, P m ðcos t i Þ is calculated as follows: Here, cos t refers to the cost required to perform an attack, that is, the knowledge, experience, and resources needed to complete the attack. Cos t means the average cost required to complete the final attacks; it's a default value and relies on the resources, knowledge, attack tools and time. depcoef is the correlation degree: Accordingly, the state transition index P m ðcos t i Þ is calculated as It can be concluded from Eq. (6) that resource state nodes in the B_NAG interact with each other, so the probability of reaching a given node cannot be analyzed only by traditional inference in the vulnerability analysis; instead, these state transitions must also be considered deeply. To solute this problem, the index of state transition is used to consider the probability of node state transitions when assessing vulnerabilities of the network. P end denotes the probability of reaching a target node: P end ðs i Þ ¼ P m ðcos t i Þ Â P a ðs i Þ ¼ ð1 À e À cos t i Aga Dif Þ Â Pðs i jD Pr eðs i ÞÞPðD Pr eðs i ÞÞ Here, P m cos t i ð Þis the state transition index of the target node s i ; cos t and depcoef are the attack cost and the correlation degree between node s i and its parent nodes, respectively; and P a s i ð Þ denotes the Bayesian probability of s i is attacked. Eq. (7) gives the probability of reaching a single target node; the probability of reaching a whole path will be obtained by iterating Eq. (7) accordingly. The related iterative algorithm is provided below.
In Algorithm 3, all nodes are traversed firstly. The parent nodes Preðs i Þ are pushed onto their respective stack q in accordance with the number of direct parent nodes DPreðs i Þ of s i , and it must make sure that the start node in the path of DPreðs i Þ is finally pushed onto q. Then, the nodes are each removed in proper order based on the "last in first out" principle, and their reachability probability can be calculated to finally determine the whole path's probability being reached.
In the example shown in Fig. 3, if s 12 is attacked, it is obvious that the attack target can be accomplished by the below three paths: If the weights W, the values of depcoef and the attack costs are shown in Fig. 3, then the prior probabilities of s 4 and s 1 are 0.3 and 0.2, respectively. For instance, the related steps for Path 1 are described below: In a similar way, the following is obtained from Eq. (7): the reachability probabilities of s 9 and s 12 by following Path 1 are Pðs 9 Þ ¼ 0:0272 and P end ðs 12 Þ ¼ 0:0201, respectively; the reachability probability of s 12 by following Path 2 is P end ðs 12 Þ ¼ 0:0648; and the reachability probability of s 12 by following Path 3 is P end ðs 12 Þ ¼ 0:0573. If the administrator knows that a 1 has been targeted, namely, Pðs 1 Þ ¼ 1, then the reachability probability of s 12 by following Path 1 can be recalculated as P end ðs 12 Þ ¼ 0:0631. This indicates that when the resource state condition corresponding to s 1 is met, s 12 is more possible to be attacked, which is the same as expected.

Posterior Probability Calculation Based on the B_NAG
In a B_NAG, it is not possible to monitor changes in the network security conditions in real time when the probabilities of resource state nodes attacked by attackers are calculated. Based on the detected the precondition and the available information of security incidents, the posterior probabilities should be calculated, and these related node probabilities can then be updated to achieve real-time monitoring. The equation for calculating a posterior probability is as follows:  Fig. 3 can be detected, and the probability of s 12 is 1. Then, the posterior probability of s 9 is calculated as follows by Eq. (8).
6 Experimental Analysis

Experimental Network Environment
To verify that the given method is feasible and effective, the experimental environment shown in Fig. 4 was created. The experimental network includes five hosts: the attacking machine, a web server, a file server, an e-mail server, and a database server. For ease of description, these hosts are represented by the letters A, W, F, E and D, respectively. W opens the telnet service, F opens the File Transfer Protocol (FTP) service, E opens the FTP and Hypertext Transfer Protocol (HTTP) services, and D opens the Oracle service. The final aim of attacker A is to obtain root permissions for host D, but the firewall allows the foreign host A access to only the telnet service of host D and denies other external access. Similarly, host E is allowed access to only the Oracle service of host D, while the other three hosts can openly gain access to each other's services. Host W can directly access host E; when it obtains access to the two services provided by host E, it can, in turn, gain direct access to the Oracle service of host D. Information about the internal host is shown in Tab. 3.

Experimental Results and Analysis
After loops have been removed as previously described during the generation of the RAG in accordance with the attack graph model and the topological graph of the experimental network, the corresponding descriptions of the attack behavior nodes are as shown in Tab. 4. These attacks are related to the services provided by the hosts and their vulnerabilities.
After the application of the conversion algorithm based on the topological graph of the experimental network to replace the attack behavior nodes mentioned in Tab. 4 with corresponding edges, the converted B_NAG is as shown in Fig. 5.
As shown in Fig. 5, each host node must win the trust of another host through a service provided by that other host, corresponding to a parallel "and" structure in the graph. When one host opens two services, the trust of that host can be obtained by gaining access to either one of its services, so the relationship between the possible attacks against that host is "or". A node with a mixed relationship can directly access the service provided by another host by crossing over the host it is attacking once it gains access to both services of the target host. A blend node is introduced to address the corresponding mixed relationship in this graph.
There are 5 paths in Fig. 5 through which the target host D can be reached. The attack path information and the probabilities of reaching each whole path are shown in Tab. 5. P 1 denotes the probability of reaching the whole path as calculated by considering the state transition index as proposed in this paper, while P 2 is the probability of reaching the whole path calculated without considering the state transition index.
Based on Tab. 5, the probability of reaching each host node is plotted in Fig. 6.
As shown in Fig. 7, the hosts attacked on Path1 and Path2 (and on Path3 and Path4) are the same; the only difference lies in the service of host E that is accessed. Path1 accesses the FTP service of host E, while Path2 accesses the HTTP service of host E. The final probabilities of reaching the whole path for Path1 and  Path2 are 0.03247 and 0.05784, respectively, as calculated using the proposed algorithm based on the state transition index. Obvious differences can be seen between the two paths in terms of the probability of the attack successfully proceeding from host F to host E, as shown in Fig. 7a. By contrast, when the state transition index is not considered, the final probabilities for Path1 and Path2 are 0.4768 and 0.4789, respectively, and there is no meaningful difference in the probability of proceeding from host F to host E, as shown in Fig. 7b. With the proposed algorithm, although the reference value of the probability for each node decreases, the differences in probability associated with attacking different nodes are fully apparent. Therefore, this approach is effective in enabling network security administrators to perform useful analyses. trust(D,W) a n d Figure 5: Example of a Bayesian network attack graph  As shown in Fig. 8, the traditional computational method for Path5, which includes a mixed relationship, is to calculate all "and" nodes and "or" nodes individually. This not only requires a large number of calculations but also ignores the correlations between nodes. The mixed node approach introduced herein provides better calculation results than the traditional method, and it does so with fewer calculations. For the mixed relationship identified when host W attempts to gain access to host E, the probability calculated by considering the state transition index effectively reflects the degree of hazard of the associated vulnerability, making this type of vulnerability more likely to be noticed by the network security administrator.

Conclusion
Improving the accuracy of network vulnerability assessments is an important topic in the field of network security. This paper presents a B_NAG model and an associated vulnerability algorithm as well as the algorithm E-Loop to eliminate loops in an attack graph. To effectively capture mixed relationships between nodes during the process of converting a RAG into a B_NAG, the Alg-AGTrans algorithm is also proposed. In addition, the indexes of node attack complexity and node state transition are introduced into the calculation of the probability of reaching each node, and the posterior probabilities are also calculated on this basis. The results of an experimental evaluation show that the model proposed herein can provide an accurate and effective assessment of network vulnerability. However, the proposed algorithm also has some shortcomings that should be addressed. For example, the effects of some factors, such as risk costs, are not considered when calculating the probability of reaching a node.
Funding Statement: This work was partially supported by the National Natural Science Foundation of China (61300216, Wang, H, www.nsfc.gov.cn).

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.