Applying the Bayesian Stackelberg Active Deception Game for Securing Infrastructure Networks

With new security threats cropping up every day, finding a real-time and smart protection strategy for critical infrastructure has become a big challenge. Game theory is suitable for solving this problem, for it provides a theoretical framework for analyzing the intelligent decisions from both attackers and defenders. However, existing methods are only based on complete information and only consider a single type of attacker, which is not always available in realistic situations. Furthermore, although infrastructure interconnection has been greatly improved, there is a lack of methods considering network characteristics. To overcome these limitations, we focus on the problem of infrastructure network protection under asymmetry information. We present a novel method to measure the performance of infrastructure from the network perspective. Moreover, we propose a false network construction method to simulate how the defender applies asymmetric information to defend against the attacker actively. Meanwhile, we consider multiple types of attackers and introduce the Bayesian Stackelberg game to build the model. Experiments in real infrastructure networks reveal that our approach can improve infrastructure protection performance. Our method gives a brand new way to approach the problem of infrastructure security defense.


Introduction
Modern society is highly dependent on infrastructure, and any failure of infrastructure will seriously affect people's daily life. With the increase of terrorism, how to effectively prevent attacks on infrastructure has become a worthwhile subject of research. For the protection of infrastructure, researchers also provide many research methods, such as probabilistic risk assessment and historical data analysis. However, because these methods need static input in the research process, it is not suitable for the study of intelligent countermeasure behavior [1][2][3]. Game theory is a natural modeling paradigm for a multi-agent intelligent interaction scenario, which can provide an accurate individual interaction model for the research on intelligent individual confrontation. As Hall [4] mentioned, "if the conditions creating the problems you had to deal with were natural or random, the answer was decision analysis (which looked a lot like what we now call risk analysis). If the conditions creating the problems you had to deal with were the result of deliberate choice, the answer was game theory." Feng et al. [5] proposed a game theory method to optimize the allocation of defense resources, which combines game theory with a risk assessment to optimize the allocation of limited defense resources in a city. Zhang et al. [6] analyzed the general intrusion detection system of infrastructure and proposed a game theory model for infrastructure security management. Nochenson et al. [7] obtained Nash equilibrium strategies under various cost conditions through simulation, to better provide infrastructure administrators with possible attack behavior and possible mitigation measures. Guan et al. [8] proposed a game theory model to study how the balanced allocation of the defender exist, the strategy selection problem of the attack and defense decision makers should be solved by game theory.
Researchers have made a preliminary exploration of the application of the game model to infrastructure protection from the perspective of the network. Li et al. made a preliminary study of the attack-defense game on complex networks by using simultaneous game theory [42] and sequential game theory [43] respectively, but this problem is still worth further exploring. Our previous work has studied the attack and defense game of infrastructure networks under asymmetric information conditions [44]. In the protection of infrastructure networks, there is still a lack of research on various types of attackers.
In summary, some researchers used game theory to study infrastructure network, and some researchers began to conduct network modeling of infrastructure system. However, there is still a lack of research that combines network science and game theory to model the infrastructure system from the perspective of a network and to conduct offensive and defensive games for uncertain attacker types. When facing multiple types of attackers, how to make use of the information advantage of asymmetric information between offensive and defensive sides to improve defense payoff is the focus of this paper. We first study the infrastructure attack-defense game based on the Bayesian Stackelberg game under the condition of multiple types of attackers and asymmetric information. To our best knowledge, this idea is new.
In this paper, we evaluate the network performance from the perspective of network science. A method of constructing asymmetric information is introduced. Then, we propose an active defense approach for the defender facing multiple types of attackers based on the Bayesian Stackelberg sequential game to improve the defender's payoffs by providing false information to the attacker.

Bayesian Stackelberg Active Deception Game Considering Multiple Types of Attackers
Consider a target network formalized in terms of an undirected graph G (V, E), where V is the set of nodes, E ⊆ V × V is the set of edges. The number of nodes is N = |V|. k i is the number of adjacent edges of node v i . Denoted by k = 1 N N ∑ i=1 k i the average degree of the network.
We classify players in the game as the defender and attacker. The type of defender in the model is single and the types of the attacker in the model are multiple. In the security game, as a convention the defender is usually assumed to be a female character and the attacker is assumed to be a male character.
In this section, we first introduce the motivation to research active deception defense with a Bayesian Stackelberg game in Section 2.1. Subsequently, the Stackelberg active deception game is defined in Section 2.2. Finally, we introduce the Bayesian Stackelberg active deception game considering multiple types of attackers in Section 2.3.

Motivation of Bayesian Stackelberg Active Deception Game
The active deception game is a game model based on asymmetric information of attack and defense, which is consistent with the fact that the target network information of attack and defense is asymmetric. Previous studies have shown that an attacker will obtain 80 percent (some say 100 percent) of the necessary information through public information or other intelligence sources before launching an attack [45]. Therefore, it is appropriate to assume that the attacker will collect complete network information before carrying out the attack. The model is studied as a sequential game model, that is to say, the defender first promises to use his mixed defense strategy, and the attacker chooses the attack strategy to maximize his payoff according to the defender's promise. However, on the other hand, the defender does not know whether the attacker has mastered the hybrid strategy promised by the defender, which means the defender does not know whether the active deception game is a simultaneous game or a sequential game. Zhuang and Bier have proved that the leader gains 'first-mover advantage' when the follower responds most to a single choice [19]. Therefore, the defender can choose to publish her defense strategy to force the attacker to play a sequential game. Thus in the field of security protection, a sequential game is more practical and useful than a simultaneous game.
Although we believe that the attacker has obtained enough disinformation (from the defender's active deceptive defense) before the attack, the defender still faces the challenge brought by the uncertainty of the type of potential attacker. Therefore, to make the active deception game more realistic and credible, it is necessary to extend the Stackelberg active deception game to the Bayesian Stackelberg active deception game.
Our study is based on the Stackelberg active deception game. The basic settings of the Stackelberg active deception game are described in Section 2.2.

Stackelberg Active Deception Game
The Stackelberg active deception game mainly applies the asymmetric information of both sides of attack and defense. After the defender reveals the false network to the attacker, the defender first promises to use a mixed defense strategy, and then the attacker chooses the optimal strategy to obtain more attacker's payoff. In this section, we first introduce the method for constructing false network information. Then we describe the cost model and strategies in the Stackelberg active deception game.

Method for Constructing the False Network Information
Because the defender masters the target network, she grasps the real target network completely. However, the attacker's mastery of network information is easily disturbed by the defender. We use α to represent the noise level of the disclosed network. Our model constructs the false network by adding α × |E| false edges and reducing α × |E| real edges on the true network. This construction method can keep the total number of edges in the network unchanged. Meanwhile, to control the credibility of the false network, the noise level range in this study is α ∈ [0, 0.5]. We use d i to represent the displayed degree of node v i in the false network.

Cost Model
Since attacking nodes can lead to more serious consequences, we assume that both attack and defense methods are targeted at nodes. An attack on a node will cause the connected edges to be removed altogether. The consumption of defending and attacking a single node is positively correlated with the degree of nodes because the degree of nodes in some infrastructure networks represents the importance of nodes to some extent. Correspondingly, the consumption of defending and attacking will increase with increasing importance. We use c D i and c A i to represent defense cost and attack cost of node i, respectively. Defense and attack costs are defined as follows: Among them, q D and q A are the defense cost coefficient and attack cost coefficient, respectively. Research shows that protecting a single infrastructure consumes far more resources than attacking it [45]. The defense resources allocated by the defender for node v i are r D i = q D k i = c D i , and the attack resources allocated by the attacker for node v i are r A i = q A d i = c A i . We define the defense resources needed to cover the whole network as (2) and the attack resources needed to cover the whole network as The resources available to the defender and the attacker are represented as R D and R A , which are defined as follows: In our research model, the focus of our research is on the strategies adopted by both sides when the attack and defense resources are not enough to cover the whole network. Therefore, we limit the attack and defense resources by controlling the parameters θ D and θ A at [0, 1], which is not enough to cover the whole network. When the resources allocated to the attacking and defending sides exceed the resources needed to cover the whole network, the so-called strategic game of attacking and defending loses its meaning and the maximum payoff can be obtained by directly covering all network nodes.
The aggregates of defended and attacked nodes are denoted as V D and V A , respectively. The defense strategy is expressed by D = [de f 1 , de f 2 , · · · de f N ], and the attack strategy is expressed by A = [att 1 , att 2 , · · · att N ]. If the node v i ∈ V D , then de f i = 1, otherwise de f i = 0. Similarly, the representation of the attack strategy is similar to that of the defense strategy.
C D and C A represent the total cost of defense resources and attack resources, respectively. We define C D and C D , and set the constraint as follows: and If the node is allocated the corresponding defense resource (de f i = 1), then the node v i is the protected node, we assume that the protected node v i will never be removed. Conversely, if an unprotected node is attacked, that is, att i = 1 and de f i = 0, there is a probability that the node will be removed. We define the probability of a successful attack on an unprotected node v i as the attack success rate:

Strategies
The strategy chosen by the defender and the attacker must satisfy Equations (5) and (6), respectively. The number of both players' strategies will increase dramatically as the number of node increases. Intuitively, for any complex network, the strategy space of confronting both sides is huge. In reality, the choice of strategies for both sides can not be completely random. For the convenience of research, we shrink the space of attack strategies to a high-degree attack strategy (HAS) and a low-degree attack strategy (LAS). In HAS, according to the degree of nodes, the transition from high-degree nodes to low-degree nodes is selected in turn until defensive resources are exhausted. The purpose of HAS is to attack some high-degree nodes to achieve a better attack effect. In LAS, on the contrary, nodes are chosen from low-degree nodes to high-degree nodes. Its purpose is to attack a large number of low-cost nodes. Similarly, defense strategies are composed of two typical defense strategies, namely a high-degree defense strategy (HDS) and a low-degree defense strategy (LDS).
Denoted by vector X = [x h , x l ] (x i ∈ [0, 1], x h + x l = 1) a mixed defense strategy, where x i represents the probability of adopting strategy i, and i ∈ {h, l} represents one of the two typical defense strategies-high-degree strategy or low-degree strategy. Similarly, Y = [y h , y l ] (y i = 0 or 1, y h + y l = 1) represents a pure attack strategy. Denoted by S D and S A are the defender's mixed strategy space and the attacker's pure strategy space, respectively.

Bayesian Stackelberg Active Deception Game
A Bayesian Stackelberg active deception game is a Stackelberg active deception game in which the defender faces multiple types of attackers. The defender only knows the prior probabilities of different types of attackers. In this paper, we only focus on the uncertainty of attackers' different types, not the uncertainty of their attack performance.
In this paper, we consider only one type of defender and multiple types of attackers. Without loss of generality, we assume that the defender faces only two types of attackers. One type of attacker focuses on the overall performance of the target network and is named the 'global-type attacker', while the other type focuses solely on the attack success efficiency and is called the 'local-type attacker'.
Define T as the set of possible types of attackers and define P as the prior probability of different types of attackers. In our study, For a given attacker type t ∈ T, suppose D t (X, Y) be the defender's payoff function when the defender chooses strategy X, and the t-type attacker adopts pure strategy Y.
The measure function of network performance is denoted by Γ. In this paper, we adopt the size of the largest connected component as the measure function.
Thus, the defender's payoff function is The defender pays attention to the defense effect on the target network-the smaller Γ Ĝ is, the smaller the payoff. Among them,Ĝ denotes the size of the largest connected component of the network remained after the confrontation.
The global-type attacker focuses on network performance, so we define the payoff function of the attacker as The local-type attacker focuses on the attack success efficiency, so we define the payoff function of the attacker as Among them, V remove is a set of nodes removed after a successful attack from the attacker's perspective. According to the above method, we can obtain the payoff matrix of the defender and the two types of attackers under all strategy interactions |S A × S D | which is shown in Figure 1. In Figure 1a, D g i,j is the payoff of the defender when facing the global-type attacker-the defender chooses strategy i and the attacker adopts strategy j. At this time, the payoff of the attacker is A g i,j . The row and column players are the defender and attacker, respectively. Similarly, when the defender faces the local-type attacker, the payoff matrix of the defender and the attacker are D l i,j and A l i,j , respectively, as shown in Figure 1b. A Bayesian Stackelberg Equilibrium (X * , Y 1 , Y 2 , · · · , Y |T| ) for the Bayesian Stackelberg active deception game is defined by Equations (11) and (12).

Solving the Active Deception Game Considering Multiple Types of Attackers
In game theory, the most common solution concept is Nash equilibrium. Under this equilibrium strategy combination, any participant cannot increase his own payoff by unilaterally changing the strategy [46]. Stackelberg equilibrium is a refinement of the Nash equilibrium concept in a Stackelberg game. In this equilibrium, each player will choose the best response in each sub-game of the original game. But when multiple strategies are not different for followers, the concept cannot guarantee a unique solution. In order to obtain the unique solution, Leitmann [47] proposed two concepts of Stackelberg equilibrium, which were named "Strong Stackelberg equilibrium" (SSE) and "Weak Stackelberg equilibrium" (WSE). Strong Stackerlberg equilibrium exists in all Stackelberg games, while Weak Stackerlberg equilibrium does not necessarily exist [48].
In the case that multiple types of attackers are considered, the attacker chooses the optimal attack strategy after knowing the defense plan of the defender, and then the Bayesian active deception game can be solved by Bayesian Stackelberg equilibrium (BSE).
After obtaining the payoff matrices of the defender and two types of attackers, we introduce an efficient exact method known as DOBSS (Decomposed Optimal Bayesian Stackelberg Solver) to calculate the Bayesian Stackelberg game equilibrium (BSE) [20].
The defender's strategy we denote by X, which consists of a vector of probability distributions over the defender's pure strategies. Hence, the value x i is the proportion of times in which pure strategy i is used. Denote by Y t the vector of strategies of attacker type t ∈ T. We also denote the payoff of the defender and each of the attacker types t by D t ij and A t ij . Let M be a large positive number. Assume that the prior probability of the t-type attacker the defender is facing is p t , with t ∈ {g(global), l(local)}, the defender solves the following: Among them, the prior probability of the occurrence of t-type attackers is represented by p t . x i represents the probability that the defender adopts strategy i in the mixed defense strategy. y t j represents the probability that the t-type attacker adopts strategy j in a pure attack strategy. Constraints 1 and 4 jointly restrict the value range of the probability of adopting various defense strategies in the mixed strategy of the defender to be [0, 1], and the sum of the probabilities of various defense strategies to be 1. Constraints 2 and 5 indicate that the attacker chooses the attack strategy after mastering the defense strategy of the defender, so the pure attack strategy is adopted. The t-type attacker chooses only one strategy as the optimal response strategy in the strategy set S A and y t j can only be 0 or 1. In constraint 3, since y t j is 0 or 1, when y t j is 0, since M is a large positive number, the right part of the constraint is always satisfied, and the left part of the constraint requires m t ≥ ∑ i∈S D A t ij x i . When y t j is 1, constraint 3 requires m t = ∑ i∈S D A t ij x i . In other words, given the mixed defensive strategy x of the defender, m t is the upper bound of the t-type attacker's payoff.
We can linearize the quadratic programming problem Equation (13) through the change of variables z t ij = x i y t j , thus obtaining the following mixed-integer linear programming problem: From the above BSE solving process, it can be seen that the BSE calculated by the DOBSS algorithm is also a Strong Stackelberg Equilibrium. It's just that the number of followers goes from one to multiple with a prior probability.

Game Equilibrium of Active Deception Defense Game
Because power-law networks exist widely in natural networks, a large number of infrastructure networks show power-law characteristics. Therefore, our research takes the scale-free network as the research object. The experimental object in this paper is the scale-free network (N = 500, k = 6) constructed by the BA model [49]. Our experimental results are obtained from the average of 1000 independent repeated experiments. Figure 2 shows the equilibrium payoffs when the noise level α equals 0, 0.1, 0.5, and the defender faces the global-type attacker and the local-type attacker separately. It can be seen that when the level of noise increases continuously, the equilibrium payoffs of the defender facing two different types of attackers also increase. In Figure 3, we show the sequential game equilibrium strategy of the defender and the attacker when the noise level α equals 0, 0.1, 0.3 and 0.5. Figure 3a,b are equilibrium strategies for the defender and the attacker when the defender faces the global-type and local-type attacker separately. The first line in each subgraph is a graphical representation of the defender's equilibrium defense strategy. The color of the square represents the probability of adopting HDS in the mixed defense strategy. The lighter the color of the square, the higher the probability of HDS in the equilibrium defense strategy. The darker the color, the lower the probability of representing LDS in an equilibrium defense strategy. The second line in each subgraph is an illustration of the attacker's equilibrium attack strategy. The red squares represent HAS, the orange squares represent LAS, and the yellow squares represent the same equilibrium payoff of the defender regardless of whether the attacker plays HAS or LAS.  We can find that, with the increase of noise level α, the attack strategies of the two types of attackers gradually tend to choose HAS. We obtain that the displayed degree expectation of the node v i is Among them, E(k cut i ) is the expected degree deduction of node v i and E(k add i ) is the expected increased degree of node v i . We randomly select the false network generated at different noise levels. As shown in Figure 4, we can see that the change trend of nodes' displayed degree matches the conclusions we deduced.
From Figure 4, we observed that nodes with a high degree have different degrees of decline in different noise levels, which is matched with Equation (4). Combined with Equation (7), we find that the false network used in our model does not change the ranking of node degree in the network, but only changes the displayed node degree d i [44].

Game Equilibrium of Bayesian Active Deception Defense Game
In the experiments, the defender uses the prior probabilities of the two attacker types to calculate the payoff matrices of the defender and the two types of attackers through the function in Section 2.3. By adopting the DOBSS, we get the defender's mixed defense strategy, the global-type attacker's pure attack strategy and the local-type attacker's pure strategy, as shown in Figures 5-7, respectively.
We observe that when p g = p l = 0.5, the defender's defense equilibrium strategies are basically the same as the defender's defense equilibrium strategies when facing only the global type of attacker. From the side, it can be seen that the global-type attacker poses a more significant threat to the defender, which is consistent with the global-type attacker's efforts to reduce network performance.
On the other hand, when the noise level α is increasing, the attacker tends to adopt the HAS strategy. With the increasing noise level, the defender can use the 'first-mover advantage' to induce the attacker to adopt HAS with reduced attack success rate, to achieve the goal of improving the equilibrium payoff of the defender.
Taking p g = 0.1 and 0.9 as examples, we show the experimental results at θ D = 0.3 and θ A = 0.8. When p g = 0.1 and p l = 0.9, the probability of occurrence of a local-type attacker is large, and the probability of a global-type attacker is small. When the noise level α equals 0, 0.1, 0.3 and 0.5, respectively, the defender equilibrium mixed strategy (x h , x l ) is

Sensitiveness Analysis
From the previous experiments and analysis, we know that before calculating the Bayesian Stackelberg game equilibrium, the defender must obtain the prior probability of multiple types of attackers. However, in real life, it is difficult for the defender to estimate the exact prior probability P = [p exact g , 1 − p exact g ]. The defender must infer the occurrence probability of multiple types of attackers through data. We define the defender's estimate of P as P = [p estimate g , 1 − p estimate g ]. Then, the defender employs the Bayesian Stackelberg active defense game and uses P as input to calculate her optimal strategy. We calculate the equilibrium defense strategy that the defender considers to be optimal at this time and then calculate the real expected defender payoff by the accurate prior probability P. We compare the calculated real expected defender payoff with the defender's equilibrium payoff, calculated with the accurate prior probability P as input. To test the sensitivity of the equilibrium payoff to prior probability P, we present two groups of experimental data. In the first group of experiments, p estimate g = 0.5, p exact g = 0.1. The experimental result is shown in Figure 8. In the other group, p estimate g = 0.5, p exact g = 0.9. The experimental result is shown in Figure 9. Comparing the defender payoff reductions in both cases, we find that the defender payoff reductions in the first group of experimental data are significantly greater than the second group of experimental data. We know that in the first case, the defender's defender strategy is the same as in Figure 5b. The actual optimal mixed defense strategy is consistent with the Figure 5a. In the second case, the defender's defender strategy is still the same as the Figure 5b, and the optimal mixed strategy in the actual situation is consistent with the Figure 5c. The optimal pure attack strategy for both types of attackers is the same in both cases, consistent with the Figure 6b and the Figure 7b.
From the defender's strategy, the defense strategy adopted by the defender after misjudging the prior probability of the attacker type is closer to the overall trend of the optimal defense strategy in the second case.
In the most extreme case, the defender does not take into account two types of attackers. The defender's misjudgment of the attacker's type inevitably leads to the decline of defender's payoff. We conducted experiments on the situation of complete misjudgment. In Figure 10, the defender misjudges that the attacker is the local type, but is actually the global type, that is, P = [0, 1] and P = [1, 0]. In Figure 11, the defender misjudges that the attacker is the global type, but is actually the local type, that is, P = [1, 0] and P = [0, 1]. From Figures 10 and 11, we can see that misjudgment of the attacker type will bring about a decrease in the defender's payoff. Therefore, it is necessary to study the situation of the defender facing multiple types of attackers.
In  Figures 10 and 11 also show that it is better to overestimate a dangerous enemy (i.e., the global-type attacker) than to underestimate it. This is because the global-type attacker who pays equal attention to the overall performance of the network as the defender will cause greater losses to the infrastructure network.
From Figure 2, we can find that with the increase of noise level α, the equilibrium payoff of the defender increases, whether in the face of global-type attacker or local-type attacker. The improvement of noise level α can effectively improve the final payoff of the defender. Meanwhile, by observing , we find that the final payoff of the defender is that the higher the noise level α is, the smaller the gap is between the false judgment of the prior probability of the two types of attackers and the correct judgment of the prior probability of the two types of attackers.

Conclusions and Discussions
With the emergence of terrorism, the protection of infrastructure has attracted the attention of more and more researchers. Moreover, with the continuous development of science and technology, infrastructure is becoming more interconnected, which makes infrastructure exhibit network characteristics. This allows us to consider infrastructure protection from a network perspective. The existing infrastructure network game model basically stays at the initial stage, assuming that both sides of attack and defense move at the same time, and both sides have complete information. For this reason, the Bayesian Stackelberg active deception defense game proposed in this paper mainly studies the defense actively transferring false network information to attackers under the condition of asymmetric information, and facing multiple types of attackers.
Firstly, we introduced a false network construction method to simulate the defender apply of asymmetric information to defend against the attacker actively.
Secondly, we apply the Bayesian Stackelberg game to simulate the reality that the defense of the infrastructure faces multiple types of attackers.
Finally, we conducted experiments on the scale-free network. Through experiments, we find that using false networks for active defense does improve the defender's equilibrium return. After analyzing the attacker's equilibrium strategy, it is found that the defender applies the 'first-mover advantage' to induce the attacker to adopt HAS whose attack success rate decreases. By analyzing the sensitivity of the prior probability of attacker type to the defender's equilibrium payoff, we verify that the global-type attacker poses a greater threat to the infrastructure network. The defender should overestimate the probability of the dangerous attacker (i.e., a global-type attacker) rather than underestimate the probability of the dangerous attacker.
Although the model we built is closer to reality than that in previous research, we still have much room for improvement compared with the complexities of reality. Next, we need to consider irrational attackers based on our model. The existing research on using game theory to improve infrastructure network performance is based on rational attackers. However, in real life, an attacker is not necessarily rational. Modeling adversary bounded rational behavior should be considered. The SHARP (Stochastic Human behavior model with AttRactiveness and Probability weighting) model based on success or failure of the adversary's past actions on exposed portions of the attack surface to model adversary adaptiveness [50] provides a good choice for our model of irrational attackers. Next, we will combine the bounded rational game model with network science to study the protection of infrastructure networks. Our research in this article is aimed at a single type of defender and multiple types of attackers, but the number of defenders and attackers is unique. In reality, multiple attackers and multiple defenders will also appear at the same time. Evolutionary game theory provides a train of thought for solving evolutionary stable strategy (ESS) by modeling a group dynamic game under the conditions of bounded rationality and incomplete information [51].
We note that the method of constructing a fake network is not unique, and its essence is to provide an incorrect or indeterminate degree of network nodes. The literature [52] gives us inspiration. Next, we will explore the influence of the information entropy of incomplete and imperfect information on offensive and defensive games from the perspective of information theory.

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.