1. Introduction
Cyber–physical systems (CPS) are considered to be among the revolutionary technologies due to the continuous technological breakthroughs and innovations in information technology and in the manufacturing industry [
1]. CPS is a multidimensional and complex system that deeply integrates control, communication and computing (that is, 3C technology composed of control, communication and computing)and can realizes large-scale information acquisition and intelligent control of the physical world through the cognition, communication and control of physical objects, so that the network can monitor the specific actions of a physical entity in a real-time, reliable, remote and safe way [
2,
3]. CPS is widely used in aerospace, industrial production, advanced automobile systems, energy reserve, environmental monitoring, national defense and military, infrastructure construction, intelligent building, smart grids, transportation systems and telemedicine [
4]. With the rapid development of network, computing, sensing and control systems, CPS technology is more and more widely used, and the emerging network attacks make the wireless CPS system very fragile, and the security of CPS becomes the primary consideration [
5,
6,
7].
For the security issues of a system’s remote state estimation, there are many forms of malicious network attacks, but they are divided into three main and common categories: denial of service (DoS) attacks, integrity (including replay and false data injection) attacks and eavesdropping attacks [
8]. DoS attacks are designed to interfere with wireless communication channels. This attack will lead to a significant decline in the estimation accuracy in CPS [
9]. Peng [
10] and Zhang [
11] formulated the problem as a Markov decision process (MDP) problem to consider the optimal attack power allocation for remote state estimation in a multi-system. Integrity attacks can disrupt the transmitted data packets with stealth constraint [
12,
13]. In Ref. [
14], an important scenario is designed from the attacker’s point of view, in which the false data injection attack can completely and secretly destroy CPS. In addition, the channel may be subject to eavesdropping attacks, which can lead to serious economic losses and even pose a threat to human survival by eavesdropping on personal privacy data [
15,
16]. For example, in the intelligent transportation system, eavesdroppers infer the path planning of vehicles by monitoring the location information, and on this basis, eavesdropping attacks will easily succeed [
17,
18]. In terms of existing research, data encryption is the main method to protect system privacy from eavesdropping attacks [
19,
20,
21].
Recently, the issue of remote state estimation in the presence of eavesdroppers has attracted widespread attention from researchers. The attack types of eavesdropping attacks are divided into passive eavesdropping attacks and active eavesdropping attacks. Some estimation and control problems have been studied in the presence of active attacks. Han [
22] studied the problem of active eavesdropping on fading channels, and proposed an interference-assisted eavesdropping method to improve the probability of successful monitoring. Yuan [
23] constructed a two-person non-zero-sum game between the sensor and the active eavesdropper with the goal of minimizing the covariance of the self-estimated error and maximizing the covariance of the opponent’s estimated error. Ding [
24] took the trade-off between stealth and eavesdropping performance as a constrained MDP, and proposed an optimal strategy for active eavesdropping.
The above literatures indicate a certain breakthrough in the design of active eavesdropping solutions. This paper mainly studies the passive attacks of eavesdroppers. Tsiamis [
25] proposed a confidentiality mechanism for randomly hiding sensor information, and explored the trade-off between user utility and control theory confidentiality through optimization methods. Huang [
26] proposed a new encryption strategy and considered the cost of the encryption process. Then, the optimal determinism of the encryption strategy and the existence of the Markov strategy in the finite time horizon are proven. Wang [
27] theoretically proved that there are some structural properties in the optimal transmission scheduling for known and unknown eavesdropper estimation errors. In reference [
28], the transmission scheduling strategy of remote state estimation systems with eavesdroppers on packet-dropping links was studied. Yuan [
29] transformed the system model into MDP in order to obtain the optimal transmission scheduling to minimize the AoI of CPS and keep the AoI of eavesdroppers above a certain level, and proved that the optimal transmission scheduling strategy is a threshold behavior on the CPS and AoI of eavesdroppers, respectively. In [
30], the proposed problem is formulated as a Stackelberg game, and the strategy of maximizing the secure transmission rate between sensor and controller in the presence of malicious eavesdroppers and disruptors is studied. On the basis of analyzing the influence of different strategies on eavesdropping performance, Zhou [
31] studied the multi-output system and proposed a decryption scheduling scheme to minimize the expected estimation error under the condition of energy constraint.
Most of the existing literature studies the optimal transmission strategies of sensors from the remote estimator. Compared to [
27,
28], this paper studies the optimal attack energy allocation strategies from the perspective of eavesdroppers. Moreover, the previous literature mainly focuses on the situation that CPS has eavesdroppers in a single system and a finite time range, but does not pay too much attention to the situation when there are eavesdroppers in a multi-system and in an infinite time range. In this paper, the optimal attack allocation problem of remote state estimation in CPS with eavesdropping attacks in a multi-system in infinite time range is studied. Our goal is to maximize the state estimation error of the eavesdropper, so as to determine the optimal attack allocation of the eavesdropper. The contributions of this paper are as follows:
We propose a multi-system eavesdropping attack model based on channel SINR, which reveals the relationship between attack power and packet arrival rate.
In the infinite time horizon, under the condition of energy constraint, the optimal attack scheduling strategy is obtained by constructing MDP and using the Bellman equation.
Finally, according to the given algorithm, the optimal attack energy allocation strategy is obtained, and then it is verified by simulation experiments.
Notations: The entire paper uses the following symbols. is the set of natural numbers. The n-dimensional Euclidean space is denoted by . () is the set of n by n positive semi-definite matricess (and positive definite matrices). is the trace of a matrix X, and is the transpose of X and denotes the inverse of matrix X. and represent that X is a positive definite matrix and positive semidefinite matrix, respectively. For functions and h, stands for the function composition and with . indicates taking the expected value of ′·′. denotes the probability of ′·′.
3. Optimal Attack Schedule
In this section, we formulate Problem 1 as a discrete time MDP to solve. In addition, we also give an algorithm for searching the optimal eavesdropping attack strategy.
3.1. MDP Formulation
For the convenience of notation, denote
(or
) as the holding time from the estimator (or eavesdropper) to the continuous successful acquisition of data at time
k, that is, the duration from the last successful transmission time to time
k, which can be expressed by the following formula:
Obviously,
(or
), then we can get:
Using MDP to describe the dynamic process of CPS under eavesdropping attacks, MDP is expressed mathematically as , and the specific elements are as follows.
State space: let , where and can be considered as the state of process i at time at the remote estimator side and eavesdropper side, respectively. The state at time k is defined as , and its value range is a countable state space . Let .
Action space: we can know the action space is defined as , where , , is the maximum attack power to channel i. Thus, the action is .
Transition probability: let the state transition introduction matrix at time
k be
, which represents the probability of the state changing from
to
under action
, where
,
. For simplicity, let the state at time
k be
. Then, the state transition probability matrix is as follows:
Payoff functions: let
be the immediate cost function and define it as:
Obviously, the single-stage reward at time k is independent of the action behavior and only depends on the current state.
Note that the random decision rule of the eavesdropper is a mixed strategy sequence
, where
is the random kernel from
to
and definition
is the set of all these feasible strategies. Based on the process state
, the attacker chooses action
,
. Then, for the initial state
, we can get the sum of expected reward
following the action strategy
:
and its optimal value
is
Define the average value function under policy
as the function
V:
. Therefore, we can get the following theorem.
Theorem 1. According to the MDP theory, we can obtain the optimal value by solving the following optimality (Bellman) equation:where s = is the initial state. The optimal attack strategies of the eavesdropper is: Proof (Proof of Theorem 1). According to the eighth chapter in reference [
37], Theorem 1 can be obtained by introducing our state transition probability matrix (
27) and immediate cost function (
28).
From Equation (8.4.2) in [
37], we can get the following equation:
where
is the strategy of the time
k.
r and
are abbreviations. Many decision rules are contained in historical strategies. So,
and
can be decomposed into the following formula:
Therefore, we can get the following:
Then, we rewrite the finite-horizon optimality Equation (4.5.1) in [
37] as
Thus, we can get the optimality (Bellman) Equation (
31) and the optimal attack strategies of the eavesdropper (
32). So, Theorem 1 is proved. □
Remark 1. It should be noted that for finite MDP, the action taken at time k is non-stationary and depends on the current state at time k.
Remark 2. We can get the optimal attack energy allocation strategy of (29) by using the optimality (Bellman) Equation (31); in addition, the optimal strategy is statically deterministic, which helps us to find out the structural characteristics of the optimal allocation strategy. 3.2. Policy Iteration Algorithm
MDP proposed in this paper has infinite state space. However, according to the characteristics of state transition in the system model, we can find that when the eavesdropper’s attack energy is limited, the transition rule can effectively limit the system state in a limited time range. Therefore, in the MDP proposed in this paper, although it has infinite state space, we can treat it as an MDP with a finite time domain. This is convenient for us to design the algorithm of the optimal attack strategy.
In a finite time domain, the solution of the optimal equation is the optimal quality function from the decision time k to the decision time T at the end of the process. Based on the MDP problem constructed above, we provide a specific backward induction algorithm to solve it and provide the optimal attack strategy, i.e., Algorithm 1.
In Algorithm 1, we first calculate
, the packet rate
and the hold time
,
in step 1 and calculate the state transition matrix
in step 2. Then, in step 3, set all
and for all
, compute
by (
31). Next, in step 4, we set
, initialize
. In step 5, let
, compute
by (
31) and
by (
32). We assess that the best action 1 is found for state
, so in step 6, if
, then for all
, let (34). After this, let
, and go to Step 5. Otherwise, let
, go to Step 5. Finally, in step 7, if
, output
and
. Otherwise, go to Step 4.
Algorithm 1 Backward induction algorithm for optimal allocation strategy |
Require: .
|
Ensure: The optimal value ; optimal deterministic Markov policy |
Step 1: Calculate , the packet rate and the holding time , |
Step 2: Calculate state transition matrix |
Step 3: Set all and for all , compute by (31). |
Step 4: Set , initialize . |
Step 5: Let , compute
|
Step 6: If , then, for all , let
|
let , and go to Step 5. Otherwise, let , go to Step 5. |
Step 7: If , then output and . Otherwise, go to Step 4. |
Remark 3. In the above Algorithm 1, it is assumed that in order to reduce the calculation cost and complexity compared with the traditional algorithm.
Remark 4. We can derive the state estimation of the eavesdropper each time to ensure the feasibility of Algorithm 1.