A novel secure diffusion Kalman ﬁlter algorithm against false data injection attacks

This paper proposes a novel secure diffusion Kalman ﬁlter (dKF) algorithm to improve the estimation performance crippled by false data injection attacks on sensors in wireless sensor networks (WSNs). Different from the conventional dKF, each adjacent node in the WSNs is detected to ascertain its trustworthiness before local estimate fusion, so as to form a new secure network topology. Then the combination step is performed to fuse the information collected from the secure topology. The proposed secure dKF algorithm, having a better estimation performance, is robust to false data injection attacks on multiple sensors and partial elements of measurements. For the proposed secure dKF algorithm, its mean and mean-square performance are derived, based on which its convergence is analysed. Additionally, the estimating and tracking problem of projectile position is investigated to conﬁrm the effectiveness of the proposed secure dKF algorithm. It is shown by simulations that the proposed secure dKF algorithm achieves a signiﬁcant estimation performance gain.


INTRODUCTION
Distributed estimation as an additional paradigm to centralized processing and decentralized processing with a fusion center is essential for the Internet of Things (IoT) and has flourished in past years. In distributed estimation, the fusion center is eliminated and a group of sensor nodes are interconnected to collaboratively realize state estimation from noisy measurements by sharing some information among them [1][2][3]. The distinctive talent eliminates the dysfunction issue of the central sensor node ultimately, and achieves higher flexibility, robustness and bandwidth and energy efficiency. As such, distributed estimation has been widely used in military and civilian fields, such as target tracking, auxiliary navigation, disaster forecast, incident investigation, and so on [4,5].
However, this assumption may not be realistic. That is primarily because wireless sensor nodes in sensing and communication may suffer from replay [13], false data injection [14], deception [15] and some other attacks [16]. Any attack may cause different kinds of severe consequences, such as customer information leakage, destruction of infrastructure, damages to state economics, endangering of human lives [17,18]. In response to these adversarial attacks, some secure distributed algorithms were investigated to achieve reliable distributed estimation [19]. In [20], for the distributed consensus estimation attacked by false data injection, the relative entropy was leveraged as a stealthiness metric to detect whether the exchanged data in wireless sensor networks (WSNs) were attacked or not. In distributed consensus estimation, the convex combination coefficients are the same and cannot be adjusted. In [21], the stochastic coding detection scheme was proposed to detect malicious replay attacks without any performance loss in the normal system. In [22], reachability analysis was employed to construct a security layer enabling secure estimate information shares for the distributed dKF algorithm. In [23,24], the secure distributed estimation problem over WSNs with malicious attacks imposed on sensor nodes and communications was considered. However, the reference estimate computed in [23] may be unreliable if there are multiple compromised neighbours in WSNs. There is no transmission of measurement information in dLMS subsystem in [24], which may degrade the distributed estimation performance to some extent. The secure state estimation in cyber-physical systems under attacks were reported in [25,26]. But there is no information exchange between sensors for cyber-physical systems, which is relatively simple compared with dKF. Different from the previous works, this paper develops a secure dKF algorithm under false data injection attacks for WSNs, which has a better robustness against attacks with an improved estimation performance. In the proposed secure dKF algorithm, each adjacent node is detected to ascertain its trustworthiness before local estimate fusion and form a new secure network topology. Then, the combination step is performed to fuse the information collected from the secure topology. The main contributions of this paper lie in two aspects: (i) a framework of secure dKF algorithm with false data injection attacks is presented; (ii) the performance of the proposed secure dKF algorithm in mean and mean-square analyses is performed. Simulation results verify the effectiveness of the proposed algorithm. Figure 1 depicts an attacked WSN comprising N spatially distributed sensor nodes, some of which are injected by false data, in a region. Two sensor nodes are assumed to be connected if directional communication occurs between them, with one sensor node always enabling connection to itself. All sensor nodes connected to the kth sensor node are termed as the kth sensor node's neighbourhood set, denoted by  ′ k (notice that k ∈  ′ k ). The size of the neighbourhood set  ′ k is denoted by n k , which is termed as the degree of  ′ k . Whether the kth sensor node is connected to the other sensor nodes in this WSN or not can be mathematically described by the adjacency matrix of the kth sensor node, which is given by [8]

SYSTEM MODEL
Clearly, if the l th and kth sensor nodes are connected, then the (l , k)th entry of the matrix A is one; otherwise, zero.
When no false data injection attacks exist, the following discrete-time linear system is given by where x i ∈ ℂ M represents the system state variable and y i ∈ ℂ pN denotes the system measurement variable, with M , N and p being the positive integers. For F i , G i and H i , they are known matrices. The process noise vector i and measurement noise vector i both follow independent and identical Gaussian dis- 1 2 i } are assumed to be detected and stabilized for every k, respectively. Furthermore, the relationship of i , i , Q i and R i is given by where the symbols T and i j represent transposition operator and Kronecker delta, respectively. We use the vector x 0 satisfying x 0 ∼ N (0, P 0 ) to denote the initial state of linear system with covariance matrix P 0 > 0, and x 0 is independent of i and i for all i. According to (3), at time i, the measurement y k,i ∈ ℂ p collected by the kth sensor node is given by Considering all sensor nodes together, the global measurement for this WSN at time i can be expressed as The measurement noise k,i is assumed to be spatially uncorrelated, thus we have where R k,i > 0 for all k and i.
Here, distributed secure estimation is considered in the WSN, where an invader sends false data to some sensor nodes for imposing attacks on sensing and malicious control. For the unattacked sensor nodes, they follow the discrete-time linear system given by (2) and (3). For the compromised sensor nodes, fictitious data is injected into them by the attacker. As a result, once the fictitious data is collected by the sensor nodes, the attacked sensor nodes may miss the true measurements. It is worth stressing that the obtained L2-norm measurement residual for one sensor node has nothing to do with whether or not this sensor node is compromised [24]. That is, the injection attack is immune to the detection method of measurement residual.
To simplify notations, the measurement model for either of unattacked and compromised sensor nodes can be expressed as The M -dimensional error vector q k,i is a null vector for an unattacked sensor node but non-zero for any compromised sensor node.

DISTRIBUTION SECURE ESTIMATION AGAINST ATTACKS
For reliable distributed estimation in the attacked WSN, this section proposes a secure dKF algorithm to reduce and prevent the negative effects resulted from false data injection on some sensor nodes. The proposed secure dKF algorithm is implemented via four steps: forecast update, transmission of measurement information then calculation of local estimates, transmission of local estimates then detection of trusted neighbours, and combination.
First, all sensor nodes, whether compromised or not by an attacker, separately update their forecast.
where P loc k,i|i is the state error covariance matrix. Second, any sensor node exchanges measurement information with its neighbours, and then incremental updates are made to obtain local estimates by combining the information of this sensor node and its neighbours.
Third, local estimates are exchanged between any sensor node and its neighbours. In the existing conventional dKF algorithm [8], each sensor node directly combines the local estimates of itself and its neighbours. However, the existing dKF algorithm is inappropriate in the considered attacked WSN. That is primarily because it is possible that the estimate of the whole WSN is ruined by the compromised sensor nodes through distributed cooperation. In view of this, the trusted neighbours of each sensor node are detected before local estimate fusion. Here, we adopt the method proposed in [24] to detect the compromised sensor nodes and form secure network topology.
To obtain a secure network topology, we need to get the reliable reference estimate of the kth sensor node at time i. Once any sensor node k receives the neighbours' estimatesx loc l ,i|i , it will sort all the mth elements from the neighbours by size including itselfW where l 1 , l s , l t , n k ∈  ′ k,i andx loc,m l 1 ,i|i <x loc,m l s ,i|i <x loc,m l t ,i|i <x loc,m n k ,i|i . From the perspective of attacker, the error between the local estimatex loc l ,i|i and the true value x i is large if an attack is effective. In other words, the probability that the attack appears in the left or right hand of the setW (m) k,i is very high. It is assumed that the number of attacked neighbours is less than n k ∕2 for the kth sensor node, then the number of trusted neighbours is greater than n k ∕2. It is obtained that the n k ∕2th estimate ofW (m) k,i is trusted with a very high probability. The detailed introduction can be referred in [24]. Then the n k ∕2th estimate ofW Going through all elements among M , the obtained reference estimate is given bȳ Since a common vector x i needs to be estimated through a distributed estimate algorithm, all local estimatesx loc l ,i|i collected from the neighbours of the kth sensor node approximate tō W k,i . In view of this, the "consistency" principle is employed to confirm whether a sensor node is attacked or not, based on the distance betweenx loc l ,i|i andW k,i . Defining a random variable characterizing the error betweenx loc l ,i|i and barW k,i , which is given by where m 1 is the index of the entity whose element-wise L 1norm distance betweenx Clearly, if an attack is effective, the error between the local estimatorx loc l ,i|i and the true value x i is large, resulting in a large s lk,i . Then, a compromised sensor node can be detected using the threshold test, which is given by where H 0 and H 1 represent the hypothesises that the sensor node l is reliable and unreliable, respectively. k,i is a pre-defined threshold wherê2 k,i isW k,i 's averaged element-wise sample variance, which are detailed in [24]. By derivation, it can be found that 2 k,i is a function ofW k,i−1 , …,W k,1 ,̂2 k,1 and the reference estimate mean̂k ,1 , whereW k,1 ,̂2 k,1 and̂k ,1 can be initialized.
Moreover, the mth elementW (m) k,i ofW k,i is obtained by sorting all the mth elements from the neighbours and finding the middle value. Obviously,W (m) k,i is entirely unrelated to the noise covariances, and the same asW k,i . Clearly, the variancê2 k,i is independent of noise covariances.
Through the threshold test, the set of trusted neighbours of the kth sensor node in an instant can be obtained as follows Finally, based on the coefficient c kl ,i and the local estimateŝ x loc l ,i|i in (13) received from the trusted neighbours of the kth sensor node, the combination step of the secure dKF algorithm isx where the convex combination coefficient c kl ,i satisfies

PERFORMANCE ANALYSIS
This section will analyse the performance of the proposed secure dKF algorithm. In the attacked WSN, distributed estimation suffers significantly from false alarm as well as missing detection. Particularly, once false alarm occurs, the performance of distributed estimation may be degraded since the trusted neighbours of a sensor node are reduced. Fortunately, malicious data does not spread out. By contrast, accompanied by missing detection, the estimate of the whole WSN may be ruined by the compromised nodes due to improper data fusion. In view of this, more attention should be paid to the error of missing detection. Lacking prior information of false data injection, it is hard to exactly derive the probability of false alarm as well as missing detection in the distributed estimation. As the purpose of detection in this work is to provide a secure distributed estimation against the attacks of false data injection, the performance is analysed for distributed estimation.
In the proposed secure dKF algorithm, a sensor node, whose index is represented by l without loss of generality, is regarded as a reliable and trustworthy neighbour of the kth sensor node if the following relationship holds where m denotes the element index of the vector x. Further, the inequality (24) is rewritten as where ‖ denotes an M × 1 unit-entry vector. Therefore, the statex loc l ,i|i is given byx where l ,i = [ 1 l ,i , 2 l ,i , … , M l ,i ] T is a random vector, each entry of which is under the bound constraint | m l ,i | < √ k,i . It is easy to find that l ,i is independent ofW k,i and the noise covariances.
Depending on whether the detection of reliable neighbours in the WSN is missing or not, the set  k,i of the kth sensor node is divided into two disjoint subsets, which are denoted by  + k,i and  − k,i . To be specific, the subset  + k,i consists of unattacked and detected reliable neighbours, and the subset  − k,i is made up of compromised but detected reliable neighbours. Then the statex k,i|i in (22) is given bŷ Combining (13) and (27), we havê Let the true value x i subtract both sides of (28). Then we havex wherex loc l ,i|i−1 = x i −x loc l ,i|i−1 andx k,i|i = x i −x k,i|i denote the local estimate error and estimate error, respectively, andW k,i = x i −W k,i is the reference estimate error.
To analyse the performance of the whole network, some global quantities are defined. The augmented state estimate errorX i|i , reference estimate error vectorW i , random vector i , the block diagonal matrices H i , P loc i|i and R i are defined respectively as follows And two combination matrices C + i and C − i are defined, respectively, and Their extended matrices are defined as follows where ⊗ denotes Kronecker product. It is clear that ‖ And these two extended matrices are related to the errors of false alarm and missing detection, respectively.
Combining (6) and the above global quantities, (30) can be extended to a global form To simplify notations, (41) can be further expressed as where It is not difficult to find that (42a) is similar to those in [8]. The only difference between them is the matrix  + i in (42) depends on the subset  + k,i consisting of unattacked and detected reliable neighbours.

Mean performance
For (42), its expectation is given by where (47d) and (47e) hold as i and i are both zero mean. Equation (47d) is similar to those in the conventional dKF algorithm. The only difference is the matrix  + i depends on the subset  + k,i consisting of unattacked and detected reliable neighbours. Therefore, it is determined by the probability of false alarm. Considering that ‖ holds as it is bounded. By adopting the proving method of convergence of reference estimate in [24], we can prove that the reference estimateW k,i always converges to an unbiased estimate, therefore E[W k,i ] = 0 as i → ∞ can be concluded. Taking ‖

Mean-square performance
The mean-square deviation (MSD) of the proposed secure dKF algorithm is analysed in this subsection. Considering that it is hard to access the exact performance of detection, the MSD boundness of the proposed secure dKF algorithm will be analysed. The MSD for the kth sensor node at time i is given by According to (42) and (48), using the identity of Kronecker products (A ⊗ B)(C ⊗ D) = (AC ) ⊗ (BD), the networked MSD is given by where (49a) and (49b) are dependent on the probability of false alarm since its combination matrix  + i is determined by the subset  + k,i . Moreover, they always converge as i → ∞ considering that ‖ + i ‖ 2 ≤ 1. The convergence proof is similar to equation (32) in [8]. Similar to the analysis of mean stability in Section 4.1, (49c)-(49f) are the functions of the cooperation matrix  − i , reference errorW k,i and i . ‖ − i ‖ 2 ≤ 1 holds, as it is bounded. According to [24], the reference estimateW k,i always converges to an unbiased estimate of the true value x i . Correspondingly, it Extremely, if all attacked sensor nodes are detected to be unreliable,  − i must converge to a null matrix, and the sum of (49c)-(49f) converges to a null matrix. In this case, the MSD in (49) is transformed into a form similar to equation (32) in [8]. Moreover, if no false alarm occurs,  + i converges to a stable combination weight matrix. Then, (49) is given by Let X ,i = E‖X i|iX T i|i ‖, then (50) is written as It is assumed that F i , G i , H k,i , Q i and R k,i in model (2) and (3) are independent of i. Then the convergence of the secure dKF algorithm can be completely guaranteed. Thus, (51) converges to the unique solution of the Lyapunov equation The steady-state MSD at sensor node k now can be given by [8] where  k is an NM × NM block matrix, each block of which has a size of M × M . Moreover, the (k, k)th block in  k is an identity matrix and the others are null matrices. Finally, the average steady-state MSD for the network is given by

SIMULATIONS
This section will show how effective the proposed secure dKF algorithm goes against the attacks due to false data injection by numerical simulations. In simulations, the problem for estimating and tracking projectile position is considered, in which each sensor node is able to obtain its noisy measurement of the projectile position. In the proposed secure dKF algorithm, our goal is to estimate the exact projectile location at each time.
As shown in Figure 2, a WSN consisting of N = 20 sensor nodes is attacked by false data injection. Here, we assume that sensor nodes k = 6 and k = 16 are attacked, and all elements of the corresponding error vectors are set as 5. For simplicity, it is also assumed that each sensor node only measures the vertical position and one horizontal position of the projectile like a two-dimensional plane. Additionally, the estimated projectile state is represented by a vector having M = 6 elements, which is formed by stacking its position, velocity, and acceleration. That is, Besides, a x,i = 0 and a y,i = g are accelerations of projectile in two orthogonal directions, the constant g = 10 is gravitational acceleration, and T = 0.2s is time interval. The covariances of states and measurement noises are Q i = 0.01I and R i = 0.01I, respectively. The process noise matrix is G i = I. The covariance of the initial state is P 0 = 0.5I. For all sensor nodes, their state and measurement matrices are the same, which are given by  Based on the same network, the results are averaged over 500 independent and random observations.
In Figure 3, the projectile trajectory is shown in five cases including true trajectory as well as the trajectories obtained by the conventional dKF algorithm [8] with and without attack, the proposed secure dKF algorithm and the fifth sensor node's noisy measurements. These cases correspond to green square, blue rhombus, black triangle, red pentagon and purple inverse triangle curves, respectively. It is obvious that the estimate produced by the proposed secure dKF algorithm is close to the true trajectory. Figure 4 depicts the transient MSD for different cases, which is the function of time. The results of the conventional dKF with and without attacks are given for comparison. It is clear that the conventional dKF with attack (blue, rhombus) indicates the worst estimation performance because of the propagation of measurements compromised by attackers. However, the conventional dKF algorithm without attack (black, triangle) achieves the best estimation performance. Obviously, the transient MSD of the proposed secure dKF algorithm (red, pentagon) is close to the transient MSD of the conventional dKF without attack, showing good performance. The reason for this subtle difference between them is that the information produced by compromised sensors is removed. The steady-state MSD of different algorithms is investigated in Figure 5. Because the propagation of measurements compromised by attackers over the network, the conventional dKF algorithm with attack (blue, rhombus) shows the worst estimation performance at each node. Compared to the case of the conventional dKF algorithm with attack, the proposed secure dKF algorithm (red, pentagon), which shows good performance at all nodes, is more robust to attacks.

CONCLUSIONS
The problem of distributed secure estimation over WSN crippled by false data injection attacks is considered here. To address this problem, a secure dKF algorithm is proposed, in which the trusted neighbours of each sensor node are detected before local estimate fusion to form a secure network topology and then the combination step is performed by combining its own local estimate and the received estimates from the trusted neighbours. The mean and mean-square analysis of the proposed secure dKF algorithm is also provided. Based on this, the convergence is analysed. Finally, the good performance of the proposed secure dKF algorithm is demonstrated by numerical simulations. On the other hand, there exists energy waste in detecting some compromised sensors in the proposed method, which will be investigated in the future work.