An Information Theoretic Metric for Measurement Vulnerability to Data Integrity Attacks on Smart Grids

: A novel metric that describes the vulnerability of the measurements in power systems to data integrity attacks is proposed. The new metric, coined vulnerability index (VuIx), leverages information theoretic measures to assess the attack effect in terms of the fundamental limits of the disruption and detection tradeoff. The result of computing the VuIx of the measurements in the system yields an ordering of their vulnerability based on the degree of exposure to data integrity attacks. This new framework is used to assess the measurement vulnerability of IEEE 9-bus and 30-bus test systems and it is observed that power injection measurements are signiﬁcantly more vulnerable to data integrity attacks than power ﬂow measurements. A detailed numerical evaluation of the VuIx values for IEEE test systems is provided.


Introduction
Supervisory Control and Data Acquisition (SCADA) systems and more recently advanced communication systems facilitate efficient, economic and reliable operation of power systems [1].For instance, the communication system transmits the measurements to a state estimator that evaluates the operational status of the system accurately [2].However, the integration between the physical layer and the cyber layer exposes the system to cybersecurity threats.Cyber incidents highlight the vulnerability of power systems to sophisticated attacks.To ensure the security and reliability of power system operation, it is essential to quantitatively characterize the vulnerabilities of the system in order to set up appropriate security mechanisms [3].To that end, security metrics provide operationally meaningful vulnerability descriptors and identify the impact that security threats pose to the system.Moreover, security metrics enable operators to assess the defence mechanisms requirements to be embedded into cybersecurity policies, processes, and technology [4].For example, the Common Vulnerability Scoring System (CVSS) analysis Information Technology (IT) system [5].Typical security metrics for power systems focus on integrity, availability, and confidentiality as envisioned by the cybersecurity working group in the NIST Smart Grid interoperability panel [6].System security objectives are categorized into system vulnerability, defence power, attack severity, and situations to develop security metrics in a systematic manner [7].A cyberphysical security assessment metric (CP-SAM) based on quantitative factors is proposed to assess the specific security challenges of microgrid systems in [8].This fragmented landscape showcases a wide variety of metrics available that depend on the security services, threat characteristics, and system parameters.Remarkably, there is a lack of general data integrity vulnerability metrics for power systems.For instance, the impact of data injection attacks (DIAs) [9] can be assessed with a wide variety of criteria that depend on the objectives of the attackers [10][11][12][13].A large body of literature addresses DIAs that compromise both the confidentiality and integrity of the information contained by the system measurements [14].With the unprecedented data acquisition capabilities available in cyberphysical systems, attackers can learn the statistical structure of the system and incorporate the underlying stochastic process to launch the attacks [15,16].DIAs that operate within a Bayesian framework by leveraging stochastic models of the system are studied in [17,18].From the perspective of the operator, the introduction of stochastic descriptors opens the door to information theoretic quantification of the measurement vulnerability.
In this paper, we propose a novel information theoretic metric to assess the vulnerability of measurements in power systems to data integrity attacks.Specifically, we characterize the fundamental information loss induced by data integrity attacks via mutual information and the stealthiness of the attack via Kullback-Leibler divergence.Our aim is to provide a metric that is grounded on fundamental principles, and therefore, informs the vulnerabilities of the measurements in the system to a wide range of threats.This is enabled by the use of information theoretic measures which characterize the amount of information acquired by the measurements in the system in fundamental terms.
The rest of the paper is organized as follows: In Section 2, we introduce a Bayesian framework with linearized dynamics for DIAs.Information theoretic attacks are presented in Section 3. The vulnerability metric on information theoretic attacks is proposed in Section 4. In Section 5, we characterize the vulnerability of measurements in uncompromised systems and propose an algorithm to evaluate the vulnerability of measurements.The vulnerability of measurements of the IEEE test systems is presented in Section 6.The paper concludes in Section 7.
The main contributions of this paper follow: (1) A notion of vulnerability for the measurements in the system is proposed.The proposed notion is characterized by the information theoretic cost induced by random attacks.Specifically, mutual information and KL divergence are used to construct a quantitative measure of vulnerability.(2) The vulnerability assessment of the measurements is posed as a minimization problem and closed-form expressions are obtained for the case in which the initial state of the system is uncompromised.(3) An algorithm that computes the proposed vulnerability indices for general state estimators in power systems is proposed.(4) The proposed framework is numerically evaluated in IEEE 9-bus and 30-bus test systems to obtain qualitative characterizations of the vulnerability of the measurements in the systems.
Notation: We denote the number of state variables on a given system by n and the number of the measurements by m.The set of positive semidefinite matrices of size n × n is denoted by S n + .The n-dimensional identity matrix is denoted as In.For a matrix A ∈ R m×n , we denote by (A) ij the entry in row i and column j and diag(A) denotes the vector formed by the diagonal entries of A. The elementary vector e i ∈ R n is a vector of zeros with a one in the i-th entry.Random variables are denoted by capital letters and their realizations by the corresponding lower case, e.g., x is a realization of the random variable X. Vectors of n random variables are denoted by a superscript, e.g., X n = (X 1 , . . ., Xn) T with corresponding realizations denoted by x.Given an n-dimensional vector µ ∈ R n and a matrix Σ ∈ S n + , we denote by N (µ, Σ) the multivariate Gaussian distribution of dimension n with mean µ and covariance matrix Σ.The mutual information between random variables X and Y is denoted by I(X; Y ) and the Kullback-Leibler (KL) divergence between the distributions P and Q is denoted by D(P Q).

2
System model

Observation Model
In a power system the state vector x ∈ R n that contains the voltages and phase angles at all the buses describes the operational state of the system.State vector x is observed by the acquisition function F : R n → R m .A linearized observation model is considered for state estimation, which yields the observation model where H ∈ R m×n is the Jacobian of the function F at a given operating point and is determined by the parameters and topology of the system.The vector of measurements Y m is corrupted by additive white Gaussian noise introduced by the sensors [1], [2].The noise vector Z m follows a multivariate Gaussian distribution, that is, where σ 2 is the noise variance.
In a Bayesian estimation framework, the state variables are described by a vector of random variables X n with a given distribution.In this study, we assume X n follows a multivariable Gaussian distribution [19] with zero mean and covariance matrix From (1), it follows that the vector of measurements is with zero mean and covariance matrix Σ YY ∈ S m + , that is, where

Attack Setting
Let us denote the measurements corrupted by the malicious attack given by the random vector A m taking values in R m , that is, where Y m A ∈ R m random vector of measurements.With a fixed covariance matrix Σ AA ∼ S m ++ , when the additive disturbance to the system, that is, Z m + A m follows a multivariate Gaussian distribution, the mutual information between the state variables X n and the compromised measurements Y m A denoted by I(X n ; Y m A ) is minimized [20].Hence, from the Lévy-Cramér decomposition theorem [21,22], it holds that the sum Z m + A m is Gaussian, given that Z m satisfies (2), and therefore, A m is Gaussian.In view of this, in the following, we assume that where 0 = (0, 0, . . ., 0) and Σ AA ∈ S m + are the mean vector and the covariance matrix of the random attack vector A m .The assumption in (7) is further discussed in Section 3. Consequently, the vector of compromised measurements Y m A follows a multivariate Gaussian distribution with zero mean and covariance matrix with

Information Theoretic Attacks
The aim of the attack is twofold.Firstly, the attack aims to disrupt the state estimation procedure.Secondly, it aims to stay undetected.
For the first objective, we minimize the mutual information between the vector of state variables X n in (3) and the vector of compromised measurements Y m A in (6), that is, I(X n ; Y m A ).In other words, the attack yields less information about the state variables contained by the compromised measurements.The stealth constraint in the second objective is captured by the Kullback Leibler (KL) divergence between the distribution P Y m A in ( 6) and the distribution For the observation model and attack setting described in Section 2, and assuming optimal detection, the Chernoff-Stein Lemma [23] states that the minimization of KL divergence leads to the minimization of the asymptotic detection probability.
The following propositions characterize mutual information and KL divergence with Gaussian state variables and attacks, respectively [24,Prop. 1,2].
Proposition 1.The mutual information between the random vectors where the matrices Σ XX and Σ Y A Y A are in (3) and (9), respectively; and the matrix Σ is the covariance matrix of the joint distribution of X n and Y m A , that is, where σ ∈ R + is in (2); and matrices H and Σ AA are in (1) and (7), respectively.
Proposition 2. The KL divergence between the distribution of random vector Y m A in (8) and the distribution of random vector Y m in (4) is where the matrices Σ YY and Σ AA are in (5) and (7), respectively.
The information theoretic attack construction is proposed in the following optimization problem [17,24]: where λ ∈ R + is the weighting parameter that determines the tradeoff between mutual information and KL divergence.Note that the optimization domain in (13) is the set of m-dimensional Gaussian multivariate distributions.The optimal Gaussian attack for λ ≥ 1 as a solution to ( 13) is given by [24] A m ∼ N (0, λ −1/2 HΣ XX H T ).
Note that the attack realizations from ( 14) are nonzero with probability one, that is, The attack implementation requires access to the sensing infrastructure of the industrial control system (ICS) operating the power systems.For that reason, the attack construction incorporates a sparsity constraint that limits the optimization domain over the attack vector A m in ( 6) to the distributions with cardinality of the support satisfying |supp(A m )| = k ≤ m, that is, The resulting sparse attack construction is [18] min The following theorem provides the optimal single sensor attack construction.

Attack Structure with Sequential Measurement Selection
To assess the impact of the attacks to different measurements, we model the entries of the random attack vector A m as idependent, that is, where A i is the i-th entry of A m and for all i ∈ {1, 2, . . ., m}, the distribution P Ai is Gaussian with zero mean and variance v ∈ R + , that is, A i ∼ N (0, v).Consider that k sensors have been attacked with k ∈ {0, 1, 2, . . ., m − 1} and let the covariance matrix of the corresponding attack vector A m in (6) be where S k is the set of m-dimensional positive semidefinite matrix with k positive entries in the diagonal, that is, Let the set of measurements that have not been compromised be where (Σ) ii is the entry of Σ in row i and column i.The sequential measurement selection imposes the following structure in the covariance matrix of the attack vector in (7): where i ∈ Ko and v ∈ R + .From ( 25), the cost function f : (10) and ( 12) is as follows: where the inequality in (28) holds from plugging (10) and ( 12) into (27); the equality in (29) follows from cancelling |Σ XX | in the first term [25, Sec.14.17] and noting that Σ Y A Y A = Σ YY + Σ AA in (9); and the equality in (30) holds from plugging (25) into (29).

Information theoretic vulnerability of a measurement
We propose a notion of vulnerability that is linked to the information theoretic cost function proposed in [24] to characterize the disruption and detection tradeoff incurred by the attacks.Taking the state of the system with k compromised measurements as the baseline, we quantify the vulnerability of measurement i ∈ Ko in terms of the cost decrease that i induces.In the following, we define the vulnerability of a measurement according to this idea.
Definition 1.The function ∆ : where Ko is in (24), defines the vulnerability of measurement i in the following form: where the function f is defined in (26).
Note that the attacker aims to minimize (26) by choosing an index i and a variance v, and therefore, the definition above implies that given that k measurements in {1, 2, . . ., m} \ Ko are already under attack in the system, the most vulnerable measurement is obtained by solving the following minimization problem where Ko is defined in (24).

Vulnerability analysis of uncompromised systems
We first consider the case in which no measurements are under attacks, that is, k = 0, for which the the following holds The attacker selects a single measurement with a given variance budget v ≤ v 0 .We quantify the vulnerability of measurement i in terms of ∆(Σ, λ, v, i) defined in (31).For the uncompromised system case, the optimization problem in (32) can be solved in closed form expression.The following theorem provides the solution.
Proof: We start by noting that (33) establishes that the vulnerability of measurement i in (31) is ∆(0, λ, v, i).From the equality in (30), the function f (0, λ, 0, i) is constant with respect to i. Hence, for Σ = 0, the optimization problem in (32) is equivalent to where Ko is defined in (34).Recall that λ ∈ R + and v ∈ R + .From (30), the resulting problem in (36) is equivalent to the following optimization problem: where the equivalence in (37) holds from plugging Σ = 0 into the equality in (30); the equality in (38) follows from removing a constant (1 − λ)log |Σ YY | from the first term; and the equality in (39) follows from the fact that Σ −1 YY e i e T i is a matrix with nonzero entries in the i-th column and all the other entries are zero.
We now proceed by defining t ∆ = vtr Σ −1 YY e i e T i , with t ∈ R + , and rewriting the equality in (39) as Note that (40) increases monotonically with t.Therefore, the cost function in (39) is monotonically increasing with t.This completes the proof.
From Theorem 2, it follows that the identification of the most vulnerable measurement is independent of λ, introduced in (26), and the value of the variance v.That is, it only depends on the system topology and parameters denoted by Σ YY defined in (5).This result coincides with Theorem 1 in the sense that in the attack construction for k = 1, the most vulnerable measurement is characterized in (19), which is independent of the value of λ.The following corollary formalizes this observation.

Vulnerability index (VuIx)
The vulnerability analysis of uncompromised systems in Section 5.1 is constrained to k = 0. To generalize the vulnerability analysis to systems compromised with k > 0, in the following we propose a novel metric, coined vulnerability index.Definition 2. For k ∈ {1, 2, . . ., m − 1} and S k in (23), let the parameters be Σ ∈ S k , v ∈ R + , λ ∈ R + .Consider the set {(i, ∆) : i ∈ Ko}, with Ko in (24) and Let the vulnerability ranking be such that for all i ∈ {1, 2, . . ., |Ko|}, r i ∈ Ko and moreover, The vulnerability index (VuIx) of measurement r j ∈ Ko is j, that is, VuIx(r j ) = j.
Note that the measurement with the smallest VuIx is the most vulnerable measurement and corresponds to the solution of the optimization problem in (32).The proposed VuIx for i ∈ Ko is obtained by Algorithm 1.

Numerical results
In this section, we numerically evaluate the VuIx of the measurements on a direct current (DC) setting for the IEEE Test systems [26].The voltage magnitudes are set to 1.0 per unit, that is,  the measurements of the systems are active power flow between the buses that are physically connected and active power injection to all the buses.The Jacobian matrix H in (1) determined by the topology of the system and the physical parameters of the branches is generated by MATPOWER [27].We adopt a Toeplitz model for the covariance matrix Σ XX that arises in a wide range of practical settings, such as autoregressive stationary processes.Specifically, we model the correlation between state variable X i and X j with an exponential decay parameter ρ ∈ R + , which results in the entries of the matrix (Σ XX ) ij = ρ |i−j| with (i, j) ∈ {1, 2, . . ., n} × {1, 2, . . ., n}.In this setting, the VuIx of the measurements is also a function of the correlation parameter ρ, the noise variance σ 2 , and the Jacobian matrix H.The noise regime in the observation model is characterized by the signal to noise ratio (SNR) defined as For all λ ∈ R + and v ∈ R + , we generate a realization of k attacked indices Ka ⊆ {1, 2, . . ., m} that is uniformly sampled from the set of sets given by  We then construct a random covariance matrix describing the existing attacks on the system as with Ka ∈ K.In the numerical simulation, we obtain the vulnerability of measurement i by computing where i ∈ Ko is in (24) and ∆ is defined in (31).

Assessment of vulnerability index (VuIx)
Fig. 1 and Fig. 2 depict the mean and variance of the VuIx obtained by Algorithm 1 for all the measurements with SNR = 10 dB, λ = 2 and ρ = 0.1 on the IEEE 9-bus system when k = 1 and k = 2, respectively.Therein, it is observed that in general power injection measurements take higher vulnerability indices.Note that the vulnerability index captures the threat posed by an attack on sensor i expressed in terms of the vulnerability of the measurement as described by ∆(Σ, λ, v, i) in Algorithm 1.A larger value of ∆(Σ, λ, v, i) indicates a larger potential for an stealthy data integrity disruption induced by an attacker.higher vulnerability indices assigned to power injection measurements for different system settings.This implies that corrupting the sensor data of power injection measurements is linked to larger information losses about the state of the grid, regardless of the attack construction used by the malicious attacker.Most power injection measurements correspond to higher ranked vulnerability indices but there are instances of power flow measurements with a higher ranked VuIx than that of power injection measurements.Interestingly, the power injection measurements with lower vulnerability indices correspond to the buses that are more isolated in the system, that is, the buses with a lower number of connections.On the other hand, the power flow measurements with higher ranked vulnerability indices correspond to the branches with higher admittance.The VuIx for k = 0 obtained in Corollary 1 is depicted for the purpose of serving as a reference to assess the deviation when k > 0. In this setting, the VuIx of most measurements does not change substantially for different values of k, which suggests that the VuIx is insensitive to the state of the system.Fig. 3 and Fig. 4 depict the mean and variance of the VuIx from Algorithm 1 for all the measurements with SNR = 30 dB, λ = 2 and ρ = 0.1 on the IEEE 9-bus system when k = 1 and k = 2, respectively.Similarly to what is observed above, the mean of the VuIx for most of the measurements does not deviate significantly from the case when k = 0.However, most of the variance values deviate significantly in comparison with the cases in Fig. 1 and Fig. 2 with SNR = 10 dB.Fig. 5 and Fig. 6 depict the results on IEEE 30-bus  systems with the same setting as in Fig. 1 and Fig. 2, respectively.Fig. 7 and Fig. 8 depict the results on IEEE 30-bus systems with the same setting as in Fig. 3 and Fig. 4, respectively.Surprisingly, the mean of the VuIx in larger systems coincides with that obtained for the case k = 0, which suggests that the VuIx is a robust security metric for large systems.In line with the previous observation, the power injection measurements corresponding to the least connected buses decrease in the VuIx when SNR = 10 dB.

Comparative vulnerability assessment of power flow and power injection measurements
In Section 6.1 we have established that power injection measurements and power flow measurements are qualitatively different in terms of the VuIx.To provide a quantitative description of this difference, Fig. 9 depicts the probability of a given VuIx i ∈ {1, 2, . . ., m − |Ka|} being taken by a power injection measurement or a power flow measurement for the IEEE 9-bus and 30-bus systems when λ = 2, k = 2, SNR = 30 dB and ρ = 0.1.Specifically, Fig. 9 depicts the probability of the following events: Flow i : VuIx i corresponds to a power flow measurement, Inj i : VuIx i corresponds to a power injection measurement.
It is observed that in both systems, small VuIx are more likely to correspond to power injection measurements than to power flow measurements, that is, P[Inj i ] > P[Flow i ] for small values of i. Conversely, it holds that P[Inj i ] < P[Flow i ] for large values of i.In fact, small VuIx correspond to power injection measurements with probability one, which suggests that the most vulnerable measurements in the system tend to be power injection measurements.Conversely, the larger VuIx values correspond to power flow measurements with probability one, which indicates that the least vulnerable measurements tend to be power flow measurements.Interestingly, there is a clear demarcation for each system for which P[Inj i ] and P[Flow i ] change rapidly with the VuIx value, which points to a phase transition type phenomenon for measurement vulnerability.
The probability of VuIx taken by power injection measurements concentrates higher probability mass for higher priority vulnerability indices.One the other hand, power flow measurements with higher probability mass coincide with low ranked VuIx values.Precisely, the probability of the vulnerability indices with higher priority taken by power injection measurements is one in both IEEE 9-bus and 30-bus systems.Meanwhile, the probability of the lower ranked vulnerability indices taken by power flow measurements is one.Note that the probability of mid-ranked vulnerability indices taken by power injection measurements drops significantly, which indicates that there are some power flow measurements that are equally as vulnerable as power injection measurements.We observe that these power flow measurements correspond to the branches with higher admittance.The power injection measurements with lower vulnerability indices correspond with the buses that are isolated in the systems.
Fig. 10 depicts the distribution of VuIx for power injection measurements and power flow measurements on the IEEE 9-bus and 30-bus systems when λ = 2, k = 2, SNR = 30 dB and ρ = 0.1.Specifically, Fig. 10 depicts the probability mass function of the following events: VuIx(Flow) = i: VuIx for power flow measurements is i, VuIx(Inj) = i: VuIx for power injection measurements is i.
Power injection measurements have a higher probability with high ranked VuIx, whereas power flow measurements have much higher probability with low ranked VuIx.It is worth noting that the probability mass functions are close to uniform for high and low vulnerability index ranges.This suggests that the most vulnerable measurements in the system are contained with high probability in a subset of the power injection measurements.Conversely, the least vulnerable measurements comprise the majority of the power flow measurements with no apparent preference over the majority.Surprisingly, in the 30-bus system, the probability of lowest ranked VuIx for power flow measurements experiences a sharp increase.

Conclusion
In this paper, we have proposed, from a fundamental perspective, a novel security metric referred to as vulnerability index (VuIx) that characterizes the vulnerability of power system measurements to data integrity attacks.We have achieved this by embedding information theoretic measures into the metric definition.The resulting VuIx framework evaluates the vulnerability of the measurements in the systems and enables the operator to identify those that are more exposed to data integrity threats.We have tested the framework for IEEE test systems and concluded that power injection measurements are more vulnerable to data integrity attacks than power flow measurements.

Fig. 1 -Fig. 5 :Fig. 6 :
Fig.1and Fig.2depict the mean and variance of the VuIx obtained by Algorithm 1 for all the measurements with SNR = 10 dB, λ = 2 and ρ = 0.1 on the IEEE 9-bus system when k = 1 and k = 2, respectively.Therein, it is observed that in general power injection measurements take higher vulnerability indices.Note that the vulnerability index captures the threat posed by an attack on sensor i expressed in terms of the vulnerability of the measurement as described by ∆(Σ, λ, v, i) in Algorithm 1.A larger value of ∆(Σ, λ, v, i) indicates a larger potential for an stealthy data integrity disruption induced by an attacker.Fig.1-6 depict a prevalence of