EMS communication routings' optimisation to enhance power system security considering cyber-physical interdependence

Energy management system (EMS) is one of the most essential categories of the advanced applications in current cyber-physical power systems. However, because of the tight coupling relationship between a power network and EMS's communication network, a physical break-line fault may be accompanied by a communication line outage, which may result in regional unobservability and uncontrollability. To maximally avoid and eliminate such operation risks, referring to current EMS's `active + standby' communication configuration scheme in China, the authors propose a dynamic routing optimisation mechanism for its standby communication routings, assigning the most reliable communication lines to the most important information. Such optimisation considers two factors: information's significance and the reliability of communication lines. By introducing cyber-physical sensitivity index and path-branch incidence matrix, both factors can be expressed numerically. In the case study, the authors optimise the standby communication routings for a power flow corrective control application under different scenarios. The results verify the effectiveness and superiority of their approach.

where ℵ = { χ k } and R = {r k }, respectively, represent the communication outage set and the corresponding possibilities (k represents the number of an outage), and Ψ = {Ψ l } and Q = {q l }, respectively, represent the physical operation scenario set and the possibilities (q l refers to the possibility of the lth operation scenario Ψ l ). s(x, z) and s(x, y) refer to the sensitivity indices, respectively, between virtual quantity z, y and selective physical state variables. c refers to the fibre's maximum occupancy rate, which is often utilised to evaluate the balance degree of a communication network. In power communication systems, different substations' routings usually occupy different channels, thus c can be represented by maximum quantity of the occupied channels in a fibre. Finally, Λ is a small quantity.
Δz( χ k ) and Δy( χ k ) represent the impact of communication outage χ k on the measurement and control signals' transmission between the control centre and the substations. Assuming that there are n substations, Δz( χ k ) and Δy( χ k ) can be abstracted in the following form: Δz( χ k ) = Δz 1 ( χ k ) … Δz n ( χ k ) T Δy( χ k ) = Δy 1 ( χ k ) … Δy n ( χ k ) T . ( To be specific, the elements of Δz( χ k ) and Δy( χ k ) are defined as follows: The first half of (1) represents the expected impact that the anticipated risks make on power system operation. The final Λ ⋅ c is introduced to balance the information distribution. In such an optimisation model, ℵ = { χ k }, R = {r k }, Ψ = {Ψ l } and Q = {q l } can be pre-assessed according to the prediction results, e.g. weather forecasting and load forecasting reports; sensitivity indices s(x, z) and s(x, y) can be calculated according to the method described in research [6] based on the system's information-energy flow model. Finally, the key to building such an optimisation model is to quantitatively express the vectors Δz( χ k ) and Δy( χ k ) by mappings from communication outage χ k . Notably, both the active and where path ↑ ( j) represents the standby up-link from substation j to the control centre. For any 1 ≤ j ≤ n, we have in which N ↑ ( j, k) and b ↑ ( j, l) represent the jth node and the lth branch path ↑ ( j) has passed. Len ↑ ( j) represents the lengths of path ↑ ( j) which is equal to the quantity of the passed branches or the routing hops. According to the above definition, the topology of G ↑ , G ↓ , G ↑ and G ↓ can be defined as the incidence relations between nodes and branches. As each path should have a terminal vertex on the control centre, the structures of G ↑ , G ↓ , G ↑ and G ↓ are quite similar to a radial distribution power network. Referring to the analysis approach for a distribution power network, we introduce a pathbranch incidence matrix to describe their topologies, denoted as T ↑ , A path-branch incidence matrix describes the incidence relations between all directed paths ending at the root node (corresponding to the control centre) and the branches, in which each column vector corresponds to a path. Assuming that there are m branches and n nodes in graph G (N, b), the orders of T ↑ , T ↓ , T ↑ and T ↓ should all be n × m.
Take T ↑ as an example (corresponding to G ↑ ) to show the elements' definitions. Define in which t ↑ (i, j) represents the element in the ith row and jth column of matrix T ↑ . The definitions of the other three matrices T ↑ , Define the row vectors of the four matrices with such row vectors, the occupancy rate of the lth communication line can be described as follows: According to the definition of path-branch incidence matrix, the non-zero elements of each row vector exactly represent the information transmitted inside this communication line, which also represents the signals blocked by the corresponding communication interruption. Take an example to illustrate the mechanism of cyber-physical break-line fault. The IEEE 14-bus test system is used to model the physical power system with the communication overlay model depicted in the left subfigure of Fig. 2. The active communication paths for the substations are abstracted to the subfigure on the right. According to the network structure, we can obtain pathbranch incidence matrix T ↑ and T ↓

46
IET Cyber-Phys. Syst Assuming that communication line 8 is interrupted by an accident (see the red star in Fig. 3), substations 6, 9 and 10's communication should be blocked (see the red circles and line in Fig. 3), which exactly corresponds to the non-zero element in b 8 's row vector in matrix (10), i.e.

Expression of communication outage impacts
In this section, we utilise the previously proposed path-branch incidence matrices to express Δz( χ k ) and Δy( χ k ) in (1). For a certain communication interruption scenario χ, assume the faulty communication lines are set as In this scenario, all the links related to b χ should be terminated, thus the blocked information transmission can be described as the sum of the row vectors corresponding to b χ . Thanks to the 0-1 duality of path-branch incidence matrices, such a set operation can be equally transformed as multiplications among vectors. According to the definitions in (8), the impact of χ on active/ standby communication can be expressed as where α(n) refers to an n-dimensional column vector whose elements are all +1. Operator . * represents the multiplication between two column vectors with the same dimension. Assume When an EMS is configured with 'active + standby' communication routings, only when a substation's active and standby link to the control centre are interrupted simultaneously, can its communication be terminated. Therefore, χ's final impacts on up-link/down-link data can be described as the intersection of the active and standby Substituting (13) into (15), the impacts of any communication outage can be expressed as a mapping from four path-branch incidence matrices T ↑ , T ↓ , T ↑ and T ↓ . To be specific, when optimising the standby communication routings, T ↑ and T ↑ can be regarded as known, thus Δz( χ k ) and Δy( χ k ) can be reformulated as a function of T ↑ and T ↓ . Therefore, in this optimisation model, the optimisation variables should be T ↑ and T ↓ .

Optimisation model for the standby communication routings
After substituting (13) and (15) into (1), we can reformulate the optimisation model for the standby communication routings as (16), in which the definitions for most of the variables can be referred to (1), (13) and (14). In particular, we define (see (16)) in which A is the node-branch incidence matrix of the communication network G(N, b), with elements defined as follows: where a( j, k) refers to the element in the jth row and kth column, and b k . start and b k . end, respectively, refer to the start node and end node of directed branch b k (b k ∈ b). According to the definition, only when vertexes N i and N j are directly connected by communication lines can the element be non-zero. In (16), the optimisation objective is to minimise the expected impacts of the possible communication risks; optimisation variables are matrices T ↑ and T ↓ , corresponding to standby up-link and down-link sets, and the constraints are the connectivity of all links. Currently, communication system designers and engineers often use the genetic algorithm to study discrete path planning problems because of its guarantee for paths' connectivity [22][23][24]. Therefore, in this paper, we also introduce the genetic algorithm to achieve this optimisation.

Case study: optimisation supporting PFCC application
In this section, we take a certain advanced EMS application as an example, i.e. the PFCC application, to illustrate our optimisation theory's application as well as its effectiveness.

Brief introduction of PFCC
PFCC is an energy management technique for power system dispatching and is widely applied in most power systems [25,26]. During a power system's operation, different fluctuations of loads and generations, as well as randomly occurring device failures, may result in overloads in different power transmission lines. Such events relate to different operation scenarios in our case study. The primary goal of PFCC is to eliminate such overloads by regulating the load distributions and generator outputs. Other than in extreme emergency scenarios, a PFCC application does not perform load shedding.
To illustrate the cyber-optimisation issue, a simplified model of PFCC is adopted as follows: where superscript 'line' represents the power transmission line, and superscript 'g' represents the generator. Vector P line, Ψ represents the transmission power of the overloaded lines under operation scenario Ψ, and vector P line, Ψ, ref represents the transmission limits of those lines. In power system analysis theory, matrix B is defined as a highly sparse susceptance matrix, i.e. the imaginary part of the node admittance matrix. Vector d br contains the susceptance information, diag (d br ) is the diagonal matrix with the vector d br on the diagonal, and matrix A refers to the power network's nodebranch incidence matrix. To be specific, in this equation, A Ψ and B Ψ , respectively, represent the power system's susceptance and node-branch incidence matrices under operation scenario Ψ.
Although additional information may also affect the PFCC's decisions by adding extra constraints to optimisation (16), such as power range and uncertainty indices, P line, Ψ and P line, Ψ, ref affect such closed-loop control most. For this reason, we present such a simplified model to better illustrate the contribution of our cyberoptimisation approach, i.e. allocating the most essential information to the most stable channels. If we only consider a simple PFCC model without inequality constraints, the optimisation (19) can be simplified into a linear mapping from P line, Ψ and P line, Ψ, ref to ΔP g : in which

PFCC's vulnerabilities: cyber-physical break-line faults
As mentioned in Section 1, an EMS's communication network's structure is often tightly coupled with its physical power grid's structure. Therefore, a break-line fault occurring in the power grid might be accompanied by a failure on its coupled communication line, which may block information transmission inside the EMS. In this case study, such accidents are defined as cyber-physical breakline faults. A cyber-physical break-line fault could result in both power flow redistribution and communication outages, and the former impact may cause power overloads, while the latter one may seriously hinder the EMS's measurement and control actions, thus further aggravating operation risks, such as unobservable or uncontrollable power overloads. Take an example to illustrate the mechanism of cyber-physical break-line fault. For the IEEE-14 bus system presented in Fig. 2, we consider the active communication routings and introduce a cyber-physical break-line fault on the power/communication line connecting substation 1 and substation 5 (i.e. bus 1 and bus 5). The graph-based assessment of such a failure is briefly presented in Fig. 4.
In the physical power grid, such a fault breaks the power transmission line between bus 1 and bus 5 (see the right subfigure in Fig. 4)). In addition, because of the topology coupling between the power grid and the communication network, the communication between substation 1 and substation 5 should also be blocked, which means that the control centre cannot issue the control command to bus 1's generator (see the left subfigure in Fig. 4). Unfortunately, in this condition, if the output of bus 1's generator remains the same, most of the power flow of power transmission line (S1-S5) will be transferred to power transmission line (S1-S2), resulting in serious overloads, which could possibly trigger a cascading failure.
The aforementioned analysis illustrates the impact of a cyberphysical break-line fault on a PFCC's communication as well as the necessity for a standby communication configuration. To ensure the reliable and safe operation of a power system, power system operators are required to dynamically optimise the standby communication routings according to the risk scenario, thereby reducing such cyber-physical negative impacts by switching between two routing schemes.

Cyber-physical sensitivity analysis for PFCC
Before generating the optimisation model (16), it is first necessary to obtain the cyber-physical sensitivity indices for z and y. To be specific, for PFCC application, z refers to P line, Ψ , and y refers to control command ΔP g . Utilising the method discussed in research [6], we can generate the information-energy flow model for such a cyber-physical power system based on the decision function (21). The model for an IEEE-14 bus system is presented in Fig. 5.
For a power transmission line, assume that any terminals' communication loss could affect its observability [27], based on which we can briefly estimate the cyber-physical sensitivity indices between information values P line, Ψ , ΔP g and line overload according to the method discussed in [6]: in which b e, overload represents the set of the overloaded power transmission lines, b i e represents the ith power transmission line, and C j Ψ refers to C Ψ 's row vector corresponding to the jth generator (see (21) for C Ψ 's definition).
Substituting the above sensitivity indices into the optimisation model (16), and solving the problem with the genetic algorithm, we  can optimise standby communication routings for different scales of power systems under different load conditions and risk scenarios. In the following subsection, we will propose three scenarios and analyse their optimised routings to illustrate the effectiveness and superiority of our approach. In this case, the physical system is set as the IEEE 14-bus system with an PFCC application, whose structure and coupled communication network are shown in Fig. 2. The probability of a power line outage is set as 0.02%, and the conditional probability for a cyber-physical break-line fault is set as 70%. The maximum quantity of broken power lines is set as 3.

Standby communication routings' optimisation for
In an IEEE 14-bus system, the generation and load are concentrated in the upper-half. Our case study considers a load fluctuation scenario in the bottom-half network, in which the power load of each bus is tripled in a control period (see the red dotted box in Fig. 6). In this scenario, power transmission lines (S1-S2) and (S1-S5) are most vulnerable to overloading (see the left sub-graph in Fig. 6), thus the control centre needs to preferentially regulate substations S1, S4 and S5's generation output based on substations S1, S2 and S5's measurements. Accordingly, these substations should be the pivotal substations for this operation scenario. After optimisation for the standby communication routings, a comparison between three pivotal substations' active and standby links to the control centre is presented in the right sub-graph in Fig. 6.
In the right sub-graph, the yellow circles represent the pivotal substations, the black lines represent active routings, and the green/ orange dotted lines represent standby up-links/down-links. Symbols * and # refer to the uploaded measurements and issued control signals, respectively. With careful analysis of the optimised routings, we can determine that all three pivotal substations' standby links do not overlap the active links, and they also avoid passing those communication lines coupled with the overloaded power lines (i.e. branch (S1-S2) and branch (S1-S5)). Such a configuration maximises the observation and controlling capacity of the control centre, thus helping to avoid unobservable and uncontrollable overload accidents. In addition, our optimised communication scheme also minimises the routing hops as well as separates the up and down links, thus reducing the outage probability and balancing the communication load.

Consider extreme weather conditions:
Power systems may be faced with different operation risks, extreme weather, for example. Therefore, system operators need to forecast the possible risks and pointedly carry out corresponding communication configuration schemes. In this case study, we compare two standby routing schemes optimised under two different risk scenarios.
The power system we choose is again the IEEE 14-bus system, whose parameters and outage probabilities are set to the same as in the former case. Consider the following two extreme weather scenarios: i. A tornado hits at the location of substation 2, and the breakline probabilities for those related power lines are increased to 30% ii. A tornado hits the bottom-half area, and the break-line probabilities for power lines (S9-S5), (S6-S10) and (S7-S8) are increased to 30%.
3.4.3 Consider regional failure of the power system: Power systems may also be faced with regional failure accidents caused by extreme weather. Such failures do not only affect the system's energy flow distribution, but also change the topology and the connectivity of both the power and communication networks, which may even block other substations' communication to the control centre. In this case, we utilise the IEEE-39 bus power system as an example to illustrate the efficiency of our optimisation mechanism against such failures. The power network and its corresponding EMS communication network are presented in Fig. 8. According to the graph, substations S1, S11 and S17 are directly connected with the control centre (see stars in three colours). The active communication routings generated by the shortest path algorithm are presented in Fig. 8b, in which the green, blue and yellow circles refer to a substation being linked to the control centre via S1, S11 and S17, respectively, and the coloured arrows/lines illustrate the general communication routings from the substations to the control centre.
Consider the scenario that substations S18 and S17 are highly likely to quit operation (the vulnerable substations and power lines are labelled by red stars). The optimised standby communication scheme is shown in Figs. 8c and d. The two sub-graphs,  respectively, show two separate groups of substations whose communication routings pass S1 and S11, and these two substation groups are marked by green and blue circles/dotted lines, respectively. The occupied communication lines for each group are labelled by their corresponding colour, e.g. green and blue. With careful comparison between the active (Fig. 8b) and standby (Figs. 8c and d) routings, we can determine that the labelled colours for any substation are different. In other words, all substations' active and standby routings connect to the control centre via different substations among S1, S11 and S17, reflecting the mutual exclusivity of the two schemes.
In addition, in this scenario, power transmission lines (S14-S15), (S16-S24), (S23-S24) and (S21-S22) are most vulnerable to overloading (see the bold red lines in Fig. 8a). Based on the information-energy flow analysis theory discussed in [1], we can estimate that substations S16, S19-S24 and S33-S35 are most important for PFCC decisions against the assumed risk; these are labelled in purple round frames in Figs. 8b-d. In the shortest-pathbased primary communication scheme, these pivotal substations communicate with the control centre via vulnerable substation S17 and communication lines (S16-S17) and (S16-S21), which may easily be affected by extreme weather. To avoid such negative effects, the optimised standby routings for these substations avoid those vulnerable substations or communication lines and link to the control centre via substation S11 instead. Indeed, in Figs. 8c and d, it is clear that no substation communicates with the control centre via high-risk substation S17. Admittedly, from the perspective of communication, some substations' standby routings via S1 or S11 are not as efficient as via S17, for example, S3 and S14, but the sacrifice in communication time and routing hops is intended to lower the operation risk of EMS. Therefore, even though the arrows in Figs. 8c and d are markedly longer than the arrows in Fig. 8b, our 'detoured' standby routing scheme successfully avoids those vulnerable substations and communication lines, thus guaranteeing the reliability of communication.

Brief summary for simulation results
After analysing the optimised communication plans for the proposed three scenarios, we have noticed that our routing configuration mechanism possess following four characteristics: i. avoiding overlapping with the active routings; ii. avoiding passing the channels with high break-line risks; iii. avoiding passing the channels coupled with heavy-loaded power transmission lines; iv. reducing the routing hops and balancing the communication load.
Notably, routing hops and communication balance are the last things we think of in our optimisation. However, these two factors are often primary consideration for communication configuration [15]. To better compare two optimisation mechanisms, we select the most vulnerable scenarios, i.e. scenarios in Subsection 3.4.2 (1) and Subection 3.4.3 to perform optimisation using both methods. The results are proposed in Table 1. According to the results, shortest-routing optimisation could achieve less router hops. However, the configuration plan computed with our mechanism could ensure the minimum expectations of both the overloaded-line quantities and overloading amount, which can also be dynamically adjusted according to cyber-physical risk scenarios. Such comparison indicates that even though our method may not be optimal from the perspective of communication networks, it can better support the safe and reliable operation of EMS and power systems than traditional communication optimisation mechanisms.

Conclusion
To better support the safe operation of power systems, we have discussed the optimisation mechanism for current EMSs' communication routings. Different from traditional network planning and configuration methods, the goal of our approach is to minimise the impact of communication risks on power system operation. Referring to current 'active + standby' configuration convention, we have proposed a dynamic optimisation model for EMS's standby communication routings to adapt to changing operation environments and risk scenarios, which guarantees that the most important information can be transmitted by the most reliable channels.
The optimisation model is determined by two main factors: information significance, which can be described as cyber-physical sensitivity index, and communication reliability, which can be represented by the seriousness of a communication break-line risk's impact on EMS's communication. For the latter factor, we introduce a path-branch incidence matrix to describe the routing scheme, in which each row vector corresponds to the information blocked by a communication line's interruption. With such a definition, the mechanism by which a communication outage affects information transmission can be quantitatively expressed using vectors' multiplications.
To illustrate the concepts and methods, we have utilised our proposed mechanisms studying a typical closed-loop control application -PFCC under three different scenarios, and the results verify the effectiveness of our approach. All three case studies indicate that the optimised standby routings do not overlap with the active routings, and avoid passing those channels with high breakline risks or coupling with heavy-loaded power transmission lines. Even though the optimised scheme may not be optimal from the perspective of communication networks, it can better support the safe and reliable operation of power systems. As future SDN-based EMSs will allow network administrators to programmatically control communication network behaviour dynamically via open interfaces, such optimisation mechanisms have great application potential.

Acknowledgment
This work was supported by National Key R&S Program of China under grant no. 2017YFB0903000.