DETERMINING FAILURE PROBABILITIES WITH QUALITATIVE CHARACTERISTICS FOR RISK ESTIMATION IN THE SGAM

In this work, we address the subject of Security-byDesign in Smart Grids. Our approach bases on designing technologies with the Smart Grid Architecture Model and existing risk estimation formulas, where we will look at characteristics to help estimating failure probabilit ies. Afterwards we will discuss which way of combin ing these characteristics and failure probabilities fits best for a risk estimation and how it can be improved.


INTRODUCTION
Renewable energies are an advancing technology and help to reduce CO 2 -emissions.Smart grid technologies were developed to modernize and imp rove traditional electrical grids to become smart grids.So here we have a strong evolving field and furthermore, the energy supply is a critical in frastructure in everyday-life.Taking together the points of evolvement and criticality, it is desirable that new Smart Grid technologies are secureand therefore security-by-design is needed.
In this paper we will analyze an approach for risk estimation in design-time.Therefore, we take the wellestablished Smart Grid Architecture Model (SGAM) to design and model Smart Grid technologies and continue analyzing how to integrate the viewpoint security in this model, where our approach attaches the analysis of security levels for the SGAM cells.

Methodical Design Approach
In regard of security-by-design, we will first introduce the corresponding design methodology.The idea is a complete toolchain fro m the description of a use case up to a first rating for the described technology.This toolchain is depicted in Figure 1and starts, after the idea and conception for a new technology, with a description of the corresponding Use Case.This description can be made with the Use Case temp late provided by the Public Available Standard IEC PAS 62559 [1].Th is Use Case description can be imported within a Use Case Management Repository (UCM R) and fro m the informat ions within the Use Case description, the architectural design of the technology can be described within the Smart Grid Architecture Model, that will be explained in the next section.Having the architectural description, the value of Key Performance Indicators (KPI) can be estimated so that a first rating of the technology can be made and to compare it with other technologies.The overall-rating of technologies descripted with the Smart Grid Architecture Model is further described in [2].

Figure 1: Toolchain from Use Case description to rating key performance indicators
In this work, we will focus on the aspect of security within this rating integrated in the toolchain.The goal is to find KPIs and a corresponding metric to derive a risk estimation for technologies.

Smart Grid Architecture Model
The Smart Grid Architecture Model (SGAM) was developed in the context of the European Commission's standardization mandate M/490 to CEN, CENELEC and ETSI and is introduced in [3].It provides a holistic viewpoint of an overall Smart Grid infrastructure and can be used to design and model Smart Grid technologies .As depicted in Figure 2, the SGAM-p lane consists of five domains referring to the supply chain in the energy sector and six zones according to the hierarchical energy management zones.This SGAM-plane builds the basis structure for five interoperability layers, which were adopted from the GridWise Architecture Council (GWAC) stack.As the issue security affects more than one interoperability layer, it is addressed as a crosscutting-issue within the SGAM.

Smart Grid Information Security
With the Smart Grid Information Security (SGIS) report [4], the Smart Grid Coord ination Group SG-CG provides a high level guidance on how standards can be used to develop Smart Grid information security.
In this light it presents concepts and tools to help stakeholders to integrate information security into daily business.The report includes the SGAM by recommending Security Level (SL) for each SGAM Do main/Zone cell given the kind of equipment used there to manage power and its maximu m potential power loss associated in a global Pan-European Electrical Grid stability scenario for a given location.These SLrecommendations are depicted in Figure 3 and range fro m 1 to 5, where 1 is the lowest and 5 the highest SL.

Figure 3: S GIS Security Level recommendations
So here we already have a guide for the importance of security in relat ion to the location of a system in design in the SGAM Model.Th is leads to the idea of integrating these Security Levels for risk estimation within the SGAM at design-time.

RELATED WORK
The idea of including SGIS-SL for the estimation of a risk is already addressed in [5].Following the principle risk = probability * impact, it proposes the calculation of risk by the following formu la [3, Eq. 8]: This formu la already uses the SGIS Security Level reffered to as SL to include the impact of an attack.The variable DOE is an indicator for direct operational effects and can have the value 1 or 0. The probability of an event is included via Attack Probability Indicators (API), which are given in [5] exemplarily by the three categories hacker's motivation, asset reachability and propagation of secret.
Building upon this, the mathemat ical interpretation of the given formu la is analyzed in [6], so that it meets the following two requirements for the resulting risk:  The resulting risk has an absolute reference value (independent of the number of API included). The formu la can be applied on qualitative values for probability and consequence of an event.Besides these requirements, the direct operational effects are included a bit different in the new formu la in order to reduce the spread of the resulting risk.Furthermo re, the new formula gives the opportunity to include a weighting factor w i for an API and calculates the risk as follows: This formula is demonstrated in [6] on a comparison of two Smart Grid technologies that pursue the same goal.For the comparison, the exemp lary categories fro m [5] for the API were used.But as already mentioned in [5] and again addressed in [6], these API do not cover the whole attack probability and may influence each other (a high asset reachability may increase the hacker's motivation), so now we will try to give a more detailed grouping and categorization of Attack Probability Indicators.

APPROACH
In the next step, we will check whether the total probability of a risk can be identified via regard ing different Attack Probability Indicators and their qualitative characteristic and building the total risk by a composition of the single probabilities.Thereby, we note that the risk of a system to fail is not only given by attacks on the system, but also by failure of the system due to other reasons.Therefore, we continue speaking of failure probabilities and failure vectors instead of attack probabilities and attack vectors.The goal is to determine different failure vectors, where a probability of occurrence can be qualitatively determined and that include any risks.In [5], the example of hacker's motivation, asset reachability and propagation of secret was made.Regard ing the vectors asset reachability and hackers motivation, it can be argued that the asset Paper 0334 CIRED 2017 3/4 reachability directly influences hacker's motivation, so the given vectors can't be regarded separately and in consequence do not fit for risk estimation in the proposed way.
Another approach for the failure vectors can be derived fro m [7].Regarding failure vectors where a qualitative probability of failure cannot be determined, the probability can be exp ressed by qualitative values.NIST [7] works with the following classifications, that can be used for our approach, too. Very h igh probability (Level 10): Probability between 96 and 100 %  High probability (Level 8): Probability between 80 and 95 %  Moderate probability (Level 5): Probability between 21 and 79 %  Low probability (Level 2): Probability between 5 and 20 %  Very low probability (Level 0): Probability between 0 and 4 % In general, [7] gives four different types of threat sources identified: adversarial, accidental, structural and environmental.These types are even more specified, for example the thread source adversarial is divided into the following o Electrical Power Within the subgroup of Natural or man-made disaster the failure probabilit ies may differ due to the physical location of a system, but in total the probability of a Natural or man-made disaster at all is still very low, so this group can be summarized and treated as one failure vector.So we can conclude that the classificat ion of thread sources from [7] delivers a list of failure vectors that covers different aspects and gives lists of sub points for further specificat ion.The granularity of the sub points is quite high and some sub points may regarded together due to their qualitative classification of probability.

DISCUSSION
The given formula fro m [6] proposes to regard the Security Level (SL) of a system given by the SGAM cell where it is located and to mu ltip ly it by a co mposition of Attack Probability Indicators (API).The co mposition of API is not made by building the geometric or arith metic mean, but by squaring all characteristics, summing them up and taking the square root again.This kind of co mposition gives a weight on strong deviations of single API, so that the total risk gro ws strongly when one indicator has a high probability.This is desired, because for the risk it is better when all probabilit ies of failure are constantly low than all probabilit ies slightly lower except of one high probability of failure with the same impact as the other failure vectors (expressed by the SL).Nevertheless, there is one problem with the proposed way of risk estimation : The formu la first determines all failure probabilit ies for one SGAM cell (with the corresponding security level), co mposites them to a total failure probability for the addressed SGAM cell and mult iplies it with the security level afterwards.This has the effect, that included failure vectors may be counted for a low security level although their occurrence may have a higher impact or they may be counted double when mu ltip le security level are regarded.

Example
We will exp lain this effect on one example: Assume we have a system located in the SGAM cell Station-Distribution.This may for examp le be an intelligent voltage transformer.The corresponding security level fro m the SGIS is 2 and the indicator for d irect operational effects is 1.Then we regard the failure vector of natural disaster (Flood/Tsunami, Windstorm/Tornado, Hurricane and Earthquake).The probability for such a natural disaster to occur is very lo w (so we have the corresponding semi-quantitative probability level 0).But now we get to the point that a natural disaster not only causes failure at the voltage transformer of the distribution grid, but also on the whole electrical power infrastructure.So a natural disaster has an impact on the SGAM cell Station-Transmission, too.And there the security level is 4.So when we regard the security level of the voltage transformer and calculate the risk including natural disaster, and further we then regard the transmission grid and calculate the risk here, too.Then we have the probability of a natural disaster included twice, what is mathemat ically not correct.On the other hand, when we avoid this problem by includ ing the probability for natural disaster only on the cell with higher risk, but exemp larily calculate the risk for a system that only applies to one SGAM cell, we skip the failure vector of natural d isease.Therefore, this option is incomp lete.

Conclusion
Taken together, the given formula allo ws us two ways to handle with failure vectors: 1. Include each failure vector on every SGAM cell.2. Identify directly affecting failure vectors for every SGAM cell o r security level.On the one hand, the first option leads to an overassessment of failure vectors that have impact on mu ltip le systems simu ltaneously like the given examp le with natural disasters.But on the other hand, the second option leads to an underassessment of failure vectors that have bigger impacts expressed on another SGAM cell or security level than the one in question and therefore are left out.

RESULTS
Following the discussion, we conclude that the given formula is not satisfying.This holds for both formulas introduced in [5] and in [6] as the principle of the formulas is the same and may lead to an over-or underassessment of failure vectors.The reason for this is also addressed in [8], where one of the main findings is the unbalanced risk distribution in smart grids, ranging fro m h igher level smart grid co mponents from utility providers to bottom level co mponents like concentrators or smart meter.To avoid over-or underassessment, a holistic approach is needed.In this work, we could not show an approach that works, but we could show why it does not work and can derive the following requirements for a holistic approach:  Some threat sources do not only affect one system, but a whole infrastructure and cannot be avoided by the design of a system. Even though two different thread sources might affect the same system and only that system, the impact of the thread may be different.So an overall risk estimation for a system should work on risks and not regard multiple impacts or probabilit ies alone before bringing them together.Yet still we have some basic approach that can be used for risk estimation :  The total risk is a co mposition of risk vectors, where [7] delivers a grouping and overview of thread sources. To identify the impact of a failure vector, the SGIS [4] delivers a qualitative value that can be taken into account. Even though an approach on single SGAM cells will not work, the SGAM itself provides a holistic viewpoint of an overall Smart Grid infrastructure and should be convenient for holistic risk estimation.

Figure 2 :
Figure 2: S mart Grid Architecture Model (S GAM) Partner o Customer  Nation-State In regard of determining qualitative probabilit ies of failure, it is recommendable to regard all given subgroups separately for the example of adversarial thread sources, because the probabilities might strongly differ between these subgroups.On the other hand, [7] names for the thread source environmental the fo llowing subgroups:  Natural or man-made d isaster o Fire o Flood/Tsunami o Windstorm/Tornado o Hurricane o Earthquake o Bo mbing o Overrun  Unusual Natural Event (e.g., sunspots)  Infrastructure Failure/Outage o Teleco mmunications