On the Advanced Methodology of Risk-Based System Resilience Analysis

The modern evolution of technological systems from Hierarchical branching structures purposed to centralized transfer and distribution of limited resources to multi-agent interconnected self-organized networks aimed to produce, transport and consumption of resources are considered. The model of multi-agent interconnected self-organized adaptive networking systems is proposed, the network topology is considered, a system functioning model including transient processes is analyzed. A substantial limitation of the traditional reliability paradigm for a novel type of systems is demonstrated. It was assumed, that optimization approaches in the context of “big data” utilization lead to create a quasiinfinite space of non-structured decisions, which can be characterized as “big decisions”. The modified approach based on the “equally defended networked system” paradigm and the corresponding quantitative risk measure is proposed. KeywordsSystem reliability, System structure, Topology, Big data, Big decisions, Risk, Equally-defended systems.


Introduction
The task of system resilience, analysis of generated by systems failures risks, is an important part of applied mathematics and engineering. Many decades this issue investigating from many sides: optimal decision making, risk and uncertainty analysis, system's design and architecture optimization, management and control methods advancement, etc., with a wide range of research methods (Singh and Billinton, 1977;Ram and Singh, 2009;Kadry and El Hami, 2015;Kumar et al., 2017).
Traditionally, the task of reliability control may be considered in the framework of a well-known paradigm, as the difference between the expected and the real system performances. Usually, the real and targeted infrastructure functions at time t are defined as F(t) and F0 (t). If R denotes resilience, its value can be expressed as (Natvig, 2010;Zio, 2013): With quantitatively metrics, included lots of linear and non-linear parameters, such as resilience capabilities, time to recover and other, for example (Høyland and Rausand, 2009;Kadry and El Hami, 2015): where, Sp is the speed recovery factor, Fo is the original system performance level, Fd is the performance level immediately after the disruption, Fr is the performance at a new stable level after recovery efforts have been exhausted, representing slack time or the maximum amount of time post-disaster that is acceptable before recovery ensues, is the time to final recovery or time to reach a new equilibrium state, * is the time to complete the initial recovery actions, and α is the parameter controlling the "decay" in resilience until the new equilibrium is met (according to Hosseini et al., 2016).
Usually, this paradigm can be applied to the analysis of the wide class of engineering systems purposed to transition and distribution of centralized resources among the hierarchical network of consumers. In this case resilience analysis usually may be reduced to the task of optimization: minimization of distances between nodes and maximization of gain toward the links failure probabilities (for example, as ∑ , ℎ ∈ , ∈ ).
In these terms, the concept of resilience can be realized in different formalism according to the researcher's priority.
But this paradigm developed to be applied first of all to engineering systems i.e. to the systems with specific structure and topology -hierarchical or branching structures with the one-direction distribution of limited resource.
Hierarchical branching structures can be described by the well-known replicator equation system, where dynamics of nodes N and links L may be represented as (Meisel and Gross, 2009;Peixoto, 2014): In equation (31) the parameter α controls the strength of feedback from the environment to the nodes. Correspondingly, in such representation, the errors will depend on entropy described as (Lee et al., 2011): Here is the total number of nodes with degree k, = ∑ is the number of the half-edge incident on block r.
In such systems, the flows of data/resources and corresponding errors will be described as the linear combinations of parameters (in particular, as it presented by (2) and (3) equations).
The recent revolutionary development of technologies initiates new tendencies in the world, such as digitalization, globalization, and decentralization. The growth of these trends leads to the transformation now not solely of humans, but also of natural and engineering systems: all systems become multi-component, multi-physics, varied-scale, and distributed managing. All components (every node) of such a distributed system may be considered as a consumer, producer, or transmitter of resources and data depends on the current system's topology, all agents are the objects and the subjects at the same time. In particular, an electrical network, based on "green energy" generation, instead of the traditional centralized power supply systems, can serve as an example of such a changed system. Therefore, the structure and function distribution of such open and expanded systems are changed, as well as the distribution of risks is changed too.
Development of improved paradigm of resilience, based on the new state os systems and changed the distribution of threats, may require a modified approach to maintain equal security of networks. This paradigm may be based on the known principle of equal strength design, which is actively used in the design of space technology (Tamaskar et al., 2014), and allows to manage a system's complexity and reliability (Asikoglu and Simpson, 2012;Dolan and Lewis, 2008).
To identify the suitable paradigm of reliability and measures to control systems risks it is necessary to analyze the changed system's structure, topology, and dynamics.

Methodological Notes
To describe a system structure and dynamics, let to estimate analyzed systems like the adaptive networks in terms of the distributed estimation problem. The way proposed is based on data diffusion normalization and data diffusion affine projection approach, which may be applied to systems with arbitrary distributions provided that the data statistical moments are known or may be forecasted (Lopes and Sayed, 2008).

General System Model: Multi-Agent Inter-Connected Self-Organized Adaptive Networking Systems
In terms of distributed estimation of the adaptive network, it should be considered the concept of diffusion adaptive networks (Lopes and Sayed, 2007;Lopes and Sayed, 2008) linked by the dynamic topology, in which the network nodes and corresponding links may be subject of failure.
To describe the data-normalized diffusion adaptive network, first of all, it is necessary to analyze 1 vector 0 from measurements collected at N network nodes (where each node k has a time realizations { ( ), , }, = 1, … of zero-mean random data { , }, with scalar measurement ( ) and a regression row vector , ).
After (Sayed, 2003) it may be assumed that measurements can be described by the linear model: Here is a noise (with variance , 2 ), which assumed spatially and temporarily independent.
We may assume here, that excepting the regressors , , all vectors are column vectors (regressors are the row vectors). The random quantities will denote as ({ , }), and its realizations as ({ ( ), , }).
The adaptive network approach (Sayed, 2003;Lopes and Sayed, 2008) may be applied to estimate 0 . An adaptive network is a result of the application of local adaptive/learning rules or filters to nodes of the network. To efficiently exploit spatial and temporal information, the resulted communication topology exploiting the realization of a cooperation protocol between the nodes. Different cooperation protocols, based on the varied adaptive/learning rules or filters, generating the different adaptive networks.
Considering the adaptive network functioning under a diffusion protocol, each node k with neighborhood , at the time i, should be analyzed, and the network should be defined as the set of nodes linked to k.
In diffusion adaptive scheme every node ∈ , has an estimate ( −1) with 0 ; at each node k an aggregate estimate ( −1) through the linear combining the neighbors' estimates is generating, and then the local estimate is updated from ( −1) to (1) , according to (Sayed, 2003;Lopes and Sayed, 2008): here the coefficients { , } are the set of local combiners (with ∑ , ( ) = 1); is a matrix of local neighborhood regressions (Lopes and Sayed, 2007).
In (8) the aggregation mapping ( −1) could be presented by any broad (usually nonlinear) function of the close estimates: Metropolis-Hastings, closest neighbor, the Laplacian algorithm, or Bayesian rules.
Calculation (8)-(9) consolidates the effect of a few adaptive filters, as well as a dynamically changing topology of the network. To calculate the system performance the state-space formalization, with introducing global random quantities (according to Sayed, 2003;Lopes and Sayed, 2008) can be utilized: A global state-space network model for (8)-(9) may be given following the (Sayed, 2003;Lopes and Sayed, 2008) as: Let to consider a random network topology model. For it consider undirected graphs ( , ( ) = , ( )). Modeling of the topology dynamics will be based on the assumption of the links and nodes are random values (at the time i, the random link , ( ), connects nodes k and l, has a nominal value , ( ) = , ( ) with probability , = , , or will be zero with probability , ) as (Sayed, 2003;Lopes and Sayed, 2008): The similar may be utilized to model the nodes: a probability of occurrence equals to , , a nominal topology 0 , contained a fixed number of nodes N and links, subjected by failures, the links give 2 different sub-networks with probability each, with existing and faulty links, so in such formalization the probabilities { } are related to the { , }.

System Functioning Model
To understand network resilience, transient processes should be analyzed.
Because (0) = (0) , by subtracting (0) from the left side and (0) from the right side of (9), it can be obtaining the following form: Basing on the assumptions of (i) temporal and spatial independence of the regressors, and (ii) the collected data is not correlated with the network perturbations, the following form of expectations may be proposed (Lopes and Sayed, 2008): This form demonstrates, that the mean evolution of the vector of global weight error depends mainly on data moment ( * ) and on mean topology matrix .
In such a case, the global Mean-Square Deviation and Excess Mean-Square Error can be presented as (Lopes and Sayed, 2008): where = * .
At the node k the local output estimation error may be defined as: And so the global error vector across the network = { 1 ( ), … ( )} will be defined as: and with including weight error from (15): From (28) through the procedure of energy balance, can be obtained expectation: where: The recursive variance equation (28) describes the evolution of weighted norms of ̃ in terms of data statistical moments.
In equation (30) the weighting matrix Σ is implicit. To find expressions for the mean-square deviation and mean-square error it is necessary to use vectorization techniques (Sayed, 2003;Lopes and Sayed, 2008). Therefore, the particular solution may be obtained here for evolving topologies with Gaussian regressors in conditions with Gaussian transformation as follow: Where: = Λ * , T is unitary, and Λ = {Λ 1 , … , Λ }, Λ > 0 and diagonal.
Therefore, it can be obtained the next recursion, which describes the mean-square network performance in terms of parameter ̅ = {Σ ̅ } as: where The global mean-square deviation can be calculated by choosing the parameter ̅ = { } in equation (35), and excess mean-square errorby selecting ̅ = {Λ} respectively.

Discussion: on the Evolution of Reliability Control in Changing Systems
Because the systems (4)-(5) as well as its transition processes and links failure probabilities (19)-(36) and (6) are substantially different, the "traditional resilience paradigm" (1)-(3) has an essential limitation in application to multi-agent adaptive networking systems (Bie et al., 2017).
Formally, here we consider a change of the "resilience paradigm" induced the change of the system described: hierarchical or branching structures with the one-directional distribution of limited resource vs. multi-agent interconnected self-organized adaptive networking systems with reproducing and redistribution of resources between nodes.
The presented approach to the model allows to make the few important conclusions.
On the one hand, as the theory has shown, in particular, results (16)- (18) and (24)-(36), links fail and so limiting of resources and data exchanges between the nodes does not degrade the network performance significantly (and so infrastructure functions) for a wide range of link probability p (Lopes and Sayed, 2008;Losada et al., 2012). In other words, much fewer communication resources may be utilized to achieve a pre-defined performance level (targeted function) in diffusion protocols in the multi-agent inter-connected self-organized adaptive networking systems toward the traditional systems.
On the other hand, this result shows that classical system stability theory (1)- (3) is not suitable enough for describing distributed adaptive multi-agent systems (Gross and Sayama, 2009). When the number of nodes is large, the number of solutions that can be considered "optimal" will also be infinite. And choosing the separate one will be impossible. Therefore, the presence of big data (i.e., when → ∞ and → ∞ ) inevitably leads to the presence of "big decisions" (Kostyuchenko et al., 2020).
In such circumstances, it may be offered the concept of "equally defended networked system", which can be expressed as: Where , is a managing parameter (in some cases and separate classes of tasks reducing to the traditional parameter controlling the "decay" in resilience and to parameter α controls the strength of systems feedback).
Variability of this value inside the network clusters can serve as the risk parameter. And its quantitative measure of distributed defense of any separate network cluster can serve as a riskbased reliability indicator: globalization, decentralization, and technological transformation. Development of these tendencies now not solely generates new kinds of nexus, nonlinear interdependencies, and dangers such as systems or chain risks, however additionally limiting the applicability of traditional approaches to risk analysis (Kostyuchenko, 2018), resilience and vulnerability assessment, not solely natural, but also environmental, social and technological dangers (Ermoliev et al., 2012). Observing problems in the area of crisis control is linked with the transformation of the studied system (Oves et al., 2018).
The sustainable management of a new kind of transforming systems is primarily should be based on the utilization of big data, processed primarily by machine learning methods. But it should be noted, that the big data utilized with formal algorithms lead to the creation of the quasi-infinite area of non-structured decisions, which can be defined as "big decisions" (Kostyuchenko et al., 2020).
Therefore, it can be formulated a qualitatively new task: a task of the optimal solution choice in this "big decisions" space, which can be decided based on the new risk analysis and system resilience context (Stock and Seliger, 2016). Taking into account the modified nature of the systems and changed threats, the systems resilience assessment should be also adapted to new types of data, risks, uncertainties, and models, as well as to the new structure of the systems.
In this study, a change of the "resilience paradigm" induced the change of the system analyzed was described: hierarchical or branching structures with the one-direction distribution of limited resource vs. multi-agent inter-connected self-organized adaptive networking systems with reproducing and redistribution of resources between nodes. The modern evolution of technological systems is considered. The model of multi-agent inter-connected self-organized adaptive networking systems is proposed, the network topology is considered, a system functioning model including transient processes is analyzed. A substantial limitation of the traditional reliability paradigm for the novel type of systems has been demonstrated. It was assumed, that optimization approaches in the context of "big data" utilization lead to the creation a quasi-infinite space of nonstructured decisions, which can be characterized as "big decisions". The modified approach based on the "equally defended networked system" paradigm and corresponding quantitative risk measure is proposed.
Thus, the problem of creating a methodology for a decision-oriented method to assessing the system resilience based on an assessment of socio-environmental risk should be additionally investigated in further research. A formulation and approaches to solving the problem of "big solutions" can be proposed, based on the use of dynamic models. In particular, a robust solution may be proposed instead of optimal solutions; as well as the construction of the equally-defended distributed clustered system can be proposed as a way to improve the sustainability management of multiagent inter-connected self-organized adaptive networking systems within the framework of integrated socio-environmental security management.