Reliability of inteRdependent netwoRks with cascading failuRes

The reliability of network systems of various structures has been studied by many researchers. However, most of the works just consider the reliability of a single network system. In practice, different networks may be interdependent such that the failure in one network may result in the failure in another network. The cascading failures have been shown to be catastrophic by some researchers. However, the quantitative evaluation for the reliability of interdependent networks has not been proposed. In this paper, a multi-valued decision diagram based approach is presented to evaluate the reliability of interdependent networks. Illustrative examples are proposed to demonstrate the application of the framework.


Introduction
Researchers have studied the reliability of networks for long [8,9,21].Typically, they have modeled the reliability of networks with different structures, and have considered different factors, such as common cause failure [1,7,25].[4] studied the influence of cascading failures on the reliability of networks.[6] studied the reliability of networks with multiple terminals using a binary decision diagram (BDD) technique.[22] also used BDD to study the reliability of networks.[12] studied the opportunistic routing for wireless ad hoc and sensor networks.[25] studied the reliability of complex networks with particle swarm optimization approach.[26] studied the optimal link state routing in mobile ad hoc networks.[11] studied the lifetime optimization for a heterogeneous wireless sensor network.[5] studied the reliability of a smart grid network systems considering direct cyber-power interdependency.[19] studied the reliability improvement of a radio electrical distribution network by optimal planning of energy storage systems.[3] studied the reliability enhancement of a wi-fi network.[13] presented the concept of a multi-phase network system to consider dynamic characteristics of networks, and analyzed its reliability.[23] studied the reliability of a cubic network system.[18] presents the method for determining the reliability of a network whose elements (links and nodes) are imperfect (can fail) and repairable.However, most of these works are restricted to the study of a single system.
In practice, the failure of different networks may be interdependent [20,14,10].Say, the failure in a subway system may increase the load of the bus transportation system, and increases the risk of traffic congestion.Another example is the interdependence between power systems and the control systems.As pointed out in [2], the cascading failure between the power systems and the internet network caused a blackout that affected much of Italy in September 2003.In [2], the effect of removing a proportion of nodes in one network is studied.However, the quantitative evaluation of reliability of interdependent networks is not provided.In this work, a multi-valued decision diagram based approach is adopted to evaluate the reliability of interdependent networks with cascading failures.
Section 2 describes the failure mechanism of the interdependent systems.Section 3 provides the multi-valued decision diagram based approach.Section 4 provides the numerical example.Section 5 concludes.

System description
Consider a system consisting of multiple networks, where the failure of some node in a network may cause one or more nodes in another network in fail.Each node in each network has an internal failure rate, and the nodes in each network have known connections with each other.Once a node fails, either due to internal failure or cascading failure, the node and its connections with other nodes are removed from the network it belongs to.After the removal, if any cluster of connected nodes in a network is smaller than a prefixed number, then the cluster will fail.A special case is where a node fails if it is not connected to any other nodes.This kind of cascading failure may cause catastrophic effects, as the failure of a node in one network may result in several nodes in other networks to fail, which may again cause more nodes in the original network to fail.In [2], an illustrative system is proposed, as shown in Fig 1 .There are two networks, A and B. Both of them contain six nodes, and the connections of the nodes are shown using the arcs in Fig. 1 (a).Any node will fail in case it is not connected with sciENcE aNd tEchNology any other node in the network.In case A i or B i fails (i=1,…,6), B i or A i will fail.Therefore, if A 5 fails, then the system will be as shown in Fig. 1 (b) since B 5 will fail and the connections of A 5 and B 5 with other nodes should be removed.Furthermore, as A 4 and A 6 are isolated, they will fail and cause B 4 and B 6 to fail.Afterwards, B 3 becomes isolated, and B 3 and A 3 will fail.Then, the system will be as shown in Fig. 1 (c).That is, the failure of A 5 has caused cascading failures of A 3 , A 4 , A 6 , B 3 , B 4 , B 5 , and B 6 .In this paper, the reliability of the interdependent networks is defined as the probability that each network still has some working nodes after a fixed period of time.

The model
Multi-valued decision diagram (MDD) has been frequently adopted to evaluate the reliability of systems with dependent failures [16,17].However, to adapt to our situation, the MDD used is somewhat different as in most papers.In most papers using MDD, each node in the MDD corresponds to a system element, each branch corresponds to a state of the element, and therefore each path leading to system success represents the set of elements that have failed and the set of elements that have not failed [15,28].In our case, if the traditional MDD is used, for each path representing system success, one still needs to enumerate all the possible sequence of the system failures.To avoid enumerate the sequence of failures, similar as in [27], the nodes of our MDD directly represent the failure sequence, and each path leading to system success represents the sequence of failures that have happened.The procedures of evaluating the system reliability with MDD are as follows: Construct the MDD representing the first event, 1) which can be the failure of any node in any network, or no failure happening at all.The terminal for each branch is the set of nodes that have failed in all the networks, considering both internal failures and cascading failures.
For the branch representing "no failure" or "no 2) more failure", the terminal for the branch is set to "0" representing system success.For any other branch, if it contains terminal representing that the system is still reliable, the branch needs further branching.The further branches represent all the possible scenarios for the follow-ing event, which can be the failure of any remaining node, or no more failure.For any branching indicating system failure, the terminal is set to be "1".Continue step 2 until all the terminals become "0" and"1"..

3)
Sum up the probabilities for the paths leading to "0", which is 4) the system reliability.

Illustrative example
Consider the illustrative system in Fig. 1 (a), and assume that the system is reliable as long as at least two connected nodes are working in each network.
According to the procedures, the MDD for the illustrative system shown in Fig. 1 (a) can be constructed.In order to make the MDD more concise, we do not show the branches directly leading to "0" and "1".The MDD for the system is as shown in Fig. 2.
From the MDD, the scenarios that lead to system success can be summarized below:  Scenario 36: A 4 or A 4 fails, then A 5 ,A 6 ,B 5 ,B 6 fails, leading to the failure of A 3 -A 6 ,B 3 -B 6 , then no more failure.Scenario 37: A 5 or B 5 fails, leading to the failure of A 3 -A 6 , B 3 -B 6 , then no more failure.Scenario 38: A 6 or B 6 fails, leading to the failure of A 6 , B 6 , then no more failure.Scenario 39: A 6 or B 6 fails, then A 1 ,A 2 ,B 1 ,B 2 fails, leading to the failure of A 1 ,B 1 , A 2 ,B 2 ,A 6 ,B 6 , then no more failure.Scenario 40: A 6 or B 6 fails, then A 1 ,A 2 ,B 1 ,B 2 fails, then A 3 , B 3 fails, leading to the failure of A 1 -A 3, B 1 -B 3 ,A 6 ,B 6 , then no more failure.Scenario 41 A 6 or B 6 fails, then A 3 or B 3 fails, leading to the failure of A 3 ,A 6 ,B 3 ,B 6 , then no more failure.Scenario 42: A 6 or B 6 fails, then A 3 or B 3 fails, then A 1 ,A 2 , B 1 or B 2 fails, leading to the failure of A 1 -A 3 , B 1 -B 3 ,A 6 ,B 6, then no more failure.Scenario 43: A 6 or B 6 fails, then A 3 or B 3 fails, then A 4 ,A 5 , B 4 or B 5 fails, leading to the failure of A 3 -A 6 , B 3 -B 6 ,then no more failure.Scenario 44: A 6 or B 6 fails, then A 4 ,A 5 ,B 4 ,B 5 fails, leading to the failure of A 3 -A 6 , B 3 -B 6 , then no more failure.
Note that though the enumeration of all the scenarios seems to be tedious, it is actually done according to a depth-first traversal.For small examples, one can enumerate the scenarios manually, whereas one needs to construct the MDD with computer programming and then sort out all the paths leading to system success through either depth-first traversal or width-first traversal if the system has a larger scale.Indeed, we admit that the system MDD can grow fast when the networks have more nodes, but it is also not supposed to solve the reliability of a complicated system with simple steps.Fortunately, with the advancement of computing technology, such as parallel computing and quantum computing, it is promising for the computer to analyze a MDD with thousands of nodes in seconds.Assume that the system operation time is T. The failure time of each node observes exponential distribution, with failure rate λ i for A i and β i for B i .The system reliability can be obtained by summing up the probabilities for all the scenarios leading to system success.Set λ i =β i =0.01 for i=1,..,6 and T=20, the system reliability can be calculated to be R= 0.7783.The influence of different nodes on system reliability is studied by calculating the system reliability again by changing λ i and β i to 0.02 and keeping other parameters unchanged.Table 1 shows the results.It can be seen that increasing the failure rate of node A 3 and B 3 does not have much influence on the system reliability.Actually, when A 3 or B 3 fails, A 1 ,A 2 ,A 4 ,A 5 ,A 6 ,B 1 ,B 2 ,B 4 ,B 5 ,B 6 can still function.Similarly, changing the failure rate of A6 or B6 also has minor effects.Actually, when A 6 or B 6 fails, A 1 -A 5 and B 1 -B 5 can still function.Increasing the failure rate of A 5 or B 5 has the biggest effect.Actually, when A 5 or B 5 fails, A 3 -A 6 and B 3 -B 6 will all fail due to cascading effects.

Conclusions
This paper proposed a multi-valued decision diagram based approach to evaluate the reliability of interdependent networks.Any node in each network has an intrinsic failure rate, and the failure of it may cause some nodes in other networks to fail.Moreover, a cluster of connected nodes fail as long as its size is smaller than a pre-specified number.A special case is where any node fails as long as it is not connected to any other nodes.The system is considered as reliable as long as it still has some working nodes in each network after a fixed period of time.
In this work, the failure of a node will cause fixed nodes to fail.It would be interesting to consider the case where a node failure may cause a random set of nodes to fail.Another direction is to consider the case where each node is multi-state instead of binary state.In the future, works can be done to calculate the importance measures of different nodes, and investigate the optimal structure of the networks.Besides, for very big networks, directly adopting the procedures may be computational complicated and unnecessary.In the future, works can be done to divide interdependent complicated networks into interdependent clusters, and calculate the reliability of the dependent networks based on the reliability of each cluster and the relationship of different clusters.Acknowledgement The research reported here was partially supported by the NSFC under grant numbers 71671016, 71231001, and 71420107023, and the Fundamental Research Fund of Central Universities under the grant number FRF-GF-17-B14.

Scenario 1 :
No failure.Scenario 2: A 1 or B 1 fails, leading to the failure of A 1 , A 2 , B 1 , B 2 , then no more failure.Scenario 3: A 1 or B 1 fails, then A 3 or B 3 fails, leading to the failure of A 1 -A 3 and B 1 -B 3 , then no more failure.

Table 1 .
System reliability when changing failure rate of different nodes