Topological Approach to Measure the Recoverability of Optical Networks

Optical networks are vulnerable to failures due to targeted attacks or large-scale disasters. The recoverability of optical networks refers to the ability of an optical network to return to a desired performance level after suffering topological perturbations such as link failures. This paper proposes a general topological approach and recoverability indicators to measure the network recoverability for optical networks for two recovery scenarios: 1) only the links which are damaged in the failure process can be recovered and 2) links can be established between any pair of nodes that have no link between them after the failure process. We use the robustness envelopes of realizations and the histograms of two recoverability indicators to illustrate the impact of the random failure and recovery processes on the network performance. By applying the average two-terminal reliability and the network efficiency as robustness metrics, we employ the proposed approach to assess 20 real-world optical networks. Numerical results validate that the network recoverability is coupled to the network topology, the robustness metric and the recovery strategy. We further show that a greedy recovery strategy could provide a near-optimal recovery performance for the robustness metrics. We investigate the sensitivity of network recoverability and find that the sensitivity of the recoverability indicators varies according to different robustness metrics and scenarios. We also find that assortativity has the strongest correlation with both recoverability indicators.


Introduction
High reliability and robustness in optical network backbones play an important role in successfully provisioning high service availability of the Internet and communication systems [1]. In optical networks, disaster-based failures and damages to optical fiber cables can partially overload data delivery, resulting in unavailability of communication services [2]. The causes for such massive failures include: human errors, malicious attacks, large-scale disasters, and environmental challenges [3]. Calculating the performance of networks under such challenges can provide significant insight into the potential damage they can incur, as well as provide a foundation for creating more robust infrastructure networks.
Network robustness is interpreted as a measure of the response of the network to perturbations, or challenges, imposed on the network [4], which has been studied extensively in recent years. Van Mieghem et al. [4] propose a framework for computing topological network robustness by considering both a network topology and a service for which the network is designed. In communication networks, Cholda et al. [5] survey various robustness frameworks and present a general framework classification, while Pasic et al. [6] present the FRADIR framework that incorporates reliable network design, disaster failure modeling and protection routing. A wide range of metrics based on the underlying topology have been proposed to measure network robustness [7], and further a structural robustness comparison of several telecommunication networks under random nodal removal is presented in Ref. [8]. Long et al. [9] propose using the maximum variation of the Weighted Spectrum (WS) to measure the survivability of networks to geographic correlated failures. For optical networks applications, Zhu et al. [10] investigate the control plane robustness in software-defined optical networks under different link cut attack scenarios and find that control plane enhancements in terms of controller addition do not necessarily yield linear improvements in control plane robustness but require tailored control plane design strategies. Ferdousi et al. [11] propose a rapid data-evacuation strategy to move maximum amounts of data from disaster regions using survived resources under strict time constraints for optical cloud networks. Xie et al. [12] come up with a robust and time-efficient algorithm to address the emergency backup in inter-datacenter networks with progressive disasters.
The work mentioned above focus on measuring and improving the ability of networks to withstand failures and attacks. However, the recovery process after failures is not considered and the investigation on the ability of a network to recover from failures is lacking. In a broad sense, network robustness is also related to the ability of a network to return to a desired performance level after suffering malicious attacks and random failures [13]. We define such network capability as network recoverability 1 in this paper. As shown in Fig. 1, recovery measures are taken in order to recover the function or performance of the optical network after the failure process, either by restoring the damaged links or by building new links. The network performance during this period is related to many factors, such as topology, recovery strategy, link adding sequence, etc. Thus, we need an approach to measure the recoverability of optical networks. Several recovery mechanisms have been investigated under different circumstances [14], particularly in complex networks applications. For example, Majdandzic et al. [15] model cascading failures and spontaneous recovery as a stochastic contiguous spreading process and show the occurrence of a phase switching phenomenon. Chaoqi et al. [16] construct a dynamic repair model and systematically analyze the energy-transfer relationships between nodes in the repair process of the failure network. Recovery strategies based on centrality metrics of network elements (e.g., nodes or links) are investigated in Refs. [13,17], which show that a centrality metric-based strategy may not exist to improve all the network performance aspects simultaneously.
In optical networks applications, Alenazi et al. [18] propose a heuristic algorithm that optimises a network by adding links to achieve a higher network resilience by maximising the algebraic connectivity while decreasing the total cost via selecting cost-efficient links. Natalino et al. [19] introduce two heuristics to upgrade Content Delivery Networks (CDNs) and increase content accessibility under targeted link cuts. Hong et al. [20] propose a recovery strategy to recover the boundary of the failed nodes in interdependent networks during cascading failures. A progressive recovery approach [21], that consists in choosing the right sequence of links to be restored after a disaster in communication networks, proposes to maximize the weighted sum of the total flow over the entire process of recovery [22], as well as to minimize the total cost of repair under link capacity constraints [23].
Although the above papers [14][15][16][17][18][19][20][21][22][23] have contributed to a deep un-derstanding of recovery processes in networks, a general framework or methodology for quantifying the recovery capability of a real-world optical network is still lacking. In this paper, we propose a topological approach and two recoverability indicators to quantify the network recoverability for two different recovery scenarios, we will denote as Scenario A and Scenario B. The link-based Scenario A assumes that only the links which are damaged in the failure process can be recovered. For the energy-based Scenario B links can be established between any pair of nodes that have no link between them, after the failure process. The proposed approach involves three concepts: the network topology, the robustness metric and the recovery strategy. For an optical network G, we apply the average two-terminal reliability ATTR and the network efficiency E G as the robustness metrics for case studies. The average two-terminal reliability ATTR is defined as the probability that the service between a randomly chosen node pair in the network is available, which also expresses the level of difficulty to disconnect parts of the network. The network efficiency E G gives an indication of the efficiency of information exchange on networks under shortest path routing [24]. Besides a random recovery strategy and some strategies based on topological properties, we also consider a greedy recovery strategy. In the greedy strategy, the damaged element (a node or a link) which improves the network performance most has the highest priority to be recovered. Our approach is tested on 20 real-world optical networks, and we verify that the proposed recoverability indicators allow us to compare the performance of different recovery strategies and assess the recoverability of different networks.
The rest of this paper is organized as follows: Section 2 introduces the topological approach for measuring the network recoverability for the two considered recovery scenarios. Section 3 presents the main concepts in the evaluation of network recoverability. The experimental results are exhibited in Section 4. Section 5 discusses the sensitivity of the network recoverability on different robustness metric thresholds. Section 6 analyzes the correlation of topological metrics with recoverability indicators. Section 7 concludes the paper.

Topological approach for measuring network recoverability
In this section, we introduce an approach for measuring the network recoverability for real-world optical networks for two recovery scenarios.

R-value and challenges
We inherit the framework and some definitions proposed for network robustness [4,25] and extend the methodology for the network recoverability. A given network determined by a service and an underlying topology is translated into a mathematical object, defined as the R-value, on which computations can be performed [4]. The R-value takes the service into account and is normalized to the interval [0,1]. Here, R = 1 reflects complete functionality in a network without failures, and R = 0 corresponds to the complete lack of functionality for a sufficiently degraded network.
An elementary challenge is an event that changes the network and thus changes the R-value. We assume that an elementary changes take place one by one, and thus do not coincide in time. Considering linkbased failures and targeted link cuts as common threats to optical infrastructure networks, we confine an elementary challenge to a link removal in a failure process or a link addition in a recovery process. Since every perturbation has an associated R-value, any realization of such a failure process, followed by a recovery process, consists of a number M of elementary challenges and hence can be described by a sequence of R-values denoted {R[k]} 1≤k≤M , where k is the sequence number of elementary challenges.  1 Sometimes also called network restoration.

Link-based Scenario A: recovery of any alternative link
Let M G0(N,L) denote the robustness metric value of the original network G 0 (N,L), with N nodes and L links. Assume that during the failure-recovery process, the resulting graph has L* links and is denoted by G(N, L*). We define the R-value R G as the normalized value of the robustness metric M G(N,L *), which satisfies Thus, the R-value R G0 of the original network G 0 (N, L) equals 1. We assume failures in the network only consist of link removals in the network, according to a fixed strategy, such as random failure or targeted link cuts, which usually degrade the robustness of the network. We assume that links are damaged (removed) one by one, until we obtain a graph G f , whose R-value R G f first reaches or drops below a constant ρ, where ρ ∈ [0, 1] is a prescribed R-threshold for the robustness metric. Usually this threshold is chosen in such a way that while the Rvalue is still above it, the service quality remains acceptable [4]. The above process is called the failure process. The number of failure challenges, i.e., the number of damaged links in the failure process, is denoted by K f . For the same network G 0 , the smaller the value of K f , the more effective the failure process is in degrading the R-value [4].
Then we launch the recovery process from the remaining network G f (N,L-K f ). Scenario A assumes that the recovery links can be established between any two nodes in the complement of the graph after failures. The process of one realization is illustrated in Fig. 2a. Specifically, we recover the network by adding links, one by one, to the damaged network G f by a recovery strategy until the normalized robustness metric R G first reaches or excesses R Gr = 1. The network after the recovery process is denoted by G r (N,L-K f +K r ), where K r is the number of recovery challenges (i.e. the number of links that are added during the recovery process). For a given damaged network G f , the smaller the value of K r , the more effective the recovery process is. Ideally, the recovery process increases the R-value of the current network exactly to 1. However, the R-value R Gr of the resulting network G r (N,L-K f +K r ) is mostly larger than 1, since the robustness metric value of the resulting network G r (N,L-K f +K r ) is slightly larger than that of the original network G 0 (N, L) in most cases.
We define the Link Ratio η L as the ratio of the number of failure challenges K f and the recovery challenges K r , i.e., which indicates the efficiency of the recovery process in one realization.
A Link Ratio η L (G, ρ) > 1 implies that the network can be recovered by less challenges than the number K f of failure challenges. Otherwise, the network is more difficult to recover than to destroy. Scenario A can characterize the recovery process in a connection oriented network with logical connections [26], e.g., a virtual circuit for transporting data or a wireless backhaul network, where the links in a logical network represent the duplex channel between two devices. For example, after channels are interrupted because of signal fading or blocking in a mobile network, one should establish several connections or reconfigure several new channels to maintain the network's overall performance. Besides, Scenario A can also apply to the situation where network operator has the capability to build connections between any node pairs in the network. In this case, the overhead cost of the recovery measures mainly depends on the total number of dispatched connections, which corresponds to the number K r of recovery challenges in Scenario A.

Energy-based Scenario B: recovery of failed links
The failure process in Scenario B is the same as in Scenario A. In the recovery process in Scenario B, we restore one by one, all the links which were removed during the failure process, until the network is restored to its original topology. Scenario B can be used to describe recovery processes in the physical communication networks, e.g., optical backbone networks. In such networks, the recovery measure for each connection, e.g., repairing fiber optic cables, usually requires a relatively long period. During the recovery process, the network still provides services, albeit with a degraded performance. Thus, for this scenario, the network recoverability is related to the network performance (or the robustness metric) throughout the recovery process.
One realization of the failure and recovery process is illustrated in process, the energy of the recovery challenges S r (G, ρ) = represents the impact of the recovery process on the network performance. For Scenario B we define the Energy Ratio, denoted by η E , as the ratio between the energy of the recovery challenges S r and the energy of the failure challenges S f , in each realization for a given R-threshold ρ: An Energy Ratio η E (G, ρ) > 1 implies the benefit of recovery measures can compensate the loss of network performance by the failures, which indicates a high network recovery capability. Conversely, an Energy Ratio η E (G, ρ) < 1 implies a low recoverability.

Comparison via envelopes and the recoverability indicators
As we discussed in Section 2.1, the impact of any realization of failure and subsequent recovery process on the network's functionality can be expressed as a sequence of R-values {R[k]}, where k is the sequence number of elementary challenges. To investigate the recoverability of networks, we need to know the number of challenges needed to make the original R-value (which is normalized to 1) decrease to a predefined R-threshold ρ in the failure process and also the number of challenges needed to increase the R-threshold ρ to the original R-value.
This confines us to investigate the number of challenges K as a function of a specific R-value r, i.e., {K[r]}. Thus, each value in {K[r]} is the number of challenges that is needed to change R-value to a specific Rvalue r for each realization. Considering that it is impossible to list all values of r between the R-threshold ρ and the original R-value, we evenly The boundaries of the envelope are given by the extreme number of challenges K which gives the best-and worst-case values of the robustness metrics for a network after a given number of recovery challenges. The expected number of challenges K leading to the topological approach r j is Since K[r] defines a probability density function (pdf), we are interested in the percentiles of K[r] where K m% [r] are the points at which the cumulative distribution of We apply the envelopes to present the behavior of the failure and recovery processes on a network [4,25]. The envelope profiles the pdf of the random variables of the number of challenges K, which is the probability of a random variable to fall within a particular region. The area of the envelope can be regarded as the variation of the robustness impact of a certain series of challenges, which quantifies the uncertainty or the amount of risk due to perturbations.
We propose two recoverability indicators, the Link Ratio η L (G, ρ) and the Energy Ratio η E (G, ρ), for different scenarios, respectively. Since a failure process and a recovery process could be random under the random strategy, the recoverability indicators are random variables. We compare the recoverability of different networks by the average recoverability indicators for simplicity. For example, the average Link Ratio for two different networks G 1 and G 2 implies that the network G 1 usually has a better recoverability than G 2 in Scenario A for a given R-threshold ρ.
Besides the average recoverability indicators, we are also concerned about the variance of the recoverability indicators Var[η(G,ρ)]. A smaller variance of the recoverability indicators Var[η(G, ρ)] implies a narrower uncertainty of the recoverability indicators, thus a better recoverability.

Robustness metrics and recovery strategies
In this section, we introduce the factors which determine specific recovery process, namely robustness metrics, recovery strategies and network topologies.

Robustness metrics
We use two metrics: the average two-terminal reliability ATTR and the network efficiency E G , as the robustness metrics. These two metrics are closely related to service availability and data delivery on optical networks.
1) Average two-terminal reliability ATTR. In optical networks, the average two-terminal reliability (ATTR) can assess the resilience and vulnerability of a fiber infrastructure [27,28]. The metric is defined as the fraction of pairs of nodes with a path between them The ATTR measures the reachability fraction of any pair of nodes, but ignores the performance of the information exchange in a network. ATTR equals 1 when the network is fully connected; otherwise ATTR is the sum of the number of node pairs in every connected component, divided by the total number of node pairs in the network. At failure scenarios, the higher the average twoterminal reliability, the higher the robustness [8].
2) Network efficiency E G . We assume that the hopcount h(i,j), i.e., the number of links in the shortest path from node i to j, indicates the overhead of data delivery from end to end. Thus, the reciprocal of the hopcount 1/h(i,j) implies the amount of packages for one unit overhead, which can be interpreted as the efficiency of data delivery between two nodes in optical networks. If there is no path from i to j, h(i,j) = ∞ and 1/h(i,j) = 0. The efficiency of a given network is defined as the mean of the reciprocals of all the hopcounts h(i,j) in a network, i.e., see [24]. Network efficiency E G quantifies the efficiency of information exchange across the whole network under shortest path routing [29], such as the data transmission between controllers and switches in software-defined optical networks. Network efficiency monotonically decreases with successive link removals.

Failure and recovery strategies
For simplicity and generality, we consider a random failure strategy. The random failure strategy implies that the failures occur independently on links randomly and uniformly, which is consistent with the random failure stage in a product life cycle. The R-value R[k] for a determined number of failure challenges k is a random variable. We consider three different strategies for recovery measures, i.e., random recovery, metric-based recovery and greedy recovery: 1) Random recovery: The random recovery strategy refers to the strategy that the links are added randomly and uniformly, one by one, during the recovery process, which can describe a self-repairing process after failures or recovery measures without scheduling. 2) Metric-based recovery: The metric-based strategy determines the sequence of adding links by the topological or spectral metrics of links. While there are many relevant metrics, such as closeness and the effective resistance [30,31], we use three metric-based recovery strategies. The selection criteria of the link between nodes i and j for each strategy are illustrated as follows: (a) The minimum product of degrees d i d j . For each challenge in a recovery process, we select and restore the link l * ij with the minimum d i d j . If there are multiple node pairs with the same minimum product of degrees, one of these pairs is randomly chosen. (b) The minimum product (x 1 ) i (x 1 ) j of the ith and jth components of the eigenvector x 1 belonging to the largest adjacency eigenvalue [32]. For each challenge in a recovery process, we restore the link l * ij with the minimum (x 1 ) i (x 1 ) j . (c) The maximum absolute difference Δy = max( ⃒ is the absolute difference between the ith and jth components of the Fiedler vector y [33]. For each challenge in a recovery process, we restore the link l * ij with the maximum Δy. 3) Greedy recovery: The greedy recovery strategy involves adding the link l * max that makes the R-value increase the most in each challenge, where G c is the complement of the current network G. The greedy strategy is a practical and intuitive recovery strategy, where the current optimal link for improving the performance of the network has the priority to be recovered. 4) Worst case recovery: The worst case recovery strategy involves adding the link l * min that makes the R-value increase the least in each challenge, where G c is the complement of the current network G. This strategy is supposed to be an inefficient recovery strategy, where each time the link that contributes the least to the restoration of the network, is recovered.

Optical networks
As a case study we select 20 real-world optical communication networks. This set of networks was selected from the Internet Topology Zoo [34], covering optical backbone networks located in different regions of the world, see Table 1.
The topological properties of the 20 real-world optical networks are described in Table 1: the number of nodes N and links L, the average degree E[D], the spectral radius λ 1 , the algebraic connectivity μ N− 1 , the diameter ϕ and the assortativity ρ D . As shown in Table 1 which signifies a preference of high-degree nodes to connect to other low-degree nodes [35].

Results and discussion
In this section, detailed results and analysis on the real-world optical networks via the proposed approach for assessing network recoverability are presented. For some evaluation items, we only present results for a specific network, i.e., US_Signal. We set the R-threshold as ρ = 0.8 in the following simulations. The approach translates easily to other networks or other robustness metrics.

Envelope examples and comparison
Each realization of processes consists of a failure process and a following recovery process. Fig. 3 exemplifies the envelopes [25] of the challenges in US_Signal network for two scenarios and two robustness metrics, ATTR and E G , respectively, under the random recovery strategy. The envelopes for the failure processes are similar in different scenarios while link-based Scenario A usually needs more challenges to recover the robustness metrics than energy-based Scenario B, if the random recovery strategy is employed. The total number of challenges K f + K r could cover a wide range of values since the number of challenges K f + K r is influenced by two random processes (i.e., failure and recovery). Fig. 3a and c also illustrate that the R-value of the average number of challenges R[K avg ] for the robustness metric ATTR does not change smoothly with the number of challenges, in both the failure process and the recovery process, because only when a new component appears during the failure process or a component disappears during the recovery process, the ATTR value changes. Furthermore, R[K avg ] for ATTR decreases slowly during the initial stage of failure process but increases  Fig. 3b and d. We will show that the concavity of the function R[K avg ] could help to explain the behavior of the recoverability indicators.

Comparison of recovery strategies
The envelope computation can be applied to compare the performance of different recovery strategies for a specific realization of failures. Fig. 4 shows different recovery strategies (e.g., random, minimum d i d j , minimum (x 1 ) i (x 1 ) j , maximum Δy, worst case and greedy) for one realization of failure processes under random failure strategy in the US_Signal network. The envelope of recovery processes by random recovery for the average two-terminal reliability ATTR covers a larger surface than that of the network efficiency E G . This implies that the average two-terminal reliability ATTR in different realizations could deviate more under the random recovery and that the performance of random recovery is more difficult to guarantee. The average challenge sequence {K avg } under the random recovery can be a standard to evaluate the performance of other recovery strategies. As shown in Fig. 4a and c, the Fiedler vector-based strategy is comparable to the degree-based recovery in Scenario A and the eigenvector-based strategy in Scenario B, which outperforms the average random recovery. Fig. 4 also shows that none of the metric-based strategies, with minimum degree product, minimum eigenvector centrality product or maximum absolute difference between Fiedler vector components, can always outperform others for both robustness metrics in both scenarios. Fig. 4a and c exemplify that though the degree-based recovery performs well in link-based Scenario A for ATTR, it does not effectively recover the network in energy-based Scenario B. The eigenvector-based strategy outperforms the average behavior of the random strategy in the initial stage of recovery processes but degrades for more recovery challenges in Scenario A. As is shown in Fig. 4b and d, these three metric-based recovery strategies are close to and even worse than the average random recovery.
Meanwhile, we notice that the greedy recovery usually upper bounds the random recovery envelopes. The R-value as a function of the number of challenges k under the greedy strategy is concave in the recovery Fig. 3. Envelopes of the challenges for two scenarios and two robustness metrics (i.e., the average two-terminal reliability ATTR and the network efficiency E G ) in US_Signal network, by random recovery strategy. Each envelope is based on 10 4 realizations. process, which demonstrates the diminishing returns property of the recovery measures. The greedy recovery provides the most effective way to recover the performance for both robustness metrics, ATTR and E G , when compared with other listed recovery strategies. The worst case recovery strategy is usually beneath the random recovery envelopes. Among all recovery strategies, the greedy/worst case strategy performs the best/worst. In link-based Scenario A, both for ATTR and E G , the greedy recovery and the worst case recovery loosely bound the random recovery envelop, because there are much realizations, while envelopes generated by simulation cannot cover all these realizations. The greedy recovery and the worst case recovery tightly bound the random recovery envelop because the number of realizations in energy-based Scenario B is limited.

Overview of the Link Ratio and the Energy Ratio
We employ the proposed approach and the recoverability indicators η (including the Link Ratio η L and the Energy Ratio η E ) to evaluate the 20 real-world optical networks. Fig. 5 shows the recoverability indicators under two different scenarios, two robustness metrics and two recovery strategies for the 20 considered networks by violin plots. Violin plots are similar to box plots, except that they show the probability density of the ratios η at different values, which presents more insights about the ratios η under random circumstances. Moreover, violin plots can be applied to compare the performance of any two different strategies, in this case the random and the greedy strategy. Fig. 5 shows that almost all histograms of the ratio η, regardless of the scenarios, the strategies and the metrics, exhibit heavy-tailed distributions, while the greedy strategy presents a heavier tail when compared with random recovery strategy. Also, the ratio η has a wider range of values under the greedy strategy, which implies the greedy strategy has a higher probability to lead to a large ratio η, as well as a better recovery performance.
For both robustness metrics in Scenario A, Real7 (PionierL1) and Real8 (RoEduNet) have an average Link Ratio E[η L ] < 1 for the random strategy, which implies a relatively low recovery capability. By contrast, Real10 (US_Signal), Real16 (Palmetto) and Real17 (Sunet) have a large average Link Ratio E[η L ] > 1, which clearly outperform other networks, both for the random strategy and the greedy strategy.
The Energy Ratio η E exhibits other behaviors than the Link Ratio η L in Fig. 4. Comparisons of different recovery strategies for one realization of failures in US_Signal network. Two scenarios and two robustness metrics (i.e., the average two-terminal reliability ATTR and the network efficiency E G ) are applied. Each envelope is based on 10 4 realizations.  Table 1.
Scenario A. The average Energy Ratios E[η E ] for the robustness metric ATTR are much larger than 1 under the random strategy, which can be explained by the fact that the function R[K avg ] decreases slowly during the initial stage of the failure process but increases fast during the initial recovery process (illustrated in Section 4.1). Thus, the energy S r is much larger than S f , i.e., the average Energy Ratios E[η E ] is much larger than 1 for ATTR. Since the function R[K avg ] is concave for the robustness metric E G and thus the energy S f < S r , the average Energy Ratios E[η E ] for different networks are slightly larger than 1. The average Energy Ratio E[η E ] in Scenario B under the greedy strategy is usually located in the tail of the distribution of the Link Ratio η L under the random strategy, which demonstrates that the greedy strategy can increase the recoverability of networks significantly.

Relation between Scenario A and Scenario B
To compare the recoverability between different networks, we employ so-called Scenario A-Scenario B plots, which show the Energy Ratio vs. the Link Ratio, under a given recovery strategy. Scenario A-Scenario B plots are divided into 4 quadrants, by the reference lines η L = 1 and η E = 1, in order to easily assess the recoverability by the location of the average ratios E[η L ] and E[η E ]. Fig. 6 shows the average ratios E [η] and the standard deviations

̅̅̅̅̅̅̅̅̅̅̅̅ ̅ Var[η]
√ for the real-world networks in Scenario A-Scenario B plots. Fig. 6a and b show that when the R-value is the average two-terminal reliability ATTR, the two recoverability ratios corresponding to two different scenarios have a positive correlation, e.g., a higher Link Ratio η L in Scenario A typically leads to a higher Energy Ratio η E in Scenario B, both for random recovery and greedy recovery.
Compared with Fig. 6a and b, c and d show that when adopting the network efficiency as the R-value, the two recoverability ratios have a weak correlation, e.g., a higher Link Ratio η L in Scenario A typically does not lead to a higher Energy Ratio η E in Scenario B both for random recovery and greedy recovery. This implies that the R-value influences the correlation between Scenario A and Scenario B. √ , ]. Fig. 6 shows that all the average Energy Ratios E[η E ] are located in the first and the second quadrant, which demonstrates a good recoverability of tested networks in Scenario B. However, for the random recovery, the average Link Ratios E[η L ] of some networks are in the second quadrant, which suggests these networks have low recoverability in Scenario A.
Both the average Link Ratio E[η L ] and the Energy Ratio E[η E ] can be increased by applying the greedy strategy, but the performance can be different. For example, the average Link Ratio E[η L ] of network Real14 (NetworkUSA) is smaller than that of network Real11(Darkstrand) under the random strategy but larger than that of network Real11 under the greedy strategy, which implies that the performance of a recovery strategy strongly depends on the network topology.

Sensitivity analysis of network recoverability
In previous sections, the R-threshold was fixed at the value ρ = 0.8.
In this section we investigate the influence of different R-thresholds on the Link Ratio η L and the Energy Ratio η E . Figs. 7 and 8 show the impact of different R-thresholds on recoverability indicators η for 4 optical networks, for the average two-terminal reliability ATTR and the network efficiency E G , respectively.
We conclude from Fig. 7 Fig. 3a and c, the function R[K avg ] is approximatively concave when the average number of challenges is small (corresponding to a high R-threshold). As the number of challenges increases in order to degrade the R-value to a lower R-threshold, the function R[K avg ] gradually becomes more convex, which is in line with the results obtained in Ref. [8].
Thus, the Energy Ratio η E , which equals the energy of recovery challenges S r divided by the energy of failure challenges S f , tends to become larger as the R-threshold increases. Fig. 8 shows that when the R-value is the network efficiency E G we can conclude the following: 1) the average Energy Ratio E[η E ] and the average Link Ratio E[η L ] are not always monotonically changing as the Rthreshold increases. Specifically, for networks Darkstrand and Funet, the average Link Ratio E[η L ] for the greedy recovery is slightly decreasing with a higher R-threshold, while for networks Shentel and US_Signal, the average Link Ratio E[η L ] is increasing when the R-threshold increases Fig. 7. The impact of thresholds on recoverability indicators for the average two-terminal reliability ATTR in 4 optical networks. from 0.5 to 0.8. Nevertheless, the average Energy Ratio E[η E ] first increases and then decreases with the increment of the R-threshold, both for random recovery and greedy recovery, which may imply an optimal R-threshold for Scenario B exists. 2) Compared with Fig. 7, the average Energy Ratio E[η E ] in Scenario B for network efficiency is less sensitive than that for ATTR. This reveals that the sensitivity of recoverability indicators largely depends on the choice of the R-value. 3) For both average two-terminal reliability ATTR and network efficiency E G , the greedy recovery a better performance than random recovery, for different R-thresholds. Thus, we propose to use the greedy recovery strategy.

Correlation of metrics with recoverability indicators
In this section, we explore the correlation between recoverability indicators in the random recovery scenario and 10 widely studied network metrics: the average degree E[D], the spectral radius λ 1 , the diameter ϕ, the algebraic connectivity μ N− 1 , the assortativity ρ D , the average hopcount E[H], the clustering coefficient c G , the ratio μ 1 / μ N− 1 , the effective graph resistance r G and the global efficiency E[1/H]. Results are shown in Tables 2 and 3, which are based on 200 optical backbone communication networks in the specialized database [34].
We use the Spearman's rank correlation coefficient ρ s [35] to evaluate the correlation between the recoverability indicators and the 10 network metrics. The Spearman's rank correlation coefficient ρ s is less restrictive than the Pearson's correlation coefficient ρ p since the latter only estimates the linear correlation between two variables. The Spearman's rank correlation coefficient ρ s measures the strength and where F X (X) and F Y (Y) are the probability distribution of the variable X and Y, respectively. ρ p (F X (X), F Y (Y)) is the Pearson's correlation coefficient between F X (X) and F Y (Y).

Conclusion
This paper proposes a topological approach for evaluating the network recoverability in two scenarios, the link-based Scenario A and the energy-based Scenario B. We assess the recoverability of 20 realworld optical networks for two robustness metrics: the average twoterminal reliability and the network efficiency. All the optical networks have a healthy recovery capability in Scenario B under the random recovery strategy, i.e., the average Energy Ratio E[η E ] > 1, while two of the networks (PionierL1 and RoEduNet) suggest topological improvements for the recoverability in Scenario A, i.e., the average Link Ratio E[η L ] < 1. The performance of the recoverability in Scenario B can be explained by the concavity of the R-value as a function of the number of challenges. There is also a strong correlation between the network recoverability and the recovery strategy. The greedy recovery strategy exhibits a good performance for the investigated robustness metrics and thus improves the network recoverability. The network efficiency is less sensitive to different R-value thresholds while the Energy Ratio E[η E ] for the average two-terminal reliability increases significantly with increasing thresholds in Scenario B. The assortativity has the strongest correlation with the average Link Ratio and the average Energy Ratio, when the robustness metric is either the average two-terminal reliability or the network efficiency.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Table 3 The Spearman's rank correlation coefficient ρ s between 10 network metrics and the two recoverability indicators. The R-value considered here is network efficiency E G . Results are based on 200 real-world optical networks.