Optimizing network robustness via Krylov subspaces

We consider the problem of attaining either the maximal increase or reduction of the robustness of a complex network by means of a bounded modification of a subset of the edge weights. We propose two novel strategies combining Krylov subspace approximations with a greedy scheme and an interior point method employing either the Hessian or its approximation computed via the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS). The paper discusses the computational and modeling aspects of our methodology and illustrates the various optimization problems on networks that can be addressed within the proposed framework. Finally, in the numerical experiments we compare the performances of our algorithms with state-of-the-art techniques on synthetic and real-world networks.


Introduction
When studying and analyzing a complex network, one of the main questions is how to identify important nodes and robust connections among them, given the network topology and no other external data. There is a broad literature on the subject, with many different models and associated algorithms. As a network can be naturally represented by a matrix, many successful approaches strongly rely on tools from linear algebra and matrix analysis [8,19].
Spectral models, such as eigenvector centrality [42], PageRank [26], or resistance distance [32], are based on the eigenvalues and eigenvectors of graph matrices and rely on a mutually reinforcing argument, while path-based models, such as Katz centrality [43], subgraph centrality and total communicability [17], use the entries of suitable graph matrix functions and are based on weighted walk counts.
For example, if x ≥ 0 is the Perron eigenvector of the adjacency matrix A of an undirected graph G = (V, E), then Ax = λx with λ > 0 and hence x i ∝ j A ij x j > 0, for all the nodes i ∈ V . Thus, we can interpret the entries of x as an importance score for the nodes of G, known as Bonacich centrality or eigenvector centrality [42], where x i , the importance of node i, is mutually reinforced by the importance of its neighbors. Similarly, if we are given a function of the adjacency matrix f (A) = a 0 I + a 1 A + a 2 A 2 + a 3 A 3 + · · · (1) where the coefficients a k are nonnegative, we can interpret the diagonal entries of f (A) as node importances. In fact, in a weighted graph G, the weight of a walk from i to j of length k can be defined as A u0u1 A u1u2 · · · A u k−1 u k > 0, where all pairs u i u i+1 are edges and u 0 = i, u k = j. Thus, the sum of the weights of all the walks of length k from i to j corresponds to (A k ) ij and the diagonal entry f (A) ii defines the so-called f -centrality or subgraph centrality score of the node i [18], which corresponds to the weighted sum of all the walks of any length from i and returning to i, i.e. the subgraphs containing i. Related to the individual centrality of a node, are important notions of network robustness and network connectivity, which can be quantified by the summations i x i , i f (A) ii and ij f (A) ij , respectively (see e.g. [7,16]). These quantities measure the degree of resiliency of a network in the face of accidental failures or deliberate attacks, modeled as edge modification, removal, or insertion. Both spectral and

Related work
Optimizing network robustness or network connectivity is in general very challenging, due to the combinatorial nature of the problem. A large body of work has focused on spectral-based scores. The problem of minimizing the largest eigenvalue (spectral radius) of A by a small number of edge and node removals is considered in [41,37,45]. This is shown to be an NP-hard problem which is addressed by a number of heuristics in [41,37] or via a semidefinite program with polynomial time complexity in [45]. A similar problem is considered in [39,31], with the aim of optimizing the network diffusion rate. The works [24,44] studied the problem of maximizing the algebraic connectivity, i.e., the second smallest eigenvalue of the graph Laplacian, and propose both a convex relaxation-based method and a greedy perturbation heuristic, based on the entries of the Fiedler eigenvector of the initial network. In [6] it is studied the problem of modifying network edges to reduce external influence. This is done by controlling the asymptotic consensus value x T a, where x is the eigenvector centrality, i.e. the Perron eigenvecror of A, and a is a vector of external user consensus coefficients. The eigenvector centrality x is also the subject of [35], where it is observed that, often, modifying a very small subset of edges of a real-world network is enough to drastically change and thus control the eigenvector centrality value of any node in the network. Instead, the Perron eigenvector of the PageRank matrix, so-called PageRank or random walk centrality, is the subject of [23]. Alongside spectral-based coefficients, other network scores have been considered by several authors. For example, [34] deals with the problem of improving both coverage and betweenness centralities by adding a small set of edges to the network. Greedy algorithms for improving coverage and closeness centralities are proposed in [13] and [12], respectively.
Centrality optimization problems for indices defined by means of matrix functions are considered for instance in [25,2,7]. These works target the optimization of a number of robustness and connectivity coefficients of the network, by modifying, adding, or removing a small subset of edges. In [25], a semidefinite program-based approach is proposed for the optimization of the total effective resistance, defined as Tr(L + ) = i (L + ) ii , where L + is the pseudo inverse of L. In [2,7], instead, given a suitable function f , a number of heuristics are proposed to efficiently enhance the network natural connectivity, defined as f −1 (Tr(f (A))/n), and the network total communicability 1 T f (A)1 = ij f (A) ij , respectively. Both these two studies show that very good results can be achieved by modifying edges between nodes with high or low centrality values.
Building on top of this body of work, we focus here on the optimal modification of the network's natural f -connectivity. In the sequel, we formalize the problem and the algorithms we propose.

Optimizing the natural connectivity
Networks strongly rely on their robustness, i.e., the ability to maintain a high degree of connectivity when a portion of the networks structure is damaged or simply is altered. An intuitive notion of graph robustness can be expressed in terms of the redundancy of routes between vertices. If we consider a source vertex and a termination vertex, there may be several paths between them. When one path fails, the two vertices can still communicate via other alternative routes. It is intuitive that the larger is the number of alternative routes, the larger is the robustness of the connection between the vertices. Thus, an ideal measure of robustness for a network would be the degree of redundancy of alternative paths, i.e. the number of alternative routes of different lengths for all pairs of vertices. However, this number is very difficult to compute.
An alternative definition of robustness, which is usually called "natural connectivity", counts instead the number of closed walks of any length. Let G = (V, E) be an undirected, possibly weighted graph with V = {1, . . . , n} and entrywise nonnegative symmetric adjacency matrix A ≥ 0, such that A ij > 0 if and only if ij ∈ E. As the number of closed walks of length k from i to itself coincides with the i-th diagonal entry of the k-th power of the adjacency matrix, we can quantify the natural connectivity by looking at where λ 1 ≤ · · · ≤ λ n are the eigenvalues of the adjacency matrix A. The scaling factor 1/k! is required here in order to have a convergent series, while the logarithm and the scaling factor 1/n are used to avoid very large numbers as they yield an "average" of the eigenvalues of the adjacency matrix. More in general, we can consider the natural f -connectivity (f -connectivity, in short) as the generalized f -mean of the eigenvalues of A where f is a real-valued, increasing, and analytic function on a set containing the spectrum of A.
As f is increasing, it is not difficult to realize that ϑ(A) itself changes monotonically with the edges of the graph, that is, ϑ(A) grows if edges are added, and decreases if they are removed. In the following, we assume we are given a budget k representing the number of edges, or the cumulative edges' weight, that can be either removed or added to the graph. Thus, we consider the optimality problem of using the given budget to either reduce or increase ϑ(A) the most.
In matrix terms we can formulate the corresponding optimization problem as follows. Assume we are given the initial graph with adjacency matrix A. We want to find a modification X of the network edges A that either maximizes or minimizes the function ϑ(A + X), subject to suitable constraints on X which account for the budget and for whether we are removing, adding or modifying the weight of the edges, as detailed next. The constraints on X also depend on whether we are considering weighted or unweighted (binary) networks. To summarize we consider the following three classes of optimization problems.

Edge downgrading
Let us assume that we are given a budget k and we want to remove or diminish the weight of the edges that yield the greatest decrease in f -connectivity. Given the graph G = (V, E), we then consider the set of admissible modifications The downgrading problem for unweighted graphs, more often referred to as edge breaking problem [7], is: while for weighted graphs the second constraint is replaced by

Edge addition
In this setting, we consider the situation where new edges may be introduced in order to increase the f -connectivity of the network. In this case, given a budget k, the set of admissible modifications takes the form For unweighted graphs, we obtain the following optimal edge addition problem To avoid trivial solutions, where all the budget is spent on a single most important edge, when dealing with weighted networks, we further assume we are given a set of maximum weight values U ij that we are allowed to spend on each edge:

Edge tuning
Finally, in the third problem, we are given the budget k and a weighted graph G, and we look for a modification of the edge weights of a limited set F ⊆ E of the existing edges in order to obtain the largest increase in f -connectivity. We will also consider the case where F includes both existing and non existing edges, to address the scenario where the creation of new links is also allowed; we call this slightly modified problem edge rewiring. As for (AD'), we assume a set of maximum weight values U ij is given, to avoid trivial solutions:

Algorithmic set up
Before moving on to the proposed algorithmic techniques, we make several preliminary remarks. First, we note that since f in the definition of ϑ is an increasing function so is f −1 and minimizing (resp. maximizing) the natural f -connectivity is equivalent to minimizing (resp. maximizing) the trace variation with respect to X.
Second, we observe that the dimensions of the constraint sets that involve all the existing (or non-existing) edges in the graph are usually very large, already for graphs of moderate size. For this reason, in the rest of the paper we further restrict the optimization problems above to a subset of the edges (or non-existing edges) F whose elements are cleverly selected and whose size is kept under control.
The selection of a suitable F may depend on the problem at hand, and we will call this procedure "the search space selection", which will be discussed case by case in Sections 3.2 and 6. Note that, in real applications, further constraints on the set of modifiable edges (or nonexisting edges) may be imposed by the application set-up, for example, one may have only access to a certain part of the network (think of the case of a street network where most of the roads may not be modifiable). Imposing this additional problem-based constraint can be implemented by straightforward modifications of the above optimization problems.

Edge downgrading and addition for unweighted graphs
In this section we propose some heuristic greedy procedures for addressing the optimization problems (DG) and (AD). We begin by describing the general greedy template that is behind our method and other algorithms proposed in the literature. Throughout the discussion we assume to have a budget of k edges.

The greedy paradigm
The most intuitive greedy strategy for problem (DG) (resp. (AD)) consists of sequentially removing (resp. adding) the edge that attains the largest reduction (resp. increase) of ϕ A until k deletions (resp. additions) are performed. Usually, the identification of the jth edge to be either added or removed is made by evaluating or approximating the variation of ϕ A on a large number of candidate edges. Even in the case of an exhaustive search of candidates over the whole edge set (or the whole set of missing edges, in the case of (AD)), this greedy procedure is guaranteed to return the optimal solution only for k = 1; on the other hand, when k > 1, we expect that the selected set of k edges provides a significant modification of ϕ A .
When dealing with medium to large networks, the implementation of this greedy procedure poses two major computational issues: (i) The large number of edges in the search space to be processed in each step, (ii) The cost of evaluating (or approximating) the cost function ϕ A .
Concerning (i), we remark that, when the graph is sparse, an exhaustive search would require considering O(n) edges for problem (DG) and O(n 2 ) edges for problem (AD). When such sets have large sizes, this step can be prohibitively expensive. This is circumvented by restricting the search space for the jth edge to an appropriate subset F j of moderate size. In the case of (DG), Similarly, task (ii) involves f (A) and f (A + X) but can not be addressed by directly forming these matrix functions as it is either not convenient or doable for large matrices. Even for small to medium size matrices, as X changes at each greedy step, computing f (A+X) each time would be prohibitively expensive. Efficient greedy methods make use of techniques that approximate the variation ϕ A with a reduced computational cost.
In Algorithm 1 we present a general scheme for the above greedy strategy, for the case of (DG). The analogous algorithm for (AD) is obtained with straightforward modifications at lines 7 and 8, by changing the sign of the rank-2 update and reversing the inequality for δ opt , which has to be initially set to −∞. Then, in the next two subsections, we will present our proposed strategy for addressing the two points (i) and (ii) above. In particular, we propose the use of a Krylov subspace-based approach for the approximation of the variation ϕ A , which will guarantee an accurate approximation with a computational cost of O(n).

Selection of the search spaces
The strategy for selecting the sets F j has to ensure a feasible size of the search space and that the most meaningful edges are considered. Intuitively, the second requirement is the trickiest as, due to the combinatorial nature of (DG) and (AD), only an exhaustive search space can guarantee it. The latter choice might be computationally viable for problem (DG) where each for (s, t) ∈ Fj do 7: if ϕA(X) ≤ δopt then 10: δopt ← ϕA(X)

15:
A ← A + Xopt 16: end for 17: return δopt, ∆A 18: end procedure F j has at most O(n) edges, assuming the initial graph is sparse. If n is moderate and the cost of evaluating ϕ A (X) is at most linear on n, then we consider the following search spaces with Chosen(j) := {edges selected in the first j-th steps of Algorithm 1}. When strategy (S full DG ) is too expensive, an alternative is to define a ranking on the set of edges to heuristically identify the most important ones. Here we propose to rank the edges on the basis of the eigenvector centrality scores of the nodes they connect, as these scores for the nodes are cheap to evaluate for sparse graphs. More specifically, given two edges (v 1 , v 2 ) and (v 3 , v 4 ), we consider the following two rankings ≤ 1 and ≤ 2 on V × V : where eigc(v) denotes the eigenvector centrality of node v ∈ V , i.e. the v-th entry x v of the Perron eigenvector x of the adjacency matrix. The ordering ≤ 1 is a standard way of inferring centralities for edges from the node scores [2,40]. However, we note that ≤ 1 may still assign large importance to edges that connect nodes with small centrality; this is prevented by ≤ 2 which thresholds the edge score by the smallest node centrality involved. We observe that ≤ 2 works better in practice, as shown in the numerical experiments in Section 5.
Finally, given a subset of edges F ⊆ V × V and a positive integer q, we denote with [F ] ≤i q the subset of F made by its larger q elements according to ≤ i , i = 1, 2. The following selection strategies maintain a search space of size q at each step of Algorithm 1: where we have used the subscripts DG and AD to emphasize that the corresponding strategy is meant for problem (DG) and (AD), respectively.
Finally, we describe an additional selection strategy for (AD) proposed in [7], a method we will use as benchmark for comparison in our experiments. Let d be the maximum node degree of the graph and denote by V d ⊆ V the set of d nodes of largest degrees. Then, the selection strategy uses the missing edges contained in V d ×V d . This is formally expressed with the following equation: Note that, strategy (S 3 AD ) only ensures that the search space has cardinality bounded from above by d 2 ; this might be a very weak property for certain graph topologies, as |F j | can be very small.

Updating the trace of f (A)
The main computational efforts of Algorithm 1 come from evaluating ϕ A (X) at line 8. Note that the matrix X at that step of the algorithm is symmetric and has rank 2. Leveraging this key rank property, we can devise a method of cost O(n) for computing the variation ϕ A (X), based on the Krylov subspace method in [4]. We start by describing in Section 3.3.1 the proposed Krylov method; then, in Section 3.3.2 we report another approximation of ϕ A that has been previously used in the literature and that will be used as a baseline for comparison later.

A Krylov projection method
Let A be a symmetric adjacency matrix, X a symmetric low-rank modification and f (z) a scalar function. In [4] it has been proved that, under mild assumptions, the matrix ∆f := f (A + X) − f (A) is of low numerical rank and its approximation can be perfomed by means of Krylov subspaces. We will see that, with some minor modifications, this also allows to cheaply approximate Tr(∆f ) = Tr(f (A + X)) − Tr(f (A)).
Let us assume X = U X B X U * X with U X ∈ R n×s , B X = B * X ∈ R s×s and denote by K m (A, U X ) the m-th order Krylov subspace generated by A and the (block) vector U X : If m steps of the Arnoldi process on A and U X can be carried out without breakdowns, then it returns an orthonormal basis U m = [U 1 | . . . |U m ] ∈ R n×ms of K m (A, U X ) which verifies the following block Arnoldi relation [28]: with a ms × ms block tridiagonal matrix H m , a s × s matrix H m+1,m , and E T m = [0| · · · |0|I s ] ∈ R s×ms , where I s denotes the s × s identity matrix. An approximation of ∆f is given by where W m := U * m U X ∈ R ms×s . The algorithm proposed in [4] builds -incrementally in mthe Arnoldi relations (2) and their corresponding quantities ∆ m f . We remark that the matrix The method stops when the heuristic stopping criterion is satisfied for a prescribed tolerance and a positive integer . In our implementation we set = 2.
Concerning the approximation error, the method is exact when f (z) is a low degree polynomial; more precisely, ∆f = ∆ m f when f ∈ P m−1 , where P m−1 denotes the set of polynomials of degree at most m − 1. For a more general f , the error norm is linked to the best polynomial approximation of f on a set Π containing the convex hull of the spectrum of A and A + X [4, We remark that, if the goal is to approximate Tr(∆f ), then we can avoid the evaluation of matrix functions at all. Indeed, for computing Tr(∆ m f ) = Tr(f (H m + W m B X W * m )) − Tr(f (H m )) it is sufficient to retrieve the eigenvalues of the small symmetric matrices H m and H m + W m B X W * m , and then apply the function f to them. Moreover, a thighter approximation bound is obtained for this particular case, namely [11,Theorem 3]: We report the pseudocode of the procedure for approximating the variation Tr(f (A + X)) − Tr(f (A)) in Algorithm 2.

Approximation via eigendecomposition update
The algorithm make it or break it (MIOBI) proposed in [7] approximates the difference of traces by means of a first order approximation of the largest eigenpairs of A + X. More specifically, given a positive integer h, the procedure starts by computing the eigenpairs (λ 1 , u 1 ), . . . , (λ h , u h ) of A, corresponding to the h eigenvalues of largest magnitudes. For each X, the authors of [7] observe that the dominant h eigenpairs λ j , u j of A + X can be written as Thus, it is proposed to consider the pairs ( λ j , u j ) as approximations of ( λ j , u j ), i.e., to neglect the high order terms O( X 2 ). The resulting procedure is of the same form as Algorithm 1, with two main modifications: at line 8 the formula h j=1 f ( λ j ) − f (λ j ) is used to approximate the trace update Tr(f (A + X)) − Tr(f (A)); then at line 15 both formulas for λ j , u j are used to approximate the h dominant eigenpairs of A + X opt . Overall, this yields an algorithm with an iteration cost of O(|F j |h + nh 2 ).

Algorithms for edge downgrading and edge addition
We are now ready to formally introduce the methods that we propose for solving (DG),(AD): greedy Krylov break Algorithm 1 combined with trace fun update for evaluating the difference of traces at line 8 and using the strategy S 2 DG for selecting the sets F j .
greedy Krylov make Algorithm 1 combined with trace fun update for evaluating the difference of traces at line 8 and using the strategy S 2 AD for selecting the sets F j . Moreover, to provide a comparison with the performances of state-of-the-art greedy schemes we consider the following methods: miobi Greedy method proposed in [7] that uses (3.3.2) for evaluating the difference of traces and the selection strategies (S full DG ) and (S 3 AD ) for (DG) and (AD), respectively.
eigenv Method proposed in [2] that consists in deleting or adding the k edges with the largest eigenvector centrality scores -with respect to ≤ 1 -in E and V × V \ E, respectively.

Edge downgrading, addition, and tuning for weighted graphs
When considering the solution of (DG'), (AD'), and (TU), one needs to deal with a constrained continuous optimization problem involving the objective function ϕ A (X). Similarly to what has been done for unweighted graphs in Section 3, we keep under control the size of the problem by imposing that we are allowed to modify only a subset F of the edges (or the missing edges), with cardinality n F . With this constraint, we have that ϕ A can be seen as a function of n F variables ϕ A : R n F → R, which correspond to the variation of the weights of the edges in F . In particular, the matrix X has rank bounded by 2n F , i.e., to efficiently evaluate ϕ A (X) we can rely on Algorithm 2, as far as 2n F n. We perform the efficient optimization of ϕ X via a tailored implementation of the Limitedmemory BFGS algorithm (L-BFGS), a quasi-Newton method that iteratively computes an approximation of the Hessian via rank-2 correction updates and thus only requires the evaluation of the objective function and its gradient. Function updates are computed by means of Algorithm 2 as before. The gradient computation, instead, requires additional analysis as it can be prohibitively expensive if done in a naive way. We devote the remainder of this section to designing a strategy, presented in Algorithm 4, to efficiently evaluate the gradient of ϕ X during the iterations of L-BFGS. The resulting procedure, obtained combining L-BFGS, Algorithm 2 and Algorithm 4, is denoted Krylov lbfgs. Our implementation of Krylov lbfgs relies on the Matlab function fmincon that allows us to select L-BFGS as solver and to specify handle functions for the evaluation of the objective function and of the gradient.

Gradient approximation via Krylov methods
We now look at the gradient of ϕ A . Let us denote by ind : F → {1, . . . , m} an ordering map on the set F and observe that the derivative with respect to the ijth component is Putting it all together we have that Note that, all the entries of the gradient in (4) involve evaluating the action of the Fréchet derivative on a rank 1 matrix. The latter property implies the low-rank approximability of (4) that in turn enables us to leverage an efficient Krylov subspace technique [30], as discussed next. The main computational effort in evaluating (4) comes from the quantities of the form L f (M, 1 i 1 T j ), for a given symmetric matrix M . The evaluation of the Fréchet derivative in a certain direction can be recast as evaluating the function of a specific augmented matrix. More precisely, applying the well-known formula in [33, Theorem 2.1] to our framework, yields so that we can look at extracting the (1, 2) sub-block of (5). Since directly provide the expression of the projected augmented matrix Thus, the method computes the quantities f,m− )| ≤ is verified, for a prescribed tolerance and a positive integer . In our implementation we set = 2. The full procedure is reported in Algorithm 3.
We point out that, the approximation error associated with the sequence Tr(L (i,j) f,m ), m = 1, 2, . . . , decays at least as the best polynomial approximation error of f on the convex hull of the spectrum of M , which we denote by Π. More precisely, a direct consequence of [30, Corollary 1] is the following bound:

Multiple evaluations of
Evaluating (4) requires to approximate the quantities L f (A, 1 i 1 T j ) for all edges (i, j) ∈ F , with |F | = n F n. In principle, running Algorithm 3 n F times (on each pair (i, j) ∈ F ) performs the sought evaluation. On the other hand, it is possible to enhance the efficiency by avoiding redundant computations due to the repetition of the same nodes in edges of F and thus the NC ← F , m = 1 NC is the set of not converged edges 3: while NC = ∅ do 4: for i ∈ V (F ) do 5: if ∃j ∈ Vi(F ) such that (i, j) ∈ NC then 6: Compute (incrementally) the Arnoldi relation for Km(M, 1i) and store U for (i, j) ∈ NC do 10: 11: if m > and |Tr(L f,m (i,j) ) for (i, j) ∈ F 19: end procedure same Krylov subspaces. Denote by V (F ) the set of nodes that are linked by the edges in F , i.e., V (F ) := {i ∈ V : ∃j ∈ V such that (i, j) ∈ F } and for any such node i ∈ V (F ) let V i (F ) be the set of nodes that are connected to i via an edge in F , i.e., We proceed as follows: (i) For each i ∈ V (F ) we compute and store the Arnoldi relation (ii) While doing (i), for each pair (i, j) ∈ F , we store Tr(L f,m (i,j) − )| ≤ . Note that, m (i,j) ≤ m i which may yield a cheaper trace evaluation for that particular (i, j) ∈ F . The procedure which implements these enhancements is reported in Algorithm 4.

Numerical experiments with unweighted graphs
We test the performance of greedy Krylov break and greedy Krylov make with respect to their effectiveness in manipulating the graph natural connectivity, i.e. f (z) = e z , and their running time on 22 real-world unweighted networks. Details about the networks' size are reported in Table 1. Those listed on the left-hand side of Table 1 include social networks of geolocated reciprocated Twitter mentions within UK cities (Cardiff, Edinburgh), coauthorship networks (ca-AstroPh, ca-CondMat, ca-HephTh, netscience), a protein-protein interactions network (yeast) and a public transports network (London). All these networks are publicly available via public repositories, as reported in [9,27,3,38]. All the networks listed on the right-hand side of Table 1 are road networks of different cities in the world [22]. Our implementation is written using MATLAB and is available at the public repository https://github.com/COMPiLELab/ krylov_robustness, together with all the datasets above.
In the proposed experiments we compare with state-of-the-art methods miobi and eigenv, that have been recalled in Section 3.4. In particular, miobi uses 25 eigenpairs to compute the approximate trace variation as described in Section 3.3.2 and the search spaces S full DG , S 3 AD for problems DG and AD, respectively. In all our tests, we consider the matrix exponential function f = exp.
To assess the impact of the various methods on the natural connectivity of a network we consider the magnitude of the relative trace variation that, given the returned modification of the adjacency matrix X, we define as: Finally, to evaluate the scalability of the approaches we report their computational time in seconds.
The experiments have been run on a single AMD Opteron(tm) Processor (6376) with 8 cores and a single thread, running at 2.20 GHz, using MATLAB R2021a.

Downgrading for unweighted graphs
As a first experiment, we measure the quantity ∆T when solving problem (DG) with a fixed budget of k = 50 edges. The parameter q, used by the method greedy Krylov break to determine its search space, is set to the value min{1000, |E|}.
The performance of greedy Krylov break, miobi, and eigenv are compared over both road and general networks. The results reported in Table 2 show that greedy Krylov break and miobi always outperform eigenv. In addition, our greedy Krylov break achieves the best score on 9 out of 11 road networks. Also, for general graphs, greedy Krylov break and miobi provide the best scores although the results reported in Table 3 picture a balanced situation: on 7 out of 11 case studies, the difference between the scores of the methods is less than 2%. The most evident gain of the top method is measured for the medium-size graph ca-HephTh and the small graph netscience. In view of the significantly lower costs of miobi and eigenv (see next section), these results suggest that greedy Krylov break can be a valid competitor for the road networks dataset only.

Trace reduction and scalability with respect to the budget size
Then, we consider a second numerical test where we let the budget size k range in the set {10 j }, j = 1, . . . , 10, and we measure both the relative trace variation and the time consumption of the methods. Further, we investigate how the parameter q affects the performance of greedy Krylov break by considering three implementations of this method for q = 50, 250, min{1000, |E|}. As case studies, we select 6 road networks: Anaheim, Birmingham, ChicagoRegional, Hawaii, RhodeIsland, and Rome. Figure 1 reports the magnitude of the  relative trace variations attained by the five methods, as the budget increases. The method greedy Krylov break with the largest search space attains the highest scores on all the examples apart from Birmingham, where the returned trace variation is comparable with the one of miobi. There is no clear winner between miobi and greedy Krylov break with q = 500, while greedy Krylov break with q = 50 and eigenv always provide the 4th and the 5th scores.
The computational times shown in Figure 2 confirm that the cost of all algorithms has a linear scaling with respect to the parameter k. Also, their dependence on n is linear, but the hidden constant determines significantly different running times. In particular, in all case studies the three implementations of greedy Krylov break are the most expensive, then we have miobi and, finally, eigenv that is the cheapest method. As expected, reducing the parameter q improves the timings of greedy Krylov break however, in view of the scores in Figure 1, the convenience of a smaller search space is questionable. Overall, these results suggest that greedy Krylov break is preferable in a scenario where the robustness reduction matters more than the computing time.

Addition for unweighted graphs
Here we consider analogous tests to those performed in the previous section, for the optimization problem (AD). This time, we compare miobi and eigenv with the performance of our greedy Krylov make with q = min{1000, |E|}. In Table 4 and Table 5 it is reported the magnitude of the relative trace variation, obtained with k = 50, for road and general networks, respectively. For all road networks, greedy Krylov make is the clear-cut winner and outperforms the second-highest score of a factor between 1.5 and 5. For general networks, greedy Krylov make obtains either the best or near-best score on 10 out of 11 examples, although the gain with respect to the competitors is often more limited than for road networks.
Then, we investigate the impact of varying the budget size k in the range 10, 20, . . . , 100 on the trace variation and the computational time for the road networks considered in section 5.1.1. Also in this case, we consider three different sizes for the search space of greedy Krylov make, corresponding to the choices of the parameter q in the set of values 50, 250, min{1000, |E|}. Figure 3 reports the magnitude of the relative trace variation and highlights a crucial difference with respect to the downgrading problem: For any size of the search space, greedy Krylov make outperforms significantly its competitors on all case studies. We also note that, in contrast to the downgrading case, eigenv has either comparable or better performances than miobi on all case studies. Moreover, the computational times reported in Figure 4 demonstrate that by choosing the smallest size of the search space (q = 50), the cost of greedy Krylov make becomes   comparable to the one of miobi. This is also due to the fact that, for the addition problem, the search space of miobi might be significantly larger than in the downgrading case. Therefore, we conclude that greedy Krylov make should be the method of choice for problem (AD), unless a very strict limitation on the time consumption has to be applied.
6 Numerical experiments with weighted graphs: tuning, rewiring, addition Finally, we present results on a set of weighted networks in order to test the performance of the proposed method for the edge tuning problem (TU), as well as weighted edge-addition and edge rewiring, where we simultaneously tune the weight of existing edges and add new ones. While in certain applications the set F of edges (or missing edges) that we are allowed to modify is given a-priori, in our setup we will assume only the cardinality of the set F is fixed, i.e. we are free to select a set of n F ≥ 1 modifiable edges (or edges to be added) and we need to form F by choosing which ones are those that are best suited to maximize the natural connectivity. This is a more challenging scenario and, clearly, the case in which the set F is specified by external constraints is retrieved as a special case.
In order to find an optimal set F , we propose to measure how sensitive is the natural connectivity with respect to changes in the weight of a certain edge (i, j). To this end, one should look  at the magnitude of the corresponding gradient entry 2Tr(L f (A, 1 i 1 T j )) and select the edges corresponding to the largest gradient. However, inspecting these quantities for all edges (or missing edges) can be too expensive for large networks. Thus, in our experiments, we proceed as follows: first, we select a set of n P candidate edges, with n P > n F , chosen as the most important, with respect to a suitable edge-ordering, among existing and/or non-existing edges; then, we identify F on the basis of the evaluations of the gradient over the n P candidate edges.
In our experiments, we test Krylov lbfgs on a set of electric power grid networks from different countries, as listed in Table 6. All the considered network datasets were collected from an Open Street Map project by the Complex Network Group at Telecom Sud-Paris [10]. Each node represents a power station and edges represent wired connections, weighted by their voltage capacity. A small number (in most cases less than 1%) of edge voltage capacity data was missing in the original datasets. For those edges we artificially set the voltage capacity as the average of the neighbors.
Concerning the selection of the edges in F , we propose three different approaches that deal with different scenarios, as listed below.
Tuning. This approach applies to the case where we are only allowed to modify edges with an initial non-zero weight. We select the candidate edges as the first n P = 100 existing edges with respect to ≤ 1 ; then, we set F as the n F = 30 edges, among the candidates, with the largest value of the gradient.
Rewiring. This approach applies to the case where we are allowed to both modify existing edges and add new ones. We select two sets C 1 and C 2 of 50 candidate edges each, as the first 50 existing edges with respect to ≤ 2 and the first 50 non-existing edges with respect to ≤ 2 , respectively. The resulting set of n P = 100 candidate pairs is then used to form F by choosing n F = 30 elements from the union of the 15 edges in C 1 and non-existing edges in C 2 with the largest value of the gradient.
Addition. This approach applies to the case where we are only allowed to add new edges. We select the candidate edges as the first n P = 100 non-existing edges with respect to ≤ 2 ; then, we set F as the n F = 30 edges, among the candidates, with the largest value of the gradient. Tables 7 and 8 show the relative trace variation and the execution time (in seconds) for all the datasets and the three problem cases above. In particular, Figure 5 shows the geographical location of the modified and added edges on the power network of Denmark, where red edges denote edges whose weight has been diminished by the algorithm, green edges are edges whose weight was increased, and yellow lines denote edges that were added. As expected, rewiring  is always the most effective procedure, resulting in the largest increase in natural connectivity, as it combines edge tuning and edge addition in a simultaneous optimization mechanism. In particular, we see from Figure 5 that the set of edges modified and added by Rewiring is a subset of those that are modified and added by the other two approaches.

Conclusions
We have proposed two strategies, based on Krylov subspace approximations, for optimizing the natural connectivity of a graph. The first one is a greedy heuristic method that is well suited to contexts where we either add or remove unweighted edges; in particular, the numerical results indicate that, for the addition problem, our approach significantly outperforms the increase of the objective function obtained with state of the art alternatives. The second proposed strategy combines Krylov subspace approximation and the L-BFGS method to address continuous optimization problems that include edge tuning and rewiring. To the best of our knowledge, this is the first attempt to tackle the optimization of the natural connectivity with a gradient method and the reported experiments demonstrate the feasibility of the approach at least for graphs up to medium size. Finally, we highlight that the proposed computational strategies are quite flexible as they Tuning Rewiring Addition Figure 5: Results of Krylov lbfgs on the electric power grid network of Denmark. Red lines correspond to edges whose weight was decreased; Green lines correspond to edges whose weight was increased; Yellow lines denote edges that have been added from scratch. can be adapted with minor changes to the optimization of other matrix function based measures on graphs and it is conceptually easy to incorporate further constraints on the set of modifiable edges.