Distributed Fixed Point Method for Solving Systems of Linear Algebraic Equations

We present a class of iterative fully distributed fixed point methods to solve a system of linear equations, such that each agent in the network holds one of the equations of the system. Under a generic directed, strongly connected network, we prove a convergence result analogous to the one for fixed point methods in the classical, centralized, framework: the proposed method converges to the solution of the system of linear equations at a linear rate. We further explicitly quantify the rate in terms of the linear system and the network parameters. Next, we show that the algorithm provably works under time-varying directed networks provided that the underlying graph is connected over bounded iteration intervals, and we establish a linear convergence rate for this setting as well. A set of numerical results is presented, demonstrating practical benefits of the method over existing alternatives.


Introduction
The problem we consider is where A = [a ij ] ∈ R n×n and b = [b i ] ∈ R n are given, and y ∈ R n is the vector of the unknowns. The matrix A is assumed to be nonsingular, so that the problem has a unique solution. We also assume that the problem needs to be solved in a distributed computational framework determined by a set of connected computational nodes which can communicate through a generic sequence of graphs. Let A i ∈ R 1×n and b i ∈ R be the i-th row of A and the i-th component of b respectively. It is assumed that each computational node i knows the corresponding A i and b i and that each node needs to obtain the solution y * through an iterative, distributed algorithm. The considered problem is important as linear systems appear naturally in a number of applications. One important example of application is to Ordinary Kriging [3,10,14], an optimal linear prediction technique of the expected value of a spatial random field Z(s), s ∈ R n . Ordinary Kriging can be applied when the random field under study is isotropic and intrinsically stationary, that is, the expected value E(Z(s)) = m is constant and the variance V ar(Z(s) − Z(s + h)) = 2γ(h) depends only on h. In this case the model parameter estimation relies on the solution of a linear system like (1) (see equations (3.2.13)-(3.2.15) in [3] and the example in Section 5). When the semivariogram γ(h) of the random field has a sill, it can be assumed that there is a rangeh over which the covariance Cov(Z(s), Z(s + h)) = 0, when |h| >h. In this case the matrix A of the Ordinary Kriging linear system becomes sparse, since its elements are the estimates of γ(h) and each sampled node of the random field Z needs to memorize only the information brought by its neighbours at a distance lower thanh to estimate the model parameters. When the mean m of the random field is known the problem simplifies into what is called Simple Kriging. In Section 5 we will use Simple Kriging as an example of application of our method.
There is a vast literature devoted to solving systems of linear equations in the conventional centralized computational environment [8,17], as well as a number of results that cover parallelization of classical iterative methods which are applicable to the case of fully connected distributed computational environment, [6]. Our interest in this paper is the class of fixed point methods [8,17] and their extensions to the distributed framework, as described above. In other words, we develop a class of novel, fully distributed, iterative fixed point methods to solve (1), wherein each node can exchange messages only with the ones in its neighborhood in the communication graph, and each node obtains the estimate of the solution y * of problem (1). It is well known that (1) can be transformed into an equivalent fixed point problem and one can apply the Banach contraction principle and define the fixed point iterative method of the form y k+1 = M y k + d, for suitable choices of tem, and the convergence of each local vector to the full solution in ensured. We are interested in the second scenario. The method presented in [6] is applicable to a general problem of the type (1) with loose restrictions on the matrix A and can be used to solve the linear least squares problem as well.
In this paper, we propose a novel distributed method to solve (1), which we refer to as DFIX (Distributed Fixed Point). DFIX assumes the same computational framework as [12,13,21] but differs significantly from the above mentioned methods. Namely, DFIX is derived starting from an associated (centralized) fixed point method, rather than basing the derivation directly on the initial linear system. We extend the convergence theory of centralized fixed point methods to the distributed case in the sense of sufficient conditions. That is, we demonstrate that the sufficient condition M ∞ < 1 continues to work in the distributed environment. The main convergence result is completely analogous to the centralized case -given an iterative matrix with the infinity norm smaller than 1, the iterative sequence is convergent for an arbitrary starting point. The theory presented here thus covers a large class of linear systems. We prove linear convergence of DFIX under directed strongly connected networks and explicitly quantify the corresponding convergence factor in terms of network and linear system parameters. As detailed below, numerical simulations demonstrate advantages of DFIX over some state of the art methods.
With respect to the underlying graph, representing the connection among the computational agents, we consider both the case when the graph is fixed (i.e., the connectivity among the nodes is the same at any time during the execution of the algorithm) and the case when the network changes at every iteration. In the fixed graph case we prove that convergence holds if the network is strongly connected, while in the time-varying graph case we give suitable assumptions over the sequence of networks. We prove that the time-independent case is a particular case of the time-varying case, but for the sake of clarity we first present and analyse the algorithm assuming the network is fixed, and then we generalize the analysis to the time-varying case.
Any system of linear equation (1) with symmetric matrix A can be considered as the first order optimality condition of an unconstrained optimization problem with cost function 1 2 It is therefore of interest to compare the approach of solving (1) applying some distributed optimization method [11,16,18] to the minimization of the quadratic function 1 2 x t Ax−b t x with DFIX. We thus compare computational and communication costs of DFIX with the state of the art optimization method from [11] and show that the computational costs with DFIX are significantly lower, while the com-munication costs are comparable or go in favor of DFIX, depending on the connectivity of the underlying graph. Thus the numerical efficiency of DFIX is also shown. A comparison with the method from [12] is also presented in Section 5, demonstrating the clear advantage of DFIX.
This paper is organized as follows. Section 2 contains the description of the computational framework together with a brief overview of fixed point iterative methods that will be used further on. The method DFIX is defined and analysed in Section 3 for the fixed graph case. In Section 4 we present the time-varying case. Numerical results that illustrate theoretical analysis as well as an application of DFIX to a kriging problem are presented in Section 5. Some conclusions are drawn in Section 6.

Preliminaries
Let us first briefly recall the theory of fixed point iterative methods for systems of linear equations. Given a generic 1 method of type (2) we know that the method is convergent if ρ(M ) < 1, where we recall that ρ(M ) is the spectral radius of M , i.e., the largest eigenvalue of M in modulus. This condition is both necessary and sufficient for convergence. Given any matrix norm · one can also state the sufficient convergence condition as M < 1. There are many ways of transforming (1) to the fixed point form (2), depending on the properties of A, with Jacobi and Gauss -Seidel methods, as well as their relaxation versions being the most studied methods.
To fix the idea before defining the distributed method we recall here the Jacobi and Jacobi Overrelaxation, JOR, method, keeping in mind that we will consider a generic M in the next section. Assume that A is a nonsigular matrix with nonzero diagonal entries. Using the splitting A = D − P, with D being the diagonal matrix, D = diag(a 11 , . . . , a nn ), the Jacobi iterative method is defined by (3) with In other words, given d = D −1 b and denoting by x k = (x k 1 , . . . , x k n ) the estimate of solution to (1) at iteration k, the new iteration is defined by The method is linearly convergent for many classes of matrices, for example strictly diagonally dominant matrices, symmetric positive definite matrices etc [8,17], and the rate of convergence is determined by ρ(M J ). To speed up convergence and extend the class of matrices for which the method is convergent, one can introduce the relaxation parameter ω ∈ R and define In other words, the JOR iteration is given by If A is a symmetric positive definite matrix, the JOR method converges for ), see [8,17].
Assuming that each node can communicate directly with every other node, the method can be applied in parallel and asynchronous manner and the convergence follows from the results of [2,7].
Let us now define precisely the computational environment we consider. Assume that the network of nodes is a directed network G = (V, E), where V is the set of nodes and E is the set of all edges, i.e., all pairs (i, j) of nodes where node i can send information to node j through a communication link. Definition 1. The graph G = (V, E) is strongly connected if for every couple of nodes i, j there exists an oriented path from i to j in G. That is, if there exist s 1 , . . . , s l such that (i, s 1 ), (s 1 , s 2 ), . . . , (s l , j) ∈ E.
Assumption A1. The network G = (V, E) is directed, strongly connected, with self-loops at every node.
Remark 2.1. The case of undirected network G can be seen as the particular case of directed graph where G is symmetric. That is, (i, j) ∈ E if and only if (j, i) ∈ E. In this case, the hypothesis that G is strongly connected is equivalent to G connected.
Let us denote by O i the in-neighborhood of node i, that is, the set of nodes that can send information to node i directly. Since the graph has self loops at each node, then i ∈ O i for every i. We associate with G an n × n matrix W , such that the elements of W are all nonnegative and each row sums up to one. More precisely, we assume the following. Assumption A2. The matrix W ∈ R n×n is row stochastic with elements w ij such that Let us denote by w min a constant such that all nonzero elements of W satisfy w ij ≥ w min > 0. Under the previously stated assumptions we know that such constant exists. Moreover, we have w min ∈ (0, 1). Therefore, for all elements of W we have The diameter of a network is defined as the largest distance between two nodes in the graph. Let us denote with δ the diameter of G.

DFIX method
We consider now a generic fixed point method for solving (1) by the fixed point iterative method (3) Moreover, we assume that the fixed point y * of (2) is a solution of (1). The algorithm is designed in such way that each node has its own estimate of the solution y * . Thus at iteration k each node i has its own estimate x k i ∈ R n with components x k ij , j = 1, . . . , n. The DFIX method is presented in the algorithm below.

Algorithm DFIX
Step 0 Initialization: Set k = 0. Each node chooses x 0 i ∈ R n .
Step 2 Each node i updates its solution estimate and sets k = k + 1.
Notice that at Step 1 each node i updates only the i-th component of its solution estimate and leaves all other components unchanged, while in Step 2 all nodes perfom a consensus step [4,9,19] using the set of vector estimateŝ x k+1 j . Defining the global variable at iteration k as Algorithm DFIX can be stated in a condensed form using X k and the following notation More precisely, matrix M i has the i-th row equal to M , the rest of diagonal elements are equal to 1 and the remaining elements are equal to 0. Vector d i has only one nonzero element in the i-th row which is equal to d i . Now, Step 1 can be rewritten asx k+1 i = M i x k i +d i , and we can rewrite the Steps 1-2 in matrix form as where M = diag M 1 , . . . , M n ∈ R n 2 ×n 2 , d = d 1 ; . . . ;d n ∈ R n 2 and ⊗ denotes the Kronecker product of matrices. We remark here that equation (8) is only theoretical, in the sense that since each agent has access only to partial information, the global vector X k , the matrix M and the vector d are not computed at any node. We derived equation (8) to get a compact representation of Algorithm 1 and to use it in the convergence analysis.
The following theorem shows that for every i ∈ {1, . . . , n} the local sequence {x k i } converges to the fixed point y * of (2). Denote Theorem 1. Let Assumptions A1 and A2 hold, M ∞ = µ < 1 and let {X k } be a sequence generated by (8). There exists a constant τ < 1 such that for every k the global error where δ denotes the diameter of the underlying computational graph G.
Proof. Since W is assumed to be row stochastic there holds (W ⊗I)X * = X * . Moreover, using the fact thatd = (I ⊗ I − M)X * , we obtain the following recursion Notice Now, denoting by e k i the i-th block of E k (the local error corresponding to node i) and by e k ij its j-th component, from (10) we obtain the following We prove the thesis by proving that if the distance between j and i in the graph is equal to l, then for every k We proceed by induction over the distance l.
Assume now that (13) holds for distance equal to l −1, and let us prove it for l. Let (j, s l−1 , s l−2 , . . . , s 1 , i) be a path of length l from j to i. In particular we have that w is 1 > 0 and thus For each of the terms |e k sj | in the sum, by (11), we have Let us now consider the term |e k s 1 j |. Since (j, s l−1 , s l−2 , . . . , s 1 , i) is a path of length l from j to i and the distance between j and i is equal to l, we have that the distance between j and s 1 is equal to l − 1 and therefore, by inductive hypothesis Replacing (16) and (17) in (15), we get and defining τ := (1 − w min (1 − τ )) < 1 we get (13). Now the thesis follows directly from the fact that the distance between any two nodes is smaller or equal than the diameter δ of the graph.

Time-varying Network
The method discussed in the previous sections is valid only if the graph representing the communication among the agents is the same at each iteration. If some failure of the communication link between two agents occurs during the execution of the algorithm, the underlying network changes, and Theorem 1 does not apply anymore. To deal with these possible changes we consider the case where the network is given, possibly different, at each iteration. We extend DFIX to this framework and we give assumptions on the sequence of graphs that yield a convergence result analogous to Theorem 1. In particular we show that, in order to achieve convergence, strong connectivity is not necessary at any time.
Assume that a sequence of directed graphs {G k } k is given, such that G k represents the network of nodes at iteration k. That is, at iteration k, each node can communicate with its neighbours in G k . The DFIX algorithm described by equations (6) and (7) can be applied in this case if we replace (7) with where W k is the consensus matrix associated with the graph G k , that is, W k satisfies Assumption A2 with G = G k . With this modification, the equation describing the global iteration becomes We will prove a convergence result for a class of sequences of graphs. We first present and analyze the assumptions on such sequence.
That is, there is an edge from j to i in G 2 • G 1 if we can find a path from j to i such that the first edge of the path is in G 1 and the second edge is in G 2 . This definition can be extended to finite sequences of graphs of arbitrary length.
Remark 4.1. Let us consider a generic set of graphs G 1 , . . . , G m . It is easy to see that if for every index j the graph G j has self-loops at every node then the set of edges of the composition G 1 • · · · • G m contains the set of edges of G j for every j. In particular, if there exists an index ∈ {1, . . . , m} such that G is fully connected, then G 1 • · · · • G m is also fully connected.
Definition 3. Given an infinite sequence of networks {G k } k and a positive integerm, we say that the sequence is jointly fully (respectively, strongly) connected for sequences of lengthm if for every index k, the composition G k • G k+1 • · · · • G k+m−1 is fully (respectively, strongly) connected.
Definition 5. Given two vertices i, j we say that there is a joint path of length l from i to j in G k , . . . , G k+m−1 if there exist s 1 , . . . , s l−1 such that (i, s 1 ) ∈ E k+m−1 , (s 1 , s 2 ) ∈ E k+m−2 , . . . , (s l−1 , j) ∈ E k+m−l , and we say that i, j have joint distance l in G k , . . . , G k+m−1 if the shortest joint path from i to j is of length l.
Our analysis is based on the following assumption.
Assumption A3. {G k } is a sequence of directed graphs, with self-loops at every node, jointly fully connected for sequences of lengthm, for some positive integerm.
The algorithm presented in [13] works for time-varying network in a similar framework. Formally, the hypothesis on {G k } in [13] is the following.
Assumption A3'. {G k } is a sequence of directed graphs, with self-loops at every node, jointly strongly connected for sequences of lengthp, for some positive integerp.
We show now that Assumptions A3 and A3' are equivalent, in the sense specified by Proposition 1. In the following, given an integer m, we denote with G m the composition of m copies of G. Lemma 1. If G is a directed strongly connected graph with self-loops at every node and diameter δ, then G δ is fully connected.
Proof. By definition of composition we have that (i, j) is an edge in G δ if and only if We want to prove that for every i, j ∈ V a sequence of nodes s h as in (22) exists.
Since G is fully connected with diameter δ, there exists a path in G from i to j of length l ≤ δ. That is, there exist a set of nodes v 1 , . . . , v l−1 such that (i, v 1 ), (v 1 , v 2 ), . . . , (v l−1 , j) are edges in G and therefore a sequence satisfying (22) is given by Proposition 1. Let {G k } be a sequence of graphs where, for each k, G k = (V, E k ) is a directed graph with self-loops at every node. The following are equivalent: (1) there exist τ 0 , l ∈ N such that {G k } is repeatedly jointly strongly connected with constants τ 0 , l (2) there existsp ∈ N such that {G k } is strongly connected for sequences of lengthp (3) there existsm ∈ N such that {G k } is fully connected for sequences of lengthm Proof. It is easy to see that (2) ⇒ (1) with τ 0 = 0 and l =p and since full connectivity clearly implies strong connectivity, we have that (3) ⇒ (2) with p =m. We now prove that (1) ⇒ (2) withp = 2l. That is, we prove that if (1) holds, then for every index s the composition G s • · · · • G s+2l−1 is strongly connected. Given an index s, we denote withr the remainder of the division of (s − τ 0 ) by l, we defineh := l −1 (s − τ 0 + l −r). By definition ofr andh and applying (1) with k =h we have that the graph is strongly connected and thus is strongly connected. Since 2l −r ∈ l + 1, . . . , 2l we have the thesis. Finally, we prove that (2) ⇒ (3). Since the size of V is finite, there exists a finite number of graphs with vertices V. In particular, there exists a finite integer L equal to the number of strongly connected graphs with vertices V. We denote with H 1 , . . . H L such graphs, with δ j the diameter of H j and withδ := max δ j . Given any index k, we consider (δ − 1)L + 1 sequences of lengthp as follows: Statement (2) fully connected, and thus (3) holds withm = (δ − 1)Lp +p.
To conclude the considerations on the sequence of networks we remark that, since we are assuming that the linear system (1) has unique solution and that each node contains exactly one row of the coefficient matrix, the D-connectivity hypothesis introduced in [12] is equivalent to Assumption A3' and thus, by Proposition 1, to Assumption A3.
Theorem 2. Assume that a sequence of networks {G k } k is given, satisfying Assumption A3, and that for every index k the corresponding consensus matrix W k satisfies Assumption A2. Let {X k } be a sequence generated by (20) with M ∞ = µ < 1. There exists a constant σ < 1 such that for every k ∈ N the global error E k = X k − X * satisfies wherem is the constant given by Assumption A3.
Proof. We follow the proof of Theorem 1. For every index k, the matrix W k is row stochastic and (W k ⊗ I)M ∞ ≤ 1, so we get and For every node i, j and for every iteration index k, we have We now prove that if the joint distance between j and i in G k−m+1 , G k−m+2 , . . . , G k is equal to l, then for every k We proceed by induction over the joint distance l. If l = 1, that is, if w k ij > 0, proceeding as in the derivation of (3) we get We assume now that (27) holds for distance equal to l − 1 and we prove it for l. Let (j, s l−1 , s l−2 , . . . , s 1 , i) be a joint path of length l from j to i in G k−m+1 , G k−m+2 , . . . , G k In particular we have that w k is 1 > 0 and thus Using the fact that (j, s l−1 , s l−2 , . . . , s 1 ) is a joint path of length l − 1 from j to s 1 in G k−m+1 , G k−m+2 , . . . , G k−1 , applying the inductive hypothesis and proceeding as in the proof of the previous theorem, we get with σ given by (27) for distance l−1, and defining σ := (1 − w min (1 − σ )) < 1 we get (27) for distance equal to l.
Since the sequence {G k } is fully connected for sequences of lengthm we have that for every couple of nodes i, j the joint distance between j and i in G k−m+1 , G k−m+2 , . . . , G k is smaller or equal thanm and we get the thesis.
Lemma 1 shows that if we consider the time-independent case as the particular instance of the time-varying case where each of the graphs G k is equal to G with diameter δ, then Assumption 3 holds withm = δ and the two theorems give the same inequality for the error vectors.

Numerical results
In this section we present initial testing results for the DFIX method. The DFIX is compared with the state-of-the-art distributed optimization algorithm from [11] and the method for solving systems of linear equations presented in [12]. The test set consists of two types of problems: Simple Kriging problems and linear systems with strictly diagonally dominant coefficient matrix. In Section 5.1 we study how the computational and communication cost of DFIX is influenced by the connectivity of the underlying network and we compare DFIX with the methods from [11] and [12] on a simple kriging problem. In Section 5.2 we repeat the comparison considering a randomly generated linear system. In Section 5.3 we consider the case of time-varying network.
The results demonstrate that DFIX, analogously to the classical results, outperforms the corresponding optimization method for solving the unconstrained quadratic problem both in terms of computational and communication costs. With respect to the method from [12] the comparison is again favorable for DFIX, in the case of the iterative matrix with suitable properties. Clearly, the method from [12] is designed for a wider class of problems, but its efficiency is significantly lower than DFIX efficiency in the case of unique solution and a suitable iterative matrix.
For the sake of completeness we describe here both methods we compare with. We already remarked in the introduction that finding a solution of (1) is equivalent to solve the unconstrained optimization problem with quadratic objective function given by 1 2 x t Ax − b t x. When applied to this optimization problem, the method from [11], abbreviated as "Harnessing", can be stated as follows. Within one Harnessing iteration, each node computes its own solution estimate x k+1 i and an additional vector s k+1 i , which is an estimation for the average gradient, according to the following rule with η in (30) being the hand tuned step size parameter and A i denoting the i-th row of the matrix. The second method [12] we consider, abbreviated as "Projection", deals with the linear system (1) directly and is specified as follows. Before the iterative procedure starts, each agent i defines the local initial vector x 0 i as any solution of the equation A i x = b i then, at every iteration, each node performs the following update: where O i denotes the neighborhood of node i in the network and P i is the projection matrix on the subspace ker(A i ) = {x ∈ R n | A i x = 0}. The DFIX method we consider here is defined using Jacobi Overrelaxation, as specified in Section 2, as underlying fixed point method. The iteration k of the resulting method at each node is given bŷ and In the rest of the section we refer to the method defined by equations (32), (33) as DFIX -JOR.

Simple Kriging problem
The first problem we consider is Simple Kriging [3]. Let us consider a physical process modeled as a spatial random field and assume that a network of sensors is given in the region of interest, taking measurements of the field. The goal is to estimate the field in any given point of the region. Assuming that the field is Gaussian and stationary, and that the expected value and covariance function are known at any point, this kind of problem can be solved by Simple Kriging method. Denote with Z(s) the value of the random field at the point s, and with µ(s) its expected value, which is assumed to be known. Moreover, by the stationarity assumption, we have that the covariance between the value of Z at two points is given by The predicted value of Z(s) is then given bŷ where (x 1 , . . . , x n ) is the approximate solution of the linear system Clearly, the matrix W plays an important role in the DFIX -JOR method. So let us first illustrate the influence of connectivity within the network in terms of communication traffic and computational cost for the above described kriging problem, with covariance function given by We assume a set {s 1 , . . . , s 100 } ⊂ [−30, 30] 2 of agents is given and for any m ∈ {2, 4, . . . , 48, 50} we take the m-regular graph with vertices {s 1 , . . . , s 100 }.
That is, given the value of m, we define the network so that each node has degree m. The matrix W is defined using the Metropolis weights [23] which in the m-regular case are given by For every value of the degree m we apply DFIX-JOR method to solve Ax = b.
At Figure 1 and 2 we plot the number of iterations performed by the method and the total communication cost, respectively, until the stopping criterion max i=1,...,n is satisfied, for graphs of increasing degree. In other words we are asking that each node solves the system with the residual tolerance of 10 −4 . The communication cost is computed as follows. At each iteration, Step 1 does not require any communication between the agents, while in Step 2 node i shares x k i with all the agents in its neighbourhood. The per-iteration traffic is thus given by nm = 2|E|, where E is the set of edges of the underlying network and m is the degree. From Figures 1 and 2 we can see that, as the degree of the network increases, the number of iterations required to satisfy (37) decreases, while the total communication traffic first decreases then increases again. This behaviour can be explained as follows 2 . As the connectivity of the graph improves, the local information is distributed through the network more efficiently, and a smaller number of iterations is necessary. On the other hand, if the degree is larger, the consensus step (7) of the algorithm requires each node to share its local vector with a larger number of neighbours, yielding a higher communication traffic at each iteration. The fact that the overall communication traffic (Figure 2) is nonmonotone suggests that for large values of the degree, the decrease in the number of iterations in not enough to balance the higher per-iteration traffic.
Let us now compare the DFIX -JOR with Harnessing [11] and Projection method [12]. We consider a 10 × 10 grid of nodes located at {s 1 , . . . , s 100 } ⊂ The linear system that we consider is derived by the kriging problem described at the beginning of this section. That is, we consider again where K is given by (35) ands is a fixed random point in [−3, 3] 2 . Proceeding as in the previous test, we compute the communication traffic and computational cost required by the three methods to achieve the tolerance specified at (37), for different values of the communication radius R. For each method, the overall computational cost is given by the number of iterations performed times the per-iteration cost, calculated as the number of scalar operations in one iteration. Similarly, the communication traffic is given by the number of iterations times the total number of vectors shared by the nodes during one iteration, times the length n of the vector. The matrix W is defined as in [23]. That is, we define the off-diagonal elements as where m i denotes the degree of node i, and the diagonal elements as so that the resulting matrix W is stochastic. The stopping criterion is the same as in the previous test, i.e., each node solves the problem with the tolerance of 10 −4 . The initial point at each node is the same for the three methods and is defined as follows: Moreover, the relaxation parameter α in (32) is chosen as (a 11 , . . . , a nn ), while for Harnessing method we take in (30 In Figures 3 and 4 we plot the obtained results. As we can see, in this framework, DFIX method is more efficient than the two methods we compare with, both in terms of computational cost and in terms of communication traffic.

Strictly diagonally dominant systems
Let us now consider a linear system Ax = b of order n = 100, where A and b are generated as follows. For every index i we take b i randomly generated with uniform distribution in (0, 1), and A is a symmetric diagonally dominant random matrix obtained as follows: takeâ ij ∈ (0, 1) with uniform distribution and then setÃ = 1 2 (Â +Â T ) and finally A =Â + (n − 1)I, where we denote with I the identity matrix of order n. As the underlying network we consider an m-regular graph with n nodes. For every fixed value of the degree m we generate, as just described, 10 random linear systems, solve all of them using the three methods and compute the average number of iterations necessary to arrive at termination. For each method, the total amount of computation and communication are then obtained multiplying the average number of iterations and the per-iteration computational cost and communication traffic, respectively. The matrix W is defined as in (36), the step sizes α and η, the initial guess at each node and the termination condition are as in the previous test. In Figures 5 and 6 we plot the results for m in in {2, 4, . . . , 48, 50}. Similarly to the previous test, we have that DFIX outperforms both Harnessing an Projection method in terms of computation and communication. From Figure 6 we can notice that the communication required by the two methods for distributed linear systems, DFIX and Projection, is similar and that the difference with the communication required by Harnessing method increases as the degree of the graph increases. Regarding the computational cost ( Figure 5), we have that while DFIX is cheaper than the other two methods, Projection method seems to be more influenced by the connectivity of the network and it is more efficient than Harnessing only for large vaues of the degree.

Time-varying Network
We now compare the performance of the three methods in the time-varying case described in Section 4. The sequence {G k } is generated as follows. We first fix a strongly connected graph G = (V, E) and a scalar γ ∈ (0, 1]. Then, at every iteration k we randomly generate E k by uniformly sampling γ|E| edges from E and we set G k = (V, E k ). This construction can be interpreted as having a fixed un-derlying graph G that represents the available communication links among the nodes, and employing at each iteration only a fraction γ of the links. In particular, γ = 1 corresponds to the case when G k = G for every k. As remarked in Section 4, this is equivalent to the time-independent case.
The test we present here is carried on comparing the communication and computational cost required by the three methods to solve a given linear system using the same sequence of networks {G k }. In practice we generated the linear system as in Section 5.2 and we chose G as the undirected mregular graph with n = 100 vertices and degree m = 8. We repeated the same test for γ in {0.1, 0.2, . . . , 1}. For every k the consensus matrix W k associated with G k is defined as in (36), the terminantion condition and all the prameters of the methods are chosen as in the previous sections. In Figures 7, 8 and 9 we plot the results (Note that Figure 8 repeats the results of Figure 7, focusing only on the comparison Harnessing versus DFIX-JOR). The computational cost and the communication traffic are calculated as described in Section 4.2.
We can see that, in the considered framework, DFIX outperforms Harnessing method both in terms of computation and communication. Comparing with Projection, we have that, for every value of the parameter γ, the computational cost of DFIX is significantly lower, but it requires a smaller amount of communication only for large values of γ (that is, when each graph G k is equal or close to G). Moreover we can see that for all the methods there is an optimal value of γ < 1, that minimizes the communication traffic, suggesting that using the whole graph G at every iteration (that is, setting γ = 1) is unefficient. A similar phenomena happens for Harnessing and DFIX also for the computational cost (Figure 8), while we can see in Figure 7 that Projection method is most efficient when all the available communication links are used at each iterations. For γ < 1 the networks G k are in general not connected, but the joint connectivity of the overall sequence is enough to ensure the convergence of the methods.

Conclusions
We proposed a class of novel, iterative, distributed methods for the solution of linear systems of equations, derived upon classical fixed point methods. We proved global convergence in the case when the communication network is strongly connected and we showed that the convergence rate depends on the diameter of the network and on the norm of the underlying iterative matrix. In particular we have that if the graph is strongly connected, the obtained result is analogous to the classical, centralized, case. We extended the presented method to the time-varying case and we proved an analogous convergence result, assuming the networks satisfy suitable joint connectivity assumptions, comparable with those required by different methods in literature.
Our algorithm was compared with the relevant methods presented in [11] and [12]. The numerical results showed good performance of DFIX compared with the mentioned methods. In particular, in the vast majority of the considered tests, DFIX outperformed the two methods in terms of both computational cost and communication traffic.
under the Marie Sk lodowska-Curie Grant Agreement no. 812912. The work of Jakovetić, Krejić and Krklec Jerinkić is partially supported by Serbian Ministry of Education, Science and Technological Development, grant no. 174030.