Asynchronous Gossip-Based Gradient-Free Method for Multiagent Optimization

the


Introduction
In recent years, the problem of solving convex optimization problems over a network has attracted a lot of research attention; see [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18].The objective function of the problem is a sum of convex functions, each of which is known by a specific agent only.Such problems arise in many real applications including distributed finite-time optimal rendezvous [2] and distributed regression over sensor networks [5].The methods that are designed for solving these optimization problems need to be fully distributed; that is, there does not exist a central coordinator.
In this paper, we propose an asynchronous gossip-based gradient-free method for solving the convex optimization problem over a multiagent network.The method is based on the gossip algorithm [19] and the gradient-free oracles [20].The method is asynchronous in the sense that only one agent communicates at a given time, in contrast to the synchronous methods where all agents communicate simultaneously.Moreover, the method does not rely on the assumption that the information of the subgradients of the objective function is available.As is well known that for a variety of reasons there have been many instances where derivatives of the objective functions are unavailable or computationally expensive to calculate [20,21].
Literature Review.In [3], the authors study the problem of minimizing a sum of multiple convex functions, each of which is known to one specific agent only.The authors use the average consensus algorithm in the literature on multiagent systems (see, e.g., [19,[22][23][24][25][26]) as a mechanism to develop a distributed subgradient method for solving the optimization problem; the convergence of the method is also given for a constant step size.The authors in [7] further take the global equality and inequality constraints into consideration.The work in [2] proposes a variant of the distributed subgradient method in [3], in which at each iteration several consensus steps are executed, which simplifies the proof of the convergence of the method.Inspired by the work in [2], the authors in [6] further incorporate the global inequality constraints.The aforementioned methods are synchronous because they require that all agents in the network update at the same time.To overcome this limitation, the work in [14] develops an asynchronous distributed algorithm, based on the gossip algorithm.The algorithm is asynchronous in the sense that only one agent communicates at a given time.Moreover, all agents use different step size values and they do not require any coordination of the agents.In [5], the author further removes the need for bidirectional communications of the asynchronous algorithm in [14]; the convergence of the algorithm is also established.The aforementioned methods or algorithms, however, rely on the assumption that the subgradients of the objective functions are available to each agent, respectively.
By comparison to previous work, the main contributions of this paper are twofold: (i) different from the methods or algorithms considered in existing papers, which rely on computing the subgradients of each agent's objective function, we propose the derivative-free method which is based on utilizing the random gradient-free oracles; (ii) the proposed method is asynchronous, in the sense that all agents use different step size values that do not require any coordination of the agents.We prove that with probability 1 the iterates of all agents converge to the same optimal point of the problem, for a diminishing step size.
Notation and Terminology.Let R  be the -dimensional vector space.We denote the standard inner product on R  by ⟨, ⟩ = ∑  =1     , for ,  ∈ R  .We write ‖‖ to denote the Euclidean norm of a vector  and Π X [] to denote the Euclidean projection of a vector  on X.We use  T to denote the transpose of .For a matrix P, [P]  represents the element in the th row and th column of P, and P T represents its transpose.We use E[] to denote the expected value of a random variable .For a function , its gradient at a point  is represented by ∇().

Problem Formulation
In this section, we start by describing the constrained multiagent optimization problem.Then, we provide some preliminary results on the gossip algorithm that we use in developing the method.
2.1.Constrained Multiagent Optimization.We consider the following constrained multiagent optimization problem: where  ∈ R  is a decision vector;   : R  → R is the convex objective function of agent  known only by agent , and we assume that   is Lipschitz continuous over X with Lipschitz constant (  ); X ⊆ R  is a nonempty closed convex set.We denote the optimal set of problem (1) by X * , and we assume that it is nonempty.Note that in problem (1), each function   need not be differentiable.

Gossip Algorithm.
The underlying network topology of problem ( 1) is denoted by  = (, ), where  = {1, . . ., } is the node set and  is the set of links {, } with  ̸ =  and {, } ∈  only if there is a link between agents  and .We assume that the network  is fixed, undirected, and connected.
In the paper, we utilize gossip algorithm as a mechanism to design the method.To be specific, at each time instant, agent  is chosen with probability 1/, and then with some positive probability, agent  communicates with one of its neighbors agent .The iterations evolve as follows: for  ≥ 0, and for agents  that do not belong to {, }, update

Gossip-Based Gradient-Free Method
In this section, motivated by the random gradient-free method in [20] and the gossip algorithm in [19], we present an asynchronous gossip-based gradient-free method for solving problem (1).We use I +1 to denote the index of the agent that is chosen to update at time +1 and J +1 the index of the agent communicating with agent I +1 .The method is given as follows.

Gossip-Based Gradient-Free Method with a Diminishing
Step Size where    = (Σ   ) −1 , and Σ   denotes the number of updates that agent  has performed until time , inclusively, and G   (   ) is the random gradient-free oracle, given by where    =    , and  is a positive constant; ]   is a random variable generated locally according to the Gaussian distribution.(ii) For  ∉ {I +1 , J +1 }:   +1 =    .We use F  to denote the -field generated by the entire history of the random variables to iteration ; that is, where F 0 = {  0 ,  ∈ }.The method can be presented in a more compact form, by defining the following weight matrix: where  is the identity matrix and   ∈ R  denotes the th standard basis vector.It is easy to see that W +1 ∈ R × is doubly stochastic.Now we can write the method as follows: for all  ≥ 0 and any  ∈ , where 1 {∈{I +1 ,J +1 }} is the indicator function of the event { ∈ {I +1 , J +1 }}.For the gradient-free oracle G    (   ), we have the following lemma, which is adopted from [20].
Lemma 1.For each  ∈ {I +1 , J +1 } and all  ≥ 0, one has the following: , and it satisfies: Remark 2. Note that method ( 7) is asynchronous, in the sense that to implement the method, each agent need not coordinate its step size with the step sizes of its neighbors; the timevarying parameters    ( ≥ 0,  ∈ ) share the same feature.In addition, to implement the method (7), the information of subgradients of the objective functions is not needed; however, each agent only needs to make two function evaluations per iteration to get the gradient-free oracle.
Let E   = { ∈ {I  , J  }} be the event that agent  updates at time  and   the probability of event E   .It is easy to see that where   denotes the set that contains all agents that are neighboring to agent  and   > 0 denotes the probability that agent  is chosen by its neighbor  to communicate.In the paper, we denote π = min ∈   and π = max ∈   , respectively.There is an interesting link between the step size    = (Σ   ) −1 and the probability   that agent  updates.
Lemma 3 (see [17]).Let  min = min {,}∈   .Let    = (Σ   ) −1 for all  ≥ 1 and  ∈ , and also let  be a scalar such that 0 <  < 1/2.Then, there exists a large enough k = k(, ) such that with probability 1 for all  ≥ k and  ∈ , To establish the convergence of method (7), we also make use of the following lemma.
We now present the main result of the paper, which is given in the following theorem.
Proof.For  ≥ 0 and  ∈ {I +1 , J +1 }, we have for any  ∈ X, where the first inequality follows from the nonexpansive property of the projection operation.For  ≥ k, by recalling Lemma 3(c), with probability 1 the last term on the righthand side of (10) can be bounded as follows: Substituting the preceding inequality into (10) gives To simplify the notation, we denote   = 2/ Taking the conditional expectation on F  , I +1 and J +1 jointly yields where the last inequality follows from using Lemma 1.For the last term on the right-hand side of the preceding inequality, we can derive where in the last inequality we have use the bound ‖∇     (   )‖ ≤ (+4)(  ), according to Lemma 1.Hence, substituting (15) into ( 14) yields where we have used the inequalities      (   ) ≥   (   ) and      () ≤   () +    √ (  ), based on Lemma 1(a).Using the fact that    =    and Lemma 3(a), we obtain which implies Taking the expectation with respect to F  and using the fact the preceding inequality holds with probability   , and   +1 =   +1 with probability 1− where  * ∈ X * and we have used the following inequality: Now we are ready to establish the convergence of the method.First, note that which can be easily seen from the explicit expressions for   and   .For the term max ∈ ‖   −   ‖, we can follow an argument similar to the proof of Lemma 4 in [5] and derive that for each  ∈ , ∑  Remark 6.Note that other choices of the parameters    ( ≥ 0,  ∈ ) are possible.For example, we can set    = √   , for all  ≥ 0 and any  ∈ , under which case the convergence of the method (7) can also be established.

Remark 7.
In contrast to the subgradient-based methods in [1][2][3], the implementation of the proposed method does not need the information of subgradients but only the function values.This makes our method suitable for the cases where explicit gradient calculations are computationally infeasible or expensive.In contrast to the gradient-free method in [13], the proposed method is asynchronous and the step sizes do not require any coordination of the agents.

Conclusion
In this paper, we have considered the constrained multiagent optimization problem.We present an asynchronous method that is based on the gossip algorithm and the gradient-free oracles for solving the problem.The proposed method removes the need for synchronous communications and the information of the subgradients as well.Finally, we prove that with probability 1 the iterates of all agents converge to the same optimal point of the problem, for a diminishing step size.There are several interesting questions that remain to be explored.For instance, it would be interesting to study the case of constant step size; it would be also interesting to study the effects of message quantization on the proposed method.