Convergence of a distributed method for minimizing sum of convex functions with fixed point constraints

In this paper, we consider a distributed optimization problem of minimizing sum of convex functions over the intersection of fixed-point constraints. We propose a distributed method for solving the problem. We prove the convergence of the generated sequence to the solution of the problem under certain assumption. We further discuss the convergence rate with an appropriate positive stepsize. A numerical experiment is given to show the effectiveness of the obtained theoretical result.


Introduction
Let R k be a Euclidean space with an inner product ·, · and with the associated norm · . Let f : R k → R be a convex objective function. The convex optimization problem minimize f (x), is to find a point x * ∈ R k such that f (x * ) ≤ f (x) for all x ∈ R k , and x * is called an optimal solution of problem (1). The basic idea used to find the optimal solution of problem (1) is generating a sequence in which it was expected that it would converge to the solution under certain assumption. In the literature, the simplest iterative method for solving problem (1) is the well-known gradient method [5]. The method essentially has the form: for given x 1 ∈ R k , calculate, where ∇f (x n ) is the gradient of f at x n and γ n is a positive stepsize. Notice that, if the function f is nonsmooth, the gradient method cannot be practically applicable. To overcome the limitation, Martinet [17] proposed the so-called proximal method, which is defined by the following form: for given x 1 ∈ R k , calculate where γ n is a positive stepsize.
It is well known that many problems in practical situations concern constraints, in this case the optimization problem (1) is nothing else than the constrained minimization problem: subject to x ∈ C, (2) where C ⊂ R k is a nonempty closed convex set. To solve this problem, in 1961, Rosen [22] proposed a gradient projection method. The method essentially has the form: for given x 1 ∈ C, calculate where P C : R k → C is the metric projection onto C. For some specific separable objective function and linear constrained set, one may consult [3,4]. However, in many practical situations, the structure of C can be complicated, e.g., C = m i=1 C i , where C i ⊂ R k is a closed and convex set for all i = 1, 2, . . . , m, which makes P C difficult to evaluate, or perhaps impossible to compute explicitly. To overcome this limitation, Yamada [27] proposed the method which essentially replaces the use of P C with an appropriate nonexpansive operator T. Actually, by interpreting C as the fixed-point set of T and considering the following problem: where Fix T stands for the fixed-point set of the operator T. The method essentially has the form: for given x 1 ∈ R k , calculate Under some assumption of the function f , the convergence of iterates is guaranteed. Many developments and applications related to Yamada's methods are presented in the literature, for instance, [7, 8, 14, 15, 18-20, 24, 26, 28].
Denote I = {1, 2, . . . , m}. Let us focus on a networked system having m users, and each user i ∈ I in the system is assumed to have its own private convex objective function f i and nonlinear operator T i . Moreover, we assume that each user can communicate with other users. The main objective of this system is to deal with a distributed optimization problem of minimizing the additive objective function i∈I f i with the common intersection constraint i∈I FixT i , in which not only the system but also each user i ∈ I can reach an optimal solution without using the private information of other users in the system. It is worth noting that, in this situation, the explicit forms of the function i∈I f i and the common constraint i∈I FixT i are not known explicitly. This means that Yamada's method cannot be applicable for the problem. Many authors have investigated the solving of this distributed optimization problem and tackled this limitation, for instance, [10,23,25]. Some practical applications of the distributed optimization problem are, for instance, in network resource allocation [9,11,13] and in machine learning [12].
In this work, we also deal with this situation by considering the distributed optimization problem with a common fixed point constraint as follows: For every i ∈ I, assume that the following assumptions hold: is a nonempty closed convex and bounded set. We will solve the problem: We denote the solution set of (4) by S and assume that it is a nonempty set. We will propose a distributed method for solving problem (4) and show that, under some suitable stepsize, the sequence generated by this method has a subsequence that converges to a solution of problem (4). By assuming one of the objective functions to be strictly convex, we can prove the convergence of the generated sequences to the unique solution of the problem. Further, we also discuss the convergence rate of weighted averages of the generated sequences. Finally, we present a numerical example to demonstrate the convergence of the proposed method.

Preliminaries
Throughout this paper, we denote by R k a Euclidean space with the inner product ·, · and its induced norm · , and we denote by Id the identity operator on R k . For an operator T : R k → R k , Fix T := {x ∈ R k : Tx = x} denotes the set of fixed points of T.
An operator T : R k → R k with a fixed point is said to be ρ-strongly quasi-nonexpansive, where ρ ≥ 0, if, for all x ∈ R k and z ∈ Fix T, If ρ = 0, then T is said to be quasi-nonexpansive. Note that if T : R k → R k is a quasinonexpansive operator, then Fix T is closed and convex.
The operator T : R k → R k is said to satisfy the demi-closedness (DC) principle if T -Id is demi-closed at 0, that is, for any sequence {x n } n∈N ⊂ R k , if x n → x ∈ R k and (T -Id)x n → 0, then x ∈ Fix T. It is well known that a nonexpansive operator satisfies the DC principle according to [1, Corollary 4.28].
Let C be a nonempty closed convex set. For every x ∈ R k , there is a unique x * ∈ R k such that x *y ≤ xy for every y ∈ C [6, Theorem 1.2.3]. We call such x * a projection of x onto C, and denote it by P C (x). Note that P C is strongly quasi-nonexpansive with Fix P C = C, see [6,Theorem 2.2.21].
Let C be a nonempty closed convex set. The normal cone to C at x ∈ R k is defined by Proposition 2.1 ([6, Lemma 1.2.9]) For every x ∈ R k , the following statements are equivalent: Let f : R k → (-∞, ∞] be a function. We call the function f a proper function if there is x ∈ C such that f (x) < ∞, and we call the set of such x the domain of f , and it is denoted Let f : R k → (-∞, ∞] be a proper function. We call f a convex function if, for every x, y ∈ domf and λ ∈ (0, 1), we have We call f strictly convex if the above inequality is strict for all x, y ∈ domf with x = y and λ ∈ (0, 1). If f is a convex function, then domf is a convex set. We call f a strongly convex function if there is a constant β > 0 such that, for every x, y ∈ domf and λ ∈ (0, 1), we have We call the constant β a strongly convex parameter.
We denote the set of all subgradients of f at x by ∂f (x).
Let C ⊂ R k be a nonempty closed convex set. The indicator function of C is denoted by Note that ı C is a proper convex function. Let f : R k → R be a proper convex function and C ⊂ R k be a nonempty closed convex set, we denote the set of all minimizers of f over C by Proposition 2.6 ([6, Theorem 1.3.1]) Let f : R k → R be a real-valued function and C ⊂ R k be a nonempty closed convex set. If f is strictly convex, then the minimizer is uniquely determined. Furthermore, if f is strongly convex, then argmin x∈C f (x) is a nonempty set.
The following proposition is a key tool for proving our main convergence analysis. The proof can be found in [16, Lemma 3.1].

Proposition 2.8
Let {a n } n∈N be a sequence of nonnegative real numbers such that there exists a subsequence {a n j } j∈N of {a n } n∈N with a n j < a n j+1 for all j ∈ N, and define, for all n ≥ n 0 , Then {τ (n)} n≥n 0 is nondecreasing, lim n→∞ τ (n) = ∞, a τ (n) ≤ a τ (n)+1 , and a n ≤ a τ (n)+1 for all n ≥ n 0 .

Algorithm and convergence result
In this section, we start with introducing the fixed-point distributed optimization method. We consider a networked system with m users which can have a different weight and deals with the problem of minimizing the sum of all the users' convex objective functions over the intersection of all the users' fixed-point set of strongly quasi-nonexpansive mapping with a closed convex and bounded set as a common constraint on a Euclidean space. This enables us to consider the case in which the projection onto the constraint set cannot be calculated efficiently.
Roughly speaking, the method is as follows: for given x 1 ∈ X 0 , as user i ∈ I has its own private objective function f i and operator T i , each user i computes the estimate x n,i ∈ X 0 . Since the users can communicate with each other, user i can receive all x n,i ∈ X 0 , and hence, user i can compute the iterate x n+1 ∈ X 0 in the convex hull of all user i's estimate x n,i ∈ X 0 , i ∈ I.
Some further important remarks relating to Algorithm 1 are in order.

Algorithm 1: Fixed-point distributed optimization method
Initialization: Given the weight {ω i } i∈I ⊂ [0, 1] with i∈I ω i = 1 and a positive real stepsize{γ n } n∈N . Choose an initial point x 1 ∈ X 0 arbitrarily. Iterative step: For a given current iterate x n ∈ X 0 (n ∈ N), compute the next iterate x n+1 ∈ X 0 as Update n := n + 1.
(i) To guarantee the well-definedness of Algorithm 1, we need to ensure that the minimizer of the subproblem Actually, since each objective function f i is a real-valued convex function and the function 1 2γ n · -T i x n 2 is strongly convex, Proposition 2.3 ensures that the objective function f i + 1 2γ n · -T i x n 2 of the subproblem is a real-valued strongly convex, and subsequently, the existence of the unique minimizer of the subproblem over the nonempty closed convex constraint X 0 is guaranteed by Proposition 2.6. (ii) As the estimate x n,i is the unique minimizer of the constrained subproblem, we can ensure that x n,i ∈ X 0 for all n ∈ N and i ∈ I. This means that the sequences {x n,i } n∈N , i ∈ I, are bounded. Furthermore, since the iterate x n belongs to the convex hull of all estimates x n,i , i ∈ I, the boundedness of the sequence {x n } n∈N ⊂ X 0 is guaranteed.
(iii) Let us compare Algorithm 1 with the existing distributed optimization method. Actually, the method in [23] is based on the fixed-point approximation method and the proximal method like the proposed method. The difference is that, in such a paper, each user i computes and, subsequently, computes where α n is a positive sequence. Moreover, it can be noted that the weight ω i = 1 m and the constrained set X 0 are omitted in such a paper. In order to prove the convergence result, the assumption that the sequence {y n,i } n∈N is bounded for all i ∈ I is needed in such a paper, whereas in this paper, the boundedness of the generated sequences is neglected.
To get started with the convergence result, we present an important property of the iterates given in Algorithm 1. Lemma 3.1 Let the sequence {x n } n∈N ⊂ X 0 and the stepsize {γ n } n∈N ⊂ (0, +∞) be given in Algorithm 1. For every y ∈ X and n ∈ N, we have Proof Let y ∈ X and n ∈ N be given. For every i ∈ I, we note that By the definition of x n,i and Proposition 2.7, we have Applying Proposition 2.5, we obtain and then 1 γ n (T i x nx n,i ) ∈ ∂f i (x n,i ) + N X 0 (x n,i ).
By virtue of the above relation (6) and Proposition 2.5, we have, for every i ∈ I, The definitions of subgradient and indicator function of X 0 yield for every i ∈ I that Now, by using equation (5) and inequality (7), we obtain for every i ∈ I that The strong quasi-nonexpansivity of T i implies for every i ∈ I that + 2γ n f i (y)f i (x n,i ) .
By summing up the above inequality for all i ∈ I and using the convexity of · 2 , we obtain that as desired.
The following theorem indicates the existence of a convergence subsequence of the generated sequence to the solution set. Note from the above lemma that the sequence { x ny 2 } n∈N is not necessarily decreasing, so we need to divide the proof of the following theorem into two cases. for every y ∈ X and for all i ∈ I. Moreover, since f i is Lipschitz continuous relative to every bounded subset of R k for all i ∈ I, there exists L i > 0 such that

and then
i∈I where L := max i∈I L i . By using these two obtained results, the relation in Lemma 3.1 becomes In order to prove the convergence result, we will divide the proof into two cases according to behavior of the sequence { x ny 2 } n∈N .
Case 1. Assume that there exists n 0 ∈ N such that x n+1y 2 ≤ x ny 2 for all y ∈ X and for all n ≥ n 0 . In this case, we have that the sequence { x ny 2 } n∈N is decreasing and bounded from below, hence lim n→+∞ x ny 2 exists. Now, we note from (10) that and, by the convergence of { x ny 2 } n∈N and the assumption that lim n→+∞ γ n = 0, we obtain that lim sup n→+∞ i∈I This implies that Observing that it follows that lim n→+∞ x nx n,i = 0 for all i ∈ I.
On the other hand, since the sequences {x n } n∈N and {x n,i } n∈N , i ∈ I, are bounded, we also have Observe that By applying Lemma 3.1 together with the above relation, we have Putting β n := f (x n )f (y) -L m i=1 ω i x nx n,i for all n ≥ n 0 and summing up the above inequality (13) for n = n 0 to infinity yield that +∞ n=n 0 γ n β n ≤ x n 0y 2 2 < +∞.
We next show that lim inf n→+∞ β n ≤ 0. Now, suppose to the contrary that there exist x ∈ X, n ∈ N, and α > 0 in which β n ≥ α for all n ≥ n . Note that +∞ = α +∞ n=n γ n ≤ +∞ n=n γ n β n < +∞, which leads to a contradiction. Thus, we have for all y ∈ X. Since lim n→+∞ x nx n,i = 0 for all i ∈ I, we obtain that lim inf n→+∞ f (x n ) ≤ f (y). This means that there is a subsequence {x n p } p∈N of {x n } n∈N in which, for every y ∈ X, Since {x n p } p∈N is a bounded sequence, there exists a subsequence {x n p l } l∈N of {x n p } n∈N such that lim l→+∞ x n p l = x * ∈ R k . We know that lim l→+∞ T i x n p lx n p l = 0 for all i ∈ I, the DC principle of T i yields that x * ∈ Fix T i for all i ∈ I, and hence x * ∈ i∈I Fix T i . Moreover, since {x n p l } l∈N ⊂ X 0 which is a closed set, we also have x * ∈ X 0 . It follows that x * ∈ X. The continuity of f together with inequality (14) imply that Finally, it remains to show that x n p → x * ∈ S. By the boundedness of {x n p } n∈N , it suffices to show that there is no subsequence {x n pr } r∈N of {x n p } n∈N such that lim r→+∞ x n pr =x ∈ S and x * =x. Indeed, if this is not true, the well-known Opial's theorem yields which leads to a contradiction. Therefore, the sequence {x n p } p∈N converges to a point x * ∈ S, which proves (i). Moreover, by using (12), we also obtain that lim p→+∞ x n p ,i = x * ∈ S for all i ∈ I, which means that (ii) holds.
Case 2. Assume that there exist a point y ∈ X and a subsequence {x n j } j∈N of {x n } n∈N such that x n jy 2 < x n j +1y 2 for all j ∈ N.
Let the sequence {τ (n)} n≥n 0 be defined as in Proposition 2.8, we have, for all n ≥ n 0 , and x ny 2 < x τ (n)+1y 2 .
By applying (10) and (15), we note that and by using the assumption that lim n→+∞ γ n = 0, we obtain Thus, for all i ∈ I, Note that Again, by using (13), we have for all n ≥ n 0 which together with (15) implies Subsequently, by using (17) together with the above relation, we obtain that Thus, there exists a subsequence {x τ (n q ) } q∈N of {x τ (n) } n≥n 0 such that Since the sequence {x τ (n q ) } q∈N is bounded, there exists a subsequence {x τ (n q l ) } l∈N of {x τ (n q ) } q∈N such that lim l→+∞ x τ (n q l ) = x * ∈ R k . Moreover, we also have lim l→+∞ T i x τ (n q l ) - x τ (n q l ) = 0. By the DC principle of T i , i ∈ I, we have x * ∈ Fix T i for all i ∈ I; consequently, x * ∈ i∈I Fix T i . Moreover, we also know that {x τ (n q l ) } l∈N ⊂ X 0 , which is a closed set, it follows that x * ∈ X 0 , and hence x * ∈ X. Invoking (18) and (19), we obtain that which implies that x * ∈ S. By using (17), we note that lim l→+∞ x τ (n q l ),i = x * for all i ∈ I.
Since lim l→+∞ x τ (n q l )x * = 0, in view of (16), we note that 0 ≤ lim inf l→+∞ x n q lx * ≤ lim sup l→+∞ x n q lx * ≤ lim sup l→+∞ x τ (n q l )x * = 0, which yields that lim l→+∞ x n q l = x * ∈ S. Similarly, we have lim l→+∞ x n q l ,i = x * ∈ S for all i ∈ I. From Cases 1 and 2, there exist a subsequence of {x n } n∈N and {x n,i } n∈N for all i ∈ I that converge to a point in S.
By assuming at least one of the objective functions f i to be strictly convex, we obtain the convergence of the whole sequences as the following theorem. Proof Note that, since the objective function f := m i=1 f i is strictly convex, we have that the solution set to problem (4) consists of at most one point, denoted by x * . To this end, we also consider the proof in two cases in the same manner as the lines of the proof of Theorem 3.2.
In case 1, we obtain that there is a subsequence {x n p } p∈N of the sequence {x n } n∈N that converges to a point x * ∈ S. However, in the context of strict convexity of f , we have S = {x * }. These imply that the sequence {x n } n∈N converges to x * . Moreover, by using (12), we also obtain that the sequences {x n,i } n∈N , i ∈ I, also converge to x * .
In case 2, we obtain that there is a subsequence {x τ (n q l ) } l∈N of {x τ (n) } n≥n 0 that converges to x * , which yields that the sequence {x τ (n) } n∈N converges to x * , that is, lim n→∞ x τ (n)x * = 0. Since it holds that x nx * ≤ x τ (n)+1x * for all n ≥ n 0 , which implies that which is nothing else than the whole sequence {x n } n∈N converging to x * . It is akin as above, we also obtain that the sequences {x n,i } n∈N , i ∈ I, also converge to x * . This completes the proof.
In the next theorem, we provide an error bound for the feasibility error of iterates per iteration. Actually, we first find the error bound of the weighted averages of distance of the iterates x n to the common fixed point sets.

Theorem 3.4
Let the sequence {x n } n∈N ⊂ X 0 and the stepsize {γ n } n∈N ⊂ (0, +∞) be given in Algorithm 1. Suppose that {γ n } n∈N is a sequence such that γ n = a n b , where a > 0 and 0 < b < 1. Then, for every n ∈ N, we have where d S (x) := inf y∈S xy , D X 0 := max x,y∈X 0 xy < +∞, ρ := min i∈I ρ i , and L := max i∈I L i , in which L i is the Lipschitz constant relative to every bounded subset of R k of each function f i .
Proof Since {x n,i } n∈N is a bounded sequence, there exists M > 0 such that and for all i ∈ I. Moreover, since f i is Lipschitz continuous relative to every bounded subset of R k , for all i ∈ I, there exists L i > 0 such that

and then
i∈I where L := max i∈I L i . By invoking the relation in (10), we have, for each n ∈ N, which in turn implies that Now, let us note that as desired.
The above theorem provides an upper bound on the rate of convergence in which the weighted average sequence i∈I ω i ( n k=1 T i x k -x k 2 n ) of the distance of the sequence x n to the fixed point set Fix T i converges to 0. It can be seen that the weighted average of the distance is bounded above by a constant factor of 1 n b , where n is the iteration index and 0 < b < 1. In this situation, we can also say that the distance converges to 0 with a rate of O( 1 n b ). Moreover, if the weight is identical, that is, ω i = 1/m, we obtain the error bound

Numerical example
In this section, we present a numerical example for solving the minimal distance to given points over a finite number of half-space constraints with box constraint. Actually, let a i ∈ R k , c i ∈ R k , and b i ≥ 0 be given for all i = 1, 2, . . . , m, we consider the following minimization problem: subject to a i , x ≤ b i , i = 1, 2, . . . , m, and where u, v ∈ R with u ≤ v. Note that the function f i := 1 2 · -c i 2 is strictly convex, the constrained set C i := {x ∈ R k : a i , x ≤ b i }, i = 1, 2, . . . , m, and the box X 0 := [u, v] k are nonempty closed and convex sets. By putting T i = P C i , for all i = 1, 2, . . . , m, we have T i is strongly quasi-nonexpansive and satisfies the DC principle with FixT i = C i . Thus, the considered problem (21) is nothing else than the particular situation of problem (4), and the sequence generated by Algorithm 1 can be applied for solving the problem.
Observe that Algorithm 1 requires the computation of estimate x n,i for all i = 1, 2, . . . , m, which is a solution of the minimization problem Of course, the solution cannot be computed explicitly in a closed-form expression. In this situation, we need to solve the following strongly convex optimization problem: subject to u ∈ X 0 .
Note that the objective function 1 2 uc i 2 + 1 2γ n u -P C i x n 2 is strongly convex function with modulus 1 + 1 γ n and Lipschitz continuous gradient with Lipschitz constant 1 + 1 γ n . In our experiment, we basically make use of the classical gradient projection method by performing the inner loop: pick an arbitrary initial point y 1 ∈ X 0 and compute y l+1 = P X 0 y lα l (y lc i ) + 1 γ n y l - where α l is a positive stepsize.
All the experiments were performed under MATLAB 9.9 (R2020b) running on a personal laptop with an AMD Ryzen 7 4800H with Radeon Graphics 2.90 GHz processor and 8GB memory. All CPU times are given in seconds. We generate vectors a i and c i in R k by uniformly distributed random generating between (-1, 1). We choose the box constraint with boundaries u = 0 and v = 1. We choose the starting point for every inner loop to be a vector whose coordinates are uniformly distributed randomly chosen from the interval (0, 1). An example of a sequence {x n } n∈N generated by Algorithm 1 and its behavior in the simple case of k = 2 and m = 10, all b i = 0, i = 1, 2, . . . , m, the stopping criterion for inner loop is 1000 iterations, and the initial point x 1 = (1, 0.7) are illustrated in Fig. 1.
In Fig. 1, we observe from both upper figures that the values of iterates x n and all x n,i for all i = 1, . . . , 10 converge to the same point, which is coherent with the assertions in Theorem 3.3. Moreover, we can see from the lower left that the feasibility error i∈I ( n k=1 T i x k -x k 2 n ) is bounded by the error bound (ρ -1 d 2 S (x 1 ) + 4ρ -1 aLD X 0 1-b ) m n b , with a = 1, b = 0.9, which conforms to the result in Theorem 3.4. For the lower right, we present the convergence behavior of the sequence {x n } n∈N which is converging to the solution point (0, 0) of the minimizing distance to the reference points c i (blue dots).
In the next experiment, we consider behavior of the sequence {x n } n∈N generated by Algorithm 1 for various problem's dimensions for two stopping criteria of inner loops. We generate vectors a i and c i as above, and b i is normally distributed randomly chosen in (1, 2). We choose the initial point to be a vector whose all coordinates are uniformly distributed randomly chosen in (0, 1). We manually choose the best choices of the involved  stepsizes, that is, γ n = 1.8/n and α l = 1.6/l. We terminate Algorithm 1 by the stopping criteria x n+1 -x n x n +1 ≤ 10 -6 . We performed 10 independent tests for any collections of dimensions k = 10, 20, 50, and 100 and the number of nodes m = 3, 5, 10, 20, 50, and 100. The results are presented in Table 1, where the average number of iterations and the average computational runtime for any collection of k and m are presented.
We have presented in Table 1 the number of iterations (k) (#(Iters)), the computational time (Time) in seconds, where the number of inner iterations (#(Inner)) is 1000 and 10,000 when the stopping criteria of Algorithm 1 were met. It can be observed that larger k and m require a larger number of iterations and computational runtime. Moreover, for the case when m = 3, 5, 10, and 20, we observe that the number of inner iterations 1000 is sufficient enough for the convergence of Algorithm 1 with less than about 10 times comparing with the case when the number of inner iterations is 10,000 in computational runtime. Nevertheless, for the very large dimension with k = 100 and m = 50, 100, the number of inner iterations 1000 may not be sufficient for the convergence of Algorithm 1.