Passivity-Based Generalization of Primal-Dual Dynamics for Non-Strictly Convex Cost Functions

In this paper, we revisit primal-dual dynamics for convex optimization and present a generalization of the dynamics based on the concept of passivity. It is then proved that supplying a stable zero to one of the integrators in the dynamics allows one to eliminate the assumption of strict convexity on the cost function based on the passivity paradigm together with the invariance principle for Caratheodory systems. We then show that the present algorithm is also a generalization of existing augmented Lagrangian-based primal-dual dynamics, and discuss the benefit of the present generalization in terms of noise reduction and convergence speed.


Introduction
Stimulated by strong needs for solving a large-scale optimization problem over a spatially distributed network, primal-dual dynamics [1], a continuous-time algorithm to solve convex optimization, has attracted attentions again in recent years due to its decomposable nature under separability of cost and constraint functions [2]. The continuous-time algorithm mitigates the computational efforts, furthermore, it does not require network components to install any optimization solver differently from the other distributed optimization algorithms [3]. Besides, it is pointed out in [4,5] that the impact of disturbances and noises added in the optimization process is analyzed from the control engineering point of view, which is important in the applications to online and/or distributed optimization.
The primal-dual dynamics is known to be closely related to so-called passivity [6,7], and it has been revealed that Email addresses: yamashita.s.ag@hfg.sc.e.titech.ac.jp (Shunya Yamashita), hatanaka@eei.eng.osaka-u.ac.jp (Takeshi Hatanaka), yamauchi@sc.e.titech.ac.jp (Junya Yamauchi), fujita@ctrl.titech.ac.jp (Masayuki Fujita). the algorithm is interpreted as a passivity-preserving interconnection of passive systems [8,9,10,11,12,13]. The passivity-based perspective brings several advantages. For example, the design flexibility inherent in passivitybased design allows one to stably interconnect other passive components such as physical dynamics [8,9,10] and communication delays with appropriate passivation techniques [10,11]. Robustness against the aforementioned disturbances may also be analyzed based on the celebrated passivity theorem [10]. In addition, the authors in [12,13] point out that the design flexibility brought by passivity contributes to accelerating the convergence speed and/or enhancing robustness. On the other hand, all of the above papers require strict convexity of the cost function, which may limit applications of the solutions.
Relatively few publications have addressed relaxation of the strict convexity assumption based on so-called augmented Lagrangian. Richert and Cortés [14] present a generalization of the primal-dual dynamics and prove asymptotic optimality for linear programming problems. Cherukuri et al. [15] also present an augmented Lagrangian-based solution to general convex optimization under strict convexification of the constraint function. Zhang et al. [16] present a solution relying on a projection operator to convex constrained sets although it requires subprocesses to solve optimization to compute the projection.
In this paper, we revisit the paradigm of [12,13]. We start with hypothesizing that supplying stable zeros to transfer functions, namely leading the phase, in the primaldual dynamics is a key to remove the strict convexity assumption through a toy linear programming problem. We then present a passivity-based generalization of the primal-dual dynamics so that zeros are added to the intended transfer functions. The above hypothesis is then shown to be valid, namely asymptotic optimality is proved for general convex cost functions under existence of the zeros, based on the passivity paradigm. It is further demonstrated that the present algorithm is also a generalization of the existing augmented Lagrangianbased algorithm [14,15], and the benefit of the generalization is exemplified through simulation.
The major contributions of this paper are summarized as below: (i) the passivity-based approaches [8,9,10,11,12,13] are extended to general convex optimization with nonstrictly convex cost function, and (ii) a generalization of the augmented Lagrangian-based algorithm [14,15] is presented, and the design flexibility inherent in passivity-based design is shown to contribute to noise/disturbance reduction and convergence acceleration.
Additional contributions are as follows: (iii) the passivity-based generalized primal-dual dynamics with general inequality constraints are presented in this paper for the first time, and (iv) strict convexification of the constraint function required in [15] may spoil separability, whereas the present approach does not require such reformulation and accordingly broadens the class of problems solvable in a distributed fashion.

Preliminaries
This section is intended to present terminologies, associated results and notations used in this paper.
Let us first introduce the notion of passivity [6,7].
Definition 1 Consider a system Σ, described by a state model with state x ∈ R n , input u ∈ R N and output y ∈ R N . The system Σ is said to be passive if there exists a positive semi-definite function S : R n → R ≥0 := [0, ∞), called storage function, such that holds for all states x ∈ R n and all inputs u ∈ R N , where the symbol L Σ represents Lie derivative along Σ.
We next introduce convex functions defined below.
Definition 2 A function f : R n → R is said to be convex if the following inequality holds for all x, y ∈ R n .
From (1), we immediately have so-called monotone condition: If f is strictly convex, the inequality (2) strictly holds as long as x = y [11].
Let us next introduce so-called KKT condition for the optimization problem: where x is the decision variable, f : R n → R is the cost function, g : R n → R m is the inequality constraint function, A ∈ R r×n and b ∈ R r are the constant matrix and vector for equality constraint, respectively. Denote the l-th element of the function g as g l (l = 1, . . . , m) : R n → R. The KKT condition is given as below [2].
where the symbol • describes the Hadamard product. The set of the KKT solutions is now defined as 3 Generalized primal-dual dynamics and passivity Throughout this paper, we consider the optimization problem (3) satisfying the following assumption.
Assumption 3 The functions f, g l (l = 1, . . . , m) are convex, continuously differentiable, and their gradients ∇f, ∇g l (l = 1, . . . , m) are locally Lipschitz. The feasible set of (3) is nonempty, and the function f has a minimum value in the feasible set.
It is well-known, under Assumption 3, that x * is an optimal solution to (3) if and only if there exist (µ * , λ * ) ∈ R r ×R m ≥0 such that (x * , µ * , λ * ) ∈ χ * [2]. Remark that x * is not always unique due to the lack of strict convexity of the cost function f .

Primal-dual gradient dynamics
In this subsection, we first deal with the following primaldual gradient dynamics [1,17] as a solution to (3).
• The system (5a) is passive from u to x, • the system (5b) is passive from x to ψ, • the system (5c) is passive from x to η, • (x * , µ * , λ * ) is a stable equilibrium of the system (5) in the sense of Lyapunov, and • the trajectories of (x, µ, λ) generated by (5) approaches one of the constants included in χ * as the time goes to infinity, if the cost function f is strictly convex.
It is to be noted that only the last item requires strict convexity of the cost function. Indeed, the dynamics without this additional assumption does not ensure asymptotic optimality as exemplified in the following trivial example.
Let us consider the problem (3) with x ∈ R, f (x) = 0 ∀x, A = 1 and b = 0 and without inequality constraints, which satisfies Assumption 3. It is also trivially confirmed that the (unique) optimal solution is x * = 0. The primal-dual dynamics (5) for the problem is then given asẋ The dynamics (8) is a feedback interconnection of two single integrators whose open-loop transfer function has the phase equal to −180deg over the whole frequency domain and hence the phase margin is 0deg. Accordingly, the dynamics does not drive x to x * = 0.
The above toy problem also provides informative knowledge in terms of overcoming the drawback of the primaldual dynamics. We know that asymptotic stability of x = 0 for (8) is ensured by just adding a compensator leading the phase. Inspired by the fact, we present a generalization of the primal-dual dynamics in the next subsection so that the phase lead compensation can be added in this specific example.

Remark 5
The primal dual dynamics (5) is known to provide a distributed algorithm if (3) is separable [2]. In addition, the distributed optimization problem with private costs f i (i = 1, . . . , N ) and private con- . . , N connected by an undirected graph with graph Laplacian L can be equivalently transformed into where the symbol ⊗ describes the Kronecker product. The primal dual dynamics (5) for the new problem provides a distributed algorithm based on the PI consensus algorithm [11]. These distribution of primal-dual gradient dynamics (5) is due to the diagonal structure of the integrator matrices in Fig. 1.

Generalized primal-dual dynamics
In this subsection, we generalize the primal-dual dynamics mainly to ensure asymptotic optimality even in the absence of strict convexity assumption. For notational simplicity, we define the signals v, h and w as Let us present the generalized primal-dual dynamics.
The basic design policy is to allow one to add the phase lead compensators to the open-loop transfer functions, while preserving passivity of the subsystems colored by gray in Fig. 1, formulated as where M (s) and H(s) are the transfer function matrices, G + (s) is the transfer function matrix with the operator in (7). Although we might be able to take a more general unstructured form, we restrict the matrices M (s), H(s) and G + (s) to the diagonal structure as: with It is easy to confirm that (12) allows one to add the phase lead compensator to the integrators in the primal dual dynamics (5). We also immediately see that M (s) and H(s) are passive since M i (s) and H j (s) are defined by a parallel connection of passive systems, which is known to preserve passivity. More precise descriptions on the issue together with passivity of G + (s) will be presented in the next subsection.
The block diagram of the algorithm (10)-(12) is then illustrated in Fig. 2, where ψ, η and u are defined in the same way as Subsection 3.1.

Remark 6
At this moment, we focus on the diagonal structure of M (s), H(s) and G + (s) in (11) for convenience of the subsequent technical discussions, but a more general form will be presented in the end of the next section. However, the diagonal structure is itself of particular importance since it trivially preserves the distributed nature of the primal-dual dynamics pointed out in Remark 5.

Passivity analysis
In this subsection, we confirm that the subsystems colored by light gray in Fig. 2 ensure passivity.  (11) and (12). The system enclosed by the solid line is passive from u to x. The system enclosed by the dashed line is passive from x to ψ. The system enclosed by the dashed-dotted line is passive from x to η.
Let us first consider the primal dynamics (10a). Now, a state space representation of M i (s) in (12a) is given aṡ where ξ ik is the k-th element of the state ξ i ∈ R ni and 1 ni is the n i dimensional all-ones vector. Then, we obtain the following lemma.
Lemma 7 Suppose that Assumption 3 holds. Then, the system given by (10a), (11a) and (12a) is passive from u to x for the following storage function : PROOF. See Appendix A.1.
We next treat the dynamics (10b). A state space representation of H j (s) in (12b) is given aṡ ζ jq = −ā jq ζ jq +c jq h j , q = 2, . . . , r j , (14b) where ζ jq is the q-th element of the state ζ j ∈ R rj .
Lemma 8 Suppose that Assumption 3 holds. Then, the system given by (10b), (11b) and (12b) is passive from x to ψ for the following storage function : where µ * j is the j-th element of µ * .

PROOF. See Appendix A.2.
Let us finally consider the dynamics (10c). The system G + l (s) is formulated as follows with state ρ l ∈ R m l ≥0 .
with an initial state ρ l (0) ≥ 0, where ρ lp is the p-th element of ρ l .
Lemma 9 Suppose that Assumption 3 holds. Then, the system given by (10c), (11c) and (12c) is passive from x to η for the following storage function : where λ * l is the l-th element of λ * .

PROOF. See Appendix A.3.
Lemmas 7-9 mean that the dynamics (10)-(12) is regarded as a passivity-preserving interconnection of passive systems. Accordingly, we immediately have the following result [6].

Convergence analysis
In this section, we analyze convergence of the trajectories of (x, µ, λ) generated by (10)- (12) to the set χ * . In the sequel, we use the notationsn := n i=1 n i ,r := r j=1 r j andm := m l=1 m l . Also, we define the notations of the state variables as The proof relies on the invariance principle for Carathéodory systems [17,18]. It is thus first proved that the system (10)-(12) satisfies the assumptions required by the principle.
PROOF. The item (iv) was already proved in Lemma 10. The item (i) also immediately holds since the function V is radially unbounded. Since (10)-(12) is a projected dynamics, the item (ii) can be proved by following the same procedure as Lemma 4.3 in [17]. Additionally, the assumption (iii) can be also proved in the same way as Lemma 4.4 in [17] and Lemma 4.1 in [7].
We are now ready to show the main result of this paper.
Theorem 12 Consider the system (10)- (12). Assume that the transfer function M i (s) has one or more stable zeros, for all i. If Assumption 3 holds, then (x, µ, λ) approaches one of the constants included in χ * as the time goes to infinity.
PROOF. From Lemma 11, the invariance principle for Carathéodory systems [17,18] is applied to the system (10)- (12), and hence any solution of (10)-(12) starting at S converges to the largest invariant set in cl({(ξ, ζ, ρ) ∈ S|L A V (ξ, ζ, ρ) = 0}). From (16), L A V ≡ 0 implies that In the sequel, we consider the system trajectories identically satisfying (17)- (19). First, we focus on (17) and the dynamics (13). Since M i (s) has one or more zeros, d i > 0 or n i ≥ 2 must be satisfied. If d i > 0, we have v i ≡ 0 and, otherwise, we have ξ ik ≡ 0 andξ ik ≡ 0 for all k = 2, . . . , n i . In the latter case, substituting ξ ik ≡ 0 andξ ik ≡ 0 into (13b) yields v i ≡ 0. We thus conclude that v i ≡ 0 holds for all i, and hence the state trajectories must identically satisfy We also see from (13a) and (13c) that ξ i1 and x i must be constant for all i.
Next, we focus on (18) and the dynamics (14). Since x is constant as shown above, h = Ax − b must be also constant. Now, if h j = 0 for some j, ζ j1 must diverge from (14a), which contradicts boundedness of ζ j1 . We thus conclude that the trajectories meet h j ≡ 0. In other words, the following equation identically holds.
Let us next consider about (19) and the dynamics (15). Since x is constant as shown above, w = g(x) is also constant. Now, let us focus on (15a). If mode 1 is active, thenρ l1 = 0 holds. Otherwise,ρ l1 = w l holds, i.e.,ρ l1 is constant. Then,ρ l1 = 0 holds sinceρ l1 = 0 contradicts the boundedness of ρ l1 . Thus, we haveρ l1 ≡ 0 for all l = 1, . . . , m. This means that is identically satisfied. From (19), we obtain From (15c), we also see that λ l ≡ĉ l1 ρ l1 and it is identically constant. Moreover, (22) means that the trajectories identically satisfy In summary, we conclude that the state trajectories identically satisfying (17) Remark that the assumption on the zeros of M i (s) validates the hypothesis extracted from the toy problem in the end of the previous section that leading the phase is the key to relax the strict convexity assumption.
Similar generalizations of the primal-dual dynamics are presented [12,13]. In these publications, M i (s) is assumed to be proper, positive real and having a pole at the origin, which is almost compatible with ours, and asymptotic optimality is proved under strict convexity of the cost function. The primary contribution of this paper relative to [12,13] is to show that the strict convexity assumption can be relaxed to convexity under the additional condition on the zeros of M i (s). Besides, we can add two additional contributions as below.
First, our algorithm can treat a general convex constraint function g, while [12] and [13] deal with the problems without inequality constraints and with linear inequality constraints, respectively. Namely, we immediately have a fully generalized result of [12,13] as follows.
Corollary 13 Consider the system (10)- (12). Suppose that Assumption 3 holds. If the cost function f is strictly convex, (x, µ, λ) approaches one of the constants included in χ * as the time goes to infinity.
PROOF. Noticing (A.2) and (16), Due to strict convexity of f , (24) is equivalent to x ≡ x * . Also, x is constant because x * is the unique solution to (3). From these results, we can prove that µ and λ are constant in the same way as Theorem 12. In addition, we obtain that x satisfies (21) and (23). Since x, µ and λ are constant, v = −∇f (x) − ∇g(x)λ − A ⊤ µ is also constant. Then, we have v ≡ 0 sinceξ i1 = v i = 0 contradicts the boundness of ξ. Thus, (20) holds. As a result, (x, µ, λ) is the constant satisfying KKT conditions when L A V ≡ 0. This completes the proof.
In addition, [12,13] take the diagonal transfer function matrices M (s) and H(s) as in (11) and we can also generalize the structure based on the passivity paradigm.
To this end, we first replace M (s) and H(s) in (11) where M ′ (s) and H ′ (s) are possibly non-diagonal transfer function matrices assumed to be strictly positive real. The additional design flexibility associated with M ′ (s) and H ′ (s) may contribute to improvement of the performance. It is now not difficult to confirm that adding M ′ (s) and H ′ (s) does not affect all the signals at the stationary state. From passivity preservation w.r.t. parallel interconnections, both of M (s) and H(s) are preserved to be passive. Accordingly, we can immediately prove the following corollary.
Corollary 14 Consider the system (10), (25), (11c) and (12). Assume that the transfer function M i (s) has one or more stable zeros, for all i. If Assumption 3 holds, then (x, µ, λ) approaches one of the constants included in χ * as the time goes to infinity.
PROOF. Redefine the energy function V by adding the storage functions of M ′ (s) and H ′ (s). It is then immediate to see from passivity preservation of M (s) and H(s) that the inequality (21) holds even after adding M ′ (s) and H ′ (s). The subsequent discussions are the same as Theorem 12.

Relation to augmented Lagrangian method
In this section, we explore relations between the present algorithm (10) and augmented Lagrangian-based primal-dual dynamics in [14,15].
Let us first focus on [15], where the authors present the following dynamics to solve (3).
It is not difficult to confirm that (26) is illustrated in the block in Fig. 3. Comparing Fig. 3 with Fig. 2, we immediately see that (26) is a special case of (10). Specifically, if we take M i (s) = 1 s , H j (s) = s+1 s and G + l (s) = 1 s + + (1) + , (10) coincides with (26). We thus conclude that the present dynamics is a generalization of (26).
We next investigate the relation to [14], where the authors address the following linear programming problem.
where Φ ∈ R m×n , φ ∈ R m and θ ∈ R n . Then, the following dynamics to solve (27) is presented in [14].
which is illustrated in Fig. 4. We immediately see from the figure that (28) is equivalent to (10) with M i (s) = s+1 s and G + l (s) = 1 s + . It is thus concluded that (28) is also a special example of (10).
In the reminder of this section, we clarify benefits of the generalized algorithm (10) over [14,15]. While the advantage over [14] is obvious since our approach is not restricted to the linear programming, the result of [15]  looks compatible with ours. However, [15] requires an additional assumption, namely strict convexity of the constraint function g(x), in order to prove asymptotic convergence to the optimal solution. Now, even if the original constraint function is separable, the strictly convexified function may lose the separable structure, which can be an obstacle for distributed optimization. Meanwhile, the present approach does not require such operations.
As stated in [14,19], optimization algorithms may suffer from a variety of noises. For example, in online optimization, the cost and constraint functions may be defined by the real-time data including noises. In addition, distributed implementation of the algorithm may suffer from the noises at the communication channels. Regarding the noise reduction, it is to be noted that only one of the transfer functions M i (s), H j (s) and G + l (s) in [14,15] are strictly proper, which means that the gain decay of the open-loop systems over the high frequency domain is 20dB/dec. On the other hand, our approach allows one to choose strictly proper M i (s), H j (s) and G + l (s), which would achieve a better roll-off and hence better noise reduction. Besides, the generalization presented in this paper allows one to shape the open loop systems more flexibly, which would contribute to a better disturbance rejection and/or acceleration of convergence speed.
We exemplify the above hypothesis through simulation. Let us consider the linear programming problem (27) whose optimal solution is x * = [1 2] ⊤ . We prepare the following three dynamics to solve the problem. Note that all of them satisfy the assumptions in Theorem 12, and case 1 coincides with the algorithm in [14].
We run the algorithms with the above transfer functions while adding the zero mean Gaussian noises with frequency components greater than 10rad/s to θ. The initial states are set as ξ(0) = 0, ρ(0) = 0 for all cases. We see from Fig. 5(a) that the algorithm of case 1, namely [14], is heavily affected by the noise. On the other hand, we also see that the effects are drastically reduced in Fig.  5(b) and (c), where the corresponding dynamics have strictly proper M i (s) and G + l (s). Comparing (b) and (c), it is also confirmed that the convergence speed can be improved by appropriately supplying zeros to these transfer functions.

Conclusion
In this paper, we have presented a generalized primaldual dynamics based on the concept of passivity. We have then proved asymptotic optimality for general convex optimization with a not necessarily strict convex cost function by supplying at least one stable zero to one of the integrators in the dynamics based on the passivity paradigm together with the invariance principle for Carathéodory systems. The present algorithm has also been shown to generalize existing augmented Lagrangian-based primal-dual dynamics. We have then demonstrated the benefit of the present generalization.

A.1 Proof of Lemma 7
The Lie derivative of S i along (10a), denoted by L P S i , is given as where δ ik := a ik /c ik > 0. Accordingly, it follows that Due to (9a) and the definition of u, we have v = −(∇f (x) − ∇f (x * )) + u and hence holds because of (2). From (A.1) and (A.2) , we have This completes the proof of Lemma 7.
We thus obtain This completes the proof of Lemma 8.