Improved convergence rates and trajectory convergence for primal-dual dynamical systems with vanishing damping

In this work, we approach the minimization of a continuously differentiable convex function under linear equality constraints by a second-order dynamical system with asymptotically vanishing damping term. The system is formulated in terms of the augmented Lagrangian associated to the minimization problem. We show fast convergence of the primal-dual gap, the feasibility measure, and the objective function value along the generated trajectories. In case the objective function has Lipschitz continuous gradient, we show that the primal-dual trajectory asymptotically weakly converges to a primal-dual optimal solution of the underlying minimization problem. To the best of our knowledge, this is the first result which guarantees the convergence of the trajectory generated by a primal-dual dynamical system with asymptotic vanishing damping. Moreover, we will rediscover in case of the unconstrained minimization of a convex differentiable function with Lipschitz continuous gradient all convergence statements obtained in the literature for Nesterov's accelerated gradient method.


Problem statement and motivation
In this paper we will deal with the optimization problem min f pxq , subject to Ax " b (1.1) Problems of type (1.1) underlie many important applications in various areas, such as image recovery [36], machine learning [32,39], the energy dispatch of power grids [54,55], distributed optimization [40,57] and network optimization [51,56].
The object of our investigations will be a second-order dynamical system with asymptotic vanishing damping term associated with the optimization problem (1.1) and formulated in terms of its augmented Lagrangian. Our main aim is to study the asymptotic behaviour of the generated trajectories to a primal-dual optimal solution as well as to derive fast rates of convergence for the primal-dual gap, the feasibility measure, and the objective function value along these.
The interplay between continuous-time dissipative dynamical systems and numerical algorithms for solving optimization problems has been subject of an intense research activity. It is well-known for unconstrained optimization problems that damped inertial dynamics are a natural way to accelerate these systems. In line with the seminal work of Polyak on the heavy ball method with friction [47,46], the first studies by Alvarez and Attouch focused on inertial dynamics with fixed viscous damping coefficient [2,3,17]. A decisive step was taken by Su, Boyd and Candès in [53], where, for the minimization of a continuously differentiable convex function f : X Ñ R, the following inertial dynamics with an asymptotically vanishing damping coefficient has been considered : x ptq`α t 9 x ptq`∇f px ptqq " 0.
The terminology asymptotic vanishing damping (AVD) refers to the specific characteristic of the damping coefficient α t to vanish in a controlled manner, neither too fast nor too slowly, as t goes to infinity. In particular, in the case α " 3, this dynamical system can be seen as the continuous limit of Nesterov's accelerated gradient algorithm [42,43,24]. In the last years, the community paid a lot of attention to the topic of inertial dynamics [7,9,12,13,20,26,29,30,38,41], as well as of their discrete counterparts [4,8,10,18,22,34], to name only a few.
The augmented Lagrangian Method (ALM) [49] (for linearly constrained problems), the Alternating Direction Method of Multipliers (ADMM) [35,32] (for problems with separable objectives and block variables linearly coupled in the constraints) and some of their variants have proved to be very suitable when solving large-scale structured convex optimization problems. Since the primal-dual systems of optimality conditions to be solved can be equivalently formulated as monotone inclusion problems, see [48,49,50], the above-mentioned methods are intimately linked with numerical algorithms designed to find a zero of a maximally monotone operator. This close connection has been used in recent works addressing the acceleration of ADMM/ALM methods via inertial dynamics. In [27], for instance, an inertial ADMM numerical algorithm has been proposed originating in the inertial version of the Douglas-Rachford splitting method for monotone inclusion problems introduced in [28]. Recently, Attouch has proposed in [5] an inertial proximal ADMM algorithm, relying on the general scheme from [19] designed to solve general monotone inclusions and in lines with [21], and investigated its fast convergence properties for certain combinations of the viscosity and the proximal parameters. However, the inertial proximal ADMM algorithm fails to be a full splitting method.
Continuous-time approaches for structured convex minimization problems formulated in the spirit of the full splitting paradigm have been recently addressed in [31] and, closely connected to our approach, in [56,37,11], to which we will have a closer look in Subsection 2.3.

Our contributions
For a primal-dual dynamical system with asymptotically vanishing damping term associated to the augmented Lagrangian formulation of (1.1) we will show fast convergence for the primal-dual gap, the feasibility measure, and the objective function value along the generated trajectories, and, consequently, improve existing results in the literature. We will prove the existence and uniqueness of the trajectories as global twice continuously differentiable solutions of the dynamical system provided the gradient of the objective function is Lipschitz continuous. In the same setting, we will also prove that the primal-dual trajectory asymptotically weakly converges to a primal-dual optimal solution of (1.1), which is the first result of this type in the literature addressing such dynamical systems.
Last but not least, we will show how the asymptotic analysis and the obtained results can be straightforwardly transferred to continuous-time methods with vanishing damping terms approaching optimization problems with separable objectives and block variables linearly coupled in the constraints. Moreover, we will rediscover in case of the unconstrained minimization of a convex differentiable function with Lipschitz continuous gradient all convergence statements obtained in the literature for Nesterov's accelerated gradient method introduced in [53, 12].

Notations and a preliminary result
For both Hilbert spaces X and Y, the Euclidean inner product and the associated norm will be denoted by x¨,¨y and ¨ , respectively. The Cartesian product XˆY will be endowed with the inner product and the associated norm defined for px, λq , pz, µq P XˆY as xpx, λq , pz, µqy " xx, zy`xλ, µy and respectively. The closed ball centered at x P X with radius ε ą 0 will be denoted by B px; εq :" ty P X : x´y ď εu. Let f : X Ñ R be a continuously differentiable convex function such that ∇f is ℓ´Lipschitz continuous. For every x, y P X it holds (see [44,  Under the assumptions (1.2), L is convex with respect to x P X and affine with respect to λ P Y. A pair px˚, λ˚q P XˆY is said to be a saddle point of the Lagrangian function L if for every px, λq P XˆY L px˚, λq ď L px˚, λ˚q ď L px, λ˚q .
If px˚, λ˚q P XˆY is a saddle point of L then x˚P X is an optimal solution of (1.1), and λ˚P Y is an optimal solution of its Lagrange dual problem. If x˚P X is an optimal solution of (1.1) and a suitable constraint qualification is fulfilled, then there exists an optimal solution λ˚P Y of the Lagrange dual problem such that px˚, λ˚q P XˆY is a saddle point of L. For details and insights into the topic of constraint qualifications for convex duality we refer to [23,25]. The set of saddle points of L, called also primal-dual optimal solutions of (1.1), will be denoted by S and, as stated in the assumptions, it will be assumed to be nonempty. The set of feasible points of (1.1) will be denoted by F :" tx P X : Ax " bu and the optimal objective value of (1.1) by f˚.
The system of primal-dual optimality conditions for (1.1) reads where A˚: Y Ñ X denotes the adjoint operator of A.
In addition,

Associated monotone inclusion problem
The optimality system (2.3) can be equivalently written as where is the maximally monotone operator associated with the convex-concave function L. Indeed, it is immediate to verify that T L is monotone. Since it is also continuous, it is maximally monotone (see, for instance, [23,Corollary 20.28]). Therefore S can be interpreted as the set of zeros of the maximally monotone operator T L , which means that it is a closed convex subset of XˆY (see, for instance, [23,Proposition 23.39]). Applying the fast continuous-time approaches recently proposed in [19,5] to the solving of (2.7) would require the use of the Moreau-Yosida approximation of the operator T L , for which in general no close formula is available. The resulting dynamical system would therefore not be formulated in the spirit of the full splitting algorithm, which is undesirable from the point of view of numerical computations.

The primal-dual dynamical system with vanishing damping
The dynamical system which we associate to (1.1) and investigate in this paper reads  x ptq`α t 9 x ptq`∇ x L β´x ptq , λ ptq`θt 9 λ ptq¯" 0 : λ ptq`α t 9 λ ptq´∇ λ L β´x ptq`θt 9 x ptq , λ ptq¯" 0 x pt 0 q , λ pt 0 q¯"´x 0 , λ 0¯a nd´9 x pt 0 q , 9 λ pt 0 q¯"´9 x 0 , 9 λ 0¯, where t 0 ą 0, α ě 3, β ě 0, θ ą 0 and px 0 , λ 0 q, p 9 x 0 , 9 λ 0 q P XˆY. Our system is a particular case of the Temporally Rescaled Inertial Augmented Lagrangian System (TRIALS) proposed by Attouch, Chbani, Fadili and Riahi in [11] $ & % : x ptq`γ ptq 9 x ptq`b ptq ∇ x L β´x ptq , λ ptq`θ ptq 9 λ ptq¯" 0 where γ, θ, b : rt 0 ,`8q Ñ p0,`8q are continuously differentiable functions. The case when b is identically 1 was also studied by He, Hu and Fang in [37]. In [11,37] the authors have actually investigated the minimization of the sum of two separable functions with the block variables linked by linear constraints, however, we will see in the next subsection that our analysis can be easily extended to this setting. The viscous damping function γp¨q is vital in achieving fast convergence and its role has been already well-understood in unconstrained minimization [7,9,38] (see also [12,13,41] for the case when γ ptq :" α t ). The role of the extrapolation function θp¨q is to induce more flexibility in the dynamical system and in the associated discrete schemes, as it has been recently noticed in [11,15,37,56]. The time scaling function bp¨q has the role to further improve the rates of convergence of the objective function value along the trajectory, as it was noticed in the context of uncostrained minimization problems in [10,14,16] and of linearly constrained minimization problems in [6].
The dynamical system (PD-AVD) is (TRIALS) for γptq :" α t , θptq :" θt and bptq :" 1 @t ě t 0 , where α ě 3 and θ ą 0. A setting which is closely related to ours can be found in the work [56] of Zeng, Lei and Chen. However, when compared to [11,37,56], we provide improved convergence rates and also prove weak convergence of the trajectories to a primal-dual optimal solution. We also expect that our analysis can be adapted to the more general system (TRIALS), though, we prefer the particular setting of (PD-AVD), in order to keep the presentation more simple and easier to follow. Since our system is a particular instance of (TRIALS), we could have relied on the results showing the existence and uniqueness of a strong global solution from [11]. We will prove instead the existence and uniqueness of the trajectories as global twice continuously differentiable solutions of (PD-AVD), provided ∇f is Lipschitz continuous.

Extension to multi-block optimization problems
For m ě 2 a positive integer, we consider the minimization of a separable objective function with respect to linearly coupled block variables .., m, and Y are real Hilbert spaces; .., m, are continuous linear operators and b P Y; the set of primal-dual optimal solutions of (2.10) is nonempty.
(2.11) Let X :" X 1ˆ¨¨¨ˆXm be the Cartesian product of the real Hilbert spaces X i , i " 1, ..., m, endowed with inner product and associated norm defined for x :" px 1 ,¨¨¨, x m q , z :" The multi-block optimization problem (2.10) can be equivalently written as (1.1), for the separable objective function

Fast convergence rates
In this section we will derive fast convergence rates for the primal-dual gap, the feasibility measure, and the objective function value along the trajectories generated by the dynamical system (PD-AVD). Throughout this section we will make the following assumption on the parameters α, β and θ. Assumption 1. Suppose that α, β and θ in (PD-AVD) satisfy α ě 3, β ě 0 and 1 2 ě θ ě 1 α´1 .

The energy function
Let px, λq : rt 0 ,`8q Ñ XˆY be a solution of (PD-AVD). For pz, µq P XˆY fixed, we define According to (2.4) and (2.5), we have for every pz, µq P FˆY and every t ě t 0 When pz, µq :" px˚, λ˚q P S, it holds for every t ě t 0 where f˚denotes the optimal objective value of (1.1). For pz, µq P XˆY fixed, we introduce the energy function E z,µ : rt 0 ,`8q Ñ R defined as Notice that due to (3.2), for px˚, λ˚q P S we have Lemma 3.1. Let px, λq : rt 0 ,`8q Ñ XˆY be a solution of (PD-AVD) and pz, µq P FˆY.
We get from the distributive property of inner product x ptq E`θ t xA px ptq`θt 9 x ptqq´b, λ ptq´µỳ Since z P F, the last four terms in the above identity vanish. Indeed, Therefore, (3.7) becomes Furthermore, the convexity of f and the fact that z P F guaranteé A ∇G β´`x ptq , λ ptq˘ˇˇpz, µq¯,`x ptq , λ ptq˘´pz, µq E " x∇f px ptqq , z´x ptqy`xA˚µ, z´x ptqy`β xA˚pAx ptq´bq , z´x ptqy ď´pf px ptqq´f pzqq´xµ, Ax ptq´by´β Ax ptq´b 2 (3.9) Combining this inequality with (3.8) yields the desired statement.
An important consequence of Lemma 3.1 is the following theorem.
The following statements are true: piiq if, in addition α ą 3 and 1 2 ě θ ą 1 α´1 , then the trajectory`x ptq , λ ptq˘t ět 0 is bounded and the convergence rate of its velocity is ´9 Proof. piq Assumption 1 implies that 2θ´1 ď 0 and ξ ě 0 (see (3.5)). Moreover, px˚, λ˚q P S yields x˚P F. Therefore, we can apply Lemma 3.1 to obtain for every t ě t 0 d dt E x˚,λ˚p tq ď p2θ´1q θt´L β px ptq , λ˚q´L β px˚, λ ptqqβ This means that E x˚,λ˚i s nonincreasing on rt 0 ,`8q, thus, for every t ě t 0 it holds For every t ě t 0 , by integrating (3.13) from t 0 to t, we obtain p1´2θq θ where the last inequality follows from (3.6). Since all quantities inside the integrals are nonnegative, we obtain (3.10) -(3.12) by passing t Ñ`8.
piiq Assuming that α ą 3 and 1 2 ě θ ą 1 α´1 , one can immediately see that ξ ą 0. From (3.14) we obtain for all t ě t 0 which implies the boundedness of the trajectory. On the other hand, the same inequality gives for all Using the triangle inequality and (3.15) we obtain for all t ě t 0 which gives the desired convergence rate.
We can now formulate and prove the main convergence rate results of the paper Theorem 3.4. Let px, λq : rt 0 ,`8q Ñ XˆY be a solution of (PD-AVD) and px˚, λ˚q P S.
The following statements are true: piq for every t ě t 0 it holds
Since px˚, λ˚q P S, we have px˚, µ psqq P FˆB pλ˚; 1q. Lemma 3.1 combined with the relation (3.26) ensure that for every t ě t 0 it holds d dt E x˚,µpsq ptq ď´σθ 2 tG β´`x ptq , λ ptq˘ˇˇpx˚, µ psqqď´σ We will prove that for every t ě t 0 it holds The first inequality follows from the definition of E x˚,µpsq . To show the later one, we multiply both sides of (3.31) by t σ ą 0 and use integration by parts, for σ ą 0, or just integrate (3.32), for σ " 0, to deduce that for every t ě t 0 t σ E x˚,µpsq ptq´t σ 0 E x˚,µpsq pt 0 q´σ τ σ`1 xµ psq´λ˚, Ax pτ q´by dτ.
(3.33) By using (3.18) and (3.19) we further obtain for every t ě t 0 which is equivalent to (3.32). Now, since (3.32) is true for every t ě t 0 , it is fulfilled also for t :" s ě t 0 , which means that θ 2 s 2´f px psqq´f px˚q`xµ psq , Ax psq´by¯ď C 4 .
For both scenarios, the estimate (3.32) becomes Since s ě t 0 has been arbitrarily chosen, this gives proves (3.27).
piiq Since L px ptq , λ˚q´L px˚, λ ptqq ě 0, a direct consequent of (3.27) is that for every t ě t 0 Ax ptq´b ď C 4 θ 2 t 2 . (3.34) From (3.27) and the Cauchy-Schwarz inequality we can also deduce for every t ě t 0 that f px ptqq´f px˚q ď C 4 θ 2 t 2´x λ˚, Ax ptq´by ď On the other hand, the convexity of f together with the fact that px˚, λ˚q P S guarantee for every t ě t 0 f px ptqq´f px˚q ě x∇f px˚q , x ptq´x˚y "´xA˚λ˚, x ptq´x˚y "´xλ˚, Ax ptq´by ě´ λ˚ Ax ptq´b ě´ λ˚ C 4 θ 2 t 2 . Remark 3.5. A few remarks comparing our convergence rate results with the ones reported in [11,37,56] are in order.
‚ Primal-dual gap: Relation (3.27) guarantees a convergence rate for the primal-dual gap of which can be equivalently written as L px ptq , λ˚q´L px˚, λ˚q " Oˆ1 t 2˙a s t Ñ`8.
The primal-dual gap convergence rate stated in this form has been reported in [11,37,56].
‚ Feasibility measure: Relation (3.34) guarantees a convergence rate for the feasibility measure of Ax ptq´b " Oˆ1 t 2˙a s t Ñ`8, In [11,37,56], the feasibility measure Ax ptq´b is reported to have a convergence rate of O p1{tq as t Ñ`8.
‚ Objective function value: The upper bound we report for the objective function value in (3.29) matches the one from [11], while our lower bound, which is of order 1 t 2 , outperforms the one reported in [11], which is of order 1 t . In [37,56] no convergence rates for the objective function value are provided.

Weak convergence of the trajectory to a primal-dual optimal solution
The study of the convergence of the trajectory will be made in the following setting, which will be assumed to be fulfilled throughout the whole section.
For the beginning we will prove that in the setting of Assumption 2 the dynamical system (PD-AVD) has a unique global twice continuously differentiable solution.
Next we will show that F is Lipschitz continuous on bounded sets and chose to this end arbitrary t 0 ď t 1 ă t 2 ă`8 and δ ą 0. For pt, z, ζ, u, ρq ,´r t, r z, r ζ, r u, r ρ¯P rt 1 , t 2 sˆB p0; δqˆB p0; δqˆB p0; δqˆB p0; δq we have F pt, z, ζ, u, ρq´F´r t, r z, r ζ, r u, r ρ¯ ď u´r u ` ρ´r ρ ` α t u´α r t r u`∇f pzq´∇f pr zq`A˚´ζ´r ζ`θ`tρ´r tr ρ˘¯`βA˚A pz´r zq ` α t ρ´α r t r ρ´´A`z´r z`θ`tu´r tr u˘˘¯ Consequently, Since F is Lipschitz continuous on bounded sets and continuously differentiable, the local existence and uniqueness theorem (see, for instance, [52,Theorems 46.2 and 46.3]) allows us to conclude that there exists a unique solution px, λ, y, νq P XˆYˆXˆY of (4.1) defined on a maximally interval rt 0 , T max q where t 0 ă T max ď`8. Furthermore, either T max "`8 or lim tÑTmax px ptq , λ ptq , y ptq , ν ptqq "`8.
We will prove that T max "`8.
We start the convergence analysis of the trajectory with the proof of two important integrability results, whereby we notice that that statement (3.10) only implies (4.3) if β ą 0. Proposition 4.2. Let px, λq : rt 0 ,`8q Ñ XˆY be a solution of (PD-AVD) and px˚, λ˚q P S. Then it holds ż`8 and ż`8 Proof. The determinant role in the proof is the fact that, for every t ě t 0 , as ∇f is ℓ´Lipschitz continuous, relation (3.9) in the proof of Lemma 3.1 can be sharpened thanks to (1.3) tó A ∇G β´`x ptq , λ ptq˘ˇˇpx˚, λ˚q¯,`x ptq , λ ptq˘´px˚, λ˚q E " x∇f px ptqq , x˚´x ptqy`xA˚λ˚, x˚´x ptqy`β xA˚pAx ptq´bq , x˚´x ptqy ď´pf px ptqq´f px˚qq´1 2ℓ ∇f px ptqq´∇f px˚q 2´x λ˚, Ax ptq´by´β Ax ptq´b 2 Consequently, by combining this inequality with (3.8), it yields for every t ě t 0 This leads by integration to (4.2). On the other hand, it follows from (3.34) that ż`8 and the proof is complete. Now we define, for a given primal-dual optimal solution px˚, λ˚q P S, the following two mappings on rt 0 ,`8q W ptq :" L β px ptq , λ˚q´L β px˚, λ ptqq`1 2
Proof. Multiplying inequality (4.4) by t and adding θpα`1qtW ptq to its both sides, we obtain for every t ě t 0 t : ϕ ptq`α 9 ϕ ptq`θ´t 2 9 W ptq`pα`1q tW ptq¯ď θ pα`1q tW ptq . (4.8) Multiplying further (4.8) by t α´1 , it yields for every t ě t 0 As 1´2θ ą 0 and ξ " θα´θ´1 ą 0, it follows from (3.11) and (3.12) in Theorem 3.2 that t Þ Ñ tW ptq belongs to L 1 prt 0 ,`8qq. After integration we obtain from (4.9) that for every t ě t 0 Next we will prove a number of results which will finally guarantee that the second assumption of the Opial Lemma is fulfilled, namely that every weak sequential cluster point of the trajectory px, λq is an element of S. Lemma 4.5. Let px, λq : rt 0 ,`8q Ñ XˆY be a solution of (PD-AVD) and px˚, λ˚q P S. The following inequality holds for every t ě t 0 : x ptq , 9 λ ptq¯ 2`θ d dt´t A˚pλ ptq´λ˚q 2¯`p 1´θq A˚pλ ptq´λ˚q 2
The following proposition provides a further important integrability result.
We will compute these five integrals separately. Let t ě t 0 fixed.
Dividing (4.23) by t α we obtain from here x ptq , A˚pλ ptq´λ˚qy`C According to (3.11) and (3.12) in Theorem 3.2 as well as (4.2) and (4.3) in Proposition 4.2, we conclude that both t Þ Ñ tV ptq and t Þ Ñ tˆ Ax ptq´b 2` 9 λ ptq 2˙b elong to L 1 prt 0 ,`8qq, therefore the right-hand side of (4.30) is finite. Hence, by passing r Ñ`8 in (4.30) and by taking into account the choice of the parameters θ and α, we obtain the desired statement.
The following result will be used to show the weak convergence of the trajectory, but it also has its own interest, since it provides the convergence rate for the KKT system associated to problem (1.1).

thus, from Lemma A.2 we get
A˚pλ ptq´λ˚q " oˆ1 ? t˙a s t Ñ`8. The functions F ptq :" t ∇f px ptqq´∇f px˚q 2 ě 0 G ptq :" p1`tq ∇f px ptqq´∇f px˚q 2`t ℓ 2 9 x ptq 2 defined on rt 0 ,`8q are locally absolutely continuous and belong, according to Proposition 4.2 and Theorem 3.2, to L 1 prt 0 ,`8qq. For almost every t ě t 0 we have d dt´t ∇f px ptqq´∇f px˚q 2" ∇f px ptqq´∇f px˚q 2`2 t B ∇f px ptqq´∇f px˚q , d dt ∇f px ptqq F ď p1`tq ∇f px ptqq´∇f px˚q 2`t d dt ∇f px ptqq 2 ď p1`tq ∇f px ptqq´∇f px˚q 2`t ℓ 2 9 x ptq 2 , where the last inequality follows from the fact that ∇f is ℓ´Lipschitz continuous. From Lemma A.2 we get ∇f px ptqq´∇f px˚q " oˆ1 ? t˙a s t Ñ`8.
We are now in the position to prove the main result of this section.
Then`x ptq , λ ptq˘converges weakly to a primal-dual optimal solution of (1.1) as t Ñ`8.
Proof. We have seen in Lemma 4.4 that the limit lim tÑ`8 `x ptq , λ ptq˘´px˚, λ˚q exists for every px˚, λ˚q P S, which proves condition (i) of Opial's Lemma (see Lemma A.3).
In order to prove condition (ii), we consider´r x, r λ¯an arbitrary weak sequential cluster point of`x ptq , λ ptq˘as t Ñ`8, which means that there exists a sequence tpx pt n q , λ pt n qqu ně0 such that px pt n q , λ pt n qq á´r x, r λ¯as n Ñ`8.
Theorem 4.7 and Theorem 3.4 allow us to deduce that ∇f px pt n qq`A˚λ pt n q Ñ ∇f px˚q`A˚λ˚" 0 as n Ñ`8.
and Ax pt n q´b Ñ 0 as n Ñ`8, respectively. Since the graph of the operator T L introduced in (2.8) is sequentially closed in pXˆYq weakˆp XˆYq strong (cf. [23,Proposition 20.38]), we have that # ∇f pr xq`A˚r λ " ∇f px˚q`A˚λ˚" 0 Ar x´b " Ax˚´b " 0 .
In other words,´r x, r λ¯belongs to S and the proof is complete.
Thus, by applying Fubini's theorem, If r :"`8, then the above inequality is an equality.
The following result can be found in [1,Lemma 5.2].
Lemma A.2. Let δ ą 0, 1 ď p ă 8 and 1 ď q ď 8. Suppose that F P L p prδ,`8qq is a locally absolutely continuous nonnegative function, G P L q prδ,`8qq and d dt F ptq ď G ptq for almost every t ě δ.
Opial's Lemma [45] in continuous form is used in the proof of the weak convergence of the trajectory of (PD-AVD) to a primal-dual solution of (1.1). This argument was first used in [33] to establish the convergence of nonlinear contraction semigroups. piiq every weak sequential cluster point of the trajectory z ptq as t Ñ`8 belongs to S.
Then z converges weakly to a point in S as t Ñ`8. Statement (4.31) in Theorem 4.7 suggests that the mapping px, λq Þ Ñ p∇f pxq, A˚λq is constant along the set S of primal-dual optimal solutions of (1.1). This is confirmed by the following result.
Proposition A.4. Consider the optimization problem (1.1). If ∇f is ℓ´Lipschitz continuous, then for every px˚, λ˚q , px˚˚, λ˚˚q P S it holds ∇f px˚q " ∇f px˚˚q and A˚λ˚" A˚λ˚˚.