Convergence of Linear Bregman ADMM for Nonconvex and Nonsmooth Problems with Nonseparable Structure

+e alternating direction method of multipliers (ADMM) is an effective method for solving two-block separable convex problems and its convergence is well understood. When either the involved number of blocks is more than two, or there is a nonconvex function, or there is a nonseparable structure, ADMM or its directly extend version may not converge. In this paper, we proposed an ADMM-based algorithm for nonconvex multiblock optimization problems with a nonseparable structure. We show that any cluster point of the iterative sequence generated by the proposed algorithm is a critical point, under mild condition. Furthermore, we establish the strong convergence of the whole sequence, under the condition that the potential function satisfies the Kurdyka–Łojasiewicz property. +is provides the theoretical basis for the application of the proposed ADMM in the practice. Finally, we give some preliminary numerical results to show the effectiveness of the proposed algorithm.


Introduction
In this paper, we consider the following possibly nonconvex and nonsmooth optimization problem: where x i ∈ R n i (i � 1, 2, . . . , N − 1), y ∈ R n are variables vectors, g: R l ⟶ R(l � n 1 + n 2 + · · · + n N− 1 + n) is differentiable, and each f i : R n i ⟶ R ∪ +∞ { } is proper and lower semicontinuous, A i ∈ R m×n i (i � 1, 2, . . . , N − 1), B ∈ R m×n are given matrix, and b ∈ R m . e alternating direction method of multipliers (ADMM) is a very effective method for solving the convex two-block optimization problem [1,2]. A natural idea is to extend ADMM to solve problem (1). However, ADMM or its directly extend version may not converge, when either the involved number of blocks is more than two, or there is a nonconvex function, or there is a nonseparable structure.
(2) e proposed algorithm combines linearization technology with regularization technology. Linearization technology and regularization technology can effectively reduce the difficulty of the solving subproblems.
e rest of this paper is organized as follows. In Section 2, some basic concepts and necessary preliminaries for further analysis are summarized. In Section 3, we propose the algorithm and analyze the convergence of it for 3-block nonconvex and nonsmooth coupled problems. Finally, some conclusions are made in Section 4.

Preliminaries
R n denotes the n-dimensional Euclidean space, R ∪ +∞ { } denotes the extended real number set, and N denotes the natural number set. e image space of a matrix Q ∈ R m×n is defined as Im Q: � Qx: x ∈ R n { }. P Q (·) denotes the Euclidean projection onto Im Q. If matrix Q ≠ 0, let μ Q denote the smallest positive singular value of the matrix QQ T . ‖ · ‖ represents the Euclidean norm. dom(f): For a point-to-set mapping F, its graph is defined by Definition 1 (see [14]). Let f: R n ⟶ R ∪ +∞ { } be a proper function. If there exists δ > 0 such that for all λ ∈ (0, 1) and x, y ∈ dom f, then f is called strongly convex with modulus δ.
Definition 2 (see [15]). For a convex differential function ϕ: R n ⟶ R, the associated Bregman distance is defined as for all x, y ∈ R n .
(3) e Bregman distance plays an important role in iterative algorithms. e Bregman distance share many similar nice properties of the Euclidean distance. However, the Bregman distance is not a metric, since it does not satisfy the triangle inequality nor symmetry. Some examples of Bregman distance include [16] Let us now collect some useful properties about Bregman distance.
Proposition 1 (see [15]). Let ϕ be differentiable and strongly convex function with modulus δ, then 2 for all x and y e following notations and definitions are quite standard and can be founded in [14,17]. (i) e Frėchet subdifferential, or regular subdifferential, (ii) e limiting subdifferential, or simply the subdifferential, of f at x ∈ domf, written zf(x), is defined as (iii) A point that satisfies 0 ∈ zf(x) is called a critical point or a stationary point of the function f. e set of critical points of f is denoted by crit f. e following proposition collects some properties of the subdifferential.
Proposition 2 (see [17]). Let f: R n ⟶ R ∪ +∞ { } and g: R n ⟶ R ∪ +∞ { } be proper lower semicontinuous functions. en, the following holds: first set is closed and convex, while the second is closed and not necessarily convex.
e Lagrangian function of (1), with multiplier λ ∈ R m , is defined as then w * is called a critical point or stationary point of the Lagrange function L(x 1 , · · · , x N− 1 , y, λ). A very important technique to prove the strong convergence of the ADMM for nonconvex optimization problems relies on the assumption that the benefit function satisfying Kurdyka-Łojasiewicz property (KL property) [18][19][20][21].
ere are many functions which satisfy this inequality. Especially, when the function belongs to some functional classes, e.g., semialgebraic, real subanalytic, and log-exp (see [22][23][24]). It is often elementary to check that such an inequality holds.
For notational simplicity, we use Φ η (η > 0) to denote the set of concave functions φ: e KL property can be described as follows.
Definition 5 (see [18][19][20][21]) (KL property). Let f: R n ⟶ R ∪ +∞ { } be a proper lower semicontinuous function. If there exists η ∈ (0, +∞], a neighborhood U of x * , and a then f is said to have the KL property at x * . Lemma 1 (see [22]) (uniformized KL property). Suppose that f: R n ⟶ R ∪ +∞ { } is a proper lower semicontinuous function and Ω is a compact set. If f(x) ≡ f * for all x ∈ Ω and satisfies the KL property at each point of Ω. en, there exist ε > 0, η > 0, and φ ∈ Φ η such that for all Lemma 2 (see [25]) (Descent lemma). Let h: R n ⟶ R be a continuous differentiable function where gradient ∇h is Lipschitz continuous with the modulus l h > 0, then for any x, y ∈ R n , we have Lemma 3 (see [26]). Let Q ∈ R r×p be a nonzero matrix and let μ Q denote the smallest positive eigenvalue of QQ T . en, for every u ∈ R p , there holds

Algorithm and Convergence
For the convenience of analysis, we only consider the case of N � 2. e obtained results could naturally be generalized to the case of N > 2. us, in the rest of this paper, we consider the following nonconvex and nonsmooth 3-block optimization problem: where is proper and lower semicontinuous but possibly nonconvex, g: In this paper, we present the following algorithm for (12).

Remark 1.
Due to the different structures of the problem, the algorithm in this paper is different from the existing algorithms. In order to make use of the properties of differentiable blocks and simplify the calculation of each iteration, we linearize the differentiable part in the x 1 and x 2 subproblems. If the function g(x 1 , x 2 , y) is only related to the variable y, that is, g(x 1 , x 2 , y) � h(y), then the algorithm LBADMM will become the Bregman ADMM in [9,10]. Different from [9, 10], we do not assume B is full row rank.
In this section, we always assume that the sequence e following lemma establishes the relationship between the dual variable and the original variables.
Proof. By Assumption 1 (ii) and Lemma 3, we have From the optimality condition of y-subproblem in (14) yields us,

Complexity
It follows from the abovementioned formula and (17) that e proof is completed. □ e augmented Lagrangian function with multiplier λ of (12) is defined as where L(x 1 , x 2 , y, λ) is the Lagrangian function of (12). Let Let e following lemma implies the monotonicity of the sequence L(w k ) k∈N . □ Lemma 5. For each k ∈ N, where Proof. From (17), we have Complexity 5 Adding up the abovementioned three formulas, we have and hence that is, From Lemma 2, Assumption 1 (iv), and Proposition 1, we obtain (29) Recall that Adding up (71) and (72), we have (31) Together with (14), we obtain 6 Complexity which implies that at is, (23) holds.
Proof. Since w k is bounded, the sequence w k is bounded and there exists a subsequence w k j such that lim j⟶+∞ (w k j )w * . Since f 1 , f 2 are lower semicontinuous and g is Lipschitz differentiable, the function L(·) is lower semicontinuous, which leads to us, L(w k j ) is bounded from below. From Lemma 5, L(w k j ) is nonincreasing. us, L(w k j ) is convergent. Furthermore, L(w k ) is also convergent and L(w k ) ≥ L(w * ) for each k. By Lemma 5, we have From the abovementioned formula, we obtain Note that σ > 0 and the arbitrariness of t, we obtain In view of (14), we have where Proof. From the definition of L(w), we have Complexity 7 From (14) and the optimality conditions, one has 0 ∈ zf 1 x k+1 at is, From (43) and (45), we have where It follows from Assumption 1 and Lemma 4 that there exists a δ > 0 such that d 0, zL w k+1 ≤ δa k , for all k ∈ N.
(49) e following theorem shows that the algorithm LBADMM has global convergence.
(iii) L(·) is finite and constant on S( w k ) and equal to inf k∈N L β (w k ) � lim k⟶+∞ L β (w k ).

Proof
(i) By the definition of S( w k ), it is trivial.
It follows from (14) that h k j (x k j +1 1 ) ≤ h k j (x * 1 ) and θ k j (x  12 Complexity value ("objective-value") and the trend of the residual defined by ‖r k ‖ � ‖A 1 x k 1 + A k 2 x k 2 + y k − b‖ ("‖r‖ 2 ").

Conclusions
We propose a new algorithm called linear Bregman ADMM for the three-blocks optimization problem with the nonseparable structure. e proposed algorithm integrates the basic ideas of the linearization technology and regularization technology. We show that any cluster point of the sequence generated by the proposed algorithm is a critical point. Under the condition that the potential function satisfies the Kurdyka-Łojasiewicz property and the penalty parameter is larger than a constant, the strong convergence of the algorithm is proved. Preliminary numerical results show that the algorithm LBADMM is stable and effective.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.