Convergence of Slice-Based Block Coordinate Descent Algorithm for Convolutional Sparse Coding

Convolutional sparse coding (CSC) models are becoming increasingly popular in the signal and image processing communities in recent years. Several research studies have addressed the basis pursuit (BP) problem of the CSC model, including the recently proposed local block coordinate descent (LoBCoD) algorithm. ,is algorithm adopts slice-based local processing ideas and splits the global sparse vector into local vector needles that are locally computed in the original domain to obtain the encoding. However, a convergence theorem for the LoBCoD algorithm has not been given previously. ,is paper presents a convergence theorem for the LoBCoD algorithm which proves that the LoBCoD algorithm will converge to its global optimum at a rate of O(1/k). A slice-based multilayer local block coordinate descent (ML-LoBCoD) algorithm is proposed which is motivated by the multilayer basis pursuit (ML-BP) problem and the LoBCoD algorithm.We prove that theML-LoBCoD algorithm is guaranteed to converge to the optimal solution at a rate O(1/k). Preliminary numerical experiments demonstrate the better performance of the proposed ML-LoBCoD algorithm compared to the LoBCoD algorithm for the BP problem, and the loss function value is also lower for ML-LoBCoD than LoBCoD.


Introduction
Sparse representation models have been widely used in various image processing [1,2] and computer vision [3,4] applications. A sparse representation model assumes that signals X ∈ R N can be expressed as a linear combination of several columns, i.e., X � DΓ, where D is a matrix that forms the dictionary D ∈ R N×Nm , and Γ ∈ R Nm is a sparse vector. If D is assumed to be fixed, it can be considered a basis pursuit (BP) problem to find the sparse vector Γ. However, the BP algorithms are only encoded in patches and ignore the relationship between neighboring patches, resulting in a high degree of redundancy in the encoding. e convolution sparse coding (CSC) model [5] has been proposed and extended in the last ten years, and it imposes constraints on the dictionary by using a banded circulant matrix. is model assumes that the signal can be represented as the superposition of a few local filters, convolved with a sparse vector. Several works have presented algorithms for solving the CSC problem [6,7]. Contemporary BP algorithms for CSC often rely on the Alternating Direction Methods of Multipliers (ADMM) algorithm in the Fourier domain. It is known that algorithms encoded in the Fourier domain are often computationally infeasible. Additionally, algorithms based on the ADMM formula need to introduce auxiliary variables which increases the difficulty of optimization. A recent work proposed by Papyan et al. [8], adopted slicebased local processing ideas and split the global sparse vector Γ into local vector needles that are locally computed in the original domain rather than the Fourier domain to obtain the encoding. While this approach still relies on the ADMM algorithm, its convergence largely depends on the auxiliary variables that were introduced. e LoBCoD algorithm [9] is another algorithm that was proposed for the BP problem. e advantages of the LoBCoD algorithm are that it is not calculated on the Fourier domain and the calculation does not use the ADMM formula. More precisely, the LoBCoD algorithm optimizes needles of the CSC model in the original domain and operates without any auxiliary variables. Compared with global or local ADMM-based methods, the LoBCoD algorithm achieves better performance to solve the BP problem. However, the literature [9] does not provide convergence theorem for the LoBCoD algorithm.
us, this paper will present a convergence theorem and proof of the LoBCoD algorithm.
A multilayer convolution sparse coding (ML-CSC) model has been proposed in the last three years by Sulam et al. [10], which is a deep extension of the CSC model. e core assumption of the ML-CSC model is that a signal can be expressed by sparse representations at different layers in terms of nested convolutional filters. e traditional BP problem was recently extended to a multilayer setting, which was motivated by the ML-CSC model [11]. Several methods have been proposed to solve the ML-BP problem. e first method is a layered basis pursuit algorithm [12], which establishes a connection between convolutional neural networks and sparse modeling. However, layered basis pursuit algorithm does not provide a signal that satisfies the assumption of the multilayer model, and the signal reconstruction error increases as the network deepens. Subsequently, the multilayer iterative threshold algorithm (ML-ISTA) and its fast version (ML-FISTA) algorithm [11] were proposed, which only require matrix multiplications and entry-wise operations and will converge well to the global optimal. Unfortunately, both methods operate on patches only and do not utilize slice-based local processing idea. erefore, the slice-based ML-LoBCoD algorithm is proposed for ML-BP problem. is algorithm employs slicebased local processing idea and the block coordinate descent (BCD) method. Based on the convergence theorem proof of the block coordinate descent algorithm [13], this paper provides a convergence theorem for the ML-LoBCoD algorithm and proves that the ML-LoBCoD algorithm converges to the global optimal value at a rate of O(1/k). e rest of this paper is organized as follows. We begin by reviewing the slice-based CSC and slice-based LoBCoD algorithms in Section 2. e convergence theorem and proof of the LoBCoD algorithm are given in Section 3. In Section 4, we propose a slice-based ML-CSC model and a slice-based ML-LoBCoD algorithm. e convergence theorem and proof of the slice-based ML-LoBCoD algorithm are given in Section 5. In Section 6, the experimental results of the signal reconstruction and classification accuracy of the two networks inspired by the two algorithms are given. Finally, we conclude this work in Section 7.

Slice-Based Convolutional Sparse Coding.
e CSC model assumes that a global signal X ∈ R N can be decomposed as X � DΓ � m i�1 d i * Γ i , where D ∈ R N×Nm is a banded convolutional dictionary that consists of all shifted versions of a local dictionary D L ∈ R n×m , d i ∈ R n are local filters that are extracted from D L , the global sparse vector Γ ∈ R Nm contains the interlacing cascades of all the sparse representations Γ i ∈ R N , and Γ i is the corresponding sparse representation of local filter d i . Using the above formula, the BP problem can be expressed as e global sparse vector Γ can be decomposed into N nonoverlapping m-dimensional local sparse vectors, α i N i�1 , which are called needles [8], i.e., Γ � [α 1 , . . . , α s , . . . , α N ].
us, the global signal X can be expressed as X � N i�1 P T i D L α i , where P T i ∈ R N×n is the operator that places D L α i in the ith position and pads the remaining entries with zeros. erefore, the BP problem (1) can be expressed as a local problem: Papyan et al. proposed the slice-based local processing idea and defined s i � D L α i as the ith slice. e global signal X can be rewritten as X � N i�1 P T i s i . en, the slice-based BP problem (2) can be expressed as Papyan et al. tackled the BP problem (3) using the ADMM algorithm [8], which minimizes the following augmented Lagrangian problem: Here, u i N i�1 denotes the dual variables of the ADMM formulation.

Slice-Based Local Block Coordinate Descent Algorithm.
e CSC model parameters are represented by the local sparse vectors α i and the local dictionary D L . Assuming D L is fixed, a slice-based local processing idea and block coordinate descent method are adopted to update the needles. f(α k 1 , . . . , α k s− 1 , α s , α k s+1 , . . . , α k N ) is objection function of equation (2). e BCD algorithm [14] will be briefly described below.
Initialization: choose any until the convergence condition is met.
Output: Γ k+1 � (α k+1 1 , . . . , α k+1 N ) In this paper, each needle α i can be treated as a block of coordinates taken from the global vector which can be optimized separately with respect to each block in sequence.
Consequently, the update rule for each needle can be written as Equation (6) can be decomposed into a local problem: where senting the residual image without the contribution of needle α i , and P i ∈ R n×N is the transpose of P T i , representing the operator that extracts the ith n-dimensional patch from X.
e LoBCoD algorithm is proposed to minimize equation (7) [9]. e function f( can be defined, which is a convex smooth function. e LoBCoD algorithm can be considered to be a generalized gradient algorithm that applies an update in the form . e update rule for each needle α i can be expressed as

Convergence of Slice-Based Local Block Coordinate Descent Algorithm
e convergence theorem will now be proposed and proof of LoBCoD algorithm will be given.

Lemma 1 (fundamental proximal gradient inequality).
Assume that f is a convex smooth function, g is a convex function, and it holds that where Theorem 1 (convergence of LoBCoD). Given a signal X and local dictionary D L , the slice-based LoBCoD algorithm is guaranteed to converge to the optimal solution at a rate Ο(1/k).

Proof.
e optimization problem of CSC can be represented as a general minimization model as follows: where is defined to represent the nonempty optimal problem set of problem (12), and the optimal object function value is represented by F(α * i ). According to the proximal gradient method, the general update step of α i can be written in the following form: . erefore, the update step of each needle α i can be written in the following form: Exploiting the fundamental proximal gradient inequality and making When all of the above inequalities are added together for n � 0, . . . , k, the following result is obtained: e following result is obtained using the scaling method: us, we obtain Mathematical Problems in Engineering 3 (18) Finally, we obtain Since the sparse vector Γ can be decomposed into nonoverlapping m-dimensional sparse vectors α i N i�1 and α i is convergent, we can obtain convergence of Γ and the convergence rate is constant with α i . us, the LoBCoD algorithm convergences to the global optimum at a rate of Ο(1/k).

Slice-Based Multilayer Local Block
Coordinate Descent Algorithm

Slice-Based Multilayer Convolutional Sparse Coding.
e ML-CSC model is a deep extension of the CSC model, since the ML-CSC model assumes that the signal can be represented by a sparse representation of the nested convolution filter on different layers. e ML-CSC model assumes that for D 1 ∈ R N×Nm 1 , Γ 1 is the corresponding sparse representation and a global signal X can be expressed as X � D 1 Γ 1 . is model can be cascaded by imposing a similar assumption Γ 1 , i.e., Γ 1 � D 2 Γ 2 , for a convolutional dictionary D 2 ∈ R Nm 1 ×Nm 2 and corresponding sparse representations where the l 0,∞ norm is defined the maximal number of nonzeros in vector [10], Γ j is the sparse representation of the jth layer, P T j,i is the operator that extracts the ith n-dimensional patch from the jth layer sparse representation Γ j , D L,j is jth layer local dictionary, α j,i is α i of the jth layer, and λ j is a superparameter. e proposed slice-based multilayer basis pursuit (ML-BP) problem can be expressed as e jth layer of the BP problem can be expressed as

Slice-Based Multilayer Local Block Coordinate Descent
Algorithm. Based on the LoBCoD algorithm, this paper extends the LoBCoD algorithm to a multilayer algorithm to solve the ML-BP problem of equation (21). ML-LoBCoD uses slice-based local processing idea to update the needles. However, rather than optimizing with respect to all needles at the same time, we can treat each needle α j,i as a coordinate block and optimize with respect to each block separately in sequence. erefore, the jth layer of the BP problem can be expressed as e jth layer of the ML-BP problem of equation (24) becomes equivalent to solving the following minimization problem: We define f(α) � (1/2)‖P j,i R j,i − D L,j α j,i ‖ 2 2 , which is a convex function. e gradient of f(α) is ∇f(α j,i ) � − D T L,j (P j,i R j,i − D L,j α j,i ). e ML-LoBCoD algorithm can also be considered to be a generalized gradient algorithm that applies an update of the form α k+1 j,i ←α k j,i − ∇f(α k j,i ). e update rule of each needle α j,i can be expressed as  6) is process is repeated k times until convergence occurs. e natural question that arises is whether the slicebased ML-LoBCoD algorithm is guaranteed to converge to the global minimum of the ML-BP problem, and the answer is positive as will be discussed in Section 5.

Convergence of Slice-Based ML-LoBCoD
In this section, we will propose the convergence theorem and provide proof of convergence of the slice-based ML-LoB-CoD algorithm.

Theorem 2 (convergence of ML-LoBCoD).
Given signal X and local dictionary D L,j J j�1 , the slicebased ML-LoBCoD algorithm will converge with convergence rate O(1/k).

Proof.
e definition of nonexpansive operators [15] is employed, which are guaranteed to converge to their fixed point. Firstly, the definition of internal operators and external operators is given as follows: An operator T is nonexpansive if it is Lipschitz continuous with constant 1, i.e., if ‖Tα − Tβ‖ 2 ≤ ‖α − β‖. In addition, if an operator is firmly nonexpansive, then it must be nonexpansive. A firmly nonexpansion operator needs to satisfy the following conditions: e proximal operator is firmly nonexpansive. e internal operator will now be analyzed as follows: where s is chosen such that s < λ max (D T 1 D 1 ). us, T in is firmly nonexpansive.
Similarly, it can be proved that T out is also a nonexpansive operator. Let t/μ be a positive constant and sat- . We obtain erefore, the ML-LoBCoD algorithm is a nonexpansive operator and it converges to a fixed point.
Next, we analyze the fixed point which is defined as α * j+1 � T out α * j+1 for ML-LoBCoD algorithm. Using the second proximal theorem [16], we obtain Assuming that there is w j+1 ∈zg(α * j+1 ), the above expression can be rewritten as Using the second proximal theorem, we obtain w j ∈zg j (T in D L,j+1 α * j+1 ) and Mathematical Problems in Engineering Combining these equations together, we obtain e fixed point satisfies the optimization condition of the convex function, so the algorithm will converge to a fixed point.
Next, the nonexpansive operator T out is used to analyze the convergence rate of this algorithm: Summing all the inequalities above from i � 1, . . . , k − 1, we can obtain erefore, we obtain Finally, the results are obtained: e jth needles α j is convergent, i.e., Γ j converges. Since the convolution sparse code Γ is an interleaved cascade of all Γ j , thus Γ will also converge, and the convergence rate is constant with α i . us, ML-LoBCoD algorithm converges to the global optimum solution at a rate O(1/k). □ 6. Experiment and Discussion 6.1. Methods. In this section, the LoBCoD algorithm and ML-LoBCoD algorithm for image reconstruction are described. ey are inspired by the ML-ISTA [11,17]. We constructed a one-layer CSC model. e LoBCoD algorithm is iterative unfolded into a layer of convolutional recurrent neural networks. We constructed a three-layer CSC model. e ML-LoBCoD algorithm is iterative unfolded into a multilayer convolutional recurrent neural network. ML-LoBCoD-NET iterates 3 times to form a convolutional recurrent structure because the algorithm requires multiple iterations to obtain the optimal performance. e forward process of ML-LoBCoD is that the images are inputted to the ML-LoBCoD network to obtain the coding, then which is classified by the classification layer. e backward process of ML-LoBCoD is minimizing the total loss of function to update the convolutional dictionary by the backpropagation algorithm. e loss function is the negative log likelihood function that represents the loss of a classification error between the ground truth labels and the predicted labels by the network.
Find the local sparse code:

Experimental Results.
In this section, we perform experiments on the MINIST dataset using both the LoBCoD and ML-LoBCoD model for image reconstruction. We constructed a one-layer CSC model and a three-layer CSC model. e one-layer CSC model contains 64 local filters of size 6 × 6. In the three-layer CSC model, the first convolutional layer includes 64 local filters of size 6 × 6, the second convolutional layer contains 128 filters of size 6 × 6, and the third convolutional layer contains 512 filters of size 4 × 4. Its parameters are the same as a traditional CNN, and thus the network parameters remain unchanged. e cause of selecting the MNIST datasets that this dataset is the most popular and the most frequently used image datasets and are also the entry-level benchmark datasets in deep learning community.
Experiments on MNISTusing the proposed networks are run on Pytorch platform and Linux system using a single computer with a Nvidia Geforce 1080Ti GPU. e code used to support the findings of this paper inspired by the ML-ISTA [11] in GitHub website and are available from the corresponding author upon request. e loss function values of LoBCoD and ML-LoBCoD with the number of iterations in the MNIST dataset are shown in Figure 1. It is obvious that the loss function values of the ML-LoBCoD are lower than that of the LoBCoD. e results of the original test image and the reconstructed image using the LoBCoD method and the ML-LoBCoD method after 100 iterations are shown in Figures 2  and 3, respectively. It can be clearly seen that the reconstruction quality of the ML-LoBCoD is better than that of LoBCoD. e loss function value, the training time, and the peak signal-to-noise ratio (PSNR) value of the two networks after 100 iterations are shown in Table 1. It can be observed that the loss function and PSNR values of the ML-LoBCoD network is smaller than the LoBCoD network. e PSNR value of the LoBCoD network is lower than that of the ML-LoBCoD network. e loss function value of the ML-LoBCoD network is 3.03 × 10 − 6 . e PSNR value of the ML-LoBCoD network is 20.15 dB. However, the ML-LoBCoD network has longer training time than LoBCoD network. e ML-LoBCoD network has more model parameters than the LoBCoD network. e model parameters of the two networks are shown in Table 2. e model parameters and of the ML-ISTA and corresponding CNN are given in [11]. It can be observed that the ML-ISTA, ML-LoBCoD, and corresponding CNN have the similar parameters, more than the LoBCoD network. e classification accuracy results of the four model are shown in Table 3. It can be observed that ML-LoBCoD has the highest accuracy rate, far superior to the LoBCoD. ese experimental results demonstrate the superiority of our proposed method.

Conclusion
In this paper, the convergence theorem of LoBCoD algorithm was proposed and it was proved that this method converges to the global optimum at a rate O(1/k). Inspired by ML-CSC and slice-based local processing idea, a slicebased ML-CSC model was proposed. Motivated by ML-ISTA and the LoBCoD algorithm, an ML-LoBCoD algorithm was proposed and it was established that this method is guaranteed to converge to the global optimum of ML-BP at a rate of O(1/k). e experiment compares the loss    e experiment shows that the reconstruction quality of the ML-LoBCoD network is better than that of LoBCoD network. However, this paper only studies the pursuit problem of ML-CSC. e dictionary learning problem of CSC needs further study.
is paper is a preliminary study of ML-LoBCoD network, and further study is needed to obtain a more complete analysis, which will likely contribute to the further understanding of deep learning based on driven optimization algorithms.

Data Availability
e data used to support the findings of this study are available from the website http://yann.lecun.com/exdb/ mnist/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.