ALTERNATING ALGORITHMS FOR TOTAL VARIATION IMAGE RECONSTRUCTION FROM RANDOM PROJECTIONS

. Total variation (TV) regularization is popular in image reconstruction due to its edge-preserving property. In this paper, we extend the alternating minimization algorithm recently proposed in [37] to the case of recovering images from random projections. Speciﬁcally, we propose to solve the TV regularized least squares problem by alternating minimization algorithms based on the classical quadratic penalty technique and alternating minimization of the augmented Lagrangian function. The per-iteration cost of the proposed algorithms is dominated by two matrix-vector multiplications and two fast Fourier transforms. Convergence results, including ﬁnite convergence of certain variables and q -linear convergence rate, are established for the quadratic penalty method. Furthermore, we compare numerically the new algorithms with some state-of-the-art algorithms. Our experimental results indicate that the new algorithms are stable, eﬃcient and competitive with the compared ones.

1. Introduction.Variational image reconstruction plays an important role in various applications, e.g., medical imaging, astronomical imaging, and transform coding.Let ū ∈ R n 2 be an original n × n image, A ∈ R m×n 2 be a linear operator, and f ∈ R m be an observation which satisfies the relationship where N(•) represents a noise corruption procedure.Given A, image reconstruction extracts ū from f , which is generally ill-posed because the system could be either under-determined (m < n 2 ) as in compressive sensing or severely ill-conditioned as in deconvolution.Even when the problem is overdetermined (m > n 2 ), f could suffer from outliers.Therefore, the classical least squares estimation alone does not suffice for a faithful reconstruction.To stabilize recovery, regularization technique is frequently used, resulting a general reconstruction model of the form (2) min u Φ reg (u) + µΦ fid (Au − f ), where Φ reg (•) promotes solution regularity such as smoothness or sparseness, Φ fid (•) fits the observed data by penalizing the difference between Au and f , and µ > 0 balances the two terms for minimization.The choice of Φ fid (•) depends on the type of noise, e.g., the squared 2 -norm is usually used for additive Gaussian noise, while the 1 -norm is preferred for certain non-Gaussian noise, e.g., salt-and-pepper noise.
Throughout this paper, we assume that N(•) represents an additive Gaussian noise contamination and set Φ fid (•) = • 2 2 .Among others, the total variation (TV) regularization has been popular ever since its introduction by Rudin, Osher and Fatemi [31].The remarkable property of TV is that it preserves edges due to the linear penalty on differences between adjacent pixels.In this paper, we propose two efficient algorithms for solving the following variational model (3) min where D i ∈ R 2×n 2 is a local finite difference operator (with certain boundary conditions) at pixel i, D i u ∈ R 2 denotes the discrete gradient of u at pixel i, i.e., the finite difference of u at the ith pixel in horizontal and vertical directions, the sum i=1 D i u 2 is a discretization of the TV of u, and A ∈ R m×n 2 .Our concentration is the case that A is generic linear operator, which has potential applications in compressive sensing [9,17], see, e.g., the single pixel camera architecture described in [18].
Since the pioneering work [31], the study of numerical algorithms for solving TV models such as (3) has been an important subject in variational image processing, see e.g.[10,12] and references therein.The advantage of TV regularization is that it exploits image blocky structures and preserves edges.The superiority of TV over Tikhonov regularization [34] is analyzed in [1,15] for recovering images containing piecewise smooth objects.Despite these advantages, it is generally challenging to solve TV models efficiently in practice because imaging problems are usually large scale, ill-conditioned, and moreover the TV is non-smooth.Here we review some existing approaches for solving TV models.
Early research on numerical algorithms for TV models was concentrated on smoothed TV models, where the TV is approximated by 2 + with > 0 small.As such, ordinary optimization methods for smooth function minimization can be applied, e.g., the time-marching scheme used in the pioneering work [31], the linearized gradient method proposed in [35,36].Another class of algorithms for TV problems are those based upon the iterative shrinkage/thresholding (IST) operator, see e.g.[29,21,33].Recently, a two-step IST algorithm was proposed in [7] by taking advantage of the iterative information.At each iteration of IST-based algorithms, a TV denoising problem needs to be solved, either exactly or approximately.More recent approaches for solving TV models are based on appropriate splitting of the TV norm.A novel way of splitting the TV norm was proposed in [37], where the authors utilized the quadratic penalty technique to derive an efficient alternating minimization algorithm.This splitting of TV allows the use of a multidimensional shrinkage operator and fast Fourier transform for deconvolution problem.By applying the classical augmented Lagrangian method [27,30], Goldstein and Osher [24] derived the algorithm based on the Bregman distance [8].Lately, the classical alternating direction method has been applied to a set of imaging problems, see e.g.[19,32,2,3,13,39].
We note that the time-marching scheme [31] suffers from slow convergence, while the IST related algorithms need to solve a TV denoising problem at each iteration which requires its own iterations.Although the recent advances, the efficiency of the approaches in [37,32] depends heavily on the problem structure.At each iteration of these algorithms, a large system of linear equations needs to be solved exactly and efficiently, which is possible only when the matrix A preserves certain structure, e.g., A is a block-circulant matrix for deconvolution problems.However, when A is a generic linear operator, the solution of a large linear system at each iteration is costly.In this paper, we propose efficient alternating minimization algorithms for solving (3) where A is a generic operator and does not have structure to be explored (see also [11,20]).Note that the case with generic A has important applications in the emerging methodology of compressive sensing.For example, it was shown in [9] that TV regularization is able to recover piecewise constant images exactly from very few Fourier measurements.Exact recoverability of sparse and piecewise constant (sparse gradient) signals is also attainable for a large class of random matrices with less rows than columns, see e.g., [41].Another popular application is the TV-regularized CT-image reconstruction, where A is a partial Radon transform matrix which depends on the location and direction of each beamlet [16].
The main contribution of this paper is to present two efficient alternating minimization algorithms for solving (3), one is based on the classical quadratic penalty method, and the other is based on the alternating minimization of the augmented Lagrangian function (which is known as alternating direction method or ADM, [22]).For the first algorithm, we establish finite convergence of certain variables and qlinear convergence rate under reasonable assumptions.For the second algorithm, we show that its convergence can be guaranteed by results in the literature.We also compare these new algorithms with some state-of-the-art algorithms numerically.
In the rest of this paper, the inner product of two vectors will be denoted by u, v .Without misleading, we let i=1 .Additional notation will be introduced when it occurs.
2. An alternating minimization algorithm.The task of this section is to develop an alternating minimization algorithm for solving (3) based on the classical quadratic penalty method.
By introducing auxiliary variables w i ∈ R 2 for each i and letting w = (w 1 , . . ., w n 2 ) ∈ R 2×n 2 , we can easily reformulate (3) into (4) min u,w i To deal with constraints, we apply the quadratic penalty method and approximate (4) by (5) min u,w i where β > 0 is a sufficiently large penalty parameter.This splitting was first introduced for deconvolution problem in [37], and then was extended to deal with cross-channel image deconvolution in [38] and impulsive noise elimination based on a TVL1 model in [40].
2.1.Alternating minimization.The advantage of ( 5) is that it permits alternating minimization.It is easy to see that, for fixed u = u k , the minimization of (5) with respect to w reduces to the following two-dimensional problems (6) min for which the unique minimizers are given by the two-dimensional shrinkage formulas (7) where the convention 0 • (0/0) = 0 is followed.On the other hand, for fixed w = w k+1 , the minimization of ( 5) with respect to u is a least squares problem of the form (8) min u i The normal equations of (8) are given by (9) i It is well-known that, under the periodic boundary condition for u, i D i D i is a block-circulant matrix and can be diagonalized by the two-dimensional Fourier matrix.However, the matrix A A does not have circulant structure for general random matrix A. Therefore, the exact solution of ( 9) is costly and should be avoided, which will be addressed in the following.
To avoid solving the linear system (9), similar with the approach in [25,14], at each iteration we linearize Au − f 2 at the current point u k and add a proximal term, resulting the following approximation problem to (8): where and τ > 0 is a parameter.Simple manipulations show that ( 10) is equivalent to (11) min The normal equations of ( 11) are given by (12) i Under the periodic boundary conditions for u, the coefficient matrix in ( 12) can be diagonalized by the two dimensional Fourier matrix.Consequently, the solution of ( 12) can be accomplished by two fast Fourier transforms or FFTs (including one inverse FFT).To sum up, our alternating minimization algorithm for solving the approximation problem ( 5) is as follows.

Inverse Problems and Imaging
Volume 6, No. 3 (2012), 547-563 The implementation details of Algorithm 2.1, such as choices of parameters and stopping criterion, will be addressed in Section 4. In the following we analyze its convergence properties.
2.2.Convergence analysis.The purpose of this subsection is to establish convergence properties of Algorithm 2.1 for a fixed β > 0. For convenience, we first define some notation.
We let the corresponding global finite difference operators in horizontal and vertical directions be D (1) and D (2) , respectively, which are n 2 ×n 2 matrices.For simplification, we let D = ((D (1) ) , (D (2) ) ) ∈ R 2n 2 ×n 2 , and thus i D i D i = D D. For j = 1, 2, we let w j be the the j-th row of w = [w 1 , . . ., Since w and w are the same set of variables with different ordering, in the following we use either w or w subject to convenience.For fixed β > 0, the 2-dimensional shrinkage operator s : R 2 → R 2 is defined as where P B (•) : R 2 → R 2 is the projection onto the closed ball B {α ∈ R 2 : α ≤ 1/β}, and the convention 0 i.e., S applies the 2-dimensional shrinkage to each pair (u i , v i ) , for i = 1, 2, . . ., N .
From the definition of s(•), it is easy to see that ( 7) can be rewritten as w k+1 i = s(D i u k ).The following result shows that the operator s is non-expansive.
The proof of Lemma 2.1 can be found, e.g., in [37].To establish the convergence of Algorithm 2.1, we need the following technical assumption.Since the objective function in ( 5) is convex, bounded below, and coercive (i.e., it goes to infinity as (w, u) → ∞), it has at least one minimizer (w * , u * ) which must satisfy (14) w * = S(D (1) . By using the shrinkage operator, we can rewrite the iteration of Algorithm 2.1 as (15) w k+1 = S(D (1) In the following, we show that (15) converges to (14).For the convenience of analysis, we define Assumption 2.1 ensures the non-singularity of M , while H −1 is always well defined.Simple manipulation shows that H − M = η 2 T with η µ βτ .With these definitions, ( 14) and ( 15) can be, respectively, simplified as ( 16) To further simplify the above equations, we define Hence, the solution and iteration systems in ( 16) and ( 17) can be, respectively, rewritten as ( 18) where "•" denotes operator composition.Furthermore, we let Then ( 18) and ( 19) become Hu k+1 = D w k+1 + ηv k+1 .Now we present an important lemma that will be used for convergence analysis of Algorithm 2.1.Its proof is given in Appendix A. Lemma 2.2.q(w; v) is non-expansive, i.e., for any (w 1 ; v 1 ), (w and equality holds in (20) if and only if Corollary 1. Suppose (w * ; v * ) is a fixed point of q, i.e., (w * ; v * ) = q(w * ; v * ).Then for any (w; v) it holds Using exactly the same arguments as [37, Theorem 3.4], we can prove the following theorem.Due to this similarity, we omit the proof.
For each i, we let Furthermore, we let It is theoretically interesting that the iterates produced by Algorithm 2.1 will converge in a finite number of steps for certain variables, though this behavior may not be easily observable in practical experiments.In fact, we have the following finite convergence result for the variables {w i : i ∈ L}.
Theorem 2.4.Suppose {(w k , u k )} generated by Algorithm 2.1 converges to (w * , u * ).Then, we have w k i ≡ w * i = 0, ∀ i ∈ L, after a finite number of iterations.Proof.From Lemma 2.1, for each i, it holds Suppose that at iteration k there exist at least one index i ∈ L such that Combining ( 22) with (24), we obtain where the second "≤" comes from the non-expansiveness of which can be easily derived.Therefore, the number of iterations k with which completes the proof.
Theorem 2.5.Under the conditions of Theorem 2.3, the sequence {u k } generated by Algorithm 2.1 converges to u * q-linearly.
Proof.From the iteration formulae for u and (w; v), there holds (25) where R = (D; ηT )H −1 (D; ηI) .Considering the finite convergence of w i , i ∈ L, we have where (R R) EE is a sub-matrix of R R ∈ R 3n 2 ×3n 2 formed by deleting certain rows and corresponding columns with indexes in ∪ i∈L {i, i + n 2 }.Multiplying (D; ηT ) on both sides of (25), we get , where the first "≤" is due to the finite convergence of w i , i ∈ L, and the second one is because (26).This implies that {u k } converges q-linearly.
3. An inexact alternating direction method.It is well known that problem (5) well approximates (3) only when β is sufficiently large.In practice, it is generally difficult to determine theoretically how large the value of β should be in order to attain a given accuracy.In the following, we present an inexact ADM, which converges to a solution of (3) with any fixed β > 0. For this purpose, we consider the augmented Lagrangian function of (4): where, for each i, λ i ∈ R 2 is the Lagrangian multiplier attached to w i = D i u.The classical ADM approach [22,23] is a variant of the augmented Lagrangian method [27,30] for convex optimization problems with separable structures such as (4).Given (u k , λ k ), the ADM iterates as follows The advantages of the ADM framework (28) are: i) it exploits the separable structure of the objective function in (4); and ii) the value of β can be fixed to ensure the convergence of ADM.It is easy to see that the w-subproblem in the ADM framework ( 28) is equivalent to the solutions of which are given by ( 29) However, the u-subproblem in the ADM framework ( 28) is expensive to solve due to the same reason as for (9).We adopt the same linearization technique and approximate the u-subproblem by min where g k = A (Au k − f ).It is easy to show that the normal equations of (30) are of the form Under the periodic boundary conditions, the exact solution u k+1 of (31) can be attained by two FFTs.Finally, λ is updated as in (28), or equivalently To sum up, we have the following inexact ADM algorithm for solving (4).
3) Update λ k via (32) and set k = k + 1. End Do Now, we discuss the convergence of the proposed inexact ADM.From the above discussions, we see that the only difference between Algorithm 3.1 and the ADM (28) lies in the solution of the u-subproblem.To avoid the prohibitive cost on solving a large linear system at each iteration, we adopted the linearization and proximal techniques.It is easy to show that the approximation problem (30) is equivalent to (33) min where S = 1 τ I − A A and v 2 S = v Sv.Therefore, Algorithm 3.1 is actually a proximal inexact ADM, in which one subproblem is solved exactly, while the other is solved approximately using the proximal formulation (33).Note that a more general proximal inexact ADM has been studied in [26] in the context of monotone variational inequality, where both subproblems are allowed to be solved approximately by using proximal method like (33).Therefore, the convergence of Algorithm 3.1 can be guaranteed by the results in [26] provided that S 0 is satisfied.
From [26,Theorem 4], we have the following convergence result for Algorithm 3.1.
4. Numerical results.In this section, we report numerical results to illustrate the performance of the proposed algorithms.For convenience, the proposed algorithms are named as Fast Total Variation image reconstruction from Compressive Sensing measurements or FTVCS.Specifically, we refer to Algorithms 2.1 and 3.1 as FTVCS-v1 and FTVCS-v2, respectively.We performed two classes of numerical experiments.In the first class, we compare the performance of FTVCS-v1 and FTVCS-v2, while in the second class we compare FTVCS-v2 with several state-ofthe-art algorithms to illustrate its efficiency and stability.All experiments were performed under Windows XP Professional Service Pack 3 and MATLAB v7.8 (R2009a) running on a PC with an Intel Core 2 Quad CPU at 2.83 GHz and 3 GB of memory.

4.1.
Test on FTVCS-v1 and FTVCS-v2.In this experiment, we used a random matrix A with independent identically distributed Gaussian entries and tested the Shepp-Logan phantom image, which has been widely used in simulations for TV models.Due to storage limitation, we tested the image size 64×64 and selected 30% samples uniformly at random.Besides, we added Gaussian noise of mean zero and standard deviation σ = 0.001.Similar as in FTVd [37], we implemented FTVCS-v1 with a continuation scheme on β to speed up convergence.Specifically, we tested the β-sequence {2 4 , 2 5 , 2 6 , 2 7 } and used the warm-start technique.In FTVCS-v2, the value of β was fixed to be 8, which was determined based on experimental results.The algorithmic parameter τ was set to be 1/λ max (A A) for both algorithms, and the model parameter µ was set to be 200.Both algorithms were terminated when the relative change between successive iterates fall below 10 −3 , i.e., (34) We measured the quality of reconstruction by relative error (RE) to the original image ū, i.e., RE = u − ū / ū × 100%, where ū and u are, respectively, the original and the recovered images.The reconstruction results from both algorithms are presented in Figure 1.
It can be seen from Figure 1 that both algorithms produced faithful recovery results.We note that, for the Phantom image, the length of edges grows linearly while the number of pixels grows quadratically as n increases.Therefore, the sparsity ratio (defined by spr = #{i : D i u = 0}/n 2 ) of the Phantom image becomes bigger for smaller n.Specifically, for the tested 64 × 64 Phantom image, we have spr = 12.26%, while the sample ratio is only 30%, which together with the presence of Gaussian noise explain why the recovered image by FTVCS-v2 has relative error as large as 5.68%.To closely examine the convergence behavior of both algorithms, we present in Figure 2 the decreasing of objective function values and relative errors as the iterations proceed (since both algorithms have the same per-iteration cost).These results indicate that Algorithm 3.1 performs better than the quadratic penalty Algorithm 2.1 in the sense that it obtains better recovery results (smaller function value and relative error) within less iterations.We also compared the two algorithms by computing on larger images with partial discrete cosine transform and observed consistent results.Thus, in the following we only compare the FTVCS-v2 with other state-of-the-art algorithms.

4.2.
Comparisons with TwIST, MFISTA, and TVAL3.In this subsection, we present numerical results to compare FTVCS-v2 with TwIST [7], MFISTA [5], and TVAL3 [28], all of which are developed in the recent years for solving linear inverse problems with TV regularization.In the comparison, we used partial discrete cosine transform (DCT) matrix as encoding matrix, i.e., the m rows of A were chosen uniformly at random from the n 2 × n 2 DCT matrix.Since the DCT matrix is implicitly stored as fast transforms, this enables tests on larger images.
TwIST1 is a two-step IST algorithm for solving a class of linear inverse problems.Specifically, TwIST is designed to solve (35) min where A is a linear operator, and J (•) is a general regularizer, which can be either the 1 -norm or the TV.The iteration framework of TwIST is where α, δ > 0 are parameters, We used the default parameters in TwIST and terminated the iteration process when the relative variation of function value fell below 10 −4 .The monotone version of the fast IST algorithm -MFISTA2 is designed to solve TV-based deblurring problem (37) min where C is a closed convex set, and A is a linear map.MFISTA is a monotone version gradient-based algorithm for TV image denoising and deblurring problems, which shares the simplicity of iterations with FISTA [6] and attains fast convergence speed.When running MFISTA, we set the Lipschitz constant to be L = 1 since λ max (A A) = 1 when A is a partial DCT matrix, and set other parameters to their default values.We compared FTVCS-v2 with TVAL33 , which is developed to solve TV-regularization problems (3) and its variants.TVAL3 is based on the augmented Lagrange function and an alternating minimization technique.Moreover, TVAL3 incorporates gradient descent method with BB steplength [4] and nonmonotone line search.We also compared FTVCS-v2 with a modified version of TVAL3 by using conjugate gradient method to solve the large linear system at each iteration.The inner iteration is terminated when the residual is less than 10 −3 or the number of iterations exceeds 5.The resulting code is named TVAL3 CG.When running TVAL3 CG, we used the default parameter values as in TVAL3 except that opts.tol=10−3 .
As stated above, λ max (A A) = 1 for partial DCT matrix.From Theorem 3.1, convergence of Algorithm 3.1 can be guaranteed for 0 < τ < 1.In our experiments, we set τ = 1, although in practice bigger τ value usually leads to faster convergence.We set β = 2 6 and terminated FTVCS-v2 when the relative change fell below 10 −4 .The regularization parameter was chosen to be µ = 500 for an additive Gaussian noise with mean 0 and standard deviation 0.001.The initial point u 0 was A f .
In the comparison, we tested two typical images (cameraman: 256×256 and lena: 512 × 512), two MRI brain images (br-1: 256 × 256 and br-2: 512 × 512).For each image, we tested four levels of sample ratios (defined by sr = m/n 2 ): 10%, 30%, 50% and 70%.The original images are displayed in Figure 3, while the comparison results are presented in Table 1, where the final objective function values (Obj), the relative errors (RE) of the recovered images to the true images, and the consumed CPU time in seconds (T) are given.It can be seen from Table 1 that, for all the tested cases, FTVCS-v2 obtained solutions with the smallest objective function values and comparable relative errors in the least CPU seconds.From the results, we also see that TVAL3 is most competitive with FTVCS-v2, while TwIST and MFIST are slower.Based on these experimental results, TVAL3 CG is slightly slower than TVAL3.We also observed that the total number of iterations of TVAL3 CG is relatively less than that for TVAL3.This is because, at each iteration, TVAL3 requires only one step with line search to obtain u k+1 , while TAVL3 CG needs more inner iterations to produce a solution of higher accuracy.We also tested a set of other images with different sizes and observed consistent results.Specifically, TVAL3 is the most competitive and runs slightly slower than FTVCS-v2, while TwIST and MFIST are slower.These results and observations clearly demonstrated the efficiency and stability of the inexact ADM Algorithm 3.1.
5. Concluding Remarks.In this paper, based on the classical quadratic penalty method and the alternating minimization of the augmented Lagrangian functions, we proposed two fast alternating minimization algorithms for solving the total variation image reconstruction model (3) with a generic linear operator.To overcome the difficulty in solving a large linear system at each iteration, we adopted linearization and proximal techniques.The per-iteration cost of both algorithms is dominated by two matrix-vector multiplications and two FFTs.We also established finite convergence of certain variables and q-linear convergence rate for the quadratic penalty method.Our numerical comparisons with some state-of-the-art algorithms also demonstrated the efficiency and stability of the inexact ADM approach.With the encouraging numerical performance of the proposed algorithms, it is worthwhile to investigate other techniques such as the line-search strategies to further accelerate some splitting algorithms, which is an interesting topic of future research.
Acknowledgments.We thank two anonymous referees for their constructive suggestions which improved the paper greatly.
Appendix A. Proof of Lemma 2.2.