USING GENERALIZED CROSS VALIDATION TO SELECT REGULARIZATION PARAMETER FOR TOTAL VARIATION REGULARIZATION PROBLEMS

. The regularization approach is used widely in image restoration problems. The visual quality of the restored image depends highly on the regularization parameter. In this paper, we develop an automatic way to choose a good regularization parameter for total variation (TV) image restoration problems. It is based on the generalized cross validation (GCV) approach and hence no knowledge of noise variance is required. Due to the lack of the closed-form solution of the TV regularization problem, diﬃculty arises in ﬁnding the minimizer of the GCV function directly. We reformulate the TV regularization problem as a minimax problem and then apply a ﬁrst-order primal-dual method to solve it. The primal subproblem is rearranged so that it becomes a special Tikhonov regularization problem for which the minimizer of the GCV function is readily computable. Hence we can determine the best regularization parameter in each iteration of the primal-dual method. The regularization parameter for the original TV regularization problem is then obtained by an averaging scheme. In essence, our method needs only to solve the TV regulation problem twice: one to determine the regularization parameter and one to restore the image with that parameter. Numerical results show that our method gives near optimal parameter, and excellent performance when compared with other state-of-the-art adaptive image restoration algorithms.

1. Introduction.Image restoration is an important task in image processing.Its aim is to recover the original image f from the degraded observed image g.Mathematically, the problem can be formulated as solving the perturbed linear equations 2 Y.W. WEN AND R. CHAN [2] g = Hf + n.
Here H is a linear operator describing the degradation processing of the system, n is the Gaussian white noise.It is well-known that the image restoration problem is ill-condition and regularization approach is generally used to stabilize the solution [7,12,14,15,30,31].One classical approach is to find the original image by minimizing an energy function consisting of a data-fitting term and a regularization term: min Here α is a positive regularization parameter and φ(f ) is a regularization term representing the prior information of the original image f .We remark that straightly speaking, the regularization parameter should be put in front of the regularization term φ(f ).But for ease of formulation in the following derivations, we put α in front of the data-fitting term and still refer it as regularization parameter.
Numerous expressions for φ(f ) have been studied in the literature, such as the Tikhonov regularization [15,31] and Total Variation (TV) regularization [30].Compare to the Tikhonov regularization, the TV regularization has the ability to preserve edges in the image due to the piecewise smooth regularization property of the TV norm.Thus it is in general more preferable than the Tikhonov regularization for image restoration purposes.
We remark that the visual quality of the restored image depends highly on the regularization parameter α.When α is too small, the restored image is oversmoothed.When α is too large, the restored image will contain too much noise.Finding a suitable value for α is therefore an integral part of solving the image restoration problem.In the literature, α is usually chosen manually by trial and error.However, this approach needs to solve the TV regularization problem many times and hence is time consuming.It also makes the model impossible to be used in many real applications.Several approaches have been developed to choose α automatically.These approaches included, for example, the L-curve method [20], the discrepancy principle [23], the variational Bayes approach [4,5,27], and the generalized cross validation (GCV) method [15,17].
In the L-curve method, α is chosen so as to maximize the curvature of the Lcurve, which is the log-log plot of the norm of the regularized term versus the norm of the data-fitting term [19].The main difficulty with L-curve method is that we also need to solve the minimization problem many times for different α's and therefore it is also computational expensive.Sometimes, it is also difficult to find the maximum value of the curvature of the L-curve.
In the discrepancy principle (DP) method, α is chosen so as to match the datafitting term to an upper bound derived from the noise variance.The regularization problem can be reformulated as a constrained one where the data-fitting term should be less than or equal to the upper bound [3,8,15,25,32].Then α is regarded as the Lagrangian multiplier associated with this constraint.A main drawback of the method is that we need to know the variance of the noise.In [33], we have developed a primal-dual framework based on DP to find α and recover the original image simultaneously.The method only needs to solve the TV-regularization problem once.
In the GCV method, α is chosen so as to minimize the GCV function which is derived from the leave-one-out rule [17,26].The most appealing advantage of the method is that it does not require the knowledge of the noise variance.The basic idea of GCV is that if α is a good choice, then it should minimize the GCV function.For Tikhonov regularization term, Golub et al. [17] developed the well-known GCV method for choosing α.In [26], GCV method was applied to parametric image restoration and resolution enhancement.
We note that GCV method works for quadratic regularization terms since the regularization problem has a closed-form (linear) solution.When the regularization term is non-quadratic (such as the TV term we considered here), there is no closed-form solution of the regularization problem, and hence the minimizer of the GCV function can not be derived explicitly.Accordingly, the problem of parameter selection becomes a bilevel optimization model which is very difficult to solve.Nonlinear GCV methods were extended to handle this difficulty [16,18,28].The nonlinear GCV function was derived by evaluating the Jacobian matrix consisting of the partial derivatives of the minimizer of (1) with respect to the observed data g.In [29], nonlinear GCV method was applied to the problem of image restoration and MRI reconstruction.Liao et al. [21] incorporated the GCV technique into the splitting-and-regularization framework to handle the parameter estimation for TV-based image reconstruction problem.
In this paper, we propose a novel and efficient GCV approach to estimate α for TV regularization problems.First, we rewrite the TV regularization problem as a minimax problem and use a first-order primal-dual method to solve it.Then for the subproblem for the primal variable, we reformulate it as a quadratic regularization problem for which the minimizer of the GCV function is computable.Hence we can determine the regularization parameter for the primal subproblem in each iteration.The regularization parameter α for the original TV regularization problem is then obtained by an averaging scheme using the regularization parameters obtained in each primal iteration.
We now highlight the major differences between our method and previous methods using the GCV approach to solve the TV regularization problems.The method in [21] needs to solve the TV regularization problem many times since it introduces one more regularization parameter; but ours only needs to solve the problem twice: one to determine α and one to solve for f with our computed α.The method in [29] is more complex than ours since in each iteration, it requires to compute the Jacobian matrix consisting of the partial derivatives of the minimizer with respect to the observed data.In contrast, our method only needs to solve the classical Tikhonov regularization problem without involving any Jacobians.Numerical results show that our method gives near optimal regularization parameter, and excellent performance when compared with other state-of-the-art adaptive image restoration algorithms.
The rest of this paper is organized as follows.In Section 2, we reformulate the TV regularization problem as a minimax problem, and apply a first-order primal-dual method to solve for it.In Section 3, we first recall the GCV approach for finding α for Tikhonov regularization problems.Then we extend it to TV regularization problems.In Section 4, experimental results are given to demonstrate the efficiency of our algorithm.Finally we draw conclusions in Section 5.
2. TV-norm and Primal-dual Approach.In this section, we briefly review the TV-norm and reformulate the TV regularization problem into a minimax problem.
Then we apply a first order primal-dual method to find the saddle point of the minimax problem.
2.1.Total Variation Norm.Let the size of the original image be m × n.In order to represent the discrete total variation, we define the matrices R ∈ R mn×2 as follows: here = (i − 1)m + j, i = 1, 2, . . ., j = 1, 2, . . ., n.The total variation of the image f can be written as TV(f ) = R T f 2 where each term on the right hand side has an equivalent dual formulation: R T f 2 = max ξ 2 ≤1 ξ T R T f .Here T with = (i − 1)n + j for i = 1, 2, . . ., m and j = 1, 2, . . ., n. Define the sets S as Then we can reformulate the total variation of the image f as 2.2.First Order Primal-dual Approach.Using the above representation of the TV norm, the TV regularization problem can be re-written as ( We notice that the functional in (3) is convex (in fact quadratic) in f and concave (in fact linear) in ξ.By using Proposition 2.6.1 in [6], an optimal solution f * of (3) can be obtained through a saddle point (f * , ξ * ) of J (f , ξ; α).When the regularization parameter α is given, many primal-dual methods [9,10,33,34,35,36] can be applied to find a saddle point of (3) by computing the primal variable f and a dual variable ξ in an alternating way.The method in [10] has become a standard approach to compute a saddle point of minimax functions, so we apply it to find a saddle point of (3).It starts at a point (f (0) , ξ (0) ) and computes k = 0, 1, . ... The parameters s, t > 0 are step sizes of the primal and dual variables respectively, and θ is the combination parameter.This primal-dual algorithm enjoys convergence with rate O(1/k) when θ = 1 and the step sizes satisfy st < 1/ R 2 2 , see [10].It is easy to verify that R 2 2 ≤ 8. Hence in all the numerical tests below, we choose θ = 1 and st < 1/8.The iterative scheme can also be equivalently interpreted as a first-order primal-dual relaxed Arrow-Hurwitz algorithm, see [10].
3. Generalized Cross Validation.We now recall the basic idea of Generalized Cross Validation (GCV) in selecting an optimal regularization parameter α for the model (1).Notice that the minimizer of ( 1) is also the solution of the following optimization problem min where λ = 1/α.If we find a suitable parameter λ, we can obtain α accordingly.The basic idea for GCV is to remove g , the -th element of the observation data g, and then use the remaining elements to estimate this -th element.If the regularization parameter is good, then the estimated element should be a good predictor for g .Let M be the diagonal matrix with all entries being one except the -th entry being zero and f be the minimizer of the minimization problem Here M g can be regarded as removing the -th entry of the given data g.The -th entry of H f (λ) is the predicted value of the missing -th entry.The optimal regularization parameter λ should be chosen to minimize the GCV function, which is defined as the weighted predicted error over all the observed data: Here w are the weights such that the estimate is invariant under the coordinate system [11,17].The analysis of the weights can be found in [17].
3.1.GCV for Tikhonov Regularization Problems.In this subsection, we present the explicit expression of the GCV function for Tikhonov regularization problems, i.e.
where L is a regularization matrix.If the intersection of the null spaces of H and L contains only the zero vector, then the optimization problem has a closed-form solution Using the closed-form formula of f (λ) in (10), the GCV function in ( 8) can be expressed as (see [17]): Here trace(•) denotes the trace of a matrix.An optimal parameter λ should be chosen to minimize the GCV function (11).Note that (11) is not easily computable since it involves (H T H + λL T L) −1 .However, for our method to be derived below, we just need L = I, the identity matrix, and H to be diagonalizable, i.e., H = U T DU with a unitary matrix U and a diagonal matrix D. Let d be the -th diagonal entry of D and ĝ be the -th entry of the vector Ug.Using the identity I −HA(λ) = λ HH T + λI −1 , the GCV function (11) becomes We remark that in image restoration, H is usually a matrix with block circulant with circulant blocks (BCCB) structure when periodic boundary conditions are applied, or with block Toeplitz-plus-Hankel with Toeplitz-plus-Hankel blocks (BTHTHB) structure when Neumann boundary conditions are used [24].If H is a BCCB matrix, it can be diagonalized by a fast discrete Fourier transform matrix; and if H is a BTHTHB symmetrical matrix, it can be diagonalized by a fast discrete cosine transform matrix [24].In both cases, we can compute the GCV function easily.

3.2.
GCV for TV Regularization Problems.Now we consider how to find the regularization parameter α for TV regularization problems (3).Notice that unlike the Tikhonov regularization problem (9) where it has a closed-form solution (10), problem (3) does not have a closed-form solution.Using the idea of GCV, the optimal parameter λ = 1/α should be the minimizer of GCV(λ) in ( 8) subject to the constraint that f (λ) is a solution of (7).Obviously, it is a bilevel minimization problem: which is in general difficult to solve.We resort to find the best λ adaptively.
Notice that if we solve problem (3) using the primal-dual method introduced in Section 2.2 (i.e.steps given in ( 4)-( 6)), only the primal subproblem (4) depends on α.Moreover (4) can be rewritten in the form of a Tikhonov regularization problem: Here v (k) = f (k) − tRξ (k) , which can be regarded as an approximately restored image.Notice that (13) can be further rewritten as f (k+1) = u (k+1) +f (k) −tRξ (k) , where with β = 1/(tα).The problem ( 14) is precisely the special Tikhonov regularization problem we discussed in Section 3.1 where L = I.Its GCV function is given by (12).By minimizing the GCV function, e.g. by Matlab function fminbnd, we can obtain the best β k for (14).Once β k is determined, we set and solve f (k+1) in ( 13) with α = α k .Finally the regularization parameter α of the original TV regularization problem (3) is obtained by an averaging scheme.The idea is to run the program until a stopping criterion is fulfilled, and then we set the regularization parameter α as the average of the α k in the last 20-th iterations.More precisely, if the iteration stops at say the Kth iteration, then we set α = 1 20 The resulting algorithm is summarized in Algorithm 1 where P S is the projection operator onto S in (2).Once α is found, we can use the primal-dual method ( 4)-( 6) with this fixed α to find the restored f .In summary, our method requires applying the primal-dual method twice; first to obtain the λ k in each iteration in order to determine α, and second to find the restored image f with the obtained α.Ensure: α = GCV(g, H).Require: g, H.
2: while stopping criterion is not satisfied do 3: Estimating the parameter β k in ( 14) by GCV and set α k = 1 tβ k ; 5: ); 8: end while  Signal-to-Noise (ISNR) is used to measure the quality of the observed images and the restoration results respectively.The ISNR is defined as follows: Eight typical blur kernels are used in the tests, see Table 1.Periodic boundary condition is used for these blurring operators.Zero mean additive white Gaussian noises with various noise levels σ are added to the blurred image to generate the noisy observations.For finding the minimizer of the GCV function (11) using Matlab fminbnd, instead of finding all the minimizers in (0, ∞), we just find a local minimizer of (11) in the finite interval (0, 100].The stopping criterion is when the relative difference between successive iterates satisfies f (k+1) − f (k) 2 2 / f (k) 2 2 ≤ 10 −4 ; or the maximum number of iterations, which is set to 500, has reached.4.1.Robustness of s and t.By (15), α depends on α k which in turn depends on the step size t.We first test whether α is robust against t.According to the convergence criteria in [10], we should have st < 1/8.In general, both step sizes s, t cannot be too small or too large; otherwise the algorithm can stop too early since Goldhill Man PSF σ t = 0.1 t = 0.5 t = 1 t = 0.1 t = 0. the difference between the consecutive iterations will be small (see second terms in (4) and ( 6)).In the tests, we set t = 0.1, 0.5, 1 and s = 1 16t .The test images are the goldhill image and the man image as shown in Fig. 1.The computed regularization parameters α are shown in Table 2.We observe that when t changes, the α obtained by our method does not change.Thus in the following experiments, we fix t = 1 and s = 1/16.

Is α Good?
We next present experiments to test whether the regularization parameter α obtained by our method is good.We also used the man and goldhill images here.We plot the curve of the ISNR versus α in Fig. 2 (resp.Fig. 3) for noise levels σ = 2, 4, 6, 8 (resp.σ = 10, 20, 30, 40) for the man image.Since the results for the goldhill image are similar, we do not depict them here.The ISNRs for different α are obtained by applying the first-order primal-dual method for each fixed α (see ( 4)-( 6)).The ISNR obtained by our method is marked as "•".We see in the figures that the ISNR obtained by our method is close to the highest ISNR of the curve in all cases, indicating that our chosen α is close to optimal.We can also observe that the optimal regularization parameters vary with the image, the noise level, and the blur type.It means that one should choose a difference α for different case in order to obtain an optimal performance and our method indeed provides a robust and adaptive approach to find such an α.

Comparison with Other Methods.
Here we compare our method on nine real-life images given in Fig. 4 with current state-of-the-art methods: alternating direction methods (ADMM) in [25], SALSA [1,13], and the GCV approach (GCV-L) [21].The discrepancy principle is used in ADMM [25] and SALSA [1,13] to adaptively restore the image.The codes are provided by the authors or downloaded from authors' website.In ADMM and SALSA, the upper bound of the data fitting term is needed and is set to mnσ 2 where σ 2 is the variance of the noise estimated by the median rule [22].We also test these two methods using the true σ, which is an oracle approach; and we denote the corresponding methods by ADMM-O and SALSA-O.For other parameters in the code, we use the default values by the authors.
The ISNR values for the restored images are listed in Tables 3-11, where the best results are shown in boldface, the second best results are marked in underlined italic.In the last column of each table, we give the gain in ISNR of our method when compared with the best one in the same row other than ours.We see that out of the 288 test cases, our method achieves the highest ISNR values in 281 cases, the second highest in 6 cases and third highest in the remaining one case.In conclusion, the experimental results demonstrate that our method is robust and provides excellent performance when compared to existing adaptive image restoration algorithms.

5.
Conclusions.In this paper, we applied the GCV idea to choose the regularization parameter α for TV regularization problems in image restoration.Our method only needs to solve the TV problems twice: first to determine α using GCV idea and second to solve the TV problem with our obtained α.Numerical results validated the excellent performance and the robustness of our proposed method.It outperforms the state-of-the-art adaptive image restoration algorithms.Our future project is to extend our method to image restoration or reconstruction problem where the degradation matrix H cannot be diagonalized.

Figure 4 .
Figure 4.The test images: macaws, motors, sailboat at pier, tropical island, lighthouse in Maine, P51 Mustang, Portland Head Light, barn and pond, mountain chalet.The sizes of the images are all 512 × 768.

Table 1 .
The point spread functions of the blurs used in the tests

Table 2 .
Regularization parameter α obtained by our approach for difference step sizes t and s = 1 16t .

Table 4 .
ISNR results for motorcross bikes image.

Table 5 .
ISNR results for sailboat at pier image.

Table 6 .
ISNR results for tropical island image.

Table 7 .
ISNR results for lighthouse in Maine image.

Table 9 .
ISNR results for Portland head light image.

Table 10 .
ISNR results for barn and pond image.

Table 11 .
ISNR results for mountain chalet image.