BFGS method based variable projection approach for image restoration

In this paper, a variable projection approach based on the BFGS (Broyden–Fletcher– Goldfarb–Shanno) method for image reconstruction problems is proposed, which is an alternative to the common alternating minimisation scheme. The image restoration is expressed as a nonlinear least-squares problem with reduced parameter space. To improve the efﬁciency of the algorithm, the BFGS method is proposed to be used to optimise the reduced objective function. The large-scale problem considered in this paper is projected on to a small Krylov subspace using Lanczos bidiagonalisation. The regularisation parameter is selected by a weighted generalised cross validation criterion. Numerical examples demonstrate the efﬁciency and effectiveness of the proposed algorithm.


Background
Image degradation usually occurred in practice due to noise and blurring. Reconstructing original images from degraded images, which is a typical inverse problem, plays an important role in many applications, e.g. medical, biological and astronomical imaging [1][2][3][4]. In this paper, we consider recovering a clear image from a blurred and noisy image by a nonlinear least squares framework. We assume that the observed image follows the linear formulation model equation where g(a, b) is the observed, noisy and blurred image, a, b represent the position of the pixel in space coordinates, f (s, t ) is the original image, (a, b) is the additive noise, and the kernel function k(⋅, ⋅) is the point spread function (PSF). Since we are interested in digital images, we discretise Equation (1) and obtain the following matrix-vector form where y ∈ R n is the observed image, x ∈ R n represents the original image, ∈ R n is unknown additive noise and ∈ R p represents nonlinear parameters. B( ) ∈ R n×n is a blur operation.
How to recover the original image x from the degraded image y is referred to as the image deconvolution. If the blurring operator is known, the process is called non-blind deconvolution; while the blurring operator is unknown, the process is called blind deconvolution. In this paper, we focus on the blind restoration task.

Related work
There has been an extensive literature on the image blind restoration. We only mention some of them according to our interests. Sondhi [5] first noticed that image restoration is an illposed problem by showing the equivalance of image restoration to the solution of the first kind Fredholm equation. Thus, image restoration is usually performed by minimising a cost function consisting a data fidelity term and certain regularisation terms, which results in the following minimisation problem [6] min x, where Φ(y − B( )x) is the fidelity term, is the regularisation parameter and Θ(x, ) is the regularisation term, respectively. In most situations, the fidelity term in (3) is 2 -norm. Designing regularisation term Θ(x, ) is an active research field. Besides the commonly used Tikhonov regularisation [7,8,10,11], there are many other regularisation methods, such as, the variational approach [12], total variation [13], and methods based on transform domains [14,15]. Recently, Cai et al. [6] introduced new sparsity-based regularisation terms on both images and motion-blur kernels; Bharadwaj et al. [16] added two additional constraints (maximally white and maximally frontloaded) to the blind deconvolution framework. The advantage of using the Thikhonov regularisation is that it may lead to very efficient iterative solvers [7].
The optimisation of (3) is nontrivial, because it is a nonconvex problem. The common approach used in the field of image processing is the alternating minimisation (AM) scheme [6,11,13,17], i.e. minimising (3) with respect to x with B fixed and minimising (3) with respect to B with x fixed. The AM scheme generates a sequence of image and PSF pair. The drawback of this approach is that a good initial guess is required and it may have uncertain convergence properties. If the regulariser in (3) is Tikhonov, an alternative approach to the AM scheme is the variable projection (VP) method [7][8][9]. For the case of Tikhonov regularisation, the minimisation of (3) is a linear least squares problem if we fix . Utilising this property, the parameter x can be eliminated from the optimisation problem, resulting in a reduced objective function that depends only on . This is the basic idea of the VP method. Apparently, the VP approach takes advantage of the fact that the function is strongly convex in x when is fixed. There are many works demonstrating its efficiency [18][19][20][21].
There have been several attempts to handle the image deconvolution problem by deep neural networks. In [23], Xu et al. proposed a deep convolutional neural network to capture the characteristics of degradation. In [33], Jim et al. proposed a deep convolutional network for inverse problems in biomedical imaging. Despite their high quality of deconvolution, those methods generally are time-consuming.

Motivation and contributions
In this paper, we consider to use a least-squares fit-to-data term with Tikhonov regularisation on x: where ∥ ⋅ ∥ denotes vector 2-norm, I is a regularisation operation and is usually chosen as the identity matrix, and is unknown which should be estimated from the given data. For (4), after eliminating x from the problem, Chung and Nagy [7] proposed a Gauss-Newton method to solve the nonlinear least-squares problem. More specifically, rather than solve (4), they instead work with the following reduced objective functioñf The above VP approach is justified by the following theorem, adopted from [22]. Note that the VP method does not alternate between and x. Instead, x is projected out from the problem.
The Gauss-Newton method requires the Jacobian of the reduced function (5). In [7], the authors used the finitedifference approach to approximate the Jacobian numerically. However, the numerical computation of Jacobian is timeconsuming.
Since the image blind deconvolution we considered is a largescale problem and x is approximated by an iterative method, the analytical expression of the Jacobian is difficult to obtain. On the contrary, the gradient off ( ) can be easily obtained. From Theorem 1, we can minimise (5) using only the information (6) based on the quasi-Newton method, such as BFGS (Broyden-Fletcher-Goldfarb-Shanno) method.
Thus, in this paper we propose a VP approach based on BFGS method for the image restoration. The BFGS method has been proven the advantages in solving large-scale problems [24,25]. Specially, Thompson et al. [26] studied several optimisation methods for blind decovolution by comparing their speed and accuracy, and the results indicated that the BFGS algorithm has the best performance.
Regularisation parameter selection is another important problem. The generalised cross validation (GCV) is a popular approach to regularisation parameter estimation in blind deconvolution [17,27]. However, Kim and Gu [28] empirically found that the standard GCV may produce too large or too small regularisation parameters. To overcome the defect, Chung et al. [29] proposed a weighted-GCV (WGCV) method for choosing the regularisation parameter for large scale ill-posed problems. Thus, in this paper we adopt the WGCV method for the regularisation parameter selection.
Specifically, the main contribution of this paper is the introduction of BFGS method to the VP algorithm in the large scale problem which make the algorithm easy to implement in image blind deconvolution. The proposed approach only uses the gradient of the reduced objective function, which makes the algorithm more efficient.
The remainder of this article is organised as follows. The iterative regularisation based on Lanczos bidiagonalisaiton and WGCV is introduced in Section 2. The proposed BFGS method based VP algorithm is presented in Section 3. Numerical examples are carried out in Section 4. The conclusions are given in Section 5.

ITERATIVE REGULARISATON BASED ON LANCZOS BIDIAGONALISATION AND WGCV
In this paper, we deal with the reduced objective function (5). This is achieved by eliminating the linear parameters x in (4). For fixed , the problem (5) is convex, and x can be determined by solving the following linear least squares problem where we omit in B( ) for convenience. If is known in (8), the optimal value of x can be expressed asx

WGCV
Unfortunately, is unknown and has to be estimated from the observation data. An efficient way to estimating is to minimise the GCV function where trace(.) represents sum of diagonal elements. However, the standard GCV may not perform well for some large-scale problems. Chung et al. [27] showed that the GCV tends to over-estimate the regularisation parameters and thus proposed a WGCV method to overcome the semi-convergence behaviour. The WGCV takes the form where controls the value of . Note that =1 results in the standard GCV (10), and > 1 results in smoother solutions.
The method of choice of shall be given in Section 2.3.

Lanczos bidiagonalisation
The minimisation of (11) requires the singular value decomposition (SVD) of the matrix B. When B is too large, evaluation of (11) can be computational impractical, because the SVD is not feasible. This difficulty can be relieved by Lanczos bidiagonalisation which projects the large-scale problem on to a small Krylov subspace. Given the vector y and matrix B, the Lanczos bidiagonalisation is described as follows [30]. Let and for i = 1, 2 … compute where k is the kth iteration of Golub-Kahan bidiagonalisation. The recurrence relations (13) and (14) can be written as where e k+1 denotes the (k + 1) standard unit vector.
At kth iteration, we get an A k . Then, an approximate solution can be obtained from the projected least squares problem where v =∥ y ∥ 2 , and the approximate solution is x k = N kx .

Iterative regularisation
Since the original problem is ill posed, A k may become illconditioned as the iteration proceeds. Thus, we compute a regularised projected least squares problem Note that the dimension of A k is very small compared to B, therefore the SVD can be applied.
In this paper, as described in Section 2.1, we use the WGCV (11) to find an approximate . The WGCV function for problem (18) is given as At early iterations, the is obtained by solving where k , opt is the smallest singular value of A k . In later iterations, is taken as the average of the previous values. According to Equation (19), we choose based on A k . A k is obtained by Lanczos bidiagonalisation of B (see Section 2.2). That is, the selection of depends on B( ) in (4).

BFGS METHOD BASED VP APPROACH FOR LARGE-SCALE IMAGE RESTORATION
In this section, we return to the nonlinear least-squares problem (4). We can treat the problem (4) as a general nonlinear leastsquares problem, and solve it using some standard optimisation algorithms regardless its special structure. The disadvantage of this approach is that the algorithm generally converges very slowly. Or, we can use the block coordinate descent approach, i.e. the AM scheme mentioned in Section 1, which alternates between fixing one set of the variables. The problem with this approach is that it may have uncertain convergence properties and good initial guesses are required.
It will be very beneficial to utilise the fact that the problem is convex in x. Thus we consider a reduced parameter space method. Denote the reduced cost function aŝ We have described in detail how to obtain x( ) in Section 2. According to (6), where D denotes the Fréchet derivative of the matrix. Then, the update of at (k+1)th iteration based on BFGS method is given by where k is the step length, For clarity, we give the pseudocode of BFGS method based VP approach for large-scale image restoration as follows

NUMERICAL EXAMPLES
In order to verify the efficiency and effectiveness of proposed BFGS method based VP approach (denoted as BFGS-VP), we give examples from image deblurring, also called blind deconvolution in the field of image processing.
In the experiments, we use the blurred signal-to-noise ratio (BSNR) to measure the noise contained in the observed image, and the improved signal-to-noise ratio (ISNR) to measure the quality of the recovered image. They are defined as follows: ALGORITHM 1 Pseudocode of BFGS method based VP approach for large-scale image restoration

Step1:
Choose an initial vector 0 an initial inverse Hessian approximation H 0 , and a precision threshold . Set k = 0.

Step6:
Determine the search direction d k = −(H k ) −1 ∇f ( k ), and obtain k by a line search procedure.
where x l is the recovered image.

Blind deconvolution
The image formation process is modelled as (2). The matrix B( ) models the blurring operation and can be written as where P( ) is the PSF. PSF in this article mainly considers the spatial invariance and parametric type of blurring function. Two common types of blur are used in this article: Gaussian blur (PSF form see Equation (29) ) and motion blur (PSF form see Equation (33) ).
In any blind deconvolution problem, some assumptions about the blur are needed. Here we assume a normal Gaussian blur, where the PSF, centered at (s 1 , s 2 ), has the form and let B( ) be the matrix defined by the Gaussian PSF with parameters . Case 1. To make a fair comparison with the results of [7], we use a 512 × 512 "Grain" image convolved with a Gaussian PSF defined by the parameter true = [3, 4, 0.5] T to create a blurred image, and then crop the center of the blurred image to create a size of 256 × 256 blurred image. We also added 1% Gaussian white noise to the image, and define the relative error of the blur parameter as and define the relative error of the image as Using the same initial guess 0 = [5, 6, 1] T as in [7], the relative objective function value (rel obj), the relative gradient value (rel grad), the relative error of the blur parameter (Δ ), the relative error of the image (image error), the calculated regularisation parameters ( ) and the running time are shown in Table 1. Table 1 also lists the results in [7] (denoted as GN-VP) for comparison. From Table 1, it can be seen that proposed method achieves slightly smaller Δ and image error than those in [7]. However, the running time of the proposed method is much shorter than that of GN-VP, which indicates the proposed method is more efficient. To verify the effectiveness of the WGCV, we list the results of both methods using GCV for a comparison in Table 1. It can be observed that the methods using the WGCV obtain better results. Figure 1 shows the original and blurred images, and the reconstructed images obtained by GN-VP and BFGS-VP methods. Figure 2 shows the relative errors of the blur parameters Δ and the errors of the image Δx during the iteration using the GN-VP and BFGS-VP. We note that Δ and Δx begin increasing after eight iterations for the GN-VP method. This is because there is no good stopping criterion. However, this phenomenon does not occur in the BFGS-VP method.
Case 2. In this case, we use another image (cameraman) to test the efficiency and effectiveness of proposed BFGS-VP. Similar to case 1, the 512 × 512 cameraman image is convolved with a Gaussian PSF using the parameter true = [3, 4, 0.5] T , and then a size of 256 × 256 blurred image is obtained by cropping the 512 × 512 image. 1% Gaussian white noise is added to the image. Table 2 lists the rel obj, rel grad, , Δ , image error and running time of the GN-VP method and BFGS-VP method. The initial value is 0 = [5, 6, 1] T . Again, from Table 2, we can see that the BFGS-VP obtains similar image error and smaller Δ compared to the results of GN-VP. Moreover, the BFGS-VP spends much less time than the GN-VP. Figure 3 shows the original and blurred images, and the reconstructed images obtained by the GN-VP and BFGS-VP methods. Figure 4 shows the relative errors of the blur parameters Δ and the errors of the image Δx during the iteration using the GN-VP and BFGS-VP. In this case, the Δ obtained by the GN-VP   has a jump at the 14th iteration and the Δ obtained by the BFGS-VP increases a little at the last iteration. However, this does not severely affect the reconstructed image. Case 3. In this case, we compare the proposed method with the method in [17] (denoted as GCV-BD) and [32] (denoted as VB-TV1 and VB-TV2) using the cameraman image and Lena image of size 256×256. The blurred, noisy images are obtained by blurring the original images with a Gaussian PSF with variance five and adding additive Gaussian noise (BSNR = 20 db and BSNR = 40 db ). Table 3 lists the comparison results of these methods. It can be observed that the proposed BFGS-VP method achieves very competitive results, which demonstrates its effectiveness. Figure 5 shows the original and blurred image, and the reconstructed image of cameraman obtained by the BFGS-VP method. Figure 6 shows the original and blurred image, and the reconstructed image of Lena obtained by the BFGS-VP method.
Case 4. As our method can naturally be extended to deal with non-uniform blur, we report results on an image (shown in Figure 7) degraded by spatially-variant motion blur. This case restores the image degradation caused by uniform linear motion. The specific form of PSF that causes image degradation by uniform linear motion is:    where is the blur length, is the direction of motion. Let and let B( ) be the matrix defined by the uniform linear motion PSF with parameters . we use a 512 × 512 "license plate" image convolved with a Gaussian PSF defined by the parameter true = [12,7] T to create a blurred image. We also added 1% Gaussian white noise to the image. Using the initial guess 0 = [11,6] T . Figure 7. shows the original image, blurred image and the restored image by the proposed method. We can see that our method generated image with clear textures.

CONCLUSION
We have proposed a BFGS-VP algorithm for the large-scale illposed inverse problems with applications to image deblurring. The VP approach, which projects out the x from the original objective function, is different from the commonly used AM scheme. The WGCV, which is an improved version of GCV, is adopted to select the Tikhonov regularisation parameter. The Lanczos bidiagonalisation method is used to project the largescale problem on to a small Krylov subspace. The main selling point of the proposed algorithm is that it avoids computing the complicated "inexact" Jacobian matrix when using the Gauss-Newton algorithm, thus improves the computational efficiency of the algorithm. Numerical experiments verify the efficiency  and effectiveness of the proposed algorithm. Our future work will be combining some deep neural networks [33] to deal with the ill-posed inverse problems in imaging.