A TWO-LEVEL DOMAIN DECOMPOSITION METHOD FOR IMAGE RESTORATION

. Image restoration has drawn much attention in recent years and a surge of research has been done on variational models and their numerical studies. However, there remains an urgent need to develop fast and robust methods for solving the minimization problems and the underlying nonlinear PDEs to process images of moderate to large size. This paper aims to propose a two-level domain decomposition method, which consists of an overlapping domain decomposition technique and a coarse mesh correction, for directly solving the total variational minimization problems. The iterative algorithm leads to a system of small size and better conditioning in each subspace, and is accelerated with a piecewise linear coarse mesh correction. Various numerical experiments and comparisons demonstrate that the proposed method is fast and robust particularly for images of large size.

1. Introduction.Image restoration is one of the fundamental and challenging tasks in image processing [16,2], and phenomenal advances have been achieved in variational and PDE-based approaches since the seminal work [43].The ROF model minimizes the total variation (TV) over the space of bounded variation (BV), so it is capable of preserving sharp edges and boundaries with a high quality recovery.More precisely, given a bounded image domain Ω ⊆ R d (d = 1, 2, 3), we are interested in the general minimization problem: from [36] and was introduced by Osher et.al. in [39].It has been extended to wavelet-based denoising [59], nonlinear inverse scale space in [8,9], and compressed sensing [30,65].The basic idea is to transform the equality constrained optimization problem into a series of unconstrained problems using Bregman distance.By combining the variable-splitting and Bregman iteration, Goldstein et.al. obtained split Bregman method in [28] which is particularly efficient for L 1 regularized problems, e.g., TV restoration.vii.Augmented Lagrangian method [26]: It was proposed in [51] for total variation image restoration.It has many advantages over other methods such as penalty method [4].As only linear problems need to be solved during the iterations, FFT can be applied to get extremely efficient implementations.In addition, the augmented Lagrangian approach provides close connections to dual methods and split Bregman iteration [65,51].viii.Multigrid method [52,53,64]: It is one of the most powerful numerical methods for solving some linear and nonlinear partial differential equations.In [18,31], the linear algebraic multigrid method [46] was adopted for solving the above PDE in each (outer) step of a fixed iteration, while [44] attempted to use the standard multigrid methods with a non-standard and somewhat global smoother.Recently, nonlinear multigrid methods based on the subspace correction approach of [52] have been introduced to image processing in [19,13].
Numerical experiments indicate their overwhelming numerical potentials.These methods have been widely used for image processing, and their strength and weakness have also been observed from real applications.Dual methods and Bregman iterations are fast, but they are under intensive investigation for the applications to more general image processing problems.Graph-cut approaches are usually fast, but they can be only applied to a special class of problems and could also have matriculation errors.The AOS and multigrid methods also have limitations in the models that they can be applied.
The purpose of this paper is to propose a fast solver based on overlapping domain decomposition and a coarse mesh correction for image processing tasks.Our aim is to demonstrate several essential advantages of the proposed method.More precisely, i.This method can be used for very general variational-based image processing problems.Indeed, based on this notion, one can easily apply the existing solvers to the minimization problem of a given task by solving a sequence of subdomain problems of smaller scale.ii.In practice, the original problem, e.g., large size 3D data processing, could be too large, which induces difficulties in applying a given solver.By splitting a large problem into many smaller sub-problems, we could apply the given solver.iii.The proposed method can save CPU time cost.The gain is significant with an implementation of efficient and relatively accurate subdomain solvers, iv.The proposed method is well-suited for distributed-memory parallel computers, hence it can be sped up by parallel implementation.It is known that domain decomposition (DD) methods are powerful iterative methods for solving partial differential equations [7,21,29,45,64].Some recent progress has shown that DD methods are also efficient for some nonlinear elliptic problems and some convex minimization problems [49,48,50,52] with mesh independent convergence.So far, it is still unknown that one can use domain decomposition methods for the ROF model.Some recent efforts have been devoted to study these problems [47,32,24,23].For simplicity of presentation, we propose and test this method on the ROF model, see (1), and present the details of the implementation.We provide ample numerical results to show its capability in processing images of large size with saving in CPU time and memory.Once again, the essence of the method is to regard domain decomposition method as a space decomposition technique.The original minimization problem related to ROF is reduced to some sub-minimization problems with smaller size over the sub-domains.If the sub-minimization problems are solved exactly, the convergence of the generated sequence is trivial to prove.Due to the degeneracy of the nonlinear equation of ROF, it is not convincing that we will be able to prove the convergence rate for the numerical solutions.
The rest of the paper is organized as follows.In section 2, we present the domain decomposition algorithm under a general framework of the subspace correction method.We describe the two-level method in section 3, and we give the detailed implementation for the ROF model in section 4. We provide various numerical results to demonstrate the merits of the proposed methods in section 5. We conclude the paper some discussions.

Domain decomposition based subspace correction method.
We put the method in a more general setting and start with the description of the subspace correction algorithm of [52].
Given a reflexive Banach space V and a convex, Gateaux differentiable functional F : V → R, we consider the minimization problem: min u∈V F (u). ( Under the notion of space correction, we first decompose the space V into a sum of smaller subspaces: which means that for any v ∈ V , there exists v j ∈ V j such that v = m i=1 v j .Following the framework of [64] for linear problems, we solve a finite sequence of sub-minimization problems over the subspaces: where u n denotes a previous approximation, to resolve (3).Two types of subspace correction methods based on (4)-( 5), known as the parallel subspace correction (PSC) and successive subspace correction (SSC) method, were proposed in [64,52].
Here, we adopt the latter, which can be described as follows: Algorithm SSC.Choose an initial value u 0 ∈ V .
In the following, we apply the algorithm to the (regularized) ROF denoising model with the cost functional: where z is a given noisy image defined on Ω = (0, 1) × (0, 1).Here, the TV-term is regularized so that F is differentiable and it also avoids the division by zero in the corresponding Euler-Lagrange equation: with a homogenous Neumann boundary condition ∂u/∂n = 0 along the boundary.
Recall that the lagged diffusivity fixed-point iteration (cf.[56]) for ( 7) is to solve the linearized equation with the initial value u 0 .We see that each iteration involves all the pixel values in the image domain, so it will be costly and usually the system is not in good conditioning when the size of images is large.The domain decomposition based SSC algorithm will overcome the difficulties by breaking down the whole problem into sub-problems of much smaller size.
In the first place, we use an overlapping domain decomposition to decompose the solution space V = BV (Ω).More precisely, we partition Ω into m overlapping subdomains For clarity, the subdomain Ω j is colored with a color j, and Ω j consists of m j subdomains (assumed to be "blocks" for simplicity), which are not intersected.Hence, the total number of blocks that cover Ω is Figure 1 illustrates schematically the decomposition of Ω into four colored subdomains with 25 blocks.Based on the above domain decomposition, we decompose the space V = BV (Ω) as where BV 0 (Ω j ) denotes the subspace of BV (Ω j ) with zero traces on the "interior" boundaries ∂Ω j \∂Ω.Applying the SSC algorithm to the TV-denoising model leads to an iterative method.
Given an initial value u 0 ∈ V, Algorithm SSC needs us to solve Figure 1.Schematic illustration of the coloring of the subdomains, and fine/coarse meshes on Ω = (0, 1) 2 .This corresponds to the decomposition: Here, we notice that e n j is the solution of the subproblem over Ω j .It is also easy to see that u n+ j m satisfies the associated Euler-Lagrange equations for 1 Outside Ω j , we have u n+ j m = u n+ j−1 m .Thus, there is no need to solve u n+ j−1 m outside Ω j .As the subdomain Ω j may contain many disjoint "block", the values of u n+ j m can be obtained in parallel in these "blocks" by solving (13).More details on the disretization of (13) will be described in the forthcoming section.
3. The two-level domain decomposition method.In this section, we build another ingredient, i.e., a coarse mesh correction, into the previous domain decomposition method.For clarity of presentation, we introduce the coarse mesh solver in the finite element setting.Similar explanations are also valid for the finite difference approximations.
We first partition the domain Ω into a coarse mesh {T H } with a mesh size H, and then refine it into a fine mesh partition {T h } with a mesh size h < H. Assume that both the coarse mesh and the fine mesh are shape-regular, and let {D i } m i=1 be a non-overlapping domain decomposition for Ω and each D i is the union of some coarse mesh elements (see Figure 1).Let V := V (Ω) be the space to be specified later.Let S H ⊂ V (Ω) and S h ⊂ V (Ω) be the continuous, piecewise linear finite element spaces, over the H-level and h-level subdivisions of Ω respectively.More specifically, For each D i , we consider an enlarged sub-domain The union of Ω i covers Ω with overlaps of size δ.Let us denote the piecewise linear finite element space with zero traces on the boundaries ∂Ω i \∂Ω as S h (Ω i ).Then one can show that For the overlapping subdomains, assume that there exist m colors such that each subdomain Ω i can be marked with one color, and the subdomains with the same color will not intersect with each other.For suitable overlaps, one can always choose Let Ω i be the union of the subdomains with the i th color, and define By denoting subspaces Note that the summation index is from 0 to m instead of from 1 to m when the coarse mesh is added.For a better understanding of the above slightly abstract setting, we present below two illustrative examples.
Example II: A coarse mesh correction to space decomposition.We consider the simple unit square domain Ω = (0, 1) × (0, 1) and a uniform triangulation T H (Ω) = {τ } and piecewise linear finite element spaces.If we take H = 4h, then the values of the 2D basis function may be denoted by matrix I h H (called the correction If we apply Algorithm SSC to decomposition (16) with the coarse mesh, we will get the following domain decomposition algorithm: It is seen that the above iteration algorithm requires to solve a sequence of the minimization problems over the subspaces/subdomains.For the TV-denoising problem ( 6), the prototypical variational formulation of the sub-minimization problem is where (•, •) is the L 2 (Ω) inner product, and V is the finite dimensional space V h j or V H 0 .
h,j , we solve the problem: However, a little care has to be taken for the transition between the coarse mesh and fine mesh, and the details will be presented in the forthcoming section.
4. Numerical discrete algorithm for TV denoising.We next present the full two-level algorithm formulated in the previous section for the TV-denoising model.We partition the image domain Ω = (0, 1) 2 into N × N uniform cells with mesh size h = 1/N.The cell centers are Hereafter, let z i,j be the pixel value of the original image z at (x i , y j ).It is known that using proper quadrature rules, finite element methods reduce to finite difference approximations on uniform grids.Hereafter, we shall adopt finite difference to discretize (19) and (20).Let u i,j be the finite difference solution at (x i , y j ).Denote The finite difference approximation of ( 19) is where α h = α/h and β h = h 2 β.The one-sided second-order finite differences are used to treat the Neumann boundary conditions, say at x = 0: Boundary conditions are also needed when evaluating δ c x and δ c y at the boundary nodes.
We now turn to the coarse mesh problem (20).Firstly we note that (20) can be written in the form: with As before, we need to define a restriction operator for the explanation of the algorithm.For any given r ∈ S h , we define Given a r ∈ S h , I H h r can be obtained numerically by a proper summation of r multiplied with the matrix I h H defined in (17) over the support of the finite element basis functions.Analysis in [49] showed that we could solve the subproblems approximately.So we shall use the following finite difference scheme to solve (20) approximately: where α H = α/H and β H = H 2 β.Formly, the above system can be written as where { ÃH , A h } are the coefficient matrices of the systems ( 22) and ( 25), respectively, and {e H , z, u h } are vectors of the pixel values.The matrix ÃH is an approximation of the system matrix for finite element equation (24), that is, we have replaced a n by the known data a n−1 , and also used the values at the coarse grid points.We summarize the above algorithm as follows.
Algorithm DDC-TV.Choose an initial value 5. Numerical results.We present in this section various numerical results to demonstrate the efficiency of the proposed domain decomposition algorithms without or with a coarse domain correction, denoted by DD and DDC in short, respectively.Their performance is assessed by comparing with the naive lagged diffusivity fixed-point iteration (i.e., (8), denoted by TV) in terms of convergence, recovery of peak signal-to-noise ratio (PSNR) and computational time.We remark that to the best of our knowledge, these algorithms have not been applied to the image restoration problems before.Hence, it is interesting to see some good results and particularly how the methods can be applied to restore images of large size.
Hereafter, assume that the pixel values of all images lie in the interval [0, 255], and the Gaussian white noise is added by the normal imnoise function imnoise(I,'gaussian', M , σ) (i.e., the mean M and variance σ) in Matlab.In our tests, we use PSNR [6] as a criteria for the quality of restoration.This quantity is usually expressed in terms of the logarithmic decibel scale: PSNR = 10 log 10 255 2 where {u i,j − z i,j } are the differences of the pixel values between the restored and original images of size n 1 × n 2 .Typical values for the PSNR in lossy image and video compression are between 30dB and 50dB (the higher implies the better).Acceptable values for wireless transmission quality loss are considered to be about 20dB to 25dB.We shall also use the relative dynamic error between two consecutive iterations: for a prescribed tolerance , as the stopping rule.
We test the methods on three typical images: lena-512 × 512, boat-1024 × 1024, and cow-2048 × 2048.All the computations are done in Matlab on an IBM server with 2.93 GHz, 8 Intel(R) Xeon(R) Quad-Core CPU and 128GB RAM.In the first two sets of experiments, we fix β = ε = 10 −4 , and choose the mean value M = 0 and the variance σ = 0.04 for the noise level, whose signal-to-noise ratio (SNR) is roughly between 8.8 to 9.1.We compare the methods with different α, subdomain size d (pixels) and overlapping size δ (pixels), and show their performance with respect to the iteration number k and computational time T. Finally, we test the methods with smaller regularization constant β, and other noise levels.
We shall see that the proposed methods lead to significant time and memory saving.Moreover, they are not sensitive to the image size, and the choice of the intrinsic parameters d and δ can be fairly relaxed.Hence, they provide fast and robust means to process images in particular of large size.

Convergence.
To set up a more quantified and reasonable rule for comparison, we fix β = ε = 10 −4 , and first identify a suitable α in the TV model with larger PSNR values (i.e, a better restoration).We plot in Figure 3 the PSNR against α obtained by TV (i.e., ( 8)) and the DD (i.e., (13)) with different subdomain and overlapping sizes for lena-512 × 512 and boat-1024 × 1024.For both cases, a good choice is α between 30 and 50.Hereafter, we fix α = 40.We also observe from Figure 4 that the PSNR reaches the "maximum" values after about ten iterations for TV, DD and DDC.
In Figure 4, we plot the dynamic error history of the three methods and PSNR values against the iteration steps.We see that the DD and DDC exhibit a convergence behavior similar to that of the fixed-point iteration.Hence, the domain decomposition method produces as good quality as the classical TV restoration via local operations and/or some global corrections.In Figure 5, we plot the computed solution for the first iteration for the boat-1024 × 1024 image.We have used d = 32 and δ = 4.These plots visualize the local step-by-step denoising effect and the recovery through overlapping subdomains.5.2.Sensitivity to the subdomain size and overlapping size.After understanding some general behaviors of the algorithms, we next demonstrate the timesaving by DD and DDC, and also provide some guidelines on the choice of the subdomain and overlapping sizes.
To illustrate the impact of overlapping sizes, we tabulate in Table 1 the PSNR and CPU time of the classic TV by the lagged diffusivity fixed-point iteration and DD with subdomain size 32, but with different overlapping size δ.Here, the percentage of the CPU time is against TV, and likewise for other tables.We see that that the PSNR obtained by DD is not so sensitive to the overlapping size δ, while the computational time increases as δ increases, as expected.To have a good trade-off between convergence rate and quality of restoration, it is advisable to choose δ to be 2, 3 or 4. It is essential to point out that the use of DD leads to a remarkable reduction of computational time in particular for images of large size.One also refers to Figure 6. the restored lena-512 × 512 image.
We further examine the impact of subdomain size to the overall performance of DD.For this purpose, we fix the overlapping size δ = 4, but vary α and the subdomain size d.Once again, Table 2 indicates a significant gain in computation time.It also shows that a good choice of d is roughly 1/16 of the given image size.We now particularly examine the effect of coarse mesh correction.We tabulate in Table 3 the computational time and PSNR for three methods with various choice of subdomain sizes.We also test them on large size image cow-2048×2048.Once more, it indicates the notable time saving by using domain decomposition techniques, and also justifies the 1/16−rule for the choice of subdomain size.However, we realize that the coarse mesh correction does not help too much to DD in the sense that almost the same number of iterations are needed for DD and DDC, but more time is consumed, since additional coarse mesh equations have to be resolved at each iteration (cf.Algorithm DDC).We believe that the main reason is the regularization constants α, β are small, which results in stiff elliptic equations and the correcting effect is expected to be minor.The influence of the coarse mesh correction increases when β is bigger as shown in Table 4.
We illustrate below some samples of the restored image obtained by DD or DDC with an "optimal" choice of the intrinsic parameters.Figure 6 is devoted to the  lena-512 × 512, where we depict the difference images and find that three methods with the same stopping rule give indistinguishable restored images.Indeed, the PSNR are very close and the dynamic error u − u K 2 / u 2 (where u is the true image, and u K is the restored image by TV, DD or DDC with K steps) are TV: 0.8029, DD: 0.8076 and DDC: 0.8060.Figure 7 illustrates the restored image of larger size by DD.Now, we examine the methods for very small regularization parameter β, and for images with considerable higher noise level.We record in Table 5 the iteration number k, computational time, and PSNR for TV and DD with the regularization parameter β = 10 −12 .In this case, the system is very stiff, so TV is extremely costly, but the stiffness can be significantly relaxed by breaking down the size of system, so DD is very fast.In a nutshell, DD is still very efficient for small β.
Finally, we illustrate in Figures 8-9, the restoration of boat-1024 × 1024 and cow-2048 × 2048 with higher noise level by DD.As before, the quality is essentially the same as that by TV, but recovered by much less CPU time.

A comparison with dual approaches.
To further show the performance of the domain decomposition method, we compare it with the dual algorithms in [14,10,16], which have been proven to be very efficient for the ROF model.We first briefly review the algorithm (cf.[16, p.199-200]).To solve (7), we introduce a new variable where |∇u| β = u 2 x + u 2 y + β for some β > 0. For the ROF model, we obtain a system in (u, p) :  Following [10], we update the dual variable p by the scheme: where the gradient and divergence operators can be discretized by the forward and backward finite difference, respectively, together with the boundary condition p| ∂Ω = 0, as in [10].
We compare the domain decomposition method with the dual algorithm in terms of decay of the residual Setting u n = z − αdivp n , we find from ( 30)-( 32) that the residual of the dual algorithm can be computed by In Figure 10, we plot the ratio R n 2 / R 1 2 (where • is the l 2 -norm as before) versus the iteration number for both methods.In this comparison, we choose the noise level σ = 0.02, the parameters α = 40, β = 10 −6 , the time step τ = 1/16 in (34), and the subdomain size d = 32 and overlapping size δ = 4 in DD.We see from Figure 10 that the decay rate of the residual (33) of DD is more than 20 times faster than that of the dual algorithm (and the gain is even more for images of larger size).In other words, for a given accuracy, the DD with the current inner solver on each subdomain requires much smaller number of iterations.Hence, with this trade-off, the total computational cost of DD is comparable to that of the dual algorithm, in particular, for images of large size, although it is fairly sensitive to give a fair comparison.It is essential that the use of domain decomposition technique allows for breaking down the size of the problem, and one might choose a more efficient subdomain solver with a good balance between the accuracy and efficiency.We want to emphasis that our method can be used for general image processing problems.Here, we have used the well-known ROF to demonstrate that the domain decomposition approach can give superior performance in term of computing and memory efficiency.

Table 3 .
Comparison of TV, DD and DDC for boat-1024 × 1024 and cow-2048 × 2048 with σ = 0.04, β = = 10 −4 , α = 40 and various subdomain sizes.Here, PSNR1 and PSNR2 refer to the PSNR of DD and DDC, respectively, and likewise for the iteration number k and computational time T.