A NONLINEAR MULTIGRID SOLVER WITH LINE

. Based on the Fenchel pre-dual of the total variation model, a non-linear multigrid algorithm for image denoising is proposed. Due to the structure of the diﬀerential operator involved in the Euler-Lagrange equations of the dual models, line Gauss-Seidel-semismooth-Newton step is utilized as the smoother, which provides rather good smoothing rates. The paper ends with a report on numerical results and a comparison with a very recent nonlinear multigrid solver based on Chambolle’s iteration [6]


1.
Introduction. Image restoration is a fundamental problem in image processing, and over the years various techniques for solving associated variational problems have been developed. Due to the potential non-uniqueness of the solution and the numerical instability from ill-conditioning, image restoration is an ill-posed problem and requires appropriate regularization for stabilization. Among the many techniques, total variation (TV) regularization as proposed by Rudin, Osher and Fatemi [22] is one of the most well-known ones. In this approach, the image u is restored by solving the minimization problem (1) min where Ω ⊂ R 2 is an open and bounded image domain with a Lipschitz-continuous boundary ∂Ω, K is a linear and continuous blurring operator from L 2 (Ω) to L 2 (Ω), i.e., K ∈ L(L 2 (Ω)), which we assume to be known, and α > 0 is a regularization parameter controlling the trade-off between fitting the degraded image z ∈ L 2 (Ω) and a smoothness requirement due to the total variation regularization. By BV (Ω) we denote the space of functions of bounded variation, i.e. u ∈ BV (Ω) if u ∈ L 1 (Ω) and the BV -seminorm is finite. The space BV (Ω) endowed with the norm u BV (Ω) = u L 1 (Ω) + Ω |Du| is a Banach space; see, e.g., [13].
The non-differentiability of the TV-term Ω |Du| is responsible for preserving edges in images, but at the same time, it poses significant numerical challenges. In the recent past, various research efforts have been undertaken to overcome this difficulty; see, e.g., [10,5,8,24,4,21,20,23,16,17] as well as the monograph [31] and the many references therein.
The first order condition or Euler-Lagrange equation of (1) leads to a differential inclusion which, in a regularized form, becomes a system of nonlinear partial differential equations (PDEs). For their numerical solution, multigrid (MG) methods appear to be ideal candidates as they are known to be efficient numerical techniques for solving partial differential equations; see, e.g., [14,27,32]. The convergence behavior of MG techniques usually depends on the type and structure of the differential operator. For linear elliptic second order operators the method has been analyzed and applied successfully, e.g., in [14] and the references therein. Nonlinear variants can be found in [3,14], for instance. In the multigrid literature, the differential operator is usually assumed to enjoy differentiability properties, whereas the system resulting from the Euler-Lagrange equation of (1) is non-differentiable. This necessitates appropriate reformulations of the PDE system and extensions of smoothing steps in order to account for the non-differentiability.
In [29] a multigrid (MG) solver acting on the primal problem (1) was proposed. In that paper it is noted that a straightforward application of a (nonlinear) MG method to a smoothed version of the Euler-Lagrange system does not work well due to the discontinuity in the diffusion coefficient. Rather the authors propose to resort to a flux-formulation similar to flow through a porous medium with fixed porosity (which amounts to a reformulation of the usual "lagged diffusivity" formulation [30]). It appears that this format improves over the straightforward application of MG. Several discretization and smoothing operations for yet a primal version of multigrid for total variation regularization were investigated in [12]. In [28] multigrid respectively multilevel techniques are used as preconditioners of the conjugate gradient method applied to the subproblem of a "lagged diffusivity" iteration for solving the total variation regularized problem.
Recently, dual formulations of the total variation problem and associated solution algorithms became popular; see, e.g., [4,7,16,17]. Furthermore, the multigrid algorithm is introduced to solve the dual problem which is a quadratic minimization under a convex constraint, see [26,25]. In [6], a nonlinear multigrid algorithm was proposed to solve the dual formulation of (1) formally given by where ω = (ω 1 , ω 2 ) is the dual variable. The restored image is obtained as u = z −αdiv( ω). In this algorithm, the semi-implicit time-marching scheme of [4] (which we call Chambolle's iteration or method in what follows) is utilized in the smoothing step of the multigrid method. It is known that the convergence rate of the smoothing step is at best linear (in the primal variable) and does not exhibit a resolution independent convergence behavior. Moreover, the dual variable is not necessarily unique at the solution due to the non-trivial kernel of the generalized divergence operator, the adjoint of the derivative in the TV-term. As a consequence of these facts, the efficiency of the multigrid iteration appears to be compromised. Moreover, for practical purposes it turns out that |∇(αdiv( ω) − z)| l 2 close to zero is problematic due to the non-differentiability of the total variation term. This affects the efficiency of the smoother. As a remedy, in [6] (like in the Newton-type scheme [7]) |∇(αdiv( ω) − z)| l 2 is replaced by |∇(αdiv( ω) − z)| 2 + β with sufficiently large β > 0 to maintain an acceptable smoothing rate and, thus, turning a non-differentiable term into a C ∞ -term. The choice of β implies a decision between a good smoothing rate versus the quality of the reconstruction. In particular, large β leads to artifacts in homogeneous image regions.
In this paper we aim at the introduction of a nonlinear multigrid algorithm in the framework of the dual problem, which overcomes the above mentioned limitations. In fact, the Fenchel pre-dual of (1) is known to be a convex quadratic problem subject to bound-constraints [16]. Since the objective function of the pre-dual problem involves the divergence operator, which has a nontrivial kernel, we propose two models with different regularization terms, respectively, to avoid the nonuniqueness of the dual solution. Then, based on these two models we utilize the nonlinear multigrid algorithm to solve the pertinent Euler-Lagrange equations. Compared to the smoothed model in [6], our two models are much closer to the original nonsmooth total variation formulation. Moreover, one of the models allows for the application of so-called semismooth Newton solvers [15] in the associated function space setting resulting in a resolution independent solution algorithm. Combined with a nonlinear multigrid algorithm, we propose a line Gauss-Seidel-semismooth-Newton step as the smoother, which provides better smoothing rates than the smoother suggested in [6]. Line smoothing is necessary due to the structure of the differential operator involved in the Euler-Lagrange equations.
The outline of the rest of the paper is as follows. In Section 2 we recall some basic facts from convex analysis and state the pre-dual of (1) in the sense of Fenchel. Section 3 is devoted to the description of our nonlinear multigrid algorithm. Section 4 gives numerical results to demonstrate the performance of the new method. Moreover, we compare our method with the one in [6]. Finally, conclusions are drawn in Section 5.

2.
Pre-dual problem of total variation model for image restoration. We start by recalling the Fenchel duality concept in infinite dimensional spaces in a form that is convenient for our work; see, e.g., [11] for details. For this purpose let V and Y be Banach spaces with topological duals V * and Y * , respectively. Let F : V → R ∪ {∞} and G : Y → R ∪ {∞} be convex, lower semi-continuous functionals such that there exists v 0 ∈ V with F(v 0 ) < ∞, G(Λv 0 ) < ∞, G is continuous at Λv 0 , and Λ ∈ L(V, Y ) is a continuous linear operator from V to Y . Then we have where the conjugate of F is defined by F * : and analogously for G * : In order to compute the Fenchel dual of (1) formally, we set Λ = ∇, Based on the definition of the Fenchel conjugate (4), we get which is assumed to be invertible, and I S is the indicator function of the set S. Here, ·, · denotes a duality pairing. Hence, in view of (3) the formal dual of (1) is given by subject to (s.t.) | p| l 2 ≤ α almost everywhere (a.e.) in Ω.
Since the kernel of the divergence operator is nontrivial, we add a dual regularization term with associated regularization parameter µ > 0 to enforce uniqueness of the solution of the pre-dual. Here, we consider the following two variants: Note that the regularization in (7) lifts p into H 1 0 (Ω) whereas the second regularization maintains the function space regularity of the original pre-dual problem (6). Like in [17] we are able to find the effect of the dual regularization in (8) in the primal problem. In fact, the total variation term Ω |Du| in (1) is approximated by Observe that Φ µα ( w) is a Huber-type function (see [19]) which smoothes the total variation regularization only locally. For the first regularization the effect in the primal problem is non-local in the sense that the regularity of the dual variable p increases from H 0 (div) to H 1 0 (Ω). In [16], see also [17], it was shown that the penalized version of (8) given by (9) min with a penalty parameter γ > 0 and the max-term penalizing violations of the pointwise a.e. constraints in (8) can be solved by a semismooth Newton iteration acting on the Euler-Lagrange equation. It is well-known that γ → ∞ enforces the con- for some constant C > 0 (independent of γ), where p γ denotes the solution of (9). The semismooth Newton iteration converges locally at a superlinear rate in function space and, as a consequence, exhibits an (image) resolution independent convergence behavior in the discrete setting. This suggests to utilize such a semismooth Newton solver as the smoothing iteration for the nonlinear multigrid method given below. Motivated by (9), we approximate (7) by (10) min The Euler-Lagrange equations for (10) and (9), respectively, read: Here,¯ p denotes the unique solution of (10) or (9), respectively. In the next section, we introduce a nonlinear multigrid algorithm to solve (11) and (12) in the denoising case, i.e., for K equal to the identity operator. The restored image is obtained byū = div¯ p + z. Variants of nonlinear multigrid methods in the deblurring case, i.e. for K not the identity, might be conceivable but go beyond the scope of the present work.
3. Nonlinear multigrid method for the dual problem. For (11) and (12) in the denoising case, i.e., K = id yielding we introduce a nonlinear multigrid scheme for restoring high resolution images efficiently. From now on we proceed in discrete terms, but, for the sake of simplicity and readability, we keep the notation from the continuous context. Assume that the discrete image domain Ω contains (m − 1) × (n − 1) pixels, and define the discrete gradient operator ∇ : corresponding to the discrete derivative in the x-and y-direction, respectively. Furthermore, we have the divergence operator div = −∇ and : . For convenience, we denote the discretized versions of (13) and (14) by and denotes the Hadamard product. Then, the nonlinear full approximation scheme (FAS) multigrid algorithm [32,27,9] for solving either (15) or (16) reads as follows.
. Set i ∈ {1, 2} and r 1 = ∇z. Start from finest level l = 1 and do j steps of a nonlinear multigrid V-cycle for solving A i ( p) = r 1 , which we call NMG( p 1 , r 1 , 1). 2: One step of a V-cycle of our nonlinear multigrid procedure NMG( p l , r l , l) proceeds as follows: • if l = L, the coarsest level, perform a high accuracy solve of A L i ( p) = r L and let p L denote the corresponding solution.
• else, on level l, do -Pre-smoothing: p l = S ν1 l ( p l , r l ). -Restrict to the coarser grid: -Set the initial solution on level l + 1 as˜ p l+1 = p l+1 .
-Interpolation the correction and compute the corrected approximation: Here P l l+1 denotes an interpolation operator from level l + 1 to level l, ν 1 and ν 2 are the numbers of the pre-smoothing steps and the post-smoothing steps on each level l, respectively.
Next, we describe each element of the FAS multigrid algorithm for solving (13) and (14) in detail.
3.1. Line Gauss-Seidel-Newton smoother. The convergence of the overall multigrid scheme hinges on the performance and reduction rates of the smoothing iteration. For this reason, we first focus on the choice of the smoother in S νi l ( p l , r l ) with i = 1, 2. Note that according to [27] the h-ellipticity property of the discrete differential operator is important when constructing a pointwise smoother. Since ∇div in (13) and (14) does not obey such a h-ellipticity property we have to proceed differently, i.e. with an alternative to a pointwise smoother. In fact, as a remedy we propose to use a line Gauss-Seidel-Newton iteration as a smoother in our nonlinear multigrid algorithm.
It is well-known that line relaxation is particularly efficient when solving anisotropic problems due to its effectivity in the direction of strong coupling. As the relevant term in our case is −∇div p we clearly find that the directions of strong coupling of p 1 and p 2 are different. In fact, for p 1 the x-direction couples strongly, whereas for p 2 the y-direction is coupling strongly. Consequently, we utilize an alternate line relaxation in our smoothing step. Below we exemplarily discuss the x-line Gauss-Seidel-Newton iteration in detail. The y-direction relaxation can be obtain similarly.
We discretize the Laplace operator by the standard five-point stencil with homogenous Dirichlet boundary conditions and use forward differences for the gradient and corresponding discrete adjoint scheme for the divergence operator. Then, the operators that appear in (13) and (14) can be written as where h x and h y are the widths of the subintervals in x-direction and y-direction, respectively. Here (and in our numerics) we consider the image domain with boundary given byΩ = [0, 1] 2 . Corresponding to the assumption at the beginning of Section 3, on the discrete level it contains (m + 1) × (n + 1) pixels in total. Hence, we have h x = 1 m and h y = 1 n , and i = 0 ∧ i = m respectively j = 0 ∧ j = n belong to pixels on the boundary ∂Ω, where the values of v are assumed to be known.
The x-line Gauss-Seidel iterations for the discretized versions of (13) and (14) are denoted as (17) A GS ( p k+1 ·,1 , · · · , p k+1 ·,j−1 , p k+1 ·,j , p k ·,j+1 , · · · , p k ·,n ) = r k , where p ·,j is the jth row of p, and the indices k and k + 1 denote the current and the new approximation, respectively. We note that these nonlinear equations for the unknown p k+1 2 ) ·,j ) are non-smooth, i.e. not necessarily Fréchet-differentiable. This is due to the presence of the maximum function. But they can be solved efficiently by a semismooth Newton method, which converges locally at a superlinear rate [17]. Applying a generalized Newton step to (17) with initial value p k ·,j yields where H 21 = H 12 .
Remark 1. For the nonlinear equation (13), we have , and χ t i,j = 1, if | p t i,j | l 2 > α, 0, else. For the generalized Newton step for equation (14), we have (13), , and In order to study the efficiency of the nonlinear multigrid algorithm with these line Gauss-Seidel-Newton smoothers, in the following we use local Fourier analysis to measure the largest amplification factor in the relaxation scheme (18).

3.1.1.
The smoothing property analysis. The convergence behavior of a multigrid algorithm depends strongly on the smoother. A convenient tool for the study of the expected reduction or smoothing rate is Fourier analysis.
Since the smoothing rate of the nonlinear multigrid algorithm proposed in [6] is mainly limited by the case when |(∇u k+1 ) i,j | l 2 is close to zero, in this section we mostly focus on this case, i.e., | p k+1 i,j | l 2 ≤ α (see the arguments right before Theorem 1). Then, we have max 0, 1 − α | p k+1 i,j | l 2 = 0. In the following, for simplifying the formulas, we assume that the numbers of the intervals in each row and each column are equal to n, i.e., m = n and h := 1 n = h x = h y . The extension to m = n is straightforward.
When | p k+1 i,j | l 2 > α, then the γ-terms in (13) and (14) are positive instead of zero, which will increase the diagonal elements in the first matrix in M i θα,θ β , i = 1, 2, and, as a result, the smoothing rate improves, i.e., it becomes smaller.
The amplification matrices of the relaxation in x-direction and the one for the y-direction are related by showing that the smoothing rate of the y-line Gauss-Seidel iteration is equal to that of the x-line Gauss-Seidel iteration.
In Table 1 we list the smoothing rate max θα,θ β ρ(M θα,θ β ) of the line Gauss-Seidel smoother for the nonlinear equations (13) and (14), respectively. Comparing the results for the two models, we see that when using the line Gauss-Seidel iteration for (13) we obtain a better smoothing rate than for (14). As we mention in Section 2, the dual regularization parameter µ is related to a (local) smoothing of the total variation. It therefore has the same effect as C := |∇(div ω k − z α )| l 2 = |∇u k | l 2 in the Fourier analysis of the multigrid method proposed in [6]. Hence, we are able to compare the smoothing rate of our method with the one in Table 4.1 of [6] and find that our method provides a better smoothing rate. Even when compared with the improved method in Table 4.2 of [6] our method still exhibits a smaller rate.

Restriction and interpolation.
For the sake of completeness, in this section we provide our choices for the restriction and interpolation operations in our multigrid algorithm. These operators are merely standard in the multigrid literature.

3.3.
Initialization. In our smoothing steps, for solving the nonlinear equations (13) and (14) we utilize a semi-smooth Newton method, which converges superlinearly provided that the initial value p 1 is sufficiently close to the solution. In our numerics we found the following initialization, which is related to the unconstrained version of the dual problem, to work well: Consider problem (1) with K = id and the associated Euler-Lagrange equation, which formally leads tō with |¯ p| l 2 ≤ α in Ω. Disregarding the pointwise constraint on¯ p, we initialize our algorithm by p 1 = ∇q, whereq =q(ū) solves −∆q = z −ū in Ω, ∂q ∂ n = 0 on ∂Ω.
Note that p 1 satisfies div p 1 =ū − z ∈ L 2 (Ω) and p 1 · n = 0 on ∂Ω. Thus, p 1 ∈ H 0 (div); see Theorem 1 for the notations. In practice, we replaceū byũ, a smoothed version of z obtained from a convolution of Gaussian kernel and z. In our numerical experiments, we use the 7-by-7 Gaussian kernel with standard variation 1. Then, we set p 1 := ∇q(ũ) yielding z −ũ = div p 1 in Ω and p 1 · n = 0 on ∂Ω.
In Table 2 we compare the above initialization strategy with an initialization utilizing the full multigrid approach (i.e. coarse-to-fine mesh refinement only), which is a popular choice in the literature. For a detailed discussion concerning the performance of both initialization we refer to the next section.
4. Numerical results. In this section, we provide numerical results for studying the behavior of our nonlinear multigrid method with respect to its image denoising capabilities and the speed of convergence. Furthermore, we compare our method with another nonlinear multigrid method based on Chambolle's iteration proposed in [6], the CCC-method for short. In all of our experiments reported on below the image intensity range is [0,255]. Unless otherwise specified in our nonlinear Figure 1. Results of the method proposed in [6] and our method for restoring the image corrupted by Gaussian noise with σ = 0.1: (a) original image, (b) noisy image, (c) by the CCC method in [6] with β = 10 −2 , (d) by the CCC-method in [6] with β = 10 −6 , (e) by our method for the model (10), (f) by our method for the model (9). multigrid method, we utilize five complete line Gauss-Seidel iterations, i.e., one xline and one y-line Gauss-Seidel iteration, as pre-smoothing and post-smoothing on each level, i.e., ν 1 = 10 and ν 2 = 10. The algorithm is stopped as soon as the initial residual is reduced by a factor of 10 −7 . For comparing fairly, we run the CCCmethod under the same parameter selections for (ν 1 , ν 2 , α) and the same stopping conditions. Example 1. In the first experiment, we use a simple 127-by-127 gray-level image, which is corrupted by Gaussian white noise with noise level σ = 0.1; see Figure  1. In Table 2 we list the number of cycles of nonlinear multigrid for reaching the stop criterion (denoted by k), the CPU time (in seconds), and the PSNR-values [2]. Although both methods denoise the image by solving the dual problem of the total variation model (1), they utilize different ways to overcome the difficulty from the constraint in (6). In the CCC-method, a fixed point type projection scheme is utilized within a semi-implicit time-marching scheme [4] used as the smoother.
In order to avoid a smoothing rate close to 1 in homogeneous regions (where the gradient is (close to) zero), the CCC-method uses the smoothing approximation Ω |∇u| 2 l 2 + β with β > 0, to avoid the non-differentiable TV-term. As β ↓ 0, the solution of the CCC-method approaches the one of (1). Moreover, the larger β becomes the better the CCC-method converges. It is however known (compare  (9) 1 11.34 34.8000 1 (The full multigrid algorithm is utlized as initialization.) Table 2. Comparison of different methods for denoising the image in Figure 1. Figure 1) that the quality of the reconstruction degrades for large values of β.

plots (c) and (d) of
In the following experiments, we set β = 10 −6 in the CCC-method, which corresponds to the case of C = 10 −3 in the local Fourier analysis in the Tables 4.1 and 4.2 of [6]. Since the CCC-method relies on the formulation (2) and uses an iterative fixed point algorithm as a smoother, it leads to a very slow convergence for high precision requirements. From Table 2 we find that with β = 10 −6 and 10 pre-and post-smoother iterations the CCC-method still needs almost 150 nonlinear multigrid cycles. However, our method based on a penalization/regularization technique to obtain the unconstrained minimization problems (10) and (9), respectively, uses a line Gauss-Seidel-semismooth-Newton method as a smoother, where the pertinent generalized Newton iteration converges locally at a superlinear rate. It is readily seen that the solutions of (10) or (9) approximate the solution of the total variation model (1) as µ ↓ 0 and γ ↑ ∞. Hence, in our experiments we set µ = 10 −3 (µh 2 = 10 −3 for (9)) and γ = 10 12 for our method. In this case, with the initialization introduced in Section 3.3, after one cycle of the nonlinear multigrid, our method reaches the stopping condition. Although in each cycle our method needs more computations, with significantly less iterations it is more efficient. As mentioned earlier, we also provide the results obtained when our method is equipped with the full multigrid initialization. Due to the good approximation quality of the initial point resulting from the new strategy introduced in Section 3.3 and the fast (i.e. superlinear) local convergence of the semismooth Newton iteration, our method requires less nonlinear multigrid cycles and, hence, less CPU-time (compare the figure in the last four rows of Table 2). Because both methods (ours with µ = 10 −3 and γ = 10 12 and the CCC-method with β = 10 −6 ) solve the total variation model (1), from Figure 1 (d), (e) and (f) and the PSNR-values listed in Table 2 we see that the obtained results are very similar. Example 2. In this example, we compare our method with the CCC-method when restoring the degraded image with the different image resolution. This allows us to study "mesh-(in)dependence" (i.e., resolution (in)dependence) effects. It is interesting that our semismooth Newton solver in the smoothing step is known to be mesh independent as it admits a function space analysis [17,18], whereas the fixed point iteration of [4] does not have such a behavior. size Our method for (10) Our method for (9) The CCC-method in [6] k  Table 3. Comparison of different methods for denoising the image "Boat" with different size.
The test image is the 511-by-511 gray-level image "Boat", which can be download from [1] (see Figure 2 for reference) and which is corrupted by Gaussian white noise with a noise level of σ = 0.1. Then this image is resized to 255-by-255 and 127-by-127 by the nearest neighbor scheme. Notice that here we are not aiming at quality aspects of downsampling, which would require an appropriate combination of low-pass filtering and sub-sampling, rather we study algebraic aspect of mesh independence of an iterative solver. As the total variation model (1) is independent on the image resolution for fixed image domain Ω, we adjust α based on the restoration runs for the 127-by-127 image by selecting the α-value giving the best PSNR value. This way lead us to choose α = 0.04. This α-value is then kept for restoring the other images at different resolutions. The approximation chosen in the CCC-method exhibits a resolution dependence, i.e. it always takes pixel distance 1 whereas our method chooses 1/n, where n − 1-by-n − 1 is the actual image resolution). Thus, instead of keeping the same α when restoring the images at different resolutions, we need set α = 0.04(n + 1). In Table 3 we list the number of cycles of the nonlinear multigrid methods for reaching the stopping condition (denoted by k), the CPU time (in seconds), and the PSNR-values for restoring the images with different image resolutions. We can see that our method reaches the stopping condition within one cycle of the nonlinear multigrid scheme, which is much less than for the CCC-method. As expected, the PSNR-values of the restored images are very similar, which is due to both methods solving the same total variation model. Example 3. Using the 127-by-127 test image of Example 2, we study the convergent rates of our methods and the CCC-method. In order to study the influence of the smoother on the convergent rate, we set the numbers of both the pre-and post-smoothing steps equal to 30, and, for the ease of accessibility, calculate the ratios of the residual after k + 2 cycles and the one after k cycles for k = 1, 3, . . . , 13; residual k+2 /residual k k Our method for (10) Our method for (9) The CCC-method in [ Table 4. We find that the ratio obtained by our method based on (10) stays around 0.5. But the ratio of the CCC-method increases quickly as the multigrid cycle continues. Therefore, the CCC-method quickly slows down which is a clear drawback when rather high precision is needed. This behavior can be attributed to the linear convergent rate of iteration of [4] in the smoothing steps. However note that the ratios of our method based on (9) are not always less than 1. This is due to the fact that the identity operator coming from the L 2 -norm regularization in (9) does not yield a smoothing operation (in contrast to the Laplacian obtained from the regularization in (10)). This adversely affects the multigrid scheme.

5.
Conclusion. Nonlinear multigrid methods for total variation based image denoising are challenged by the non-smooth character of the Euler-Lagrange equations, or better differential inclusion, associated with the pertinent minimization problem. An appropriate numerical treatment depends on a reformulation of the PDE system as a non-smooth equation and the choice of an appropriate smoother. For this purpose and due to the type of the resulting PDE system, a dual formulation of the TV-problem appears better suited than a purely primal formulation, since the latter suffers from oscillatory and even discontinuous coefficients in the partial differential operator. The dual problem involves a vector-field (rather than a scalar-value function as in the primal problem) with a PDE-operator with a component-dependent strong coupling. The latter requires appropriate smoothing in the multigrid scheme. In our case, a line Gauss-Seidel smoother turns out to be efficient yielding satisfactory smoothing rates. In addition, the line smoother has to operate on non-differentiable, but still semismooth, equations. There are several options for solving these non-smooth problems like Chambolle's iteration or semismooth Newton methods. The mesh independence properties of semismooth Newton solvers make the latter solver class appealing. In our numerics we found that while the number of multigrid cycles deteriorates for Chambolle's method as the image resolution is refined, our semismooth Newton iteration based multigrid cycle remains stable.