A FAST MODIFIED NEWTON’S METHOD FOR CURVATURE BASED DENOISING OF 1D SIGNALS

. We propose a novel fast numerical method for denoising of 1D signals based on curvature minimization. Motivated by the primal-dual formu- lation for total variation minimization introduced by Chan, Golub, and Mulet, the proposed method makes use of some auxiliary variables to reformulate the stiﬀ terms presented in the Euler-Lagrange equation which is a fourth-order diﬀerential equation. A direct application of Newton’s method to the resulting system of equations often fails to converge. We propose a modiﬁed Newton’s iteration which exhibits local superlinear convergence and global convergence in practical settings. The method is much faster than other existing methods for the model. Unlike all other existing methods, it also does not require tuning any additional parameter besides the model parameter. Numerical experiments are presented to demonstrate the eﬀectiveness of the proposed method.

1. Introduction. Images acquired by imaging systems are unavoidably contaminated with noise. Image denoising is used to recover the clean image from the observed image. In additive noise models, the acquired image f defined on an open domain Ω ⊂ R 2 is modeled as the sum of the original clean image u and the noise n. We consider Gaussian white noise, where the noise n is independent of the image u and {n(x)} x∈Ω are independent and follow a zero-mean Gaussian distribution with a common variance σ 2 .
A variety of approaches for image denoising have been proposed. They include classical linear and non-linear filtering, wavelet shrinkage, statistical models, partial differential equation (PDE) models, and variational methods [3,12]. We are particularly interested in variational methods. In this kind of methods, the denoised image is exhibited as the minimizer of an objective function. The main advantage of this approach is that geometric properties of the denoised image are directly controlled. A classical method of this type is the well-known total variation (TV) model proposed by Rudin, Osher, and Fatemi [29]. Given an observed image f ∈ L 2 (Ω), the TV model seeks the minimizer of the energy functional (1) E TV (u) : over the set of all functions on Ω with bounded variation. Here, Ω (f − u) 2 dx is the data fidelity term, Ω |∇u|dx is the TV regularization term, and λ > 0 is a tuning parameter. The fidelity term ensures the solution u to stay close to f , so that the major signals are inherited. The regularization term penalizes the presence of oscillations (noise) in the solution to achieve denoising. The main advantage of TV regularization is that edges, as discontinuities of u, are preserved. In contrast, the classical H 1 regularization Ω |∇u| 2 dx, the ancestor of TV regularization, smears out all edges. While the TV model is a simple and elegant model that can produce denoised images with good quality and sharp edges, there are some limitations on the kind of features that can be recovered. For example, intensity contrast, corners, and textures are not preserved [14]. Moreover, the model privileges piecewise constant features, so that smooth regions can become staircases. To fix these issues, various new models have been proposed. Textures can be separated through the use of the G-norm [25]. The inf-convolution model [8], the spatially adaptive TV model [5], and numerous models based on higher-order derivatives [8,13,11,18,22,31,34] can be used to reduce the staircase effect. In particular, we focus on the mean curvature model proposed by Zhu and Chan [37]. Various kinds of curvature of curves and surfaces have also been used in other image and geometric processing tasks, for example, image inpainting [1,2,17,23,24], image segmentation [10,16,21,26,27,28,36,35] and surface fairing [15].
In the mean curvature model, the image is viewed as a parameterized surface {(x, u(x)) : x ∈ Ω} in R 3 . The mean curvature κ of the surface at (x, u(x)) is given by |∇u(x)| 2 + 1 is the surface normal projected on the plane. Here, ∇ is the gradient operator and div is the divergence operator. Then, the mean curvature is used to regularize the reconstructed image. The objective function reads (2) E MC (u) : where Ψ : R → R + is defined as The potential function Ψ is defined so that the regularization is isotropic in smooth low-gradient regions and that the high curvature points are well preserved. It has been proved in [37] that this mean curvature model can preserve edges, intensity contrast, and corners when applied to some model data. Moreover, it does not have staircase artifacts, so that smooth regions can be reconstructed faithfully. Due to its ability to preserve contrast, high-contrast textures are also preserved. As a result, the mean curvature model often yields results of better visual quality compared to the TV model. However, like the TV model and most denoising algorithms, small-scale low-contrast textures may be lost because they can be difficult to be distinguished from noise. Some other interesting properties of the model are studied in [37,38]. The objective function E MC is non-convex, so that multiple local minima exist. But this does not pose any problem in practice. By using iterative methods with the observed image as initial data, good results have been obtained. The Euler-Lagrange equation is given by in Ω, together with the corresponding natural boundary conditions on the boundary ∂Ω. Here, I : R 2 → R 2 is the identity operator and P x : R 2 → R 2 is a rank-one operator given by In [37], an effective explicit artificial time marching scheme is proposed to solve the Euler-Lagrange equation numerically. The capabilities of the mean curvature model stem from its dependence on higherorder derivatives. This also adds some complexities into the optimization of the objective. The most prominent complication is that the Euler-Lagrange equation becomes a fourth-order nonlinear PDE whereas that of the TV model is only secondorder. While standard gradient descent method with explicit time marching is very robust and easy to implement, it is restrained by very stringent stability conditions. Typically, the temporal step size ∆t has to be as small as O(∆x 4 ), where ∆x is the spatial step size. This makes explicit schemes time consuming. Moreover, the objective function is non-convex so that solution to the Euler-Lagrange equation is not unique.
In this paper, we focus on the denoising of one-dimensional signals and propose a fast numerical method for solving the one-dimensional version of the Euler-Lagrange equation (3). By introducing some auxiliary variables, the terms in (3) that are stiff are effectively transformed to equations that are more amenable. The resulting system of nonlinear equations is solved by a modified Newton's method. Based on Newton's iteration, the method has the desirable local superlinear convergence. Unfortunately, due to the indefiniteness of the Jacobian matrix, the straightforward Newton's method often fails to converge globally. To obtain global convergence, we propose a modification of the Jacobian matrix which results in very fast convergence and does not require the use of globalization techniques such as line search and trust-region method in practical settings.
2. Related work. In this section, we first review some existing methods for solving (2). Then, we review the primal-dual method for the TV model [9] by which our method is inspired.
2.1. Existing numerical methods. In the literature, a few approaches have been proposed for computing a solution to (2). They include a gradient descent method with explicit time marching [37], an Augmented Lagrangian method [38], a fixed point iteration scheme based on convex splitting [7], and a multigrid scheme with the aforementioned fixed point iteration as the smoother [7]. We describe these methods in more detail next.
Gradient Descent [37]: An artificial time is first introduced so that u = u(x, y, t). Then, the denoised image is given by the steady state of the gradient flow of the Euler-Lagrange equation (3): This fourth-order nonlinear PDE is then discretized by an explicit finite difference scheme. Denote the temporal and spatial step sizes by ∆t and ∆x respectively. The stability condition of the explicit scheme requires ∆t to be as small as O(∆x 4 ), so that a large number of steps is needed. Augmented Lagrangian method [38]: This optimization technique has been applied to a number of image processing problems such as total variation denoising [19,33] and inpainting [32]. In this method, some auxiliary variables are introduced to make the objective function simple to handle. For the mean curvature model, these variables include p = (u x , u y , 1) , m = p/|p|, n = m and q = divn. Then, a saddle point of the Augmented Lagrangian function is computed via an alternating minimization procedure in which the variables are optimized alternatively. Here, r i > 0 are fixed constants and λ 1 , λ 3 : Ω → R, λ 2 , λ 4 : Ω → R 3 are Lagrange multipliers. The term I(|m| ≤ 1) is the indicator function to enforce the constraint |m| ≤ 1. A nice feature of this method is that the optimization of L with respect to each of its variables is a simple problem that can be done very quickly. Moreover, the outer alternating minimization procedure also converges fast. It also does not require differentiability of the function Ψ. Fixed point iteration with convex splitting [7]: The idea of convex splitting is to split the objective function into E MC = E 1 − E 2 in such a way that E 1 is a convex function. Then the terms originated from E 1 are treated implicitly whereas those from E 2 are treated explicitly. In [7], the splitting is done to the Euler-Lagrange equation F = 0 such that F = F 1 −F 2 and F 1 (u) = A(u)u for some positive definite operator A. The splitting of (3) is given by The parameter γ > 0 has to be large enough to bring in the necessary stability when F 1 is inverted. The fixed point iteration then reads However, in [7], convergence results for (4) are not presented. A caveat with this method is that at each iteration, a complicated system of nonlinear equations has to be solved to advance one time step. In [7], the authors also presented a nonlinear multigrid method with the convex splitting scheme (4) as the smoother. In this case, (4) is only loosely solved by a few Gauss-Seidel or SOR iterations. We remark that many numerical methods have been proposed for some related fourth-order problems. Of particular mention here is the minimization of Euler's elastica Ω Ψ(κ)|∇u| dx, or Ω Ψ(κ)|∇u|δ(u) dx (whereκ = div (∇u/|∇u|) and δ is the Dirac delta function), which has been applied to image segmentation, image inpainting and image denoising. Some numerical methods found in the literature include explicit time marching scheme (accelerated by scaling the Euler-Lagrange equation) [10], semi-implicit scheme [30], approximation with Modica-Mortola functional [16], multigrid and convex splitting [6], Augmented Lagrangian [32], graph-cut [4], and lagged curvature fixed point iteration [20]. While these methods have great potential to the model (2), we construct our method based on a primal-dual method which is reviewed next.

2.2.
Primal-dual method by Chan, Golub, and Mulet. In the TV model (1), the Euler-Lagrange equation to solve is given by Owing to the non-differentiability of the objective function at locations where |∇u| = 0, the Euler-Lagrange equation has singularities there. They make numerical solution difficult. This is the case especially for higher-order numerical methods such as Newton's method because function approximation based on higher-order Taylor's expansion becomes unreliable. Even if the standard trick of replacing |∇u| with |∇u| := |∇u| 2 + for some small > 0 is used, the equation is still very stiff to solve. For instance, the domain of convergence of Newton's method is extremely small, so that the method often fails to converge. In [9], Chan, Golub, and Mulet proposed to introduce an auxiliary variable p = ∇u/|∇u| to remove the singularities. With this new variable, the Euler-Lagrange equation is reformulated to The troublesome term |∇u| appeared in a denominator in (5) is now positioned in a numerator in (6). This system of equations is then solved by Newton's method. The Newton's system of linear equations to solve in each iteration is symmetrized. Local quadratic convergence and global convergence are observed. In the experiments, the method converges to a very high precision in just a few iterations. The linear system to solve in each Newton iteration can be solved efficiently using preconditioned conjugate gradient method. The variable p has an elegant interpretation. The Fenchel dual of (1) is given by which is to be maximized over {p ∈ (C 1 0 (Ω)) 2 : |p(x)| ≤ 1}. Due to the convexity of (1), duality theory states that the maximizer p * of E * TV and the minimizer u * of E TV are related through (6) (with = 0). Therefore, the variable p = ∇u/|∇u| is the Fenchel dual variable and the system (6) is a primal-dual system.
Thus, κ is the signed curvature of the graph of u, treated as a planar curve. We shall discretize the energy functional and present the Newton's iteration in a finite dimensional setting. Let f i for 1 ≤ i ≤ n be the given noisy signal. Let u i for 1 ≤ i ≤ n be the restored signal. Using finite difference with a step size h, the discrete curvature is where ∆ + is the forward difference operator and we have imposed the boundary conditions ∆ + u 0 = ∆ + u n = 0. The discrete energy reads To make E MC smooth, we adopt a smooth potential function Ψ: where > 0. This function is constructed such that Ψ (x) = 2 π arctan( x ) is a smooth approximation of the Heaviside function.

3.2.
First-order optimality conditions. By differentiating (7), we obtain Hence, we obtain the first-order optimality conditions: Note that the ∂EMC ∂ui computed here can also be obtained by discretizing the Euler-Lagrange equation (3) with standard finite differences under the boundary Putting it in another way, the above derivation gives us the proper numerical boundary conditions that make the solution minimize a discrete energy.

Proposed formulation.
Parallel to the idea of the CGM method for the TV model, we introduce a variable (10) w Then, by (7), we have We also propose to introduce two more variables By abusing notations, the optimality conditions (9) can be written as F j = 0 for j = 1, 2, 3, 4, where Here, u, v, w, f, g ∈ R n , is the forward difference matrix under the reflective boundary condition and is the backward difference matrix under the zero boundary condition. The term (∆ + u) 2 + h 2 is interpreted as a diagonal matrix with diagonal entries . Note that the system (13) implies that v n = w n = 0.

3.4.
Newton's iteration. The Newton's update for (13) is given by the solution of the following linear system By differentiating (13), the Jacobian matrix reads where Ψ = Ψ ( 1 h ∆ − w). Next, we shall simplify the above linear system. Let D = (∆ + u) 2 + h 2 . Note that D is a diagonal matrix with positive entries so that it is invertible. Thus, the augmented linear system can be written as Denote the ith block row of the above matrix by R i for i = 1, 2, 3, 4. By subtracting from R 1 , we obtain an equation for δu: Thus, Let us introduce some notations to simplify the notation. We define Then, the equation for δu is given by Once the linear system is solved to obtain δu, we can compute other unknowns using (14). More precisely, we have 3.5. Modified Newton system. Due to the non-convexity of the objective function E MC , the linear system (15) can be indefinite. In practice, the Newton's iteration often diverges (see Figure 5). To remedy this defect, we propose to modify the system. Note that A 1 is a diagonal matrix. Let to be the positive part of A 1 and let Then, we replace (15) by the following system: Note that ∆ − = −∆ + . By the strict convexity of Ψ, we have that A 3 is a positive diagonal matrix. Then, it is straightforward to verify that the matrixÃ is symmetric positive definite.
To shed some light on how this method works, consider the case where the auxiliary variables v, w, g are near convergence so that F 2 , F 3 , F 4 F 1 so that λ h ∆ − b F 1 . Moreover, by (9) and (13), we have Thus, SinceÃ is symmetric positive definite, we have that δu is a descending direction. Hence, as long as F 2 , F 3 , F 4 F 1 , global convergence is favored.
4. Numerical results. In this section, we evaluate the effectiveness of the proposed method by comparing it with the gradient descent method in [37] and the augmented Lagrangian method in [38] numerically and by testing it in various parameter settings. The focus of the tests is the speed and robustness of the algorithm. For assessment of the quality of the denoised signals, we refer the readers to [37]. All computations are done using Matlab R2011b on an Intel Pentium D machine.

Original Signal Noisy Signal Denoised Signal
Signal A . These data are chosen to contain various characteristics. Signal A contains piecewise constant, piecewise linear, piecewise quadratic, and jump features. Signal B is a smooth signal. Signal C contains the largest number of jumps. These data are depicted in Figure 1, where the signal size is n = 2 12 . In the figure, we also show some examples of noisy signals with signal-to-noise (SNR) ratio 30dB and some denoised results using an optimal choice of λ (see below for the definition of optimal λ). Notice that under the curvature based model, jumps are preserved and staircase artifacts are absent.    Table 2. Optimal λ (two significant digits) for various SNRs. In each entry, parameters not shown are set to their defaults indicated in Table 1.  Table 3. Optimal λ (two significant digits) for various signal sizes n. In each entry, parameters not shown are set to their defaults indicated in Table 1.
The default values of the parameters of the data and model are listed in Table 1. But in some experiments, we test the method by varying some of the parameters.
The optimal λ is chosen (with two significant digits) to minimize the l 1 error: Such an optimal λ is recomputed for each observed signal, which uses a different clean signal, n, SNR and noise. These values are reported in Tables 2 and 3. It is well known that such a choice does not necessarily optimize visual quality. For example, in Figure 1, the denoised Signal C still contains some noise. Using a larger λ can further reduce these oscillations, at the expense of an increased l 1 error. But our aim is to test the proposed method under a large number of parameters. Thus, we resort to a way of determining λ that can be automated. We have also tried to use l 2 and l ∞ norms to measure errors, but the visual quality of the denoised signals worsens.
Parameters of Newton's method. The stopping criterion of the Newton's iteration that we adopt is a very stringent one:  Table 4. Optimal time steps for the gradient descend method (one significant digit).
Except in some extreme parameter settings, the proposed method usually converges in a few iterations. The maximum number of iterations is set to 30. The initial guess of u is the noisy signal f . The initial guess of the other variables v, w, g are chosen according to (12), (10), and (11), respectively.
The linear system (20) to solve in each Newton's iteration is pentadiagonal and can be solved efficiently by the Matlab backslash function.
To assess the performance of the proposed method, we report the energy (8) vs CPU time (in seconds) and the residual norm F ∞ vs CPU time (in seconds). Here, F = 0 represents the system that the method is solving, i.e. F : where F j is defined by (13).
Parameters of gradient descent method. The maximum number of iterations is set to 10000. The time step is tuned on individual basis to obtain the best timings and ensure convergence. The time steps are shown in Table 4. The initial guess is u = f . The gradient descend method directly solves the optimality condition (9). Thus, the residual F GD : R n → R n is defined by We report the normalized l 1 residual norm |F GD,i | since the convergence w.r.t. l ∞ norm is too slow. But we insist to use the l ∞ norm to the proposed method to highlight its fast and uniform convergence. We remark that, theoretically, if F (u * , v * , w * , g * ) = 0, then we also have F GD (u * ) = 0. Thus one may think of using F GD (u) ∞ as well to measure the merit of the proposed method so that a common measure is used across different methods. But in practice we found that although our method can achieve F (u, v, w, g) ∞ ≤ 10 −15 after some iterations, we only have F GD (u) = O(10 −15 (λ + 1)). Thus the residual norm F GD (u) ∞ cannot reach the level of 10 −15 when a large λ is used. Hence, we define the residual to be the system that each method directly deals with. (1, 1, 1, 10 2 ) (10 2 , 10 2 , 10 2 , 10 2 ) (10 2 , 10 2 , 10 2 , 10 2 ) Table 5. Manually selected parameters for the augmented Lagrangian method.
Parameters of augmented Lagrangian method. The maximum number of iterations is set to 1000. The parameters r 1 , r 2 , r 3 , r 4 are selected manually as follows. For each (r 1 , r 2 , r 3 , r 4 ) ∈ {1, 10 2 , 10 4 , 10 6 } 4 , we run the method for 500 iterations and record the final energy and residual norm. Then, the parameters are selected so that both energy and residual norm are as small as possible. The selected parameters are shown in Table 5.
The residual in this case is defined as We report the normalized l 1 residual norm |F AL,i |.

4.2.
Comparison with existing methods and effects of the signal-to-noise ratio. In this test, we compare the proposed modified Newton's method (Newton) with the gradient descend (GD) and augmented Lagrangian (AL) method under different noise levels (40dB, 30dB and 20dB). In Figure 2, the residual norm and the energy are plotted against CPU time (in seconds). The noise level here is SNR=40dB. We recall that the residual norms for Newton, GD and AL methods are F ∞ , F GD 1 and F AL 1 , respectively. Thus Newton uses a stricter measure (the l 1 norm is normalized). The corresponding results for 30dB and 20dB are shown in Figures 3 and 4, respectively. For Newton, the number of circles on each curve equals to the number of Newton's iterations plus one. The first circle reveals the time needed for initialization. The iteration stops either when δu ∞ ≤ 10 −15 or the number of iterations reaches 30.
For GD and AL, the iteration stops when either δu 1 ≤ 10 −15 or the number of iterations reaches 10000 and 1000 respectively.
Several observations are in place. First, the residual norm of Newton drops below 10 −15 in just a few iterations for Signal A & B and in less than 20 iterations for Signal C. The residual norm of GD remains relatively large after 10000 iterations for Signal A & B but drops to a very low level for Signal C. AL often gives a residual norm around 10 −10 .
Second, the time for Newton to converge is less than 0.3s in all cases and is insensitive to the noise level.
Third, for Signal A & C, both Newton, GD and AL reach a similar energy level but Newton uses much less time in most cases (recall that the times are plotted in log scale). For Signal B, GD stops at a higher energy level (about 10 times larger when SNR=30dB and about 100 times larger when SNR=20dB).
Fourth, besides the obvious case of GD, Newton also features a monotonically decreasing energy.
Fifth, while the profiles of the residual norm of Newton and AL are very stable across different noise levels, it is less so for GD. For example, for Signal A, GD's residual norm at the 10000th iteration increases from about 10 −4 to 10 −2 and 10 0 as the SNR changes from 40dB to 30dB and 20dB, respectively. This can be attributed to the time steps shown in Table 4. As noise level increases (SNR decreases), a smaller time step is needed to facilitate convergence. This in turn makes convergence slower.

4.3.
Effects of the signal size n. In this test, we investigate how the method depends on the signal size n. For each n ∈ {2 8 , 2 10 , 2 12 , 2 14 , 2 16 }, we run the proposed method and the residual norm F ∞ during the iteration is plotted against the CPU time (in seconds), see Figure 5. To demonstrate the need of modifying the Newton's system, we report the results using the original Newton's iteration as well.
For the modified Newton's iteration (20) and (16) We also observe that even when n is as large as 2 16 , the residual norm resulted from the proposed method becomes less than 10 −15 in less than 4 seconds. The number of Newton's iterations is quite stable for all n.
For illustration purpose, we show the condition number of the modified Newton's system in Figure 6. The condition number does not vary at all during the iteration. As expected, it increases with the signal size.
In Figure 7, fast convergence is observed for all but one case. The time to convergence also does not vary too much. The only case where divergence is observed is Signal C with λ = 10 6 . The optimal λ in this case is 4.0. Thus the divergence occurred at a value much larger than the practical value. Indeed, if λ is very large, then the energy is dominated by the regularization term and the problem becomes more ill-posed due to the non-uniqueness of minimizer to the regularization term. This is also reflected in the Newton's system (14) whose limiting system is singular. Thus we expect the performance can deteriorate as λ increases. 4.5. Line search. In Figure 7, divergence is observed for Signal C with λ = 10 6 . However, thanks to the modification of the Newton's system, the computed direction δu is a descending direction under very mild conditions. Thus, global convergence is encouraged with the help of line search. The aim of this experiment is to test the performance of line search.
The line search procedure we adopt is as follows.

Residual norm Energy
Signal A 1. Compute δu as usual 2. α u ← 1 3. While E MC (u + α u δu) > E MC (u), α u ← 0.5α u 4. δu ← α u δu 5. Compute δv, δw, δg as usual In Figure 8, we show the result of Newton with line search. The data is Signal C with λ = 10 6 , which is the only case where divergence is observed in the previous experiments. Convergence is reached after eleven iterations. The timing is also comparable to those in Figure 7, where line search is not utilized. The step sizes α u obtained are shown in Figure 8. The step size is reduced to 0.5 only in three of the iterations.

Residual norm Energy
Signal A We remark that this line search procedure will not reduce the step size in all other cases conducted in the previous experiments because there the energy is monotonically decreasing.

5.
Conclusion. Based on the Chan-Golub-Mulet idea, we reformulated the curvature based denoising model into a more amenable nonlinear system of equations. We also introduced a modified Newton's method to solve the system. We demonstrated that the modification is essential and that the method exhibits very fast convergence in a wide range of parameters. Only when some extreme values of λ are used, a line search is needed to obtain global convergence. The proposed method does not require any parameter tuning whereas gradient descent and augmented Lagrangian  methods do. Another advantage of the proposed method is that the accuracy is very high. This makes convergence very easy to detect. When using a slow numerical method such as gradient descent, sometimes the stopping criterion is met merely because the progress is too slow that δu becomes tiny; in this case a prolonged run of the iteration can eventually result in a significant change of the restored signal.
The curvature model has some nice features such as contrast and corner preservation which are not shared by the seemingly alike Euler's elastica model. Moreover, the presence of the constant 1 in the curvature term ( In a future work, we shall investigate the generalization to the two-dimensional case, where more sophisticated ways to modify the Newton's system and its preconditioning are called. Step sizes