1 Introduction

Observed images are often contaminated by noise and may be additionally distorted by some measurement device. Then the obtained data g can be described as

$$\begin{aligned} g=\mathcal {N}(T\hat{u}), \end{aligned}$$

where \(\hat{u}\) is the unknown original image, T is a linear bounded operator modeling the image formation device, and \(\mathcal {N}\) represents noise. In this paper, we consider images which are contaminated by either white Gaussian noise or impulse noise. While for white Gaussian noise the degraded image g is obtained as

$$\begin{aligned} g = T \hat{u} + \eta , \end{aligned}$$

where the noise \(\eta \) is oscillatory with zero mean and standard deviation \(\sigma \), there are two main models for impulse noise that are widely used in a variety of applications, namely salt-and-pepper noise and random-valued impulse noise. We assume that \(T\hat{u}\) is in the dynamic range [0, 1], i.e., \(0 \le T\hat{u} \le 1\), then in the presence of salt-and-pepper noise the observation g is given by

$$\begin{aligned} g(x)= {\left\{ \begin{array}{ll} 0 &{} \text { with probability } r_1\in [0,1),\\ 1 &{} \text { with probability } r_2 \in [0,1), \\ T \hat{u} (x) &{} \text { with probability } 1-r_1-r_2,\\ \end{array}\right. } \end{aligned}$$
(1.1)

with \(1-r_1-r_2>0\). If the image is contaminated by random-valued impulse noise, then g is described as

$$\begin{aligned} g(x)= {\left\{ \begin{array}{ll} \rho &{} \text { with probability } r \in [0,1),\\ T \hat{u} (x) &{} \text { with probability } 1-r,\\ \end{array}\right. } \end{aligned}$$
(1.2)

where \(\rho \) is a uniformly distributed random variable in the image intensity range [0, 1].

The recovery of \(\hat{u}\) from the given degraded image g is an ill-posed inverse problem and thus regularization techniques are required to restore the unknown image [41]. A good approximation of \(\hat{u}\) may be obtained by solving a minimization problem of the type

$$\begin{aligned} \min _u \mathcal {H}(u;g) + \alpha \mathcal {R}(u), \end{aligned}$$
(1.3)

where \(\mathcal {H}(.;g) \) represents a data fidelity term, which enforces the consistency between the recovered and measured images, \(\mathcal {R}\) is an appropriate filter or regularization term, which prevents overfitting, and \(\alpha >0\) is a regularization parameter weighting the importance of the two terms. We aim at reconstructions in which edges and discontinuities are preserved. For this purpose, we use the total variation as a regularization term, first proposed in [81] for image denoising. Hence, here and in the remaining of the paper we choose \(\mathcal {R}(u)=\int _\varOmega |Du|\), where \(\int _\varOmega |Du|\) denotes the total variation of u in \(\varOmega \); see [3, 46] for more details. However, we note that other regularization terms, such as the total generalized variation [13], the nonlocal total variation [60], the Mumford–Shah regularizer [71], or higher order regularizers (see e.g. [77] and references therein) might be used as well.

1.1 Choice of the Fidelity Term

The choice of \(\mathcal {H}\) typically depends on the type of noise contamination. For images corrupted by Gaussian noise, a quadratic \(L^2\)-data fidelity term is typically chosen and has been successfully used; see for example [1820, 24, 2830, 32, 35, 47, 72, 76, 91, 93]. In this approach, which we refer to as the \(L^2\)-TV model, the image \(\hat{u}\) is recovered from the observed data g by solving

$$\begin{aligned} \min _{u\in BV(\varOmega )} \frac{1}{2} \Vert Tu - g \Vert _{L^2(\varOmega )}^2 + \alpha \int _\varOmega |Du|, \end{aligned}$$
(1.4)

where \(BV(\varOmega )\) denotes the space of functions with bounded variation, i.e., \(u\in BV(\varOmega )\) if and only if \(u\in L^1(\varOmega )\) and \(\int _\varOmega |Du| <\infty \). In the presence of impulse noise, e.g., salt-and-pepper noise or random-valued impulse noise, the above model usually does not yield a satisfactory restoration. In this context, a more successful approach, suggested in [1, 74, 75], uses a nonsmooth \(L^1\)-data fidelity term instead of the \(L^2\)-data fidelity term in (1.4), i.e., one considers

$$\begin{aligned} \min _{u\in BV(\varOmega )} \Vert Tu - g \Vert _{L^1(\varOmega )} + \alpha \int _\varOmega |Du|, \end{aligned}$$
(1.5)

which we call the \(L^1\)-TV model. In this paper, we are interested in both models, i.e., the \(L^2\)-TV and the \(L^1\)-TV model, and condense them into

$$\begin{aligned} \min _{u\in BV(\varOmega )} \left\{ \mathcal {J}_\tau (u;g) := \mathcal {H}_\tau (u;g) + \alpha \int _\varOmega |Du| \right\} \end{aligned}$$
(1.6)

to obtain a combined model for removing Gaussian or impulsive noise, where \( \mathcal {H}_\tau (u;g):=\frac{1}{\tau }\Vert Tu-g\Vert _{L^\tau (\varOmega )}^\tau \) for \(\tau =1,2\). Note that instead of (1.6) one can consider the equivalent problem

$$\begin{aligned} \min _{u\in BV(\varOmega )} \lambda \mathcal {H}_\tau (u;g) + \int _\varOmega |Du|, \end{aligned}$$
(1.7)

where \(\lambda =\frac{1}{\alpha }>0\). Other and different fidelity terms have been considered in connection with other type of noise models, as Poisson noise [64], multiplicative noise [4], and Rician noise [43]. For images which are simultaneously contaminated by Gaussian and impulse noise [15], a combined \(L^1--L^2\)-data fidelity term has been recently suggested and demonstrated to work satisfactorily [53]. However, in this paper, we concentrate on images degraded by only one type of noise, i.e., either Gaussian noise or one type of impulse noise, and perhaps additionally corrupted by some measurement device.

1.2 Choice of the Scalar Regularization Parameter

For the reconstruction of such images, the proper choice of \(\alpha \) in (1.6) and \(\lambda \) in (1.7) is delicate; cf. Fig. 1. In particular, large \(\alpha \) and small \(\lambda \), which lead to an oversmoothed reconstruction, not only remove noise but also eliminate details in images. On the other hand, small \(\alpha \) and large \(\lambda \) lead to solutions which fit the given data properly but therefore retain noise in homogeneous regions. Hence, a good reconstruction can be obtained by choosing \(\alpha \) and, respectively, \(\lambda \) such that a good compromise of the aforementioned effects is made. There are several ways of how to select \(\alpha \) in (1.6) and equivalently \(\lambda \) in (1.7), such as manually by the trial-and-error method, the unbiased predictive risk estimator method (UPRE) [67, 69], the Stein unbiased risk estimator method (SURE) [10, 38, 82] and its generalizations [34, 40, 45], the generalized cross-validation method (GCV) [48, 66, 67, 78], the L-curve method [49, 50], the discrepancy principle [70], and the variational Bayes’ approach [6]. Further parameter selection methods for general inverse problems can be found for example in [41, 42, 88, 89].

Fig. 1
figure 1

Reconstruction of an image corrupted by Gaussian white noise with a “relatively” large parameter \(\alpha \) in (b) and a “relatively” small parameter \(\alpha \) in (c). a Noisy image, b oversmoothed reconstruction, and c overfitted reconstruction

Based on a training set of pairs \((g_k, \hat{u}_k)\), for \(k=1, 2, \ldots , N \in \mathbb N\), where \(g_k\) is the noisy observation and \(\hat{u}_k\) represents the original image, for example in [16, 33, 61] bilevel optimization approaches have been presented to compute suitable scalar regularization parameters of the corresponding image model. Since in our setting we do not have a training set given, these approaches are not applicable here.

Applying the discrepancy principle to estimate \(\alpha \) in (1.6) or \(\lambda \) in (1.7), the image restoration problem can be formulated as a constrained optimization problem of the form

$$\begin{aligned} \min _{u\in BV(\varOmega )} \int _\varOmega |Du| \quad \text {subject to (s.t.)} \quad \mathcal {H}_\tau (u;g)=\mathcal {B}_\tau , \end{aligned}$$
(1.8)

where \(\mathcal {B}_\tau :=\frac{\nu _\tau }{\tau } |\varOmega |\) with \(\nu _\tau >0\) being here a constant depending on the underlying noise, \(\tau =1,2\), and \(|\varOmega |\) denoting the volume of \(\varOmega \); see Sect. 3 for more details. Note that here we assume to know a priori the noise level. In real applications, this means that possibly in a first step a noise estimation has to be performed before the discrepancy principle may be used. However, in general it is easier to estimate the noise level than the regularization parameter [18].

The constrained minimization problem (1.8) is naturally linked to the unconstrained minimization problem (1.7) and accordingly to (1.6). In particular, there exists a constant \(\lambda \ge 0\) such that the unconstrained problem (1.7) is equivalent to the constrained problem (1.8) if T does not annihilate constant functions, i.e., \(T\in \mathcal {L}(L^2(\varOmega ))\) is such that \(T\cdot 1 =1\); see Sect. 2 for more details. Several methods based on the discrepancy principle and problem (1.8) with \(\tau =2\) have been proposed in the literature, see for example [9, 18, 51, 92] and references therein, while not so much attention has been given to the case \(\tau =1\), see for example [73, 91].

1.3 Spatially Adaptive Parameter

Note that a scalar regularization parameter might not be the best choice for every image restoration problem, since images usually have large homogeneous regions as well as parts with a lot of details. Actually, it seems obvious that \(\alpha \) should be small, or \(\lambda \) should be large, in parts with small features in order to preserve the details. On the contrary, \(\alpha \) should be large, or \(\lambda \) should be small, in homogeneous parts to remove noise considerably. With such a choice of a spatially varying weight, we expect better reconstructions than with a globally constant parameter, as demonstrated for example in [37, 58]. This motivated to consider multiscale total variation models with spatially varying parameters initially suggested in [80]. The multiscale version of (1.6) reads as

$$\begin{aligned} \min _{u\in BV(\varOmega )} \mathcal {H}_\tau (u;g) + \int _\varOmega \alpha (x) |Du|, \end{aligned}$$
(1.9)

while for (1.7) one writes

$$\begin{aligned} \min _{u\in BV(\varOmega )} \frac{1}{\tau }\int _\varOmega \lambda (x) |Tu-g|^\tau \mathrm{d}x + \int _\varOmega |Du|, \end{aligned}$$
(1.10)

and in the sequel we refer to (1.9) and (1.10) as the multiscale \(L^\tau \)-TV model.

In [84], the influence of the scale of an image feature on the choice of \(\alpha \) is studied and the obtained observations were later used in [83] to construct an updating scheme of \(\alpha \). Based on (1.10), in [8] a piecewise constant function \(\lambda \), where the pieces are defined by a partitioning of the image due to a presegmentation, is determined. In particular, for each segment a scalar \(\lambda _i\), \(i=1,\ldots ,\#\)pieces is computed by Uzawa’s method [27].

Later, it was noticed that stable choices of \(\lambda \), respectively, \(\alpha \) should incorporate statistical properties of the noise. In this vein, in [2, 37, 44] for the problem (1.10) automated update rules for \(\lambda \) based on statistics of local constraints were proposed. In [44], a two-level approach for variational denoising is considered, where in the first level noise and relevant texture are isolated in order to compute local constraints based on local variance estimation. In the second level, a gradient descent method and an update formula for \(\lambda (x)\) derived from the Euler–Lagrange equation are utilized. An adaptation of this approach to multiplicative noise can be found in [65]. For convolution type of problems in [2] based on an estimate of the noise variance for each pixel, an automatic updating scheme of \(\lambda \) using Uzawa’s method is created. This approach is improved in [37] by determining the fidelity weights due to the Gumbel statistic for the maximum of a finite number of random variables associated with localized image residuals and by incorporating hierarchical image decompositions, proposed in [86, 87], to speed up the iterative parameter adjustment process. An adaptation of this approach to a total variation model with \(L^1\) local constraints is studied in [58]. A different approach has been proposed in [85] for image denoising only, where nonlocal means [14] are used to create a nonlocal data fidelity term. While in all these approaches the adjustment of \(\lambda \) relies on the output of T being a deteriorated image again, in [54] the method of [37] is adjusted to the situation where T is an orthogonal wavelet transform or Fourier transform. Very recently, also bilevel optimisation approaches are considered for computing spatially adaptive weights [26, 56, 57].

1.4 Contribution

Our first contribution of this paper is to present a method which automatically computes the regularization parameter \(\alpha \) in (1.6) based on (1.8) for \(\tau =1\) as well as for \(\tau =2\). Our approach is motivated by the parameter selection algorithm presented in [18], which was originally introduced for \(L^2\)-TV image denoising only, i.e., when \(T=I\), where I denotes the identity operator. In this setting, the algorithm in [18] is shown to converge to a parameter \(\alpha ^*\) such that the corresponding minimizer \(u_{\alpha ^*}\) of (1.6) is also a solution of (1.8). The proof relies on the nonincrease of the function \(\alpha \mapsto \frac{\mathcal {H}_2(u_\alpha ;g)}{\mathcal {B}_2}\). However, this important property does not hold for operators \(T\not =I\) in general. Nevertheless, we generalize the algorithm from [18] to problems of the type (1.6) for \(\tau =1,2\) and for general linear bounded operators T, e.g., T might be a convolution type of operator. Utilizing an appropriate update of \(\alpha \), which is different than the one used in [18], we are able to show analytically and numerically that our approach indeed converges to the desired regularization parameter. Further, besides the general applicability of our proposed method, it even possesses advantages for the case \(\tau =2\) and \(T=I\) over the algorithm from [18] with respect to convergence. More precisely, in our numerics it turned out that our proposed method always needs less or at least the same number of iterations as the algorithm from [18] till termination.

Motivated by multiscale total variation minimization, the second contribution of this paper is concerned with the automated selection of a suitable spatially varying \(\alpha \) for the optimization problem in (1.9) for \(\tau =1,2\). Based on our considerations for an automatic scalar regularization parameter selection, we present algorithms where the adjustment of a locally varying \(\alpha \) is fully automatic. Differently to the scalar case, the adjustment of \(\alpha \) is now based on local constraints, similarly as already considered for example in [2, 37, 58]. However, our approach differs significantly from these previous works, where problem (1.10) is considered and Uzawa’s method or an Uzawa-like method is utilized for the update of the spatially varying parameter. Note that in Uzawa’s method an additional parameter has to be introduced and chosen accordingly. We propose an update scheme of \(\alpha \) which does not need any additional parameter and hence is not similar to Uzawa’s method. Moreover, differently to the approaches in [37, 58] where the initial regularization parameter \(\lambda >0\) has to be set sufficiently small, in our approach any initial \(\alpha >0\) is allowed. In this sense, our algorithm is even more general than the ones presented in [37, 58].

1.5 Outline of the Paper

The remaining of the paper is organized as follows: In Sect. 2, we revisit and discuss the connection between the constrained minimization problem (1.8) and the unconstrained optimization problem (1.6). Section 3 is devoted to the automated scalar parameter selection. In particular, we present our proposed method and analyze its convergence behavior. Based on local constraints, we describe in Sect. 4 our new locally adapted total variation algorithm in detail. Algorithms for performing total variation minimization for spatially varying \(\alpha \) are presented in Sect.  5 where also their convergence properties are studied. To demonstrate the performance of the new algorithms, we present in Sect. 6 numerical experiments for image denoising and image deblurring. Finally, in Sect. 7 conclusions are drawn.

2 Constrained Versus Unconstrained Minimization Problem

In this section, we discuss the connection between the unconstrained minimization problem (1.6) and the constrained optimization problem (1.8). For this purpose, we introduce the following basic terminology. Let \({\mathcal V}\) be a locally convex space, \({\mathcal V}'\) its topological dual, and \(\langle \cdot , \cdot \rangle \) the bilinear canonical pairing over \({\mathcal V}\times {\mathcal V}'\). The domain of a functional \({\mathcal J}: {\mathcal V}\rightarrow \mathbb R\cup \{+\infty \}\) is defined as the set

$$\begin{aligned} {\text {Dom}}({\mathcal J}) := \{v\in {\mathcal V}\ : \ {\mathcal J}(v) < \infty \}. \end{aligned}$$

A functional \({\mathcal J}\) is called lower semicontinuous (l.s.c) if for every weakly convergent subsequence \(v^{(n)}\rightharpoonup \hat{v}\) we have

$$\begin{aligned} \liminf _{v^{(n)}\rightharpoonup \hat{v}} {\mathcal J}(v^{(n)}) \ge {\mathcal J}(\hat{v}). \end{aligned}$$

For a convex functional \({\mathcal J}: {\mathcal V}\rightarrow \mathbb R\cup \{+\infty \}\), we define the subdifferential of \({\mathcal J}\) at \(v\in {\mathcal V}\), as the set-valued function \(\partial {\mathcal J}(v) = \emptyset \) if \({\mathcal J}(v)=\infty \), and otherwise as

$$\begin{aligned} \partial {\mathcal J}(v) = \{v^* \in {\mathcal V}' \ : \ \langle v^*, u-v\rangle + {\mathcal J}(v) \le {\mathcal J}(u) \ \ \forall u\in {\mathcal V}\}. \end{aligned}$$

For any operator T, we denote by \(T^*\) its adjoint and by \(\mathcal {L}(L^2(\varOmega ))\) we denote the space of linear and continuous operators from \(L^2(\varOmega )\) to \(L^2(\varOmega )\). Moreover, \(g_\varOmega \) describes the average value of the function \(g\in L^1(\varOmega )\) in \(\varOmega \) defined by \(g_\varOmega :=\frac{1}{|\varOmega |} \int _\varOmega g(x) \ d x\).

Theorem 2.1

Assume that \(T\in \mathcal {L}(L^2(\varOmega ))\) does not annihilate constant functions, i.e., \(T 1_\varOmega \not =0\), where \(1_\varOmega (x) = 1\) for \(x\in \varOmega \). Then the problem

$$\begin{aligned} \min _{u\in BV(\varOmega )} \int _{\varOmega } |D u| \quad \text { s.t. } \quad \mathcal {H}_\tau (u;g) \le \mathcal {B}_\tau \end{aligned}$$
(2.1)

has a solution for \(\tau =1,2\).

Proof

For a proof, we refer the reader to [20] and [58]. \(\square \)

Moreover, we have the following statement.

Proposition 2.1

Assume that \(T\in \mathcal {L}(L^2(\varOmega ))\) is such that \(T\cdot 1=1\) and \(\nu _\tau |\varOmega | \le \Vert g-g_\varOmega \Vert ^\tau _{L^\tau (\varOmega )}\). Then problem (2.1) is equivalent to the constrained minimization problem (1.8) for \(\tau =1,2\).

Proof

For \(\tau =2\), the statement is shown in [20]. We state the proof for \(\tau =1\) by noting that it follows similar arguments as for \(\tau =2\). Let \(\tilde{u}\) be a solution of (2.1). Note that there exists \(u\in BV(\varOmega )\) such that \(\tilde{u}= u+g_\varOmega \). We consider now the continuous function \(f(s)=\Vert T(s u + g_\varOmega )- g\Vert _{L^1(\varOmega )}\) for \(s\in [0,1]\). Note that \(f(1)=\Vert T\tilde{u} - g\Vert _{L^1(\varOmega )}\le \nu _1|\varOmega |\) and \(f(0)=\Vert T g_\varOmega - g\Vert _{L^1(\varOmega )} = \Vert g - g_\varOmega \Vert _{L^1(\varOmega )}\ge \nu _1 |\varOmega |\), since \(T\cdot 1=1\), and hence there exists some \(s\in [0,1]\) such that \(f(s)=\nu _1 |\varOmega |\). Set \(u'=su\) which satisfies \(\Vert Tu' - g\Vert _{L^1(\varOmega )}=\nu _1|\varOmega |\) and

$$\begin{aligned} \int _{\varOmega } |D u'| = s \int _{\varOmega } |D u| \le \liminf _{n\rightarrow \infty } \int _{\varOmega } |D u_{n}|, \end{aligned}$$

where \((u_n)_n\) is a minimizing sequence of (1.8). Hence \(u'\) is a solution of (1.8). \(\square \)

Now, we are able to argue the equivalence of the problems (1.6) and (1.8).

Theorem 2.2

Let \(T\in \mathcal {L}(L^2(\varOmega ))\) be such that \(T\cdot 1 =1\) and \(\nu _\tau |\varOmega | \le \Vert g-g_\varOmega \Vert ^\tau _{L^\tau (\varOmega )}\). Then there exists \(\alpha \ge 0\) such that the constrained minimization problem (1.8) is equivalent to the unconstrained problem (1.6), i.e., u is a solution of (1.8) if and only if u solves (1.6).

Proof

For \(\tau =2\), the proof can be found in [20, Prop. 2.1]. By similar arguments, one can show the statement for \(\tau =1\), which we state here. Set \( \mathcal {R}(u):= \int _{\varOmega } |D u|\) and

$$\begin{aligned} {\mathcal G}(u)=\ {\left\{ \begin{array}{ll} +\infty &{} \text {if } \Vert u-g\Vert _{L^1(\varOmega )} >\nu _1 |\varOmega |,\\ 0 &{} \text {if } \Vert u-g\Vert _{L^1(\varOmega )} \le \nu _1 |\varOmega |. \end{array}\right. } \end{aligned}$$

Notice that \(\mathcal {R}\) and \({\mathcal G}\) are convex l.s.c functions and problem (2.1) is equivalent to \(\min _u \mathcal {R}(u)+{\mathcal G}(Tu)\). We have \({\text {Dom}}(\mathcal {R}) = BV(\varOmega ) \cap L^2(\varOmega )\) and \({\text {Dom}}({\mathcal G})=\{u\in L^2(\varOmega ) : {\mathcal G}(u)<+\infty \}\). Since \(g\in \overline{T{\text {Dom}}(\mathcal {R})}\), there exists \(\tilde{u}\in {\text {Dom}}(\mathcal {R})\) with \(\Vert T\tilde{u} - g\Vert _{L^1(\varOmega )} \le \nu _1|\varOmega |/2\). As \(T\in \mathcal {L}(L^2(\varOmega ))\) is continuous, \({\mathcal G}\circ T\) is continuous at \(\tilde{u}\). Hence, by [39, Prop. 5.6, p. 26] we obtain

$$\begin{aligned} \partial (\mathcal {R}+{\mathcal G}\circ T)(u) = \partial \mathcal {R}(u) + \partial ({\mathcal G}\circ T)(u) \end{aligned}$$

for all u. Further, \({\mathcal G}\) is continuous at \(T\tilde{u}\), and hence by [39, Prop. 5.7,p. 27] we have for all u

$$\begin{aligned} \partial ({\mathcal G}\circ T)(u) = T^*\partial {\mathcal G}(Tu), \end{aligned}$$

where \(\partial {\mathcal G}(u)=\{0\}\) if \(\Vert u-g\Vert _{L^1(\varOmega )}<\nu _1|\varOmega |\) and \(\partial {\mathcal G}(u)=\{\alpha \partial (\Vert u-g\Vert _{L^1(\varOmega )}), \alpha \ge 0\}\) if \(\Vert u-g\Vert _{L^1(\varOmega )}=\nu _1|\varOmega |\).

If u is a solution of (2.1) and hence of (1.8), then

$$\begin{aligned} 0\in \partial (\mathcal {R}+{\mathcal G}\circ T)(u) = \partial \mathcal {R}(u) + T^*\partial {\mathcal G}(Tu). \end{aligned}$$

Since any solution of (1.8) satisfies \(\Vert Tu-g\Vert _{L^1(\varOmega )}=\nu _1 |\varOmega |\), this shows that there exists an \(\alpha \ge 0\) such that

$$\begin{aligned} 0\in \partial \mathcal {R}(u) + \alpha T^*\partial (\Vert Tu-g\Vert _{L^1(\varOmega )}). \end{aligned}$$

Hence, for this \(\alpha \ge 0\), u is a minimizer of the problem in (1.6).

Conversely, a minimizer u of (1.6) with the above \(\alpha \) is obviously a solution of (1.8) with \(\Vert Tu-g\Vert _{L^1(\varOmega )}=\nu _1 |\varOmega |\). This concludes the proof.\(\square \)

Note that \(\Vert T u_\alpha -g\Vert _{L^1(\varOmega )}\) is (only) convex with respect to \(Tu_\alpha \), and hence a minimizer of (1.5) is in general not unique even in the simple case when \(T=I\), i.e., for two minimizers \(u_\alpha ^1\) and \(u_\alpha ^2\) in general we have \(T u_\alpha ^1 \not =T u_\alpha ^2\). On the contrary, \(\Vert T u_\alpha -g\Vert _{L^2(\varOmega )}^2\) is strictly convex with respect to \(Tu_\alpha \), i.e., for two minimizers \(u_\alpha ^1\) and \(u_\alpha ^2\) of (1.4) we have \(T u_\alpha ^1 =T u_\alpha ^2\). Moreover, the function \(\alpha \mapsto \mathcal {H}_1(u_\alpha ;g)\) is in general not continuous [23], while \(\alpha \mapsto \mathcal {H}_2(u_\alpha ;g)\) indeed is continuous [20], where \(u_\alpha \) is a respective minimizer of (1.6). Hence we have the following further properties:

Lemma 2.1

Let \(u_\alpha \) be a minimizer of (1.6). Then \(\alpha \mapsto \mathcal {H}_\tau (u_\alpha ;g)\) is nondecreasing for \(\tau =1,2\). Moreover, \(\alpha \mapsto \mathcal {H}_2(u_\alpha ;g)\) maps \(\mathbb R^+\) onto \([0,\Vert g- g_\varOmega \Vert _{L^2(\varOmega )}^2]\).

Proof

For a proof see [20, 23].\(\square \)

Proposition 2.2

If \(u_{\alpha _i}\) is a minimizer of

$$\begin{aligned} \mathcal {E}(u,\alpha _i) := \Vert u-g\Vert _{L^2(\varOmega )}^2 +\alpha _i \int _{\varOmega } |D u| \end{aligned}$$

for \(i=1,2\), then we have

$$\begin{aligned} \Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )} \le C \left\| g- g_\varOmega \right\| _{L^2(\varOmega )} \end{aligned}$$

with \(C:=\min \left\{ 2\left| \frac{\alpha _2 - \alpha _1}{\alpha _2 + \alpha _1}\right| , \left| \frac{\alpha _2 - \alpha _1}{\alpha _2 + \alpha _1}\right| ^{\frac{1}{2}}\right\} \).

Proof

By [7, Lemma10.2], we have

$$\begin{aligned} \begin{aligned} \frac{1}{\alpha _1} \Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )}^2&\le \frac{1}{\alpha _1} \left( \mathcal {E}(u_{\alpha _2},\alpha _1) - \mathcal {E}(u_{\alpha _1},\alpha _1)\right) \\ \frac{1}{\alpha _2} \Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )}^2&\le \frac{1}{\alpha _2} \left( \mathcal {E}(u_{\alpha _1},\alpha _2) - \mathcal {E}(u_{\alpha _2},\alpha _2)\right) . \end{aligned} \end{aligned}$$

Summing up these two inequalities yields

$$\begin{aligned} \begin{aligned}&\left( \frac{1}{\alpha _1} + \frac{1}{\alpha _2} \right) \Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )}^2 \\&\quad \le \left( \frac{1}{\alpha _1} - \frac{1}{\alpha _2}\right) \left( \Vert u_{\alpha _2}-g\Vert _{L^2(\varOmega )}^2 - \Vert u_{\alpha _1}-g\Vert _{L^2(\varOmega )}^2\right) \ \end{aligned} \end{aligned}$$

which implies

$$\begin{aligned}&\Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )}^2 \nonumber \\&\quad \le \frac{\alpha _2 - \alpha _1}{\alpha _1+\alpha _2} \left( \Vert u_{\alpha _2}-g\Vert _{L^2(\varOmega )}^2 - \Vert u_{\alpha _1}-g\Vert _{L^2(\varOmega )}^2\right) .\nonumber \\ \end{aligned}$$
(2.2)

By the nondecrease and boundedness of the function \(\alpha \mapsto \mathcal {H}_2(u_\alpha ;g)\), see Lemma 2.1, it follows

$$\begin{aligned} \begin{aligned} \Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )}^2 \le \left| \frac{\alpha _2 - \alpha _1}{\alpha _1+\alpha _2} \right| \left\| g - g_\varOmega \right\| _{L^2(\varOmega )}^2 . \end{aligned} \end{aligned}$$
(2.3)

On the other hand, inequality (2.2) implies

$$\begin{aligned} \begin{aligned} \Vert u_{\alpha _1}&- u_{\alpha _2} \Vert _{L^2(\varOmega )} \\&\le \left| \frac{\alpha _2 - \alpha _1}{\alpha _1+\alpha _2}\right| \left( \Vert u_{\alpha _2}-g\Vert _{L^2(\varOmega )} + \Vert u_{\alpha _1}-g\Vert _{L^2(\varOmega )}\right) , \end{aligned} \end{aligned}$$

where we used the binomial formula \(a^2 -b^2 = (a+b)(a-b)\) for \(a,b\in \mathbb R\) and the triangle inequality. Using Lemma 2.1 yields

$$\begin{aligned} \begin{aligned} \Vert u_{\alpha _1} - u_{\alpha _2} \Vert _{L^2(\varOmega )} \le 2 \left| \frac{\alpha _2 - \alpha _1}{\alpha _1+\alpha _2}\right| \left\| g - g_\varOmega \right\| _{L^2(\varOmega )}. \end{aligned} \end{aligned}$$
(2.4)

The assertion follows then from (2.3) and (2.4). \(\square \)

Remark 2.1

Without loss of generality, let \(\alpha _2 \ge \alpha _1\) in Proposition 2.2. Then we easily check that

$$\begin{aligned} C={\left\{ \begin{array}{ll} \left( \frac{\alpha _2-\alpha _1}{\alpha _2+\alpha _1}\right) ^{\frac{1}{2}} &{} \text {if } \alpha _2 > \frac{5}{3}\alpha _1, \\ \frac{1}{2} &{} \text {if } \alpha _2 = \frac{5}{3}\alpha _1, \\ 2 \frac{\alpha _2-\alpha _1}{\alpha _2+\alpha _1} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

3 Automated Scalar Parameter Selection

In order to find a suitable regularization parameter \(\alpha >0\) of the minimization problem (1.6), we consider the corresponding constrained optimization problem (1.8). Throughout the paper, we assume that T does not annihilate constant function, which guarantees the existence of a minimizer of the considered optimization problems; see Sect. 2. We recall that in the constraint of (1.8) the value \(\mathcal {B}_\tau \) is defined as \(\mathcal {B}_\tau =\frac{\nu _\tau }{\tau }|\varOmega |\), where \(\nu _\tau \in \mathbb R\) is a statistical value depending on the underlying noise and possibly on the original image.

3.1 Statistical Characterization of the Noise

Let us characterize the noise corrupting the image in more detail by making similar considerations as in [58, Sect. 2]. Note that at any point \(x\in \varOmega \) the contaminated image \(g(x) = \mathcal {N}(T\hat{u})(x)\) is a stochastic observation, which depends on the underlying noise. Two important measures to characterize noise are the expected absolute value and the variance, which we denote by \(\nu _1\) and \(\nu _2\), respectively. For images contaminated by Gaussian white noise with standard deviation \(\sigma \), we typically set \(\tau =2\) and \(\nu _2=\sigma ^2\). If the image is instead corrupted by impulse noise, then we set \(\tau =1\) and we have to choose \(\nu _1\) properly. In particular, for salt-and-pepper noise \(\nu _1\in [\min \{r_1,r_2\},\max \{r_1,r_2\}]\), while for random-valued impulse noise \(\nu _1\) should be a value in the interval \([\frac{r}{4},\frac{r}{2}]\), where we used that for any point \(x\in \varOmega \) we have \(T\hat{u}(x)\in [0,1]\); cf. [58]. Here \(\nu _1\) seems to be fixed, while actually \(\nu _1\) depends on the true (unknown) image \(\hat{u}\). In particular, for salt-and-pepper noise the expected absolute value is given by

$$\begin{aligned} \nu _1(\hat{u}) := r_2-(r_2-r_1) \frac{1}{|\varOmega |}\int _{ \varOmega }(T\hat{u})(x) \ d x \end{aligned}$$
(3.1)

and for random-valued impulse noise we have

$$\begin{aligned} \nu _1(\hat{u}) := \frac{1}{|\varOmega |}\int _{\varOmega }r\left( (T\hat{u})(x)^2 - (T \hat{u})(x) +\frac{1}{2}\right) \ d x. \end{aligned}$$
(3.2)

However, instead of considering the constraint \({\mathcal H}_\tau (u;g) = \mathcal {B}_\tau (u)\) in (1.8), which results in a quite nonlinear problem, in our numerics we choose a reference image and compute an approximate value \(\mathcal {B}_\tau \). Since our proposed algorithms are of iterative nature (see APS- and pAPS-algorithm below), it makes sense to choose the current approximation as the reference image, i.e., the reference image changes during the iterations. Note that for salt-and-pepper noise with \(r_1=r_2\) the expected absolute value becomes independent of \(\hat{u}\) and hence \(\nu _1=r_1\). In case of Gaussian noise, \(\nu _\tau \) and \(\mathcal {B}_\tau \) are independent of \(\hat{u}\) too. Nevertheless, in order to keep the paper concise, in the sequel instead of \(\nu _\tau \) and \(\mathcal {B}_\tau \) we often write \(\nu _\tau (\tilde{u})\) and \(\mathcal {B}_\tau (\tilde{u})\), where \(\tilde{u}\) represents a reference image approximating \(\hat{u}\), even if the values may actually be independent from the image.

3.2 Automated Parameter Selection Strategy

In order to determine a suitable regularization parameter \(\alpha \) in [18], an algorithm for solving the constrained minimization problem (1.8) for \(T=I\) and \(\tau =2\) is proposed, i.e., in the presence of Gaussian noise with zero mean and standard deviation \(\sigma \). This algorithm relies on the fact that \(\alpha \mapsto {\mathcal H}_2(u_\alpha ;g)\) is nondecreasing, which leads to the following iterative procedure.

figure b

For the minimization of the optimization problem in step 1) in [18], a method based on the dual formulation of the total variation is used. However, we note that any other algorithm for total variation minimization might be used for solving this minimization problem. The CPS-algorithm generates a sequence \((u_{\alpha _n})_n\) such that for \(n\rightarrow \infty \), \(\Vert u_{\alpha _n} - g\Vert _{L^2(\varOmega )} \rightarrow \sigma \sqrt{|\varOmega |}\) and \(u_{\alpha _n}\) converges to the unique solution of (1.8) with \(T=I\) and \(\tau =2\) [18]. The proof relies on the fact that the function \(\alpha \rightarrow \frac{\Vert u_\alpha - g\Vert _{L^2(\varOmega )}}{\alpha }\) is nonincreasing. Note that this property does not hold in general for operators \(T\not =I\).

3.2.1 The p-Adaptive Algorithm

We generalize now the CPS-algorithm to optimization problems of the type (1.8) for \(\tau =1,2\) and for general operators T. In order to keep or obtain appropriate convergence properties, we need the following two conditions to be satisfied. Firstly, the function \(\alpha \mapsto {\mathcal H}_\tau (u_\alpha ;g)\) has to be monotonic, which is the case due to Lemma 2.1. Secondly, in each iteration n the parameter \(\alpha _n\) has to be updated such that \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) is monotonic and bounded by \((\mathcal {B}_\tau (u_{\alpha _n}))_n\). More precisely, if \({\mathcal H}_\tau (u_{\alpha _0};g)\le \mathcal {B}_\tau (u_{\alpha _0})\), then there has to exist an \(\alpha _{n+1}\ge \alpha _n\) such that \( {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau (u_{\alpha _{n+1}})\), while if \({\mathcal H}_\tau (u_{\alpha _0};g)> \mathcal {B}_\tau (u_{\alpha _0})\), then there has to exist an \(\alpha _{n+1}\le \alpha _n\) such that \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\ge \mathcal {B}_\tau (u_{\alpha _{n+1}})\). This holds true by setting in every iteration

$$\begin{aligned} \alpha _{n+1}:=\left( \frac{\mathcal {B}_\tau (u_{\alpha _n})}{{\mathcal H}_\tau (u_{\alpha _n};g)}\right) ^p \alpha _n \end{aligned}$$
(3.3)

together with an appropriate choice of \(p\ge 0\). In particular, there exists always a \(p\ge 0\) such that this condition is satisfied.

Proposition 3.1

Assume \(\Vert g-g_\varOmega \Vert _{L^\tau (\varOmega )}^\tau \ge \nu _\tau |\varOmega |\) and \(\alpha _{n+1}\) is defined as in (3.3).

  1. (i)

    If \(\alpha _n>0\) such that \({\mathcal H}_\tau (u_{\alpha _n};g)= \mathcal {B}_\tau (u_{\alpha _{n}})\), then for all \(p\in \mathbb R\) we have that \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau (u_{\alpha _{n+1}})\).

  2. (ii)

    If \(\alpha _n>0\) such that \(0<{\mathcal H}_\tau (u_{\alpha _n};g)<\mathcal {B}_\tau (u_{\alpha _{n}})\), then there exist \(p\ge 0\) with \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau (u_{\alpha _{n+1}})\).

  3. (iii)

    If \(\alpha _n>0\) such that \({\mathcal H}_\tau (u_{\alpha _n};g)>\mathcal {B}_\tau (u_{\alpha _{n}})\), then there exist \(p\ge 0\) with \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\ge \mathcal {B}_\tau (u_{\alpha _{n+1}})\).

Proof

The assertion immediately follows by noting that for \(p=0\) we have \(\alpha _{n+1}=\alpha _n\).\(\square \)

Taking these considerations into account, a generalization of the CPS-algorithm can be formulated as the following p-adaptive automated parameter selection algorithm:

figure c

Note that due to the dependency of \(\alpha _{n+1}\) on p a proper p cannot be explicitly computed, but only iteratively, as in the pAPS-algorithm.

The initial \(p_0> 0\) can be chosen arbitrarily. However, we suggest to choose it sufficiently large in order to keep the number of iterations small. In particular, in our numerical experiments in Sect. 6 we set \(p_0=32\), which seems large enough to us.

Proposition 3.2

The pAPS-algorithm generates monotone sequences \((\alpha _n)_n\) and \(\left( {\mathcal H}_\tau (u_{\alpha _n};g)\right) _n\) such that \(\left( {\mathcal H}_\tau (u_{\alpha _n};g)\right) _n\) is bounded. Moreover, if \( {\mathcal H}_\tau (u_{\alpha _0};g) > \mathcal {B}_\tau (u_{\alpha _0})\) or \(\mathcal {B}_\tau (u_\alpha )\le \frac{1}{\tau }\Vert g- g_\varOmega \Vert _\tau ^\tau \) for all \(\alpha >0\), then \((\alpha _n)_n\) is also bounded.

Proof

If \({\mathcal H}_\tau (u_{\alpha _0};g) > \mathcal {B}_\tau (u_{\alpha _0})\), then by induction and Lemma 2.1 one shows that \(0< \alpha _{n+1} \le \alpha _n\) and \(0 \le {\mathcal H}_\tau (u_{\alpha _{n+1}};g) \le {\mathcal H}_\tau (u_{\alpha _n};g)\) for all \(n\in \mathbb N\). Consequently, \((\alpha _n)_n\) and \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) are monotonically decreasing and bounded.

If \({\mathcal H}_\tau (u_{\alpha _0};g)\le \mathcal {B}_\tau (u_{\alpha _0})\), due to Lemma 2.1 we have that \(0<\alpha _n\le \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g)\le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\) for all \(n\in \mathbb N\) and hence \((\alpha _n)_n\) and \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) are monotonically increasing. Since there exists \(\mathcal {B}_\tau ^*>0\) such that \({\mathcal H}_\tau (u_{\alpha _n};g) \le \mathcal {B}_\tau (u_{\alpha _n})\le \mathcal {B}_\tau ^*\) for all \(n\in \mathbb N\), see Sect. 3.1, \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) is also bounded. If we additionally assume that \(\mathcal {B}_\tau (u_\alpha )\le \frac{1}{\tau }\Vert g-g_\varOmega \Vert _\tau ^\tau \) for all \(\alpha >0\) and we set \(\mathcal {B}_\tau ^*:=\max _\alpha \mathcal {B}_\tau (u_\alpha )\), then Theorem 2.2 ensures the existence of an \(\alpha ^*\ge 0\) such that \({\mathcal H}_\tau (u_{\alpha ^*};g) = \mathcal {B}_\tau ^*\). By Lemma 2.1, it follows that \(\alpha _{n}\le \alpha ^*\) for all \(n\in \mathbb N\), since \({\mathcal H}_\tau (u_{\alpha _{n}};g) \le \mathcal {B}_\tau (u_{\alpha _n})\le \mathcal {B}_\tau ^*\). Hence, \((\alpha _n)_n\) is bounded, which finishes the proof.\(\square \)

Since any monotone and bounded sequence converges to a finite limit, also \((\alpha _n)_n\) converges to a finite value if one of the assumptions in Proposition 3.2 holds. For constant \(\mathcal {B}_\tau \), we are even able to argue the convergence of the pAPS-algorithm to a solution of the constrained minimization problem (1.8).

Theorem 3.1

Assume that \(\mathcal {B}_\tau (u)\equiv \mathcal {B}_\tau \) is a constant independent of u and \(\Vert g- g_\varOmega \Vert _\tau ^\tau \ge \nu _\tau |\varOmega |\). Then the pAPS-algorithm generates a sequence \((\alpha _n)_n\) such that \(\lim _{n\rightarrow \infty } \alpha _n=\bar{\alpha } >0\) with \({\mathcal H}_\tau (u_{\bar{\alpha }};g)=\lim _{n\rightarrow \infty }{\mathcal H}_\tau (u_{\alpha _n};g)\) \(=\) \(\mathcal {B}_\tau \) and \(u_{\alpha _n} \rightarrow u_{\bar{\alpha }}\in \arg \min _{u\in X} \mathcal {J}_\tau (u,\bar{\alpha })\) for \(n\rightarrow \infty \).

Proof

Let us start with assuming that \({\mathcal H}_\tau (u_{\alpha _0};g)\le B_\tau \). By induction, we show that \(\alpha _n \le \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g)\) \(\le \) \( {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau \). In particular, if \({\mathcal H}_\tau (u_{\alpha _n};g) \le \mathcal {B}_\tau \), then \(\alpha _{n+1}=\left( \frac{\mathcal {B}_\tau }{{\mathcal H}_\tau (u_{\alpha _n};g)} \right) ^p\alpha _n > \alpha _n\), where \(p> 0\) such that \({\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau \); cf. pAPS-algorithm. Then by Lemma 2.1 it follows that

$$\begin{aligned} {\mathcal H}_\tau (u_{\alpha _n};g)\le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau . \end{aligned}$$

Note that there exists an \(\alpha ^*>0\) with \({\mathcal H}_\tau (u_{\alpha ^*};g)=B_\tau \), see Theorem 2.2, such that for any \(\alpha \ge \alpha ^*\), \({\mathcal H}_\tau (u_{\alpha };g) \ge \mathcal {B}_\tau \); cf. Lemma 2.1. If \(\alpha _n\ge \alpha ^*\), then \({\mathcal H}_\tau (u_{\alpha _n};g)\ge B_\tau \). Hence \({\mathcal H}_\tau (u_{\alpha _n};g) = B_\tau \) and \(\alpha _{n+1}=\alpha _n\). Thus, we deduce that the sequences \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) and \((\alpha _n)_n\) are nondecreasing and bounded. Consequently, there exists an \(\bar{\alpha }\) such that \(\lim _{n\rightarrow \infty }\alpha _n = \bar{\alpha }\) with \({\mathcal H}_\tau (u_{\bar{\alpha }};g) = B_\tau \). Let \(\bar{{\mathcal H}} = \lim _{n\rightarrow \infty } {\mathcal H}_\tau (u_{\alpha _n};g)\). Then \(\bar{{\mathcal H}}={\mathcal H}_\tau (u_{\bar{\alpha }};g)=B_\tau \). By the optimality of \(u_{\alpha _n}\), we have that \(0\in \partial \mathcal {J}_\tau (u_{\alpha _n},\alpha _n) = \partial {\mathcal H}_\tau (u_{\alpha _n};g) + \alpha _n \partial \int _\varOmega |D u_{\alpha _n}|\); see [39, Prop. 5.6 + Eq. (5.21), p. 26]. Consequently, there exist \(v_{\alpha _n}\in \partial \int _\varOmega |D u_{\alpha _n}|\) such that \(-\alpha _n v_{\alpha _n}\in \partial {\mathcal H}_\tau (u_{\alpha _n};g) \) with \(\lim _{n\rightarrow \infty } v_{\alpha _n}=v_{\bar{\alpha }}\). By [79, Theorem. 24.4, p. 233], we obtain that \(-\bar{\alpha }v_{\bar{\alpha }}\in \partial {\mathcal H}_\tau (u_{\bar{\alpha }};g) \) with \(v_{\bar{\alpha }}\in \partial \int _\varOmega |D u_{\bar{\alpha }}|\) and hence \(0\in \partial \mathcal {J}_\tau (u_{\bar{\alpha }},\bar{\alpha })\) for \(n\rightarrow \infty \).

If \({\mathcal H}_\tau (u_{\alpha _0};g)>B_\tau \), then as above we can show by induction that \(\alpha _n \ge \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g)\ge {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\ge B_\tau \). Thus, we deduce that \(({\mathcal H}_\tau (u_{\alpha _n};g))_n\) and \((\alpha _n)_n\) are nonincreasing and bounded. Note that there exists an \(\alpha ^*>0\) with \({\mathcal H}_\tau (u_{\alpha ^*};g)=B_\tau \) such that for any \(\alpha \le \alpha ^*\), \({\mathcal H}_\tau (u_{\alpha };g)\le B_\tau \). Hence if \(\alpha _n\le \alpha ^*\), then \({\mathcal H}_\tau (u_{\alpha _n};g)\le B_\tau \). This implies that \({\mathcal H}_\tau (u_{\alpha _n};g) = B_\tau \) and \(\alpha _{n+1}=\alpha _n\). The rest of the proof is identical to the above.\(\square \)

Remark 3.1

The adaptive choice of the value p in the pAPS-algorithm is fundamental for proving convergence in Theorem 3.1. In particular, the value p is chosen in dependency of \(\alpha \), i.e., actually \(p=p(\alpha )\), such that in the case of a constant \(\mathcal {B}_\tau \) the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)^{p(\alpha )}}{\alpha }\) is nonincreasing; cf. Fig. 5a.

3.2.2 The Non p-Adaptive Case

A special case of the pAPS-algorithm accrues when the value p is not adapted in each iteration but set fixed. For the case \(p=1\) (fixed), we obtain the following automated parameter selection algorithm.

figure d

Even in this case, although under certain assumptions, we can immediately argue the convergence of this algorithm.

Theorem 3.2

For \(\alpha >0\), let \(u_\alpha \) be a minimizer of \(\mathcal {J}_\tau (u,\alpha )\). Assume that \(\mathcal {B}_\tau (u)\equiv \mathcal {B}_\tau \) is a constant independent of u, the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is nonincreasing, and \(\Vert g- g_\omega \Vert _\tau ^\tau \ge \nu _\tau |\varOmega |\). Then the APS-algorithm generates a sequence \((\alpha _n)_n \subset \mathbb R^+\) such that \(\lim _{n\rightarrow \infty } \alpha _n = \bar{\alpha }>0\), \(\lim _{n\rightarrow \infty }{\mathcal H}_\tau (u_{\alpha _n};g)=B_\tau \), and \(u_{\alpha _n}\) converges to \(u_{\bar{\alpha }}\in \arg \min _{u\in BV(\varOmega )} \mathcal {J}(u,\bar{\alpha })\) for \(n\rightarrow \infty \).

Proof

We only consider the case when \({\mathcal H}_\tau (u_{\alpha _0};g)\le \mathcal {B}_\tau \) by noting that the case \({\mathcal H}_\tau (u_{\alpha _0};g)> \mathcal {B}_\tau \) can be shown analogous. By induction, we can show that \(\alpha _n\le \alpha _{n+1}\) and \({\mathcal H}_\tau (u_{\alpha _n};g) \le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\le \mathcal {B}_\tau \). More precisely, if \({\mathcal H}_\tau (u_{\alpha _n};g)\le \mathcal {B}_\tau \), then \(\alpha _{n+1}=\frac{\mathcal {B}_\tau }{{\mathcal H}_\tau (u_{\alpha _n};g)}\alpha _n \ge \alpha _n\) and by Lemma 2.1 it follows that \({\mathcal H}_\tau (u_{\alpha _n};g)\le {\mathcal H}_\tau (u_{\alpha _{n+1}};g)\). Moreover, by the assumption that \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_{\alpha };g)}{\alpha }\) is nonincreasing, we obtain \({\mathcal H}_\tau (u_{\alpha _{n+1}};g) \le \frac{\alpha _{n+1}}{\alpha _n} {\mathcal H}_\tau (u_{\alpha _n};g)=\mathcal {B}_\tau .\) That is,

$$\begin{aligned} {\mathcal H}_\tau (u_{\alpha _n};g)\le {\mathcal H}_\tau (u_{\alpha _{n+1}};g) \le \mathcal {B}_\tau . \end{aligned}$$

The rest of the proof is analogous to the one of Theorem 3.1.\(\square \)

Nothing is known about the convergence of the APS-algorithm, if \(\mathcal {B}_\tau (\cdot )\) indeed depends on u and \(\mathcal {B}_\tau (u_{\alpha _n})\) is used instead of a fixed constant. In particular, in our numerics for some examples, in particular for the application of removing random-valued impulse noise with \(r=0.05\), we even observe that starting from a certain iteration the sequence \((\alpha _n)_n\) oscillates between two states, see Fig. 10c. This behavior can be attributed to the fact that, for example, if \(H_\tau (u_{\alpha _n}) \le \mathcal {B}_\tau (u_{\alpha _n})\), then it is not guaranteed that also \(H_\tau (u_{\alpha _{n+1}}) \le \mathcal {B}_\tau (u_{\alpha _{n+1}})\), which is essential for the convergence. The second assumption in the previous theorem, i.e., the nonincrease of the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\), can be slightly loosened, since for the convergence of the APS-algorithm it is enough to demand the nonincrease starting from a certain iteration \(\tilde{n}\ge 0\). That is, if there exists a region \(U\subset \mathbb R^+\) where \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is nonincreasing and \((\alpha _n)_{n\ge \tilde{n}} \subset U\), then the algorithm converges; see Fig. 5. Analytically, this can be easily shown via Theorem 3.2 by just considering \(\alpha _{\tilde{n}}\) as the initial value of the algorithm. If \(\tau =2\), similar to the CPS-algorithm, we are able to show the following monotonicity property.

Proposition 3.3

If there exists a constant \(c > 0\) such that \(\Vert T^*(T u - g)\Vert _{L^2(\varOmega )} = c \Vert T u - g\Vert _{L^2(\varOmega )}\) for all \(u\in L^2(\varOmega )\), then the function \(\alpha \mapsto \frac{\sqrt{{\mathcal H}_2(u_\alpha ;g)}}{\alpha }\) is nonincreasing, where \(u_\alpha \) is a minimizer of \( \mathcal {J}_2(u,\alpha )\).

Proof

We start by replacing the functional \(\mathcal {J}_2\) by a family of surrogate functionals denoted by \(\bar{\mathcal {S}}\) and defined for \(u,a\in X\) as

$$\begin{aligned} \begin{aligned} \bar{\mathcal {S}}(u,a)&:= \mathcal {J}_{2}(u,\alpha ) + \frac{\delta }{2}\Vert u-a\Vert _{L^2(\varOmega )}^2 - \frac{1}{2}\Vert T(u-a)\Vert _{L^2(\varOmega )}^2 \\&=\delta \Vert u-z(a)\Vert _{L^2(\varOmega )}^2 + 2\alpha \int _\varOmega |Du| + \psi (a,g,T), \end{aligned} \end{aligned}$$

where \(\delta >\Vert T\Vert ^2, z(a):=a - \frac{1}{\delta }T^*(T a - g)\), and \(\psi \) is a function independent of u. It can be shown that the iteration

$$\begin{aligned} u_{\alpha ,0}\in X, \quad u_{\alpha ,k+1}=\arg \min _u \bar{\mathcal {S}}(u,u_{\alpha ,k}), \ k\ge 0 \end{aligned}$$
(3.4)

generates a sequence \((u_{\alpha ,k})_k\) which converges weakly for \(k\rightarrow \infty \) to a minimizer \(u_{\alpha }\) of \(\mathcal {J}_2(u,\alpha )\); see for example [32]. The unique minimizer \(u_{\alpha ,k+1}\) is given by \(u_{\alpha ,k+1} = (I-P_{\frac{\alpha }{\delta }K_{}})(z(u_{\alpha ,k}))\), where K is the closure of the set

$$\begin{aligned} \{{\text {div}}\xi \ : \ \xi \in C_c^1(\varOmega ,\mathbb R^2), |\xi (x)| \le 1 \ \forall x\in \varOmega \} \end{aligned}$$

and \(P_K(u):=\arg \min _{v\in K} \Vert u-v\Vert _{L^2(\varOmega )}\); see [18]. Then for \(k\rightarrow \infty \), let us define

$$\begin{aligned} \tilde{f}(\alpha ):= \left\| P_{\frac{\alpha }{\delta }K_{}}( z(u_{\alpha }))\right\| _{L^2(\varOmega )} = \left\| \frac{1}{\delta }T^*(T u_{\alpha } - g)\right\| _{L^2(\varOmega )}. \end{aligned}$$

Since \(\Vert T^*(T u - g)\Vert _{L^2(\varOmega )} = c \Vert T u - g\Vert _{L^2(\varOmega )}\), it follows that \(\tilde{f}(\alpha )=\frac{c}{\delta } \sqrt{2 {\mathcal H}_2(u_\alpha ;g)}\). The assertion follows by applying [18, Lemma4.1], which is extendable to infinite dimensions, to \(\tilde{f}\) and by noting that the nonincrease of \(\alpha \mapsto \frac{\tilde{f}(\alpha )}{\alpha }\) implies the nonincrease of \(\alpha \mapsto \frac{\sqrt{{\mathcal H}_2(u_\alpha ;g)}}{\alpha }\).\(\square \)

We remark that for convolution type of operators the assumption of Proposition 3.3 does not hold in general. However, there exist several operators T, relevant in image processing, with the property \(\Vert T^*(Tu-g)\Vert _{L^2(\varOmega )} = \Vert Tu-g\Vert _{L^2(\varOmega )}\). Such operators include \(T=I\) for image denoising, \(T=1_D\) for image inpainting, where \(1_D\) denotes the characteristic function of the domain \(D\subset \varOmega \), and \(T=S\circ A\), where S is a subsampling operator and A is an analysis operator of a Fourier or orthogonal wavelet transform. The latter type of operator is used for reconstructing signals from partial Fourier data [17] or in wavelet inpainting [25], respectively. For all such operators, the function \(\alpha \mapsto \frac{\sqrt{{\mathcal H}_2(u_\alpha ;g)}}{\alpha }\) is nonincreasing and hence by setting \(p=\frac{1}{2}\) fixed in the pAPS-algorithm or changing the update of \(\alpha \) in the APS-algorithm to

$$\begin{aligned} \alpha _{n+1} := \sqrt{\frac{\mathcal {B}_2}{{\mathcal H}_2(u_{\alpha _n};g)}} \alpha _n, \end{aligned}$$

where \(\mathcal {B}_2\) is a fixed constant, chosen according to (1.6), we obtain in these situations a convergent algorithm.

We emphasize once more that in general the nonincrease of the function \(\alpha \mapsto \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is not guaranteed. Nevertheless, there exists always a constant \(p\ge 0\) such that \(\alpha \mapsto \frac{({\mathcal H}_\tau (u_\alpha ;g))^p}{\alpha }\) is indeed nonincreasing. For example, \(p=\frac{1}{2}\) for operators T with the property \(\Vert T^*(Tu-g)\Vert _{L^2(\varOmega )} = \Vert Tu-g\Vert _{L^2(\varOmega )}\); cf. Propsition 3.3. In particular, one easily checks the following result.

Proposition 3.4

Let \(0<\alpha \le \beta \), and \(u_\alpha \) and \(u_\beta \) minimizers of \(\mathcal {J}_\tau (\cdot ,\alpha )\) and \(\mathcal {J}_\tau (\cdot ,\beta )\), respectively, for \(\tau =1,2\). Then \(\frac{({\mathcal H}_\tau (u_{\beta };g))^p}{\beta } \le \frac{({\mathcal H}_\tau (u_{\alpha };g))^p}{\alpha }\) if and only if \(p \le \frac{\ln {\beta } - \ln {\alpha }}{\ln {{\mathcal H}_\tau (u_{\beta };g)} - \ln {{\mathcal H}_\tau (u_{\alpha };g)}}\).

4 Locally Constrained TV Problem

In order to enhance image details, while preserving homogeneous regions, we formulate, as in [37, 58], a locally constrained optimization problem. That is, instead of considering (1.6) we formulate

$$\begin{aligned} \min _{u\in BV(\varOmega )} \int _\varOmega |Du| \ \text { s.t. } \ \int _\varOmega w(x,y) |Tu - g|^\tau (y) \mathrm{d}y \le \nu _\tau \nonumber \\ \end{aligned}$$
(4.1)

for almost every \(x\in \varOmega \), where w is a normalized filter, i.e., \(w\in L^\infty (\varOmega \times \varOmega )\), and \(w\ge 0\) on \(\varOmega \times \varOmega \) with

$$\begin{aligned} \begin{aligned}&\int _{\varOmega }\int _\varOmega w(x,y)\mathrm{d}y \mathrm{d}x =1 \quad \\&\text { and } \quad \\&\int _{\varOmega }\int _\varOmega w(x,y)|\phi (y)|^\tau \mathrm{d}y \mathrm{d}x \ge \epsilon \Vert \phi \Vert ^\tau _{L^\tau (\varOmega )} \ \end{aligned} \end{aligned}$$
(4.2)

for all \(\phi \in L^{\tau }(\varOmega )\) and for some \(\epsilon >0\) independent of \(\phi \); cf. [37, 58].

4.1 Local Filtering

In practice, for w we may use the mean filter together with a windowing technique; see for example [37, 58]. In order to explain the main idea, we continue in a discrete setting. Let \(\varOmega ^h\) be a discrete image domain containing \(N_1\times N_2\) pixels, \(N_1,N_2\in \mathbb N\), and by \(|\varOmega ^h|=N_1N_2\) we denote the size of the discrete image (number of pixels). We approximate functions u by discrete functions, denoted by \(u^h\). The considered function spaces are \(X=\mathbb R^{N_1\times N_2}\) and \(Y=X\times X\). In what follows for all \(u^h\in X\), we use the following norms:

$$\begin{aligned} \Vert u^h\Vert _{\tau } := \Vert u^h\Vert _{\ell ^\tau (\varOmega ^h)} = \left( \sum _{x\in \varOmega ^h} |u^h(x)|^\tau \right) ^{1/\tau } \end{aligned}$$

for \(1\le \tau <+\infty \). Moreover, we denote by \(u^h_\varOmega \) the average value of \(u^h\in X\), i.e, \( u^h_\varOmega := \frac{1}{|\varOmega ^h|} \sum _{x\in \varOmega ^h} u^h(x)\). The discrete gradient \(\nabla ^h: X \rightarrow Y\) and the discrete divergence \({\text {div}}^h : Y \rightarrow X\) are defined in a standard way by forward and backward differences such that \({\text {div}}^h:= - (\nabla ^h)^* \); see for example [18, 22, 55, 63]. With the above notations and definitions, the discretization of the general function in (1.9) is given by

$$\begin{aligned} J_\tau (u^h,\alpha ):=H_\tau (u^h) + R_ \alpha (u^h), \end{aligned}$$
(4.3)

where \(H_\tau (u^h)=\frac{1}{\tau } \Vert T^hu^h-g^h\Vert _{\tau }^{\tau }\), \(\tau \in \{1,2\}\), \(T^h : X \rightarrow X\) is a bounded linear operator, \(\alpha \in (\mathbb R^+)^{N_1 \times N_2}\), and

$$\begin{aligned} R_\alpha (u^h) := \sum _{x\in \varOmega ^h} \alpha (x) |\nabla ^h u^h(x) |_{l^2} \end{aligned}$$
(4.4)

with \(|y|_{l^2}=\sqrt{y_1^2 + y_2^2}\) for every \(y=(y_1,y_2)\in \mathbb R^2\). In the sequel, if \(\alpha \) is a scalar or \(\alpha \equiv 1\) in (4.4), we write instead of \(R_\alpha \) or \(R_1\) just \(\alpha R\) or R, respectively, i.e.,

$$\begin{aligned} R(u^h) = \sum _{x\in \varOmega ^h}|\nabla ^h u^h(x)|_{l^2} \end{aligned}$$

is the discrete total variation of u in \(\varOmega ^h\), and we write \(\bar{E}_\tau \) instead of \(E_\tau \) to indicate that \(\alpha \) is constant. Introducing some step-size h, then for \(h\rightarrow 0\) (i.e. the number of pixels \(N_1N_2\) goes to infinity), one can show, similar as for the case \(\alpha \equiv 1\), that \(R_{\alpha }\) \(\Gamma \)-converges to \(\int _\varOmega \alpha |Du|\); see [12, 62]. We turn now to the locally constrained minimization problem, which is given in the discrete setting as

$$\begin{aligned} \min _{u^h\in X} R(u^h) \quad \text { s.t. } \quad S_{i,j}^{\tau }(u^h)\le \frac{\nu _\tau }{\tau } \ \text { for all } x_{i,j} \in \varOmega ^h. \end{aligned}$$
(4.5)

Here \(\nu _\tau \) is a fixed constant and

$$\begin{aligned} S_{i,j}^{\tau }(u^h) := \frac{1}{M_{i,j}}\sum _{x_{s,t}\in \mathcal {I}_{i,j}}\frac{1}{\tau }|(T^h u^h)(x_{s,t}) - g^h(x_{s,t}) |^\tau \end{aligned}$$

denotes the local residual at \(x_{i,j}\in \varOmega ^h\) with \(\mathcal {I}_{i,j}\) being some suitable set of pixels around \(x_{i,j}\) of size \(M_{i,j}\), i.e., \(M_{i,j}= |\mathcal {I}_{i,j}|\). For example, in [37, 54, 58] for \(\mathcal {I}_{i,j}\) the set

$$\begin{aligned} \varOmega _{i,j}^{\omega } = \left\{ x_{s+i,t+j}\in \varOmega ^h : - \frac{\omega -1}{2}\le s,t \le \frac{\omega -1}{2}\right\} \end{aligned}$$

with a symmetric extension at the boundary and with \(\omega \) being odd is used. That is, \(\varOmega _{i,j}^\omega \) is a set of pixels in a \(\omega \)-by-\(\omega \) window centered at \(x_{i,j}\), i.e., \(M_{i,j}=\omega ^2\) for all ij, such that \(\varOmega _{i,j}^\omega \not \subset \varOmega ^h\) for \(x_{i,j}\) sufficiently close to \(\partial \varOmega \). Additionally, we denote by \(\tilde{\varOmega }_{i,j}^{\omega }\) a set of pixels in a window centered at \(x_{i,j}\) without any extension at the boundary, i.e.,

$$\begin{aligned} \begin{aligned} \tilde{\varOmega }_{i,j}^{\omega }&= \Bigg \{x_{s+i,t+j} : \max \left\{ 1-(i,j),- \frac{\omega -1}{2}\right\} \le (s,t) \\&\le \min \left\{ \frac{\omega -1}{2},(N_1-i,N_2-j)\right\} \Bigg \}. \end{aligned} \end{aligned}$$

Hence \(\tilde{\varOmega }_{i,j}^\omega \subset \varOmega ^h\) for all \(x_{i,j}\in \varOmega ^h\). Before we analyze the difference between \(\varOmega _{i,j}^{\omega }\) and \(\tilde{\varOmega }_{i,j}^{\omega }\) with respect to the constrained minimization problem (4.5), we note that since \(T^h\) does not annihilate constant functions, the existence of a solution of (4.5) is guaranteed; see [37, Theorem 2][58, Theorem 2].

In the following, we set \(B_\tau :=\frac{\nu _\tau }{\tau }|\varOmega ^h|\).

Proposition 4.1

  1. (i)

    If \(u^h\) is a solution of (4.5) with \(\mathcal {I}_{i,j}= \varOmega _{i,j}^{\omega }\), then \(H_\tau (u^h) < B_\tau .\)

  2. (ii)

    If u is a solution of (4.5) with \(\mathcal {I}_{i,j}= \tilde{\varOmega }_{i,j}^{\omega }\), then \(H_\tau (u^h) \le B_\tau .\)

Proof

  1. (i)

    Since \(u^h\) is a solution of (4.5) and \(\varOmega _{i,j}^\omega \) is a set of pixels in a \(\omega \)-by-\(\omega \) window, we have

    $$\begin{aligned} \begin{aligned} B_\tau&\ge \sum _{i,j}S_{i,j}^{\tau }(u^h) \\&= \sum _{i,j} \frac{1}{\tau \omega ^2}\sum _{x_{s,t}\in \varOmega _{i,j}^\omega }|g^h(x_{s,t})- T^h u^h(x_{s,t}) |^\tau \\&> \frac{1}{\tau }\sum _{i,j} |g^h(x_{i,j})- T^h u^h(x_{i,j})|^\tau = H_\tau (u^h). \end{aligned} \end{aligned}$$

    Here we used that due to the sum over ij each element (pixel) in \(\varOmega _{i,j}^\omega \) appears at most \(\omega ^2\) times. More precisely, any pixel coordinate in the set \(\Lambda ^\omega := \{(i,j) : \min \{i-1,j-1,N_1-i,N_2-j\}\ge \frac{\omega -1}{2}\}\) occurs exactly \(\omega ^2\) times, while any other pixel coordinate appears strictly less than \(\omega ^2\) times. This shows the first statement.

  1. (ii)

    For a minimizer \(u^h\) of (4.5), we obtain

    $$\begin{aligned} \begin{aligned} B_\tau&\ge \sum _{i,j}S_{i,j}^{\tau }(u^h) \\&= \sum _{i,j} \frac{1}{\tau M_{i,j}}\sum _{x_{s,t}\in \varOmega _{i,j}^\omega }|g^h(x_{s,t}) - T^h u^h(x_{s,t})|^\tau \\&= \frac{1}{\tau }\sum _{i,j} |g^h(x_{i,j})- T^h u^h(x_{i,j})|^\tau = H_\tau (u^h), \end{aligned} \end{aligned}$$

    which concludes the proof.\(\square \)

Note that if \(\mathcal {I}_{i,j} = \tilde{\varOmega }_{i,j}^{\omega }\), then by Proposition 4.1 a minimizer of (4.5) also satisfies the constraint of the problem

$$\begin{aligned} \min _{u^h\in X} R(u^h) \quad \text {s.t.} \quad H_\tau (u^h)\le B_\tau \end{aligned}$$
(4.6)

(discrete version of (2.1)) but is in general of course not a solution of (4.6).

Proposition 4.2

Let \(\mathcal {I}_{i,j} = \tilde{\varOmega }_{i,j}^{\omega }\), \(u_s^h\) be a minimizer of (4.6), and \(u_l^h\) be a minimizer of (4.5). Then \(R(u_s^h)\le R(u_l^h)\).

Proof

Assume that \(R(u_s^h)> R(u_l^h)\). Since \(u_s^h\) is a solution of (4.6), it satisfies the constraint \(H_\tau (u_s^h)\le B_\tau \). By Proposition 4.1, we also have \(H_\tau (u_l^h)\le B_\tau \). Since \(R(u_s^h) > R(u_l^h)\), \(u_s^h\) is not the solution of (4.6) which is a contradiction. Hence, \(R(u_s^h)\le R(u_l^h)\).\(\square \)

Remark 4.1

Proposition 4.1 and its consequence are not special properties of the discrete setting. Let the filter w in (4.1) be such that the inequality in (4.2) becomes an equality with \(\epsilon =1/|\varOmega |\), as it is the case in Proposition 4.1(ii). Then a solution \({u}_l\) of the locally constrained minimization problem (4.1) satisfies

$$\begin{aligned} \mathcal {H}_\tau ({u}_l;g)\le \frac{\nu _\tau }{\tau }|\varOmega | \ \text { and }\ \int _\varOmega |D{u}_l| \ge \int _\varOmega |D{u}_s|, \end{aligned}$$

where \({u}_s\) is a solution of (2.1).

From Proposition 4.2 and Remark 4.1, we conclude, since \(R(u_s^h) \le R(u_l^h)\) and \(\int _\varOmega |D{u}_s| \le \int _\varOmega |D{u}_l|\), that \(u_s^h\) and \({u}_s\) are smoother than \(u_l^h\) and \({u}_l\), respectively. Hence, the solution of the locally constrained minimization problem is expected to preserve details better than the minimizer of the globally constrained optimization problem. Since noise can be interpreted as fine details, which we actually want to eliminate, this could also mean that noise is possibly left in the image.

4.2 Locally Adaptive Total Variation Algorithm

Whenever \(\nu _\tau \) depends on \(\hat{u}\), problem (4.5) results in a quite nonlinear problem. Instead of considering nonlinear constraints, we choose as in Sect. 3 a reference image \(\tilde{u}\) and compute an approximate \(\nu _\tau =\nu _\tau (\tilde{u})\). Note that in our discrete setting for salt-and-pepper noise we have now

$$\begin{aligned} \nu _1({u}^h) := r_2-(r_2-r_1) \frac{1}{|\varOmega |}\sum _{x\in \varOmega ^h}(T^h{u}^h)(x) \end{aligned}$$

and for random-valued impulse noise we have

$$\begin{aligned} \nu _1({u}^h) := \frac{1}{|\varOmega ^h|}\sum _{x\in \varOmega ^h}r\left( \big (T^h{u}^h\big )(x)^2 - \big (T^h {u}^h\big )(x) +\frac{1}{2}\right) . \end{aligned}$$

In our below proposed locally adaptive algorithms, we choose as a reference image the current approximation (see LATV- and pLATV-algorithm below), as also done in the pAPS- and APS-algorithm above. Then we are seeking for a solution \(u^h\) such that \(S_{i,j}^{\tau }(u^h)\) is close to \(\frac{\nu _\tau }{\tau }\). We note that for large \(\alpha >0\) the minimization of (4.3) yields an oversmoothed restoration \(u_\alpha ^h\) and the residual contains details, i.e., we expect \(H_\tau (u_\alpha ^h)> B_\tau \). Hence, if \( S_{i,j}^{\tau }(u_\alpha ^h) > \frac{\nu _\tau }{\tau }\), we suppose that this is due to image details contained in the local residual image. In this situation, we intend to decrease \(\alpha \) in the local regions \(\mathcal {I}_{i,j}\). In particular, we define, similar as in [37, 58], the local quantity \(f^\omega _{i,j}\) by

$$\begin{aligned} f^\omega _{i,j}:= {\left\{ \begin{array}{ll} S_{i,j}^{\tau }(u_\alpha ^h) &{} \text { if } S_{i,j}^{\tau }(u_\alpha ^h)> \frac{\nu _\tau }{\tau },\\ \frac{\nu _\tau }{\tau } &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Note that \(\frac{\nu _\tau }{\tau f^\omega _{i,j}}\le 1\) for all ij and hence we set

$$\begin{aligned} \alpha (x_{i,j}):= \frac{1}{ M_{i,j}}\sum _{x_{s,t}\in \mathcal {I}_{i,j}} \left( \frac{\nu _\tau }{\tau f^\omega _{s,t}} \right) ^p \alpha (x_{s,t}). \end{aligned}$$
(4.7)

On the other hand, for small \(\alpha >0\) we get an undersmoothed image \(u_\alpha ^h\), which still contains noise, i.e., we expect \(H_\tau (u_\alpha ^h)< B_\tau \). Analogously, if \(S_{i,j}^{\tau }(u_\alpha ^h) \le \frac{\nu _\tau }{\tau }\), we suppose that there is still noise left outside the residual image in \(\mathcal {I}_{i,j}\). Hence, we intend to increase \(\alpha \) in the local regions \(\mathcal {I}_{i,j}\) by defining

$$\begin{aligned} f^\omega _{i,j}:= {\left\{ \begin{array}{ll} S_{i,j}^{\tau }(u_\alpha ^h) &{} \text { if } S_{i,j}^{\tau }(u_\alpha ^h)< \frac{\nu _\tau }{\tau },\\ \frac{\nu _\tau }{\tau } &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

and setting \(\alpha \) as in (4.7). Notice that now \(\frac{\nu _\tau }{\tau f^\omega _{i,j}}\ge 1\). These considerations lead to the following locally adapted total variation algorithm.

figure e

Here and below \(\varepsilon >0\) is a small constant (e.g., in our experiments we choose \(\varepsilon =10^{-14}\)) to ensure that \(f_{i,j}^\omega >0\), since it may happen that \(S_{i,j}^{\tau }(u_{\alpha _n}^h)=0\).

If \(H_\tau (u_{\alpha _0}^h) >B_\tau (u_{\alpha _0}^h)\), we stop the algorithm as soon as the residual \(H_\tau (u_{\alpha _n}^h)<B_\tau (u_{\alpha _n}^h)\) for the first time and set the desired locally varying \(\alpha ^*=\alpha _{n}\). If \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), we stop the algorithm as soon as the residual \(H_\tau (u_{\alpha _n}^h)>B_\tau (u_{\alpha _n}^h)\) for the first time and set the desired locally varying \(\alpha ^*=\alpha _{n-1}\), since \(H_\tau (u_{\alpha _{n-1}}^h)\le \frac{\nu _\tau (u_{\alpha _{n-1}}^h)}{\tau }\).

The LATV-algorithm has the following monotonicity properties with respect to \((\alpha _n)_n\).

Proposition 4.3

Assume \(\mathcal {I}_{i,j} = \varOmega _{i,j}^\omega \) and let \(\varepsilon >0\) be sufficiently small. If \(\alpha _0 >0\) such that \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), then the LATV-algorithm generates a sequence \((\alpha _n)_n\) such that

$$\begin{aligned} \sum _{i,j} \alpha _{n+1}(x_{i,j}) > \sum _{i,j} \alpha _{n}(x_{i,j}). \end{aligned}$$

Proof

By the same argument as in the proof of Proposition 4.1, we obtain

$$\begin{aligned} \begin{aligned} \sum _{i,j} \alpha _{n+1}(x_{i,j})&= \sum _{i,j} \left( \frac{(\nu _\tau (u_{\alpha _n}^h))^p}{\tau ^p\omega ^2}\sum _{(s,t)\in \varOmega _{i,j}^\omega } \frac{\alpha _n(x_{s,t})}{(f_{s,t}^\omega )^p}\right) \\&> \sum _{i,j} \left( \frac{(\nu _\tau (u_{\alpha _n}^h))^p \omega ^2}{\tau ^p \omega ^2} \frac{\alpha _n(x_{i,j})}{(f_{i,j}^\omega )^p}\right) . \end{aligned} \end{aligned}$$

Note that \(\nu _\tau (\cdot )\) is bounded from below, see Sect. 3.1. Consequently, there exists an \(\varepsilon >0\) such that \(\frac{\nu _\tau (u^h)}{\tau } \ge \varepsilon \) for any \(u^h\). Then, since \(H_\tau (u_{\alpha _0}^h)\le B_\tau (u_{\alpha _0}^h)\) we have by the LATV-algorithm that

$$\begin{aligned} f_{i,j}^\omega := \max \left\{ \min \left\{ S_{i,j}^{\tau }(u_{\alpha _n}^h),\frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right\} ,\varepsilon \right\} \le \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau } \end{aligned}$$

and hence \(\sum _{i,j} (\alpha _{n+1})(x_{i,j}) > \sum _{i,j} (\alpha _{n})(x_{i,j}).\) \(\square \)

Proposition 4.4

Let \(\mathcal {I}_{i,j} = \tilde{\varOmega }_{i,j}^\omega \) and \(\varepsilon >0\) be sufficiently small.

  1. (i)

    If \(\alpha _0 >0\) such that \(H_\tau (u_{\alpha _0}^h) > B_\tau (u_{\alpha _0}^h)\), then the LATV-algorithm generates a sequence \((\alpha _n)_n\) such that

    $$\begin{aligned} \sum _{i,j} (\alpha _{n+1})(x_{i,j}) \le \sum _{i,j} (\alpha _{n})(x_{i,j}). \end{aligned}$$
  2. (ii)

    If \(\alpha _0 >0\) such that \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), then the LATV-algorithm generates a sequence \((\alpha _n)_n\) such that

    $$\begin{aligned} \sum _{i,j} (\alpha _{n+1})(x_{i,j}) \ge \sum _{i,j} (\alpha _{n})(x_{i,j}). \end{aligned}$$

Proof

  1. (i)

    By the same argument as in the proof of Proposition 4.1 and since \(f_{i,j}^{\omega }:= \max \left\{ S_{i,j}^{\tau }(u_{\alpha _n}^h),\frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right\} \ge \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\), we obtain

    $$\begin{aligned} \begin{aligned} \sum _{i,j} \alpha _{n+1}(x_{i,j})&= \sum _{i,j} \left( \frac{\nu _\tau (u_{\alpha _n}^h)^p}{\tau ^p M_{i,j}}\sum _{x_{s,t}\in \tilde{\varOmega }_{i,j}^\omega } \frac{\alpha _n(x_{s,t})}{(f_{s,t}^\omega )^p}\right) \\&=\sum _{i,j} \left( \left( \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right) ^p \frac{\alpha _n(x_{i,j})}{(f_{i,j}^\omega )^p}\right) \\&\le \sum _{i,j} \alpha _{n}(x_{i,j}). \end{aligned} \end{aligned}$$
  2. (ii)

    Since \(\nu _\tau (\cdot )\) is bounded from below, see Sect. 3.1, there exists an \(\varepsilon >0\) such that \(\frac{\nu _\tau (u^h)}{\tau } \ge \varepsilon \) for any \(u^h\). Hence

    $$\begin{aligned} \begin{aligned} f_{i,j}^\omega&:= \max \left\{ \min \left\{ S_{i,j}^{\tau }(u_{\alpha _n}^h),\frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right\} ,\varepsilon \right\} \\&\le \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau } \end{aligned} \end{aligned}$$

    and by the same arguments as above we get

    $$\begin{aligned} \begin{aligned} \sum _{i,j} \alpha _{n+1}(x_{i,j})&=\sum _{i,j} \left( \left( \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\right) ^p \frac{\alpha _n(x_{i,j})}{(f_{i,j}^\omega )^p}\right) \\&\ge \sum _{i,j} \alpha _{n}(x_{i,j}). \end{aligned} \end{aligned}$$

    \(\square \)

In contrast to the pAPS-algorithm in the LATV-algorithm, the power \(p>0\) is not changed during the iterations and should be chosen sufficiently small, e.g., we set \(p=\frac{1}{2}\) in our experiments. Note that small p only allows small changes of \(\alpha \) in each iteration. In this way, the algorithm is able to generate a function \(\alpha ^*\) such that \(H_\tau (u_{\alpha ^*}^h)\) is very close to \(\frac{\nu _\tau (u_{\alpha ^*}^h)}{\tau }\). On the contrary, small p has the drawback that the number of iterations till termination is kept large. Since the parameter p has to be chosen manually, the LATV-algorithm, at least in the spirit, seems to be similar to Uzawa’s method, where also a parameter has to be chosen. The proper choice of such a parameter might be complicated and hence we are desiring for an algorithm where we do not have to tune parameters manually. Because of this and motivated by the pAPS-algorithm, we propose the following p adaptive algorithm:

figure f

In our numerical experiments, this algorithm is terminated as soon as \(|H_\tau (u_{\alpha _n}^h) - B_\tau (u_{\alpha _n}^h)| \le 10^{-6}\) and \(H_\tau (u_{\alpha _{n}}^h) \le B_\tau (u_{\alpha _{n}}^h)\). Additionally, we stop iterating when p is less than machine precision, since then anyway no progress is to be expected. Due to the adaptive choice of p, we obtain a monotonic behavior of the sequence \((\alpha _n)_n\).

Fig. 2
figure 2

Progress of \(\nu _\tau (u_{\alpha _n}^h)\) of the LATV-algorithm with \(p=\frac{1}{8}\) and \(\alpha _0=10^{-2}\) for removing random-valued impulse noise with \(r=0.3\) (left), \(r=0.1\) (middle), and \(r=0.05\) (right) from the cameraman-image (cf. Fig. 4b)

Fig. 3
figure 3

Progress of \(\nu _\tau (u_{\alpha _n}^h)\) of the pLATV-algorithm with \(p_0=\frac{1}{2}\) and \(\alpha _0=10^{-2}\) for removing random-valued impulse noise with \(r=0.3\) (left), \(r=0.1\) (middle), and \(r=0.05\) (right) from the cameraman-image (cf. Fig. 4b)

Proposition 4.5

The sequence \((\alpha _n)_n\) generated by the pLATV-algorithm is for any point \(x\in \varOmega \) monotone. In particular, it is monotonically decreasing for \(\alpha _0\) such that \(H_\tau (u_{\alpha _0}^h)>B_\tau (u_{\alpha _0}^h)\) and monotonically increasing for \(\alpha _0\) such that \(H_\tau (u_{\alpha _0}^h)\le B_\tau (u_{\alpha _0}^h)\).

Proof

For \(H_\tau (u_{\alpha _0}^h) > \mathcal {B}_\tau (u_{\alpha _0}^h)\), we can show by induction that by the pLATV-algorithm \(f_{i,j}^\omega \ge \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau }\) and hence \(1 \ge \frac{\nu _\tau (u_{\alpha _n}^h)}{\tau f_{i,j}^\omega }\) for all n. Then by the definition of \(\alpha _{n+1}\) it follows

$$\begin{aligned} \begin{aligned} \alpha _{n+1}(x_{i,j})&:=\frac{\alpha _n(x_{i,j})}{ M_{i,j}}\sum _{x_{s,t}\in \mathcal {I}_{i,j}} \left( \frac{\nu _\tau (u_{\alpha _n}^h) }{\tau f^\omega _{s,t}} \right) ^p \\&\le \alpha _n(x_{i,j}). \end{aligned} \end{aligned}$$

By similar arguments, we obtain for \(\alpha _0\) with \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\) that \( \alpha _{n+1}(x_{i,j}) \ge \alpha _n(x_{i,j}) \) for all \(x_{i,j}\in \varOmega \). \(\square \)

We are aware of the fact that using \(u_{\alpha _n}^h\) as a reference image in the LATV- and pLATV-algorithm to compute \(\nu _\tau \) may commit errors. However, we recall that for Gaussian noise we set \(\nu _2=\sigma ^2\) and for salt-and-pepper noise with \(r_1=r_2\) we have \(\nu _1= r_1\). In these cases, \(\nu _\tau \) does not depend on the original image and hence we do not commit any error by computing \(\nu _\tau \). For random-valued impulse noise-corrupted images, the situation is different and \(\nu _1\) indeed depends on the true image. In this situation, errors may be committed when \(u_{\alpha _n}^h\) is used as a reference image for calculating \(\nu _\tau \); see Figs. 2 and 3. Hence, in order to improve the proposed algorithm, for such cases for future research it might be of interest to find the optimal reference image to obtain a good approximation of the real value \(\nu _\tau \).

In contrast to the SA-TV algorithm presented in [37, 58], where the initial regularization parameter has to be chosen sufficiently small, in the LATV-algorithm as well as in the pLATV-algorithm the initial value \(\alpha _0\) can be chosen arbitrarily positive. However, in the case \(H_\tau (u_{\alpha _0}^h) > B_\tau (u_{\alpha _0}^h)\) we cannot guarantee in general that the solution \(u_{\alpha }\) obtained by the pLATV-algorithm fulfills \(H_\tau (u_{\alpha }^h) \le B_\tau (u_{\alpha }^h)\), not even if \(B_\tau (\cdot )\) is constant, due to the stopping criterion with respect to the power p. On the contrary, if \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), then the pLATV-algorithm generates a sequence \((u_{\alpha _n}^h)_n\) such that \(H_\tau (u_{\alpha _n}^h) \le B_\tau (u_{\alpha _n}^h)\) for all n and hence also for the solution of the algorithm. As a consequence, we would wish to choose \(\alpha _0>0\) such that \(H_\tau (u_{\alpha _0}^h) \le B_\tau (u_{\alpha _0}^h)\), which may be realized by the following simple automated procedure:

figure g

5 Total Variation Minimization

In this section, we are concerned with developing numerical methods for computing a minimizer of the discrete multiscale \(L^{\tau }\)-TV model, i.e.,

$$\begin{aligned} \min _{u^h\in X} \{J_\tau (u^h,\alpha ):=H_\tau (u^h) + R_ \alpha (u^h)\}.\end{aligned}$$
(5.1)

5.1 \(L^2\)-TV Minimization

Here we consider the case \(\tau =2\), i.e., the minimization problem

$$\begin{aligned} \min _{u^h\in X} \frac{1}{2}\Vert T^h u^h - g^h\Vert _{2}^2 + R_\alpha (u^h), \end{aligned}$$
(5.2)

and present solution methods, first for the case \(T^h=I\) and then for general linear bounded operators \(T^h\).

5.1.1 An Algorithm for Image Denoising

If \(T^h=I\), then (5.2) becomes an image denoising problem, i.e., the minimization problem

$$\begin{aligned} \min _{u^h\in X} \Vert u^h - g^h \Vert _{2}^2 + 2 R_\alpha (u^h). \end{aligned}$$
(5.3)

For solving this problem, we use the algorithm of Chambolle and Pock [22], which leads to the following iterative scheme:

figure h

In our numerical experiments, we choose \(\theta =1\). In particular, in [22] it is shown that for \(\theta =1\) and \(\tau \sigma \Vert \nabla ^h\Vert ^2< 1\) the algorithm converges.

5.1.2 An Algorithm for Linear Bounded Operators

Assume that \(T^h\) is a linear bounded operator from X to X, different to the identity I. Then instead of minimizing (5.2) directly, we introduce the surrogate functional

$$\begin{aligned} \begin{aligned} \mathcal {S}(u^h,a^h)&:= \frac{1}{2}\Vert T^hu^h-g^h\Vert _{2}^2 + R_{\alpha }(u^h) + \frac{\delta }{2} \Vert u^h-a^h\Vert _{2}^2 \\&\qquad - \frac{1}{2} \Vert T^h(u^h-a^h)\Vert _{2}^2 \\&= \frac{\delta }{2} \Vert u^h - z(a^h)\Vert _{2}^2 + R_{\alpha }(u^h) + \psi (a^h,g^h,T^h) \end{aligned} \end{aligned}$$
(5.4)

with \(a^h,u^h\in X, z(a^h)= a^h - \frac{1}{\delta }(T^h)^*(T^h a^h -g^h), \psi \) a function independent of \(u^h\), and where we assume \(\delta > \Vert T^h\Vert ^2\); see [31, 32]. Note that

$$\begin{aligned} \min _{u^h\in X} \mathcal {S}(u^h,a^h) \Leftrightarrow \min _{u\in X} \Vert u^h - z(a^h)\Vert _{2}^2 + 2 R_{\frac{\alpha }{\delta }}(u^h) \end{aligned}$$

and hence to obtain a minimizer amounts to solve a minimization problem of the type (5.3) and can be solved as described in Sect. 5.1.1. Then an approximate solution of (5.2) can be computed by the following iterative algorithm: Choose \(u_{0}^h \in X\) and iterate for \(n\ge 0\)

$$\begin{aligned} u_{n+1}^h=\arg \min _{u^h\in X} \mathcal {S}(u^h,u_n^h). \end{aligned}$$
(5.5)

For scalar \(\alpha \), it is shown in [28, 31, 32] that this iterative procedure generates a sequence \((u_n^h)_n\) which converges to a minimizer of (5.2). This convergence property can be easily extended to our nonscalar case yielding the following result.

Theorem 5.1

For \(\alpha : \varOmega \rightarrow \mathbb R^+\) the scheme in (5.5) generates a sequence \((u_n^h)_n\), which converges to a solution of (5.2) for any initial choice of \(u_0^h\in X\).

Proof

A proof can be accomplished analogous to [32]. \(\square \)

5.2 An Algorithm for \(L^1\)-TV Minimization

The computation of a minimizer of

$$\begin{aligned} \min _{u^h\in X} \Vert T^h u^h - g^h \Vert _{1} + R_\alpha (u^h), \end{aligned}$$
(5.6)

is due to the nonsmooth \(\ell ^1\)-term in general more complicated than obtaining a solution of the \(L^2\)-TV model. Here we suggest to employ a trick, proposed in [5] for \(L^1\)-TV minimization problems with a scalar regularization parameter, to solve (5.6) in two steps. In particular, we substitute the argument of the \(\ell ^1\)-norm by a new variable v, penalize the functional by an \(L^2\)-term, which should keep the difference between v and \(Tu-g\) small, and minimize with respect to v and u. That is, we replace the original minimization (5.6) by

$$\begin{aligned} \min _{v^h,u^h\in X} \Vert v^h\Vert _{1} + \frac{1}{2 \gamma } \Vert T^h u^h - g^h - v^h \Vert _{2}^2 + R_{\alpha }(u^h), \end{aligned}$$
(5.7)

where \(\gamma >0\) is small, so that we have \(g^h\approx T^h u^h - v^h\). Actually, it can be shown that (5.7) converges to (5.6) as \(\gamma \rightarrow 0\). In our experiments, we actually choose \(\gamma =10^{-2}\). This leads to the following alternating algorithm.

figure i

The minimizer \(v_{n+1}^h\) in step 1) of the \(L^1\)-TV\(_\alpha \) algorithm can be easily computed via a soft-thresholding, i.e., \(v_{n+1}^h={\text {ST}}(T^h u_n^h - g^h,\gamma )\), where

$$\begin{aligned} {\text {ST}}(g^h,\gamma )(x) = {\left\{ \begin{array}{ll} g^h(x) - \gamma &{} \text { if } g^h(x) > \gamma , \\ 0 &{} \text { if } |g^h(x)| \le \gamma , \\ g^h(x) + \gamma &{} \text { if } g^h(x) < -\gamma \\ \end{array}\right. } \end{aligned}$$

for all \(x\in \varOmega ^h\). The minimization problem in step 2) is equivalent to

$$\begin{aligned} \arg \min _{u^h\in X} \frac{1}{2}\Vert T^h u^h - g^h - v_{n+1}^h \Vert _{2}^2 + R_{\gamma \alpha }(u^h) \end{aligned}$$
(5.8)

and hence is of the type (5.2). Thus, an approximate solution of (5.8) can be computed as described above; see Sect. 5.1.

Theorem 5.2

The sequence \((u_n^h,v_n^h)_n\) generated by the \(L^1\)-TV\(_\alpha \) algorithm converges to a minimizer of (5.7).

Proof

The statement can be shown analogous to [5]. \(\square \)

5.3 A Primal–Dual Method for \(L^1\)-TV Minimization

For solving (1.6) with \(\tau =1\), we suggest, alternatively to the above method, to use the primal–dual method of [58] adapted to our setting, where a Huber regularization of the gradient of u is considered; see [58] for more details. Denoting by \(\bar{u}\) a corresponding solution of the primal problem and \(\bar{\mathbf {p}}\) the solution of the associated dual problem, the optimality conditions due to the Fenchel theorem [39] are given by

$$\begin{aligned}&-{\text {div}}\bar{\mathbf {p}}(x) = -\kappa \Delta \bar{u}(x) + \frac{1}{\beta +\mu } T^*(T\bar{u}(x) - g(x) ) \\&\phantom {-{\text {div}}\bar{\mathbf {p}}(x) = }+ \frac{\mu }{\beta +\mu } T^* \frac{T\bar{u}(x) - g(x)}{\max \{\beta , |T \bar{u}(x) - g(x) |\}} \\&- \bar{\mathbf {p}}(x) =\frac{1}{\gamma } \nabla \bar{u}(x) \quad \text {if } |\bar{\mathbf {p}}(x)|_{l^2} <\alpha (x)\\&- \bar{\mathbf {p}}(x) =\alpha (x) \frac{\nabla \bar{u}(x)}{|\nabla \bar{u}(x)|_{l^2}} \quad \text {if } |\bar{\mathbf {p}}(x)|_{l^2} =\alpha (x), \end{aligned}$$

for all \(x\in \varOmega \), where \(\kappa , \beta , \mu \), and \(\gamma \) are fixed positive constants. The latter two conditions can be summarized to \(-\bar{\mathbf {p}}(x) = \frac{\alpha (x) \nabla \bar{u}(x)}{\max \{\gamma \alpha (x), |\nabla \bar{u}(x)|_{l^2} \}}\). Then setting \(\bar{\mathbf {q}} = - \bar{\mathbf {p}}\) and \(\bar{v} = \frac{T\bar{u} - g}{\max \{\beta , |T \bar{u} - g |\}}\) leads to the following system of equation:

$$\begin{aligned} \begin{aligned}&0 = -\max \{\beta , |T \bar{u}(x) - g(x) |\} \bar{v} + T\bar{u}(x) - g(x) \\&0={\text {div}}\bar{\mathbf {q}}(x) + \kappa \Delta \bar{u}(x) - \frac{1}{\beta +\mu } T^*(T\bar{u}(x) - g(x) ) \\&\phantom {0=} - \frac{\mu }{\beta +\mu } T^* \bar{v}(x) \\&0=\max \{\gamma \alpha (x), |\nabla \bar{u}(x)|_{l^2} \} \bar{\mathbf {q}}(x) - \alpha (x) \nabla \bar{u}(x) \end{aligned} \end{aligned}$$
(5.9)

for all \(x\in \varOmega \). This system can be solved efficiently by a semismooth Newton algorithm; see Appendix for a description of the method and for the choice of the parameters \(\kappa , \beta , \mu \), and \(\gamma \).

Note that different algorithms presented in the literature can also be adjusted to the case of a locally varying regularization parameter, such as [18, 21, 68]. However, it is not the scope of this paper to compare different algorithms in order to detect the most efficient one, although this is an interesting research topic in its own right.

6 Numerical Experiments

In the following, we present numerical experiments for studying the behavior of the proposed algorithms (i.e., APS-, pAPS-, LATV-, pLATV-algorithm) with respect to its image restoration capabilities and its stability concerning the choice of the initial value \(\alpha _0\). The performance of these methods is compared quantitatively by means of the peak signal-to-noise ratio (PSNR) [11], which is widely used as an image quality assessment measure and the structural similarity measure (MSSIM) [90], which relates to perceived visual quality better than PSNR. When an approximate solution of the \(L^1\)-TV model is calculated, we also compare the restorations by the mean absolute error (MAE), which is an \(L^1\)-based measure defined as

$$\begin{aligned} {\text {MAE}} = {\Vert u - \hat{u} \Vert _{L^1(\varOmega )}}, \end{aligned}$$

where \(\hat{u}\) denotes the true image and u represents the obtained restoration. In general, when comparing PSNR and MSSIM, large values indicate better reconstruction than smaller values, while the smaller the MAE becomes, the better the reconstruction results are.

Whenever an image is corrupted by Gaussian noise, we compute a solution by means of the (multiscale) \(L^2\)-TV model, while for images containing impulsive noise the (multiscale) \(L^1\)-TV model is always considered.

In our numerical experiments, the CPS-, APS-, and pAPS-algorithm are terminated as soon as

$$\begin{aligned} \frac{| {\mathcal H}_\tau (u_{\alpha _n};g) - \mathcal {B}_\tau (u_{\alpha _n}) |}{\mathcal {B}_\tau (u_{\alpha _n})} \le \epsilon _B = 10^{-5} \end{aligned}$$

or the norm of the difference of two successive iterates \(\alpha _{n}\) and \(\alpha _{n+1}\) drops below the threshold \(\epsilon _\alpha =10^{-10}\), i.e., \(\Vert \alpha _{n} - \alpha _{n+1}\Vert < \epsilon _\alpha \). The latter stopping criterion is used to terminate the algorithms if \((\alpha _n)_n\) stagnates and only very little progress is to be expected. In fact, if our algorithm converges at least linearly, i.e., if there exists an \(\varepsilon _\alpha \in (0,1)\) and an \(m>0\) such that for all \(n\ge m\) we have \(\Vert \alpha _{n+1} - \alpha _{\infty }\Vert < \varepsilon _\alpha \Vert \alpha _{n} - \alpha _{\infty }\Vert \), the second stopping criterion at least ensures that the distance between our obtained result \(\alpha \) and \(\alpha _{\infty }\) is \(\Vert \alpha - \alpha _{\infty }\Vert \le \frac{\epsilon _\alpha \varepsilon _\alpha }{1-\varepsilon _\alpha }\).

6.1 Automatic Scalar Parameter Selection

For automatically selecting the scalar parameter \(\alpha \) in (1.6), we presented in Sect. 3 the APS- and pAPS-algorithm. Here we compare their performance for image denoising and image deblurring (Fig. 4).

6.1.1 Gaussian Noise Removal

For recovering images corrupted by Gaussian noise with mean zero and standard deviation \(\sigma \), we minimize the functional in (1.6) by setting \(\tau =2\) and \(T=I\). Then \(\mathcal {B}_2=\frac{\sigma ^2}{2}|\varOmega |\) is a constant independent of u. The automatic selection of a suitable regularization parameter \(\alpha \) is here performed by the CPS-, APS-, and pAPS-algorithm, where the contained minimization problem is solved by the method presented in Sect. 5.1.1. We recall that by [18, Theorem 4] and Theorem 3.1 it is ensured that the CPS- and the pAPS-algorithm generate sequences \((\alpha _n)_n\) which converge to \(\bar{\alpha }\) such that \(u_{\bar{\alpha }}\) solves (1.8). In particular, in the pAPS-algorithm the value p is chosen in dependency of \(\alpha \), i.e., \(p=p(\alpha )\), such that \(\alpha \rightarrow \frac{({\mathcal H}_\tau (u_\alpha ;g))^{p(\alpha )}}{\alpha }\) is nonincreasing; see Fig. 5a. This property is fundamental for obtaining convergence of this algorithm; see Theorem 3.1. For the APS-algorithm, such a monotonic behavior is not guaranteed and hence we cannot ensure its convergence. Nevertheless, if the APS-algorithm generates \(\alpha \)’s such that the function \(\alpha \rightarrow \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) is nonincreasing, then it indeed converges to the desired solution; see Theorem 3.2. Unfortunately, the nonincrease of the function \(\alpha \rightarrow \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) does not hold always, see Fig.  5b.

Fig. 4
figure 4

Original images. a Phantom, b cameraman, c barbara, and d lena

Fig. 5
figure 5

Denoising of the phantom-image corrupted with Gaussian white noise with \(\sigma =0.03\). a Plot of the function \(\alpha \rightarrow \frac{{\mathcal H}_\tau (u_\alpha ;g)^p}{\alpha }\) of the pAPS-algorithm with \(\alpha _0=10^{-2}\). b Plot of the function \(\alpha \rightarrow \frac{{\mathcal H}_\tau (u_\alpha ;g)}{\alpha }\) of the APS-algorithm with \(\alpha _0=10^{-2}\). The desired monotone behavior is observed after the second iteration (red part of the curve)

Fig. 6
figure 6

Denoising of the phantom-image corrupted with Gaussian white noise with \(\sigma =0.03\). a CPS-algorithm, b APS-algorithm, and c pPAS-algorithm

For performance comparison, here we consider the phantom-image of size \(256 \times 256\) pixels, see Fig. 4a, corrupted only by Gaussian white noise with \(\sigma =0.3\). In the pAPS-algorithm we set \(p_0=32\). The behavior of the sequence \((\alpha _n)_n\) for different initial \(\alpha _0\), i.e., \(\alpha _0 \in \{1, 10^{-1}, 10^{-2}\}\) is depicted in Fig. 6. We observe that all three methods for arbitrary \(\alpha _0\) converge to the same regularization parameter \(\alpha \) and hence generate results with the same PSNR and MSSIM (i.e., PSNR\(=19.84\) and MSSIM\(=0.7989\)). Hence, in these experiments, despite the lack of theoretical convergence, also the APS-algorithm seems to converge to the desired solutions. We observe the same behavior for different \(\sigma \) as well.

Table 1 Number of iterations needed for the reconstruction of the phantom-image corrupted by Gaussian white noise with different standard deviations \(\sigma \)

Looking at the number of iterations needed till termination, we observe from Table 1 that the APS-algorithm always needs significantly less iterations than the CPS-algorithm till termination. This is attributed to the different updates of \(\alpha \). Recall that for a fixed \(\alpha _n\) in the CPS-algorithm we set \(\alpha ^{CPS}_{n+1}:=\sqrt{\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}}\alpha _n\), while in the APS-algorithm the update is performed as \(\alpha ^{APS}_{n+1}:=\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\alpha _n\). Note that \( \sqrt{\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}} \le \frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)} \), if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\ge 1\) and \( \sqrt{\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}} \ge \frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)} \), if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\le 1\). Hence, we obtain \(\alpha _n \le \alpha ^{CPS}_{n+1} \le \alpha ^{APS}_{n+1}\) if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\ge 1\) and \(\alpha _n\ge \alpha ^{CPS}_{n+1} \ge \alpha ^{APS}_{n+1}\) if \(\frac{\nu _2 |\varOmega |}{\tau {\mathcal H}_2(u_{\alpha _n};g)}\le 1\). That is, in the APS-algorithm \(\alpha \) changes more significantly in each iteration than in the CPS-algorithm, which leads to a faster convergence with respect to the number of iterations. Nevertheless, exactly this behavior allows the function \(\alpha \rightarrow \frac{{\mathcal H}_2(u_\alpha ;g)}{\alpha }\) to increase which is responsible for that the convergence of the APS-algorithm is not guaranteed in general. However, in our experiments we observed that the function \(\alpha \rightarrow \frac{{\mathcal H}_2(u_\alpha ;g)}{\alpha }\) only increases in the first iterations, but does not increase (actually even decreases) afterwards; see Fig. 5b. This is actually enough to guarantee convergence, as discussed in Sect. 3, since we can consider the solution of the last step in which the desired monotonic behavior is not fulfilled as a “new” initial value. Since from this point on the nonincrease holds, we get convergence of the algorithm.

The pAPS-algorithm is designed to ensure the nonincrease of the function \(\alpha \rightarrow \frac{({\mathcal H}_\tau (u_\alpha ;g))^{p(\alpha )}}{\alpha }\) by choosing \(p(\alpha )\) in each iteration accordingly, which is done by the algorithm automatically. If \(p(\alpha )=p=1/2\) in each iteration, then the pAPS-algorithm becomes the CPS-algorithm, as it happens sometimes in practice (indicated by the same number of iterations in Table 1). Since the CPS-algorithm converges [18], the pAPS-algorithm always yields \(p\ge 1/2\). In particular, we observe that if the starting value \(\alpha _0\) is larger than the requested regularization parameter \(\alpha \), less iterations till termination are needed than with the CPS-algorithm. On the contrary, if \(\alpha _0\) is smaller than the desired \(\alpha \), \(p=1/2\) is chosen by the algorithm to ensure the monotonicity. The obtained result of the pAPS-algorithm is independent of the choice of \(p_0\) as visible from Fig. 7. In this plot, we also specify the number of iterations needed till termination. On the optimal choice of \(p_0\) with respect to the number of iterations, we conclude from Fig. 7 that \(p_0=32\) seems to do a good job, although the optimal value may depend on the noise level.

Fig. 7
figure 7

Regularization parameter \(\alpha \) obtained by the pAPS-algorithm with different \(p_0\) for denoising the phantom-image corrupted with Gaussian white noise for different \(\sigma \)

Similar behaviors as described above are also observed for denoising other and real images as well.

6.1.2 Image Deblurring

Now, we consider the situation when an image is corrupted by some additive Gaussian noise and additionally blurred. Then the operator T is chosen according to the blurring kernel, which we assume here to be known. For testing the APS- and pAPS-algorithm in this case, we take the cameraman-image of Fig. 4b, which is of size \(256\times 256\) pixels, blur it by a Gaussian blurring kernel of size \(5\times 5\) pixels and standard deviation 10, and additionally add some Gaussian white noise with variance \(\sigma ^2\). The minimization problem in the APS- and pAPS-algorithm is solved approximately by the algorithm in (5.5). In Fig. 8, the progress of \(\alpha _n\) for different \(\sigma \)’s, i.e., \(\sigma \in \{0.3, 0.1, 0.05\}\), and different \(\alpha _0\)’s, i.e., \(\alpha _0 \in \{1, 10^{-1}, 10^{-2}\}\), is presented. In these tests, both algorithms converge to the same regularization parameter and minimizer. From the figure, we observe that the pAPS-algorithm needs much less iterations than the APS-algorithm till termination. This behavior might be attributed to the choice of the power p in the pAPS-algorithm, since we observe in all our experiments that \(p>1\) till termination.

Fig. 8
figure 8

Reconstruction of the cameraman-image corrupted by Gaussian blurring kernel of size \(5\times 5\) and standard deviation 10. In the pAPS-algorithm, we set \(p_0=32\). a \(\sigma =0.3\); pAPS-algorithm, b \(\sigma =0.1\); pAPS-algorithm, c \(\sigma =0.05\); pAPS-algorithm, d \(\sigma =0.3\); APS-algorithm, e \(\sigma =0.1\); APS-algorithm, and f \(\sigma =0.05\); APS-algorithm

6.1.3 Impulsive Noise Removal

It has been demonstrated that for removing impulsive noise in images one should minimize the \(L^1\)-TV model rather than the \(L^2\)-TV model. Then for calculating a suitable regularization parameter \(\alpha \) in the \(L^1\)-TV model we use the APS- and pAPS-algorithm, in which the minimization problems are solved approximately by the \(L^1\)-TV\(_\alpha \)-algorithm. Here, we consider the cameraman-image corrupted by salt-and-pepper noise or random-valued impulse noise with different noise levels, i.e., \(r_1=r_2 \in \{0.3, 0.1, 0.05\}\) and \(r \in \{0.3, 0.1, 0.05\}\), respectively. The obtained results for different \(\alpha _0\)’s are depicted in Figs. 9 and 10. For the removal of salt-and-pepper noise, we observe from Fig. 9 similar behaviors of the APS- and pAPS-algorithm as above for removing Gaussian noise. In particular, both algorithms converge to the same regularization parameter. However, in many cases the APS-algorithm needs significantly less iterations than the pAPS-algorithm. These behaviors are also observed in Fig. 10 for removing random-valued impulse noise as long as the APS-algorithm finds a solution. In fact, for \(r=0.05\) it actually does not converge but oscillates as depicted in Fig. 10c.

Fig. 9
figure 9

Denoising of the cameraman-image corrupted by salt-and-pepper noise. In the pAPS-algorithm, we set \(p_0=32\). a \(r_1=r_2=0.3\); pAPS-algorithm, b \(r_1=r_2=0.1\); pAPS-algorithm, c \(r_1=r_2=0.05\); pAPS-algorithm, d \(r_1=r_2=0.3\); APS-algorithm, e \(r_1=r_2=0.1\); APS-algorithm, and f \(r_1=r_2=0.05\); APS-algorithm

Fig. 10
figure 10

Denoising of the cameraman-image corrupted by random-valued impulse noise. In the pAPS-algorithm, we set \(p_0=32\). a \(r=0.3\); pAPS-algorithm, b \(r=0.1\); pAPS-algorithm, c \(r=0.05\); pAPS-algorithm, d \(r=0.3\); APS-algorithm, and e \(r=0.1\); APS-algorithm

6.2 Locally Adaptive Total Variation Minimization

In this section, various experiments are presented to evaluate the performance of the LATV- and pLATV-algorithm presented in Sect. 4. Their performance is compared with the proposed pAPS-algorithm as well as with the SA-TV-algorithm introduced in [37] for \(L^2\)-TV minimization and in [58] for \(L^1\)-TV minimization. We recall that the SA-TV methods perform an approximate solution for the optimization problem in (1.10), respectively, and compute automatically a spatially varying \(\lambda \) based on a local variance estimation. However, as pointed out in [37, 58], they only perform efficiently when the initial \(\lambda \) is chosen sufficiently small, as we will do in our numerics. On the contrary, for the LATV- and pLATV-algorithm any positive initial \(\alpha _0\) is sufficient. For the comparison, we consider four different images, shown in Fig. 4, which are all of size \(256 \times 256\) pixels. In all our experiments, for the SA-TV-algorithm we use \(\mathcal {I}_{i,j}= \varOmega _{i,j}^\omega \), see [37], and we set the window-size to \(11 \times 11\) pixels in the case of Gaussian noise and to \(21 \times 21\) pixels in case of impulse noise. For the LATV- and pLATV-algorithm, we use the window-size \(\omega =11\), if not otherwise specified, and choose \(p_0=\frac{1}{2}\).

6.3 Gaussian Noise Removal

6.3.1 Dependency on the Initial Regularization Parameter

We start this section by investigating the stability of the SA-TV-, LATV-, and pLATV-algorithm with respect to the initial regularization parameter, i.e., \(\lambda _0\) for the SA-TV-algorithm and \(\alpha _0\) for the other algorithms, by denoising the cameraman-image corrupted by Gaussian white noise with standard deviation \(\sigma =0.1\). In this context, we also compare the difference of the pLATV-algorithm with and without using Algorithm 1 for computing automatically an initial parameter, where we set \(c_{\alpha _0}=\frac{1}{5}\). The minimization problems contained in the LATV- and pLATV-algorithm are solved as described in Sect.  5.1.1. For comparison reasons, we define the values PSNR\(_{\text {diff}}:= \max _{\alpha _0}\) PSNR\((\alpha _0) - \min _{\alpha _0}\) PSNR\((\alpha _0)\) and MSSIM\(_{\text {diff}}:= \max _{\alpha _0}\) MSSIM\((\alpha _0) - \min _{\alpha _0}\) MSSIM\((\alpha _0)\) to measure the variation of the considered quality measures. Here PSNR\((\alpha _0)\) and MSSIM\((\alpha _0)\) are the PSNR and MSSIM-values of the reconstructions, respectively, which are obtained from the considered algorithms when the initial regularization parameter is set to \(\alpha _0\). From Table 2, we observe that the pLATV-algorithm with and without Algorithm 1 is more stable with respect to the initial regularization parameter than the LATV-algorithm and the SA-TV-algorithm. This stable performance of the pLATV-algorithm is reasoned by the adaptivity of the value p, which allows the algorithm to reach the desired residual (at least very closely) for any \(\alpha _0\). As expected, the pLATV-algorithm with Algorithm 1 is even more stable with respect to \(\alpha _0\) than the pLATV-algorithm alone, since, due to Algorithm 1, the difference of the actually used initial parameters in the pLATV-algorithm is rather small leading to very similar results. Note that if \(\alpha _0\) is sufficiently small, then the pLATV-algorithm with and without Algorithm 1 coincides; see Table 2 for \(\alpha _0\in \{10^{-2},10^{-3},10^{-4}\}\). Actually, in the rest of our experiments we choose \(\alpha _0\) always so small that Algorithm 1 returns the inputted \(\alpha _0\).

Table 2 PSNR and MSSIM of the reconstruction of the cameraman-image corrupted by Gaussian white noise with standard deviation \(\sigma =0.1\) via the LATV- and pLATV-algorithm with different \(\alpha _0\) and via the SA-TV-algorithm with different \(\lambda _0\). In the LATV- and pLATV-algorithm, we use \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}\) with window-size \(11 \times 11\) pixels in the interior and we set \(p_0=\frac{1}{2}\)

6.3.2 Dependency on the Local Window

In Table 3, we report on the performance tests of the pLATV-algorithm with respect to the chosen type of window, i.e., \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}^\omega \) and \(\mathcal {I}_{i,j}={\varOmega }_{i,j}^\omega \). We observe that independently which type of window is used the algorithm finds nearly the same reconstruction. This may be attributed to the fact that the windows in the interior are the same for both types of windows. Nevertheless, the boundaries are treated differently, which leads to different theoretical results, but seems not to have a significant influence on the practical behavior. A similar behavior is observed for the LATV-algorithm, as the LATV- and pLATV-algorithm return nearly the same reconstructions as observed below in Table 4. Since for both types of windows nearly the same results are obtained, in the rest of our experiments we limit ourselves to always set \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}^\omega \) in the LATV- and pLATV-algorithm.

Table 3 PSNR and MSSIM of the reconstruction of different images corrupted by Gaussian white noise with standard deviation \(\sigma \) via the pLATV-algorithm with \(\alpha _0=10^{-4}\)
Table 4 PSNR- and MSSIM-values of the reconstruction of different images corrupted by Gaussian white noise with standard deviation \(\sigma \) via pAPS-, LATV-, and pLATV-algorithm with \(\alpha _0=10^{-4}\) and SA-TV-algorithm with \(\lambda _0=10^{-4}\). In the LATV- and pLATV-algorithm, we use \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}\) with window-size \(11 \times 11\) pixels in the interior and we set \(p_0=\frac{1}{2}\)

Next, we test the pLATV-algorithm for different values of the window-size varying from 3 to 15. Fig. 11 shows the PSNR and MSSIM of the restoration of the cameraman-image degraded by different types of noise (i.e., Gaussian noise with \(\sigma =0.3\) or \(\sigma =0.1\), salt-and-pepper noise with \(r_1=r_2=0.3\) or \(r_1=r_2=0.1\), or random-valued impulse noise with \(r=0.3\) or \(r=0.1\)), where the pLATV-algorithm with \(\alpha _0=10^{-2}\) and \(p_0=1/2\) is used. We observe that the PSNR and MSSIM are varying only slightly with respect to changing window-size. However, in the case of Gaussian noise elimination the PSNR and MSSIM increase very slightly with increasing window-size, while in the case of impulse noise contamination such a behavior cannot be observed. In Fig. 11, we also specify the number of iterations needed till termination of the algorithm. From this, we observe that a larger window-size results in most experiments in more iterations.

Fig. 11
figure 11

Restoration of the cameraman-image corrupted by different types of noise via the pLATV-method with different window-sizes

6.3.3 Homogeneous Noise

Now, we test the algorithms for different images corrupted by Gaussian noise with zero mean and different standard deviations \(\sigma \), i.e., \(\sigma \in \{0.3, 0.1, 0.05, 0.01\}\). The initial regularization parameter \(\alpha _0\) is set to \(10^{-4}\) in the pAPS-, LATV-, and pLATV-algorithm. In the SA-TV-algorithm, we choose \(\lambda _0=10^{-4}\), which seems sufficiently small. From Table 4, we observe that all considered algorithms behave very similar. However, for \(\sigma \in \{ 0.1, 0.05, 0.01\}\) the SA-TV-algorithm most of the times performs best with respect to PSNR and MSSIM, while sometimes the LATV- and pLATV-algorithm have larger PSNR and MSSIM. That is, looking at these quality measures a locally varying regularization weight is preferred to a scalar one, as long as \(\sigma \) is sufficiently small. In Fig. 12, we present the reconstructions obtained via the considered algorithms and we observe that the LATV- and pLATV-algorithm generate visually the best results, while the result of the SA-TV-algorithm seems in some parts oversmoothed. For example, the very left tower in the SA-TV-reconstruction is completely vanished. This object is in the other restorations still visible. For large standard deviations, i.e. \(\sigma =0.3\), we observe from Table 4 that the SA-TV method performs clearly worse than the other methods, while the pAPS-algorithm usually has larger PSNR and the LATV- and pLATV-algorithm have larger MSSIM. Hence, whenever the noise level is too large and details are considerably lost due to noise, the locally adaptive methods are not able to improve the restoration quality.

Fig. 12
figure 12

Reconstruction of the cameraman-image corrupted by Gaussian white noise with \(\sigma =0.1\). a pAPS (PSNR: 27.31; MSSIM: 0.8109), b SA-TV (PSNR: 27.56; MSSIM: 0.7792), c LATV (PSNR: 27.40; MSSIM: 0.8170), and d pLATV (PSNR: 27.38; MSSIM: 0.8171)

Fig. 13
figure 13

Reconstruction of the cameraman-image corrupted by Gaussian white noise with \(\sigma ^2=0.0025\) except in (a) the highlighted area where \(\sigma ^2=0.015\)

6.3.4 Nonhomogeneous Noise

For this experiment, we consider the cameraman-image degraded by Gaussian white noise with variance \(\sigma ^2 = 0.0025\) in the whole domain \(\varOmega \) except a rather small area (highlighted in red in Fig. 13a), denoted by \(\tilde{\varOmega }\), where the variance is 6 times larger, i.e., \(\sigma ^2=0.015\) in this part. Since the noise level is in this application not homogeneous, the pLATV-algorithm presented in Sect. 4 has to be adjusted to this situation accordingly. This can be done by making \(\nu _\tau \) (here \(\tau =2\)) locally dependent and we write \(\nu _\tau =\nu _\tau (\hat{u})(x)\) to stress the dependency on the true image \(\hat{u}\) and on the location \(x\in \varOmega \) in the image. In particular, for our experiment we set \(\nu _2=0.015\) in \(\tilde{\varOmega }\), while \(\nu _2=0.0025\) in \(\varOmega \setminus \tilde{\varOmega }\). Since \(\nu _\tau \) is now varying, we also have to adjust the definition of \(\mathcal {B}_\tau \) and \(B_\tau \) to

$$\begin{aligned} \mathcal {B}_\tau (u):= \int _\varOmega \nu _\tau (u)(x) \ d x \ \text {and} \ B_\tau (u^h):= \sum _{x\in {\varOmega ^h}} \nu _\tau (u^h)(x), \end{aligned}$$
Table 5 PSNR- and MSSIM-values of the reconstruction of different images corrupted by Gaussian blur (blurring kernel of size \(5 \times 5\) pixels with standard deviation 10) and additive Gaussian noise with standard deviation \(\sigma \) via pAPS-, LATV-, and pLATV-algorithm with \(\alpha _0=10^{-2}\) and SA-TV-algorithm with \(\lambda _0=10^{-4}\). In the LATV- and pLATV-algorithm, we use \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}\) with window-size \(11 \times 11\) pixels in the interior and set \(p_0=\frac{1}{2}\)

respectively, for the continuous and discrete setting. Making these adaptations allows us to apply the pLATV-algorithm as well as the pAPS-algorithm to the application of removing nonuniform noise.

The reconstructions obtained by the pAPS-algorithm (with \(p_0=32\) and \(\alpha _0=10^{-2}\)) and by the pLATV-algorithm (with \(p_0=1/2\) and \(\alpha _0=10^{-2}\)) are shown in Figs. 13b and c, respectively. Due to the adaptive choice of \(\alpha \), see Fig. 13d where light colors indicate a large value, the pLATV-algorithm is able to remove all the noise considerably, while the pAPS-algorithm returns a restoration, which still retains noise in \(\tilde{\varOmega }\).

6.4 Deblurring and Gaussian Noise Removal

The performance of the algorithms for restoring images corrupted by Gaussian blur with blurring kernel of size \(5\times 5\) pixels and standard deviation 10 and additive Gaussian noise with standard deviation \(\sigma \) is reported in Table 5. Here we observe that the LATV- as well as the pLATV-algorithm outperforms the SA-TV-algorithm for nearly any example. This observation is also clearly visible in Fig. 14, where the SA-TV-algorithm produces a still blurred output. The pAPS-algorithm generates very similar reconstructions as the LATV- and pLATV-algorithm, which is also reflected by similar PSNR and MSSIM. Similarly as before, the pAPS-algorithm performs best when \(\sigma =0.3\), while for smaller \(\sigma \) the LATV-algorithm has always the best PSNR.

Fig. 14
figure 14

Reconstruction of the cameraman-image corrupted by Gaussian white noise with \(\sigma =0.1\) and Gaussian blur. a pAPS (PSNR: 23.11; MSSIM: 0.7175), b SA-TV (PSNR: 22.64; MSSIM: 0.6957), c LATV (PSNR: 23.17; MSSIM: 0.7160), and d pLATV (PSNR: 23.15; MSSIM: 0.7160)

Table 6 PSNR- and MSSIM-values of the reconstruction of different images corrupted by salt-and-pepper noise with \(r_1=r_2\) via pAPS- and pLATV-algorithm with \(\alpha _0=10^{-2}\) and SA-TV-algorithm with \(\lambda _0=0.2\) and window-size \(21 \times 21\). In the pLATV-algorithm, we use \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}\) with window-size \(11 \times 11\) pixels in the interior and we set \(p_0=\frac{1}{2}\)
Table 7 PSNR- and MSSIM-values of the reconstruction of different images corrupted by random-valued impulse noise via pAPS- and pLATV-algorithm with \(\alpha _0=10^{-2}\) and SA-TV-algorithm with \(\lambda _0=0.2\) and window-size \(21 \times 21\). In the pLATV-algorithm, we use \(\mathcal {I}_{i,j}=\tilde{\varOmega }_{i,j}\) with window-size \(11 \times 11\) pixels in the interior and we set \(p_0=\frac{1}{2}\) in the pLATV-algorithm
Fig. 15
figure 15

Reconstruction of the barbara-image corrupted by salt-and-pepper noise with \(r_1=r_2=0.3\). a Noisy image, b pAPS (PSNR: 21.56; MSSIM: 0.6242), c SA-TV (PSNR: 20.54; MSSIM: 0.5889), and d pLATV (PSNR: 21.51; MSSIM: 0.6253)

Fig. 16
figure 16

Reconstruction of the barbara-image corrupted by random-valued impulse noise with \(r=0.3\). a Noisy image, b pAPS (PSNR: 24.24; MSSIM: 0.8040), c SA-TV (PSNR: 23.96; MSSIM: 0.7977), and d pLATV (PSNR: 24.24; MSSIM: 0.7992)

6.5 Impulse Noise Removal

Since it turns out that the LATV- and pLATV-algorithm produce nearly the same output, here, we compare only our pAPS- and pLATV-algorithm for \(L^1\)-TV minimization, with the SA-TV method introduced in [58], where a semismooth Newton method is used to generate an estimate of the minimizer of (1.10). For the sake of a fair comparison, an approximate solution of the minimization problem in the pLATV-algorithm is solved by the semismooth Newton method described in Appendix. For the SA-TV method, we use the parameters suggested in [58] and hence \(\mathcal {I}_{i,j}=\varOmega _{i,j}^\omega \). Moreover, we set \(\lambda _0=0.2\) in our experiments, which seems sufficiently small. In Tables 6 and 7, we report on the results obtained by the pAPS-, SA-TV-, and pLATV-algorithm for restoring images corrupted by salt-and-pepper noise or random-valued impulse noise, respectively. While the pAPS- and pLATV-algorithm produce quite similar restorations for both type of noises, the SA-TV algorithm seems to be outperformed in most examples. For example, in Fig. 15 we observe that the pAPS- and pLATV-algorithm remove the noise considerably, while the solution of the SA-TV method still contains noise. On the contrary, for the removal of random-valued impulse noise in Fig. 16 we see that all three methods produce similar restorations.

7 Conclusion and Extensions

For \(L^1\)-TV and \(L^2\)-TV minimization including convolution type of problems, automatic parameter selection algorithms for scalar and locally dependent weights \(\alpha \) are presented. In particular, we introduce the APS- and pAPS-algorithm for automatically determining a suitable scalar regularization parameter. While for the APS-algorithm its convergence only under some assumptions is shown, the pAPS-algorithm is guaranteed to converge always. Besides the general applicability of these two algorithms, they also possess a fast numerical convergence in practice.

In order to treat homogeneous regions differently than fine features in images, which promises a better reconstruction, cf. Proposition 4.2 and Remark 4.1, algorithms for automatically computing locally adapted weights \(\alpha \) are proposed. These methods are much more stable with respect to the initial \(\alpha _0\) than the state-of-the-art SA-TV method. Moreover, while in the SA-TV-algorithm the initial \(\lambda _0>0\) has to be chosen sufficiently small, in our proposed methods any arbitrary \(\alpha _0>0\) is allowed. Hence, the LATV- and pLATV-algorithm are much more flexible with respect to the initialization. By numerical experiments, it is shown that the reconstructions obtained by the newly introduced algorithms are similar with respect to image quality measure to the restorations obtained by the SA-TV algorithm. In the case of Gaussian noise removal (including deblurring) for sufficiently small noise levels, reconstructions obtained by locally varying weights seem to be qualitatively better than the results with scalar parameters. On the contrary, for removing impulse noise a spatially varying \(\alpha \) or \(\lambda \) is in general not always improving the restoration quality.

For computing a minimizer of the respective multiscale total variation model, we present first- and second-order methods and show their convergence to a respective minimizer.

Although the proposed parameter selection algorithms are constructed to estimate the parameter \(\alpha \) in (1.6) and (1.9), they can be easily adjusted to find a good candidate \(\lambda \) in (1.7) and (1.10), respectively, as well.

Note that the proposed parameter selection algorithms are not restricted to total variation minimization, but may be extended to other types of regularizers as well by imposing respective assumptions that guarantee a minimizer of the considered optimization problem. In order to obtain similar (convergence) results as presented in Sects. 3 and 4, the considered regularizer should be convex, lower semicontinuous, and one-homogeneous. In particular for proving convergence results as in Sect. 3 (cf. Theorem 3.1 and Theorem 3.2), an equivalence of the penalized and corresponding constrained minimization problem, as in Theorem 2.2, is needed. An example of such a regularizer for which the presented algorithms are extendable is the total generalized variation [13].