1 Introduction

Every measurement contains noise, which most of the time is undesirable, as it disguises the signal and thus the information to be extracted from it [1]. Often, iterative denoising [2,3,4,5,6] is needed to recover part of the information hidden by the noise. For recovering as much of information as possible, the algorithm has to address the right noise statistics, obtain a suitable prior, and, of course, converge to a certain solution. One category of algorithms for denoising relies on the principles of solving the “augmented Lagrangian,” which is adaptable to a large variety of problems concerning image optimization. One of these algorithms is the “alternating direction method of multipliers” (ADMM) algorithm, which was first described by Glowinski and Marroco [7] and Gabay and Mercier [8]. It has proven to be a good tool to solve these kinds of ill-posed inverse problems successfully and thus has gained huge interest within the last decade, during which huge efforts were made to optimize specialized priors or regularizers for all types of structure occurring in images [9,10,11,12,13,14,15]. Many of these algorithms interpret the principle of “total variation” (TV) by Rudin et al. [16] in slightly different ways. The newest approach called “total generalized variation” (TGV) from Bredis et al. [17] utilizes an adaptive mixture of different higher orders of TV and applies them on suiting local regions on the image, solving the staircase problem of first order TV, with the benefit of preserving sharp edges, that would otherwise be lost in higher-order TV approaches [18].

As will be shown in this work, successful denoising and deconvolution in the context of total variation break down to the choice of two sets of parameters: the penalty parameters \(\rho \) and \(\eta \), which have a great influence on the convergence speed [19,20,21,22], and the regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\), that depend on the amount of noise and determine the quality of the outcome [23, 24]. To use these ADMM-based algorithms to evaluate experimental data in a scientific context, the choice of these parameters must necessarily be objective to avoid altering the results by the user. Today, this hurdle restricts natural scientist to switch to better-suited algorithms. Instead, it pushes them to use algorithms, such as the “Richardson-Lucy-Algorithm” (RLA), Fourier-ratio deconvolution, or Wiener filtering [25,26,27,28,29,30,31], that cannot compete with modern algorithms, such as the ADMM, in terms of adaptability and versatility.

By balancing primal and dual residuals, arising from optimality conditions for the ADMM, the penalty parameters can be found adaptively in a stable scheme for image denoising or deconvolution, speeding up convergence and making the algorithm independent on the initial choice of these parameters. Further, we will show how regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\) need to be chosen in order to improve the image quality significantly, based on the standard deviation of the noise. With our work, the ADMM-based TGV algorithm can be designed such that it operates purely on a mathematical foundation and on noise parameters that can be determined experimentally by analyzing the used detector itself. Thus, our algorithm enables experimental scientist to use ADMM-based denoising and deconvolution techniques, adapt them to their respective experiments, and improve their data evaluation significantly.

By testing sets of these four parameters \(\rho \) and \(\eta \) and \(\lambda _{0}\) and \(\lambda _{1}\) for noisy images, we can further show that regions of optimal results can be found for different noise types and different priors, allowing to compare results of the standard ADMM-TGV algorithm with the ones of the residual balanced. In this way, we can show that the automated choice does not only objectify the parameters, but also increases the convergence speed for most initial values. In the following subsections, the noise and the fundamentals of the respective algorithms are explained in detail.

1.1 Gaussian noise fundamentals

Whenever images are captured, the acquisition process adds noise. Gaussian noise is inflicted in terms of, e.g., camera read-out noise and can be described as an additive process. The image formation process equates the following [16]:

$$\begin{aligned} \varvec{\xi } =\varvec{\Omega } {\textbf {x}} + \mathcal {N} \left( \mu ,\sigma ^2\right) \text { ,} \end{aligned}$$
(1)

with the measured image \(\varvec{\xi }\), a point-spread-function (PSF) \(\varvec{\Omega }\) convolved with the noise-free image \(\textbf{x}\) and Gaussian noise \(\mathcal {N}\) with the mean value \(\mu \) and the variance \(\sigma ^{2}\). To clarify notations, in the following, we will use bold letters to indicate vectors or matrices. Scalars and indices remain in normal font.

The mean value can be leveled by subtracting a background image. For removing noise with the variance \(\sigma ^{2}\) from the data, denoising is required. In terms of Bayes’ theorem [32], the likelihood \(L_{\mathcal {N}}\) of having a good estimate of the “true” convolved image \(\varvec{\Omega }\textbf{x}\) can be expressed as the conditional probability P knowing \(\varvec{\xi }\) [33]:

$$\begin{aligned}&L_{\mathcal {N}}(\varvec{\Omega } {\textbf {x}})=P(\varvec{\xi }\,|\,\varvec{\Omega } {\textbf {x}}) \qquad \qquad \qquad \qquad \qquad \qquad \qquad \; \nonumber \\&\quad \qquad \;\;\, = {\displaystyle \prod _{m,n}^{M,N}} {\frac{1}{\sqrt{2\pi \sigma ^{2}}} \exp \left\{ -\frac{\left( \left[ \varvec{\Omega } {\textbf {x}}\right] _{m,n}-\varvec{\xi }_{m,n}\right) ^{2}}{2\sigma ^{2}}\right\} } \text {,} \end{aligned}$$
(2)

with the individual pixels mn of an \(M\!\!\times \!\! N\) image. This equation can be reformulated into the negative log-likelihood functional [33]:

$$\begin{aligned} F_{\mathcal {N}}(\varvec{\Omega } {\textbf {x}})&\propto - \text {ln} \left\{ L_{\mathcal {N}}(\varvec{\Omega } {\textbf {x}})\right\} \propto {\displaystyle \sum _{m,n}^{M,N}} \frac{\left( \left[ \varvec{\Omega } {\textbf {x}}\right] _{m,n}-\varvec{\xi }_{m,n}\right) ^{2}}{2\sigma ^{2}}\quad \nonumber \\&\!\!\!\!\!\!\!\!\! ={\frac{1}{2\sigma ^{2}}\, \Vert \, \varvec{\Omega } {\textbf {x}}-\varvec{\xi }\,\Vert _{F}^{2}}\;\qquad \qquad \qquad \qquad \quad \,\text {,} \end{aligned}$$
(3)

with the Frobenius norm \(\Vert \cdot \Vert _{F}\) as an extension of the Euclidean norm for matrices.

1.2 Poisson noise fundamentals

Due to the quantized nature of photons or electrons used for imaging, the Poisson noise occurs as count fluctuations. It thus differs from the Gaussian noise in the overall image statistics, as it is deeply embedded in the signal [34]:

$$\begin{aligned} \varvec{\xi }= \mathcal {P} \left( \varvec{\Omega } {\textbf {x}} \right) \text { ,} \end{aligned}$$
(4)

with the Poisson noise \(\mathcal {P}\). Similar to flipping a coin, a probability of counting a certain number of incidents can be assigned to every pixel. Utilizing Bayes’ theorem again to have the true convolved image \(\varvec{\Omega } {\textbf {x}}\), the likelihood can be expressed as follows [34]:

$$\begin{aligned} L_{\mathcal {P}}(\varvec{\Omega } {\textbf {x}}) \!\!&=P\left( \varvec{\xi }\,|\,\varvec{\Omega } {\textbf {x}}\right) \qquad \qquad \qquad \quad \;\, \nonumber \\&\!\quad ={\displaystyle \prod _{n,m}^{N,M}} \frac{\left[ \varvec{\Omega } {\textbf {x}}\right] _{n,m}^{\varvec{\xi }_{n,m}} \exp \Big \lbrace - \left[ \varvec{\Omega } {\textbf {x}}\right] _{n,m}\Big \rbrace }{\varvec{\xi }_{n,m}!} \text {,} \end{aligned}$$
(5)

with the negative log-likelihood functional [34]:

$$\begin{aligned} G_{\mathcal {P}}\left( \varvec{\Omega } {\textbf {x}}\right) \!\!\!\!\!\!&\,\propto \quad \;\; -{{\,\textrm{ln}\,}}\left\{ L_{\mathcal {P}}(\varvec{\Omega } {\textbf {x}})\right\} \qquad \qquad \qquad \qquad \end{aligned}$$
(6)
$$\begin{aligned}&\qquad = {\displaystyle \sum _{n,m}^{N,M}} \smash {-{{\,\textrm{ln}\,}}\left\{ \left[ \varvec{\Omega } {\textbf {x}}\right\} _{n,m} \right\} \left( \varvec{\xi }_{n,m}\right) + \left[ \varvec{\Omega } {\textbf {x}}\right] _{n,m}} \text {.} \end{aligned}$$
(7)

In the formulation of \(G_{\mathcal {P}}\), the factorial in the denominator can be left out, as it vanishes in the derivation and thus plays no further role in the minimization.

1.3 General scaled ADMM

The general scaled ADMM algorithm solves problems in the form of [22]:

$$\begin{aligned}&\underset{\mathbf {x, {z}}}{{{\,\mathrm{arg\,min}\,}}} \quad&f(\textbf{x})+g(\textbf{z}) \nonumber \\&\text {subject to} \quad&\textbf{Ax}+\textbf{Bz}=\textbf{c} \text {,} \end{aligned}$$
(8)

with two convex functions f and g applied to the variables \(\textbf{x}\) and \(\textbf{z}\). \(\textbf{A}\) and \(\textbf{B}\) then are operators applied to the variables, and \(\textbf{c}\) is a constant.

As an example in imaging, \(\textit{f}\) and \(\textit{g}\) can be functions that correspond to a given noise statistic or allow to distinguish between a “true” image without blur and noise and the noise itself. The variable \(\textbf{x}\) in this case corresponds to the “true” image and \(\textbf{z}\) to the noise to be separated. Both variables are connected via one or several conditions, including the operators \(\textbf{A}\) and \(\textbf{B}\) that for example could act as a convolution with a blur kernel or difference operators. The application of ADMM for image treatments is shown in Sect. 2.

Staying with more general expressions, the overall problem can be formulated into the scaled augmented Lagrangian to be minimized, which is given by the following [22]:

$$\begin{aligned} \mathcal {L}_{\rho }(\textbf{x},\textbf{z},\textbf{u})= f(\textbf{x}) + g(\textbf{z}) + \frac{\rho }{2} \Vert \textbf{Ax} +\textbf{Bz}-\textbf{c} + \textbf{u} \Vert _{2}^{2} -\frac{\rho }{2} \,\Vert \textbf{u} \Vert _{2}^{2} \text { ,} \end{aligned}$$
(9)

with the Euclidean norm \(\Vert \cdot \Vert _{2}\), the scaled dual variable (or Lagrange multiplier) \(\textbf{u}=\frac{1}{\rho } \textbf{y}\) associated with the constraint \(\textbf{Ax}+\textbf{Bz}=\textbf{c}\), and \(\textbf{y}\) being the unscaled dual variable. The iterative minimization process then can be decomposed and carried out in a three-step pattern [22]:

$$\begin{aligned}&\textbf{x}^{k+1}&= \underset{\textbf{x}}{{{\,\mathrm{arg\,min}\,}}} \left\{ f(\textbf{x}) + \frac{\rho }{2} \,\Vert \,\textbf{Ax} + \textbf{Bz}^{k} -\textbf{c} + \textbf{u}^{k}\,\Vert _{2}^{2}\,\right\} \;\; \text {,} \end{aligned}$$
(10)
$$\begin{aligned}&\textbf{z}^{k+1}&= \underset{\textbf{z}}{{{\,\mathrm{arg\,min}\,}}} \left\{ g(\textbf{z}) + \frac{\rho }{2} \,\Vert \,\textbf{Ax}^{k+1} + \textbf{Bz} -\textbf{c} + \textbf{u}^{k}\,\Vert _{2}^{2}\,\right\} \!\!\; \text {,} \end{aligned}$$
(11)
$$\begin{aligned}&\textbf{u}^{k+1}&= \textbf{u}^k + \textbf{Ax}^{k+1} + \textbf{Bz}^{k+1}-\textbf{c} \qquad \qquad \qquad \qquad \qquad \quad \; \text {,} \end{aligned}$$
(12)

minimizing with respect to \(\textbf{x}\) first, \(\textbf{z}\) second and then updating the dual variable \(\textbf{u}\). In this context, \(\rho \!>\!\text {0}\) appears as a penalty parameter. Its careful choice is crucial for the convergence speed, as a good choice might result in an accurate solution within a few tens of iterations, while a bad choice might delay convergence to a couple of thousand iterations or even lead to divergence. Obviously, it is desirable to minimize time cost, so somehow, an optimal \(\rho \) choice has to be found.

1.4 Optimality conditions and residual balancing

Three feasibility conditions have to be fulfilled for the ADMM to optimally converge. These arise from the Karush-Kuhn-Tucker conditions [35], which utilize first-order derivatives to find a saddle point in the Lagrangian. Deriving \(\mathcal {L}_{\rho } (\textbf{x},\textbf{z},\textbf{u})\) with respect to the unscaled Lagrange parameter \(\textbf{y}\) yields the primal feasibility condition [9, 22]:

$$\begin{aligned} 0 = \textbf{Ax}^{\star } +\textbf{Bz}^{\star }-\textbf{c} \text { ,} \end{aligned}$$
(13)

with the star marking the optimum choice of the respective variable. As the Lagrangian is minimized by the updates \(\textbf{x}^{k+1}\) and \(\textbf{z}^{k+1}\), the difference from zero of the above equation gives the primal residual [9, 22]:

$$\begin{aligned} \textbf{R}^{k+1} = \textbf{Ax}^{k+1} +\textbf{Bz}^{k+1}-\textbf{c} \text { .} \end{aligned}$$
(14)

Second, both the \(\textbf{x}\)- and \(\textbf{z}\)-updates must satisfy the dual feasibility conditions with respect to the unscaled dual variable \(\textbf{y}=\rho \textbf{u}\), yielding the following [9, 22]:

$$\begin{aligned} 0\in & {} \partial _{\textbf{x}} f(\textbf{x}^{\star }) + \rho \textbf{A}^{T}\textbf{u}^{\star } \text {,}\end{aligned}$$
(15)
$$\begin{aligned} 0\in & {} \partial _{\textbf{z}}\, g(\textbf{z}^{\,\star }) + \rho \textbf{B}^{T}\textbf{u}^{\star } \text {,} \end{aligned}$$
(16)

with \(\textbf{A}^{T}\), \(\textbf{B}^{T}\) denoting the adjoint operator of \(\textbf{A}\) and \(\textbf{B}\). The subdifferential operator \(\partial \) can be replaced by a gradient and the \(\in \) by an equal sign, in the case of g and f being differentiable [22]. Further, assuming \(\rho ^{k}\) to be variable between iterations, where \(\tau ^{k}\) optimizes \(\rho ^{k+1}\):

$$\begin{aligned} \rho ^{k+1}=&\tau ^{k}\cdot \rho ^{k}&\qquad \qquad \;\;\;\;\;\text {,}\end{aligned}$$
(17)
$$\begin{aligned} \textbf{u}^{k+1}=&\frac{\textbf{y}^{k}}{\rho ^{k+1}}=&\frac{\textbf{u}^{k} + \textbf{R}^{k+1}}{\tau ^{k}} \text {.} \end{aligned}$$
(18)

Then, the scaled dual variable has to be rescaled inversely [22], to adjust for the right \(\textbf{y}^{k+1}\). To make it easier to follow the general idea, we switch the order of both dual feasibility conditions for a moment. The approach for the residual balancing is hidden in (15), eventually. However, (16) is needed for completeness:

As the \(\textbf{z}\)-update happens after the \(\textbf{x}\)-update, \(\textbf{x}^{k+1}\leftarrow \textbf{x}^{k}\) is already updated. Per definition, the \(\textbf{z}\)-update \(\mathcal {L}_{\rho } (\textbf{x}^{k+1},\textbf{z},\textbf{u}^{k})\) is minimized by \(\textbf{z}^{k+1}\), and deriving with respect to \(\textbf{z}\) and substituting it into the equation (16) yield the following [22]:

$$\begin{aligned} 0\in & {} \partial _{\textbf{z}} \mathcal {L}_{\rho }(\textbf{x}^{k+1},\textbf{z}^{k+1},\textbf{u}^{k}) \nonumber \\= & {} \partial _{\textbf{z}}g(\textbf{z}^{k+1}) +\rho ^{k} \textbf{B}^{T}\textbf{u}^{k} + \rho ^{k} \textbf{B}^{T}\left( \textbf{Ax}^{k+1} + \textbf{Bz}^{k+1} -\textbf{c}\right) \nonumber \\= & {} \partial _{\textbf{z}}g(\textbf{z}^{k+1}) + \rho ^{k+1} \textbf{B}^{T}\textbf{u}^{k+1} \text {,} \end{aligned}$$
(19)

showing that optimality is always assured by \(\textbf{z}^{k+1}\) and \(\textbf{u}^{k+1}\), even when \(\rho \) is changing.

Thus, it is left to show how the \(\textbf{x}\)-update satisfies (15). Since the \(\textbf{x}\)-update comes before the \(\textbf{z}\)-update, \(\textbf{z}^{k}\) has not been updated yet. As per definition, the \(\textbf{x}\)-update \(\mathcal {L}_{\rho } (\textbf{x},\textbf{z}^{k},\textbf{u}^{k})\) is minimized by \(\textbf{x}^{k+1}\), and deriving with respect to \(\textbf{x}\) and substituting it into the equation (15) yield the following [22]:

$$\begin{aligned} 0\in & {} \partial _{\textbf{x}} \mathcal {L}_{\rho }(\textbf{x}^{k+1},\textbf{z},\textbf{u}^{k}) \qquad \nonumber \\= & {} \partial _{\textbf{x}} f(\textbf{x}^{k+1}) + \rho ^{k} \textbf{A}^{T}\textbf{u}^{k} + \rho ^{k} \textbf{A}^{T}\,\left( \textbf{Ax}^{k+1} + \textbf{Bz}^{k} -\textbf{c}\right) \nonumber \\= & {} \partial _{\textbf{x}} f(\textbf{x}^{k+1}) + \rho ^{k+1} \textbf{A}^{T}\textbf{u}^{k+1} +\rho ^{k} \textbf{A}^{T} \textbf{B}\left( \textbf{z}^{k}-\textbf{z}^{k+1}\right) \nonumber \\&\quad \Leftrightarrow ~&\!\!\! \rho ^{k} \textbf{A}^{T} \textbf{B}\left( \textbf{z}^{k+1}-\textbf{z}^{k}\right) \in \partial _{\textbf{x}} f(\textbf{x}^{k+1}) + \rho ^{k+1} \textbf{A}^{T}\textbf{u}^{k+1} \text {.} \end{aligned}$$
(20)

In order to assure optimality, the first term, which can be seen as the dual residual [9, 22]

$$\begin{aligned} \textbf{S}^{k+1}=\rho ^{k} \textbf{A}^{T}\textbf{B} \left( \textbf{z}^{k+1}-\textbf{z}^{k}\right) \text { ,} \end{aligned}$$
(21)

has to vanish. Further, Wohlberg suggested the usage of relative residuals [36] making them invariant to problem scaling. In contrast to his approach, we propose to alter the normalization of the relative primal residual \(\textbf{R}_{rel}^{k+1}\) slightly, such that \(\textbf{c}\) is grouped with the \(\textbf{Bz}^{k+1}\) norm, making the norm more resilient against high constants \(\textbf{c}\):

$$\begin{aligned}&\textbf{S}_{rel}^{k+1} =&\frac{\rho ^{k} \textbf{A}^{T}\textbf{B} \left( \textbf{z}^{k+1}-\textbf{z}^{k}\right) }{{{\,\textrm{max}\,}}\left\{ \Vert \partial _{\textbf{x}} f(\textbf{x}^{k+1})\Vert _{2} ,\; \Vert \rho ^{k} \textbf{A}^{T}\left( \textbf{u}^{k}+\textbf{R}^{k+1} \right) \Vert _{2} \right\} }\text {,}\end{aligned}$$
(22)
$$\begin{aligned}&\textbf{R}_{rel}^{k+1} =&\frac{\textbf{Ax}^{k+1} +\textbf{Bz}^{k+1}-\textbf{c}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{Ax}^{k+1} \Vert _{2},\;\Vert \textbf{Bz}^{k+1} - \textbf{c} \Vert _{2}\right\} } \qquad \qquad \qquad \quad \text {.} \end{aligned}$$
(23)

He et al. [37] define the distance from convergence for ADMM based on these residuals as \(D^{k+1} =\Vert \,\textbf{R}^{k+1}\,\Vert _{2}^{2} +\Vert \,\textbf{S}^{k+1}\,\Vert _{2}^{2}\) allowing to increase the performance of the algorithm. Equations (10) and (1112) suggest a decreasing norm of the primal residual for an increasing \(\rho \), but (21) suggests an increase of dual residual norm simultaneously—for decreasing \(\rho \) vice versa. So, somehow, balancing \(\rho \), such that both residuals approach zero, seems to be a reasonable heuristic for minimizing the overall distance [36]. As both norms for the relative residuals converge, a similar relative distance measure \(D_{rel}^{k+1} =\Vert \,\textbf{R}_{rel}^{k+1}\,\Vert _{2}^{2} +\Vert \,\textbf{S}_{rel}^{k+1}\,\Vert _{2}^{2}\) must also vanish, showing that the normalization of the residuals is also a valid strategy.

For the purpose of utilizing both residuals, Boyd proposes an update scheme for \(\rho \), which keeps both residuals in the same magnitude, called “residual balancing” [22]:

$$\begin{aligned} \rho ^{k+1} = {\left\{ \begin{array}{ll} \rho ^{k}\cdot \,\tau &{} \text {if } \Vert \,\textbf{R}^{k+1}\Vert _{2}> \mu \,\Vert \,\textbf{S}^{k+1}\,\Vert _{2}\\ \rho ^{k}\,/\,\tau &{} \text {if } \Vert \,\textbf{S}^{k+1}\,\Vert _{2} > \mu \,\Vert \,\textbf{R}^{k+1}\Vert _{2}\\ \rho ^{k} &{} \text {otherwise} \end{array}\right. } \quad \text {,} \end{aligned}$$
(24)

with a balancing parameter \(\mu \!=\!\text {10}\) and \(\tau \!=\!\text {2}\). This approach was found successful by various authors [38,39,40,41]. In the proof by He et al. [37], convergence for this adaptive scheme comes down to bounds on the sequence \(\rho ^{k}\) [37]:

$$\begin{aligned}&{{\,\textrm{inf}\,}}_{k} \left\{ \rho ^{k}\right\} >\text {0} \quad \text {and}\quad {{\,\textrm{sup}\,}}_{k} \left\{ \rho ^{k}\right\} <+\infty \text {,}\end{aligned}$$
(25)
$$\begin{aligned}&\;\,\text {with} \;\; \sum _{k=0}^{\infty } \tau _k - 1< + \infty \qquad \qquad \qquad \text {.} \end{aligned}$$
(26)

The latter is a slight variation of the original proof, as He et al. define their factor (here \(\hat{\tau }\)) slightly different than the authors do \(\tau =(\hat{\tau }+1) \). The main consequence of this proof is that \(\tau \rightarrow 1\) must converge eventually. Since the definition \(\tau > 1\) is necessary for \(\tau \) to be useful, it increases the sum (26) every iteration indefinitely until it converges to 1. In Boyd’s approach, the infinite sum is mainly prevented by a high value of \(\mu \), hoping that at some point, \(\textbf{R}\) and \(\textbf{S}\) diverge so slowly from one another, that this threshold basically stops any further updates from happening. A low value of \(\mu \) in contrast would be desirable, as it would adjust \(\rho \) much faster. However, a low \(\mu \) leads to balancing \(\rho ^{k}\) regularly, making it oscillate around an optimal \(\rho \) and never attaining it. This necessarily violates (26) for any value \(\tau \ne 1\), which, despite converging, would be pointless. This scheme is thus compelled to updating \(\rho \) quite slowly, as it takes time for \(\textbf{R}\) and \(\textbf{S}\) to diverge from one another.

Both residuals can be utilized in a more direct way, as Wohlberg showed by using relative residuals, extending the scheme to choosing \(\tau \!=\!\tau ^{k}\) at each iteration and utilizing a ratio of the relative residuals \(\upsilon ^{k+1}\) to make the \(\rho \) update much faster [36]:

$$\begin{aligned} \tau ^{k+1} =\!{\left\{ \begin{array}{ll} \;\,\upsilon ^{k+1} &{} \!\!\text {if } \tau _{max} \ge \upsilon ^{k+1} \ge 1\\ \left( \upsilon ^{k+1}\right) ^{-1} &{} \!\!\text {if } \tau _{max}^{-1} \le \upsilon ^{k+1} < 1\\ \;\; \tau _{max} &{} \!\!\text {otherwise} \end{array}\right. } \text { with}\; \\ \upsilon ^{k+1} = \sqrt{ {\Vert \textbf{R}_{rel}^{k+1} \Vert _{2}}/{\Vert \textbf{S}_{rel}^{k+1} \Vert _{2}}} \text {.}\qquad \qquad \qquad \qquad \qquad \;\;\; \nonumber \end{aligned}$$
(27)

Since \(\tau _{max}\) provides a bound on \(\tau \), convergence still holds for the extension, as the sequence \(\rho ^{k}\) would be bounded away from zero and infinity [36]. But again, \(\tau \) must eventually settle to 1 or updates must unnecessarily be delayed by a high value of \(\mu \).

However, the general idea of delaying the updates is not wrong, but the method of thresholding the residuals by a factor \(\mu \) proves to be unpredictable. Since the initial choice of \(\rho \) determines the development of both primal and dual residual, respectively, it depends on that very choice whether the first balancing of \(\rho \) appears sooner or later during the iterations. But since the progress in the early iterations is much larger than in the later ones, it is important to place \(\rho \) on the right track, as early as possible.

Here, we propose a similar method with neither the strict requirement \(\tau \!\!\rightarrow \!\! 1\) nor the unpredictability. By choosing \(\mu \!=\!1\), the above extended scheme (24) with (27) can be simplified to the following:

$$\begin{aligned} \tau ^{k+1}&\!\!\!\!= {\left\{ \begin{array}{ll} \upsilon ^{k+1} &{} \text {if }\quad \quad \;0< \upsilon ^{k+1}< \infty \\ {{\,\textrm{min}\,}}\;\left\{ \rho ^{-1} ,\, \tau _{ext} \right\} &{} \text {elseif } \rho < 1 \\ {{\,\textrm{max}\,}}\left\{ \rho ^{-1},\,\tau _{ext}^{-1}\right\} &{} \text {elseif } \rho > 1 \\ 1 &{}\text {otherwise} \end{array}\right. }\text {.} \end{aligned}$$
(28)
$$\begin{aligned} \rho ^{k+1}&= {\left\{ \begin{array}{ll} \rho ^{k}\cdot \tau ^{k+1} &{} \text {if}\quad \quad \;\; \rho _{max}^{-1} \le \rho ^{k}\cdot \tau ^{k+1} \le \rho _{max}\\ {\rho _{max}^{-1}}/{\rho ^{k}} &{}\text {if} \quad \quad \;\; \rho _{max}^{-1} < \rho ^{k}\cdot \tau ^{k+1}\\ {\rho _{max}}/{\rho ^{k}} &{}\text {if} \quad \quad \;\; \, \qquad \qquad \rho ^{k}\cdot \tau ^{k+1}> \rho _{max}\\ \end{array}\right. } \end{aligned}$$
(29)

We propose to balance \(\rho \) by the above equations, but to increase the length of the update interval exponentially. This could be done by, e.g., updating ten times per magnitude of iterations, such that the updates are exponentially spaced:

$$\begin{aligned} \tau ^{k+1}&= {\left\{ \begin{array}{ll} \tau ^{k+1} &{} \text {if }\; k \in \text {Val} \\ 1 &{} \text {otherwise } \end{array}\right. }\text {,} \end{aligned}$$
(30)

where Val is a series of exponentially growing values. Thus, the update occurs regularly in the early iterations, where it is necessary to find a good \(\rho \), and fades out at a later stage, when convergence is more important. With the exponential increase of the update interval, the interval length eventually approaches infinity, which is equal to the standard ADMM, assuring convergence. Since \(\tau ^{k}=1\) for every step without balancing, the mean value of all \(\tau ^{k}\) between two balancing events approaches one, which satisfies (26).

In the above equations, \(\rho \) is bounded away from zero and infinity by forbidding a direct scaling with these and by limiting the number of update steps. However, \(\upsilon ^{k+1}\) being either zero or infinity indicates an over-extension of the balancing parameter \(\rho \). A useful strategy is to relief an overextended \(\rho \) by pushing the parameter back towards 1 by introducing a factor \(\tau _{ext} > 1\), with which \(\rho \) is either increased or decreased monotonically until \(\upsilon ^{k+1}\! \in \! \left( 0,\infty \right) \) again is within boundaries and is well defined. For both, increasing or decreasing \(\rho \) monotonically He et al. proved convergence [37]. Further, the authors suggest to set a limit \(\rho _{max}\) to assure the numerical stability of the algorithm.

2 ADMM with total generalized variation

In order to distinguish “true” image \(\textbf{x}\) and noise, some additional information about the image needs to be taken into consideration. The total variation approach relies on the assumption that images are smooth to a certain degree and noise is highly volatile. Consequently, derivatives can be utilized to find the noise, as the derivative of the noise is expected a lot higher than the derivative of the smooth and slowly changing image. However, if the image has a gradient itself, this assumption in inconvenient as part of the image slope is forced to be piece-wise constant in the reconstruction, leading to staircase artifacts [18]. Here, a higher-order derivative can help to smoothen the gradient of the image, instead of the image itself, leading to a better approximation of the image, but at the cost of preserving sharp edges.

Total generalized variation of second order as a method for image smoothing was first described by Bredis et al. [17] aiming to combine both orders of derivatives in a beneficial way, such that regions of the image are denoised in the most optimal manner with the help of self-adapting filter masks.

In terms of an isotropic TGV penalizer, an ADMM algorithm can be formulated following Shirai and Okuda [42]:

$$\begin{aligned} \Vert \textbf{x}\Vert _{TGV}^{\lambda _0,\lambda _1} := \underset{\textbf{Dx}=\textbf{s}+\textbf{t}}{{{\,\textrm{min}\,}}} \left( \lambda _{0}\Vert \textbf{s}\Vert _{F} + \lambda _{1}\Vert \textbf{Gt}\Vert _{F}\right) \text { .} \end{aligned}$$
(31)

The auxiliary matrices \(\textbf{s}\) and \(\textbf{t}\) are linked to the flat and sloped region of the image, where \(\textbf{t}\) contains the latter. The matrices \(\textbf{D}\) and \(\textbf{G}\) contain derivatives in all possible directions. For images, they are given as follows [42]:

$$\begin{aligned} \textbf{D} = \begin{pmatrix} \textbf{D}_{h}\\ \textbf{D}_{v} \end{pmatrix} \qquad \text {,}\qquad \textbf{G}^{T} = \begin{pmatrix} \textbf{D}_{h} &{} \textbf{D}_{v} &{} {\mathbb {0}} \\ \mathbf {\mathbb {0}} &{} \textbf{D}_{h} &{} \textbf{D}_{v} \end{pmatrix}\text { ,} \end{aligned}$$
(32)

where the individual entries \(\textbf{D}_{h}\) and \(\textbf{D}_{v}\) are forward difference operators for the horizontal and vertical direction. Both and the matrix of zeros \(\mathbf {\mathbb {0}}\) follow a “block circulant with circulant blocks” (BCCB) structure [42]. So for an image with \(N \times M\) pixels, all entries \(\textbf{D}_{h,v}\) and the matrix of zeros \(\mathbf {\mathbb {0}}\) have the same size and dimension as said image \(\left\{ \textbf{D}_{h},\textbf{D}_{v},\mathbf {\mathbb {0}}\right\} \in \mathbb {R}^{N\times M}\). According to (31), the minimization separates the image in flat and sloped regions, which later serve as filter masks for first- and second-order smoothing. Thus, \(\textbf{t}\) contains the first-order derivatives of the sloped regions of the image, at which a further derivation \(\textbf{G}\) is applied. These sloped regions get subtracted from \(\textbf{Dx}\) leaving over the first-order filter mask \(\textbf{s}\), which then only contains the flat regions of the image.

2.1 Gaussian noise ADMM with TGV

Depending on the noise type, the structure of the ADMM must be adapted. For the Gaussian-type noise, following (3), it can be formulated as follows [42]:

$$\begin{aligned} \underset{\textbf{x},\textbf{z}_{1},\textbf{z}_{2}}{{{\,\mathrm{arg\,min}\,}}} \quad {1}/{\left( 2\sigma ^{2}\right) }\,\Vert \varvec{\Omega } {\textbf {x}}- \varvec{\xi } \Vert _{F}^{2} + \lambda _{0}\Vert \textbf{z}_{1}\Vert _{F} + \lambda _{1}\Vert \textbf{z}_{2}\Vert _{F}\quad \nonumber \\ \text {subject to} \,\qquad \quad \quad \textbf{z}_{1} =\textbf{Dx}-\textbf{t} \text {, } \textbf{z}_{2} = \textbf{Gt} \qquad \;\;\,\text {,} \end{aligned}$$
(33)

with the acquired image \(\varvec{\xi }\), a point-spread-function \(\varvec{\Omega }\), and the regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\). The augmented Lagrangian to be solved is given as follows [42]:

$$\begin{aligned}&\mathcal {L}_{\rho ,\eta }\left( \textbf{x},\textbf{t},\textbf{z}_{1},\textbf{z}_{2},\textbf{u}_{1},\textbf{u}_{2} \right) \!=\!& {1}/{\left( 2\sigma ^{2}\right) }\Vert \varvec{\Omega } {\textbf {x}}-\varvec{\xi }\Vert _{F}^{2}\\{} & {} + \lambda _{0}\,\Vert \textbf{z}_{1}\Vert _{F} + {\rho }/{2}\, \Vert \textbf{Dx}-\textbf{t}-\textbf{z}_{1}\!+\!\textbf{u}_{1}\Vert _{F}^{2}\!-\! {\rho }/{2} \,\Vert \,\textbf{u}_{1}\, \Vert _{F}^{2}\quad \quad \nonumber \\{} & {} + \lambda _{1}\,\Vert \textbf{z}_{2}\Vert _{F}+ {\eta }/{2}\, \Vert \textbf{Gt}-\textbf{z}_{2}\!+\!\textbf{u}_{2}\Vert _{F}^{2}\;\;\;\;\;\;\!-\! {\eta }/{2} \,\Vert \,\textbf{u}_{2}\, \Vert _{F}^{2} \text {.}\nonumber \end{aligned}$$
(34)

with the penalty parameters \(\rho \) and \(\eta \). Note that for pure denoising problems, the “deconvolution kernel” \(\varvec{\Omega }\) simplifies to a matrix of ones \(\varvec{\Omega } = \mathbb {1}\). Further, it is important to mention that often in literature, \(\sigma ^{2}\) is found to be absorbed into \(\hat{\rho }\!=\!\rho \sigma ^{2}\) and \(\hat{\eta }\!=\!\eta \sigma ^{2}\) and thus excluded from the equation. But to show the dependency of the optimal \(\lambda \! \propto \! {1}/{\sigma }\), it is more convenient to stay with the definition chosen here.

Starting with \(\textbf{x}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{t}_{i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{z}_{1,i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{z}_{2,j}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{u}_{1,i}^{0}= \mathbf {\mathbb {0}}\) and \(\textbf{u}_{2,j}^{0}= \mathbf {\mathbb {0}}\), with i and j denoting the respective directions of the derivatives and \(\mathbf {\mathbb {0}}\) being matrices of zeros, the iteration steps are given as follows [42]:

$$\begin{aligned} \!\left\{ \textbf{x}^{k+1},\textbf{t}^{k+1} \right\} =&\underset{\textbf{x},\textbf{t}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho ,\eta }\left( \textbf{x},\textbf{t},\textbf{z}_{1}^{k},\textbf{z}_{2}^{k},\textbf{u}_{1}^{k},\textbf{u}_{2}^{k} \right) \;\;\,\text {,}\,\nonumber \\ \textbf{z}_{1,i}^{k+1} \;\; =&\underset{\textbf{z}_{1,i}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho } \left( \textbf{x}^{k+1},\textbf{t}_{i}^{k+1},\textbf{z}_{1,i},\textbf{u}_{1,i}^{k} \right) \, \text {,} \nonumber \\ \textbf{z}_{2,j}^{k+1} \;\; =&\underset{\textbf{z}_{2,j}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\eta } \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{2,j},\textbf{u}_{2,j}^{k} \right) \,\text {,} \\ \textbf{u}_{1,i}^{k+1} \;\; =&\textbf{u}_{1,i}^{k}+ \left( \left[ \textbf{Dx}^{k+1}\right] _{i}-\textbf{t}_{i}^{k+1}-\textbf{z}_{1,i}^{k+1} \right) \,\,\,\, \text {,}\, \nonumber \\ \textbf{u}_{2,j}^{k+1} \;\; =&\textbf{u}_{2,j}^{k}+ \left( \left[ \textbf{Gt}^{k+1}\right] _{j}-\textbf{z}_{2,j}^{k+1} \right) \qquad \quad \; \; \;\, \text {.}\nonumber \end{aligned}$$
(35)

For images \(i \in \left\{ h,v\right\} \) and \(j \in \left\{ h,d,v \right\} \), with d denoting the diagonal direction connected to both horizontal and vertical derivatives. Further, \(\textbf{t}\), \(\textbf{z}_{1,2}\), and \(\textbf{u}_{1,2}\) are given as follows [42]:

$$\begin{aligned} \textbf{t}&= \;\begin{pmatrix} \textbf{t}_{h}\\ \textbf{t}_{v} \end{pmatrix} \;\;\;\;&\text {,} \nonumber \\ \mathbf {z_1}&= \;\begin{pmatrix} \textbf{z}_{1,h}\\ \textbf{z}_{1,v} \end{pmatrix} \;&\text {,} \quad \mathbf {u_1}= \,\begin{pmatrix} \textbf{u}_{1,h}\\ \textbf{u}_{1,v} \end{pmatrix} \;\;\text {,}\;\,\\ \mathbf {z_2}&= \begin{pmatrix} \textbf{z}_{2,h}\\ \textbf{z}_{2,d}\\ \textbf{z}_{2,v} \end{pmatrix}&\text {,} \quad \mathbf {u_2}= \begin{pmatrix} \textbf{u}_{2,h}\\ \textbf{u}_{2,d}\\ \textbf{u}_{2,v} \end{pmatrix}\,\; \text {,} \nonumber \end{aligned}$$
(36)

with, according to the BCCB structure, all entries having the size of the image. The solution for the \(\textbf{x}\)- and \(\textbf{t}\)- update is shown in detail by Shirai and Okuda [42]. Note, however, that both \(\textbf{x}\)- and \(\textbf{t}\)-updates minimize the same Lagrangian, but with respect to different variables. The minimization of \(\textbf{z}_{1,i}\) and \(\textbf{z}_{2,j}\) can be done via the “block soft threshold operator” \(\mathcal {S}_{\gamma }\) [42], which is given as follows [42]:

$$\begin{aligned} \!\!\!\!\!\!\!\!\!\!\!\!\mathcal {S}_{\gamma }\left( \textbf{v}\right) ={{\,\textrm{max}\,}}\left\{ \left( 1 - \frac{\gamma }{\Vert \, \textbf{v} \,\Vert _{2}} \right) \textbf{v}_{i} \, ,\,0\,\right\} \text { ,} \end{aligned}$$
(37)

such that:

$$\begin{aligned} \textbf{z}_{1,i}^{k+1} =&\mathcal {S}_{ {\lambda _{0}}/{\rho }}\left( \textbf{Dx}^{k+1}-\textbf{t}^{k+1}+\textbf{u}_{1}^{k}\right) \text {,} \end{aligned}$$
(38)
$$\begin{aligned} \textbf{z}_{2,j}^{k+1} =&\mathcal {S}_{ {\lambda _{1}}/{\eta }}\left( \textbf{Gt}^{k+1}+\textbf{u}_{2}^{k}\right) \qquad \quad \;\;\;\text {,} \end{aligned}$$
(39)

with i and j denoting the respective direction of the derivative, which in 2D is \(i \in \left\{ h,v\right\} \) for (38) and \(j \in \left\{ h,d,v \right\} \) for (39). This operator is applied pixel-wise to the image.

2.2 Poisson noise ADMM with TGV

Incorporating the log-likelihood functional (7) directly into ADMM leads to problems of finding a straightforward solution for the \(\textbf{x}\)- and \(\textbf{t}\)-update. Thus, it is usually used as a penalizer functional g(\(\textbf{z}\)), as these problems do not occur then. For data corrupted with Poisson noise, the minimization states as follows:

$$\begin{aligned} \underset{\textbf{x},\textbf{z}_{0},\textbf{z}_{1},\textbf{z}_{2}}{{{\,\mathrm{arg\,min}\,}}}\;\;\;&G_{\mathcal {P}}(\textbf{z}_{0}) + \lambda _{0}\Vert \textbf{z}_{1}\Vert _{F} + \lambda _{1}\Vert \textbf{z}_{2}\Vert _{F}&\nonumber \\ \text {subject to} \;&\textbf{z}_{0}=\varvec{\Omega } {\textbf {x}} \text {, } \textbf{z}_{1} =\textbf{Dx}-\textbf{t} \text {, } \textbf{z}_{2} = \textbf{Gt}&\!\!\!\! \text {,} \end{aligned}$$
(40)

with the corresponding augmented Lagrangian:

$$\begin{aligned}&\mathcal {L}_{\varphi ,\rho ,\eta }\left( \textbf{x},\textbf{t},\textbf{z}_{0},\textbf{z}_{1},\textbf{z}_{2},\textbf{u}_{0},\textbf{u}_{1},\textbf{u}_{2} \right) =&\quad G_{\mathcal {P}}(\textbf{z}_{0}) + {\varphi }/{2} \Vert \varvec{\Omega } {\textbf {x}}-\textbf{z}_{0} + \textbf{u}_{0}\Vert _{F}^{2} \quad \;\;\;\,\, - {\varphi }/{2} \,\Vert \,\textbf{u}_{0}\, \Vert _{F}^{2}\nonumber \\{} & {} + \lambda _{0}\Vert \textbf{z}_{1}\Vert _{F} + {\rho }/{2} \Vert \textbf{Dx}-\textbf{t}-\textbf{z}_{1}+\textbf{u}_{1}\Vert _{F}^{2} - {\rho }/{2} \,\Vert \,\textbf{u}_{1}\, \Vert _{F}^{2}\nonumber \\{} & {} + \lambda _{1}\Vert \textbf{z}_{2}\Vert _{F}+ {\eta }/{2} \Vert \textbf{Gt}-\textbf{z}_{2}+\textbf{u}_{2}\Vert _{F}^{2} \quad \;\;\;\,- {\eta }/{2} \,\Vert \,\textbf{u}_{2}\, \Vert _{F}^{2} \text {.} \end{aligned}$$
(41)

As initial values for the algorithm, the following are chosen: \(\textbf{x}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{t}_{i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{z}_{0}^{0}= \varvec{\xi }\), \(\textbf{z}_{1,i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{z}_{2,j}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{u}_{0}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{u}_{1,i}^{0}= \mathbf {\mathbb {0}}\) and \(\textbf{u}_{2,j}^{0}= \mathbf {\mathbb {0}}\). With this, the iteration steps then are given as:

$$\begin{aligned} \!\!\!\!\left\{ \textbf{x}^{k+1}\!\!,\textbf{t}^{k+1}\! \right\} =&\, \underset{\textbf{x},\textbf{t}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\varphi ,\rho ,\eta }\left( \textbf{x},\textbf{t},\textbf{z}_{0}^{k},\textbf{z}_{1}^{k},\textbf{z}_{2}^{k},\textbf{u}_{0}^{k},\textbf{u}_{1}^{k},\textbf{u}_{2}^{k} \right) \! \text {,}\nonumber \\ \!\!\textbf{z}_{0}^{k+1} \;\;\! \!\!=&\underset{\textbf{z}_{0}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\varphi } \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{0},\textbf{u}_{0}^{k} \right) \quad \quad \quad \;\;\;\;\;\; \text {,}\nonumber \\ \textbf{z}_{1,i}^{k+1} \;\;\! =&\underset{\textbf{z}_{1,i}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho } \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{1},\textbf{u}_{1}^{k} \right) \qquad \quad \;\;\;\;\;\;\text {,} \nonumber \\ \textbf{z}_{2,j}^{k+1} \;\;\! =&\ \underset{\textbf{z}_{2,j}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\eta } \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{2},\textbf{u}_{2}^{k} \right) \qquad \quad \;\;\;\;\;\;\, \text {,}\\ \textbf{u}_{0}^{k+1} \;\;\! =&\ \textbf{u}_{0}^{k}\;\;\, + \left( \varvec{\Omega } {\textbf {x}}^{k+1}-\textbf{z}_{0}^{k+1} \right) \!\!\qquad \qquad \quad \quad \;\;\;\;\;\;\;\;\; \text {,}\nonumber \\ \textbf{u}_{1,i}^{k+1} \;\;\! =&\;\,\textbf{u}_{1,i}^{k} \,+ \left( \left[ \textbf{Dx}^{k+1}\right] _{i}-\textbf{t}_{i}^{k+1}-\textbf{z}_{1,i}^{k+1} \right) \qquad \, \text {,}\;\nonumber \\ \textbf{u}_{2,j}^{k+1} \;\; \! =&\,\textbf{u}_{2,j}^{k}\,+ \left( \left[ \textbf{Gt}^{k+1}\right] _{j}-\textbf{z}_{2,j}^{k+1} \right) \qquad \quad \quad \quad \text {,}\nonumber \end{aligned}$$
(42)

The resulting Lagrangian can be solved in the same manner as the Gaussian noise ADMM as shown by Shirai and Okuda [42]. However, despite the slight changes in the \(\textbf{x}\)- and \(\textbf{t}\)-update, the only difference in the iteration pattern is the additional \(\textbf{z}_{0}\) and \(\textbf{u}_{0}\)-update step, together with the introduction of a new penalty parameter \(\varphi \). The minimization with respect to \(\textbf{z}_{0}\) is given by the Poisson noise operator, which equates to the following [34]:

$$\begin{aligned} \textbf{z}_{0}^{k+1} = \underset{\mathcal {P}_{\varphi }}{{{\,\textrm{prox}\,}}} \left( \varvec{\Omega } {\textbf {x}}^{k+1}+\textbf{u}_{0}^{k} \right) \text { ,} \end{aligned}$$
(43)

with the operator being defined as follows [34]:

$$\begin{aligned} \underset{\mathcal {P}_{\varphi }}{{{\,\textrm{prox}\,}}}\left( \textbf{v}\right) =-\frac{1-\varphi \textbf{v}}{2\varphi } + \sqrt{\left( \frac{1-\varphi \textbf{v}}{2\varphi }\right) ^{2}+\frac{\varvec{\xi }}{\varphi }} \text {.} \end{aligned}$$
(44)

Again, this operator is applied pixel-wise. All other \(\textbf{z}\)-updates are carried out as in the Gaussian ADMM.

3 ADMM-TGV with residual balancing

When utilizing the residual balancing for ADMM-TGV algorithms, as described in Sect. 1.4, \(\rho \), \(\eta \), and \(\varphi \) are to be balanced here. So, the overall structure can be divided into two parts for the Gaussian algorithm—one for the \(\rho \)-balancing and one for the \(\eta \)-balancing. Further, a third part for the Poisson algorithm is needed—the \(\varphi \)-balancing.

3.1 Primal feasibility conditions of ADMM-TGV

Since \(\rho \), \(\eta \), and \(\varphi \) are scalars, acting on all pixels equally, we need to find global residuals, depending on all pixels. Following the pattern of Sect. 1.4 by deriving the augmented Lagrangian for Gaussian (see (34)) or Poisson noise (41) with respect to the unscaled dual variables \(\textbf{y}_{1,i}=\rho \textbf{u}_{1,i}\) and \(\textbf{y}_{2,j}=\rho \textbf{u}_{2,j}\) (and additionally \(\textbf{y}_{0}=\rho \textbf{u}_{0}\) for Poisson) for all respective directions leads to the primal feasibility conditions with their respective relative residuals:

$$\begin{aligned} R^{k+1}_{rel,\rho }&= \frac{\Vert \textbf{R}^{k+1}_{\rho } \Vert _{F}}{N\left( \textbf{R}^{k+1}_{\rho }\right) }&= \frac{\Vert \textbf{Dx}^{k+1} - \textbf{t}^{k+1} - \textbf{z}_{1}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{Dx}^{k+1}-\textbf{t}^{k+1}\Vert _{F},\; \Vert \textbf{z}_{1}^{k+1}\Vert _{F}\right\} }\;\text {,} \end{aligned}$$
(45)
$$\begin{aligned} R^{k+1}_{rel,\eta }&= \frac{\Vert \textbf{R}^{k+1}_{\eta }\Vert _{F}}{N\left( \textbf{R}^{k+1}_{\eta }\right) }&= \frac{\Vert \textbf{G}\textbf{t}^{k+1}\, - \textbf{z}_{2}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{G}\textbf{t}^{k+1}\, \Vert _{F},\;\Vert \textbf{z}_{2}^{k+1}\Vert _{F}\right\} }\qquad \quad \quad \text {,} \end{aligned}$$
(46)

with the standard primal residual \(\Vert \textbf{R}_{\rho }^{k+1}\Vert _{F}\) in the numerator and the normalization for the scaled residual \(N\left( \textbf{R}^{k+1}_{\rho }\right) \) in the denominator. Here, \(\textbf{Dx}^{k+1}-\textbf{t}^{k+1} = \textbf{s}^{k+1}\) form a single norm in the normalization (45) instead of appearing as separated norms like (23) suggests, because both are coupled via (31).

For the Poisson TGV, a primal residual for the new penalty parameter \(\varphi \) can be found as follows:

$$\begin{aligned} R^{k+1}_{rel,\varphi } = \frac{\Vert \textbf{R}^{k+1}_{\varphi } \Vert _{F}}{N\left( \textbf{R}^{k+1}_{\varphi }\right) } = \frac{\Vert \varvec{\Omega } {\textbf {x}}^{k+1} - \textbf{z}_{0}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \varvec{\Omega } {\textbf {x}}^{k+1}\Vert _{F}^{2}\;,\; \Vert \textbf{z}_{0}^{k+1}\Vert _{F}^{2}\right\} } \text { .} \end{aligned}$$
(47)

3.2 Dual feasibility conditions of ADMM-TGV

Since the updates happen jointly in (35) and (42), \(\textbf{x}^{k}\) and all \(\textbf{t}_{i}^{k}\) are not updated yet, regardless which variable we consider. To find the whole set of dual residuals, the augmented Lagrangian has to be derived with respect to \(\textbf{x}\) and to all \(\textbf{t}_{i}\). This leads to the equations:

$$\begin{aligned} \textbf{x}^{k+1} =&\underset{\textbf{x}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho ,\eta }\left( \textbf{x},\textbf{t}^{k},\textbf{z}_{1}^{k},\textbf{z}_{2}^{k},\textbf{u}_{1}^{k},\textbf{u}_{2}^{k} \right) \qquad \text{, }\end{aligned}$$
(48)
$$\begin{aligned} \textbf{t}_{i}^{k+1} =&\underset{\textbf{t}_{i}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho ,\eta }\left( \textbf{x}^{k},\textbf{t}_{i},\textbf{t}_{l}^{k},\textbf{z}_{1}^{k},\textbf{z}_{2}^{k},\textbf{u}_{1}^{k},\textbf{u}_{2}^{k} \right) \text{, } \end{aligned}$$
(49)

with \(\textbf{t}_{l}\) denoting all other vector elements of \(\textbf{t}\) excluding the i-th element (for images \(l \in \left\{ h,v \, | \, l \ne i\right\} \)) as both \(\textbf{x}\) and \(\textbf{t}_{i}\) rely on the former iterates of the respective other variables.

Deriving the augmented Lagrangian with respect to \(\textbf{x}\) leads to the first dual feasibility condition:

$$\begin{aligned} 0 &=&\partial _{x} f(\textbf{x})+\rho ^{k} \textbf{D}^{T}\left( \textbf{Dx}^{k+1}-\textbf{t}^{k}-\textbf{z}_{1}^{k}+\textbf{u}_{1}^{k}\right) \nonumber\\&=&\partial _{\textbf{x}} f(\textbf{x})+\rho ^{k} \textbf{D}^{T}\left( \textbf{u}_{1}^{k+1} + \textbf{t}^{k+1} + \textbf{z}_{1}^{k+1} -\textbf{t}^{k}-\textbf{z}_{1}^{k} \right) \nonumber\\ &\Leftrightarrow& - \rho ^{k} \textbf{D}^{T} \left( \textbf{t}^{k+1} + \textbf{z}_{1}^{k+1} - \textbf{t}^{k} - \textbf{z}_{1}^{k}\right) = {1}/{\sigma ^{2}}\,\varvec{\Omega}^{T}\left( \varvec{\Omega} \textbf{x}-\varvec{\xi}\right) + \rho^{k+1} \textbf{D}^{T} \textbf{u}_{1}^{k+1}, \end{aligned}$$
(50)

which can be seen as the dual residual, with \(f(\textbf{x})= {1}/{2\sigma ^{2}}\Vert \varvec{\Omega } {\textbf {x}}-\varvec{\xi }\Vert _{F}^{2}\) (see (34)) for the Gaussian algorithm. The corresponding relative dual residual is given as follows:

$$\begin{aligned} S^{k+1}_{rel,\rho ,x} = \frac{\Vert \textbf{S}^{k+1}_{\rho ,x}\Vert _{F}}{N\left( \textbf{S}^{k+1}_{\rho ,x}\right) } = \frac{\Vert -\rho ^{k} \left[ \textbf{D}^{T} \left( \textbf{t}^{k+1}-\textbf{t}^{k} + \textbf{z}_{1}^{k+1}-\textbf{z}_{1}^{k}\right) \right] \Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert {1}/{\sigma ^{2}}\,\varvec{\Omega }^{T}\left( \varvec{\Omega } {\textbf {x}}-\varvec{\xi }\right) \Vert _{F}\;,\;\Vert \rho ^{k} \textbf{D}^{T}\left( \textbf{u}_{1}^{k}+\textbf{R}_{1}^{k+1}\right) \Vert _{F}\right\} } \text { ,} \end{aligned}$$
(51)

with the standard dual residual \(\Vert \textbf{S}_{\rho ,x}^{k+1}\Vert \) in the numerator and the normalization for the scaled residual \(N\left( \textbf{S}^{k+1}_{\rho ,x}\right) \) in the denominator.

For the Poisson algorithm, however, the dual feasibility condition equates to the following:

$$\begin{aligned} -\varphi ^{k} \varvec{\Omega }^{T}\!\! \left( \textbf{z}_{0}^{k+1}-\textbf{z}_{0}^{k} \right) - \rho ^{k} \textbf{D}^{T}\!\!\left( \textbf{t}^{k+1} \!\!+\! \textbf{z}_{1}^{k+1} \!\! - \! \textbf{t}^{k} \!\! -\! \textbf{z}_{1}^{k}\right) \! = \varphi ^{k+1} \varvec{\Omega }^{T}\textbf{u}_{0}^{k+1} + \rho ^{k+1} \textbf{D}^{T}\textbf{u}_{1}^{k+1} \text {,} \end{aligned}$$
(52)

with \(f(\textbf{x})=\mathbb {0}\) (see (41)). Since all residuals must vanish eventually, we can rearrange the dual residual into a residual for the balancing of \(\rho \) and for \(\varphi \):

$$\begin{aligned}&S_{rel,\rho ,x}^{k+1}&= \frac{\Vert \textbf{S}_{\rho ,x}^{k+1}\Vert _{F}}{N\left( \textbf{S}_{\rho ,x}^{k+1}\right) } = \frac{\Vert \textbf{S}_{\rho ,x}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{N}^{*} \left( \textbf{S}_{\rho ,x}^{k+1}\right) \Vert _{F} ,\; \Vert \textbf{N}^{*}\left( \textbf{S}_{\varphi ,x}^{k+1}\right) -\textbf{S}_{\varphi ,x}^{k+1} \Vert _{F}\right\} } \text {,} \end{aligned}$$
(53)
$$\begin{aligned}&S_{rel,\varphi ,x}^{k+1}&=\frac{\Vert \textbf{S}^{k+1}_{\varphi ,x}\Vert _{F}}{N\left( \textbf{S}_{\varphi ,x}^{k+1}\right) } = \frac{\Vert \textbf{S}_{\varphi ,x}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{N}^{*}\left( \textbf{S}_{\varphi ,x}^{k+1}\right) \Vert _{F} ,\; \Vert \textbf{N}^{*} \left( \textbf{S}_{\rho ,x}^{k+1}\right) - \textbf{S}_{\rho ,x}^{k+1} \Vert _{F}\right\} } \text {,} \end{aligned}$$
(54)

with:

$$\begin{aligned}&\textbf{S}_{\rho ,x}^{k+1}&= -\;\rho ^{k} \left[ \textbf{D}^{T}\! \left( \textbf{t}^{k+1}-\textbf{t}^{k} + \textbf{z}_{1}^{k+1}-\textbf{z}_{1}^{k}\right) \right] \qquad \qquad \qquad \qquad \qquad \qquad \;\text {,} \nonumber \\&\!\!\!\!\!\!\!\! \textbf{N}^{*} \left( \textbf{S}_{\rho ,x}^{k+1}\right)&= \;\;\;\, \rho ^{k} \textbf{D}^{T}\!\left( \textbf{u}_{1}^{k}+\textbf{R}_{1}^{k+1}\right) \quad \;\;\; \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \;\;\;\;\;\text {,} \nonumber \\&\textbf{S}_{\varphi ,x}^{k+1}&= -\varphi ^{k} \varvec{\Omega }^{T}\!\! \left( \textbf{z}_{0}^{k+1}-\textbf{z}_{0}^{k} \right) \quad \;\;\; \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \;\;\text {,} \nonumber \\&\!\!\!\!\!\!\!\!\! \textbf{N}^{*}\left( \textbf{S}_{\varphi ,x}^{k+1}\right)&=\;\;\,\varphi ^{k}\varvec{\Omega }^{T}\!\left( \textbf{u}_{0}^{k}+\textbf{R}_{0}^{k+1}\right) \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \quad \;\;\;\;\;\, \text {.}\nonumber \end{aligned}$$

For both Poisson and Gaussian, (51) and (5354) show that a change between iterations in \(\textbf{z}_{1,i}\) can be induced by a change in \(\textbf{t}\), but then does not change the residual for the \(\textbf{x}\)-update.

Deriving the augmented Lagrangian ((34) or (41)) with respect to the dual variables \(\textbf{t}_{i}\) with n-dimensions leads to the following:

$$\begin{aligned}&0 =&- \rho ^{k} \left( \textbf{D}_{i}\textbf{x}^{k}-\textbf{t}_{i}^{k+1} -\textbf{z}_{1,i}^{k}+\textbf{u}_{1,i}^{k}\right) \nonumber \\{} & {} + \eta ^{k} \left( \begin{array}{ll} \left[ \textbf{G}^{T}\right] _{i}\nonumber \\ \left[ \textbf{G}^{T}\right] _{l} \end{array}\right) \left[ \left( \begin{array}{ll} \left[ \textbf{G}^{T}\right] _{i}^{T}&\left[ \textbf{G}^{T}\right] _{l}^{T}\end{array}\right) \left( \begin{array}{ll} \textbf{t}_{i}^{k+1}\\ \textbf{t}_{l}^{k} \end{array}\right) \right] + \eta ^{k} \textbf{D}_{i}\left( \textbf{u}_{2,i}^{k}- \textbf{z}_{2,i}^{k} \right) \nonumber \\{} & {} + \eta ^{k} \sum _{l} \textbf{D}_{l}\left( \textbf{u}_{2,il}^{k}- \textbf{z}_{2,il}^{k} \right) \text {,} \end{aligned}$$
(55)

with i being the direction of the derivative (or vector row) and (il) denoting all the diagonal vector elements for i, here for images \((il) \in \left\{ hv, vh \right\} =\left\{ d \right\} \). Solving for the relative dual residuals connected to both \(\rho \) and \(\eta \) leads to the following:

$$\begin{aligned}&S_{rel,\rho ,t_{i}}^{k+1}&= \frac{\Vert \textbf{S}^{k+1}_{\rho ,t_{i}}\Vert _{F}}{N\left( \textbf{S}^{k+1}_{\rho ,t_{i}}\right) } = \frac{\Vert \textbf{S}_{\rho ,t_{i}}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{N}^{*} \left( \textbf{S}_{\rho ,t_{i}}^{k+1}\right) \Vert _{F} ,\; \Vert \textbf{N}^{*}\left( \textbf{S}_{\eta ,t_{i}}^{k+1}\right) - \textbf{S}_{\eta ,t_{i}}^{k+1} \Vert _{F}\right\} } \text {,}\end{aligned}$$
(56)
$$\begin{aligned}&S_{rel,\eta ,t_{i}}^{k+1}&=\frac{\Vert \textbf{S}^{k+1}_{\eta ,t_{i}}\Vert _{F}}{N\left( \textbf{S}^{k+1}_{\eta ,t_{i}}\right) } = \frac{\Vert \textbf{S}_{\eta ,t_{i}}^{k+1}\Vert _{F}}{{{\,\textrm{max}\,}}\left\{ \Vert \textbf{N}^{*}\left( \textbf{S}_{\eta ,t_{i}}^{k+1}\right) \Vert _{F} ,\; \Vert \textbf{N}^{*} \left( \textbf{S}_{\rho ,t_{i}}^{k+1}\right) - \textbf{S}_{\rho ,t_{i}}^{k+1}\Vert _{F}\right\} } \text {,} \end{aligned}$$
(57)

with:

$$\begin{aligned}&\textbf{S}_{\rho ,t_{i}}^{k+1}&=\quad \rho ^{k} \left[ \textbf{D}_{i}\left( \textbf{x}^{k}-\textbf{x}^{k+1}\right) + \textbf{z}_{1,i}^{k+1}-\textbf{z}_{1,i}^{k}\right] \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \;\; \text {,} \nonumber \\&\!\!\!\!\!\!\!\! \textbf{N}^{*} \left( \textbf{S}_{\rho ,t_{i}}^{k+1}\right)&=\quad \rho ^{k} \left[ \textbf{u}_{1,i}^{k}+\textbf{R}_{1,i}^{k+1}\right] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \;\;\;\text {,} \nonumber \\&\textbf{S}_{\eta ,t_{i}}^{k+1}&= - \eta ^{k}\left[ \textbf{D}_{i}\left( \textbf{z}_{2,i}^{k+1}-\textbf{z}_{2,i}^{k}\right) + \sum _{l} \textbf{D}_{l}\left( \textbf{D}_{i}^{T}\left( \textbf{t}_{l}^{k} - \textbf{t}_{l}^{k+1}\right) + \textbf{z}_{2,il}^{k+1}-\textbf{z}_{2,il}^{k}\right) \right] \text {,} \nonumber \\&\!\!\!\!\!\!\!\!\!\textbf{N}^{*}\left( \textbf{S}_{\eta ,t_{i}}^{k+1}\right)&=\quad \eta ^{k} \left[ \textbf{D}_{i} \left( \textbf{u}_{2,i}^{k} + \textbf{R}_{2,i}^{k+1}\right) + \sum _{l} \textbf{D}_{l} \left( \textbf{u}_{2,il}^{k} + \textbf{R}_{2,il}^{k+1}\right) \right] \qquad \qquad \qquad \qquad \;\;\, \text {.}\nonumber \end{aligned}$$

These equations show that a change in \(\textbf{z}_{1,i}\) in between iterations might be induced by a change in \(\textbf{x}\), which then also does not appear in the residual for the \(\textbf{t}_{i}\) filter mask update. The same holds for a change in the diagonal \(\textbf{z}_{il}\) elements, which can occur due to a change in the other \(\textbf{t}_{l}\) filter masks.

Since \(\textbf{x}\) and \(\textbf{t}_{i}\) can be represented as a vector \(\begin{pmatrix} \textbf{x}\\ \textbf{t} \end{pmatrix}\), the same holds for the dual residuals. Thus, applying the Frobenius norm to form a global relative residual for the \(\rho \), \(\eta \), and \(\varphi \) balancing seems logical:

$$\begin{aligned}&S_{rel,\rho }^{k+1}&= \sqrt{\frac{\Vert \textbf{S}_{\rho ,x}^{k+1}\Vert _{F}^{2} + \sum _{i} \Vert \textbf{S}_{\rho ,t_{i}}^{k+1}\Vert _{F}^{2}}{N\left( \textbf{S}_{\rho ,x}^{k+1}\right) ^{2} + \sum _{i} N\left( \textbf{S}_{\rho ,t_{i}}^{k+1}\right) ^{2}}}\text {,}\end{aligned}$$
(58)
$$\begin{aligned}&S_{rel,\eta }^{k+1}&= \sqrt{\frac{\sum _{i}\Vert \textbf{S}_{\eta ,t_{i}}^{k+1}\Vert _{F}^{2}}{\sum _{i} N \left( \textbf{S}_{\eta ,t_{i}}^{k+1}\right) ^{2}}}\qquad \qquad \qquad \;\;\;\; \text {,}\end{aligned}$$
(59)
$$\begin{aligned}&S_{rel,\varphi }^{k+1}&= \frac{\Vert \textbf{S}_{\varphi ,x}^{k+1}\Vert _{F}}{N\left( \textbf{S}_{\varphi ,x}^{k+1}\right) }\qquad \qquad \qquad \qquad \qquad \;\;\; \text {.} \end{aligned}$$
(60)

The dual residuals arising from the derivation with respect to \(\textbf{t}\) stay identical to the Poisson ADMM-TGV.

These primal and dual residuals are used for balancing \(\rho \) and \(\eta \) in the following according to (28). The residual balanced algorithms are hereafter called ADMM-RBTGV for Gauss or Poisson noise.

3.3 ADMM for Gaussian and Poisson noise with RBTGV

Implementing the residual balanced algorithm for Gaussian noise gives the following algorithm, with starting conditions \(\textbf{x}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{t}_{i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{z}_{1,i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{z}_{2,j}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{u}_{1,i}^{0}= \mathbf {\mathbb {0}}\), \(\textbf{u}_{2,j}^{0}= \mathbf {\mathbb {0}}\) and finally \(\rho ^{0}=1\), \(\eta ^{0}=1\), leading to the following:

$$\begin{aligned} \!\left\{ \textbf{x}^{k+1},\textbf{t}^{k+1} \right\} =&\underset{\textbf{x, t}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho ^{k},\eta ^{k}}\left( \textbf{x},\textbf{t},\textbf{z}_{1}^{k},\textbf{z}_{2}^{k},\textbf{u}_{1}^{k},\textbf{u}_{2}^{k} \right) \qquad \,\text {,}\quad (\text {after~} [42])\quad \;\,\nonumber \\ \textbf{z}_{1,i}^{k+1} \;\; =&\,\underset{\textbf{z}_{1,i}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho ^{k}} \left( \textbf{x}^{k+1},\textbf{t}_{i}^{k+1},\textbf{z}_{1,i},\textbf{u}_{1,i}^{k} \right) \, \text {,}\quad {(\text {after}\, (38))}\nonumber \\ \textbf{z}_{2,j}^{k+1} \;\; =&\underset{\textbf{z}_{2,j}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\eta ^{k}} \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{2,j},\textbf{u}_{2,j}^{k} \right) \,\text {,}\quad {(\text {after}\, (39))} \nonumber \\ \text {if}\quad k \in \text {Val} \nonumber \\&\tau _{\rho }^{k+1}\leftarrow \tau _{\rho }^{k} \text {,}\quad \tau _{\eta }^{k+1} \leftarrow \tau _{\eta }^{k}\qquad \qquad \;\;\;\;\text {,}\quad \text {(after}\, (28))\\&\rho ^{k+1} \leftarrow \rho ^{k} \text {,}\quad \eta ^{k+1} \leftarrow \eta ^{k}\qquad \qquad \;\;\;\; \text {,}\quad {(\text {after}\, (29))}\nonumber \\ \text {else}\qquad \quad \;\;\nonumber \\&\tau _{\rho }^{k+1} = 1 \text {,}\quad \tau _{\eta }^{k+1} =1 \qquad \qquad \qquad \;\;\;\;\;\text {,}\quad \qquad \qquad \qquad \nonumber \\ \text {end}\qquad \quad \;\;\nonumber \\ \textbf{u}_{1,i}^{k+1} \;\; =&\frac{\textbf{u}_{1,i}^{k}+ \left( \left[ \textbf{Dx}^{k+1}\right] _{i}-\textbf{t}_{i}^{k+1}-\textbf{z}_{1,i}^{k+1} \right) }{\tau _{\rho }^{k+1}}\qquad \qquad \qquad \; \text {,}\qquad \qquad \qquad \quad \nonumber \\ \textbf{u}_{2,j}^{k+1} \;\; =&\frac{\textbf{u}_{2,j}^{k}+ \left( \left[ \textbf{Gt}^{k+1}\right] _{j}-\textbf{z}_{2,j}^{k+1} \right) }{\tau _{\eta }^{k+1}} \qquad \qquad \qquad \quad \;\;\,\quad \text {,}\qquad \qquad \qquad \quad \nonumber \end{aligned}$$
(61)

with Val being a series of exponentially growing values, such that the residual balancing occurs less and less often. The initial values are the same as in the algorithm for the standard Gaussian ADMM with TGV (35).

The residual balanced algorithm for Poisson noise, with the starting conditions of the Gaussian algorithm plus \(\textbf{z}_{1,i}^{0}= \varvec{\xi }\) and \(\textbf{u}_{0}^{0}= \mathbf {\mathbb {0}}\), is given as follows:

$$\begin{aligned} \!\!\!\!\left\{ \textbf{x}^{k+1}\!\!,\textbf{t}^{k+1}\! \right\} =&\underset{\textbf{x},\textbf{t}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\varphi ^{k},\rho ^{k},\eta ^{k}}\left( \textbf{x},\textbf{t},\textbf{z}_{0}^{k},\textbf{z}_{1}^{k},\textbf{z}_{2}^{k},\textbf{u}_{0}^{k},\textbf{u}_{1}^{k},\textbf{u}_{2}^{k} \right) \! \text {,}\quad {(\text {after~} [42])}\quad \quad \nonumber \\ \!\!\textbf{z}_{0}^{k+1} \;\;\! =&\;\, \underset{\textbf{z}_{0}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\varphi ^{k}} \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{0},\textbf{u}_{0}^{k} \right) \qquad \quad \quad \; \text {,}\quad {(\text {after}\, (47))}\quad \nonumber \\ \textbf{z}_{1,i}^{k+1} \;\;\! =&\underset{\textbf{z}_{1,i}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\rho ^{k}} \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{1},\textbf{u}_{1}^{k} \right) \qquad \quad \quad \text {,}\quad \text {(after}\, (38)) \nonumber \\ \textbf{z}_{2,j}^{k+1} \;\;\! =&\, \underset{\textbf{z}_{2,j}}{{{\,\mathrm{arg\,min}\,}}}\; \mathcal {L}_{\eta ^{k}} \left( \textbf{x}^{k+1},\textbf{t}^{k+1},\textbf{z}_{2},\textbf{u}_{2}^{k} \right) \qquad \quad \;\;\; \text {,}\quad \text {(after}\, (39))\nonumber \\ \text {if}\quad k \in \text {Val} \nonumber \\&\tau _{\rho }^{k+1}\leftarrow \tau _{\rho }^{k} \text {,}\quad \tau _{\eta }^{k+1} \leftarrow \tau _{\eta }^{k} \text {,}\quad \tau _{\varphi }^{k+1} \leftarrow \tau _{\varphi }^{k}\quad \text {,}\quad \text {(after}\, (28))\\&\rho ^{k+1} \leftarrow \rho ^{k} \text {,}\quad \eta ^{k+1} \leftarrow \eta ^{k}\text {,}\quad \varphi ^{k+1} \leftarrow \varphi ^{k} \quad \text {,}\quad \text {(after}\, (29))\nonumber \\ \text {else}\qquad \quad \;\;\nonumber \\&\tau _{\rho }^{k+1} = 1 \text {,}\quad \tau _{\eta }^{k+1} =1 \text {,}\quad \tau _{\varphi }^{k+1} =1 \;\;\text {,}\quad \qquad \nonumber \\ \text {end}\qquad \quad \;\;\nonumber \\ \textbf{u}_{0}^{k+1} \;\;\! =&\,\frac{\textbf{u}_{0}^{k}\;\;\, + \left( \varvec{\Omega } {\textbf {x}}^{k+1}-\textbf{z}_{0}^{k+1} \right) }{\tau _{\varphi }^{k+1}} \!\!\qquad \qquad \qquad \qquad \;\;\;\;\;\, \text {,}\qquad \quad \quad \nonumber \\ \textbf{u}_{1,i}^{k+1} \;\;\! =&\;\,\frac{\textbf{u}_{1,i}^{k} \,+ \left( \left[ \textbf{Dx}^{k+1}\right] _{i}-\textbf{t}_{i}^{k+1}-\textbf{z}_{1,i}^{k+1} \right) }{\tau _{\rho }^{k+1}} \qquad \quad \;\; \text {,}\qquad \quad \quad \;\;\nonumber \\ \textbf{u}_{2,j}^{k+1} \;\; \! =&\,\frac{\textbf{u}_{2,j}^{k}\,+ \left( \left[ \textbf{Gt}^{k+1}\right] _{j}-\textbf{z}_{2,j}^{k+1} \right) }{\tau _{\eta }^{k+1}}\qquad \quad \quad \quad \;\;\; \text {.}\qquad \quad \quad \;\nonumber \end{aligned}$$
(62)

4 Evaluation of the algorithms

In this section, the algorithms are tested on images for denoising and deconvolution problems to demonstrate the success of residual balancing and to show its impact on the convergence of the respective algorithm.

An image is superimposed with noise, such that it is easy to compare the restoration with the original image and show which parameter sets work best. For TGV, four parameters have to be well chosen to achieve optimality, \(\rho \) and \(\eta \), \(\lambda _{0}\) and \(\lambda _{1}\); for Poisson, this list extends by an additional penalty parameter \(\varphi \).

For evaluating the algorithm, a quality measure is defined. For the quality of the restoration, the mean squared error (MSE) after iteration k

$$\begin{aligned} {{\,\textrm{MSE}\,}}^{k} = \frac{1}{M \cdot N} {\displaystyle \sum _{m,n}^{M,N}} \left( \textbf{x}_{i,j}^{k} - \varvec{\xi }_{i,j}^{Orig.}\right) ^{2} \text { ,} \end{aligned}$$
(63)

is a commonly used criterion, with the original image \(\varvec{\xi }^{Orig.}\) known and the current iterate image \(\textbf{x}^{k}\). In order to make optimal regions for the parameter choices visible, initially, \(\rho \!=\! \eta (=\varphi \) for Poisson) and \(\lambda _{0}\!=\!\lambda _{1}\) are chosen over orders of magnitude from \(\text {10}^{-\text {5}} \le \rho \le \text {10}^\text {5}\) and \( {\text {10}^{-\text {5}}}/{\omega \sigma } \le \lambda _{0} \le {\text {10}^\text {5}}/{\omega \sigma }\), where \(\sigma \) is the standard deviation of the inherent noise in the image and \(\omega \) is an attenuation coefficient depending on the deconvolution kernel.

Looking at the total generalized variation ADMM approach for Gaussian noise (33) and setting \(a\cdot \lambda =\lambda _{0}\), \(b\cdot \lambda =\lambda _{1}\), with \(\varvec{\Omega }=\mathbb {1}\),

$$\begin{aligned} \underset{\textbf{x},\textbf{z}_{1},\textbf{z}_{2}}{{{\,\mathrm{arg\,min}\,}}} \quad {1}/{2\sigma ^{2}}\,\Vert \textbf{x}- \varvec{\xi } \Vert _{F}^{2} + \lambda \left( a\cdot \Vert \textbf{z}_{1}\Vert _{F} + b \cdot \Vert \textbf{z}_{2}\Vert _{F}\right) \quad \nonumber \\ \text {subject to} \,\qquad \quad \textbf{z}_{1} =\textbf{Dx}-\textbf{t} \text {, } \textbf{z}_{2} = \textbf{Gt} \qquad \quad \qquad \;\;\,\text {,} \end{aligned}$$
(64)

the following can be seen: If the true, noise-free image \(\textbf{x}^{*}\) could be found, the first term of the minimization would equal:

$$\begin{aligned} \frac{1}{2\sigma ^2}\,\Vert \textbf{x}^{*}- \varvec{\xi } \Vert _{F}^{2}= \sum _{n,m}^{N,M} \frac{1}{2} \quad \Longleftrightarrow \quad \Vert \textbf{x}^{*}- \varvec{\xi } \Vert _{F}^{2}= \sum _{n,m}^{N,M}\sigma ^{2}\text { ,} \end{aligned}$$
(65)

by definition of the standard deviation \(\sigma \), as \(\sigma \) characterizes the mean noise level of all contributing pixels. In a perfect setting with perfect noise detection, the TGV prior would be that exact difference:

$$\begin{aligned} \lambda \cdot \left( a\cdot \Vert \textbf{z}_{1}\Vert _{F} + b \cdot \Vert \textbf{z}_{2}\Vert _{F}\right) = \lambda \cdot \Vert \textbf{x}^{*}- \varvec{\xi } \Vert _{F} = \left( \sum _{n,m}^{N,M} \lambda ^{2}\cdot \sigma ^{2}\right) ^{{1}/{2}} . \end{aligned}$$
(66)

Since we have perfect noise estimation in this example, the noise level would be irrelevant for the noise detection and for the image. Putting (65) and (66) into the minimization (64) shows that the minimum only depends on the total amount of pixels, if \(\lambda ={1}/{\sigma}\) is chosen as an optimal regularization parameter:

$$\begin{aligned} \underset{\textbf{x},\textbf{z}_{1},\textbf{z}_{2}}{{{\,\mathrm{arg\,min}\,}}} \quad \sum _{n,m}^{N,M} \frac{1}{2} + \left( \sum _{n,m}^{N,M} \lambda ^{2}\cdot \sigma ^{2}\right) ^{ {1}/{2}} . \end{aligned}$$
(67)

But as the image can only be separated into piece-wise flat or sloped regions either connected to \(\Vert \textbf{z}_{1}\Vert _{F}\) or \(\Vert \textbf{z}_{2}\Vert _{F}\), a and b must add to 1 to comprise the whole image.

However, as noise cannot be predicted perfectly, the optimal \(\lambda \) deviates according to the quality and applicability of the prior. In case of a bad prior, distortions are induced and may at some point outweigh the benefits to the restoration process. Conversely, this would reduce the optimal regularization parameter compared to what would be the case with a good prior. However, as most images comprise piece-wise flat or sloped regions, it is reasonable to assume that TGV is close to approaching the optimal regularization parameter of \(\lambda = {1}/{\sigma }\). If the standard deviation of the noise \(\sigma \) is sufficiently known, optimality mainly depends on the choice of a and b weighting \(\Vert \textbf{z}_{1}\Vert _{F}\) and \(\Vert \textbf{z}_{2}\Vert _{F}\), respectively.

The same argumentation also holds for Poisson noise, as it approaches Gaussian noise for the increasing number of counts. However, for deconvolution, the \(\textbf{z}_{1}\) and \(\textbf{z}_{2}\) are applied to the deconvolved estimate of the image \(\textbf{x}\) and not to the convolved estimate \(\varvec{\Omega }\textbf{x}\) (see (35) and (42)). The noise is thus attenuated by a factor \(\omega \) which depends on the convolution kernel the image is convolved with. The wider the kernel \(\varvec{\Omega }\) for deconvolution, the more of the noise gets drawn into a single pixel. By normalizing the kernel to its maximum peak, we can find the contributions of the other pixel noises to the attenuated noise. Since adding noises corresponds to adding their variances, this leads to the following:

$$\begin{aligned} \omega ^2 = \sum _{i,j}^{N,M}\frac{\varvec{\Omega }_{i,j}}{\max \left\{ \ \varvec{\Omega }_{i,j}\right\} } \text { .} \end{aligned}$$
(68)

So, the overall scaling needs to be \( {1}/{\omega \sigma }\). For denoising, the attenuation is \(\omega ^{2}=1\), since \(\Omega \) corresponds to a delta peak function.

To make regions visible where the overall quality improves the MSE of a given iteration, k can divided by the initial MSE\(^{0}\):

$$\begin{aligned} {{\,\textrm{NormMSE}\,}}^{k}=\frac{{{\,\textrm{MSE}\,}}^{k}}{{{\,\textrm{MSE}\,}}^{0}} \text { ,} \end{aligned}$$
(69)

showing how much the image improves relatively.

4.1 Denoising of images with Gaussian noise

Fig. 1
figure 1

a Test image “onion” from MATLAB. The pixel intensities were multiplied by a factor of 10. b Image with additional Gaussian white noise with a standard deviation of \(\sigma \! = 50\) and the attenuation \(\omega =1\). c The result of the original ADMM-TGV (see (35)) and d the result of our ADMM-RBTGV algorithm (see (61)), after 1000 iterations and with unfavorable starting values for the penalty parameters \(\rho =\eta =1\) and optimal regularization parameters \(\lambda _{0}=\lambda _{1}\approx {0.63}/{\omega \sigma }\) as values, respectively

The first original image can be found in Fig. 1a. In Fig. 1b, Gaussian noise of \(\sigma = 50\) was added to the image to achieve a signal-to-noise ratio of \(\text {SNR} \approx 20\). To visualize the effect of residual balancing, a suboptimal choice of the penalty parameters \(\rho \) and \(\eta \) was done, while the regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\) were kept close to the optimum. In Fig. 1 c and d, the denoised images for the ADMM-TGV and ADMM-RBTGV algorithms after 1000 iterations are compared.

The difference between both denoised images Fig. 1 c and d is rather easy to spot, as c looks blurry compared to d. The suboptimal choice of \(\rho \) and \(\eta \) diminishes the progress of the denoising. However, as residual balancing equalizes the denoising result over all penalty parameters, making the result independent of the initial choice, the progress of the algorithm is further advanced, and thus, the quality of the reconstruction is significantly improved.

To show that penalty parameters \(\rho \) and \(\eta \) for residual balancing can in fact be arbitrarily chosen, Fig. 2 displays the course of thousand iterations for a wide range of regularization parameters. The NormMSE (see (69)) is displayed such that it shows parameter sets improving the image quality in blue and the rest in yellow. This depiction allows to quickly get an insight into the relevant range of the parameters, that an operator would preferably choose from.

It can be seen that the original ADMM-TGV algorithm in the upper row of Fig. 2 strongly depends on the initial choice of \(\rho \) and \(\eta \) as the blue-colored optimal region slowly shifts to higher \(\rho \) and \(\eta \) in the course of iterations. With a higher number of iterations, the NormMSE-map of the original ADMM-TGV forms an inverted L-shaped optimal region that grows in \(\rho \)-direction with increasing the number of iterations. Thus, the optimality map can be divided into three regions: the first region at low values of \(\lambda \) bordered by the first red line, in which the algorithm does not improve the image quality significantly, because the noise is estimated too low, leading to under-smoothing; the second, inside the red boundaries, where the algorithm converges to a better-than-before solution around the optimum; and a third region in which the algorithm decreases image quality due to over-smoothing, eventually. In the left outer regions at some point in the iterations, an optimum is reached for some \(\rho \) at a given iterate, but lost due to convergence to another result. It is, however, not predictable for a given image, when this incident occurs and thus recommended to stay in the second, optimal region. For every \(\rho \) in this optimal region, the TGV algorithm converges eventually to the same result for a given \(\lambda \), but might need many iterations due to a poor choice of \(\rho \).

In order to investigate convergence for all initial choices of \(\rho \!= \!\eta \), the residual balancing is applied in the second row of Fig. 2. Here, the results at a given \(\lambda _{0}=\lambda _{1}\) column are constant for all starting values \(\rho = \eta \) as early as after ten iterations, indicating that the residual balanced approach does not depend heavily on the initial choice of \(\rho \) and \(\eta \).

To elaborate further, if a faster convergence of the algorithm compared to the original, the drop rate of the R and S residuals connected to \(\rho \) and \(\eta \) are evaluated in Figs. 3 a and b, respectively.

Fig. 2
figure 2

NormMSE maps generated from applying both ADMM-TGV and ADMM-RBTGV algorithms on Fig. 1, with the individual pixels showing the denoising results under Gaussian noise of the respective \(\left( \rho =\eta \,|\, \lambda _{0}=\lambda _{1}\right) \)- pair. In the upper row, the results of the original ADMM-TGV are shown after a 10, b 100, and c 1000 iterations. In the lower row (df), the results of the residual balanced ADMM-RBTGV approach are displayed at the same iteration stages. The two red lines in every map indicate the region, in which the algorithm eventually improves the image quality compared to the initial noisy image, and the dashed line indicates the \(\lambda _{0}\!=\!\lambda _{1}\)- value in units of \( {1}/{\omega \sigma }\) at which results are optimal (\(\lambda _{opt}\!\approx {\text {0.63}}/{\omega \sigma }\))

Fig. 3
figure 3

a Trend of the R and S residuals connected to both \(\rho \) and \(\eta \) belonging to the standard ADMM-TGV algorithm for Gaussian noise (Fig. 2d) and the relative residuals \(\text {R}_\text {rel}\) and \(\text {S}_\text {rel}\) described in Sect. 3.1 and Sect. 3.2. Unfavorable starting values of \(\rho \!=\!\eta \!=\!1\) and optimal values for \(\lambda _{0} \!=\lambda _{1} \!=\!\lambda _{opt}\! \approx 0.63 \,{1}/{\omega \sigma }\) were chosen to compare with b the residuals of the residual balanced ADMM-RBTGV denoted as RB. The optimal values for \(\lambda _{0}=\lambda _{1}\) were found as the column with minimal NormMSE values in Fig. 2d. It can be seen that for the residual balancing, the respective R and S residuals for \(\rho \) and \(\eta \) are bound to each other and drop faster than for the original algorithm. c Combining all four residuals to a distance measure D confirms increased drop rates for the residual balancing. d The NormMSE shows that the RBTGV algorithm (red) converges faster than the original algorithm (black), as it approaches the optimum zone (green) of the converged result (green dotted line) from early on. The NormMSE values for an optimal choice (denoted as opt) of \(\rho =\eta \approx 6.3\cdot 10^{-5}\), however, are faster for both algorithms with the residual balancing (orange dotted line) being equally fast as the original algorithm (gray). e Development of \(\rho \) and \(\eta \) in the course of the iterations

The developments of the R and S residuals are shown for the optimal choice of \(\lambda _{0}=\lambda _{1} \approx 0.63\,{1}/{\omega \sigma }\) and an unfavorable \(\rho \!=\!\eta \! = \!1\) value as found in Fig. 2. By comparing the residuals for the ADMM-TGV algorithm in Fig. 3a and the ADMM-RBTGV algorithm in b, it can be seen that the regular rebalancing of the R and S pairs leads to an increased drop rate of all residuals. Combining all residuals to a distance measure \(D^{k+1}\!\! =\!\! \Vert \,\textbf{R}_{\rho }^{k+1}\,\Vert _{F}^{2} +\Vert \,\textbf{S}_{\rho }^{k+1}\,\Vert _{F}^{2} + \Vert \,\textbf{R}_{\eta }^{k+1}\,\Vert _{F}^{2} +\Vert \,\textbf{S}_{\eta }^{k+1}\,\Vert _{F}^{2}\) following the scheme of He et al. [37] leads to a clearer picture. In Fig. 3c, the distance D of the residual balanced ADMM-RBTGV (RB) can be seen to drop faster than the distance for the original ADMM-TGV. These trends stand out even more, when analyzing relative residuals (rel.) in Fig. 3 a and b as defined in (45) and (46) as well as in (5859) and (). Here, the relative \(S_{rel,\eta }\) residual increases for the original algorithm whereas it decreases in the residual balanced.

The residual balancing thus results in a faster convergence, as can be seen in Fig. 3d, where the NormMSE value of the residual balanced ADMM-RBTGV settles close to a value of 0.3 after around ten iterations, whereas the non-balanced ADMM-TGV requires several thousand. To set this into perspective, the NormMSE value for the optimal choice of \(\rho =\eta = \text {6.3}\cdot \text {10}^{-\text {5}}\), as found as the column with minimum values in Fig. 2, settles after four iterations, with no difference between residual balancing and the original algorithm. Thus, ADMM-RBTGV is beneficial when the optimal \(\rho \) and \(\eta \) values are unknown, which in general is true without excessive testing of both values.

In Fig. 3e, the trends of \(\rho \) and \(\eta \) for the residual balanced algorithm are shown, which indicate that the initial (unfavorable) choice was indeed set too high, as the residual balancing corrects these values significantly to lower values.

Fig. 4
figure 4

a Gray scaled test image “onion” from Matlab. The pixel intensities were multiplied by a factor of 10. b Image with pure Poisson noise (\(\sigma _{mean}\!\approx \! 31.7 \)) added after convolution with a 2D Gaussian kernel (FWHM \(=\) 1 Pixel). c The result of the ADMM-TGV and d the ADMM-RBTGV algorithm after 1000 iterations, with unfavorable penalty parameters \(\rho =\eta =\varphi =1\) and optimal regularization parameters \(\lambda _{0}=\lambda _{1}\approx {\text {0.4}}/{\omega \sigma }\) as starting values, respectively. According to (68), the attenuation \(\omega \) amounts to \(\omega \approx 2.5\)

4.2 Deconvolution of images with Poisson noise

To further demonstrate the versatile applicability to other types of data reconstruction, in a next example, residual balancing is applied to an image affected by blur and pure Poisson noise. The original image can be found in Fig. 4a. After convolution with a Gaussian kernel (FWHM \(=\) 1 Pixel), Poisson noise (\(\sigma _{mean}\!\approx \! 31.7 \)) is added in Fig. 4b. Further, in Fig. 4 c and d, the deconvolved image for the ADMM-TGV and ADMM-RBTGV algorithms after 1000 iterations, respectively, is displayed.

At a glance, the image reconstructed by the ADMM-RBTGV looks sharper compared to the ADMM-TGV-treated image, and further analysis can confirm this: Similar to the Gaussian noise results, regions of optimality for \(\rho =\eta =\varphi \) and \(\lambda _{0}=\lambda _{1}\) can be found via the NormMSE maps as shown in Fig. 5. Again, the convergence of the original algorithm (Fig. 5 a–c) lags behind the ADMM-RBTGV (Fig. 5 d–f) convergence for most \(\rho =\eta =\varphi \) values. Our algorithm shows nearly the same result for all \(\rho \) of a given \(\lambda \) after approximately ten iterations (Fig. 5e), indicating that the residual balancing is applicable here, too. But the convergence is slightly slower than in the Gaussian example, as an additional penalizer for the Poisson noise must be balanced in.

Figure 6a indicates that the residual for \(\rho \), \(\eta \), and \(\varphi \) are bound to each other and drop faster in the residual balanced case (Fig. 6b) altogether for an unfavorable choice of \(\rho \!=\!\eta \!= \varphi =\!\text {1}\). Despite all three \(R_{rel}\)-residuals being significantly lower from the start for the ADMM-TGV, the \(S_{rel}\)-residuals do not decrease at all. As all residuals must vanish, convergence is slowed down significantly for the original algorithm. In Fig. 6 a and b, we only display the relative residuals, as they give a better impression on the development of the algorithm. For completeness, in Fig. 6c, again, both original and relative distances from convergence are shown. Here, the distance \(D^{k+1}\!\! =\!\! \Vert \,\textbf{R}_{\rho }^{k+1}\,\Vert _{F}^{2} +\Vert \,\textbf{S}_{\rho }^{k+1}\,\Vert _{F}^{2} + \Vert \,\textbf{R}_{\eta }^{k+1}\,\Vert _{F}^{2} +\Vert \,\textbf{S}_{\eta }^{k+1}\,\Vert _{F}^{2} + \Vert \,\textbf{R}_{\varphi }^{k+1}\,\Vert _{F}^{2} +\Vert \,\textbf{S}_{\varphi }^{k+1}\,\Vert _{F}^{2}\) is extended by the residuals of \(\varphi \). It can be seen that the distance from convergence vanishes faster with residual balancing overall. The same behavior is shown in Fig. 6d for the NormMSE. Interestingly, both the distance from convergence D and the NormMSE increase significantly within the first five iterations of the ADMM-RBTGV, as the \(R_{rel}\)-residuals rise. As the log-likelihood functional for Poisson noise (3), which is tied to the original image, is influenced by the residual balancing, the difference between the new iteration of \(\textbf{x}^{k}\) and the original image is significantly increased. But the increase is only transient as with increasing iterations, the average \(\tau \) converges to 1 due to increased intervals between balancing steps. After approximately 16 iterations, convergence is close for the ADMM-RBTGV (indicated as the green dotted line within the green optimum zone in Fig. 6d), while the original ADMM-TGV has not made significant progress. Even for the choice of a close to optimal (denoted as opt) \(\rho =\eta =\varphi = \text {6.3}\cdot \text {10}^{-\text {3}}\), the original ADMM-TGV (gray) is slower than the residual balanced ADMM-RBTGV (orange), since all three penalty parameters are identically chosen and thus suboptimal. Finding the optimal penalty parameters often is time-consuming, whereas residual balancing is fully automated and allows to quickly adapt all penalty parameters to optimal values. Figure 6e shows the development of \(\rho \), \(\eta \), and \(\varphi \) due to the residual balancing, indicating that the initial values were chosen too high.

Fig. 5
figure 5

NormMSE maps, with the individual pixels showing the deconvolution results under Poisson noise with a Gaussian kernel (FWHM \(=\) 1 Pixel) of the respective \(\left( \rho =\eta =\varphi \,|\, \lambda _{0}=\lambda _{1}\right) \)- pairs. In the upper row, the results of the original ADMM-TGV are shown after a 10, b 100, and c 1000 iterations. In the lower row df, the results of the residual balanced ADMM-RBTGV are displayed at the same iteration stages. The two red lines in every map indicate the region, in which the algorithm eventually improves the image quality compared to the initial noisy image, and the dashed line indicates the \(\lambda _{0}\!=\!\lambda _{1}\)- value in units of \( {1}/{\omega \sigma }\), with the standard deviation \(\sigma \!=\!31.7\) of the noise and the attenuation \(\omega \approx 2.5\), at which results are optimal (\(\lambda _{opt}\approx {\text {0.4}}/{\omega \sigma }\))

Fig. 6
figure 6

a Trend of the relative \(\text {R}_\text {rel}\) and \(\text {S}_{rel}\) residuals (described in Sect. 3.1 and Sect. 3.2) connected to \(\rho \), \(\eta \), and \(\varphi \) belonging to the standard ADMM-TGV algorithm for deconvolution under Poisson noise (Fig. 5d). Unfavorable starting values of \(\rho =\!\eta \!= \varphi = 1\) and optimal values for \(\lambda _{0} \!=\lambda _{1} \!=\!\lambda _{opt}\! \approx {\text {0.4}}/{\omega \sigma }\) were chosen to compare with b the residuals of the residual balanced ADMM-RBTGV denoted as RB. The optimal values for \(\lambda _{0}=\lambda _{1}\) were found as the column with minimal NormMSE values in Fig. 5d. It can be seen that for the residual balancing, the respective relative \(\text {R}_\text {rel}\) and \(\text {S}_\text {rel}\) residuals for \(\rho \), \(\eta \), and \(\varphi \) stay bound together and drop faster, whereas the relative residuals of the original algorithm stagnate. Initially, this leads to a rise in all three \(\text {R}_\text {rel}\)- residuals. c Combining all six residuals to a distance measure D confirms the impression of a faster drop rate for the residual balancing. However, \(\text {D}_\text {RB}\) and \(\text {D}_\text {RB,rel}\) show a transient increase in iteration three, where the increase in \(\text {D}_\text {RB}\) is more pronounced. d The NormMSE shows that the residual balancing (red) leads to a faster convergence than the original algorithm (black), as it approaches the optimal zone (green) of the converged result (green dotted line) from iteration 16 on. The NormMSE values for the “optimal” choice of \(\rho =\eta =\varphi = 1 \cdot 10^{(\text {-}3)}\) converges a lot slower in the original algorithm (gray) than for residual balancing (orange). Like the initial rise of the distance measure D suggests, the NormMSE initially increases before seeing a fast convergence. e The development of \(\rho \), \(\eta \), and \(\varphi \) in the course of the iteration

4.3 Optimality for \(\lambda _{0}\) and \(\lambda _{1}\)

Having established the residual balancing for Gaussian and Poisson problems, \(\rho \), \(\eta \), and \(\varphi \) are no longer necessary to be chosen by the user, but instead are automatically set by the algorithm. The parameter choice thus reduces to \(\lambda _{0}\) and \(\lambda _{1}\) to achieve optimality. Therefore, the NormMSE maps for Gaussian denoising and Poisson deconvolution are displayed in Fig. 7 a and c for pairs of \(\lambda _{1}\) and \(\lambda _{0}\), with the starting values fixed to \(\rho =\text {1}\), \(\eta =\text {1}\) and \(\varphi =\text {1}\) for the Poisson case.

Fig. 7
figure 7

NormMSE as a function of \(\lambda _{0}\) and \(\lambda _{1}\) after applying 1000 iterations of RBTGV for Gaussian denoising a of Fig. 1b, with the zoomed extract marked with a blue box in b. The same for Poisson deconvolution c of Fig. 4b, with the zoomed extract marked with a blue box in d. The red lines in a and c indicate the region, in which the algorithm eventually improves the image quality compared to the initial noisy image. For both, a Gaussian denoising with the noise standard deviation \(\sigma \!=\!50\) and the attenuation of \(\omega =1\) as well as c Poisson deconvolution with \(\sigma \!\approx 31.7\) and the attenuation of \(\omega \approx 2.5\), two branches occur forming an L-shape at which results are better than before. The first branch is connected to the \(\lambda _{0}\) balancing parameter for the first order total variation, and the second branch is connected to \(\lambda _{1}\); the dotted red line indicates the position of the optimal \(\lambda _{0,opt}\) and \(\lambda _{1,opt}\) value connected to either branch. For a Gaussian denoising, these values are \(\left( \lambda _{0,opt}\,|\, \lambda _{1,opt}\right) \approx \left( 0.25 \,|\, 0.63\right) {1}/{\omega \sigma }\) and for c Poisson deconvolution \(\left( \lambda _{0,opt}\,|\, \lambda _{1,opt}\right) \approx \left( 0.1 \,|\, 0.63\right) {1}/{\omega \sigma }\). b and d show the overlap region of the branches for Gaussian denoising and Poisson deconvolution, respectively. Here, the red cross indicates the optimal point, at which the overall minimum is found. For b Gaussian denoising, the optimal value is found at \(\left( \lambda _{0}\,|\, \lambda _{1}\right) _{opt}\!\! \approx \!\!\left( 0.36 \,|\, 0.59\right) {1}/{\omega \sigma }\). For d Poisson deconvolution, the optimal value is found at \(\left( \lambda _{0}\,|\, \lambda _{1}\right) _{opt} \!\!\approx \!\!\left( 0.57 \,|\, 0.46\right) {1}/{\omega \sigma }\)

Both maps exhibit two branches of optimality forming an L-shape. The first is connected to the value for \(\lambda _{0}\), which controls the impact of the derivatives \(\textbf{D}\) acting on the flat parts of the image, and the second is connected to the \(\lambda _{1}\) value controlling the derivatives \(\textbf{G}\) acting like a second-order derivative applied to the sloped regions of the image. Depending on the features of the image, both parameters have to be chosen carefully and therefore typically require experience.

Since the chosen image comprises of flat and sloped features, optimality is influenced by the choice of both, the \(\lambda _{0}\) and \(\lambda _{1}\)-parameter. As long as \(\lambda _{0}\) is above a certain threshold of \({2}/{\omega \sigma }\), the result dominantly depends on the \(\lambda _{1}\)- parameter. The same is true as long as \(\lambda _{1}\) is above \({2}/{\omega \sigma }\), vice versa. Raising either \(\lambda _{0}\) or \(\lambda _{1}\) to unreasonably high numbers leads to large distances to the image. To reduce the distance, the filter masks adapt and minimize the parts of the image it is applied to. Thus, only the other derivative takes part in the minimization.

The optima of both branches differ significantly from 1 in Fig. 7 a and c, which is an indication that neither first- nor second-order TV alone is optimal. But \(\lambda _{1}\) being slightly closer to 1 than \(\lambda _{0}\) in a and by much in c shows that it is better suited and comes with lower cost. While in a, both branches are equal, the difference between both in c is large which shows as the much darker color of the \(\lambda _{1}\) branch.

As the biggest advantage of TGV is the mixture between both orders of total variation, it appears that when both types of structure, piece-wise flat and sloped, are combined in one image, the optimal \(\lambda _{0}\), \(\lambda _{1}\) pair is found in the overlap region of both branches. Since this overlap is only possible with the TGV approach, it shows the advantage over using only simple first- or second-order TV approaches.

Figures 7 b and d display the respective overlap regions, where the optimum values can be found, as indicated by the red cross.

It appears that for both denoising and deconvolution, the optimum values for \(\lambda _{0}\) and \(\lambda _{1}\) add up to one, approximately. This indicates both are dependent on \({1}/{\omega \sigma }\) introduced in (67) and (68). The clear difference between both sets for Gaussia denoising \(\left( \lambda _{0}\,|\, \lambda _{1}\right) _{opt} \approx \left( 0.36 \,|\, 0.59\right) {1}/{\omega \sigma }\) in b and for Poisson deconvolution \(\left( \lambda _{0}\,|\, \lambda _{1}\right) _{opt} \approx \left( 0.57 \,|\, 0.46\right) {1}/{\omega \sigma }\) in d is an indication that the balancing between \(\Vert \textbf{z}_{1}\Vert _{F}\) and \(\Vert \textbf{z}_{2}\Vert _{F}\) does not solely depend on the original image, as it is identical for both restorations, but rather on the significant features, which can be distinguished from the noise. This produces a dilemma in scientific applications that cannot be resolved, as features are not only unknown, but also the main subject of the debate, to which the restoration should add to. For an unbiased choice, we propose to weight both \(\lambda _{0}\) and \(\lambda _{1}\) equally with \( {1}/{2\omega \sigma }\), since this choice will always increase the outcome quality significantly, while preserving an unbiased and thus scientific approach to the data.

4.4 Comparison of residual balancing strategies

Fig. 8
figure 8

NormMSE maps with the individual pixels showing the denoising results under Gaussian noise af with standard deviation \(\sigma =50\) (same problem as Fig. 2). The approaches of Boyd [22] a and b and Wohlberg [36] d–f are tested with different \(\mu \) and \(\tau \) parameters (see Sect. 1.4) against c ours after 1000 iterations, respectively. It can be seen that only \(\mu =1\) produces the same result for all given \(\rho =\eta \) for every \(\lambda _{0}=\lambda _{1}\). The black regions in the Wohlberg approach show the algorithm diverging to infinity and thus failing for the respective pairs. The colored boxes mark all results that are further analyzed in Fig. 9a, since they at least converge for optimal choices of \(\lambda \). Images g–l show the deconvolution results under Poisson noise with standard deviation \(\sigma \approx 31.7\) and a 2D Gaussian Kernel with FWHM \(=\)1 pixel (same problem as Fig. 5) under the same conditions as for the Gaussian noise aforementioned

To further show the benefit of using our approach, it is tested against the former approaches of Boyd [22] and Wohlberg [36] (see (24) and (27)) in Fig. 8 for both denoising with Gaussian noise (a–f) and deconvolution with Poisson (g–l) noise under different parameters \(\mu \) and \(\tau \). Boyd suggested values of \(\mu =\text {10}\) and \(\tau =\text {2}\) for his algorithms (see (24)) as used in Fig. 8a for Gaussian and g for Poisson noise. Further, we tested the same values for Wohlberg d and j again for both types of noise. Both approaches show unstable and inconsistent results. As residual balancing is only applied when R and S residuals differ by one order of magnitude, it leads to the respective algorithms balancing the penalty parameters at different iterations depending on their initial value, which leads to different results at a given iteration. However, for this problem, we found \(\mu =\text {1}\) as proposed by Wohlberg [36] in Fig. 8 b and h as well as e and k more convincing. Since the residual balancing is now applied every iteration, differences between the initial penalty parameters are reduced more effectively. In a test for several \(\mu \), stability increased both for Boyds and Wohlbergs algorithms as \(\mu \rightarrow 1\).

Increasing \(\tau \) leads to a faster adaption of all penalty parameters as the residuals can be balanced in larger steps. However, as Boyd’s algorithm does not further adapt the step size in the course of iterations, the optimal \(\rho \) and \(\eta \) can only be reached within a factor of \(\tau \), which is why we stayed with the proposed value of \(\tau =\text {2}\) by Boyd. Wohlberg’s algorithm, in contrast, allows to approach the optimal \(\rho \), \(\eta \) and \(\varphi \) since \(\tau \) is variable between 0 and its maximum value \(\tau _{max}\).

However, choosing \(\tau \) too high leads to divergences for Wohlberg’s algorithm, marked as black areas in d–f and j–l. For both Gaussian denoising and Poisson deconvolution, this happens if the \(\lambda \)- values are estimated too high leading to over-smoothing.

Compared to Wohlberg’s algorithm, ours showed stability in both cases (Fig. 8 c and i). The main difference between Wohlbergs and our proposed algorithm here is the upper limit on \(\rho \), \(\eta \), and \(\varphi \) instead of the limit on \(\tau _{max}\), that prevents our algorithm from divergence, and the increasing number of iterations between two consecutive balancing steps, which enforces convergence instead.

As Fig. 8 does not show the speed or progress of the respective algorithms, all converging strategies marked with colored boxes are tested against each other in Fig. 9 at an unfavorable \(\rho =\eta =\varphi =\text {1}\) and \(\lambda _{opt}\) pair (similar to Fig. 3d and Fig. 6d). The graphs indicate that the approach of Boyd for Gaussian denoising (Fig. 9a) and Poisson deconvolution (Fig. 9b) is the slowest, since the optimal penalty parameters can never be reached due to the limitation of \(\tau \), which is overcome by Wohlberg’s approach with \(\tau _{max}=\text {2}\). For the first six iterations, both Boyd and Wohlberg’s approach show the same NormMSE values, indicating that afterwards, the variable \(\tau \) is beneficial. Further increasing \(\tau _{max}\) leads to a faster convergence, as the maximum balancing step is increased. Our approach is superior to the others in terms of convergence speed, since we do not limit the step size at all, but only set an upper limit to \(\rho \), \(\eta \), and \(\varphi \).

Fig. 9
figure 9

The graphs display the progress of the NormMSE within 1000 iterations. To show differences in the speed of convergence from the respective stable ADMM-RBTGV algorithms for a Gaussian and b Poisson noise (see Fig. 8) marked with the same line color as the boxes before. It can be seen that Boyd’s (black) approach [22] is the slowest of all for both problems. It is observed that an increase of the \(\tau _{max}\) translates into a faster convergence for Wohlberg’s approach [36] (blue and green). The fastest convergence is achieved with our algorithm (red) by only limiting \(\rho \) and \(\eta \) instead of \(\tau \)

5 Application examples

Fig. 10
figure 10

a Noisy image of a metallic nanoparticle acquired in a transmission electron microscope. b The deconvolved image with the ADMM-RBTGV after 1500 iterations. c Comparison of zoomed extracts of both, original and deconvolved images at the positions marked with the respective colors. d Electron energy loss signal from the oxygen K-edge of a Hematite (Fe\(_{2}\)O\(_\text {3}\)) nanoparticle with only the background subtracted (black), deconvolved with the zero-loss-peak by the commonly used RLA after 15 iterations (blue), 150 iterations (cyan) iterations (red), and the ADMM-RBTGV after 1500 iterations, with a mixed Poisson-Gauss noise model [43]. It can be seen that our approach produces a much smoother outcome, due to the better-adjusted noise model. Additionally, our result leaves much less room for interpretations of additional peaks, that occur around 540 eV in both RLA results

In Fig. 10, we give two examples of applications in physics. Figure 10a shows gold nanoparticles on a silicon oxide substrate as acquired with a scanning transmission electron microscope (STEM) in annular dark field mode (ADF) with inherent blur by the point-spread-function of the electron beam and some minor defocus. Deconvolving with a more specialized and adaptable mixed Poisson-Gaussian noise model (MPG) [43] with our ADMM-RBTGV (b) improves the overall quality of the reconstruction compared to the original as can especially be seen in the zoomed extract of c, where the crystalline structures appear much clearer.

The second example in Fig. 10d demonstrates the application of the ADMM-RBTGV algorithm to 1D data (spectra), here an electron-energy loss spectrum (EELS) recorded in a STEM. EEL spectra suffer from both the finite energy width of the impinging electron beam, leading to a convolution of characteristic spectral features, and noise, owing to the limited electron dose used in a measurement. The black spectrum was recorded from a catalytic metal oxide nanoparticle near the oxygen K-edge, where a background was subtracted using a common power-law mode. The blue and cyan curves show the spectrum after applying the commonly used Richardson-Lucy algorithm (RLA) 15 and 150 times, respectively. Deconvolving the image with the RLA sharpens the edges, but due to a misfit in the noise model by completely ignoring Gaussian read-out noise and not regarding the change in statistics by the subtraction of the background, the RLA diverges with increasing iterations. This is why the RLA is often limited to 15 iterations in EELS applications [44]. Oscillations occurring after 150 iterations of RLA near the spectral maximum at 542 eV might be easily misinterpreted as characteristic features. At energy losses above 550 eV, RLA-treated spectra show additional oscillations with the same amplitude as the peaks around 542 eV are produced by the RLA. These artifacts are commonly known as ringing artifacts [45,46,47]. In comparison, the spectrum obtained by applying the ADMM-RBTGV (1500 times) nicely corresponds to the spectra expected for metal oxide particles [48,49,50,51]. Thus, by utilizing MPG with our ADMM-RBTGV, the overall quality of the reconstruction improves significantly compared to the other as the signal appears much smoother and less noisy. All in all, the examples show that ADMM-RBTGV can reliably improve image and signal quality, unbiased by the choice of input parameters by the user.

6 Conclusions

In this paper, a novel residual balancing approach for ADMM image denoising and deconvolution with total generalized variation is described. It is applied to standard images as well as scientific 2D and 1D data superimposed with both Gaussian- and Poisson-type noises. The usefulness, objectivity, and fast convergence of the algorithm are demonstrated. By automatically choosing penalty parameters \(\rho \), \(\eta \), and \(\varphi \) for the Poisson case and equalizing their initial choice after a few iterations, residual balancing makes the algorithm user-friendly and unbiased. Moreover, it increases the convergence speed for almost all initial choices of the penalty parameters significantly. If the optimum is already known, residual balancing is indeed slower than the original algorithm, but does not delay convergence by much. However, as the optimum is hard to guess, residual balancing provides an easy and successful way of choosing \(\rho \) and \(\eta \) in the case of the Gaussian noise ADMM and an additional \(\varphi \) for the Poisson case. It is shown that the original algorithm and the residual balanced one converge to the same result for different initial choices of the penalty parameters over several orders of magnitude for the same values for the regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\). Further, we could show that our approach of residual balancing is superior to others, as it provides the fastest convergence and is stable at all tested pairs.

As the applicability is clearly shown, this work allows to adapt the principles of residual balancing to other penalizers for image restoration in general and for different dimensionality of the data.

By reducing the set of four parameters (or five in the Poisson case) to be chosen by the user to two, optimal regions can be found by only testing \(\left( \lambda _{0}\,|\,\lambda _{1}\right) \)-pairs. This makes visualization easier and reduces computational time for finding the optimal set.

It is shown that the regularization parameters \(\lambda _{0}\) connected to the first derivative and \(\lambda _{1}\) connected to the second derivative depend on the inverse standard deviation of the noise \( {1}/{\omega \sigma }\), that gets attenuated by a factor of \(\omega \) in the case of a deconvolution. This factor depends on the deconvolution kernel, as it drags more noise into a single pixel.

Selecting either \(\lambda _{0}\) or \(\lambda _{1}\) too high while keeping the other close to the optimum leads to the filter masks adapting and minimizing the influence of the respective first or second matrix of derivatives \(\textbf{D}\) or \(\textbf{G}\). The conclusion drawn from this is that TGV can either converge to results of the standard first-order TV or the second-order TV. However, as most images consist of both flat and sloped parts, optimality is found in the overlap of both \(\lambda _{0}\) and \(\lambda _{1}\) regions. Here, we could show that both regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\) must add up to a total value of \( {1}/{\omega \sigma }\) to achieve optimality. The ratio between both depends on the significant features in the image, which can be distinguished from the noise. Thus, for the same image under different noise conditions, the optimality of these parameters is not identical, but changes significantly.

Finally, with our work, the algorithm can be set up such that it operates on objective criteria and not on the arbitrary choice of user inputs: The penalty parameters are chosen automatically on a mathematical foundation, and the regularization parameters \(\lambda _{0}\) and \(\lambda _{1}\) can be chosen equal to \( {1}/{2\omega \sigma }\), such that they only depend on the noise standard deviation \(\sigma \). Since the features in an image are not known in a scientific context, the true optimality of the ratio of both parameters cannot be achieved without introducing a certain bias. But fixing the ratio of both regularization parameters still improves images significantly, since both parameters continue to be located within the optimality region. This allows for an unbiased approach on the restoration of experimental data as there is no room for altering the results. With sufficient knowledge of the noise in an experiment, which can be found by carefully investigating the detector, our work enables the use of the ADMM-based image restoration algorithms for scientific tasks.