A function space framework for structural total variation regularization with applications in inverse problems

In this work, we introduce a function space setting for a wide class of structural/weighted total variation (TV) regularization methods motivated by their applications in inverse problems. In particular, we consider a regularizer that is the appropriate lower semi-continuous envelope (relaxation) of a suitable total variation type functional initially defined for sufficiently smooth functions. We study examples where this relaxation can be expressed explicitly, and we also provide refinements for weighted total variation for a wide range of weights. Since an integral characterization of the relaxation in function space is, in general, not always available, we show that, for a rather general linear inverse problems setting, instead of the classical Tikhonov regularization problem, one can equivalently solve a saddle-point problem where no a priori knowledge of an explicit formulation of the structural TV functional is needed. In particular, motivated by concrete applications, we deduce corresponding results for linear inverse problems with norm and Poisson log-likelihood data discrepancy terms. Finally, we provide proof-of-concept numerical examples where we solve the saddle-point problem for weighted TV denoising as well as for MR guided PET image reconstruction.


Introduction
In many classical applications of inverse problems, rather than measuring data from a single channel, recently the simultaneous acquisition from multiple channels has gained importance. Besides having different sources of information available, a main advantage of such multiple measurements is due to the possibility of exploiting correlations between different data channels in the inversion process, often leading to a significant improvement for each individual channel. In particular, when the underlying quantities of interest can be visualized as image data, these correlations typically correspond to joint structures in images. In view of this background, multimodality and multicontrast imaging explore such joint structures for improved reconstruction. Successful applications of these techniques can be found for instance in biomedical imaging [9,20,21,22,40,43,45,49,52], geosciences [51], electron microscopy [27] and several more.
In this context, one distinguishes two different approaches for exploiting correlations: (i) Joint reconstruction techniques that treat all available channels equally, such as [40], and (ii) structuralprior-based regularization techniques that assume some ground truth structural information to be available. Here we focus on a particular class of type (ii), namely structural total-variation-type regularization functionals, i.e., functionals which integrate a spatially-dependent pointwise function of the image gradient for regularization. In this vein, given some prior information v, we consider a regularization functional of the type J(u) = which is used in a Tikhonov-type regularization problem where u is reconstructed by solving (1.1) min u Ω j v (x, ∇u(x)) dx + (λD • K)(u).
Here, D denotes a data discrepancy term and K a bounded linear operator which reflects the forward model. A very basic, formal example of such a regularization is given when v denotes the underlying ground truth image and we incorporate information on its gradient by defining j v (x, z) := 1 |∇v(x)| |z|, where | · | is some norm. This yields which corresponds to a weighted total variation (TV) functional.
Even though problems of the form (1.1) limit the exchange of information to the gradient level, they cover a large set of existing works in particular in the context of medical image processing. This can be explained on the one hand by the popularity of TV-type regularization for imaging in general, which is due to its capability of recovering jump discontinuities. On the other hand, the gradient level seems very well suited for exploiting correlations between different contrasts or modalities as it primarily encodes structural information, i.e., to some extent it is independent of the absolute magnitude of the signal. With the goal of enforcing parallel level sets, in [19], for instance, the regularizer J was chosen for a given image v as where φ, ψ are appropriate increasing functions, | · | β , | · | β 2 denote smoothed norms, and a · b is the scalar ("dot") product of vectors a, b ∈ R d . This definition is motivated by the fact that for two vectors w, y the quantity |w||y| − |(w, y)| is zero if and only if they are parallel to each other. Particular instances of (1.2) were then also used in [22] for joint MR-PET reconstruction. A similar functional, but in the spirit of [39] is Here w denotes some a priori vector field that contains gradient information. In the context of multicontrast MR, the authors in [20] choose where again a smoothed version of the absolute value was used in the denominator. Here, θ denotes the angle between ∇u and ∇v. Observe that the latter functional can also be written as Ω |∇u| 2 − ∇u · ∇v |∇v| 2 1/2 , bearing similarity to the functional (1.3). We note that the regularization approaches above have been considered only in discrete settings, despite the original continuous formulations. Concerning the latter and as indicated above in connection with (1.1), it is natural to consider u ∈ BV(Ω), i.e., |u| is Lebesgue integrable and u is of bounded total variation, still allowing for jump discontinuities, i.e., sharp edges. As a consequence, the (generalized) gradient Du of u is in general a finite Radon measure, only. This fact, however, challenges the proper defininition of regularization functionals of the above types in an associated function space setting. The first part of our work aims at addressing precisely this issue. In fact, resorting to the concept of functions of a measure [7] we propose to use the convex biconjugate as a regularizer. Indeed, starting from a general function j : Ω × R d → [0, ∞), Ω ⊂ R d with minimal assumptions, e.g., convexity, linear growth, 1-homogeneity in the second variable, we define the functional J : L p (Ω) → R as where we omit the dependence of j on the prior v for the sake of unburdening notation. Here, L p (Ω) and W 1,1 (Ω) denote the usual Lebesgue and Sobolev spaces; see [1]. We note that J in (1.5) is not suitable for variational regularization as it is finite only for a class of rather regular functions (i.e., in W 1,1 (Ω)) and it is not lower semi-continuous in an appropriate topology. As a remedy, we propose to resort to the convex biconjugate J * * of J, which we call structural TV functional. We recall that J * * coincides with the lower semi-continuous envelope (relaxation) of J with respect to L p convergence. In general, J * * may not have an explicit representation, but we show that it can always be expressed in a dual form as where (1.7) Q := g ∈ W q 0 (div; Ω) ∩ L ∞ (Ω, R d ) : j • (x, g(x)) ≤ 1 for almost every (a.e.) x ∈ Ω , see (3.2) below for the precise definition of the support function j • , and we refer to [25] for details on W q 0 (div; Ω). Nevertheless, based on a recent result by Amar, De Cicco and Fusco [2], by linking J * * to the relaxed functional of J with respect to L 1 convergence we are able to provide an integral representation, under additional assumptions. For instance, in the case where j(x, z) = α(x)|z|, α ∈ BV(Ω), with α ≥ 0, J * * is equal to the weighted TV-functional J * * (u) = Ω α − d|Du|, u ∈ BV(Ω), with α − the approximate lower limit of α; see (2.3) below for its definition. The set Q then reads Q = g ∈ W q 0 (div; Ω) ∩ L ∞ (Ω, R d ) : |g(x)| ≤ α(x) for a.e. x ∈ Ω .
Interestingly, as a consequence of this formulation, we get certain density results of convex intersections in the spirit of [31, 32, 34] as a byproduct; compare Section 4. Taking advantage of duality theory, in the second part of the paper we use the structural TV functional J * * for the regularization of linear inverse problems. In particular we study the general minimization problem As emphasized above, an explicit representation of the functional J * * is available only under some additional, perhaps restrictive assumptions. In order to solve (1.8) without invoking such assumptions, we employ Fenchel-Rockafellar duality and show, in the continuous setting, equivalence of (1.8) to a saddle-point problem of the form where (·, ·) denotes an appropriate pairing. This is achieved under assumptions which are tight in the sense that we can provide a counterexample where, without these assumptions, even existence of a solution for (1.8) fails. The major advantage of the above saddle-point reformulation is that it allows to obtain a solution of the original problem without requiring an explicit form of neither J * * nor (D • K) * . Note that the latter would be required for solving the predual problem. Furthermore, it has a format directly amenable to duality-based numerical optimization algorithms. The equivalence to a saddle-point reformulation is obtained under rather general assumptions on the data discrepancy term D, which, as corollary, allows us to cover the case of any norm discrepancy term as well as the case of a log-likelihood term for Poisson noise, which is relevant for instance for PET image reconstruction and electron tomography. The latter leads to the minimization problem (1.10) min where K denotes a Radon-transform-type operator, f, c 0 some given data, and I is the indicator function for a set M , i.e., I M (u) = 0 if u ∈ M and I M (u) = +∞ otherwise. Finally, we show the versatility of our approach with proof-of-concept numerical examples in weighted TV denoising with vanishing weight function and MR guided PET reconstruction.
We note here that there is previous work on the analysis of weighted and/or structural TV regularization in an infinite dimension setting. In [26], another instance of structural-TV type functionals is employed, but the work only considers the case of image denoising. Further, the authors simultaneously optimize over the image data and an anisotropy in the TV-term, which leads to a non-convex problem. Regarding properties of solutions of weighted TV denoising we refer to the work by Jalalzai [38] as well as to [30]. Finally, we mention that in [5] the authors analyze a weighted TV regularization model for vortex density models.
Structure of the paper. The paper is organized as follows: In Section 2, we fix our notation and we remind the reader of basic facts concerning functions of bounded variation and W (div) spaces.
In Section 3 we describe the relaxation framework for the structural TV functional. Under some conditions, we provide an integral representation of the relaxation based on a result of [2] and we also provide a characterization of its subdifferential.
Some refinements for weighted TV functional are given in Section 4 in the case of continuous and lower semi-continuous weight functions. We show that the functional can be defined in a dual fashion, using smooth test functions and as a byproduct we obtain certain density results of convex intersections in the spirit of [32,34].
Section 5 contains the main duality result of the paper. In particular, we show that under certain mild assumptions, the variational regularization problem with the relaxed structural TV functional as regularizer can be equivalently formulated as a saddle-point problem which requires knowledge neither of the explicit form of the relaxation-as it is the case for the primal problem -nor of the convex conjugate of the discrepancy term-as it is the case for the predual problem. As particular application, we elaborate on the case of Poisson log-likelihood discrepancy terms and show how the result can be transferred to this situation.
Finally, in Section 6 we provide proof-of-concept numerical examples, where we solve our saddle point problem for weighted TV denoising with vanishing weight function and for MR-guided PET image reconstruction.

Notation and Preliminaries
2.1. Functions of bounded variation. Throughout the paper, Ω ⊂ R d , with d ≥ 2, will be a bounded, open set with Lipschitz boundary, and we denote d * = d d−1 . By p, q we always denote two real numbers such that p, q ∈ [1, ∞] and q = p p−1 if p ∈ (1, ∞), q = ∞ if p = 1, and q = 1 if p = ∞. The space of functions of bounded variation on Ω is denoted by BV(Ω). We have that u ∈ BV(Ω) if and only if it is in L 1 (Ω) and its distributional derivative is a bounded Radon measure, denoted by Du. The total variation TV(u) of u is defined to be the total variation of that measure, i.e., TV(u) = |Du|(Ω) and it is equal to The measure Du can be decomposed into where D a u is the absolutely continuous with respect to Lebesgue measure L d , with density function denoted by ∇u, D j u denotes the jump part which is the restriction to the jump set J u of u, and D c u is the Cantor part of Du. We recall that D j u is defined over the set of points in Ω for which are the approximate upper and lower limits of u, respectively. With these definitions, the total variation of the measure D j u can be written as where H d−1 denotes the (d − 1)-dimensional Hausdorff measure. The density functions of the measures D j u and D c u with respect to |D j u| and |D c u| are denoted by σ D j u and σ D c u , respectively. For further details about the space BV(Ω), we refer the reader to [3,24].
2.2. The space W q (div; Ω). As the Banach space W q (div; Ω), with q ∈ [1, ∞), plays a major role in our work, we recall here some basic facts.
Furthermore we define with the norm g q W q (div;Ω) := g q L q (Ω) + divg q L q (Ω) . Remark 2.2. By density of C ∞ c (Ω) in L p (Ω) we have divg = w as w ∈ L q (Ω) is unique. By completeness of L q (Ω) and L q (Ω, R d ) it follows that W q (div; Ω) is a Banach space when equipped with · W q (div;Ω) .
We now state some general properties of W q (div; Ω). As W q (div; Ω) is just a straightforward generalization of the well-known space H(div; Ω) := W 2 (div; Ω), these results can be proven readily by generalizing from H(div; Ω); see [25,Chapter 1] for details on the latter.
Proposition 2.4 (Normal trace). Let q ∈ [1, ∞) and denote by n Ω (x) ∈ R d the outer normal vector to ∂Ω at x ∈ ∂Ω. Then the mapping can be extended to a linear, continuous mapping, also denoted by τ : . Further we have a Gauss-Green formula for W q (div; Ω) functions: The next proposition also uses ideas from [25] but since its proof is slightly more involved, we include it in Appendix A.

Structural TV as a lower semi-continuous envelope
The goal of this section is to obtain a predual representation of a general TV-based functional that includes the case of weighted TV for a general choice of weights. To this aim, we will define the corresponding functional as the L p -lower semi-continuous envelope of a restriction to W 1,1 functions, with p ∈ [1, ∞). The approach is motivated by the paper of Bouchitte and Dal Maso [12]. We start with a few definitions.
In what follows, convex conjugation of j will always be understood with respect to the second argument, and convex conjugation of J is performed in L p (Ω). Due to positive 1-homogeneity, we get the following well known representation of j * . For its formulation, we define the support function and denote by j * the convex conjugate of j; see [23] for more information on the latter concept.
Proof. In case j(x, ·) = 0 the assertion holds true trivially. Hence we assume that there exists z ∈ R d with j(x, z) = 0. For such a point z, we can write with λ = j(x, z) ≥ 0 and forz such that j(x,z) = 1. Hence we get where the non-negativity follows from the fact that j(x, ·) is even. We further have which completes the proof.
We note that by definition and density we have for u * ∈ L q (Ω) with q = p p−1 that Here we write (v, w) := Ω vw dx for v ∈ L q (Ω) and w ∈ L p (Ω). The next proposition follows the lines of [12] and provides a characterization J * , which also holds without the positive 1-homogeneity assumption on j.
Proof. We define the map F : Further, we define the unbounded operator Λ : and Λu := ∇u. Then Λ is densely defined and closed. We note that J can be re-written as Due to |F (p)| ≤ γ meas(Ω) + γ p L 1 (Ω) by (J2), where meas(·) denotes the Lebesgue measure of a set, J is bounded in a neighborhood of any Λu with u ∈ dom(Λ). Hence, it follows from [46,Theorem 19] that To complete the proof, it is left to show that we have for any u * ∈ L q (Ω) that We first show the inclusion "⊃": In case the set on the right-hand side above is empty, then the inclusion holds trivially; otherwise take g ∈ W q 0 (div; Ω) ∩ L ∞ (Ω, R d ) with −divg = u * and v ∈ dom(Λ) ⊂ W 1,1 (Ω). By density [24,Theorem 4.3] there exists a sequence (v n ) n∈N in W 1,1 (Ω) ∩ C ∞ (Ω) converging to v in W 1,1 (Ω) for which we can also assume that v n → v in L p (Ω) since v ∈ L p (Ω). Then, by the Gauss-Green theorem for W q 0 (div; Ω), see (2.4), we get Hence |(Λv, g)| ≤ C g v L p (Ω) with some C g > 0, and thus g ∈ dom(Λ * ) and −divg = Λ * g. To show the reverse inclusion "⊂", again assuming the set on the left-hand side to be non-empty, we take . Hence, y * ∈ W q (div; Ω). Further, we also get that y ∈ ker(τ ), with τ being the normal trace operator of Proposition 2.4, and hence y * ∈ W q 0 (div; Ω), which completes the proof.
Using positive 1-homogeneity, Proposition 3.2 immediately implies the following refinement.
3.1. Explicit representation of J * * . Now we study J * * and aim at providing an explicit representation. Note that since J is convex, J * * is equal to the lower semi-continuous envelope of J with respect to both the strong and weak L p -convergence. In other words, for every u ∈ L p (Ω) we have where "→" refers to convergence with respect to the strong topology and " " with respect to the weak topology.
In order to obtain an explicit representation of J * * , we will employ the representation of the L 1lower semi-continuous envelope of J, denoted here by J, as derived in [2]. Note that for u ∈ L p (Ω) with the last equality again being true due to convexity of J.
We show that, under suitable coercivity assumptions on j, the functionals J * * and J coincide.
and that for some c > 0 we have c|z| ≤ j(x, z) for every z ∈ R d and for almost every x ∈ Ω. Then, J * * (u) = J(u) for all u ∈ L p (Ω).
Proof. Since we assume Ω to be bounded, L p convergence implies L 1 convergence. Also, J(u n ) = +∞ for u ∈ L p (Ω) \ W 1,1 (Ω) and, consequently, J ≤ J * * . Hence we are left to show J * * (u) ≤ J(u) for all u ∈ L p (Ω). To this aim, we first note that, due to the coercivity assumption on j, c|Du|(Ω) ≤ J(u) for all u ∈ BV(Ω). Now take a = lim n→∞ J(u n ) with u n ∈ W 1,1 (Ω) ⊂ BV(Ω), u n → u in L 1 (Ω). Without loss of generality, we can assume a < ∞. From the continuous embedding of BV(Ω) into L d * (Ω), we get for a generic constant C > 0 and for some K > 0 Hence, (u n ) n∈N also converges weakly (up to subsequences) in L p (Ω), and thus also weakly in L 1 (Ω). By uniqueness of the weak limit we get a = lim inf i→∞ J(u n i ) with u n i u in L p (Ω) and the proof is complete.
Remark 3.7. Note that if the coercivity assumption of Lemma 3.6 holds, then due to the lower semi-continuity of total variation with respect to weak L 1 -convergence we get u / ∈ BV(Ω) if and only if J(u) = J * * (u) = +∞.
We then get the following result, which is a direct consequence of [2, Theorem 1.1].
Proposition 3.8 (Integral representation of J * * ). Assume that p ∈ [1, d * ] and one of the following two assertions holds true: with α ∈ BV(Ω) and b being a convex function such that (J1)-(J3) hold for j.
(ii) There exists a constant c > 0 such that for every z ∈ R d and for almost every x ∈ Ω, c|z| ≤ j(x, z) and also j(·, z) ∈ BV(Ω). Then, for any u ∈ BV(Ω) we have (3.6) Proof. Denote by J the right-hand side of equation (3.6). If one of the two assumptions is satisfied, then it follows from [2, Theorem 1.1] that J(u) = J (u) for every u ∈ BV(Ω). It hence remains to show that J * * (u) = J(u) for any u ∈ BV(Ω). In case (ii) is satisfied, this is the assertion of Lemma 3.6. Assume now that (i) holds true. In that case we will show directly that J * * (u) = J (u) for every u ∈ BV(Ω). It follows from [2, Theorem 3.1] that J is lower semi-continuous with respect to L 1 convergence. Consequently it is also lower semi-continuous with respect to weak L p convergence and hence, for all u and (u n ) n∈N in W 1,1 (Ω) with u n u in L p we get This means that J (u) is a lower bound for the set and since, by definition of the lower semi-continuous relaxation, J * * (u) is the infimum of this set, we get J (u) ≤ J * * (u). Furthermore, in the proof of [2, Theorem 4.1] it is shown, with u h = u * φ h being a standard mollification of u ∈ BV(Ω) and any A ⊂⊂ Ω with |Du|(∂A) = 0, that Then Ω ⊂⊂ Ω and ∂Ω ∩ ∂Ω = ∅. By sigma additivity and since J Remark 3.9. In the case j(x, z) = α(x)|z| with 0 ≤ α(x) ≤ C and α ∈ BV(Ω), the assumptions of the above proposition are fulfilled. Using that, in this case (j − ) = α − | · |, we get that We provide further remarks concerning the case j(x, z) = α(x)|z| in Section 4 below.
3.2. Subdifferential characterization. From Proposition 3.4 and Corollary 3.5 we directly obtain an integral characterization of the subdifferential for a wide class of structural total variation type functionals. The corresponding result is established in Proposition 3.10 below. Also, we refer to [4,13,16] for more elaborate results in that direction.

Some refinements for weighted TV
For the special case j(x, z) = α(x)|z| we next investigate alternative dual definitions of the structural total variation functional. This will provide some density results for pointwise bounded W d 0 (div; Ω)-functions, which are of interest in a variety of applications [31, 32,33,34]. Given u ∈ L p (Ω) with p ∈ (1, d * ], consider the following two extended-valued weighted TVfunctionals: Recall that TV W α = J * * by Corollary 3.5.
We commence by stating a preparatory result whose proof can be found in [31].
Using this density result, the following result can be shown for a continuous weight function α.
The proof will be completed by showing that (4.4) is equal to and thus (4.5) is equal to the right-hand side of (4.3). Lemma 4.1 now allows to approximate any term in (4.4) by a term in (4.5) satisfying the pointwise constraint, which completes the proof.
As we show next, (4.1) and (4.2) are equivalent for u ∈ BV(Ω) and a continuous weight function.
Proof. We have that for every u ∈ BV(Ω) there exists a sequence ( ]. This, together with the fact that Ω α d|D ·| is lower semi-continuous with respect to L d * -convergence, and also that J * * is the L d * -lower semi-continuous envelope of J implies that where the first equality is due to Proposition 4.2 and the last equality is due to Corollary 3.5. Note that in the proof above, we are not able to directly use the result of [2] or its corollaries proven in the previous section, as α ∈ C(Ω) does not imply α ∈ BV(Ω). For smooth (but not necessarily with integrable gradient) u the following analogous result holds true; see Appendix A for its proof. Proposition 4.4. Let α ∈ C(Ω) with α ≥ 0 and u ∈ C 1 (Ω). Then TV C α (u) = Ω α|∇u| dx. We now reduce the regularity of α by assuming the following property for the function α − ≥ 0: Observe that the above requirement slightly generalizes lower semi-continuity.
Moreover, we find . But, the above inequality is, in fact, an equality as for every element From this, (4.8), Corollary 3.5 and Remark 3.9, we get that which ends the proof.
Under a uniform positivity assumption on the weight, we obtain a density result, which is of use, e.g., when predualizing the renowned Rudin-Osher-Fatemi model; see [47], and [29,33,34] for its dualization.
Proposition 4.6. Let the assumptions of Proposition 4.5 hold true, and assume in addition that α > c > 0 a.e. in Ω. Then we have for q ∈ [d, +∞) that where the first equality stems from Proposition 4.5, the second one is due to [2], the third equality comes from Proposition (3.6), and Corollary 3.5 yields the final relation.
As α > c > 0 a.e. in Ω, we have TV C α − (u) = TV W α (u) = +∞ for every u ∈ L p (Ω) \ BV(Ω). Hence, we have TV C α − = TV W α in all of L p (Ω) which is equivalent to , |g|≤α − , a.e.} (u), for all u ∈ L p (Ω). After dualization and using the fact that the second set in the equation above is closed in L q (Ω) (compare [13] for a proof for scalar α which readily carries over to the present setting), we obtain for all u * ∈ L q (Ω) that , which proves the claimed density.
Density results of the above type have recently gained attention in the literature [31, 32,33,34] and enjoy a variety of applications. In the context of variational regularization in image reconstruction, an analogous density result for continuous weights α was used in [33] in order to show equivalence of a weighted TV-regularization problem and a corresponding predual problem; see [29] for a scalar weight. We emphasize that the result of Proposition 4.6 allows for the dualization for a larger class of weights rather than continuous functions. In the following section, we discuss an even more general duality result.

A general duality result
In this section we consider the variational regularization of linear inverse problems using structural total variation regularization. Our goal is to show existence of a solution as well as equivalence to a saddle-point formulation in the continuous setting under mild conditions that are naturally satisfied by the applications of our interest. The saddle-point problem will be formulated in a way that it only requires an explicit form of J * , but not of J * * , and such that its numerical solution by duality-based optimization algorithms is direct.
Since we aim to capture diverse applications, such as structural-TV-regularized MR and PET reconstruction, our main duality result will be rather general with technical assumptions. In order to better demonstrate the essence of our result, we first consider the particular case where data fidelity will be guaranteed by a norm discrepancy. For this purpose, consider where p ∈ (1, d * ], (S, · S ) is a Banach space with f ∈ S, K : L p (Ω) → S is a bounded linear operator (i.e., K ∈ L(L p (Ω), S)), J * * corresponds to the structural TV functional as defined above, and λ > 0 is a regularization parameter. Note that, without further assumptions (see Propositions 3.4 and 3.8), we only have J * = J * * * , but J * * is not available explicitly. Hence we are interested in showing equivalence of (5.1) to an appropriate predual problem which only requires J * . We will see that, for general J, this is possible if either c|z| ≤ j(x, z) for every z ∈ R d and for almost every x ∈ Ω, or the inversion of K is essentially well-posed, i.e., K has a closed range and a finite dimensional kernel. Regarding the latter, we note that this is in particular true if we assume K * K to be invertible, with K * being the adjoint of K. A first result for the particular setting of (5.1) is stated next.
Proposition 5.1. Let p ∈ (1, d * ], (S, · S ) a Banach space with f ∈ S, K ∈ L(L p (Ω), S) and λ > 0. Assume that at least one of the following two conditions holds: (i) There exists c > 0 such that c|z| ≤ j(x, z) for every z ∈ R d and for almost every x ∈ Ω.
(ii) Rg(K * ) is closed and ker(K) is finite dimensional.
Then there exists a solution to the primal problem to a corresponding predual problem, and to the saddle-point problem g(x)) ≤ 1 for a.e. x ∈ Ω}. They all coincide and are equivalent in the sense that the pair (p, u) is a solution to the saddle-point problem if and only if u is a solution to (5.2) and p is a solution to the predual problem.
Proof. This is a special case of Theorem 5.4 below. We, hence, refer to the corresponding proof.
We mention that the above result readily carries over to data fidelities of the type Ku−f r S , with r ∈ [1, ∞). Here, often r = 2 is of interest when S is a Hilbert space, or r = p when S = W k,p (Ω), with k ∈ N 0 and p ∈ [1, ∞).
Note that the coercivity assumption (i) of Proposition 5.1 excludes the case where J * * (u) = Ω α d|Du| with vanishing weight α. The following example shows that if the weight is not bounded uniformly away from zero, existence for (5.2) is not guaranteed, in general (not in BV(Ω) but also not even in L 2 (Ω)). This observation justifies assumption (i) very well. , where · := · ∞ + ∇ · ∞ , and K : L 2 (Ω) → S with Note that K is injective, linear and bounded. In fact, concerning the latter observe for an appropriate constant C > 0. Finally choose f = δ 0 , i.e., the Dirac measure at zero. We claim that with this set-up the infimum in (5.4) is zero. Indeed, define u n := 1 meas(Nn) X Nn where N n := x ∈ R 2 : |x| ≤ 1 n . Given that meas(N n ) = π n 2 , we have Now for the fidelity term we have Hence, if there was a minimizerũ ∈ L 2 (Ω), then Kũ should be equal to δ 0 in the sense that . But this is impossible and hence there is no minimizer for the problem (5.4).
Observe that the operator K in the proof of Proposition 5.2 has no closed range. This can be seen easily as for the sequence (u n ) n∈N , in the proof above, we have Ku n − δ 0 S → 0 and δ 0 / ∈ Rg(K). From the closed range theorem we also get that the range of K * is not closed. As K is injective, and, in particular, it then has finite dimensional kernel, all other assumptions of Proposition 5.1 are satisfied. Hence the closed range assumption is tight in the sense that, for non-closed range operators we cannot expect a similar result without further assumptions on the integrand j.
Motivated by the particular case of a norm discrepancy as data fidelity, we now consider a more general setting. In fact, let p ∈ (1, d * ], λ > 0, K ∈ L(L p (Ω), S) with S being a Banach space, and assume D : S → R to be convex and lower semi-continuous. We aim to solve where J * * again corresponds to the structural TV functional as defined above. We recall first the following result which follows from [46,Theorem 19].
Lemma 5.3. Let S be a Banach space and D : S → R convex, lower semi-continuous, and continuous and finite at zero. Further, let K : L p (Ω) → S be a bounded linear operator. Then for all x * ∈ L q (Ω) we have where the minimum is attained. Consequently, it holds that dom((D • K) * ) = K * dom(D * ).
Having these prerequisites at hand, we now arrive at the main duality result of the paper.
Theorem 5.4. Let p ∈ (1, d * ], λ > 0, S a Banach space, and K ∈ L(L p (Ω), S). Further let D : S → R be convex, lower semi-continuous, and continuous and finite at zero. Also, assume that at least one of the following two conditions hold: There exists c > 0 such that c|z| ≤ j(x, z) for every z ∈ R d and for almost every x ∈ Ω, is a vector space and 0 ∈ dom((λD) * ).
(ii) Rg(K * ) is closed, ker(K) is finite dimensional and D is coercive.
Then there exists a solution to the primal problem as well as to the predual problem ) ≤ 1 for a.e. x ∈ Ω}, and to the saddle-point problem Further, these problems all coincide at their optimal values and are equivalent in the sense that the pair (p, u) is a solution to the saddle-point problem (sp) if and only if u is a solution to (P) and p is a solution to (pD).
Before we provide the proof, let us motivate the rather technical assumptions by showing that they exactly reduce to the setting of Proposition 5.1 if the discrepancy term D is a norm: In this setting, we have D(v) = v − f S , which is obviously finite and continuous at zero, and it is coercive in S. Furthermore, from [11,Theorem 4.4.10], coercivity (i.e., the first ingredient of assumption (i)) implies that B (0) ⊂ dom((λD) * ) for some > 0. Hence the union over all positive factors times dom((λD) * ) is all of S * and since P ker(∇) is linear, the second part of (i) also holds. Thus, the remaining assumptions correspond exactly to the assumptions of Proposition 5.1. We remark that the rather technical assumptions above allow to also cover settings like the one for structural TV-regularized PET reconstruction with positivity constraints.
Proof of Theorem 5.4. We first show strict duality. Note that convex conjugation is always carried out in the space where the functional is defined. Define X = W q 0 (div; Ω), Y = L q (Ω), the bounded linear operator T : X → Y , T p = divp, and the functionals F : X → [0, ∞], F (p) = I Q (p), and G : Y → [0, ∞], G(u * ) = (λD • K) * (u * ). Our goal is to show the following duality relation, which also asserts existence of a minimum for the right-hand side: For the right-hand side we get To show the duality relation, according to [6], it suffices to show that is a closed vector space. Note that the second equality holds true due to Lemma 5.3. Now first consider the case that assumption (i) holds. We claim that in this case U = P ker(∇) K * µ≥0 µ dom((λD) * ) + ker(∇) ⊥ and hence, by assumption (i), U is a closed vector space for being the sum of a finite dimensional and a closed vector space. It is clear that U is a subset of the right-hand side since For the reverse inclusion, we employ a result of [37] which states that for any δ > 0 there is an > 0 such that B (0) ∩ ker(∇) ⊥ ⊂ {divg : g ∈ W q 0 (div; Ω), |g(x)| ≤ δ for a.e. x ∈ Ω}. In fact, this was shown in [37] for p = d * , but the extension to 1 < p < d * is direct.
Taking hence sufficiently small such that w = divg with g L ∞ (Ω) ≤ c, we get that w ∈ div(Q). Now define η = min{1/µ, }. Then ηµv ∈ dom((λD) * ) by convexity of (λD) * and assumption (i). Consequently we get as claimed. Before considering condition (ii), we show that V := µ≥0 −µ div(Q) is a vector space. To see this, take λ ∈ R and y 1 := −µ 1 T (z 1 ) and y 2 := −µ 2 T (z 2 ) with z 1 , z 2 ∈ Q and µ 1 , µ 2 ≥ 0. We show that λy 1 + y 2 ∈ V . We note that, by definition, Q is convex and also λz ∈ Q for z ∈ Q and λ ∈ [0, 1]. Further, from assumption (J3) on j, one can check that −z ∈ Q for z ∈ Q. Define η = max{µ 1 , µ 2 } max{|λ|, 1}. Then λµ 1 z 1 η ∈ Q and µ 2 z 2 η ∈ Q. Hence by convexity of Q, we have Suppose now condition (ii) holds. We will show that in this case U can be written as a sum of ker(K) ⊥ and a finite dimensional space. Indeed, first note that, following [14, Section 2.4, Example 2], ker(K) ⊥ is of finite codimension and, consequently, there exists a finite dimensional spaceŨ ⊂ L q (Ω) such that L q (Ω) is the direct sum of ker(K) ⊥ andŨ . Thus we can define the canonical continuous linear projection ontoŨ , PŨ : L q (Ω) →Ũ . We claim now that which is again a closed vector space for being the sum of a closed and a finite dimensional vector space. First we note that by the close range theorem we have ker(K) ⊥ = Rg(K * ), and hence U is included in the right-hand side since To show the other subset inclusion, take u = K * v + PŨ (−µw) with w ∈ div(Q). Again we re-write u = K * v − P ker(K) ⊥ (−µw) − µw = K * ṽ − µw withṽ ∈ S * . Again by [11,Theorem 4.4.10], coercivity of λD implies continuity of (λD) * at 0. Hence there exists > 0 such that B (0) ⊂ dom((λD) * ). Setting λ = min{ / ṽ , 1/µ} we can write 1]. This shows strict duality and existence for (P) under assumption (i) or (ii). Now we show existence for the predual problem. From the continuity of D at 0 and since K is bounded, λD • K is continuous at 0 and again by [11,Theorem 4.4.10] (λD • K) * is coercive in L q (Ω). We show that the set Q is bounded with respect to the L q -norm. For this purpose, we define C = 2γ > 0, where γ is the constant from assumption (J2), and observe that, for any g ∈ Q and almost every x ∈ Ω, Indeed, since by (J2), j(x, z) ≤ γ(1 + |z|), ( * ) holds true for C > γ since then, for |z * | ≤ (C/γ) − 1, j(x, z * ) ≤ γ(1 + |z * |) ≤ C. This implies that g L ∞ (Ω) ≤ C and hence also Q is bounded in L q (Ω). Now taking (p n ) n∈N an infimizing sequence for the predual problem, we get by boundedness of Q and coercivity of (λD • K) * that there existp ∈ L q (Ω, R d ) and w ∈ L q (Ω) such that, up to subsequences, p n p and div(p n ) w. This implies that p ∈ W q (div; Ω) and divp = w. Further we note that {(g, divg) : g ∈ W q 0 (div; Ω) ∩ Q} is a convex and closed subset of L q (Ω, R d+1 ), hence it is also weakly closed. By weak convergence of (p n , divp n ) to (p, divp) and lower semi-continuity of (λD • K) with respect to weak convergence in L q (Ω, R d ) it follows thatp is a solution to (pD).
Finally, [23, Proposition III.3.1] guarantees equivalence to the saddle-point problem (sp) as claimed and the proof is complete.
This concludes the main existence and duality result of the paper, from which the application to linear inverse problems with norm discrepancy follows as a special case. A second situation we want to consider in more detail is an inverse problem where the measured data describes the physical density of some quantity and is corrupted by Poisson noise. The main application we have in mind for this setting is PET image reconstruction, where the Poisson log-likelihood and positivity constraints are used for data fidelity.
where K is a linear operator (a slightly modified Radon transform), f ≥ 0 is the given data, c 0 > 0 is an estimate for measurements due to scattering and random events and Σ is a subset of R n d with some n d ≥ 1. The function I [0,∞) constrains the unknown to the non-negative reals. Note that in PET imaging with real data, the estimate c 0 of scatter and random events is an integral part of the (so-called) reconstruction pipeline, as it describes a non-negligible part of the data. Such an estimate is typically delivered by the scanner software as preprocessing step. More specifically, in what follows we invoke the following data assumptions: (iv) the constant functions are contained in Rg(K * ).
When reading the above integrability assumptions on the data and the scatter and random events, one has to keep in mind that typically f describes the Radon transform of some density which is supported in the interior of Ω and c 0 some defects that increase with the density, but are present on every measurement line. We believe that in such a context, the posed assumptions are realistic and not very restrictive. Clearly, they are satisfied if we assume f and c 0 to be bounded and c 0 to be uniformly bounded away from zero, which is often assumed in the literature for the measured data in order to obtain stability results [48]. In this respect one has to keep in mind, however, that the Radon transform of any function which is bounded above will converge to zero for measurement lines whose length converges to zero. Hence we believe that there is a benefit in using the assumption (AP) rather than uniform boundedness away from zero.
Regarding the assumptions on the forward operator K, we recall in the following some basic properties of the Radon transform, which in particular show that for such a K assumption (AP) is fulfilled. In fact, the classical Radon transform R is a bounded linear operator from L 1 (R d ) to L 1 (S d−1 ×R); see for instance [42]. Here S d−1 ×R is equipped with the measure µ := H d−1 S d−1 ×L. In this context, dσ denotes integration with respect the measure µ. Define now the bounded linear operator E : L p (Ω) → L 1 (R d ) as the extension-by-zero outside Ω. Then, if K := R • E we have that K is linear, it maps L p (Ω) to L 1 (Σ), with Σ a bounded subset of S d−1 × R, and it is bounded, since for every u ∈ L p (Ω) According to [28], the adjoint R * : L ∞ (Σ) → L ∞ (R d ) of the Radon transform is given by One can easily see now that K * : L ∞ (Σ) → L q (Ω) is simply the restriction of R * in Ω, since for every u ∈ L p (Ω) and every v ∈ L ∞ (Σ) we have In this case, the constant functions belong to Rg(K * ). Moreover, from the definition of the Radon transform it follows immediately that Ku ≥ 0 for u ≥ 0 with both inequalities to be understood in the almost everywhere sense. Hence, assumption (AP) holds true when K is the Radon transform. Note also that the assumption on Σ is valid since due to Ω being bounded, for every u ∈ L 1 (Ω), we have that Ku is supported in a fixed compact subset of Σ. Now considering the formal minimization problem for Poisson-corrupted data (5.8) in view of our general duality result, some issues arise. First of all we need to rigorously define the data discrepancy term as a function from L p (Ω) to the extended reals, and secondly neither the data discrepancy nor the positivity constraint will be continuous in L p (Ω) around 0.
The following modification of the data term resolves some of these issues without changing the original problem. For f ∈ [0, ∞) and c 0 ∈ (0, ∞) we define the integrand and for f : Σ → [0, ∞) and c 0 : Σ → (0, ∞) the functional where we set L := 1 − f c 0 ∞ + 1 and v − := min{v, 0} in a pointwise almost everywhere sense. The point in the definition of D KL is that we change the original data fidelity only in points where Ku is negative, which, however, can never occur due to the positivity constraint in u and Ku ≥ 0 for u ≥ 0. Hence, when considering the minimization problem it is immediate that the set of optimal solutions is exactly the same as for (5.8). The modified fidelity D KL enjoys the following properties.
Lemma 5.5. Assume that (AP) holds. Then D KL is convex and continuous in L 1 (Σ). Further there exist constants M, N > 0 such that Proof. Regarding convexity, we note that v → v − L 1 (Σ) is convex and hence it suffices to show convexity of the integrand z → l f (σ),c 0 (σ) (z) for σ ∈ Σ fixed. To this aim, we note that convexity is equivalent to the derivative of l f (σ),c 0 (σ) being monotonously increasing. The latter is indeed true since for z > 0 and z < 0 the integrand is convex and since, as can be readily checked, the left and right derivative of l f (σ),c 0 (σ) (·) at z = 0 coincide. Regarding continuity, for v ∈ L 1 (Σ) we denote Σ 1 = {σ ∈ Σ : v(σ) ≥ 0}, Σ 2 = Σ \ Σ 1 and estimate Hence D KL is bounded above in a neighborhood of any point and, for being a convex function [11,Prop. 4.1.4] it is also continuous at any point.
To show the coercivity estimate, first pick any Further, for v ∈ L 1 (Σ) with v ≥ 0 the Poisson log-likelihood can be estimated below as follows [10,44]: It is easy to check by differentiating that the function is convex and attains its minimum value of 0 at t = 1.
Now taking the square root on both sides, integrating and using the Cauchy-Schwarz inequality we get that Further, using that v + c 0 In the other case we get Hence in any case there exists constants M > 0, N ∈ R such that Now splitting an arbitrary v ∈ L 1 (Σ) into its positive and negative parts, and using the estimates (5.10), (5.11) accordingly, we get that there exists which completes the proof.
Using these properties of the data discrepancy, we obtain the following existence result by standard arguments.
Proposition 5.6. Assume that (AP) is fulfilled and there exists c > 0 such that c|z| ≤ j(x, z) for every z ∈ R d and for almost every x ∈ Ω. Then there exists a solution to Proof. We only provide a sketch: Let (u n ) n∈N be an infimizing sequence for (5.12). Since ker(K) = Rg(K * ) ⊥ [14, Corollary 2.18], K does not vanish on constant functions. Hence, we get from the equivalence of J * * to TV and the coercivity of D KL (Lemma 5.5) as well as norm equivalence in finite dimensional spaces that (u n ) n∈N is bounded in L p (Ω). Hence there exists a weakly convergent subsequence and from weak lower semi-continuity of all terms appearing in the objective functional, existence follows readily.
For applying the general duality result of Theorem 5.4 to the situation of Poisson noise, the only issue that remains is the fact that functional realizing the positivity constraint is not continuous at zero. To overcome this, we will introduce a slight modification of this functional as follows. We define the penalty term The following properties of H are immediate. Replacing I [0,∞) by H obviously is modification of the original problem for which, in general, we cannot expect that the set of solution coincides with the one of the original problem. However, H is chosen in the spirit of exact penalty functions. For the latter it can be shown that for sufficiently large penalty parameter, the solution of the original problem also solves the penalized one. We also note that our goal is to show equivalence of (5.9) to a saddle-point problem, which is then numerically solved with a primal-dual algorithm. For the original problem, the positivity constraint would then result in a projection of the unknown to the positive reals in every iteration. The modified objective results in a soft-shrinkage operation for the negative values at each iteration, i.e., u is unchanged at positive values and replaced by u(x) = min{u(x) + M, 0} for negative values.
We finally get the following result for the modified PET problem: The optimal values of all of these problems coincide, and the problems are equivalent in the sense that a pair (p, u) is a solution to the saddle-point problem if and only if u is a solution to (P-PET) and p is a solution to ( pD-PET).
Remark 5.9. Summarizing, both for norm and Poison-log likelihood discrepancies, in the situation where the underlying inverse problem is indeed ill-posed, coercivity of the integrand as it appears in the definition of the structural TV prior is necessary for showing existence and for establishing our main duality result. As in this situation the structural TV prior is topologically equivalent to isotropic TV, we note that also classical stability and convergence results for vanishing noise levels can be shown by standard techniques. We refer, for instance, to [36] for the norm-discrepancy case and to [48] for the case of a Poison-log likelihood discrepancy.

Numerical examples
The goal of this section is to exemplary show the numerical realization of two applications of the above general setting. The first one considers a pure denoising problem. This example is included to provide a setting where the inversion of the forward operator, in this case the identity, is well-posed and hence, using the second assumption of Proposition 5.1, the duality result also holds for integrands which vanish on non-trivial sets. The second problem considers the practically more relevant application of MR-guided PET reconstruction and shows how recently proposed regularization approaches [49,21] can be realized within our framework.
In an abstract setting, the variational problem setting of the previous section can be written as follows: min where G and H are smooth and "simple" functions, respectively, that are to be specified for the concrete problem setting, e.g., one chooses G(u) = u − f 2 2 , H ≡ 0 for denoising. We have shown that, under suitable assumptions, this problem is equivalent to the saddle-point problem The main point in this reformulation is that it allows to solve the variational problem without requiring explicit knowledge of J * * . We recall that our main results contains a saddle-point formulation whose numerical realization is directly amenable to first-order primal-dual algorithms; see, e.g., [17,18]. In fact, since for p = 2 the saddle-point problem is defined in a Hilbert space, a corresponding formulation of algorithm can even be shown to be convergent in the infinite dimensional setting [18]. In practice, however, a proper infinite-dimensional version of the algorithm would require the computation of some proximal mappings in H(div; Ω) space, which in turn requires the solution of a non-trivial problem (essentially TV denoising) on its own. For the derivation of an implementable algorithm, we first discretize and change to the Euclidean norm for both underlying spaces. The iteration steps of the algorithm presented in [18] for solving the saddle-point problem above can then be specified in a general form as follows: Given some triple (u, p,p), the updates u + , p + ,p + are computed as Here, prox σ,H denotes the proximal mapping defined as prox σ,H (u) = arg min y u−y 2 2 2σ + H(y), which is well-defined by convexity of H. The term proj Q is a projection onto the set Q and results from the proximal mapping of I Q , and σ, τ > 0 need to be suitably chosen. This iteration is then repeated with (u, p,p) := (u + , p + ,p + ) until some stopping rule is satisfied.
6.1. Weighted TV with vanishing weight. For the case of weighted TV denoising one can simply choose H ≡ 0 and G(u) := λ 2 u − f 2 L 2 (Ω) , with f ∈ L 2 (Ω) some given data. This results in the saddle-point problem (6.2) inf For a proof of concept, we compute a weighted TV-based denoising example, with vanishing continuous weight function α, i.e., j(x, z) = α(x)|z|, with α ∈ C(Ω), α ≥ 0. This fits into the regime of condition (ii) of Theorem 5.4, i.e., K = id, hence Rg(K * ) is closed and ker(K) is finite dimensional and the fact that α vanishes implies that the coercivity assumption (i) of the same theorem does not hold. Notice that because of this lack of coercivity we cannot expect the solution u to be in BV(Ω). Thus, the characterization J * * (u) = Ω α d|Du| is not valid here. This deficiency, however, can be overcome by solving the corresponding saddle-point problem.
After a standard discretization procedure the saddle-point problem (6.2) can be solved straightforwardly with the algorithm in (6.1), where the involved proximal mappings prox σ,H and proj Q can be computed explicitly and pointwise. We use standard forward differences for the discrete gradient with pixel replication at the boundary. The discrete divergence operator is defined by the adjoint relation ∇ = −div T . We use an accelerated version of the primal-dual algorithm of [17], as described in [18], where for the step-sizes τ and σ, the update rule τ n = 1/(n + 1), σ n = 8/τ is employed, with τ 0 = 10 −4 and σ 0 = 8 × 10 4 . Here, n is the iteration number and 8 is an upper bound for the squared norm of the discrete gradient operator, i.e., ∇ 2 ≤ 8. The primal and dual variables were initialized as u 0 = f and p 0 = 0 respectively. As a stopping rule we used a maximum number of iterations (n = 2000), after which no considerable change in the iterates was observed.
Regarding the weight function α, we note that our purpose here is not to determine a procedure for automatically selecting it -for that we refer the reader to [33,35]. Nevertheless, in order to produce the weight function α needed for the examples in Figure 1 we enable the following procedure: Initially we detect the edges of the noisy image in Figure 1b  Then we construct a positive continuous weight function α which takes the value zero exactly on the edges of the image. This can be done by computing the distance function to the edge set, which can be achieved for instance by solving the Eikonal equation [50]; see Figure 1d. The fact that α vanishes exactly at the edge set leads to a piecewise constant reconstruction with less loss of contrast and better edge preservation when compared to the standard scalar TV-regularization; compare Figures 1e and 1f. See also [30] for a theoretical justification of this experimental observation.  6.2. MR-PET reconstruction. Magnetic resonance imaging (MRI) and positron emission tomography (PET) are two complementary imaging techniques that are both intensively used in clinical applications. While MRI allows to resolve soft tissue contrast with comparably high spatial resolution, standard MRI techniques deliver qualitative information only. In contrast, PET imaging suffers from a poor spatial resolution, but it is able to deliver quantitative information (the distribution of a radioactively marked tracer). In the past years, and recently particularly motivated by the availability of joint MR-PET scanners, there has been a substantial research effort towards combining MR and PET data in order to obtain benefits for the ill-posed image reconstruction process; see for instance the references provided in [21]. While some works aim at a simultaneous reconstruction of MR and PET [22,40,43], aiming to obtain mutual benefits for both modalities, most approaches employ an MR-based structural prior to improve upon the poor spatial resolution of PET [21,49,52]. Our work here provides an analytical basis for the latter. We note that our theory covers different priors that were proposed for structure-guided PET reconstruction in [21,49].
In particular, it allows to conclude that well-posedness of the PET reconstruction problem can only be expected if the prior satisfies the coercivity condition stated in Proposition 5.8. We now exemplary work out the application of our framework to one particular prior, that is motivated by [20,49]. Assume that we are given an already reconstructed MR image v defined on a joint image domain Ω. We supppose that ∇v ∈ L 1 loc (Ω, R 2 ). We then define the structural TV-based prior for PET reconstruction as follows. For parameters η, ν > 0 with η ∈ (0, 1) and ν small we define where √ · refers to the matrix square root, and the integrand It is easy to see that j fulfills the assumptions (J1)-(J3). Hence, we can define the structural prior and consider its lower semi-continuous relaxation for regularization. The effect of this prior can be understood by re-writing This shows that, at any point x ∈ Ω, the cost of j(x, z) depends on the extent to which z is aligned with w(x). In particular, the smallest cost will appear if z and w(x) are parallel. Regarding the parameters η and ν, we note that ν is used to carry out a regularized division in the normalization of w, as suggested for instance in [21]. The parameter η ∈ (0, 1) does not appear in previous works and we use it to ensure coercivity of j, and hence that J * * indeed has a regularizing effect. In fact, noting that j(x, z) = (|z| 2 − (z, w(x)) 2 )η 2 + (1 − η 2 )|z| 2 ≥ 1 − η 2 |z|, we obtain coercivity of j and hence, by Proposition 5.8, well-posedness of the structural-prior based PET reconstruction problem given as Further, equivalence and well-posedness of a corresponding saddle-point problem can be concluded in function space and, due to topological equivalence of J * * with TV, standard stability results can be expected; see Remark 5.9. Algorithmic realization. In order to derive a numerical algorithm for solving the structural-priorbased PET reconstruction problem in two dimensions, we now replace the image domain Ω and the data domain Σ by a discretized pixel grid, that is, we set Ω = R N Ω ×M Ω and Σ = R N Σ ×M Σ with N Ω , M Ω , N Σ , M Σ ∈ N. For simplicity, we keep the notation of the continuous setting, where now all involved operators, integrals and norms are replaced by their discrete counterparts, using standard discretizations. In particular, for the forward operator for PET, we use the implementation of [41], which is designed to work with real scanner data and includes resolution modeling by a convolution with a Gaussian kernel.
In order to carry out the iteration steps as described in (6.1) and obtain a convergent algorithm in the discretized setting, we need that the derivative of the data term, on which we perform an explicit descent step, is Lipschitz continuous. To achieve this, we carry out a C 2 instead of a C 1 interpolation of the Poisson-log-likelihood in regions with negative data. In fact, for f ∈ [0, ∞) and c 0 ∈ (0, ∞), we define the modified integrand and a modified data term where, for a small parameter > 0, g is a smoothed 1-norm for the negative values given by In this context, it is important to remember that any change of D KL that only affects negative values of v does not change the optimal solutions as long as Au is constrained to be positive at every point. In order to simplify the computation of proj Q in the iteration (6.1), we further carry out a change of variables in the saddle-point reformulation of (6.3) as p = A T q, where A T q is defined as pointwise matrix-vector multiplication A T q(x) = A T (x)q(x). The resulting saddle-point problem then reads where (prox σ,H (u))(x) = max{u(x), min{u(x) + σM, 0}} is a soft-thresholding operator and (proj {|·| 2 ≤1} (q))(x) = q(x)/ max{1, |q(x)| 2 }. Following [18], convergence of the algorithm can be guaranteed under the step-size constraints 1 τ ( 1 σ − L) ≥ div K T . Hence, we analytically estimate both div A T and L, with the latter being the Lipschitz constant of the derivative of λ D KL . However, in order to accelerate convergence in practice, we multiply the analytical estimate of L by 10 −3 , which increases the admissible step-size. Even tough convergence naturally cannot be guaranteed for this increase step-size choice, we did not experience any convergence issue in practice, which might indicate that our analytical estimate of L are too conservative.
To evaluate the effect of structural coupling compared to standard TV-regularization, we simulated a 2D PET and MR scan using a slightly modified version of the brain phantom of [8]; see Figure 2. In fact, we used the software provided in [8] to simulate an MR image with MPRAGE contrast and a FDG-PET image. Both images share similar structures, but they also contain separate features. In addition, a separate lesion was added to each image, in the top right area for the PET image and in the top left area for the MR image, see corresponding red squares, (see [40] for more details on quantitative values for the PET phantom). Further, to avoid a bias resulting from piecewise constancy, we added two different linear grayscale gradients in the non-background region of both images.
The MR image was then used as structural prior to define the regularization term J as above. PET data, denoted above by f , was generated by forward-projecting the PET phantom, adding simulated random and scatter events (denoted above by c 0 ) and adding multiplicative Poisson-noise. Two different noise levels (strong and medium) were simulated. For comparison, we also carried out a standard TV-regularized reconstruction. The reconstructions for both methods were obtained by iterating the steps provided in (6.4), where for standard TV-regularization A was replaced by the identity. In order to ensure optimality, we carried out 10 4 iterations for each method, while in practice about 1000 iterations seem sufficient to obtain close-to-optimal results. For both TVand structural TV-regularization we tested a range of different regularization parameters (λ) and provide here the results with the lowest mean-squared error.
The resulting images can be seen in Figure 3. As one would expect, for both noise levels, the structural-prior based reconstruction is able to recover sharper edges whenever they are aligned with the prior image. This results also in improved MSE values. In the PET only lesion, it can be observed that both TV and structural TV obtain similar results. In the region of the PET image where the MR prior has an additional feature, no particular feature transfer can be observed. This indicates a certain robustness of the approach with respect to non-aligned features between the prior and the reconstructed image. We emphasize again that the results presented here should be understood as a proof-of-concept only, and we refer to [21,49] for a detailed evaluation of this kind of prior for PET reconstruction, also on 3D PET patient data.

Conclusion
In this work, we have analyzed a particular class of structural-prior-based regularization approaches for linear inverse problems in function space. While these approaches have recently been successfully applied to several practically relevant inverse problem settings, an analysis in function space was still missing. Our results show that, in function space, a certain regularity of the prior information in space is necessary in order to obtain an explicit representation of the regularizer. To account for this fact, we have shown how the original minimization problem can alternatively be solved using a saddle-point reformulation that does not require explicit knowledge of the prior. In that context, our main message is that coercivity of the prior is necessary in order to obtain wellposedness (and, consequently, stability by standard results) in non-trivial inverse problem settings. Ultimately, we have shown how to numerically solve the proposed saddle-point reformulation for two proof-of-concept applications including the practically relevant application of structure-guided PET reconstruction.
This follows directly from Proposition 4.2 and the fact that u ∈ BV(Ω δ ).