Variational regularisation for inverse problems with imperfect forward operators and general noise models

Abstract We study variational regularisation methods for inverse problems with imperfect forward operators whose errors can be modelled by order intervals in a partial order of a Banach lattice. We carry out analysis with respect to existence and convex duality for general data fidelity terms and regularisation functionals. Both for a priori and a posteriori parameter choice rules, we obtain convergence rates of the regularised solutions in terms of Bregman distances. Our results apply to fidelity terms such as Wasserstein distances, φ-divergences, norms, as well as sums and infimal convolutions of those.


Introduction
We consider linear inverse problems where A : X → Y is a linear bounded operator (referred to as the forward operator or the forward model) acting between two Banach spaces X and Y. The exact measurementf is typically * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
not available and only a noisy version of it f δ is known along with an estimate of the noise level δ. Since the inversion of (1.1) is often unstable with respect to noise and hence ill-posed, it requires regularisation. Variational regularisation replaces solving (1.1) by the following optimisation problem where H(·| f ) is a so-called data fidelity function that models statistical properties of the noise in f and J (·) is a regularisation functional that stabilises the inversion. The regularisation parameter α > 0 balances the influence of the data fidelity and the regularisation. The amount of noise δ in the measurement f δ is assumed to be such that The fidelity function often depends only on the difference of the arguments, i.e. H(v| f ) = h(v − f ) for some function h. The most common example is H(v| f ) = 1 2 v − f 2 . There are, however, cases when the fidelity function depends on its arguments in a more complicated manner; an example is the Kullback-Leibler divergence that is used to model Poisson noise [1], where H(v| f ) = (v log v f − (v − f )) dx (see also the review paper [2]). Problems with general fidelity functions were analysed in [3,4].
To guarantee convergence of the minimisers of (1.2) to a solution of (1.1) as the noise level δ decreases, the regularisation parameter α needs to be chosen as a function of the measurement noise α = α(δ) (a priori parameter choices) or of the measurement itself and of measurement noise α = α( f δ , δ) (a posteriori parameter choices). For a priori parameter choice rules, convergence rates for solutions of (1.2) in different scenarios have been obtained, e.g., in [5][6][7][8][9]. A classical a posteriori parameter choice rule is the so-called discrepancy principle, originally introduced in [10] and later studied in, e.g., [11][12][13]. Roughly speaking, it consists in choosing α = α( f δ , δ) such that the following equation is satisfied where u α is the solution of (1.2) corresponding to the regularisation parameter α.
In many applications, not only the measurement f δ is noisy, but also the forward operator A that generated the data is not precisely known. Errors in the operator may come from the uncertainty in some model-related parameters such as the point-spread function of a microscope, simplified model geometry and/or discretisation. A classical approach to modelling errors in the forward operator assumes an error estimate in the operator norm, i.e. (1.4) where A h : X → Y is a linear bounded operator that we have numerical access to and h 0 describes the approximation error (e.g., [14][15][16][17]). To guarantee convergence in this setting, the parameter α needs to be chosen as the function of δ and h (a priori choice rules) or of δ, h, f δ and A h (a posteriori choice rules). Generalisations of the discrepancy principle to this setting are available [18][19][20], but they usually rely on a triangle inequality that H(·| f ) needs to satisfy. An alternative approach to modelling operator errors using order intervals in Banach lattices was proposed in [21][22][23]. It assumes that the spaces X and Y have a lattice structure [24] and, instead of (1.4), lower and upper bounds for the operator are available A l A A u , (1.5) where the inequalities are understood in the sense of a partial order for linear operators, i.e.
A l u Au A u u for all u 0. (1.6) The inequalities in (1.6) are understood in an abstract sense of a Banach lattice; which for L p spaces means inequality almost everywhere. In order for the partial order bounds (1.5) to be well-defined, we assume that A : X → Y is a regular operator [24], i.e. that it can be written as a difference of two positive operators, A = A 1 − A 2 , where for any u 0 it holds that A 1,2 u 0. Some examples of regular operators will be given later. The approach (1.6) to describing errors in the forward operator was studied in the context of the residual method in the case Y = L ∞ when the data fidelity is a characteristic function of a norm ball (1.7) In this case, one solves the following problem where f l := f δ − δ and f u := f δ + δ are pointwise (a.e.) lower and upper bounds for the exact dataf in (1.1) such that f l f f u and is the constant one-function. For comparison, with the data term (1.7) and without an operator error, (1.2) translates into where the constraint is equivalent to Au − f δ ∞ δ. (In [25], a connection is made between the lower and upper bounds f l , f u and confidence intervals.) One can show that the partial order based condition (1.5) implies the norm based condition (1.4). Indeed, given A l , A u as in (1.5), one defines It can be readily verified that the so defined A h satisfies (1.4). The opposite implication is, in general, wrong. Hence, if an estimate (1.5) is available, it allows one to describe the operator error more precisely and one may expect better reconstructions. Indeed, it was found in [23] that solving (1.2) with H(Au| f δ ) = Au − f δ ∞ and α chosen according to a generalised discrepancy principle [18] based on (1.4) produces overregularised solutions compared to (1.8), i.e. the generalised discrepancy principle tends to overestimate the regularisation parameter. One of the reasons for this is the use of the triangle inequality to account for (1.4), which makes the estimates not sharp, in general. The motivation for this paper is two-fold. First, we want to extend the approach (1.5) and (1.8) to a broader class of fidelity terms than the characteristic function of a ball and more general data spaces than L ∞ . We also aim at a unified analysis of problems with fidelities that do not satisfy a triangle-type inequality, which is interesting in its own right. Our proofs mostly rely on convex analysis and duality. Setup. We consider the inverse problem (1.1), where X = U * and Y = V * are duals of Banach lattices U and V, respectively. We assume that the partial order on Y is induced by the partial order in V as follows: where J : X → R + and H(·| f ) : Y → R + (as a function of its first argument) are assumed proper, convex and weakly- * lower semicontinuous (cf assumption 1). Main contribution. In this work we study convergence of solutions of (1.11) to a Jminimising solution of (1.1) as the noise in data and operators decreases, and obtain convergence rates in one-sided Bregman distances with respect to J . We also give conditions when (1.11) admits strong duality, in which case the convergence rates translate to symmetric Bregman distances. Furthermore, we analyse an a posteriori parameter choice rule based on a discrepancy principle for (1.11).
Our results apply inter alia to general ϕ-divergences, as for instance the Kullback-Leibler divergence, and coercive fidelities such as powers of norms or Wasserstein distances from optimal transport. In addition, we also obtain rates for sums and infimal convolutions of different fidelities, as used for instance in mixed-noise removal. Even for exact operators, our analysis goes beyond the state of the art in problems with fidelity terms that lack a triangle-type inequality. Structure of the paper. In section 2 we study existence of solutions of the problem (1.11) and its dual and establish sufficient conditions for strong duality. In section 3 we derive convergence rates for a priori parameter choice rules. In section 4 we formulate a discrepancy principle for the problem (1.11) and also obtain convergence rates. For readers' convenience, we present some background material on Banach lattices in the appendix.

Examples of regular operators
Below, we give some examples of regular operators and discuss how lower and upper bounds in the sense of (1.5)-(1.6) can be obtained. Example 1.1. If Y is an abstract maximum space (a generalisation of L ∞ ) or if X is an abstract Lebesgue space (a generalisation of L 1 ) then all linear bounded operators are regular, i.e. they can be written as a difference of two positive operators. More details can be found in the appendix.

Example 1.2 (Integral operators-perturbations of the kernel). Let
be an integral operator with a (p, q)-bounded kernel k [26], (1.12) The operator A can be written as where k + and k − are the positive and the negative parts of k (in the a.e. sense in Ω × Ω). Clearly, A ± are positive and A is regular.
Suppose that the kernel is corrupted by an unknown (p, q)-bounded perturbation such that we only know pointwise lower and upper bounds for k, (1.13) Then lower and upper operators in the sense of (1.5) are given by It should be noted that the bounds (1.13) are of a deterministic nature. They could arise, for example, if the kernel depends on additional parameters θ ∈ Θ, i.e. k(x, ξ) = k θ (x, ξ). If reconstructing the unknown parameter θ is not of independent interest, the dependence on it can be eliminated by defining provided the suprema and infima are finite for a.e. x, ξ and k l,u are (p, q)-bounded.

Example 1.3 (Integral operators-discretisation).
Let the operator A be as defined in example 1.2 on an interval Ω ⊂ R and consider its approximation by Riemann sums. In particular, let S l n (x) and S u n (x) denote the lower and upper Riemann sums in (1.12) obtained using an n-point discretisation. Then these sums define lower and upper operators in the sense of (1.5), As we refine the discretisation (i.e. n → ∞), these bounds converge pointwise to Au(x). Example 1.4 (Integration with respect to a vector-valued measure). Example 1.2 can be generalised as follows. Let μ ∈ M(Ω; Y) be a vector-valued Radon measure [27], where Ω is a compact metric space and Y is a Banach lattice with the Radon-Nikodým property. Define partial order on M(Ω; Y) as follows (1.14) Let A : C(Ω) → Y be defined as follows Since Y is a lattice, it is clear that A is regular. Lower and upper bounds μ l M , μ, M , μ u in the sense of (1.14) define lower and upper operators A l,u in the sense of (1.5).

Example 1.5 (1D source identification). We consider the operator
dy.
Clearly, A 0 and hence regular. Hence, if a, a : [0, 1] → R are continuous functions such that a a a on [0, 1] and a a 0 > 0 on [0, 1], we can define operators , the antiderivative U is continuous and one can approximate the integrals in A l and A u with lower and upper Riemann sums, respectively. This gives rise to operators A l n and A u n such that A l n A l A A u A u n . If then additionally n → ∞, the operators A l,u n converge to A. Note that a similar approach can be used for estimating the diffusivity a for a given source term. In this case, however, the forward operator A becomes non-linear. This would require an extension of our theory.

Example 1.6 (Conditional expectations). Let Ω be a separable metric space and
under the convention 0/0 = 0. Clearly, A 0 and hence regular. If we allow μ to be a finite signed measure, then we can generalise the definition as follows where |μ| is the total variation of μ. Clearly, and A ± 0, hence A is regular. In contrast to example 1.4, partial order bounds on μ in the sense of (1.14) do not translate into lower and upper bounds (1.6) for A since A is not an integral operator (in particular, it is not linear in μ).

Primal and dual problems
In this section we establish existence of solutions to (1.11) using the direct method, where standard assumptions on the forward operators, the regularisation, and fidelity function will guarantee coercivity and lower semicontinuity. Subsequently, we derive the dual maximisation problem and prove existence and strong duality under the additional assumption that the data space Y is an abstract maximum space.

Existence of a primal solution
We make the following standard assumptions on the regularisation functional J, the fidelity function H, and the operators A l,u .
• Proper, convex in its first argument and weakly- * lower semicontinuous jointly in both arguments; A sufficient condition for assumption 2 to hold is given in lemma A.5 in the appendix. Proof. Consider a minimising sequence (u k , v k ). Due to assumption 1 there exists a convergent subsequence u k (that we don't relabel) such that Then assumption 2 yields From (1.11) we get that for all k since weakly- * convergent sequences are bounded.
Since Y is a dual of a separable Banach space V, by the sequential Banach-Alaoglu theorem the sequence v k contains a weakly- * convergent subsequence v k (that we do not relabel) such that v k , * , v ∞ .
Since both A l,u u k and v k converge weakly- * and order intervals in Y are weakly- * closed due to lemma A.4, we obtain that Hence (u ∞ , v ∞ ) is feasible for (1.11). Furthermore, since J (·) and H(·| f ) are weakly- * lower semicontinuous, we get that is a solution of (1.11).

Dual problem
To simplify our notation, we introduce an operator B : With this notation we can rewrite (1.11) as follows Proof. The Lagrangian of (2.3) is given by Taking a supremum over μ 0 gives (2.4).
It is well known (e.g., [28]) that which is referred to as weak duality.

Remark 2.3.
If the fidelity function depends only on the difference of its arguments, i.e.

Existence of a dual solution and strong duality
The goal of this section is to study the relationship between the primal problem (2.3) and its dual (2.4), establishing strong duality and existence of a dual solution, and obtaining complementarity conditions for Lagrange multipliers associated with constraints in (2.3). We will need the following result from [28, theorem 2.165].

Theorem 2.4 ([28]). Consider the following optimisation problem
and its dual where X and Y are Banach spaces, L : X → Y is a linear bounded operator, L * its adjoint, K ⊂ Y a closed convex set, and g : X → R a proper convex lower semicontinuous function with convex conjugate g * : X * → R. The characteristic function of K is denoted by χ K (·) and its convex conjugate (i.e. the support function of K) by χ * K (·). Suppose that the following regularity condition is satisfied Then there is no duality gap between problems (P) and (D). If the optimal value of (P) is finite, then the dual problem (D) has at least one solutionȳ * ∈ Y * .
The regularity condition (2.6) is due to Robinson [30] and plays an important role in the stability of optimisation problems under perturbations of the feasible set [28].
To ensure that (2.6) is satisfied in the primal problem (2.3), we will need to assume that the positive cone in Y has a non-empty interior. This naturally leads to the concept of abstract maximum spaces [24] which are a generalisation of L ∞ (Ω).
Here x ∨ y and |x| denote the usual supremum and absolute value of elements in a Banach lattice (cf appendix).
Theorem 2.6. Let Y be an AM-space with unit and suppose that there exist u 0 ∈ dom(J ) and v 0 ∈ dom(H(·| f )) such that where ε > 0 is a constant. Then Robinson's condition (2.6) is satisfied in the primal problem.
Proof. In the notation of theorem 2.4, we have Take an arbitrary y = (y 1 , y 2 ) ∈ Y × Y with y ε. Without loss of generality we can choose the norm on Y × Y to be y = max( y 1 , y 2 ). Hence, the definition of the unit implies −ε y 1,2 ε .
To show Robinson regularity, we need to write y as for some u ∈ dom(J ), v ∈ dom(H(·| f )) and z = (z 1 , z 2 ) ∈ Y × Y, z 1,2 0. Writing this in terms of A l and A u , we get Take u = u 0 and v = v 0 . Then and we can take z 1,2 as above to represent y as in (2.7 Proof. Using the Fenchel-Young inequality, strong duality (2.8) and the feasibility of (u, v), we obtain Hence, equality holds everywhere and we get that −B * μ ∈ ∂J (u) and αE * μ ∈ ∂H(v| f ).

Convergence analysis
Having investigated well-posedness of the primal and dual problems, we can now prove convergence rates of solutions as the noise in the data and the operator tends to zero. To this end we consider sequences and corresponding sequences (u n , v n ) and μ n which solve problems (2.3) and (2.4), respectively. We are interested in studying the behaviour of (u n , v n ) as n → ∞ and would like to prove that u n converges to a J -minimizing solution u † J (cf (1.10)) whereas v n approaches the exact dataf .  1c). For asymmetric fidelities such as the Kullback-Leibler divergence it does. If we think of the Kullback-Leibler divergence D KL (p|q) as the amount of information lost by using q instead of p (see [31]), then it actually makes sense to choose H(f | f n ) in (3.1c), i.e. to measure the amount of information lost by using the noisy measurement f n instead of the exact onef .
We start with results that do not require the existence of a dual solution and are valid under general assumptions (cf theorem 2.1).

Convergence of primal solutions
We consider a sequence of primal problems (2.3) where B n : X → Y × Y is defined as follows Under assumptions 1 and 2, we obtain the following standard result. Proof. Comparing the value of the objective function at the optimum (u n , v n ) and (u † J ,f ) (which is a feasible point for all n), we get and Since δ n α n → 0, the value on the right-hand side is bounded uniformly in n. Hence, since sublevel sets of J are weakly- * sequentially compact, u n contains a weakly- * convergent subsequence (that we do not relabel) that converges to some u ∞ ∈ X u n , * , u ∞ . source condition (e.g., [6,33,34]); we will use the variant from [6], which in our notation can be written as follows

Convergence rates in a one-sided Bregman distance.
We start with a convergence rate in a one-sided Bregman distance D p † J , where p † := − B * μ † is the subgradient from the source condition (3.5).

Theorem 3.5. Let assumptions of theorem 2.1 and assumption 3 be satisfied and (3.1) hold. Then the following estimate holds
and therefore By the Fenchel-Young inequality, the term in the brackets is bounded by H * (α n E * μ † | f n ), hence

Convergence rates in a symmetric Bregman distance.
Under a stronger assumption that Y is an AM-space (cf theorem 2.6), we can obtain an estimate in a symmetric Bregman distance.
Theorem 3.6. Let assumptions of theorem 2.6 and assumption 3 be satisfied and (3.1) hold. Then the following estimate holds where u n is bounded due to theorem 3.2. From the Fenchel-Young inequality and theorem 2.8 we get that which yields the desired estimate upon dividing by α n .

Applications to different fidelity terms
To apply theorems 3.5 or 3.6, we need to study the term H * (α n E * μ † | f n ) − α n E * μ † ,f separately for each fidelity term. where ϕ(1) = 0. We refer to [35] for many examples and fundamental properties of ϕdivergences. Since ρ and ν have unit mass, function ϕ is only determined up to the additive term c(x − 1) for c ∈ R. In particular, since ϕ is convex and meets ϕ(1) = 0, it is straightforward to see that one can always find a suitable c ∈ R such that ϕ(x) + c(x − 1) 0 for all x > 0. Hence, we will without loss of generality assume that ϕ 0. We take Y = M(Ω) to be space of Radon measures on Ω equipped with the total variation norm and consider ∞, e l s e , (3.12) where P(Ω) ⊂ M(Ω) is the set of probability measures and f ∈ P(Ω). We estimate the convex conjugate of H(ρ|ν) as follows: for any h ∈ C(Ω).

Remark 3.8 (Poisson noise).
The main motivation for the use of the Kullback-Leibler divergence as a fidelity term is the modelling of Poisson noise [1]. If t denotes the exposure time, the measured data can be assumed to be generated by a Poisson process with intensity tf . In this case, the upper bound on the error in the Kullback-Leilbler divergence is given by [36] H While in the deterministic setting, this estimate is sufficient to obtain convergence rates, the statistical setting requires further assumptions, in particular some concentration inequalities [2,36,37].

Strongly coercive fidelity terms.
Theorem 3.9. Suppose that the fidelity function H is coercive in the following sense for all v, f ∈ Y, where λ 1 and C > 0 are constants (we will assume with loss of generality that C = 1). Then under the assumptions of theorem 3.5 the following convergence rates hold where p † = −B * μ † is the subgradient from assumption 3. If α n is chosen such that α n ∼ (δ n ) If Y is an AM-space (cf theorem 3.6), the same rate holds for the symmetric Bregman distance D symm J (u n , u † J ). Proof. Since convex conjugation is order-reversing, from (3.21) we obtain that for any q ∈ Y * (we will drop the subscripts Y and Y * after the norms to simplify notation) where λ * = λ λ−1 . We will consider the cases λ > 1 and λ = 1 separately. Let λ > 1. Then from theorem 3.5 we obtain Condition (3.21) implies that f n −f Cδ 1 λ n . Hence, using the Cauchy-Schwarz inequality, we obtain Let now λ = 1. Then for sufficiently small α n 1 E * μ † we obtain from theorem 3.5 Such fidelities were studied e.g. in [43] and allow to simultaneously handle data from different modalities. Furthermore, in [44][45][46] fidelites of L 1 + L 2 -type were analysed and used for image restoration in the presence of mixed Gaussian and impulse noise. If H 1 and H 2 are proper, it holds Furthermore, under the hypothesis that H 1 is coercive, H 2 is bounded from below, and both are weakly- * lower semicontinuous convex functions, it holds that H is weakly- * lower semicontinuous, proper, and exact (see [49] for the statement and [50] for a proof on Hilbert spaces which generalises to Banach spaces). The latter means that the infimum in the definition of H is attained. In particular, there areḡ,h ∈ Y such thatf =ḡ +h and Furthermore, from (3.30) we get

Discrepancy principle
When the operator is known exactly, Morozov's discrepancy principle [10,33] can be used to select the regularisation parameter α n . In the case of a squared norm fidelity H(v| f ) = v − f 2 this amounts to selecting α n such that where u α n n is the regularised solution corresponding the regularisation parameter α n and τ > 1 is a parameter. Here we assume that f − f n 2 δ n (and not f − f n 2 δ 2 n ) to be consistent with our earlier notation. Convergence rates for this choice of α n in the case of an exact operator and an arbitrary convex regularisation functional were obtained in [11]. For the data fidelity given by the Kullback-Leibler divergence, the discrepancy principle is studied in [13].
In the case of an imperfect operator, the discrepancy principle needs to be modified. When the operator error is measured using the operator norm, i.e. one assumes that an approximate operator A h is available such that one can choose α n as follows [15] (in the case of a squared norm fidelity in the Hilbert space setting) If the fidelity term is not based on a norm and does not satisfy the triangle inequality, such a generalisation is not available. Since in our case the operator error is explicitly accounted for through the constraints in (2.3), we can use the discrepancy principle in its original form (4.1) with an arbitrary fidelity term. We will choose α n such that where v α n n solves (2.3) with the regularisation parameter α n and τ > 1 is a parameter.

Remark 4.1.
If the solution v α n n is unique, then we have In case of non-uniqueness, we can always choose a solution v α n n such that (4.4) is satisfied, following the argument in [12, proposition 3.5-remark 3.8] and using convexity of the objective function in (2.3).

Existence
In this section we study well-posedness of the discrepancy principle, meaning that there is a regularisation parameter α n which meets (4.3). Let (u α , v α ) be a solution of (2.3) corresponding to the parameter α > 0. Define the following functions:

Lemma 4.2. The function j(α) is monotone non-increasing and h(α) is monotone nondecreasing in α.
Proof. The proof is similar to [51].

Remark 4.3.
If either H(·| f n ) or J (·) is strictly convex, then h(α) and j(α) are indeed uniquely defined (the argument is similar to [38]). Otherwise the lemma applies to H(v α | f n ) and J (u α ) for any solution (u α , v α ) of (2.3).  Proof. We just sketch the proof. Letting α k → α, one can easily see that the corresponding solutions (v k , u k ) converge (up to a subsequence) weakly- * to (v, u) which solve the problem for α. Hence, by the lower semicontinuity of H and J the assertion follows. Proof. For every α > 0 because of the feasibility of (u † J ,f ) we get and in particular for almost all α > 0. Letting α ↓ 0 we obtain using the monotonicity of h that On the other hand, by assumption it holds Hence, in light of (4.6) and (4.7), and the monotonicity of h, there exists α n > 0 such that and τ can be chosen in (1, C). Since h is lower semicontinuous according to lemma 4.5, we get that sup α<α n h(α) τδ n which proves the assertion.

Remark 4.7.
The assumption of theorem 4.6 is rather weak. For instance, if H(0| f n ) < ∞, one can show that v α * 0 as α → ∞. Hence, one can relax the assumption to Cδ n H(0| f n ) which, for δ n sufficiently small, is fulfilled in many applications.

Convergence rates
Our goal in this section is to obtain convergence rates similar to those in theorem 3.5 (respectively theorem 3.6) for the parameter choice rule (4.3).  Table 1. Summary of convergence rates for different fidelities in terms of the data error δ, the operator error η and the regularisation parameter α. Whenever α is absent in the a priori rate, exact penalisation occurs and the rate is independent of α as long as it is smaller than a fixed constant. Optimal rates correspond to an optimal choice of α in the a priori rate.

Fidelity
A priori rate Optimal rate Discr. principle KL-and χ 2 -divergences, Strongly coercive fidelities. For a strongly coercive fidelity terms such that (3.21) holds, we immediately get, using the Cauchy-Schwarz inequality, that 1 λ + Cη n and therefore we get the following rate which coincides with the optimal rate in theorem 3.9. ϕ-divergences. For any ϕ-divergence that satisfies Pinsker's inequality [52] with where v, f ∈ P(Ω), we have the same situation as above. In particular, for the Kullback-Leibler divergence, the χ 2 -divergence an the squared Hellinger distance λ = 2 and , which coincides with the optimal rate (3.17).
We summarise all convergence rates for obtained in this paper in table 1.

Conclusions
In this work we have proven convergence rates in Bregman distances for variational regularisation in Banach lattices for problems with imperfect forward operators and general fidelity functions. Our results apply to many classes of fidelity functions and recover known convergence rates for norm-type fidelities and the Kullback-Leibler divergence in the case of exact operators. In addition, we have derived convergence rates for sums and infimal convolutions of fidelity functions, as used for mixed-noise removal. Furthermore, we have analysed an extension of Morozov's discrepancy principle to problems with operator errors in the Banach lattice setting, which does not rely on the triangle inequality and hence applies to a broader class of fidelity functions.

Appendix. Banach lattices and duality
The following definitions and results can be found, e.g., in [24]. Let U be a vector space and ' ' be a partial order relation on U (i.e. a reflexive, antisymmetric and transitive binary relation). For all x, y ∈ U we write x y if y x. The pair (U, ) is called an ordered vector space if the following conditions hold x y =⇒ x + z y + z ∀ z ∈ U, x y =⇒ ax ay ∀ a ∈ R + .
An ordered vector space (U, ) is called a vector lattice (or a Riesz space) if any two elements x, y ∈ U have a unique supremum x ∨ y and infimum x ∧ y. For any x ∈ U we define x + := x ∨ 0, x − := (−x) + , |x| := x + + x − .
For any x ∈ U it holds that Let · be a norm on U. The triple (U, , · ) is called a Banach lattice if (U, ) is a vector lattice, (U, · ) is a Banach space (i.e. it is norm complete) and for all x, y ∈ U |x| |y| =⇒ x y , or equivalently that x y for any x y 0.
A linear operator T acting between two vector lattices U 1 , U 2 is called positive, and we write T 0, if u 0 implies Tu 0 (the inequalities are understood in the sense of partial orders in U 1 and U 2 , respectively). A linear operator T is called regular if it can be written as a difference of two positive operators, T = T 1 − T 2 with T 1,2 0. The space of all regular operators U 1 → U 2 is itself an ordered vector space with the following partial order .1 ([24, proposition 1.3.5]). Let U 1 , U 2 be Banach lattices. Then every regular operator U 1 → U 2 is (norm) continuous.
The converse is in general false, i.e. not every continuous operator is regular. However, in some settings this is true. We repeat definition 2.5 for readers' convenience.
Definition A.2. A Banach lattice Y with norm · is called an AM-space (abstract maximum space) if x ∨ y = x ∨ y , ∀ x, y 0.
An element ∈ Y which meets 0, = 1, Definition A.3. A Banach lattice Y with norm · is called an AL-space (abstract Lebesgue space) if x ∨ y = x + y , ∀x, y 0.
If either Y is an AM-space with an order unit or X is an AL-space, then every linear bounded operator is regular (under some additional conditions, see [24, theorem 1.5.11] for a precise statement).
We need the following result. Furthermore, order intervals in U * are weakly- * closed.
Proof. We need to check that ϕ ψ 0 implies ϕ U * ψ U * . Splitting x ∈ U into positive and negative part as x = x + − x − with x ± 0, we obtain by linearity and non-negativity that

This implies
Hence, we obtain which proves that U * is a Banach lattice. Now we prove weak- * closedness of order intervals.
Here it is sufficient to show that whenever (ϕ k ) ⊂ U * converges weakly * to some ϕ ∈ U * and meets ϕ k 0 for all k ∈ N it holds ϕ 0. Using the assumptions we get 0 lim k→∞ ϕ k (x) = ϕ(x), ∀ x ∈ U, x 0, which according to (A.1) means ϕ 0.
We also need the following result unrelated to Banach lattices.
Lemma A.5. Let A : U * → V * be a bounded linear operator mapping between the duals of two Banach spaces U and V, and let J U and J V be the canonical embeddings of U and V into U * * and V * * . If A * J V (V) ⊂ J U (U), then A is weak- * to weak- * continuous.
Proof. Let (η k ) ⊂ U * converge weakly- * to η ∈ U * . Using that for any y ∈ V it holds A * J V (y) = J U (x) for some x ∈ U, we obtain which means that (Aη k ) converges weakly- * to Aη.