Discretization-invariant Bayesian inversion and Besov space priors

Bayesian solution of an inverse problem for indirect measurement $M = AU + {\mathcal{E}}$ is considered, where $U$ is a function on a domain of $R^d$. Here $A$ is a smoothing linear operator and $ {\mathcal{E}}$ is Gaussian white noise. The data is a realization $m_k$ of the random variable $M_k = P_kA U+P_k {\mathcal{E}}$, where $P_k$ is a linear, finite dimensional operator related to measurement device. To allow computerized inversion, the unknown is discretized as $U_n=T_nU$, where $T_n$ is a finite dimensional projection, leading to the computational measurement model $M_{kn}=P_k A U_n + P_k {\mathcal{E}}$. Bayes formula gives then the posterior distribution $\pi_{kn}(u_n | m_{kn})\sim\pi_n(u_n) \exp(-{1/2}\|m_{kn} - P_kA u_n\|_2^2)$ in $R^d$, and the mean $U^{CM}_{kn}:=\int u_n \pi_{kn}(u_n | m_k) du_n$ is considered as the reconstruction of $U$. We discuss a systematic way of choosing prior distributions $\prior_n$ for all $n\geq n_0>0$ by achieving them as projections of a distribution in a infinite-dimensional limit case. Such choice of prior distributions is {\em discretization-invariant} in the sense that $\prior_n$ represent the same {\em a priori} information for all $n$ and that the mean $U^{CM}_{kn}$ converges to a limit estimate as $k,n\to\infty$. Gaussian smoothness priors and wavelet-based Besov space priors are shown to be discretization invariant. In particular, Bayesian inversion in dimension two with $B^1_{11}$ prior is related to penalizing the $\ell^1$ norm of the wavelet coefficients of $U$.


Introduction
Consider a quantity U that can be observed indirectly through a relation where A is a smoothing linear operator and E is white noise. Here U and M are considered as continuum objects, or functions defined on subsets of R d , so that our discussion applies to classical models of mathematical physics such as Laplace, Maxwell, Helmholtz or Schrödinger equation. We are interested in the use of Bayesian inversion to find information about U from measurement data concerning M. Let U(x, ω), M(y, ω) and E(y, ω) be random functions where ω ∈ Ω is an element of a complete probability space (Ω, Σ, P) and x and y denote the variables in domains of Euclidean spaces. We analyze Bayesian estimates of U when a continuum model of the form (1) is approximated by finite-dimensional models to allow computerized inversion.
Assume that a measurement device provides us with a realization of the random variable where A k = P k A and E k = P k E. Here P k is a linear operator related to the device; for simplicity we assume that P k is an orthogonal projection with k-dimensional range. We call (2) the practical measurement model and (1) the continuum model. Realizations of measurements are denoted by m k = M k (ω 0 ) and m = M(ω 0 ), respectively, where ω 0 ∈ Ω is a specific element in the probability space. This study concentrates on the inverse problem given a realization M k (ω 0 ), estimate U, where the estimates in question are means and confidence intervals related to a Bayesian posterior probability distribution.
For example, consider the brain imaging method called magnetoencephalography (meg), see e.g. [23]. Maxwell's equations describe how synchronized neuronal currents U in the cerebral cortex produce a magnetic field AU that can be measured at the surface of the head. Let E denote the magnetic fields produced by all external sources; then the continuum model (1) describes the total magnetic field M = AU + E. In practice one measures the inner products M, φ j , j = 1, 2, . . . , k, where φ j are linearly independent device functions corresponding to measuring the flux of the magnetic field M through a small surface determined by the jth measurement unit (squid). As an idealization, let us assume that φ j are orthonormal so that P k v = k j=1 v, φ j φ j is an orthogonal projection. Then meg data is modelled by P k M.
Computational solution of (3) using Bayesian inversion involves discretization of the unknown quantity U. We assume that U is a priori known to take values in a function space Y . Choose a linear projection T n : Y → Y with n-dimensional range Y n , and define a random variable U n := T n U taking values in Y n . This leads to the computational model (4) M kn = A k U n + E k involving two independent discretizations: P k is related to the measurement device and T n to finite representation of the unknown.
In the above application related to meg, the projection T n corresponds to an approximate representation of the electromagnetic sources in the brain using a finite set of basis functions (defined for instance according to a finite element method). Note that the model (4) is virtual in the sense that U n appears neither in the continuum model (1) nor in the practical measurement model (2). In particular, measurement M kn (ω 0 ) is related to the computational model but not to the practical measurement model. This is why we use m k = M k (ω 0 ) as the given data.
Denote the probability density function of the random variable M kn by Υ kn (m kn ). The posterior density for U n is given by the Bayes formula: where the exponential function corresponds to (4) with white noise statistics with identity variance, and a priori information about U is expressed in the form of a prior density Π n for the random variable U n . The density Π n assigns high probability to functions that are typical in light of a priori information, and low probability to atypical functions.
We can now state the inverse problem (3) more specifically: given a realization m k = M k (ω 0 ), estimate U by u kn , where the conditional mean (cm) estimate (or posterior mean estimate) u kn is (7) u kn := Yn u n π kn (u n | m k ) du n .
We remark that applying the mcmc solution strategy requires also practical implementation of the operator A k . In the case of meg imaging A k corresponds to computing the electromagnetic field A k U n using the discrete current U n and an approximate numerical solution to Maxwell's equations. We neglect the effects of numerical error in the implementation of A k in this paper. (One possibility to take this error into account is to use the approximation error model of Kaipio and Somersalo [33,Section 5.8], but this leads to non-Gaussian noise statistics in general and thus falls outside the scope of this discussion.) Summarizing, our starting point is the infinite-dimensional continuum model (1). A measurement instrument provides us with finite-dimensional noisy data m k = M k (ω 0 ) described by the practical measurement model (2). Our aim is to use m k to find information about the unknown U. To allow computerized inversion we construct the fully discrete computational model (4) involving a priori information about U, and we write down the posterior distribution (5). Finally, we use formula (7) and numerical methods to estimate U with u kn . However, in definition (7) the data m k comes from the practical measurement model (2), while π kn is related to the computational model (4). Taking this incompatibility into account is one of the central novelties in this work.
Constructing T n and Π n is the core difficulty in Bayesian inversion. Often there is no natural discretization for the continuum quantity U, so n can be freely chosen. Consequently, T n and Π n should in principle be described for all n > 0, or at least for an infinite sequence of increasing values of n. Also, updating our measurement device may increase k independently of n. This work is motivated by the need to avoid the following unwanted phenomena: (a) The estimates u kn diverge as n → ∞. In this case investing more computational resources to modeling our unknown does not necessarily result in improved reconstructions. (b) The estimates u kn diverge as k → ∞. In this case performing more measurements may lead to worse reconstructions. (c) Representation of a priori knowledge is incompatible with discretization. It is reported in [39] that discrete (non-Gaussian) total variation priors converge to a Gaussian smoothness prior as n → ∞. In this case one makes the mistake of specifying different a priori information for different values of n. See Appendix B below for more details.
A choice of T n and Π n is called discretization-invariant if it avoids (a)-(c). We construct prior distributions for U in the infinite-dimensional space Y . Then the random variable U n = T n U takes values in the finite-dimensional subspace Y n ⊂ Y and represents approximately the same a priori knowledge as U. Further, we analyze convergence of u kn using a deterministic function called reconstructor that almost surely maps a given measurement to the conditional mean estimate. For example, the reconstructor R M kn (U n | · ) corresponding to the computational model (4) takes the measurement data m k = M k (ω 0 ) to the mean u kn defined in (7). The infinite-dimensional model M = AU + E has a reconstructor R M (U| · ) as well.
Theorem 7 below states under suitable assumptions on U, T n and P k that where the realization m k = P k m comes from a realization m = M(ω 0 ) of the random variable M.
Our proving strategy involves another measurement model analogous to (4): where the noise is not finite-dimensional. The noise models (1) and (9) now contain the same (continuum) white noise process. This allows us to prove convergence results in a fixed function space. Theorem 5 below states under suitable assumptions on U, T n and P k that if lim k→∞ m k = m then lim n,k→∞ Formula (8) follows by showing that the reconstructors coincide: R Θ kn (U n |m k ) = R M kn (U n |m k ) for m k ∈ Ran (P k ). We will consider also more general reconstructors R M (g(U)|m) that can be used to analyze convergence of confidence intervals. Our way of using infinite-dimensional limit processes is one possibility to achieve discretization-invariance since we avoid problems (a)-(c).
We are especially interested in discretization-invariant and edge-preserving Bayesian inversion. Total variation regularization of Rudin, Osher and Fatemi [57] penalizes the L 1 norm of derivatives and yields edge-preserving reconstructions in practice [20,64,31,60,36]. These results are equivalent to computing maximum a posteriori estimates using a total variation prior and a Gaussian likelihood. However, Bayesian inversion with discretized total variation prior distribution, where Y n ⊂ W 1,1 0 (0, 1) are finite dimensional subspaces, is not discretization-invariant [39]; in particular, conditional mean estimates (i.e. posterior mean estimates) lose their edge-preserving quality as n → ∞.
Wavelet-based Bayesian inversion using Besov space priors is used in applications with results similar to total variation regularization, see [8,55,51,34]. The Besov space B 1 11 that bounds L 1 norms of (band-filtered) first derivatives similarly to (10) is useful for image processing, see Meyer [47]. One of the main results of this paper is that Besov priors are discretization-invariant; we emphasize that this gives us non-Gaussian discretization-invariant priors. The proof is based on the analysis of the infinite-dimensional limit case and includes a quantitative estimate on the speed of convergence of reconstructors. Further, we show in Section 2.2 for the special case of two-dimensional deblurring that B 1 11 inversion using u kn reduces to applying ℓ 1 -type prior on the wavelet coefficients of U n -a combination of two well-known computational methods. See Section 2.2 for more details. There is an interesting parallel to algorithms computing map estimates with penalty on the ℓ 1 norm of Fourier or wavelet coefficients [21,61,17,42].
Let us review previous literature on the topic. The study of Bayesian inversion in infinite-dimensional function spaces was initiated by Franklin [29] and continued by Mandelbaum [45], Lehtinen, Päivärinta and Somersalo [40], Fitzpatrick [28], and Luschgy [44]. The concept of discretization invariance was formulated by Markku Lehtinen in the 1990's and has been studied by Lasanen [37], Piiroinen [53], D'Ambrogi, Mäenpää and Markkanen [3], and Lasanen and Roininen [38]. A definition of discretization invariance similar to the above was given in [39]. For other kinds of discretization of continuum objects in the Bayesian framework, see [6,50]. The use of wavelets and Besov spaces in statistical algorithms is discussed in [1,2,8,41,14,49]. For regularization based approaches for statistical inverse problems, see [25,26,9,54]. The relationship between continuous and discrete (non-statistical) inversion is studied in Hilbert spaces in [66]. See [11] for specialized discretizations for inverse problems.
We remark that working entirely within the computational model (4) would require using a realization M kn (ω 0 ) of the random variable M kn instead of m k = M k (ω 0 ) in formula (7). The starting point of inversion in earlier studies of discretization invariance is indeed the random variable M kn or its realization. However, the appropriate model of data given by the measurement device is a realization m k related to model (2). Rigorous analysis of this incompatibility is a central novelty in this paper. To emphasize this aspect we show that inversion from m k using Gaussian smoothness priors or Besov space priors is discretization-invariant. We do not discuss non-Gaussian noise models in this paper. This paper is organized as follows. In Section 2 we discuss discretization-invariant Bayesian inversion using Gaussian and Besov priors. In Section 3 we present the general theory of discretization invariance. In Section 4 we discuss random variables with values in Besov spaces. After proving some technical results in Section 5 we apply them in Section 6 to prove convergence of reconstructors arising from Besov priors. In Appendix B we consider examples.
Below ·, · refers to pairing of either generalized functions with test functions, or a Banach space with its dual. We denote by ·, · X the inner product in a Hilbert space X. The notation L(X, Y ) stands for the space of bounded linear operators between Banach spaces X and Y , and L(X, X) is abbreviated as L(X). We occasionally denote the norm · L(X,Y ) by · X→Y . The specific element ω 0 ∈ Ω of the probability space denotes realizations of measurements throughout the paper.
Acknowledgements: the authors thank Markku Lehtinen, Heikki Haario and Marko Laine for useful discussions and the anonymous referees whose thoughtful comments helped us to improve the paper. ML and SS were supported by National Technology Agency of Finland (TEKES) under Contract 206/03, Finnish Centre of Excellence in Inverse Problems Research and PaloDEx Group. ES was supported by the Academy of Finland, projects no. 113826 and 118765, and by the Finnish Center of Excellence in Analysis and Dynamics.

Example: discretization-invariant deblurring
In this section we give examples of discretization-invariant prior distributions and consider a simple inverse problem to give a flavour of our results to the reader. The precise definitions in a more general setting are postponed to the later sections.
We discuss Bayesian deconvolution using a Gaussian smoothness prior and a Besov space prior. In both cases we define the prior distribution in the continuous context and then marginalize it to discrete cases to allow practical computation. The Gaussian case is shown to be related to deterministic Tikhonov regularization with derivative penalty.
For simplicity, we ignore boundary effects by considering periodic functions. The loss of generality is not too bad from the practical point of view since the periodic analysis covers compactly supported non-periodic cases.
Let T 2 be the two-dimensional torus constructed by identifying parallel sides of the square D = (0, 1) 2 ⊂ R 2 ; we model periodic images as elements of function spaces over T 2 . The continuum model is M = AU + E with convolution operator A defined by (11) Au where Φ ∈ C ∞ (T 2 ) is a point spread function.
2.1. Gaussian smoothness prior. For any s ∈ R, let H s (T 2 ) be the L 2 -based Sobolev space equipped with Hilbert space inner product Note that H 0 (T 2 ) = L 2 (T 2 ).
Recall that a generalized Gaussian random variable V takes values in the space of generalized functions, and the pairing V, φ with any test function φ ∈ C ∞ (T 2 ) is a Gaussian random variable taking values in R, see [56]. The Gaussian random variables we will consider below are assumed to take values in some Hilbert space, typically in a Sobolev space H s (T 2 ), where the smoothness index s ∈ R may also be negative. Now, if V takes values in a Hilbert space X we say that V has the covariance operator with any φ, ψ ∈ X. Here ·, · X stands for the inner product in X.
Next we analyze a simple measurement model as an example. Consider the continuum model M = AU + E where the convolution operator (11) is now viewed as a smoothing map A : H s (T 2 ) → C ∞ (T 2 ).
We construct the smoothness prior by choosing U to be a generalized Gaussian random variable taking values in H −1 (T 2 ) and having expectation E U = 0 and covariance operator C U = α −1 (I − ∆) −2 . Here α > 0 is a regularization parameter. The operator C U corresponds formally to the prior and generates the discrete smoothness priors widely used in practice. However, it is curious that in spite of the term smoothness prior the realizations of U are almost surely not even L 2 (T 2 ) functions, let alone differentiable. This is why we need to consider U as taking values in some space H s (T 2 ) with negative smoothness index s; the value s = −1 is chosen just for convenience.
White noise E is a generalized Gaussian random variable with expectation E E = 0. Let us discuss the choice of an appropriate covariance operator: the standard definition of white noise as a generalized random variable is that where ·, · denotes the pairing between generalized functions and test functions. To consider the white noise E as a Hilbert-space-valued random variable, we can choose the Hilbert space to be any Sobolev space H s (T 2 ), s < −1. One possible choice is to consider E as taking values in H −2 (T 2 ) and choose the covariance operator (13). Note that realizations of E belong to L 2 (T 2 ) only with probability zero, and this is why we need to use Sobolev spaces with negative smoothness indices s. Now the continuous framework for inversion is in place. Let m = M(ω 0 ) be a realization of the measurement M = AU + E. Since both the prior and noise statistics are Gaussian, the conditional mean estimate coincides with the location of the maximum of the posterior density. Thus we can evaluate the cm estimate u as (15) u = argmax We omitted in formula (15) the constant term m 2 L 2 (T 2 ) in the formal expansion (16) m − Au 2 , which is well-defined only when m ∈ L 2 (T 2 ), and this happens with probability zero. Also, the smoothing properties of the operator A make it possible to replace the formal quantity m, Au L 2 (T 2 ) in (16) by the rigorously defined pairing m, Au in (15). Note that we can write (15) in the form u = arg min The practical measurement model is M k = P k AU + P k E, where P k : L 2 (T 2 ) → L 2 (T 2 ) is an orthogonal projection with k-dimensional range. We require that the sequence P k converges strongly to the identity operator on L 2 (T 2 ) as k → ∞. For example, P k may measure averages of AU over k pixels on T 2 and construct a piecewise function or compute a truncated Fourier series expansion to get an element of L 2 (T 2 ). Here k can be arbitrarily large, enabling models of imaging devices with any resolution.
Let us now turn to practical inversion where all appearing quantities are finite dimensional. We need to discretize the unknown. Take T n : H −1 (T 2 ) → H −1 (T 2 ) to be truncations of Fourier series expansions to n lowest frequency terms. Then T n are linear orthogonal projections with n-dimensional range Y n converging strongly to the identity operator on H −1 (T 2 ) as n → ∞. Let U be as in the continuum case above and define a random variable U n := T n U taking values in Y n . The conditional mean estimate for U n (determined using the posterior distribution corresponding to model M kn = P k AU n + E k ) is u kn := arg min is the realization of measurement M k in the practical measurement model. Theorem 5 below implies in particular that u kn → u as k, n → ∞, showing that Bayesian deblurring with Gaussian smoothness prior is discretization-invariant.
Much of the material in this Subsection 2.1 is due to Lasanen [37] and Piiroinen [53]. We emphasize that we assume given a realization m k = M k (ω 0 ) from practical measurement model (2) and that we do not have available any realization of M kn . Because of this, m k is used in formula (17). This is the main novelty of our Gaussian example compared to [37].
We close this section by discussing a connection to generalized Tikhonov regularization, here defined as finding the element in u ∈ Y n ∩ H 1 (T 2 ) that minimizes the functional . After similar modification as above we can define the Tikhonov solution by The quadratic forms defined in (17) and (18) are the same, so u kn = u T kn . Thus our results apply to the convergence analysis of Tikhonov regularization as well.
2.2. Besov prior. Gaussian smoothness priors are designed for representing the prior information that the unknown physical quantity U does not vary sharply. However, sometimes we know a priori that U is piecewise regular, and other kind of priors are needed. One could use Gaussian hyperpriors as in [32,12] or geometric priors as in [50]. We discuss a different approach that allows analysis of an infinitedimensional limit case and consequently leads to discretization-invariant inversion.
We replace the discretization-dependent [39] total variation prior ). Since the norm of B 1 11 (T 2 ) bounds the L 1 norms of (band-limited) first derivatives of u, we expect that the B 1 11 prior can be used for edge-preserving inversion. It is computationally convenient that B 1 11 functions can be written using a compactly supported wavelet bases and the B 1 11 norm can be computed as a weighted sum of wavelet coefficients, see Appendix A.
The continuum measurement model is M = AU + E where the linear convolution operator (11) is considered as a smoothing map A : . We construct the B 1 11 (T 2 ) prior by expanding U in the wavelet basis as the sum This distribution arises naturally due to the wavelet characterization of Besov spaces. The scale of the wavelet basis functions becomes finer when ℓ increases; for the exact bookkeeping of scales and locations of ψ ℓ as function of ℓ we refer to Appendix A. We take T n : that simply truncate the wavelet expansion to n first terms. These operators T n converge strongly to the identity as n → ∞. For each n, define a random variable U n := T n U taking values in Y n := span(ψ 1 , . . . , ψ n ) and consider the model With the above choices the posterior distribution of U n takes the computationally efficient form in terms of the wavelet coefficients X n := (X 1 , X 2 , . . . , X n ) of U n , namely, the probability distribution of X n conditioned with M kn is The conditional mean estimate can be computed approximately e.g. using a Markov chain Monte Carlo algorithm such as the Gibbs sampler or the Metropolis-Hastings method. Sampling from L 1 distributions is explained e.g. in [43,31] and [33, Section 3.3.2], and modifying those methods for computing the mean of (20) involves only adding the fast wavelet transform appropriately.
Our new results below show that Besov priors are discretization-invariant. First, by Theorems 7 and 19 and Corollary 20 the estimates u kn converge when n, k → ∞ either separately or simultaneously. Thus Besov priors do not involve the possible difficulties (a) and (b) mentioned in the introduction. Second, the prior distributions defined in Y n converge to a limit distribution in Y as n → ∞ (Proposition 11). Thus we do not have the unwanted phenomenon (c) of the introduction.

General theory of discretization invariance
Consider two independent random variables U and E and the measurement model M = AU + E. We construct a rigorous stochastic framework for discretizationinvariant Bayesian inversion using the following diagram of spaces and maps: Our immediate task is to define all the objects in (21).
Recall the notion of a random variable taking values in a Banach space. For any given separable Banach space X we denote the dual of X by X ′ . Let B X be the Borel σ-algebra of (X, τ w ) with τ w the weak topology of X. Note that the separability of X verifies that B X coincides with the Borel σ-algebra of (X, τ n ), where τ n is the norm topology of X. An X-valued random variable V is simply any measurable map V : (Ω, Σ) → (X, B X ). In this paper we consider only random variables with values in separable Banach spaces.
We now assume that the space Y in the diagram (21) is a separable Banach space The measurement noise E = E(ω 2 ) with ω 2 ∈ Ω 2 is a Gaussian random variable taking values in a separable Hilbert space S; the expectation satisfies E E = 0, and the covariance operator C E : S → S, is defined by requiring We assume that the essential range of E is dense in S. Then C E is one-to-one, selfadjoint and in the trace class, and we may define the unique positive and self-adjoint power C t E for any t ∈ R. For t ≥ 0 we denote S t := C t E S; henceforth the spaces S 1/2 and S 1 in (21) are well defined.
The space Z = S 1/2 = C 1/2 E S is called the Cameron-Martin space of E, and E is called the white noise process in Z. We remark that the realizations of E belong to Z only with probability zero. Note that C t E : S → S t is an isomorphism when the norm of the dense subspace S t ⊂ S is defined as In the Gaussian example in Section 2.1 we have Finally, we assume that A : Y → S 1 is a bounded linear operator. The definition of the objects in diagram (21) is now complete, and we turn to discussing reconstructors.
We analyze finite-and infinite-dimensional Bayesian inversion simultaneously, so let us introduce a rigorous setting for discrete approximations to the random variable U above. We say that Y -valued random variables U n tend weakly in distribution for every y ′ ∈ Y ′ . Note that here "weakly" refers to the weak topology used in space Y , and "distribution" to the convergence of scalar-valued random variables.
Definition 1. (Linear discretization of random functions) Let Y be a separable Banach space and Y n ⊂ Y be finite-dimensional subspaces. The spaces Y n need not be nested. Let U n = U n (ω) be Y n -valued random variables with n = 1, 2, 3, . . . . Assume that (2) There are bounded linear operators T n ∈ L(Y ) such that Then we say that the U n , n ≥ 1, are proper linear discretizations of U in Y .
Examples of random variables that form or do not form proper linear discretizations are given in Appendix B.
In following, we mainly consider cases where T n are projection operators.
Note that all vector-valued integrals in this work are standard Bochner integrals. We refer the reader to [18] for definition and basic facts on Bochner integral and vector-valued conditional expectations. The operator P : Now we are ready to give the definition of a (non-unique) reconstructor.
Definition 3. Denote by M ⊂ Σ be the σ-algebra generated by the random variable M(ω). We say that any deterministic function Note that reconstructor is a deterministic function. Also, if a realization of the measurement in the computational model, M kn (ω 0 ), is substituted in the reconstructor R M (U| · ), then the obtained result R M (U| M kn (ω 0 )) coincides with conditional expectation. Thus, reconstructor, considered as a deterministic function, is just the functional representation of conditional expectation (see e.g. [59], section II.7, formula (43)). However, we assume that we are not given a realization of the measurement in computational model, M kn (ω 0 ), as data, but instead M k (ω 0 ) = P k M(ω 0 ), a realization of the measurement in the practical measurement model (2). An essential feature of the reconstructor is that it is defined for all elements in S. Thus, even though the reconstructor is related to the computational model (4), it is possible to substitute into the reconstructor the realization of the practical measurement model (2). This is the reason why the reconstructor, which has the same functional representation as conditional expectation, is defined as a new concept.
One may generalize the well-known scalar-valued result on the existence of reconstructor, see [59, II.3 Theorem 3 and II.7.5], to Bochner-valued conditional expectations. However, we will not need this since we next establish a specific formula for a reconstructor in our situation. Note that the following result is close to the usual functional representation of the conditional expectation, c.f. [59]. As we need to define reconstructors as a deterministic function of the space S and also introduce notations for later use, we present the proof of the result for completeness.
where λ stands for the distribution of U in Y . Then the function is well-defined and satisfies R M (g(U)|M(ω)) = E (g(U)|M)(ω) almost surely, that is, formula (23) defines a reconstructor.

Proof.
Using the equality M = AU + E, and Fubini theorem, we have for any Above, m ′ is an integration variable running over the space S where M is taking values. Thus we have for any integrable function f : One checks that the same holds for any Bochner integrable function f : S × Y → Y by simply using the fact that such an f is an almost sure limit of simple functions f k that satisfy the pointwise inequality Since Z is the Cameron-Martin space of E, we have for any a ∈ S 1 by [10, Cor.

2.4.3] the Radon-Nikodym
The latter formula has the advantage of being well-defined for every m ∈ S. In particular, we have 1 To motivate formula (25), we note that if Z would be finite-dimensional, we could write The formula (25) is a generalization of this for the infinite-dimensional case.
Using formula (26) we see that formula (22) can be written as By Fubini theorem we may continue from (24) to obtain Especially for E ⊂ B S it holds that and thus ν << µ and dν dµ (m) = H 1 U,M (m) for almost every m with respect to µ, where Observe that H is well defined. Now by (24)

Hence by Fubini theorem for
Thus we have the almost sure equality This verifies the formula for R M (g(U)|m) given in the assertion. 2 For convenience, let us look at Theorem 4 in the finite-dimensional case. Then Y = R n and Z = S = R m , E is the Gaussian white noise with the identity covariance matrix, and U and M have smooth everywhere positive probability density functions π U (u) and π M (m), correspondingly. It follows that satisfies the assumptions of Definition 3. Compare formulas (23) and (27) and see Appendix C for further discussion. Note that (27) is widely used in practical Bayesian inversion [63,33,13].
Stability of the reconstructor with respect to data m is important from the point of view of practical inversion. The following theorem yields a non-quantitative convergence result for reconstructors in general. We provide sharper results for the special case of Besov priors later in sections 5 and 6.
Theorem 5. Assume that the exponential moments of U satisfy Take g : Y → R to be such a continuous function that with some constant a. Assume that T n : Y → Y , n > 0, and P k : with some C 0 > 0. Finally, let m k , m ∈ S satisfy lim k→∞ m k = m in S.
Then we have the convergence lim n,k→∞ where the reconstructors are defined using formula (23) for models (9) and (1), respectively. Moreover, the limits exist for a fixed value of k (resp. n).

Proof. We have
and since (28) and (29) imply E |g(T n U)| < ∞ we may also write by our assumptions. The claims follows now from the Lebesgue dominated convergence theorem by applying the majorant a exp((c + aC 0 ) U(ω 1 ) Y ). 2 Now the general theory of discretization-invariant Bayesian inversion is in place for the case of measurement models M = AU + E and Θ kn = P k AT n U + E concerning infinite-dimensional noise E. Using the continuum noise E is convenient above because we can work in the same function space regardless of k.
Assume given data M k (ω 0 ) corresponding to the practical measurement model (2) and consider the computational solution of the inversion problem using the computational model (4), where random error is finite-dimensional white noise E k = P k E. It remains to discuss the implications of our general theory for these practical models. To do this, assume that P k are projections P k : S → S having the following properties: First we show that reconstructors corresponding to the measurement models Θ kn = P k AU n + E and M kn = P k AU n + P k E actually coincide. Lemma 6. Assume P k : S → S is a projection satisfying (34) and (35). Then the reconstructors defined in Theorem 4 for the measurement models Θ kn = P k AU n + E and M kn = P k AU n + P k E satisfy for m k ∈ Ran (P k ).

Proof.
Consider first E k = P k E as a Gaussian random variable taking values in the space Ran (P k ) that has the inner product inherited from S. The random variable E k has zero expectation.
As the finite-dimensional projection P k : S → S is bounded, the density of Z in S implies that P k η, φ S×S 1 = η, P k φ S×S 1 for all η ∈ S and φ ∈ S 1 . Using this, we see for φ, ψ ∈ Ran (P k ) This implies that the covariance operator of E k , considered now as a Gaussian random variable taking values in Ran (P k ) endowed with the inner product inherited from Z, is the identity operator. Using this we see that the reconstructor defined in Theorem 4 for the measurement M kn has the form ) for m k ∈ Ran (P k ). The reconstructor R Θ kn (g(U n )|m k ) defined in Theorem 4 for the measurement Θ kn has the same form, and the assertion follows. 2 Finally, we prove the convergence of reconstructors for models with discrete noise.
Theorem 7. Assume that in addition to conditions (28)- (32), the projections P k : S → S satisfy (34), (35), together with Let u = U(ω 0 ), ε = E(ω 0 ), ω 0 ∈ Ω be realizations of the random variables U and E, and let be the realizations of the measurements (1) and (2), respectively. Then the reconstructors defined in Theorem 4 for the measurement models M kn = P k AT n U + P k E and M = AU + E satisfy lim n,k→∞ Moreover, the limits lim n→∞ R M kn (g(U n )|m k ) and lim k→∞ R M kn (g(U n )|m k ) exist for a fixed value of k (resp. n).
Proof. By (36), lim k→∞ m k = m in S, and hence the assertion follows by Theorem 5 and Lemma 6. 2 Theorem 7 concerns the convergence of practical inversion methods: m k is data provided by an actual measurement device, and the computational model M kn = P k AT n U + E k allows computer implementation. For instance, most Markov chain Monte Carlo inversion algorithms are programmed to evaluate R M kn (U n |m k ).
Let E ⊂ Y be a Borel set and χ E be the indicator function of E. Using reconstructors, we define For a given m ∈ S, the map E → P(E|m) is a probability measure on Y by equation (23). Next, let E be a fixed Borel set of Y . If we substitute M(ω) in the function m → P(E|m), we obtain by Definition 3 where P({U ∈ E}|M)(ω) is the conditional probability for the event {U ∈ E} with respect to the σ-algebra M (see [35,Sec. 6]). Thus, roughly speaking, m → P(E|m) can be considered as the posterior probability for the event U ∈ E when the measurement M gets value m, and P kn (E|m k ) the posterior probability for the event U n ∈ E when the measurement M kn gets value m k .
Recall that measures µ j on Y convergence weakly to measure µ as j → ∞ if lim j→∞ Y g dµ j = Y g dµ for all bounded continuous functions g : Y → R. Thus Theorem 7 yields the following corollary.
Corollary 8. Let the assumptions of Theorem 7 hold. Then the Borel measures E → P kn (E|m k ) converge weakly to the measure E → P(E|m) as n, k → ∞.

Random variables in Besov spaces
We wish to use priors of the form for any integrability parameter 1 ≤ p < ∞ and some chosen smoothness s ∈ R. Note that we consider now functions on a general d-dimensional torus T d .
Following Appendix A we use a compactly supported waveletψ(x), x ∈ R d and scaling functionφ(x) suitable for multi-resolution analysis of smoothness C r in L 2 (R d ) with large enough r. Then we can expand functions as is finite. For the exact bookkeeping of scales and locations of ψ ℓ as function of ℓ we refer to Appendix A.
Definition 9. Let 1 ≤ p < ∞ and s ∈ R. Let (X ℓ ) ∞ ℓ=1 be independent identically distributed real-valued random variables with probability density function Let U be the random function Then we say that U is distributed according to a B s pp prior.
Next we show that random variable U in Definition 9 is a well defined object.
Lemma 10. Let U be as in Definition 9 and take t ∈ R. The following three conditions are equivalent: Denote by (X ℓ ) ∞ ℓ=1 a sequence of independent random variables with density (38). Assume (iii). In order to deduce (ii) we need to show the finiteness of the quantity Above we used independence and the observation that for k ∈ (0, 1) one may compute E exp k|X ℓ | p = (1 − k) −1/p . Clearly the product in (39) converges if t < s − d/p. The notation a ≃ b stands for the existence of a positive constant c < ∞ such that a/c ≤ b ≤ ca.

Observe next that obviously (ii) implies (i). Finally, assume that (i) is true. Then
almost surely. Since the the random variables |X ℓ | p are non-negative and identically distributed, an easy application of truncation and [35,Thm. 4.17] shows that almost sure finiteness of the sum implies finiteness of the expectation. Hence (i) implies (iii). 2 Now we easily see that B s pp (T d ) distributions generate discretization-invariant priors. Proof. The random variables U n = T n U converge almost surely in norm topology of B t pp (T d ). Since almost sure convergence implies convergence weakly in distribution, the assertion follows. Note that if above t > d p , then the continuous embedding B t pp (T d ) → C(T d ) implies that realizations of U and U n are almost surely continuous.

Quantitative estimates for reconstructors
This section studies the case that U is distributed according to B s pp -prior. We provide quantitative stability estimates for reconstructors. For p > 1 qualitative results are described by Theorem 5, thanks to Lemma 10(ii). However, the case p = 1 is more difficult since condition (28) fails.
We collect a set of assumptions together for later reference: Assumption A. Let s ∈ R and 1 ≤ p < ∞ be arbitrary. Fix t, t, r ∈ R such that t < t < s − d/p and r > d/2.

Let A be a linear operator satisfying
Assume that g : B t pp → B t pp is a map such that for some q < ∞ we have Finally, let G : B e t pp → B t pp be a bounded operator. How does the diagram (21) look for the B s pp -prior? Take t < s − d p and set Observe that by Lemma 10 the random variable U takes values in the Besov space where · , · is the distribution duality. To make our results more precise, we will consider the white noise as a random variable taking values in a Besov space instead of Sobolev spaces, and consider E here as a random variable taking values in the Besov space B −r ∞∞ (T d ). In assumption A we are particularly interested in g being a characteristic function g(u) = χ E (u) of some set E (corresponding to confidence intervals), the identity map g(u) = u (corresponding to the mean), and g(u) = u q B t pp . We denote by H g (m, A, G) the quantity In the case g ≡ 1 the operator G plays no role, and the corresponding real-valued quantity is denoted simply by H 1 (m, A). In general, the above integral is defined as a B t pp -valued Bochner integral. The weak measurability of the integrand is obvious, whence the strong measurability follows by the separability of the spaces involved. The following auxiliary result will be used to verify integrability and to estimate the sensitivity of the quantity H g (m, A, G) with respect to changes in the variables. Below, for 1 ≤ p < ∞ we denote p ′ = p/(p − 1).

Proposition 12.
Let U be distributed according to B s pp prior as in Definition 9. Let q ≥ 0 and m ∈ B −r ∞∞ . Denote by S q (m, A) the random variable with the understanding that in the case p = 1 one sets 2r/p ′ = 0.
Proof. In order to prove (41) we first observe that by Lemma 10(ii) and Cauchy-Schwarz inequality it is enough to estimate the expectation where For ℓ ≥ 0 denote by R ℓ the standard projection to the first ℓ coordinates in the wavelet basis, ℓ ≥ 0, with R 0 = 0 and Q ℓ = I − R ℓ . Fix an integer ℓ 0 ≥ 0 and divide further above m = R ℓ 0 m + Q ℓ 0 m.
We now assume that p > 1. By completing the square, and applying Young's inequality in the form −x p /2 + 2xy ≤ c p y p ′ we obtain If m = 0 there is nothing to prove, so assume that m = 0. Recall that w ≥ r and consider first the situation where where 1/p + 1/p ′ = 1. In this case we choose .

We may estimate
Moreover, one easily verifies that By invoking the choice of ℓ 0 and applying the auxiliary estimates (46) and (47) we compute from (44) that which finishes the proof in the situation (45). Above, we also used the observation that A * On the other hand, if (45) is not valid, that is, we apply the choice ℓ 0 = 0 in (44) to estimate . In case w > r the last inequality above followed from inequality (48).
Consider next the case p = 1. Assuming first that we utilize the fact that w > r and choose and observe that then (46) verifies that A * Q ℓ 0 m B −t ∞∞ ≤ 1/4, which in turn yields , where we also applied the estimate (47). Finally, in case (50) does not hold we may choose ℓ 0 = 0 above and obtain that Summarizing, we have shown that in all the above cases with some c 1 , c 2 > 0, which finishes the proof of the Proposition. Our first application of the above result will be a local Lipschitz continuity estimate for the quantity H g (m, A, G).

Then it holds that
Here w and other indices are as in Proposition 12.

Proof.
Choose It will be convenient to introduce for each x ∈ [1, 2] the notation G , and m x = m + (x − 1)(m ′ − m). As g was assumed to be Lipschitz it follows that for each fixed U the function x → f (x, U) is also Lipschitz. Hence we may compute for almost every x ∈ [1, 2] In order to estimate the right hand side we observe first that our assumption on g yields for almost every x ∈ [1, 2] Moreover, by using the observation that B r By combining the previous bounds and the obvious bound for g(G x U) we obtain The Fubini theorem allows us to compute According to Proposition 12 there is the uniform bound and the (54) follows immediately by combining this with (57). 2 The local Lipschitz constant of a given map r : E → F between the Banach spaces E and F is defined as In terms of this quantity the previous proposition states that , where the linear factors containing norms of A and m were absorbed in the exponential term. Above we consider H g as the map Before we are able to give good estimates for the Lipschitz constant of the ratio H g /H 1 we need a couple of auxiliary results.
Lemma 14. Let a > 0 and assume that f and F are non-negative random variables with E F < ∞. Then there is a constant c depending only on a so that

Proof.
Let us consider the change of probability measure P F that is simply obtained by using F as a weight and normalizing. Then the left hand side of (59) equals E F f. By invoking the convex function φ a (x) := exp((1+|x|) a /a) and applying the Jensen inequality we obtain where the last inequality followed from the Cauchy-Schwartz inequality. By using the inequality 2 a (1 + |x|) a ≤ c + c|x| a and applying the inverse φ −1 a (x) = (a log(x)) 1/a − 1 a direct computation yields the inequality Then (59) follows by applying (60) Proof. Since the prior measure is invariant under the change of variables U → −U we obtain that We may clearly take C(t) := P({ U B t pp ≤ 1}). 2 Lemma 16. Let λ > 0. Under assumption A and with w as in Proposition 12 there is the estimate Proof. We apply Lemma 14 with the choice a = p/λ, f = ( 1 2 (1 + U B e t pp )) λ and F = exp − 1 2 AU 2 L 2 + AU, m . One inserts the estimate for the quantity E F 2 = I ′ , see (42), obtained in (43) and (53) in the proof of Proposition 12. Moreover, a lower bound for E F is given by Lemma 15 and we recall that E exp(f a /2)) < ∞ thanks to Lemma 10. 2 We are ready for the main result of this section.
Theorem 17. Consider the ratio H g (m, A, G)/H 1 (m, A, G) as a map Under assumption A, and with w as in Proposition 12, the local Lipschitz constant of this function satisfies Proposition 13 verifies that both H g and H 1 are locally Lipschitz. As a special case we obtain that E S q+2 (m, A) is continuous with respect to variables A and m. Hence inequality (57) yields the estimate The simple inequality (here x 1 , x 2 are vectors and y 1 , y 2 > 0 are scalars) verifies that point-wise Observe that by assumption A we have H g B t pp ≤ cE S q . By combining the previous estimates and remembering that H 1 = E S 0 we thus obtain The statement then follows by three applications of Lemma 16. 2 Remark 18. The main content of Theorem 17 is a stability estimate that grows polynomially in the norm of the measurement m. We have not striven for the optimal exponents in the above computations. Moreover, one should observe that in the case p > 1 it is possible to choose w = r in Theorem 17, which corresponds to the weakest condition on smoothness of A. If p ∈ (1, 2) this yields the exponents α = γ = 1 + p ′ (q + 4)/p. In the important special case of p = 1 one is forced to choose w > r. On the other hand, the choices w > r yield considerably smaller exponents for all small values p ≥ 1: in the limit w → ∞ one has α → 1 + ( 2q+8 p ), and γ → 1 + ( 2q+8 p ).

Convergence results for Besov priors
In the present section we assume that indices t, t, r and the quantities m, A, and g satisfy Assumption A. We take T n : B e t pp (T d ) → B e t pp (T d ), n ≥ 1, defined by the familiar truncation (62) T n ∞ ℓ=1 c ℓ ψ ℓ = n ℓ=1 c ℓ ψ ℓ discussed above and in Appendix A. Then T n → I strongly in L(B e t pp ) as n → ∞. We consider proper linear discretizations U n = T n U of the random variable U.
By standard compact imbedding results it follows that (I − T n ) → 0 as n → ∞ in the operator norm topology of L(B e t pp , B t pp ), and (I − P k ) → 0 as k → ∞ in the operator norm topology of L(B e r 11 , B r 11 ) with r > r, see Appendix A. Next we formulate the assumptions in quantitative terms. Finally, it is assumed that Let Θ kn = P k AU n + E. As before we define the reconstructor R Θ kn (g(U n )|m k ) corresponding to model Θ kn at m k as The following result yields a quantitative convergence result for Besov priors: Theorem 19. Let Assumptions A and B hold. Denote γ = 1 + ( 2q+8 p )( w w−r+2r/p ′ ). There is a constant C ′ = C ′ (A, g, t, r, w, p) such that

Moreover, the limits
exist for a fixed value of k (resp. n).
Proof. The statement is an immediate consequence of the estimates obtained in the previous section: observe that in our notation Moreover, apart from a possible change of the uninteresting constants, the Lipschitz bound given by Theorem 17 on the quantity H g /H 1 is uniform on any bounded neighbourhood of m, A and G, where G is the identical embedding Id : B e t pp → B t pp . Note also that by our assumptions (A − P k AT n ) → 0 in the operator norm topology of L(B t pp , B r 11 ). The first claim now follows from definitions and Theorem 17. The last statements follow immediately by the same reasoning. 2 We emphasize that Theorem 19 covers the highly interesting value p = 1 despite the failure of assumption (28) of the general theory in that case.
Let us revisit the deblurring example of Section 2.2, where p = 1, s = 1, d = 2 and A has a smooth kernel. Often it is possible to take the projection P k related to the measurement device to be truncation of the wavelet expansion to the first k terms analogously to (62). (For instance, P k might measure local averages at a grid of points and then compute discrete wavelet transform by convolutions with finite filters.) Then Theorem 19 yields the convergence of reconstructors and a result analogous to Theorem 7 with an explicit convergence speed.
Corollary 20. Let Assumptions A and B hold. Let p = s = 1 and assume that A : D ′ (T 2 ) → C ∞ (T 2 ) is a bounded linear operator. Let P k and T n be truncations to k and n first terms in wavelet expansion, respectively. Moreover, let t <t < −1, r > r 1 > 1, λ > 11, and τ > 0. Then (i) There is a constant c 0 = c 0 (A, r, t, λ) > 0 such that the reconstructors satisfy , ω 0 ∈ Ω be realizations of the random variables U and E, and be the realizations of the measurements (1) and (2), respectively. Moreover, assume that m ∈ B −r 1 11 (T 2 ). Then there is C > 0 independent of n and k so that the reconstructors defined in Theorem 4 for measurements (2) and (1) satisfy Note that in (ii) we have m ∈ B −r 1 11 (T 2 ) for P-a.e. ω.

Proof.
Claim (i) follows directly from Theorem 19 when we use q = 1 and so large w that λ ≥ γ. As A is an infinitely smoothing operator, we can choose above any −∞ < t < t < −1 andr > r + 2τ > r 1 + 4τ . Moreover, since m ∈ B −r 1 pp (T 2 ), we can take above η 1 (k) = c 1 k −(e r−r)/2 , η 2 (n) = c 2 n −( e t−t)/2 , and η 3 (k) = c 3 k −(r−r 1 )/2 where c 1 , c 2 and c 3 depend on ω 0 and the parameters r, r, r 1 , t, t, but not on k or n. As the projections P k satisfy the conditions (34) and (35), the assertion (ii) follows from (i) and Lemma 6. 2 Appendix A. Besov spaces and wavelets Letψ andφ be compactly supported wavelet and scaling function suitable for multi-resolution analysis of smoothness C r in L 2 (R).
Following Daubechies [16, section 9.3] we construct a wavelet representation for periodic functions in R with period 1; in other words, for functions on the onedimensional torus T 1 . Set We use in the following the subspaces of L 2 (T 1 ), It turns out that V j are spaces of constant functions for j ≤ 0. Thus we have a ladder V 0 ⊂ V 1 ⊂ V 2 ⊂ · · · of multiresolution spaces satisfying j≥0 V j = L 2 (T 1 ).
Further, we denote the successive orthogonal complements of V j in V j+1 by W j for j ≥ 0. Then we have orthonormal bases Following Meyer [47, section 3.9] we define a wavelet basis for periodic functions in R d ; in other words, for functions on the torus T d . Let E denote the set of 2 d − 1 sequences ν = (ν 1 , ν 2 , . . . , ν d ) of zeroes and ones, excluding the sequence (0, 0, . . . , 0). Define for ν ∈ E and j ≥ 0 the wavelets with the convention that ψ 0 = φ and ψ 1 = ψ, and integer-valued components of vector k ranging over 0 ≤ k i ≤ 2 j − 1 for all i = 1, 2, . . . , d.
The functions ψ ν j,k (x) constitute an orthonormal basis for L 2 (T d ). Let us renumber the above basis functions using just one integer ℓ = 1, 2, . . . . First, ℓ = 1 corresponds to the scaling function φ(x 1 ) . . . φ(x n ). The remaining numbering is done scale by scale; that is, we first number wavelets with j = 0, then wavelets with j = 1, and so on. The 2 d − 1 indices ν ∈ E are naturally numbered by thinking them as binary representation of integers. The exact ordering of all 2 jd translations corresponding to a fixed j can be chosen arbitrarily. This leads to a numbering of the following type: According to Meyer [47, Section 6.10], we can characterize periodic Besov space functions using these wavelets. Namely, the series We always assume that r is large enough for providing bases for Besov spaces with smoothness s. The case q = p is especially relevant to us, and we obtain the equivalent norm We use the above quantity as the definition of the Besov norm · B s pp (T d ) for generalized functions on T d , s ∈ R and p ∈ [1, ∞]. In case p = ∞ the definition must be understood as follows: It follows that B s pp (T d ) is isometrically isomorphic to the sequence space ℓ p , and by the simplicity of the norm it is easy to control the basic properties of the spaces. Especially, there is an embedding (an easy corollary of the Hölder inequality), and it is easy to verify that this embedding is compact if ∞), and p ′ stands for the dual index: 1/p + 1/p ′ = 1. The duality is with respect to the standard duality bracket: if f = ∞ ℓ=1 f ℓ ψ ℓ and g = ∞ ℓ=1 g ℓ ψ ℓ are finite sums, then We finally observe that natural bounded linear projection operators on B s pp (T d ) are obtained by setting These projections work at the same time for all the spaces and are contractions.

Appendix B. Examples of limits of finite-dimensional random variables
Here we illustrate difficulties related to finite-dimensional models and their possible convergence to an infinite-dimensional continuum model. Unless otherwise stated, we work on a circle T 1 , or equivalently, the interval [0, 1] with end points identified. Let u ∈ L 2 (T 1 ). We consider two measurements and the corresponding measurement models where U n is a finite-dimensional random variable, U is a random variable in an infinite-dimensional function space, and E is a normalized Gaussian random variable independent of U and U n . We consider various examples of U n and study whether some random variable U could be considered as a limit of U n as n → ∞, and whether models (71) or (72) make sense in the limit.
In all examples below, X n j , j, n ∈ N are independent real-valued normalized Gaussian random variables. Example 1. ("Non-proper discretization of white noise") Let I(n, j) = ( j−1 n , j n ], j = 1, . . . , n and χ n j (t) = χ I(n,j) (t) be the indicator functions of intervals I(n, j). Let U n (t) = n j=1 a n j X n j χ n j (t), t ∈ T 1 (73) where a n j > 0 are parameters. For a fixed n, with an ad hoc choice a n j = 1, the functions U n could be considered as an interesting random signal. However, for any function φ ∈ L 2 (T 1 ) and thus U n , considered as L 2 (T 1 ) valued random variables, converge to zero weakly in distribution as n → ∞. Taking U = 0 we see that the measurements M (1) n converge in distribution to M (1) . Concerning measurement (72) we notice the U n ( 1 2 ) ∼ N(0, 1), but U( 1 2 ) ∼ N(0, 0). Thus M (2) n ∼ N(0, 2) for all n, but for U = 0 we have M (2) ∼ N(0, 1). This shows that measurement models (72) do not behave nicely with the choice a n j = 1. In this example U n are not proper linear discretizations of U in any Banach space, since the second condition in Definition 1 is violated.
Example 2. ("Proper discretization of white noise") Consider random variables U n defined in (73) with parameters a n j = n 1/2 . This choice is motivated by the fact that the functions n 1/2 χ I(n,j) are orthonormal in L 2 (T 1 ). Then, for φ ∈ C ∞ (T 1 ) where U is Gaussian white noise in L 2 (T 1 ). This actually holds also when φ ∈ H 1 (T 1 ). Let Q n be the L 2 (T 1 )-orthogonal projection on the subspace of functions that have constant value on the intervals I(n, j). Then U n , φ appearing on the left hand side of (74) is a real-valued Gaussian random variable with covariance n j=1 | φ, n 1/2 χ I(n,j) | 2 = Q n φ 2 2 .
As n → ∞ this tends to the value φ 2 2 , as is easily seen by first approximating φ by smooth functions. Hence U n → U weakly in distribution in the space H −1 (T 1 ).
Moreover, we note that U n has the same distribution as the random variable T n U, where T n : H −1 (T 1 ) → H −1 (T 1 ) is the linear operator T n v = n j=1 v, φ n j n 1/2 χ I(n,j) , where (φ n j ) n j=1 is any orthonormal sequence in L 2 (T 1 ) consisting of elements of H 1 (T 1 ). We have thus verified that the variables U n are proper linear discretizations of U in the space H −1 (T 1 ), according to Definition 1. Now the measurement M (1) is well defined and M (1) n converge in distribution to M (1) as n → ∞. However, we have M (2) n ∼ N(0, n) and thus measurements M (2) do not converge in distribution as n → ∞. This is related to the fact that the white noise U is a well defined H −1 (T 1 ) valued Gaussian random variable, and the constant function 1 is in the dual of the space H −1 (T 1 ) but the point evaluation u → u( 1 2 ) does not define a bounded linear operator in H −1 (T 1 ).

Remark 21.
In Example 2 above one may verify that it is possible to choose the L 2 (T 1 )-orthonormal sequence (φ n j ) n j=1 so that the norm T n H −1 (T 1 )→H −1 (T 1 ) remains uniformly bounded for all n. However, this choice is somewhat complicated. A way to construct discretizations with this property (and such that they fall in the scope of the basic theory developed in Section 3) is to apply suitable finite dimensional approximations of identity that are uniformly bounded simultaneously on both H −1 (T 1 ) and L 2 (T 1 ). E.g., one may truncate Fourier series or apply basis projections corresponding to a wavelet basis (set p = 2 in Section 6). Details of these comments will be considered elsewhere. Similar remarks apply to Example 3 below.
Example 3. ("Discretization of the Gaussian smoothness prior") Choose continuous functions η n j : T 1 → R so that they are affine on intervals I(n, j), j = 1, . . . , n and that (η n j ) n j=1 are orthonormal in H 1 (T 1 ). Let where b n j > 0 are parameters. We choose b n j = 1. Then, for φ ∈ L 2 (T 1 ) where U is a Gaussian random variable in L 2 (T 1 ) having zero expectation and covariance operator (I−∆) −1 , that is, U is the one-dimensional Gaussian smoothness prior. Let Q n be the H 1 (T 1 )-orthogonal projection onto the subspace Y n spanned by η n j , j = 1, 2, . . . , n. Analogously to Example 2, one can see that U n have the same distribution as random variables T n U where where T n : L 2 (T 1 ) → L 2 (T 1 ) is the linear operator where (φ n j ) n j=1 is any orthonormal sequence in H 1 (T 1 ) consisting of elements of H 2 (T 1 ). Thus U n are proper linear discretizations of U in L 2 (T 1 ).
Let us next consider φ ∈ H −1 (T 1 ). Due to the formula (75), the (H 1 (T 1 ) × H −1 (T 1 ))-duality U n (ω), φ defines a real-valued Gaussian random variable with covariance E ( U n , φ 2 ) = E ( n j,k=1 b n j X j b n k X k η n j , φ η n k , φ ) (77) = n j=1 η n j , φ 2 = n j=1 η n j , (I − ∆) −1 φ 2 The kernel of the covariance of operator of U in L 2 (T 1 ), that is, the function G(t, t ′ ) = E (U(t)U(t ′ )), t, t ′ ∈ T 1 is Green's function of the differential operator − d 2 dt 2 + 1, This implies that E (U(t)U(t ′ )) is Lipschitz smooth on T 1 × T 1 . As U is Gaussian, one can see using e.g. [35,Theorem 3.23] that the values of the Gaussian smoothness prior U are almost surely in any Hölder space C α (T 1 ) with α < 1/2. Thus, after fixing α ∈ (0, 1 2 ), we can consider U also as a Gaussian C α (T 1 )-valued random variable satisfying E ( U, φ 2 ) = (I − ∆) −1 φ 2 for every φ in the dual space of C α (T 1 ). Note that as H 1 (T 1 ) ⊂ C α (T 1 ), we have (C α (T 1 )) ′ ⊂ H −1 (T 1 ). Thus, since the projectors Q n converge strongly to identity in H 1 (T 1 ) as n → ∞, we can use (77), (78), and the fact that the ranges of the operator T n used above are in H 1 (T 1 ) ⊂ C α (T 1 ), and infer that that U n are proper linear discretizations of U in C α (T 1 ), too. Because of this the measurements M (1) and M (2) are well defined and one can see that the measurements M where Z n = (Z n 1 , Z n 2 , . . . , Z n n−1 ) is a R n valued random variable having the probability density function π Z n (z 0 , . . . , z n−1 ) = c n exp −a n d dt ( n−1 j=1 z j η n j (t)) L 1 (I) where a n > 0 is a parameter and c n is a normalization constant of the probability density function. The distribution of the random variables U n are sometimes called the discrete total variation prior. By [39], the random variables U n converge in distribution if a n = n 1/2 but then the limit U is a Gaussian random variable. As a Gaussian distribution stays Gaussian in a linear transformation, we see that in this example U n are not proper linear discretizations of U in any Banach space.
Let now u ∈ L 2 (T 1 ). We consider two measurements and define models M  (71) where E is Gaussian white noise in L 2 (T 2 ). Then, the measurement M (1) is well defined and the measurements M (1) n converge in distribution to M (1) . However, one can show that M (2) n ∼ N(0, σ 2 n ), where σ n → ∞ as n → ∞ and we see that the measurements M (2) n do not converge in distribution as n → ∞. In this example one can verify that the U n are proper linear discretizations of U in H −1 (T 2 ).
The fact the point value measurements M n do not converge is related to the fact that the two-dimensional Gaussian smoothness prior has, formally speaking, the covariance function E (U(t)U(s)) = G(t, s), t, s ∈ T 2 is the Green's function of the operator −∆ + I and has thus on the diagonal t = s a logarithmic singularity. This fact is extensively used in quantum field theory in the study of the free Gaussian field, a random field that is very similar to the Gaussian smoothness prior [58].

Appendix C. On the domain of reconstructors
Considering formula (27) in the infinite-dimensional case, we meet the difficulty that realizations of M belong to Z only with probability zero. Therefore the function m → R M (U|m) should be defined in some larger set than Z.
A generalized definition of a reconstructor is as follows: The quantity E(g(U)| · ) : S 0 → Y is defined analogously.
For the domain of R M (U|m) we could consider any of the non-trivial Borelmeasurable subspaces L ⊂ S, such that the realization M(ω) belongs to L almost surely. Given any two such subspaces L 1 and L 2 , the value of the function R M (U|m) can be changed in L 1 \ L 2 , without contradicting the property (81). It is tempting to choose the domain to be the intersection of all such subspaces L ⊂ S, but unfortunately, this intersection is the set Z where the realizations of M lie only with probability zero. It appears to be hard to pick a candidate for a smallest space S 0 where the reconstructor R M (U|m) should be defined.
Because of the above difficulties, we restricted ourselves to the case where the operator A maps A : S → S 1 implying that m → R M (U|m) can be defined in the whole space S.