Fast Gibbs sampling for high-dimensional Bayesian inversion

Solving ill-posed inverse problems by Bayesian inference has recently attracted considerable attention. Compared to deterministic approaches, the probabilistic representation of the solution by the posterior distribution can be exploited to explore and quantify its uncertainties. In applications where the inverse solution is subject to further analysis procedures, this can be a significant advantage. Alongside theoretical progress, various new computational techniques allow to sample very high dimensional posterior distributions: In [Lucka2012], a Markov chain Monte Carlo (MCMC) posterior sampler was developed for linear inverse problems with $\ell_1$-type priors. In this article, we extend this single component Gibbs-type sampler to a wide range of priors used in Bayesian inversion, such as general $\ell_p^q$ priors with additional hard constraints. Besides a fast computation of the conditional, single component densities in an explicit, parameterized form, a fast, robust and exact sampling from these one-dimensional densities is key to obtain an efficient algorithm. We demonstrate that a generalization of slice sampling can utilize their specific structure for this task and illustrate the performance of the resulting slice-within-Gibbs samplers by different computed examples. These new samplers allow us to perform sample-based Bayesian inference in high-dimensional scenarios with certain priors for the first time, including the inversion of computed tomography (CT) data with the popular isotropic total variation (TV) prior.


Bayesian Inversion
We consider the task of inferring information about an unknown quantity from indirect, noisy measurements and assume that a reasonable mathematical model is given by a linear, ill-posed operator equation including additive noise terms. The following discrete forward model is used to carry out the computational inference: Here, f ∈ R m represents the measured data, u ∈ R n a suitable discretization of the unknown quantity we wish to reconstruct and A ∈ R m×n a corresponding discretization of the continuous forward operator. We assume that the statistics of the additive noise can be well-approximated by a Gaussian distribution N (µ, Σ) and that f and A are already centered and decorrelated with respect to µ and Σ, i.e., f = Σ −1/2 (f − µ) and A = Σ −1/2Ã , wheref andÃ denote the original data and forward operator, respectively. This leads to ε ∼ N (0, I m ) and the following likelihood distribution, which is a probabilistic model of the measured data f given the unknown solution u. In typical inverse problems, solving (1) for u is ill-posed. As a consequence, the information that (2) contains about u is insufficient to carry out robust inference and we need to amend it by a-priori information, encoded in a prior distribution p prior (u). Then, the total information on u we have after performing the measurement is encoded by the conditional distribution of u given f , the so-called a-posteriori distribution. It can be computed by Bayes' rule: Figure 1 illustrates the inference process. Originating from statistical physics, Gibbs distributions are commonly used prior models: The functional J (u) measures an energy of u. The use of Gibbs priors leads to For a general introduction to Bayesian inversion we refer to [20,39,28] and references therein. The recent attention on this particular inversion approach is best reflected by the recent special issue of Inverse Problems [7], which also provides a good overview over current developments and trends.   which both yield a single point estimate of u. Details on their properties and relationship can be found in [6]. More sophisticated estimators such as conditional covariance (CCov ), conditional variance (CVar ) or standard deviation (CStd ) estimates try to extract higher order statistics of u, or try to quantify the uncertainties of u, for instance through credible region/interval and extreme value probability estimators.

Sample-based Inference
Bayesian computation refers to the practical task of computing the above estimators. For most inversion scenarios and prior models, this involves solving high-dimensional optimization or integration tasks (cf., (6)), or even a mix of both. In this article, we are examining techniques that integrate p post (u|f ) by Monte Carlo integration: where u i are samples of p post (u|f ) generated by a sampling algorithm/sampler. Due to the lack of efficient direct samplers that generate i.i.d. samples, Markov chain Monte Carlo (MCMC ) samplers need to be employed in most situations. MCMC for high dimensional Bayesian inversion is a very active field of research, see, e.g., [16,11,2,36,22,31,38,37,4,1,33,32,12] for some examples of recent developments. In [27], an efficient MCMC sampler for Gibbs priors with 1 -norm-type energies ( 1priors, cf. Figure 1b) was presented: Such energies are commonly used to impose sparsity constraints on the solution of high dimensional image reconstruction problems, a direction of research closely related to the notion of compressed sensing [8,14,15]. A detailed discussion of sparsity as apriori information in Bayesian inversion can be found in [28]. The sampler developed in [27] belongs to the class of single component (SC) Gibbs samplers, which sample p post (u|f ) by subsequently sampling along the conditional, single component densities p post (u j |u −j , f ): (SC-Gibbs Sampling) Define an initial state u 0 , a burn-in size K 0 and sample size K. For i = 1,. . .,K 0 + K do: A1.1 Choose a component j (deterministic or random).
i=0 and use {u i } K0+K i=K0+1 as a sample of p post (u|f ). We have used x −j :=(x 1 , . . . , x j−1 , x j+1 , . . . , x n ) T to denote a vector with all components of x except for x j . An illustration of Gibbs sampling is given in Figure  1c. In [27], this SC-Gibbs sampler was compared to the popular Metropolis-Hastings (MH ) sampler: For the computational scenarios considered and the evaluation performed, it was demonstrated that in contrast to the MH, SC-Gibbs sampling gets more efficient when the level of sparsity or the dimension of the unknowns is increased. Thereby, it became possible to carry out sample-based inference with 1 priors in challenging inverse problems scenarios with n > 10 6 : • The theoretical predictions about the infinite dimensional limits of TV priors posed in [24,10] could be verified numerically (see [28]).
• The numerical results stimulated the development of new theoretical ideas about the relationship of MAP and CM estimates (see [6,19]).

Previous Limitations
As the sampler developed in [27] relies on a direct sampling of the SC densities, namely the inverse cumulative distribution method (iCDF ), we will call it the direct (SC1) compute the conditional, SC densities in an explicit, parameterized form in a fast way.
(SC2) employ a fast, robust and exact sampling scheme for the parameterized form of the SC densities.
In order to best fulfill (SC1) and (SC2), the direct 1 sampler was designed for a very particular setting: Firstly, in addition to relying on a linear forward map (1) and a Gaussian noise model (2), it assumes that the operator D ∈ R n×h in (8) can be diagonalized (synthesis priors): There is a basis matrix V such that D T V is a diagonal matrix W ∈ R h×n . The direct 1 sampler then samples over the coefficients of u = V ξ in this basis: This excludes the use of frames or dictionaries for D. Secondly, it only works for the 1 norm as a prior energy: A straight-forward extension of iCDF to examine more general q p -prior of the form p prior (u) ∝ exp −λ D T u q p is not possible. This excludes the interesting cases of q = p, p < 1, which leads to a non-convex energy but also p = 1, q > 1, which was examined in [10]. Finally, a lot of interesting priors such as the popular isotropic TV prior in 2D/3D or related, bloc/structured sparsity priors have a more involved structure than (8) and cannot be treated with iCDF in an efficient and robust way as well. In all the above cases, including additional hard constraints, u ∈ C, where C is the feasible set of solutions is often advantageous: While such constraints have proven to be very useful as a-priori knowledge [40,3], their inclusion into the direct 1 sampler in a numerically stable way is cumbersome.

Contributions and Structure
For most of the limitations discussed above, the main problem is not to fulfill (SC1), but to fulfill (SC2) by using a direct sampler such as iCDF for the parameterized SC densities in step A1.2. In this article, we sample from them by using a generalization of slice sampling that utilizes their specific structure instead and demonstrate the effectiveness of this replacement in different computed examples. This allows us to perform sample-based Bayesian inference in high-dimensional scenarios with the priors described above for the first time. The paper is structured as follows: In Section 2, we first derive the SC densities for the priors discussed above. Then, we introduce the basic and generalized slice sampler and discuss how to integrate it into the SC Gibbs sampler for Bayesian inversion. Section 3 contains computed examples and Section 4 closes with a discussion. Several technical details are covered in Section Appendix A.

Sampling Methods
For general and comprehensive introductions to MCMC sampling methods, we refer to [34,26].

SC Posteroir Densities
In this section, we briefly derive the SC posterior densities for the examined prior models in a simple, parameterized way, cf. (SC1). We first discuss the case where a basis {v 1 , . . . , v n } helps to represent u = ξ l v l =: V ξ such that p post (ξ j |ξ −j , f ) can be described using as few parameters as possible. Once such a basis is found, the part of p post (ξ j |ξ −j , f ) coming from the likelihood is easy to derive: We define Ψ := AV and ϕ(j) := f − Ψ −j ξ −j . Then, we find that ξ −j to ease the notation for the following sections. Note that while a and Ψ T j f can be precomputed, (Ψ T j Ψ −j )ξ −j relies on the current state of the ξ-chain and has to be computed in every step of the sampler. Especially for complicated forward operators in high-dimensional scenarios, this operation is the computational bottleneck of SC Gibbs samplers. Therefore, a careful, scenario-dependent implementation is important to obtain a fast sampler. Now we proceed to determine V and the part of p post (ξ j |ξ −j , f ) coming from the prior. The energies of the q p priors can be written as To obtain simple conditional densities for all ξ j , we thus have to choose V such that is as small as possible. We first consider the special but important case of D T ∈ R h×n having full rank and h n. This includes the case where the columns D are elements of a basis, and thereby, the class of Besov priors, see [23,21,18,13,6] and the TV prior in 1D with Neumann boundary conditions, which we will use in the computational studies. Due to the full rank, we can choose v 1 , . . . , v h such that D T v l = e l for l = 1, . . . , h, and v h+1 , . . . , v n such that D T v l = 0 for l = h + 1, . . . , n (for D being a basis, we have V = D). With this transformation, (12) simplifies to Defining x := ξ j as above, we can write the conditional SC posterior density as which simplifies to for p p priors. In the case where D cannot be diagonalized, an explicit form is given by where Various generalizations of the standard q p priors with D T u q p -type energies first compute the 2 -norm of a local feature of u, e.g., of its gradient, and then measure the global q p energy of these local 2 norms. In this article, we will only discuss one prominent example thereof, which is given by the isotropic TV prior in 2D: If we assume that u represents an N × N discrete image, we can index the components of u as u (k,l) with k = 1, . . . , N , l = 1, . . . , N , n = N 2 . We can then use forward differences in both spatial directions to define with appropriate additional boundary conditions. The local nature of the J iT V (u) allows to derive a simple parameterization of the SC densities in the pixel basis V = I n .
Every ξ j = u (k,l) only appears in three terms of the energy: Therefore, we can write the conditional SC posterior as with appropriately computed parameters d k ∈ {0, 1, 2}, e k , g k 0.
The difficulty of incorporating additional hard constraints (10) depends on the shape of the feasible set C and the transformation V applied. In the following, we assume that they lead to a feasible (semi-)finite interval [lb, ub] to which the continuous densities computed above can be restricted to. In the case of C being convex, such an interval always exists and there are computationally efficient ways to compute it.

MCMC-within-Gibbs Sampling
The direct 1 sampler is sampling (16) with p = 1 by the iCDF method using an explicit form of the inverse CDF. For p, q = 1 or (21) this is not possible and one would need to integrate the CDF numerically to use the iCDF method as a SC density sampler. However, already for p = 1, a major technical difficulty was to develop a numerical implementation that worked for all possible combinations (a, b, c): The first implementations broke down when the dimension n of the problem was increased and the ill-posedness became more severe. The reason was that combinations of (a, b, c) corresponding to extremely degenerate SC densities appeared more frequently for n → ∞ and in general, the variability of SC densities grows. This trend will be an even more severe problem when one cannot find an explicit form of the inverse CDF and needs so resort to numerical integration. But also replacing the iCDF method by a univariate MCMC sampler (MCMC-within-Gibbs sampling) becomes challenging: The most commonly used Metropolis-within-Gibbs sampler, which utilizes an easy-toimplement MH sampler with a univariate Gaussian proposal N (x, κ) (where x is the current state) for the SC sampling step A1.2 will not work properly in such a situation: For an MH sampler to be efficient, finding a value of κ leading to an optimal acceptance rate is essential. However, the large variations in-between SC densities renders an automatic tuning of a single κ impossible. The alternative would be to tune and use a different κ j for every component j, but the tuning procedure would require n times more samples than tuning one κ for all components. Thereby, the resulting algorithm would be more like an adaptive SC-MH sampler than a Gibbs sampler [17,25].

Slice Sampling
Slice sampling transfers the automatic adaptation of Gibbs sampling to univariate densities. While the basic version to sample arbitrary densities in a "back-box" fashion was proposed in [30], we follow the presentation given in [34], which leads to a general version in which we can utilize several properties of our specific posterior densities to derive an efficient SC sampler. The starting point for slice sampling is the accepted samples rejected samples Fundamental Theorem of Simulation, which states that sampling from a distribution p(x) is equivalent to sampling uniformly from the area under the graph of p(x): A2.2 Draw x i+1 uniform from S y :={z | p(z) y} (horizontal move).
i=K0+1 as a sample of p(x). The difficulty of this basic slice sampling scheme as developed in [30] is determining S y in Step A2.2. For the SC densities we want to sample from, determining S y explicitly is not always feasible, and robust numerical approaches to compute it are difficult to design. For instance, using non-convex prior energies such as in q p priors with q = p, p < 1 leads to multi-modal SC densities and S y may not be a single interval. Therefore, we will use a generalization of Algorithm 2: Slice sampling is a variant of auxiliary variables algorithms that introduce an additional variable y with a suitable density p(y|x). Then, samples (x i , y i ) from p(x, y) = p(x)p(y|x) are obtained by a Gibbs sampler, which relies on p(y|x) and p(x|y), and only the x i are kept. For the basic slice sampler, p(y|x) is chosen as i.e., as a uniform distribution on [0, p(x i )]. We then have which leads to The corresponding sampler takes the form: For a univariate density p(x) ∝ p 1 (x)p 2 (x), define an initial state x 0 , a burn-in size K 0 and sample size K. For i = 1,. . .,K 0 + K do: . For all the methods presented in this section, p(x) does not need to be normalized. Also note that for simplicity, we refer to Algorithm 3 as the "slice sampler", hopefully without causing confusion with the one presented in [30], which was included as the "basic slice sampler" (Algorithm 2) here for completeness of the presentation.

Slice-Within-Gibbs Sampling for Bayesian Inversion
The implicit variable split introduced in Algorithm 3 is appealing if S y 2 = {z | p 2 (z) y} is a single interval and easy to determine and p 1 (x) constrained to an interval is easy to sample from. For the SC posterior densities we consider here, this holds if we split into likelihood plus hard constraints, i.e., p 1 (x) = exp(−ax 2 + bx)1 [lb,ub] (x), and prior parts p 2 (x). As the prior terms are unimodal and some even symmetric to zero, S y 2 is a single interval and can be determined easily: For (15), we have For the TV prior, (21), we need to compute S y 2 numerically. However, as the energy of p 2 (x) is convex, S y 2 is a single interval given by the solutions to p 2 (x) = y. As the energy of p 2 (x) is also piecewise smooth and can be bounded from below, we can easily find starting points for fast, derivative-based root-finding-algorithms. The details are given in Appendix A.2. A generalization to other convex, piecewise-smooth energies, such as (17) with suitable p, q, is straight-forward (p = q = 1 is a special case as p 2 (x) = y can be solved explicitly by a simple scheme). However, if D T V is dense the number terms in the prior energy is large and this step will become the computational bottleneck of the whole solver. Fortunately, many relevant operators D T such as finite  For sampling truncated Gaussians, various direct samplers were developed. Our implementation relies on a modified, more robust, version of [9]. Note that if the sampler is initialized in a feasible point u 0 ∈ C, the probability of I being empty or a single point is zero in theory. In practice, finite precision can lead to I = {x}, in which case one has to set x i+1 =x. Using the slice sampler presented above to sample from p post ( · |ξ −j , f ) in step A1.2 will be called slice-within-Gibbs sampling. In principle, it will generate a full Markov chain where we subscripted all variables belonging to the inner slice sampler with s. Practically, we only need one sample from p post ( · |ξ −j , f ). We will always initialize the slice sampler with the current value ξ i of the component we want to update. Then, we only have to determine the length of the burn-in phase K s,0 and choose the first sample of the real run as a sample of p post ( · |ξ −j , f ), i.e., K s = 1.
The correctness and convergence of the slice-within-Gibbs sampler can be established by combining the properties of the slice sampler (Algorithm 3) and the general Gibbs sampler (Algorithm 1), which are discussed in [34].

Computational Scenarios
"Boxcar" -Image Deblurring in 1D For the initial evaluation studies, we use a simple image deblurring scenario in 1D that was adopted from [24] and also used in [27]. It is a simplification of the task to reconstruct a spatially distributed intensity image that is known to consist of piecewise homogeneous parts with sharp edges: The indicator function of [ 1 3 , 2 3 ] is to be recovered from its integrals over m = 30 equidistant subintervals of [0, 1], corrupted by noise with µ = 0, Σ = 10 −3 I m (see Figure 3). The reconstruction is carried out on the grid x n i = i 256 , i = 1, . . . , n, with n = 255 and the forward operator is discretized by the trapezoidal quadrature rule applied to that grid. Further details can be found in Section 3.1.1 of [27]. The prior operator D T will be given by the forward difference operator with Neumann boundary conditions: D T has full rank h = n − 1 and V ∈ R n×n given by is a basis matrix V such that D T V is a diagonal matrix. We will refer to priors based on this operator as increment priors. For the 1 increment prior, i.e., the conventional TV prior, we can also use the direct 1 sampler to sample from the posterior. By this, we can validate the approximation of the direct SC sampling via iCDF by the slice sampler proposed here. We will refer to this setting as the "Boxcar" scenario in the following.
"Phantom-CT" -CT Inversion in 2D We consider an example of 2D sparse angle CT to demonstrate the potential of the proposed sampler for real-world applications. An approximate model of CT is given by the Radon transform R: For a 2D function u ∞ ∈ L 2 ([−1, 1] 2 ), it computes integrals along straight lines which are parametrized by the angle θ of their normal vector and their (signed) distance s to the origin: histogram p(x) (a) Figure 5: (a) Histogram (blue bars) of the slice sampler compared to targeted SC density (red line) given by (16) with p = 0.8. The parameters a and b were picked from a run of the direct 1 sampler applied to a 2D image deblurring scenario (described in [27]) and c was matched to the regularization parameter used therein.
In sparse angle tomography, only a small number of such angular projections can be measured. In our study, we chose only m θ = 45 angles, evenly distributed in [0, π). In addition, for a given angle θ i , we practically only measure the integrals of R[u ∞ ](θ i , s) over small s-intervals representing an array of m s = 500 equal sized sensor pixels. In total, this leads to m = m s · m θ = 18.000 measurements. The forward operator A corresponds to the exact discretization of this measurement with respect to the pixel basis: All the operations involved in the measurement can be computed explicitly for indicator functions of rectangular sets. Further details of this step can be found in Section 2.3 in [28]. The unknown function u ∞ to recover is a slightly scaled version of the Shepp-Logan phantom [35], a toy model of the human head defined by 10 ellipses. Figure 4a shows u ∞ and Figure 4b the measurement data generated by discretizing u ∞ with a 768×768 pixel grid. We will refer to this scenario as "Phantom-CT".

Accuracy Assessment
To validate that the developed slice sampler accurately reproduces the distributions it is supposed to sample from, the convergence of the sample histograms to the underlying SC densities was checked for visually. Various (random) combinations of coefficients for the different SC densities were tested; see Figure 5 for an example of such a comparison.

Efficiency Assessment
Once the accuracy of the slice sampler is established, the next crucial question is whether its use within a Gibbs sampler is efficient: In Algorithm 1, we ideally want to Table 1: Comparison of τ int for direct and slice-within-Gibbs samplers using different burn-in lengths for the slice sampler. The "Boxcar" scenario and a TV prior (p = q = 1) with λ = 400 is used and K = 5 · 10 6 , SSR = n.  Table 2: Comparison of τ int for slice-within-RSG samplers using different burn-in lengths for the slice sampler. The "Boxcar" scenario is used in (a) with an p increment prior with p = 1.2, λ = 400 and in (b) with an q p increment prior with p = 1, q = 10, λ = 0.02. In both cases K = 2 · 10 6 samples were drawn. In (c), the "Phantom-CT" scenario with an isotropic TV prior with λ = 500 is used and K = 2 · 10 5 samples were drawn.  replace the current values of the component j, u i by a values that is both distributed following p post ( · |u −j , f ) and statistically independent of the current value u i . While direct SC samplers, such as the iCDF, naturally fulfill these requirements, SC samplers relying on MCMC chains initialized with u i fulfill them only asymptotically, in the limit K s,0 → ∞. Using a fixed chain size K s,0 will inevitably introduce additional correlation between subsequent samples and lower the statistical efficiency of slicewithin-Gibbs samplers compared to Gibbs sampling relying on a direct sampler for the SC densities. In the following, we will asses this loss of statistical efficiency by autocorrelation analysis.
Autocorrelation Analysis Evaluating samplers in general rather than for a specific aim is a difficult task [26]. For the sake of a concise presentation, a detailed introduction and discussion is omitted here but can be found in Section 4.1.6. of [28]. In this study, we will only examine the autocorrelation functions of the MCMC chains projected onto a test function v ∈ R n , i.e., of the chain In the "Boxcar" scenario, v is given as the largest eigenvector of the (pre-computed) posterior covariance matrix while in the "Phantom-CT" scenario, it is the indicator function of the area defined by [−0.32, −0.12] × [0.12, 0.32] (this area corresponds to the green box shown in Figures 7e-7f). To extract a quantitative measure from the autocorrelation functions, we will estimate their integrated autocorrelation time τ int by the approach presented in [41]. In all computed examples, the component j to update in step A1.1 of Algorithm 1 is drawn uniformly at random, (random scan Gibbs sampler ) and a sub-sampling rate (SSR) of n is used, i.e., only every n-th sample of the chains generated by Algorithm 1 is actually stored and τ int refers to the samples of this thinned chain. This means that, on average, we update all n components of u ∈ R n between two steps of the chain (full sweep). In each scenario, the samplers were given a large burn-in time K 0 and K was chosen large enough to obtain sufficiently tight error bounds on τ int [41].
Results When using a conventional TV prior (p = q = 1) in the "Boxcar" scenario, the direct 1 sampler using the iCDF method can be used as a reference to which the slice samplers can be compared to: The τ int obtained by the direct sampler is a lower bound for all slice samplers. Table 1 lists the results. One can observe that already for small MCMC chain length K s,0 , the differences between direct and slice sampler in terms of statistical efficiency are negligible in practice. Similar examinations using 2 priors (where, again, a direct sampler can be used as a reference) showed that in this case, significant differences vanish for even smaller values of K s,0 (results omitted here). Tables 2 (a), (b) and (c) show the results of similar examinations for an p prior with p = 1.2, an q p prior with p = 1, q = 10 and the isotropic TV prior in the 2D "Phantom-CT" scenario (using n = 129 × 129), respectively. While we do not have a direct sampler as a reference here, one can clearly see that τ int is converging to a limit for increasing K s,0 . In some cases, even using K s,0 = 0, i.e., only performing one step of the slice sampler, might be sufficient.
Computational Complexity In typical large scale inverse problems such as the one examined in Section 3.5, the computational bottleneck is to compute the coefficients of the SC densities, not the process of sampling from them. Therefore, the computational complexity of the slice sampler is not a critical aspect of the whole algorithm. However, to give an indication of how increasing K s,0 effects the total run time, Table 3 compares the run time of the slice-within-Gibbs sampler to the direct 1 sampler and Table 4 lists the run times of the slice-within-Gibbs sampler for the TV prior in 2D. While the implementation of the slice sampling part is more complicated in this situation (cf. Section Appendix A.2), we see that even for a moderate sized scenario (n = 256 2 ) it does not significantly effect the run time. Therefore, one does not have to compromise statistical efficiency by choosing a small K s,0 to obtain a better computational efficiency. Table 4: Total run time of the slice-within-Gibbs sampler using different burn-in lengths K s,0 divided by the run time for K s,0 = 0. The "Phantom-CT" scenario (n = 256 × 256) and a TV prior (p = 1) with λ = 500 is used.
(a) MAP estimates by SA (b) CM estimates by RSG Figure 6: MAP and CM estimates for the 1D "Boxcar" scenario using the p increment prior and n = 63. The true function to recover (gray line plot) is denoted by u †,∞ .

Application to the p Increment Prior
Problems with the conventional TV prior (see [24] and the overview in Section 5.2.2 in [28]) stimulated research into alternative, edge-preserving prior models. Here, we exemplify how the new slice-within-Gibbs sampler can be used to investigate such general questions in Bayesian inversion: We use it to compute both MAP and CM estimates for the p increment prior with p decreasing from p = 2 (Gaussian prior) to p = 1 (TV prior) and even below p < 1 (non-logconcave prior). While the computation of the CM estimates is straight-forward, computing MAP estimates is done by using the sampler within a simulated annealing (SA) scheme, a stochastic meta-heuristic for global optimization. The details and an evaluation of using SA together with the proposed SC Gibbs samplers can be found in Section 4.2.4 and 5.1.5 in [28]. In both cases, K = 10 5 samples were drawn with SSR = n.
The results of computing MAP and CM estimates for different values of p are shown in Figure 6. Here, λ was chosen such that all likelihood energies are equal and that λ = 200 for p = 1. The results suggest that using p < 1 leads to superior results for both MAP and CM estimates compared to p = 1. The MAP estimate is closer to the real solution as it is both sparser in the increment basis and the contrast loss is reduced. The CM estimate for p = 0.8 looks way more convincing compared to those for p 1: It has clear pronounced edges that separate smooth, denoised parts. However, using the slice-within Gibbs samplers for p < 1 needs to be examined more carefully: While the results are visually convincing, we cannot be sure that the sampler explored the whole, possibly multimodal posterior and did not get stuck in a single mode. Figure 7 shows MAP and CM estimates for the "Phantom-CT" scenario using an isotropic TV prior with Neumann boundary conditions (19). Here, the MAP estimates were computed with the alternating direction method of multipliers (ADMM ) [5]. In the Gibbs sampler, oriented over-relaxation (OOR) (see [29] and Section 4.3.1. in [28]) was used to accelerate convergence and K = 2.5 · 10 4 , 10 4 , 5 · 10 3 samples were drawn for n = 64 2 , 128 2 , 256 2 , respectively (SSR = n).

Non-Negativity Constraints
As the slice-within-Gibbs sampler can easily incorporate additional hard constraints (10), it can be used to quantify their effect on the posterior p post (u|f ). Figure 8 shows CM and CStd estimates computed with or without nonnegativity constraints, u 0. While the CM estimates look very similar, the CStd estimates reveal that the non-negativity constraints lead to a significant reduction of the posterior variance in some regions.

Gradient Estimates
We further present one example of how the samples of u generated by the sampler can be used to compute statistics and uncertainties of a feature of u (cf. Section1.2): In Figure 9a, we computed the CM estimate of the gradient of the image u, and Figure 9b shows the corresponding CStd estimate.

Discussion and Conclusions
In this article, we presented and evaluated a new MCMC sampler that allows to carry out sample-based Bayesian inversion for a wide range of scenarios and prior models. It is based on the extension of the single component Gibbs-type sampler developed in [27] by a problem-specific adaptation and implementation of generalized slice sampling and enables efficient posterior sampling in high-dimensional scenarios with certain priors for the first time.
The results in Sections 3.2 and 3.3 show that using generalized slice sampling to sample from the one-dimensional conditional, single component densities can lead to a fast, robust and accurate posterior sampler for the inverse problems scenario (1) and is therefore an attractive option whenever a fast direct sampler such as iCDF is not available. The computed results in Section 3.4 exemplified the use of the new slice-within-Gibbs sampler to examine recent topics in Bayesian inversion and Section 3.5 demonstrated how it can lead to interesting results for Bayesian estimation in challenging, high-dimensional inverse problems scenarios. In particular, we examined that TV prior model in 2D: The theoretical analysis of the TV prior carried out, e.g. in [24,23], is restricted to 1D, only, and, to the best of our knowledge, no theoretical results are available for higher dimensions, yet. The development of the slice-within-Gibbs sampler now enabled us to examine the use the TV prior for the important inverse problem scenario of CT inversion in 2D, for the first time. The results show that, contrary to the 1D case, the CM estimates seem to get smoother for a constant value of λ as the resolution increases. This observation could be the starting point for a new theoretical analysis and has to be examined in higher spatial dimensions by computational studies. More generally, while our results and those of others (cf. Section 1.2) have demonstrated that sampling high-dimensional posterior distributions is feasible for many important inverse problems scenarios nowadays, an important future challenge lies in extracting the information of interest from the samples generated: While we demonstrated, e.g., how to compute CStd estimates to examine how the posterior variance is influenced by non-negativity constraints (cf. Figure 8) or estimates of a feature g(u) of u (cf. Figure 9), we did not discuss how to interpret the corresponding results. This requires a concrete application and objective and will be topic of future investigations based on the methods presented here. Related to the last point, we only used simulated data scenarios in this study to focus on the sampling algorithm. The application to experimental CT data will be the subject of a forthcoming publication covering more general aspects of Bayesian inversion in practical applications (see Section 5.3. in [28]). Furthermore, only prior models based on q p -norms were considered here, while the sampler can, in principle, be implemented for more general prior models. A more fundamental limitation and future challenge is the current restriction to linear forward maps (1) and Gaussian noise models (2). Both non-linear forward maps and non-Gaussian noise models typically conflict with condition (SC1), i.e., they make it very difficult to find an explicit parameterization of the SC densities. In addition, problems related to using SC-Gibbs sampling for multimodal posteriors (cf. Section 3.4) may occur as well. Code to reproduce all the computed examples will be provided as part of the release of a Matlab-based toolbox for Bayesian inversion. Ψ = A and Φ := Ψ t Ψ explicitly and use the same implementation as in the "Boxcar scenario (Ψ = A as we stay in the pixel basis, i.e., V = I n ). For larger n or 3D applications, we might not be able to store Φ or Ψ. An alternative implementation that does not require storing any matrices uses to compute b in the following way: • We again pre-compute Ψ T i f and Ψ i 2 2 for all i. Then, we store the measurement that the current state ξ would cause as f ξ and initialize it by Au 0 . In principle, f ξ is given as Ψξ, and can be directly computed at any time but this computation is too expensive to be performed at every SC update.
• For a given pixel i that is to be updated, we construct Ψ i and compute the scalar product Ψ T i f ξ to update b by the above formula (note that Ψ T i (f − Ψξ) is just a projection of Ψ i onto the current residual of f ξ = Ψξ). With the constructed Ψ i and the change, δ i , in ξ i caused by the sampling step, we can then update • While this iterative updating of f ξ is fast, inaccuracies can accumulate over time, leading to a misfit between f ξ and Ψξ. Therefore, we compute Ψξ explicitly every n steps and reset f ξ to this exact value.
The computational bottleneck of this procedure is to compute Ψ i , i.e., the Radon transform of a pixel (or voxel in 3D). For the parallel beam geometry used here, explicit formulas relying on basic operations that can be parallelized over the angles θ can be derived. For more complicated beam geometries, e.g., the cone beam geometry for 3D reconstruction, approximations relying on basic operations from computer graphics can be derived and implemented very efficiently and parallelized on GPUs.  For finding x + , a similar reasoning can be applied. In the locations of nondifferentiability, the minimal subgradient has to be used. h < J * e : In this case, J(x) is not piecewise linear (cf. the yellow line in Figure A1a) and the unique minimizer x * is not in {e 1 , e 2 , e 3 }. The convexity ensures that x − < x * < x + are all either in I 2 or I 3 . If max(∂J(e 1 )) < 0 and min(∂J(e 2 )) > 0 we have that x * (and thereby x − and x + ) are in I 2 .