Quantifying the spatial resolution of the maximum a posteriori estimate in linear, rank-deficient, Bayesian hard field tomography

Image based diagnostics are interpreted in the context of spatial resolution. The same is true for tomographic image reconstruction. Current empirically driven approaches to quantify spatial resolution rely on a deterministic formulation based on point-spread functions which neglect the statistical prior information, that is integral to rank-deficient tomography. We propose a statistical spatial resolution measure based on the covariance of the reconstruction (point estimate) and show that the prior information acts as a lower limit for the spatial resolution. Furthermore, the spatial resolution measure can be employed for designing tomographic systems under consideration of spatial inhomogeneity of spatial resolution.


Introduction
Over the last decades absorption spectroscopic linear hard field tomography of gas phase media has been applied to a multitude of engineering problems, including turbines [2,3,4,5], piston engines [6,7,8], exhaust gas aftertreatment [9,10], and coal combustion [11] to reconstruct the spatial distribution of temperatures or concentrations. The tomographic reconstruction of these distributions from several line integrals is generally an inverse problem. Due to the limited number of measurement beams these problems are often rank-deficient, requiring additional information introduced, for example, by regularization methods like Tikhonov regularization [12,1,9,13]. A more rigorous approach is given by Bayesian inversion methods, which can reinterpret many classical regularization methods like Tikhonov regularization in a statistical framework.
A frequently arising question concerns the quantification of spatial resolution of the reconstruction, be it for comparison to direct imaging methods or to help interpret the reconstruction in terms of resolvable scales in the process. Unlike conventional imaging, the spatial resolution of tomographic reconstructions is not approximately constant but highly inhomogeneous in the imaging domain. For many non-Bayesian inversion methods the resolution matrix [14,15], as the name implies, is used as a measure of resolution, since each row of the resolution matrix describes an effective point-spread function (PSF) for one specific location in the imaging plane. As noted by Tsekenis et al. [1] the PSF, or its Fourier transform the Modulation Transfer Function (MTF), is a convinient and rigorous way to describe spatial resolution in tomography. In rank-deficient (limited data) tomography, however, not every discrete spatial element is neccessarily captured by a measurement beam as opposed to full-rank tomography like in medical imaging applications for which resolution has been approached with PSFs [16,17,18]. Hence, the resolution matrix in limited data tomography often contains null PSFs corresponding to "blind spots" in the tomographic beam arrangement. These regions are difficult to interpret in terms of a finite resolution.
Tsekenis et al. [1] circumvent this problem by deducing a PSF from the Edge-Spread Functions (ESF) of reconstructed real world phantoms. The underlying assumption hereby is homogeneity and isotropy of the spatial resolution, both being questionable for low beam count measurements. This way, prior information introduced by the regularization is indirectly regarded in the resolution measure and the extensive edge phantoms circumvent the problem of blind spots, yielding heuristic empirical spatial resolution estimates.
In this work we show that, in general, the concept of spatial resolution does not apply to Bayesian tomography which usually gives a finite posterior probability distribution containing arbitrarily large or small spatial frequencies with a certain probability. Instead the posterior probability can be used for uncertainty quantification, yielding credible intervals for the value of each "pixel". Nonetheless, in most real world applications it is common practice to only regard a point estimate, e.g. the maximum a posteriori estimate (MAP), sampled from the posterior distribution. This sampling process accounts to a loss in resolution, which we address in this work.
We begin this paper by reviewing the Bayesian tomography formulation and the deficiencies of the resolution matrix approach in the Bayesian framework. We limit the discussions to a linear hard field tomography problem with normally distributed measurement error and priors formulated as multivariate normal distributions (MVN). In this context we give an expression for the covariance of the MAP estimate and demonstrate its suitability as a resolution measure. Finally, a scalar resolution measure based on a thresholding method proposed by Tsekenis et al. [1] is presented.

Tomography
Linear hard field tomography problems arise from linear integral equations (e.g. Beer-Lambert-Law), which connect the unknown quantity ( , ), which is distributed over two (or more) dimensions and , to the measurement (projection) along the th beam, where is the beam length and ( ) is the beam path through the measurement volume with parameter ∈ [0, ]. Discretizing the function on a linear basis, e.g. a finite element grid with nodes, yields the representation where are the finite basis functions and the respective weight vector to be solved. The beam integral (1) can therefore be approximated by a vector product, For beams this discretization can be summarized in a matrix equation = , where is the × sensitivity matrix defined by the beam arrangement and the grid. For a fixed measurement system (number of beams and beam arrangement), the rank deficiency of hence depends on the chosen grid density, so that any tomography problem may become rank-deficient if the grid resolution is sufficiently fine. In fact the definition of a finite grid density constitutes a prior onto itself [19]. Hence, usually the grid density is chosen sufficiently high to ensure the grid element size is well below any structure size of interest. Therefore, the classification of the problem as rank deficient (limited data, sparse) or full rank is a function of the structure sizes to be measured in the context of the beam arrangement.
Here we assume a general limited data tomography problem with < . Modeling the unknown variables as random variables and accounting for measurement error leads to the measurement model where is the beam sensitivity matrix,̃ is the random vector describing the sought after distribution, and̃ is the random measurement vector incorporating measurement noisẽ . Throughout this work random variables are marked by a superscript tilde,⬚, while fixed values and realizations of random variables are written without a tilde. The measurement errors̃ , introduced by, for example, electrical noise or shot noise, are assumed to follow a MVN distribution with mean value and covariance . For many common tomographic problems the error can be assumed independent and identically distributed with = 0, and However, the following discussions are not limited to independent identically distributed errors. Model error that mainly arise due to insufficient grid resolution are assumed to be small compared to measurement noise (sufficiently dense grid), and hence are already accounted for by the error term. For the other random variables different statistical models will be used as explained in the following sections.

Standard solution approach
In the Bayesian methodology, the data in and the unknowns in are treated as random variables that obey probability density functions (PDFs). These PDFs are related by [20] ( where the posterior distribution, ( | ), is defined by the model likelihood ( | ), which defines the likelihood of observing the data for a hypothetical , and pr ( ) defines what is known about before the measurement. The evidence, ( ), is a constant, normalizing the posterior distribution to a proper PDF. For a linear measurement model and MVN measurement noise the likelihood is given by The prior PDF can model any expectations on the behavior of . As we consider a limited data and therefore rank deficient problem the prior is mandatory in order to avoid a degenerate posterior distribution. The prior constitutes therefore an essential part of the measurement system on the same level as the experimental data. The priors should minimize information content beyond the general attributes of the field, e.g. one expects a spatially-smooth distribution due to diffusive transport. For convenience a MVN PDF is often chosen as a model The MVN prior is defined by the expected mean value, pr , and, most importantly, the covariance matrix, pr . The spatial structure expected by the prior is described by the off-diagonal elements in pr . For example for turbulent flows it might make sense to define a squared exponential covariance [19,21,22] pr , = 2 pr exp With both the model likelihood and the prior PDF modeled as an MVN distribution it can be shown that the posterior distribution is also a MVN distribution [20] with mean and covariance This MVN posterior describes the manifold of possible solutions to the tomographic inversion with their corresponding probabilities, and constitutes the outcome of the Bayesian inference. Though it is not trivial to interpret this result directly, it can, for example, be used to derive credible intervals for the quantity of interest at each grid node. Alternatively it is possible to randomly draw a set of solutions from the posterior distribution to visualize credible solutions to the tomographic problem. Unfortunately, for high dimensional tomographic problems, obtaining a representative set of solutions would require a very large number of draws, making this approach impractical. Many practitioners hence default to giving only the solution with the highest probability density, the maximum a posteriori (MAP) estimate.

Maximum a posteriori and Tikhonov Regularization
For MVN posterior distributions the MAP is equal to the mean value of the distribution, post = MAP , given in Equation (11). The MAP can also be defined as the solution to a least squares minimization problem [20] with = chol −1 , and pr = chol −1 pr , where the Cholesky decomposition can also be replaced by any other valid matrix square root. This form highlights the resemblance between the MAP estimation and classical Tikhonov regularization: by setting pr = Tik and pr = , with the regularization factor and the Tikhonov matrix Tik (for example an identity matrix for zeroth order Tikhonov or an Laplace matrix for second order Tikhonov), the MAP estimation becomes the result of a Tikhonov regularization. From the Bayesian perspective Tikhonov regularization resembles a Bayesian inference with a prior defined by the inverse prior covariance −1 pr = 2 T Tik Tik and the prior mean pr = . Note that for the smoothing operators, like second order Tikhonov regularization, pr does not exist as −1 pr is rank deficient. However, this does not limit the applicability of Equation (11) as it depends on the inverse of the prior covariance.
In summary, Tikhonov regularization amounts to determining a MAP after Bayesian inference, but neglects the actual derivation of a posterior PDF describing the full resolution manifold. Nonetheless, this single sample from the posterior PDF is often given as the convenient solution to the inference problem. With regard to spatial resolution this leads to the definition of the resolution matrix.

Resolution and Point-Spread-Function
The problem of defining resolution in tomography has been addressed for non-Bayesian methods. Definitions based on the Point-Spread-Function (PSF) seem reasonable and have already been adopted to tomography in a framework based on the resolution matrix [15,23] as well as in practical approaches [1,24,25]. The interpretation of the PSF as a pulse response of the imaging system can be transferred to tomographic imaging when viewing the tomographic inversion as a black box. The PSF is visualized by applying a initial pulse input to the system in the form of a vector pulse, consisting of all zeros except for the th element being unity. The noise free virtual measurements resulting from this pulse are Applying the augmented pseudo inverse [26], under the assumption of Tikhonov regularization yields the MAP estimate which is also the PSF of the th point in the grid, where = # is the resolution matrix of the inversion. As indicated by Equation (17) each column of the resolution matrix defines the PSF of the th grid point. Problems arise when defining PSFs for sparse/limited data tomography: many grid points are not traversed by a single ray, resulting in an null PSF for the corresponding grid point. This scenario is illustrated in Figure 1. In order to ensure that the grid as a prior does not influence the solution, a sufficiently smooth peak at the off-center location was chosen as a ground truth for this test case. The beam arrangement consists of only two perpendicular projections, each with five infinitely narrow parallel measurement beams. Many of the grid elements, e.g. point , do not influence the measurement data in our measurement model = , and are therefore not probed in the measurement. The measurement uncertainty, , is chosen to be 1 % of the noise-free peak measurement value, the regularization parameter is set to = 1, and Tik is a Laplace matrix that penalizes curvature. The resulting MAP estimate shows the expected blurring as well as the distortions due to the coarse beam arrangements. For point , which is intersected by two measurement beams, the resolution matrix gives a reasonable result that explains both blurring and distortions. For point the resolution matrix gives an null PSF ( Figure  1(g)).
The interpretation of these zero-PSFs or blind spots is not trivial, as there is no direct analogue in conventional imaging, e.g. amplifier and bus real estate on camera chips are usually smaller than the optical blurring kernel of the camera. Also, the interpretation that null PSF means that only infinitely large structures can be resolved, is faulty as structures that span the space between neighboring beams can be resolved.
Tsekenis et al. [1] avoid these issues by employing the Edge-Spread-Function (ESF) to first deduce the line-spread function (LSF) and then the PSF. The large phantom structure (two constant value levels divided by a straight line) used to measure the ESF ensures that some information is preserved in the tomographic measurements and the blurring introduced by the regularization ensures that the edge can be resolved in the blind spots of the beam array. Nonetheless, the commonly used direct relation between ESF and LSF implies that the PSF is the same for each grid point (homogeneity). In tomography, however, this is not necessarily true because of varying beam array density, varying quality of the prior, etc. The PSF is then derived from the LSF with the assump-tion that the LSF represents a cut through the PSF. While this is a crude approximation, exact analytical relations between PSF and LSF either require rotational symmetry and homogeneity of the PSF [27] or for all possible LSF orientations to be measured [28]. The validity of the recovered PSFs is hence disputable. Therefore, the standard definition of resolution does not apply to classical regularized, limited data tomography. These issues can partially resolved within the statistical Bayesian framework.

Resolution in Bayesian Tomography
Bayesian inference determines a posterior PDF describing the whole manifold of solutions as opposed to a single point estimate. Hence, this solution manifold also contains solutions with a very high spatial frequency content corresponding to small structures. The only lower limit in the possible structure sizes is given by the prior distribution, although this is not a hard limit for Gaussian priors as the probability for very high spatial frequencies is small but finite. Hence, as long as the prior reflects real knowledge about the measurement volume and is consistent with the maximum entropy principle, we can assume that this influence by the prior does not distort the reconstructed image. This gives rise to two possible interpretations of the posterior distribution: either the tomography system resolves all possible fluctuations and therefore has perfect spatial resolution, or it resolves only the possible fluctuations and therefore has a resolution solely limited by the prior. The more conservative conclusion is that the concept of spatial resolution is not transferable to a manifold or PDF of solutions as it simply describes the current state of knowledge of the measurement volume.
This equally unsatisfactory conclusion only concerns the resolution of the posterior distribution. As previously explained it is common practice to depict only a single point estimate from the posterior PDF, for which the definition of spatial resolution again makes sense. However, it should be emphasized that the actual posterior distribution holds the most information and should be used instead of a point estimate with a resolution and uncertainty measure, wherever possible.

Resolution of the MAP estimate
Within the Bayesian framework it is convenient to define the resolution of the MAP estimate based on statistical quantities like covariance, matrices instead of PSFs. However, there is a strong connection between PSFs and covariance, as has been already suggested in a non-Bayesian context for tomograms in geostatistics [23]. This connection is illustrated in the following thought experiment.
Suppose the tomographic imaging system were replaced with a camera to image the physical distributioñ , resulting in the camera imagẽ cam . The camera optics and imaging sensor have a limited resolution described by the PSFs in the resolution matrix , making the measurement model̃ We do not have prior knowledge about the structure sizes or correlation lengths of̃ , but can estimate only its mean and fluctuation width, making our prior choice an MVN prior with mean pr and covariance pr = 2 x . With this prior knowledge and Equation (18) the covariance of the resulting camera image is given as This shows that the finite PSF of the camera introduces a correlation between the pixels of the camera image. Note that we do not assume a fixed ground truth here, but regard the uncertain random variablẽ directly, making this covariance matrix of the camera image a property of the measurement system and not of a single measurement. Reversing this operation, i.e. inferring the PSFs from the camera covariance matrix, is an ill-posed problem in itself as it does not feature a unique solution, so it is not directly possible to derive effective PSFs from these covariance matrices. However, the covariance of the camera image itself can be used to define a resolution. For a tomography system the question arises as to how to define such a covariance for a point estimate without knowledge of the PSFs. While a common first thought is that the posterior covariance matrix is suitable for this purpose, its meaning differs from the covariance of the camera image given previously: it describes possible deviations of the true physical distribution from the given MAP estimate instead of the possible fluctuations of the MAP itself. This is seen in Figure 1, where the plotted posterior covariance matrix columns of nodes and do not show a direct relation to the distortions seen in the reconstructed MAP. The posterior covariance instead shows that there exists a negative correlation between the "arm" artifacts introduced by the low projection count and the peak center for the possible deviations from the MAP. Therefore, while the posterior covariance defines the inferred posterior distribution, it cannot be used to determine the resolution of the MAP estimate.
Instead, in a way that is analogous to the camera covariance a covariance of the MAP point estimate itself can be defined. We follow a similar scheme to the derivation of the resolution matrix given above, but propagate statistical moments instead of fixed values. Instead of ideal uncorrelated point sources, we employ the statistical properties of̃ given by pr and pr .
Given this prior distribution for̃ and the distribution of the errors,̃ , solving the measurement model in Equation (4) for̃ = ̃ −̃ , allows the prior distribution of measurements to be determined, and covariance b = pr T + .
The calculation of ( ) is analogous to the first step in determining the resolution matrix given in Equation (15), but instead of a fixed set of point sources we propagate the statistics of the prior and the measurement error. Accordingly the second step is the propagation of the measurement PDF to the PDF for MAP based on Equation (11), which gives By ( MAP ) the statistical properties of the random vector̃ MAP are defined unconditionally on any concrete realization of̃ or̃ . The MAP covariance, MAP , hence has a similar meaning to the covariance cam given previously and is therefore expected to have a close relation to the theoretical PSFs of the imaging system. This is demonstrated in Figure 2. In this example the same grid, beam arrangement and ground truth were used as for the example in Figure 1, but a squared exponential prior according to Equation (10) with corr equal to 10% of the width of the measurement area is applied in place of the Tikhonov prior. This ensures that pr exists, enabling the direct application of Equation (25). Priors that do not posses a valid covariance matrix like first and second order Tikhonov priors, will be addressed in the following sections. Figure 2 demonstrates that while the MAP covariance, MAP , gives a structurally similar result to the resolution matrix for node , it also gives a correlation structure for node in between the beams, where the resolution matrix fails (see Figures 2(g) and 2(k)). Furthermore, the structure at node closely resembles the inferred MAP from the ground truth with a peak at . The visible structure of the beam array is an effect of the short correlation lengths not fully spanning the space between measurement beams. This reslt exemplifies how beam arrangement influences resolution.
The MAP covariance does not exhibit the same "blind spots" as the resolution matrix due to the incorporation of the prior information, as long as the prior spans the unprobed grid nodes with a certain correlation. If the prior does not fulfill this requirement the MAP covariance will also feature blind spots similar to those in the resolution matrix. However, if neither prior nor measurement beam span these grid nodes, they are indeed blind (as opposed to the examples given here), either defaulting to the mean value given by the prior or leading to a degenerate posterior and therefor no unique MAP. The prior in limited data tomography is therefore an integral part of the measurement system; this is reflected by the MAP covariance which incorporates the prior information. Both the MAP and the MAP covariance resolution can be highly anisotropic in low beam count tomographic imaging. The definition of a scalar resolution quantity is therefore always connected to a further loss in information on the behavior of the imaging system. Nonetheless, in most practical applications a scalar resolution quantity at each location in the tomography domain is preferred. Figure 2: Demonstration of posterior covariance, resolution matrix and MAP covariance on a test case with beam array and grid, (a), square exponential prior, (b), phantom, (c), and resulting MAP, (d). Posterior covariance, resolution matrix and MAP covariance are shown for two locations marked with magenta circles: point is not traversed by any measurement beam, and point is traversed by measurement beams.

A resolution measure
While discussions of the resolution measures based on PSF, optical transfer function (OTF), or modulation transfer function (MTF) can be found throughout optics textbooks [29,30], Tsekenis et al. [1] provide a thorough discussion of the intricacies of their application in tomography, concluding that a resolution quantity defined on the MTF or OTF amplitude is more robust than resolution measures defined using the PSF.
Following their definition, the OTF for a node is given as the Discrete Fourier Transform (DFT) of the PSF, where the DFT operates on the two dimensional grid instead of the linear PSF vector itself. The resolution measure in spatial frequency, , is then defined by an OTF amplitude threshold relative to the peak OTF of, e.g. th = 20%. The is then transferred to a spatial resolution measure As the PSFs are unknown, the OTF cannot be calculated directly, but instead the Fourier transform of the MAP covariance columns can be used: Note that ( MAP ) ∶, ( , ) refers to the representation of the spatial distribution on a linear basis, introduced in Equation (2). The complex Fourier coefficients  ( , ) then describe the complex magnitude of the spatial frequency component ( , ) for the th column of the MAP covariance matrix. Instead of the amplitude of the OTF, we apply the threshold (relative to the maximum) to the amplitude of these Fourier coefficients, | ( , )|. Therefore the question of choosing the threshold for comparable results to the PSF/OTF based method arises. Note that the units of the amplitudes of  are always the square of the units of the OTF amplitudes. Therefore  resembles a power spectral density of the (non-existent) PSF at node , although this is not a valid identity for a inhomgeneous PSF function. Hence we propose to choose the threshold in  as the square of the threshold in the OTF, Alternatively the square root of the amplitude of | | can be employed. As Tsekenis et al. assumed an isotropic PSF their thresholding method gives an unambiguous value. Here we account for the anisotropy of the resolution by calculating, MAP and PSD, which are not rotationally symmetric. Thresholding therefore gives a contour line around the zero spatial frequency point in the PSD. In order to attain a conservative scalar estimate of resolution we choose the lowest frequency of the contour Before this procedure is demonstrated on a test case, we revisit the treatment of classic Tikhonov smoothness priors with this method.

Application to improper prior distributions
While Equations (11) and (12) can be applied to Tikhonov regularization without modifications, the determination of MAP according to Equation (25) requires a valid prior covariance matrix, pr , which does not exist in the case of Tikhonov regularization. To get a resolution measure nonetheless, the rank deficiency of the regularization operator (or whitening operator), Tik , needs to be treated. This can be done by taking the singular value decomposition of the Tik × operator matrix Tik = T , with diagonal elements of [20] 1 ≥ 2 ≥ ⋯ ≥ > +1 = ⋯ = = 0, = min( Tik , ).
Thus the ( + 1) th to th column vectors of describe the directions in which the prior does not supply any information. These null space basis vectors are summarized as a subspace which is used to define the approximate prior covariance [20] pr,Tik = 1 2 † Tik † Tik where † Tik is the pseudo inverse of Tik . As becomes larger this covariance approaches the prior distribution. For common smoothness priors described by Laplace operators or other difference operators, consists of only a single vector with all elements the same value (i.e. the mean value of the distribution is unknown), enabling the discussion of the asymptotic behavior of pr,Tik . In this case T is proportional to a × matrix of ones, 1 T = × . Further, note that the absolute magnitude of  and therefor the magnitude of is unimportant as the threshold is chosen relative to the maximum value. It is hence valid to instead regard an effective MAP covariance The effective MAP,ef f can then be used to calculate the PSD and determine the cutoff frequency .

Results
To demonstrate the determination of a scalar resolution estimate we again employ the example given in Figure 2. The MAP covariance for point and the corresponding PSD column are depicted in Figure 3. As can be seen in Figure 3b the cut off frequency is determined to be = 1.02 m −1 , resulting in a scalar resolution measure of = 0.4911 m. Determining this resolution measure for every grid point yields the resolution map in Figure 4.  Artifacts originating from the coarse beam array are visible in the resolution pattern, but the detailed influence of the beam count and arrangement, which is introduced by the beam sensitivity matrix, , in Equation (25), is not apparent.
Instead we again focus on the resolution in a single point of the spatial domain to discuss the influence of the measurement information quantity on resolution. In order to investigate the influence of beam array density, randomly oriented beams are added to the orthogonal array. Figure 5 shows that as expected resolution improves as the beams are added. By increasing the beam count further the spatial resolution at each single Figure 5: Spatial resolution over beam count for two squared exponential priors with different spatial correlation lengths. point approaches a fixed value given by the prior distribution, which acts as a lower limit of the spatial resolution. This is illustrated in Figure 5, which depicts resolutions found using two squared exponential priors, one with a correlation length parameter = 0.2 m (shorter spatial correlation length) and one with = 0.3 m (longer spatial correlation length). The prior with the shorter correlation length is the less restrictive prior, providing less support of the solution but allowing for higher spatial resolution. The asymptotic resolution values of 0.173 m for = 0.2 m and 0.256 m for = 0.3 m match these expectations. Furthermore, the dashed lines in Figure 5 represent the prior resolution which is calculated by applying the scalar resolution measure to the prior covariance instead of the MAP covariance. In accordance with the previous explanation the prior resolution represents the asymptote to the tomographic resolution in Figure  5. Hence, the prior gives a lower bound for the spatial resolution, and when this lower bound is reached additional measurement beams do not significantly improve resolution. This agrees with studies of the influence of the beam arrangement on resolution using Tsekenis approach [31]: the resolution approaches a limiting value, although this could not directly be explained using Tsekenis empirical approach.

Conclusion
This work discusses spatial resolution measures for linear Bayesian hard field tomography. For the posterior PDF a spatial resolution measure cannot be directly defined due to the arbitrarily large frequency content of candidate solutions. Rather, it is only possible to define the resolution of a specific point estimate, in this case the maximum a posteriori estimate. Especially in the case of rank deficient tomographic problems the classical approach of measuring or calculating the point-spread function suffers from indefiniteness in blind spots that are not traversed by measurement beams. These problems stem from the contradiction between the assumed prior and the assumption of point spread functions, or, put differently, from disregarding the prior knowledge used for inversion.
To remedy these problems we propose a statistical measure, the covariance matrix of the maximum a posteriori estimate, instead of point-spread functions. The statistical formulation allows for the incorporation of the prior PDF into the resolution measure. The use of spatial correlations of the MAP allows for a similar treatment as point-spread functions, namely transfer to spatial frequency domain and thresholding, giving a scalar resolution measure for each point in the measurement domain.
This resolution measure is influenced by measurement noise, beam arrangement, and the prior. It constitutes an important mathematically-valid quality criterion that can be employed in the concept phase of the design of a tomographic measurement system.