Unsupervised solution for in-line holography phase retrieval using Bayesian inference.

In propagation based phase contrast imaging, intensity patterns are recorded on a x-ray detector at one or multiple propagation distances, called in-line holograms. They form the input of an inversion algorithm that aims at retrieving the phase shift induced by the object. The problem of phase retrieval in in-line holography is an ill-posed inverse problem. Consequently an adequate solution requires some form of regularization with the most commonly applied being the classical Tikhonov regularization. While generally satisfying this method suffers from a few issues such as the choice of the regularization parameter. Here, we offer an alternative to the established method by applying the principles of Bayesian inference. We construct an iterative optimization algorithm capable of both retrieving the unknown phase and determining a multi-dimensional regularization parameter. In the end, we highlight the advantages of the introduced algorithm, chief among them being the unsupervised determination of the regularization parameter(s). The proposed approach is tested on both simulated and experimental data and is found to provide robust solutions, with improved response to typical issues like low frequency noise and the twin-image problem.


Introduction
In the hard x-ray regime the investigation of low absorbing materials such as biological samples is difficult using x-ray absorption imaging. This limitation can be overcome through phase imaging techniques with highly intense and coherent synchrotron beams [1]. In in-line holography [2][3][4], phase contrast is the result of the wavefield propagation through free space after interaction with the object. Combined with a nanofocus [5], the experimental setup allows for the acquisition of several diffraction patterns on a fixed detector as the sample is moved at multiple distances from the focus. A quantitative relation exists between the recorded images and the complex index of refraction characterizing the object. This dependency is described in our paper by detailing the image formation process and the "Contrast Transfer Function" model that approximates it. Based on this model, inversion algorithms aiming at retrieving the phase shift can be deduced. However, due to poor transfer of information from the object plane to the detector plane for some spatial frequencies, phase retrieval is an ill-posed inverse problem that requires adequate regularization. We describe in the following the standard Tikhonov regularization techniques most commonly in use for phase retrieval as well as their limitations and drawbacks. Consequently we propose an alternative inversion method based on Bayesian inference that addresses some of the imperfections of the classical approach. The main advantage of the proposed algorithm is the unsupervised determination of the regularization parameter. Finally we proceed to test and compare the standard and Bayesian approaches on both simulated and experimental datasets.

Image formation in in-line holography
Considering the propagation of x-ray electromagnetic waves through an object, a refraction phenomena must occur that can be described by the three-dimensional complex refractive index of that material: where r denotes the spatial coordinate r = (x, y, z). The real part of the complex refractive index δ(r) records a relative phase shift in the wave propagating through the object as compared to the wave passing through free space (the 'background' in imaging terms). The imaginary part, β(r), is a measure of the wave attenuation and proportional to the linear attenuation coefficient, µ = 4π β/λ. The difference from unity of n is very small for x-rays, with the real part larger or much larger than the imaginary part (δ ≈ 10 −6 , β ≈ 10 −9 , for water or intracellular biological material in hard x-ray regime at 17keV, as studied in this work). This represents the prime argument for the use of phase contrast imaging instead of classical absorption imaging. Given the object plane is orthogonal to the optical axis and an unitary incident wave u inc = 1, we get that the expression of the wave in the object plane (as it exits the object) u 0 (r T ) is equal to the so-called transmission function T(r T ) of the object : The transmission function [6] is determined by the projection of the complex refractive index on the optical axis z: The wave intensity in this plane corresponds to the absorption image. It is given by the square modulus of the wave and thus retains only the attenuation information: I 0 (r T ) = |u 0 (r T )| 2 = |T(r T )| 2 = exp{−2 B(r T )} The Kirchhoff-Fresnel [7] integral in its paraxial approximation gives the relation between the wave in the object plane and the wave propagated to a finite distance D: where λ is the wavelength. It can be used to derive a powerful mathematical operator in in-line holography, called the Fresnel propagator. Wave propagation is found equivalent to convolution with this operator in real space, thus multiplication in frequency space: where F is the Fourier transform operator and f stands for the frequency coordinate: Finally any recorded image is basically the wave intensity I D at a certain propagation distance D: The last result makes it obvious that the recorded diffraction patterns represent an entangled, convoluted mixture of the phase and attenuation coefficients of the wave in the object plane. To avoid the computational difficulties related to the convolution operator the inversion algorithms make use of the Fourier transforms of the recorded imagesĨ D to retrieve the phase and/or the absorption components. Reconstructing the entire refractive index requires the limitation of the problem to a simpler (though farther from the truth) mathematical expression of the diffraction patterns, that permits the numerical retrieval of both the phase and attenuation images. This begins with the important result of Guigay [8]: Additionally, it employs several hypothesis on the physical properties of the object in order to arrive to a linear relation between the Fourier transform of the diffraction patterns and the Fourier transform of the unknown phase and attenuation. Linear approximations based on the "Transport of intensity equation" (TIE) exist and are in general use [9,10]. However they are valid approximations only for small propagation distances. Contrast due to phase shift becomes more significant as the propagation distance increases which means that phase imaging is more sensitive in a regime that TIE fails to correctly approximate [11]. Instead we make use of the "Contrast transfer function" (CTF) method [4,8,12,13] that builds on the reduction of relation (3) to a linear expression: This is valid under the assumptions of low absorption, B(r T ) 1 (often true for soft biological tissue), and slowly varying phase, |φ(r T ) − φ(r T + λDf)| 1 (again true in cell imaging, where recorded images usually have a sparse gradient). Combining the last two equations one can derive the formulation of a basic direct model in phase retrieval called the CTF model: Further simplification of this model can be achieved by assuming either a non-absorbing, pure phase object [12], B(x) = 0:Ĩ or a homogeneous object [14,15] with a known ratio between its phase and attenuation components, δ/β:Ĩ An extension of the CTF model, called the Mixed Approach [13], was developed for the treatment of more strongly absorbing samples starting with the hypothesis of a slowly varying attenuation where Ψ(r T ) = I 0 (r T ) φ(r T ). A drawback to this approach is the requirement to know (record) I 0 (r T ), the attenuation image.
For a correct numerical evaluation, the direct model must also account for some physical realities like the effect of the spherical wave illumination, the partial coherence of the beam or the point-spread-function of the detector. The spherical wave illumination induces a magnification and a change in effective propagation distance. The effect can be accounted for prior to the phase retrieval process [5]. The partial coherence and the detector response can be described as simple multiplications of the Fourier transform of the intensityĨ D (f) with respectively the degree of coherence and the detector transfer function. From now on, for a simplified formalism, every time we writeĨ D (f) we assume these multiplications are included in notation.
All enumerated models feature a linear relation between the Fourier representation of the recorded imageĨ D (f) and the Fourier transforms of the phase,φ(f) and the attenuation,B(f). The inverse problem in in-line holography consists in retrieving the latter two quantities from the inputĨ D (f). In the case of single distance acquisition, the problem is obviously underdetermined. Immediately this can be overcome by recording several images at multiple propagation distances. Still the problem remains ill-posed for the following reason. The coefficients of equations (11) to (14) are sine and/or cosine functions of a multiple of the squared modulus of the spatial frequency. For those frequencies that render these coefficients null (the so-called zero-crossings of the transfer function, see Fig. 1 the phase and/or the attenuation retrieval becomes impossible. Although this is true for only a few frequencies in reciprocal space, it means there is no unique solution to the phase retrieval problem, known as the twin image problem. Another use of multi-distance acquisition in phase imaging is to provide redundancy in input information and thus mitigate the zero-crossings issue. Previous research and practice have set the number of optimum distances at four [12]. However the multi-distance acquisition does not solve the fact that the phase transfer function is close to zero at low spatial frequencies. In order to fully address the ill-posedness of phase retrieval one still has to take additional measures on the computational side such as regularization. We proceed to describe how this is applied in phase retrieval in the next section.

Phase retrieval through classical regularization
We have set in the previous section the formulation of the direct (CTF) model in phase imaging as a linear, algebraic relation between the observations (recorded diffraction patterns) and the unknowns (complex index of refraction) in Fourier space. We also underlined the difficulties raised by the ill-posed inverse problem and the necessity of applying regularization to obtain a satisfying solution to phase retrieval. Here we proceed to introduce the results of classical regularization and their application to the inverse problem of phase retrieval. The rising shortcomings of this method are highlighted and addressed later in this paper.
We consider inverse problems that, like phase retrieval, can be formulated as a linear and algebraic relation y = Hx + , between the known output data y and the unknown input x. H represents the system matrix describing the model, the physical relation between the input and output datasets; stands for the noise affecting the measurements. In phase retrieval y is equivalent to the collection of recorded images in Fourier spaceĨ D k (f), while x stands for the unknown phaseφ(f) and/or attenuationB(f), again in reciprocal space. The elements of matrix H are defined by the coefficients of the CTF model chosen to better represent our a priori information on the imaged object.
There are several basic methods to solve the inverse problem, i.e. to infer on the unknown quantity x. Most of them generally offer poor estimatesx because of the unrealistic hypotheses they rely on. Direct inversion is based on the assumption that the matrix H is invertible leading to the simple solution:x = H −1 y. Even if the matrix H is invertible, it might still suffer from ill-conditioning. A small conditioning number of the matrix of a linear system makes it so that small variations in the input translate into large variations of the output. Consequently, small errors in measurements (recorded images) translate into large errors in the retrieval of the unknown quantities (phase, attenuation).
The basic solution for a realistic hypothesis that accounts for noise errors and a non-square matrix H is the least squares (LS) estimate. It is based on the minimization of the Euclidian distance between the observations y and model simulations Hx: where J LS designates the least squares criterion. Given the problem is convex, the solution can be obtained by solving where H * denotes the conjugate transpose of H. This solution is still very sensitive to noise depending on the conditioning number of matrix H * H that needs to be inverted. In such situations least squares estimation requires regularization. The regularization theory aims to overcome the ill-posedness of inverse problems. As introduced by Tikhonov [16], the common approach is to construct a compound criterion that adds a penalization term to the standard least squares one: where ∆ is a distance in the unknown space, λ R is called the regularization parameter and x 0 represents an a priori estimation of the solution. While the least squares term y − Hx 2 attempts to match the observations y to the model simulations Hx, the regularization term λ R ∆(x, x 0 ) acts as a constraint that makes the solution less sensitive to the inherent noise in the measurements. It is also meant to bring the solution closer to available a priori information as described by x 0 and the choice of distance ∆. The regularization parameter λ R acts as a weight to the penalization term and balances between the relevance attributed to the a priori information as compared to the observations. If we choose ∆ to be the Euclidian norm, ∆(x, x 0 ) = x − x 0 2 , the compound criterion for Tikhonov regularization becomes: and by minimizing it ∂ ∂x J(x) = −2H * (y − Hx) + 2λ R (x − x 0 ) = 0, we obtain the estimator: Given the lack of a priori information, in many cases we assume that x 0 = 0, which leads to the minimum norm least squares solution: Here, by appropriately choosing the regularization parameter's value, λ R , we can control the conditioning number of matrix H * H + λ R I that needs to be inverted. Applying this result for the CTF models expressed by equations (11) to (13) leads to the following solutions for phase retrieval: where k stands for the index of the distance at which a certain image was acquired, A = k sin(πλD k |f| 2 ) cos(πλD k |f| 2 ), B = k sin 2 (πλD k |f| 2 ), C = k cos 2 (πλD k |f| 2 ) and ∆ = BC − A 2 . These solutions are the result of applying classical Tikhonov regularization to the inverse problem of phase retrieval. They are tested in this work against a regularization method based on Bayesian inference.
There are other various options available when choosing the appropriate distance for the model term ∆(y, Hx) and the regularization term λ R ∆(x, x 0 ): • Quadratic or Least squares (LS): where M is the length of output y and N is the length of output x. It leads to the Tikhonov solution in Eq. (19).
• Weighted least squares: where Q 1 and Q 2 represent the weights applied to the two norms in matrix form. Combined they play an identical role to λ R , the scalar regularization parameter in Eq. (18), but they express it in a more complex matrix form. Given that the choice of λ R in classical regularization is done empirically, the determination of Q 1 and Q 2 is even more complicated. We show later how a non-scalar and unsupervised regularization can be achieved through the Bayesian method we introduce.
• Kullback-Leibler (KL) divergence: is widely applied in statistics as a measure of similarity between two or more probability density functions [20]. This 'distance' is used in the next section where we develop the Bayesian algorithm.
Computing the estimatorx in Eq. (19) requires the inversion of the matrix H * H + λ R I. In those instances where this matrix proves too large or too difficult to invert, several iterative optimization methods can provide or improve a solution [21,22]. At iteration k + 1, it generally follows the expression: Depending on the expression of functional δ(x (k) ) various gradient based iterative algorithms can be applied. The fixed step gradient descent iterative solution is commonly employed. It uses δ(x (k) ) = −∇J(x (k) ) and a fixed α coefficient, so that For the basic minimum norm least square criterion J(x) = y − Hx 2 + λ R x 2 the iterative solution becomes: We can observe that initialized with a null value x (0) = 0, we obtain the solution x (1) = H * y after the first iteration.
Although the standard regularization methods we presented above offer satisfying and robust results, there are a number of problems that remain open, like the determination of the regularization parameter or the arguments behind choosing the appropriate type of penalization term. Various theoretical methods of automatically computing the ideal value of the regularization parameter have been developed, among which the L-curve method is the most commonly applied [23]. In practice we found this method to be unsatisfactory and consistently resorted to manually determining the value of the regularization parameter by trial and error. Under a Bayesian framework we show that these issues can be addressed directly, in a less empirical fashion.

Bayesian regularization with Gaussian priors
We assume that our problem has been linearized and discretized so that the forward model can be written using this algebraic form: where the output (experimental data) y and the additive noise are vectors of length M, the input (the unknown quantity we try to estimate) x is a vector of length N, so that H is a M by N matrix describing the linear model. Probability density functions are assigned to this quantities: p(x) as the prior distribution, p(y|x) as the likelihood and p(x|y) as the the posterior distribution. Applying the Bayes rule for probabilities [24], the posterior distribution is found to be proportional to the multiple of the previous two: where the denominator p(y) = ∫ p(y|x) p(x) dx is a normalizing constant called evidence, that will be ignored in the following formalism.
Let's consider a white noise distribution for , p( ) = N( |0, v I), i.e. a Gaussian distribution with zero mean and where v denotes the variance of the noise. In turn the likelihood follows a Gaussian distribution of mean equal to the model simulation Hx and variance equal to that of the noise: A result that we make use of in the next section states that the estimator of the variance of a Gaussian distribution follows an Inverse Gamma distribution, so that: The choice of the prior law is one of most sensible since it is mathematically equivalent to the choice of the regularization term in the classical regularization theory. In our case we employ a multivariate Gaussian distribution to describe the random vector x of unknowns, of mean x 0 and covariance matrix V x . The v j components of this vector are defined as independent random variables following one-dimensional Gaussian distributions, of mean x 0j and variance v j : This ultimately leads to a solution comparable to applying an L2 norm penalization term in Tikhonov regularization. The variance of the noise and of the prior distribution make up the so-called hyper-parameters of the model, θ.
In the case of maximum a posteriori (MAP) estimation of the posterior distribution p(x, θ |y) the Bayesian approach [25] is able to infer on both the unknown quantity x and the hyper-parameters of the model, θ: The log-a-posteriori criterion J(x) is logarithmic in order to simplify the minimization of the exponential functions that define the posterior distribution p(x, θ |y).
Here we propose a more sophisticated method, the Bayesian Variational Approximation (BVA) [26], which begins with approximating the posterior distribution p(x, θ |y) by a separable one q(x, θ |y) = q 1 (x) q 2 (θ). The quality of this approximation is assessed with the Kullback-Leibler divergence measure.
Minimizing it, ∂ ∂q i KL(q : p) = 0, we obtain solutions for the q 1 (x) and q 2 (θ) distributions (36) The estimatorsx andθ are computed as the expected values of these distributions.

Gaussian prior case
Here we offer a more detailed explanation of how the expressions of the aforementioned estimators are determined. The Gaussian prior model assumes that the prior follows a normal probability distribution. It is called a hierarchical prior model since it introduces a hidden variable v of Inverse Gamma distribution. This hidden variable represents the variance of the normal prior distribution.
Above we assumed the normal prior has a zero mean as, more often than not, no estimation of the inferred quantity is available a priori. In these conditions the expression of the posterior distribution becomes: The fully developed expression of the logarithm of the posterior distribution writes as: Applying BVA as presented above, we search for a separable distribution q(x, v, v ) that approximates the posterior distribution in Eq. (39). We may treat x as a random vector with independent components x j : or with correlated components: Hence forward we develop both the correlated case and the independent case. We continue to solve the following expectations: where L denotes the logarithm of the posterior distribution in Eq. (39) and we arrive to: The expressions of the estimators are found by a simple process of identification.
The numerical computation of these estimators requires the initialization of a few parameters linked to the Inverse Gamma distributions. Standard values for hyper-parameters α v = α = 1 and β v, j = β = 1 can be chosen as input so that, at first iteration, the regularization termv /v j equals one leading to a solutionx identical to the minimum norm least square estimator (as in Eq. (20) with λ R = 1)). An iterative optimization process can now begin where the estimators {x,v ,v j } i+1 at iteration i + 1 are computed using the values obtained at the previous iteration, i, until convergence is reached. Practice has shown that very few iterations (often less than 10) are necessary to reach convergence. The difference in computational speed compared to the non-iterative Tikhonov solution is negligible since all calculations included in the optimization process are vector-based arithmetics and do not include any Fourier transformations.
The solution obtained here forx is similar to the one given by minimum norm least squares regularization Eq. (20). The significant change in the expression of the estimatorx is that the regularization parameter λ R , a scalar, is here replaced by a vector defined as the ratio of the estimated variances of the noise and posterior distributionsv /v j . Interestingly, this particular result is consistent with the formulation of the Wiener deconvolution filter [27]. Widely used in various image processing applications, this filter makes use of the same ratio between the variance of the noise and the variance of the signal (a signal-to-noise measurement) in order to improve on the least squares filter. Coming back to our proposed solution we can underline several advantages. The values of the regularization vectorv /v j are obtained alongside the final solutionx in an alternate optimization scheme. This leads to an unsupervised determination of the regularization parameter that replaces the empirical, trial-and-error tuning process necessary in classical regularization. Moreover the availability of a specific regularization parameter for each element ofx obviously allows for more flexibility than the scalar λ R , since different components ofx may require regularization in different 'weights'. In the case of phase retrieval this non-uniform regularization pattern is shown to come in very useful in dealing with the low frequency noise.

Phase retrieval through Bayesian regularization
In order to implement the algorithm constructed in the previous section we need to follow a few preparatory steps. First the outputs of the in-line holography model, the acquired images I k of size n × n pixels need to be translated into Fourier space and padded accordingly (to a size of 2n × 2n pixels) and then assembled and reshaped into a single vector y of length 4Kn 2 pixels, where K denotes the number of recorded diffraction patterns. We consider here the acquisition of four images (K = 4) as it was the case with the experimental data that we treat later. The phase and attenuation images we intend to retrieve are of course of the same size 2n × 2n pixels, after padding. Consequently, the x vector comprising the inferred quantities must be of length 4n 2 in the case of a pure-phase object (when we only retrieve the phase) or 8n 2 in the more general case (when we reconstruct the entire refractive index). At the end of the algorithm the solution has to be deconstructed into one or two matrices which, after applying the Inverse Fourier transform, will represent the retrieved phase φ and attenuation B. The scheme below illustrates the algebraic result of the vectorization process we described.
x [8n 2 , 1] Lastly we need to define the model matrix H. Its elements are the coefficients of the CTF model used and have the following general expressions: where k stands for the index of the distance at which a certain image was acquired. In propagation based phase retrieval the acquired images are often large, typically n = 2048 pixels. Thus matrix H is found to be huge-dimensional and impractical to store or manipulate in its entire form. Fortunately, it is also a highly sparse matrix, block diagonal, which means it is sufficient to store the diagonals representing the coefficients of the contrast transfer function. The matrix operations needed in Eq. (45) can be comfortably handled by operating with vectors s k and c k .

Simulations
First we tried to validate our method on simulated data. We used modified Shepp-Logan phantoms to model the ground truth, i.e. the real and imaginary components of the complex refractive index. The first picture represents the phase component φ of the constructed object (see Fig.  2). The attenuation component was considered proportional to the phase, lower by a factor of 1 000, corresponding to a δ/β ratio typical for water and soft tissue samples. We simulated the propagation of the constructed wavefield in the object plane u 0 = exp{−B + iφ} to four distances D by multiplication with the Fresnel propagatorP D (see relation (6) The four simulated holograms finally obtained are shown in Fig. 3. They were used as input for a phase retrieval algorithm that assumed the pure-phase CTF model in Eq. (11) as a reflection of the large δ/β value chosen.
The featured results were obtained by L2 regularization (Fig. 4(a)) and by Bayesian inversion with a Gaussian prior assuming independent components (Fig. 4(b)). A qualitative comparison between the two retrieved 'phase maps' shows the first one to suffer from a low frequency noise that erroneously lowers the values of the pixels in the center of the image and raises the values of those at the four corners. This undesired effect is less pronounced in the second result obtained with the proposed algorithm. In order to provide a quantitative assessment of the two retrieved 'phase maps' we calculated the normalized mean square error for each result relative to the ground truth: NMSE = i j (φ i j − φ i j ) 2 / i j (φ i j ) 2 , whereφ is the retrieved phase and φ is the ground truth. The values of the normalized mean square error found in the two cases were 0.44 for the standard regularization result and 0.27 for the Bayesian inversion. Multiple simulation scenarios were explored where we varied model parameters like the δ/β value (in order to simulate weaker or stronger absorbing objects), the CTF formalism applied for inversion or the standard deviation of the added noise. All the results we obtained suggest that the proposed Bayesian-based algorithm is capable of providing robust solutions that are qualitatively and  quantitatively superior to the standard method.

Experimental data
The proposed method was applied to different experimental datasets acquired at the ID16A NanoImaging beamline at the European Synchrotron Radiation Facility (ESRF). The presented data are chemically fixed red blood cells, imaged at an energy of 17.05 keV (equivalent to an x-ray wavelength of 0.073 nm). The beam was focused by Kirkpatrick-Baez optics [28,29] down to approximately 30 nm in both directions [30]. The sample was placed behind the focus acting as a quasi-point source, creating a divergent beam geometry. Images were recorded using a FReLoN CCD based detector with a pixel size of 0.845µm. Images were registered at four effective propagation distances (D 1 = 4 mm, D 2 = 4.16 mm, D 3 = 4.82 mm, D 4 = 6.17 mm) with a fixed focus to detector distance of 337 mm, yielding an effective pixel size of 10 nm. We show the acquired holograms in Fig. 5 after necessary pre-processing corrections (like flat-field correction, compensation for magnification, mutual alignment etc.) were applied. The phase maps of the red blood cell were retrieved using standard L2 regularization (see Fig. 6(a)) and the Bayesian algorithm assuming independent components (see Fig. 6(b)). As with the simulated data we can see some improvement in low frequency noise reduction. One obvious fault of the phase map retrieved by standard regularization is the unjustified 'glow' or high value pixels found in close vicinity of the cell edge. This is interpreted to be the effect of the twin-image problem [31]. Especially encouraging is that the Bayesian inversion algorithm seems to limit the effects of the twin-image problem, as it is revealed by the line profiles through the cell shown in Fig. 7. Table 1 Applying the Bayesian method of phase retrieval to other real data, we could observe its validity as an alternative to the standard inversion algorithm in place. The advantages theoretically anticipated were confirmed, notably the unsupervised determination of the regularization parameter(s). A basic quantitative comparison of the two results is shown in Table 1. Here by 'std' we denote the standard deviation of a particular region of the image. SNR stands for the signal-to-noise ratio; the values in Table 1 were determined as the fraction between the square standard deviation of the region of interest (the cell) and the square of the standard deviation of the background (the region outside of the cell). The algorithm we introduced produced a phase map with a higher SNR mainly due to more pronounced signal gradient characterizing the cell region.
As shown in Fig. 8, the proposed algorithm quickly converges towards the final solution, the retrieved phase map shown in Fig. 6(b). Fast convergence (below 8 to 12 iterations) has been observed on each occasion the algorithm was applied, independent of the nature of reconstructed sample (here, a weakly absorbing biological sample i.e. a red blood cell; in other instances, more strongly absorbing samples or variously defined phantom objects). In the particular case of the red blood cell, in order to enable the fastest convergence we chose to initialize the variablesv and v j in such manner that their fraction -the regularization parameter at first iteration -equals 0.1. This value was found after running the algorithm multiple times. The initialization value chosen for the variance of the noise,v = 25, was found as the average variance of the background in the four recorded images multiplied by the number of pixels in one image (4n 2 ). At first iteration, the obtained phase map resembles a too strongly regularized minimum norm least squares solution as predicted in the discussion of Eq. (45). The second iteration provides a solution similar to the classical regularization result shown in Fig. 6(a), that suffers from pronounced twin-image artifacts, in the form of the unrealistic "glow" surrounding the cell. The following iterations enable a fast optimization of this result, as indicated by the quick stabilization of the norm of the solution around the value of 3.52 × 10 −2 (see table in Fig. 8). Correcting some of the abundant low-frequency noise present at the second iteration and showcasing a 'flatter' background, the final solution represents an improved phase map reconstruction.
In an attempt to analyze the amount of regularization performed by the proposed algorithm we looked at the modulus of the ratiov /v j (equivalent to the regularization parameter for our method) and its variation with the spatial frequency, as shown in Fig. 9. First we observe how sharply the function varies in the low to intermediate frequency range. This showcases the advantage of the proposed method in adapting the amount of regularization needed for a specific spatial frequency. It is easy to see how a classical approach using a scalar regularization parameter, thus constant for all frequencies, might fail to apply the appropriate amount of regularization for at least a certain frequency range. As expected the value of the regularization parameter is larger for very low frequencies, close to zero, corresponding to low sensitivity of propagation based imaging to slow variations of the phase; the phase transfer function is close to zero at low spatial frequencies as discussed at the end of section 1.1. For high frequencies where the signal-to-noise ratio is again low, the regularization parameter is also large in order to compensate for the noisy data.

Conclusions
The inverse problem of phase retrieval in in-line holography is an ill-posed inverse problem. We considered the case of multi-distance propagation-based phase contrast imaging and introduced linearized forward models expressed as Contrast Transfer Functions. The ill-posedness of the problem was readily recognized at low spatial frequencies and the zero-crossings of the transfer functions, making the phase retrieval procedure particularly sensitive to noise and measurement errors. The present work first discussed standard Tikhonov regularization and highlighted its drawbacks, such as the difficult determination of the regularization parameter and the reasoning behind the choice of the appropriate type of regularization term. As an alternative to the standard approach, we proposed a statistical estimation method based on Bayesian inference. An iterative optimization algorithm was constructed along the lines of the Bayesian Variational Approximation method and applied to the linearized inverse problem of phase retrieval. An immediate advantage to the standard framework is the unsupervised determination of the regularization parameter. Moreover this parameter is no longer a scalar, but a vector of the same length as the solution. This allows for a better tuned result by applying a specific scalar regularization value to each frequency in the Fourier representation of the solution. Another improvement to classical regularization is the fact that this algorithm accounts for and models the noise through one of the prior distributions. The Bayesian inversion as a statistical method allows for more informed decisions, intentionally shaping the solution through a priori knowledge instead of repetitive, empirical choices as with deterministic methods based on Tikhonov regularization. The introduced algorithm was tested on simulated and experimental data. The results obtained on the constructed phantom proved to be numerically superior to those provided by the standard method in terms of normalized mean square error. Applied on experimental data the Bayesian algorithm provided qualitatively improved results by reducing the negative effects of low frequency noise and the twin-image problem. Its fast convergence and robustness make it usable on different kind of data without significant computational overhead. In order to make better use of the a priori information available, one can consider other than Gaussian priors [32]. The flexibility of the algorithm allows for the quick implementation of alternative solutions based on these different priors. The ability of the proposed algorithm to infer on the unknowns as well as other multi-dimensional quantities like the regularization parameter or the noise itself shows it is able to make better use of the redundancy in the input information available in multi-distance propagation-based phase imaging. This quality makes the proposed approach well suited to other redundancy rich methods in coherent diffraction imaging like near-field [33] or far-field [34,35] ptychography.