Generalized Proximal Smoothing for Phase Retrieval

In this paper, we report the development of the generalized proximal smoothing (GPS) algorithm for phase retrieval of noisy data. GPS is a optimization-based algorithm, in which we relax both the Fourier magnitudes and object constraints. We relax the object constraint by introducing the generalized Moreau-Yosida regularization and heat kernel smoothing. We are able to readily handle the associated proximal mapping in the dual variable by using an infimal convolution. We also relax the magnitude constraint into a least squares fidelity term, whose proximal mapping is available. GPS alternatively iterates between the two proximal mappings in primal and dual spaces, respectively. Using both numerical simulation and experimental data, we show that GPS algorithm consistently outperforms the classical phase retrieval algorithms such as hybrid input-output (HIO) and oversampling smoothness (OSS), in terms of the convergence speed, consistency of the phase retrieval, and robustness to noise.


I. INTRODUCTION
Phase retrieval has been fundamental to several disciplines, ranging from imaging, microscopy, crystallography and optics to astronomy [1], [2], [3], [4].It aims to recover an object only from its Fourier magnitudes.Without the Fourier phases, the recovery can be achieved via iterative algorithms when the Fourier magnitudes are sampled at a frequency sufficiently finer than the Nyquist interval [5].In 1972, Gerchberg and Saxton developed an iterative algorithm for phase retrieval, utilizing the magnitude of an image and the Fourier magnitudes as constraints [6].In 1982, Fienup generalized the Gerchberg-Saxton algorithm by developing two iterative algorithms: error reduction (ER) and hybrid input-output (HIO), which use a support and positivity as constraints in real space and the Fourier magnitudes in reciprocal space [7].In 1998, Miao, Sayre and Chapman proposed, when the number of independently measured Fourier magnitudes is larger than the number of unknown variables associated with a sample, the phases are in principle encoded in the Fourier magnitudes and can be retrieved by iterative algorithms [5].These developments finally led to the first experimental demonstration of coherent diffractive imaging (CDI) by Miao and collaborators in 1999 [8], which has stimulated wide spread research activities in phase retrieval, CDI, and their applications in the physical and biological sciences ever since [2], [9], [10].
For a finite object, when its Fourier transform is sampled at a frequency finer than the Nyquist interval (i.e.oversampled), mathematically it is equivalent to padding zeros to the object in real space.In another words, when the magnitude of the Fourier transform is oversampled, the correct phases correspond to the zero-density region surrounding the object, which is known as the oversampling theorem [5], [11].The phase retrieval algorithms iterate between real and reciprocal space using zero-density region and the Fourier magnitudes as dual-space constraints.A support is typically defined to separate the zero-density region from the object.The positivity constraint is applied to the density inside the support.In the ER algorithm, the no-density region outside the support and the negative density inside the support are set to zero in each iteration [7].The HIO algorithm relaxes the ER in the sense that it gradually reduces the densities that violate the object constraint instead of directly forcing them to zero [7].This relaxation often leads to good reconstructions from noise-free patterns.However, in real experiments, the diffraction intensities, which are proportional to the square of Fourier magnitudes, are corrupted by a combination of Gaussian and Poisson noise and missing data.In the presence of experimental noise and missing data, phase retrieval becomes much more challenging, and the ER and HIO algorithms may only converge to sub-optimal solutions.Simply combining ER and HIO still suffers from stagnation and the iterations can get trapped at local minima [12].To alleviate these problems, more advanced phase retrieval algorithms have been developed such as the shrinkwrap algorithm and guided HIO (gHIO) [13], [4].In 2010, a smoothness constraint in real space was first introduced to improve the phase retrieval of noisy data [14].Later, a noise robust framework was implemented for enhancing the performance of existing algorithms [15].Recently, Rodriguez et al. proposed to impose the smoothness constraint on the no-density region outside the support by applying Gaussian filters [16].The resulting oversampling smoothness (OSS) algorithm successfully reduces oscillations in the reconstructed image, and is more robust to noisy data than the existing algorithms.
Since phase retrieval can be cast as a non-convex minimization problem, many efforts have been made to study phase retrieval algorithms from the viewpoint of optimization.For example, Bauschke et al. [17] related HIO to a particular relaxation of the Douglas-Rachford algorithm [18] and introduced the hybrid projection reflection algorithm [19], [17].Using similar ideas, researchers further proposed several projection algorithms such as iterated difference map [20] and relaxed averaged alternation reflection [21].
In [22], Chen and Fannjiang analyzed a Fourier-domain Douglas-Rachford algorithm for phase retrieval.By taking noise into account, the Wirtinger Flow [23] relaxes the magnitude constraint into a fidelity term that measures the misfit of Fourier data, to which Wirtinger gradient descent is applied.Other methods in this line include alternating direction methods [24], [25], [26] that have been widely used in image processing, as well as lifting approaches [27] such as PhaseLift [28], [29] by Candès et al. and its variants [30], [31].
In this paper, we propose an optimization-based phase retrieval method, termed generalized proximal smoothing (GPS), which effectively addresses the noise in both real and Fourier spaces.Motivated by the success of OSS [16], GPS incorporates the idea of Moreau-Yosida [32], [33] regularization with heat kernel smoothing, to relax the object constraint into an implicit regularizer.We extend the notion of infimal convolution from real domain to complex domain in the context of convex analysis [34], [35], which enables us to handle the convex conjugate of the implicit relaxation in the dual variable.We further relax the magnitude constraint into a least squares fidelity term, for de-noising in Fourier space.To minimize the primal-dual formulation, GPS iterates back and forth between efficient proximal mappings of the two relaxed functions, respectively.Our experimental results using noisy experimental data of biological and inorganic specimens demonstrate that GPS consistently outperforms the state-of-the-art algorithms HIO and OSS in terms of both speed and robustness.We also refer readers to the recent paper [36] about training quantized neural networks, which shows another success of using Moreau-Yosida regularization to relax the hard constraint.
Notations.Let us fix some notations.For any complexvalued vectors u, v ∈ C n , u is the complex conjugate of u, whereas u * := u is the Hermitian transpose.Re(u) and Im(u) are the real and imaginary parts of u, respectively.
is the Hermitian inner product of u and v. u•v is the elementwise product of u and v given by (u • v) i = u i v i .u := u, u denotes the 2 norm of u.Given any Hermitian positive semi-definite matrix K ∈ C n×n , we define u K := u, K u .arg(u) denotes the argument (or phase) of u = Re(u) + i • Im(u), which is given by arg I X is the characteristic function of a closed set X ⊂ C n given by proj X (u) := arg min v∈X v − u is the projection of u onto X , and prox f is the proximal mapping of the function f (u) defined by

II. PROPOSED MODEL
We consider the reconstruction of a 2D image u defined on a discrete lattice For simplicity, we represent u in terms of a vector in R n by the lexicographical order with n = n 1 × n 2 .Then u i represents the density of image at the i-th pixel.Due to oversampling, the object densities reside in a sub-domain S ⊂ Ω known as the support, and u is supposed to be zero outside S. Throughout the paper, we assume that the support S is centered around the domain Ω.The object constraint is The Fourier magnitude data is obtained as b = |Fu|, where F : R n1×n2 → C n1×n2 is the discrete Fourier transform (DFT).We denote the magnitude constraint by In the absence of noise, phase retrieval (PR) problem is simply to This amounts to the following composite minimization problem where f (u) := I S (u) and g(z) := I |z|=b (z) are two characteristic functions that enforce the object and Fourier magnitudes constraints.Note that f is a closed and convex function while g is closed but non-convex, which give the non-convex optimization problem of (1).

A. A new noise-removal model
In real experiments, the Fourier data are contaminated by experimental noise.Moreover, the densities outside the support are not exactly equal to zero either.In the noisy case, the image to be reconstructed no longer fulfills either the Fourier magnitudes or the object constraint.The ER algorithm, which alternatively projects between these two constraints, apparently does not take care of the noise.The HIO "relaxes" the object constraint on densities wherever it is violated.This relaxation only helps in the noiseless case.In the presence of noise, the feasible set S ∩ T can be empty, and alternating projection methods like ER and HIO may fail to converge and keep oscillating.The OSS [16] improves the HIO by applying extra Gaussian filters to smooth the densities outside the support at different stages of the iterative process.None of them, however, seems to properly address the corruption of the Fourier magnitudes.
Introducing the splitting variable z = Fu ∈ C n , we reformulate (1) as min u,z∈C n f (u) + g(z) subject to z = Fu. (2) Note that we need to extend u to complex domain in this setting.In the presence of noise, we seek to relax the characteristic functions f and g that enforce hard constraints into soft constraints.To this end, we extend the definition of the Moreau-Yosida regularization [32], [33] to complex domain.
Let K be a Hermitian positive definite n × n matrix.The Moreau-Yosida regularization of a lower semi-continuous extended-real-valued function h : We see that h K converges pointwise to h as K → 0 + .In the special case where K = tI is a multiple of identity matrix with t > 0, h K reduces to For any characteristic function is 1 2t of the squared 2 distance from u to the set X .Similar idea of relaxing a characteristic function into a distance function has been successfully applied to the quantization problem of deep neural networks in [36].Taking X = {z ∈ R n : |z| = b} to be the magnitude constraint set and σ > 0, we first relax g = I |z|=b in (2) into is a least squares fidelity, which measures the difference between the observed magnitudes and fitted ones.This fidelity term has been considered in the literature by assuming the measurements being corrupted by i.i.d.Gaussian noise; see [24] for example.In practice, we observe that it works well even with a combination of Gaussian and Poisson noises.
Following this line, we further relax for some Hermitian positive definite matrix G.The choice of G here is tricky, and will be discussed later in section III.The relaxation of both constraints thus leads to the proposed noise-removal model For a non-diagonal matrix G, the associated Moreau-Yosida regularization f G in (4) does not enjoy an explicit expression in general.This poses a challenge to the direct minimization of ( 5) using solvers such as alternating direction method of multipliers (ADMM) [37], [38], [39].

B. Generalized Legendre-Fenchel transformation
We can express any function h : C n → R as a function h defined on R 2n in the following way Re u, y = Re(u), Re(y) + Im(u), Im(y) = ũ, ỹ .
We propose to generalize the Legendre-Fenchel transformation (a.k.a.convex conjugate) [34] of an extended-real-valued convex function h defined on C n as h * (y) := sup In fact, f G is the infimal convolution [40] between the convex functions f and 1  2 Similar to the real case, the infimal convolution holds the property that where While f G takes an implicit form, its generalized convex conjugate is readily explicit.This suggests us look at the primal-dual formulation of model (5).

C. A primal-dual formulation
With slight abuse of notation, we say y ∈ ∂h(u) is a subgradient of h at u, if ỹ ∈ ∂ h(ũ).Then the Lagrangian of (5) reads (8) where F * = F −1 is the adjoint of F or the inverse DFT.The corresponding Karush-Kuhn-Tucker (KKT) condition is We apply the convex conjugate and rewrite (9) as which is exactly the KKT condition of the following minmax saddle point problem with g σ (z) and f * G (y) explicitly available from equations ( 3) and (6).

III. GENERALIZED PROXIMAL SMOOTHING
We carry out the minimization of the saddle point problem (10) by a generalized primal dual hybrid gradient (PDHG) algorithm [41], [42], [43], [44], which iterates for some step sizes s, t > 0. The update of z k+1 calls for computing the proximal mapping of tg σ [24], whose analytic solution is given by which is essentially a linear interpolation between z and its projection onto the magnitude constraint {z ∈ C n : |z| = b}.
Moreover, we need to find the proximal mapping of sf * G for updating y k+1 , which reduces to In the third equation, S * := {y ∈ C n : Re(y) ≤ 0 on S} according to (7), whose projection is Here x − := min(0, x) for x ∈ R. Problem (11) seems to have closed-form solution only when G is a diagonal matrix.
We devise two versions of GPS algorithm based on different choices of G.Here we remark that G only needs to be positive semi-definite in (4), as f G can take the extended value ∞.In this case, although G −1 does not exist in (4), since f G is convex and lower semi-continuous, and the strong duality f * * G = f G holds here, we can re-define f G via the biconjugate as The remaining challenge is to solve the proximal problem (11).

A. Real space smoothing
One choice of G is G = γ D D.Here D is the discrete gradient operator, and then D D is the negative of discrete Laplacian.In this case, Dy 2 , which we shall refer as the real space smoothing.Since G is not diagonal, the closed-form solution to ( 11) is not available.
For small γ, we approximate the solution by The projection is followed by the matrix inversion to ensure the smoothness of the reconstructed image after each iteration.In fact, the real space smoothing is related to the diffusion process.Consider the heat equation with an initial value condition on R n × [0, +∞): A numerical approach to the above problem is the backward Euler scheme: where dt is the step size for time discretization.On the other hand, the exact solution of the heat equation is given by a Gaussian convolution of is a heat kernel.This observation leads to a fast approximated implementation of (14) when γ is small: where the convolution can be done via the efficient DFT.In the context of physics, this is known as the low-pass filtering.
Algorithm 1: GPS-R features low-pass filters for smoothing.Here we abuse notation γ to imply the filter.Inspired by OSS [16], we choose an increasing sequence of spatial frequency filters {γ l } (a sequence of finer filters).In our experiments, we do 1000 iterations with totally 10 stages.
Each stage contains 100 iterations, in which we stick with the same filter frequency.We monitor the R-factor (relative error) during the iterative process, which is defined as The reconstruction with minimal R F at each state is fed into the next stage.By applying the smoothing on the entire domain, GPS-R can remove noise in real space and obtain the spatial resolution with fine features.
Algorithm 1 GPS-R: GPS with smoothing in real space.

B. Fourier space smoothing
Another simple choice is the diagonal matrix where r ∈ R n and r i is the distance in the original 2D lattice between the i-th pixel and the center of image.Note that G is not invertible since r i = 0 for the pixel at the center.By (13), So f G (u) is a weighted sum of squares penalty on u.The weight is inversely proportional to the squared radius, which is infinity for density in the center.The further the density off the center, the smaller the penalty for the object constraint being violated.
By the Parseval's identity, for square-integrable function u, we have d dξ û(ξ) where û is the Fourier transform of u.In the discrete setting, this amounts to Therefore, by (6), This means that we are smoothing f * (y) by regularizing with the 2 gradient of Fourier coefficients of y.We thus refer it as the Fourier space smoothing.
Since G is diagonal, (11) has the closed-form solution The solution can be also approximated by a direct multiplication with the Gaussian filter when γ is small.Hence, we update y k+1 as GPS with smoothing in Fourier space (GPS-F) is summarized in Algorithm 2.

C. Real-Fourier space smoothing
Combining both Fourier and real space smoothing is an option.In each iteration, one can first apply the low-pass filter and then the Gaussian kernel in a heuristic way (GPS-RF).

D. Incomplete measurements
In practice, not all diffraction intensities can be experimentally measured.For example, to prevent a detector from being damaged by an intense X-ray beam, either a beamstop has to be placed in front of the detector to block the direct beam or a hole has to be made at the center of the detector, resulting in missing data at the center [45].Furthermore, missing data may also be present in the form of gaps between detector panels.For incomplete data, the alternating projection algorithms skip the projection onto the magnitude constraint in this region.Similarly, we only apply the relaxation 2 on the known data for GPS.A simple exercise shows that

A. Reconstruction from simulated data
We expect GPS to be a reliable PR algorithm in the reconstruction of the images of weakly scattering objects, in particular biological specimens, which have become more popular [46].Since OSS has been shown to consistently outperform ER, HIO, ER-HIO, NR-HIO [16], we perform both quantitative and qualitative comparisons between GPS and OSS.To simulate realistic experimental conditions, the Fourier magnitudes of a vesicle model are first corrupted with 5% Poisson noise.R noise is defined to be the relative error with respect to the noise-free Fourier measurements Fig. 2: Histogram (first row), and convergence (second row) of R F and R real on Vesicle data using HIO, OSS, GPS.GPS consistently produces smaller R F and R real than HIO or OSS.Moreover, GPS converges fastest with the fewest oscillations.
where u o is the noise-free model, and b are noisy Fourier magnitudes.Due to the discrete nature of photon counting, experimentally measured data inherently contain Poisson noise that is directly related to the incident photon flux on the object.In addition to Poisson noise, the data is also corrupted with zero-mean Gaussian noise to simulate readout from a CCD.Any resulting negative values are set to zero.Therefore, an accurate simulation of b i can be calculated as where σ is proportional to the readout noise.[47] In some cases, the reconstructed image by an algorithm yields a small relative error R F but has low quality.This is the issue of over-fitting, an example of which can be seen in certain reconstructions using ER-HIO [16].Smoothing is a technique to avoid data over-fitting.To validate results and show that our algorithm does not develop over-fitting, we measure the difference between the reconstructed image and the model by In addition, we also look at the residual Res = Fu| − b which measures the difference between the Fourier magnitudes of the reconstructed image and the experimental measurements.The residual can validate the noise-removal model, telling how much noise is removed.Figure 1 shows the reconstruction of vesicle model from simulated noisy data using HIO, OSS, GPS-R, and GPS-F.GPS-F and GPS-R obtain lower R F and R real than OSS.Moreover, GPS-F can get very close to the ground truth with R F = 5.90% and R real = 0.7%.In addition to lower R values, GPS-R and GPS-F converge to zero outside the support.They both obtain lower residuals than OSS, specifically GPS-F produces the least residual.If we choose larger parameter for the 2 gradient regularizer in Fourier space, we will get a smoother residual.Overall for realistic Gaussian and Poisson noise in Fourier space, GPS-F is a suitable noise-removal model.
Figure 2 shows the histogram and the convergence of R F and R real on 100 independent, randomly seeded runs using HIO, OSS and GPS on the simulated vesicle data.The histogram shows that GPS is more consistent and robust than OSS.It has a higher chance to converge to good minima with lower R F and R real than OSS.Furthermore, R F and R real of OSS scatter widely due to the initial value dependency.In contrast, GPS is more selective and less dependent on initial values.R F and R real of GPS are seen at a lower mean minimum with less variance.
Similar to HIO and ER, OSS keeps oscillating until a finer low-pass filter is applied.In contrast, GPS converges faster and is less oscillatory than OSS.In the presence of noise, alternating projection methods (ER, HIO, OSS) keep oscillating but do not converge.Applying smoothing and replacing the measurement constraint by the least squares fidelity term g σ (z) = 1 2σ |z| − b 2 helps to reduce the oscillations; hence, the method can converge to a stationary point.Note that larger σ reduces more oscillations, but also decreases the chance to escape from local minima.Alternating projection methods have σ = 0 since they impose measurement constraints.GPS obtains both smaller R F , R real , and lower variance.Even though R F are close to each other, R real of GPS is much smaller than OSS.This means GPS recovers the vesicle cell with higher quality than OSS.This simulation shows that GPS is more reliable and consistent than OSS.

1) S. Pombe Yeast Spore:
To demonstrate the applicability of GPS to biological samples, we do phase retrieval on the diffraction pattern in figure 3 taken of a S. Pombe yeast spore from an experiment done using beamline BL29 at SPring-8 [48].We do 500 independent, randomly seeded reconstructions with each algorithm and record R F , excluding the missing center.We choose default parameters for these experiments: t = 1, s = 0.9.The sequence of low-pass filters are chosen to be the same as in OSS [16].For the first 400 iterations, σ = 0.01, then is increased to σ = 0.1 for the remaining 600 iterations.The left column of figure 3 is the mean of the best 5 reconstructions obtained by the respective algorithm.The right column shows the variance of the same 5 reconstructions.It is evident from the variance that GPS achieves more consistent reconstructions.Figure 4 shows the histogram and convergence of R F .We can conclude that not only are GPS-R results the most consistent, but also the most faithful to the experimental data.
2) Nanorice: To demonstrate the generality of GPS, we also do testing with experimental data of inorganic samples.The diffraction patterns shown in the top row of figure 5 from ellipsoidal iron oxide nanoparticles (referred to as 'nanorice1'and 'nanorice2') were taken at the AMO instrument at LCLS at an X-ray energy of 1.2keV [49].This data is freely available online on the CXIDB [50].We choose default parameters for these experiments: t = 1, s = 0.9.The sequence of low-pass filters are chosen to be the same as in OSS [16].The fidelity parameter σ is chosen small for the first 800 iterations, specifically σ ∈ [0, 0.01], to produce oscillations which is necessary for the algorithm to skip bad local minima.Once the reconstruction arrives at a good local minimum region, we increase σ to reduce oscillations.This later value of σ depends on noise level and data.We test different values of σ and σ = 1 has been found to be the optimal for both nanorice data.Figure 5 shows OSS(second row) and GPS-F(third row) reconstructions of the two nanorice particles.Figure 6 shows again that GPS obtains more consistent and faithful reconstructions as compared to those obtained by OSS.GPS-F with σ = 1 converges to lower relative error than OSS at all times.OSS cannot get lower relative error because σ = 0 does not work for this case.In general, alternating projection methods do not treat noise correctly.For example, in this case, HIO keeps oscillating but does not converge.Therefore, its results are omitted here.The better approach, OSS model, can reduce oscillations by smoothing but this is not enough.In contrast, the least squares g σ (z) = 1 2σ |z| − b 2 of GPS works for noise removal since relaxing the constraints allows GPS to reach lower relative error.The values of σ depend on noise level and type.To optimize the convergence of GPS-F on nanorice2, we apply σ = 0.01 for the first 400 iterations, σ = 0.1 for the next 300 iterations, and σ = 1 for the last 300 iterations.This test shows the effect of σ on the convergence.OSS (σ = 0) oscillates the most.GPS with σ = 0.01, 0.1, 1 oscillates less and less.As σ increases, GPS also gets to lower R F .The algorithm finally reaches a stable minimum as σ goes up to 1. Continuing to increase σ does not help with R F .Choosing large σ in the beginning may reduce oscillations but also limit the mobility to skip local minima.We recommend start with small σ and then gradually increase it until the iterative process reaches a stable minimum.

V. CONCLUSION
In this work, we have developed a fast and robust phase retrieval algorithm GPS for the reconstruction of images from noisy diffraction intensities.Similar to [36], the Moreau-Yosida regularization was used to relax the hard constraints considered in the noiseless model.GPS utilizes a primal-dual algorithm and a noise-removal technique, in which the 2 gradient smoothing is effectively applied on either real or Fourier space of the dual variable.GPS shows more reliable and consistent results than OSS, HIO for the reconstruction of weakly scattered objects such as biological specimens.Looking forward, we aim to explore the role of dual variables in non-convex optimization.Smoothing the dual variable, which is equivalent to smoothing the gradient of convex conjugate, represents a new and effective technique that can in principle be applied to other nonsmooth, non-convex problems.

Fig. 4 :
Fig. 4: Top: histogram of R F in 500 independent runs (top).Bottom: the convergence curves of a singe construction of R F on S. pombe yeast spore by GPS-RF, OSS, and HIO.

Fig. 6 :
Fig. 6: Histogram (first column) and convergence (second column) of OSS and GPS-F on nanorice1 (first row) and nanorice2 (second row).The results of HIO are omitted due to lack of convergence.