Ptychographic phase retrieval by proximal algorithms

We derive a set of ptychography phase-retrieval iterative engines based on proximal algorithms originally developed in convex optimization theory, and discuss their connections with existing ones. The use of proximal operator creates a simple frame work that allows us to incorporate the effect of noise from a maximum-likelihood (ML) principle. We focus on three particular algorithms, namely proximal minimization, alternating direction method of multiplier and accelerated proximal gradient (APG). We benchmark their performance with numerical simulations, and discuss their optimal conditions for convergence and accuracy. An experimental dataset is used to demonstrate their effectiveness as well, in which case an array of cubic Au nanoparticles with a size of 50 nm is imaged. We show that with the presence of Poisson noise, a dataset with photon counts up to 104 at one detector pixel already requires ML-based methods to achieve a stable convergence. Among the three algorithms derived in this work, APG method is reported first time for its application in ptychographic reconstruction and shows superior performance in terms of both accuracy and convergence rate with a noisy dataset.


Introduction
Ptychography is a powerful scanning imaging technique that utilizes advanced mathematical tools to retrieve the missing phase information of the wave-field from a sequence of intensity measurements [1][2][3]. The attraction of this technique comes from its capability of recovering both the complex-valued probe and object functions, a blind deconvolution, as well as its ability of breaking the resolution barrier set by the focusing optics. It gains increasing popularity in recent years for its robustness in practice and was used successfully for many imaging applications in different fields [4][5][6][7][8][9]. The major challenge of this technique resides in the fact that the mathematical problem is non-convex and ill posed. Particularly for real-world problems, the experimental data always contains noise and other types of errors, therefore finding a solution optimized globally is extremely difficult, if not completely impossible. A great deal of efforts have been devoted to developing robust ptychographic iterative engines, either based on alternating-projection or gradient-descent methods [3,[10][11][12][13][14][15][16][17][18]. More complex algorithms that can handle mixed states [19], positioning errors [20,21], diffraction effect [22,23] and instability of the probe [24] were developed as well.
For convex optimization problems, a class of algorithms called proximal algorithms have been studied extensively [25]. They turn out to be well-suited for constrained, large-scale and distributed optimization problems, and ptychography falls into this category. In these techniques, the solving process is divided into subproblems involving the evaluation of the proximal operator, which usually has a closed-form solution. Inspired by these developments, here we propose to combine proximal algorithms and Wirtinger derivatives to create a simple frame work for solving ptychography problems, where either alternating-projection or gradient-descent algorithms can be derived straightforwardly.
We want to emphasize that proximal algorithms are originally developed for convex problems dealing with real-valued numbers, and ptychography problems are non-convex and involves complex-valued numbers. The Wirtinger derivative allows us to use the common rules for differentiation known from real-valued analysis [26], so the developed solving techniques in proximal algorithms can be readily applied. We show some previously reported ptychography algorithms can be derived in this frame work. Here we do not attempt to provide a rigorous theoretical proof of the overall convergence, but rather offer a heuristic for future work by demonstrating the effectiveness of these algorithms with numerical simulation and real-world applications. As have been shown that many phase-retrieval algorithms can find their counterparts in convex optimization theory [27], so can ptychography. In this paper, we focus on ptychographic reconstruction with noise and round-off errors, a common problem encountered in all measurements. Three maximum-likelihood (ML) based algorithms, namely proximal minimization (PM), alternating direction method of multiplier (ADMM) and accelerated proximal gradient (APG) are derived and benchmarked with both simulation and experiment data. Among them, APG, which has not been reported before, exhibits a superior performance. Although current model only considers noise and round-off errors, it is not difficult to extend it to a more complex case. This paper is organized as follows. In the model section, we first build the mathematical model for the optimization problem in ptychography, and then discuss how different solving algorithms can be derived from proximal operators. Connections with existing techniques are discussed. Two statistical models for the noise, intensity Poisson and amplitude Gaussian, are considered throughout the paper. In the numerical simulation section, we benchmark performance of the derived algorithms at different signal-to-noise ratios (SNRs), and discuss the optimal conditions for convergence and accuracy in respective cases. In the experimental data section, PM, ADMM and APG are tested with an experimental dataset taken with x-rays. The result confirms that APG outperforms the other two and achieves the state-of-the-art performance.

Model
In a ptychographic scan, a probe, p, impinges on different parts of an object, o. The transmitted wave is assumed to be simply a product of the probe and object function, and propagates to a farfield detector. The wave-field at the object plane is linked to the wave-field at detector plane by Fourier transform. Because a physical detector only measures intensity (or amplitude in other words), the phase of the wave-field is lost in an experiment. A phase-retrieval algorithm intends to recover this set of missing information based on the measured amplitude under certain conditions. Assuming that we collect farfield diffraction patterns at K different positions, We now have two constraints for the measured data to satisfy. One is in real space (sample plane). The wave-field has to be written as a product of a probe and an object function with known translations. The other is in reciprocal space (detector plane). The modulus of its Fourier transform has to agree with the measurement where y i is the measured amplitude of the ith image and F is the two-dimensional Fourier transform operator. In equation (1), another set of known information is the probed position, r i . The goal here is to find complexvalued functions p and o that satisfy both the amplitude and translation constraints. We discretize the problem and rewrite equation (1) into a vector form by concatenating an image along its column directions Here the lower-case bold letter represents a column vector and a capital bold one corresponds to a matrix. The absolute operator is element-wise. In the above equation, Î y i N 1 is the measured amplitude, Î F N N is the Fourier transform matrix, Î p N 1 is the probe vector, P= diag(p) is a diagonal matrix, Î S i N M is a sparse matrix containing only zeros and ones that selects the illuminated elements of the object vector, Î o M 1 , at the ith probed position. The resulting vector is Î o i N 1 . Problem described by equation (2) can be turned into an unconstrained optimization problem where f and g are indicator functions corresponding to the two constraints respectively

Alternating projection (AP)
The classical method of solving the above problem is AP algorithm that projects the solution into two domains, Here Π f and Π g are Euclidean projection operators. By iteratively projecting into these two domains, we hope that an initial guess can converge to a solution fulfilling both constraints. A proximal operator of a function f is defined by Here we use 1/λ instead of 1/2λ for convenience because we need to deal with complex-valued variables. For the two indicator functions in equation (3) their corresponding proximal operators are simplified to x dom x dom The solution to the first projection is simply to replace the amplitude of v i with y i while retain its phase The divide is an element-wise operation. The solution to the second projection can be obtained by many different ways. The probe and object functions can be updated sequentially [16], collectively [3] or jointly [14].
Here we choose the usual collective update  where superscript H denotes conjugate and transpose operation. Π g defines another projection satisfying the second constraint. Specifically, we back-propagate the wave-field to the sample plane using inverse Fourier transform. Then we update probe and object functions collectively based on the probed positions. One can run the iteration once or multiple times for high accuracy. Lastly, we replace the wave-field at the sample plane by the product of the probe and object functions and propagate it to the detector plane. This method is often referred as error reduction (ER) algorithm. From a statistical point of view, if we assume a Gaussian likelihood function of the amplitude and use its negative log as our cost function, we arrive at Their Wirtinger derivatives with respect to the probe and object are They can be solved iteratively by a fixed-point algorithm that seeks a fix point of the equation, z=q(z). In this case, the unknown variables are o and p. If they are solved in sequence we arrive at ⎡ This is no different from the classical ER algorithm shown in equation (10). Therefore, we can also interpret ER algorithm for ptychography as some sort of fix-point algorithm that seeks the stationary point of the amplitude Gaussian likelihood function with respect to p and o.

Alternating direction method of multiplier
For real-world phase-retrieval problems, ER is known to suffer from slow convergence and stagnation issues. A far more robust and popular algorithm is ADMM. One special variant of its form is Douglas-Rachford splitting method, also known as difference map (DM), which is widely used for ptychography reconstructions. ADMM is usually derived from argumented Lagrangian method. In the framework of proximal algorithms, it can be written in a very concise form. An in-depth discussion of ADMM can be found in the monography by Boyd et al [28], and its application for ptychography were reported in previous publications [12,18]. Recently it was applied for joint ptycho-tomography reconstruction [29]. Thus, here we skip the derivation process. We change our optimization problem (equation (3)) slightly For two indicator functions defined in equation (4), the ADMM algorithm is, Variable + x i t 1 is not independent and can be replaced. Rearrange terms and use Euclidean projection operators derived in equations (9) and (10), we have, This is the well-known DM algorithm [3].
For a noisy dataset, DM is known to have stability problem because it attempts to find a solution with its amplitude exactly equal to the measured value. A simple remedy is to replace the indicator function, f, with a negative log likelihood function, . In such a case, x i update in equation (16) is modified to Here 'G' and 'P' refers to amplitude Gaussian and intensity Poisson, respectively. One may notice that the update scheme for x i is now parameter-dependent. From Bayes' theorem, the proximal operator can be interpreted as maximum-a-posterior (MAP) probability estimate, where the prior probability follows a normal distribution. The parameter, λ, controls how close the new update should be to its prior value, and plays an important role in determining the performance of the algorithm. We will have a more detailed dissuasion in the following section.

Proximal minimization
The fixed point of a proximal operator is also the minimizer of the original function. This leads to a simple proximal iterative algorithm For the ptychography problem considered in this paper, we define Again, we can interpret the update as a MAP estimate. The difference from ADMM discussed above is that x i and z i are forced to be equal here. In ADMM, the splitted variables belong to their individual domains and are not necessary the same. Similar to the derivation of equation (14), we make two-step update Compared to ER, the only difference is that the measured amplitude is replaced with a MAP-estimated value. This method is first proposed by Katkovnik et al [13]. Here we show it is equivalent to PM algorithm and provide an alternative perspective. Repeat

Accelerated proximal gradient
Let's consider the optimization problem Wirtinger derivative allows us to derive the gradient of the real-valued likelihood function with respect to the complex-valued variable Here ε is a small real-valued constant introduced to avoid the discontinuity at zero, as suggested in [18]. The negative of the gradient is also the steepest descent direction of the function. The proximal gradient algorithm is Here λ t is a positive step size that can vary at each iteration. We use a simple method to determine its value [30]. The step size remains the same unless the following condition is violated  In such a case, we reject the update and multiply the step size by a factor βä(0, 1). The process is repeated until the above condition is satisfied and then the iteration is completed, = + x z i t i 1 . The proximal gradient algorithm can be understood from a point of view of localized optimization. For completeness, we give an explanation due to Beck and Teboulle [30]. The function, ( ) This function can be considered as an first-order approximation to  with a regularization term. We may rewrite it as In the vicinity of x i t , we replace the original optimization problem with an approximate one Dropping constant terms in equation (27) does not affect the solution to equation (28). As a result we arrive at Consequently, we can interpret each iteration as a proximal operator of g along the steepest decent direction of , as the name proximal gradient suggests. By definition In this particular case g is an indicator function, thus We can simplify the inequality as Therefore, each iteration decent the negative log likelihood function meanwhile satisfying the constraint g. For a faster convergence, the accelerated version of the proximal gradient method which includes an additional extrapolation step can be used We note that Xu et al [15] recently proposed accelerated Wirtinger flow (AWF) method for ptychography, which stems from the popular Wirtinger flow algorithm for phase-retrieval problems [31]. It shares some similarity with APG. However, they differ fundamentally in many aspects. AWF can be considered as a steepest decent method with a constant step size, while APG here is a projected gradient method with a varying step size. We may consider APG as a hybrid algorithm combining gradient descent and projection methods. We first decent the cost function in reciprocal space with a small step, and then project it to the domain in real space which satisfies the translation constraint. The update is completed only when such a move would make the cost function smaller. For a Gaussian amplitude likelihood function, if λ t is equal to one, as can be seen the updating scheme is no different from ER algorithm. Therefore, we can choose one as the initial value of the step size. Repeat ; else λ t =λ t β; end until λ t is sufficiently small; λ t+1 =λ t ; ifλ t+1 <δ then λ t+1 =λ 0 ; end until t>t max

Numerical simulation
In this section we will compare the performance of different methods using simulation data. The test object function is shown in figure 1. The image 'Cameraman' is used as its amplitude and 'Barbara' as its phase. The pixel size of the image is assumed to be 5 nm. A probe of size 37 nm is produced by a Fresnel Zone plate and a special fermat scan that follows the equation (in polar coordinates) [32] ( ) is performed to illuminate the different parts of the sample. c is chosen to be 20 nm (4 pixels) and a total of 1261 far-field diffraction patterns are collected. The maximum detector intensity is scaled to the range of 10 2 -10 6 and a Poisson noise is added to each pixel accordingly. The 'measured' intensity contains two different types of errors. One is the round-off error since the measured intensity are integers. This approximation reduces the dynamical range of the signal. Particularly in the detector region where the intensity drops below 0.5 count, they are all set to zero. Two is the Poisson noise, which adds background fluctuation to the signal. To be more quantitative, we calculate the SNR of the intensity as where y î are the ground-truth values.
We first study the case with SNR=32.25 dB (max. detector pixel intensity=10 4 counts). The reconstruction results obtained from different algorithms discussed in the preceding section are shown in figure 1. All of ML-based methods yield a high-quality reconstruction with nearly indistinguishable difference. If we pay close attention on the reconstructed amplitude, a very faint cross-talk from the phase image can be seen in the background. In contrast, the phase reconstruction does not show any visible artifacts. As a comparison, we also plot results from the non-statistical algorithm, DM. Strong cross-talk in the reconstructed amplitude image can be observed. In addition, the reconstructed phase image is less sharp than the others and contains some visible artifacts. This suggests that even at this level of signals, one may still need to use ML-based algorithm to achieve the best result.
To quantify the error for a systematic study, we use a root mean square error (RMSE) defined aŝˆ( whereô is the ground-truth and a is a complex-valued constant to account for the ambiguity in ptychography reconstruction. Here we choose RMSE over other image quantification methods (such as the popular structural similarity index [33]) because the reconstructed quantity is a complex-valued number. RMSE allows the calculation of a single metric without evaluating the amplitude and phase images separately. The assessment under different conditions is presented in figure 2. For PM ( figure 2(a)), its convergence rate is λ-dependent, while the resultant RMSE is not; they all converge to the same value. The Poisson model leads to a smaller RMSE, which is not a surprise since a Poisson noise is added. When the amplitude Gaussian is used, PM is no better than ER. As we discussed earlier, ER can also be considered as a ML-based method. For ADMM ( figure 2(b)), we observe that a larger value of λ leads to a faster convergence. In the limit of infinity, ML-based ADMM becomes DM, which shows the fastest convergence rate at the beginning. However, a large value of λ can cause a stability issue, which can be seen in the plot. In this case when λ>5, they do not tend to converge to a stable solution after initial fast convergence, but rather fluctuate as iteration goes. Also, a large λ results in a higher RMSE. Therefore the solution is less accurate. This suggests a multi-stage strategy for ADMM to optimize both the convergence rate and accuracy. We can start with a large λ for fast convergence. When a stable ), we reduce the value of λ by multiplying it with a constant βä(0, 1), and then use the solution obtained from the last stage as the initial guess and continue the iteration. To distinguish it from a regular ADMM algorithm, we call it mADMM thereafter, where 'm' refer to multi-stage.
For APG (figure 2(c)), we always choose the initial value of λ 0 as one and it is adjusted automatically in the update. Similar to the cases in PM and ADMM, intensity Poisson model outperforms amplitude Gaussian. It is worth noting that when the initial guess of the probe is bad, λ t can quickly becomes very small, particularly for intensity Poisson model. This will cause a stagnation problem. A remedy is to reset λ t to its initial value when it becomes too small, but a price to pay is more computation time.
To assess performance across algorithms, in figure 2(d) we plot their RMSE variations as a function of iteration number. For a fair comparison, all start with a disk-like probe and a square object as the initial guess. Poisson model is used in all algorithms. Among them, APG has the best performance. Not only its overall convergences rate is higher, but also the resultant RMSE is smaller.
With current SNR, though quantitative analysis shows the difference in the reconstruction, visually the results look almost the same (figure 1). In a more extreme case, we consider a maximum detector intensity of 100 counts (SNR=12.77). Compared to the previous case, the diffraction intensity is reduced by 100-fold. Figure 3(a) shows a typical diffraction pattern with limited counts in logarithmic scale, and a comparison with the ground-truth ( figure 3(b)). The error-free data has a dynamical range over seven orders of magnitude, while that for the limited counts is reduced to less than two due to the round-off to integers. The added noise makes the situation even worse. Such a noisy dataset poses a significant challenge to ptychographic reconstruction. We present in figure 3(c) the reconstructed results obtained from PM, APG and mADMM algorithms. Because DM does not converge at all in this case, its result is not shown here. Again a disk-like probe and a square object as the initial guess are used. All three algorithms lead to a converged solution. As is clear, their results however differ considerably from algorithm to algorithm. PM recovers high-frequency features well, but the reconstructed images look more grainy. There are also strong cross-contamination between the amplitude and the phase images. In contrast, mADMM yields smooth images, but the high-frequency details are lost. APG balances the two well and produces the best overall results. The quality of the obtained phase image is still very acceptable, without having visible artifacts and losing too much details. Figure 4 depicts the achieved RMSE of the reconstructed complex image at various levels of signal. In general, the log-log plot shows a close-to-linear relationship, suggesting that they all follow a power law approximately. In most circumstances, APG outperforms the other two. Unlike PM and mADMM algorithms for which the value of λ has to be chosen accordingly with the noise level to achieve the best performance, there is no need to tune any parameter for APG when SNR changes.

Experimental data
The simulation data only take into account round-off errors and Poisson noise. The real-world problem can be much more complicated. In order to assess the robustness of these algorithms, we perform reconstruction on an experimental dataset that was taken at the hard x-ray nanoprobe beamline of National Synchrotron Light Source II, Brookhaven National Laboratory. The nanobeam with a size of ∼13 nm 2 at 12 keV was produced by two crossed multilayer Laue lenses. The beamline layout is shown in figure 5(a). Details about the experimental setup can be found elsewhere [34]. The sample consists of cubic Au nanoparticles with a size of 50 nm deposited on a Si substrate [35]. They form an ordered array with gaps between them as small as 10 nm, as seen in the scanning electron microscopy image ( figure 5(b)). The sample was placed at a downstream position with a distance of 25 mm to the focal plane. A fermat scan with c=20 nm (equation (34)) was performed to avoid periodic aliasing effect, and a total of 792 frames were collected. The diffraction pattern collected on a far-field pixel-array detector (Merlin, Quantum Detectors) has over 4×10 4 maximum detector counts at one pixel.
In figure 5(c) reconstructed complex-valued images with different algorithms are presented. For a fair comparison, they all start with the same initial guess of the probe function, which is obtained by inverse Fourier transforming the measured far-field amplitude and then propagating 25 μm to the sample plane. In other words, the initial guess assumes a lens with no phase aberration. Because DM does not converge to a stable solution, we have to choose an intermediate reconstruction result that looks the best. Nevertheless, the phase image is a bit noisy, and the amplitude part is barely recognizable. PM yields a smoother result, but both the phase and amplitude exhibit some ghost image around the boundary of the array. mADMM produces a further improved result, but the ghost image can still be seen, particularly in the amplitude image. Not surprisingly, the best result is achieved with APG. It has no apparent artifacts seen in the reconstruction. We can clearly resolve the shape of individual 50 nm nanoparticles and their sharp edges in the phase image, even though the amount of the phase variation of one layer is in the order of ∼0.05 radian. Because the sample has a very low absorption contrast (∼1.7%), the amplitude image usually is too fuzzy to be useful. However in the one obtained from APG we can still recognize the particle array. For this dataset, we conclude that APG leads to a reconstruction result with overall image quality noticeably better than that of others.
There are a few remarks we'd like to point out. Among all the algorithms tested in this paper, DM takes the least number of iterations to arrive at a plausible solution, particularly when the initial guess of the probe is far from the ground truth. However, with the presence of noise DM can become unstable and diverge. On the contrary, ML-based algorithms converge slower, but are stable. The reason is that their iteration processes usually involves a MAP-based sub-optimization step which requires the updated solution to be close to its prior value. As a result, a good guess of the initial probe is more important for these ML-based algorithms. In addition, for the experimental data tested in this case we do not see visible difference between Poisson and Gaussian models. This suggests that the difference in reconstruction between two statistical models diminishes as intensity increases.

Conclusion
In summary, we presented several solving techniques for ptychographic imaging derived in the framework of proximal algorithms. The separable nature of the proximal operator makes it well-suited for dealing with largescale ptychography reconstruction problems where its evaluation can be parallelized. The optimization problem is usually divided into sub-optimization steps involving proximal operators for which often a closed-form solution can be found. Therefore, the problem becomes more tractable. We derived ER, PM, ADMM, DM and APG algorithms and benchmarked their performance with noisy datasets. Among them, APG depicted the best reconstruction result not only in numerical simulation but also in experiment.
In the current work, we only consider a noise model and round-off errors. In the same frame work, it is not difficult to enable modes to deal with partial coherence [19] and bluring effect in fly-scan [36,37]. In such cases, contribution from different modes will add up incoherently and the likelihood function has to be modified accordingly. We can also consider to incorporate more constraints on probe or object function. For example, adding a regularization term with denoiser has shown suppressed noise in the reconstructed object function [13,38]. With an additional constraint on the probe, it was demonstrated that the periodic aliasing effect seen in grid scans could be mitigated [18]. There is still a lot of room for improvement, and they will be the future work.