Efficient and flexible approach to ptychography using an optimization framework based on automatic differentiation

Ptychography is a lensless imaging method that allows for wavefront sensing and phase-sensitive microscopy from a set of diffraction patterns. Recently, it has been shown that the optimization task in ptychography can be achieved via automatic differentiation (AD). Here, we propose an open-access AD-based framework implemented with TensorFlow, a popular machine learning library. Using simulations, we show that our AD-based framework performs comparably to a state-of-the-art implementation of the momentum-accelerated ptychographic iterative engine (mPIE) in terms of reconstruction speed and quality. AD-based approaches provide great flexibility, as we demonstrate by setting the reconstruction distance as a trainable parameter. Lastly, we experimentally demonstrate that our framework faithfully reconstructs a biological specimen.


INTRODUCTION
Ptychography is a computational imaging method that allows to recover both the complex illumination and the transmission function of an object [1]. The method is based on the illumination of an object with a localized, coherent probe and the measurement of the resulting diffraction patterns. By laterally scanning the object with overlapping illumination areas, the lost phase information in the diffraction intensities can be recovered through phase retrieval. Further algorithmic extensions to ptychography and related methods have been introduced. For example, some algorithms allow to weaken the coherence requirement of the probe beam [2], to correct for experimental errors in the positioning and the setup geometry [3][4][5][6], to obtain three-dimensional reconstructions [7][8][9][10], or to acquire images without any moving parts using Fourier ptychography [11][12][13] or single-shot ptychography [13,14]. Conceptionally similar algorithms have been employed for wide-field fluorescence imaging [15]. Potential applications of quantitative phase imaging include label-free imaging in biomedicine [16], characterization of lens aberrations [17], and x-ray crystallography [18]. Many phase retrieval algorithms are based on the alternating projections scheme [19], where an object is iteratively reconstructed from the modulus of its Fourier transform and by applying known constraints. One variant of this approach for ptychography is called the ptychographical iterative engine (PIE) [20]. Significant improvements to the convergence and robustness of PIE have been made by enabling the simultaneous reconstruction of the probe beam [21] (known as ePIE), and later by revising the update functions and by borrowing the idea of momentum from the machine learning community [22]. The latter introduction to the PIE family has been coined momentum-accelerated PIE (mPIE) and can be considered as a state-of-the-art approach to perform ptychographic reconstructions.
Instead of solving the phase problem in ptychography using algorithms associated with PIE, it has recently been shown that the object can also be recovered by using optimizers based on automatic differentiation (AD) [23][24][25][26]. This approach allows to solve the inverse problem in ptychography without the need to find an analytical derivation of the update function, which can be challenging to obtain. Moreover, it directly benefits from the fast progress in the machine-learning community in terms of software tools, hardware, and algorithms. However, it remains unclear how this approach compares to state-of-the-art algorithms, notably in terms of computational time and reconstruction quality. In this paper, we implement an AD-based reconstruction framework using TensorFlow [27] and Keras [28] and benchmark it against an implementation of mPIE running on a graphics processing unit (GPU) [6]. In addition, we provide an open-source implementation for performing AD-based reconstructions as we show in Code File 1 [29]. Within this framework, we thus provide a flexible tool to perform ptychographic reconstructions and to simultaneously estimate unknown experimental parameters, opening interesting perspectives to improve the reconstruction quality for visible and X-ray lensless imaging techniques.

PROBLEM FORMULATION
The AD-based reconstruction framework implemented in this paper is presented in Fig. 1. Let O(r) represent the two-dimensional, complex-valued object and let P (r) represent the probe while r represents the coordinate vector in the object plane. The object is illuminated at different positions R i in such a way that the probed areas are overlapping. In the projection approximation [30], the exit field distribution in the object plane ψ exit (r, R i ) is then given as: The field in the detection plane ψ cam (r , R i ) with coordinate vector r can be obtained by choosing an appropriate diffraction model. We have implemented both the angular spectrum method and the Fraunhofer diffraction model, the latter of which conveniently accommodates Fresnel diffraction by absorption of a quadratic phase function into the probe [31]. While the angular spectrum method can be used to model both near-field and far-field diffraction, it comes at the cost of computing two fast Fourier transforms (FFT). The Fraunhofer diffraction model, which is based on the far-field approximation, can be performed using a single FFT. The last step in the forward model is the calculation of the intensity in the detection plane using where I pred (r , R i , θ) is the predicted coherent diffraction pattern at the ith position given the current best-fit for some parameters θ that describe the model. While the set of parameters θ usually includes amplitude and phase values characterizing the probe field and the object, we will show below that θ can also include other experimental parameters such as the reconstruction distance. Eventually, the objective of the framework is to minimize the mean squared error (MSE) function given as where N is the total count of scanning points in the chosen illumination scheme, and I meas (r , R i ) is the experimentally measured diffraction pattern at scanning position R i . As a data standardization method, we normalize the ptychographic data set to the highest measured pixel intensity, a choice that only affects the ensuing tuning of the optimizer hyperparameters. Fig. 1. Flowchart of the optimization framework based on automatic differentiation. The dataflow from left to right represents the forward model in ptychography. The object is initially guessed and adapted by the optimizer in order to minimize the error between the experimental and predicted diffraction patterns. By applying the chain rule, automatic differentiation finds the derivative of the error function with respect to the independent variables (such as object, probe, or parameters of the setup geometry). Gradient descent optimization then backpropagates the changes to the variables.
The key idea of AD-based ptychography is to rely on the optimizer to reconstruct the object and probe. As computers execute mathematical functions as a sequence of elementary arithmetic operations with known derivatives, the derivative of the objective function with respect to θ is computationally known (by applying the chain rule repeatedly). Therefore, AD enables us to converge to a solution of the ptychographic problem without the need to find an analytical expression for the update function, in contrast to algorithms based on PIE. In practice, we employ the popular Adam optimizer [32] to make changes to θ in each backpropagation epoch.

SIMULATED DATA
With our goal to quantitatively compare the reconstruction speed and quality for our AD-based framework and mPIE, we simulate diffraction patterns according to the setup shown in Fig. 2(a). A ptychographical dataset is synthesized by generating a 512 × 512 complex-valued object generated from two images, respectively defining the transmittance and phase contrast of the object. The pixel size is 6.45 µm. Using coherent illumination at λ = 561 nm, we shift the object to 80 overlapping locations with a relative overlap of 60 % as defined and recommended by Bunk et al. [33]. The projection approximation is applied to calculate the exit waves which are then propagated to the detection plane using the angular spectrum method. To calculate the synthetic diffraction patterns, we then take the absolute squares of the propagated fields and add both (Poissonian) shot noise and (Gaussian) camera read-out noise to the synthetic data. The simulated intensities are in the order of a few hundred counts per pixel, and the Gaussian noise is characterized by a standard deviation of σ = 10 counts. Then, we utilize our optimization framework based on AD to reconstruct the object and probe functions. As shown in Fig. 2(b), we successfully reconstruct both the complex-valued object and the probe by running the algorithm for 170 epochs (4 min). For a quantitative comparison between our AD-based framework and mPIE, we run them on the same graphics card (NVIDIA GeForce RTX 2070) using the synthetic dataset. To this end, we use the GPU-enabled mPIE implementation from Ref. [6] and measure both reconstruction quality and speed. It must be stressed that the computational time needed for one iteration in mPIE and for one epoch of our AD-based framework can be different. Therefore, to provide a meaningful comparison, we measure the reconstruction speed in time units. In order to quantify reconstruction quality, a useful figure of merit to compare complex functions is the complex correlation coefficient C defined by where O(r) is the ground truth,Ô(r) is the estimate of the object and n is the total number of pixels per image. Note that the reconstructed object can be shifted in space relative to the ground truth, which is a typical ambiguity in ptychography. We estimate this shift and compensate for it before calculating the correlation coefficient. The absolute value of the coefficient C is a quantitative indicator for how well an estimate resembles the ground truth. The argument of the correlation coefficient can be discarded as it only represents a global phase shift in the reconstruction. The performances of mPIE and of our AD-based framework generally depend on the choice of some hyperparameters. We choose hyperparameters for mPIE according to the values suggested in Ref. [22]. For our AD-based framework, we can control the learning rate of the optimizer, which determines the step size at each epoch while moving toward a minimum of the objective function. In Fig. 3, we show the magnitude of the correlation coefficient as a function of the reconstruction time for different learning rates. We also include ePIE for reference. Using a learning rate of 0.01, our AD-based framework converges robustly but is significantly outperformed by mPIE in terms of reconstruction speed. Using a learning rate of 0.04, our AD-based framework performs comparably to mPIE. Higher learning rates become numerically unstable and do not converge to a reasonable estimate of the ground truth. Both algorithms converge rapidly within the first 60 s. After this time, we observe a decrease of the reconstruction performance in our AD-based framework due to stronger overfitting to noise in comparison to mPIE. For long reconstruction times of more than 200 s, noise overfitting also occurs in mPIE, but to a significantly smaller extent. The influence of noise raises a relevant question for future research on AD-based ptychography. In Ref. [34] and [35], different noise models and mitigation strategies such as regularization and maximum-likelihood refinement have already been explored for coherent diffraction imaging. A comparison for the separate reconstruction qualities for amplitude and phase images is provided in Fig. S1 in the supplementary document [36]. In conventional ptychography, θ usually comprises the pixel values for O(r) and P (r) as originally introduced in the ePIE algorithm [21]. However, one important strength of our framework is that we can choose θ relatively freely, e. g. by incorporating experimental parameters into the forward model. We demonstrate this flexibility by estimating and correcting the axial distance z between the object and detector, similar to the recently shown autofocusing in ptychography using a sharpness metric and an algorithm called zPIE in Ref. [37]. By using the angular spectrum method in our AD framework, the forward model becomes explicitly dependent on z. Therefore, we can easily define z as a trainable parameter of the model in the same way as we defined the pixel values of O(r) or P (r). To illustrate this approach, we initialize the reconstruction algorithm with an error of 15 mm in the axial distance z. As shown in Fig. 4, our AD-based framework is able to find the true value of z with submillimeter precision in approximately 15 s (35 epochs), in addition to the pixel values of O(r) and P (r). Here, note that we only obtain this result by using a precalibrated probe.

EXPERIMENTAL DATA
In order to test our AD-based framework on experimental data, we study a biological sample. We acquire a nearinfrared ptychographic scan of a histological slice of a mouse cerebellum. The sample is illuminated with a focused supercontinuum laser source that is spectrally filtered to a wavelength of λ = 708.9 nm (filter bandwidth, 0.6 nm). The sample is mounted on a XY translation stage (2x Smaract SLC-1770-D-S) with a sample-detector distance of z = 34.95 mm. The XY translation stage is driven by piezoelectric actuators with an accuracy on the position of the Fig. 4. Correction of the reconstruction distance (axial distance between camera and object) using our AD-based framework. In the reconstruction, the camera position is defined as trainable and initialized with an error of 1.5 cm.
order of tens of nanometers, making it unnecessary to employ position correction methods. A charge-coupled device (CCD) camera (AVT prosilica, 1456 × 1456 pixels with pixel size 4.54 µm) is used to collect 1824 diffraction patterns downstream of the specimen. Reconstructions are carried out in the same way as described before using the synthetic dataset. The resulting reconstructions obtained with our AD-based framework and mPIE can be visually judged to be of similar quality (Fig. 5) with an edge response (10 % − 90 %) of 10 µm in both cases. However, our AD-based framework is more prone to the apparition of high-frequency components in the reconstructed image. This effect, which is possibly due to overfitting to noise, could certainly be mitigated by the use of regularization procedures.

CONCLUSIONS
To conclude, we have evaluated the performances of a ptychographic reconstruction framework based on automatic differentiation. Using numerical simulations, we show that our AD-based framework performs comparably to the state-of-the-art algorithm mPIE in terms of speed and quality of reconstructions. Furthermore, we show that the flexibility of the forward model in our AD-based framework can readily be utilized for estimating experimental parameters in addition to the pixel values of the probe and object. As an example, we have successfully corrected the reconstruction distance. Lastly, we have experimentally demonstrated that our framework faithfully reconstructs a biological specimen with a large space-bandwidth product. We believe that the presented results are important for establishing optimization frameworks based on AD as viable methods within the field of coherent diffraction imaging. Moreover, we can expect AD-based techniques to further improve thanks to the fast-paced progress in machine-learning toolboxes like TensorFlow and in computer hardware like application-specific integrated circuits (e. g., tensor processing units [38]).
Efficient and flexible approach to ptychography using an optimization framework based on automatic differentiation This document provides supplementary information to "Efficient and flexible approach to ptychography using an optimization framework based on automatic differentiation".

AMPLITUDE AND PHASE CORRELATIONS
Correlating the ground truth with the reconstructed complex-valued object using equation (4) provides an accurate measure of the overall reconstruction performance. To gain knowledge about the reconstruction performance of the amplitude and phase images independently, it is useful to correlate them to the ground truth separately. Since the phase values in ptychography are cyclic variables and not absolute, one has to avoid spurious influences from phase wrapping and global phase shift when comparing two phase images. This is achieved by computing the magnitude of the complex correlation of exp[iϕ(r)], where ϕ(r) = arg[O(r)] represents the phase image. The results are shown in Fig. S1. Fig. S1. Convergence results for reconstructions using mPIE, ePIE and our AD-based framework in simulation for (a) amplitude images and (b) phase images. The correlations between the ground truth and the reconstruction estimates are shown as a function of computation time. All algorithms run on the same computer hardware. lr: learning rate for the Adam optimizer.

DIFFRACTION INTENSITIES AND SCANNING TRAJECTORIES
In Fig. S2, a selection of randomly selected diffraction intensities for both the synthetic and experimental ptychography datasets are shown. The diffraction intensities are shown after the application of Poisson noise and readout noise, and are sampled with a dynamic range of 12 bits. Furthermore, the scanning trajectories for the probe positions are visualized in Fig. S3. To avoid the raster grid pathology in ptychography, we utilize trajectories following a concentric pattern for the experiment and Poisson disk sampling for the synthetic data [1,2].