Using Automatic Differentiation as a General Framework for Ptychographic Reconstruction

Coherent diffraction imaging methods enable imaging beyond lens-imposed resolution limits. In these methods, the object can be recovered by minimizing an error metric that quantifies the difference between diffraction patterns as observed, and those calculated from a present guess of the object. Efficient minimization methods require analytical calculation of the derivatives of the error metric, which is not always straightforward. This limits our ability to explore variations of basic imaging approaches. In this paper, we propose to substitute analytical derivative expressions with the automatic differentiation method, whereby we can achieve object reconstruction by specifying only the physics-based experimental forward model. We demonstrate the generality of the proposed method through straightforward object reconstruction for a variety of complex ptychographic experimental models.


Introduction
Ptychography is a coherent diffraction imaging (CDI) technique that acquires a series of intensity diffraction patterns through spatial shifts of the illumination (probe) across the sample (object) in a set of overlapping beam positions. Given a large number of overlapping beam positions, the ptychography experiment yields sufficient redundant information with which we can reconstruct the object structure to sub-beam-size spatial resolution, and even determine additional experimental parameters such as the structure of the probe itself. First proposed by Hoppe in 1969 [1], the ptychographic technique was realized experimentally and rapidly developed algorithmically in the 2000s [2,3,4,5,6]. By removing the typical CDI limitation that the probe size has to be larger than the sample, ptychography has enabled high resolution imaging of extended objects, making it a powerful imaging technique. As such, the ptychographic technique has found application not only as a 2D far-field diffraction imaging method but also as 2D near-field diffraction imaging method [7], a 3D Bragg imaging method [8], a 3D multislice imaging method [9], and a part of the 3D tomographic imaging method [10] including for objects beyond the depth of focus limit [11].
Imaging with ptychography involves solving the challenging phase retrieval problem, where one attempts to reconstruct an object from only the magnitude of its Fourier transform. In general, the phase retrieval problem is ill-posed [12]; solving it requires the use of oversampling and support constraints. These are typically used in an iterative projection framework that updates the object guess by applying a Fourier magnitude projection and a real-space constraint projection [13,14,15,16]. Alternatively, we can also frame phase retrieval as a nonlinear minimization problem, where we minimize an error metric using a gradient-based approach. The gradient-based approach is flexible and can include in the forward model a large variety of the physical phenomena related to the probing light (such as partial coherence [17], source fluctuations [18], and errors in positions [19,20]), or the detection process (such as the measurement noise [21,22] and the finite size of the pixel [23]). As such, this method has been the focus of much recent literature, leading to the development of steepest descent methods [16,24,25], conjugate gradient methods [4,16,26], Gauss-Newton methods [27], and quasi-Newton methods [28]. These algorithms have found application in the far-field ptychographic problem not only to solve for the object alone [24,25,29,30,31,32] but also to additionally solve for the probe [32,33] as well.
Gradient-based phase retrieval methods in the literature tend to rely on the availability of a closed-form expression for the gradient calculation. This closed-form expression is typically obtained by writing down an explicit expression for the error metric to minimize, then symbolically differentiating the error metric with respect to the individual input parameters [23]. Calculating the gradient in this fashion is laborious; a slight modification of the forward model usually requires a complete rederivation and algorithmic reimplementation of the gradient expressions. This becomes especially limiting if we desire to explore variations of, or introduce new capabilities to, our basic experimental methodology. As such, it is more than desirable to have an approach beyond symbolic differentiation in order to easily explore a variety of algorithms and approaches.
Automatic differentiation [34], or algorithmic differentiation, provides such an alternative to symbolic differentiation. This approach is based upon the observation that vector-valued functions can, in general, be interpreted as composites of basic arithmetic operations and a limited number of elementary functions (including exponentials and trigonometric operations). Differentiation of functions can then be understood as a recursive application of the chain rule of differentiation, wherein we repeatedly differentiate the same elementary functions (with known derivatives), only with different input parameters. This is a mechanistic process and can hence be performed entirely in software. Given a set of input numeric parameters, the automatic differentiation method computes the exact derivative by accumulating the numerical values of the elementary functions and their derivatives, without ever calculating the closed form derivative expression (Section 2).
While the field of automatic differentiation has a long and storied history [34], it is only recently that the emergence of deep neural network methods has driven its widespread adoption in the optimization and machine learning communities. Specifically, there has now arisen a need to perform gradient-based minimization to optimize state-of-the-art neural networks, which can be compositions of thousands or even millions of individual elementary functions. Since calculating closed-form derivatives for these is not feasible, automatic differentiation has become the tool of choice, thus leading to the recent rapid adoption and advancement of such software. In 2014, when Jurling and Fienup first proposed an automatic differentiation framework for phase retrieval [23], they commented on the lack of suitable existing software packages. Since then, we have seen the rise of multiple powerful, easy-to-use, and computationally efficient automatic differentiation frameworks such as TensorFlow [35], PyTorch [36], and Autograd [37]. As has been demonstrated recently [38,39], we can now simply adapt one of these software packages to solve the phase retrieval problem.
In this paper, we first provide an overview of the reverse-mode automatic differentiation algorithm (Section 2) for gradient calculations, and then demonstrate the application of this algorithm as a general framework for ptychographic phase retrieval. We provide a numerical validation of the reverse-mode automatic differentiation framework through a comparison with the popular symbolic-gradient-based ePIE method (Section 3). Finally, we demonstrate the generalizability of this framework through successful phase retrieval for increasingly complex ptychographic forward models-near-field ptychography and 3D Bragg projection ptychography-emphasizing the flexibility and potential capacity of the approach for non-standard ptychography experiments (Section 4). This shows that the reverse-mode automatic differentiation framework allows the practitioner to update and change the forward model as necessary to better reflect the physics of the problem, without prior consideration towards how to symbolically differentiate the error metric.

Overview of automatic differentiation
In this section, we provide a limited overview of the automatic differentiation procedure, focusing singularly on the reverse-mode automatic differentiation (or reverse-mode AD) framework, finally motivating the application of this framework to the phase retrieval problem. Detailed and rigorous examinations of the automatic differentiation procedure-the various modes and the algorithmic frameworks-are available [34], as is a detailed exposition of the reverse-mode AD procedure in application to the phase retrieval problem [23].
To demonstrate the idea of automatic differentiation, we first consider differentiable functions f, φ 1 , φ 2 , φ 3 : R → R with the function composition f = φ 3 • φ 2 • φ 1 , and with the priorly available individual function derivatives dφ1 dx , dφ2 dx , and dφ3 dx (for any x ∈ R). At a given point x = c, we can compute the value of f through the sequence of successive evaluations shown in Table 1. We refer to this sequence of computations as the forward pass.

Forward Pass Backward Pass
Computational Forward pass Backward pass In the evaluation trace shown in Table 1, we follow the notation in the literature [34,40,41] and index the variables stored in the memory as v i , with i ≤ 0 for the input variables, and i > 0 for the intermediate computed variables. To calculate the derivative of f at x = c, we use the chain rule of differentiation: To evaluate this derivative, we perform a backward pass (i.e., a reversed sequence of computations) during which we associate each intermediate variable from the forward pass, v i , with a new adjoint variablev i = df dvi ; we then evaluatev i from i = 3 to i = 0. This is shown in Table 1. Noting that we can see that each step in the backward pass requires as input not only the derivative values calculated in the previous steps, but also the intermediate variables calculated during the forward pass. This is illustrated in the computational graph in Table 1 This reverse-mode automatic differentiation scheme calculates the exact derivative value (up to floating point errors) without relying on the closed form expression of the derivative; instead, it relies only on the structure of the computational graph. The reverse-mode AD method generalizes in a straightforward manner to gradient calculations for real-valued multivariate functions of the form f : R n → R, as we demonstrate for a toy function in the Supplementary Material (Section 1). From Tables 1 and S1 (Supplementary Material), we can see that for the reverse-mode AD gradient calculation, the values of the intermediate variables calculated during the forward pass are shared with the backward pass; each elementary expression is only computed once but reused multiple times. This ensures that the gradient calculation is very efficient computationally. In fact, for functions of the form f : R n → R, if the function evaluation (forward pass) requires ops(f ) = N floating point operations, then the number of floating point operations required for the gradient evaluation (backward pass) is always given by ops(∇f ) = k · ops(f ) = kN, with 0 < k < 6 a constant, such that k typically satisfies k ∈ [2,3], no matter the value of n [34,40]. In other words, for reverse-mode AD, the time required to calculate the gradient ∇f is always within an order of magnitude of the time required to calculate the function value itself. This is known as the cheap gradient principle.
As such, the reverse-mode AD procedure is ideally suited to provide gradient-based iterations aiming at solving of optimization problems. A quintessential case in point is machine learning with neural networks-the 'training step' in large neural networks boils down to the numerical optimization of an error metric involving a large number of functional compositions and up to millions of input parameters. In such a situation, the derivation of the closed-form gradient with pen and paper is simply out of reach. Thus, the implementation of gradient descent procedures that use the AD framework has been key to the recent meteoric rise in machine learning applications.
In contrast to the neural network case, a typical error metric for the phase retrieval problem only involves a limited number of functional compositions, and the closed form gradients can be calculated manually. This has been demonstrated in prior phase retrieval literature: the Wirtinger flow method and its variants [24,25,42], the PIE family of methods [32], and numerous other methods proposed in the literature [14,16,43,22,29] are examples of gradient-based descent procedures that rely on such closed-form gradient expressions. However, the impressively flexible AD framework not only simplifies these selfsame gradient calculations, but also allows us to modify the forward model; this empowers us to address the full range of problems that are related to phase retrieval. Consequently, we expect that the AD framework should also greatly benefit the phase retrieval community.
As we demonstrate in Section 3 for the ptychographic problem, the error metric for the phase retrieval problem is a scalar-valued multivariate objective function of complex variables, f : C n → R [44,16]. To minimize such a function using a gradient descent approach, we adopt the Wirtinger gradient descent formalism [45,46,47]. For some z ∈ C n , the Wirtinger gradient operator is defined as and I[z] are respectively the real and imaginary parts of z, and ∂f ∂R[z] and ∂f ∂I[z] the componentwise partial derivatives with respect to the real and imaginary parts of z. Since the partial derivatives ∂f ∂R[z] and ∂f ∂I[z] are both individually real-valued, we can calculate them by separately using the reverse-mode AD framework in the same fashion as shown in Tables 1 and S1 (Supplementary Material). This ensures that the gradient calculation procedure is very efficient, with the time cost once again comparable to that for the objective function itself.

Validation: Far-field transmission ptychography
In this section, we first establish a forward model for far-field transmission ptychography. We then use reverse-mode AD to set up a gradient descent procedure that is, by construction, equivalent to the ePIE reconstruction method [32] which uses a closed-form gradient expression. We compare these frameworks numerically and establish that the automatic differentiation framework calculates gradient values that are identical to those calculated via the closed-form gradient expressions.
This comparison to the well-known ePIE method serves to establish the validity of the AD framework for phase retrieval. The ePIE method, however, cannot be used out-of-the-box for object reconstruction once we modify the ptychographic forward model-as such, it is not well-suited for use with the AD framework. Instead, we demonstrate that we can use the flexible and easy-to-use state-of-the-art accelerated adaptive gradient descent algorithms (like the Adaptive Moment Estimation or Adam algorithm [48]) that are commonly available out-of-the-box with AD software to efficiently solve the ptychographic reconstruction problem.

Forward model for far-field transmission ptychography
In far-field transmission ptychography, an unknown object is illuminated with a coherent beam, called the probe, which is localized to a small area on the object. The intensity of the wavefront transmitted through the object is then measured at the far field by a pixel-array detector. The beam is used to raster scan the object in a grid of J spatially overlapping illumination spots, generating a set of J diffraction patterns at the detector plane. The forward model for far-field ptychography consists of the following steps: 1. The complex valued object transmission function O is approximated by a total of N pixels in the object plane. For the illumination position r j , a shift operator S j , with an M ×N matrix representation, extracts the M object pixels illuminated by the probe beam P containing M pixels in the object plane. The thus transmitted wave function is represented by the exit wave ψ j ∈ C M : where is the element-wise Hadamard product operator.
2. Each transmitted exit wave ψ j propagates to the far field detector plane, where the detector then records a real-valued intensity pattern containing M pixels. If there is no noise fluctuation, the expected intensity of this jth wave-field at the detector plane reads where b j is the expected level of background events, F is the matrix representation of the two-dimensional digital Fourier transform, and | · | extracts the modulus element-wise for a complex-valued vector. (4) describes the behavior of the detected intensity in average. During an experiment, each diffraction pattern is subject to random fluctuations produced by the instrumental and the Poisson counting fluctuations (shot-noise). If one further assumes that the instrumental (thermal) noise is negligible, a Gaussian additive perturbation is usually accurate enough to describe how the counting fluctuation plagues the square-root of the measured intensity (see, for instance, [22] and references therein):

The model in Equation
where ε j is a centered Gaussian random vector.
The above relations are the forward (observation) model that predicts how the observations behave when the sample and the probe are jointly given. The inversion/reconstruction step simply aims at retrieving both these quantities from the observations. For that purpose, the maximum likelihood estimator defines the solution of this reconstruction step via a minimization problem that can be easily derived from the fluctuation model of Eq. (5). In our case it leads to where g is a separable fitting-function that reads where · denotes the usual Euclidian norm in C N , and h 1/2 j and y 1/2 j denote the componentwise square roots of the vectors h j and y j respectively. While this fitting-function has been used in the phase retrieval literature for decades, there exist some alternative noise models that would in turn provide alternative functionals to minimize [22,4]. In any case, the optimization step derived from the noise model involves a large-scale phase retrieval problem that is rather challenging. Phase retrieval problems are NP-hard problems because of their inherent non-convex structure and the search for good and computationally efficient heuristics is still an open problem. The overlapping between successive scanning probe is however recognized as a key factor that helps in preventing stagnation for standard, derivative-based iterations. Thanks to AD, these derivatives can be efficiently computed and further used by one of the iterative solvers that will be discussed in the next section.

Iterative ptychographic reconstruction with reverse-mode AD: ePIE and Adam
We first consider the extended Ptychographical Iterative Engine or ePIE algorithm [5], which can be summarized in a generalized form in Algorithm 1.

Algorithm 1 Generalized ePIE gradient descent iteration
Require: Probe and object initial guesses P 0 and O 0 . Require: Minibatch size b. For the original ePIE procedure, b = 1. With the probe positions indexed as j ∈ J , calculate the partial derivatives:

4:
Calculate the step sizes where · max denotes the maximum element of the enclosed vector.

5:
Set 6: end for With b = 1, Algorithm 1 exhibits one iteration of the ePIE procedure, during which it uses the information in single diffraction patterns to cyclically update the current probe and object estimates. The core (or "Engine") of this update is the correction (Equation (10)) computed from the considered probe position j ∈ J using the derivative of g with respect to the sample O and the probe P. In its original formulation, the ePIE procedure uses the well-known closed form expression of these derivatives (see Equations (6) and (7) in Ref. [5]) to compute these updates.
Alternatively, we can use the reverse-mode AD procedure for the numerical computation of these derivatives by relying on the Wirtinger formalism (Equation (2)) for complex derivatives. The derivative with respect to the real and imaginary parts of the complex variable are evaluated separately, and they are then accumulated according to the equations and The AD version of ePIE is mathematically equivalent to the original ePIE [5]; thus, both the algorithms should generate the same sequence of iterates. We refer to the AD version of the ePIE solver as AD-ePIE. In Figure 1 we present a simplified representation of the computational graph structure that generates the AD-ePIE iterates (see also Table 1). We can see from the figure that the modular reverse-mode AD framework is agnostic as to the choice of the error metric, the update scheme, and even to the forward model itself; it generalizes straightforwardly to any variations in these. We demonstrate this in Section 4, where we vary both the forward model and the update scheme. A natural extension to the ePIE/AD-ePIE algorithm is to use several probe positions to compute a single update: that is, to increase the minibatch size b ≥ 1 in Algorithm 1. In this perspective, Algorithm 1 belongs to a larger family of iterates that takes advantage of the natural partitioning of the dataset [22]. Similar or identical optimization strategies are indeed well known under different names in several communities, such as 'ordered-subset' in image processing [49], 'incremental gradient' [50,Chapter 1] in the optimization literature, or 'stochastic gradient' for neural-network learning algorithms [51].
A salient feature of ePIE/AD-ePIE is that the chosen step sizes α k O and α k P (Equation (9)) for the object and probe updates respectively equate to the inverse of the Lipschitz constant of the partial gradients ∂ O g k and ∂ P g k [33]. This choice of step sizes is well-known in the optimization literature, and is particularly useful in a batch gradient descent setting where the derivatives in the update (Equation (10)) are calculated using all available probe positions at once (setting b = J). As an example, with a steepest-descent iteration algorithm [50] these step sizes would then ensure global convergence (toward a local minimizer). However, the Lipschitz constants are not usually known a priori ; they generally have to be carefully derived using the closed-form gradient expressions [33]. Changes to the forward model, or the inclusion of additional terms in the error metric (e.g. regularizers), can thus require a re-deriving of both the closed form gradient expression and the Lipschitz constant. As such, using the Lipschitz constants to calculate the step sizes would negate the flexibility that is the hallmark of the AD procedure.
To circumvent this limitation and enable phase retrieval within the AD framework, we can substitute the ePIE/AD-ePIE method with a choice of state-of-the-art adaptive gradient descent algorithms that are widely used in the machine learning literature [52], such as Momentum, Nesterov Momentum, Adagrad, Adadelta, RMSProp, or Adam. The performance of these exotic methods specifically in phase retrieval applications is a new but promising area of research, with the recent literature [32,53,54] demonstrating that momentum-based accelerated gradient descent methods converge to a solution faster than standard gradient descent methods. In this work, we use the Adam (Adaptive Moment Estimation) gradient descent procedure [48] that uses a momentumlike acceleration and, crucially, does not rely on the Lipschitz constant for the gradient descent. As such, the Adam method is robust to changes in the error metric and therefore well-suited to phase retrieval applications.
The Adam optimization method is available out-of-the-box in the commonly used AD toolsets TensorFlow [35] and PyTorch [36], so that it can be used directly without first implementing the underlying algorithm. Nevertheless, for the sake of clarity, we present in Appendix A the parameter initialization step and the variable update step that make up the Adam optimizer for the object variable (the probe updates are similarly calculated); we also provide a sample code in the Supplementary Material. To solve the ptychography problem, we incorporate the Adam optimizer in a fashion identical to Algorithm 1, but with steps 4 and 5 substituted with the update computed from the Adam optimizer. The initial Adam step size is chosen through a trial and error method. In Section 3.3, we provide a minimal comparison of performance of the Adam method to the ePIE/AD-ePIE method by iterating through the individual probe positions and calculating the object and probe updates separately per probe position. A detailed evaluation of the performance of the Adam algorithm for phase retrieval is an interesting research question but is beyond the scope of this paper.

Numerical results
For a numerical validation of the reverse-mode AD procedure, we simulated a far-field transmission ptychography experiment with an incident x-ray of energy 8.7 keV. We used the 'Mandrill' and 'Cameraman' images, each 128 × 128 pixels in size, as the test object magnitude and phase respectively. We illuminated the test object with a complex-valued probe function approximated using an array of size 64 × 64 pixels and with a total integrated intensity of 10 6 photons at the object plane. The probe function was obtained by propagating the exit wave from a square aperture of width 7 µm to a distance of ζ = 15 cm so that it contained diffraction fringes characterized by √ λζ = 4.6 µm. The raster grid for the object scan was obtained by translating the probe latitudinally and longitudinally in steps of 3.5 µm (6 pixels) each and additionally wrapping the probe around the object (i.e., using a periodic object). The real-space pixel grid was obtained by assuming exact Nyquist sampling with respect to a detector with a pixel pitch of 55 µm placed at a distance of 14.6 m from the object. We thereby generated 484 far-field diffraction patterns at the detector position, then incorporated Poisson noise into these diffraction patterns to get the simulated far-field data points. To implement the reverse-mode AD approach, we followed previous work [38,39] and used the TensorFlow [35] machine learning library for the gradient calculation. For the reconstruction, we used a 7 µm (12 pixels) square aperture as the initial probe guess, and a random complex array as the initial object guess. Using the same starting parameters, we performed 150 iterations through the data set (first five with the probe value held constant) for the ePIE, AD-ePIE, and the Adam methods, with the same randomized sequence of diffraction patterns for all three algorithms. For the Adam method, we used step sizes 0.05 and 0.01 for the probe and object respectively.
As we demonstrate in Figure 2, the AD-ePIE method followed the same reconstruction path as the ePIE method for both the error metric g(O, P), and the normalized object reconstruction error calculated per pixel from the ground truth [55]. This demonstrates that the reverse-mode AD framework described in this paper calculates gradient values numerically identical to those calculated using the closed form symbolic derivatives [5]. Additionally, the Adam method also successfully reconstructed the probe and the object, and in fewer iterations than with the ePIE method.

Applications to other ptychographic forward models
In this section, we apply the reverse-mode AD framework developed in Section 3 for the far-field transmission ptychography experiment to the more complex experimental regimes of near-field ptychography and multi-angle Bragg ptychography.

Near-field ptychography
Just as in the far-field case, the near-field ptychography experiment uses a probe beam to raster scan the object in a grid of spatially overlapping illumination spots, generating a set of J exit waves after the probe-object interaction. The detector, however, is placed at a distance z from the object so that the Fresnel number satisfies the condition where λ is the wavelength of the incident probe, and W is the lateral extent of the illuminated area [7] at any given probe position. This is the high Fresnel number condition for the near-field regime. In this regime, given a probe P, an object O, a lateral shift operator S j , and an exit wave ψ j = P (S j O), the expected intensity at the detector plane is given by where b j is the expected background, and D z is the Fresnel free-space propagator for the distance z defined by the expression where (q x , q y ) the reciprocal space coordinates [56]. The experimentally measured intensity patterns are again denoted as y j . Detailed characterizations of this near-field ptychography experiment can be found in [7], [57], [58], and [59]. We simulated a near-field ptychography experiment using an incident 8.7 keV probe beam approximated using an array of size 512 × 512 pixels at the object plane, with the pixel pitch set to 0.6 µm. The probe was initialized as a Gaussian with a FWHM of 19 µm (50 pixels), then passed through a diffuser and modeled as a speckle pattern with an average flux of 10 4 photons/pixel. The simulated object consists of the 'Mandrill' and 'Cameraman' images of size 192 × 192 pixels, modeled as the object magnitude and phase respectively. The raster grid was obtained by translating the object in the horizontal and vertical in steps of 10 µm (44 pixels), generating a total of 16 diffraction patterns at a detector placed 4.7 cm from the object plane. We added Poisson noise to the diffraction patterns to obtain our simulated dataset.
For the near-field reconstruction, we obtained an initial probe estimate by backpropagating the average measured intensity, y avg = ( J j=1 y j )/J, to give The object was initialized as a 192 × 192 array of random complex numbers. To solve for the probe and object structures, we again used the Adam optimizer in the approach outlined in Algorithm 1, but considering 5 probe positions per iteration (i.e. with a minibatch size of 5). The initial Adam step sizes were set to α A O = 0.1 and α A P = 10 for the object and the probe respectively. After 10,000 iterations of Adam gradient descent, we obtained the object and probe reconstructions shown in Figure 3. The object was reconstructed with an NRMSE of 0.01, and the probe with an NRMSE of 0.12. These reconstructions demonstrate that the straightforward generalization of the reverse-mode AD reconstruction framework from the far-field ptychography case (Figure 1) to the near-field ptychography case successfully solves the near-field ptychography phase retrieval problem.

Multi-angle Bragg ptychography
Multi-angle Bragg projection ptychography (maBPP) [60] is a ptychographic experiment that allows for two degrees of freedom in the scan parameters: 1) the choice of the planar scan positions for the usual two-dimensional ptychographic scan, and 2) the choice of angular scan positions corresponding to small object rotations of a crystalline object oriented to satisfy a Bragg diffraction condition. The maBPP experiment uses the far-field ptychography setup, but with the detector placed to measure a crystalline Bragg peak, typically displaced from the direct beam by tens of degrees. The detector records a set of two-dimensional (2D) coherent diffraction patterns at overlapping scan positions at one or more angles near the Bragg diffraction condition. The diffraction patterns are then used to reconstruct a three-dimensional (3D) strain-sensitive image of an extended crystalline sample [61]. An example experimental geometry with the exit beam at a 60 • angle to the incident beam direction is shown in Figure 4.
To develop the maBPP forward model, we consider a 3D diffracting crystal volume O and an illumination volume P. The incident probe direction k i and the exit beam direction k f satisfy the Bragg diffraction condition when the scattering vector q = k f − k i coincides with some reciprocal lattice vector G hkl for the crystal. At the Bragg condition, if the probe volume interacts with the slice of the object indicated by the lateral shift operator S j , then the 2D exit wave ψ j is calculated as [61] where R is the matrix operator performing the projection along the exit beam direction. When the object is rotated by a small angle ∆θ j , the scattering vector deviates from the Bragg condition by Q j = q j − G hkl . The corresponding change in the object-probe interaction can be encoded in terms of a phase shift operator defined as Q j = exp (ir · Q j ). The 2D exit wave is then given by [60] and the expected intensity at the detector plane is then where b j accounts for the background events. The experimentally measured intensities are denoted by y j . We can again adapt the forward model of the far-field ptychography experiment to the multiangle Bragg experiment by generalizing the probe and object models to 3D, and applying the angle-dependent phase shifts Q j along with the projection operator R. This 3D generalization comes at the cost of a dramatic increase in the parameter space, and the reconstruction becomes correspondingly more difficult. With this in mind, for ease of reconstruction we use a speckle pattern, which was found to help in solving the phase retrieval problem for 2D CDI [62,63], as a highly diverse structured probe illumination. We also impose the commonly applied [61] additional constraints on the reconstruction problem: we assume that the probe structure and the object thickness (along the x−direction) are known in advance.
For the multi-angle Bragg ptychography experiments, we simulated the object as a compact crystal represented by set of grid points inside a faceted volume. We set the interior points to have magnitude 0.5 and a spatially varying phase structure emulating a strain field. We defined the orthonormal real space axes (x, y, z) such that each object voxel is of size 66 × 66 × 66 nm 3 , and such that the polyhedral crystal is situated obliquely within (with ≈ 32% of the volume of) a cuboid of size 26 × 56 × 50 voxels (≈ 25 µm 3 ). The cuboid was itself placed at the center of a simulation box of total size 64 × 162 × 112 voxels (≈ 334 µm 3 ). A 64 × 64 pixel detector with a pixel pitch of 55 µm was placed at a distance of 1.5 m from the object, normal to the real space y-axis. The probe was initialized as Gaussian beam of energy 8.7 keV with an FWHM of 396 nm (6 pixels) contained entirely within a 64 × 64 pixel array, then passed through a diffuser and modeled as a speckle pattern. The probe was then interpolated to approximate the incident beam at a Bragg condition of 2θ = 60 • (Figure 4) such that k i ⊥ z and k f y. The beam profile was assumed to be constant along the propagation direction during its propagation through the simulation box. To obtain the ptychographic data set, we applied phase modulations corresponding to an angular shift between ±0.14 • around the Bragg condition using an angular step size of ∆θ = 0.02 • , for a total of 15 angles. At each such angle, we translated the probe along the y and z directions with a step size of 132 nm (2 pixels) in a raster grid of 41 × 24 scan positions, i.e. with an overlap of ≈ 86% per step in the y direction and ≈ 83% per step in the z direction. This generated a total of 14760 diffraction patterns, to each of which we added Poisson noise, which gave us a data set with an overall intensity maximum of 13560 photons/pixel. During the reconstruction, we solved for a volume of size 30 × 78 × 82 voxels, initialized as a random complex array, and placed centrally in the simulation box. The object support along the x−direction is set to be slightly (≈ 15%) larger than the actual object thickness. This allows for the expected spreading of the object induced by resolution effects due to the low photon count [64]. To solve for the object structure, we again adapted the approach outlined in Algorithm 1 to use the Adam optimizer with minibatches containing 400 diffraction patterns each, with the diffraction patterns selected in a stochastic fashion irrespective of either probe position or object rotation. We chose this minibatch size through trial and error, optimizing for computational speed and memory requirements. After 70 iterations of Adam gradient descent, we obtained a reconstruction of the crystal (and the loose support) shown in Figure 5. As we can see in Figures  5(c,f), the error in the reconstruction is primarily due to discrepancies at the object edges, which is as expected given that the diffraction data is corrupted by the presence of shot noise [64]. The overall normalized reconstruction error, calculated using a 3D adaptation of the subpixel registration algorithm presented in [55], was found to be 0.09.

Conclusion
Our results in this paper demonstrate that the automatic differentiation technique can be used to construct a general gradient-based inversion framework for ptychographic phase retrieval. Specifically, we have shown that we can minimize a ptychographic error metric by first using reverse-mode AD to numerically calculate the gradients of the error metric, then using these within a state-ofthe-art adaptive gradient descent algorithm. We can thereby solve the ptychographic problem without ever performing an analytical derivation for any closed-form gradient expression. This inversion framework is robust to changes in the forward model and can be used identically for phase retrieval in such varying experimental configurations as far-field transmission ptychography, near-field ptychography, and multi-angle Bragg ptychography.
The inversion framework showcased in this work does not rely on any particular choice of the error metric or the experimental configuration. We can change the error metric either to accommodate alternate noise models, or to incorporate a priori knowledge about the experiment through the choice of a regularizer. We can tailor the experimental setup and introduce new degrees of freedom when necessary, then simply change the forward model correspondingly. Additionally, by using the state-of-the-art TensorFlow library for the gradient calculations, we gain in-built scalability and parallelization (multi-CPU/GPU architectures), thus allowing for convenient phase retrieval for experimental models large and small. In the future, we aim to leverage these flexibilities to incorporate a variety of physical phenomena-such as probe fluctuations, positional inaccuracies, and limitations in the detection process-into the phase retrieval framework. This would enable highresolution phase retrieval from near-field/far-field 3D (or 2D) Bragg and non-Bragg experiments, all within a single consistent platform.
The flexible AD-based inversion framework is expected to provide a unified approach to phase retrieval for general (ptychographic and non-ptychographic) CDI experiments with x-rays, electrons, or optical waves, even as we move towards increasingly complex imaging regimes. This would potentially give researchers the ability to explore variations of basic CDI methodologies in a convenient and straightforward manner, and could thereby prove a powerful addition to the phase retreiver's toolbox. Adaptation of automatic differentiation to the maBPP diffraction geometry was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Science and Engineering Division.

A Parameter update with the Adam optimizer
To examine how the Adam optimizer performs the parameter updates, we consider the far-field transmission ptychography setting from Section 3.1, with the error metric g(O, P) which is to be minimized with respect to the object parameter O. The optimization is performed in the minibatch setting described in Algorithm 1, wherein the partial derivative at the k th step, ∂ O g k , is calculated using Equation (8).
The Adam optimization step consists of a parameter initialization step and a variable update step. The initialization step, which precedes the gradient descent iterations, is outlined in Algorithm 2a. The parameter update step in Algorithm 2b replaces Steps 4 and 5 in the iterative descent setting outlined in Algorithm 1. Typical out-of-the-box implementations of the Adam optimizer only support gradient descent for real-valued variables, in which case the real and imaginary components of the object O are separately updated. To formalize this update step, we can separate the real and imaginary components of ∂ O g k as and

See Supplement 1 for supporting content.
Abstract This document provides supplementary information to "Using automatic differentiation as a general framework for ptychographic reconstruction." In particular, this document provides a detailed example of gradient calculation using the reverse-mode automatic differentiation method. Additionally, it also contains a sample code for ptychographic phase retrieval using the TensorFlow library within the Python programming language.
In Table 1, we present the sequence of computations that we can perform to evaluate the objective function (forward pass), and the gradient (backward pass). The table follows the usual notation with the input variables indexed as v i≤0 , and intermediate variables indexed as v i>0 and v i>0 = ∂ω ∂vi for the forward and backward passes respectively. At (x 1 , x 2 ) = (0.5, π/4), the objective function and the final gradient values are calculated to be ω(0.5, π/4) = 0.7765, and,

Forward pass
Backward pass Table 1: Reverse-mode AD for a multivariate function. While the backward pass again requires the values of the intermediate variables computed during the forward pass, these connections are omitted from the computational graph for the sake of clarity.

Software implementation for ptychographic phase retrieval
We implemented the phase retrieval framework described in this paper using the TensorFlow machine learning library within the Python programming language. In Listing 1, we present sample code that uses TensorFlow to perform a batch gradient descent with the out-of-the-box Adam optimizer for ptychographic phase retrieval. To accomplish this, we create separate realvalued variables for the real and imaginary components of the object (and the probe), then combine these into a single complex-valued tensor for subsequent forward propagation. Once we define the error metric to be minimized, we create separate instances of the Adam optimizer for the probe and object minimization. Initializing a session creates a computational graph and loads it in memory. We can now minimize the probe and object variables.