Overcoming information reduced data and experimentally uncertain parameters in ptychography with regularized optimization

The overdetermination of the mathematical problem underlying ptychography is reduced by a host of experimentally more desirable settings. Furthermore, reconstruction of the sample-induced phase shift is typically limited by uncertainty in the experimental parameters and finite sample thicknesses. Presented is a conjugate gradient descent algorithm, regularized optimization for ptychography (ROP), that recovers the partially known experimental parameters along with the phase shift, improves resolution by incorporating the multislice formalism to treat finite sample thicknesses, and includes regularization in the optimization process, thus achieving reliable results from noisy data with severely reduced and underdetermined information.


Introduction
Ptychography recovers the phase shift of scattered radiation within a probed specimen from a series of diffraction patterns [1]. It evolved due to improvements in computing power, detector hardware and algorithm development from a technique with relatively limited practical application * Correspondence should be addressed to Wouter Van den Broek (vandenbroek@physik.hu-berlin. de) to a mature technique that is popular in light microscopy [2], the X-ray community [3,4] and electron microscopy, where it has recently reached subangstrom resolution [5]. Its principle is based on illuminating the specimen with a spatially limited probe that is shifted such that an overlap between illuminated areas provides information redundancy.
One of the main benefits of ptychography is that spatial resolution is not limited by the imaging optics [1]. However, in practice, it comes at the price of a challenging task for the reconstruction algorithm to solve the phase problem. While the algorithm must recover the phase, the quality of the reconstruction depends strongly on the precise knowledge of the shape [6,7,8,9], the coherence [10], and the positions [11,12,13] of the probe. If these parameters are not initially known, they are required to be retrieved by the algorithm during the reconstruction process. Initially, ptychography relied on the projection approximation, which restricted the method to relatively thin samples [1,7].
To account for multiple scattering in thicker samples, a multislice method can be adopted [14,15,16,17,18,19,20]. In cases where the noise level in the diffraction data is too high and/or the oversampling limit is not reached, ptychography reconstruction algorithms tend to either converge to the wrong solution or are too unstable to reach convergence [21].
A variety of ptychographic reconstruction algorithms exist, utilizing either iterative [8] or direct inversion [22] to solve for the phase of the object, and sometimes for the probe shape and/or positions [16]. These iterative algorithms have introduced gradient based optimization [23] (including the use of finite difference derivatives in the optimization of the beam positions [24]), multislice reconstructions [16,15] as well as regularization [23,25,15], but these innovations have not been previously combined.
We introduce regularized optimization for ptychography (ROP), which is a multislice-capable conjugate gradient descent reconstruction algorithm that can reconstruct the phase while simultaneously retrieving the probe shape and positions. ROP is an extension to the inversion of dynamical electron scattering (IDES) algorithm, which has been successfully demonstrated to achieve inversion from simulated high resolution transmission electron microscope images, diffraction data and experimental optical Fourier ptychography data [14,15,26,10]. The reconstruction algorithm adapts the design of an Artificial Neural Network (ANN), a model very well established in the machine learning community, to efficiently calculate the gradients and the reconstructed phase, which can be regularized during optimization to ensure sparseness.
In this paper it is shown that the conditioning of the reconstruction problem, characterized by the oversampling ratio [27], gets worse for desirable experimental conditions such as smaller observed diffraction patterns and larger step sizes, which can increase recording speed, field of view, or reduce the amount of acquired data, and more convergent electron beams, which improve resolution. A smaller beam support which improves computation speed also worsens the conditioning.
The power of using ROP is demonstrated by relaxing the experimental requirements and significantly reducing the oversampling ratio to even below unity, where the problem becomes underdetermined. It is thus shown that the incoherent resolution can be reached from just the intensities in the central diffraction discs, due to the regularization better conditioning the phase problem. Furthermore, only partially known experimental conditions such as the beam shape and the beam positions can be treated, as can relatively thick specimens due to the incorporation of the multislice algorithm. These improvements will enable the application of ptychography at higher frame rates and scanning speeds, more convergent probes, and larger fields of view. The reduced data load per recording will furthermore facilitate a fast computation of the reconstruction.
The paper is organized as follows. In Section 2 the image formation in electron ptychography and its impact on the inverse problem are treated. Furthermore, an overview of the settings for the simulations and the experiments is given. In Section 3 the probe and position correction are demonstrated on simulated MoS 2 data. Then, the effect of regularization on underdetermined data is investigated, and reconstructions from experimental MoS 2 and Nb 3 Cl 8 data are presented. Finally, the conclusions are drawn in Section 4.

Image formation in electron ptychography
We consider the multi-slice method for the scattering process of an electron beam that passes through a sample. In this framework, the Schrödinger equation has a solution that is expressed for an electron wavefunction that evolves in a specimen which has been sectioned into Z multiple slices. The wavefunction ψ z ( r) is alternatively transmitted through a slice z and propagated to the next slice z + 1 by the following equation: where ⊗ denotes the convolution operation, σ the interaction constant, V z ( r) a complex quantity, with its real part V z re ( r) the projected atomic potential and its imaginary part V z im ( r) the absorptive potential at slice z, λ the wavelength, ∆z the slice thickness, and r the lateral real-space vector coordinates, respectively. Since V z ( r) is complex, this model accounts for the amplitude contrast. Based on the projection approximation, the transmission function at slice z can be defined as t z = e iσV z ( r) , and the convolution kernel in the second half of Eq.
(1) is the so-called Fresnel propagator responsible for free-space transmission to the next slice. The wave exiting the sample ψ Z+1 ( r) is further propagated to the detector, which can be modelled by a Fourier transform and the intensity of the diffraction pattern p is thus defined as: The multiplication in real space of incoming wave and transmission function amounts to a convolution in reciprocal space, causing wrap-around artifacts due to periodic boundary conditions. Following [28], these artifacts are prevented by setting to zero the frequencies above two-thirds of the Nyquist frequency for each slice in the multislice algorithm, resulting in a diffraction pattern that is zero beyond the circle defined by this limit. For each diffraction pattern p in the stack of the ptychographic dataset, consisting of P diffraction patterns, the incoming The main variables of the ptychography set-up are illustrated in Figure 1. The beam is scanned with a step size of ∆x, and has a real-space support of width w, divided into m pixels of width d. This translates into an m × m diffraction pattern in reciprocal space with pixel size δ = 1/w. The experimentally recorded diffraction patterns are n×n pixels wide, with n ≤ m, and the variable θ obs is their width. Furthermore, θ con stands for the beam's convergence semi-angle. Reconstruction aims for a resolution r, corresponding to the angular frequency θ res . As detailed in Appendix A, when single scattering is dominant, θ res gets smeared out between θ res − θ con and θ res + θ con . This implies that ptychography is able to achieve the incoherent resolution limit of 2θ con from just the intensities within the central disc.
Retrieving the electrostatic object potential V from the measurements is a mathematical problem with a number of knowns, N k , given by the number of pixels in the measurements, and a number of unknowns, N u , determined by double the number of pixels in the reconstructed complex potential. The ratio N k /N u is the so-called oversampling ratio [27], for which in Appendix A an expression is given. In the limiting case of a large scan area, the incoherent resolution r, and thin specimens, the expression reduces to N k N u = 2 81 It is thus shown that the problem gets conditioned worse for smaller observed diffraction patterns, larger step sizes and smaller beam support, properties that improve both recording and computation speed. Furthermore, a more convergent electron beam, through a larger opening angle, further reduces the oversampling ratio.

The inverse problem
The following derivations are carried out for a single beam position and the associated diffraction pattern. The index p is thus dropped for the time being, but will be reintroduced from Eq. (15) onward to treat the interdependence. Furthermore, the variable σ is absorbed into V for brevity. The discrepancy between simulation and experiment of a single diffraction pattern is quantified by an error metric E. In this paper, we define three different metrics: To more clearly distinguish the measurements from the model, they are denoted by J. The metric E 3 describes the log-likelihood of the measurements under a Poissonian noise model. The error metrics only take into account the central n × n pixels from the m × m simulated diffraction patterns. Derivative based methods allow a free choice of error metric. When the noise is purely Poissonian, as is the case with direct detectors [29], the loglikelihood E 3 is well-suited, while in [10], it was shown how E 1 was advantageous in the presence of small modelling errors.
In addition to the error metric, we introduce a regularization term R that acts as a sparsity constraint on the object. Among different alternatives [23,25,15], we express that atomic resolution images are sparse in the pixel basis and thus define R by the 1 -norm of the projected potential: Optimization methods for inverse problems that involve a regularization need to trade-off the error, corresponding to the exactness of the fit of the model to the given data, against the amount of regularization in the solution. This balancing is controlled by a regularization parameter which in ROP is denoted by µ. Error metric E and penalization term R are then combined in a loss function: Here, a high µ value imposes a strong sparsity constraint, while a low µ value emphasises a strong compliance to the data. Object, probe shape and positions are eventually retrieved by iteratively minimizing the loss function using the respective gradients. As illustrated in Figure 2, the multislice algorithm is recast as an ANN and we use the backpropagation algorithm to compute the derivatives of the loss w.r.t. the parameter of interest [14,15,26,10] . The derivative of the error w.r.t. the atomic potential at each slice is given by: where ψ z t z is computed in the forward propagation and is obtained in the backpropagation algorithm. To account for scattering effects in a Figure 2: Schematic of the network model. The core ANN is divided into sub-networks by the number of probe positions. For each sub-network, the impinging wave ψ z is multiplied with a transmission function t z . The next layer of edges and nodes encodes a real-space convolution with the Fresnel propagator, resulting in ψ z+1 . This is repeated until the exit wave ψ Z+1 is produced and the intensity is taken in the far-field. thick sample while keeping the complexity of the reconstruction problem low, the atomic potential in all slices can be constrained to be equal, so that the oversampling ratio N k /N u remains unaffected. Thus, Eq. (9) can be written as: The derivative of the regularization term R with respect to the potential is calculated as described in [15]. The derivative of the loss with respect to the incoming probe, ψ 0 = ψ 0 re +iψ 0 im , is calculated with a translated object instead of a translated beam to ensure that the result is centered on the origin. Following Appendix B we have: The derivative of the error w.r.t. the probe position, also calculated in Appendix B is: The first factor in the summation is the derivative of the loss function with respect to the probe in position R and follows directly from the backpropagation; the second factor is the derivative of the probe in position R with respect to the spatial coordinate r and can be calculated numerically by a convolution with a one dimensional derivative filter in each of the two spatial directions; the index k runs over all elements of the probe and its derivatives. Note that contrary to [24], this is expression is analytical. The forward propagation of the electron wave and the subsequent backpropagation of the loss are computed independently for each diffraction pattern p. The updates for atomic potential V and incoming beam ψ 0 are then performed with a loss function and gradients that are respectively formed from the entire batch of diffraction patterns P , It is assumed that the incoming beam ψ 0 is constant over the beam positions. Furthermore R collects the individual vectors R p . The projected potential of the object, the probe shape and the probe positions are retrieved through a non-linear conjugate-gradient method, making use of Polak-Ribière search-directions d that depend on the batched gradients and the searchdirections of the previous iteration [30,31,32]: In each epoch, enumerated by the superscript a, the three quantities are optimized iteratively during a so-called sub-epoch. At the end of the iterations for V , enumerated by the subscript 0 ≤ f < F , it holds that V a+1 = V a F , and, mutatis mutandis, the same rules hold for the other two quantities. The step sizes of the respective optimizations α, β and γ are found by using a cubic interpolation. The initial step sizes must be chosen sufficiently small such that only the search space of the local minimum is considered.

Settings for simulated and experimental data
To investigate the performance of the algorithm and the regularization effects on the reconstruction, we conducted one simulation and one experiment on both a MoS 2 and a Nb 3 Cl 8 specimen. The first simulation tested for successful convergence of the algorithm when either probe or positions were distorted and the result was supported by an experiment. For the simulation, the ptychographic data of a MoS 2 monolayer was generated using the code described in [33] and with parameters that were mostly chosen to match those used in the experiment [5]. This data, as well as all following simulated data, can be found in Dataset 1 [34]. This experiment, also investigating a MoS 2 monolayer, was performed on the Titan Themis 300 microscope, operated with an acceleration voltage of 80 kV (λ = 4.18 pm) and a convergence semi-angle θ con of 21.4 mrad. The EMPAD direct electron detector [29] recorded 124 × 124 pixel diffraction patterns and while a 87 × 51 probe position scan with a step size ∆x of 0.021 nm was used in the experiment, a scan of 100 × 100 probe positions was chosen in the simulation. Our second simulation and experiment aimed at improving the resolution of the reconstruction for limited data from a Nb 3 Cl 8 specimen. A resolution corresponding to twice the beam convergence semi-angle, θ res = 2θ con , was aimed for. During reconstruction the angular frequencies were calculated out to θ cal = 3θ con . In this case, a measure of spatial resolution is the minimum resolvable distance between two atoms. Nb 3 Cl 8 is a trigonal system with lattice parameters a = b = 6.831Å and c = 13.75Å. In the [001] orientation Nb-dumbbells with an atom spacing of 0.67Å can be observed. This is shown in Fig. 5(c). At an acceleration voltage of 120 kV, equivalent to λ = 3.35 pm, the dumbbell spacing corresponds to a scattering angle of 50 mrad.
We simulated ptychographic data using the code described in [33], with a semi-convergence angle, θ con , set to 28 mrad so that when ρ = 2 the angular frequency of the dumbbell falls well below θ res = ρθ con = 56 mrad. The number of pixels in the beam support of the simulation was chosen to be four times higher than needed for the reconstruction to avoid the so-called "inverse crime" fallacy [35]. Because this results in a four-times smaller pixel size in reciprocal space, a four-by-four binning of the simulated diffraction patterns was done before reconstruction. The number of slices in the multislice simulation was set to 35 with a thickness of 0.1338 nm per slice, thus covering the crystal's vertical extent of 3c = 4.125 nm. The scan pattern followed a Halton sequence to avoid artifacts in the reconstruction that could originate from the translation symmetry of a grid scan pattern which has been dubbed "raster grid pathology" [21]. A total of 6,600 points within a disc of diameter 4.19 nm were probed, resulting in an average distance between neighboring beam positions of ∆x = 0.0457 nm.
The desired resolution corresponded to 2θ con = 56 mrad, which is therefore sufficient to resolve the dumbbells at 50 mrad. The diffraction patterns only contained the central disc, i.e. θ obs = θ con . The binning resulted in a beam support of w = 0.232 nm wide and diffraction patterns of n = 4 pixels, and m = 20. The resulting oversampling ratio was 0.47, or underdetermined by a factor of over two.
In addition, we generated simulated data similar to the aforementioned data, differing only in the number of beam positions by a factor of four, totalling 26,400 positions. The diffraction patterns were subsequently binned down by a factor of two, resulting in an oversampling ratio of 6.6. The reconstruction from this strongly overdetermined and noiseless data acted as ground truth in the evaluation of the other reconstructions. The settings of the simulation and reconstruction are detailed in Table 1.
For the experiment, a 4 nm thick Nb 3 Cl 8 -crystal was produced through exfoliation and transfer on a SiN TEM grid with holes. It was investigated on the Titan Themis 300 microscope, with an acceleration voltage of 120 kV (λ = 3.35 pm) and a convergence semi-angle θ con of 24 mrad. 124 × 124 pixel diffraction patterns were recorded on the EM-PAD direct electron detector, with a 64 × 64 scan and a scan step size ∆x of 0.043 nm.
From these raw data, three reconstructions were calculated, the first from the unaltered data, and then two from reduced data sets. For the first reduced data set (the second reconstruction) the diffraction patterns were binned 3 × 3, and only the central 12 × 12 pixels were retained, thus yield-ing diffraction patterns out to θ obs = 36.8 mrad, and a beam support of w = 0.571 nm. Since θ obs + θ con = 60.8 mrad encompasses the dumbbell signal at 50 mrad, resolved dumbbells were expected. The second reduced data set (the third reconstruction) is obtained by a 6 × 6 binning, retaining only the central 14×14 pixels. This yielded a beam support of w = 0.285 nm and diffraction patterns out to θ obs = 82.3 mrad, well beyond the farthest extent of the dumbbell signal at 50 mrad +θ con = 74 mrad, and hence resolved dumbbells were expected. The three data sets had an oversampling ratio of 70, 5.6 and 7.5, respectively. All settings are summarized in Table 1. 3 Results

Probe and position correction on simulated MoS 2 data
We investigated the performance of the update functions (20) and (21) on the simulated MoS 2 data described in 2.3. For all the tests, optimization was done for 400 iterations, regularization has been neglected, the initial atomic potential was assumed to be vacuum and since a single layer MoS 2 sample exhibits weak phase difference, the depth of the ROP model has been set to one slice. For the analysis of the probe shape optimization, a false incoming probe that had been deviated in defocus, relative to the correct probe, was read in. In Figure 3, we apply the optimization for incoming probes that deviate in defocus by up to 12 nm. The correct probe positions were used, leaving the algorithm to alternate only between the object and probe shape update function. Comparing different optimization strategies, a sub-epoch of 5 iterations for the object update function and 10 iterations for the probe shape update function achieved the lowest root mean squared error (RMSE) between the probe shape of the updated probe and the correct probe. We further chose the error metric E 1 and the initial step size α 0 for the object update function was set to 1, while the initial step size β 0 for the probe shape update function was set to 1e-5. Using the aforementioned parameter settings, Figure 3(a), shows the intensity line profile of a probe with a defocus of 10 nm that was taken as the initial guess (Probe A), the updated final probe (Probe B) and the correct probe that was used to generate the data (Probe C). The RMSE of the probe shape between Probe B and Probe C was 2.05e-6. Figure 3(b) shows for all the test cases the RMSE of the probe shape between the probe at each iteration of the optimization process and Probe C.
In our second analysis, we focused on the probe position optimization. Here, the reconstruction algorithm started the optimization with probe positions (Positions A) that were randomly deviated from the correct positions (Positions C) and finally converged to the updated probe positions (Positions B). The performance was then evaluated by taking the RMSE between the Positions at every iteration and Positions C. In this analysis, the correct probe shape was used and therefore only update functions (19) and (21) were alternated. Subepochs of 3 iterations for the object update function and 4 for the probe position update function gave the best result. The initial step size α 0 for the object update function was set to 1e3 and the initial step size γ 0 for the probe position update function was set to 1e5. The optimal error metric appeared to be E 2 , showing a much lower final RMSE than E 1 . Figure 4  B and the Positions C. The trajectories they form during the optimization process are indicated by a dashed blue line. In Figure 4(b), we show the optimization for cases with initial mean deviation starting from 0.54∆x up to 1.92∆x and compared the RMSE during the optimization process.

Effect of regularization on under-determined data
In this section, we analyzed the influence of the regularization R on the reconstructed object potential. In particular, we concentrated on the simulated Nb 3 Cl 8 data set, from which we created 7 data sets that differed by the electron count per pixel in the central disc, further denoted as electron count. We increased the electron count from 100 to 1000 and restricted the analysis to only update function (19). The optimization has been limited to 100 iterations and again the initial atomic potential was chosen to be vacuum. The thickness of the Nb 3 Cl 8 specimen has been covered in our model by using 7 slices, each separated by 0.669 nm and equally updated according to Eq. (11). The initial step size α 0 for the object update function was set to 1 and we selected the error metric E 2 .
In Figure 5 we demonstrate the influence of the regularization on the reconstruction quality with a particular focus on the data set that has an electron count of 316. Figure 5 (a) shows that for each of the 7 data sets, a different regularization parameter µ is required. The higher the electron count in the data, i.e. the less noise present, the less regularization is needed for an optimal reconstruction. Here, we estimated the optimal µ based on the RMSE between the final reconstruction of each test case and a reconstruction from the over-determined data set, acting as a ground truth. Tested regularization parameters covered a range from 5.18 to 2.68e2 and the optimal µ was determined as the closest sampling point to the local minimum of a 3rd-order polynomial function that was fitted to a sub-region of the test range. To increase the accuracy of our fit function, we sampled more densely the range close to the local minimum. Figure 5(b) exemplifies the parameter estimation based on the RMSE for the data set with an electron counts of 316. The parameter resulting in the lowest RMSE was found to be µ = 3.20e1. Figure 5(c) shows the reconstruction from the ground truth data that is compared to the reconstructions of the test cases. Figure 5(d) shows a strongly under-regularized reconstruction, corresponding to µ = 5.18, Figure 5(e) shows an optimally regularized reconstruction, corresponding to µ = 3.20e1 and Figure 5(f) shows a strongly over-regularized reconstruction, corresponding to µ = 2.68e2.
Although µ was determined through a sweep, other approaches have been considered in [36,37].

Reconstructions from experi-
mental MoS 2 and Nb 3 Cl 8 data Figure 6 shows the result from the MoS 2 , focusing on the different parts of the algorithm. The settings are shown in Tables 1 and 2, where µ is the regularization parameter, α 0 is the initial object update step size, β 0 is the initial probe update step size, and sub ep. α and sub ep. β are the sub epoch lengths of each in iterations, respectively. Figure 6(a) shows the resulting object when only the object is updated, and the rest of the update-able options (probe positions and probe shape) are left at their defaults. This results in an object with a large background ramp of illumination from top to bottom, which mostly obscures the atomic potentials. However, fitting and removing this background plane of potential does reveal un- derlying atomic potentials. A possible reason for the phase ramp in the background is a sub-pixel error in positioning the diffraction patterns [38]. Figure 6(b) shows the result of then adding just the probe update, which helps to remove the ramp of illumination, but does not accurately reconstruct the expected spherically symmetric atomic potentials. Figure 6(c) shows the result when a previously optimized probe, from 6(b), is passed in as an initial argument to the reconstruction, resulting in much more symmetric atomic potentials. Figure 6(d) shows the same as 6(c), but now also allowing for the positions to be updated. This results in even more circularly symmetric atoms, particularly in the row between the arrows in 6(c), in which the atoms are elongated. The change in atom shape can be seen in the subpanels for 6(c) and 6(d), in which the average shape of the atom in the rows marked by the white arrows is shown, along with the best fit ellipse to the half maximum intensities of the class average. A clear difference in ellipse major and minor axis length can be seen in 6(c), with a much less noticeable difference in 6(d).
The updated positions are shown in 6(e), showing the ability to compensate for global inaccuracies in probe position, due to imperfect rotation and pixel size scaling calibrations, as well as local inaccuracies in positions in the region corresponding to the region between arrows in 6(c) with the elongated atoms. The corresponding region to the area between the white arrows in 6(c) and 6(d) is between the black arrows in 6(e), where a clear shift in positions can be seen, possibly due to microscope instabilities. Figure 6(f) finally shows the reconstructed probe at the sample's exit used for 6(c) and 6(d).
Here we show that only through the optimization of probe positions, probe shape, and object itself do we obtain the highest quality reconstructions using ROP. Figure 7 shows the result from using ROP on data acquired from a 4 nm slab of Nb 3 Cl 8 , and the benefits of using the multislice approach. The parameters used can be seen in Tables 1 and 3, where ∆z slice is the propagation distance between slices. In all cases, updated probe positions were used from a prior reconstruction in order to focus on the multislice capabilities of the algorithm. Figure 7(a) shows the result of reconstructing the object with only one slice in the beam direction. This ignores all aspects of multiple scattering, and re-sults in the dumbbells being unresolved. In the unit cell class average (lower right), the location of the dumbbells are resolved, but not the spacing between the atoms. Figure 7(b) shows the same result, but this time having 6 slices, with a fixed propagation distance of 0.66875 nm, corresponding to half the vertical lattice parameter, thereby matching the sample's total thickness of 4 nm. The dumbbells are sharper, resolvable in individual cases, and can be seen in the class average. Figure 7(c) and 7(d) both show data reduction techniques, in which the acquired diffraction patterns were binned and cropped around the central beam. In Figure 7(c), in which diffraction patterns are binned by 3 and then cropped to 12 × 12 pixels, the dumbbells are still visible in both the class average and object itself. However, in Figure 7(d), where the patterns are first binned by 6 and then the central 14 × 14 pixels were cropped, the dumbbells again are no longer resolvable, although like in 7(a), their positions can be found. Figure 7(e) shows the Fourier Transform of 7(c), with 6-fold symmetric reflections around the dotted circle indicating that the desired resolution to resolve the dumbbells was achieved. In Figure 7(f), the Fourier Transform of 7(d), 6-fold symmetric spots are only observed corresponding to a maximum resolution of 0.84Å when our reduced settings using sixfold binning are employed. Both Fourier Transforms were averaged using a sixfold symmetry operation and weighted with a r 0.4 strength ramp to emphasize the signal from the noise. As Figure 7(c and) 7(e) show, the potential for much faster data acquisition as well as reconstruction exists due to less pixels being required, when appropriately chosen acquisition settings are used.

Conclusions
In this paper we present the regularized optimization for ptychography (ROP) algorithm that uses derivative-based non-linear conjugate gradients optimization with Polak-Ribière search directions for the inversion. Three error metrics were used: the sum of absolute differences, the sum of squared differences and the negative log-likelihood for Poisson noise.
Through incorporation of the multislice formalism, multiple scattering was accounted for, result- ing in an improved lateral resolution as demonstrated by its necessity in resolving the Nbdumbbells in 4 nm thick Nb 3 Cl 8 with and without data reduction. Data compromised by partially known experimental parameters, such as beam shape and positions, was successfully treated by optimizing said parameters along with the object. Errors of up to 10 nm in defocus and up to 0.024 nm or 1.15 times the step size in beam position were shown to be recoverable.
It was shown how the experimentally favorable settings of large step sizes and convergence angles, and small beam support and width of the recorded diffraction patterns, worsen the oversampling ratio and make the inverse problem less wellconditioned. Regularization made reconstruction possible nonetheless, even from noisy simulated data that was underdetermined by a factor of 0.47. The experimental data had its overdetermination significantly reduced, from a factor of 70 down to 5.6, while still obtaining the desired resolution.
These improvements relax the experimental requirements and hence are likely to extend the applicability of ptychography to thicker samples, and higher frame rates and scanning speeds, more convergent probes, and larger fields of view. The reduced data load per recording furthermore facilitates a fast transfer of the experimental data and computation of the reconstruction. This will help solidify the applicability of ptychography as a standard experimental technique for electron microscopy and optical setups, and increase its applicability to a broader class of microscopes and detectors.
to observations that extend out to θ obs , with generally θ obs ≤ θ cal . The width in pixels of the calculated diffraction patterns, m, is 3θ cal /(λδ), and the equivalent width of the fitting region, n, is given by 2θ obs /(λδ). The pixel width in reciprocal space, δ, is 1/w. The pixel width in real space is d = λ/(3θ cal ).
The number of knowns and unknowns, N k and N u , is expressed as, The term of 1 for N k prevents fencepost errors, and ∆x is the scan step. The factor of 2 in the expression for N u accounts for the complex nature of the reconstruction, and s is the width of the scanned area. This yields, with the second equation describing a typical widefield scan. For thin specimens and the incoherent resolution limit, θ cal is substituted with 3θ con and (23) reduces further to (3).
The approximate beam width is given by the Rayleigh criterion as 0.61λ/θ con , and practical experience with both the experimental and simulated data sets in this paper, has indicated that ∆x should be considerably lower than this value, for example two-thirds or a half.
To ensure an acceptable forward simulation, w must be wide enough to encompass a sizeable fraction of the beam intensity. At w (k + 0.24)λ/θ con the support encloses the first k minima of a focused diffraction limited probe. It is hence recommended to choose w of the order 2λ/θ con or above.

B Derivatives
Here, the derivative of the loss function with respect to the incoming beam shape and beam positions as given by Eqs. (12), (13) and (14) are derived.

C Algorithm implementation
All the reconstructions presented were generated with a ROP implementation that utilizes the parallel hardware architecture of a NVIDIA V100 GPU, allowing a simultaneous propagation through the network for a batch of probe positions. For a batch size of 100, convergence of the algorithm for the tasks described in section 3.1 ranged between 3 to 4h and for every task in section 3.2 roughly 5min. Experimental results in section 3.3 were obtained with a batch size of 87 for the reconstructions of the MoS 2 data and a batch size of 10 for the reconstructions of the Nb 3 Cl 8 data, with an average time of 15min and 40min, respectively. Table 1: Settings for simulation, and reconstruction from simulated and experimental measurements of MoS 2 and Nb 3 Cl 8 . The acceleration voltage is 80 kV (λ = 4.18 pm) and 120 kV (λ = 3.35 pm), respectively. To treat the non-square scan patterns, the width s of the scan area is approximated as the width of a square of equal area. N k /N u has been calculated from (22).