Ptychography with multiple wavelength illumination.

For performing phase retrieval in the extreme ultraviolet (EUV) regime more efficiently, developing polychromatic ptychography is desirable. As an alternative to the existing ptychographic information multiplexing (PIM) method, we present an another scheme where all monochromatic exit waves are expressed in terms of the amplitude of the transmission function and the thickness function of the object. Our proposed algorithm is a gradient based method and its validity is studied numerically. In addition, the sampling issue which appears in the polychromatic ptychography scheme and its influence to the reconstruction quality are discussed.


Introduction
Ptychography [1][2][3][4][5][6][7] is a scanning coherent diffraction imaging (CDI) technique that is suitable for EUV and X-ray imaging applications due to its high fidelity and its minimum requirement on optical imaging elements. Its goal is to reconstruct a complex-valued object from a set of data which are recorded in the Fraunhofer or Fresnel diffraction region. In accordance with the meaning of 'ptychography' (from the Greek 'πτν χη' for 'fold'), for each measurement a part of the object is illuminated and adjacent illuminations should be partially overlap. This data redundancy and a priori information about the relative position between the illumination light beam and the object are the cause for the robustness of ptychography. During the last few years, many ptychography experiments have been successfully demonstrated with visible light sources [5][6][7][8][9], synchrotron radiation [3,4,[10][11][12][13][14], electron beam [15,16], and tabletop high-harmonic generation laser [17][18][19][20]. It has been widely agreed that ptychography is able to provide not only a wide field-of-view (FoV), but also simultaneous reconstructions of the complex-valued function of the sample and the illumination probe function.
Regarding reconstruction algorithms in ptychography, the most commonly used approaches are the difference-map method [4] and the ptychographic iterative engine (PIE) [2,6]. The differencemap method is designed to solve problems which can be formulated in terms of finding the intersection of two constraint sets [7]. In contrast, the PIE method derived from a gradient descent scheme by sequentially minimizing the distance between the estimated diffraction intensities and the measurements [5]. In principle, a highly coherent illumination is always demanded while performing ptychography [21,22]. However, in the X-ray regime, where ptychography has been widely used, light sources are either spatially partially coherent (e.g. synchrotron radiation [23,24]) or temporally partially coherent (e.g. tabletop high-harmonic generation laser [25,26]). To mitigate the unwanted effect due to the partial coherence, and also to improve the throughput of the imaging system, more novel algorithms have been introduced into ptychography during the last decade, among which the most popular approaches are the blind deconvolution method [27] and the modes decomposition method [10]. Both of these approaches were initially utilized for performing ptychography with spatially partially coherent illumination. The mode decomposition method was later used to perform ptychography with spatially coherent but temporally broadband illumination. The polychromatic ptychography was named ptychographic information multiplexing (PIM) method [8], in which the object is illuminated by a spatially fully coherent light beam, which spectrum however consists of several wavelengths. The exit wave immediately behind the object is decomposed into mutually incoherent modes, each mode corresponding to one wavelength. Then these modes are reconstructed simultaneously by minimizing the distance between the estimated diffraction intensity and the measurement, which is the incoherent sum of the intensities of the separate wavelengths.
In this paper, we propose an alternative polychromatic ptychography scheme where the modes in the PIM method are all expressed in the transmission and the thickness function of the sample. We consider both the case that the probe for the different wavelengths is assumed known, and the case of simultaneous reconstruction of unknown probe and the object. Our method is described and derived in Section 2.1. After introducing the error functions in Section 2.3, our simulation settings and results are presented in Subsection 3.1.1, followed by a quantitative study on the error functions in Subsection 3.1.2 and a comparison with the PIM method in Subsection 3.1.3. The simulation includes probe reconstruction is demonstrated in Subsection 3.2. We conclude the paper with a summary and outlook in the last section.

Plane-wave illumination
We start by considering a polychromatic ptychography configuration as depicted in Fig. 1. A part of the object is illuminated by a spatially coherent plane-wave, which has distinct peaks in its temporal spectrum S(λ). The object is moved to a number of positions while a set of ptychographic data is collected in the far field. For one position of the object and for one wavelength λ, we denote the exit wave immediately behind the object by Ψ (r, λ) and the measured intensity of the diffracted field by I M (r ). Here r and r are 2-D coordinates in the object-plane and the detector-plane, respectively. In our proposed scheme, instead of decomposing the exit wave into mutually incoherent modes and calculating their diffraction intensities, we consider the relation between these modes. In the plane-wave illumination configuration, the exit wave Ψ (r, λ) is given by the object's complex transmission function multiplied by a planar wave with wavelength λ. For the case where the illumination contains several separate wavelengths λ 1 <λ 2 < · · · <λ k < · · · <λ N in its temporal spectrum S(λ). The exit wave Ψ (r, λ) for wavelength λ k and probe position R j is (apart from a phase constant, see in Appendix D.1): where R j represents the jth relative position between the probe and the object. A(r) is the object's transmission function and φ(r) stands for 2π times the ratio of the object thickness function and wavelength λ 1 . Note that both A(r) and φ(r) are real valued and that A(r) is positive. P(r) stands for the illumination (or probe) function, which in this subsection is treated as a planar wavefield multiplied by an circular aperture with radius r 0 : Note that in Eq. (1) we assume that the object has no dispersion. For dispersive materials, the exponential term in Eq. (1) should be modified by introducing the ratio of the refractive indices at wavelengths λ 1 and λ k , and the absorption should be represented by the imaginary part of the refractive indices. The goal in our polychromatic ptychography scheme, for the case that the probe P is known , is to retrieve A(r) and φ(r) simultaneously. To do that, we minimize the following cost function: where I M (r ) is the measured intensity when the probe is at position R j and I E (r ) is the estimated polychromatic far field diffraction intensity which is an incoherent sum over all calculated monochromatic diffraction patterns [28]: where 1 D (r ) is a binary window function representing the region of the detector: with and F denoting the Fourier transform, which is used to propagate the exit wave to the far field over the large distance z. The reconstruction of A(r) and φ(r) is done by applying the steepest decent procedure to the cost function E j (A, φ). In Appendix A and B detailed derivations are given of the following formulas for updating A(r) and φ(r): stands for the iteration number, and ∆Ψ (r, λ k ) is defined by: where α>0 is a regularization parameter which prevents division by zero. Its value should be chosen comparable to the noise level so that an accurate reconstruction can be guaranteed. The Eq. (7) are implemented sequentially for all lateral positions as one complete iteration. Note that A(r) and φ(r) are only updated where P j (r) is nonzero. Here we stress that the expression for ∆Ψ (r, λ k ) can also be identified as the gradient descent direction of the cost function with respect to each mode in the PIM method. For successfully retrieving A(r) and φ(r), the complete ptychographic dataset must be used. To do that, we sequentially implement Eq. (7) on every lateral position of the object as a complete reconstruction procedure within one iteration. In the meantime, we also introduce a positive-value correction to the amplitude reconstruction A(r) at the end of each iteration, so that a positive amplitude is obtained and hence the phase is well defined. A framework of our proposed model is summarized in Algorithm 1. stands for the iteration number, and ∆Ψ (r, λ k ) is defined by: where α>0 is a regularization parameter which prevents division by zero. Its value should be chosen comparable to the noise level so that an accurate reconstruction can be guaranteed. The Eq. (7) are implemented sequentially for all lateral positions as one complete iteration. Note that A(r) and φ(r) are only updated where P j (r) is nonzero. Here we stress that the expression for ∆Ψ (r, λ k ) can also be identified as the gradient descent direction of the cost function with respect to each mode in the PIM method. For successfully retrieving A(r) and φ(r), the complete ptychographic dataset must be used. To do that, we sequentially implement Eq. (7) on every lateral position of the object as a complete reconstruction procedure within one iteration. In the meantime, we also introduce a positive-value correction to the amplitude reconstruction A(r) at the end of each iteration, so that a positive amplitude is obtained and hence the phase is well defined. A framework of our proposed model is summarized in Algorithm 1.
Algorithm 1 polychromatic ptychography algorithm with plane-wave illumination iteration number n = 1 γ = a small positive number (e.g. 10 −3 ) repeat for each probe position R j do for each λ k do forward propagate the wavefield; end for use Eq. (4) to calculate I E (r ); for each λ k do use Eq .(8) to apply intensity constraint on the total diffraction field; backward propagate the wavefields for every λ k ; end for use Eq. (7) to update A(r) and φ(r); if A(r) n < 0 then A(r) n = γ; end if end for n = n + 1; until algorithm converges

With probe reconstruction
In the previous subsection we restricted ourselves to the case where the illumination (or probe) is a localized plane-wave. One can imagine that if in practice this condition is violated, directly applying the algorithm will end up with stagnation or an inaccurate reconstruction. Therefore to achieve a better of the reconstruction's quality, simultaneously retrieving the probe function and the object's transmission and thickness functions is necessary in such cases. As shown in Fig. 1, in our model the object is illuminated by a wavefield which contains multiple wavelength components. If these components have different intensity and wavefront profiles, then we have to model this polychromatic probe function as an incoherent superposition of different modes and using the PIM formula to reconstruct these modes would be the most reasonable choice. Here

With probe reconstruction
In the previous subsection we restricted ourselves to the case where the illumination (or probe) is a localized plane-wave. One can imagine that if in practice this condition is violated, directly applying the algorithm will end up with stagnation or an inaccurate reconstruction. Therefore to achieve a better of the reconstruction's quality, simultaneously retrieving the probe function and the object's transmission and thickness functions is necessary in such cases. As shown in Fig. 1, in our model the object is illuminated by a wavefield which contains multiple wavelength components. If these components have different intensity and wavefront profiles, then we have to model this polychromatic probe function as an incoherent superposition of different modes and using the PIM formula to reconstruct these modes would be the most reasonable choice. Here we only consider a simple situation where every mode shares the same intensity and wavefront profile. In that case the exit wave field Ψ (r, λ) can be written as in Eq. (1). However, now P(r) is a complex-valued probe function which represents the illumination wave field of all wavelengths. To incorporate the reconstruction of this probe function, Eq. (7) is modified as : where ∆Ψ (r, λ k ) is again given by Eq. (8). Note that Eq. (9) are designed for the simple situation mentioned above, and should be implemented sequentially for all the lateral positions as one complete iteration. In Section 3, numerical experiments are performed to test our algorithm.

Definition of error functions
To monitor our numerical experiment, we define an error function given by: as before, the subscript j is an index that labels the object's lateral position and the subscript n is the current iteration number. We refer to E F as the normalized error in Fourier space (NEF), which is commonly used in practical experiments due to the availability of I M and I E . Due to the close relation between the NEF and the cost function as defined in Eq. (3), it is more suitable to assess the convergence of our algorithm by monitoring the evolution of the NEF than the NER.
To estimate the quality of the results, we also define a normalized error E R in real space (NER), defined by: where O(r) = A(r) · exp [iφ(r)] is the actual object function which is defined as the exit wave at wavelength λ 1 . O n (r) and γ are given by: Hence O n (r) is the reconstructed object, i.e. the reconstructed exit wave for wavelength λ 1 , after n iterations. The parameter γ is a multiplication constant that makes the NER invariant with respect to phase offset [29]. The NER can be regarded as a direct measure of the quality of the reconstructions. This suggests that in numerical experiments, the NER is more suitable to monitor the error, however in real experiments only NEF can be used.

Plane-wave illumination
To examine our Algorithm 1, the results of numerical experiments are reported in this section. A detailed introduction and demonstration of our simulation is presented in the first part of this section, then in the following subsection we analysis the performance of the algorithm by studying the error functions. In the final part of the section a comparison study between Algorithm 1 and the PIM method is reported.
3.1.1. Validation of the proposed algorithm.
Parameter settings of the simulation We first construct a complex-valued object with as amplitude map the 'Mona Lisa' painting and as phase map the 'camera man' picture. The amplitude map contains non-zero values ranging in [0.1, 1] to avoid phase uncertainty, and the phase map varies between [0, π] to prevent phase wrapping effect. A planar wavefield which contains a certain number of wavelength components, and which was transmitted by a circular aperture, is used as the probe function. For each wavelength component, the exit wavefield is modeled as in Eq. (1) and then is propagated to the far field where their intensities are added up. The measured total intensity is the polychromatic ptychographic data set in accordance with Eq. (4). Note that for every wavelength λ k other than the shortest one λ 1 , the propagated wavefield is zoomed in with a rate = λ k /λ 1 . The far field diffraction patterns scale with the reciprocal wavelength. Since the discrete mesh in the far field region should correspond to the pixels of the detector, we use the chip z-transform [30,31] instead of the FFT to enable the desired flexibility of the mesh. The discretised regions of the diffraction patterns are limited to spatial frequencies that are below the maximum spatial limit defined by the Nyquist sampling for the shortest wavelength λ 1 . To give an example, we assume in the configuration of Fig. 1 that there is a detector with a 320 × 320 array of 15µm pixels, at a propagation distance of z = 1cm behind the object. Accordingly, the maximum spatial frequency that can be measured by this detector for the wavelengths 30nm, 40nm and 50nm are listed in Table 1. The probe function that we use is a matrix with 320 × 320 pixels, which is in line with the number of pixels of the detector. The probe has circular support with diameter of 160 pixels, which is equivalent to a diameter size of 10µm. The Fresnel number is 1/3 for 30nm wavelength, and is smaller than 1/3 for larger wavelengths. Therefore the detector is in the Fraunhofer region. The circular support is used as a priori knowledge in the reconstruction. In addition, the a priori knowledge is used that the object is moved over an equidistant 4x4 grid with 80% overlap between adjacent illuminated areas. Suppose the diameter of the circular support is L, and the distance between adjacent illumination positions is denoted by d ∈ [0, L]. The overlap ratio is defined by: which is usually be assigned from 60% to 85% to achieve optimal performance of the reconstruction algorithm [32]. A detailed demonstration regarding how the overlap ratio influences the reconstruction quality can be found in Appendix G. This scanning procedure of the probe gives us an object array with 480 × 480 pixels in total. For the pixels of the discretized object region which are not illuminated, the value of A(r) are set to be zero during the iteration. The width of this region of zeros is roughly 80 pixels. In practice, the object's moving grid could be obtained from a translation stage's reading, which can be refined using some proposed algorithms [5,33].
On the other hand, the a priori knowledge about the probe's support size can be estimated by Fourier transforming the diffraction intensity, and then can be registered with real-world spatial scale according to the propagation distance and camera's parameters.
It is worth to note that, by taking more wavelengths into account, the computational cost of the algorithm is generally more expensive than the case of monochromatic illumination. This can be understood by inspecting the framework of Algorithm 1 and Eq. (8). To complete a single iteration in Algorithm 1, one need to perform forward and backward propagation of the wavefield for every wavelengths and for every probe positions, which involve multiple FFTs. If the propagation is calculated for each λ k sequentially, the required computation time will increase almost linearly with respect to the number of wavelengths. To shorten the calculation duration, one can modify the code based on accelerated gradient-based algorithms (e.g. nonlinear optimization methods [34,35] and momentum-based methods [36]) or take an advantage of modern computational devices (e.g. GPU-based parallel computing [37][38][39]). In our numerical experiments, 1000 iterations were applied for each single simulation to ensure the convergence. To make the algorithm converges faster, Nesterov momentum-based algorithm [36] was implemented between 50th and 500th iteration. The momentum is added to the reconstruction formula in the manner as suggested in [40]. Our simulation is running on a NVIDIA GeForce GTX 1060 GPU. For easy integration with the device, our code is written in Python using the scikit-cuda package for calculating 2D-FFT and pycuda package for performing other operations.
Because in this part of our simulation the probe includes polychromatic plane-waves, we first investigate how the number of wavelengths in the probe's temporal spectrum influence the quality of the reconstruction. The probe's spectrum was generated as follows: (i) when the illumination is a monochromatic plane-wave, the wavelength is 30nm. (ii) for the polychromatic situation, we start with generating two spectral components at 30nm and 50nm. To include more spectral component (when N>2), we add the additional frequencies between 30nm and 50nm, while the distances in frequency between adjacent frequencies are identical. A schematic demonstration regarding how we generate the temporal spectrum can be found at the left of Fig. 2. When the measurements are noise free, we let all the wavelengths have the same intensity in the probe spectrum as shown in Fig. 2. Whereas for the noisy situation, we built the probe spectrum such that all the wavelengths share the same number of photons. A more detailed description about the noisy case can be found in the next subsection.
Adding noise to the measurements To investigate the influence of noise, we added Poisson noise to every diffraction intensity measurements. Considering that one of the tasks in the numerical experiment is to investigate how the algorithm performs with different number of wavelengths when the noise is kept at a certain amount, it is important to build a criterion of the noise level which is insensitive to the number of wavelength in the probe's spectrum. In the simulation we pay attention to the total photon number of the diffraction intensity, we call this number TPN (short for 'total photon number') and we use it as a reference to define the Poisson noise level. In accordance with the Poisson noise model, the signal-to-noise ratio (SNR) equals to √ TPN. Hence when the TPN remains at the same value, each measurement will also stays at the same noise level for different number of wavelengths. Here it is notable that a single photon includes hc/λ k of energy for each wavelength λ k , therefore each wavelength has different intensities with the same photon number. However, though a simple calculation we can find that the total energy of the diffraction intensity is the same for a fixed TPN. The calculation is presented in Appendix C. Reconstruction results In Fig. 2, we present our simulation results for the cases where 1, 2, 5 and 20 spectral components are included in the probe's spectrum, respectively. The results in Figs. 2(a1)-2(a8) are obtained with the PIE algorithm, while Figs. 2(b1)-2(b8), 2(c1)-2(c8) and 2(d1)-2(d8) are the reconstructions by implementing our proposed Algorithm 1. We use a constant amplitude map A 0 (r) = 0.5 and a constant phase map φ 0 (r) = 1 as an initial guess for the object function, which is proven to be sufficient for our proposed algorithm to successfully alleviate the ambiguities described in Appendix D.2. Reconstruction results with different TPN value are also depicted in Fig. 2. By roughly examining these pictures, one can conclude that the reconstructions are visually acceptable when the TPN equals 10 7 and 10 6 , whereas the reconstructions are corrupted by noise when the TPN is decreased to 10 5 . By carefully comparing the phase reconstructions (the 'camera man' pictures), it is also noticeable that stronger defects occur when the probe contains 20 spectral components. For a better understanding on this defect, in the following subsection a more quantitative analysis is given by inspecting the evolution of the error functions.

Evolution of the error function.
To better understand the simulation results, it is necessary to examine the evolution of the error functions that are described in Subsection 2.3. A series of NERs and NEFs, which were computed from the simulation results described in Subsection 3.1.1 and Fig. 2, are drawn in Figs. 3(a) and 3(b) respectively. All the NERs and the NEFs were computed for the case that the probe contains 1-20 components in its spectrum, and the TPN varies from 10 5 to 10 7 . If we only pay attention to a single TPN value, it is seen that although the NEFs stay at almost the same value, the NER increases with the number of wavelengths almost linearly, which agrees with what we already observed in Figs. 2(c1)-2(c8). Note that the signal-to-noise ratio is so large that the noise has negligible influence when the TPN equals 10 7 , therefore in this plot the blue dots and red ones are almost overlap.
For seeking the reason behind this phenomenon, additional simulation have been performed. The difference between the new and previously discussed simulations is that we ignore the wavelength dependence in propagating the wavefields (see Eq. (4)). Figure 4 gives a schematic description of the two different ways of computing the polychromatic diffraction intensity pattern. In line with what has been discussed in Subsection 3.1.1, it is assumed that the shortest wavelength's (λ 1 ) contribution to the diffraction intensity always fulfills the Nyquist sampling criterion. Hence in Fig. 4 we use a red frame to represent the Nyquist frequency with respect to λ 1 , which is equivalent to the boundary of an imaginary detector 1 D (r ). Figure 4(a) illustrates the calculation process in Eq. (4), which includes the wavelength-dependency of the wavefield propagation. In the new numerical experiment the wavefield propagation is without the wavelength-dependency, as shown in Fig. 4(b). In correspondence with the wavefield propagation model shown in Fig. 4(b), the simulated NERs and NEFs, gathered after our algorithm converged, are illustrated in Figs. 5(a) and 5(b) respectively. By comparing Fig. 3 and Fig. 5, it is obvious that the wavelength-dependent error in the reconstruction disappears once the wavelength-dependency in the propagation model is removed. Hence, one can conclude that the way we measure and compute the polychomatic diffraction wavefield is the cause for the increase in NER in Fig. 5(a). This is because when the binary function 1 D (r ) in Eq. (4) corresponds to a window size which fits the Nyquist sampling criterion for the short wavelength λ 1 , the same maximum spatial frequency from larger wavelengths are not completely measurable (see in Table 1). As illustrated in Fig. 5(a), for wavelengths larger than λ 1 , part of the diffraction wavefield (outside the red frame) is cut off by the boundary of the detector. Hence the recorded data is incomplete which leads to the increase of the NER. It follows that the theoretical resolution of the reconstruction cannot be estimated only by the size of the detector and the sampling rule of the shortest wavelength. This is important if one performs polychromatic ptychography without using wavelength-scanning or spectroscopic detection to separately detect the diffracted intensities of the individual wavelengths. In other words, although in principle performing ptychography with a polychromatic light source can gives higher SNR and shorter acquisition time, the reconstruction quality is not necessarily better than the monochromatic ptychography result. A similar effect was also reported in [8].  5. The numerical experiment results corresponding to the situation where the wavefield propagation is wavelength-independent, as in Fig. 4(b). In these plots same choices have been made for the number of wavelengths and noise as in Fig. 3.

Comparison with PIM method.
In order to study the different performance between our proposed algorithm and the PIM method, we present some numerical simulations in this subsection. All parameter values are chosen the same as in Subsection 3.1.1. In Fig. 6 we illustrate the final reconstructed NERs and NEFs for simulated noise-free measurements. The NERs of our proposed algorithm were calculated as described in Section 2.3. While the NERs of the PIM method were calculated also from Eq. (11), in which the reconstructed object O(r) n is given by the exit wave in PIM corresponding to λ 1 . From Fig. 6 it can be concluded that although both Algorithm 1 and the PIM method converged to a very low value of NEF, the reconstructed object functions have different qualities. When the probe function contains polychromatic plane waves with more than roughly 10 wavelengths, the PIM method suffers more from the limited detector size issue described in Subsection 3.1.2. As shown in Fig. 7, when the probe has 20 spectral components, all the reconstructed modes in the PIM method are disrupted. Hence in this case it is more suitable to employ Algorithm 1.

With probe reconstruction
We also compare the simulation outcomes for different situations when the probe function is unknown. As has been mentioned in Subsection 2.2, we consider a relatively simple case where for every wavelength the complex probe function is the same. To inspect the performance of our proposed iterative scheme of Eq. (9), we use a complex initial probe function and show the noise-free reconstruction result in Figs. 8(a1)-8(a6). The probe's temporal spectrum is assumed to contain 3 components with identically long wavelength intervals between them, as illustrated at the left of Fig. 8. It can be confirmed that the update formula given in Eq. (9) is able to retrieve the object and the probe function, even when the probe has a very complicated profile. However, due to the fact that in this simulation we set the relative position between the probe and the object R j to be a regular position grid in two orthogonal directions, raster grid pathology appears in both the reconstruction of the object and the probe. This kind of defect is more visually obvious when the initial probe function is a polychromatic plane-wave (the reconstruction result for this case is in Figs. 8(b1)-8(b4)). To eliminate the raster grid defect, ideas for optimizing the scanning trajectory were proposed in [41][42][43], which however are beyond the scope of this paper.

Conclusion and outlook
In this paper we have introduced a polychromatic ptychographic algorithm, which can be regarded as an alternative to the PIM approach. It is based on the idea that the mutually incoherent modes in the multiple wavelength scheme can be related by representing the object by real transmission and thickness functions. The algorithm is derived from the steepest descent method and is numerically validated. We first discussed polychromatic plane wave illumination which is assumed to be known and afterwards included the case of the reconstruction of an unknown probe. The results show that Algorithm 1 performs nicely for a known probe, and that a reasonable level of noise can be handled by the algorithm. For a sufficiently thin object, the ptychography reconstruction given by the polychromatic approach has a higher NER than the case of monochromatic illumination. This is due to the fact that for longer wavelengths higher spatial frequencies are not capture by the detector because stronger diffraction effects. Hence, although in principle performing ptychography with a polychromatic light source can give higher SNR and shorter acquisition time, the reconstruction quality is not necessarily better than the monochromatic ptychography results. Compared to the PIM method, Algorithm 1 is less sensitive to the missing data issue. However, defects are observed when a very complex unknown probe function is introduced and must be reconstructed as well. With varying parameter settings (i.e. noise and the number of spectral components in the probe), different behaviors are observed and discussed in this paper. As next steps for improvement, pathologies cause by the raster scanning grid of the probe function should be eliminated and the performance of the algorithm should be validated using experimental data.

Appendix
The steepest descent method and monochromatic ptychography algorithm In this section of the Appendix we derive the ptychograpy algorithms for a single wavelength. The goal of ptychography is to reconstruct a complex-valued object from a set of diffraction intensity patterns which are recorded in the Fraunhofer or Fresnel region. Let r and r be 2-D coordinates in the object-plane and the detector-plane, respectively. The exit wave immediately behind the object is denoted by Ψ (r) and the measured diffraction intensity measurement I M (r ). According to the thin object model, the exit wave Ψ (r) is given by the multiplication of the probe function P(r − R j ) and the object function O(r): where R j is the jth relative position of the probe and the object. The probe function is assumed to have a circular shape support which has a finite size with radius r 0 : For a detector locates in the far field, we can estimate the diffraction intensity pattern I E (r ) as: where F stands for the Fourier transform operator. A cost function E can be defined as the distance between the estimated diffraction intensities I E (r ) and the measurements I M (r ): The task of ptychography is to find an object function and a probe function by minimizing the cost functional E which fit the given a priori knowledge (i.e. the support constraint on the probe function, and the jth relative position R j between the probe and the object), while the cost function E is minimized.
To recover the object function, a straightforward way is sequentially performing projections in the Fourier plane and the object plane. In recent research works on phase retrieval, it has became more common to implement these projections in the Fourier and object domain with more advanced schemes (e.g. difference map [7,10,44], RAAR [37,45], etc.) for achieving better convergence.
Alternatively, one can obtain ptychography reconstruction by sequentially minimizing a cost function E j with the steepest descent method. Let E j be the difference between I E,j (r ) and I M,j (r ) for position j: First, we compute the retrieval formula for the object function O(r). To do that, we calculate the functional derivative of E j with respect to O at every point r: where ∆Ψ j (r) is an auxiliary function given by: and is the real part of a complex number. To determine the steepest descent direction, we formulate the problem as following: where δO 2 is given by: To solve this problem, we construct a Lagrange function: where λ O is a real Lagrange multiplier. Now, we differentiate L by perturbing the function δO with an arbitrary auxiliary function δÕ. According to the Lagrange multiplier rule, for the optimal solution of the problem in (21), we have: ∬ Note that (24) is for all δÕ(r). If we assign δÕ(r) to be pure real-valued, then we have: ∬ One solution for this equation is: On the other hand, if we assign δÕ(r) to be pure imaginary-valued, then we have: ∬ which leads to a solution: where is an operator which takes the imaginary part of a complex number. Combining (25) and (26) gives us: Hence, the steepest descent direction of E j at O(r) is proportional to the function ∆Ψ j (r) · P * j (r) at every point r. The iteration formula for the object function is then given by: where β O is the step-size which is normally chosen to be a constant.
In a similar fashion, we can derive the iteration formula for updating the probe function using the steepest decent direction. We find: where β P is a constant step-size which takes the same value as β O . We note that (28), (29) are equivalent to the ePIE algorithm [2,6].

Polychromatic ptychography algorithm
In this section we propose a ptychography algorithm for multiple-wavelength illumination with mutually incoherent wavelengths and using measured total diffracted intensities (i.e. the intensities are not spectrally separated). As mentioned in the main text, for an non-dispersive thin object we can express the exit wave for wavelength λ k as: where R j represents the jth relative position between the probe and the object. A(r) is the object's local transmission function and φ(r) stands for 2π times the ratio of the object thickness function and wavelength λ 1 . Note that both A(r) and φ(r) are real valued and that A(r) is positive. As in Appendix A, the steepest descent method can provide to us with updating formulas for A(r), φ(r) and P(r). Due to the fact that the exit wave in not monochromatic, we need to re-define the estimated diffraction intensity I E (r ) as an incoherent sum of every monochromatic component: where 1 D (r ) is a binary window function which represents the area of the detector with r = x ê x + y ê y . (32) To start with, we compute the functional derivative of E j , which was defined in Eq. (18), with respect to A(r), φ(r) and P(r) at every point r: where we use the fact that both A(r), φ(r) are real-valued by definition. Once again the auxiliary function is given by: Similar to the derivation in Eq. (18)- (29), we arrive at the following formulas to update the estimates for A(r), φ(r) and P(r) for each position j: where δ A , δ φ and δ P are constant step-sizes and n is the iteration number.

The relationship between the photon number and the total energy
In this appendix we aim to prove that the total energy of the measurements is a constant for a fixed total photon number. In line with Section 3.1.1, we denote N as the number of wavelength and TPN as the total photon number in the diffracted wavefield. Because every wavelengths are presumed to have the same number of photons, the total energy e N of the measurement is given by: where h is the Planck constant and ν k is the kth frequency. Considering that all the frequencies lie in the range [ν 1 ,ν N ], and have equal distance in frequency between adjacent ones, Eq. (36) can be rewritten as: where m is an auxiliary integer. The third term on the right side of Eq. (37) stands for the total energy from ν 2 to ν N−1 . By computing the sum in Eq. (37), the total energy e N can be expressed as: Hence for a fixed TPN, ν 1 and ν N , the total energy e N of the polychromatic diffracted wavefield is a constant.

Ambiguities in ptychography
Compared to traditional single-measurement phase retrieval methods, ptychography alleviates amount of issues which might disrupt the reconstruction. However, there are still ambiguities that could lead us to an unwanted solution in ptychogrpahy.

Global phase shift
Due to the fact that in ptychography only diffracted intensities are detected, we can have: where α is an arbitrary constant, and Ψ j (r) = Ψ j (r) exp(iα) is an alternative solution for the same ptychographical measurement.

Conjugate reconstruction
This ambiguity comes from the nature of the Fourier transform. For a given exit wave function Ψ j (r), we have the following relationship: Considering that the far-field diffraction intensity pattern is in principle centrosymmetric and insensitive to complex conjugation, Ψ * j (r) can be an alternative solution for the same ptychographical measurement. An experienced initial point for the algorithm could avoid this issue.

Raster grid pathology
The raster grid pathology is a periodical defect which can occur in ptychography reconstruction, if the relative positions R j between the probe and the object are on a regular position grid. To be explicit, we start with the expression of exit wave in Eq. (14): with where m is an arbitrary integer, than Eq. (41) is true for all jth positions. Therefore P (r) and O (r) can be an alternative probe and object reconstruction for the same ptychographical measurement.

Additional simulations about the effect of incomplete measurements
In Subsection 3.1.2, we demonstrated that the quality of the reconstruction could degrades in the polychromatic ptychography scheme because a detector of limited size cannot capture strongly diffracted far field intensities at longer wavelengths. To demonstrate this, we described two simulations Subsection 3.1.2, namely: (1) with an incomplete ptychographic data-set; (2) with a complete measured data-set, which was obtained by ignoring the wavelength-dependency in the wavefield propagation model, as illustrated in Fig. 4. The simulation results are shown in Fig. 5.
In this Appendix, an alternative simulation is provided to further argue the cause of the inaccuracies of the reconstructions. As illustrated in Fig. 9(b), instead of ignoring the wavelengthdependency in the wavefield propagation model, we expand the boundary of the detector such that the maximum spatial frequency for the largest wavelength λ N can be collected. We emphasize that in this simulation all monochromatic diffracted wavefields have the same maximum spatial frequency, which equals to the largest wavelength λ N . This is because the discretized object has the same pixel size and illuminated area for every wavelength. For the wavelengths shorter than λ N , the relevant parts of the diffraction patterns of all wavelengths fall inside the area of the detector, and the pixel size is assumed to be sufficiently small that for the smallest wavelength (and hence for all wavelengths), the intensity patterns are sufficiently well sampled. As a comparison, we again inspect the reconstruction with the incomplete measurement. The reconstructed error functions for these two numerical experiments are plotted in Fig. 10.
Because limited size of the detector only influences the quality of reconstruction for the propagation model as shown in Fig. 9(a), we call the model of Fig. 9(a) model as the case 'with window function' and similarly we call the model in Fig. 9(b) as the case 'without window function', which explains the legend in Fig. 10. From Fig. 10(b) we see that all the NEFs have reached zero-value, which indicates that the algorithm has converged to the solutions with the same NEF for all the situations. However in Fig. 10(a) the NERs of the solutions obtained with window function are higher than without the window function. Therefore one can conclude that the limited size of the detector is the cause of the degradation of the results.

Evolution of error functions
In this section we give an example of the evolution of the error functions, which aims to show the convergence of the algorithm. Each curve in Figs. 11(a) and 11(b) is related to one situation in Fig. 9. Graphical description of the two different ways of computing the polychromatic diffraction intensity pattern. The red frame represents the boundary of an imaginary detector 1 D (r ). (a) illustrates the situation where the detector can records incomplete data. (b) describes the situation where the detector is able to measure the maximum spatial frequency component for the largest wavelength λ N , hence also for the wavelengths shorter than λ N each monochromatic diffraction intensities are zero-padded in accordance with Eq. (4).

Fig. 10.
A comparison between simulation results of the two sampling situations shown in Figs. 9(a) and 9(b), with noise-free measurements. The orange dots are related to the propagation model which is illustrated in Fig. 9(a), while the blue dots corresponds to the sampling scheme which is shown in Fig. 9(b). Fig. 2. Although we have not decided a terminate criterion in our numerical experiment, it is clear that employing the algorithm for 1000 iterations is sufficient for reaching the minima in our simulation.

Influence of the overlap ratio
Considering ptychography is a scanning diffraction imaging technique, it is reasonable that the overlap ratio defined in Eq. (13) plays an important role in the quality of the reconstructed image. In [32] the authors studied how the overlap ratio influences the performance of PIE through simulation and experiment for the first time. The result indicates that, with monochromatic and fully spatial coherent illumination, one can get hold of satisfactory reconstructions by employing 60%-85% of overlap ratio. It was also mentioned in the literature that 30% of overlap could be sufficient if fast overview scans are demanded. On the other hand, a theoretical explanation regarding how overlapped ptychographic scans facilitate the convergence of the algorithm was Fig. 11. The evolution of NEF and NER in our simulation described in Subsection 3.1.1. Each curve in (a) and (b) is related to one situation in Fig. 3. proposed in [46]. However, to the best of our knowledge, no optimal overlap ratio from theoretical point of view has been proposed yet. Fig. 12. Simulation results about how the overlap ratio affects the performance of polychromatic ptychography. In this plot the diffraction intensity measurements are noise free. (a) shows the reconstructed object when the probe only contains 5 wavelengths, and when the overlap ratio equals 0%, 80% and 99%, respectively. In (b) we demonstrate the NERs for a series of wavelengths and for overlap ratios ranging from 0% to 99%.
In this appendix we conduct a simulation to demonstrate how the overlap ratio influences the performance of our proposed polychromatic ptychography method. In this numerical experiment, we presume the diffraction intensity measurements are noise free. The rest of the parameter settings for the simulation are duplicated from the Subsection 3.1.1. To observe how the overlap ratio affects the quality of the reconstruction, we adjust the overlap ratio from 0% to 100% with an interval of 10%. Once again in each simulation 1000 iterations were applied to ensure convergence. As an example of the reconstruction results, in Fig. 12(a) we show the reconstructed object when the probe only contains 5 wavelengths, and when the overlap ratio equals 0%, 80% and 99%, respectively. The NERs for a series of wavelengths overlap ratios can be found in Fig. 12(b). From Fig. 12(b), it can be concluded that the overlap ratio should be selected from 60% to 90% for achieving optimal performance of the algorithm, which is in good agreement with the monochromatic situation given in [32]. In line with this conclusion, we choose to employ 80% overlap ratio for the numerical studies in the main context.