Elementary signals in ptychography

: Ptychographic imaging has gained popularity for its high resolving power and sensitivity as well as for its ability to map simultaneously the sample’s complex-valued refractive index and the illumination. Yet, despite signiﬁcant progress that allows for reliable practical implemen-tation, some of the technique’s fundamentals remain poorly understood, and oftentimes successful data acquisition is either overly conservative or relies more on experimenters experience than on rational data acquisition strategies. Here, we propose a theoretical framework of ptychography, which is based on Gabor’s notion of decomposition into elementary signals and the concept of frames. We demonstrate how this framework can straightforwardly be used to derive sampling requirements or to provide arguments on how to optimize the ptychographic scan. More generally, our theoretical framework can serve as a bridge between the experimental technique and the rich and well established mathematical disciplines of wavelets decomposition and spectrogram analysis.


Introduction
Ptychography is a coherent diffractive imaging technique capable of providing highly detailed images of a sample's complex-valued transmittance. To this task, a collection of coherent diffraction patterns is generated by scanning across the sample with a spatially confined il-lumination and sufficient overlap of adjacent illumination footprints. These overlapping illuminations introduce redundancy in the data which is exploited in to simultaneously provide information on specimen and illumination.
The technique was proposed in the late 1960s for application in electron microscopy [1] but could be demonstrated experimentally only decades later [2][3][4] using Wigner distribution function (WDF) deconvolution [5] as reconstruction technique. Whereas this approach allows for a useful theoretical interpretation of ptychography, Nyquist sampling both in reciprocal and in real space rendered the experiments rather impractical.
The interest in ptychography was reinvigorated by incorporating efficient phase retrieval algorithms, which iterate between real and reciprocal space through Fourier transforms and are based on distance-minimizing projections onto two constraint sets [6][7][8][9][10]: firstly ensuring consistency of the image with measured data and secondly ensuring self-consistency of the reconstructed images in the multiply illuminated areas. Additionally, minimization of a cost function [11][12][13], such as the negative log-likelihood, can be used.
These methods allow for significantly less restrictive sampling, rendering the technique practically useful. Yet despite some efforts [14][15][16], the connection between these iterative algorithms and Wigner distribution deconvolution is not fully understood. In particular, common sampling practices [17] do not take full advantage of the redundancy in the data and are oftentimes rather heuristic. Only recently, an ad hoc approach showed that the sampling of ptychographic data is independent of the illumination spot size on the specimen plane [18,19].
Ptychography's combination of measuring in reciprocal space as a function of real-space translation, and similarly the combination with reversed roles in Fourier ptychography [20], lends itself to be described in phase space. Examples can be found ranging from early representations in terms of Wigner distribution functions [3][4][5] over more recent investigations into sampling requirements [18,19] or illumination design [21] to full statistical descriptions [16,22], among others. Here we introduce to ptychographic imaging Gabor's notion that any digital signal can be expanded into a discrete set of elementary signals in phase space [23]. This interpretation is conveniently expressed in the context of frames [24][25][26]. We show that the inversion formulas, that yield from the set of diffraction patterns both sample and illumination, are consistent with those used in the difference map algorithm [10,27] and use this theoretical framework for a better understanding of sampling requirements [18,19] and the influence of the illumination on the optimal scanning pattern. Finally, we indicate how this description can be used to investigate the role of the spatial-frequency spectrum of the illumination [18,19,21] or partial-coherence effects [22].

Ptychography
A central assumption in ptychography is that the interaction of the sample with the illuminating wave is multiplicative: where ψ(r; R) represents the exit wave past the sample or past a "slice" of the sample [28], O(r) describes the sample, and P(r) is the illumination function, also called probe. The coordinate R is the translation coordinate of the illumination scanned across the sample. Further, it is assumed that the measured intensities in the diffraction patterns comply for each illumination spot with the square magnitude of the exit wave propagated to the detector plane. While the results here presented can easily be applied also to ptychography in the Fresnel regime [29], for simplicity we assume that the Fraunhofer far field is detected, in which case the propagation between sample and detector corresponds to a simple Fourier transform, whose scaling we will omit: This four-dimensional function of the reciprocal-space coordinate q and the illumination shift distance R preserves both spatial and spatial-frequency information of the specimen. The task of ptychography is to retrieve from this measurable quantity the object function O(r) and the illumination function P(r).
In order to derive an analytical theory for ptychography, we note that P(r−R) plays in Eq. (2) the role of window function by limiting the portion of the object function which is expanded onto the Fourier basis. This allows local fluctuations of frequency across the sample to be explored rather than infinite wave trains emanating from the entire sample at once. Accordingly, we consider the object function in Eq. (2) as expanded onto a windowed-Fourier basis ( Fig. 1)akin to the confined elementary signals decomposition proposed by Gabor [23] and later worked out in the context of frames [24,25] and wavelets [26,30].
While its linearly dependent family of functions and the resulting redundancy render the windowed-Fourier transform more complex to analyze than the "common" sequence of expanding onto the Fourier basis only after having windowed the signal, the relevant theory is well established [26,30] and offers new insights. A Gaussian window function with sigma equals to 0.25 × image size was used to "window" the Fourier basis.

Windowed Fourier transform and Heisenberg boxes
Let us now define for P a window function, which is shifted by R and modulated by a frequency q: where the symbol * indicates the complex conjugate. Then, Eq. (2) can be expressed as which can be interpreted as the squared magnitude of a 2D windowed-Fourier transform [31,32] of O(r) with a complex window function P R,q (r). The measurable quantity |Ψ(q; R)| 2 is referred to as spectrogram [30]. It represents the energy density of O(r) in a joint real-reciprocal space neighborhood of the positions (R, q). This neighorhood is specified by boxes that one can think of as Heisenberg boxes. The size of such a box equals the variance of P R,q (r) in real and reciprocal space, σ q and σ r , respectively, and is independent of its position in the four-dimensional phase space. This allows phase space to be conveniently discretized [30]. Figure 2(a) illustrates a 2D sub-space of the spectrogram phase space. For ptychography, σ r and σ q measure the spread of the P R,q (r) over the size D of the confined illumination at the sample position and over its spread 2π/D in the diffracted waves, respectively.
We further note that, after some algebra, the spectrogram can be written as [5,16,33]: where W O (r, q) and W P (r, q) are the WDFs of the object and probe, respectively. Since the spectrogram represents the far-field intensity distribution I(q; R), Eq. (5) shows that the intensities can be interpreted as a weighted average in space and frequency of the Wigner distribution of the object with the Wigner distribution of the illumination as weighting kernel. Fig. 2. Representation of ptychography phase space. Two Heisenberg boxes and their spread in the real and reciprocal space are located at (q n , R n ) and (q m , R m ). The projection of these boxes onto the real space represents the |P R,q (r)| (in solid and dashed blue lines), here illustrated by Gaussian functions, and onto the reciprocal space represents |F {P R,q (r)}| (in solid and dashed green lines).

Reconstruction formulas
In order to obtain the object function O(r) and the illumination function P(r) we identify the diffracted wavefield, Eqs. (2) and (4), as an expansion of the object signal onto a sort of "elementary" 2D signals [23,31,32,34]. Therefore, from windowed-Fourier transform theory, we can directly write the inversion formula for the object as [23,[30][31][32][35][36][37]: which can be interpreted as the expansion of the object O(r) in terms of a continuum of shifted and modulated window functions P R,q (r), Eq. (3). Along similar lines, we can derive the reconstruction formula for the probe (cf. Appendix): Changing the order of the integrals in Eqs. (6) and using the inverse Fourier transform of Eq. (1), the object and probe formula can be re-written: which are the continuous analogues of the discrete update formulas of the DM algorithm [27], where they have been derived as to minimize a distance between experimental data and Eq. 1.

Elementary functions
To use the inversion formulas of Eqs. (6), it is sufficient to determine the spectrum of Ψ(q, R) at points of a 4D periodic lattice in phase space [23-26, 30, 35-37]. We define the real-space spacing in the direction i ∈ {x, y} as R i = α i D i and, equivalently in reciprocal-space spacing, Q i = β i q i , where D i q i = 2π and α i β i ≤ 1 [26,[35][36][37], as illustrated in Fig. 3. We note that 1/α i β i = 2π/Q i R i has been defined as "one-dimensional sampling ratio" in the recent ad hoc approach estimating ptychographic sampling requirements [18]. With the sampling a k x ,k y ,n x ,n y = Ψ(k x Q x , k y Q y , n x R x , n y R y ), where k x , k y , n x , and n y all take integer values, the object function O(r) can be reconstructed by considering the sampling values as the coefficients of Gabor's signal expansion with the synthesis window being the same, or approximately the same, as the illumination, P(r) [35,36]. Thus, Eq. (6a) can be discretized [31,32]: where N i is the number of samples in the each direction i and where we omitted the denominator for clarity. Since Heisenberg boxes of the elementary functions P R,q (r) (Eq. (3)) cover the entire 4D phase space, the windowed Fourier transform is complete. In fact, a discrete windowed Fourier transform representation given by the set of functions P R,q (r) is complete and stable if P R,q (r) is a frame of L 2 (R), which is true only if [24][25][26]30,38] which seconds the finding that dense sampling in real space can compensate for poor sampling in reciprocal space and vice versa [19]. Analogue to previously reported results [18,19], at critical sampling, i.e., α i β i = 1, the localization properties of each window in the 4D lattice are poor, and one can expect a poor numerical stability of the signal expansion [17,39]. This condition can be exemplified by three situations: (i) the lateral shifts are of the same size as the illumination size D without any overlap of illumination spots and, consequently, no redundancy in the data; (ii) the lateral shifts of the illumination keep the overlap of the spots in the real space, but the reciprocal space is highly undersampled; (iii) the opposite of situation (ii) in which reciprocal space is oversampled, but without overlap of the illumination spots in the real space.
When α i β i < 1, the shifts of the illumination spots are reduced by the factor α i giving rise to overlap. β i < 1/α i defines the "oversampling" in reciprocal space. Thus, the localization of each window in the 4D lattice is good, and better numerical stability is obtained at the cost of redundancy and non-unique Gabor coefficients in Eq. (8) [39]. However, the value of such coefficients can be determined from ptychographic phase retrieval [6-9, 12, 27]. Formally, the functions P R,q (r) form a frame and are not linearly independent.

Role of the spatial-frequency spectrum of the illumination
Whereas according to the theory discussed thus far, α i β i > 1 would preclude existence of the signal expansion because the functions P R,q (r) would not completely fill the 4D phase space and would not form a frame. However, Edo et al. have discussed just such a case, in which the reconstruction of ptychographic data did not break down even though Eq. (10) did not hold [18]. In this case the illumination was highly structured which changes the bandwidth of the signal and thus affects the bandwidth of the diffracted wavefield, Eq. (2). In fact, the reconstruction of a ptychographic dataset can be affected by the spatial-frequency content of the illumination [21].
According to Carson's rule, the bandwidth of the frequency-modulated signal is given by , where ∆ f is the maximum deviation from the carrier frequency and f m is the highest frequency component in the modulating signal [40]. Consequently the sampling of ptychographic data acquired with highly structured illumination may change, and the modulation caused by the illumination needs to be taken into account for the sampling. However, an exhaustive discussion of modulation by structured illumination in ptychography [21] is beyond the scope of the present work.

Role of the overlap
Finally, we demonstrate the role of the overlap in ptychography by means of a simple 1D example. Let us define a signal φ (x) = exp (−ixq) and the window function as p(x) = cos(x), if |x| ≤ π/2 and 0 elsewhere. Its square magnitude, p 2 (x), is shown in Fig. 4(a). We then scan the signal by translating the window function by x n = nπ/2 with n ∈ {−1, 0, 1} ( Fig. 4(b)) and sum all square magnitude contributions of the window p 2 (x) + p 2 (x + π/2) + p 2 (x − π/2) as well as the windowed real and imaginary parts of the signal (Fig. 4(c)). We observe that within the interval [−π/2, π/2], we recover the original signal. This is an example of tight frames [25,30]. Tight frames have the advantage that the synthesis windows are the same as the analysis window in the windowed-Fourier transform context. Applied to ptychography, the consequence is that the overlap would be ideal when the square magnitude (i.e., energy) of the illumination functions is such that: We note that the algorithms can still work well if Eq. (11) holds only approximately, which is the case for the so-called snug frames [26]. Consequently, the energy of the illumination functions, Eq. (3), should be distributed as uniformly as possible over the scanned area in ptychography while ensuring sufficient sampling.
The overlap, i.e., real-space sampling, has been long recognized as providing redundancy of the data. Here we have shown how it assists reconstruction algorithms to recover the original signal of the object function. In addition to advantages of a uniform distributing of the overlap [41], Eq. (11) can provide arguments how the size and shape of the illumination at the sample position can help optimizing the scan pattern and density.

Conclusion
We have presented an interpretation of ptychography based on the concept of windowed Fourier transform frames, where the illumination function has the role of a window function. We described a 4D phase space on the premise of the Gabor's lattice and showed that a ptychographic scan ought to fully cover this phase space, some oversampling ensured. From this, we derived sampling requirements, which are in agreement with previous works based on that was ad hoc approaches and artificial undersampling of ptychography data. The sampling requirements we derived here are fully compatible with the iterative reconstruction approaches, such as the extended ptychographic iterative engine (ePIE) and the difference map (DM). At the same time, our theoretical approach successfully links to the Wigner distribution function, which could previously be exploited only in a rather restrictive theoretical framework.
We propose a criterion for optimizing the overlap and scan geometry. More generally, our theoretical framework can provide guidelines for optimizing experimental parameters, including the illumination size, overlap, scan pattern, sample-to-detector distance, and the detector pixel size, while keeping the dose imparted to the sample acceptable. For instance, the empirical experience that many short low signal-to-noise acquisitions tend to be more advantageous than distributing the same dose on fewer low-noise acquisitions can now be easily rationalized.
Phase space descriptions of ptychography have been tremendously useful in the past. Here, we have introduced a "natural" discretization in terms of elemental information. The formal description in terms of frames is sufficiently general that intricacies, such as non-trivial illumination structures or partial coherence, can be taken into account in future studies. We expect that this framework will facilitate the positioning of ptychography among other spectrogram characterization techniques and that it will allow ptychographic sampling and measurement strategies to benefit from well established theories [21,25]. ψ(r + R; R) = O(r + R)P(r).

(12)
Applying the Fourier shift theorem and using Eq. (2), this yields: Now, we use the inverse Fourier transform definition, multiply both sides of Eq. (13) by O * (r + R) and integrate over R in order to obtain: from which we can isolate the probe function after rearranging the terms similar to Eq. (6a).

Derivation of Eqs. 7a and 7b
Starting from 6a: from which we can write Eq. (7a). And starting from 6a: from which we can write Eq. (7b).