High-resolution (diffraction limited) single-shot multiplexed coded-aperture ptychography

We propose and numerically explore a method that upgrades a conventional single-shot microscope into a single-shot ptychographic microscope, without impairing its optical performance. It is based on splitting the microscope’s intermediate image plane into multiple replicas, and detecting a set of their coded Fourier transform magnitudes, using a different sensor for each replica. To code each beam, it is transmitted through a distinct coding mask. A ptychographic phase-retrieval algorithm is used to reconstruct the imaged object’s complex-valued transmission function. The proposed method also enables ultrahigh-speed imaging by using a pulse-burst as illumination and applying a multi-state reconstruction algorithm. This approach paves the way to both single-frame and ultrahigh-speed, diffraction-limited microscopy of complex-valued objects.


Introduction
Ptychography [1][2][3][4] and diffraction-based coded aperture [5][6][7][8][9] are two powerful modalities of coherent diffraction imaging (CDI) [10,11]. Both methods enable us to determine an object's complex (i.e. amplitude and phase) optical response transmission from a set of diffraction measurements. In conventional ptychography, the measurements are obtained by scanning the object with a localized probe over different overlapping regions, and recording the far-field diffraction patterns that correspond to their magnitude square spatial Fourier transform. In diffraction-based coded-aperture imaging, the optical field is modulated either before [7] or after [5,8] interacting with the object using several optical masks. As in ptychography, a set of diffraction patterns of the modulated fields are then measured. In both techniques, complex-valued images of the object are computationally obtained from the set of recorded intensity patterns using iterative phase retrieval algorithms [7,12].
An obvious limitation of such multiple-exposure techniques is their long measurement duration-typically more than a second. This limitation has motivated researchers to devise schemes for single-shot ptychography (SSP), i.e. schemes in which a ptychographic data set is recorded in a single exposure [13][14][15][16][17]. In essence, such SSP schemes are based on splitting the beam either before illuminating the object, by using a grating [13] or a mask in the input of a 4f system [14,15], or splitting the outgoing beam from the object, by a beam-splitter [17] or a grating [16,17], and recording the diffraction patterns on a single sensor within a single exposure. The detection plane is divided into zones, such that the recorded diffraction pattern in each zone is associated with a localized region in the object. An exciting application of SSP is for ultrahigh-speed imaging [18]. In time-resolved imaging by multiplexed ptychography (TIMP) [15], an SSP setup is illuminated with a burst of pulses that probe the object at different times, and multiple frames (a movie) are reconstructed from the multiplexed data recorded in a single exposure using a mixed-state reconstruction algorithm [19]. Recently, multiplexed SSP and TIMP were demonstrated experimentally [20,21].
To date, all SSP setups sacrifice resolution for data redundancy. Partitioning the detection plane along with the diffraction angles into cones results in resolution deterioration. For instance, an N×N division limits the spatial resolution to Nλ/2. Moreover, measuring an object with Fourier components beyond this limit gives rise to 'cross talks' between different zones which degrades reconstruction.
Here, we propose a single-shot ptychographic coded-diffraction scheme that does not sacrifice resolution. It is based on a method that upgrades a conventional single-shot microscope into a phase and amplitude microscope, while maintaining its spatial resolution and field of view (FOV). Furthermore, we show that this system can also be used for TIMP. In the proposed technique, which we term single-shot coded-aperture ptychography (SSCAP), the magnified image at the output of the single-shot microscope is split into several replicas at several arms by using a set of beam splitters. In each arm, the replica is projected through a distinct coding transmissive mask and finally its Fourier transform intensity is recorded. We explore three different types of masks: shifted identical localized aperture (ptychography-like) masks, extended phase-only masks and extended complex-valued masks. The recorded patterns are used to reconstruct the complex-valued field of the magnified image at the output of the single-shot microscope using a ptychographic reconstruction algorithm. We numerically explored the performances of SSCAP, for both single and multiple frame cases. We found that when using phase-only masks, a single, high-resolution complex-valued frame can be reconstructed from as few as two coded-diffraction patterns, and that ten such frames can be reconstructed from only nine multiplexed coded-diffraction measurements.
2. Single-shot coded-aperture ptychography Figure 1 presents the proposed setup schematically. Starting from the left side, an object is imaged using a spatially coherent light source and single-shot high-resolution microscope. The beam propagating out from the microscope is split prior to the imaging plane to K arms using beam-splitters, hence a magnified object optical field, O r ( ), is formed in each arm (where r is the spatial coordinates vector in the imaging plane). Each such replica image is then projected through a different coding mask. Each arm ends with a Fourier transforming lens and a sensor that records the coded diffraction pattern. The recorded set of intensity patterns is thus given by: where M r k ( ) is the kth arm coding mask's complex transfer function,  is the two-dimensional spatial Fourier operator, and n is the spatial coordinates vector in the sensor plane, scaled according to wavelength λ and focal length f. Importantly, we reconstruct a magnified image of the object O r ( ), which is paraxial, that is, its spatial bandwidth is much smaller than λ −1 . Thus, the optics downstream to the conventional single-shot microscope can be paraxial (the same principle was applied in upgrading a single-shot conventional microscope to an ultrafast phase and amplitude microscope using interferometry [22]).
show three different types of coding masks that we explored in this work. Figure 1(b) shows a localized-aperture complex-valued mask. Using a set of such identical masks with shifted centers, i.e.
takes the form of conventional scanning ptychography. Notably, in contrast to all previous SSP schemes [13][14][15][16][17], equation (1) describes the measured data in our setup without the need to apply limitations about the spatial power spectrum of the object [13][14][15] or its field of view (FOV) [16,17]. We first explored the performance of the system with K=1-4 modulating phase-only masks and measurements. Each mask set was comprised of one 'trivial' mask (i.e. M r 1 ( ) inside the support, which is equivalent to CDI) while the rest (k=2,...,K ) were randomly generated using low-pass filtered white Gaussian noise (WGN) as the phase function. The complex-valued objects were generated using images from ImageNet [23]; different images were used for the amplitude and phase values, normalized to the unit disc. The objects' support size was 72×72 pixels and they were zero padded to 160×160 before Fourier transforming to attain an oversampling ratio of more than 2 [24]. We added WGN to the amplitudes in the sensor plane at several signal to noise ratio (SNR) levels and then quantized each calculated intensity pattern with a dynamic range (DR) of 12 bits (4096 levels), relative to its maximum (i.e. the exposure levels are independently controlled in each arm-for each sensor). For the K=1 case, which is essentially CDI, we also quantized with a 20-bit dynamic range as another comparison point.
The reconstruction process is based on the momentumaccelerated ptychographic iterative engine (mPIE) algorithm [25], without probe updates (i.e. apriori knowledge of the masks is assumed). To reconstruct from a single diffraction measurement, a combined algorithm of error-reduction with hybrid input-output was used [10]. Reconstruction results were evaluated using the log-scaled normalized mean square error (NMSE) criterion: Several examples are given in figure 3(b) to visualize the reconstruction quality at diffrent NMSE values. Figure 2 depicts typical reconstruction results from two diffraction patterns (SNR=30 dB). An object's complex optical field (figure 2(a)) is coded using two masks (figures 2(c) and (d)) and the calculated diffraction patterns (Fourier magnitude square of the modulated object) are shown in figures 2(e) and (f). The reconstructed field (NMSE=26 dB) is shown in figure 2(b). Figure 3 presents a statistical investigation of SSCAP as a function of SNR, for K=1-4 values and a 12-bit dynamic range, where each data point presents mean and standard deviation of 700 mask sets (size=K ) and objects. Results for K=1 (i.e. CDI) are also shown at 20-bit dynamic range. Unlike typical CDI reconstruction methods, apart for the limited support (i.e. Fourier plane oversampling), no further assumptions were made. SSCAP is superior to CDI as is evident by the reconstruction NMSE, which is both higher (mean) and more consistent (standard deviation). This is especially evident with regard to 12-bit quantized CDI reconstruction which essentially fails. As expected, larger K leads to better and more consistent reconstruction results.

High-resolution TIMP using SSCAP
Next, we explore the implementation of SSCAP for TIMP setup as illustrated in figure 4. The hardware modification is  (1)). The object was reconstructed from only two diffraction patterns with NMSE=26 dB.  High-resolution TIMP using an SSCAP setup. A burst of N pulses is used to illuminate an object placed at the imaged plane of an SSCAP setup. Each pulse has a different carrier WL-λ n , therefore experiencing different phase accumulation and Fourier plane scaling, both proportional to ln 1 . The resulting diffraction patterns are summed incoherently (i.e. the sum of intensities) and recorded (equation (4)).
that in this case the system is illuminated by a burst of N pulses, each with a different carrier wavelength (WL)-λ. Assuming that the cameras' integration time is much longer than the pulse-burst duration, each recorded pattern corresponds to the sum of several diffraction intensity patterns, each originating from a different pulse. It was recently discovered that reconstruction from such multiplexed diffraction patterns is feasible, relying either on probe diversity [26,27] or prior knowledge such as probes' mode-orthogonality [19,28] or that the objects are phase-only [28]. In our proposed method, probe diversity is realized by the system's WL dependence through two mechanisms. First, assuming a small dependency of the refractive index in λ, the nth pulse experiences a phase accumulation at the kth mask, which is approximated by: where f k,1 is the accumulated phase by the n=1 pulse. Secondly, each diffraction pattern scales with proportion to λ −1 . Thus, the measurement in the kth arm is given by where O r n ( ) is the object during the nth frame, M r k n , ( ) is the kth mask with its phase scaled according to equation (3),  is the two-dimensional spatial Fourier transform scaled according to λ 1 / λ n and n l = ¢ f r 1 ( ) is the spatial coordinates vector in the sensor plane, scaled according to WL λ 1 and focal length f.
To numerically explore the method's performance, we simulated it with K=9 arms using similar parameters to the single-shot case (but with zero-padding to 180×180 to keep the sampling ratio above 2 and DR=16 bit), and N =1-10 frames/pulses. We scaled each object and mask by adding two pixels in each dimension to get an effective WL of l l = + -n 1 1 ) . Figure 5 shows the reconstruction quality as a function of reconstructed frames/pulses number N, where each data point represents averaging over 20 cases-each with a randomized mask set (size=K) and a randomly picked object set (size=N). Three coding mask types were simulated: identical shifting localized phase and amplitude (i.e. ptychography-like) masks, nonidentical extended phase-only masks, and nonidentical extended amplitude and phase masks (examples are shown in figures 2(b)-(d)). We simulated noise at 25 dB and 45 dB SNR levels as well as the noiseless case (denoted as SNR=¥). Extended phase-only masks gave the finest results for all cases except for high-SNR reconstruction of 1 or 2 frames where ptychography-like masks performed the best. In all cases and for all frame numbers, extended amplitude & phase masks showed similar trends to extended phase-only masks yet always performed worse. Using extended phase-only masks, we were able to reconstruct 10 frames without significant visual artifacts from SNR=45 dB measurements.

Discussion and outlook
We have proposed SSCAP, a coded-aperture ptychographic method, which enables ultrahigh-speed phase and amplitude microscopy, for both a single frame case or multiple frames, via a module added to a single-shot microscope. Using a spatially or spectrally diverse pulse-burst to illuminate the object allows framerates limited only by the pulse repetition rate. We found that our method outperforms CDI in reconstruction quality and robustness even when using a high dynamic range (20 bits) measurement for CDI. Using SSCAP should allow high-resolution ultrahigh-speed complex-valued microscopy of dynamic microscopic phenomena such as phononics [29], spintronics [30], and life sciences [31] (notably, photo-damaging in SSCAP should be examined, especially because SSCAP is based on downstream beam splitting). SSCAP may be improved by several modifications. In this work, the coding masks were at the imaging plane and the measurement was taken in its corresponding Fourier plane. However, it is also possible to locate the masks at any plane before/after the imaging plane and measuring at any diffraction plane afterwards, with optional lenses at each armthus obtaining other measured constraints on the object, similar, for example, to Fourier ptychography [32] or coherent modulation imaging [33]. Also, in addition to varying the probes' carrier WL, different modes or spatial profiles may also introduce probe diversity sufficient for separating different frames. For example, different polarizations in a beamsplitting setup can be spatially modulated before recombining them to illuminate the object as was done in SSP [20]. A second, parallel approach is to use a bayer filter in order to  (2)) as a function of reconstructed frames number for K=9 arms, DR=16bit. Three mask types were compared: identical shifting localized phase and amplitude (i.e. ptychography-like) masks, nonidentical nonlocal phase only masks, and nonidentical nonlocal amplitude and phase masks (examples are shown in figures 2(b)-(d)). Noise was added at three different SNR levels. The lines represent the mean level of NMSE while the colored areas show their respective standard deviation. The horizontal dashed line is at NMSE=15 dB where visual artifacts become significant.
concurrently measure in each arm three multiplexed diffraction patterns, thus further increasing the number of reconstructed frames per arm. Thirdly, improving the results is possible by utilizing structure-based prior knowledge [34], intraframe and interframe self-similarity [35], or augmenting/ performing the reconstruction process with a deep-learning based approach [36,37].

Funding Information
European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (819440-TIMP).