Experimental time-resolved imaging by multiplexed ptychography

A recently proposed technique introduced a time-resolved option of fast transient non-repetitive events to ptychographic microscopy. This technique, termed time-resolved imaging by multiplexed ptychography (TIMP), is based on algorithmic reconstruction of multiple frames from data recorded in a single camera acquisition of a single-shot ptychographic microscope. We demonstrate TIMP experimentally, reconstructing thirty-six frames of a dynamical complexvalued object from ptychographic data recorded in a single camera snapshot. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Ptychography [1] is a powerful coherent diffractive imaging (CDI) [2] technique, yielding a labelfree, high-contrast quantitative amplitude and phase information, which does not require prior information (e.g. support) on the object and the probe beam. In a conventional ptychographic microscope, a complex-valued object is scanned in a stepwise fashion through a localized beam. In each step, the far-field intensity diffraction pattern from the illuminated region on the object is recorded. Critically, the illumination spot in each step overlaps substantially with neighboring spots, resulting in significant redundancy in the measured data which enables robust amplitude and phase reconstructions of the object using an iterative phase retrieval algorithm [3,4].
A limitation of conventional ptychography is the long acquisition time (> 1ms) due to the scanning, precluding the application of ptychography to imaging of fast dynamics. To overcome this restriction, single-shot ptychography (SSP) schemes, in which ptychographic data (multiple intensity diffraction patterns from overlapping regions) is recorded in a single camera exposure, were proposed and demonstrated [5][6][7][8][9]. The recorded data is divided into zones, each zone approximately contains one diffraction pattern that is associated with a known localized region of the object. Notably, ultrafast SSP (single frame of a static object) was demonstrated using a single 150 psec pulse illumination [7].
Recently, Time-resolved Imaging by Multiplexed Ptychography (TIMP) was proposed as a promising approach to obtain ultrahigh-speed high-resolution imaging of complex-valued objects [7]. In TIMP, an SSP system is illuminated by a burst of pulses that is much faster than the integration time of the sensor, so the diffraction patterns from all the pulses are summed up and recorded in a single camera snapshot. In order to produce a movie of the event from the recorded multiplexed ptychographic data, i) the burst should consist of different (preferentially mutually orthogonal) probe pulses and ii) the ordinary ptychographic reconstruction algorithm should be replaced by a multi-state ptychographic algorithm (MsPA) [10,11]. TIMP offers exciting possibilities for single-shot ultrahigh-speed imaging [12]. First, due to its relative simplicity, it should be applicable across the electromagnetic spectrum, including extreme UV and x-ray spectral regions. Second, in TIMP, the spatial resolution and frame rate can be largely uncoupled to the number of frames (the cost of increasing the number of frames can be allocated to reduce the field of view or to enhance the complexity of the microscope [7]). Thus, TIMP may allow ultrahigh speed microscopy of complex-valued objects with submicrometer and picosecond resolution scales. While multiplexed single-shot ptychography was demonstrated for static polarization sensitive objects [13], TIMP has not been demonstrated experimentally yet.
Here we demonstrate TIMP experimentally, reconstructing up to thirty-six complex-valued images from data recorded in a single camera snapshot. We explore two different schemes for producing a burst of pulses that are mutually orthogonal. The first one achieves high mutual orthogonality by spatially modulating the probes with different vortex-like phases. The second method modulates the probe beams with different linear phases. We first explore both methods separately and then combine them in order to increase the sequence depth (i.e., the number of frames captured in one acquisition). This work constitutes an important step in the development of an ultrahigh-speed high-resolution ptychographic microscope.

Experimental setup
Our TIMP setup is based on SSP through a 4f system (4fSSP) [6,7,13,14] (Fig. 1(a)). In 4fSSP, an array of pinholes is located at the input plane of a 4f system. Lens L1, focuses the light beams that diffract from the array onto the object, which is located at distance d before the back focal plane of lens L1. The small displacement from the Fourier plane, d, creates a partial overlap between the beams which is essential in ptychography. Lens L2 collects the diffracted light from the object and transfers it to a camera on the output plane of the 4f system. Assuming that the spatial power spectrum of the object is largely confined to a low-frequency region, the camera measures an intensity pattern consisting of clearly distinguishable blocks. Each block contains a diffraction pattern associated with a beam originating from a single pinhole, and contains spectral information about a specific region on the object plane.
In our experiment ( Fig. 1(b)), the optical setup is comprised of a 520nm diode-laser that is temporally modulated electrically, emitting a pulse burst, each with a duration of τ = 5msec and the pulse separation is ∆t = 16.66msec (60Hz repetition rate). The beam is coupled to a single-mode fiber for spatial filtering, then spatially magnified by a telescope (not shown in Fig.  1) and enters a modified 4fSSP setup, with f 1 = f 2 = 100mm. In order to vary the probe beams in time, we replaced the static pinhole array by a reflective HOLOEYE PLUTO-2 phase-only spatial light modulator (SLM) that generates a tunable mask-like beam structure on the input plane of the 4f system. The induced phase mask consists of a fixed component and an additional component that varies between pulses.
Specifically, we set the fixed phase component such that the SLM acts as a micro-lens array (MLA), producing an effective pinhole array at a focal distance, f M L A , downstream from the SLM. Hence, the SLM is located f M L A before the input plane of the 4f system. The programmed fixed-component phase mask is shown in Fig. 1(c). It is an array of 20 phase masks of the form Φ 0 (r) = exp(iπr 2 /λ f M L A ), where r is the distance from the center of a single micro-lens, and is locally defined in each block. We determined f M L A according to f M L A = πbD/4λ, where b = 1.4mm is the distance between consecutive lenses/pinholes, D = 120µm is the required effective spot/pinhole size on the input plane of the 4f system and λ = 520nm is the illumination wavelength.
In our experiment, the dynamical object is an amplitude-only SLM (HOLOEYE HED 6001 monochrome LCOS microdisplay), that was placed d = 16mm before the 4f system Fourier plane, yielding ∼75% overlap on the object plane between beams originated from neighboring pinholes. Notably, we used reflective SLMs as they can typically have smaller pixel size than transmissive SLMs (because the electronic systems can be placed behind the pixels), along with a considerably larger pixel fill factor. In order to further increase the contrast of the SLM-generated MLA, the SLM was slightly tilted so that the non-diffracted light gets deflected outside of the experiment's numerical aperture, reaching an SNR of 40 dB. Both SLMs have 1920 × 1080 pixels, 8µm pixel pitch, 8 bit dynamic range, and 60Hz input frame rate (synchronized with the laser source).
Lens L2 transforms the object plane's exit-wave to spatial frequency domain at the exit plane of the 4f system, where we placed the camera. The transformation from real-space to the spatial frequency coordinates is given by ν ν ν = r/λ f 2 (the fact that the object is located distance d + f 2 before lens L2 merely adds a parabolic phase, which is not detectable by the camera), where r is a spatial coordinate vector on the camera plane. The resulting images were captured with a Basler acA2440-35um camera that has 2448 × 2048 pixels and a pixel size of 3.45 × 3.45µm 2 .
In order to make the most efficient use of the rectangular detection area of the camera, we sampled the object with a rectangular N x × N y = 4 × 5 pinhole array, yielding a cutoff frequency ν max = b/2λ f 1 = 14 × 10 −3 /2λ , i.e. ∼ 1% of the diffraction limit, and a field of view of FOV= (N x × N y )bd/ f 1 ≈ 1mm 2 [6].
In order to measure K frames with TIMP, we illuminate the system with a burst of K ≤ 36 pulses [7]. The acquisition time of the camera was set to 4sec (which is larger than K∆t), hence the recorded intensity pattern corresponds to an incoherent sum of the diffraction patterns from all the K pulses in the burst: where I, P, O stand for intensity, probe and object distributions, respectively, m = 1, 2, 3, ..., N x × N y is the block/pinhole index, k is the frame/pulse index, and F stands for the 2D spatial Fourier operator.

Experimental results
Generally, multiple frames of a dynamical complex-valued object cannot be recovered uniquely from the measurements described in Eq. (1). However, an orthogonal set of probes can lead to a unique reconstruction of both the probes and the objects [11]. Thus, in our experiment we employ the SLM to produce a burst of pulses with orthogonal complex-valued spatial profiles. Notably, according to Plancherel's theorem [15], the orthogonality is preserved after propagation to the object's plane. We use two different encoding approaches to acquire mutually orthogonal pulses -an orbital angular momentum encoding (OAME) which is based on the orthogonality of Laguerre-Gaussian beams [16], and a phase gradient encoding (PGE) which shifts transversely the pattern induced by the MLA.

Orbital angular momentum encoding
First, we explore TIMP using orbital angular momentum encoding of the probe pulses. That is, the phase mask induced by SLM P is given by: where l is the angular mode index number, r and θ are the radial and angular coordinates, respectively. The orthogonality condition for the OAME is l ∈ Z [16].
In each experiment, we first measured a set of 9 such probes using the digit 0 as a known reference single frame object, and later used them as initial guesses for the probe beams in MsPA [10] reconstructions of multi-frame measurements.
Next, we used a burst of 9 such probes to illuminate the dynamical object -SLM O displaying the digits 1-9 -and recorded the integrated diffraction patterns on a single camera snapshot. Specifically, we used the following zipper-like OAME set l = {−9, −7, −5, −3, −1, 2, 4, 6, 8}. The captured snapshot is shown in Fig. 2(a). Using MsPA [10] we reconstruct successfully 9 complex-valued digits and 9 complex-valued probes (Fig. 2(b)). Two features are worth mentioning regarding OAME. First, as higher modes contain higher spatial frequencies, they broaden the diffraction patterns, limiting the resolution of the system. As shown in Fig. 2, OAME transfers the spatial spectral information to ring-like shapes with different radii. In order to minimize the overlap level between different frames, and thus improve the reconstruction quality, we chose the mode orders in the zipper-like way described above. Second, the probe mode order is also limited by the SLM pixel size. Because of the circular profiles of vortices, the required resolution increases towards the center of the vortex. Therefore, the pixel resolution of the SLM limits the number of measurable frames in a single snapshot.

Phase gradient encoding
Next, we explore TIMP with phase gradient (PG) induced masks: where k = (k x , k y ) is a phase gradient mode vector, and r is the spatial coordinates vector. The overlap between PG modes in our system is given by: where k,k = (k x , k y ), (k x ,k y ) are phase gradients modes, ∆k = k −k, Φ is a phase image displayed on the SLM, and * stands for complex conjugation. According to Eq. (4), in order to form an orthogonal set (i.e. zero overlap), every pair of different PGE modes must satisfy: Notably, a PG increment ∆k = 2π/b displaces the output intensity diffraction patterns on the camera plane by ∆r = λ f M L A /b (for magnification M = f 2 / f 1 = 1), thus limiting the achievable spatial spectral content of the original system. According to Eq. (5), for objects with typical bandwidth of ∆k ob j = nν max with 0 ≤ n ≤ 1, the highest PG mode contained in the available spectral width is given by: Substituting the optical parameters into Eq. (6), we get that the highest measurable mode is i max = 3 × (1 − n) . Therefore, the system is capable of capturing up to 25 frames if the captured objects' bandwidths are up to ν max /3, and up to 9 frames if the captured objects' bandwidths are up to ν max × 2/3. Figure 3 presents experimental results of TIMP using phase gradient encoding. As in the experiment presented in section 3.1, we reconstructed the probes in advance and used the results as initial guesses for the probes.

Orbital angular momentum & phase gradient encoding
The number of measurable frames is limited in both probe encoding methods described above, due to the aforementioned technical limitations.
However, these limitations can be overcome by combining the two encoding methods, e.g. by equipping every PGE displaced probe with multiple OAME modes. Thus, by using 4 OAME modes with 9 PGE modes, we increase the sequence depth of the system to 36 frames per snapshot. Figure 4 shows the captured snapshot as well as 36 reconstructed frames consisting of complex-valued objects and probes.

Image quality comparison
We demonstrated above TIMP reconstruction of multiple frames using three encoding methods of the probe beams. Next, we evaluate the performances of the methods. We implemented three comparison criteria: radially averaged spectral width, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [17]. The spectral width indicates the resolution obtained by each method, while PSNR and SSIM are commonly used image quality metrics that indicate the visual 11.5 0.14 Evaluation of the reconstruction quality, using three approaches (FWHM, PSNR and SSIM -see text for details), of a frame containing the digit 8, taken within the following experiments: Single-frame single-shot ptychography (SSP), 9-frames TIMP using orbital angular momentum encoding (OAME), 9-frames TIMP using phase gradient encoding (PGE), and 36 frames TIMP using combined OAME and PGE. The reconstructed amplitudes of the frames are shown in Fig. 5. similarity (in real space) between the retrieved and original objects. For the evaluation, we used the digit 8 as it is relatively symmetrical (important for the radial averaging to be indicative) but still more geometrically complex than a simple circle or ellipse. Figures 5(a)-5(e) show the frames that we used in this comparison. The original object (ground truth) that we used was a real-valued image with no noise (Fig. 5(a)). Figure 5(b) shows the SSP reconstruction of the same object -its performance corresponds to the upper limit of our current setup before introducing multi-framing. Figures 5(c) and 5(d) display the TIMP reconstruction of the 8 th frame (out of 9 frames) using OAME and PGE, respectively. The 9 th reconstructed frame (out of 36 frames) using the combined OAME and PGE encoding is shown in Fig. 5(e).
The radially averaged spatial spectra are shown in Fig. 5(f). The spectral width (FWHM) as well as the PSNR and SSIM values for each method are shown in Table 1. The SSP line shows that our SSP setup does not give rise to high-resolution microscopy. But importantly, comparing the SSP and TIMP lines indicate that the multiplexing component in TIMP does not significantly further reduce the resolution of the system. Strikingly, OAME largely maintained the reconstruction quality, even though it contains eight frames more than SSP. Also noticeable, PGE exhibits poorer performance than SSP and OAME. This is likely because the PGE orthogonality Ground Truth SSP OAME PGE Combined Fig. 5. The amplitude images and radially averaged spatial spectra that were used for comparison in Table 1 condition in Eq. (5) is more sensitive than OAME to pixelization and inaccuracies in the phase masks induced by the SLM, therefore have limited resolution and dynamic range. The performance of the reconstruction with the combined encoding is naturally the poorest.

Measuring complex-valued spatial profiles of pulses in a burst
Mathematically, the object and the probe beam play symmetric roles in ptychography (the measured signal is a function of their product [1]), hence they are interchangeable. Indeed, ptychography is sometimes used for characterizing the probe beams using a known object [18,19]. Applying this exchange of roles between the objects and probe beams to TIMP offers a method for characterizing the (different) spatial profiles of the pulses in a pulse train. We show below a proof-of-concept demonstration of this possibility. To this end, we apply our TIMP reconstruction algorithm on the data presented in Fig. 2(a), where the reconstructed frames presented in Fig.  2(b) are used as initial guesses for the objects and an SSP reconstruction of a TEM 00 profile was used as an initial guess for all the probe beams. The reconstruction results are presented in Fig. 6, showing very good correspondence of the reconstructed probe beam functions to those presented in Fig. 2. Hence, we conclude that by using a known dynamic object, TIMP is indeed capable of measuring the spatial complex-valued profiles of pulses in a burst.

Conclusion
In summary, time-resolved imaging by multiplexed ptychography (TIMP) is a new scheme for multi-frame imaging from data recorded in a single camera snapshot [7]. In this work we demonstrated TIMP experimentally, reconstructing up to thirty-six frames that were recorded in a single camera exposure. In this experiment, the spatial resolution was low (∼300 microns) and the frame rate was 60Hz (corresponding to the frame rate of the SLMs). In order to increase Fig. 6. Reconstruction of 9 complex-valued spatial profiles of pulses in a burst from measured data shown in Fig. 2(a). Each frame is divided to 2 (as marked on the first frame): amplitude on the left and phase on the right. The amplitudes are normalized, and the phases are in the [−π, π] range. the frame rate to the THz scale, one needs to generate a train of femtosecond pulses that are separated by a picosecond scale interval and each pulse is modulated (coded) uniquely such that the pulses are mutually orthogonal. A possible approach to producing such a train of pulses is to launch a single pulse into a multi-mode fiber. Due to modal dispersion, the single pulse is split into a train of pulses, each pulse with a different mode of the fiber [20]. Approaches to enhancing the spatial resolution of TIMP include the use of high NA lenses in 4fSSP, utilizing SSP based on coded aperture [21] and using short-wavelength radiation.

Funding
European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (819440-TIMP).