Multiplexed phase-space imaging for 3D fluorescence microscopy

Optical phase-space functions describe spatial and angular information simultaneously; examples include light fields in ray optics and Wigner functions in wave optics. Measurement of phase-space enables digital refocusing, aberration removal and 3D reconstruction. High-resolution capture of 4D phase-space datasets is, however, challenging. Previous scanning approaches are slow, light inefficient and do not achieve diffraction-limited resolution. Here, we propose a multiplexed method that solves these problems. We use a spatial light modulator (SLM) in the pupil plane of a microscope in order to sequentially pattern multiplexed coded apertures while capturing images in real space. Then, we reconstruct the 3D fluorescence distribution of our sample by solving an inverse problem via regularized least squares with a proximal accelerated gradient descent solver. We experimentally reconstruct a 101-Megavoxel 3D volume ($1010\times510\times500\mu$m with NA 0.4), demonstrating improved acquisition time, light throughput and resolution compared to scanning aperture methods. Our flexible patterning scheme further allows sparsity in the sample to be exploited for reduced data capture.


Introduction
3D fluorescence microscopy is a critical tool for bioimaging, since most samples are thick and can be functionally labeled. High-resolution 3D imaging typically uses confocal [1], two-photon [2] or light sheet microscopy [3]. Because these methods all involve scanning, they are are inherently limited in terms of speed or volume. Light field microscopy [4,5], on the other hand, achieves single-shot 3D capture, but sacrifices resolution. High resolution and single-shot capture are possible with coded aperture microscopy [6][7][8][9]; however, this requires an extremely sparse sample. Here, we describe a multi-shot coded aperture microscopy method for high-resolution imaging of large and dense volumes with efficient data capture.
Our work fits into the framework of phase-space optics, which is the general term for any spaceangle description of light [10][11][12][13]. Light fields, for example, are phase-space functions for ray optics with incoherent light. The light field describes each ray's position and angle at a particular plane, which can be used for digital refocusing [14], 3D imaging, aberration correction [15] or imaging in scattering [16]. In microscopy, wave-optical effects become prominent and so ray optics no longer suffices if one wishes to achieve diffraction-limited resolution. Hence, we use Wigner functions, which are the wave-optical analog to light fields [12,13].
The Wigner function describes spatial and spatial frequency (propagation angle) information for a wave-field of arbitrary coherence. It converges to the light field in the limit of incoherent ray optics [13]. Capturing Wigner functions is akin to spatial coherence imaging [17,18], since they contain the same information as mutual intensity and coherent mode decompositions [10,19]. In the case of fluorescence microscopy, where the object is a set of incoherent emitters, Wigner representations define the incoherent 3D Optical Transfer Function (OTF) [10].
The 4D nature of phase space (two spatial and two spatial-frequency dimensions) poses significant measurement challenges. Single-shot schemes (e.g. lenslet arrays [20]) must distribute the 4D function across a 2D sensor, severely restricting resolution. Scanning aperture methods [21][22][23] are slow (∼minutes) and light inefficient. We seek here a flexible trade-off between capture time and resolution, with the ability to exploit sparsity for further data reduction.
Our experimental setup consists of a widefield fluorescence microscope with a spatial light modulator (SLM) in Fourier space (the pupil plane). The SLM implements a series of quasirandom coded aperture patterns, while collecting real space images for each ( Fig. 1) [24]. The recovered 4D phase space has very large pixel count (the product of the pixel counts of the SLM and the sensor) ∼ 10 12 . Compared to scanning aperture methods [21,22], the new scheme has three major benefits. First, it achieves better resolution by capturing high-frequency interference effects (high-order correlations). This enables diffraction-limited resolution at the microscope's full numerical aperture (NA). Second, we achieve higher light throughput by opening up more of the pupil in each capture; this can be traded for shorter exposure time and faster acquisition. Third, the multiplexed nature of the measurements means that we can employ compressed sensing approaches (when samples are sparse) in order to capture fewer images without sacrificing resolution. This means that the number of images required scales not with the reconstructed number of resolved voxels, but rather with the sparsity of the volume.
Our method can be thought of as a multi-shot coded aperture scheme for diffraction-limited 3D fluorescence microscopy. It is analogous to coded aperture photography [25-28]; however, we use a wave-optical model to account for diffraction effects, so intensity measurements are nonlinear with complex-field. Fluorescent imaging allows a simplification of the forward model, since each fluorophore is spatially coherent with itself but incoherent with all other fluorophores. Our reconstruction algorithm then becomes a large-scale inverse problem akin to multi-image 3D deconvolution, formulated as a convex 1 regularized least-square error problem and solved by a fast iterative shrinkage-thresholding algorithm (FISTA) [29].  Fig. 1. Phase-space multiplexing for 3D fluorescence microscopy. The microscope (a 4 f system) uses a Spatial Light Modulator (SLM) in Fourier space to implement multiple coded apertures, while capturing 2D intensity images in real space for each. Our wave-optical forward model A relates the object c to the measured images y meas for each pattern. The inverse problem recovers the object, subject to sparsity priors where applicable.

Theory and methods
We use the Wigner function (WF) and its close relative, the Mutual Intensity (MI) function, to describe our multiplexing scheme. The WF describes position and spatial frequency, where spatial frequency can be thought of as the direction of ray propagation in geometrical optics. The concept was introduced by Walther [30] to describe the connection between ray intensity and wave theory. Here we assume that the light is quasi-monochromatic, temporally stationary and ergodic, so the Wigner function is [10,11]: where r = (x, y) denotes transverse spatial coordinates, u = (u x , u y ) denotes spatial frequency coordinates and · denotes ensemble average. The quantity contained by angle brackets, is, apart from a coordinate transform, the Mutual Intensity. E(r) is a spatially coherent electric field (e.g. from a single fluorophore) andẼ(u) is its Fourier transform. The ensemble average allows representation of both coherent and partially (spatially) coherent light. Here, we assume that the object is a 3D volume of incoherent emitters with no occlusions. Thus, the phase-space description of light from the object is a linear sum of that from each fluorophore.

General forward model
Our forward model relates each coded aperture's captured image to the 3D object's intensity. Each image contains information from multiple frequencies and their interference terms (which are key to resolution enhancement). The MI framework facilitates our mathematical analysis, since the projection of a mask in MI space is equivalent to applying the 3D Optical Transfer Function (OTF) to an incoherent source (see Fig. 2). MI analysis further reveals the interference effects that may be obscured by looking only at a projected quantity (the OTF). For simplicity, we first describe the forward model of a point source and then generalize to multiple mutually incoherent sources. Consider a complex electric field, specified by some properties, α (e.g. location and wavelength of a point source). The field E s (r 1 ; α) acts like a unique coherent mode -it interferes coherently with itself but not with other modes. For a single coherent mode at the front focal plane of the 4 f system in Fig. 1, the complex-field just before the SLM is the Fourier transform of the input complex-field [31], expressed asẼ s (u 1 , α), where u 1 is in units of spatial frequency that relate to the lateral coordinates x 1 on the SLM by multiplying the wavelength λ and the front focal length f , such that x 1 = λ f u 1 . The fieldẼ s (u 1 ; α) is then multiplied by the coded aperture, giving, whereẼ n is the patterned complex-field in Fourier space and M n represents the n'th binary coded aperture pattern. At the sensor plane (the back focal plane of the 4 f system) the intensity I n is: where u 2 is a duplicate of u 1 . The intensity image can be alternately related to the MI or WF by a where MI n and W n are the MI and WF associated with the field modified by the n'th mask. Describing the intensity measurement in terms of both its WF and MI gives new insights. Eq. (7) shows that the intensity image is a projection of the patterned Wigner function W n across all spatial frequencies. Alternately, looking at the Fourier transform of I n , denoted byĨ n , we can interpret the intensity measurement as a patterning and projection of the source MI: where MI s is the MI of source field E s . This interpretation is illustrated for a 1D complex-field in Fig. 2. Each SLM pattern probes different parts of the input MI. By using multiple coded apertures, one may reconstruct the MI from its projections. The extension of our forward model to multiple coherent modes is straightforward. Since each mode is mutually incoherent with others, we sum over all of them with their weights C(α):

Forward model for incoherent sources
From the general forward model of Eq. (9), we can now derive the case in which the object is a 3D incoherent intensity distribution (e.g. a fluorescent sample). We specifyẼ s in Eq. (3) to be from a single-color fluorophore located at position (r s , z s ) where z s is the defocus distance. The parameter α is (r s , z s , λ) for a single point source, and our goal is to solve for the coefficients C(α) which represents intensity of each. To account for off-focus point sources, the field can be propagated to −z s using angular spectrum propagation [31]: Using Eqs. (2), (6) and (8), after some algebra the intensity becomes: where is the kernel for mask M n at depth z s . Plugging Eq. (11) into Eq. (9) gives the final expression of the forward model. Before doing so, we assume that the fluorescent color spectrum S(λ) is identical and known for all flourophores. Thus, we can decompose the mode weights in Eq. (9) into a product of the spectral weight and the 3D intensity distribution, S(λ)C(r s , z s ), giving: Equation (13) describes the forward model for a 3D fluorescent object C(r s , z s ) with no occlusions. The term in parentheses is a convolution kernel describing the 3D Point Spread Function (PSF) for mask M n (shown in Fig. 1). For simplicity, we assume here no scattering, though incorporating the scattering forward model in [16, 22] is straightforward.

Inverse problem
Based on the raw data and forward model, the inverse problem is formulated as a nonlinear optimization. Our goal is to reconstruct the 3D intensity distribution C(r s , z s ) from the measured images. To do so, we aim to minimize data mismatch, with an 1 regularizer to mitigate the effects of noise (and promote sparsity where applicable). The mismatch is defined as the least-square error between the measured intensity images and the intensity predicted by our forward model (Eq. (13)). This formulation has a smooth part and a non-smooth part in the objective function and is efficiently solved by a proximal gradient descent solver (FISTA [29]).
To formulate the inverse problem, we first discretize the forward model in Eq. (13) to be y = Ac.
Here y ∈ R M P×1 corresponds to predicted images on the sensor; each small chunk (∈ R P×1 ) of y is a vectorized image I n (r). We discretize r into P pixels, and the number of masks is M, so n = 1 . . . M. Similarly, we discretize r s and z s into P pixels and L samples, respectively, to obtain a vectorized version of C(r s , z s ) (c ∈ R L P ×1 ). The matrix A ∈ R M P×L P , which is not materialized, represents the summation and convolution in Eq. (13) using 2D Fast Fourier Transforms (FFTs) for each subvector (∈ R P ×1 ) of c, with zero-padding to avoid periodic boundary condition errors. The convolution kernel is precomputed and stored for speed.
The inverse problem becomes a data fidelity term plus an 1 regularization parameter µ: where y meas ∈ R M P×1 is the measured intensity. We also use a diagonal matrix W ∈ R L P ×L P to lower the weight of point sources near the borders of images whose light falls off the sensor. Each diagonal entry of W is obtained by summing the corresponding column in A. Outside point sources may also contribute to the measured intensity due to defocus; hence, we use an extended field-of-view method [32] to solve for more sample points in c than y (i.e. P > P).

Design of coded apertures
In the scanning-aperture scheme [21,22], smaller apertures give better frequency sampling of the 4D phase space, at a cost of: 1) lower resolution, 2) lower signal-to-noise ratio (SNR) and 3) large data sets. Our multiplexing scheme alleviates all of these problems. Multiplexing achieves diffraction-limited resolution by additionally capturing interference terms, which cover the full NA-limited bandwidth. This is evident in the Fourier transform of the captured images (Fig. 3).
The SNR improvement is also visible; the multiplexed image is less noisy. Our masks are chosen by quasi-random non-replacement selection. We section the SLM plane into 18×18 square blocks and keep only the 240 blocks that are inside the system NA. For each mask, we open 12 blocks, selected randomly from the blocks remaining after excluding ones that were open in previous sequences. In this scheme, the full NA can be covered by 20 masks. To allow for both diversity and redundancy, we choose to to cover the entire pupil 5 times, resulting in 100 multiplexed aperture patterns, one of which is shown in Fig. 3.
Importantly, the number of multiplexed patterns can be flexibly chosen to trade off accuracy for speed of capture. For instance, by increasing the number of openings in each pattern, we can cover the entire pupil with fewer patterns. This means that we may be able to reconstruct the object from fewer measurements, if the inverse problem is solvable. By using a priori information about the object (such as sparsity in 3D) as a constraint, we can solve under-determined problems with fewer total measured pixels than voxels in the reconstruction.

Experiments
Our experimental setup consists of the 4 f system ( f 1 = 250 mm, f 2 = 225 mm) shown in Fig. 1, with an additional 4 f system in front, made of an objective lens (20× NA 0.4) and a tube lens ( f = 200 mm) to image the sample at the input plane. The SLM (1400×1050 pixels of size 10.3µm) is a liquid crystal chip from a 3-LCOS projector (Canon SX50) which is reflective and polarization-sensitive, so we fold the optical train with a polarization beam splitter and insert linear polarizers. Our sensor (Hamamatsu ORCA-Flash4.0 V2) captures the multiplexed images and synchronizes with the SLM via computer control.
Our sample is a fixed fluorescent brine shrimp (Carolina Biological). It is relatively dense, yet does not fill the entire 3D volume. The reconstructed 3D intensity ( Fig. 4(a), 4(b) and 4(f)-4(h)) is stitched from five volume reconstructions, each with 640×640×120 voxels to represent the sample volume of 455×455×600 µm. The reconstruction is cropped to the central part of our extended field of view, so the final volume contains 1422×715×100 voxels corresponding to 1010×510×500µm. The dataset size is large (9 GB), and since the size of the 3D array is ∼5×10 7 without the extended field-of-view and the measured data is ∼4×10 7 , the number of operations for evaluating Eq. (13) is on the order of 3×10 10 . This takes 4 seconds to compute on a computer with 48-core 3.0 GHz CPUs and requires 94 GB memory to store the kernel (Eq. (12)). . Image quality can be traded for capture speed (number of coded aperture images). 3D reconstructions from increasing numbers of images with different coded apertures show that this object is too dense to be accurately reconstructed by a single coded-aperture image, but gives a reasonable reconstruction with 10 or more images, due to sparsity of the sample.
The reconstructed 3D intensity is shown in Fig. 4, alongside images from a confocal microscope and a widefield focus stack, for comparison. Both our method and the focus stack use a 0.4 NA objective, while the confocal uses 0.25 NA; hence, the confocal results should have slightly better resolution. Our reconstructed slices appear to have slightly lower resolution than the defocus stack and confocal, possibly due to the missing information in the frequency mutual intensity illustrated in Fig. 2(c), which is greatly undersampled. As expected, the depth slices of our reconstruction have better rejection of information from other depths, similar to the confocal images.
To illustrate the flexible tradeoff between capture time (number of coded apertures used) and quality, we show reconstructions in Fig. 5 using different numbers of coded aperture images. The case of only 1 image corresponds to a single coded aperture and gives a poor result, since the sample is relatively dense. However, with as few as 10 images we obtain a reasonable result, despite the fact that we are solving a severely under-determined problem. This is possible because the measurements are multiplexed and so the 1 regularizer acts as a sparsity promoter.

Conclusion
We demonstrated 3D reconstruction of a large-volume high-resolution dense fluorescent object from multiplexed phase-space measurements. An SLM in Fourier space dynamically implements quasi-random coded apertures while intensity images are collected in real space for each coded aperture. Theory is developed in the framework phase space, with relation to Mutual Intensity functions and 3D OTFs. Reconstruction is formulated as an 1 -regularized least-square problem. This method enables diffraction-limited 3D imaging with high resolution across large volumes, efficient data capture and a flexible acquisition scheme for different types and sizes of samples.