Fourier DiffuserScope: Single-shot 3D Fourier light field microscopy with a diffuser

Light field microscopy (LFM) uses a microlens array (MLA) near the sensor plane of a microscope to achieve single-shot 3D imaging of a sample, without any moving parts. Unfortunately, the 3D capability of LFM comes with a significant loss of lateral resolution at the focal plane, which is highly undesirable in microscopy. Placing the MLA near the pupil plane of the microscope, rather than the image plane, can mitigate the artifacts near focus and provide an efficient shift-invariant model at the expense of field-of-view. Here, we show that our Fourier DiffuserScope achieves significantly better performance than Fourier LFM. Fourier DiffuserScope uses a diffuser in the pupil plane to encode depth information, then reconstructs volumetric information computationally by solving a sparsity-constrained inverse problem. Our diffuser consists of randomly placed microlenses with varying focal lengths. We show that by randomizing the microlens positions, a larger lateral field-of-view can be achieved compared to a conventional MLA; furthermore, by adding diversity to the focal lengths, the axial depth range is increased. To predict system performance based on diffuser parameters, we for the first time establish a theoretical framework as a design guideline, followed by numerical simulations to verify. Under both theoretical and numerical analysis, we demonstrate that our diffuser design provides more uniform resolution over a larger volume, both laterally and axially, outperforming the MLA used in LFM. We build an experiment system with an optimized lens set and achieve<3 um lateral and 4 um axial resolution over a 1000 x 1000 x 280 um^3 volume.


Introduction
Volumetric fluorescence microscopy with video-rate capture is essential for understanding dynamic biological systems. Single-shot 3D imaging with a 2D sensor is possible by using a hardware encoding procedure followed by a computational decoding procedure. Light field microscopy (LFM) [1,2] is one popular implementation of this, where a microlens array (MLA) is inserted in front of the microscope's image sensor to simultaneously capture 2D spatial and 2D angular information. The resulting 4D light field can be used for digital refocusing, perspective synthesis, or 3D reconstruction. However, using a 2D sensor to sample a 4D light field fundamentally requires trading off angular and spatial sampling, resulting in poor resolution. This is particularly undesirable in microscopy, where resolution is the key performance metric.
The resolution of a LFM can be improved by taking a deconvolution approach to image reconstruction [3,4]. In this case, the captured 2D measurement is used to directly solve for the 3D object, without the intermediate step of calculating a 4D light field. The method makes an implicit assumption of no occlusions, which holds well for most fluorescence microscopy applications. Deconvolution LFM can achieve significantly better (nearly diffraction-limited) resolution at some depth planes, but its performance degrades quickly with depth, even with extra phase masks added to address the problem [5]. Besides suffering from non-uniform resolution throughout the volume, deconvolution LFM incurs artifacts at the native focal plane and requires a computationally-intensive spatially-varying deconvolution procedure. These artifacts and the resolution loss can be mitigated by placing the MLA in an off-focus plane [6][7][8], but spatial variance and resolution inhomogeneity remain.
To solve some of these problems, an alternative configuration, termed Fourier light field microscopy (FLFM), places the MLA at the Fourier (pupil) plane of the objective, with the sensor one microlens focal length away [9][10][11]. This effectively splits the 2D sensor into a grid of sub-images, with each microlens imaging the sample from a different perspective angle. FLFM achieves more uniform resolution near the native focal plane and has a spatially-invariant point spread function (PSF) for improved computational efficiency. However, the fundamental trade-off between spatial and angular sampling remains, unless a camera array is used, greatly increasing cost and complexity [12]. Previous single-sensor implementations use 7-9 microlenses in the pupil plane, which require limiting the microscope's field-of-view (FOV) by a factor of (at least) 7-9× in order to avoid overlap of the sub-images at the sensor [9][10][11]. The resolution is more homogeneous than LFM, but still degrades quickly with depth.
Our Fourier DiffuserScope improves on FLFM by replacing the regular MLA with a diffuser consisting of randomly-spaced multi-focal microlenses. The new architecture has several advantages: 1) By using microlenses with multiple focal lengths [13][14][15], the PSF will have sharp features at a wide range of depth planes, improving the axial depth range and resolution homogeneity.
2) The randomness of the diffuser eliminates periodicity in the PSF and thus removes the ambiguities that required FOV limits in FLFM. Thus, we allow the microlens sub-images to overlap, then use compressed sensing algorithms [16,17] to reconstruct the 3D volume with the fully-available FOV, without trading off volumetric FOV and depth resolution. This 'best of both worlds' scenario is possible only when the sample is sparse in some domain, as is generally true in fluorescence microscopy. The resulting system achieves uniform resolution over a large volume, with imaging speed limited only by signal strength or camera frame rate.
Fourier DiffuserScope can be considered a variant on our previous methods for diffuser-based imaging with different architectures [18][19][20][21][22][23]. Here, we for the first time provide a theoretical framework for Fourier DiffuserScope design with given performance metrics (e.g. resolution, volumetric FOV), and we directly compare with FLFM in simulation. We demonstrate the advantages of both the random and multi-focal properties of our diffuser design for achieving bigger imaging volume and more uniform resolution than FLFM or a random uni-focal diffuser. Finally, we build an experimental system, designed in Zemax OpticStudio, that achieves 2-3 µm lateral and 4 µm axial resolution over a 1000 × 1000 × 280 µm 3 volume. We use the system to record a 3D video of a freely-moving C. elegans nematode at 25 fps.

Related Work
Besides variations of LFM, we compare our proposed approach to other methods for single-shot 3D fluorescence microscopy.
Multifocal microscopy methods simultaneously capture multiple in-focus images at different depths. This can be done by using beamsplitters and multiple cameras conjugate to different depth planes [24]; however, the resulting system is expensive and bulky. To acquire multiple depths with a single sensor, a distorted phase grating can be inserted in the pupil plane, with diffraction orders designed to project different axial layers onto different sub-images on the camera [25][26][27]. A spatial light modulator with superimposed Fresnel lenses [28] or a diffractive metalens [29] can achieve a similar result. For more than a few depth planes, multiplexed volume holography is a good option due to its low cross-talk [30]. In all these methods, however, the FOV is sacrificed by dividing the sensor into small tiles, and the number of depth planes is limited by the number of sub-images that fit on the sensor (generally less than 25 [27]).
PSF engineering for point localization refers to methods that use a coded mask in Fourier space, like our Fourier DiffuserScope, but with the image captured in image (real) space. This configuration results in a depth-dependant PSF (e.g. astigmatic [31,32], double-helix [33][34][35], tetrapod [36], etc.) that, along with localization algorithms, is well-suited to localize separated point-like molecules [31][32][33][34]36] but ill-posed when the object is continuous [35]. Because our Fourier DiffuserScope places both the phase mask and the sensor near the Fourier plane, we have a much larger PSF in which the energy is distributed into more features, so that the cross-correlation of laterally and axially shifted PSFs is lower than that of engineered PSFs. As a result, the design matrix of our random diffuser has nearly orthogonal columns due to the pseudo-random PSF, which is better suited to reconstruct a 3D volume from an undersampled 2D measurement according to the restricted isometry property [37] in compressed sensing theory. Lensless mask-based imaging, which uses a coded aperture for lens-free 2D [38] or 3D [39] imaging, first emerged in X-ray and gamma-ray systems [40,41] for 2D imaging in situations where lenses are difficult to implement. Amplitude coded masks are straightforward to design and easy to fabricate, but come with the inherent issue of blocking a lot of light, which leads to low signal-to-noise ratio (SNR) in the acquisition and noise amplification during reconstruction. Phase masks are more difficult to fabricate but have much better light efficiency [42].
Diffuser-based microscopy describes several different architectures that emerged from our original DiffuserCam [18], which is a lensless phase-mask-based imager that uses a diffuser for encoding 3D information. We have demonstrated 2D [21,43], 3D [18,20,22,23], and 4D light field imaging [19], flat [22] and miniature microscopy [23]. Our original diffuser was an off-the-shelf Gaussian pseudo-random phase mask with 100% fill factor, placed directly in front of the sensor. However, the resulting PSFs had substantial background light which amplifies noise during deconvolution. Hence, this work instead uses a designed diffuser made of randomly-located microlenses to focus light into high-contrast random multi-focal spots, providing good SNR across a large depth range, while maintaining the randomness of the PSF.

System overview
Our Fourier DiffuserScope consists of a diffuser (a phase mask with randomly-located multi-focal microlenses) in the Fourier plane of a microscope objective, with the sensor placed after, spaced by the average focal length of the diffuser (Fig. 1). Because the actual Fourier plane of the objective is physically inaccessible, we insert a relay system to image its pupil plane onto the diffuser. For each point emitter in the object space, the diffuser will produce a unique multi-spot PSF on the sensor. As compared to Gaussian diffusers or highly-scattered speckle patterns, our diffuser's PSFs concentrate light onto fewer pixels in order to improve SNR, while also ensuring that different points come into focus at different depths. Because our PSF is distributed and different for each point location within the 3D space, it is possible to reconstruct the whole volume from a single measurement with compressed sensing algorithms.
To model the image formation process, we divide the 3D volume into 2D slices, where each slice corresponds to a single depth plane. Neighboring slices are separated axially by less than half the axial resolution. Our experimental system is designed to ensure that the system PSF (the sensor measurement resulting from a single point source) for each depth can be modeled as shift-invariant. Thus, the measurement contribution from each 2D plane is the convolution between the object slice at that depth and the PSF at that depth. At different depths, the PSFs have different sizes, and different microlenses come into focus with our multi-focal design, so that each depth has a unique PSF. The final sensor measurement is the sum of the contributions from each 2D layer, assuming that the light from different fluorescent sources is mutually incoherent and there are no occlusions: Here, y is the intensity measurement on the sensor, h z is the measured PSF at depth z (acquired  Fig. 1. System overview for Fourier DiffuserScope and Fourier light field microscopy (FLFM). A diffuser or microlens array is placed in the Fourier plane of the objective (relayed by a 4f system) and a sensor is placed one microlens focal length after. From a single 2D sensor measurement, together with a previously calibrated point spread function (PSF) stack, 3D objects can be reconstructed by solving a sparsity-constrained inverse problem. Here, we compare three choices of diffuser/microlens array: FLFM with a uni-focal microlens array (MLA), random uni-focal microlenses (RUM) and our Fourier DiffuserScope with random multi-focal microlenses (RMM). Our RMM design provides a non-periodic PSF with different spots coming into focus at different planes, enabling 3D reconstructions with full FOV and high resolution across a wider depth range. Note that the PSF images (bottom right) are shown with a gamma correction of 0.4 for better visibility. during calibration), x z is the object intensity at depth z, and * represents 2D convolution over the transverse dimensions. Since this is a linear operation, we can write our model in matrix form where vector x is a vector representing the entire 3D volume and H is a matrix with columns containing the calibration measurements from every depth. Because our system is shift-invariant at each depth, we can compute Hx computationally efficiently using FFT-based convolutions. The forward model in Eq. 1 defines the data fidelity term of our inverse problem. Because we solve for 3D from a single 2D measurement without reducing the number of lateral pixels in the reconstruction, the inverse problem is under-determined. We solve it by using a compressed sensing algorithm that leverages the multiplexed nature of our measurements and assumes the sample is sparse in some domain. This sparsity-constrained inverse solver can be written as: Here · 2 2 is the data fidelity term, · 1 is a regularization term that enforces sparsity, and τ is a tuning parameter related to the sparsity level. We use 3D Total Variation (TV) sparsity here, with ∇ = [∇ x , ∇ y , ∇ z ] being the gradient operator along each dimension [44]. Since most fluorophores are isotropic in shape, while the resolution of our system is not necessarily isotropic, we add a weighting vector γ = (γ xy , γ xy , γ z ) where the TV value in the lateral and axial directions are weighted differently.
The system architecture for our Fourier DiffuserScope is essentially the same as FLFM, except that we use random multi-focal microlenses (RMM) instead of a uni-focal MLA. To demonstrate the advantages of the RMM over MLA, we theoretically and numerically compare performance by looking at the properties of their respective sensing matrices H. To separate out the effects of randomness and multi-focal, we also compare to random uni-focal microlenses (RUM). The MLA ( Fig. 1) focuses light from the native focal plane into a grid of sharp spots, providing an in-focus PSF with high SNR, but the periodicity causes ambiguity when shifted by more than one pitch, reducing the effective FOV. Randomizing the location of the microlenses (as with the RUM and RMM) breaks the ambiguity, enabling full-FOV imaging with our sparsity-constrained inverse solver. The RUM, like the MLA, uses the same focal length for all microlenses, resulting in blurred PSFs off-focus and therefore a shallow axial imaging range, particularly for high-NA microscopy. Assigning different focal lengths to each of the microlenses, as with our RMM (Fig. 1), extends the imaging depth range, within which there is always a subset of microlenses in focus. This trades SNR near the native focus plane for an increased depth range due to spreading high-frequency information across the whole volume. The RMM PSFs form nearly orthogonal columns of the design matrix H, enabling a compressed sensing 3D reconstruction with more voxels than there were pixels in the 2D measurement (50× more in our experimental prototype).

Diffuser design theory
In this section, we derive the relationship between diffuser design and system performance in terms of lateral resolution, axial resolution, FOV and depth range. The diffuser is characterized by the following parameters: the size of the diffuser L × L, the number of microlenses on the diffuser N 2 (giving an average of N microlenses in each transverse direction), the minimum focal length f min , the maximum focal length f max and the average focal length f average of the microlenses. We investigate three different phase masks (MLA, RUM and RMM) to be placed in Fourier configuration. All three designs have the same size and number of microlenses, but the locations and focal-length distributions are different. The MLA and RUM microlenses all have a single focal length f average , whereas the RMM microlenses all have different focal lengths, varying between f min and f max . The minimum and maximum focal lengths are designed to focus at the closest and furthest depth planes within the volume-of-interest. The rest focus at depth planes evenly spaced within that range, which means their focal lengths are dioptrically distributed between f min and f max . The system schematic and parameter definitions are shown in Fig. 2 and Table 1, respectively.

Lateral resolution
In Fourier configuration, each microlens forms a perspective view of the object. Consider the middle microlens in Fig. 2, which collects light from the yellow region, with acceptance angle α, from an in-focus point source (the orange dot in object space) and forms a diffraction-limited spot on the sensor. If all the microlenses have the same aperture size, other bundles of light from the same point source and with the same acceptance angle will reach other microlenses, focusing to separate spots on the sensor. With the MLA, these spots will form a grid, whereas with the RUM or RMM, they will form a random set of points at the sensor. The in-focus lateral resolution is determined by the size of the diffraction-limited spot beneath a single microlens, which is determined by the effective numerical aperture (NA), or the acceptance angle α, of a microlens sub-aperture. Since the the back pupil of the objective is divided into N microlenses in each direction, the effective NA (under paraxial approximation) is the objective NA divided by N: NA eff = NA obj /N. The diffraction-limited lateral resolution is given by the Rayleigh criterion: ,-.
For RUM and RMM, the aperture size of each microlens varies, so we determine expected resolution by the average sub-aperture size, which is designed to match the MLA effective NA, in order to compare the two situations fairly. To achieve the diffraction-limited optical resolution derived above, the sensor pixel spacing must be small enough to Nyquist sample the pattern after taking into account magnification. To quantify this requirement for our Fourier configuration, consider two point sources laterally separated by ∆d (the orange and purple dots in the object space of Fig. 2). After the 4f system of the objective and the tube lens, their intermediate images will be spaced by ∆d = ( f TL / f obj )∆d. Then, using similar triangles between the relay lens, the microlens plane and the sensor, the distance between the two chief rays on the sensor is ∆d = ( f average / f RL )∆d . Together, we have ∆d = M∆d, where M is the lateral magnification rate from the object space to the sensor: Thus, the pixel pitch p achieves Nyquist sampling when p ≤ M R lateral /2. Because we reconstruct 3D information, we also investigate how lateral resolution changes for objects away from the objective's native focal plane. For MLA and RUM, in which all microlenses have the same focal length, the minimum resolvable spot is determined by the circle of confusion; we define off-focus lateral resolution to be the radius of the circle of confusion, ∆c. To derive the off-focus resolution in our setup, we first calculate the defocus distance of the intermediate image for an off-focus point source (the green dot in object space in Fig. 2), which is scaled by the objective's magnification: ∆z = ( f TL / f obj ) 2 ∆z. Then, by applying the Newtonian form of the thin lens equation for the relay lens, we calculate the location of the second intermediate image of the green point, relative to the diffuser, after passing through the relay lens: z defocus = f 2 RL /∆z . This serves as the 'object' for the diffuser microlenses and z defocus is the 'object distance'. So, the circle of confusion size depends on z defocus , the diffuser focal length f average and the size of a single microlens L/N. The resulting expression describes how the lateral resolution degrades linearly with defocus distance: The primary advantage of using an RMM, as we do in Fourier DiffuserScope, is that it has multiple focal lengths. This means that, within a designed axial range, the spots from the subset of microlenses that are in focus at each depth will have the same size as the in-focus diffraction-limited lateral resolution derived in the previous section. Hence, the lateral resolution does not degrade with depth within the volume-of-interest. When the object moves beyond the designed range, the lateral resolution will increase as the defocus distance increases. A detailed analysis on the depth range is in Sec. 4.4.

Axial resolution
We define the axial resolution as the minimum axial distance between two point emitters that can be resolved. The off-axis microlenses have disparity, such that point sources from different depths are imaged to different lateral locations on the sensor; two points will be resolved if their images are separated by at least the diffraction-limited lateral resolution after magnification. Since the outermost microlens has the largest disparity angle, we analyze the limits on the topmost microlens in Fig. 2, whose center is h = ((N − 1)/2N)L away from the optical axis. Two point sources with the same lateral location are axially spaced by ∆z (the orange and green dots in object space, Fig. 2). In the previous section we have already related ∆z to z defocus . From the similar triangles formed by the relay lens, the diffuser and the sensor, we can calculate the lateral distance between the orange chief ray and the green chief ray on the sensor, ∆h = ( f average /z defocus )h. The minimum distance on the sensor for resolving the points is M R lateral , which sets the minimum value for ∆h, and the value of ∆z we solve for is the axial resolution R axial . Given the relation between the relayed pupil diameter and numerical aperture L = ( f RL / f TL )2NA obj f obj , the axial resolution is: To derive the axial resolution for an off-focus plane, we replace the R lateral term in Eq. 6 with the radius of the circle of confusion ∆c in Eq. 5, and thus the slope of defocused axial resolution as a function of depth is proportional to that of defocused lateral resolution.

Field-of-view
The FOV throughout the volume will be approximately the same as that at the native focal plane of the objective, so we analyze the in-focus FOV for each of the three microlens designs. The regular layout of the MLA results in a periodic PSF. When a point in the scene moves laterally by an amount that shifts the PSF by an integer number of pitches, the shifted PSF is nearly the same as the unshifted one; this creates ambiguities that cause the deconvolution to fail. To avoid this problem, a field stop is inserted to guarantee that the PSF shifts by less than one period over the FOV [10,11], which directly reduces the FOV by the number of microlenses, giving a FOV for the MLA-based FLFM: FOV MLA = FOV obj /N. The randomly located lenses in the RUM and RMM create PSFs with randomly-located spots that do not suffer from the ambiguity caused by periodicity. So, both RUM and RMM are able to reconstruct the whole objective FOV and we have FOV RUM = FOV RMM = FOV obj . This conclusion is based on the assumption that the optics are perfect; however, in reality aberrations can break the shift-invariance of the PSF in the peripheral FOV so that the final reconstruction has a smaller FOV or reduced resolution near the edges. In practice, we determine the FOV for random diffusers by calculating the similarity between on-axis and off-axis PSFs, described in more detail in Sec. 5.2.

Depth range
The depth range describes the axial distance over which the object can be reconstructed with diffraction-limited resolution. For the uni-focal MLA and RUM, the depth range is the depthof-field (DOF) of a single microlens, since all microlenses have the same focal length. The microscope DOF expression is the sum of a wave optics term and a geometrical optics term [45], and we use the effective NA to account for the partitioning of the back pupil plane into multiple microlenses: The main advantage of using multi-focal microlenses in the RMM for Fourier DiffuserScope is that the depth range will be much larger, since the DOFs of different microlenses are offset. The RMM can be designed for the largest possible depth range by ensuring that the focus positions of different microlenses are separated axially by their DOF; thus, the upper bound of the depth range is the product of a single microlens' DOF and the number of microlenses, N 2 × DOF microlens .
To design such a RMM to cover a depth range from −∆z to ∆z, the maximum and minimum focal lengths are designed to focus on the farthest or nearest depth planes: The remaining focal lengths are dioptrically distributed between f min and f max to provide equally-spaced focus planes in the object space. In practice, however, because the microlenses have different sizes and shapes, there will be variation in the resolution of different microlenses. To balance the gain of depth range and the stability of performance, in practice we design the DOFs to overlap by half and the depth range to cover half of its upper bound.

Simulation results
We use simulations to numerically validate the design theory derived in the previous section, and to demonstrate the advantage of using RMM over MLA and RUM. We set the target performance to be ∼ 2 µm resolution across a ∼ 200 µm depth range using a 20×, 1.0NA objective lens ( f obj = 9 µm, NA obj = 1.0, FOV obj = 1.1 mm, D = 18 mm). The design wavelength is λ = 510 nm for common green fluorescent calcium indicators. The tube lens and the relay lens form a 1 : 1 relay system to conjugate the back pupil plane onto the phase mask, so the diffuser side length equals the pupil diameter (L = 18 mm). Calculated from Eq. 3 and Eq. 6, the diffuser has at most N = 5 microlenses in one transverse direction, resulting in an effective NA of 0.2 and predicted resolution R lateral = 1.56 µm and R axial = 1.94 µm. The average focal length of the RMM ( f average = 58.5 mm), matched to the focal lengths of the MLA and RUM, is chosen to achieve a total magnification of M = 6.5×. For the RMM, with our goal of ±∆z = ±100 µm, the microlens focusing at the nearest and farthest depth planes have f min = 54.6 mm and f max = 63.1 mm, respectively. The focal lengths of the remaining 23 microlenses are dioptrically distributed between f min and f max . The surface height of the three phase mask designs are shown in Fig. 1. The centers of the randomly spaced microlenses are generated from a uniform distribution, under constraint that the distance between adjacent centers is at least 70% of the microlens pitch size. Then a spherical surface grows around the center by considering its focal length and the refractive index (n r = 1.56 for photopolymer), and we take the point-wise maximum surface height to form the final diffuser with 100% fill factor. The sensor is located at the distance of the average focal length behind the diffuser with 2 µm pixel size so that Nyquist sampling of the diffraction-limited pattern is achieved.
Our wave-optics simulation framework models light propagation from the object to the sensor. From a point source location in the object space, we calculate the spherical wavefront at the the back focal plane, then multiply by the apodization function of the objective to get the wavefront distribution at the pupil plane. The wavefront at pupil is then multiplied by the transmission function of the diffuser/MLA, then propagated to the sensor using the angular spectrum method [46]. The resulting PSFs are shown in Fig. 1 for each case. The in-focus PSFs are the measured intensity with a point source at the native focal plane of the objective; the defocus PSFs depicted in Fig. 1 are the measured intensity when the point source is off focus by 100 µm towards the objective. For both uni-focal designs (MLA, RUM), all the lenslets are in-focus or out-of-focus simultaneously, while for RMM each microlens comes into focus at a different plane.

Resolution
To characterize the lateral and axial resolution, which vary with depth, we reconstruct volumes from acquisitions with two point sources at varying separation distances. After 8 iterations of Richardson-Lucy deconvolution [47,48], we consider the two points resolved when there is at least a 20% intensity drop between neighboring points, as in the Rayleigh criterion. For lateral resolution, the two points are placed on the same depth plane with separation only in the x-y direction; for axial resolution, the two points are both on the optical axis and symmetrically set apart from the designated depth plane. The results in Fig. 3 compare the reconstruction resolution for the three diffuser/MLA designs, with comparison to the theory presented in Sec 4.
At the native focal plane (z = 0 µm) the MLA has a lateral resolution of 0.6 µm and the RUM has a lateral resolution of 1.1 µm (Fig. 3(a)), somewhat better than the predicted R lateral = 1.56 mm owing to deconvolution. However, the resolution of both uni-focal designs (MLA, RUM) degrades rapidly with depth; based on Eq. 5, the slope of the resolution with depth is 0.13 laterally and 0.1625 axially. The lateral resolution of our Fourier DiffuserScope (RMM) remains relatively steady over a large depth range (z = −80 µm to z = 90 µm), varying between 1.4 ∼ 2.6 µm.
The axial resolution ( Fig. 3(b)) follows similar trends. The highest axial resolution for both MLA and RUM is 1.75 µm at the native focal plane, which is somewhat better than our theoretical prediction of R axial = 1.94 µm (Sec. 4.2). The axial resolution of RMM oscillates between 2.0 ∼ 3.8 µm within a 170 µm depth range. Thus, we conclude that our RMM design, relative to the MLA and RUM, slightly sacrifices lateral and axial resolving power at the native focal plane, but gains uniformly high performance across a large imaging volume.

Field-of-view
To compare the FOV of the three different designs, we simulate and reconstruct a 2D phantom that fills the objective FOV (1.1 × 1.1 mm 2 ), placed at the native focal plane of the objective (where the uni-focal microlenses have the best performance). The theory in Sec. 4.3 predicts that the random diffusers (RUM, RMM) should be able to reconstruct the whole object, while the MLA will only reconstruct FOV MLA = 220 µm.
To simulate the imaging pipeline accurately, we take into account the aberration from planoconvex microlenses, which means that the PSF at the edges of the FOV will have subtle differences from the center PSF. We divide the object into 10 × 10 µm 2 blocks, convolve each block's content with its corresponding PSF (calculated at the center of the block) and then sum up the convolution result from all the blocks to get the simulated measured image. After the spatially-variant block-wise convolution is done, we add 5% Gaussian noise to generate the final measurement shown in Fig. 4(a), first row. The simulated MLA measurement has a periodic pattern because of its periodic PSF, while the diffuser measurements are more random.
To reconstruct the image, we deconvolve the simulated acquisition with a single on-axis PSF, assuming shift invariance. The reconstructions after 8 iterations of Richardson-Lucy deconvolution [47,48] are shown in Fig. 4(a), second row. No regularization is added (τ = 0 in Eq. 2) in order to compare the worst-case performance. The reconstruction using the MLA shows periodic replicas and large errors, due to the ambiguity of its PSFs, which are nearly the same for parts of the FOV that correspond to an integer shift of the periodic pattern. Restricting the FOV with a field stop eliminates this ambiguity at the cost of a severely reduced FOV. Both random diffusers, which do not have ambiguities in their PSFs, are able to reconstruct the whole object faithfully. The RUM has a slightly better peak signal-to-noise ratio (PSNR), since at the native focal plane all its microlenses are in focus, while only some of the RMM microlenses are. The error maps (error = reconstruction − ground truth) in Fig. 4(a) show significantly less error for the random microlenses designs. The reconstruction from the MLA distributes lots of energy into ghost structures. As for the random diffusers, the overall error is significantly smaller than that from MLA, and mainly at edges of objects, which can be reduced by adding TV regularization to the reconstruction.
The shift-variance introduced by the aberrations in our simulation will cause model-mismatch that reduces the performance of the system when using a single-PSF reconstruction. To quantify the shift-variance, we examine the cosine similarity (normalized cross-correlation) between the on-axis and off-axis PSFs (Fig. 4(b)). At each lateral shift location, we register the off-axis PSF to The FLFM (with MLA) reconstruction suffers from ghosting replicas (blue regions in the error map) due to its periodic structure. Both the RUM and the RMM reconstruct the phantom successfully. The error of the random diffusers mainly occurs at sharp edges, which can be fixed by adding total variation regularization. Error = reconstruction − ground truth. (b) Cosine similarity between the on-axis PSF and off-axis PSFs is used to quantify the shift-invariance assumption. The MLA has the highest similarity value (red), but its FOV is limited by the microlens pitch. The similarity of RUM (blue) and RMM (orange) are all above 75% across the full objective FOV. the on-axis PSF [49] and calculate the normalized cross-correlation between them. The similarity value for randomly-located microlenses is at least 75% across the FOV, which is sufficient for single-PSF deconvolution [18]. At the edges of the FOV, the similarity goes down because the aberration and distortion are most severe at the periphery. The MLA provides the highest values because all microlenses have a regular shape and are of the same size, but the benefits are not useful because the FOV is actually limited by periodicity, as described in Sec. 4.3. The randomly distributed microlenses have irregular borders where the surfaces of neighboring microlenses are merged, which increases the aberration, and the multi-focal diffuser adds additional defocus aberration as compared to the uni-focal diffuser.
If shift-variance is of concern or high accuracy near the periphery is important, we can correct model mismatch with a spatially-varying deconvolution algorithm [22]. This algorithm calibrates the PSFs at multiple points across the FOV and interpolates them to find the PSFs at each position. It should give better reconstructions, but at a cost of significantly longer computation times and larger memory requirements. In our experimental system, the highest angle incident onto the diffuser (13 degrees) is much smaller than the highest angle (50 degrees) in [22], and the shift-invariant assumption holds well. Thus, we choose to use only a single PSF for each depth for computational efficiency.

Depth range
The two-point resolution in Fig. 3 can be used to estimate the depth range. For the uni-focal designs (MLA, RUM), the lateral resolution remains below its predicted in-focus value over a range of ∼ 20 µm, which is in agreement with the depth range predicted by Eq. 7 using our system parameters: DOF microlens = 0.51×1.33 0.2 2 + 1.33×2 6.5×0.2 = 19 µm. The multi-focal design has stable performance from z = −80 µm to z = 90 µm, demonstrating the improvement of depth range over uni-focal designs.
To demonstrate the depth range differences, we reconstruct a long 3D spiral of point sources covering a 200 µm depth range (Fig. 5). This phantom contains 39 spheres of 2 µm diameter, with the first one at z = 95 µm and the last one at z = −95 µm, spaced axially by 5 µm (resulting in a 3 µm gap between spheres axially). The lateral distance between the spheres starts from 3 µm (gap is 1 µm) at the center of the spiral and increases up to 7 µm (gap is 5 µm) at the outer circle of the spiral. The lateral extent of the spiral (66 µm) stays within the restricted FOV of the MLA to avoid ghosting artifacts. We divide the 200 µm-long object into 200 layers of 2D slices, implement the forward model in Eq. 1 and add 5% Gaussian noise to the simulated measurement (Fig. 5). The measurement contains 25 sub-images of the spiral object, one for each microlens which observes the spiral from a specific angle; in this way the 3D information is encoded into a single 2D acquisition. The simulated measurements highlight why the depth range of the multi-focal RMM (Fourier DiffuserScope) is much larger than the uni-focal design cases. For the uni-focal (MLA, RUM) cases, only the waist area of the spiral is sharp in all the sub-images. For the multi-focal RMM, different spiral sub-images contain different sharp areas; hence, more in-focus information about the entire depth range passes into the measurement.
The 3D reconstructions for each of the three cases are shown in Fig. 5. We use a PSF calibration stack with fewer (100) PSFs than were used in the forward simulation, to mimic practical axial sampling rates of continuous objects. The sparsity parameter (τ = 1e − 5) is hand-tuned and remains the same for all cases. From the reconstructions, the benefit of using multi-focal microlenses is obvious. 36 spheres are clearly resolved (from z = −80 µm to z = 95 µm) with the RMM design, while only up to 13 spheres are resolved with the uni-focal designs (green shaded regions). The depth range of the three cases matches the depth range where the axial two-point resolution is under 5 µm. However, from both the two-point resolution result and the 3D object reconstruction result, the depth range of our RMM is slightly worse than predicted. This is likely due to most microlenses being out-of-focus at both ends of the targeted depth range, causing a lack of high-frequency information that is difficult to deconvolve.

Experimental system and results
We build an experimental Fourier DiffuserScope system using the RMM design from our simulation, with a 20×, 1.0 NA objective lens (Olympus XLUMPLFLN). The fluorescent sample is excited with blue light from a Xenon lamp light filtered by a band-pass emission filter (Semrock FF01-474/27). The emitted green light is filtered using a dichroic mirror (Semrock FF495-Di03) and an emission filter (Semrock FF01-520/35). Since the back pupil diameter is larger than the sensor size (Andor Zyla 4.2, sensor size 13.3 × 13.3 mm, pixel size 6.5 µm), we demagnify the pupil by 3.75× so that the full FOV can be recorded. The relay lens design is optimized using Zemax OpticStudio to reduce aberration (see Supplemental materials).
To fabricate our RMM diffuser, we make a negative mold by randomly indenting polished copper using ball bearings with diameters ranging from 10 mm to 16 mm. We then use polydimethylsiloxane (PDMS) to make a replica of the mold with convex-plano microlenses. This approximately achieves our diffuser design parameters of average focal length of 15.6 mm (after considering the relay system), with minimum and maximum focal lengths of 12.3 mm and 21.4 mm, respectively, giving a ∼ 200 µm depth range. The main fabrication errors come from deformation error during indentation and shrinkage of the PDMS material. These have opposite effects, since the indented deformation will have bigger diameter than the indenter while the material shrinkage gives smaller diameter, so they offset each other to some extent.
Fabrication errors should be accounted for during calibration, such that they do not cause model mismatch during deconvolution. The calibration PSF stack is acquired by placing a sub-resolution fluorescent bead at different depths, controlled by a motorized stage, then recording the intensity with a sensor located at the average back focal plane of the diffuser. In total, 350 PSF images were recorded with a 2 µm axial increment from 200 µm below the native focal plane to 500 µm above. When the point source moves more than 200 µm below the native focal plane, the overall PSF becomes so small that the out-of-focus blur from neighboring microlenses will merge into each other, which causes very noisy reconstruction, so we avoid placing objects in this region. Based on the PSF measurements, shown in Fig. 6a, there are ∼ 60 microlenses in the illuminated region of the diffuser. We apply the theory in section 4.1 and 4.2 to get the predicted lateral resolution of 2.4 µm and axial resolution of 2.8 µm.
We experimentally measured the two-point resolution (Fig. 6b) in order to benchmark the resolution and depth range of our Fourier DiffuserScope prototype. For practicality, measurements were synthetically generated by summing two experimental PSFs at different locations. The image was recovered by solving Eq. 2, and we then calculated the smallest distance at which the two points were still resolved, both laterally and axially, at each depth z. The increment of separation distance is 0.1 µm laterally and 1 µm axially. Across a depth range of 280 µm (from z = −150 µm µm to z = 130 µm), the lateral resolution fluctuates between 2.5 µm and 2.9 µm and the axial resolution is mostly 4 µm, close to their theoretical predictions. The depth range is larger than the design, suggesting that the the actual diameter range of the microlenses is wider than the ball bearings used.
We next demonstrate our system on a live adult C. elegans organism that is pan-neuronally expressing a GCaMP fluorescent indicator. The C. elegans is anesthetized by levamisole in M9 buffer and then loaded into a 1000 × 1000 × 100 µm 3 arena on a microfluidic chip which constrains the worms to move within the FOV of the objective. Since our method is able to reconstruct a 3D object from a single shot, the frame rate is only limited by the sensor. We recorded a raw video at 25 fps while the worm was freely moving (Fig. 6c). There is one C. elegans image behind every microlens and in total we see ∼ 60 overlapping C. elegans images, each from a different angle. Given that every location in the object space has a unique PSF on the sensor, we are able to deconvolve the overlapping images. Our deconvolution algorithm applies ADMM due to its fast convergence rate [50]. To save memory, we did not deconvolve the measurement with all the calibrated PSFs, instead we firstly use a coarse axial sampling of the PSFs to locate the object occupied depth range and then a small subset of fine sampling PSFs to reconstruct the object. The C. elegans in our raw video moves within a 80 µm depth range and we use 50 PSFs with 2 µm axial increment to cover the whole object. The reconstructed C. elegans from the corresponding frame is displayed in a color-coded depth image in Fig. 6c, showing the potential of our method to locate the neurons of the whole animal simultaneously in 3D. The full video is available in Visualization 1. The randomness of the diffuser also enables compressed sensing reconstructions with more voxels in the 3D result than pixels on the sensor. From a 4.2 mega pixel sensor, the reconstructed C. elegans volume contains 50× more voxels and the gain could increase to 140× if we deconvolve with all the available PSFs within the 280 µm depth range. With regards to the resolvable voxels, for our experimental system the lower bound equals the imaging volume divided by the worst lateral (2.9 µm) and axial (4 µm) resolution, which gives 10 mega resolvable voxels per frame.

Discussion and conclusion
Like light field microscopy, our Fourier DiffuserScope achieves single-shot 3D imaging with high light throughput. Like Fourier light field microscope, it has efficient computation and reduced artifacts near the objective's native focal plane. Beyond Fourier light field microscope, our Fourier diffuser design and sparsity-based inverse algorithm enables nearly uniform resolution across a large imaging volume. Here, we: (1) provide a theoretical design framework for calculating system performance (e.g. resolution, volumetric FOV) from the diffuser parameters (e.g. number of microlenses, focal lengths), (2) carry out theoretical and numerical comparison to demonstrate that our RMM can achieve more uniform resolution across a larger imaging volume than a uni-focal MLA (Fourier light field microscope) or a RUM.
In the future, our system can be further improved in several ways. (1) For fabricating diffusers, our current indentation method is fast and cheap, but imprecise. While as-built surface shape should be captured by the PSF calibration and computationally accounted for, using more time-consuming and expensive manufacture methods (e.g. diamond turning, injection molding, or two-photon polymerization) could improve the surface quality of the diffuser to precisely fabricate a pre-defined diffuser surface and guarantee the system performance. (2) In our forward model, we didn't take into account scattering or use any space-time models for video processing. Particularly for neural activity tracking applications, the temporal behavior of calcium indicators can be used as a constraint [51], and the scattering potential can be incorporated to enable deeper imaging with higher-fidelity reconstructions, further suppressing noise and enhancing resolution.
(3) Our first-principle derivation for our random diffuser design is made for a general-purpose imaging situation; however, for a specific type of data set, the microlenses locations and focal lengths distributions can be optimized using data-driven approaches for end-to-end learning.

Supplemental materials
In this section, we examine the design of the relay lens. The relayed pupil plane cannot have huge aberrations, otherwise the shift-invariant assumption does not hold. To investigate how aberration affects out system, we model the system in Zemax OpticStudio (Fig. 7). The objective is represented by a paraxial plane since its lens data is not publicly available. The tube lens data is a black box model downloaded from Thorlabs (Thorlabs TTL180-A). There are two restrictions in choosing the relay lens: the focal lengths should be < 60 mm to make the full FOV recorded and the clear aperture is at least ∼ 30 mm to prevent vignetting. However, we see a huge amount of aberration with the off-the-shelf lenses, even achromatic doublets. To demonstrate, Fig. 7 (b) shows the layout of an achromatic lens with 50 mm focal length (Edmund Optics 89682, clear aperture 39 mm) and the resulting footprint of the relayed pupil plane at different field heights. Since the phase mask is put at the relayed pupil plane, the drifting footprints mean that different areas on the diffuser will be utilized for different field locations, which breaks the shift-invariant forward model. To implement an aberration-free and cost-effective relay lens, we designed a customized lens set with off-the-shelf elements. Noticing that the function of the relay lens is similar to an eyepiece in a traditional microscope, we start from the Erfle eyepiece design due to its wide FOV and short working distance [52]. Since the Erfle contains one piece of uncommon convex-concave lens, we replace it with a convex-convex achromatic doublet plus a concave-plano lens. We alternately optimize the radii of all the surfaces and replace every lens with the most similar counterpart in the catalog. We also optimize the air gap between every two components which can be controlled by using a spacers inside the lens tube. After many iterations, we arrive at the final design described in the Table 2. In Fig. 7 (c), the new design's footprint of the peripheral field mostly overlaps with the center footprint and the aberration is greatly reduced. The relayed pupil size is 4.8 mm which means that the effective de-magnification rate of the back pupil plane is 3.75× and the effective focal length of the lens set is 48 mm.