Diffractive lensless imaging with optimized Voronoi-Fresnel phase

Lensless cameras are a class of imaging devices that shrink the physical dimensions to the very close vicinity of the image sensor by replacing conventional compound lenses with integrated flat optics and computational algorithms. Here we report a diffractive lensless camera with spatially-coded Voronoi-Fresnel phase to achieve superior image quality. We propose a design principle of maximizing the acquired information in optics to facilitate the computational reconstruction. By introducing an easy-to-optimize Fourier domain metric, Modulation Transfer Function volume (MTFv), which is related to the Strehl ratio, we devise an optimization framework to guide the optimization of the diffractive optical element. The resulting Voronoi-Fresnel phase features an irregular array of quasi-Centroidal Voronoi cells containing a base first-order Fresnel phase function. We demonstrate and verify the imaging performance for photography applications with a prototype Voronoi-Fresnel lensless camera on a 1.6-megapixel image sensor in various illumination conditions. Results show that the proposed design outperforms existing lensless cameras, and could benefit the development of compact imaging systems that work in extreme physical conditions.


Introduction
Imaging devices, such as photographic objectives and microscopes, have long been relying on lenses to focus light and create a projection of the scene onto photosensitive sensors. In such designs, as in human eyes, the focal length of the lens presents a fundamental limit for the overall device form factor. In addition, various optical aberrations preclude the use of single lenses for wide field-of-view (FOV) imaging. Instead, groups of compound lenses that are carefully designed must be used for good image quality.
With the ever increasing demand for compactness of imaging optics, great efforts have been motivated to develop lensless cameras during the past few years. To date two prevailing lensless methods exist, natural compound-eye mimicry [1] and heuristic point spread function (PSF) engineering [2].
Analogous to insect eyes, various artificial compound-eye designs have been proposed to directly mimic the eye structures in nature. Early artificial compound-eye structures tessellate regular Micro-Lens Arrays (MLAs) on planar or curved surfaces. Each lenslet is an independent unit to create a small image. By tessellating an array of lenslets on a planar surface, TOMBO [3,4] produces final images by a backprojection algorithm to stitch sub-images together. To prevent optical cross-talks, a light-blocking layer has to be employed under each micronlens. The number of output image pixels is equal to the number of lenslets. Gabor superlens [5] employs sophisticated multi-layer planar MLAs and aperture arrays along the optical path to re-arrange light rays on the sensor. Followup implementations either stitch the sub-images algorithmically (eCley [6]) or optically (oCley [7]). A recent artificial ommatidia lensless camera [8] employs plasmonic structures to allow for larger FOV for planar compound eyes. Tessellating lenslets on curved substrates is more challenging, but yields a wide FOV. A digital compound-eye camera [9] tessellates a uniform elastomeric micro-lens array with stretchable electronics on a hemisphere to fully resemble the arthropod eyes. CurvACE [10] achieves 180 • horizontal FOV and 60 • vertical FOV with polymer microlenses and flexible printed circuits. These methods focus more on the optical tessellation, with less attention on the computational reconstruction. The resolution is relatively low, and only simple scene targets can be imaged.
The other strategy considers the entire system from the perspective of the PSF it generates. Instead of mapping each scene point to a single focus spot on the image plane, as in conventional lens-based systems, such lensless cameras render each scene point as a distributed pattern that covers large areas of the image plane. The captured raw data is interpreted by computational imaging algorithms based on the physical model to reconstruct the latent image. Early implementations make use of amplitude masks [11] to create patterns induced by shadowing effects. In order to improve the numerical conditioning, separable masks have been proposed in a coded-aperture design [12] and FlatCam [13] for optimal amplitude mask design. Fresnel Zone Aperture (FZA) [14] is also used for lensless imaging. These amplitude masks inherently suffer from low light efficiency. Phase-only lensless cameras improve the overall throughput without blocking light. A binary phase profile with tailored odd-symmetry gratings in PicoCam [15,16] produces spiral-shaped PSFs. An important pioneer work is DiffuserCam [17], where a non-optimized diffuser is used to generate caustic patterns as the PSF. A Perlin pattern is later introduced in PhlatCam [18] as a better heuristic PSF. Random lenslets have also been favored in 3D light field microscopy, such as Miniscope3D [19] and Fourier diffuserScope [20]. A random lenslet diffuser has also been used as a lensless on-chip fluorescence microscope [21]. A custom microlens array (MLA) has been adopted as a single optical component in the computational miniature mesoscope [22,23] for fluorescence imaging. Very recently a learned 3D lensless camera [24] co-designs the MLA and the neural network to achieve single-shot 3D imaging without the need of PSF calibration.
Here we report a new diffractive lensless camera with spatially-coded Voronoi-Fresnel phase to improve the imaging performance by leveraging the benefits of both the compound-eye structure and PSF engineering. Inspired by the random and uniform distribution of lenslets in the apposition compound eyes [25], and motivated by an observation from the properties of the engineered heuristic PSFs [18], we find that the image quality after algorithmic reconstruction is positively related to the sparse distribution of high-contrast bright spots in the PSFs, subject to the phase being physically realizable. In other words, lensless cameras favor PSFs with concentrated sparse patterns. From the perspective of Fourier optics, Modulation Transfer Function (MTF) [26], the Fourier counterpart of PSF, offers a more comprehensive measure to quantify this phenomenon. High frequency details are better recoverable in the reconstruction algorithms if the cut-off frequency is kept as large as possible, as well as the MTF is uniformly distributed in all directions in the Fourier domain. However, MTF is a 2D function of spatial frequencies. We therefore define MTF volume (MTFv) as a single-number metric for information maximization. A larger MTFv value encourages the system to transmit more information from hardware to software.
By applying this principle, we find that a certain number of individual point PSFs makes an excellent match to the preferred PSFs in lensless imaging. To this end it is possible to engineer the optimal PSFs from the compound-eye mimicry. We devise a metric-guided optimization framework with modified Lloyd's iterations to find non-trivial and non-heuristic solutions to the problem. The resulting PSF is a collection of diffraction limited diverse directional spots, with optimized spatial locations and total number. They do not work individually as in conventional compound eyes, but their union is the equivalent PSF to be coupled with computational algorithms to reconstruct sharp images. With significant lower requirement in optics and more flexibility in computation, the proposed method offers a solution to high quality 2D photography devices that work in extreme physical conditions, such as wearable cameras, in-cabin monitoring, and capsule endoscopy. Unconventional imaging applications in various disciplines are expected to be triggered owing to the emerging importance of flat optics imaging.

Overview
An overview of the proposed Voronoi-Fresnel lensless camera is shown in Fig. 1. Inspired from the delicate distributions of the ant's ommatidia (Fig. 1a), our lensless camera consists of an optimized phase element just a few millimeters above the image sensor (Fig. 1b). The phase features an irregular array of quasi-Centroidal Voronoi cells containing a base first-order Fresnel phase function (Fig. 1c). By applying the proposed design principle, our Voronoi-Fresnel phase yields optimized PSF (Fig. 1d) and MTF (Fig. 1e). An image reconstruction pipeline (Fig. 1f) is then employed to recover high-quality images from the non-interpretable raw data. The proposed Voronoi-Fresnel lensless camera consists only of a phase element in close proximity to the sensor (distance is a few ). The zoom-in illustrates the detailed PSF and Bayer filters. (c) The first-order Fresnel phase function is the building block of the camera. Their center locations are distributed in a quasi-Centroidal Voronoi Tessellation across the 2D plane. (d) The PSF is an array of diffraction limited spots with optimal spatial locations (intensity enhanced for better visualization). (e) MTF (in log scale) is uniform across the Fourier domain. (f) The image reconstruction pipeline converts the uninterpretable raw data (left) to a high quality image (right).

Voronoi-Fresnel phase
Our Voronoi-Fresnel phase is composed of a base first-order Fresnel phase that is duplicated in various sub-regions on the 2D plane (Fig. 1c). The base Fresnel phase at the design wavelength is defined as where ( , ) are the Cartesian coordinates on the phase plane, = 2 / is the wave number, and is the distance from the optics to the image sensor. This phase function is the first-order approximation of an ideal lens, producing a diffraction limited spot in the paraxial region. We divide the design space into regions that form a complete tessellation of the whole area. A typical tessellation is a Voronoi diagram, which is a collection of sub-regions that contain points closer to the corresponding generating sites than any other sites. Each sub-region, also known as a Voronoi cell , features a center location and a few vertices. The origin of the Fresnel phase function coincides with the center location, and the polygon determined by the vertices creates a distinct aperture for that cell. We refer to each Voronoi cell with the Fresnel phase as a Voronoi-Fresnel cell. The entire Voronoi-Fresnel phase is a collection of all the constituent Voronoi-Fresnel cells, where ( , ) are the center locations of all the Voronoi-Fresnel cells. The aperture function is defined by the vertices of the -th sub-region, where = 1 inside , and = 0 outside. The union of all cells is the whole region, and any two different cells have no intersections.

PSF and the MTFv metric
The PSF is characterized by the Fresnel diffraction pattern of the entire Voronoi-Fresnel phase on the image sensor, which is the squared magnitude of the diffracted optical field. The panchromatic PSF can then be calculated as an integral of spectral PSFs over the effective spectral range where F {·} is the Fresnel propagator for a distance .
Since the entire phase function is a collection of the base Fresnel phase, we further investigate the individual PSFs that are generated from the polygonal sub-apertures. As demonstrated in Supplemental Document 1, when adjacent Fresnel phase centers are at the centroids of the cell, the cross interference between individual cells are negligible, with maximum error less than 1%. The equivalent PSF can be approximated by the union of individual PSFs, where PSF 0 is the centered PSF as if the aperture were located at the origin. Note that in the simulation and optimization below, we do not simulate individual PSFs separately, but use the entire constructed phase function to obtain the panchromatic PSF. These irregular apertures are significant, since the resulting diffraction pattern generates a set of compact directional filters depending on the geometries. We show such directional filtering effect in Fig. 2. The center of a base Fresnel phase is first shifted randomly off the origin. A polygon aperture is then imposed on the phase profile. Here, without loss of generality, we illustrate four simple polygons, a triangle, a rectangle, a pentagon, and a hexagon (Fig. 2a). The corresponding panchromatic PSFs are rendered in false color in Fig. 2b. Long tails in the PSFs show up in the perpendicular directions to the polygon edges due to diffraction. A collection of such directional spots resulted from these cells make up the effective PSF.
It is more effective to evaluate the PSF through its frequency counterpart, the MTF. With higher MTF across all frequencies, the imaging system preserves more details in the scene, a property that has long been used to evaluate the quality of lens-based optics. It can be seen from the log-scale panchromatic MTFs in Fig. 2c that the long tails in the spatial domain cause the MTF to drop in certain directions, while information is maintained in other directions. For example, in the triangle and rectangle cases, the elongated directions in the PSF are compressed in the MTF, whereas the squeezed directions in the PSF are then pulled in the frequency domain. This is a result of the scaling property of the Fourier transform. As the polygon becomes more regular in all edges (e.g., the pentagon and hexagon examples), the directional filtering tends to be more isotropic. A circular aperture as in conventional lenses would result in no directional filtering effect at all.
The distribution of the panchromatic PSF, or equivalently the MTF, is key to the imaging performance. However, MTF is a 2D function in the Fourier domain. It is difficult to use directly as a metric to optimize the optical element. We propose to use normalized panchromatic MTF volume (MTFv) as a single-number metric to evaluate the system performance, where and are the Fourier frequencies, and |Ω| is the area of the design space. The MTF is the Fourier transform of the panchromatic PSF, i.e., MTF ( , , ) = |F {PSF ( , )}|, with F {·} being the Fourier transform.
The MTFv is not only simple for computation, but also shares a connection with the wellestablished optics metric, Strehl ratio, that is widely used in image quality evaluation. As derived in Supplemental Document 1, Strehl ratio can be approximated by integrating the MTF as an upper bound [27]. Since the reference diffraction limited quantity only affects the denominator, we can ignore it when used as a figure-of-merit, leading to the above MTFv metric. In addition, MTFv is in principle a generic metric that imposes no restrictions on the intensity distributions of the PSFs, so it is applicable for all phase-type lensless designs. A larger MTFv value encourages more information to be recorded by the optical system, so we can employ it as a guide to seek for optimal PSFs generated by the Voronoi-Fresnel phase.

Optimization and properties
MTFv is a highly nonlinear and non-convex function of the number of Voronoi regions and their center locations ( , ). It is challenging to optimize the MTFv over the entire parameter space. Therefore, we do not aim at finding a closed-form solution to maximizing MTFv. Instead, we devise an optimization framework to search for a feasible solution. We show in the Supplemental Document 1 that the effective MTF largely depends on three factors, the diffraction by each aperture; the phase delay terms in the Fourier domain that are introduced by the amount of spatial shifts from the centered PSFs; and the total number of the Voronoi-Fresnel cells .
The smallest dimension in the aperture geometry plays an important role due to diffraction. The phase delay terms would vary dramatically as the distances between each pair of center locations change. Given a certain number of sites within a fixed area, there are infinite Voronoi tessellations, whereas the MTFv values vary significantly. Taking the extreme case of = 1, we get a compact single spot PSF, but it is impractical to realize such an optical element that covers the whole image sensor. On the other hand, when approaches infinity, the PSF becomes pixel-wise uniform, and carries no useful information. Therefore, there exists an optimal number that maximizes the MTFv. Our solution to the Voronoi-Fresnel phase optimization is based on the above conjecture that the optimal center locations would require uniform distributions of the Voronoi cells over the 2D design space, as well as keeping some degrees of irregularity of each Voronoi region to diversify the spatial filtering of individual PSFs. To take the above factors into account, our optimization framework employs a two-step search scheme. In the first step, we fix , and adopt a quasi-CVT routine to maximize the panchromatic MTFv, as summarized in Algorithm 1. The second step is then a parameter sweep to find the best .  21 end In the first step, we initialize a set of random coordinates. A Voronoi tessellation is constructed to produce the Voronoi-Fresnel phase function by Eq. (2). The corresponding MTFv is computed by Eq. (5). In each iteration, we compute the centroids of each cell and update the center coordinates with the centroids. A new Voronoi tessellation is constructed, leading to a new MTFv for the updated Voronoi-Fresnel phase. The new site coordinates and the corresponding MTFv are recorded only if the current MTFv is larger than the existing maximum MTFv, otherwise are discarded. After each iteration, we compute the root-mean-square (RMS) distance between the new sites and the previous sites as the residual error. The iterations terminate when the residual error is smaller than a preset tolerance. To make the algorithm fabrication-aware, we set the terminating tolerance tol as a hundredth (0.01) of the smallest feature size that can be fabricated. The iterations terminate after a maximum number of iterations maxiter is reached.
An exemplary quasi-CVT optimization is shown in Fig. 3a followed by a parameter sweep step in Fig. 3b. This simulation uses 240 × 160 sensor pixels with a pixel pitch of 3.45 m. The distance is 2 mm from the optical plane to the sensor plane. The quasi-CVT optimization is initialized by a random set of points as the center locations of the base Fresnel phase (green dots). The green edges mark the individual apertures. As the optimization evolves, the Voronoi-Fresnel cells tend to scatter uniformly in the design space, while a certain degree of irregularity of the apertures is maintained. When the optimization converges, an optimal phase profile is achieved. See Visualization 1 for an animation of the optimization process. To account for the random initialization effects, for each we run the quasi-CVT steps 100 times with different initializations, and take the mean value as the resulting MTFv. We take the standard deviation among the 100 runs as the uncertainty (Fig. 3b). After repeating optimization for different values in the same manner, the mean MTFv can be plotted against , and a polynomial curve is fitted to the data. The maximum MTFv is found to occur when = 23 in this example. The corresponding Voronoi-Fresnel phase is selected as the final result. A notable difference between our quasi-CVT algorithm and conventional CVT is that, a certain degree of irregularity should be preserved in quasi-CVT, while the CVT tends to converge to a regular hexagonal grid. Theoretically, the optimal CVT in a 2D plane results in a hexagonal pattern [28]. To demonstrate that the optimization is non-trivial, we compare the optimized result from the above algorithm against a regular rectangular and a hexagonal Voronoi-Fresnel phase in Fig. 4. The design area is 256 × 256 pixels, and the number of cells is 64 in all three cases. The two regular grids are equivalent to common off-the-shelf microlens arrays. The MTFv on the regular rectangular (MTFv = 1.14) or hexagonal grid (MTFv = 1.19) is significantly lower than the optimized result (MTFv = 2.29). Periodic patterns exist in both regular grids. This attributes to the fact that the regular grids lack of diversity in the directional filtering. Therefore, the optimization is non-trivial. Simple and regular off-the-shelf microlens arrays are not well-suited for this purpose. The parameter sweep in the second step of the algorithm is time-consuming, especially when the design space is large. We can facilitate the search of best number of sites in the full scale from smaller scales. We evaluate the optimal in various common scales for various aspect ratios of 1:1, 4:3, 3:2, 16:9, 21:9, and plot the MTFv with the design area in Fig. 5. See detailed analysis in Supplemental Document 1. The result indicates our algorithm is linearly scalable with respect to the design area, regardless of the aspect ratios of the design area.

Image reconstruction
We formulate the image formation process as a convolution between the ground-truth image and the PSF. It can be expressed in the matrix-vector product form, where x ∈ R 3 is the ground-truth image with pixels in three color channels, A ∈ R 3 ×3 is a matrix that represents the color PSF, and y ∈ R 3 is the captured raw data. The data is degraded by additive noise n.
There are various methods to solve the image based on the above image formation model. A straightforward way is to solve an inverse problem with effective image priors in an optimization framework. Recent data-driven image reconstruction methods make use of end-to-end deep neural networks to inference the required image. They usually require a large dataset captured with specific designs, such as diffuserCam [29,30], or PhlatCam [31], and cannot be generalized from hardware to hardware. In this paper, we focus on the advantages of the optical optimization, so we adopt the image deconvolution method with a Total Variation (TV) regularization to demonstrate and compare the optical advantages agianst prior works. The TV regularizer encourages the sparsity of edges in natural images, which has proved to be effective in lensless image reconstruction [11,17]. Specifically, we solve arg min where · 2 and · 1 are the ℓ 2 -norm and ℓ 1 -norm respectively. D is the finite difference operator, and is a penalization weight. An effective way to solve this problem is to employ the Alternating Direction Method of Multipliers (ADMM) [32]. The detailed implementation is presented in Supplemental Document 1.

Simulation results
We first validate our methods in simulation, and compare its performance with very recent phase-type lensless cameras that share the most similar inspirations, DiffuserCam [17] and PhlatCam [18]. To ensure a fair comparison, all the simulations are performed with the same conditions. We assume the sensor pixel pitch is 4 m, and the resolution of the phase element is 1 m, which provides a good sampling rate between the two. The sensor pixels are 256 × 256, and Voronoi-Fresnel phase is 1024 × 1024. The distance between the optical element and the sensor is 2 mm. The PSFs for the Voronoi-Fresnel design and the PhlatCam design are obtained through a full spectrum simulation, from 400 nm to 700 nm with an interval of 10 nm (totally 31 spectral bands). We are not able to simulate a diffuser PSF, so we adopt the closest PSF from DiffuserCam, and assume the intensities are the same for the three color channels. To take the spectral variations into account, we use multispectral image data [33,34] for the full spectrum simulation. An example raw data with their respective PSFs are shown in Fig. 6a. The raw data from our lensless camera is more structured rather than flattened in the other two cases. For a complete comparison, here we also include the regular Fresnel array configurations in both rectangular grid and hexagonal grid, as shown in Fig. 4. The corresponding reconstructed images are shown in Fig. 6b. We evaluate the performance by Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) in Fig. 7. The results show that, our Voronoi-Fresnel design outperforms existing methods in both PSNR and SSIM, owing to the MTFv guided optimization. The optimized Voronoi-Fresnel configuration also performs better than the two reference regular Fresnel configurations.

Prototype design
A prototype Voronoi-Fresnel lensless camera is built with a board-level camera FLIR BFS-GE-16S2C-BD2 and a custom fabricated phase element. The image sensor (Sony IMX273) has 1440 × 1080 pixels, with a pixel pitch of 3.45 m. The optical distance in-between is 3 mm, where the cover glass on the sensor has a thickness of 0.7 mm. The sensor's angular response  covers approximately ±20 • × ±15 • FOV, so the shift-invariance of the PSF should be maintained within this angular range. Marginal cells beyond the FOV are excluded for this purpose. The optimized phase profile is shown in Fig. 8a. The excluded cells are empty of the Fresnel phase. A calibration PSF is shown in Fig. 8b with a zoom-in area highlighting three adjacent spots. The optical element is fabricated on a 0.5-mm-thick fused silica substrate by a combination of photolithography and reactive-ion etching techniques. At a nominal wavelength of 550 nm, the 2 phase modulation corresponds to a total depth of 1200 nm, which is discretized into 16-levels for fabrication. The lateral fabrication resolution is 1.15 m, making each sensor pixel upsampled by 3 × 3 of the optical pixel. The optical element is assembled onto the sensor by 3D-printed mechanical mounts (Fig. 8c). An optical microscope image (Nikon Eclipse L200N, 5×) and a 3D measurement (Zygo NewView 7300, 20×) of the fabricated sample are shown in Fig. 8d. Details of the prototype design and fabrication can be found in the Supplemental Document 1.

Prototype results
In the following, we present the raw data in full sensor resolution, and the reconstructed images are cropped to 640 × 480 pixels to cover the ±20 • × ±15 • FOV. The object distance is about 30 cm away from the lensless camera. The exposure time is set to make sure all pixels not saturated (∼ 240/255) in 8-bit mode. Gamma correction is disabled in the capture, and raw data are used in the reconstruction.
We evaluate the image performance in two illumination scenarios. The first is self-illuminating images displayed on a computer monitor. Self-illuminating objects emit light in a confined angular range, so little stray light or cross-talk is introduced in the captured data. Fluorescence imaging would be a good application for this mode. Figure 8e and 8f show the captured raw data. The reconstructed image in Fig. 8i reveals fine details in the camera and human face. A zoom-in area of the camera object is shown in Fig. 8m. We evaluate the spatial resolution with a USAF resolution target in Fig. 8j, with two cross lines (yellow and magenta) plotted in Fig. 8n for the RGB color channels. Line pairs in the central area are clearly visible.
The second is real objects with ambient illumination (Fig. 8g and 8h). This is a more realistic scenario for photography outside of a lab environment or in applications like endoscopy. Ambient illumination poses a severe challenge for existing flat or lenseless imaging systems, since "stray" light can enter the camera at angles outside the nominal field of view, which is then not modeled by the reconstruction algorithm. We show the reconstructed car toy image example in Fig. 8k and a printed USAF resolution chart in Fig. 8l. Since the ambient illumination is not uniform across the scene, the intensity fall-off from on-axis to off-axis, which is vignetting, is more obvious than in the self-illuminating case. The recovered image is still able to reveal sufficient details despite of the complicated environmental light condition. Similarly we evaluate the spatial resolution with the printed USAF resolution target. The cross lines in Fig. 8p indicate that the line pairs can be discriminated very well in green and blue channels, while it becomes less reliable in the red channel in the high frequency line pairs. The differences in color channels also indicate residual chromatic aberrations exist in the reconstructed images. We present a more thorough discussion of the prototype, including characteristic measurements, color reproduction, resolution analysis, and additional results in Supplemental Document 1.

Advantages over existing designs
The proposed Voronoi-Fresnel lensless camera differentiates itself from existing lensless technologies that focus on either biomimetic optics or PSF engineering. Our design not only makes use of the compound-eye structure optically, but also facilitates the subsequent algorithm computationally. The compositing cells in our design do not work individually as in compound eyes, but form an optimized PSF collectively. Optical cross-talk between adjacent cells is not a concern, so no blocking layer is necessary. The reconstruction algorithm is not to stitch sub-images, but to solve an inverse problem from the physical model. Most prominently, unlike the regular tessellations, our method features an optimized irregular structure. The number of cells is determined by the imaging performance, not by the final image pixel counts.
Our Voronoi-Fresnel phase is mostly inspired by phase-only designs for PSF engineering, but pushes the optimality of the PSF structure further. The resulting PSF not only possesses the properties assumed in existing works, but also exhibits some unique features, such as more compact directional filtering, non-trivial random yet uniform distribution, and an optimal number of diffraction limited spots. As we have demonstrated, the distribution of the focusing units matter significantly. A search on the number of units is also necessary to find the optimal solution. Another related optimization metric is the auto-correlation of the PSF as used in Miniscope3D [19]. Since auto-correlation is not a single number, it cannot be used directly as an optimization metric. A diffraction limited MTF is necessary as the reference instead. We show a simple example in Supplemental Document 1 that, without a reference, this metric could lead to ambiguity. In contrast, our MTFv concept distills the spatial and spectral information in PSF into a single figure-of-merit that can be used for numerical optimization. MTFv requires no reference value, and evaluates the amount of useful information by itself. Different from the custom MLAs in computational miniature mesoscope [22,23] and learned 3D lensless camera [24], the number and geometries of our Voronoi-Fresnel cells are determined automatically by the algorithm.
Additional benefits of the proposed design lie on various aspects. Since our design is a tessellation of the base Fresnel phase, it facilitates large-area design in high spatial resolution, which alleviates the sampling load for pixel-wise phase element optimization on mega-pixel sensors. The resulting smooth microstructures ease fabrication requirements than otherwise high-frequency random features in prior designs. The Voronoi-Fresnel design finds an in-between strategy that takes advantages of both compound-eye and PSF engineering methods.

Optical characteristics
Compared to its lens counterparts, the performance of the Voronoi-Fresnel lensless imaging can also be characterized and analyzed by effective focal length, spatial resolution, FOV and so on. Since the Voronoi-Fresnel phase is a collection of the same base Fresnel phase, the effective focal length is also the focal length of the base Fresnel phase. The FOV is determined by three factors, as shown in Fig. 9a. First, the image sensor usually has a cut-off angle , beyond which light is not sensible. Each Voronoi-Fresnel cell is limited by this cut-off angle. The outmost object that the central cell 0 can see is 0 in the object space. Second, the marginal cell has a lateral center displacement ℎ from the optical axis. This corresponds to an angular displacement of arctan (ℎ / ). The equivalent half FOV is then where = / is the magnification calculated from the object distance , and phase-to-sensor distance . In our experiments we did not remove the sensor cover glass, which impose a minimum constraint on this distance.  The resolution of lensless cameras is usually object-dependent. Since the base Fresnel phase is a first-order approximation of an ideal lens, the individual cell is closely diffraction-limited. There exists a theoretical limiting resolution. However, variations in the aperture shapes do exist between different Voronoi-Fresnel cells. We define an effective diameter¯for the base Fresnel phase by statistically calculating the Root-Mean-Square (RMS) distance of all the vertices to their respective center locations. Assuming the -th cell has vertices, and there are cells in total, the RMS diameter is¯= where ( , ) are the -th center coordinates, and ( , ) are the -th vertex in the -th cell. The equivalent limiting resolution can be evaluated according to the conventional Rayleigh criterion, i.e., the radius of the distinguishable spot is 1.22 /¯. For our prototype, the effective diameter¯is 233.5 m, so the diffraction limited spot diameter is 15.7 m. We evaluate the prototype resolution by imaging a cross-hair target, and measure the cross-section diameters. The measured resolution is 18.4 m and 20.7 m in horizontal and vertical directions respectively (see Supplemental Document 1). In practice, the final image quality is degraded by the complexity of the scene, the noise, and the algorithm efficacy. Owing to the blue-noise sampling, the optimized MTF shows minimal low-frequency components, while keeping mid-frequency and high-frequency well. This indicates the contrast in smooth objects is sacrificed to allow more mid-frequency and high-frequency information to be recorded.

Depth-dependent PSFs
Similar as the PSFs in DiffuserCam [17] and PhlatCam [18], our PSF expands laterally when the depth is closer to the sensor, while keeping nearly constant when the object is far away. To demonstrate the PSF expansion, we simulate the depth varying PSFs for the optimized example in Fig 4c. The results are shown in Fig. 10. We label the centroids of the spots in the PSF at infinity in red as a reference. The centroids of the PSFs at the evaluated depths are shown in green, and the corresponding displacements are denoted with blue lines. It can be seen that the spot locations displace much more for the small object distance of = 10 mm than larger distances of 100 mm and 1000 mm, with respect to the infinity PSFs. The lateral displacements of the spots make the PSFs at different depths distinct and less correlated with each other.  Since the PSFs are depth-dependent, our Voronoi-Fresnel lensless camera is also capable of imaging 3D scenes. To demonstrate this, we show in Fig. 11 an example of refocusing at various depths with the prototype. We place three objects in front of the lensless camera at 50 mm, 100 mm and 200 mm, respectively. With one single shot, we can reconstruct the corresponding images that are focused at different depths. For example, in Fig. 11a it is focused at the letters "KAUST" at 50 mm, and the rest of the scene are severely blurred. In Fig. 11b the logo in the middle (100 mm) is focused, and the front letters are completely unrecognizable, while the drawing in the back is slightly blurred. Similarly when the drawing in Fig. 11c is sharp, the other two objects are out-of-focus. We also note that, in 2D photography applications the PSFs are often nearly constant. To account for depths, not only geometric displacement in the PSFs but also their spectral variations could be employed to encode depths. The above example only shows the geometric effect. It is also possible to optimize the base phase in each cell to make the PSFs more suitable for 3D applications. Furthermore, in the reconstruction, this requires depth-aware optimization of the phase together with the reconstruction scheme. The end-to-end optimization in a deep learning framework would be explored in future work.

Limitations and future work
The proposed Voronoi-Fresnel lensless camera also exhibits a few limitations. First, the two-step optimization algorithm is time-consuming due to the parameter sweep step. Although the scalability property helps estimating the correct range, it is still expected to develop more efficient algorithms to accelerate the process for large-area designs. Second, the FOV is currently limited by the cut-off angle of the sensor, which is mainly due to the microlens array embedded in the image sensor. The intensity fall-off may be a concern for wide-angle applications. It may be possible to design telecentric Voronoi-Fresnel phases to mitigate the situation, although fabrication complexity would be increased. Another limitation for the current prototype is the stray light. A common practice in conventional lenses to control stray light is to use baffles. However, it is difficult to implement such an idea without sacrificing the compactness. In the apposition type compound eyes, evolution has developed a screening pigment around the ommatidium to block stray light and cross-talk from entering the photoabsorbing rhabdom. Similar structures can be fabricated by etching deep trenches around the boundaries of Voronoi-Fresnel cells, and fill in black materials. Finally, an end-to-end framework that incorporates the optical element to be optimized together with the reconstruction network is still lacking, and is worth exploring in future work.

Conclusion
We have demonstrated a compound eye inspired lensless camera with optimal information preservation in optics. Following the proposed Fourier domain metric MTFv, we are able to tailor a spatially-coded Voronoi-Fresnel phase for better computational image reconstruction. Experimental results show the superior image quality of the prototype lensless camera in various illumination conditions. The advantages of the proposed Voronoi-Fresnel lensless camera offer a simple yet cost-effective imaging solution to significantly reducing the volume of imaging devices. The possibility of mass production makes it a promising candidate in applications such as fluorescence imaging, endoscopy, and internet-of-things.

Diffractive lensless imaging with optimized Voronoi-Fresnel phase: supplemental document
This document provides supplementary information to "Diffractive lensless imaging with optimized Voronoi-Fresnel phase". Technical details of the Point Spread Function, Modulation Transfer Function, Centroidal Voronoi Tesselation, Voronoi-Fresnel phase optimization, image formation and reconstruction, prototype fabrication, and additional results are provided to support the findings in the main paper.

POINT SPREAD FUNCTION
Here we derive the mathematical expressions for the point spread functions (PSFs) in the main texts. First, we inspect a single Voronoi-Fresnel cell V i . The complex optical field after propagating from the phase element to the image sensor plane is (S1) where we have replaced the variables by ξ = ξ − ξ i and η = η − η i in the second line. The integral in the third line is exactly the Fourier transform of the aperture function A i evaluated at spatial frequencies [(x − ξ i ) /λz, (y − η i ) /λz]. The corresponding PSF is then where we denote PSF 0 i (x, y, λ) as the centered PSF as if the aperture were located at the origin, This implies that the PSF for the i-th Voronoi-Fresnel cell is a shifted version of the centered PSF. It is worth noting that the shape and distribution of the centered PSF depend on the geometry of the aperture functions. These apertures are finite-edge polygons, so the centered PSFs are actually compact yet highly directional filters. It is important to diversify such directional filtering of the PSFs to achieve optimal performance. The effective PSF of the Voronoi-Fresnel lensless camera is obtained in the same way by taking the whole phase into account, Although the Voronoi cells share no intersections, V i ∩ V j = ∅, ∀i = j, and the individual aperture functions have no overlapped areas, the diffraction patterns P i (x, y, λ) and P j (x, y, λ) in general would interfere with each other, so the cross terms in Eq. (S4) are not necessarily zero. We investigate the cross terms in two situations. One is a random distribution of adjacent Fresnel centers, and the other is where the centers are at the centroids. A simple example is a rectangular space tessellated with two adjacent cells, as shown in Fig. S1. The optical parameters are the same as in Fig. 3 in the main paper. When the centers of two adjacent cells are not located in the centroids, and very close to each other near the boundary, the sum of individual PSFs are different from the real PSF predicted by the Fresnel propagation. However, when the centers are at the centroids of the cells, the sum of individual PSFs is approximately the same as the accurate PSF predicted by the Fresnel diffraction. The maximum error is less than 1%. Hence, for the Centroidal Voronoi case, the entire PSF can be approximated by omitting the cross terms, just for the purpose of analysis, Note that in the simulation and optimization, we do not simulate individual PSFs and superposition them, but use the entire constructed phase function to obtain the panchromatic PSF. The above PSF is for monochromatic light. To get the panchromatic PSF, we simply integrate all the spectral PSFs over the interested spectral range, (S6)

MODULATION TRANSFER FUNCTION
Here we provide a detailed derivation and analysis for the Modulation Transfer Function (MTF). MTF is defined as the magnitude of the Optical Transfer Function (OTF), which is the Fourier transform of the PSF for incoherent imaging systems, where 0 ≤ MTF ≤ 1. We can obtain the MTF by taking the Fourier transform of the above PSF, where f X and f Y are the Fourier domain frequencies, and we have applied the translation property of Fourier transform. The individual OTFs in the complex form are The remaining phase delay terms are simplified as so the MTF can now be rewritten as This equation reveals three factors that affect the MTF, the diffraction by each similar apertures that determines the M ( f X , f Y ) and P i ( f X , f Y ) terms; the additional phase delay terms Ψ i ( f X , f Y ) that is introduced by the amount of spatial shifts from the centered PSFs; and the total number of Voronoi-Fresnel cells K.
In addition, we show how MTFv is related to the Strehl ratio. Strehl ratio is defined as the peak intensity ratio between the aberrated PSF and the diffraction limited PSF [1], where I ab (0, 0) is the peak intensity of the aberrated PSF in the origin, and I dl (0, 0) is the corresponding intensity of the diffraction limited PSF. In practice, it is difficult to use Strehl ratio in this form as an optimization metric, as it is challenging to optimize the peak intensity of the PSF, and there may be multiple peaks in a composite system like ours. According to the definition of Fourier transform, we can rewrite the above equation in the Frequency domain, Now it becomes a quantity that can be calculated more easily with OTFs. We also note that, since the diffraction limited OTF is in the denominator, and is often a fixed value for a given system. We only need the term in the numerator. In addition, OTF consists of complex values. For stable numerical computation, we can replace OTF with MTF in the above equation. Since MTF is positive definite, the integral of MTF is always no less than the integral of OTF [2], So finally we have We note that, strictly speaking, although MTFv is related to the Strehl ratio, they are not identical. MTFv is a more generalized term that can be calculated easily from the MTF. It is also a tractable quantity that measures the information collected in an optical system. Therefore, we choose MTFv as a figure-of-merit for the optimization of our lensless imaging system.

CENTROIDAL VORONOI TESSELLATION
Finding the optimal tessellation of the 2D design space is basically a sampling problem. Among various sampling methods, blue noise sampling [3,4] offers minimal low-frequency components and no concentrated spikes of energy, which is the required properties for our application. An effective way to achieve blue noise sampling is by Centroidal Voronoi Tessellation (CVT) [5]. The CVT is a special Voronoi diagram where the sites coincide with the mass centroids of the corresponding Voronoi regions. The mass centroid of a Voronoi region V i is defined as where p is a point in the Cartesian coordinates, and τ is a given density function. We can assume a constant density across the 2D plane for simplicity, i.e., τ ≡ 1. CVT is a critical point of the energy function defined by There are various algorithms to optimize the above energy function and generate optimal CVT [5,6]. A classic method is to use Lloyd iterations [7] to update the Voronoi sites by their centroids until convergence. In each iteration, the mass centroids are computed for the current Voronoi regions. Then these generated sites are replaced by the calculated centroids, and a new Voronoi tessellation is constructed. The process is repeated until a convergence criterion is met.

SCALABILITY
We find that the optimal number of Voronoi-Fresnel cells is linearly proportional to the design area. Here we analyze and validate this assumption for different design areas with various aspect ratios. For all the experimental designs below, we assume the substrate is fused silica, and design the phase at 550 nm. The distance between the phase and sensor is 2 mm. Phase pixel size is 1 µm. In Fig. S2a-e we show the optimization curves for these experimental designs of aspect ratios of 1:1, 4:3, 3:2, 16:9, and 21:9, respectively. These aspect ratios allow us to account for common sensor shapes. For each aspect ratio, we start from a small area to optimize the Voronoi-Fresnel phase for various number of cells. Then we double the size in both dimensions (scaling to a quadruple area). We repeat this procedure 4 times for each aspect ratio. The best number of Voronoi-Fresnel cells in each design is obtained by fitting the data into a 5th order polynomial, and evaluate the cell number when the MTFv reaches the peak. These data are used in Fig. 5 in the main paper.

IMAGE RECONSTRUCTION
A color image recorded on the image sensor is an integral of spectrally-blurred images weighted by the color response of the sensor. This process can be expressed as where f (x, y, λ) is the latent spectral image at wavelength λ, h (x, y, λ) is the spectral PSF, and * denotes spatial convolution. q c (λ) is the color response function of the sensor, and g c (x, y) is the captured color image in color channel c (for sensors with Bayer filters, c = r, g, b). If the imaging system is perfect, we assume the spectral PSFs are all identically Dirac delta functions, i.e., h (x, y, λ) = δ (x, y, λ) = δ (x, y). The ground-truth sharp image would be (S19) On the other hand, if the image is a spectrally-uniform ideal point source, f (x, y, λ) = δ (x, y, λ) = δ (x, y), the captured image would be a color PSF Note that if the imaging system is approximately achromatic, h (x, y, λ) ≈ h (x, y), Eq. (S18) can where u is the scaled dual variable. The x-problem can be efficiently solved in the Fourier domain. The analytical solution is The z-problem has a closed-form solution, where S κ (v) = (1 − κ/ |v|) + v is an element-wise soft thresholding operator. Finally u is updated with the new x and z.
We show more comparison results in Fig. S3 for the three candidate designs, the diffuser PSF [8], the Perlin PSF [9], and our Voronoi-Fresnel PSF. The top two rows are from dataset [10] with lab setup scenes, and the bottom two rows are from the dataset [11] with natural indoor scenes. Our design outperforms the other two in both PSNR and SSIM.

COMPARISON WITH AUTO-CORRELATION
The auto-correlation of the PSF is a concept related to MTF, and could in principle be used for optimization of optical phase elements, such as in Miniscope3D [12]. Since auto-correlation is not a single-number, it cannot be used directly as an optimization metric. A diffraction limited MTF must be taken as the reference. A major difference in the proposed MTFv concept is that, spatial and spectral information in the PSF are distilled into a single number that can be used for numerical optimization. Our MTFv metric requires no reference value, and evaluates the amount of useful information by itself.
In addition, we present an example to show that, auto-correlation may not be consistent in certain cases from the perspective of image quality, whereas our MTFv metric is more directly related to the final performance. Here we use four PSFs that show different auto-correlation properties. The first three are those we use in Fig. 4 in the main paper. Their auto-correlation functions are shown in Fig. S4a-c. The fourth one is a white Gaussian noise pattern, which has an extremely sharp Dirac-like auto-correlation function (Fig. S4d). As a comparison, the rectangular-grid and hexagonal-grid PSFs exhibit very large support, whereas the optimized Voronoi-Fresnel PSF has a moderate shape. We also compute their corresponding MTFv values, 1.14 (rectangular), 1.19 (hexagonal), 2.29 (Voronoi-Fresnel), and 0.83 (Gaussian). To evaluate the final image quality for these PSFs, we reconstruct the sharp images using the same parameters. The reconstructed images are shown in Fig. S4e-h, with the PSNR and SSIM values shown below. Our Voronoi-Fresnel PSF yields the best PSNR (32.4 dB) and SSIM (0.765), significantly better than the other three.
This example indicates that, although the white Gaussian noise pattern has a strong peak in the auto-correlation, it fails in reconstructing a reasonable image. The rectangular and hexagonal PSFs show very large support in the auto-correlation, but not as good as the optimized Voronoi-Fresnel PSF. The result also emphasizes the importance of a proper reference quantity for the success of the auto-correlation metric. As a comparison, the proposed MTFv metric is consistently related with the image quality, and naturally rules out the white Gaussian noise PSF, so MTFv is a more robust metric for this design problem.

FABRICATION
The experimental samples are fabricated by a combination of photolithography and reactiveion etching (RIE) techniques. The substrate is a 4 inch fused silica wafer with a thickness of 0.5 mm. We binarize the optimized Voronoi-Fresnel phase profile into 2 4 = 16 levels, and repeat 4 iterations of the basic photolithography with different masks and then RIE with doubled etching time. The masks are fabricated on soda lime substrates by laser direct writing on a Heidelberg µPG 501 mask maker. In each iteration, the wafer is first cleaned in Piranha solution at 115 • C for 10 min, and dried with N 2 . A 150-nm-thick Chromium (Cr) layer is deposited by sputtering on one side of the wafer. A 0.6-µm-thick layer of photoresist AZ1505 is then spin-coated on top of Cr after HMDS (Hexamethyldisilazane) vapor priming. To transfer the mask patterns to the wafer, we align the wafer with the mask on a contact aligner EVG6200 ∞ in the hard+vacuum mode, and then apply UV exposure with a dose of 9 mJ/cm 2   The prototype Voronoi-Fresnel phase is designed at 550 nm for 2π modulation. Considering the fabrication resolution and sensor pixel size, we fix the upsampling ratio of 3× for the optical element, i.e., 1.15 µm, which is well controlled by our fabrication method. The sensor has 1440 × 1080 pixels, and the Voronoi-Fresnel phase has 4320 × 3240 pixels. The optimized full-area phase profile is shown in Fig. S5a. Our sensor has a field-of-view of ±20 • in the horizontal direction, and ±15 • in the vertical direction, so we design an aperture (Fig. S5b) that excludes the cells outside of the field-of-view. The optimal number of Voronoi-Fresnel cells from the optimization is 594, as shown in Fig. S5c after the parameter sweep step. The effective cells within the field-of-view is 424, which is about 71.4% of the total number.
The final presentation of the image requires some necessary pre-processing and post-processing for better image presentation. We first demosaic the raw sensor data into color image data according to the sensor Bayer layout. Before image reconstruction, we normalize the blurred data by its norm in each color channel. For all the experimental results, the weights are all set to µ = 1e −7 , and ρ = 1e −5 . In the post-processing, we use a simple gray world algorithm in MATLAB (chromadapt) for automatic white balancing in the linear color space. The illuminant is estimated by excluding 10 percentile of pixels. Finally gamma correction (γ = 1.25) is applied.
We present additional characteristic test results for the prototype in Fig. S6. First, we evaluate the geometry distortion using a checkerboard target. As shown in Fig. S6a, the geometry is restored very well across the ±20 • × ±15 • field-of-view. On the border regions, there are residual chromatic artifacts, however. This may arise from two factors. One is the off-axis aberrations of the base Fresnel phase, and the other is the difference in PSFs from on-axis to off-axis leading to the drop in reconstruction quality. Second, we evaluate the color fidelity using a color checker target in Fig. S6b. Despite the residual color artifacts in the border regions, the overall color fidelity is retrieved from the raw data. Potential improvement could be a more advanced white balancing algorithm other than the simple gray world algorithm used here. Last, we evaluate the spatial resolution variation using a Siemens star target in Fig. S6c. A uniform spatial resolution change is observed from the reconstructed image, demonstrating again that the MTF optimization is effective with uniform response. Color reproduction remains a challenge in the current prototype. To analyze the color fidelity, we extract the color patches from the reconstructed image of the color checker (Fig. S6a), and tile them side by side according to their original orders in the color checker to synthesize an image in Fig. S7a. Each extracted patch has 50 × 50 pixels. As a reference, we create a reference color patch image from the corresponding true RGB values with the same size, as shown in Fig. S7b, with their indices labeled. The color fidelity is measured as the color difference dE using the CIEDE2000 standard [13]. We can visualize the pixel-wise color difference with the dE map in Fig. S7c. Since the dE map varies within each color patch due to the noisy reconstruction results, we take the average value in each color patch as the reconstructed color values. The color difference dE values are then calculated and plotted in Fig. S7d. The maximum dE (largest color difference) is 31.6 at index 19, which is the "White" patch, and the the minimum dE (smallest color difference) is 3.7 at index 10, which is the "Purple" patch. This also agrees with the visual perception.
We attribute the quality of color fidelity in our prototype mainly to the residual chromatic aberrations, and the simple white balancing algorithm we currently use. Since the base Fresnel phase is static at one single wavelength (550 nm in our prototype), the PSF geometry changes if the illumination wavelength is different, and hence the MTFv is also a function of wavelength. We optimize for the spectral integral of the MTF, but not the cross differences between wavelengths. We evaluate this chromatic effect by plotting the MTFv with respect to wavelength across the visible band from 400 nm to 700 nm, as shown in Fig. S7e. The MTFv drops around 71% for both the short and long ends of the wavelength range, compared with the design wavelength at 550 nm. This could be mitigated by using a base phase function that is optimized achromatic, instead of the static Fresnel phase we currently use. Further improvement can be made to use more advanced white balance algorithms to improve the color fidelity. Although we use the panchromatic PSF in the visible band in our design, there are still residual chromatic aberrations in the current result. We attribute mainly two important factors for the residual chromatic aberrations. First, the fixed base Fresnel phase in each cell is inherently dispersive, which is a fundamental property of all the diffractive optical elements. It could be possible to use an optimized achromatic base phase in each cell, which would require additional efforts. Second, the simple image reconstruction algorithm we adopt here does not account for chromatic aberration correction. The total variation regularization only takes care of spatial structures in the image. It would then be more advantageous to employ neural networks for the reconstruction to alleviate chromatic aberrations.
To evaluate the resolution of the prototype, we capture a cross-hair target, and reconstruct the final image. The raw data is shown in Fig. S8a, and the reconstructed image is shown in Fig. S8b. We can plot the cross-sections in the horizontal and vertical directions. The spot diameter is measured around the lines as the intensity first reaches zero. Since we know the pixel size, we can calculate the physical diameters. In the horizontal direction, the diameters are measured as 17.25 µm, 17.25 µm, 20.7 µm for the red, green and blue channels respectively. The average diameter is 18.4 µm. In the vertical direction, the diameters are 20.7 µm, 17.25 µm, 24.15 µm for the red, green and blue channels respectively. The average diameter is 20.7 µm. To compare with the theoretical value, we calculate the effective diameter of all the Voronoi-Fresnel cells. The ideal spot diameter is 15.7 µm, so the resolution of the prototype is indeed close to the theoretical value.
Additionally, we present more example results in Fig. S9. Again, the results in