Computational imaging with a highly parallel image-plane-coded architecture: challenges and solutions

This paper investigates a highly parallel extension of the singlepixel camera based on a focal plane array. It discusses the practical challenges that arise when implementing such an architecture and demonstrates that system-specific optical effects must be measured and integrated within the system model for accurate image reconstruction. Three different projection lenses were used to evaluate the ability of the system to accommodate varying degrees of optical imperfection. Reconstruction of binary and grayscale objects using system-specific models and Nesterov’s proximal gradient method produced images with higher spatial resolution and lower reconstruction error than using either bicubic interpolation or a theoretical system model that assumes ideal optical behavior. The high-quality images produced using relatively few observations suggest that higher throughput imaging may be achieved with such architectures than with conventional single-pixel cameras. The optical design considerations and quantitative performance metrics proposed here may lead to improved image reconstruction for similar highly parallel systems. 2015 Optical Society of America OCIS codes: (110.1758) Computational imaging; (110.3010) Image reconstruction techniques. References and links 1. R. G. Baraniuk, “Compressive sensing,” IEEE Signal Process. Mag. 24, 118-124 (2007). 2. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. E. Kelly, R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Sig. Proc. Mag. 25(2), 83-91. (2008). 3. M. A. Neifeld, J. Ke, “Optical architectures for compressive imaging,” Appl. Opt. 46(22), 5293-5303 (2007). 4. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, D. S. Kittle, “Compressive coded aperture spectral imaging,” IEEE Sig. Proc. Mag. 31(1), 105-115 (2014). 5. R. M. Willett, R. F. Marcia, J. M. Nichols, “Compressed sensing for practical optical imaging systems: a tutorial,” Opt. Eng. 50(7), 072601 (2011). 6. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science 340(6134), 844-847 (2013). 7. Y. Wu, P. Ye, I. O. Mirza, G. R. Arce, D. W. Prather, “Experimental demonstration of an optical-sectioning compressive sensing microscope (CSM),” Opt. Express 18(24), 24565-24578 (2010). 8. V. Studer, J. Bobin, M. Chahid, H. S. Mousavi, E. Candes, M. Dahan, “Compressive fluorescence microscopy for biological and hyperspectral imaging,” Proc. Natl. Acad. Sci. 109(26), E1679-E1687 (2012). 9. M. S. Mermelstein, “Synthetic aperture microscopy,” Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, (1999). 10. J. Ryu, S. S. Hong, B. K. P. Horn, D. M. Freeman, and M. S. Mermelstein, “Multibeam interferometric illumination as the primary source of resolution in optical microscopy,” Appl. Phys. Lett. 88, 171112 (2006). 11. R. H. Shepard, C. Fernandez-Cull, R. Raskar, B. Shi, C. Barsi, H. Zhao, “Optical design and characterization of an advanced computational imaging system,” Proc. SPIE 9216, 92160A (2014). 12. R. Kerviche, N. Zhu, A. Ashok, “Information-optimal scalable compressive imaging system,” in Computational Imaging and Sensing Conference (Optical Society of America, 2014), paper CM2D.2. 13. J. Ke, E. Y. Lam, “Object reconstruction in block-based compressive imaging,” Opt. Express 20(20), 2210222117 (2012). 14. A. Mahalanobis, R. Shilling, R. Murphy, R. Muise, “Recent results of medium wave infrared compressive sensing” Appl. Opt. 53(34), 8060-8070, (2014). 15. J. Wang, M. Gupta, A. C. Sankaranarayanan, “LiSens A scalable architecture for video compressive sensing,” in Proceedings of IEEE International Conference on Computational Photography (2015). 16. M. Elad, “Optimized projections for compressed sensing,” IEEE Trans. Signal Process. 55(12), 5695–5702 (2007). 17. J. Ke, E. Y. Lam, P. Wei. "Binary sensing matrix design for compressive imaging measurements." Signal Recovery and Synthesis. Optical Society of America, (2014). 18. H. Arguello and G. R. Arce, “Rank minimization code aperture design for spectrally selective compressive imaging,” IEEE Trans. Image Process. 22(3), 941–954 (2013). 19. G. H. Golub, C. F. V. Loan, “Matrix computations,” Johns Hopkins University Press, Baltimore, MD, (2013). 20. M. Rudelson, R. Vershynin, “Non-asymptotic theory of random matrices: extreme singular values,” Proc. Int. Congr. of Mathematicians, 1-25, (2010). 21. M. Rudelson, R. Vershynin, “The Littlewood-Offord problem and invertibility of random matrices,” Adv. Math. 218(2),600-633 (2008). 22. R. Gu and A. Dogandzic, “A fast proximal gradient algorithm for reconstructing nonnegative signals with sparse transform coefficients”, in Proceedings of Asilomar Conference on Signals, Systems, and Computers, pp.1662–1667, (2014).


Introduction
Compressive sensing (CS) techniques can be implemented in applications where signal acquisition is sparse in some known domain [1]. Recently, such techniques have been applied to optical imaging systems. The CS framework states that high-fidelity images can be constructed with a larger number of pixels than are physically present in the optical sensor. An extreme example of CS-based imaging is the "single-pixel camera", which uses just one photodetector, yet can reconstruct images with several hundred thousand pixels [2]. The trade-off, of course, is that many sequential measurements of the object have to be made, each corresponding to a linear projection of the object intensity onto one of the elements in a set of known functions. With selection of an appropriate set of functions, a high-resolution image can be recovered from a low-resolution sensor through CS reconstruction methods. Singlepixel camera architectures typically implement these linear projections by placing a spatial light modulator (SLM) at a conjugate image plane ( Fig. 1(a)). The SLM is configured to impart a distinct high-resolution intensity or phase modulation onto the object field for each projection, with the resulting light signal integrated at the single photodetector. Due to the location of the SLM, we refer to this as image-plane coding (IPC). This is in contrast to coded-aperture architectures, which introduce a modulation at an aperture (Fourier) plane within the optical path [3,4].
Since their introduction, single-pixel cameras have demonstrated potential for applications ranging from remote sensing [5] to 3D imaging [6], and microscopy [7,8]. However, the single-pixel camera remains a highly-sequential measurement system, with consequent limitations on light collection efficiency and imaging speed. Several architechures have been proposed to potentially overcome these limitations by using multiple pixels of a focal plane array to implement highly parallel extensions of the single-pixel camera ( Fig. 1(b)) [9][10][11][12][13][14][15]. Measurement models and simulated data for these architectures have outlined the feasibility of the parallel approach [10][11][12][13]. Experimental implementations have also demonstrated some of the advantages to this strategy, particularly in the infrared spectral region where increasing the pixel density of imaging cameras can be challenging [14,15].
In this paper, we investigate several practical factors that arise when implementing compressive sensing with a parallel focal plane array architecture, particularly when the pixel size is small. Similar to the single pixel camera framework ( Fig. 1(a)), a coded mask is located at a conjugate image plane with multiple elements mapped to individual array pixels at the sensor ( Fig. 1(b)). We demonstrate that system-specific optical effects must be measured and integrated into the system model for accurate image reconstruction. Data is acquired using a benchtop platform ( Fig. 1(c)), which takes sequential measurements that are reconstructed into images. We introduce measurable, quantitative metrics to relate reconstruction accuracy to the hardware components and report representative figures and image data for our test platform. Finally, we verify that our findings translate to imaging real objects by presenting data for a 1951 USAF resolution target.

Methods
In this section, we describe the design and characterization of our experimental platform and formulate a mathematical model for compressive imaging with parallel IPC. We identify the challenges that arise with physical implementation of IPC-based CS and discuss how to address such challenges. We develop a mathematical model for IPC CS and describe how to measure and incorporate system-specific mapping parameters into the model.

Experimental platform
Our experimental platform uses a digital micromirror device (DMD) to provide twodimensional binary or grayscale mask patterns. The elements of binary mask patterns were generated from a Bernoulli distribution with parameter value 0.5, where an outcome of 0 indicates a mirror (mask element) that blocks all incident light and a 1 indicates a mirror that reflects all incident light. The elements of grayscale mask patterns consisted of random values uniformly distributed between 0 and 255, providing 256 levels of light modulation. While previous studies have explored the structure of measurement matrices, which dictates mask pattern selection [16][17][18], the focus of this report is on optical system design and CS implementation; random patterns were used here as an example of a common mask design.
We removed the projection lens from a Texas Instruments LightCrafter 4500 unit to provide direct access to the 1140 × 912 micromirror array. Each mirror element is 7.64 m square. The LightCrafter unit contains red, green, and blue LEDs for illumination. The DMD

Single pixel
Pixel array (a) is imaged onto a 14-bit CCD array with 1384 × 1036 pixels, each 6.45 m square (Point Grey Research, GRAS-14S5M-C). Since the object intensity is multiplied by the mask pattern in IPC, the locations of the object and mask in Fig. 1 are interchangeable and both configurations have been used previously for single-pixel cameras [2,7]. To maximize the impact of this work and focus on the optical architecture and subsequent image reconstruction, we generated objects synthetically and multiplied them by a number of mask patterns in software. We then uploaded sequences of these masked objects to the on-board flash memory of LightCrafter 4500 and projected them using the unit's internal illumination. All masked objects were imaged onto the CCD array by either a multi-element microscope objective (Olympus Plan N, 4/0.1), a plano-convex singlet (Thorlabs LA1422-A, 40 mm focal length), or an achromatic doublet (Thorlabs AC254-040-A, 40 mm focal length), each referred to as lens L p in Fig. 1(b). For each lens, an iris limited the entrance pupil diameter to 9 mm. As with the single-pixel camera and ensuing parallel architectures, the key to generating diverse observations for subsequent CS reconstruction is to image multiple mask elements onto each sensor pixel. The number of mask elements imaged onto each sensor pixel is termed the undersampling factor. In principle, a very large undersampling factor can be achieved by using a DMD with small mask elements, a sensor array with large pixels, or using a projection lens (L p ) with large demagnification factor. However, DMD manufacturing constraints, optical aberrations, and imperfect alignment of system components all represent practical limitations to achievable undersampling factors. For all experiments in our case, we used 3 × 3 binning of DMD mirrors to generate an effective mask element of 22.92 m square, with 2 × 2 binning on the CCD sensor to yield an effective pixel size of 12.9 m square. We positioned the projection lens to produce a 0.28 magnification from mask to sensor, resulting in a theoretical undersampling factor of 4 ( Fig. 2(a)). In practice, an exact integer undersampling factor is extremely difficult to achieve. Light from individual mask elements can leak onto sensor pixels neighboring the geometrically imaged pixel. A non-uniform or inexact magnification factor can lead to some mask elements mapped partially onto multiple sensor pixels. Optical aberrations result in a broadening of the point spread function (PSF) that in turn causes image blurring at the sensor. Other effects, such as distortion can also cause errors in mapping of mask elements to the sensor array ( 2(c)). Physical vibrations can cause components to drift out of position, altering the mask-topixel alignment. Collectively, these effects result in light from individual mask elements being captured by sensor pixels lying beyond the geometrically mapped pixel. However, we show that these mapping imperfections in the imaging system can be measured in a principled manner and then mathematically integrated into the CS framework for improved image reconstruction.

Mathematical model for parallel IPC
An image-plane-coded architecture first combines the object X with a mask C to form a pointwise modulated version (C ⨀ X) of the object intensity, where ⨀ denotes elementwise multiplication of entries. This modulated object is then imaged onto the sensor array to obtain an observation Y that depends on the undersampling factor d and a system-specific matrix H that describes the contributions of (and mapping imperfections from) different mask elements to individual sensor pixels. In an ideal and perfectly aligned paraxial system, H would be a matrix with binary entries that eliminates the boundary entries of Z . Then (1) where ⊗ denotes the Kronecker product, T H denotes an N × N block Toeplitz matrix with Toeplitz blocks constructed from the convolution kernel H, D C is an N × N diagonal matrix with the mask elements in C as its diagonal entries, and x denotes the N × 1 (columnwise) vectorized version of object X. In  vectorized version of H and Cx B is an n × m matrix whose i-th row consists of m pixels from the object-mask combination that get mapped to the i-th element of y (these m pixels include the geometrically mapped as well as the neighboring pixels). To incorporate more than one object-mask combination into the equation, the rows of Cx B and the entries of y can be appended accordingly. We solve this new system of linear equations by least squares to determine H for a particular system. Recall that a reliable least squares solution requires a well-conditioned matrix ( Cx B in this case) [19]. We also know from random matrix theory that random matrices are well-conditioned with very high probability [20,21]. Therefore, we use object-mask combinations having random entries, by virtue of which the matrix Cx B is also random and therefore well-conditioned, to determine H. To quantify the accuracy of the estimated H, we also define a metric termed the prediction error, which is the error between the experimentally observed and the mathematically predicted (through H using Eq. (1)

Results
The prediction errors for H with different values of m and for three different projection lenses are reported in Fig. 3. In each case, using an experimentally determined H leads to lower prediction error than using the "ideal" H (comprising four entries, all equal to 0.25), resulting in prediction errors of 13.1%, 8.0%, and 5.3% for the plano-convex, achromatic doublet, and microscope objective lenses, respectively. Fig. 3 shows that increasing m for an experimentally determined H initially lowers the prediction error to below 1% for either the achromatic doublet or microscope objective lenses. However, as m increases beyond around 200, the prediction error begins to increase for both these lenses, likely due to the mathematical model attempting to account for elements that make insignificant contributions to measured pixel values. The values of m that minimize the prediction error are marked as closed circles in Fig. 3. While the prediction error quantifies how well the estimated H captures the imperfections of the experimental setup, it does not directly indicate how close the system is to achieving perfect optical mapping. For this purpose we define photon transfer accuracy (PTA) as the sum of the four central entries in the system-specific mapping matrix H. We consider these four central values because our experimental platform uses an undersampling factor of two in each dimension. In principle, this means that exactly four elements from the coded mask are mapped to each individual pixel on the sensor array. In this ideal case, H is a 2 × 2 matrix with all values equal to 0.25 which results in a PTA value equal to 1. However, as illustrated in Fig. 2, the optical PSF may blur the mapping from mask elements to sensor elements, causing light which is intended for a specific sensor pixel to be captured by adjacent pixels. Therefore, in practice, the H matrix must be larger in size than the "ideal" H to account for photons contributing to each pixel which are intended for neighboring pixels. To quantify the proximity of the experimental H to the ideal H we compute the sum of the four central entries; the closer this sum (or PTA value) is to 1, the closer the system is to achieving ideal mask to sensor mapping and the narrower the optical PSF. In this regard, Fig. 3 shows that the microscope objective not only has the lowest prediction error for all sizes of H, but it also exhibits a PTA metric (0.77) that is closest to 1. Fig. 4. MTF data for the three lenses used in our experimental setup. Each projection lens was positioned for 3.55 demagnification from the coded mask to sensor array with a 9 mm entrance pupil diameter.
To further relate the properties of the estimated convolution kernel matrix H to the physical attributes of the projection lenses, we quantified the performance of each lens in terms of its "modulation transfer function" (MTF). MTF data were generated by displaying binary objects at the DMD with line pairs ranging from 65.5 lp/mm to 3.3 lp/mm. For these input objects, a CCD sensor with 1.67 m square pixels was used for measurements to ensure that the experimentally measured resolution was limited by the lens itself, rather than by the sampling at the sensor. Fig. 4 indicates that the objective lens MTF maintained the highest contrast ratio at all spatial frequencies tested, followed by the achromatic lens and then by the plano-convex lens. Therefore, the objective lens exhibits the narrowest PSF, giving rise to the least blurring at the sensor plane. This assessment of the quality of lenses is in agreement with the PTA metrics obtained from our experimentally determined H's (Fig. 3).
We have now established a mathematical model for the parallel IPC architecture (Eq. (1)) and discussed experimental estimation of the convolution kernel matrix H for different projection lenses. Next, we evaluate the performance of our setup in terms of quality of the image X reconstructed from the observations synthetic binary target as the object X (Fig. 5(a)). We collected and stacked 10 observations of the object X into a vector y using the microscope objective lens and 10 different pseudorandom binary masks. We then reconstructed a 64 × 64 X from these observations by making use of an optimization-based CS reconstruction technique that is based on Nesterov's proximal gradient (NPG) method [22]. We used this technique because of its ability to take into account the non-negativity constraint inherent in optical systems. This results in lower error in the reconstructed image when compared to other techniques that ignore the nonnegativity constraint. The measurement matrix H A used in these experiments corresponds to either the "ideal" 2 × 2 matrix H or an experimentally determined 6 × 6 matrix H (m = 36), (Fig. 3). Qualitative analysis of the NPG-based reconstruction (which used the "Daubechies-4 wavelet" as the sparsifying basis) illustrates superior image detail (Figs. 5(d) and 5(e)) over a single observation or bicubic interpolation. A single observation of the object without any post-processing results in loss of edge integrity and fine detail (Fig. 5(b)). In particular, the smallest bars at the top of the object become indistinguishable (Fig. 5(b)), which can also be quantified in terms of a loss of contrast in the corresponding line profile (Fig. 5(g)). Further, 4 bicubic interpolation of the image in Fig. 5(b) fails to recover the lost detail (Figs. 5(c) and 5(h)). In contrast, Figs. 5(d) and 5(e) show CS-based reconstructions of the object using H A with the ideal H ( Fig. 5(d)) and the experimentally determined H (Fig. 5(e)). When reconstructed using 10 observations with the ideal H, we can only partially recover the lost contrast at the top of the image and there are still visible artifacts throughout the reconstructed image (Figs. 5(d) and 5(i)). On the other hand, reconstruction using the same set of observations with the experimentally determined H recovers nearly full contrast and detail without the visible artifacts (Figs. 5(e) and 5(j)). We next tested the parallel image-plane-coded system on a more structurally complex, grayscale object ( Fig. 6(a)). We imaged the cameraman test pattern under the same conditions and subsequently reconstructed using the same methods as described for the binary target experiment. We once again see that reconstructions with the experimentally determined 6 × 6 matrix H and 14 × 14 matrix H (Figs. 6(e) and 6(f), respectively) result in sharper images with fewer artifacts than reconstruction with the ideal H (Fig. 6(d)). Furthermore, quantitative analysis of the (normalized) cumulative error between corresponding pixel values of the reconstructed image and the original object, i.e., the reconstruction error ( Fig. 6(g)), as Pixel value a function of the number of observations reveals that use of the experimentally determined H is more accurate than using the ideal 2 × 2 H for any number of observations, except when the observational diversity is low (< 4 observations). Note from Fig. 3 that the two experimental H matrices used in our reconstructions correspond to the ones with the lowest PTA (6 × 6 H) and the lowest prediction error (14 × 14 H). However, owing to only slight differences between the respective PTA values and prediction errors of these H matrices, we see that they perform near identically in terms of the reconstruction error. We next evaluated the relationship between quality of the projection lens and the matrix H. We tested different lenses to determine whether accounting for lens imperfections in the system-specific H allows a lower quality lens to achieve reconstruction accuracy that is comparable to that of a high quality lens. According to the MTF results (Fig. 4) and PTA data (Fig. 3), the microscope objective is a high quality lens and the plano-convex lens is lower quality, with the achromat lens being of intermediate quality. In this experiment, we performed CS reconstruction for each lens using H matrices of size 36, 36, and 676 for the objective, achromat, and plano-convex lens respectively (see Fig. 3). We can observe in Figs. 7(a) and 7(d) that the reconstruction error for the plano-convex singlet is significantly higher than that for the other lenses, consistent with the low area under the MTF curve (Fig. 4) and the low PTA metric (0.16) of this lens. On the other hand, the reconstruction errors for the achromatic lens ( Fig. 7(b)) and microscope objective lens (Fig. 7(c)) are very similar to each other for the case of experimentally determined H's, regardless of the number of observations ( Fig. 7(d)). This suggests that use of an appropriately calibrated H during reconstruction can compensate for imperfect lens performance and other factors (Fig. 2). However, in some cases, a lens' imperfections cannot be fully compensated by H, as shown by results for the plano-convex lens. Finally, we used the IPC-based CS architecture to image a physical object. We modified our experimental platform to include additional imaging optics and a white LED (Thorlabs MCWHL5) for illumination, as illustrated in Fig. 8(a). The LightCrafter 4500 unit was replaced by a LightCrafter 6500 unit, which offers a direct path for imaging real objects onto the DMD. Two achromatic doublet lenses (Thorlabs AC254-150-A, 150 mm focal length), were arranged to image the object onto the DMD plane with unit magnification. The DMD was then used solely to impose mask patterns onto the object, which in turn was projected to the CCD sensor by a 10 microscope objective (Olympus Plan F, 10/0.3) as the projection lens. An undersampling ratio of 4 was maintained. We measured the system specific mask-to-sensor mapping term H as in our earlier description, but now a uniform white object was used in place of the binary object displaying all ones. The results in Figs. 8(b)-(e) are for a 1951 USAF Resolution Target object printed on photographic paper (Edmund Optics, 53-715). Fig. 8(b) shows a direct image of the target with no mask imposed. Fig. 8(c) shows a reconstructed image following bicubic interpolation of the single image in Fig. 8(b). Figs. 8(d) and 8(e) are images reconstructed from 30 modulated measurements. The reconstructed images exhibit finer detail than those in Figs. 8(b) and 8(c). For example, the numbers down the left hand side of the image that are blurred in Fig. 8(b) are distinguishable in the reconstructed image in Fig. 8(e). Significantly, the image reconstructed using the experimentally estimated H (Fig. 8(e)) is of higher quality than the image reconstructed using an ideal H (Fig. 8(d)), further indicating the importance of including a system-specific mapping term in the reconstruction model.

Conclusions
We have described the theoretical and experimental aspects of an architecture for imageplane-coded computational imaging that is based on a highly parallel version of the singlepixel camera. We have established that NPG-based CS reconstruction can produce highresolution images of binary and grayscale objects using relatively few observations. Some of the challenges that arise when translating CS-based imaging theory to practice for focal plane arrays were identified and the relationship between hardware quality and computational compensation was quantitatively and qualitatively analyzed. An inferior quality lens can still provide high quality images if the convolution kernel matrix associated with the lens is accurately measured and integrated into the measurement matrix. This analysis of the optical and mathematical aspects of the parallel architecture may provide guidance for researchers developing IPC compressive systems and lead to more precise image reconstruction.