Compressive Sensing Based Active Imaging System Using Programable Coded Mask and a Photodiode

There has been significant progress in the field of compressive sensing (CS) and its applications in various fields in recent years. Here, we have developed a CS-based active imaging system that does not require a pixelated array detector but instead uses a transmissive spatial light modulator as a programmable coded mask. Single-pixel cameras (SPCs) have many applications in fields such as terahertz (THz) and infrared (IR) imaging. With an increase in the number of pixels, array detectors can become expensive, especially in the THz and IR regimes. In this article, we have explored the concept of active illumination, which could be helpful in developing SPCs. The article focuses on the development of an active SPC, and the analysis of how coded masks affect the quality of the reconstructed image.


I. INTRODUCTION
I N THE conventional digital camera, photons incident at each pixel is converted into electrons and stored as a digital value in the memory. For example, a 2.0-megapixel monochrome camera that uses 8 bits analog to digital converter (ADC) requires 16 megabits of memory to store a picture. This consumes a considerable amount of memory and causes a significant delay when transferring the picture from one medium to another. To tackle this issue, images are compressed using appropriate compression algorithms, which can lower the expense of storage or transmission time [1], [2]. The selection of an algorithm depends upon the size of the image, the level of compression, and the information (quality) that needs to be maintained. Image compression algorithms essentially transform the image in terms of mathematical basis functions like Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and Wavelet Transform (WT) and extract and store the crucial coefficients and discard others [3], [4]. The main limitation of a conventional digital imaging system, which generally uses a large detector array, adds considerable costs in the image sampling and acquisition Manuscript  phase [5]. Another digital camera aspect is spatial resolution, which refers to the number of pixels used to build the camera. As the number of pixels increases, so does the spatial resolution. The number of pixels per unit distance determines the spatial resolution. It is also known as the sampling frequency, which is limited by the Nyquist criterion [6]. According to Nyquist criteria, the sampling frequency should be greater than twice the highest spatial frequency present in the optical image to accurately preserve the spatial resolution in the resulting digital image [7], [8]. As the spatial resolution increases, the cost of the camera also increases accordingly [9]. CS comes here as a rescuer [10], [11], which has emerged as an alternative to the traditional method for efficiently acquiring and reconstructing a signal [12], [13], [14], [15]. According to CS theory, images can be sampled at a much lower sampling frequency than needed to meet Nyquist criteria. Therefore, by reducing the sampling frequency, CS can reduce the imaging system's acquisition time and overall weight, size, and cost [16], [17].
CS compresses the image during the sampling phase itself [18]. The necessary condition for compression is that the signal must be sparse. When the signal is represented in terms of basis function, few coefficients are non-zero. CS intelligently measures these non-zero coefficients by applying a suitable "coded mask" to the image [19]. The number of measurements should be slightly larger than the number of non-zero coefficients so that no important coefficients are missed out [20]. CS ensures that the original signal can be reconstructed from these compressive measurements [21]. SPC is an application of CS in digital imaging that replaces the millions of pixels of modern digital cameras [22], [23]. Remarkable progress has been observed in SPC after it was first reported one and half decades ago by Duarte et al. [24], [25]. Recently, a review article by G.M. Gibson et al. [26] reported some significant developments in single-pixel imaging in terms of modulation technology, sampling scheme, illumination technique, or reconstruction algorithm. Some remarkable improvements have also been observed in the field of active SPCs. For example, F. Magalhães et al. [27] demonstrated an active single-pixel imaging system using structured and non-structured illumination techniques. Random-coded masks were used in both illumination techniques and reported a Peak Signal to Noise Ratio (PSNR) of 11 dB at a 20% compression ratio. K. Ikeda et al. [28] conducted another study demonstrating active single-pixel imaging using a diffractive pattern generated from a multi-core fiber laser. However, there was no reported evaluation of the reconstructed image quality by the  authors. A hybrid sampling mechanism that combined structured illumination and scanning was proposed by K. Zhang et al. [29]. This approach used a spatially coded laser beam to scan the object, increasing acquisition speed. The reconstructed image achieved a PSNR of 29.42 dB at a compression ratio of 25%. In a different study, M. Sun et al. [30] used micro-scanning of coded masks at the subpixel level to produce high-resolution images. The author claimed micro-scanning had improved the image signal-to-noise ratio (SNR) by 50%. R. Horisaki et al. [31] demonstrated active illumination Single Pixel imaging for phase imaging without reference light. The study utilized an SLM to produce a random coded mask for compressive sampling, proposed a phase retrieval algorithm, and reported a PSNR of approximately 18 dB at an 8% compression ratio. Differential sampling in active SPC for microscopy in the near IR spectral region was demonstrated by O. Denk et al. [32]. In this study, a halogen lamp served as an active source and compressive measurements were taken by capturing random projections of transmitted light through the sample using dual photodiodes. The authors demonstrated a low-cost, temperature-stabilized IR microscopy concept but did not report any assessment of the quality of the reconstructed images. An active imaging system that combined an array of SPCs was proposed by Z. Wang et al. [33]. Through simulation, they demonstrated that adaptive sampling can produce a high-resolution image with a large field of view and rapid reconstruction. However, the authors did not report the coded pattern used to modulate the digital micromirror device, nor did they mention the quality of the reconstructed image. C. Xu et al. [34] conducted a similar study; however, they proposed an adaptive sampling method for active single-pixel imaging. They used orthogonal basis functions to spatially modulate the projector's light to illuminate the scene. The reconstructed images were evaluated in terms of PSNR, reported as 21.95dB, and the structural similarity index (SSIM), reported as 0.804, at a compression ratio of 48.84%. Researchers worldwide have made efforts to improve the quality of images while achieving better compression rates. Table I presents a comparative analysis of their progress in active single-pixel imaging.
So far, it has been observed that either the light source was modulated or scattered light from the object was modulated for single-pixel imaging. As a result, the spatial light modulator is a crucial device that controls the amount of light used in imaging and has a significant impact on the quality of the final reconstructed image. To our knowledge, this article is the first to study the effect of light transmission of the coded masks on the quality of reconstructed images in SPC. The object is illuminated using a coherent light source, and scattered from the object is modulated using a computer-generated programable coded mask. Various random coded masks are generated by varying the percentage of transmissive pixels in the SLM. The images are reconstructed using a total variation (TV) based image reconstruction algorithm. The CS measurements and analysis of the reconstructed images are carried out based on the following: i) Ascending number of CS measurements with constant coded mask transmission ii) Ascending code mask transmission with constant CS measurements In the first method, images are reconstructed based on the increasing number of CS measurements, where the transmission of the coded aperture is kept constant. In the second method, images are reconstructed based on the increasing order of transmissions of the mask patterns, where CS measurements are kept constant. The quality assessment of the reconstructed image is performed by measuring SSIM [43], [44] to investigate the effect of light transmission of the coded mask on the reconstructed image, resulting in an impressive achieved PSNR score of 0.86 at a compression rate of 30%, surpassing recently reported results as mentioned in Table I.

A. Compressive Sensing
CS is an emerging data acquisition technique that enables the acquisition of signals at sampling rates below the Nyquist criterion [45], [46], [47]. CS substitutes traditional sampling with measurements of the inner products of the signal and mathematical vectors [48]. Afterward, the original signals can be estimated with the help of the inner product and mathematical vectors. A signal is said to be compressible if it is sparse or sparsifiable in some transformed domain [49]. This characteristic of the signal is known as sparsity. All real-valued signals are compressible and compressive sampling can be performed to acquire them [50]. CS theory mainly includes three steps: 1) sparse representation, 2) random projection, and 3) signal recovery [51], [52], [53], [54].

B. Sparse Representation
Let us consider x is a 2D image, which can be viewed as an N×1 real vector [x 1 , x 2 , x 3 . . . x N ], and can be represented in terms of N×N orthonormal basis matrix Ψ = [ ψ 1 , ψ 2 , ψ 3 . . . ψ N ], where ψ i is a column vector. Thus, x can be expressed in terms of transformed coefficients as T of x in the basis Ψ, is sparse. In other way, x is uniquely represented as If the basis Ψ offers a K-sparse representation of x, then (1) can be rephrased as There are many basis functions that are used for sparse representation of a signal. The widely used basis functions are discrete wavelets transform (DWT) and discrete cosine transform (DCT) [55].

C. Compressive Measurement
To take compressive measurement of the real-valued image x, M (M<<N) linear measurements are taken against a set of vectors {φ j } M j = 1 . This essentially gives the inner product of the image x and M×1 row vector φ j , giving y j = x, φ T j . Thus, the compressive measurements y = [y 1 , y 2 , y 3 . . . y M ] are given by: By plugging (2) in (6), the final equation of the system can be represented by Where Φ is the M×N measurement matrix, and A = ΦΨ is called the sensing matrix. Constructing the measurement matrix with Restricted Isometry Property (RIP) is essential because it offers a unique recovery of the sparse signal with high probability [56], [57]. This is possible because RIP preserves the pairwise distance between all the data points when projected from high to low-dimensional space [58], [59]. The least number of measurements required for successful recovery of the image is M ≥ cK log(N/K), where c is some constant, and K is sparsity level [60], [61].

D. Sparse Signal Recovery
Estimating α by solving (7) using the least square method will not give unique solutions [62] since it is an ill-posed inverse problem. Candes, Romberg, and Tao showed that the α of the image x could be reconstructed from its compressive measurements y with high probability by solving the convex optimization problem [63] given by:

III. EXPERIMENTAL SETUP
A Schematic of the experimental setup used for active singlepixel imaging is shown in Fig. 1. The corresponding photograph of the experimental setup is shown in Fig. 2. A commercially available diode laser (50 mW, λ = 650 nm) is used as an active source. The laser beam is diverged using a small focal length lens (3 cm) to illuminate the entire scene completely. The scattered light from the scene is collected using a lens (F#1.8) and focused on the SLM plane, as shown in Fig. 2.
A twisted nematic liquid-crystal SLM (LC2002) from Holoeye generates coded masks. The SLM has 800×600 pixels with 32 μm pitch and 256 grey levels. The pixels of the device are modulated electrically and controlled through VGA signals from a desktop computer (i7-4790 CPU @ 3.60 GHz, 4 GB RAM). We used the polarizer-SLM-analyser configuration to operate the SLM as an amplitude modulator. The polarizer and the analyzer are calibrated so that the 0°position blocks horizontally polarized light and transmits orthogonally polarized light. These positions are located through a manual optimization process.
A 10× objective lens (NA 0.25) is used to collect the modulated light by the SLM and focus it onto the Photodetector (Texas Instruments: OPT101). It is a large area (2.29 × 2.29 mm 2 ) monolithic photodiode integrated with a trans-impedance amplifier. It offers a wide operating voltage range, from 2.7V to 36V. However, we have operated the detector at 5V. The detector's responsivity is 0.45A/W @650 nm with a quiescent current of 120 μA. The photodiode is interfaced with the analog input channel of Advantech's PCI-based ADC Card (PCI-1747U). The sampling rate is up to 250 kS/s, and 16-bit resolution provides the resolution required for data acquisition. The computer's display resolution is set (800 × 600) according to SLM resolution so that there is a one-to-one mapping. The application program creates a 300 × 300 pixels window at the center of the display, which is mapped to the 64 × 64 coded mask pattern. The mask patterns are modulated based on pre-generated binary random matrices of size 64 × 64. The binary random matrices are generated purely based on the Bernoulli distribution. The 1's and 0's in the binary matrix represent transmission and opaque pixels, respectively. As the number of transmissive pixels increases, the transmittance of the coded mask also increases. Each matrix is stored row-wise in the measurement matrix Φ. A set of measurement matrices are generated by varying the transmission pixels to study the effect of transmission of coded aperture on the quality of a reconstructed image.

IV. RESULT AND DISCUSSION
The experimental setup in Fig. 2 shows that the laser beam is diverged using a small lens to overspill the scene. The collection lens collects the scattered light from the scene and forms the image at the center of the SLM in such a way that it fits inside the mask pattern by the application program, as shown in Fig. 3.
The application program is developed in the Win32 framework using OpenGL and multi-threading concepts. OpenGL is an application programming interface (API) used for rendering computer graphics. At the same time, a thread is a process with a small footprint used to execute instructions independently. The thread in our application calls the function which renders coded masks for the number of times specified for CS measurements. For each coded mask pattern, the light intensity level of the photodiode is converted into a digital format using the ADC card and stored in the computer memory for further processing. Two objects, 1) the letter "IITD" printed on white paper with a black background of size 3 cm × 1 cm and 2) The letter "T" on white paper (2.2 cm 2 ), are used as the input scenes. The photographs of these objects, captured using a smartphone camera, are shown in Fig. 4(a) and (b), respectively.
During the CS measurements, the application program selects the row from the measurement matrix Φ and displays it on the Fig. 4. Picture of the actual scene (a) printout of 'IITD' on paper (b) 'T' shape paper cut out (c) ground truth image of study i (d) ground truth image of study ii. SLM after resizing it to a 64 × 64 matrix to create a mask pattern. The light intensity on the photodetector is converted into digital format and stored in the computer as column vector y. A TV minimization by augmented lagrangian and alternating direction algorithm (TVAL3) [64] is used to reconstruct the image from the CS measurements. TVAL3 is commonly used for single-pixel imaging due to its better reconstruction rate and quality than its predecessor algorithms like l1-magic, NeSTA, and TwIST [65]. Therefore, in this study, we have employed the TVAL3 algorithm to reconstruct images [66]. About 1230 CS measurements (30% of the 64 × 64 image) are recorded based on the measurement matrix of 50% transmission. Images are reconstructed based on the number of CS measurements that range from 5% to 30% of the total number of pixels in a step of 5. The reconstructed images are shown in Fig. 5.
The quality assessment of the reconstructed images is carried out using SSIM. Which is a persistent and commonly used technique for measuring the similarity between two images [67], [68], [69], [70]. SSIM of two images x and y, is calculated using (10) SSIM (x, y) = (2μ x μ y + c 1 ) (2σ xy + c 2 ) μ 2 x + μ 2 y + c 1 σ 2 x + σ 2 y + c 2 Where r μ x , μ y are the mean of x and y, respectively. r σ 2 x , σ 2 y are the variance of x and y , respectively.
r σ xy is the covariance of x and y.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  r c 1 and c 2 are two variables involved in stabilizing the division by weak denominators and are dependent on the dynamic range of the pixels. The best reconstruction image, which is considered the ground truth for subsequent SSIM calculations, is obtained by employing full sampling (4096 measurements). The ground truth image was reconstructed from the captured signal at the photodetector by employing full sampling for 64x64 image when the signal from the scene/object was modulated with 4096 coded patterns. The SSIM score was calculated for these images generated using the same optical setup. The ground truth images for Fig. 4(a) and (b) are provided in Fig. 4(c) and (d), respectively. The values of the SSIM scores vary from 0 to 1. SSIM scores are calculated for reconstructed images and plotted against the no. of CS measurements used for the reconstruction, as shown in Fig. 6. The results of our study are encouraging, and the design can also be applied in other imaging fields. It is found that as the number of CS measurements increases, the SSIM scores increase accordingly.
To determine the compression rate at which the SSIM score reaches saturation, a 2nd order polynomial curve fitting (red dotted line, Fig. 6) is employed. It is observed that when the compression rate exceeds 20%, the SSIM score stabilizes at its maximum value. Another set of CS measurements is recorded based on a set of measurement matrices of assorted transmission. The transmission of the measurement matrices varied from  30% to 80% at a step size of 10. One matrix from each set is shown in Fig. 7. Around 410 nos. (10% of 64 × 64 image) of CS measurements are recorded from each set of measurement matrices. The reconstructed images are shown in Fig. 8.
SSIM scores are calculated for reconstructed images and plotted against the percentage of transmission of the coded masks used during the CS measurements, shown in Fig. 9. It is bserved that the best image is reconstructed when the transmission percentage of the coded mask is 50%. A 2nd order polynomial curve fitting (green dotted line, Fig. 9) is used to calculate the slope of the SSIM score in relation to the transmission of the coded aperture. The results indicate that the maximal SSIM score is achieved when the transmission of the coded aperture ranges from 50% to 60%.

V. CONCLUSION
The single-pixel camera, using the active imaging modality described in this article, can reconstruct a scene's image by illuminating a monochromatic light source. A comprehensive analysis of the quality of reconstructed images with respect to the light transmission of coded masks is presented. The analysis indicates that the optimal quality for the reconstructed image occurs when the transmission of the coded mask is at 50%, resulting in an SSIM score of 0.86. It was also observed that the quality of the reconstructed image increases as the number of CS measurements increases. The trans-receiver mode active illumination process described looks promising as it provides a compact and affordable platform for military, medical, and low-intensity conflict applications. In addition to static frames, single pixel imaging may be explored for dynamic objects in future. Here, the dynamic scene of the object can be captured in the form of video from longer distances. The short-pulsed lasers and compatible detectors with suitable optical filters may be used to capture the dynamic scene. Additionally, application of faster projection and synchronized readout techniques may enhance the temporal resolution. The source pulse, projection mechanism for the coded mask and the signal at detector can be well synchronized by suitably selecting the hardware components. The quality of reconstructed image can be further improved by using sophisticated sampling schemes, optimization techniques, and utilizing the priori information about the object.